All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Improve open zones accounting
@ 2020-08-13  4:57 Shin'ichiro Kawasaki
  2020-08-13  4:57 ` [PATCH 1/6] zbd: Decrement open zones count at write command completion Shin'ichiro Kawasaki
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Shin'ichiro Kawasaki @ 2020-08-13  4:57 UTC (permalink / raw)
  To: fio, Jens Axboe; +Cc: Damien Le Moal, Dmitry Fomichev, Shinichiro Kawasaki

The max_open_zones and job_max_open_zones option control number of write target
open zones within the specified limit. Currently, this write target zone
accounting is not accurate enough. Even with these options set, the limits can
exceed during write workloads.

This patch series improves accuracy of the open zones accounting. The first
patch moves the moment to decrement number of currently open zones from write
command submit to write command completion. This avoids zone writes beyond the
limit caused by concurrent write commands at queue depths approaching the max
limits. The second patch fixes the open zone status initialization issue. Other
following patches improve t/zbd scripts to take maximum write target zone limit
and allow to test devices with the limit.

Shin'ichiro Kawasaki (6):
  zbd: Decrement open zones count at write command completion
  zbd: Initialize open zones list referring zone status at fio start
  t/zbd: Improve usage message of test-zbd-support script
  t/zbd: Add -o option to t/zbd/test-zoned-support
  t/zbd: Reset all zones before test when max open zones is specified
  t/zbd: Remove unnecessary option for zbc_reset_zone

 io_u.c                 |   4 +-
 io_u.h                 |   5 +-
 ioengines.c            |   4 +-
 t/zbd/functions        |   2 +-
 t/zbd/test-zbd-support | 124 +++++++++++++++++++++++++++++++++++------
 zbd.c                  | 113 ++++++++++++++++++++++++++++---------
 zbd.h                  |   9 +--
 7 files changed, 208 insertions(+), 53 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] zbd: Decrement open zones count at write command completion
  2020-08-13  4:57 [PATCH 0/6] Improve open zones accounting Shin'ichiro Kawasaki
@ 2020-08-13  4:57 ` Shin'ichiro Kawasaki
  2020-08-15  0:05   ` Dmitry Fomichev
  2020-08-13  4:57 ` [PATCH 2/6] zbd: Initialize open zones list referring zone status at fio start Shin'ichiro Kawasaki
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Shin'ichiro Kawasaki @ 2020-08-13  4:57 UTC (permalink / raw)
  To: fio, Jens Axboe; +Cc: Damien Le Moal, Dmitry Fomichev, Shinichiro Kawasaki

When max_open_zones or job_max_open_zones option is provided, fio counts
number of open zones. This open zone accounting is done during submission
of write commands. When write command is submitted to a zone for the
first time, the zone is counted as open. When a write command which will
fill the zone gets submitted, the zone is counted as closed. However,
this count at write command submission may open zones more than the
limit. When writes to zones more than the limit are submitted with high
queue depth, those writes in-flight open zones more than the limit.

To avoid such writes beyond max open zones limit, decrement number of
open zones count not at write command submission but at write command
completion. By doing this, the number of zones with in-flight write
commands are kept within the limit with accuracy. Introduce the helper
function zbd_end_zone_io() for this decrement and zbd_close_zone() call.

The zbd_close_zone() function requires thread_data argument. To refer
thread_data at write command completion, add the argument to zbd_put_io()
and zbd_queue_io() callbacks. Also add a loop to the zbd_close_zone()
function which converts zone_info array index to open_zones array index
to avoid loop code duplication.

Even when io_u points to an open zone, the zone may not be valid for
write since in-flight write commands may make the zone full. Check this
in zbd_open_zone() to handle such zones as in full status.

Because of the zone close timing change, there might be no open zone when
zbd_convert_to_open_zone() is called. Do not handle such case as an
error and open other zones.

When zbd_convert_to_open_zone() finds all open zones can not be used for
next write, it opens other zones. This zone open fails when the number of
open zones hits one of the limits: 1) number of zones in the fio write
target region, 2) max_open_zones option or 3) job_max_open_zones option.
To avoid the zone open failure, wait for writes in-flight completes and
open zones get closed before opening other zones.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 io_u.c      |  4 +--
 io_u.h      |  5 +--
 ioengines.c |  4 +--
 zbd.c       | 92 +++++++++++++++++++++++++++++++++++++++++------------
 zbd.h       |  9 +++---
 5 files changed, 84 insertions(+), 30 deletions(-)

diff --git a/io_u.c b/io_u.c
index 2ef5acec..62bfec47 100644
--- a/io_u.c
+++ b/io_u.c
@@ -795,7 +795,7 @@ void put_io_u(struct thread_data *td, struct io_u *io_u)
 {
 	const bool needs_lock = td_async_processing(td);
 
-	zbd_put_io_u(io_u);
+	zbd_put_io_u(td, io_u);
 
 	if (td->parent)
 		td = td->parent;
@@ -1364,7 +1364,7 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
 		if (!fill_io_u(td, io_u))
 			break;
 
-		zbd_put_io_u(io_u);
+		zbd_put_io_u(td, io_u);
 
 		put_file_log(td, f);
 		td_io_close_file(td, f);
diff --git a/io_u.h b/io_u.h
index 31100928..5a28689c 100644
--- a/io_u.h
+++ b/io_u.h
@@ -101,13 +101,14 @@ struct io_u {
 	 * @success == true means that the I/O operation has been queued or
 	 * completed successfully.
 	 */
-	void (*zbd_queue_io)(struct io_u *, int q, bool success);
+	void (*zbd_queue_io)(struct thread_data *td, struct io_u *, int q,
+			     bool success);
 
 	/*
 	 * ZBD mode zbd_put_io callback: called in after completion of an I/O
 	 * or commit of an async I/O to unlock the I/O target zone.
 	 */
-	void (*zbd_put_io)(const struct io_u *);
+	void (*zbd_put_io)(struct thread_data *td, const struct io_u *);
 
 	/*
 	 * Callback for io completion
diff --git a/ioengines.c b/ioengines.c
index 1c5970a4..476df58d 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -352,7 +352,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	}
 
 	ret = td->io_ops->queue(td, io_u);
-	zbd_queue_io_u(io_u, ret);
+	zbd_queue_io_u(td, io_u, ret);
 
 	unlock_file(td, io_u->file);
 
@@ -394,7 +394,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	if (!td->io_ops->commit) {
 		io_u_mark_submit(td, 1);
 		io_u_mark_complete(td, 1);
-		zbd_put_io_u(io_u);
+		zbd_put_io_u(td, io_u);
 	}
 
 	if (ret == FIO_Q_COMPLETED) {
diff --git a/zbd.c b/zbd.c
index e4a480b7..20f64b58 100644
--- a/zbd.c
+++ b/zbd.c
@@ -721,12 +721,18 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 
 /* The caller must hold f->zbd_info->mutex */
 static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
-			   unsigned int open_zone_idx)
+			   unsigned int zone_idx)
 {
-	uint32_t zone_idx;
+	uint32_t open_zone_idx = 0;
+
+	for (; open_zone_idx < f->zbd_info->num_open_zones; open_zone_idx++) {
+		if (f->zbd_info->open_zones[open_zone_idx] == zone_idx)
+			break;
+	}
+	if (open_zone_idx == f->zbd_info->num_open_zones)
+		return;
 
-	assert(open_zone_idx < f->zbd_info->num_open_zones);
-	zone_idx = f->zbd_info->open_zones[open_zone_idx];
+	dprint(FD_ZBD, "%s: closing zone %d\n", f->file_name, zone_idx);
 	memmove(f->zbd_info->open_zones + open_zone_idx,
 		f->zbd_info->open_zones + open_zone_idx + 1,
 		(ZBD_MAX_OPEN_ZONES - (open_zone_idx + 1)) *
@@ -765,13 +771,8 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 			continue;
 		zone_lock(td, f, z);
 		if (all_zones) {
-			unsigned int i;
-
 			pthread_mutex_lock(&f->zbd_info->mutex);
-			for (i = 0; i < f->zbd_info->num_open_zones; i++) {
-				if (f->zbd_info->open_zones[i] == nz)
-					zbd_close_zone(td, f, i);
-			}
+			zbd_close_zone(td, f, nz);
 			pthread_mutex_unlock(&f->zbd_info->mutex);
 
 			reset_wp = z->wp != z->start;
@@ -952,8 +953,15 @@ static bool zbd_open_zone(struct thread_data *td, const struct io_u *io_u,
 		return false;
 
 	pthread_mutex_lock(&f->zbd_info->mutex);
-	if (is_zone_open(td, f, zone_idx))
+	if (is_zone_open(td, f, zone_idx)) {
+		/*
+		 * If the zone is already open and going to be full by writes
+		 * in-flight, handle it as a full zone instead of an open zone.
+		 */
+		if (z->wp >= zbd_zone_capacity_end(z))
+			res = false;
 		goto out;
+	}
 	res = false;
 	/* Zero means no limit */
 	if (td->o.job_max_open_zones > 0 &&
@@ -995,6 +1003,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	unsigned int open_zone_idx = -1;
 	uint32_t zone_idx, new_zone_idx;
 	int i;
+	bool wait_zone_close;
 
 	assert(is_valid_offset(f, io_u->offset));
 
@@ -1030,11 +1039,9 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 		if (td->o.max_open_zones == 0 && td->o.job_max_open_zones == 0)
 			goto examine_zone;
 		if (f->zbd_info->num_open_zones == 0) {
-			pthread_mutex_unlock(&f->zbd_info->mutex);
-			pthread_mutex_unlock(&z->mutex);
 			dprint(FD_ZBD, "%s(%s): no zones are open\n",
 			       __func__, f->file_name);
-			return NULL;
+			goto open_other_zone;
 		}
 
 		/*
@@ -1081,14 +1088,30 @@ examine_zone:
 		pthread_mutex_unlock(&f->zbd_info->mutex);
 		goto out;
 	}
-	dprint(FD_ZBD, "%s(%s): closing zone %d\n", __func__, f->file_name,
-	       zone_idx);
-	if (td->o.max_open_zones || td->o.job_max_open_zones)
-		zbd_close_zone(td, f, open_zone_idx);
+
+open_other_zone:
+	/* Check if number of open zones reaches one of limits. */
+	wait_zone_close =
+		f->zbd_info->num_open_zones == f->max_zone - f->min_zone ||
+		(td->o.max_open_zones &&
+		 f->zbd_info->num_open_zones == td->o.max_open_zones) ||
+		(td->o.job_max_open_zones &&
+		 td->num_open_zones == td->o.job_max_open_zones);
+
 	pthread_mutex_unlock(&f->zbd_info->mutex);
 
 	/* Only z->mutex is held. */
 
+	/*
+	 * When number of open zones reaches to one of limits, wait for
+	 * zone close before opening a new zone.
+	 */
+	if (wait_zone_close) {
+		dprint(FD_ZBD, "%s(%s): quiesce to allow open zones to close\n",
+		       __func__, f->file_name);
+		io_u_quiesce(td);
+	}
+
 	/* Zone 'z' is full, so try to open a new zone. */
 	for (i = f->io_size / f->zbd_info->zone_size; i > 0; i--) {
 		zone_idx++;
@@ -1203,6 +1226,29 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 	return NULL;
 }
 
+/**
+ * zbd_end_zone_io - update zone status at command completion
+ * @io_u: I/O unit
+ * @z: zone info pointer
+ * @success: Whether or not the I/O unit has been queued successfully
+ *
+ * If the write command made the zone full, close it.
+ *
+ * The caller must hold z->mutex.
+ */
+static void zbd_end_zone_io(struct thread_data *td, const struct io_u *io_u,
+			    struct fio_zone_info *z, bool success)
+{
+	const struct fio_file *f = io_u->file;
+
+	if (io_u->ddir == DDIR_WRITE && success &&
+	    io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
+		pthread_mutex_lock(&f->zbd_info->mutex);
+		zbd_close_zone(td, f, z - f->zbd_info->zone_info);
+		pthread_mutex_unlock(&f->zbd_info->mutex);
+	}
+}
+
 /**
  * zbd_queue_io - update the write pointer of a sequential zone
  * @io_u: I/O unit
@@ -1212,7 +1258,8 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
  * For write and trim operations, update the write pointer of the I/O unit
  * target zone.
  */
-static void zbd_queue_io(struct io_u *io_u, int q, bool success)
+static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
+			 bool success)
 {
 	const struct fio_file *f = io_u->file;
 	struct zoned_block_device_info *zbd_info = f->zbd_info;
@@ -1258,6 +1305,9 @@ static void zbd_queue_io(struct io_u *io_u, int q, bool success)
 		break;
 	}
 
+	if (q == FIO_Q_COMPLETED)
+		zbd_end_zone_io(td, io_u, z, !io_u->error);
+
 unlock:
 	if (!success || q != FIO_Q_QUEUED) {
 		/* BUSY or COMPLETED: unlock the zone */
@@ -1270,7 +1320,7 @@ unlock:
  * zbd_put_io - Unlock an I/O unit target zone lock
  * @io_u: I/O unit
  */
-static void zbd_put_io(const struct io_u *io_u)
+static void zbd_put_io(struct thread_data *td, const struct io_u *io_u)
 {
 	const struct fio_file *f = io_u->file;
 	struct zoned_block_device_info *zbd_info = f->zbd_info;
@@ -1292,6 +1342,8 @@ static void zbd_put_io(const struct io_u *io_u)
 	       "%s: terminate I/O (%lld, %llu) for zone %u\n",
 	       f->file_name, io_u->offset, io_u->buflen, zone_idx);
 
+	zbd_end_zone_io(td, io_u, z, true);
+
 	ret = pthread_mutex_unlock(&z->mutex);
 	assert(ret == 0);
 	zbd_check_swd(f);
diff --git a/zbd.h b/zbd.h
index 021174c1..bff55f99 100644
--- a/zbd.h
+++ b/zbd.h
@@ -98,18 +98,19 @@ static inline void zbd_close_file(struct fio_file *f)
 		zbd_free_zone_info(f);
 }
 
-static inline void zbd_queue_io_u(struct io_u *io_u, enum fio_q_status status)
+static inline void zbd_queue_io_u(struct thread_data *td, struct io_u *io_u,
+				  enum fio_q_status status)
 {
 	if (io_u->zbd_queue_io) {
-		io_u->zbd_queue_io(io_u, status, io_u->error == 0);
+		io_u->zbd_queue_io(td, io_u, status, io_u->error == 0);
 		io_u->zbd_queue_io = NULL;
 	}
 }
 
-static inline void zbd_put_io_u(struct io_u *io_u)
+static inline void zbd_put_io_u(struct thread_data *td, struct io_u *io_u)
 {
 	if (io_u->zbd_put_io) {
-		io_u->zbd_put_io(io_u);
+		io_u->zbd_put_io(td, io_u);
 		io_u->zbd_queue_io = NULL;
 		io_u->zbd_put_io = NULL;
 	}
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/6] zbd: Initialize open zones list referring zone status at fio start
  2020-08-13  4:57 [PATCH 0/6] Improve open zones accounting Shin'ichiro Kawasaki
  2020-08-13  4:57 ` [PATCH 1/6] zbd: Decrement open zones count at write command completion Shin'ichiro Kawasaki
@ 2020-08-13  4:57 ` Shin'ichiro Kawasaki
  2020-08-15  0:06   ` Dmitry Fomichev
  2020-08-13  4:57 ` [PATCH 3/6] t/zbd: Improve usage message of test-zbd-support script Shin'ichiro Kawasaki
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Shin'ichiro Kawasaki @ 2020-08-13  4:57 UTC (permalink / raw)
  To: fio, Jens Axboe; +Cc: Damien Le Moal, Dmitry Fomichev, Shinichiro Kawasaki

When fio starts write workloads to zones with open status, fio does not
reflect the zone status to open zone list. This results in inconsistent
open zone accounting.

To avoid this inconsistency, initialize the open zone list referring the
zone status by calling zbd_open_zone() function before workload start.
Since io_u is not available at that timing, make the function
independent from io_u.

Of note is that fio counts open zones within the write target range.
Open zones out of the range are not counted or checked.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 zbd.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/zbd.c b/zbd.c
index 20f64b58..123d5530 100644
--- a/zbd.c
+++ b/zbd.c
@@ -627,6 +627,9 @@ static int zbd_init_zone_info(struct thread_data *td, struct fio_file *file)
 	return ret;
 }
 
+static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
+			  uint32_t zone_idx);
+
 int zbd_setup_files(struct thread_data *td)
 {
 	struct fio_file *f;
@@ -650,6 +653,8 @@ int zbd_setup_files(struct thread_data *td)
 
 	for_each_file(td, f, i) {
 		struct zoned_block_device_info *zbd = f->zbd_info;
+		struct fio_zone_info *z;
+		int zi;
 
 		if (!zbd)
 			continue;
@@ -665,6 +670,13 @@ int zbd_setup_files(struct thread_data *td)
 			log_err("'max_open_zones' value is limited by %u\n", ZBD_MAX_OPEN_ZONES);
 			return 1;
 		}
+
+		for (zi = f->min_zone; zi < f->max_zone; zi++) {
+			z = &zbd->zone_info[zi];
+			if (z->cond == ZBD_ZONE_COND_IMP_OPEN ||
+			    z->cond == ZBD_ZONE_COND_EXP_OPEN)
+				zbd_open_zone(td, f, zi);
+		}
 	}
 
 	return 0;
@@ -934,11 +946,10 @@ static bool is_zone_open(const struct thread_data *td, const struct fio_file *f,
  * was not yet open and opening a new zone would cause the zone limit to be
  * exceeded.
  */
-static bool zbd_open_zone(struct thread_data *td, const struct io_u *io_u,
+static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 			  uint32_t zone_idx)
 {
 	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
-	const struct fio_file *f = io_u->file;
 	struct fio_zone_info *z = &f->zbd_info->zone_info[zone_idx];
 	bool res = true;
 
@@ -1126,7 +1137,7 @@ open_other_zone:
 		zone_lock(td, f, z);
 		if (z->open)
 			continue;
-		if (zbd_open_zone(td, io_u, zone_idx))
+		if (zbd_open_zone(td, f, zone_idx))
 			goto out;
 	}
 
@@ -1169,7 +1180,7 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
 	const struct fio_file *f = io_u->file;
 	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
 
-	if (!zbd_open_zone(td, io_u, z - f->zbd_info->zone_info)) {
+	if (!zbd_open_zone(td, f, z - f->zbd_info->zone_info)) {
 		pthread_mutex_unlock(&z->mutex);
 		z = zbd_convert_to_open_zone(td, io_u);
 		assert(z);
@@ -1581,7 +1592,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	case DDIR_WRITE:
 		if (io_u->buflen > f->zbd_info->zone_size)
 			goto eof;
-		if (!zbd_open_zone(td, io_u, zone_idx_b)) {
+		if (!zbd_open_zone(td, f, zone_idx_b)) {
 			pthread_mutex_unlock(&zb->mutex);
 			zb = zbd_convert_to_open_zone(td, io_u);
 			if (!zb)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/6] t/zbd: Improve usage message of test-zbd-support script
  2020-08-13  4:57 [PATCH 0/6] Improve open zones accounting Shin'ichiro Kawasaki
  2020-08-13  4:57 ` [PATCH 1/6] zbd: Decrement open zones count at write command completion Shin'ichiro Kawasaki
  2020-08-13  4:57 ` [PATCH 2/6] zbd: Initialize open zones list referring zone status at fio start Shin'ichiro Kawasaki
@ 2020-08-13  4:57 ` Shin'ichiro Kawasaki
  2020-08-15  0:06   ` Dmitry Fomichev
  2020-08-13  4:57 ` [PATCH 4/6] t/zbd: Add -o option to t/zbd/test-zoned-support Shin'ichiro Kawasaki
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Shin'ichiro Kawasaki @ 2020-08-13  4:57 UTC (permalink / raw)
  To: fio, Jens Axboe; +Cc: Damien Le Moal, Dmitry Fomichev, Shinichiro Kawasaki

The usage message of t/zbd/test-zbd-support does not explain meaning of
each option. Elaborate meaning of the options.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 t/zbd/test-zbd-support | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 139495d3..9e398cab 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -5,7 +5,15 @@
 # This file is released under the GPL.
 
 usage() {
-    echo "Usage: $(basename "$0") [-d] [-e] [-l] [-r] [-v] [-t <test>] [-z] <SMR drive device node>"
+	echo "Usage: $(basename "$0") [OPTIONS] <test target device file>"
+	echo "Options:"
+	echo -e "\t-d Run fio with valgrind using DRD tool"
+	echo -e "\t-e Run fio with valgrind using helgrind tool"
+	echo -e "\t-v Run fio with valgrind --read-var-info option"
+	echo -e "\t-l Test with libzbc ioengine"
+	echo -e "\t-r Reset all zones before test start"
+	echo -e "\t-t <test #> Run only a single test case with specified number"
+	echo -e "\t-z Run fio with debug=zbd option"
 }
 
 max() {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/6] t/zbd: Add -o option to t/zbd/test-zoned-support
  2020-08-13  4:57 [PATCH 0/6] Improve open zones accounting Shin'ichiro Kawasaki
                   ` (2 preceding siblings ...)
  2020-08-13  4:57 ` [PATCH 3/6] t/zbd: Improve usage message of test-zbd-support script Shin'ichiro Kawasaki
@ 2020-08-13  4:57 ` Shin'ichiro Kawasaki
  2020-08-15  0:07   ` Dmitry Fomichev
  2020-08-13  4:57 ` [PATCH 5/6] t/zbd: Reset all zones before test when max open zones is specified Shin'ichiro Kawasaki
  2020-08-13  4:57 ` [PATCH 6/6] t/zbd: Remove unnecessary option for zbc_reset_zone Shin'ichiro Kawasaki
  5 siblings, 1 reply; 17+ messages in thread
From: Shin'ichiro Kawasaki @ 2020-08-13  4:57 UTC (permalink / raw)
  To: fio, Jens Axboe; +Cc: Damien Le Moal, Dmitry Fomichev, Shinichiro Kawasaki

To specify maximum open zones of the test target device, add -o option
to t/zbd/test-zoned-support. When this option is specified, add
max_open_zones option to all fio commands in the test script. The option
ensures the number of zones opened by fio is within the limit. This is
useful when the test target device has maximum open zones limit.

When the fio command does not have multiple jobs and the test case does
not specify max_open_zones, add single max_open_zones option to the fio
command line.

When the fio command takes multiple jobs, add max_open_zones option to
each job. Introduce job_var_opts variable to keep options to be added to
each job. To distinguish it from global options for all jobs, rename
var_opts variable to global_var_opts. When a test case with multiple jobs
always specifies max_open_zones option, exclude the max_open_zones option
from job_var_opts, using job_var_opts_exclude() helper function.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 t/zbd/test-zbd-support | 70 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 57 insertions(+), 13 deletions(-)

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 9e398cab..c21d6aad 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -12,6 +12,7 @@ usage() {
 	echo -e "\t-v Run fio with valgrind --read-var-info option"
 	echo -e "\t-l Test with libzbc ioengine"
 	echo -e "\t-r Reset all zones before test start"
+	echo -e "\t-o <max_open_zones> Run fio with max_open_zones limit"
 	echo -e "\t-t <test #> Run only a single test case with specified number"
 	echo -e "\t-z Run fio with debug=zbd option"
 }
@@ -103,14 +104,41 @@ is_scsi_device() {
     return 1
 }
 
+job_var_opts_exclude() {
+	local o
+	local ex_key="${1}"
+
+	for o in "${job_var_opts[@]}"; do
+		if [[ ${o} =~ "${ex_key}" ]]; then
+			continue
+		fi
+		echo -n "${o}"
+	done
+}
+
+has_max_open_zones() {
+	while (($# > 1)); do
+		if [[ ${1} =~ "--max_open_zones" ]]; then
+			return 0
+		fi
+		shift
+	done
+	return 1
+}
+
 run_fio() {
     local fio opts
 
     fio=$(dirname "$0")/../../fio
 
-    opts=("--max-jobs=16" "--aux-path=/tmp" "--allow_file_create=0" \
-	  "--significant_figures=10" "$@")
-    opts+=(${var_opts[@]})
+    opts=(${global_var_opts[@]})
+    opts+=("--max-jobs=16" "--aux-path=/tmp" "--allow_file_create=0" \
+			   "--significant_figures=10" "$@")
+    # When max_open_zones option is specified to this test script, add
+    # max_open_zones option to fio command unless the test case already add it.
+    if [[ -n ${max_open_zones_opt} ]] && ! has_max_open_zones "${opts[@]}"; then
+	    opts+=("--max_open_zones=${max_open_zones_opt}")
+    fi
     { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
 
     "${dynamic_analyzer[@]}" "$fio" "${opts[@]}"
@@ -128,13 +156,16 @@ write_and_run_one_fio_job() {
     local r
     local write_offset="${1}"
     local write_size="${2}"
+    local -a write_opts
 
     shift 2
     r=$(((RANDOM << 16) | RANDOM))
-    run_fio --filename="$dev" --randseed="$r"  --name="write_job" --rw=write \
-	    "$(ioengine "psync")" --bs="${logical_block_size}" \
-	    --zonemode=zbd --zonesize="${zone_size}" --thread=1 --direct=1 \
-	    --offset="${write_offset}" --size="${write_size}" \
+    write_opts=(--name="write_job" --rw=write "$(ioengine "psync")" \
+		      --bs="${logical_block_size}" --zonemode=zbd \
+		      --zonesize="${zone_size}" --thread=1 --direct=1 \
+		      --offset="${write_offset}" --size="${write_size}")
+    write_opts+=("${job_var_opts[@]}")
+    run_fio --filename="$dev" --randseed="$r" "${write_opts[@]}" \
 	    --name="$dev" --wait_for="write_job" "$@" --thread=1 --direct=1
 }
 
@@ -512,7 +543,7 @@ test25() {
 	opts+=("--offset=$((first_sequential_zone_sector*512 + zone_size*i))")
 	opts+=("--size=$zone_size" "$(ioengine "psync")" "--rw=write" "--bs=16K")
 	opts+=("--zonemode=zbd" "--zonesize=${zone_size}" "--group_reporting=1")
-	opts+=(${var_opts[@]})
+	opts+=(${job_var_opts[@]})
     done
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
 }
@@ -557,7 +588,7 @@ test28() {
 	opts+=("--size=$zone_size" "--io_size=$capacity" "$(ioengine "psync")" "--rw=randwrite")
 	opts+=("--thread=1" "--direct=1" "--zonemode=zbd")
 	opts+=("--zonesize=${zone_size}" "--group_reporting=1")
-	opts+=(${var_opts[@]})
+	opts+=(${job_var_opts[@]})
     done
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
     check_written $((jobs * $capacity)) || return $?
@@ -581,7 +612,8 @@ test29() {
 	opts+=("$(ioengine "psync")" "--rw=randwrite" "--direct=1")
 	opts+=("--max_open_zones=4" "--group_reporting=1")
 	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
-	opts+=(${var_opts[@]})
+	# max_open_zones is already specified
+	opts+=($(job_var_opts_exclude "--max_open_zones"))
     done
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
     check_written $((jobs * zone_size)) || return $?
@@ -617,7 +649,7 @@ test31() {
 	opts+=("--bs=$bs" "--size=$zone_size" "$(ioengine "libaio")")
 	opts+=("--rw=write" "--direct=1" "--thread=1" "--stats=0")
 	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
-	opts+=(${var_opts[@]})
+	opts+=(${job_var_opts[@]})
     done
     "$(dirname "$0")/../../fio" "${opts[@]}" >> "${logfile}.${test_number}" 2>&1
     # Next, run the test.
@@ -627,6 +659,7 @@ test31() {
     opts+=("--bs=$bs" "$(ioengine "psync")" "--rw=randread" "--direct=1")
     opts+=("--thread=1" "--time_based" "--runtime=30" "--zonemode=zbd")
     opts+=("--zonesize=${zone_size}")
+    opts+=(${job_var_opts[@]})
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
 }
 
@@ -843,6 +876,8 @@ test48() {
 	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
 	opts+=("--io_size=$zone_size" "--iodepth=256" "--thread=1")
 	opts+=("--group_reporting=1")
+	# max_open_zones is already specified
+	opts+=($(job_var_opts_exclude "--max_open_zones"))
     done
 
     fio=$(dirname "$0")/../../fio
@@ -880,6 +915,7 @@ dynamic_analyzer=()
 reset_all_zones=
 use_libzbc=
 zbd_debug=
+max_open_zones_opt=
 
 while [ "${1#-}" != "$1" ]; do
   case "$1" in
@@ -891,6 +927,7 @@ while [ "${1#-}" != "$1" ]; do
     -l) use_libzbc=1; shift;;
     -r) reset_all_zones=1; shift;;
     -t) tests+=("$2"); shift; shift;;
+    -o) max_open_zones_opt="${2}"; shift; shift;;
     -v) dynamic_analyzer=(valgrind "--read-var-info=yes");
 	shift;;
     -z) zbd_debug=1; shift;;
@@ -906,9 +943,10 @@ fi
 # shellcheck source=functions
 source "$(dirname "$0")/functions" || exit $?
 
-var_opts=()
+global_var_opts=()
+job_var_opts=()
 if [ -n "$zbd_debug" ]; then
-    var_opts+=("--debug=zbd")
+    global_var_opts+=("--debug=zbd")
 fi
 dev=$1
 realdev=$(readlink -f "$dev")
@@ -994,6 +1032,12 @@ elif [[ -c "$realdev" ]]; then
 	fi
 fi
 
+if [[ -n ${max_open_zones_opt} ]]; then
+	# Override max_open_zones with the script option value
+	max_open_zones="${max_open_zones_opt}"
+	job_var_opts+=("--max_open_zones=${max_open_zones_opt}")
+fi
+
 echo -n "First sequential zone starts at sector $first_sequential_zone_sector;"
 echo " zone size: $((zone_size >> 20)) MB"
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/6] t/zbd: Reset all zones before test when max open zones is specified
  2020-08-13  4:57 [PATCH 0/6] Improve open zones accounting Shin'ichiro Kawasaki
                   ` (3 preceding siblings ...)
  2020-08-13  4:57 ` [PATCH 4/6] t/zbd: Add -o option to t/zbd/test-zoned-support Shin'ichiro Kawasaki
@ 2020-08-13  4:57 ` Shin'ichiro Kawasaki
  2020-08-15  0:07   ` Dmitry Fomichev
  2020-08-13  4:57 ` [PATCH 6/6] t/zbd: Remove unnecessary option for zbc_reset_zone Shin'ichiro Kawasaki
  5 siblings, 1 reply; 17+ messages in thread
From: Shin'ichiro Kawasaki @ 2020-08-13  4:57 UTC (permalink / raw)
  To: fio, Jens Axboe; +Cc: Damien Le Moal, Dmitry Fomichev, Shinichiro Kawasaki

When the test target device has maximum open zones limit, the zones in
test target region may not be opened up to the limit, because the zones
out of the test target region may have open zones. To ensure that the
test target zones can be opened up to the limit, reset all zones of the
test target device before the test cases with write work load starts.

Introduce the helper function prep_write() to check if all zone reset is
required and do the reset. For test cases which reset all zones
regardless of the maximum open zones option., the helper function takes
"reset_all" argument.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 t/zbd/test-zbd-support | 44 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index c21d6aad..9ec406c3 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -181,6 +181,16 @@ run_fio_on_seq() {
     run_one_fio_job "${opts[@]}" "$@"
 }
 
+# Prepare for write test by resetting zones. When "reset_all" is specified,
+# reset all sequential write required zones. When max_open_zones option is
+# specified, reset all zones of the test target to ensure that zones out of the
+# test target range do not have open zones. This allows the write test to the
+# target range to be able to open zones up to max_open_zones.
+prep_write() {
+	[[ ${1} == "reset_all" || -n "${max_open_zones_opt}" ]] &&
+		[[ -n "${is_zbd}" ]] && reset_zone "${dev}" -1
+}
+
 # Check whether buffered writes are refused.
 test1() {
     run_fio --name=job1 --filename="$dev" --rw=write --direct=0 --bs=4K	\
@@ -252,6 +262,7 @@ test4() {
 test5() {
     local size off capacity
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
     size=$((4 * zone_size))
@@ -267,6 +278,7 @@ test5() {
 test6() {
     local size off capacity
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
     size=$((4 * zone_size))
@@ -285,6 +297,7 @@ test7() {
     local size=$((zone_size))
     local off capacity
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 1 $off $dev)
     run_fio_on_seq "$(ioengine "libaio")" --iodepth=1 --rw=randwrite	\
@@ -299,6 +312,7 @@ test7() {
 test8() {
     local size off capacity
 
+    prep_write
     size=$((4 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
@@ -319,6 +333,7 @@ test9() {
 	return 0
     fi
 
+    prep_write
     size=$((4 * zone_size))
     run_fio_on_seq --ioengine=sg					\
 		   --iodepth=1 --rw=randwrite --bs=16K			\
@@ -337,6 +352,7 @@ test10() {
 	return 0
     fi
 
+    prep_write
     size=$((4 * zone_size))
     run_fio_on_seq --ioengine=sg 					\
 		   --iodepth=64 --rw=randwrite --bs=16K			\
@@ -350,6 +366,7 @@ test10() {
 test11() {
     local size off capacity
 
+    prep_write
     size=$((4 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
@@ -364,6 +381,7 @@ test11() {
 test12() {
     local size off capacity
 
+    prep_write
     size=$((8 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 8 $off $dev)
@@ -378,6 +396,7 @@ test12() {
 test13() {
     local size off capacity
 
+    prep_write
     size=$((8 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 8 $off $dev)
@@ -393,6 +412,7 @@ test13() {
 test14() {
     local size
 
+    prep_write
     size=$((16 * 2**20)) # 20 MB
     if [ $size -gt $((first_sequential_zone_sector * 512)) ]; then
 	echo "$dev does not have enough sequential zones" \
@@ -417,6 +437,7 @@ test15() {
 	    reset_zone "$dev" $((first_sequential_zone_sector +
 				 i*sectors_per_zone))
     done
+    prep_write
     w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
     w_size=$((2 * zone_size))
     w_capacity=$(total_zone_capacity 2 $w_off $dev)
@@ -441,6 +462,7 @@ test16() {
 	    reset_zone "$dev" $((first_sequential_zone_sector +
 				 i*sectors_per_zone))
     done
+    prep_write
     w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
     w_size=$((2 * zone_size))
     w_capacity=$(total_zone_capacity 2 $w_off $dev)
@@ -463,6 +485,7 @@ test17() {
     if [ -n "$is_zbd" ]; then
 	reset_zone "$dev" $((off / 512)) || return $?
     fi
+    prep_write
     run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw --bs=4K \
 		    --zonemode=zbd --zonesize="${zone_size}"		\
 		    --offset=$off --loops=2 --norandommap=1\
@@ -516,6 +539,7 @@ test24() {
     local bs loops=9 size=$((zone_size))
     local off capacity
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 1 $off $dev)
 
@@ -538,6 +562,7 @@ test25() {
         [ -n "$is_zbd" ] &&
 	    reset_zone "$dev" $((first_sequential_zone_sector + i*sectors_per_zone))
     done
+    prep_write
     for ((i=0;i<16;i++)); do
 	opts+=("--name=job$i" "--filename=$dev" "--thread=1" "--direct=1")
 	opts+=("--offset=$((first_sequential_zone_sector*512 + zone_size*i))")
@@ -552,6 +577,7 @@ write_to_first_seq_zone() {
     local loops=4 r
     local off capacity
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 1 $off $dev)
 
@@ -580,7 +606,7 @@ test28() {
     local i jobs=16 off opts
 
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
-    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    prep_write reset_all
     opts=("--debug=zbd")
     capacity=$(total_zone_capacity 1 $off $dev)
     for ((i=0;i<jobs;i++)); do
@@ -604,7 +630,7 @@ test29() {
 
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     size=$((16*zone_size))
-    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    prep_write reset_all
     opts=("--debug=zbd")
     for ((i=0;i<jobs;i++)); do
 	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
@@ -623,6 +649,7 @@ test29() {
 test30() {
     local off
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw	\
 		    --bs="$(max $((zone_size / 128)) "$logical_block_size")"\
@@ -636,6 +663,7 @@ test30() {
 test31() {
     local bs inc nz off opts size
 
+    prep_write
     # Start with writing 128 KB to 128 sequential zones.
     bs=128K
     nz=128
@@ -668,6 +696,7 @@ test31() {
 test32() {
     local off opts=() size
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     size=$((disk_size - off))
     opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
@@ -684,6 +713,7 @@ test33() {
     local bs io_size size
     local off capacity=0;
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 1 $off $dev)
     size=$((2 * zone_size))
@@ -700,6 +730,7 @@ test33() {
 test34() {
     local size
 
+    prep_write
     size=$((2 * zone_size))
     run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write --size=$size \
 		   --do_verify=1 --verify=md5 --bs=$((3 * zone_size / 4)) \
@@ -711,6 +742,7 @@ test34() {
 test35() {
     local bs off io_size size
 
+    prep_write
     off=$(((first_sequential_zone_sector + 1) * 512))
     size=$((zone_size - 2 * 512))
     bs=$((zone_size / 4))
@@ -725,6 +757,7 @@ test35() {
 test36() {
     local bs off io_size size
 
+    prep_write
     off=$(((first_sequential_zone_sector) * 512))
     size=$((zone_size - 512))
     bs=$((zone_size / 4))
@@ -739,6 +772,7 @@ test36() {
 test37() {
     local bs off size capacity
 
+    prep_write
     capacity=$(total_zone_capacity 1 $first_sequential_zone_sector $dev)
     if [ "$first_sequential_zone_sector" = 0 ]; then
 	off=0
@@ -758,6 +792,7 @@ test37() {
 test38() {
     local bs off size
 
+    prep_write
     size=$((logical_block_size))
     off=$((disk_size - logical_block_size))
     bs=$((logical_block_size))
@@ -828,6 +863,7 @@ test45() {
     local bs i
 
     [ -z "$is_zbd" ] && return 0
+    prep_write
     bs=$((logical_block_size))
     run_one_fio_job "$(ioengine "psync")" --iodepth=1 --rw=randwrite --bs=$bs\
 		    --offset=$((first_sequential_zone_sector * 512)) \
@@ -840,6 +876,7 @@ test45() {
 test46() {
     local size
 
+    prep_write
     size=$((4 * zone_size))
     run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=4K \
 		   --group_reporting=1 --numjobs=8 \
@@ -851,6 +888,7 @@ test46() {
 test47() {
     local bs
 
+    prep_write
     bs=$((logical_block_size))
     run_fio_on_seq "$(ioengine "psync")" --rw=write --bs=$bs --zoneskip=1 \
 		    >> "${logfile}.${test_number}" 2>&1 && return 1
@@ -865,7 +903,7 @@ test48() {
 
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     size=$((16*zone_size))
-    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    prep_write reset_all
     opts=("--aux-path=/tmp" "--allow_file_create=0" "--significant_figures=10")
     opts+=("--debug=zbd")
     opts+=("$(ioengine "libaio")" "--rw=randwrite" "--direct=1")
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/6] t/zbd: Remove unnecessary option for zbc_reset_zone
  2020-08-13  4:57 [PATCH 0/6] Improve open zones accounting Shin'ichiro Kawasaki
                   ` (4 preceding siblings ...)
  2020-08-13  4:57 ` [PATCH 5/6] t/zbd: Reset all zones before test when max open zones is specified Shin'ichiro Kawasaki
@ 2020-08-13  4:57 ` Shin'ichiro Kawasaki
  2020-08-15  0:07   ` Dmitry Fomichev
  5 siblings, 1 reply; 17+ messages in thread
From: Shin'ichiro Kawasaki @ 2020-08-13  4:57 UTC (permalink / raw)
  To: fio, Jens Axboe; +Cc: Damien Le Moal, Dmitry Fomichev, Shinichiro Kawasaki

When -all option is specified, zbc_reset_zone command does not take
offset option. Remove it.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 t/zbd/functions | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/zbd/functions b/t/zbd/functions
index 81b6f3f7..1a64a215 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -185,7 +185,7 @@ reset_zone() {
 	fi
     else
 	if [ "$offset" -lt 0 ]; then
-	    ${zbc_reset_zone} -all "$dev" "${offset}" >/dev/null
+	    ${zbc_reset_zone} -all "$dev" >/dev/null
 	else
 	    ${zbc_reset_zone} -sector "$dev" "${offset}" >/dev/null
 	fi
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] zbd: Decrement open zones count at write command completion
  2020-08-13  4:57 ` [PATCH 1/6] zbd: Decrement open zones count at write command completion Shin'ichiro Kawasaki
@ 2020-08-15  0:05   ` Dmitry Fomichev
  2020-08-19 12:35     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Fomichev @ 2020-08-15  0:05 UTC (permalink / raw)
  To: fio, axboe, Shinichiro Kawasaki; +Cc: Damien Le Moal

On Thu, 2020-08-13 at 13:57 +0900, Shin'ichiro Kawasaki wrote:
> When max_open_zones or job_max_open_zones option is provided, fio counts
> number of open zones. This open zone accounting is done during submission
> of write commands. When write command is submitted to a zone for the
> first time, the zone is counted as open. When a write command which will
> fill the zone gets submitted, the zone is counted as closed. However,
> this count at write command submission may open zones more than the
> limit. When writes to zones more than the limit are submitted with high
> queue depth, those writes in-flight open zones more than the limit.
> 
> To avoid such writes beyond max open zones limit, decrement number of
> open zones count not at write command submission but at write command
> completion. By doing this, the number of zones with in-flight write
> commands are kept within the limit with accuracy. Introduce the helper
> function zbd_end_zone_io() for this decrement and zbd_close_zone() call.
> 
> The zbd_close_zone() function requires thread_data argument. To refer
> thread_data at write command completion, add the argument to zbd_put_io()
> and zbd_queue_io() callbacks. Also add a loop to the zbd_close_zone()
> function which converts zone_info array index to open_zones array index
> to avoid loop code duplication.
> 
> Even when io_u points to an open zone, the zone may not be valid for
> write since in-flight write commands may make the zone full. Check this
> in zbd_open_zone() to handle such zones as in full status.
> 
> Because of the zone close timing change, there might be no open zone when
> zbd_convert_to_open_zone() is called. Do not handle such case as an
> error and open other zones.
> 
> When zbd_convert_to_open_zone() finds all open zones can not be used for
> next write, it opens other zones. This zone open fails when the number of
> open zones hits one of the limits: 1) number of zones in the fio write
> target region, 2) max_open_zones option or 3) job_max_open_zones option.
> To avoid the zone open failure, wait for writes in-flight completes and
> open zones get closed before opening other zones.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  io_u.c      |  4 +--
>  io_u.h      |  5 +--
>  ioengines.c |  4 +--
>  zbd.c       | 92 +++++++++++++++++++++++++++++++++++++++++------------
>  zbd.h       |  9 +++---
>  5 files changed, 84 insertions(+), 30 deletions(-)
> 
> diff --git a/io_u.c b/io_u.c
> index 2ef5acec..62bfec47 100644
> --- a/io_u.c
> +++ b/io_u.c
> @@ -795,7 +795,7 @@ void put_io_u(struct thread_data *td, struct io_u *io_u)
>  {
>  	const bool needs_lock = td_async_processing(td);
>  
> -	zbd_put_io_u(io_u);
> +	zbd_put_io_u(td, io_u);
>  
>  	if (td->parent)
>  		td = td->parent;
> @@ -1364,7 +1364,7 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
>  		if (!fill_io_u(td, io_u))
>  			break;
>  
> -		zbd_put_io_u(io_u);
> +		zbd_put_io_u(td, io_u);
>  
>  		put_file_log(td, f);
>  		td_io_close_file(td, f);
> diff --git a/io_u.h b/io_u.h
> index 31100928..5a28689c 100644
> --- a/io_u.h
> +++ b/io_u.h
> @@ -101,13 +101,14 @@ struct io_u {
>  	 * @success == true means that the I/O operation has been queued or
>  	 * completed successfully.
>  	 */
> -	void (*zbd_queue_io)(struct io_u *, int q, bool success);
> +	void (*zbd_queue_io)(struct thread_data *td, struct io_u *, int q,
> +			     bool success);
>  
>  	/*
>  	 * ZBD mode zbd_put_io callback: called in after completion of an I/O
>  	 * or commit of an async I/O to unlock the I/O target zone.
>  	 */
> -	void (*zbd_put_io)(const struct io_u *);
> +	void (*zbd_put_io)(struct thread_data *td, const struct io_u *);
>  
>  	/*
>  	 * Callback for io completion
> diff --git a/ioengines.c b/ioengines.c
> index 1c5970a4..476df58d 100644
> --- a/ioengines.c
> +++ b/ioengines.c
> @@ -352,7 +352,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
>  	}
>  
>  	ret = td->io_ops->queue(td, io_u);
> -	zbd_queue_io_u(io_u, ret);
> +	zbd_queue_io_u(td, io_u, ret);
>  
>  	unlock_file(td, io_u->file);
>  
> @@ -394,7 +394,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
>  	if (!td->io_ops->commit) {
>  		io_u_mark_submit(td, 1);
>  		io_u_mark_complete(td, 1);
> -		zbd_put_io_u(io_u);
> +		zbd_put_io_u(td, io_u);
>  	}
>  
>  	if (ret == FIO_Q_COMPLETED) {
> diff --git a/zbd.c b/zbd.c
> index e4a480b7..20f64b58 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -721,12 +721,18 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
>  
>  /* The caller must hold f->zbd_info->mutex */
>  static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
> -			   unsigned int open_zone_idx)
> +			   unsigned int zone_idx)
>  {
> -	uint32_t zone_idx;
> +	uint32_t open_zone_idx = 0;
> +
> +	for (; open_zone_idx < f->zbd_info->num_open_zones; open_zone_idx++) {
> +		if (f->zbd_info->open_zones[open_zone_idx] == zone_idx)
> +			break;
> +	}
> +	if (open_zone_idx == f->zbd_info->num_open_zones)

I understand that the assert below needs to be removed because this function can be called
as a part of the reset of all zones. Still, maybe add a debug message saying "zone %d is
not open" here?

> +		return;
>  
> -	assert(open_zone_idx < f->zbd_info->num_open_zones);
> -	zone_idx = f->zbd_info->open_zones[open_zone_idx];
> +	dprint(FD_ZBD, "%s: closing zone %d\n", f->file_name, zone_idx);
>  	memmove(f->zbd_info->open_zones + open_zone_idx,
>  		f->zbd_info->open_zones + open_zone_idx + 1,
>  		(ZBD_MAX_OPEN_ZONES - (open_zone_idx + 1)) *
> @@ -765,13 +771,8 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
>  			continue;
>  		zone_lock(td, f, z);
>  		if (all_zones) {
> -			unsigned int i;
> -
>  			pthread_mutex_lock(&f->zbd_info->mutex);
> -			for (i = 0; i < f->zbd_info->num_open_zones; i++) {
> -				if (f->zbd_info->open_zones[i] == nz)
> -					zbd_close_zone(td, f, i);
> -			}
> +			zbd_close_zone(td, f, nz);
>  			pthread_mutex_unlock(&f->zbd_info->mutex);
>  
>  			reset_wp = z->wp != z->start;
> @@ -952,8 +953,15 @@ static bool zbd_open_zone(struct thread_data *td, const struct io_u *io_u,
>  		return false;
>  
>  	pthread_mutex_lock(&f->zbd_info->mutex);
> -	if (is_zone_open(td, f, zone_idx))
> +	if (is_zone_open(td, f, zone_idx)) {
> +		/*
> +		 * If the zone is already open and going to be full by writes
> +		 * in-flight, handle it as a full zone instead of an open zone.
> +		 */
> +		if (z->wp >= zbd_zone_capacity_end(z))
> +			res = false;
>  		goto out;
> +	}
>  	res = false;
>  	/* Zero means no limit */
>  	if (td->o.job_max_open_zones > 0 &&
> @@ -995,6 +1003,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
>  	unsigned int open_zone_idx = -1;
>  	uint32_t zone_idx, new_zone_idx;
>  	int i;
> +	bool wait_zone_close;
>  
>  	assert(is_valid_offset(f, io_u->offset));
>  
> @@ -1030,11 +1039,9 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
>  		if (td->o.max_open_zones == 0 && td->o.job_max_open_zones == 0)
>  			goto examine_zone;
>  		if (f->zbd_info->num_open_zones == 0) {
> -			pthread_mutex_unlock(&f->zbd_info->mutex);
> -			pthread_mutex_unlock(&z->mutex);
>  			dprint(FD_ZBD, "%s(%s): no zones are open\n",
>  			       __func__, f->file_name);

I was not able to trigger this message by running t/zbd/test-zbd-support script
with -z option. Do you know an easy way to trigger this situation?
I don't see anything wrong with the code here, my concern is mainly about
test coverage...

> -			return NULL;
> +			goto open_other_zone;
>  		}
>  
>  		/*
> @@ -1081,14 +1088,30 @@ examine_zone:
>  		pthread_mutex_unlock(&f->zbd_info->mutex);
>  		goto out;
>  	}
> -	dprint(FD_ZBD, "%s(%s): closing zone %d\n", __func__, f->file_name,
> -	       zone_idx);
> -	if (td->o.max_open_zones || td->o.job_max_open_zones)
> -		zbd_close_zone(td, f, open_zone_idx);
> +
> +open_other_zone:
> +	/* Check if number of open zones reaches one of limits. */
> +	wait_zone_close =
> +		f->zbd_info->num_open_zones == f->max_zone - f->min_zone ||
> +		(td->o.max_open_zones &&
> +		 f->zbd_info->num_open_zones == td->o.max_open_zones) ||
> +		(td->o.job_max_open_zones &&
> +		 td->num_open_zones == td->o.job_max_open_zones);
> +
>  	pthread_mutex_unlock(&f->zbd_info->mutex);
>  
>  	/* Only z->mutex is held. */
>  
> +	/*
> +	 * When number of open zones reaches to one of limits, wait for
> +	 * zone close before opening a new zone.
> +	 */
> +	if (wait_zone_close) {
> +		dprint(FD_ZBD, "%s(%s): quiesce to allow open zones to close\n",
> +		       __func__, f->file_name);
> +		io_u_quiesce(td);
> +	}
> +
>  	/* Zone 'z' is full, so try to open a new zone. */
>  	for (i = f->io_size / f->zbd_info->zone_size; i > 0; i--) {
>  		zone_idx++;
> @@ -1203,6 +1226,29 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
>  	return NULL;
>  }
>  
> +/**
> + * zbd_end_zone_io - update zone status at command completion
> + * @io_u: I/O unit
> + * @z: zone info pointer
> + * @success: Whether or not the I/O unit has been queued successfully
> + *
> + * If the write command made the zone full, close it.
> + *
> + * The caller must hold z->mutex.
> + */
> +static void zbd_end_zone_io(struct thread_data *td, const struct io_u *io_u,
> +			    struct fio_zone_info *z, bool success)
> +{
> +	const struct fio_file *f = io_u->file;
> +
> +	if (io_u->ddir == DDIR_WRITE && success &&
> +	    io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
> +		pthread_mutex_lock(&f->zbd_info->mutex);
> +		zbd_close_zone(td, f, z - f->zbd_info->zone_info);
> +		pthread_mutex_unlock(&f->zbd_info->mutex);
> +	}
> +}
> +
>  /**
>   * zbd_queue_io - update the write pointer of a sequential zone
>   * @io_u: I/O unit
> @@ -1212,7 +1258,8 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
>   * For write and trim operations, update the write pointer of the I/O unit
>   * target zone.
>   */
> -static void zbd_queue_io(struct io_u *io_u, int q, bool success)
> +static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
> +			 bool success)
>  {
>  	const struct fio_file *f = io_u->file;
>  	struct zoned_block_device_info *zbd_info = f->zbd_info;
> @@ -1258,6 +1305,9 @@ static void zbd_queue_io(struct io_u *io_u, int q, bool success)
>  		break;
>  	}
>  
> +	if (q == FIO_Q_COMPLETED)
> +		zbd_end_zone_io(td, io_u, z, !io_u->error);

This could be changed to

+	if (q == FIO_Q_COMPLETED && !io_u->error)
+		zbd_end_zone_io(td, io_u, z);

making the last parameter of zbd_end_zone_io(), "success", unnecessary.

> +
>  unlock:
>  	if (!success || q != FIO_Q_QUEUED) {
>  		/* BUSY or COMPLETED: unlock the zone */
> @@ -1270,7 +1320,7 @@ unlock:
>   * zbd_put_io - Unlock an I/O unit target zone lock
>   * @io_u: I/O unit
>   */
> -static void zbd_put_io(const struct io_u *io_u)
> +static void zbd_put_io(struct thread_data *td, const struct io_u *io_u)
>  {
>  	const struct fio_file *f = io_u->file;
>  	struct zoned_block_device_info *zbd_info = f->zbd_info;
> @@ -1292,6 +1342,8 @@ static void zbd_put_io(const struct io_u *io_u)
>  	       "%s: terminate I/O (%lld, %llu) for zone %u\n",
>  	       f->file_name, io_u->offset, io_u->buflen, zone_idx);
>  
> +	zbd_end_zone_io(td, io_u, z, true);

The last parameter in this call above can be removed with the change above.

> +
>  	ret = pthread_mutex_unlock(&z->mutex);
>  	assert(ret == 0);
>  	zbd_check_swd(f);
> diff --git a/zbd.h b/zbd.h
> index 021174c1..bff55f99 100644
> --- a/zbd.h
> +++ b/zbd.h
> @@ -98,18 +98,19 @@ static inline void zbd_close_file(struct fio_file *f)
>  		zbd_free_zone_info(f);
>  }
>  
> -static inline void zbd_queue_io_u(struct io_u *io_u, enum fio_q_status status)
> +static inline void zbd_queue_io_u(struct thread_data *td, struct io_u *io_u,
> +				  enum fio_q_status status)
>  {
>  	if (io_u->zbd_queue_io) {
> -		io_u->zbd_queue_io(io_u, status, io_u->error == 0);
> +		io_u->zbd_queue_io(td, io_u, status, io_u->error == 0);
>  		io_u->zbd_queue_io = NULL;
>  	}
>  }
>  
> -static inline void zbd_put_io_u(struct io_u *io_u)
> +static inline void zbd_put_io_u(struct thread_data *td, struct io_u *io_u)
>  {
>  	if (io_u->zbd_put_io) {
> -		io_u->zbd_put_io(io_u);
> +		io_u->zbd_put_io(td, io_u);
>  		io_u->zbd_queue_io = NULL;
>  		io_u->zbd_put_io = NULL;
>  	}

Overall approach looks fine to me.

Dmitry

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] zbd: Initialize open zones list referring zone status at fio start
  2020-08-13  4:57 ` [PATCH 2/6] zbd: Initialize open zones list referring zone status at fio start Shin'ichiro Kawasaki
@ 2020-08-15  0:06   ` Dmitry Fomichev
  2020-08-19 12:51     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Fomichev @ 2020-08-15  0:06 UTC (permalink / raw)
  To: fio, axboe, Shinichiro Kawasaki; +Cc: Damien Le Moal

On Thu, 2020-08-13 at 13:57 +0900, Shin'ichiro Kawasaki wrote:
> When fio starts write workloads to zones with open status, fio does not
> reflect the zone status to open zone list. This results in inconsistent
> open zone accounting.
> 
> To avoid this inconsistency, initialize the open zone list referring the
> zone status by calling zbd_open_zone() function before workload start.
> Since io_u is not available at that timing, make the function
> independent from io_u.
> 
> Of note is that fio counts open zones within the write target range.
> Open zones out of the range are not counted or checked.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  zbd.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/zbd.c b/zbd.c
> index 20f64b58..123d5530 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -627,6 +627,9 @@ static int zbd_init_zone_info(struct thread_data *td, struct fio_file *file)
>  	return ret;
>  }
>  
> +static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
> +			  uint32_t zone_idx);
> +
>  int zbd_setup_files(struct thread_data *td)
>  {
>  	struct fio_file *f;
> @@ -650,6 +653,8 @@ int zbd_setup_files(struct thread_data *td)
>  
>  	for_each_file(td, f, i) {
>  		struct zoned_block_device_info *zbd = f->zbd_info;
> +		struct fio_zone_info *z;
> +		int zi;
>  
>  		if (!zbd)
>  			continue;
> @@ -665,6 +670,13 @@ int zbd_setup_files(struct thread_data *td)
>  			log_err("'max_open_zones' value is limited by %u\n", ZBD_MAX_OPEN_ZONES);
>  			return 1;
>  		}
> +
> +		for (zi = f->min_zone; zi < f->max_zone; zi++) {
> +			z = &zbd->zone_info[zi];
> +			if (z->cond == ZBD_ZONE_COND_IMP_OPEN ||
> +			    z->cond == ZBD_ZONE_COND_EXP_OPEN)
> +				zbd_open_zone(td, f, zi);
> +		}

What if zbd->max_open_zones is set less than the number of the currently
open zones in [max_zone:min_zone] range? In this case, the loop above will
open more zones than needed. Maybe this part should be changed to something
like the code below?

+		int zi, noz;
...
+		for (zi = f->min_zone, noz = 0; zi < f->max_zone; zi++) {
+			z = &zbd->zone_info[zi];
+			if (z->cond == ZBD_ZONE_COND_IMP_OPEN ||
+			    z->cond == ZBD_ZONE_COND_EXP_OPEN) {
+				if (noz < zbd->max_open_zones)
+					zbd_open_zone(td, f, zi);
+				else
+					zbd_reset_zone(td, f, zi);
+				noz++;
+			}
+		}

>  	}
>  
>  	return 0;
> @@ -934,11 +946,10 @@ static bool is_zone_open(const struct thread_data *td, const struct fio_file *f,
>   * was not yet open and opening a new zone would cause the zone limit to be
>   * exceeded.
>   */
> -static bool zbd_open_zone(struct thread_data *td, const struct io_u *io_u,
> +static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
>  			  uint32_t zone_idx)
>  {
>  	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
> -	const struct fio_file *f = io_u->file;
>  	struct fio_zone_info *z = &f->zbd_info->zone_info[zone_idx];
>  	bool res = true;
>  
> @@ -1126,7 +1137,7 @@ open_other_zone:
>  		zone_lock(td, f, z);
>  		if (z->open)
>  			continue;
> -		if (zbd_open_zone(td, io_u, zone_idx))
> +		if (zbd_open_zone(td, f, zone_idx))
>  			goto out;
>  	}
>  
> @@ -1169,7 +1180,7 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
>  	const struct fio_file *f = io_u->file;
>  	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
>  
> -	if (!zbd_open_zone(td, io_u, z - f->zbd_info->zone_info)) {
> +	if (!zbd_open_zone(td, f, z - f->zbd_info->zone_info)) {
>  		pthread_mutex_unlock(&z->mutex);
>  		z = zbd_convert_to_open_zone(td, io_u);
>  		assert(z);
> @@ -1581,7 +1592,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
>  	case DDIR_WRITE:
>  		if (io_u->buflen > f->zbd_info->zone_size)
>  			goto eof;
> -		if (!zbd_open_zone(td, io_u, zone_idx_b)) {
> +		if (!zbd_open_zone(td, f, zone_idx_b)) {
>  			pthread_mutex_unlock(&zb->mutex);
>  			zb = zbd_convert_to_open_zone(td, io_u);
>  			if (!zb)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/6] t/zbd: Improve usage message of test-zbd-support script
  2020-08-13  4:57 ` [PATCH 3/6] t/zbd: Improve usage message of test-zbd-support script Shin'ichiro Kawasaki
@ 2020-08-15  0:06   ` Dmitry Fomichev
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Fomichev @ 2020-08-15  0:06 UTC (permalink / raw)
  To: fio, axboe, Shinichiro Kawasaki; +Cc: Damien Le Moal

On Thu, 2020-08-13 at 13:57 +0900, Shin'ichiro Kawasaki wrote:
> The usage message of t/zbd/test-zbd-support does not explain meaning of
> each option. Elaborate meaning of the options.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>


Looks good.
Reviewed-by: Dmitry Fomichev: <dmitry.fomichev@wdc.com>

> ---
>  t/zbd/test-zbd-support | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
> index 139495d3..9e398cab 100755
> --- a/t/zbd/test-zbd-support
> +++ b/t/zbd/test-zbd-support
> @@ -5,7 +5,15 @@
>  # This file is released under the GPL.
>  
>  usage() {
> -    echo "Usage: $(basename "$0") [-d] [-e] [-l] [-r] [-v] [-t <test>] [-z] <SMR drive device node>"
> +	echo "Usage: $(basename "$0") [OPTIONS] <test target device file>"
> +	echo "Options:"
> +	echo -e "\t-d Run fio with valgrind using DRD tool"
> +	echo -e "\t-e Run fio with valgrind using helgrind tool"
> +	echo -e "\t-v Run fio with valgrind --read-var-info option"
> +	echo -e "\t-l Test with libzbc ioengine"
> +	echo -e "\t-r Reset all zones before test start"
> +	echo -e "\t-t <test #> Run only a single test case with specified number"
> +	echo -e "\t-z Run fio with debug=zbd option"
>  }
>  
>  max() {

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] t/zbd: Add -o option to t/zbd/test-zoned-support
  2020-08-13  4:57 ` [PATCH 4/6] t/zbd: Add -o option to t/zbd/test-zoned-support Shin'ichiro Kawasaki
@ 2020-08-15  0:07   ` Dmitry Fomichev
  2020-08-19 13:13     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Fomichev @ 2020-08-15  0:07 UTC (permalink / raw)
  To: fio, axboe, Shinichiro Kawasaki; +Cc: Damien Le Moal

On Thu, 2020-08-13 at 13:57 +0900, Shin'ichiro Kawasaki wrote:
> To specify maximum open zones of the test target device, add -o option
> to t/zbd/test-zoned-support. When this option is specified, add
> max_open_zones option to all fio commands in the test script. The option
> ensures the number of zones opened by fio is within the limit. This is
> useful when the test target device has maximum open zones limit.
> 
> When the fio command does not have multiple jobs and the test case does
> not specify max_open_zones, add single max_open_zones option to the fio
> command line.
> 
> When the fio command takes multiple jobs, add max_open_zones option to
> each job. Introduce job_var_opts variable to keep options to be added to
> each job. To distinguish it from global options for all jobs, rename
> var_opts variable to global_var_opts. When a test case with multiple jobs
> always specifies max_open_zones option, exclude the max_open_zones option
> from job_var_opts, using job_var_opts_exclude() helper function.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
 t/zbd/test-zbd-support | 70 ++++++++++++++++++++++++++++++++++--------
>  1 file changed, 57 insertions(+), 13 deletions(-)
> 
> diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
> index 9e398cab..c21d6aad 100755
> --- a/t/zbd/test-zbd-support
> +++ b/t/zbd/test-zbd-support
> @@ -12,6 +12,7 @@ usage() {
>  	echo -e "\t-v Run fio with valgrind --read-var-info option"
>  	echo -e "\t-l Test with libzbc ioengine"
>  	echo -e "\t-r Reset all zones before test start"
> +	echo -e "\t-o <max_open_zones> Run fio with max_open_zones limit"
>  	echo -e "\t-t <test #> Run only a single test case with specified number"
>  	echo -e "\t-z Run fio with debug=zbd option"
>  }
> @@ -103,14 +104,41 @@ is_scsi_device() {
>      return 1
>  }
>  
> +job_var_opts_exclude() {
> +	local o
> +	local ex_key="${1}"
> +
> +	for o in "${job_var_opts[@]}"; do
> +		if [[ ${o} =~ "${ex_key}" ]]; then
> +			continue
> +		fi
> +		echo -n "${o}"
> +	done
> +}
> +
> +has_max_open_zones() {
> +	while (($# > 1)); do
> +		if [[ ${1} =~ "--max_open_zones" ]]; then
> +			return 0
> +		fi
> +		shift
> +	done
> +	return 1
> +}
> +
>  run_fio() {
>      local fio opts
>  
>      fio=$(dirname "$0")/../../fio
>  
> -    opts=("--max-jobs=16" "--aux-path=/tmp" "--allow_file_create=0" \
> -	  "--significant_figures=10" "$@")
> -    opts+=(${var_opts[@]})
> +    opts=(${global_var_opts[@]})
> +    opts+=("--max-jobs=16" "--aux-path=/tmp" "--allow_file_create=0" \
> +			   "--significant_figures=10" "$@")
> +    # When max_open_zones option is specified to this test script, add
> +    # max_open_zones option to fio command unless the test case already add it.
> +    if [[ -n ${max_open_zones_opt} ]] && ! has_max_open_zones "${opts[@]}"; then
> +	    opts+=("--max_open_zones=${max_open_zones_opt}")
> +    fi
>      { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
>  
>      "${dynamic_analyzer[@]}" "$fio" "${opts[@]}"
> @@ -128,13 +156,16 @@ write_and_run_one_fio_job() {
>      local r
>      local write_offset="${1}"
>      local write_size="${2}"
> +    local -a write_opts
>  
>      shift 2
>      r=$(((RANDOM << 16) | RANDOM))
> -    run_fio --filename="$dev" --randseed="$r"  --name="write_job" --rw=write \
> -	    "$(ioengine "psync")" --bs="${logical_block_size}" \
> -	    --zonemode=zbd --zonesize="${zone_size}" --thread=1 --direct=1 \
> -	    --offset="${write_offset}" --size="${write_size}" \
> +    write_opts=(--name="write_job" --rw=write "$(ioengine "psync")" \
> +		      --bs="${logical_block_size}" --zonemode=zbd \
> +		      --zonesize="${zone_size}" --thread=1 --direct=1 \
> +		      --offset="${write_offset}" --size="${write_size}")
> +    write_opts+=("${job_var_opts[@]}")
> +    run_fio --filename="$dev" --randseed="$r" "${write_opts[@]}" \
>  	    --name="$dev" --wait_for="write_job" "$@" --thread=1 --direct=1
>  }
>  
> @@ -512,7 +543,7 @@ test25() {
>  	opts+=("--offset=$((first_sequential_zone_sector*512 + zone_size*i))")
>  	opts+=("--size=$zone_size" "$(ioengine "psync")" "--rw=write" "--bs=16K")
>  	opts+=("--zonemode=zbd" "--zonesize=${zone_size}" "--group_reporting=1")
> -	opts+=(${var_opts[@]})
> +	opts+=(${job_var_opts[@]})
>      done
>      run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
>  }
> @@ -557,7 +588,7 @@ test28() {
>  	opts+=("--size=$zone_size" "--io_size=$capacity" "$(ioengine "psync")" "--rw=randwrite")
>  	opts+=("--thread=1" "--direct=1" "--zonemode=zbd")
>  	opts+=("--zonesize=${zone_size}" "--group_reporting=1")
> -	opts+=(${var_opts[@]})
> +	opts+=(${job_var_opts[@]})
>      done
>      run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
>      check_written $((jobs * $capacity)) || return $?
> @@ -581,7 +612,8 @@ test29() {
>  	opts+=("$(ioengine "psync")" "--rw=randwrite" "--direct=1")
>  	opts+=("--max_open_zones=4" "--group_reporting=1")
>  	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
> -	opts+=(${var_opts[@]})
> +	# max_open_zones is already specified
> +	opts+=($(job_var_opts_exclude "--max_open_zones"))
>      done
>      run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
>      check_written $((jobs * zone_size)) || return $?
> @@ -617,7 +649,7 @@ test31() {
>  	opts+=("--bs=$bs" "--size=$zone_size" "$(ioengine "libaio")")
>  	opts+=("--rw=write" "--direct=1" "--thread=1" "--stats=0")
>  	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
> -	opts+=(${var_opts[@]})
> +	opts+=(${job_var_opts[@]})
>      done
>      "$(dirname "$0")/../../fio" "${opts[@]}" >> "${logfile}.${test_number}" 2>&1
>      # Next, run the test.
> @@ -627,6 +659,7 @@ test31() {
>      opts+=("--bs=$bs" "$(ioengine "psync")" "--rw=randread" "--direct=1")
>      opts+=("--thread=1" "--time_based" "--runtime=30" "--zonemode=zbd")
>      opts+=("--zonesize=${zone_size}")
> +    opts+=(${job_var_opts[@]})
>      run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
>  }
>  
> @@ -843,6 +876,8 @@ test48() {
>  	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
>  	opts+=("--io_size=$zone_size" "--iodepth=256" "--thread=1")
>  	opts+=("--group_reporting=1")
> +	# max_open_zones is already specified
> +	opts+=($(job_var_opts_exclude "--max_open_zones"))
>      done
>  
>      fio=$(dirname "$0")/../../fio
> @@ -880,6 +915,7 @@ dynamic_analyzer=()
>  reset_all_zones=
>  use_libzbc=
>  zbd_debug=
> +max_open_zones_opt=
>  
>  while [ "${1#-}" != "$1" ]; do
>    case "$1" in
> @@ -891,6 +927,7 @@ while [ "${1#-}" != "$1" ]; do
>      -l) use_libzbc=1; shift;;
>      -r) reset_all_zones=1; shift;;
>      -t) tests+=("$2"); shift; shift;;
> +    -o) max_open_zones_opt="${2}"; shift; shift;;
>      -v) dynamic_analyzer=(valgrind "--read-var-info=yes");
>  	shift;;
>      -z) zbd_debug=1; shift;;
> @@ -906,9 +943,10 @@ fi
>  # shellcheck source=functions
>  source "$(dirname "$0")/functions" || exit $?
>  
> -var_opts=()
> +global_var_opts=()
> +job_var_opts=()
>  if [ -n "$zbd_debug" ]; then
> -    var_opts+=("--debug=zbd")
> +    global_var_opts+=("--debug=zbd")
>  fi
>  dev=$1
>  realdev=$(readlink -f "$dev")
> @@ -994,6 +1032,12 @@ elif [[ -c "$realdev" ]]; then
>  	fi
>  fi
>  
> +if [[ -n ${max_open_zones_opt} ]]; then
> +	# Override max_open_zones with the script option value
> +	max_open_zones="${max_open_zones_opt}"
> +	job_var_opts+=("--max_open_zones=${max_open_zones_opt}")
> +fi
> +
>  echo -n "First sequential zone starts at sector $first_sequential_zone_sector;"
>  echo " zone size: $((zone_size >> 20)) MB"
>  

The script changes look good. The only nit is... I tried to run

t/zbd/test-zbd-support with -o 512 <device>

which is larger than 128 max open zones supported by the drive expecting
the test to fail, but it didn't. It seems, the script defaults to the maximum
number of open zones in this case, but it is not very obvious. Maybe log
a message saying that the script has capped the value specified by the user
because it was too large?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] t/zbd: Remove unnecessary option for zbc_reset_zone
  2020-08-13  4:57 ` [PATCH 6/6] t/zbd: Remove unnecessary option for zbc_reset_zone Shin'ichiro Kawasaki
@ 2020-08-15  0:07   ` Dmitry Fomichev
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Fomichev @ 2020-08-15  0:07 UTC (permalink / raw)
  To: fio, axboe, Shinichiro Kawasaki; +Cc: Damien Le Moal

On Thu, 2020-08-13 at 13:57 +0900, Shin'ichiro Kawasaki wrote:
> When -all option is specified, zbc_reset_zone command does not take
> offset option. Remove it.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  t/zbd/functions | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/zbd/functions b/t/zbd/functions
> index 81b6f3f7..1a64a215 100644
> --- a/t/zbd/functions
> +++ b/t/zbd/functions
> @@ -185,7 +185,7 @@ reset_zone() {
>  	fi
>      else
>  	if [ "$offset" -lt 0 ]; then
> -	    ${zbc_reset_zone} -all "$dev" "${offset}" >/dev/null
> +	    ${zbc_reset_zone} -all "$dev" >/dev/null
>  	else
>  	    ${zbc_reset_zone} -sector "$dev" "${offset}" >/dev/null
>  	fi

Nice find. Is seems to fix -r option in test-zbd-support script too :)
Reviewed-by: Dmitry Fomichev: <dmitry.fomichev@wdc.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] t/zbd: Reset all zones before test when max open zones is specified
  2020-08-13  4:57 ` [PATCH 5/6] t/zbd: Reset all zones before test when max open zones is specified Shin'ichiro Kawasaki
@ 2020-08-15  0:07   ` Dmitry Fomichev
  2020-08-21  6:06     ` Shinichiro Kawasaki
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Fomichev @ 2020-08-15  0:07 UTC (permalink / raw)
  To: fio, axboe, Shinichiro Kawasaki; +Cc: Damien Le Moal

On Thu, 2020-08-13 at 13:57 +0900, Shin'ichiro Kawasaki wrote:
> When the test target device has maximum open zones limit, the zones in
> test target region may not be opened up to the limit, because the zones
> out of the test target region may have open zones. To ensure that the
> test target zones can be opened up to the limit, reset all zones of the
> test target device before the test cases with write work load starts.
> 
> Introduce the helper function prep_write() to check if all zone reset is
> required and do the reset. For test cases which reset all zones
> regardless of the maximum open zones option., the helper function takes
> "reset_all" argument.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

The patch looks ok, but I wonder if some of the reset-all cases can be
unnecessary if you reset excessive open zones in zbd_setup_files() as
I suggested in the comment to the previous patch. Anyway,

Reviewed-by: Dmitry Fomichev: <dmitry.fomichev@wdc.com>

> ---
>  t/zbd/test-zbd-support | 44 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
> index c21d6aad..9ec406c3 100755
> --- a/t/zbd/test-zbd-support
> +++ b/t/zbd/test-zbd-support
> @@ -181,6 +181,16 @@ run_fio_on_seq() {
>      run_one_fio_job "${opts[@]}" "$@"
>  }
>  
> +# Prepare for write test by resetting zones. When "reset_all" is specified,
> +# reset all sequential write required zones. When max_open_zones option is
> +# specified, reset all zones of the test target to ensure that zones out of the
> +# test target range do not have open zones. This allows the write test to the
> +# target range to be able to open zones up to max_open_zones.
> +prep_write() {
> +	[[ ${1} == "reset_all" || -n "${max_open_zones_opt}" ]] &&
> +		[[ -n "${is_zbd}" ]] && reset_zone "${dev}" -1
> +}
> +
>  # Check whether buffered writes are refused.
>  test1() {
>      run_fio --name=job1 --filename="$dev" --rw=write --direct=0 --bs=4K	\
> @@ -252,6 +262,7 @@ test4() {
>  test5() {
>      local size off capacity
>  
> +    prep_write
>      off=$((first_sequential_zone_sector * 512))
>      capacity=$(total_zone_capacity 4 $off $dev)
>      size=$((4 * zone_size))
> @@ -267,6 +278,7 @@ test5() {
>  test6() {
>      local size off capacity
>  
> +    prep_write
>      off=$((first_sequential_zone_sector * 512))
>      capacity=$(total_zone_capacity 4 $off $dev)
>      size=$((4 * zone_size))
> @@ -285,6 +297,7 @@ test7() {
>      local size=$((zone_size))
>      local off capacity
>  
> +    prep_write
>      off=$((first_sequential_zone_sector * 512))
>      capacity=$(total_zone_capacity 1 $off $dev)
>      run_fio_on_seq "$(ioengine "libaio")" --iodepth=1 --rw=randwrite	\
> @@ -299,6 +312,7 @@ test7() {
>  test8() {
>      local size off capacity
>  
> +    prep_write
>      size=$((4 * zone_size))
>      off=$((first_sequential_zone_sector * 512))
>      capacity=$(total_zone_capacity 4 $off $dev)
> @@ -319,6 +333,7 @@ test9() {
>  	return 0
>      fi
>  
> +    prep_write
>      size=$((4 * zone_size))
>      run_fio_on_seq --ioengine=sg					\
>  		   --iodepth=1 --rw=randwrite --bs=16K			\
> @@ -337,6 +352,7 @@ test10() {
>  	return 0
>      fi
>  
> +    prep_write
>      size=$((4 * zone_size))
>      run_fio_on_seq --ioengine=sg 					\
>  		   --iodepth=64 --rw=randwrite --bs=16K			\
> @@ -350,6 +366,7 @@ test10() {
>  test11() {
>      local size off capacity
>  
> +    prep_write
>      size=$((4 * zone_size))
>      off=$((first_sequential_zone_sector * 512))
>      capacity=$(total_zone_capacity 4 $off $dev)
> @@ -364,6 +381,7 @@ test11() {
>  test12() {
>      local size off capacity
>  
> +    prep_write
>      size=$((8 * zone_size))
>      off=$((first_sequential_zone_sector * 512))
>      capacity=$(total_zone_capacity 8 $off $dev)
> @@ -378,6 +396,7 @@ test12() {
>  test13() {
>      local size off capacity
>  
> +    prep_write
>      size=$((8 * zone_size))
>      off=$((first_sequential_zone_sector * 512))
>      capacity=$(total_zone_capacity 8 $off $dev)
> @@ -393,6 +412,7 @@ test13() {
>  test14() {
>      local size
>  
> +    prep_write
>      size=$((16 * 2**20)) # 20 MB
>      if [ $size -gt $((first_sequential_zone_sector * 512)) ]; then
>  	echo "$dev does not have enough sequential zones" \
> @@ -417,6 +437,7 @@ test15() {
>  	    reset_zone "$dev" $((first_sequential_zone_sector +
>  				 i*sectors_per_zone))
>      done
> +    prep_write
>      w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
>      w_size=$((2 * zone_size))
>      w_capacity=$(total_zone_capacity 2 $w_off $dev)
> @@ -441,6 +462,7 @@ test16() {
>  	    reset_zone "$dev" $((first_sequential_zone_sector +
>  				 i*sectors_per_zone))
>      done
> +    prep_write
>      w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
>      w_size=$((2 * zone_size))
>      w_capacity=$(total_zone_capacity 2 $w_off $dev)
> @@ -463,6 +485,7 @@ test17() {
>      if [ -n "$is_zbd" ]; then
>  	reset_zone "$dev" $((off / 512)) || return $?
>      fi
> +    prep_write
>      run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw --bs=4K \
>  		    --zonemode=zbd --zonesize="${zone_size}"		\
>  		    --offset=$off --loops=2 --norandommap=1\
> @@ -516,6 +539,7 @@ test24() {
>      local bs loops=9 size=$((zone_size))
>      local off capacity
>  
> +    prep_write
>      off=$((first_sequential_zone_sector * 512))
>      capacity=$(total_zone_capacity 1 $off $dev)
>  
> @@ -538,6 +562,7 @@ test25() {
>          [ -n "$is_zbd" ] &&
>  	    reset_zone "$dev" $((first_sequential_zone_sector + i*sectors_per_zone))
>      done
> +    prep_write
>      for ((i=0;i<16;i++)); do
>  	opts+=("--name=job$i" "--filename=$dev" "--thread=1" "--direct=1")
>  	opts+=("--offset=$((first_sequential_zone_sector*512 + zone_size*i))")
> @@ -552,6 +577,7 @@ write_to_first_seq_zone() {
>      local loops=4 r
>      local off capacity
>  
> +    prep_write
>      off=$((first_sequential_zone_sector * 512))
>      capacity=$(total_zone_capacity 1 $off $dev)
>  
> @@ -580,7 +606,7 @@ test28() {
>      local i jobs=16 off opts
>  
>      off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
> -    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
> +    prep_write reset_all
>      opts=("--debug=zbd")
>      capacity=$(total_zone_capacity 1 $off $dev)
>      for ((i=0;i<jobs;i++)); do
> @@ -604,7 +630,7 @@ test29() {
>  
>      off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
>      size=$((16*zone_size))
> -    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
> +    prep_write reset_all
>      opts=("--debug=zbd")
>      for ((i=0;i<jobs;i++)); do
>  	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
> @@ -623,6 +649,7 @@ test29() {
>  test30() {
>      local off
>  
> +    prep_write
>      off=$((first_sequential_zone_sector * 512))
>      run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw	\
>  		    --bs="$(max $((zone_size / 128)) "$logical_block_size")"\
> @@ -636,6 +663,7 @@ test30() {
>  test31() {
>      local bs inc nz off opts size
>  
> +    prep_write
>      # Start with writing 128 KB to 128 sequential zones.
>      bs=128K
>      nz=128
> @@ -668,6 +696,7 @@ test31() {
>  test32() {
>      local off opts=() size
>  
> +    prep_write
>      off=$((first_sequential_zone_sector * 512))
>      size=$((disk_size - off))
>      opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
> @@ -684,6 +713,7 @@ test33() {
>      local bs io_size size
>      local off capacity=0;
>  
> +    prep_write
>      off=$((first_sequential_zone_sector * 512))
>      capacity=$(total_zone_capacity 1 $off $dev)
>      size=$((2 * zone_size))
> @@ -700,6 +730,7 @@ test33() {
>  test34() {
>      local size
>  
> +    prep_write
>      size=$((2 * zone_size))
>      run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write --size=$size \
>  		   --do_verify=1 --verify=md5 --bs=$((3 * zone_size / 4)) \
> @@ -711,6 +742,7 @@ test34() {
>  test35() {
>      local bs off io_size size
>  
> +    prep_write
>      off=$(((first_sequential_zone_sector + 1) * 512))
>      size=$((zone_size - 2 * 512))
>      bs=$((zone_size / 4))
> @@ -725,6 +757,7 @@ test35() {
>  test36() {
>      local bs off io_size size
>  
> +    prep_write
>      off=$(((first_sequential_zone_sector) * 512))
>      size=$((zone_size - 512))
>      bs=$((zone_size / 4))
> @@ -739,6 +772,7 @@ test36() {
>  test37() {
>      local bs off size capacity
>  
> +    prep_write
>      capacity=$(total_zone_capacity 1 $first_sequential_zone_sector $dev)
>      if [ "$first_sequential_zone_sector" = 0 ]; then
>  	off=0
> @@ -758,6 +792,7 @@ test37() {
>  test38() {
>      local bs off size
>  
> +    prep_write
>      size=$((logical_block_size))
>      off=$((disk_size - logical_block_size))
>      bs=$((logical_block_size))
> @@ -828,6 +863,7 @@ test45() {
>      local bs i
>  
>      [ -z "$is_zbd" ] && return 0
> +    prep_write
>      bs=$((logical_block_size))
>      run_one_fio_job "$(ioengine "psync")" --iodepth=1 --rw=randwrite --bs=$bs\
>  		    --offset=$((first_sequential_zone_sector * 512)) \
> @@ -840,6 +876,7 @@ test45() {
>  test46() {
>      local size
>  
> +    prep_write
>      size=$((4 * zone_size))
>      run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=4K \
>  		   --group_reporting=1 --numjobs=8 \
> @@ -851,6 +888,7 @@ test46() {
>  test47() {
>      local bs
>  
> +    prep_write
>      bs=$((logical_block_size))
>      run_fio_on_seq "$(ioengine "psync")" --rw=write --bs=$bs --zoneskip=1 \
>  		    >> "${logfile}.${test_number}" 2>&1 && return 1
> @@ -865,7 +903,7 @@ test48() {
>  
>      off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
>      size=$((16*zone_size))
> -    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
> +    prep_write reset_all
>      opts=("--aux-path=/tmp" "--allow_file_create=0" "--significant_figures=10")
>      opts+=("--debug=zbd")
>      opts+=("$(ioengine "libaio")" "--rw=randwrite" "--direct=1")

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] zbd: Decrement open zones count at write command completion
  2020-08-15  0:05   ` Dmitry Fomichev
@ 2020-08-19 12:35     ` Shinichiro Kawasaki
  0 siblings, 0 replies; 17+ messages in thread
From: Shinichiro Kawasaki @ 2020-08-19 12:35 UTC (permalink / raw)
  To: Dmitry Fomichev; +Cc: fio, axboe, Damien Le Moal

Hi Dmitry, thank you for your review comments and sorry about my slow response.

On Aug 15, 2020 / 00:05, Dmitry Fomichev wrote:
> On Thu, 2020-08-13 at 13:57 +0900, Shin'ichiro Kawasaki wrote:
> > When max_open_zones or job_max_open_zones option is provided, fio counts
> > number of open zones. This open zone accounting is done during submission
> > of write commands. When write command is submitted to a zone for the
> > first time, the zone is counted as open. When a write command which will
> > fill the zone gets submitted, the zone is counted as closed. However,
> > this count at write command submission may open zones more than the
> > limit. When writes to zones more than the limit are submitted with high
> > queue depth, those writes in-flight open zones more than the limit.
> > 
> > To avoid such writes beyond max open zones limit, decrement number of
> > open zones count not at write command submission but at write command
> > completion. By doing this, the number of zones with in-flight write
> > commands are kept within the limit with accuracy. Introduce the helper
> > function zbd_end_zone_io() for this decrement and zbd_close_zone() call.
> > 
> > The zbd_close_zone() function requires thread_data argument. To refer
> > thread_data at write command completion, add the argument to zbd_put_io()
> > and zbd_queue_io() callbacks. Also add a loop to the zbd_close_zone()
> > function which converts zone_info array index to open_zones array index
> > to avoid loop code duplication.
> > 
> > Even when io_u points to an open zone, the zone may not be valid for
> > write since in-flight write commands may make the zone full. Check this
> > in zbd_open_zone() to handle such zones as in full status.
> > 
> > Because of the zone close timing change, there might be no open zone when
> > zbd_convert_to_open_zone() is called. Do not handle such case as an
> > error and open other zones.
> > 
> > When zbd_convert_to_open_zone() finds all open zones can not be used for
> > next write, it opens other zones. This zone open fails when the number of
> > open zones hits one of the limits: 1) number of zones in the fio write
> > target region, 2) max_open_zones option or 3) job_max_open_zones option.
> > To avoid the zone open failure, wait for writes in-flight completes and
> > open zones get closed before opening other zones.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  io_u.c      |  4 +--
> >  io_u.h      |  5 +--
> >  ioengines.c |  4 +--
> >  zbd.c       | 92 +++++++++++++++++++++++++++++++++++++++++------------
> >  zbd.h       |  9 +++---
> >  5 files changed, 84 insertions(+), 30 deletions(-)
> > 
> > diff --git a/io_u.c b/io_u.c
> > index 2ef5acec..62bfec47 100644
> > --- a/io_u.c
> > +++ b/io_u.c
> > @@ -795,7 +795,7 @@ void put_io_u(struct thread_data *td, struct io_u *io_u)
> >  {
> >  	const bool needs_lock = td_async_processing(td);
> >  
> > -	zbd_put_io_u(io_u);
> > +	zbd_put_io_u(td, io_u);
> >  
> >  	if (td->parent)
> >  		td = td->parent;
> > @@ -1364,7 +1364,7 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
> >  		if (!fill_io_u(td, io_u))
> >  			break;
> >  
> > -		zbd_put_io_u(io_u);
> > +		zbd_put_io_u(td, io_u);
> >  
> >  		put_file_log(td, f);
> >  		td_io_close_file(td, f);
> > diff --git a/io_u.h b/io_u.h
> > index 31100928..5a28689c 100644
> > --- a/io_u.h
> > +++ b/io_u.h
> > @@ -101,13 +101,14 @@ struct io_u {
> >  	 * @success == true means that the I/O operation has been queued or
> >  	 * completed successfully.
> >  	 */
> > -	void (*zbd_queue_io)(struct io_u *, int q, bool success);
> > +	void (*zbd_queue_io)(struct thread_data *td, struct io_u *, int q,
> > +			     bool success);
> >  
> >  	/*
> >  	 * ZBD mode zbd_put_io callback: called in after completion of an I/O
> >  	 * or commit of an async I/O to unlock the I/O target zone.
> >  	 */
> > -	void (*zbd_put_io)(const struct io_u *);
> > +	void (*zbd_put_io)(struct thread_data *td, const struct io_u *);
> >  
> >  	/*
> >  	 * Callback for io completion
> > diff --git a/ioengines.c b/ioengines.c
> > index 1c5970a4..476df58d 100644
> > --- a/ioengines.c
> > +++ b/ioengines.c
> > @@ -352,7 +352,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
> >  	}
> >  
> >  	ret = td->io_ops->queue(td, io_u);
> > -	zbd_queue_io_u(io_u, ret);
> > +	zbd_queue_io_u(td, io_u, ret);
> >  
> >  	unlock_file(td, io_u->file);
> >  
> > @@ -394,7 +394,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
> >  	if (!td->io_ops->commit) {
> >  		io_u_mark_submit(td, 1);
> >  		io_u_mark_complete(td, 1);
> > -		zbd_put_io_u(io_u);
> > +		zbd_put_io_u(td, io_u);
> >  	}
> >  
> >  	if (ret == FIO_Q_COMPLETED) {
> > diff --git a/zbd.c b/zbd.c
> > index e4a480b7..20f64b58 100644
> > --- a/zbd.c
> > +++ b/zbd.c
> > @@ -721,12 +721,18 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
> >  
> >  /* The caller must hold f->zbd_info->mutex */
> >  static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
> > -			   unsigned int open_zone_idx)
> > +			   unsigned int zone_idx)
> >  {
> > -	uint32_t zone_idx;
> > +	uint32_t open_zone_idx = 0;
> > +
> > +	for (; open_zone_idx < f->zbd_info->num_open_zones; open_zone_idx++) {
> > +		if (f->zbd_info->open_zones[open_zone_idx] == zone_idx)
> > +			break;
> > +	}
> > +	if (open_zone_idx == f->zbd_info->num_open_zones)
> 
> I understand that the assert below needs to be removed because this function can be called
> as a part of the reset of all zones. Still, maybe add a debug message saying "zone %d is
> not open" here?

Okay, will add the debug message in the v2.

> 
> > +		return;
> >  
> > -	assert(open_zone_idx < f->zbd_info->num_open_zones);
> > -	zone_idx = f->zbd_info->open_zones[open_zone_idx];
> > +	dprint(FD_ZBD, "%s: closing zone %d\n", f->file_name, zone_idx);
> >  	memmove(f->zbd_info->open_zones + open_zone_idx,
> >  		f->zbd_info->open_zones + open_zone_idx + 1,
> >  		(ZBD_MAX_OPEN_ZONES - (open_zone_idx + 1)) *
> > @@ -765,13 +771,8 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
> >  			continue;
> >  		zone_lock(td, f, z);
> >  		if (all_zones) {
> > -			unsigned int i;
> > -
> >  			pthread_mutex_lock(&f->zbd_info->mutex);
> > -			for (i = 0; i < f->zbd_info->num_open_zones; i++) {
> > -				if (f->zbd_info->open_zones[i] == nz)
> > -					zbd_close_zone(td, f, i);
> > -			}
> > +			zbd_close_zone(td, f, nz);
> >  			pthread_mutex_unlock(&f->zbd_info->mutex);
> >  
> >  			reset_wp = z->wp != z->start;
> > @@ -952,8 +953,15 @@ static bool zbd_open_zone(struct thread_data *td, const struct io_u *io_u,
> >  		return false;
> >  
> >  	pthread_mutex_lock(&f->zbd_info->mutex);
> > -	if (is_zone_open(td, f, zone_idx))
> > +	if (is_zone_open(td, f, zone_idx)) {
> > +		/*
> > +		 * If the zone is already open and going to be full by writes
> > +		 * in-flight, handle it as a full zone instead of an open zone.
> > +		 */
> > +		if (z->wp >= zbd_zone_capacity_end(z))
> > +			res = false;
> >  		goto out;
> > +	}
> >  	res = false;
> >  	/* Zero means no limit */
> >  	if (td->o.job_max_open_zones > 0 &&
> > @@ -995,6 +1003,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
> >  	unsigned int open_zone_idx = -1;
> >  	uint32_t zone_idx, new_zone_idx;
> >  	int i;
> > +	bool wait_zone_close;
> >  
> >  	assert(is_valid_offset(f, io_u->offset));
> >  
> > @@ -1030,11 +1039,9 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
> >  		if (td->o.max_open_zones == 0 && td->o.job_max_open_zones == 0)
> >  			goto examine_zone;
> >  		if (f->zbd_info->num_open_zones == 0) {
> > -			pthread_mutex_unlock(&f->zbd_info->mutex);
> > -			pthread_mutex_unlock(&z->mutex);
> >  			dprint(FD_ZBD, "%s(%s): no zones are open\n",
> >  			       __func__, f->file_name);
> 
> I was not able to trigger this message by running t/zbd/test-zbd-support script
> with -z option. Do you know an easy way to trigger this situation?
> I don't see anything wrong with the code here, my concern is mainly about
> test coverage...

The situation is recreated with the test case #27 and a zoned block device
which has zone capacity smaller than zone size. The command is as follows.

# ./run-tests-against-zoned-nullb -zone-cap -t 27 -z -o 12

> 
> > -			return NULL;
> > +			goto open_other_zone;
> >  		}
> >  
> >  		/*
> > @@ -1081,14 +1088,30 @@ examine_zone:
> >  		pthread_mutex_unlock(&f->zbd_info->mutex);
> >  		goto out;
> >  	}
> > -	dprint(FD_ZBD, "%s(%s): closing zone %d\n", __func__, f->file_name,
> > -	       zone_idx);
> > -	if (td->o.max_open_zones || td->o.job_max_open_zones)
> > -		zbd_close_zone(td, f, open_zone_idx);
> > +
> > +open_other_zone:
> > +	/* Check if number of open zones reaches one of limits. */
> > +	wait_zone_close =
> > +		f->zbd_info->num_open_zones == f->max_zone - f->min_zone ||
> > +		(td->o.max_open_zones &&
> > +		 f->zbd_info->num_open_zones == td->o.max_open_zones) ||
> > +		(td->o.job_max_open_zones &&
> > +		 td->num_open_zones == td->o.job_max_open_zones);
> > +
> >  	pthread_mutex_unlock(&f->zbd_info->mutex);
> >  
> >  	/* Only z->mutex is held. */
> >  
> > +	/*
> > +	 * When number of open zones reaches to one of limits, wait for
> > +	 * zone close before opening a new zone.
> > +	 */
> > +	if (wait_zone_close) {
> > +		dprint(FD_ZBD, "%s(%s): quiesce to allow open zones to close\n",
> > +		       __func__, f->file_name);
> > +		io_u_quiesce(td);
> > +	}
> > +
> >  	/* Zone 'z' is full, so try to open a new zone. */
> >  	for (i = f->io_size / f->zbd_info->zone_size; i > 0; i--) {
> >  		zone_idx++;
> > @@ -1203,6 +1226,29 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
> >  	return NULL;
> >  }
> >  
> > +/**
> > + * zbd_end_zone_io - update zone status at command completion
> > + * @io_u: I/O unit
> > + * @z: zone info pointer
> > + * @success: Whether or not the I/O unit has been queued successfully
> > + *
> > + * If the write command made the zone full, close it.
> > + *
> > + * The caller must hold z->mutex.
> > + */
> > +static void zbd_end_zone_io(struct thread_data *td, const struct io_u *io_u,
> > +			    struct fio_zone_info *z, bool success)
> > +{
> > +	const struct fio_file *f = io_u->file;
> > +
> > +	if (io_u->ddir == DDIR_WRITE && success &&
> > +	    io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
> > +		pthread_mutex_lock(&f->zbd_info->mutex);
> > +		zbd_close_zone(td, f, z - f->zbd_info->zone_info);
> > +		pthread_mutex_unlock(&f->zbd_info->mutex);
> > +	}
> > +}
> > +
> >  /**
> >   * zbd_queue_io - update the write pointer of a sequential zone
> >   * @io_u: I/O unit
> > @@ -1212,7 +1258,8 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
> >   * For write and trim operations, update the write pointer of the I/O unit
> >   * target zone.
> >   */
> > -static void zbd_queue_io(struct io_u *io_u, int q, bool success)
> > +static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
> > +			 bool success)
> >  {
> >  	const struct fio_file *f = io_u->file;
> >  	struct zoned_block_device_info *zbd_info = f->zbd_info;
> > @@ -1258,6 +1305,9 @@ static void zbd_queue_io(struct io_u *io_u, int q, bool success)
> >  		break;
> >  	}
> >  
> > +	if (q == FIO_Q_COMPLETED)
> > +		zbd_end_zone_io(td, io_u, z, !io_u->error);
> 
> This could be changed to
> 
> +	if (q == FIO_Q_COMPLETED && !io_u->error)
> +		zbd_end_zone_io(td, io_u, z);
> 
> making the last parameter of zbd_end_zone_io(), "success", unnecessary.
> 
> > +
> >  unlock:
> >  	if (!success || q != FIO_Q_QUEUED) {
> >  		/* BUSY or COMPLETED: unlock the zone */
> > @@ -1270,7 +1320,7 @@ unlock:
> >   * zbd_put_io - Unlock an I/O unit target zone lock
> >   * @io_u: I/O unit
> >   */
> > -static void zbd_put_io(const struct io_u *io_u)
> > +static void zbd_put_io(struct thread_data *td, const struct io_u *io_u)
> >  {
> >  	const struct fio_file *f = io_u->file;
> >  	struct zoned_block_device_info *zbd_info = f->zbd_info;
> > @@ -1292,6 +1342,8 @@ static void zbd_put_io(const struct io_u *io_u)
> >  	       "%s: terminate I/O (%lld, %llu) for zone %u\n",
> >  	       f->file_name, io_u->offset, io_u->buflen, zone_idx);
> >  
> > +	zbd_end_zone_io(td, io_u, z, true);
> 
> The last parameter in this call above can be removed with the change above.

Thanks! This idea will make the code simpler. Will reflect to v2.

> 
> > +
> >  	ret = pthread_mutex_unlock(&z->mutex);
> >  	assert(ret == 0);
> >  	zbd_check_swd(f);
> > diff --git a/zbd.h b/zbd.h
> > index 021174c1..bff55f99 100644
> > --- a/zbd.h
> > +++ b/zbd.h
> > @@ -98,18 +98,19 @@ static inline void zbd_close_file(struct fio_file *f)
> >  		zbd_free_zone_info(f);
> >  }
> >  
> > -static inline void zbd_queue_io_u(struct io_u *io_u, enum fio_q_status status)
> > +static inline void zbd_queue_io_u(struct thread_data *td, struct io_u *io_u,
> > +				  enum fio_q_status status)
> >  {
> >  	if (io_u->zbd_queue_io) {
> > -		io_u->zbd_queue_io(io_u, status, io_u->error == 0);
> > +		io_u->zbd_queue_io(td, io_u, status, io_u->error == 0);
> >  		io_u->zbd_queue_io = NULL;
> >  	}
> >  }
> >  
> > -static inline void zbd_put_io_u(struct io_u *io_u)
> > +static inline void zbd_put_io_u(struct thread_data *td, struct io_u *io_u)
> >  {
> >  	if (io_u->zbd_put_io) {
> > -		io_u->zbd_put_io(io_u);
> > +		io_u->zbd_put_io(td, io_u);
> >  		io_u->zbd_queue_io = NULL;
> >  		io_u->zbd_put_io = NULL;
> >  	}
> 
> Overall approach looks fine to me.
> 
> Dmitry

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] zbd: Initialize open zones list referring zone status at fio start
  2020-08-15  0:06   ` Dmitry Fomichev
@ 2020-08-19 12:51     ` Shinichiro Kawasaki
  0 siblings, 0 replies; 17+ messages in thread
From: Shinichiro Kawasaki @ 2020-08-19 12:51 UTC (permalink / raw)
  To: Dmitry Fomichev; +Cc: fio, axboe, Damien Le Moal

On Aug 15, 2020 / 00:06, Dmitry Fomichev wrote:
> On Thu, 2020-08-13 at 13:57 +0900, Shin'ichiro Kawasaki wrote:
> > When fio starts write workloads to zones with open status, fio does not
> > reflect the zone status to open zone list. This results in inconsistent
> > open zone accounting.
> > 
> > To avoid this inconsistency, initialize the open zone list referring the
> > zone status by calling zbd_open_zone() function before workload start.
> > Since io_u is not available at that timing, make the function
> > independent from io_u.
> > 
> > Of note is that fio counts open zones within the write target range.
> > Open zones out of the range are not counted or checked.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  zbd.c | 21 ++++++++++++++++-----
> >  1 file changed, 16 insertions(+), 5 deletions(-)
> > 
> > diff --git a/zbd.c b/zbd.c
> > index 20f64b58..123d5530 100644
> > --- a/zbd.c
> > +++ b/zbd.c
> > @@ -627,6 +627,9 @@ static int zbd_init_zone_info(struct thread_data *td, struct fio_file *file)
> >  	return ret;
> >  }
> >  
> > +static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
> > +			  uint32_t zone_idx);
> > +
> >  int zbd_setup_files(struct thread_data *td)
> >  {
> >  	struct fio_file *f;
> > @@ -650,6 +653,8 @@ int zbd_setup_files(struct thread_data *td)
> >  
> >  	for_each_file(td, f, i) {
> >  		struct zoned_block_device_info *zbd = f->zbd_info;
> > +		struct fio_zone_info *z;
> > +		int zi;
> >  
> >  		if (!zbd)
> >  			continue;
> > @@ -665,6 +670,13 @@ int zbd_setup_files(struct thread_data *td)
> >  			log_err("'max_open_zones' value is limited by %u\n", ZBD_MAX_OPEN_ZONES);
> >  			return 1;
> >  		}
> > +
> > +		for (zi = f->min_zone; zi < f->max_zone; zi++) {
> > +			z = &zbd->zone_info[zi];
> > +			if (z->cond == ZBD_ZONE_COND_IMP_OPEN ||
> > +			    z->cond == ZBD_ZONE_COND_EXP_OPEN)
> > +				zbd_open_zone(td, f, zi);
> > +		}
> 
> What if zbd->max_open_zones is set less than the number of the currently
> open zones in [max_zone:min_zone] range? In this case, the loop above will
> open more zones than needed. Maybe this part should be changed to something
> like the code below?
> 
> +		int zi, noz;
> ...
> +		for (zi = f->min_zone, noz = 0; zi < f->max_zone; zi++) {
> +			z = &zbd->zone_info[zi];
> +			if (z->cond == ZBD_ZONE_COND_IMP_OPEN ||
> +			    z->cond == ZBD_ZONE_COND_EXP_OPEN) {
> +				if (noz < zbd->max_open_zones)
> +					zbd_open_zone(td, f, zi);
> +				else
> +					zbd_reset_zone(td, f, zi);
> +				noz++;
> +			}
> +		}
>

Thank you for catching this. As you suggest, I will add the code to reset zones
when number of open zones exceeds the limits. The code will be little bit more
complicated to cover both max_open_zones and job_max_open_zones.

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] t/zbd: Add -o option to t/zbd/test-zoned-support
  2020-08-15  0:07   ` Dmitry Fomichev
@ 2020-08-19 13:13     ` Shinichiro Kawasaki
  0 siblings, 0 replies; 17+ messages in thread
From: Shinichiro Kawasaki @ 2020-08-19 13:13 UTC (permalink / raw)
  To: Dmitry Fomichev; +Cc: fio, axboe, Damien Le Moal

On Aug 15, 2020 / 00:07, Dmitry Fomichev wrote:
> On Thu, 2020-08-13 at 13:57 +0900, Shin'ichiro Kawasaki wrote:
> > To specify maximum open zones of the test target device, add -o option
> > to t/zbd/test-zoned-support. When this option is specified, add
> > max_open_zones option to all fio commands in the test script. The option
> > ensures the number of zones opened by fio is within the limit. This is
> > useful when the test target device has maximum open zones limit.
> > 
> > When the fio command does not have multiple jobs and the test case does
> > not specify max_open_zones, add single max_open_zones option to the fio
> > command line.
> > 
> > When the fio command takes multiple jobs, add max_open_zones option to
> > each job. Introduce job_var_opts variable to keep options to be added to
> > each job. To distinguish it from global options for all jobs, rename
> > var_opts variable to global_var_opts. When a test case with multiple jobs
> > always specifies max_open_zones option, exclude the max_open_zones option
> > from job_var_opts, using job_var_opts_exclude() helper function.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
>  t/zbd/test-zbd-support | 70 ++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 57 insertions(+), 13 deletions(-)
> > 
> > diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
> > index 9e398cab..c21d6aad 100755
> > --- a/t/zbd/test-zbd-support
> > +++ b/t/zbd/test-zbd-support
> > @@ -12,6 +12,7 @@ usage() {
> >  	echo -e "\t-v Run fio with valgrind --read-var-info option"
> >  	echo -e "\t-l Test with libzbc ioengine"
> >  	echo -e "\t-r Reset all zones before test start"
> > +	echo -e "\t-o <max_open_zones> Run fio with max_open_zones limit"
> >  	echo -e "\t-t <test #> Run only a single test case with specified number"
> >  	echo -e "\t-z Run fio with debug=zbd option"
> >  }
> > @@ -103,14 +104,41 @@ is_scsi_device() {
> >      return 1
> >  }
> >  
> > +job_var_opts_exclude() {
> > +	local o
> > +	local ex_key="${1}"
> > +
> > +	for o in "${job_var_opts[@]}"; do
> > +		if [[ ${o} =~ "${ex_key}" ]]; then
> > +			continue
> > +		fi
> > +		echo -n "${o}"
> > +	done
> > +}
> > +
> > +has_max_open_zones() {
> > +	while (($# > 1)); do
> > +		if [[ ${1} =~ "--max_open_zones" ]]; then
> > +			return 0
> > +		fi
> > +		shift
> > +	done
> > +	return 1
> > +}
> > +
> >  run_fio() {
> >      local fio opts
> >  
> >      fio=$(dirname "$0")/../../fio
> >  
> > -    opts=("--max-jobs=16" "--aux-path=/tmp" "--allow_file_create=0" \
> > -	  "--significant_figures=10" "$@")
> > -    opts+=(${var_opts[@]})
> > +    opts=(${global_var_opts[@]})
> > +    opts+=("--max-jobs=16" "--aux-path=/tmp" "--allow_file_create=0" \
> > +			   "--significant_figures=10" "$@")
> > +    # When max_open_zones option is specified to this test script, add
> > +    # max_open_zones option to fio command unless the test case already add it.
> > +    if [[ -n ${max_open_zones_opt} ]] && ! has_max_open_zones "${opts[@]}"; then
> > +	    opts+=("--max_open_zones=${max_open_zones_opt}")
> > +    fi
> >      { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
> >  
> >      "${dynamic_analyzer[@]}" "$fio" "${opts[@]}"
> > @@ -128,13 +156,16 @@ write_and_run_one_fio_job() {
> >      local r
> >      local write_offset="${1}"
> >      local write_size="${2}"
> > +    local -a write_opts
> >  
> >      shift 2
> >      r=$(((RANDOM << 16) | RANDOM))
> > -    run_fio --filename="$dev" --randseed="$r"  --name="write_job" --rw=write \
> > -	    "$(ioengine "psync")" --bs="${logical_block_size}" \
> > -	    --zonemode=zbd --zonesize="${zone_size}" --thread=1 --direct=1 \
> > -	    --offset="${write_offset}" --size="${write_size}" \
> > +    write_opts=(--name="write_job" --rw=write "$(ioengine "psync")" \
> > +		      --bs="${logical_block_size}" --zonemode=zbd \
> > +		      --zonesize="${zone_size}" --thread=1 --direct=1 \
> > +		      --offset="${write_offset}" --size="${write_size}")
> > +    write_opts+=("${job_var_opts[@]}")
> > +    run_fio --filename="$dev" --randseed="$r" "${write_opts[@]}" \
> >  	    --name="$dev" --wait_for="write_job" "$@" --thread=1 --direct=1
> >  }
> >  
> > @@ -512,7 +543,7 @@ test25() {
> >  	opts+=("--offset=$((first_sequential_zone_sector*512 + zone_size*i))")
> >  	opts+=("--size=$zone_size" "$(ioengine "psync")" "--rw=write" "--bs=16K")
> >  	opts+=("--zonemode=zbd" "--zonesize=${zone_size}" "--group_reporting=1")
> > -	opts+=(${var_opts[@]})
> > +	opts+=(${job_var_opts[@]})
> >      done
> >      run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
> >  }
> > @@ -557,7 +588,7 @@ test28() {
> >  	opts+=("--size=$zone_size" "--io_size=$capacity" "$(ioengine "psync")" "--rw=randwrite")
> >  	opts+=("--thread=1" "--direct=1" "--zonemode=zbd")
> >  	opts+=("--zonesize=${zone_size}" "--group_reporting=1")
> > -	opts+=(${var_opts[@]})
> > +	opts+=(${job_var_opts[@]})
> >      done
> >      run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
> >      check_written $((jobs * $capacity)) || return $?
> > @@ -581,7 +612,8 @@ test29() {
> >  	opts+=("$(ioengine "psync")" "--rw=randwrite" "--direct=1")
> >  	opts+=("--max_open_zones=4" "--group_reporting=1")
> >  	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
> > -	opts+=(${var_opts[@]})
> > +	# max_open_zones is already specified
> > +	opts+=($(job_var_opts_exclude "--max_open_zones"))
> >      done
> >      run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
> >      check_written $((jobs * zone_size)) || return $?
> > @@ -617,7 +649,7 @@ test31() {
> >  	opts+=("--bs=$bs" "--size=$zone_size" "$(ioengine "libaio")")
> >  	opts+=("--rw=write" "--direct=1" "--thread=1" "--stats=0")
> >  	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
> > -	opts+=(${var_opts[@]})
> > +	opts+=(${job_var_opts[@]})
> >      done
> >      "$(dirname "$0")/../../fio" "${opts[@]}" >> "${logfile}.${test_number}" 2>&1
> >      # Next, run the test.
> > @@ -627,6 +659,7 @@ test31() {
> >      opts+=("--bs=$bs" "$(ioengine "psync")" "--rw=randread" "--direct=1")
> >      opts+=("--thread=1" "--time_based" "--runtime=30" "--zonemode=zbd")
> >      opts+=("--zonesize=${zone_size}")
> > +    opts+=(${job_var_opts[@]})
> >      run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
> >  }
> >  
> > @@ -843,6 +876,8 @@ test48() {
> >  	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
> >  	opts+=("--io_size=$zone_size" "--iodepth=256" "--thread=1")
> >  	opts+=("--group_reporting=1")
> > +	# max_open_zones is already specified
> > +	opts+=($(job_var_opts_exclude "--max_open_zones"))
> >      done
> >  
> >      fio=$(dirname "$0")/../../fio
> > @@ -880,6 +915,7 @@ dynamic_analyzer=()
> >  reset_all_zones=
> >  use_libzbc=
> >  zbd_debug=
> > +max_open_zones_opt=
> >  
> >  while [ "${1#-}" != "$1" ]; do
> >    case "$1" in
> > @@ -891,6 +927,7 @@ while [ "${1#-}" != "$1" ]; do
> >      -l) use_libzbc=1; shift;;
> >      -r) reset_all_zones=1; shift;;
> >      -t) tests+=("$2"); shift; shift;;
> > +    -o) max_open_zones_opt="${2}"; shift; shift;;
> >      -v) dynamic_analyzer=(valgrind "--read-var-info=yes");
> >  	shift;;
> >      -z) zbd_debug=1; shift;;
> > @@ -906,9 +943,10 @@ fi
> >  # shellcheck source=functions
> >  source "$(dirname "$0")/functions" || exit $?
> >  
> > -var_opts=()
> > +global_var_opts=()
> > +job_var_opts=()
> >  if [ -n "$zbd_debug" ]; then
> > -    var_opts+=("--debug=zbd")
> > +    global_var_opts+=("--debug=zbd")
> >  fi
> >  dev=$1
> >  realdev=$(readlink -f "$dev")
> > @@ -994,6 +1032,12 @@ elif [[ -c "$realdev" ]]; then
> >  	fi
> >  fi
> >  
> > +if [[ -n ${max_open_zones_opt} ]]; then
> > +	# Override max_open_zones with the script option value
> > +	max_open_zones="${max_open_zones_opt}"
> > +	job_var_opts+=("--max_open_zones=${max_open_zones_opt}")
> > +fi
> > +
> >  echo -n "First sequential zone starts at sector $first_sequential_zone_sector;"
> >  echo " zone size: $((zone_size >> 20)) MB"
> >  
> 
> The script changes look good. The only nit is... I tried to run
> 
> t/zbd/test-zbd-support with -o 512 <device>
> 
> which is larger than 128 max open zones supported by the drive expecting
> the test to fail, but it didn't. It seems, the script defaults to the maximum
> number of open zones in this case, but it is not very obvious. Maybe log
> a message saying that the script has capped the value specified by the user
> because it was too large?

Do you use SMR drives with limit 128 for testing? The SMR drives conformant to
ZBC/ZAC spec implicitly close zones when the number of open zones exceeds the
limit 128. So the script reports no failure, but actually fio opens zones beyond
the limit.

ZNS devices which has "max active resources" limit do not close zones when
the number of zones exceeds that limit. The test script reports failures for
these devices and the -o option is useful to avoid the failures.

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] t/zbd: Reset all zones before test when max open zones is specified
  2020-08-15  0:07   ` Dmitry Fomichev
@ 2020-08-21  6:06     ` Shinichiro Kawasaki
  0 siblings, 0 replies; 17+ messages in thread
From: Shinichiro Kawasaki @ 2020-08-21  6:06 UTC (permalink / raw)
  To: Dmitry Fomichev; +Cc: fio, axboe, Damien Le Moal

On Aug 15, 2020 / 00:07, Dmitry Fomichev wrote:
> On Thu, 2020-08-13 at 13:57 +0900, Shin'ichiro Kawasaki wrote:
> > When the test target device has maximum open zones limit, the zones in
> > test target region may not be opened up to the limit, because the zones
> > out of the test target region may have open zones. To ensure that the
> > test target zones can be opened up to the limit, reset all zones of the
> > test target device before the test cases with write work load starts.
> > 
> > Introduce the helper function prep_write() to check if all zone reset is
> > required and do the reset. For test cases which reset all zones
> > regardless of the maximum open zones option., the helper function takes
> > "reset_all" argument.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> 
> The patch looks ok, but I wonder if some of the reset-all cases can be
> unnecessary if you reset excessive open zones in zbd_setup_files() as
> I suggested in the comment to the previous patch. Anyway,

That's right. With you suggestion on zbd_setup_files(), I confirmed that
reset-all is no longer required. Will remove in the v2. Thank you.

> 
> Reviewed-by: Dmitry Fomichev: <dmitry.fomichev@wdc.com>
> 
> > ---
> >  t/zbd/test-zbd-support | 44 +++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 41 insertions(+), 3 deletions(-)
> > 
> > diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
> > index c21d6aad..9ec406c3 100755
> > --- a/t/zbd/test-zbd-support
> > +++ b/t/zbd/test-zbd-support
> > @@ -181,6 +181,16 @@ run_fio_on_seq() {
> >      run_one_fio_job "${opts[@]}" "$@"
> >  }
> >  
> > +# Prepare for write test by resetting zones. When "reset_all" is specified,
> > +# reset all sequential write required zones. When max_open_zones option is
> > +# specified, reset all zones of the test target to ensure that zones out of the
> > +# test target range do not have open zones. This allows the write test to the
> > +# target range to be able to open zones up to max_open_zones.
> > +prep_write() {
> > +	[[ ${1} == "reset_all" || -n "${max_open_zones_opt}" ]] &&
> > +		[[ -n "${is_zbd}" ]] && reset_zone "${dev}" -1
> > +}
> > +
> >  # Check whether buffered writes are refused.
> >  test1() {
> >      run_fio --name=job1 --filename="$dev" --rw=write --direct=0 --bs=4K	\
> > @@ -252,6 +262,7 @@ test4() {
> >  test5() {
> >      local size off capacity
> >  
> > +    prep_write
> >      off=$((first_sequential_zone_sector * 512))
> >      capacity=$(total_zone_capacity 4 $off $dev)
> >      size=$((4 * zone_size))
> > @@ -267,6 +278,7 @@ test5() {
> >  test6() {
> >      local size off capacity
> >  
> > +    prep_write
> >      off=$((first_sequential_zone_sector * 512))
> >      capacity=$(total_zone_capacity 4 $off $dev)
> >      size=$((4 * zone_size))
> > @@ -285,6 +297,7 @@ test7() {
> >      local size=$((zone_size))
> >      local off capacity
> >  
> > +    prep_write
> >      off=$((first_sequential_zone_sector * 512))
> >      capacity=$(total_zone_capacity 1 $off $dev)
> >      run_fio_on_seq "$(ioengine "libaio")" --iodepth=1 --rw=randwrite	\
> > @@ -299,6 +312,7 @@ test7() {
> >  test8() {
> >      local size off capacity
> >  
> > +    prep_write
> >      size=$((4 * zone_size))
> >      off=$((first_sequential_zone_sector * 512))
> >      capacity=$(total_zone_capacity 4 $off $dev)
> > @@ -319,6 +333,7 @@ test9() {
> >  	return 0
> >      fi
> >  
> > +    prep_write
> >      size=$((4 * zone_size))
> >      run_fio_on_seq --ioengine=sg					\
> >  		   --iodepth=1 --rw=randwrite --bs=16K			\
> > @@ -337,6 +352,7 @@ test10() {
> >  	return 0
> >      fi
> >  
> > +    prep_write
> >      size=$((4 * zone_size))
> >      run_fio_on_seq --ioengine=sg 					\
> >  		   --iodepth=64 --rw=randwrite --bs=16K			\
> > @@ -350,6 +366,7 @@ test10() {
> >  test11() {
> >      local size off capacity
> >  
> > +    prep_write
> >      size=$((4 * zone_size))
> >      off=$((first_sequential_zone_sector * 512))
> >      capacity=$(total_zone_capacity 4 $off $dev)
> > @@ -364,6 +381,7 @@ test11() {
> >  test12() {
> >      local size off capacity
> >  
> > +    prep_write
> >      size=$((8 * zone_size))
> >      off=$((first_sequential_zone_sector * 512))
> >      capacity=$(total_zone_capacity 8 $off $dev)
> > @@ -378,6 +396,7 @@ test12() {
> >  test13() {
> >      local size off capacity
> >  
> > +    prep_write
> >      size=$((8 * zone_size))
> >      off=$((first_sequential_zone_sector * 512))
> >      capacity=$(total_zone_capacity 8 $off $dev)
> > @@ -393,6 +412,7 @@ test13() {
> >  test14() {
> >      local size
> >  
> > +    prep_write
> >      size=$((16 * 2**20)) # 20 MB
> >      if [ $size -gt $((first_sequential_zone_sector * 512)) ]; then
> >  	echo "$dev does not have enough sequential zones" \
> > @@ -417,6 +437,7 @@ test15() {
> >  	    reset_zone "$dev" $((first_sequential_zone_sector +
> >  				 i*sectors_per_zone))
> >      done
> > +    prep_write
> >      w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
> >      w_size=$((2 * zone_size))
> >      w_capacity=$(total_zone_capacity 2 $w_off $dev)
> > @@ -441,6 +462,7 @@ test16() {
> >  	    reset_zone "$dev" $((first_sequential_zone_sector +
> >  				 i*sectors_per_zone))
> >      done
> > +    prep_write
> >      w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
> >      w_size=$((2 * zone_size))
> >      w_capacity=$(total_zone_capacity 2 $w_off $dev)
> > @@ -463,6 +485,7 @@ test17() {
> >      if [ -n "$is_zbd" ]; then
> >  	reset_zone "$dev" $((off / 512)) || return $?
> >      fi
> > +    prep_write
> >      run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw --bs=4K \
> >  		    --zonemode=zbd --zonesize="${zone_size}"		\
> >  		    --offset=$off --loops=2 --norandommap=1\
> > @@ -516,6 +539,7 @@ test24() {
> >      local bs loops=9 size=$((zone_size))
> >      local off capacity
> >  
> > +    prep_write
> >      off=$((first_sequential_zone_sector * 512))
> >      capacity=$(total_zone_capacity 1 $off $dev)
> >  
> > @@ -538,6 +562,7 @@ test25() {
> >          [ -n "$is_zbd" ] &&
> >  	    reset_zone "$dev" $((first_sequential_zone_sector + i*sectors_per_zone))
> >      done
> > +    prep_write
> >      for ((i=0;i<16;i++)); do
> >  	opts+=("--name=job$i" "--filename=$dev" "--thread=1" "--direct=1")
> >  	opts+=("--offset=$((first_sequential_zone_sector*512 + zone_size*i))")
> > @@ -552,6 +577,7 @@ write_to_first_seq_zone() {
> >      local loops=4 r
> >      local off capacity
> >  
> > +    prep_write
> >      off=$((first_sequential_zone_sector * 512))
> >      capacity=$(total_zone_capacity 1 $off $dev)
> >  
> > @@ -580,7 +606,7 @@ test28() {
> >      local i jobs=16 off opts
> >  
> >      off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
> > -    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
> > +    prep_write reset_all
> >      opts=("--debug=zbd")
> >      capacity=$(total_zone_capacity 1 $off $dev)
> >      for ((i=0;i<jobs;i++)); do
> > @@ -604,7 +630,7 @@ test29() {
> >  
> >      off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
> >      size=$((16*zone_size))
> > -    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
> > +    prep_write reset_all
> >      opts=("--debug=zbd")
> >      for ((i=0;i<jobs;i++)); do
> >  	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
> > @@ -623,6 +649,7 @@ test29() {
> >  test30() {
> >      local off
> >  
> > +    prep_write
> >      off=$((first_sequential_zone_sector * 512))
> >      run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw	\
> >  		    --bs="$(max $((zone_size / 128)) "$logical_block_size")"\
> > @@ -636,6 +663,7 @@ test30() {
> >  test31() {
> >      local bs inc nz off opts size
> >  
> > +    prep_write
> >      # Start with writing 128 KB to 128 sequential zones.
> >      bs=128K
> >      nz=128
> > @@ -668,6 +696,7 @@ test31() {
> >  test32() {
> >      local off opts=() size
> >  
> > +    prep_write
> >      off=$((first_sequential_zone_sector * 512))
> >      size=$((disk_size - off))
> >      opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
> > @@ -684,6 +713,7 @@ test33() {
> >      local bs io_size size
> >      local off capacity=0;
> >  
> > +    prep_write
> >      off=$((first_sequential_zone_sector * 512))
> >      capacity=$(total_zone_capacity 1 $off $dev)
> >      size=$((2 * zone_size))
> > @@ -700,6 +730,7 @@ test33() {
> >  test34() {
> >      local size
> >  
> > +    prep_write
> >      size=$((2 * zone_size))
> >      run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write --size=$size \
> >  		   --do_verify=1 --verify=md5 --bs=$((3 * zone_size / 4)) \
> > @@ -711,6 +742,7 @@ test34() {
> >  test35() {
> >      local bs off io_size size
> >  
> > +    prep_write
> >      off=$(((first_sequential_zone_sector + 1) * 512))
> >      size=$((zone_size - 2 * 512))
> >      bs=$((zone_size / 4))
> > @@ -725,6 +757,7 @@ test35() {
> >  test36() {
> >      local bs off io_size size
> >  
> > +    prep_write
> >      off=$(((first_sequential_zone_sector) * 512))
> >      size=$((zone_size - 512))
> >      bs=$((zone_size / 4))
> > @@ -739,6 +772,7 @@ test36() {
> >  test37() {
> >      local bs off size capacity
> >  
> > +    prep_write
> >      capacity=$(total_zone_capacity 1 $first_sequential_zone_sector $dev)
> >      if [ "$first_sequential_zone_sector" = 0 ]; then
> >  	off=0
> > @@ -758,6 +792,7 @@ test37() {
> >  test38() {
> >      local bs off size
> >  
> > +    prep_write
> >      size=$((logical_block_size))
> >      off=$((disk_size - logical_block_size))
> >      bs=$((logical_block_size))
> > @@ -828,6 +863,7 @@ test45() {
> >      local bs i
> >  
> >      [ -z "$is_zbd" ] && return 0
> > +    prep_write
> >      bs=$((logical_block_size))
> >      run_one_fio_job "$(ioengine "psync")" --iodepth=1 --rw=randwrite --bs=$bs\
> >  		    --offset=$((first_sequential_zone_sector * 512)) \
> > @@ -840,6 +876,7 @@ test45() {
> >  test46() {
> >      local size
> >  
> > +    prep_write
> >      size=$((4 * zone_size))
> >      run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=4K \
> >  		   --group_reporting=1 --numjobs=8 \
> > @@ -851,6 +888,7 @@ test46() {
> >  test47() {
> >      local bs
> >  
> > +    prep_write
> >      bs=$((logical_block_size))
> >      run_fio_on_seq "$(ioengine "psync")" --rw=write --bs=$bs --zoneskip=1 \
> >  		    >> "${logfile}.${test_number}" 2>&1 && return 1
> > @@ -865,7 +903,7 @@ test48() {
> >  
> >      off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
> >      size=$((16*zone_size))
> > -    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
> > +    prep_write reset_all
> >      opts=("--aux-path=/tmp" "--allow_file_create=0" "--significant_figures=10")
> >      opts+=("--debug=zbd")
> >      opts+=("$(ioengine "libaio")" "--rw=randwrite" "--direct=1")

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-08-21  6:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-13  4:57 [PATCH 0/6] Improve open zones accounting Shin'ichiro Kawasaki
2020-08-13  4:57 ` [PATCH 1/6] zbd: Decrement open zones count at write command completion Shin'ichiro Kawasaki
2020-08-15  0:05   ` Dmitry Fomichev
2020-08-19 12:35     ` Shinichiro Kawasaki
2020-08-13  4:57 ` [PATCH 2/6] zbd: Initialize open zones list referring zone status at fio start Shin'ichiro Kawasaki
2020-08-15  0:06   ` Dmitry Fomichev
2020-08-19 12:51     ` Shinichiro Kawasaki
2020-08-13  4:57 ` [PATCH 3/6] t/zbd: Improve usage message of test-zbd-support script Shin'ichiro Kawasaki
2020-08-15  0:06   ` Dmitry Fomichev
2020-08-13  4:57 ` [PATCH 4/6] t/zbd: Add -o option to t/zbd/test-zoned-support Shin'ichiro Kawasaki
2020-08-15  0:07   ` Dmitry Fomichev
2020-08-19 13:13     ` Shinichiro Kawasaki
2020-08-13  4:57 ` [PATCH 5/6] t/zbd: Reset all zones before test when max open zones is specified Shin'ichiro Kawasaki
2020-08-15  0:07   ` Dmitry Fomichev
2020-08-21  6:06     ` Shinichiro Kawasaki
2020-08-13  4:57 ` [PATCH 6/6] t/zbd: Remove unnecessary option for zbc_reset_zone Shin'ichiro Kawasaki
2020-08-15  0:07   ` Dmitry Fomichev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.