All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] zbd: clean up code and fix bugs for open zones accounting
@ 2023-06-07  8:32 Shin'ichiro Kawasaki
  2023-06-07  8:32 ` [PATCH 1/7] zbd: rename 'open zones' to 'write zones' Shin'ichiro Kawasaki
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Shin'ichiro Kawasaki @ 2023-06-07  8:32 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu
  Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel, Shin'ichiro Kawasaki

Fio with zonemode=zbd does accounting of 'open zones'. However, the meaning of
'open zone' is confusing because its definition is similar but different
between fio and zoned block devices. The first patch in this series avoids the
confusion by renaming the word 'open zone' to 'write zone'.

Also, recently three bugs were found related to the open zones accounting.
Following three patches fix the bugs. The last three patches improve test cases
corresponding to the fixes.

Shin'ichiro Kawasaki (7):
  zbd: rename 'open zones' to 'write zones'
  zbd: do not reset extra zones in open conditions
  zbd: fix write zone accounting of almost full zones
  zbd: fix write zone accounting of trim workload
  t/zbd: reset zones before tests with max_open_zones option
  t/zbd: test write zone accounting of almost full zones
  t/zbd: test write zone accounting of trim workload

 engines/io_uring.c     |   2 +-
 fio.h                  |   2 +-
 io_u.c                 |   2 +-
 io_u.h                 |   2 +-
 options.c              |   4 +-
 t/zbd/test-zbd-support |  64 ++++++++-
 zbd.c                  | 293 ++++++++++++++++++++++++-----------------
 zbd.h                  |  25 ++--
 zbd_types.h            |   2 +-
 9 files changed, 249 insertions(+), 147 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] zbd: rename 'open zones' to 'write zones'
  2023-06-07  8:32 [PATCH 0/7] zbd: clean up code and fix bugs for open zones accounting Shin'ichiro Kawasaki
@ 2023-06-07  8:32 ` Shin'ichiro Kawasaki
  2023-06-07 13:15   ` Niklas Cassel
  2023-06-07  8:32 ` [PATCH 2/7] zbd: do not reset extra zones in open conditions Shin'ichiro Kawasaki
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Shin'ichiro Kawasaki @ 2023-06-07  8:32 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu
  Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel, Shin'ichiro Kawasaki

Current fio code for zonemode=zbd uses the word 'open zone' to mean the
zones that fio jobs write to. Before fio starts writing to a zone, it
calls zbd_open_zone(). When fio completes writing to a zone, it calls
zbd_close_zone(). This wording is good for zoned block devices with
max_open_zones limit, such as ZBC and ZAC devices. The devices use same
word 'open' to express the zone condition that the devices assign
resources for data write to zones. However, the word 'open' gets
confusing to support zoned block devices which has max_active_zones
limit, such as ZNS devices. These devices have both 'open' and 'active'
keywords to mean two different kinds of resources on the device. This
'active' status does not fit with the 'open zone' wording in the fio
code. Also, the word 'open' zone in fio code does not always match with
the 'open' condition of zones on the device (e.g. when
--ignore_zone_limits option is specified).

To avoid the confusion, stop using the word 'open zone' in the fio code.
Instead, use the word 'write zone' to mean that the zone is the write
target. When fio starts a write to a zone, it adds the zone to
write_zones array. When fio completes writing to a zone, it removes the
zone from the write_zones array. For this purpose, rename struct fields,
functions and a macro:

  ZBD_MAX_OPEN_ZONES -> ZBD_MAX_WRITE_ZONES
  struct fio_zone_info
    open -> write
  struct thread_data
    num_open_zones -> num_write_zones
  struct zoned_block_device_info:
    max_open_zones -> max_write_zones
    num_open_zones -> num_write_zones
    open_zones[] -> write_zones[]
  zbd_open_zone() -> zbd_write_zone_get()
  zbd_close_zone() -> zbd_write_zone_put()
  zbd_convert_to_open_zone() -> zbd_convert_to_write_zone()

To match up these changes, rename local variables and goto labels. Also
rephrase code comments.

Of note is that this rename is only for the fio code. The fio options
max_open_zones and job_max_open_zones are not renamed to not confuse
users.

Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 fio.h       |   2 +-
 options.c   |   4 +-
 zbd.c       | 218 ++++++++++++++++++++++++++--------------------------
 zbd.h       |  23 +++---
 zbd_types.h |   2 +-
 5 files changed, 126 insertions(+), 123 deletions(-)

diff --git a/fio.h b/fio.h
index 6fc7fb9c..c5453d13 100644
--- a/fio.h
+++ b/fio.h
@@ -275,7 +275,7 @@ struct thread_data {
 	unsigned long long num_unique_pages;
 
 	struct zone_split_index **zone_state_index;
-	unsigned int num_open_zones;
+	unsigned int num_write_zones;
 
 	unsigned int verify_batch;
 	unsigned int trim_batch;
diff --git a/options.c b/options.c
index 8193fb29..a7c4ef6e 100644
--- a/options.c
+++ b/options.c
@@ -3618,7 +3618,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Per device/file maximum number of open zones",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct thread_options, max_open_zones),
-		.maxval	= ZBD_MAX_OPEN_ZONES,
+		.maxval	= ZBD_MAX_WRITE_ZONES,
 		.help	= "Limit on the number of simultaneously opened sequential write zones with zonemode=zbd",
 		.def	= "0",
 		.category = FIO_OPT_C_IO,
@@ -3629,7 +3629,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Job maximum number of open zones",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct thread_options, job_max_open_zones),
-		.maxval	= ZBD_MAX_OPEN_ZONES,
+		.maxval	= ZBD_MAX_WRITE_ZONES,
 		.help	= "Limit on the number of simultaneously opened sequential write zones with zonemode=zbd by one thread/process",
 		.def	= "0",
 		.category = FIO_OPT_C_IO,
diff --git a/zbd.c b/zbd.c
index 5f1a7d7f..7529e68b 100644
--- a/zbd.c
+++ b/zbd.c
@@ -304,39 +304,39 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 }
 
 /**
- * zbd_close_zone - Remove a zone from the open zones array.
+ * zbd_write_zone_put - Remove a zone from the write target zones array.
  * @td: FIO thread data.
- * @f: FIO file associated with the disk for which to reset a write pointer.
+ * @f: FIO file that has the write zones array to remove.
  * @zone_idx: Index of the zone to remove.
  *
  * The caller must hold f->zbd_info->mutex.
  */
-static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
-			   struct fio_zone_info *z)
+static void zbd_write_zone_put(struct thread_data *td, const struct fio_file *f,
+			       struct fio_zone_info *z)
 {
-	uint32_t ozi;
+	uint32_t zi;
 
-	if (!z->open)
+	if (!z->write)
 		return;
 
-	for (ozi = 0; ozi < f->zbd_info->num_open_zones; ozi++) {
-		if (zbd_get_zone(f, f->zbd_info->open_zones[ozi]) == z)
+	for (zi = 0; zi < f->zbd_info->num_write_zones; zi++) {
+		if (zbd_get_zone(f, f->zbd_info->write_zones[zi]) == z)
 			break;
 	}
-	if (ozi == f->zbd_info->num_open_zones)
+	if (zi == f->zbd_info->num_write_zones)
 		return;
 
-	dprint(FD_ZBD, "%s: closing zone %u\n",
+	dprint(FD_ZBD, "%s: removing zone %u from write zone array\n",
 	       f->file_name, zbd_zone_idx(f, z));
 
-	memmove(f->zbd_info->open_zones + ozi,
-		f->zbd_info->open_zones + ozi + 1,
-		(ZBD_MAX_OPEN_ZONES - (ozi + 1)) *
-		sizeof(f->zbd_info->open_zones[0]));
+	memmove(f->zbd_info->write_zones + zi,
+		f->zbd_info->write_zones + zi + 1,
+		(ZBD_MAX_WRITE_ZONES - (zi + 1)) *
+		sizeof(f->zbd_info->write_zones[0]));
 
-	f->zbd_info->num_open_zones--;
-	td->num_open_zones--;
-	z->open = 0;
+	f->zbd_info->num_write_zones--;
+	td->num_write_zones--;
+	z->write = 0;
 }
 
 /**
@@ -405,7 +405,7 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 
 		zone_lock(td, f, z);
 		pthread_mutex_lock(&f->zbd_info->mutex);
-		zbd_close_zone(td, f, z);
+		zbd_write_zone_put(td, f, z);
 		pthread_mutex_unlock(&f->zbd_info->mutex);
 
 		if (z->wp != z->start) {
@@ -450,19 +450,19 @@ static int zbd_get_max_open_zones(struct thread_data *td, struct fio_file *f,
 }
 
 /**
- * zbd_open_zone - Add a zone to the array of open zones.
+ * zbd_write_zone_get - Add a zone to the array of write zones.
  * @td: fio thread data.
- * @f: fio file that has the open zones to add.
+ * @f: fio file that has the write zones array to add.
  * @zone_idx: Index of the zone to add.
  *
- * Open a ZBD zone if it is not already open. Returns true if either the zone
- * was already open or if the zone was successfully added to the array of open
- * zones without exceeding the maximum number of open zones. Returns false if
- * the zone was not already open and opening the zone would cause the zone limit
- * to be exceeded.
+ * Add a ZBD zone to write target zones array, if it is not yet added. Returns
+ * true if either the zone was already added or if the zone was successfully
+ * added to the array without exceeding the maximum number of write zones.
+ * Returns false if the zone was not already added and addition of the zone
+ * would cause the zone limit to be exceeded.
  */
-static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
-			  struct fio_zone_info *z)
+static bool zbd_write_zone_get(struct thread_data *td, const struct fio_file *f,
+			       struct fio_zone_info *z)
 {
 	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
 	struct zoned_block_device_info *zbdi = f->zbd_info;
@@ -480,20 +480,20 @@ static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 		return false;
 
 	/*
-	 * zbdi->max_open_zones == 0 means that there is no limit on the maximum
-	 * number of open zones. In this case, do no track open zones in
-	 * zbdi->open_zones array.
+	 * zbdi->max_write_zones == 0 means that there is no limit on the
+	 * maximum number of write target zones. In this case, do no track write
+	 * target zones in zbdi->write_zones array.
 	 */
-	if (!zbdi->max_open_zones)
+	if (!zbdi->max_write_zones)
 		return true;
 
 	pthread_mutex_lock(&zbdi->mutex);
 
-	if (z->open) {
+	if (z->write) {
 		/*
 		 * If the zone is going to be completely filled by writes
-		 * already in-flight, handle it as a full zone instead of an
-		 * open zone.
+		 * already in-flight, handle it as a full zone instead of a
+		 * write target zone.
 		 */
 		if (!zbd_zone_remainder(z))
 			res = false;
@@ -503,17 +503,17 @@ static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 	res = false;
 	/* Zero means no limit */
 	if (td->o.job_max_open_zones > 0 &&
-	    td->num_open_zones >= td->o.job_max_open_zones)
+	    td->num_write_zones >= td->o.job_max_open_zones)
 		goto out;
-	if (zbdi->num_open_zones >= zbdi->max_open_zones)
+	if (zbdi->num_write_zones >= zbdi->max_write_zones)
 		goto out;
 
-	dprint(FD_ZBD, "%s: opening zone %u\n",
+	dprint(FD_ZBD, "%s: adding zone %u to write zone array\n",
 	       f->file_name, zone_idx);
 
-	zbdi->open_zones[zbdi->num_open_zones++] = zone_idx;
-	td->num_open_zones++;
-	z->open = 1;
+	zbdi->write_zones[zbdi->num_write_zones++] = zone_idx;
+	td->num_write_zones++;
+	z->write = 1;
 	res = true;
 
 out:
@@ -894,7 +894,7 @@ out:
 	return ret;
 }
 
-static int zbd_set_max_open_zones(struct thread_data *td, struct fio_file *f)
+static int zbd_set_max_write_zones(struct thread_data *td, struct fio_file *f)
 {
 	struct zoned_block_device_info *zbd = f->zbd_info;
 	unsigned int max_open_zones;
@@ -902,7 +902,7 @@ static int zbd_set_max_open_zones(struct thread_data *td, struct fio_file *f)
 
 	if (zbd->model != ZBD_HOST_MANAGED || td->o.ignore_zone_limits) {
 		/* Only host-managed devices have a max open limit */
-		zbd->max_open_zones = td->o.max_open_zones;
+		zbd->max_write_zones = td->o.max_open_zones;
 		goto out;
 	}
 
@@ -913,13 +913,13 @@ static int zbd_set_max_open_zones(struct thread_data *td, struct fio_file *f)
 
 	if (!max_open_zones) {
 		/* No device limit */
-		zbd->max_open_zones = td->o.max_open_zones;
+		zbd->max_write_zones = td->o.max_open_zones;
 	} else if (!td->o.max_open_zones) {
 		/* No user limit. Set limit to device limit */
-		zbd->max_open_zones = max_open_zones;
+		zbd->max_write_zones = max_open_zones;
 	} else if (td->o.max_open_zones <= max_open_zones) {
 		/* Both user limit and dev limit. User limit not too large */
-		zbd->max_open_zones = td->o.max_open_zones;
+		zbd->max_write_zones = td->o.max_open_zones;
 	} else {
 		/* Both user limit and dev limit. User limit too large */
 		td_verror(td, EINVAL,
@@ -931,15 +931,15 @@ static int zbd_set_max_open_zones(struct thread_data *td, struct fio_file *f)
 
 out:
 	/* Ensure that the limit is not larger than FIO's internal limit */
-	if (zbd->max_open_zones > ZBD_MAX_OPEN_ZONES) {
+	if (zbd->max_write_zones > ZBD_MAX_WRITE_ZONES) {
 		td_verror(td, EINVAL, "'max_open_zones' value is too large");
 		log_err("'max_open_zones' value is larger than %u\n",
-			ZBD_MAX_OPEN_ZONES);
+			ZBD_MAX_WRITE_ZONES);
 		return -EINVAL;
 	}
 
-	dprint(FD_ZBD, "%s: using max open zones limit: %"PRIu32"\n",
-	       f->file_name, zbd->max_open_zones);
+	dprint(FD_ZBD, "%s: using max write zones limit: %"PRIu32"\n",
+	       f->file_name, zbd->max_write_zones);
 
 	return 0;
 }
@@ -981,7 +981,7 @@ static int zbd_create_zone_info(struct thread_data *td, struct fio_file *f)
 	assert(f->zbd_info);
 	f->zbd_info->model = zbd_model;
 
-	ret = zbd_set_max_open_zones(td, f);
+	ret = zbd_set_max_write_zones(td, f);
 	if (ret) {
 		zbd_free_zone_info(f);
 		return ret;
@@ -1174,7 +1174,7 @@ int zbd_setup_files(struct thread_data *td)
 			assert(f->min_zone < f->max_zone);
 
 		if (td->o.max_open_zones > 0 &&
-		    zbd->max_open_zones != td->o.max_open_zones) {
+		    zbd->max_write_zones != td->o.max_open_zones) {
 			log_err("Different 'max_open_zones' values\n");
 			return 1;
 		}
@@ -1184,25 +1184,25 @@ int zbd_setup_files(struct thread_data *td)
 		 * global max open zones limit. (As the tracking of open zones
 		 * is disabled when there is no global max open zones limit.)
 		 */
-		if (td->o.job_max_open_zones && !zbd->max_open_zones) {
+		if (td->o.job_max_open_zones && !zbd->max_write_zones) {
 			log_err("'job_max_open_zones' cannot be used without a global open zones limit\n");
 			return 1;
 		}
 
 		/*
-		 * zbd->max_open_zones is the global limit shared for all jobs
+		 * zbd->max_write_zones is the global limit shared for all jobs
 		 * that target the same zoned block device. Force sync the per
 		 * thread global limit with the actual global limit. (The real
 		 * per thread/job limit is stored in td->o.job_max_open_zones).
 		 */
-		td->o.max_open_zones = zbd->max_open_zones;
+		td->o.max_open_zones = zbd->max_write_zones;
 
 		for (zi = f->min_zone; zi < f->max_zone; zi++) {
 			z = &zbd->zone_info[zi];
 			if (z->cond != ZBD_ZONE_COND_IMP_OPEN &&
 			    z->cond != ZBD_ZONE_COND_EXP_OPEN)
 				continue;
-			if (zbd_open_zone(td, f, z))
+			if (zbd_write_zone_get(td, f, z))
 				continue;
 			/*
 			 * If the number of open zones exceeds specified limits,
@@ -1284,12 +1284,12 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 	zbd_reset_write_cnt(td, f);
 }
 
-/* Return random zone index for one of the open zones. */
+/* Return random zone index for one of the write target zones. */
 static uint32_t pick_random_zone_idx(const struct fio_file *f,
 				     const struct io_u *io_u)
 {
 	return (io_u->offset - f->file_offset) *
-		f->zbd_info->num_open_zones / f->io_size;
+		f->zbd_info->num_write_zones / f->io_size;
 }
 
 static bool any_io_in_flight(void)
@@ -1303,35 +1303,35 @@ static bool any_io_in_flight(void)
 }
 
 /*
- * Modify the offset of an I/O unit that does not refer to an open zone such
- * that it refers to an open zone. Close an open zone and open a new zone if
- * necessary. The open zone is searched across sequential zones.
+ * Modify the offset of an I/O unit that does not refer to a zone such that
+ * in write target zones array. Add a zone to or remove a zone from the lsit if
+ * necessary. The write target zone is searched across sequential zones.
  * This algorithm can only work correctly if all write pointers are
  * a multiple of the fio block size. The caller must neither hold z->mutex
  * nor f->zbd_info->mutex. Returns with z->mutex held upon success.
  */
-static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
-						      struct io_u *io_u)
+static struct fio_zone_info *zbd_convert_to_write_zone(struct thread_data *td,
+						       struct io_u *io_u)
 {
 	const uint64_t min_bs = td->o.min_bs[io_u->ddir];
 	struct fio_file *f = io_u->file;
 	struct zoned_block_device_info *zbdi = f->zbd_info;
 	struct fio_zone_info *z;
-	unsigned int open_zone_idx = -1;
+	unsigned int write_zone_idx = -1;
 	uint32_t zone_idx, new_zone_idx;
 	int i;
-	bool wait_zone_close;
+	bool wait_zone_write;
 	bool in_flight;
 	bool should_retry = true;
 
 	assert(is_valid_offset(f, io_u->offset));
 
-	if (zbdi->max_open_zones || td->o.job_max_open_zones) {
+	if (zbdi->max_write_zones || td->o.job_max_open_zones) {
 		/*
-		 * This statement accesses zbdi->open_zones[] on purpose
+		 * This statement accesses zbdi->write_zones[] on purpose
 		 * without locking.
 		 */
-		zone_idx = zbdi->open_zones[pick_random_zone_idx(f, io_u)];
+		zone_idx = zbdi->write_zones[pick_random_zone_idx(f, io_u)];
 	} else {
 		zone_idx = zbd_offset_to_zone_idx(f, io_u->offset);
 	}
@@ -1361,34 +1361,34 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 
 		if (z->has_wp) {
 			if (z->cond != ZBD_ZONE_COND_OFFLINE &&
-			    zbdi->max_open_zones == 0 &&
+			    zbdi->max_write_zones == 0 &&
 			    td->o.job_max_open_zones == 0)
 				goto examine_zone;
-			if (zbdi->num_open_zones == 0) {
-				dprint(FD_ZBD, "%s(%s): no zones are open\n",
+			if (zbdi->num_write_zones == 0) {
+				dprint(FD_ZBD, "%s(%s): no zone is write target\n",
 				       __func__, f->file_name);
-				goto open_other_zone;
+				goto choose_other_zone;
 			}
 		}
 
 		/*
-		 * List of opened zones is per-device, shared across all
+		 * Array of write target zones is per-device, shared across all
 		 * threads. Start with quasi-random candidate zone. Ignore
 		 * zones which don't belong to thread's offset/size area.
 		 */
-		open_zone_idx = pick_random_zone_idx(f, io_u);
-		assert(!open_zone_idx ||
-		       open_zone_idx < zbdi->num_open_zones);
-		tmp_idx = open_zone_idx;
+		write_zone_idx = pick_random_zone_idx(f, io_u);
+		assert(!write_zone_idx ||
+		       write_zone_idx < zbdi->num_write_zones);
+		tmp_idx = write_zone_idx;
 
-		for (i = 0; i < zbdi->num_open_zones; i++) {
+		for (i = 0; i < zbdi->num_write_zones; i++) {
 			uint32_t tmpz;
 
-			if (tmp_idx >= zbdi->num_open_zones)
+			if (tmp_idx >= zbdi->num_write_zones)
 				tmp_idx = 0;
-			tmpz = zbdi->open_zones[tmp_idx];
+			tmpz = zbdi->write_zones[tmp_idx];
 			if (f->min_zone <= tmpz && tmpz < f->max_zone) {
-				open_zone_idx = tmp_idx;
+				write_zone_idx = tmp_idx;
 				goto found_candidate_zone;
 			}
 
@@ -1406,7 +1406,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 		return NULL;
 
 found_candidate_zone:
-		new_zone_idx = zbdi->open_zones[open_zone_idx];
+		new_zone_idx = zbdi->write_zones[write_zone_idx];
 		if (new_zone_idx == zone_idx)
 			break;
 		zone_idx = new_zone_idx;
@@ -1425,32 +1425,32 @@ examine_zone:
 		goto out;
 	}
 
-open_other_zone:
-	/* Check if number of open zones reaches one of limits. */
-	wait_zone_close =
-		zbdi->num_open_zones == f->max_zone - f->min_zone ||
-		(zbdi->max_open_zones &&
-		 zbdi->num_open_zones == zbdi->max_open_zones) ||
+choose_other_zone:
+	/* Check if number of write target zones reaches one of limits. */
+	wait_zone_write =
+		zbdi->num_write_zones == f->max_zone - f->min_zone ||
+		(zbdi->max_write_zones &&
+		 zbdi->num_write_zones == zbdi->max_write_zones) ||
 		(td->o.job_max_open_zones &&
-		 td->num_open_zones == td->o.job_max_open_zones);
+		 td->num_write_zones == td->o.job_max_open_zones);
 
 	pthread_mutex_unlock(&zbdi->mutex);
 
 	/* Only z->mutex is held. */
 
 	/*
-	 * When number of open zones reaches to one of limits, wait for
-	 * zone close before opening a new zone.
+	 * When number of write target zones reaches to one of limits, wait for
+	 * zone write completion to one of them before trying a new zone.
 	 */
-	if (wait_zone_close) {
+	if (wait_zone_write) {
 		dprint(FD_ZBD,
-		       "%s(%s): quiesce to allow open zones to close\n",
+		       "%s(%s): quiesce to remove a zone from write target zones array\n",
 		       __func__, f->file_name);
 		io_u_quiesce(td);
 	}
 
 retry:
-	/* Zone 'z' is full, so try to open a new zone. */
+	/* Zone 'z' is full, so try to choose a new zone. */
 	for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
 		zone_idx++;
 		if (z->has_wp)
@@ -1465,18 +1465,18 @@ retry:
 		if (!z->has_wp)
 			continue;
 		zone_lock(td, f, z);
-		if (z->open)
+		if (z->write)
 			continue;
-		if (zbd_open_zone(td, f, z))
+		if (zbd_write_zone_get(td, f, z))
 			goto out;
 	}
 
 	/* Only z->mutex is held. */
 
-	/* Check whether the write fits in any of the already opened zones. */
+	/* Check whether the write fits in any of the write target zones. */
 	pthread_mutex_lock(&zbdi->mutex);
-	for (i = 0; i < zbdi->num_open_zones; i++) {
-		zone_idx = zbdi->open_zones[i];
+	for (i = 0; i < zbdi->num_write_zones; i++) {
+		zone_idx = zbdi->write_zones[i];
 		if (zone_idx < f->min_zone || zone_idx >= f->max_zone)
 			continue;
 		pthread_mutex_unlock(&zbdi->mutex);
@@ -1492,13 +1492,14 @@ retry:
 
 	/*
 	 * When any I/O is in-flight or when all I/Os in-flight get completed,
-	 * the I/Os might have closed zones then retry the steps to open a zone.
-	 * Before retry, call io_u_quiesce() to complete in-flight writes.
+	 * the I/Os might have removed zones from the write target array then
+	 * retry the steps to choose a zone. Before retry, call io_u_quiesce()
+	 * to complete in-flight writes.
 	 */
 	in_flight = any_io_in_flight();
 	if (in_flight || should_retry) {
 		dprint(FD_ZBD,
-		       "%s(%s): wait zone close and retry open zones\n",
+		       "%s(%s): wait zone write and retry write target zone selection\n",
 		       __func__, f->file_name);
 		pthread_mutex_unlock(&zbdi->mutex);
 		zone_unlock(z);
@@ -1512,7 +1513,7 @@ retry:
 
 	zone_unlock(z);
 
-	dprint(FD_ZBD, "%s(%s): did not open another zone\n",
+	dprint(FD_ZBD, "%s(%s): did not choose another write zone\n",
 	       __func__, f->file_name);
 
 	return NULL;
@@ -1582,7 +1583,8 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u, uint64_t min_bytes,
  * @io_u: I/O unit
  * @z: zone info pointer
  *
- * If the write command made the zone full, close it.
+ * If the write command made the zone full, remove it from the write target
+ * zones array.
  *
  * The caller must hold z->mutex.
  */
@@ -1594,7 +1596,7 @@ static void zbd_end_zone_io(struct thread_data *td, const struct io_u *io_u,
 	if (io_u->ddir == DDIR_WRITE &&
 	    io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
 		pthread_mutex_lock(&f->zbd_info->mutex);
-		zbd_close_zone(td, f, z);
+		zbd_write_zone_put(td, f, z);
 		pthread_mutex_unlock(&f->zbd_info->mutex);
 	}
 }
@@ -1954,7 +1956,7 @@ retry:
 		if (zbd_zone_remainder(zb) > 0 &&
 		    zbd_zone_remainder(zb) < min_bs) {
 			pthread_mutex_lock(&f->zbd_info->mutex);
-			zbd_close_zone(td, f, zb);
+			zbd_write_zone_put(td, f, zb);
 			pthread_mutex_unlock(&f->zbd_info->mutex);
 			dprint(FD_ZBD,
 			       "%s: finish zone %d\n",
@@ -1977,11 +1979,11 @@ retry:
 			zone_lock(td, f, zb);
 		}
 
-		if (!zbd_open_zone(td, f, zb)) {
+		if (!zbd_write_zone_get(td, f, zb)) {
 			zone_unlock(zb);
-			zb = zbd_convert_to_open_zone(td, io_u);
+			zb = zbd_convert_to_write_zone(td, io_u);
 			if (!zb) {
-				dprint(FD_IO, "%s: can't convert to open zone",
+				dprint(FD_IO, "%s: can't convert to write target zone",
 				       f->file_name);
 				goto eof;
 			}
diff --git a/zbd.h b/zbd.h
index 05189555..25a3e0f1 100644
--- a/zbd.h
+++ b/zbd.h
@@ -29,8 +29,8 @@ enum io_u_action {
  * @type: zone type (BLK_ZONE_TYPE_*)
  * @cond: zone state (BLK_ZONE_COND_*)
  * @has_wp: whether or not this zone can have a valid write pointer
- * @open: whether or not this zone is currently open. Only relevant if
- *		max_open_zones > 0.
+ * @write: whether or not this zone is the write target at this moment. Only
+ *              relevant if zbd->max_open_zones > 0.
  * @reset_zone: whether or not this zone should be reset before writing to it
  */
 struct fio_zone_info {
@@ -41,16 +41,17 @@ struct fio_zone_info {
 	enum zbd_zone_type	type:2;
 	enum zbd_zone_cond	cond:4;
 	unsigned int		has_wp:1;
-	unsigned int		open:1;
+	unsigned int		write:1;
 	unsigned int		reset_zone:1;
 };
 
 /**
  * zoned_block_device_info - zoned block device characteristics
  * @model: Device model.
- * @max_open_zones: global limit on the number of simultaneously opened
- *	sequential write zones. A zero value means unlimited open zones,
- *	and that open zones will not be tracked in the open_zones array.
+ * @max_write_zones: global limit on the number of sequential write zones which
+ *      are simultaneously written. A zero value means unlimited zones of
+ *      simultaneous writes and that write target zones will not be tracked in
+ *      the write_zones array.
  * @mutex: Protects the modifiable members in this structure (refcount and
  *		num_open_zones).
  * @zone_size: size of a single zone in bytes.
@@ -61,10 +62,10 @@ struct fio_zone_info {
  *		if the zone size is not a power of 2.
  * @nr_zones: number of zones
  * @refcount: number of fio files that share this structure
- * @num_open_zones: number of open zones
+ * @num_write_zones: number of write target zones
  * @write_cnt: Number of writes since the latest zone reset triggered by
  *	       the zone_reset_frequency fio job parameter.
- * @open_zones: zone numbers of open zones
+ * @write_zones: zone numbers of write target zones
  * @zone_info: description of the individual zones
  *
  * Only devices for which all zones have the same size are supported.
@@ -73,7 +74,7 @@ struct fio_zone_info {
  */
 struct zoned_block_device_info {
 	enum zbd_zoned_model	model;
-	uint32_t		max_open_zones;
+	uint32_t		max_write_zones;
 	pthread_mutex_t		mutex;
 	uint64_t		zone_size;
 	uint64_t		wp_valid_data_bytes;
@@ -82,9 +83,9 @@ struct zoned_block_device_info {
 	uint32_t		zone_size_log2;
 	uint32_t		nr_zones;
 	uint32_t		refcount;
-	uint32_t		num_open_zones;
+	uint32_t		num_write_zones;
 	uint32_t		write_cnt;
-	uint32_t		open_zones[ZBD_MAX_OPEN_ZONES];
+	uint32_t		write_zones[ZBD_MAX_WRITE_ZONES];
 	struct fio_zone_info	zone_info[0];
 };
 
diff --git a/zbd_types.h b/zbd_types.h
index 0a8630cb..5f44f308 100644
--- a/zbd_types.h
+++ b/zbd_types.h
@@ -8,7 +8,7 @@
 
 #include <inttypes.h>
 
-#define ZBD_MAX_OPEN_ZONES	4096
+#define ZBD_MAX_WRITE_ZONES	4096
 
 /*
  * Zoned block device models.
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/7] zbd: do not reset extra zones in open conditions
  2023-06-07  8:32 [PATCH 0/7] zbd: clean up code and fix bugs for open zones accounting Shin'ichiro Kawasaki
  2023-06-07  8:32 ` [PATCH 1/7] zbd: rename 'open zones' to 'write zones' Shin'ichiro Kawasaki
@ 2023-06-07  8:32 ` Shin'ichiro Kawasaki
  2023-06-07 13:15   ` Niklas Cassel
  2023-06-07  8:32 ` [PATCH 3/7] zbd: fix write zone accounting of almost full zones Shin'ichiro Kawasaki
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Shin'ichiro Kawasaki @ 2023-06-07  8:32 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu
  Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel, Shin'ichiro Kawasaki

The commit 954217b90191 ("zbd: Initialize open zones list referring zone
status at fio start") introduced zone resets for zones in open condition
which exceeds the limit of max_open_zones. However, this zone reset may
break data in the zones even when fio does no write to them. Avoid the
zone reset and report it as an error.

Fixes: 954217b90191 ("zbd: Initialize open zones list referring zone status at fio start")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 zbd.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/zbd.c b/zbd.c
index 7529e68b..832868cb 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1206,12 +1206,10 @@ int zbd_setup_files(struct thread_data *td)
 				continue;
 			/*
 			 * If the number of open zones exceeds specified limits,
-			 * reset all extra open zones.
+			 * error out.
 			 */
-			if (zbd_reset_zone(td, f, z) < 0) {
-				log_err("Failed to reest zone %d\n", zi);
-				return 1;
-			}
+			log_err("Number of open zones exceeds max_open_zones limit\n");
+			return 1;
 		}
 	}
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] zbd: fix write zone accounting of almost full zones
  2023-06-07  8:32 [PATCH 0/7] zbd: clean up code and fix bugs for open zones accounting Shin'ichiro Kawasaki
  2023-06-07  8:32 ` [PATCH 1/7] zbd: rename 'open zones' to 'write zones' Shin'ichiro Kawasaki
  2023-06-07  8:32 ` [PATCH 2/7] zbd: do not reset extra zones in open conditions Shin'ichiro Kawasaki
@ 2023-06-07  8:32 ` Shin'ichiro Kawasaki
  2023-06-07 13:15   ` Niklas Cassel
  2023-06-07  8:32 ` [PATCH 4/7] zbd: fix write zone accounting of trim workload Shin'ichiro Kawasaki
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Shin'ichiro Kawasaki @ 2023-06-07  8:32 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu
  Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel, Shin'ichiro Kawasaki

For zonemode=zbd, fio checks condition of each zone and account it as
write target zone if it has open condition. However, when such a zone in
open condition is almost full and its remainder area for write is
smaller than the block size, fio does not handle it as a write target
zone. This causes difference between open zones accounting on the device
and write target zones accounting by fio. It results in unexpected
max_open_zones limit failure.

Avoid the zone accounting difference by handling the almost full zones
as write target zones at fio start. Introduce the helper function
__zbd_write_zone_get() which does same operation as zbd_write_zone_get()
except the check for the almost full zones. At fio start, call
__zbd_write_zone_get() so that almost full zones are added to write
target zones. During fio workload run, call zbd_write_zone_get() so that
the almost full zones are not chosen for write target.

Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 zbd.c | 47 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 11 deletions(-)

diff --git a/zbd.c b/zbd.c
index 832868cb..36b68d62 100644
--- a/zbd.c
+++ b/zbd.c
@@ -450,21 +450,19 @@ static int zbd_get_max_open_zones(struct thread_data *td, struct fio_file *f,
 }
 
 /**
- * zbd_write_zone_get - Add a zone to the array of write zones.
+ * __zbd_write_zone_get - Add a zone to the array of write zones.
  * @td: fio thread data.
  * @f: fio file that has the write zones array to add.
  * @zone_idx: Index of the zone to add.
  *
- * Add a ZBD zone to write target zones array, if it is not yet added. Returns
- * true if either the zone was already added or if the zone was successfully
- * added to the array without exceeding the maximum number of write zones.
- * Returns false if the zone was not already added and addition of the zone
- * would cause the zone limit to be exceeded.
+ * Do same operation as @zbd_write_zone_get, except it adds the zone at
+ * @zone_idx to write target zones array even when it does not have remainder
+ * space to write one block.
  */
-static bool zbd_write_zone_get(struct thread_data *td, const struct fio_file *f,
-			       struct fio_zone_info *z)
+static bool __zbd_write_zone_get(struct thread_data *td,
+				 const struct fio_file *f,
+				 struct fio_zone_info *z)
 {
-	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
 	struct zoned_block_device_info *zbdi = f->zbd_info;
 	uint32_t zone_idx = zbd_zone_idx(f, z);
 	bool res = true;
@@ -476,7 +474,7 @@ static bool zbd_write_zone_get(struct thread_data *td, const struct fio_file *f,
 	 * Skip full zones with data verification enabled because resetting a
 	 * zone causes data loss and hence causes verification to fail.
 	 */
-	if (td->o.verify != VERIFY_NONE && zbd_zone_full(f, z, min_bs))
+	if (td->o.verify != VERIFY_NONE && zbd_zone_remainder(z) == 0)
 		return false;
 
 	/*
@@ -521,6 +519,33 @@ out:
 	return res;
 }
 
+/**
+ * zbd_write_zone_get - Add a zone to the array of write zones.
+ * @td: fio thread data.
+ * @f: fio file that has the open zones to add.
+ * @zone_idx: Index of the zone to add.
+ *
+ * Add a ZBD zone to write target zones array, if it is not yet added. Returns
+ * true if either the zone was already added or if the zone was successfully
+ * added to the array without exceeding the maximum number of write zones.
+ * Returns false if the zone was not already added and addition of the zone
+ * would cause the zone limit to be exceeded.
+ */
+static bool zbd_write_zone_get(struct thread_data *td, const struct fio_file *f,
+			       struct fio_zone_info *z)
+{
+	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
+
+	/*
+	 * Skip full zones with data verification enabled because resetting a
+	 * zone causes data loss and hence causes verification to fail.
+	 */
+	if (td->o.verify != VERIFY_NONE && zbd_zone_full(f, z, min_bs))
+		return false;
+
+	return __zbd_write_zone_get(td, f, z);
+}
+
 /* Verify whether direct I/O is used for all host-managed zoned block drives. */
 static bool zbd_using_direct_io(void)
 {
@@ -1202,7 +1227,7 @@ int zbd_setup_files(struct thread_data *td)
 			if (z->cond != ZBD_ZONE_COND_IMP_OPEN &&
 			    z->cond != ZBD_ZONE_COND_EXP_OPEN)
 				continue;
-			if (zbd_write_zone_get(td, f, z))
+			if (__zbd_write_zone_get(td, f, z))
 				continue;
 			/*
 			 * If the number of open zones exceeds specified limits,
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] zbd: fix write zone accounting of trim workload
  2023-06-07  8:32 [PATCH 0/7] zbd: clean up code and fix bugs for open zones accounting Shin'ichiro Kawasaki
                   ` (2 preceding siblings ...)
  2023-06-07  8:32 ` [PATCH 3/7] zbd: fix write zone accounting of almost full zones Shin'ichiro Kawasaki
@ 2023-06-07  8:32 ` Shin'ichiro Kawasaki
  2023-06-07 13:15   ` Niklas Cassel
  2023-06-07 15:32   ` Jens Axboe
  2023-06-07  8:32 ` [PATCH 5/7] t/zbd: reset zones before tests with max_open_zones option Shin'ichiro Kawasaki
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 16+ messages in thread
From: Shin'ichiro Kawasaki @ 2023-06-07  8:32 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu
  Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel, Shin'ichiro Kawasaki

The commit e3be810bf0fd ("zbd: Support zone reset by trim") supported
trim for zonemode=zbd by introducing the function zbd_do_io_u_trim(),
which calls zbd_reset_zone(). However, it did not call
zbd_write_zone_put() to the trim target zone, then trim operation
resulted in wrong accounting of write zones.

To fix the issue, call zbd_write_zone_put() from zbd_reset_zone(). To
cover the case to reset zones without a zbd_write_zone_put() call,
prepare another function __zbd_reset_zone(). While at it, simplify
zbd_reset_zones() by calling the modified zbd_reset_zone().

Of note is that the constifier of the argument td of do_io_u_trim() is
removed since zbd_write_zone_put() requires changes in that argument.

Fixes: e3be810bf0fd ("zbd: Support zone reset by trim")
Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 engines/io_uring.c |  2 +-
 io_u.c             |  2 +-
 io_u.h             |  2 +-
 zbd.c              | 40 ++++++++++++++++++++++++++++++++--------
 zbd.h              |  2 +-
 5 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/engines/io_uring.c b/engines/io_uring.c
index ff64fc9f..73e4a27a 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -561,7 +561,7 @@ static inline void fio_ioring_cmdprio_prep(struct thread_data *td,
 		ld->sqes[io_u->index].ioprio = io_u->ioprio;
 }
 
-static int fio_ioring_cmd_io_u_trim(const struct thread_data *td,
+static int fio_ioring_cmd_io_u_trim(struct thread_data *td,
 				    struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
diff --git a/io_u.c b/io_u.c
index 6f5fc94d..faf512e5 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2379,7 +2379,7 @@ int do_io_u_sync(const struct thread_data *td, struct io_u *io_u)
 	return ret;
 }
 
-int do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
+int do_io_u_trim(struct thread_data *td, struct io_u *io_u)
 {
 #ifndef FIO_HAVE_TRIM
 	io_u->error = EINVAL;
diff --git a/io_u.h b/io_u.h
index 55b4d083..b432a540 100644
--- a/io_u.h
+++ b/io_u.h
@@ -162,7 +162,7 @@ void io_u_mark_submit(struct thread_data *, unsigned int);
 bool queue_full(const struct thread_data *);
 
 int do_io_u_sync(const struct thread_data *, struct io_u *);
-int do_io_u_trim(const struct thread_data *, struct io_u *);
+int do_io_u_trim(struct thread_data *, struct io_u *);
 
 #ifdef FIO_INC_DEBUG
 static inline void dprint_io_u(struct io_u *io_u, const char *p)
diff --git a/zbd.c b/zbd.c
index 36b68d62..7eef49c6 100644
--- a/zbd.c
+++ b/zbd.c
@@ -254,7 +254,7 @@ static int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
 }
 
 /**
- * zbd_reset_zone - reset the write pointer of a single zone
+ * __zbd_reset_zone - reset the write pointer of a single zone
  * @td: FIO thread data.
  * @f: FIO file associated with the disk for which to reset a write pointer.
  * @z: Zone to reset.
@@ -263,8 +263,8 @@ static int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
  *
  * The caller must hold z->mutex.
  */
-static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
-			  struct fio_zone_info *z)
+static int __zbd_reset_zone(struct thread_data *td, struct fio_file *f,
+			    struct fio_zone_info *z)
 {
 	uint64_t offset = z->start;
 	uint64_t length = (z+1)->start - offset;
@@ -339,6 +339,33 @@ static void zbd_write_zone_put(struct thread_data *td, const struct fio_file *f,
 	z->write = 0;
 }
 
+/**
+ * zbd_reset_zone - reset the write pointer of a single zone and remove the zone
+ *                  from the array of write zones.
+ * @td: FIO thread data.
+ * @f: FIO file associated with the disk for which to reset a write pointer.
+ * @z: Zone to reset.
+ *
+ * Returns 0 upon success and a negative error code upon failure.
+ * The caller must hold z->mutex.
+ */
+static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
+			  struct fio_zone_info *z)
+{
+	int ret;
+
+	ret = __zbd_reset_zone(td, f, z);
+	if (ret)
+		goto done;
+
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	zbd_write_zone_put(td, f, z);
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+
+done:
+	return ret;
+}
+
 /**
  * zbd_finish_zone - finish the specified zone
  * @td: FIO thread data.
@@ -404,9 +431,6 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 			continue;
 
 		zone_lock(td, f, z);
-		pthread_mutex_lock(&f->zbd_info->mutex);
-		zbd_write_zone_put(td, f, z);
-		pthread_mutex_unlock(&f->zbd_info->mutex);
 
 		if (z->wp != z->start) {
 			dprint(FD_ZBD, "%s: resetting zone %u\n",
@@ -2048,7 +2072,7 @@ retry:
 			 */
 			io_u_quiesce(td);
 			zb->reset_zone = 0;
-			if (zbd_reset_zone(td, f, zb) < 0)
+			if (__zbd_reset_zone(td, f, zb) < 0)
 				goto eof;
 
 			if (zb->capacity < min_bs) {
@@ -2167,7 +2191,7 @@ char *zbd_write_status(const struct thread_stat *ts)
  * Return io_u_completed when reset zone succeeds. Return 0 when the target zone
  * does not have write pointer. On error, return negative errno.
  */
-int zbd_do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
+int zbd_do_io_u_trim(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	struct fio_zone_info *z;
diff --git a/zbd.h b/zbd.h
index 25a3e0f1..f0ac9876 100644
--- a/zbd.h
+++ b/zbd.h
@@ -100,7 +100,7 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
 			      enum fio_ddir ddir);
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
 char *zbd_write_status(const struct thread_stat *ts);
-int zbd_do_io_u_trim(const struct thread_data *td, struct io_u *io_u);
+int zbd_do_io_u_trim(struct thread_data *td, struct io_u *io_u);
 
 static inline void zbd_close_file(struct fio_file *f)
 {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] t/zbd: reset zones before tests with max_open_zones option
  2023-06-07  8:32 [PATCH 0/7] zbd: clean up code and fix bugs for open zones accounting Shin'ichiro Kawasaki
                   ` (3 preceding siblings ...)
  2023-06-07  8:32 ` [PATCH 4/7] zbd: fix write zone accounting of trim workload Shin'ichiro Kawasaki
@ 2023-06-07  8:32 ` Shin'ichiro Kawasaki
  2023-06-07  8:32 ` [PATCH 6/7] t/zbd: test write zone accounting of almost full zones Shin'ichiro Kawasaki
  2023-06-07  8:32 ` [PATCH 7/7] t/zbd: test write zone accounting of trim workload Shin'ichiro Kawasaki
  6 siblings, 0 replies; 16+ messages in thread
From: Shin'ichiro Kawasaki @ 2023-06-07  8:32 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu
  Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel, Shin'ichiro Kawasaki

After the recent fix, fio no longer resets zones when it finds more
zones in open condition than the max_open_zones option. This results in
failure of test cases 12, 13, 29, 32, 48 and 51. To avoid the failures,
reset zones at the beginning of the test cases.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 t/zbd/test-zbd-support | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 996160e7..86577952 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -460,7 +460,8 @@ test11() {
 test12() {
     local size off capacity
 
-    prep_write
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
     size=$((8 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 8 $off $dev)
@@ -477,7 +478,8 @@ test13() {
 
     require_max_open_zones 4 || return $SKIP_TESTCASE
 
-    prep_write
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
     size=$((8 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 8 $off $dev)
@@ -726,7 +728,9 @@ test29() {
     require_seq_zones 80 || return $SKIP_TESTCASE
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     size=$((16*zone_size))
-    prep_write
+
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
     opts=("--debug=zbd")
     for ((i=0;i<jobs;i++)); do
 	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
@@ -796,7 +800,8 @@ test32() {
 
     require_zbd || return $SKIP_TESTCASE
 
-    prep_write
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
     off=$((first_sequential_zone_sector * 512))
     size=$((disk_size - off))
     opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
@@ -1024,7 +1029,9 @@ test48() {
 
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     size=$((16*zone_size))
-    prep_write
+
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
     opts=("--aux-path=/tmp" "--allow_file_create=0" "--significant_figures=10")
     opts+=("--debug=zbd")
     opts+=("$(ioengine "libaio")" "--rw=randwrite" "--direct=1")
@@ -1094,7 +1101,7 @@ test51() {
 	require_conv_zones 8 || return $SKIP_TESTCASE
 	require_seq_zones 8 || return $SKIP_TESTCASE
 
-	prep_write
+	reset_zone "$dev" -1
 
 	off=$((first_sequential_zone_sector * 512 - 8 * zone_size))
 	opts+=("--size=$((16 * zone_size))" "$(ioengine "libaio")")
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] t/zbd: test write zone accounting of almost full zones
  2023-06-07  8:32 [PATCH 0/7] zbd: clean up code and fix bugs for open zones accounting Shin'ichiro Kawasaki
                   ` (4 preceding siblings ...)
  2023-06-07  8:32 ` [PATCH 5/7] t/zbd: reset zones before tests with max_open_zones option Shin'ichiro Kawasaki
@ 2023-06-07  8:32 ` Shin'ichiro Kawasaki
  2023-06-07  8:32 ` [PATCH 7/7] t/zbd: test write zone accounting of trim workload Shin'ichiro Kawasaki
  6 siblings, 0 replies; 16+ messages in thread
From: Shin'ichiro Kawasaki @ 2023-06-07  8:32 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu
  Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel, Shin'ichiro Kawasaki

Recent commit fixed the bug of the write zone accounting for almost full
zones. Add a test case which confirms the fix.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 t/zbd/test-zbd-support | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 86577952..cdaa0574 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -1368,6 +1368,27 @@ test63() {
 	check_reset_count -eq 3 || return $?
 }
 
+# Test write zone accounting handles almost full zones correctly. Prepare an
+# almost full, but not full zone. Write to the zone with verify using larger
+# block size. Then confirm fio does not report write zone accounting failure.
+test64() {
+	local bs cap
+
+	[ -n "$is_zbd" ] && reset_zone "$dev" -1
+
+	bs=$((zone_size / 8))
+	cap=$(total_zone_capacity 1 $((first_sequential_zone_sector*512)) $dev)
+	run_fio_on_seq "$(ioengine "psync")" --rw=write --bs="$bs" \
+		       --size=$((zone_size)) \
+		       --io_size=$((cap - bs)) \
+		       >> "${logfile}.${test_number}" 2>&1 || return $?
+
+	bs=$((zone_size / 2))
+	run_fio_on_seq "$(ioengine "psync")" --rw=write --bs="$bs" \
+		       --size=$((zone_size)) --do_verify=1 --verify=md5 \
+		       >> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
 SECONDS=0
 tests=()
 dynamic_analyzer=()
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] t/zbd: test write zone accounting of trim workload
  2023-06-07  8:32 [PATCH 0/7] zbd: clean up code and fix bugs for open zones accounting Shin'ichiro Kawasaki
                   ` (5 preceding siblings ...)
  2023-06-07  8:32 ` [PATCH 6/7] t/zbd: test write zone accounting of almost full zones Shin'ichiro Kawasaki
@ 2023-06-07  8:32 ` Shin'ichiro Kawasaki
  6 siblings, 0 replies; 16+ messages in thread
From: Shin'ichiro Kawasaki @ 2023-06-07  8:32 UTC (permalink / raw)
  To: fio, Jens Axboe, Vincent Fu
  Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel, Shin'ichiro Kawasaki

Recent commit fixed the bug of the write zone accounting of trim
workload. Add a test case which confirms the fix.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 t/zbd/test-zbd-support | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index cdaa0574..a3d37a7d 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -1389,6 +1389,30 @@ test64() {
 		       >> "${logfile}.${test_number}" 2>&1 || return $?
 }
 
+# Test open zone accounting handles trim workload correctly. Prepare open zones
+# as many as max_open_zones=4. Trim one of the 4 zones. Then write to another
+# zone and check the write amount is expected size.
+test65() {
+	local off capacity
+
+	[ -n "$is_zbd" ] && reset_zone "$dev" -1
+
+	off=$((first_sequential_zone_sector * 512))
+	capacity=$(total_zone_capacity 1 $off "$dev")
+	run_fio --zonemode=zbd --direct=1 --zonesize="$zone_size" --thread=1 \
+		--filename="$dev" --group_reporting=1 --max_open_zones=4 \
+		"$(ioengine "psync")" \
+		--name="prep_open_zones" --rw=randwrite --offset="$off" \
+		--size="$((zone_size * 4))" --bs=4096 --io_size="$zone_size" \
+		--name=trimjob --wait_for="prep_open_zones" --rw=trim \
+		--bs="$zone_size" --offset="$off" --size="$zone_size" \
+		--name=write --wait_for="trimjob" --rw=write --bs=4096 \
+		--offset="$((off + zone_size * 4))" --size="$zone_size" \
+		>> "${logfile}.${test_number}" 2>&1
+
+	check_written $((zone_size + capacity))
+}
+
 SECONDS=0
 tests=()
 dynamic_analyzer=()
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] zbd: rename 'open zones' to 'write zones'
  2023-06-07  8:32 ` [PATCH 1/7] zbd: rename 'open zones' to 'write zones' Shin'ichiro Kawasaki
@ 2023-06-07 13:15   ` Niklas Cassel
  0 siblings, 0 replies; 16+ messages in thread
From: Niklas Cassel @ 2023-06-07 13:15 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: fio, Jens Axboe, Vincent Fu, Damien Le Moal, Dmitry Fomichev

On Wed, Jun 07, 2023 at 05:32:24PM +0900, Shin'ichiro Kawasaki wrote:
> Current fio code for zonemode=zbd uses the word 'open zone' to mean the
> zones that fio jobs write to. Before fio starts writing to a zone, it
> calls zbd_open_zone(). When fio completes writing to a zone, it calls
> zbd_close_zone(). This wording is good for zoned block devices with
> max_open_zones limit, such as ZBC and ZAC devices. The devices use same
> word 'open' to express the zone condition that the devices assign
> resources for data write to zones. However, the word 'open' gets
> confusing to support zoned block devices which has max_active_zones
> limit, such as ZNS devices. These devices have both 'open' and 'active'
> keywords to mean two different kinds of resources on the device. This
> 'active' status does not fit with the 'open zone' wording in the fio
> code. Also, the word 'open' zone in fio code does not always match with
> the 'open' condition of zones on the device (e.g. when
> --ignore_zone_limits option is specified).
> 
> To avoid the confusion, stop using the word 'open zone' in the fio code.
> Instead, use the word 'write zone' to mean that the zone is the write
> target. When fio starts a write to a zone, it adds the zone to
> write_zones array. When fio completes writing to a zone, it removes the
> zone from the write_zones array. For this purpose, rename struct fields,
> functions and a macro:
> 
>   ZBD_MAX_OPEN_ZONES -> ZBD_MAX_WRITE_ZONES
>   struct fio_zone_info
>     open -> write
>   struct thread_data
>     num_open_zones -> num_write_zones
>   struct zoned_block_device_info:
>     max_open_zones -> max_write_zones
>     num_open_zones -> num_write_zones
>     open_zones[] -> write_zones[]
>   zbd_open_zone() -> zbd_write_zone_get()
>   zbd_close_zone() -> zbd_write_zone_put()
>   zbd_convert_to_open_zone() -> zbd_convert_to_write_zone()
> 
> To match up these changes, rename local variables and goto labels. Also
> rephrase code comments.
> 
> Of note is that this rename is only for the fio code. The fio options
> max_open_zones and job_max_open_zones are not renamed to not confuse
> users.
> 
> Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---

Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/7] zbd: do not reset extra zones in open conditions
  2023-06-07  8:32 ` [PATCH 2/7] zbd: do not reset extra zones in open conditions Shin'ichiro Kawasaki
@ 2023-06-07 13:15   ` Niklas Cassel
  0 siblings, 0 replies; 16+ messages in thread
From: Niklas Cassel @ 2023-06-07 13:15 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: fio, Jens Axboe, Vincent Fu, Damien Le Moal, Dmitry Fomichev

On Wed, Jun 07, 2023 at 05:32:25PM +0900, Shin'ichiro Kawasaki wrote:
> The commit 954217b90191 ("zbd: Initialize open zones list referring zone
> status at fio start") introduced zone resets for zones in open condition
> which exceeds the limit of max_open_zones. However, this zone reset may
> break data in the zones even when fio does no write to them. Avoid the
> zone reset and report it as an error.
> 
> Fixes: 954217b90191 ("zbd: Initialize open zones list referring zone status at fio start")
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  zbd.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/zbd.c b/zbd.c
> index 7529e68b..832868cb 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -1206,12 +1206,10 @@ int zbd_setup_files(struct thread_data *td)
>  				continue;
>  			/*
>  			 * If the number of open zones exceeds specified limits,
> -			 * reset all extra open zones.
> +			 * error out.
>  			 */
> -			if (zbd_reset_zone(td, f, z) < 0) {
> -				log_err("Failed to reest zone %d\n", zi);
> -				return 1;
> -			}
> +			log_err("Number of open zones exceeds max_open_zones limit\n");
> +			return 1;
>  		}
>  	}
>  
> -- 
> 2.40.1
> 

Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/7] zbd: fix write zone accounting of almost full zones
  2023-06-07  8:32 ` [PATCH 3/7] zbd: fix write zone accounting of almost full zones Shin'ichiro Kawasaki
@ 2023-06-07 13:15   ` Niklas Cassel
  0 siblings, 0 replies; 16+ messages in thread
From: Niklas Cassel @ 2023-06-07 13:15 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: fio, Jens Axboe, Vincent Fu, Damien Le Moal, Dmitry Fomichev

On Wed, Jun 07, 2023 at 05:32:26PM +0900, Shin'ichiro Kawasaki wrote:
> For zonemode=zbd, fio checks condition of each zone and account it as
> write target zone if it has open condition. However, when such a zone in
> open condition is almost full and its remainder area for write is
> smaller than the block size, fio does not handle it as a write target
> zone. This causes difference between open zones accounting on the device
> and write target zones accounting by fio. It results in unexpected
> max_open_zones limit failure.
> 
> Avoid the zone accounting difference by handling the almost full zones
> as write target zones at fio start. Introduce the helper function
> __zbd_write_zone_get() which does same operation as zbd_write_zone_get()
> except the check for the almost full zones. At fio start, call
> __zbd_write_zone_get() so that almost full zones are added to write
> target zones. During fio workload run, call zbd_write_zone_get() so that
> the almost full zones are not chosen for write target.
> 
> Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---

Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/7] zbd: fix write zone accounting of trim workload
  2023-06-07  8:32 ` [PATCH 4/7] zbd: fix write zone accounting of trim workload Shin'ichiro Kawasaki
@ 2023-06-07 13:15   ` Niklas Cassel
  2023-06-08  5:48     ` Shinichiro Kawasaki
  2023-06-07 15:32   ` Jens Axboe
  1 sibling, 1 reply; 16+ messages in thread
From: Niklas Cassel @ 2023-06-07 13:15 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: fio, Jens Axboe, Vincent Fu, Damien Le Moal, Dmitry Fomichev

On Wed, Jun 07, 2023 at 05:32:27PM +0900, Shin'ichiro Kawasaki wrote:
> The commit e3be810bf0fd ("zbd: Support zone reset by trim") supported
> trim for zonemode=zbd by introducing the function zbd_do_io_u_trim(),
> which calls zbd_reset_zone(). However, it did not call
> zbd_write_zone_put() to the trim target zone, then trim operation
> resulted in wrong accounting of write zones.
> 
> To fix the issue, call zbd_write_zone_put() from zbd_reset_zone(). To
> cover the case to reset zones without a zbd_write_zone_put() call,
> prepare another function __zbd_reset_zone(). While at it, simplify
> zbd_reset_zones() by calling the modified zbd_reset_zone().
> 
> Of note is that the constifier of the argument td of do_io_u_trim() is
> removed since zbd_write_zone_put() requires changes in that argument.
> 
> Fixes: e3be810bf0fd ("zbd: Support zone reset by trim")
> Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  engines/io_uring.c |  2 +-
>  io_u.c             |  2 +-
>  io_u.h             |  2 +-
>  zbd.c              | 40 ++++++++++++++++++++++++++++++++--------
>  zbd.h              |  2 +-
>  5 files changed, 36 insertions(+), 12 deletions(-)
> 
> diff --git a/engines/io_uring.c b/engines/io_uring.c
> index ff64fc9f..73e4a27a 100644
> --- a/engines/io_uring.c
> +++ b/engines/io_uring.c
> @@ -561,7 +561,7 @@ static inline void fio_ioring_cmdprio_prep(struct thread_data *td,
>  		ld->sqes[io_u->index].ioprio = io_u->ioprio;
>  }
>  
> -static int fio_ioring_cmd_io_u_trim(const struct thread_data *td,
> +static int fio_ioring_cmd_io_u_trim(struct thread_data *td,
>  				    struct io_u *io_u)
>  {
>  	struct fio_file *f = io_u->file;
> diff --git a/io_u.c b/io_u.c
> index 6f5fc94d..faf512e5 100644
> --- a/io_u.c
> +++ b/io_u.c
> @@ -2379,7 +2379,7 @@ int do_io_u_sync(const struct thread_data *td, struct io_u *io_u)
>  	return ret;
>  }
>  
> -int do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
> +int do_io_u_trim(struct thread_data *td, struct io_u *io_u)
>  {
>  #ifndef FIO_HAVE_TRIM
>  	io_u->error = EINVAL;
> diff --git a/io_u.h b/io_u.h
> index 55b4d083..b432a540 100644
> --- a/io_u.h
> +++ b/io_u.h
> @@ -162,7 +162,7 @@ void io_u_mark_submit(struct thread_data *, unsigned int);
>  bool queue_full(const struct thread_data *);
>  
>  int do_io_u_sync(const struct thread_data *, struct io_u *);
> -int do_io_u_trim(const struct thread_data *, struct io_u *);
> +int do_io_u_trim(struct thread_data *, struct io_u *);
>  
>  #ifdef FIO_INC_DEBUG
>  static inline void dprint_io_u(struct io_u *io_u, const char *p)
> diff --git a/zbd.c b/zbd.c
> index 36b68d62..7eef49c6 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -254,7 +254,7 @@ static int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
>  }
>  
>  /**
> - * zbd_reset_zone - reset the write pointer of a single zone
> + * __zbd_reset_zone - reset the write pointer of a single zone
>   * @td: FIO thread data.
>   * @f: FIO file associated with the disk for which to reset a write pointer.
>   * @z: Zone to reset.
> @@ -263,8 +263,8 @@ static int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
>   *
>   * The caller must hold z->mutex.
>   */
> -static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
> -			  struct fio_zone_info *z)
> +static int __zbd_reset_zone(struct thread_data *td, struct fio_file *f,
> +			    struct fio_zone_info *z)
>  {
>  	uint64_t offset = z->start;
>  	uint64_t length = (z+1)->start - offset;
> @@ -339,6 +339,33 @@ static void zbd_write_zone_put(struct thread_data *td, const struct fio_file *f,
>  	z->write = 0;
>  }
>  
> +/**
> + * zbd_reset_zone - reset the write pointer of a single zone and remove the zone
> + *                  from the array of write zones.
> + * @td: FIO thread data.
> + * @f: FIO file associated with the disk for which to reset a write pointer.
> + * @z: Zone to reset.
> + *
> + * Returns 0 upon success and a negative error code upon failure.

Please add an empty line before the "The caller must hold z->mutex" line,
such that we are consistent with other functions, e.g. __zbd_reset_zone().

> + * The caller must hold z->mutex.
> + */
> +static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
> +			  struct fio_zone_info *z)
> +{
> +	int ret;
> +
> +	ret = __zbd_reset_zone(td, f, z);
> +	if (ret)
> +		goto done;
> +
> +	pthread_mutex_lock(&f->zbd_info->mutex);
> +	zbd_write_zone_put(td, f, z);
> +	pthread_mutex_unlock(&f->zbd_info->mutex);
> +
> +done:
> +	return ret;
> +}
> +
>  /**
>   * zbd_finish_zone - finish the specified zone
>   * @td: FIO thread data.
> @@ -404,9 +431,6 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
>  			continue;
>  
>  		zone_lock(td, f, z);
> -		pthread_mutex_lock(&f->zbd_info->mutex);
> -		zbd_write_zone_put(td, f, z);
> -		pthread_mutex_unlock(&f->zbd_info->mutex);
>  
>  		if (z->wp != z->start) {
>  			dprint(FD_ZBD, "%s: resetting zone %u\n",
> @@ -2048,7 +2072,7 @@ retry:
>  			 */
>  			io_u_quiesce(td);
>  			zb->reset_zone = 0;
> -			if (zbd_reset_zone(td, f, zb) < 0)
> +			if (__zbd_reset_zone(td, f, zb) < 0)

Since you remove the zbd_write_zone_put() call, which I think is good,
I think that you should continue to call zbd_reset_zone() here.

>  				goto eof;
>  
>  			if (zb->capacity < min_bs) {
> @@ -2167,7 +2191,7 @@ char *zbd_write_status(const struct thread_stat *ts)
>   * Return io_u_completed when reset zone succeeds. Return 0 when the target zone
>   * does not have write pointer. On error, return negative errno.
>   */
> -int zbd_do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
> +int zbd_do_io_u_trim(struct thread_data *td, struct io_u *io_u)
>  {
>  	struct fio_file *f = io_u->file;
>  	struct fio_zone_info *z;
> diff --git a/zbd.h b/zbd.h
> index 25a3e0f1..f0ac9876 100644
> --- a/zbd.h
> +++ b/zbd.h
> @@ -100,7 +100,7 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
>  			      enum fio_ddir ddir);
>  enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
>  char *zbd_write_status(const struct thread_stat *ts);
> -int zbd_do_io_u_trim(const struct thread_data *td, struct io_u *io_u);
> +int zbd_do_io_u_trim(struct thread_data *td, struct io_u *io_u);
>  
>  static inline void zbd_close_file(struct fio_file *f)
>  {
> -- 
> 2.40.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/7] zbd: fix write zone accounting of trim workload
  2023-06-07  8:32 ` [PATCH 4/7] zbd: fix write zone accounting of trim workload Shin'ichiro Kawasaki
  2023-06-07 13:15   ` Niklas Cassel
@ 2023-06-07 15:32   ` Jens Axboe
  2023-06-08  5:49     ` Shinichiro Kawasaki
  1 sibling, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2023-06-07 15:32 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, fio, Vincent Fu
  Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel

On 6/7/23 2:32?AM, Shin'ichiro Kawasaki wrote:
> @@ -339,6 +339,33 @@ static void zbd_write_zone_put(struct thread_data *td, const struct fio_file *f,
>  	z->write = 0;
>  }
>  
> +/**
> + * zbd_reset_zone - reset the write pointer of a single zone and remove the zone
> + *                  from the array of write zones.
> + * @td: FIO thread data.
> + * @f: FIO file associated with the disk for which to reset a write pointer.
> + * @z: Zone to reset.
> + *
> + * Returns 0 upon success and a negative error code upon failure.
> + * The caller must hold z->mutex.
> + */
> +static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
> +			  struct fio_zone_info *z)
> +{
> +	int ret;
> +
> +	ret = __zbd_reset_zone(td, f, z);
> +	if (ret)
> +		goto done;
> +
> +	pthread_mutex_lock(&f->zbd_info->mutex);
> +	zbd_write_zone_put(td, f, z);
> +	pthread_mutex_unlock(&f->zbd_info->mutex);
> +
> +done:
> +	return ret;
> +}

This would look better as:

{
	int ret;

	ret = __zbd_reset_zone(td, f, z);
	if (ret)
		return ret;

	pthread_mutex_lock(&f->zbd_info->mutex);
	zbd_write_zone_put(td, f, z);
	pthread_mutex_unlock(&f->zbd_info->mutex);
	return 0;
}

to avoid that goto. Goto only makes sense if there's common cleanup to
do for multiple error cases, not for a single failure where you can just
return the error.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/7] zbd: fix write zone accounting of trim workload
  2023-06-07 13:15   ` Niklas Cassel
@ 2023-06-08  5:48     ` Shinichiro Kawasaki
  2023-06-08  8:10       ` Niklas Cassel
  0 siblings, 1 reply; 16+ messages in thread
From: Shinichiro Kawasaki @ 2023-06-08  5:48 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: fio, Jens Axboe, Vincent Fu, Damien Le Moal, Dmitry Fomichev

On Jun 07, 2023 / 13:15, Niklas Cassel wrote:
> On Wed, Jun 07, 2023 at 05:32:27PM +0900, Shin'ichiro Kawasaki wrote:
> > The commit e3be810bf0fd ("zbd: Support zone reset by trim") supported
> > trim for zonemode=zbd by introducing the function zbd_do_io_u_trim(),
> > which calls zbd_reset_zone(). However, it did not call
> > zbd_write_zone_put() to the trim target zone, then trim operation
> > resulted in wrong accounting of write zones.
> > 
> > To fix the issue, call zbd_write_zone_put() from zbd_reset_zone(). To
> > cover the case to reset zones without a zbd_write_zone_put() call,
> > prepare another function __zbd_reset_zone(). While at it, simplify
> > zbd_reset_zones() by calling the modified zbd_reset_zone().
> > 
> > Of note is that the constifier of the argument td of do_io_u_trim() is
> > removed since zbd_write_zone_put() requires changes in that argument.
> > 
> > Fixes: e3be810bf0fd ("zbd: Support zone reset by trim")
> > Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  engines/io_uring.c |  2 +-
> >  io_u.c             |  2 +-
> >  io_u.h             |  2 +-
> >  zbd.c              | 40 ++++++++++++++++++++++++++++++++--------
> >  zbd.h              |  2 +-
> >  5 files changed, 36 insertions(+), 12 deletions(-)
> > 
> > diff --git a/engines/io_uring.c b/engines/io_uring.c
> > index ff64fc9f..73e4a27a 100644
> > --- a/engines/io_uring.c
> > +++ b/engines/io_uring.c
> > @@ -561,7 +561,7 @@ static inline void fio_ioring_cmdprio_prep(struct thread_data *td,
> >  		ld->sqes[io_u->index].ioprio = io_u->ioprio;
> >  }
> >  
> > -static int fio_ioring_cmd_io_u_trim(const struct thread_data *td,
> > +static int fio_ioring_cmd_io_u_trim(struct thread_data *td,
> >  				    struct io_u *io_u)
> >  {
> >  	struct fio_file *f = io_u->file;
> > diff --git a/io_u.c b/io_u.c
> > index 6f5fc94d..faf512e5 100644
> > --- a/io_u.c
> > +++ b/io_u.c
> > @@ -2379,7 +2379,7 @@ int do_io_u_sync(const struct thread_data *td, struct io_u *io_u)
> >  	return ret;
> >  }
> >  
> > -int do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
> > +int do_io_u_trim(struct thread_data *td, struct io_u *io_u)
> >  {
> >  #ifndef FIO_HAVE_TRIM
> >  	io_u->error = EINVAL;
> > diff --git a/io_u.h b/io_u.h
> > index 55b4d083..b432a540 100644
> > --- a/io_u.h
> > +++ b/io_u.h
> > @@ -162,7 +162,7 @@ void io_u_mark_submit(struct thread_data *, unsigned int);
> >  bool queue_full(const struct thread_data *);
> >  
> >  int do_io_u_sync(const struct thread_data *, struct io_u *);
> > -int do_io_u_trim(const struct thread_data *, struct io_u *);
> > +int do_io_u_trim(struct thread_data *, struct io_u *);
> >  
> >  #ifdef FIO_INC_DEBUG
> >  static inline void dprint_io_u(struct io_u *io_u, const char *p)
> > diff --git a/zbd.c b/zbd.c
> > index 36b68d62..7eef49c6 100644
> > --- a/zbd.c
> > +++ b/zbd.c
> > @@ -254,7 +254,7 @@ static int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
> >  }
> >  
> >  /**
> > - * zbd_reset_zone - reset the write pointer of a single zone
> > + * __zbd_reset_zone - reset the write pointer of a single zone
> >   * @td: FIO thread data.
> >   * @f: FIO file associated with the disk for which to reset a write pointer.
> >   * @z: Zone to reset.
> > @@ -263,8 +263,8 @@ static int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
> >   *
> >   * The caller must hold z->mutex.
> >   */
> > -static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
> > -			  struct fio_zone_info *z)
> > +static int __zbd_reset_zone(struct thread_data *td, struct fio_file *f,
> > +			    struct fio_zone_info *z)
> >  {
> >  	uint64_t offset = z->start;
> >  	uint64_t length = (z+1)->start - offset;
> > @@ -339,6 +339,33 @@ static void zbd_write_zone_put(struct thread_data *td, const struct fio_file *f,
> >  	z->write = 0;
> >  }
> >  
> > +/**
> > + * zbd_reset_zone - reset the write pointer of a single zone and remove the zone
> > + *                  from the array of write zones.
> > + * @td: FIO thread data.
> > + * @f: FIO file associated with the disk for which to reset a write pointer.
> > + * @z: Zone to reset.
> > + *
> > + * Returns 0 upon success and a negative error code upon failure.
> 
> Please add an empty line before the "The caller must hold z->mutex" line,
> such that we are consistent with other functions, e.g. __zbd_reset_zone().

Thanks, will add in v2.

> 
> > + * The caller must hold z->mutex.
> > + */
> > +static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
> > +			  struct fio_zone_info *z)
> > +{
> > +	int ret;
> > +
> > +	ret = __zbd_reset_zone(td, f, z);
> > +	if (ret)
> > +		goto done;
> > +
> > +	pthread_mutex_lock(&f->zbd_info->mutex);
> > +	zbd_write_zone_put(td, f, z);
> > +	pthread_mutex_unlock(&f->zbd_info->mutex);
> > +
> > +done:
> > +	return ret;
> > +}
> > +
> >  /**
> >   * zbd_finish_zone - finish the specified zone
> >   * @td: FIO thread data.
> > @@ -404,9 +431,6 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
> >  			continue;
> >  
> >  		zone_lock(td, f, z);
> > -		pthread_mutex_lock(&f->zbd_info->mutex);
> > -		zbd_write_zone_put(td, f, z);
> > -		pthread_mutex_unlock(&f->zbd_info->mutex);
> >  
> >  		if (z->wp != z->start) {
> >  			dprint(FD_ZBD, "%s: resetting zone %u\n",
> > @@ -2048,7 +2072,7 @@ retry:
> >  			 */
> >  			io_u_quiesce(td);
> >  			zb->reset_zone = 0;
> > -			if (zbd_reset_zone(td, f, zb) < 0)
> > +			if (__zbd_reset_zone(td, f, zb) < 0)
> 
> Since you remove the zbd_write_zone_put() call, which I think is good,
> I think that you should continue to call zbd_reset_zone() here.

The two hunks above are difficult for review. The 1st hunk which removes the
zbd_write_zone_put() is a change for zbd_reset_zones(). zbd_reset_zones() no
longer needs to call zbd_write_zone_put(), since zbd_reset_zone() (no s) calls
zbd_write_zone_put().

The 2nd hunk is a change in zbd_adjust_block(). The zone reset here is done
just before the write command issue to the zone, so zbd_write_zone_put() should
not be called. Then zbd_reset_zone() should be replaced with __zbd_reset_zone().

> 
> >  				goto eof;
> >  
> >  			if (zb->capacity < min_bs) {
> > @@ -2167,7 +2191,7 @@ char *zbd_write_status(const struct thread_stat *ts)
> >   * Return io_u_completed when reset zone succeeds. Return 0 when the target zone
> >   * does not have write pointer. On error, return negative errno.
> >   */
> > -int zbd_do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
> > +int zbd_do_io_u_trim(struct thread_data *td, struct io_u *io_u)
> >  {
> >  	struct fio_file *f = io_u->file;
> >  	struct fio_zone_info *z;
> > diff --git a/zbd.h b/zbd.h
> > index 25a3e0f1..f0ac9876 100644
> > --- a/zbd.h
> > +++ b/zbd.h
> > @@ -100,7 +100,7 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
> >  			      enum fio_ddir ddir);
> >  enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
> >  char *zbd_write_status(const struct thread_stat *ts);
> > -int zbd_do_io_u_trim(const struct thread_data *td, struct io_u *io_u);
> > +int zbd_do_io_u_trim(struct thread_data *td, struct io_u *io_u);
> >  
> >  static inline void zbd_close_file(struct fio_file *f)
> >  {
> > -- 
> > 2.40.1
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/7] zbd: fix write zone accounting of trim workload
  2023-06-07 15:32   ` Jens Axboe
@ 2023-06-08  5:49     ` Shinichiro Kawasaki
  0 siblings, 0 replies; 16+ messages in thread
From: Shinichiro Kawasaki @ 2023-06-08  5:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: fio, Vincent Fu, Damien Le Moal, Dmitry Fomichev, Niklas Cassel

On Jun 07, 2023 / 09:32, Jens Axboe wrote:
> On 6/7/23 2:32?AM, Shin'ichiro Kawasaki wrote:
> > @@ -339,6 +339,33 @@ static void zbd_write_zone_put(struct thread_data *td, const struct fio_file *f,
> >  	z->write = 0;
> >  }
> >  
> > +/**
> > + * zbd_reset_zone - reset the write pointer of a single zone and remove the zone
> > + *                  from the array of write zones.
> > + * @td: FIO thread data.
> > + * @f: FIO file associated with the disk for which to reset a write pointer.
> > + * @z: Zone to reset.
> > + *
> > + * Returns 0 upon success and a negative error code upon failure.
> > + * The caller must hold z->mutex.
> > + */
> > +static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
> > +			  struct fio_zone_info *z)
> > +{
> > +	int ret;
> > +
> > +	ret = __zbd_reset_zone(td, f, z);
> > +	if (ret)
> > +		goto done;
> > +
> > +	pthread_mutex_lock(&f->zbd_info->mutex);
> > +	zbd_write_zone_put(td, f, z);
> > +	pthread_mutex_unlock(&f->zbd_info->mutex);
> > +
> > +done:
> > +	return ret;
> > +}
> 
> This would look better as:
> 
> {
> 	int ret;
> 
> 	ret = __zbd_reset_zone(td, f, z);
> 	if (ret)
> 		return ret;
> 
> 	pthread_mutex_lock(&f->zbd_info->mutex);
> 	zbd_write_zone_put(td, f, z);
> 	pthread_mutex_unlock(&f->zbd_info->mutex);
> 	return 0;
> }
> 
> to avoid that goto. Goto only makes sense if there's common cleanup to
> do for multiple error cases, not for a single failure where you can just
> return the error.

Thanks for the comment. Will reflect in v2.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/7] zbd: fix write zone accounting of trim workload
  2023-06-08  5:48     ` Shinichiro Kawasaki
@ 2023-06-08  8:10       ` Niklas Cassel
  0 siblings, 0 replies; 16+ messages in thread
From: Niklas Cassel @ 2023-06-08  8:10 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: fio, Jens Axboe, Vincent Fu, Damien Le Moal, Dmitry Fomichev

On Thu, Jun 08, 2023 at 05:48:41AM +0000, Shinichiro Kawasaki wrote:
> On Jun 07, 2023 / 13:15, Niklas Cassel wrote:
> > On Wed, Jun 07, 2023 at 05:32:27PM +0900, Shin'ichiro Kawasaki wrote:

(snip)

> > > @@ -404,9 +431,6 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
> > >  			continue;
> > >  
> > >  		zone_lock(td, f, z);
> > > -		pthread_mutex_lock(&f->zbd_info->mutex);
> > > -		zbd_write_zone_put(td, f, z);
> > > -		pthread_mutex_unlock(&f->zbd_info->mutex);
> > >  
> > >  		if (z->wp != z->start) {
> > >  			dprint(FD_ZBD, "%s: resetting zone %u\n",
> > > @@ -2048,7 +2072,7 @@ retry:
> > >  			 */
> > >  			io_u_quiesce(td);
> > >  			zb->reset_zone = 0;
> > > -			if (zbd_reset_zone(td, f, zb) < 0)
> > > +			if (__zbd_reset_zone(td, f, zb) < 0)
> > 
> > Since you remove the zbd_write_zone_put() call, which I think is good,
> > I think that you should continue to call zbd_reset_zone() here.
> 
> The two hunks above are difficult for review. The 1st hunk which removes the
> zbd_write_zone_put() is a change for zbd_reset_zones(). zbd_reset_zones() no
> longer needs to call zbd_write_zone_put(), since zbd_reset_zone() (no s) calls
> zbd_write_zone_put().
> 
> The 2nd hunk is a change in zbd_adjust_block(). The zone reset here is done
> just before the write command issue to the zone, so zbd_write_zone_put() should
> not be called. Then zbd_reset_zone() should be replaced with __zbd_reset_zone().

You are absolutely correct!

I confused the hunk at:
@@ retry:

For being for zbd_reset_zones(), but it really is for zbd_adjust_block().
(And zbd_reset_zones() still calls zbd_reset_zone() as it should.)

I really hate the patch format for using labels instead the actual function
name... Especially since you can have the same label in different functions.

Sorry for the noise.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-06-08  8:10 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-07  8:32 [PATCH 0/7] zbd: clean up code and fix bugs for open zones accounting Shin'ichiro Kawasaki
2023-06-07  8:32 ` [PATCH 1/7] zbd: rename 'open zones' to 'write zones' Shin'ichiro Kawasaki
2023-06-07 13:15   ` Niklas Cassel
2023-06-07  8:32 ` [PATCH 2/7] zbd: do not reset extra zones in open conditions Shin'ichiro Kawasaki
2023-06-07 13:15   ` Niklas Cassel
2023-06-07  8:32 ` [PATCH 3/7] zbd: fix write zone accounting of almost full zones Shin'ichiro Kawasaki
2023-06-07 13:15   ` Niklas Cassel
2023-06-07  8:32 ` [PATCH 4/7] zbd: fix write zone accounting of trim workload Shin'ichiro Kawasaki
2023-06-07 13:15   ` Niklas Cassel
2023-06-08  5:48     ` Shinichiro Kawasaki
2023-06-08  8:10       ` Niklas Cassel
2023-06-07 15:32   ` Jens Axboe
2023-06-08  5:49     ` Shinichiro Kawasaki
2023-06-07  8:32 ` [PATCH 5/7] t/zbd: reset zones before tests with max_open_zones option Shin'ichiro Kawasaki
2023-06-07  8:32 ` [PATCH 6/7] t/zbd: test write zone accounting of almost full zones Shin'ichiro Kawasaki
2023-06-07  8:32 ` [PATCH 7/7] t/zbd: test write zone accounting of trim workload Shin'ichiro Kawasaki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.