* [PATCH] zbd: Fix unexpected job termination by open zone search failure
@ 2021-09-30 0:02 Shin'ichiro Kawasaki
2021-09-30 8:46 ` Niklas Cassel
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Shin'ichiro Kawasaki @ 2021-09-30 0:02 UTC (permalink / raw)
To: fio, Jens Axboe
Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel, Shinichiro Kawasaki
Test case #46 in t/zbd/test-zbd-support fails when it is repeated
hundreds of times on null_blk zoned devices. The test case uses libaio
IO engine to run 8 random write jobs on 4 sequential write required
zones. When all of the 4 zones get almost full but still open for
in-flight writes, the helper function zbd_convert_to_open_zone() fails
to get an opened zone for next write. This results in unexpected job
termination.
To avoid the unexpected job termination, retry the steps in
zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
that the in-flight writes get completed.
To prevent infinite loop by the retry, retry only when any IOs are
in-flight or in-flight IOs get completed. To check in-flight IO count of
all jobs, add a new helper function any_io_in_flight().
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
zbd.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/zbd.c b/zbd.c
index 64415d2b..c0b0b81c 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1204,6 +1204,19 @@ static uint32_t pick_random_zone_idx(const struct fio_file *f,
f->io_size;
}
+static bool any_io_in_flight(void)
+{
+ struct thread_data *td;
+ int i;
+
+ for_each_td(td, i) {
+ if (td->io_u_in_flight)
+ return true;
+ }
+
+ return false;
+}
+
/*
* Modify the offset of an I/O unit that does not refer to an open zone such
* that it refers to an open zone. Close an open zone and open a new zone if
@@ -1223,6 +1236,8 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
uint32_t zone_idx, new_zone_idx;
int i;
bool wait_zone_close;
+ bool in_flight;
+ bool should_retry = true;
assert(is_valid_offset(f, io_u->offset));
@@ -1337,6 +1352,7 @@ open_other_zone:
io_u_quiesce(td);
}
+retry:
/* Zone 'z' is full, so try to open a new zone. */
for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
zone_idx++;
@@ -1376,6 +1392,24 @@ open_other_zone:
goto out;
pthread_mutex_lock(&zbdi->mutex);
}
+
+ /*
+ * When any I/O is in-flight or when all I/Os in-flight get completed,
+ * the I/Os might have closed zones then retry the steps to open a zone.
+ * Before retry, call io_u_quiesce() to complete in-flight writes.
+ */
+ in_flight = any_io_in_flight();
+ if (in_flight || should_retry) {
+ dprint(FD_ZBD, "%s(%s): wait zone close and retry open zones\n",
+ __func__, f->file_name);
+ pthread_mutex_unlock(&zbdi->mutex);
+ zone_unlock(z);
+ io_u_quiesce(td);
+ zone_lock(td, f, z);
+ should_retry = in_flight;
+ goto retry;
+ }
+
pthread_mutex_unlock(&zbdi->mutex);
zone_unlock(z);
dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,
--
2.31.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] zbd: Fix unexpected job termination by open zone search failure
2021-09-30 0:02 [PATCH] zbd: Fix unexpected job termination by open zone search failure Shin'ichiro Kawasaki
@ 2021-09-30 8:46 ` Niklas Cassel
2021-09-30 15:28 ` Dmitry Fomichev
2021-09-30 16:05 ` Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Niklas Cassel @ 2021-09-30 8:46 UTC (permalink / raw)
To: Shinichiro Kawasaki; +Cc: fio, Jens Axboe, Damien Le Moal, Dmitry Fomichev
On Thu, Sep 30, 2021 at 09:02:36AM +0900, Shin'ichiro Kawasaki wrote:
> Test case #46 in t/zbd/test-zbd-support fails when it is repeated
> hundreds of times on null_blk zoned devices. The test case uses libaio
> IO engine to run 8 random write jobs on 4 sequential write required
> zones. When all of the 4 zones get almost full but still open for
> in-flight writes, the helper function zbd_convert_to_open_zone() fails
> to get an opened zone for next write. This results in unexpected job
> termination.
>
> To avoid the unexpected job termination, retry the steps in
> zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
> that the in-flight writes get completed.
>
> To prevent infinite loop by the retry, retry only when any IOs are
> in-flight or in-flight IOs get completed. To check in-flight IO count of
> all jobs, add a new helper function any_io_in_flight().
>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
> zbd.c | 34 ++++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/zbd.c b/zbd.c
> index 64415d2b..c0b0b81c 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -1204,6 +1204,19 @@ static uint32_t pick_random_zone_idx(const struct fio_file *f,
> f->io_size;
> }
>
> +static bool any_io_in_flight(void)
> +{
> + struct thread_data *td;
> + int i;
> +
> + for_each_td(td, i) {
> + if (td->io_u_in_flight)
> + return true;
> + }
> +
> + return false;
> +}
> +
> /*
> * Modify the offset of an I/O unit that does not refer to an open zone such
> * that it refers to an open zone. Close an open zone and open a new zone if
> @@ -1223,6 +1236,8 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
> uint32_t zone_idx, new_zone_idx;
> int i;
> bool wait_zone_close;
> + bool in_flight;
> + bool should_retry = true;
>
> assert(is_valid_offset(f, io_u->offset));
>
> @@ -1337,6 +1352,7 @@ open_other_zone:
> io_u_quiesce(td);
> }
>
> +retry:
> /* Zone 'z' is full, so try to open a new zone. */
> for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
> zone_idx++;
> @@ -1376,6 +1392,24 @@ open_other_zone:
> goto out;
> pthread_mutex_lock(&zbdi->mutex);
> }
> +
> + /*
> + * When any I/O is in-flight or when all I/Os in-flight get completed,
> + * the I/Os might have closed zones then retry the steps to open a zone.
> + * Before retry, call io_u_quiesce() to complete in-flight writes.
> + */
> + in_flight = any_io_in_flight();
> + if (in_flight || should_retry) {
> + dprint(FD_ZBD, "%s(%s): wait zone close and retry open zones\n",
> + __func__, f->file_name);
> + pthread_mutex_unlock(&zbdi->mutex);
> + zone_unlock(z);
> + io_u_quiesce(td);
> + zone_lock(td, f, z);
> + should_retry = in_flight;
> + goto retry;
> + }
> +
> pthread_mutex_unlock(&zbdi->mutex);
> zone_unlock(z);
> dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,
> --
> 2.31.1
>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] zbd: Fix unexpected job termination by open zone search failure
2021-09-30 0:02 [PATCH] zbd: Fix unexpected job termination by open zone search failure Shin'ichiro Kawasaki
2021-09-30 8:46 ` Niklas Cassel
@ 2021-09-30 15:28 ` Dmitry Fomichev
2021-09-30 16:05 ` Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Dmitry Fomichev @ 2021-09-30 15:28 UTC (permalink / raw)
To: fio, axboe, Shinichiro Kawasaki; +Cc: Damien Le Moal, Niklas Cassel
On Thu, 2021-09-30 at 09:02 +0900, Shin'ichiro Kawasaki wrote:
> Test case #46 in t/zbd/test-zbd-support fails when it is repeated
> hundreds of times on null_blk zoned devices. The test case uses libaio
> IO engine to run 8 random write jobs on 4 sequential write required
> zones. When all of the 4 zones get almost full but still open for
> in-flight writes, the helper function zbd_convert_to_open_zone() fails
> to get an opened zone for next write. This results in unexpected job
> termination.
>
> To avoid the unexpected job termination, retry the steps in
> zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
> that the in-flight writes get completed.
>
> To prevent infinite loop by the retry, retry only when any IOs are
> in-flight or in-flight IOs get completed. To check in-flight IO count
> of
> all jobs, add a new helper function any_io_in_flight().
>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Looks good,
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> ---
> zbd.c | 34 ++++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/zbd.c b/zbd.c
> index 64415d2b..c0b0b81c 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -1204,6 +1204,19 @@ static uint32_t pick_random_zone_idx(const
> struct fio_file *f,
> f->io_size;
> }
>
> +static bool any_io_in_flight(void)
> +{
> + struct thread_data *td;
> + int i;
> +
> + for_each_td(td, i) {
> + if (td->io_u_in_flight)
> + return true;
> + }
> +
> + return false;
> +}
> +
> /*
> * Modify the offset of an I/O unit that does not refer to an open
> zone such
> * that it refers to an open zone. Close an open zone and open a new
> zone if
> @@ -1223,6 +1236,8 @@ static struct fio_zone_info
> *zbd_convert_to_open_zone(struct thread_data *td,
> uint32_t zone_idx, new_zone_idx;
> int i;
> bool wait_zone_close;
> + bool in_flight;
> + bool should_retry = true;
>
> assert(is_valid_offset(f, io_u->offset));
>
> @@ -1337,6 +1352,7 @@ open_other_zone:
> io_u_quiesce(td);
> }
>
> +retry:
> /* Zone 'z' is full, so try to open a new zone. */
> for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
> zone_idx++;
> @@ -1376,6 +1392,24 @@ open_other_zone:
> goto out;
> pthread_mutex_lock(&zbdi->mutex);
> }
> +
> + /*
> + * When any I/O is in-flight or when all I/Os in-flight get
> completed,
> + * the I/Os might have closed zones then retry the steps to
> open a zone.
> + * Before retry, call io_u_quiesce() to complete in-flight
> writes.
> + */
> + in_flight = any_io_in_flight();
> + if (in_flight || should_retry) {
> + dprint(FD_ZBD, "%s(%s): wait zone close and retry open
> zones\n",
> + __func__, f->file_name);
> + pthread_mutex_unlock(&zbdi->mutex);
> + zone_unlock(z);
> + io_u_quiesce(td);
> + zone_lock(td, f, z);
> + should_retry = in_flight;
> + goto retry;
> + }
> +
> pthread_mutex_unlock(&zbdi->mutex);
> zone_unlock(z);
> dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] zbd: Fix unexpected job termination by open zone search failure
2021-09-30 0:02 [PATCH] zbd: Fix unexpected job termination by open zone search failure Shin'ichiro Kawasaki
2021-09-30 8:46 ` Niklas Cassel
2021-09-30 15:28 ` Dmitry Fomichev
@ 2021-09-30 16:05 ` Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2021-09-30 16:05 UTC (permalink / raw)
To: Shin'ichiro Kawasaki, fio
Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel
On 9/29/21 6:02 PM, Shin'ichiro Kawasaki wrote:
> Test case #46 in t/zbd/test-zbd-support fails when it is repeated
> hundreds of times on null_blk zoned devices. The test case uses libaio
> IO engine to run 8 random write jobs on 4 sequential write required
> zones. When all of the 4 zones get almost full but still open for
> in-flight writes, the helper function zbd_convert_to_open_zone() fails
> to get an opened zone for next write. This results in unexpected job
> termination.
>
> To avoid the unexpected job termination, retry the steps in
> zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
> that the in-flight writes get completed.
>
> To prevent infinite loop by the retry, retry only when any IOs are
> in-flight or in-flight IOs get completed. To check in-flight IO count of
> all jobs, add a new helper function any_io_in_flight().
Applied, thanks.
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-09-30 16:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30 0:02 [PATCH] zbd: Fix unexpected job termination by open zone search failure Shin'ichiro Kawasaki
2021-09-30 8:46 ` Niklas Cassel
2021-09-30 15:28 ` Dmitry Fomichev
2021-09-30 16:05 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.