All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] zbd: Fix unexpected job termination by open zone search failure
@ 2021-09-30  0:02 Shin'ichiro Kawasaki
  2021-09-30  8:46 ` Niklas Cassel
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Shin'ichiro Kawasaki @ 2021-09-30  0:02 UTC (permalink / raw)
  To: fio, Jens Axboe
  Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel, Shinichiro Kawasaki

Test case #46 in t/zbd/test-zbd-support fails when it is repeated
hundreds of times on null_blk zoned devices. The test case uses libaio
IO engine to run 8 random write jobs on 4 sequential write required
zones. When all of the 4 zones get almost full but still open for
in-flight writes, the helper function zbd_convert_to_open_zone() fails
to get an opened zone for next write. This results in unexpected job
termination.

To avoid the unexpected job termination, retry the steps in
zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
that the in-flight writes get completed.

To prevent infinite loop by the retry, retry only when any IOs are
in-flight or in-flight IOs get completed. To check in-flight IO count of
all jobs, add a new helper function any_io_in_flight().

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 zbd.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/zbd.c b/zbd.c
index 64415d2b..c0b0b81c 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1204,6 +1204,19 @@ static uint32_t pick_random_zone_idx(const struct fio_file *f,
 		f->io_size;
 }
 
+static bool any_io_in_flight(void)
+{
+	struct thread_data *td;
+	int i;
+
+	for_each_td(td, i) {
+		if (td->io_u_in_flight)
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * Modify the offset of an I/O unit that does not refer to an open zone such
  * that it refers to an open zone. Close an open zone and open a new zone if
@@ -1223,6 +1236,8 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	uint32_t zone_idx, new_zone_idx;
 	int i;
 	bool wait_zone_close;
+	bool in_flight;
+	bool should_retry = true;
 
 	assert(is_valid_offset(f, io_u->offset));
 
@@ -1337,6 +1352,7 @@ open_other_zone:
 		io_u_quiesce(td);
 	}
 
+retry:
 	/* Zone 'z' is full, so try to open a new zone. */
 	for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
 		zone_idx++;
@@ -1376,6 +1392,24 @@ open_other_zone:
 			goto out;
 		pthread_mutex_lock(&zbdi->mutex);
 	}
+
+	/*
+	 * When any I/O is in-flight or when all I/Os in-flight get completed,
+	 * the I/Os might have closed zones then retry the steps to open a zone.
+	 * Before retry, call io_u_quiesce() to complete in-flight writes.
+	 */
+	in_flight = any_io_in_flight();
+	if (in_flight || should_retry) {
+		dprint(FD_ZBD, "%s(%s): wait zone close and retry open zones\n",
+		       __func__, f->file_name);
+		pthread_mutex_unlock(&zbdi->mutex);
+		zone_unlock(z);
+		io_u_quiesce(td);
+		zone_lock(td, f, z);
+		should_retry = in_flight;
+		goto retry;
+	}
+
 	pthread_mutex_unlock(&zbdi->mutex);
 	zone_unlock(z);
 	dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] zbd: Fix unexpected job termination by open zone search failure
  2021-09-30  0:02 [PATCH] zbd: Fix unexpected job termination by open zone search failure Shin'ichiro Kawasaki
@ 2021-09-30  8:46 ` Niklas Cassel
  2021-09-30 15:28 ` Dmitry Fomichev
  2021-09-30 16:05 ` Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Niklas Cassel @ 2021-09-30  8:46 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: fio, Jens Axboe, Damien Le Moal, Dmitry Fomichev

On Thu, Sep 30, 2021 at 09:02:36AM +0900, Shin'ichiro Kawasaki wrote:
> Test case #46 in t/zbd/test-zbd-support fails when it is repeated
> hundreds of times on null_blk zoned devices. The test case uses libaio
> IO engine to run 8 random write jobs on 4 sequential write required
> zones. When all of the 4 zones get almost full but still open for
> in-flight writes, the helper function zbd_convert_to_open_zone() fails
> to get an opened zone for next write. This results in unexpected job
> termination.
> 
> To avoid the unexpected job termination, retry the steps in
> zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
> that the in-flight writes get completed.
> 
> To prevent infinite loop by the retry, retry only when any IOs are
> in-flight or in-flight IOs get completed. To check in-flight IO count of
> all jobs, add a new helper function any_io_in_flight().
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  zbd.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/zbd.c b/zbd.c
> index 64415d2b..c0b0b81c 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -1204,6 +1204,19 @@ static uint32_t pick_random_zone_idx(const struct fio_file *f,
>  		f->io_size;
>  }
>  
> +static bool any_io_in_flight(void)
> +{
> +	struct thread_data *td;
> +	int i;
> +
> +	for_each_td(td, i) {
> +		if (td->io_u_in_flight)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>  /*
>   * Modify the offset of an I/O unit that does not refer to an open zone such
>   * that it refers to an open zone. Close an open zone and open a new zone if
> @@ -1223,6 +1236,8 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
>  	uint32_t zone_idx, new_zone_idx;
>  	int i;
>  	bool wait_zone_close;
> +	bool in_flight;
> +	bool should_retry = true;
>  
>  	assert(is_valid_offset(f, io_u->offset));
>  
> @@ -1337,6 +1352,7 @@ open_other_zone:
>  		io_u_quiesce(td);
>  	}
>  
> +retry:
>  	/* Zone 'z' is full, so try to open a new zone. */
>  	for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
>  		zone_idx++;
> @@ -1376,6 +1392,24 @@ open_other_zone:
>  			goto out;
>  		pthread_mutex_lock(&zbdi->mutex);
>  	}
> +
> +	/*
> +	 * When any I/O is in-flight or when all I/Os in-flight get completed,
> +	 * the I/Os might have closed zones then retry the steps to open a zone.
> +	 * Before retry, call io_u_quiesce() to complete in-flight writes.
> +	 */
> +	in_flight = any_io_in_flight();
> +	if (in_flight || should_retry) {
> +		dprint(FD_ZBD, "%s(%s): wait zone close and retry open zones\n",
> +		       __func__, f->file_name);
> +		pthread_mutex_unlock(&zbdi->mutex);
> +		zone_unlock(z);
> +		io_u_quiesce(td);
> +		zone_lock(td, f, z);
> +		should_retry = in_flight;
> +		goto retry;
> +	}
> +
>  	pthread_mutex_unlock(&zbdi->mutex);
>  	zone_unlock(z);
>  	dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,
> -- 
> 2.31.1
> 

Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] zbd: Fix unexpected job termination by open zone search failure
  2021-09-30  0:02 [PATCH] zbd: Fix unexpected job termination by open zone search failure Shin'ichiro Kawasaki
  2021-09-30  8:46 ` Niklas Cassel
@ 2021-09-30 15:28 ` Dmitry Fomichev
  2021-09-30 16:05 ` Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Dmitry Fomichev @ 2021-09-30 15:28 UTC (permalink / raw)
  To: fio, axboe, Shinichiro Kawasaki; +Cc: Damien Le Moal, Niklas Cassel

On Thu, 2021-09-30 at 09:02 +0900, Shin'ichiro Kawasaki wrote:
> Test case #46 in t/zbd/test-zbd-support fails when it is repeated
> hundreds of times on null_blk zoned devices. The test case uses libaio
> IO engine to run 8 random write jobs on 4 sequential write required
> zones. When all of the 4 zones get almost full but still open for
> in-flight writes, the helper function zbd_convert_to_open_zone() fails
> to get an opened zone for next write. This results in unexpected job
> termination.
> 
> To avoid the unexpected job termination, retry the steps in
> zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
> that the in-flight writes get completed.
> 
> To prevent infinite loop by the retry, retry only when any IOs are
> in-flight or in-flight IOs get completed. To check in-flight IO count
> of
> all jobs, add a new helper function any_io_in_flight().
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

Looks good,
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>

> ---
>  zbd.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/zbd.c b/zbd.c
> index 64415d2b..c0b0b81c 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -1204,6 +1204,19 @@ static uint32_t pick_random_zone_idx(const
> struct fio_file *f,
>                 f->io_size;
>  }
>  
> +static bool any_io_in_flight(void)
> +{
> +       struct thread_data *td;
> +       int i;
> +
> +       for_each_td(td, i) {
> +               if (td->io_u_in_flight)
> +                       return true;
> +       }
> +
> +       return false;
> +}
> +
>  /*
>   * Modify the offset of an I/O unit that does not refer to an open
> zone such
>   * that it refers to an open zone. Close an open zone and open a new
> zone if
> @@ -1223,6 +1236,8 @@ static struct fio_zone_info
> *zbd_convert_to_open_zone(struct thread_data *td,
>         uint32_t zone_idx, new_zone_idx;
>         int i;
>         bool wait_zone_close;
> +       bool in_flight;
> +       bool should_retry = true;
>  
>         assert(is_valid_offset(f, io_u->offset));
>  
> @@ -1337,6 +1352,7 @@ open_other_zone:
>                 io_u_quiesce(td);
>         }
>  
> +retry:
>         /* Zone 'z' is full, so try to open a new zone. */
>         for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
>                 zone_idx++;
> @@ -1376,6 +1392,24 @@ open_other_zone:
>                         goto out;
>                 pthread_mutex_lock(&zbdi->mutex);
>         }
> +
> +       /*
> +        * When any I/O is in-flight or when all I/Os in-flight get
> completed,
> +        * the I/Os might have closed zones then retry the steps to
> open a zone.
> +        * Before retry, call io_u_quiesce() to complete in-flight
> writes.
> +        */
> +       in_flight = any_io_in_flight();
> +       if (in_flight || should_retry) {
> +               dprint(FD_ZBD, "%s(%s): wait zone close and retry open
> zones\n",
> +                      __func__, f->file_name);
> +               pthread_mutex_unlock(&zbdi->mutex);
> +               zone_unlock(z);
> +               io_u_quiesce(td);
> +               zone_lock(td, f, z);
> +               should_retry = in_flight;
> +               goto retry;
> +       }
> +
>         pthread_mutex_unlock(&zbdi->mutex);
>         zone_unlock(z);
>         dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] zbd: Fix unexpected job termination by open zone search failure
  2021-09-30  0:02 [PATCH] zbd: Fix unexpected job termination by open zone search failure Shin'ichiro Kawasaki
  2021-09-30  8:46 ` Niklas Cassel
  2021-09-30 15:28 ` Dmitry Fomichev
@ 2021-09-30 16:05 ` Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2021-09-30 16:05 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, fio
  Cc: Damien Le Moal, Dmitry Fomichev, Niklas Cassel

On 9/29/21 6:02 PM, Shin'ichiro Kawasaki wrote:
> Test case #46 in t/zbd/test-zbd-support fails when it is repeated
> hundreds of times on null_blk zoned devices. The test case uses libaio
> IO engine to run 8 random write jobs on 4 sequential write required
> zones. When all of the 4 zones get almost full but still open for
> in-flight writes, the helper function zbd_convert_to_open_zone() fails
> to get an opened zone for next write. This results in unexpected job
> termination.
> 
> To avoid the unexpected job termination, retry the steps in
> zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
> that the in-flight writes get completed.
> 
> To prevent infinite loop by the retry, retry only when any IOs are
> in-flight or in-flight IOs get completed. To check in-flight IO count of
> all jobs, add a new helper function any_io_in_flight().

Applied, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-09-30 16:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30  0:02 [PATCH] zbd: Fix unexpected job termination by open zone search failure Shin'ichiro Kawasaki
2021-09-30  8:46 ` Niklas Cassel
2021-09-30 15:28 ` Dmitry Fomichev
2021-09-30 16:05 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.