From: Niklas Cassel <Niklas.Cassel@wdc.com>
To: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: "fio@vger.kernel.org" <fio@vger.kernel.org>,
Jens Axboe <axboe@kernel.dk>, Vincent Fu <vincentfu@gmail.com>,
Damien Le Moal <damien.lemoal@opensource.wdc.com>,
Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
Subject: Re: [PATCH v2 6/8] zbd: check write ranges for zone_reset_threshold option
Date: Tue, 7 Feb 2023 14:06:24 +0000 [thread overview]
Message-ID: <Y+Ja38Vlb49Owx0r@x1-carbon> (raw)
In-Reply-To: <20230207063739.1661191-7-shinichiro.kawasaki@wdc.com>
On Tue, Feb 07, 2023 at 03:37:37PM +0900, Shin'ichiro Kawasaki wrote:
> The valid data bytes accounting is used for zone_reset_threshold option.
> This accounting usage has two issues. The first issue is unexpected
> zone reset due to different IO ranges. The valid data bytes accounting
> is done for all IO ranges per device, and shared by all jobs. On the
> other hand, the zone_reset_threshold option is defined as the ratio to
> each job's IO range. When a job refers to the accounting value, it
> includes writes to IO ranges out of the job's IO range. Then zone reset
> is triggered earlier than expected.
>
> The second issue is accounting value initialization. The initialization
> of the accounting field is repeated for each job, then the value
> initialized by the first job is overwritten by other jobs. This works as
> expected for single job or multiple jobs with same write range. However,
> when multiple jobs have different write ranges, the overwritten value is
> wrong except for the last job.
>
> To ensure that the accounting works as expected for the option, check
> that write ranges of all jobs are same. If jobs have different write
> ranges, report it as an error. Initialize the accounting field only once
> for the first job. All jobs have same write range, then one time
> initialization is enough. Update man page to clarify this limitation of
> the option.
>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
> HOWTO.rst | 3 ++-
> fio.1 | 3 ++-
> zbd.c | 22 +++++++++++++++++++---
> zbd.h | 4 ++++
> 4 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/HOWTO.rst b/HOWTO.rst
> index d08f8a18..c71c3949 100644
> --- a/HOWTO.rst
> +++ b/HOWTO.rst
> @@ -1088,7 +1088,8 @@ Target file/device
> A number between zero and one that indicates the ratio of written bytes
> to the total size of the zones with write pointers in the IO range. When
> current ratio is above this ratio, zones are reset periodically as
> - :option:`zone_reset_frequency` specifies.
> + :option:`zone_reset_frequency` specifies. If there are multiple jobs when
> + using this option, the IO range for all write jobs shall be the same.
nit: perhaps replace "shall be the same" with "has to be the same"
>
> .. option:: zone_reset_frequency=float
>
> diff --git a/fio.1 b/fio.1
> index 54d2c403..80cd23b5 100644
> --- a/fio.1
> +++ b/fio.1
> @@ -857,7 +857,8 @@ value to be larger than the device reported limit. Default: false.
> A number between zero and one that indicates the ratio of written bytes to the
> total size of the zones with write pointers in the IO range. When current ratio
> is above this ratio, zones are reset periodically as \fBzone_reset_frequency\fR
> -specifies.
> +specifies. If there are multiple jobs when using this option, the IO range for
> +all write jobs shall be the same.
> .TP
> .BI zone_reset_frequency \fR=\fPfloat
> A number between zero and one that indicates how often a zone reset should be
> diff --git a/zbd.c b/zbd.c
> index 6783acf9..ca6816b9 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -1201,10 +1201,26 @@ static uint64_t zbd_set_vdb(struct thread_data *td, const struct fio_file *f)
> {
> struct fio_zone_info *zb, *ze, *z;
> uint64_t wp_vdb = 0;
> + struct zoned_block_device_info *zbdi = f->zbd_info;
>
> if (!accounting_vdb(td, f))
> return 0;
>
> + if (zbdi->wp_write_min_zone != zbdi->wp_write_max_zone) {
> + if (zbdi->wp_write_min_zone != f->min_zone ||
> + zbdi->wp_write_max_zone != f->max_zone) {
> + td_verror(td, EINVAL,
> + "multi-jobs with different write ranges are "
> + "not supported with zone_reset_threshold");
> + log_err("multi-jobs with different write ranges are "
> + "not supported with zone_reset_threshold\n");
> + }
> + return 0;
> + }
This check looks a bit weird.
What if the first job specifies a f->file_offset and an f->io_size that is
within the same zone? Then f->min_zone and f->max_zone will have the same value.
Which should cause wp_write_min_zone and wp_write_max_zone to have the same
value.
The check will thus not be performed for the second job, after wp_write_min_zone
and wp_write_min_zone have been initialized.
> +
> + zbdi->wp_write_min_zone = f->min_zone;
> + zbdi->wp_write_max_zone = f->max_zone;
> +
> zb = zbd_get_zone(f, f->min_zone);
> ze = zbd_get_zone(f, f->max_zone);
> for (z = zb; z < ze; z++) {
> @@ -1214,9 +1230,9 @@ static uint64_t zbd_set_vdb(struct thread_data *td, const struct fio_file *f)
> }
> }
>
> - pthread_mutex_lock(&f->zbd_info->mutex);
> - f->zbd_info->wp_valid_data_bytes = wp_vdb;
> - pthread_mutex_unlock(&f->zbd_info->mutex);
> + pthread_mutex_lock(&zbdi->mutex);
> + zbdi->wp_valid_data_bytes = wp_vdb;
> + pthread_mutex_unlock(&zbdi->mutex);
>
> for (z = zb; z < ze; z++)
> if (z->has_wp)
> diff --git a/zbd.h b/zbd.h
> index 20b2fe17..d4f81325 100644
> --- a/zbd.h
> +++ b/zbd.h
> @@ -55,6 +55,8 @@ struct fio_zone_info {
> * num_open_zones).
> * @zone_size: size of a single zone in bytes.
> * @wp_valid_data_bytes: total size of data in zones with write pointers
> + * @wp_write_min_zone: Minimum zone index of all job's write ranges. Inclusive.
> + * @wp_write_max_zone: Maximum zone index of all job's write ranges. Exclusive.
> * @zone_size_log2: log2 of the zone size in bytes if it is a power of 2 or 0
> * if the zone size is not a power of 2.
> * @nr_zones: number of zones
> @@ -75,6 +77,8 @@ struct zoned_block_device_info {
> pthread_mutex_t mutex;
> uint64_t zone_size;
> uint64_t wp_valid_data_bytes;
> + uint32_t wp_write_min_zone;
> + uint32_t wp_write_max_zone;
> uint32_t zone_size_log2;
> uint32_t nr_zones;
> uint32_t refcount;
> --
> 2.38.1
>
next prev parent reply other threads:[~2023-02-07 14:06 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-07 6:37 [PATCH v2 0/8] zbd: fix 'sectors with data' and zone_reset_threshold accounting issues Shin'ichiro Kawasaki
2023-02-07 6:37 ` [PATCH v2 1/8] zbd: refer file->last_start[] instead of sectors with data accounting Shin'ichiro Kawasaki
2023-02-07 14:05 ` Niklas Cassel
2023-02-07 6:37 ` [PATCH v2 2/8] zbd: remove CHECK_SWD feature Shin'ichiro Kawasaki
2023-02-07 14:05 ` Niklas Cassel
2023-02-07 6:37 ` [PATCH v2 3/8] zbd: rename the accounting 'sectors with data' to 'valid data bytes' Shin'ichiro Kawasaki
2023-02-07 14:06 ` Niklas Cassel
2023-02-07 6:37 ` [PATCH v2 4/8] doc: fix unit of zone_reset_threshold and relation to other option Shin'ichiro Kawasaki
2023-02-07 14:06 ` Niklas Cassel
2023-02-07 6:37 ` [PATCH v2 5/8] zbd: account valid data bytes only for zone_reset_threshold option Shin'ichiro Kawasaki
2023-02-07 14:06 ` Niklas Cassel
2023-02-07 6:37 ` [PATCH v2 6/8] zbd: check write ranges " Shin'ichiro Kawasaki
2023-02-07 14:06 ` Niklas Cassel [this message]
2023-02-08 4:59 ` Shinichiro Kawasaki
2023-02-07 6:37 ` [PATCH v2 7/8] zbd: initialize valid data bytes accounting at file set up Shin'ichiro Kawasaki
2023-02-07 14:06 ` Niklas Cassel
2023-02-08 5:03 ` Shinichiro Kawasaki
2023-02-08 8:30 ` Niklas Cassel
2023-02-07 6:37 ` [PATCH v2 8/8] t/zbd: add test cases for zone_reset_threshold option Shin'ichiro Kawasaki
2023-02-07 14:06 ` Niklas Cassel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y+Ja38Vlb49Owx0r@x1-carbon \
--to=niklas.cassel@wdc.com \
--cc=Dmitry.Fomichev@wdc.com \
--cc=axboe@kernel.dk \
--cc=damien.lemoal@opensource.wdc.com \
--cc=fio@vger.kernel.org \
--cc=shinichiro.kawasaki@wdc.com \
--cc=vincentfu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).