fio.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: "fio@vger.kernel.org" <fio@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Vincent Fu <vincentfu@gmail.com>,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>,
	Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
Subject: Re: [PATCH v2 6/8] zbd: check write ranges for zone_reset_threshold option
Date: Wed, 8 Feb 2023 04:59:20 +0000	[thread overview]
Message-ID: <20230208045919.s2s76gpdcp7n5mog@shindev> (raw)
In-Reply-To: <Y+Ja38Vlb49Owx0r@x1-carbon>

On Feb 07, 2023 / 14:06, Niklas Cassel wrote:
> On Tue, Feb 07, 2023 at 03:37:37PM +0900, Shin'ichiro Kawasaki wrote:
> > The valid data bytes accounting is used for zone_reset_threshold option.
> > This accounting usage has two issues. The first issue is unexpected
> > zone reset due to different IO ranges. The valid data bytes accounting
> > is done for all IO ranges per device, and shared by all jobs. On the
> > other hand, the zone_reset_threshold option is defined as the ratio to
> > each job's IO range. When a job refers to the accounting value, it
> > includes writes to IO ranges out of the job's IO range. Then zone reset
> > is triggered earlier than expected.
> > 
> > The second issue is accounting value initialization. The initialization
> > of the accounting field is repeated for each job, then the value
> > initialized by the first job is overwritten by other jobs. This works as
> > expected for single job or multiple jobs with same write range. However,
> > when multiple jobs have different write ranges, the overwritten value is
> > wrong except for the last job.
> > 
> > To ensure that the accounting works as expected for the option, check
> > that write ranges of all jobs are same. If jobs have different write
> > ranges, report it as an error. Initialize the accounting field only once
> > for the first job. All jobs have same write range, then one time
> > initialization is enough. Update man page to clarify this limitation of
> > the option.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  HOWTO.rst |  3 ++-
> >  fio.1     |  3 ++-
> >  zbd.c     | 22 +++++++++++++++++++---
> >  zbd.h     |  4 ++++
> >  4 files changed, 27 insertions(+), 5 deletions(-)
> > 
> > diff --git a/HOWTO.rst b/HOWTO.rst
> > index d08f8a18..c71c3949 100644
> > --- a/HOWTO.rst
> > +++ b/HOWTO.rst
> > @@ -1088,7 +1088,8 @@ Target file/device
> >  	A number between zero and one that indicates the ratio of written bytes
> >  	to the total size of the zones with write pointers in the IO range. When
> >  	current ratio is above this ratio, zones are reset periodically as
> > -	:option:`zone_reset_frequency` specifies.
> > +	:option:`zone_reset_frequency` specifies. If there are multiple jobs when
> > +	using this option, the IO range for all write jobs shall be the same.
> 
> nit: perhaps replace "shall be the same" with "has to be the same"

Will replace. I find "has to" is more consitent in the document than "shall".

> 
> >  
> >  .. option:: zone_reset_frequency=float
> >  
> > diff --git a/fio.1 b/fio.1
> > index 54d2c403..80cd23b5 100644
> > --- a/fio.1
> > +++ b/fio.1
> > @@ -857,7 +857,8 @@ value to be larger than the device reported limit. Default: false.
> >  A number between zero and one that indicates the ratio of written bytes to the
> >  total size of the zones with write pointers in the IO range. When current ratio
> >  is above this ratio, zones are reset periodically as \fBzone_reset_frequency\fR
> > -specifies.
> > +specifies. If there are multiple jobs when using this option, the IO range for
> > +all write jobs shall be the same.
> >  .TP
> >  .BI zone_reset_frequency \fR=\fPfloat
> >  A number between zero and one that indicates how often a zone reset should be
> > diff --git a/zbd.c b/zbd.c
> > index 6783acf9..ca6816b9 100644
> > --- a/zbd.c
> > +++ b/zbd.c
> > @@ -1201,10 +1201,26 @@ static uint64_t zbd_set_vdb(struct thread_data *td, const struct fio_file *f)
> >  {
> >  	struct fio_zone_info *zb, *ze, *z;
> >  	uint64_t wp_vdb = 0;
> > +	struct zoned_block_device_info *zbdi = f->zbd_info;
> >  
> >  	if (!accounting_vdb(td, f))
> >  		return 0;
> >  
> > +	if (zbdi->wp_write_min_zone != zbdi->wp_write_max_zone) {
> > +		if (zbdi->wp_write_min_zone != f->min_zone ||
> > +		    zbdi->wp_write_max_zone != f->max_zone) {
> > +			td_verror(td, EINVAL,
> > +				  "multi-jobs with different write ranges are "
> > +				  "not supported with zone_reset_threshold");
> > +			log_err("multi-jobs with different write ranges are "
> > +				"not supported with zone_reset_threshold\n");
> > +		}
> > +		return 0;
> > +	}
> 
> This check looks a bit weird.
> What if the first job specifies a f->file_offset and an f->io_size that is
> within the same zone? Then f->min_zone and f->max_zone will have the same value.
> 
> Which should cause wp_write_min_zone and wp_write_max_zone to have the same
> value.
> 
> The check will thus not be performed for the second job, after wp_write_min_zone
> and wp_write_min_zone have been initialized.

Thanks for catching this. In most cases, f->min_zone and f->max_zone have
different values, since f->io_size must be equal to or larger than zone size.
However, still f->min_zone and f->max_zone can be same when the IO range fits
within a conventional zone. The commit f952800a0b1a is a reference.

I suggest following 3 more changes below:

1) Before IO range check in zbd_set_vdb(), ensure that the IO range includes
   zones with write pointer. This also ensures that f->min_zone and f->max_zone
   are different. The code diff will be as follows:

diff --git a/zbd.c b/zbd.c
index a464d6df..f0a0856b 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1086,6 +1086,13 @@ static uint64_t zbd_set_vdb(struct thread_data *td, const struct fio_file *f)
        if (!accounting_vdb(td, f))
                return 0;

+       /*
+        * Ensure that all zones in the I/O range are not conventional so that
+        * f->min_zone and f->max_zone have different values.
+        */
+       if (!zbd_is_seq_job(f))
+               return 0;
+
        if (zbdi->wp_write_min_zone != zbdi->wp_write_max_zone) {
                if (zbdi->wp_write_min_zone != f->min_zone ||
                    zbdi->wp_write_max_zone != f->max_zone) {

2) The names of new fields wp_write_min_zone and wp_write_max_zone are not
   accurate, since the IO range may cover both coventional zones and zones
   with write pointers. I will rename them to write_min_zone and write_max_zone
   without the wp_ prefix.

3) The 4th patch needs modification also. The description of the
   zone_reset_threshold option should cover the case that IO range has both
   conventional zones and zones with write pointers. It will be like this:

  .BI zone_reset_threshold \fR=\fPfloat
  A number between zero and one that indicates the ratio of written bytes in the
  zones with write pointers in the IO range to the size of the IO range.

-- 
Shin'ichiro Kawasaki

  reply	other threads:[~2023-02-08  4:59 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-07  6:37 [PATCH v2 0/8] zbd: fix 'sectors with data' and zone_reset_threshold accounting issues Shin'ichiro Kawasaki
2023-02-07  6:37 ` [PATCH v2 1/8] zbd: refer file->last_start[] instead of sectors with data accounting Shin'ichiro Kawasaki
2023-02-07 14:05   ` Niklas Cassel
2023-02-07  6:37 ` [PATCH v2 2/8] zbd: remove CHECK_SWD feature Shin'ichiro Kawasaki
2023-02-07 14:05   ` Niklas Cassel
2023-02-07  6:37 ` [PATCH v2 3/8] zbd: rename the accounting 'sectors with data' to 'valid data bytes' Shin'ichiro Kawasaki
2023-02-07 14:06   ` Niklas Cassel
2023-02-07  6:37 ` [PATCH v2 4/8] doc: fix unit of zone_reset_threshold and relation to other option Shin'ichiro Kawasaki
2023-02-07 14:06   ` Niklas Cassel
2023-02-07  6:37 ` [PATCH v2 5/8] zbd: account valid data bytes only for zone_reset_threshold option Shin'ichiro Kawasaki
2023-02-07 14:06   ` Niklas Cassel
2023-02-07  6:37 ` [PATCH v2 6/8] zbd: check write ranges " Shin'ichiro Kawasaki
2023-02-07 14:06   ` Niklas Cassel
2023-02-08  4:59     ` Shinichiro Kawasaki [this message]
2023-02-07  6:37 ` [PATCH v2 7/8] zbd: initialize valid data bytes accounting at file set up Shin'ichiro Kawasaki
2023-02-07 14:06   ` Niklas Cassel
2023-02-08  5:03     ` Shinichiro Kawasaki
2023-02-08  8:30       ` Niklas Cassel
2023-02-07  6:37 ` [PATCH v2 8/8] t/zbd: add test cases for zone_reset_threshold option Shin'ichiro Kawasaki
2023-02-07 14:06   ` Niklas Cassel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230208045919.s2s76gpdcp7n5mog@shindev \
    --to=shinichiro.kawasaki@wdc.com \
    --cc=Dmitry.Fomichev@wdc.com \
    --cc=Niklas.Cassel@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=fio@vger.kernel.org \
    --cc=vincentfu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).