All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ZBD support improvment and cleanup
@ 2018-08-28  5:34 Damien Le Moal
  2018-08-28  5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal
  2018-08-28  5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal
  0 siblings, 2 replies; 9+ messages in thread
From: Damien Le Moal @ 2018-08-28  5:34 UTC (permalink / raw)
  To: fio, Jens Axboe; +Cc: Bart Van Assche

Two small patches: the first one improves read randomness to avoid excessive
device read cache hit for a device with very few written zones. With this patch
applied, on a 14TB SMR disk with only 100 zones written, random read performance
result change from several thousands IOPS down to a more disk-like 100 IOPS or
so.

The second patch cleans up zbd related declarations.

Damien Le Moal (2):
  zbd: Improve read randomness
  zbd: Remove inexitent function declaration

 zbd.c | 10 +++++++++-
 zbd.h | 12 ------------
 2 files changed, 9 insertions(+), 13 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] zbd: Improve read randomness
  2018-08-28  5:34 [PATCH] ZBD support improvment and cleanup Damien Le Moal
@ 2018-08-28  5:34 ` Damien Le Moal
  2018-08-28 14:38   ` Bart Van Assche
  2018-08-28  5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal
  1 sibling, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2018-08-28  5:34 UTC (permalink / raw)
  To: fio, Jens Axboe; +Cc: Bart Van Assche

With zonemode=zbd, for random read operations with read_beyond_wp=0,
the zbd code will always adjust an I/O offset so that the I/O falls in
a non empty zone. The adjustment however always sets the I/O offset to
the start of the zone, resulting in a high device read cache hit rate
if the device has very few zones written.

Fix this by improving the randomness of the I/Os by adjusting the I/O
offset to a random value with the range of written data in the zone.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 zbd.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/zbd.c b/zbd.c
index 56197693..a985e022 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1179,7 +1179,15 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 				       io_u->buflen);
 				goto eof;
 			}
-			io_u->offset = zb->start << 9;
+			range = ((zb->wp - zb->start) << 9) - io_u->buflen;
+			if (td_random(td) && range >= 0)
+				io_u->offset = (zb->start << 9) +
+					((io_u->offset - (zb->start << 9)) %
+					 (range + 1)) / min_bs * min_bs;
+			else
+				io_u->offset = zb->start << 9;
+			assert(zb->start << 9 <= io_u->offset);
+			assert(io_u->offset + io_u->buflen <= zb->wp << 9);
 		}
 		if ((io_u->offset + io_u->buflen) >> 9 > zb->wp) {
 			dprint(FD_ZBD, "%s: %lld + %lld > %" PRIu64 "\n",
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] zbd: Remove inexitent function declaration
  2018-08-28  5:34 [PATCH] ZBD support improvment and cleanup Damien Le Moal
  2018-08-28  5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal
@ 2018-08-28  5:34 ` Damien Le Moal
  2018-08-28 14:34   ` Bart Van Assche
  1 sibling, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2018-08-28  5:34 UTC (permalink / raw)
  To: fio, Jens Axboe; +Cc: Bart Van Assche

zbd_do_trim() and zbd_update_wp() are not implemented.
Remove there declaration from zbd.h.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 zbd.h | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/zbd.h b/zbd.h
index 08751fd5..d750b67e 100644
--- a/zbd.h
+++ b/zbd.h
@@ -95,8 +95,6 @@ int zbd_init(struct thread_data *td);
 void zbd_file_reset(struct thread_data *td, struct fio_file *f);
 bool zbd_unaligned_write(int error_code);
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
-int zbd_do_trim(struct thread_data *td, const struct io_u *io_u);
-void zbd_update_wp(struct thread_data *td, const struct io_u *io_u);
 char *zbd_write_status(const struct thread_stat *ts);
 #else
 static inline void zbd_free_zone_info(struct fio_file *f)
@@ -123,16 +121,6 @@ static inline enum io_u_action zbd_adjust_block(struct thread_data *td,
 	return io_u_accept;
 }
 
-static inline int zbd_do_trim(struct thread_data *td, const struct io_u *io_u)
-{
-	return 1;
-}
-
-static inline void zbd_update_wp(struct thread_data *td,
-				 const struct io_u *io_u)
-{
-}
-
 static inline char *zbd_write_status(const struct thread_stat *ts)
 {
 	return NULL;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] zbd: Remove inexitent function declaration
  2018-08-28  5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal
@ 2018-08-28 14:34   ` Bart Van Assche
  0 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2018-08-28 14:34 UTC (permalink / raw)
  To: Damien Le Moal, fio, Jens Axboe

On 08/27/18 22:34, Damien Le Moal wrote:
> zbd_do_trim() and zbd_update_wp() are not implemented.
> Remove there declaration from zbd.h.

Regarding the patch subject: did you perhaps mean "inexistent"?
Otherwise this patch looks fine to me.

Bart.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] zbd: Improve read randomness
  2018-08-28  5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal
@ 2018-08-28 14:38   ` Bart Van Assche
  2018-08-29  0:57     ` Damien Le Moal
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2018-08-28 14:38 UTC (permalink / raw)
  To: Damien Le Moal, fio, Jens Axboe

On 08/27/18 22:34, Damien Le Moal wrote:
> With zonemode=zbd, for random read operations with read_beyond_wp=0,
> the zbd code will always adjust an I/O offset so that the I/O falls in
> a non empty zone. The adjustment however always sets the I/O offset to
> the start of the zone, resulting in a high device read cache hit rate
> if the device has very few zones written.
> 
> Fix this by improving the randomness of the I/Os by adjusting the I/O
> offset to a random value with the range of written data in the zone.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  zbd.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/zbd.c b/zbd.c
> index 56197693..a985e022 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -1179,7 +1179,15 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
>  				       io_u->buflen);
>  				goto eof;
>  			}
> -			io_u->offset = zb->start << 9;
> +			range = ((zb->wp - zb->start) << 9) - io_u->buflen;
> +			if (td_random(td) && range >= 0)
> +				io_u->offset = (zb->start << 9) +
> +					((io_u->offset - (zb->start << 9)) %
> +					 (range + 1)) / min_bs * min_bs;

This looks wrong to me: it is not guaranteed that io_u->offset >=
(zb->start << 9). Please use the original zb in that expression.
Additionally, since that expression is only evaluated if range >=0,
there is no need to add "+ 1" to "range".

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] zbd: Improve read randomness
  2018-08-28 14:38   ` Bart Van Assche
@ 2018-08-29  0:57     ` Damien Le Moal
  2018-08-29  3:23       ` Bart Van Assche
  0 siblings, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2018-08-29  0:57 UTC (permalink / raw)
  To: fio, bvanassche, axboe

On Tue, 2018-08-28 at 07:38 -0700, Bart Van Assche wrote:
> On 08/27/18 22:34, Damien Le Moal wrote:
> > With zonemode=zbd, for random read operations with read_beyond_wp=0,
> > the zbd code will always adjust an I/O offset so that the I/O falls in
> > a non empty zone. The adjustment however always sets the I/O offset to
> > the start of the zone, resulting in a high device read cache hit rate
> > if the device has very few zones written.
> > 
> > Fix this by improving the randomness of the I/Os by adjusting the I/O
> > offset to a random value with the range of written data in the zone.
> > 
> > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> > ---
> >  zbd.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/zbd.c b/zbd.c
> > index 56197693..a985e022 100644
> > --- a/zbd.c
> > +++ b/zbd.c
> > @@ -1179,7 +1179,15 @@ enum io_u_action zbd_adjust_block(struct
> > thread_data *td, struct io_u *io_u)
> >  				       io_u->buflen);
> >  				goto eof;
> >  			}
> > -			io_u->offset = zb->start << 9;
> > +			range = ((zb->wp - zb->start) << 9) - io_u->buflen;
> > +			if (td_random(td) && range >= 0)
> > +				io_u->offset = (zb->start << 9) +
> > +					((io_u->offset - (zb->start << 9)) %
> > +					 (range + 1)) / min_bs * min_bs;
> 
> This looks wrong to me: it is not guaranteed that io_u->offset >=
> (zb->start << 9). Please use the original zb in that expression.

Yes indeed. Will do.

> Additionally, since that expression is only evaluated if range >=0,
> there is no need to add "+ 1" to "range".

Thinking more about this, since zbd_find_zone() only guarantees that the zone
returned has min_bs data in it, range can in fact still be <= 0 if
io_u->buflen is larger than min_bs. Is it assumed here that min_bs and
io_u->buflen are always equal ? If yes, then range < 0 is not possible, but
range == 0 is definitely still possible and so the +1 is needed.

Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So
either we need to modify zbd_find_zone() to return a zone with at least
io_u->buflen bytes in it, or we need to handle the range < 0 case...

Am I getting this wrong ?

> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] zbd: Improve read randomness
  2018-08-29  0:57     ` Damien Le Moal
@ 2018-08-29  3:23       ` Bart Van Assche
  2018-08-29  3:56         ` Damien Le Moal
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2018-08-29  3:23 UTC (permalink / raw)
  To: Damien Le Moal, fio, axboe

On 08/28/18 17:57, Damien Le Moal wrote:
> Thinking more about this, since zbd_find_zone() only guarantees that the zone
> returned has min_bs data in it, range can in fact still be <= 0 if
> io_u->buflen is larger than min_bs. Is it assumed here that min_bs and
> io_u->buflen are always equal ?

There is no such assumption. io_u->buflen can be larger than min_bs.
E.g. test11 in test-zbd-support triggers that case because it uses the
bsrange option.

> If yes, then range < 0 is not possible, but range == 0 is definitely still
> possible and so the +1 is needed.

Have you considered to leave out the "+ 1" and to handle range == 0
separately?

> Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So
> either we need to modify zbd_find_zone() to return a zone with at least
> io_u->buflen bytes in it, or we need to handle the range < 0 case...

The range < 0 case is already handled by the following code at the end
of the DDIR_WRITE case:

	/*
	 * Make sure that the buflen is a multiple of the minimal
	 * block size. Give up if shrinking would make the request too
	 * small.
	 */
	new_len = min((unsigned long long)io_u->buflen,
		      ((zb + 1)->start << 9) - io_u->offset);
	new_len = new_len / min_bs * min_bs;
	if (new_len == io_u->buflen)
		goto accept;
	if (new_len >= min_bs) {
		io_u->buflen = new_len;
		dprint(FD_IO, "Changed length from %u into %llu\n",
		       orig_len, io_u->buflen);
		goto accept;
	}
	log_err("Zone remainder %lld smaller than minimum block size %d\n",
		(((zb + 1)->start << 9) - io_u->offset),
		min_bs);
	goto eof;

Bart.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] zbd: Improve read randomness
  2018-08-29  3:23       ` Bart Van Assche
@ 2018-08-29  3:56         ` Damien Le Moal
  2018-08-29 15:42           ` Bart Van Assche
  0 siblings, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2018-08-29  3:56 UTC (permalink / raw)
  To: fio, bvanassche, axboe

Bart,

On Tue, 2018-08-28 at 20:23 -0700, Bart Van Assche wrote:
> On 08/28/18 17:57, Damien Le Moal wrote:
> > Thinking more about this, since zbd_find_zone() only guarantees that the
> > zone
> > returned has min_bs data in it, range can in fact still be <= 0 if
> > io_u->buflen is larger than min_bs. Is it assumed here that min_bs and
> > io_u->buflen are always equal ?
> 
> There is no such assumption. io_u->buflen can be larger than min_bs.
> E.g. test11 in test-zbd-support triggers that case because it uses the
> bsrange option.

OK. We are in sync. And this means that range can be negative after
zbc_find_zone() runs and finds a new zone, since the search returns once a
zone with min_bs bytes is found. That may be less than io_u->buflen.

> 
> > If yes, then range < 0 is not possible, but range == 0 is definitely still
> > possible and so the +1 is needed.
> 
> Have you considered to leave out the "+ 1" and to handle range == 0
> separately?

Sure I can do that. But again what about zbd_find_zone() ? Shouldn't it be
changed to look for a zone with at least io_u->buflen bytes rather than only
min_bs ? Otherwise, range can be negative and the test "if (io_u->offset +
io_u->buflen > zb->wp)" at the end of the DDIR_READ case could terminate reads
with eof prematuraly.

> > Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So
> > either we need to modify zbd_find_zone() to return a zone with at least
> > io_u->buflen bytes in it, or we need to handle the range < 0 case...
> 
> The range < 0 case is already handled by the following code at the end
> of the DDIR_WRITE case:

That is fine, but that is for the DDIR_WRITE case. There is no such code in
the DDIR_READ case. Reads I/Os buflen are not modified based on the zone wp
position... Do you think it is better to add a similar I/O size reduction code
for reads or change zbd_find_zone() as suggested above ? I think the latter is
better to ensure consistent sizes for reads.

> 
> 	/*
> 	 * Make sure that the buflen is a multiple of the minimal
> 	 * block size. Give up if shrinking would make the request too
> 	 * small.
> 	 */
> 	new_len = min((unsigned long long)io_u->buflen,
> 		      ((zb + 1)->start << 9) - io_u->offset);
> 	new_len = new_len / min_bs * min_bs;
> 	if (new_len == io_u->buflen)
> 		goto accept;
> 	if (new_len >= min_bs) {
> 		io_u->buflen = new_len;
> 		dprint(FD_IO, "Changed length from %u into %llu\n",
> 		       orig_len, io_u->buflen);
> 		goto accept;
> 	}
> 	log_err("Zone remainder %lld smaller than minimum block size %d\n",
> 		(((zb + 1)->start << 9) - io_u->offset),
> 		min_bs);
> 	goto eof;
> 
> Bart.

-- 
Damien Le Moal
Western Digital

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] zbd: Improve read randomness
  2018-08-29  3:56         ` Damien Le Moal
@ 2018-08-29 15:42           ` Bart Van Assche
  0 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2018-08-29 15:42 UTC (permalink / raw)
  To: Damien Le Moal, fio, axboe

On 08/28/18 20:56, Damien Le Moal wrote:
> On Tue, 2018-08-28 at 20:23 -0700, Bart Van Assche wrote:
>> On 08/28/18 17:57, Damien Le Moal wrote:

>>> If yes, then range < 0 is not possible, but range == 0 is definitely still
>>> possible and so the +1 is needed.
>>
>> Have you considered to leave out the "+ 1" and to handle range == 0
>> separately?
> 
> Sure I can do that. But again what about zbd_find_zone() ? Shouldn't it be
> changed to look for a zone with at least io_u->buflen bytes rather than only
> min_bs ? Otherwise, range can be negative and the test "if (io_u->offset +
> io_u->buflen > zb->wp)" at the end of the DDIR_READ case could terminate reads
> with eof prematuraly.

I think that it's better to shrink a read request than to modify
zbd_find_zone(). I think that making zbd_find_zone() look for a zone
that is large enough would confuse random reads if the random map is
enabled because the random generator would repeatedly propose to submit
I/O to the remainder of a zone if the random map is almost full.

>>> Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So
>>> either we need to modify zbd_find_zone() to return a zone with at least
>>> io_u->buflen bytes in it, or we need to handle the range < 0 case...
>>
>> The range < 0 case is already handled by the following code at the end
>> of the DDIR_WRITE case:
> 
> That is fine, but that is for the DDIR_WRITE case. There is no such code in
> the DDIR_READ case. Reads I/Os buflen are not modified based on the zone wp
> position... Do you think it is better to add a similar I/O size reduction code
> for reads or change zbd_find_zone() as suggested above ? I think the latter is
> better to ensure consistent sizes for reads.

Modifying the DDIR_READ case such that read requests for a zone with
less data than io_u->buflen are shrunk sounds fine to me.

Bart.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-08-29 15:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-28  5:34 [PATCH] ZBD support improvment and cleanup Damien Le Moal
2018-08-28  5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal
2018-08-28 14:38   ` Bart Van Assche
2018-08-29  0:57     ` Damien Le Moal
2018-08-29  3:23       ` Bart Van Assche
2018-08-29  3:56         ` Damien Le Moal
2018-08-29 15:42           ` Bart Van Assche
2018-08-28  5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal
2018-08-28 14:34   ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.