* [PATCH] ZBD support improvment and cleanup
@ 2018-08-28 5:34 Damien Le Moal
2018-08-28 5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal
2018-08-28 5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal
0 siblings, 2 replies; 9+ messages in thread
From: Damien Le Moal @ 2018-08-28 5:34 UTC (permalink / raw)
To: fio, Jens Axboe; +Cc: Bart Van Assche
Two small patches: the first one improves read randomness to avoid excessive
device read cache hit for a device with very few written zones. With this patch
applied, on a 14TB SMR disk with only 100 zones written, random read performance
result change from several thousands IOPS down to a more disk-like 100 IOPS or
so.
The second patch cleans up zbd related declarations.
Damien Le Moal (2):
zbd: Improve read randomness
zbd: Remove inexitent function declaration
zbd.c | 10 +++++++++-
zbd.h | 12 ------------
2 files changed, 9 insertions(+), 13 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] zbd: Improve read randomness
2018-08-28 5:34 [PATCH] ZBD support improvment and cleanup Damien Le Moal
@ 2018-08-28 5:34 ` Damien Le Moal
2018-08-28 14:38 ` Bart Van Assche
2018-08-28 5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal
1 sibling, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2018-08-28 5:34 UTC (permalink / raw)
To: fio, Jens Axboe; +Cc: Bart Van Assche
With zonemode=zbd, for random read operations with read_beyond_wp=0,
the zbd code will always adjust an I/O offset so that the I/O falls in
a non empty zone. The adjustment however always sets the I/O offset to
the start of the zone, resulting in a high device read cache hit rate
if the device has very few zones written.
Fix this by improving the randomness of the I/Os by adjusting the I/O
offset to a random value with the range of written data in the zone.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
zbd.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/zbd.c b/zbd.c
index 56197693..a985e022 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1179,7 +1179,15 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
io_u->buflen);
goto eof;
}
- io_u->offset = zb->start << 9;
+ range = ((zb->wp - zb->start) << 9) - io_u->buflen;
+ if (td_random(td) && range >= 0)
+ io_u->offset = (zb->start << 9) +
+ ((io_u->offset - (zb->start << 9)) %
+ (range + 1)) / min_bs * min_bs;
+ else
+ io_u->offset = zb->start << 9;
+ assert(zb->start << 9 <= io_u->offset);
+ assert(io_u->offset + io_u->buflen <= zb->wp << 9);
}
if ((io_u->offset + io_u->buflen) >> 9 > zb->wp) {
dprint(FD_ZBD, "%s: %lld + %lld > %" PRIu64 "\n",
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] zbd: Remove inexitent function declaration
2018-08-28 5:34 [PATCH] ZBD support improvment and cleanup Damien Le Moal
2018-08-28 5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal
@ 2018-08-28 5:34 ` Damien Le Moal
2018-08-28 14:34 ` Bart Van Assche
1 sibling, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2018-08-28 5:34 UTC (permalink / raw)
To: fio, Jens Axboe; +Cc: Bart Van Assche
zbd_do_trim() and zbd_update_wp() are not implemented.
Remove there declaration from zbd.h.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
zbd.h | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/zbd.h b/zbd.h
index 08751fd5..d750b67e 100644
--- a/zbd.h
+++ b/zbd.h
@@ -95,8 +95,6 @@ int zbd_init(struct thread_data *td);
void zbd_file_reset(struct thread_data *td, struct fio_file *f);
bool zbd_unaligned_write(int error_code);
enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
-int zbd_do_trim(struct thread_data *td, const struct io_u *io_u);
-void zbd_update_wp(struct thread_data *td, const struct io_u *io_u);
char *zbd_write_status(const struct thread_stat *ts);
#else
static inline void zbd_free_zone_info(struct fio_file *f)
@@ -123,16 +121,6 @@ static inline enum io_u_action zbd_adjust_block(struct thread_data *td,
return io_u_accept;
}
-static inline int zbd_do_trim(struct thread_data *td, const struct io_u *io_u)
-{
- return 1;
-}
-
-static inline void zbd_update_wp(struct thread_data *td,
- const struct io_u *io_u)
-{
-}
-
static inline char *zbd_write_status(const struct thread_stat *ts)
{
return NULL;
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Remove inexitent function declaration
2018-08-28 5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal
@ 2018-08-28 14:34 ` Bart Van Assche
0 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2018-08-28 14:34 UTC (permalink / raw)
To: Damien Le Moal, fio, Jens Axboe
On 08/27/18 22:34, Damien Le Moal wrote:
> zbd_do_trim() and zbd_update_wp() are not implemented.
> Remove there declaration from zbd.h.
Regarding the patch subject: did you perhaps mean "inexistent"?
Otherwise this patch looks fine to me.
Bart.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Improve read randomness
2018-08-28 5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal
@ 2018-08-28 14:38 ` Bart Van Assche
2018-08-29 0:57 ` Damien Le Moal
0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2018-08-28 14:38 UTC (permalink / raw)
To: Damien Le Moal, fio, Jens Axboe
On 08/27/18 22:34, Damien Le Moal wrote:
> With zonemode=zbd, for random read operations with read_beyond_wp=0,
> the zbd code will always adjust an I/O offset so that the I/O falls in
> a non empty zone. The adjustment however always sets the I/O offset to
> the start of the zone, resulting in a high device read cache hit rate
> if the device has very few zones written.
>
> Fix this by improving the randomness of the I/Os by adjusting the I/O
> offset to a random value with the range of written data in the zone.
>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
> zbd.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/zbd.c b/zbd.c
> index 56197693..a985e022 100644
> --- a/zbd.c
> +++ b/zbd.c
> @@ -1179,7 +1179,15 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
> io_u->buflen);
> goto eof;
> }
> - io_u->offset = zb->start << 9;
> + range = ((zb->wp - zb->start) << 9) - io_u->buflen;
> + if (td_random(td) && range >= 0)
> + io_u->offset = (zb->start << 9) +
> + ((io_u->offset - (zb->start << 9)) %
> + (range + 1)) / min_bs * min_bs;
This looks wrong to me: it is not guaranteed that io_u->offset >=
(zb->start << 9). Please use the original zb in that expression.
Additionally, since that expression is only evaluated if range >=0,
there is no need to add "+ 1" to "range".
Thanks,
Bart.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Improve read randomness
2018-08-28 14:38 ` Bart Van Assche
@ 2018-08-29 0:57 ` Damien Le Moal
2018-08-29 3:23 ` Bart Van Assche
0 siblings, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2018-08-29 0:57 UTC (permalink / raw)
To: fio, bvanassche, axboe
On Tue, 2018-08-28 at 07:38 -0700, Bart Van Assche wrote:
> On 08/27/18 22:34, Damien Le Moal wrote:
> > With zonemode=zbd, for random read operations with read_beyond_wp=0,
> > the zbd code will always adjust an I/O offset so that the I/O falls in
> > a non empty zone. The adjustment however always sets the I/O offset to
> > the start of the zone, resulting in a high device read cache hit rate
> > if the device has very few zones written.
> >
> > Fix this by improving the randomness of the I/Os by adjusting the I/O
> > offset to a random value with the range of written data in the zone.
> >
> > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> > ---
> > zbd.c | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/zbd.c b/zbd.c
> > index 56197693..a985e022 100644
> > --- a/zbd.c
> > +++ b/zbd.c
> > @@ -1179,7 +1179,15 @@ enum io_u_action zbd_adjust_block(struct
> > thread_data *td, struct io_u *io_u)
> > io_u->buflen);
> > goto eof;
> > }
> > - io_u->offset = zb->start << 9;
> > + range = ((zb->wp - zb->start) << 9) - io_u->buflen;
> > + if (td_random(td) && range >= 0)
> > + io_u->offset = (zb->start << 9) +
> > + ((io_u->offset - (zb->start << 9)) %
> > + (range + 1)) / min_bs * min_bs;
>
> This looks wrong to me: it is not guaranteed that io_u->offset >=
> (zb->start << 9). Please use the original zb in that expression.
Yes indeed. Will do.
> Additionally, since that expression is only evaluated if range >=0,
> there is no need to add "+ 1" to "range".
Thinking more about this, since zbd_find_zone() only guarantees that the zone
returned has min_bs data in it, range can in fact still be <= 0 if
io_u->buflen is larger than min_bs. Is it assumed here that min_bs and
io_u->buflen are always equal ? If yes, then range < 0 is not possible, but
range == 0 is definitely still possible and so the +1 is needed.
Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So
either we need to modify zbd_find_zone() to return a zone with at least
io_u->buflen bytes in it, or we need to handle the range < 0 case...
Am I getting this wrong ?
> Thanks,
>
> Bart.
>
--
Damien Le Moal
Western Digital
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Improve read randomness
2018-08-29 0:57 ` Damien Le Moal
@ 2018-08-29 3:23 ` Bart Van Assche
2018-08-29 3:56 ` Damien Le Moal
0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2018-08-29 3:23 UTC (permalink / raw)
To: Damien Le Moal, fio, axboe
On 08/28/18 17:57, Damien Le Moal wrote:
> Thinking more about this, since zbd_find_zone() only guarantees that the zone
> returned has min_bs data in it, range can in fact still be <= 0 if
> io_u->buflen is larger than min_bs. Is it assumed here that min_bs and
> io_u->buflen are always equal ?
There is no such assumption. io_u->buflen can be larger than min_bs.
E.g. test11 in test-zbd-support triggers that case because it uses the
bsrange option.
> If yes, then range < 0 is not possible, but range == 0 is definitely still
> possible and so the +1 is needed.
Have you considered to leave out the "+ 1" and to handle range == 0
separately?
> Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So
> either we need to modify zbd_find_zone() to return a zone with at least
> io_u->buflen bytes in it, or we need to handle the range < 0 case...
The range < 0 case is already handled by the following code at the end
of the DDIR_WRITE case:
/*
* Make sure that the buflen is a multiple of the minimal
* block size. Give up if shrinking would make the request too
* small.
*/
new_len = min((unsigned long long)io_u->buflen,
((zb + 1)->start << 9) - io_u->offset);
new_len = new_len / min_bs * min_bs;
if (new_len == io_u->buflen)
goto accept;
if (new_len >= min_bs) {
io_u->buflen = new_len;
dprint(FD_IO, "Changed length from %u into %llu\n",
orig_len, io_u->buflen);
goto accept;
}
log_err("Zone remainder %lld smaller than minimum block size %d\n",
(((zb + 1)->start << 9) - io_u->offset),
min_bs);
goto eof;
Bart.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Improve read randomness
2018-08-29 3:23 ` Bart Van Assche
@ 2018-08-29 3:56 ` Damien Le Moal
2018-08-29 15:42 ` Bart Van Assche
0 siblings, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2018-08-29 3:56 UTC (permalink / raw)
To: fio, bvanassche, axboe
Bart,
On Tue, 2018-08-28 at 20:23 -0700, Bart Van Assche wrote:
> On 08/28/18 17:57, Damien Le Moal wrote:
> > Thinking more about this, since zbd_find_zone() only guarantees that the
> > zone
> > returned has min_bs data in it, range can in fact still be <= 0 if
> > io_u->buflen is larger than min_bs. Is it assumed here that min_bs and
> > io_u->buflen are always equal ?
>
> There is no such assumption. io_u->buflen can be larger than min_bs.
> E.g. test11 in test-zbd-support triggers that case because it uses the
> bsrange option.
OK. We are in sync. And this means that range can be negative after
zbc_find_zone() runs and finds a new zone, since the search returns once a
zone with min_bs bytes is found. That may be less than io_u->buflen.
>
> > If yes, then range < 0 is not possible, but range == 0 is definitely still
> > possible and so the +1 is needed.
>
> Have you considered to leave out the "+ 1" and to handle range == 0
> separately?
Sure I can do that. But again what about zbd_find_zone() ? Shouldn't it be
changed to look for a zone with at least io_u->buflen bytes rather than only
min_bs ? Otherwise, range can be negative and the test "if (io_u->offset +
io_u->buflen > zb->wp)" at the end of the DDIR_READ case could terminate reads
with eof prematuraly.
> > Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So
> > either we need to modify zbd_find_zone() to return a zone with at least
> > io_u->buflen bytes in it, or we need to handle the range < 0 case...
>
> The range < 0 case is already handled by the following code at the end
> of the DDIR_WRITE case:
That is fine, but that is for the DDIR_WRITE case. There is no such code in
the DDIR_READ case. Reads I/Os buflen are not modified based on the zone wp
position... Do you think it is better to add a similar I/O size reduction code
for reads or change zbd_find_zone() as suggested above ? I think the latter is
better to ensure consistent sizes for reads.
>
> /*
> * Make sure that the buflen is a multiple of the minimal
> * block size. Give up if shrinking would make the request too
> * small.
> */
> new_len = min((unsigned long long)io_u->buflen,
> ((zb + 1)->start << 9) - io_u->offset);
> new_len = new_len / min_bs * min_bs;
> if (new_len == io_u->buflen)
> goto accept;
> if (new_len >= min_bs) {
> io_u->buflen = new_len;
> dprint(FD_IO, "Changed length from %u into %llu\n",
> orig_len, io_u->buflen);
> goto accept;
> }
> log_err("Zone remainder %lld smaller than minimum block size %d\n",
> (((zb + 1)->start << 9) - io_u->offset),
> min_bs);
> goto eof;
>
> Bart.
--
Damien Le Moal
Western Digital
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Improve read randomness
2018-08-29 3:56 ` Damien Le Moal
@ 2018-08-29 15:42 ` Bart Van Assche
0 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2018-08-29 15:42 UTC (permalink / raw)
To: Damien Le Moal, fio, axboe
On 08/28/18 20:56, Damien Le Moal wrote:
> On Tue, 2018-08-28 at 20:23 -0700, Bart Van Assche wrote:
>> On 08/28/18 17:57, Damien Le Moal wrote:
>>> If yes, then range < 0 is not possible, but range == 0 is definitely still
>>> possible and so the +1 is needed.
>>
>> Have you considered to leave out the "+ 1" and to handle range == 0
>> separately?
>
> Sure I can do that. But again what about zbd_find_zone() ? Shouldn't it be
> changed to look for a zone with at least io_u->buflen bytes rather than only
> min_bs ? Otherwise, range can be negative and the test "if (io_u->offset +
> io_u->buflen > zb->wp)" at the end of the DDIR_READ case could terminate reads
> with eof prematuraly.
I think that it's better to shrink a read request than to modify
zbd_find_zone(). I think that making zbd_find_zone() look for a zone
that is large enough would confuse random reads if the random map is
enabled because the random generator would repeatedly propose to submit
I/O to the remainder of a zone if the random map is almost full.
>>> Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So
>>> either we need to modify zbd_find_zone() to return a zone with at least
>>> io_u->buflen bytes in it, or we need to handle the range < 0 case...
>>
>> The range < 0 case is already handled by the following code at the end
>> of the DDIR_WRITE case:
>
> That is fine, but that is for the DDIR_WRITE case. There is no such code in
> the DDIR_READ case. Reads I/Os buflen are not modified based on the zone wp
> position... Do you think it is better to add a similar I/O size reduction code
> for reads or change zbd_find_zone() as suggested above ? I think the latter is
> better to ensure consistent sizes for reads.
Modifying the DDIR_READ case such that read requests for a zone with
less data than io_u->buflen are shrunk sounds fine to me.
Bart.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-08-29 15:42 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-28 5:34 [PATCH] ZBD support improvment and cleanup Damien Le Moal
2018-08-28 5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal
2018-08-28 14:38 ` Bart Van Assche
2018-08-29 0:57 ` Damien Le Moal
2018-08-29 3:23 ` Bart Van Assche
2018-08-29 3:56 ` Damien Le Moal
2018-08-29 15:42 ` Bart Van Assche
2018-08-28 5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal
2018-08-28 14:34 ` Bart Van Assche
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.