* [PATCH] ZBD support improvment and cleanup @ 2018-08-28 5:34 Damien Le Moal 2018-08-28 5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal 2018-08-28 5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal 0 siblings, 2 replies; 9+ messages in thread From: Damien Le Moal @ 2018-08-28 5:34 UTC (permalink / raw) To: fio, Jens Axboe; +Cc: Bart Van Assche Two small patches: the first one improves read randomness to avoid excessive device read cache hit for a device with very few written zones. With this patch applied, on a 14TB SMR disk with only 100 zones written, random read performance result change from several thousands IOPS down to a more disk-like 100 IOPS or so. The second patch cleans up zbd related declarations. Damien Le Moal (2): zbd: Improve read randomness zbd: Remove inexitent function declaration zbd.c | 10 +++++++++- zbd.h | 12 ------------ 2 files changed, 9 insertions(+), 13 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] zbd: Improve read randomness 2018-08-28 5:34 [PATCH] ZBD support improvment and cleanup Damien Le Moal @ 2018-08-28 5:34 ` Damien Le Moal 2018-08-28 14:38 ` Bart Van Assche 2018-08-28 5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal 1 sibling, 1 reply; 9+ messages in thread From: Damien Le Moal @ 2018-08-28 5:34 UTC (permalink / raw) To: fio, Jens Axboe; +Cc: Bart Van Assche With zonemode=zbd, for random read operations with read_beyond_wp=0, the zbd code will always adjust an I/O offset so that the I/O falls in a non empty zone. The adjustment however always sets the I/O offset to the start of the zone, resulting in a high device read cache hit rate if the device has very few zones written. Fix this by improving the randomness of the I/Os by adjusting the I/O offset to a random value with the range of written data in the zone. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> --- zbd.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/zbd.c b/zbd.c index 56197693..a985e022 100644 --- a/zbd.c +++ b/zbd.c @@ -1179,7 +1179,15 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u) io_u->buflen); goto eof; } - io_u->offset = zb->start << 9; + range = ((zb->wp - zb->start) << 9) - io_u->buflen; + if (td_random(td) && range >= 0) + io_u->offset = (zb->start << 9) + + ((io_u->offset - (zb->start << 9)) % + (range + 1)) / min_bs * min_bs; + else + io_u->offset = zb->start << 9; + assert(zb->start << 9 <= io_u->offset); + assert(io_u->offset + io_u->buflen <= zb->wp << 9); } if ((io_u->offset + io_u->buflen) >> 9 > zb->wp) { dprint(FD_ZBD, "%s: %lld + %lld > %" PRIu64 "\n", -- 2.17.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Improve read randomness 2018-08-28 5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal @ 2018-08-28 14:38 ` Bart Van Assche 2018-08-29 0:57 ` Damien Le Moal 0 siblings, 1 reply; 9+ messages in thread From: Bart Van Assche @ 2018-08-28 14:38 UTC (permalink / raw) To: Damien Le Moal, fio, Jens Axboe On 08/27/18 22:34, Damien Le Moal wrote: > With zonemode=zbd, for random read operations with read_beyond_wp=0, > the zbd code will always adjust an I/O offset so that the I/O falls in > a non empty zone. The adjustment however always sets the I/O offset to > the start of the zone, resulting in a high device read cache hit rate > if the device has very few zones written. > > Fix this by improving the randomness of the I/Os by adjusting the I/O > offset to a random value with the range of written data in the zone. > > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > --- > zbd.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/zbd.c b/zbd.c > index 56197693..a985e022 100644 > --- a/zbd.c > +++ b/zbd.c > @@ -1179,7 +1179,15 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u) > io_u->buflen); > goto eof; > } > - io_u->offset = zb->start << 9; > + range = ((zb->wp - zb->start) << 9) - io_u->buflen; > + if (td_random(td) && range >= 0) > + io_u->offset = (zb->start << 9) + > + ((io_u->offset - (zb->start << 9)) % > + (range + 1)) / min_bs * min_bs; This looks wrong to me: it is not guaranteed that io_u->offset >= (zb->start << 9). Please use the original zb in that expression. Additionally, since that expression is only evaluated if range >=0, there is no need to add "+ 1" to "range". Thanks, Bart. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Improve read randomness 2018-08-28 14:38 ` Bart Van Assche @ 2018-08-29 0:57 ` Damien Le Moal 2018-08-29 3:23 ` Bart Van Assche 0 siblings, 1 reply; 9+ messages in thread From: Damien Le Moal @ 2018-08-29 0:57 UTC (permalink / raw) To: fio, bvanassche, axboe On Tue, 2018-08-28 at 07:38 -0700, Bart Van Assche wrote: > On 08/27/18 22:34, Damien Le Moal wrote: > > With zonemode=zbd, for random read operations with read_beyond_wp=0, > > the zbd code will always adjust an I/O offset so that the I/O falls in > > a non empty zone. The adjustment however always sets the I/O offset to > > the start of the zone, resulting in a high device read cache hit rate > > if the device has very few zones written. > > > > Fix this by improving the randomness of the I/Os by adjusting the I/O > > offset to a random value with the range of written data in the zone. > > > > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > > --- > > zbd.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/zbd.c b/zbd.c > > index 56197693..a985e022 100644 > > --- a/zbd.c > > +++ b/zbd.c > > @@ -1179,7 +1179,15 @@ enum io_u_action zbd_adjust_block(struct > > thread_data *td, struct io_u *io_u) > > io_u->buflen); > > goto eof; > > } > > - io_u->offset = zb->start << 9; > > + range = ((zb->wp - zb->start) << 9) - io_u->buflen; > > + if (td_random(td) && range >= 0) > > + io_u->offset = (zb->start << 9) + > > + ((io_u->offset - (zb->start << 9)) % > > + (range + 1)) / min_bs * min_bs; > > This looks wrong to me: it is not guaranteed that io_u->offset >= > (zb->start << 9). Please use the original zb in that expression. Yes indeed. Will do. > Additionally, since that expression is only evaluated if range >=0, > there is no need to add "+ 1" to "range". Thinking more about this, since zbd_find_zone() only guarantees that the zone returned has min_bs data in it, range can in fact still be <= 0 if io_u->buflen is larger than min_bs. Is it assumed here that min_bs and io_u->buflen are always equal ? If yes, then range < 0 is not possible, but range == 0 is definitely still possible and so the +1 is needed. Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So either we need to modify zbd_find_zone() to return a zone with at least io_u->buflen bytes in it, or we need to handle the range < 0 case... Am I getting this wrong ? > Thanks, > > Bart. > -- Damien Le Moal Western Digital ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Improve read randomness 2018-08-29 0:57 ` Damien Le Moal @ 2018-08-29 3:23 ` Bart Van Assche 2018-08-29 3:56 ` Damien Le Moal 0 siblings, 1 reply; 9+ messages in thread From: Bart Van Assche @ 2018-08-29 3:23 UTC (permalink / raw) To: Damien Le Moal, fio, axboe On 08/28/18 17:57, Damien Le Moal wrote: > Thinking more about this, since zbd_find_zone() only guarantees that the zone > returned has min_bs data in it, range can in fact still be <= 0 if > io_u->buflen is larger than min_bs. Is it assumed here that min_bs and > io_u->buflen are always equal ? There is no such assumption. io_u->buflen can be larger than min_bs. E.g. test11 in test-zbd-support triggers that case because it uses the bsrange option. > If yes, then range < 0 is not possible, but range == 0 is definitely still > possible and so the +1 is needed. Have you considered to leave out the "+ 1" and to handle range == 0 separately? > Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So > either we need to modify zbd_find_zone() to return a zone with at least > io_u->buflen bytes in it, or we need to handle the range < 0 case... The range < 0 case is already handled by the following code at the end of the DDIR_WRITE case: /* * Make sure that the buflen is a multiple of the minimal * block size. Give up if shrinking would make the request too * small. */ new_len = min((unsigned long long)io_u->buflen, ((zb + 1)->start << 9) - io_u->offset); new_len = new_len / min_bs * min_bs; if (new_len == io_u->buflen) goto accept; if (new_len >= min_bs) { io_u->buflen = new_len; dprint(FD_IO, "Changed length from %u into %llu\n", orig_len, io_u->buflen); goto accept; } log_err("Zone remainder %lld smaller than minimum block size %d\n", (((zb + 1)->start << 9) - io_u->offset), min_bs); goto eof; Bart. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Improve read randomness 2018-08-29 3:23 ` Bart Van Assche @ 2018-08-29 3:56 ` Damien Le Moal 2018-08-29 15:42 ` Bart Van Assche 0 siblings, 1 reply; 9+ messages in thread From: Damien Le Moal @ 2018-08-29 3:56 UTC (permalink / raw) To: fio, bvanassche, axboe Bart, On Tue, 2018-08-28 at 20:23 -0700, Bart Van Assche wrote: > On 08/28/18 17:57, Damien Le Moal wrote: > > Thinking more about this, since zbd_find_zone() only guarantees that the > > zone > > returned has min_bs data in it, range can in fact still be <= 0 if > > io_u->buflen is larger than min_bs. Is it assumed here that min_bs and > > io_u->buflen are always equal ? > > There is no such assumption. io_u->buflen can be larger than min_bs. > E.g. test11 in test-zbd-support triggers that case because it uses the > bsrange option. OK. We are in sync. And this means that range can be negative after zbc_find_zone() runs and finds a new zone, since the search returns once a zone with min_bs bytes is found. That may be less than io_u->buflen. > > > If yes, then range < 0 is not possible, but range == 0 is definitely still > > possible and so the +1 is needed. > > Have you considered to leave out the "+ 1" and to handle range == 0 > separately? Sure I can do that. But again what about zbd_find_zone() ? Shouldn't it be changed to look for a zone with at least io_u->buflen bytes rather than only min_bs ? Otherwise, range can be negative and the test "if (io_u->offset + io_u->buflen > zb->wp)" at the end of the DDIR_READ case could terminate reads with eof prematuraly. > > Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So > > either we need to modify zbd_find_zone() to return a zone with at least > > io_u->buflen bytes in it, or we need to handle the range < 0 case... > > The range < 0 case is already handled by the following code at the end > of the DDIR_WRITE case: That is fine, but that is for the DDIR_WRITE case. There is no such code in the DDIR_READ case. Reads I/Os buflen are not modified based on the zone wp position... Do you think it is better to add a similar I/O size reduction code for reads or change zbd_find_zone() as suggested above ? I think the latter is better to ensure consistent sizes for reads. > > /* > * Make sure that the buflen is a multiple of the minimal > * block size. Give up if shrinking would make the request too > * small. > */ > new_len = min((unsigned long long)io_u->buflen, > ((zb + 1)->start << 9) - io_u->offset); > new_len = new_len / min_bs * min_bs; > if (new_len == io_u->buflen) > goto accept; > if (new_len >= min_bs) { > io_u->buflen = new_len; > dprint(FD_IO, "Changed length from %u into %llu\n", > orig_len, io_u->buflen); > goto accept; > } > log_err("Zone remainder %lld smaller than minimum block size %d\n", > (((zb + 1)->start << 9) - io_u->offset), > min_bs); > goto eof; > > Bart. -- Damien Le Moal Western Digital ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Improve read randomness 2018-08-29 3:56 ` Damien Le Moal @ 2018-08-29 15:42 ` Bart Van Assche 0 siblings, 0 replies; 9+ messages in thread From: Bart Van Assche @ 2018-08-29 15:42 UTC (permalink / raw) To: Damien Le Moal, fio, axboe On 08/28/18 20:56, Damien Le Moal wrote: > On Tue, 2018-08-28 at 20:23 -0700, Bart Van Assche wrote: >> On 08/28/18 17:57, Damien Le Moal wrote: >>> If yes, then range < 0 is not possible, but range == 0 is definitely still >>> possible and so the +1 is needed. >> >> Have you considered to leave out the "+ 1" and to handle range == 0 >> separately? > > Sure I can do that. But again what about zbd_find_zone() ? Shouldn't it be > changed to look for a zone with at least io_u->buflen bytes rather than only > min_bs ? Otherwise, range can be negative and the test "if (io_u->offset + > io_u->buflen > zb->wp)" at the end of the DDIR_READ case could terminate reads > with eof prematuraly. I think that it's better to shrink a read request than to modify zbd_find_zone(). I think that making zbd_find_zone() look for a zone that is large enough would confuse random reads if the random map is enabled because the random generator would repeatedly propose to submit I/O to the remainder of a zone if the random map is almost full. >>> Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So >>> either we need to modify zbd_find_zone() to return a zone with at least >>> io_u->buflen bytes in it, or we need to handle the range < 0 case... >> >> The range < 0 case is already handled by the following code at the end >> of the DDIR_WRITE case: > > That is fine, but that is for the DDIR_WRITE case. There is no such code in > the DDIR_READ case. Reads I/Os buflen are not modified based on the zone wp > position... Do you think it is better to add a similar I/O size reduction code > for reads or change zbd_find_zone() as suggested above ? I think the latter is > better to ensure consistent sizes for reads. Modifying the DDIR_READ case such that read requests for a zone with less data than io_u->buflen are shrunk sounds fine to me. Bart. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] zbd: Remove inexitent function declaration 2018-08-28 5:34 [PATCH] ZBD support improvment and cleanup Damien Le Moal 2018-08-28 5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal @ 2018-08-28 5:34 ` Damien Le Moal 2018-08-28 14:34 ` Bart Van Assche 1 sibling, 1 reply; 9+ messages in thread From: Damien Le Moal @ 2018-08-28 5:34 UTC (permalink / raw) To: fio, Jens Axboe; +Cc: Bart Van Assche zbd_do_trim() and zbd_update_wp() are not implemented. Remove there declaration from zbd.h. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> --- zbd.h | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/zbd.h b/zbd.h index 08751fd5..d750b67e 100644 --- a/zbd.h +++ b/zbd.h @@ -95,8 +95,6 @@ int zbd_init(struct thread_data *td); void zbd_file_reset(struct thread_data *td, struct fio_file *f); bool zbd_unaligned_write(int error_code); enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u); -int zbd_do_trim(struct thread_data *td, const struct io_u *io_u); -void zbd_update_wp(struct thread_data *td, const struct io_u *io_u); char *zbd_write_status(const struct thread_stat *ts); #else static inline void zbd_free_zone_info(struct fio_file *f) @@ -123,16 +121,6 @@ static inline enum io_u_action zbd_adjust_block(struct thread_data *td, return io_u_accept; } -static inline int zbd_do_trim(struct thread_data *td, const struct io_u *io_u) -{ - return 1; -} - -static inline void zbd_update_wp(struct thread_data *td, - const struct io_u *io_u) -{ -} - static inline char *zbd_write_status(const struct thread_stat *ts) { return NULL; -- 2.17.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] zbd: Remove inexitent function declaration 2018-08-28 5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal @ 2018-08-28 14:34 ` Bart Van Assche 0 siblings, 0 replies; 9+ messages in thread From: Bart Van Assche @ 2018-08-28 14:34 UTC (permalink / raw) To: Damien Le Moal, fio, Jens Axboe On 08/27/18 22:34, Damien Le Moal wrote: > zbd_do_trim() and zbd_update_wp() are not implemented. > Remove there declaration from zbd.h. Regarding the patch subject: did you perhaps mean "inexistent"? Otherwise this patch looks fine to me. Bart. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-08-29 15:42 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-28 5:34 [PATCH] ZBD support improvment and cleanup Damien Le Moal 2018-08-28 5:34 ` [PATCH] zbd: Improve read randomness Damien Le Moal 2018-08-28 14:38 ` Bart Van Assche 2018-08-29 0:57 ` Damien Le Moal 2018-08-29 3:23 ` Bart Van Assche 2018-08-29 3:56 ` Damien Le Moal 2018-08-29 15:42 ` Bart Van Assche 2018-08-28 5:34 ` [PATCH] zbd: Remove inexitent function declaration Damien Le Moal 2018-08-28 14:34 ` Bart Van Assche
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.