From: Ming Lei <ming.lei@redhat.com> To: Yu Kuai <yukuai3@huawei.com> Cc: tj@kernel.org, axboe@kernel.dk, cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com Subject: Re: [PATCH -next] blk-throttle: delay the setting of 'BIO_THROTTLED' to when throttle is done Date: Wed, 18 May 2022 12:07:12 +0800 [thread overview] Message-ID: <YoRw8J1Y/bzxVsSR@T590> (raw) In-Reply-To: <20220517134909.2910251-1-yukuai3@huawei.com> On Tue, May 17, 2022 at 09:49:09PM +0800, Yu Kuai wrote: > commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") > introduce a new problem, for example: > > [root@localhost ~]# echo "8:0 1024" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device > [root@localhost ~]# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs > [root@localhost ~]# dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & > [1] 620 > [root@localhost ~]# dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & > [2] 626 > [root@localhost ~]# 1+0 records in > 1+0 records out > 10240 bytes (10 kB, 10 KiB) copied, 10.0038 s, 1.0 kB/s1+0 records in > 1+0 records out > > 10240 bytes (10 kB, 10 KiB) copied, 9.23076 s, 1.1 kB/s > -> the second bio is issued after 10s instead of 20s. > > This is because if some bios are already queued, current bio is queued > directly and the flag 'BIO_THROTTLED' is set. And later, when former > bios are dispatched, this bio will be dispatched without waiting at all, > this is due to tg_with_in_bps_limit() will return 0 if the flag is set. > > Instead of setting the flag when bio starts throttle, delay to when > throttle is done to fix the problem. > > Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > block/blk-throttle.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > index 447e1b8722f7..f952f2d942ff 100644 > --- a/block/blk-throttle.c > +++ b/block/blk-throttle.c > @@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, > unsigned int bio_size = throtl_bio_data_size(bio); > > /* no need to throttle if this bio's bytes have been accounted */ > - if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) { > + if (bps_limit == U64_MAX) { This way may double account bio size for re-entered split bio. > if (wait) > *wait = 0; > return true; > @@ -1226,8 +1226,10 @@ static void blk_throtl_dispatch_work_fn(struct work_struct *work) > > spin_lock_irq(&q->queue_lock); > for (rw = READ; rw <= WRITE; rw++) > - while ((bio = throtl_pop_queued(&td_sq->queued[rw], NULL))) > + while ((bio = throtl_pop_queued(&td_sq->queued[rw], NULL))) { > + bio_set_flag(bio, BIO_THROTTLED); > bio_list_add(&bio_list_on_stack, bio); > + } > spin_unlock_irq(&q->queue_lock); > > if (!bio_list_empty(&bio_list_on_stack)) { > @@ -2134,7 +2136,8 @@ bool __blk_throtl_bio(struct bio *bio) > } > break; > } > - > + /* this bio will be issued directly */ > + bio_set_flag(bio, BIO_THROTTLED); > /* within limits, let's charge and dispatch directly */ > throtl_charge_bio(tg, bio); Marking BIO_THROTTLED before throtle_charge_bio() causes the bio bytes not be charged. Another simple way is to compensate for previous extra bytes accounting, something like the following patch: diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 139b2d7a99e2..44773d2ba257 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -810,8 +810,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; unsigned int bio_size = throtl_bio_data_size(bio); - /* no need to throttle if this bio's bytes have been accounted */ - if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) { + if (bps_limit == U64_MAX) { if (wait) *wait = 0; return true; @@ -921,10 +920,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) unsigned int bio_size = throtl_bio_data_size(bio); /* Charge the bio to the group */ - if (!bio_flagged(bio, BIO_THROTTLED)) { - tg->bytes_disp[rw] += bio_size; - tg->last_bytes_disp[rw] += bio_size; - } + tg->bytes_disp[rw] += bio_size; + tg->last_bytes_disp[rw] += bio_size; tg->io_disp[rw]++; tg->last_io_disp[rw]++; @@ -2125,6 +2122,20 @@ bool __blk_throtl_bio(struct bio *bio) if (sq->nr_queued[rw]) break; + /* + * re-entered bio has accounted bytes already, so try to + * compensate previous over-accounting. However, if new + * slice is started, just forget it + */ + if (bio_flagged(bio, BIO_THROTTLED)) { + unsigned int bio_size = throtl_bio_data_size(bio); + + if (tg->bytes_disp[rw] >= bio_size) + tg->bytes_disp[rw] -= bio_size; + if (tg->last_bytes_disp[rw] - bio_size) + tg->last_bytes_disp[rw] -= bio_size; + } + /* if above limits, break to queue */ if (!tg_may_dispatch(tg, bio, NULL)) { tg->last_low_overflow_time[rw] = jiffies; Thanks, Ming
WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> To: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> Cc: tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, yi.zhang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org Subject: Re: [PATCH -next] blk-throttle: delay the setting of 'BIO_THROTTLED' to when throttle is done Date: Wed, 18 May 2022 12:07:12 +0800 [thread overview] Message-ID: <YoRw8J1Y/bzxVsSR@T590> (raw) In-Reply-To: <20220517134909.2910251-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> On Tue, May 17, 2022 at 09:49:09PM +0800, Yu Kuai wrote: > commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") > introduce a new problem, for example: > > [root@localhost ~]# echo "8:0 1024" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device > [root@localhost ~]# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs > [root@localhost ~]# dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & > [1] 620 > [root@localhost ~]# dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & > [2] 626 > [root@localhost ~]# 1+0 records in > 1+0 records out > 10240 bytes (10 kB, 10 KiB) copied, 10.0038 s, 1.0 kB/s1+0 records in > 1+0 records out > > 10240 bytes (10 kB, 10 KiB) copied, 9.23076 s, 1.1 kB/s > -> the second bio is issued after 10s instead of 20s. > > This is because if some bios are already queued, current bio is queued > directly and the flag 'BIO_THROTTLED' is set. And later, when former > bios are dispatched, this bio will be dispatched without waiting at all, > this is due to tg_with_in_bps_limit() will return 0 if the flag is set. > > Instead of setting the flag when bio starts throttle, delay to when > throttle is done to fix the problem. > > Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") > Signed-off-by: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> > --- > block/blk-throttle.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > index 447e1b8722f7..f952f2d942ff 100644 > --- a/block/blk-throttle.c > +++ b/block/blk-throttle.c > @@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, > unsigned int bio_size = throtl_bio_data_size(bio); > > /* no need to throttle if this bio's bytes have been accounted */ > - if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) { > + if (bps_limit == U64_MAX) { This way may double account bio size for re-entered split bio. > if (wait) > *wait = 0; > return true; > @@ -1226,8 +1226,10 @@ static void blk_throtl_dispatch_work_fn(struct work_struct *work) > > spin_lock_irq(&q->queue_lock); > for (rw = READ; rw <= WRITE; rw++) > - while ((bio = throtl_pop_queued(&td_sq->queued[rw], NULL))) > + while ((bio = throtl_pop_queued(&td_sq->queued[rw], NULL))) { > + bio_set_flag(bio, BIO_THROTTLED); > bio_list_add(&bio_list_on_stack, bio); > + } > spin_unlock_irq(&q->queue_lock); > > if (!bio_list_empty(&bio_list_on_stack)) { > @@ -2134,7 +2136,8 @@ bool __blk_throtl_bio(struct bio *bio) > } > break; > } > - > + /* this bio will be issued directly */ > + bio_set_flag(bio, BIO_THROTTLED); > /* within limits, let's charge and dispatch directly */ > throtl_charge_bio(tg, bio); Marking BIO_THROTTLED before throtle_charge_bio() causes the bio bytes not be charged. Another simple way is to compensate for previous extra bytes accounting, something like the following patch: diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 139b2d7a99e2..44773d2ba257 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -810,8 +810,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; unsigned int bio_size = throtl_bio_data_size(bio); - /* no need to throttle if this bio's bytes have been accounted */ - if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) { + if (bps_limit == U64_MAX) { if (wait) *wait = 0; return true; @@ -921,10 +920,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) unsigned int bio_size = throtl_bio_data_size(bio); /* Charge the bio to the group */ - if (!bio_flagged(bio, BIO_THROTTLED)) { - tg->bytes_disp[rw] += bio_size; - tg->last_bytes_disp[rw] += bio_size; - } + tg->bytes_disp[rw] += bio_size; + tg->last_bytes_disp[rw] += bio_size; tg->io_disp[rw]++; tg->last_io_disp[rw]++; @@ -2125,6 +2122,20 @@ bool __blk_throtl_bio(struct bio *bio) if (sq->nr_queued[rw]) break; + /* + * re-entered bio has accounted bytes already, so try to + * compensate previous over-accounting. However, if new + * slice is started, just forget it + */ + if (bio_flagged(bio, BIO_THROTTLED)) { + unsigned int bio_size = throtl_bio_data_size(bio); + + if (tg->bytes_disp[rw] >= bio_size) + tg->bytes_disp[rw] -= bio_size; + if (tg->last_bytes_disp[rw] - bio_size) + tg->last_bytes_disp[rw] -= bio_size; + } + /* if above limits, break to queue */ if (!tg_may_dispatch(tg, bio, NULL)) { tg->last_low_overflow_time[rw] = jiffies; Thanks, Ming
next prev parent reply other threads:[~2022-05-18 4:09 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-05-17 13:49 [PATCH -next] blk-throttle: delay the setting of 'BIO_THROTTLED' to when throttle is done Yu Kuai 2022-05-17 13:49 ` Yu Kuai 2022-05-18 4:07 ` Ming Lei [this message] 2022-05-18 4:07 ` Ming Lei 2022-05-18 6:30 ` yukuai (C) 2022-05-18 6:30 ` yukuai (C)
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YoRw8J1Y/bzxVsSR@T590 \ --to=ming.lei@redhat.com \ --cc=axboe@kernel.dk \ --cc=cgroups@vger.kernel.org \ --cc=linux-block@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=tj@kernel.org \ --cc=yi.zhang@huawei.com \ --cc=yukuai3@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.