linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/4] blk-throttle bugfix
@ 2022-08-29  2:22 Yu Kuai
  2022-08-29  2:22 ` [PATCH v9 1/4] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Yu Kuai @ 2022-08-29  2:22 UTC (permalink / raw)
  To: axboe, tj, mkoutny, ming.lei
  Cc: linux-block, linux-kernel, cgroups, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

Changes in v9:
 - renaming the flag BIO_THROTTLED to BIO_BPS_THROTTLED, and always
 apply iops limit in path 1;
 - add tag for patch 4
Changes in v8:
 - use a new solution in patch 1
 - move cleanups to a separate patchset
 - rename bytes/io_skipped to carryover_bytes/ios in patch 4
Changes in v7:
 - add patch 5 to improve handling of re-entered bio for bps limit
 - as suggested by Tejun, add some comments
 - sdd some Acked tag by Tejun
Changes in v6:
 - rename parameter in patch 3
 - add comments and reviewed tag for patch 4
Changes in v5:
 - add comments in patch 4
 - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
 patch 4
 - and cleanup patches 5-8
Changes in v4:
 - add reviewed-by tag for patch 1
 - add patch 2,3
 - use a different way to fix io hung in patch 4
Changes in v3:
 - fix a check in patch 1
 - fix link err in patch 2 on 32-bit platform
 - handle overflow in patch 2
Changes in v2:
 - use a new solution suggested by Ming
 - change the title of patch 1
 - add patch 2

Patch 1 fix that blk-throttle can't work if multiple bios are throttle.
Patch 2 fix overflow while calculating wait time.
Patch 3,4 fix io hung due to configuration updates.

Previous version:
v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
v4: https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
v5: https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/
v6: https://lore.kernel.org/all/20220701093441.885741-1-yukuai1@huaweicloud.com/
v7: https://lore.kernel.org/all/20220802140415.2960284-1-yukuai1@huaweicloud.com/
v8: https://lore.kernel.org/all/20220823033130.874230-1-yukuai1@huaweicloud.com/

Yu Kuai (4):
  blk-throttle: fix that io throttle can only work for single bio
  blk-throttle: prevent overflow while calculating wait time
  blk-throttle: factor out code to calculate ios/bytes_allowed
  blk-throttle: fix io hung due to configuration updates

 block/bio.c               |   2 -
 block/blk-throttle.c      | 137 +++++++++++++++++++++++++-------------
 block/blk-throttle.h      |  11 ++-
 include/linux/bio.h       |   2 +-
 include/linux/blk_types.h |   2 +-
 5 files changed, 104 insertions(+), 50 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v9 1/4] blk-throttle: fix that io throttle can only work for single bio
  2022-08-29  2:22 [PATCH v9 0/4] blk-throttle bugfix Yu Kuai
@ 2022-08-29  2:22 ` Yu Kuai
  2022-08-29  4:03   ` Tejun Heo
  2022-08-29  2:22 ` [PATCH v9 2/4] blk-throttle: prevent overflow while calculating wait time Yu Kuai
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Yu Kuai @ 2022-08-29  2:22 UTC (permalink / raw)
  To: axboe, tj, mkoutny, ming.lei
  Cc: linux-block, linux-kernel, cgroups, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

Test scripts:
cd /sys/fs/cgroup/blkio/
echo "8:0 1024" > blkio.throttle.write_bps_device
echo $$ > cgroup.procs
dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &

Test result:
10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s
10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s

The problem is that the second bio is finished after 10s instead of 20s.

Root cause:
1) second bio will be flagged:

__blk_throtl_bio
 while (true) {
  ...
  if (sq->nr_queued[rw]) -> some bio is throttled already
   break
 };
 bio_set_flag(bio, BIO_THROTTLED); -> flag the bio

2) flagged bio will be dispatched without waiting:

throtl_dispatch_tg
 tg_may_dispatch
  tg_with_in_bps_limit
   if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED))
    *wait = 0; -> wait time is zero
    return true;

commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
support to count split bios for iops limit, thus it adds flagged bio
checking in tg_with_in_bps_limit() so that split bios will only count
once for bps limit, however, it introduce a new problem that io throttle
won't work if multiple bios are throttled.

In order to fix the problem, handle iops/bps limit in different ways:

1) for iops limit, there is no flag to record if the bio is throttled,
   and iops is always applied.
2) for bps limit, original bio will be flagged with BIO_BPS_THROTTLED,
   and io throttle will ignore bio with the flag.

Noted this patch also remove the code to set flag in __bio_clone(), it's
introduced in commit 111be8839817 ("block-throttle: avoid double
charge"), and author thinks split bio can be resubmited and throttled
again, which is wrong because split bio will continue to dispatch from
caller.

Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
Cc: <stable@vger.kernel.org>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bio.c               |  2 --
 block/blk-throttle.c      | 20 ++++++--------------
 block/blk-throttle.h      |  2 +-
 include/linux/bio.h       |  2 +-
 include/linux/blk_types.h |  2 +-
 5 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3d3a2678fea2..77e3b764a078 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -760,8 +760,6 @@ EXPORT_SYMBOL(bio_put);
 static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
 {
 	bio_set_flag(bio, BIO_CLONED);
-	if (bio_flagged(bio_src, BIO_THROTTLED))
-		bio_set_flag(bio, BIO_THROTTLED);
 	bio->bi_ioprio = bio_src->bi_ioprio;
 	bio->bi_iter = bio_src->bi_iter;
 
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 47142a1dd102..7a4c815e0f23 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -814,7 +814,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
 	/* no need to throttle if this bio's bytes have been accounted */
-	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) {
+	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_BPS_THROTTLED)) {
 		if (wait)
 			*wait = 0;
 		return true;
@@ -924,22 +924,13 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
 	/* Charge the bio to the group */
-	if (!bio_flagged(bio, BIO_THROTTLED)) {
+	if (!bio_flagged(bio, BIO_BPS_THROTTLED)) {
 		tg->bytes_disp[rw] += bio_size;
 		tg->last_bytes_disp[rw] += bio_size;
 	}
 
 	tg->io_disp[rw]++;
 	tg->last_io_disp[rw]++;
-
-	/*
-	 * BIO_THROTTLED is used to prevent the same bio to be throttled
-	 * more than once as a throttled bio will go through blk-throtl the
-	 * second time when it eventually gets issued.  Set it when a bio
-	 * is being charged to a tg.
-	 */
-	if (!bio_flagged(bio, BIO_THROTTLED))
-		bio_set_flag(bio, BIO_THROTTLED);
 }
 
 /**
@@ -1029,6 +1020,7 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
 	sq->nr_queued[rw]--;
 
 	throtl_charge_bio(tg, bio);
+	bio_set_flag(bio, BIO_BPS_THROTTLED);
 
 	/*
 	 * If our parent is another tg, we just need to transfer @bio to
@@ -2162,8 +2154,10 @@ bool __blk_throtl_bio(struct bio *bio)
 		qn = &tg->qnode_on_parent[rw];
 		sq = sq->parent_sq;
 		tg = sq_to_tg(sq);
-		if (!tg)
+		if (!tg) {
+			bio_set_flag(bio, BIO_BPS_THROTTLED);
 			goto out_unlock;
+		}
 	}
 
 	/* out-of-limit, queue to @tg */
@@ -2192,8 +2186,6 @@ bool __blk_throtl_bio(struct bio *bio)
 	}
 
 out_unlock:
-	bio_set_flag(bio, BIO_THROTTLED);
-
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
 	if (throttled || !td->track_bio_latency)
 		bio->bi_issue.value |= BIO_ISSUE_THROTL_SKIP_LATENCY;
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index c1b602996127..ee7299e6dea9 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -175,7 +175,7 @@ static inline bool blk_throtl_bio(struct bio *bio)
 	struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg);
 
 	/* no need to throttle bps any more if the bio has been throttled */
-	if (bio_flagged(bio, BIO_THROTTLED) &&
+	if (bio_flagged(bio, BIO_BPS_THROTTLED) &&
 	    !(tg->flags & THROTL_TG_HAS_IOPS_LIMIT))
 		return false;
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ca22b06700a9..2c5806997bbf 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -509,7 +509,7 @@ static inline void bio_set_dev(struct bio *bio, struct block_device *bdev)
 {
 	bio_clear_flag(bio, BIO_REMAPPED);
 	if (bio->bi_bdev != bdev)
-		bio_clear_flag(bio, BIO_THROTTLED);
+		bio_clear_flag(bio, BIO_BPS_THROTTLED);
 	bio->bi_bdev = bdev;
 	bio_associate_blkg(bio);
 }
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 1ef99790f6ed..41afb4cfb9b0 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -325,7 +325,7 @@ enum {
 	BIO_QUIET,		/* Make BIO Quiet */
 	BIO_CHAIN,		/* chained bio, ->bi_remaining in effect */
 	BIO_REFFED,		/* bio has elevated ->bi_cnt */
-	BIO_THROTTLED,		/* This bio has already been subjected to
+	BIO_BPS_THROTTLED,	/* This bio has already been subjected to
 				 * throttling rules. Don't do it again. */
 	BIO_TRACE_COMPLETION,	/* bio_endio() should trace the final completion
 				 * of this bio. */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v9 2/4] blk-throttle: prevent overflow while calculating wait time
  2022-08-29  2:22 [PATCH v9 0/4] blk-throttle bugfix Yu Kuai
  2022-08-29  2:22 ` [PATCH v9 1/4] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
@ 2022-08-29  2:22 ` Yu Kuai
  2022-08-29  2:22 ` [PATCH v9 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed Yu Kuai
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Yu Kuai @ 2022-08-29  2:22 UTC (permalink / raw)
  To: axboe, tj, mkoutny, ming.lei
  Cc: linux-block, linux-kernel, cgroups, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

There is a problem found by code review in tg_with_in_bps_limit() that
'bps_limit * jiffy_elapsed_rnd' might overflow. Fix the problem by
calling mul_u64_u64_div_u64() instead.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/blk-throttle.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 7a4c815e0f23..d193a738a966 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -809,7 +809,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 				 u64 bps_limit, unsigned long *wait)
 {
 	bool rw = bio_data_dir(bio);
-	u64 bytes_allowed, extra_bytes, tmp;
+	u64 bytes_allowed, extra_bytes;
 	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
@@ -827,10 +827,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-
-	tmp = bps_limit * jiffy_elapsed_rnd;
-	do_div(tmp, HZ);
-	bytes_allowed = tmp;
+	bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
+					    (u64)HZ);
 
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v9 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed
  2022-08-29  2:22 [PATCH v9 0/4] blk-throttle bugfix Yu Kuai
  2022-08-29  2:22 ` [PATCH v9 1/4] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
  2022-08-29  2:22 ` [PATCH v9 2/4] blk-throttle: prevent overflow while calculating wait time Yu Kuai
@ 2022-08-29  2:22 ` Yu Kuai
  2022-08-29  2:22 ` [PATCH v9 4/4] blk-throttle: fix io hung due to configuration updates Yu Kuai
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Yu Kuai @ 2022-08-29  2:22 UTC (permalink / raw)
  To: axboe, tj, mkoutny, ming.lei
  Cc: linux-block, linux-kernel, cgroups, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

No functional changes, new apis will be used in later patches to
calculate wait time for throttled bios when new configuration is
submitted.

Noted this patch also rename tg_with_in_iops/bps_limit() to
tg_within_iops/bps_limit().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/blk-throttle.c | 59 ++++++++++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index d193a738a966..04c72d3283a5 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -757,33 +757,20 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
 		   tg->slice_start[rw], tg->slice_end[rw], jiffies);
 }
 
-static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
-				  u32 iops_limit, unsigned long *wait)
+static unsigned int calculate_io_allowed(u32 iops_limit,
+					 unsigned long jiffy_elapsed)
 {
-	bool rw = bio_data_dir(bio);
 	unsigned int io_allowed;
-	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	u64 tmp;
 
-	if (iops_limit == UINT_MAX) {
-		if (wait)
-			*wait = 0;
-		return true;
-	}
-
-	jiffy_elapsed = jiffies - tg->slice_start[rw];
-
-	/* Round up to the next throttle slice, wait time must be nonzero */
-	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-
 	/*
-	 * jiffy_elapsed_rnd should not be a big value as minimum iops can be
+	 * jiffy_elapsed should not be a big value as minimum iops can be
 	 * 1 then at max jiffy elapsed should be equivalent of 1 second as we
 	 * will allow dispatch after 1 second and after that slice should
 	 * have been trimmed.
 	 */
 
-	tmp = (u64)iops_limit * jiffy_elapsed_rnd;
+	tmp = (u64)iops_limit * jiffy_elapsed;
 	do_div(tmp, HZ);
 
 	if (tmp > UINT_MAX)
@@ -791,6 +778,32 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 	else
 		io_allowed = tmp;
 
+	return io_allowed;
+}
+
+static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
+{
+	return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
+}
+
+static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
+				 u32 iops_limit, unsigned long *wait)
+{
+	bool rw = bio_data_dir(bio);
+	unsigned int io_allowed;
+	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
+
+	if (iops_limit == UINT_MAX) {
+		if (wait)
+			*wait = 0;
+		return true;
+	}
+
+	jiffy_elapsed = jiffies - tg->slice_start[rw];
+
+	/* Round up to the next throttle slice, wait time must be nonzero */
+	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
+	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
 		if (wait)
 			*wait = 0;
@@ -805,8 +818,8 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 	return false;
 }
 
-static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
-				 u64 bps_limit, unsigned long *wait)
+static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
+				u64 bps_limit, unsigned long *wait)
 {
 	bool rw = bio_data_dir(bio);
 	u64 bytes_allowed, extra_bytes;
@@ -827,9 +840,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-	bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
-					    (u64)HZ);
-
+	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
 			*wait = 0;
@@ -898,8 +909,8 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 				jiffies + tg->td->throtl_slice);
 	}
 
-	if (tg_with_in_bps_limit(tg, bio, bps_limit, &bps_wait) &&
-	    tg_with_in_iops_limit(tg, bio, iops_limit, &iops_wait)) {
+	if (tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) &&
+	    tg_within_iops_limit(tg, bio, iops_limit, &iops_wait)) {
 		if (wait)
 			*wait = 0;
 		return true;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v9 4/4] blk-throttle: fix io hung due to configuration updates
  2022-08-29  2:22 [PATCH v9 0/4] blk-throttle bugfix Yu Kuai
                   ` (2 preceding siblings ...)
  2022-08-29  2:22 ` [PATCH v9 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed Yu Kuai
@ 2022-08-29  2:22 ` Yu Kuai
  2022-08-31 11:31 ` [PATCH v9 0/4] blk-throttle bugfix Yu Kuai
  2022-09-12  6:20 ` Jens Axboe
  5 siblings, 0 replies; 9+ messages in thread
From: Yu Kuai @ 2022-08-29  2:22 UTC (permalink / raw)
  To: axboe, tj, mkoutny, ming.lei
  Cc: linux-block, linux-kernel, cgroups, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

If new configuration is submitted while a bio is throttled, then new
waiting time is recalculated regardless that the bio might already wait
for some time:

tg_conf_updated
 throtl_start_new_slice
  tg_update_disptime
  throtl_schedule_next_dispatch

Then io hung can be triggered by always submmiting new configuration
before the throttled bio is dispatched.

Fix the problem by respecting the time that throttled bio already waited.
In order to do that, add new fields to record how many bytes/io are
waited, and use it to calculate wait time for throttled bio under new
configuration.

Some simple test:
1)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 2048" > blkio.throttle.write_bps_device
{
        sleep 2
        echo "8:0 1024" > blkio.throttle.write_bps_device
} &
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

2)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 1024" > blkio.throttle.write_bps_device
{
        sleep 4
        echo "8:0 2048" > blkio.throttle.write_bps_device
} &
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

test results: io finish time
	before this patch	with this patch
1)	10s			6s
2)	8s			6s

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/blk-throttle.c | 58 +++++++++++++++++++++++++++++++++++++++-----
 block/blk-throttle.h |  9 +++++++
 2 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 04c72d3283a5..7179b5cd42dc 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -642,6 +642,8 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
 {
 	tg->bytes_disp[rw] = 0;
 	tg->io_disp[rw] = 0;
+	tg->carryover_bytes[rw] = 0;
+	tg->carryover_ios[rw] = 0;
 
 	/*
 	 * Previous slice has expired. We must have trimmed it after last
@@ -659,12 +661,17 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
 		   tg->slice_end[rw], jiffies);
 }
 
-static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw)
+static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw,
+					  bool clear_carryover)
 {
 	tg->bytes_disp[rw] = 0;
 	tg->io_disp[rw] = 0;
 	tg->slice_start[rw] = jiffies;
 	tg->slice_end[rw] = jiffies + tg->td->throtl_slice;
+	if (clear_carryover) {
+		tg->carryover_bytes[rw] = 0;
+		tg->carryover_ios[rw] = 0;
+	}
 
 	throtl_log(&tg->service_queue,
 		   "[%c] new slice start=%lu end=%lu jiffies=%lu",
@@ -786,6 +793,41 @@ static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
 	return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
 }
 
+static void __tg_update_carryover(struct throtl_grp *tg, bool rw)
+{
+	unsigned long jiffy_elapsed = jiffies - tg->slice_start[rw];
+	u64 bps_limit = tg_bps_limit(tg, rw);
+	u32 iops_limit = tg_iops_limit(tg, rw);
+
+	/*
+	 * If config is updated while bios are still throttled, calculate and
+	 * accumulate how many bytes/ios are waited across changes. And
+	 * carryover_bytes/ios will be used to calculate new wait time under new
+	 * configuration.
+	 */
+	if (bps_limit != U64_MAX)
+		tg->carryover_bytes[rw] +=
+			calculate_bytes_allowed(bps_limit, jiffy_elapsed) -
+			tg->bytes_disp[rw];
+	if (iops_limit != UINT_MAX)
+		tg->carryover_ios[rw] +=
+			calculate_io_allowed(iops_limit, jiffy_elapsed) -
+			tg->io_disp[rw];
+}
+
+static void tg_update_carryover(struct throtl_grp *tg)
+{
+	if (tg->service_queue.nr_queued[READ])
+		__tg_update_carryover(tg, READ);
+	if (tg->service_queue.nr_queued[WRITE])
+		__tg_update_carryover(tg, WRITE);
+
+	/* see comments in struct throtl_grp for meaning of these fields. */
+	throtl_log(&tg->service_queue, "%s: %llu %llu %u %u\n", __func__,
+		   tg->carryover_bytes[READ], tg->carryover_bytes[WRITE],
+		   tg->carryover_ios[READ], tg->carryover_ios[WRITE]);
+}
+
 static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
 				 u32 iops_limit, unsigned long *wait)
 {
@@ -803,7 +845,8 @@ static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
 
 	/* Round up to the next throttle slice, wait time must be nonzero */
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
+	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd) +
+		     tg->carryover_ios[rw];
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
 		if (wait)
 			*wait = 0;
@@ -840,7 +883,8 @@ static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
+	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd) +
+			tg->carryover_bytes[rw];
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
 			*wait = 0;
@@ -901,7 +945,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 	 * slice and it should be extended instead.
 	 */
 	if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]))
-		throtl_start_new_slice(tg, rw);
+		throtl_start_new_slice(tg, rw, true);
 	else {
 		if (time_before(tg->slice_end[rw],
 		    jiffies + tg->td->throtl_slice))
@@ -1325,8 +1369,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
 	 * that a group's limit are dropped suddenly and we don't want to
 	 * account recently dispatched IO with new low rate.
 	 */
-	throtl_start_new_slice(tg, READ);
-	throtl_start_new_slice(tg, WRITE);
+	throtl_start_new_slice(tg, READ, false);
+	throtl_start_new_slice(tg, WRITE, false);
 
 	if (tg->flags & THROTL_TG_PENDING) {
 		tg_update_disptime(tg);
@@ -1354,6 +1398,7 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of,
 		v = U64_MAX;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_carryover(tg);
 
 	if (is_u64)
 		*(u64 *)((void *)tg + of_cft(of)->private) = v;
@@ -1540,6 +1585,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of,
 		return ret;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_carryover(tg);
 
 	v[0] = tg->bps_conf[READ][index];
 	v[1] = tg->bps_conf[WRITE][index];
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index ee7299e6dea9..66b4292b1b92 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -121,6 +121,15 @@ struct throtl_grp {
 	uint64_t last_bytes_disp[2];
 	unsigned int last_io_disp[2];
 
+	/*
+	 * The following two fields are updated when new configuration is
+	 * submitted while some bios are still throttled, they record how many
+	 * bytes/ios are waited already in previous configuration, and they will
+	 * be used to calculate wait time under new configuration.
+	 */
+	uint64_t carryover_bytes[2];
+	unsigned int carryover_ios[2];
+
 	unsigned long last_check_time;
 
 	unsigned long latency_target; /* us */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v9 1/4] blk-throttle: fix that io throttle can only work for single bio
  2022-08-29  2:22 ` [PATCH v9 1/4] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
@ 2022-08-29  4:03   ` Tejun Heo
  0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2022-08-29  4:03 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, mkoutny, ming.lei, linux-block, linux-kernel, cgroups,
	yukuai3, yi.zhang

On Mon, Aug 29, 2022 at 10:22:37AM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Test scripts:
> cd /sys/fs/cgroup/blkio/
> echo "8:0 1024" > blkio.throttle.write_bps_device
> echo $$ > cgroup.procs
> dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
> dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
> 
> Test result:
> 10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s
> 10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s
> 
> The problem is that the second bio is finished after 10s instead of 20s.
> 
> Root cause:
> 1) second bio will be flagged:
> 
> __blk_throtl_bio
>  while (true) {
>   ...
>   if (sq->nr_queued[rw]) -> some bio is throttled already
>    break
>  };
>  bio_set_flag(bio, BIO_THROTTLED); -> flag the bio
> 
> 2) flagged bio will be dispatched without waiting:
> 
> throtl_dispatch_tg
>  tg_may_dispatch
>   tg_with_in_bps_limit
>    if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED))
>     *wait = 0; -> wait time is zero
>     return true;
> 
> commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
> support to count split bios for iops limit, thus it adds flagged bio
> checking in tg_with_in_bps_limit() so that split bios will only count
> once for bps limit, however, it introduce a new problem that io throttle
> won't work if multiple bios are throttled.
> 
> In order to fix the problem, handle iops/bps limit in different ways:
> 
> 1) for iops limit, there is no flag to record if the bio is throttled,
>    and iops is always applied.
> 2) for bps limit, original bio will be flagged with BIO_BPS_THROTTLED,
>    and io throttle will ignore bio with the flag.
> 
> Noted this patch also remove the code to set flag in __bio_clone(), it's
> introduced in commit 111be8839817 ("block-throttle: avoid double
> charge"), and author thinks split bio can be resubmited and throttled
> again, which is wrong because split bio will continue to dispatch from
> caller.
> 
> Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v9 0/4] blk-throttle bugfix
  2022-08-29  2:22 [PATCH v9 0/4] blk-throttle bugfix Yu Kuai
                   ` (3 preceding siblings ...)
  2022-08-29  2:22 ` [PATCH v9 4/4] blk-throttle: fix io hung due to configuration updates Yu Kuai
@ 2022-08-31 11:31 ` Yu Kuai
  2022-09-12  6:13   ` Yu Kuai
  2022-09-12  6:20 ` Jens Axboe
  5 siblings, 1 reply; 9+ messages in thread
From: Yu Kuai @ 2022-08-31 11:31 UTC (permalink / raw)
  To: Yu Kuai, axboe, tj, mkoutny, ming.lei
  Cc: linux-block, linux-kernel, cgroups, yi.zhang, yukuai (C)

Hi, Jens!

在 2022/08/29 10:22, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Changes in v9:
>   - renaming the flag BIO_THROTTLED to BIO_BPS_THROTTLED, and always
>   apply iops limit in path 1;
>   - add tag for patch 4
> Changes in v8:
>   - use a new solution in patch 1
>   - move cleanups to a separate patchset
>   - rename bytes/io_skipped to carryover_bytes/ios in patch 4
> Changes in v7:
>   - add patch 5 to improve handling of re-entered bio for bps limit
>   - as suggested by Tejun, add some comments
>   - sdd some Acked tag by Tejun
> Changes in v6:
>   - rename parameter in patch 3
>   - add comments and reviewed tag for patch 4
> Changes in v5:
>   - add comments in patch 4
>   - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
>   patch 4
>   - and cleanup patches 5-8
> Changes in v4:
>   - add reviewed-by tag for patch 1
>   - add patch 2,3
>   - use a different way to fix io hung in patch 4
> Changes in v3:
>   - fix a check in patch 1
>   - fix link err in patch 2 on 32-bit platform
>   - handle overflow in patch 2
> Changes in v2:
>   - use a new solution suggested by Ming
>   - change the title of patch 1
>   - add patch 2
> 
> Patch 1 fix that blk-throttle can't work if multiple bios are throttle.
> Patch 2 fix overflow while calculating wait time.
> Patch 3,4 fix io hung due to configuration updates.
> 
> Previous version:
> v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
> v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
> v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
> v4: https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
> v5: https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/
> v6: https://lore.kernel.org/all/20220701093441.885741-1-yukuai1@huaweicloud.com/
> v7: https://lore.kernel.org/all/20220802140415.2960284-1-yukuai1@huaweicloud.com/
> v8: https://lore.kernel.org/all/20220823033130.874230-1-yukuai1@huaweicloud.com/
> 
> Yu Kuai (4):
>    blk-throttle: fix that io throttle can only work for single bio
>    blk-throttle: prevent overflow while calculating wait time
>    blk-throttle: factor out code to calculate ios/bytes_allowed
>    blk-throttle: fix io hung due to configuration updates

Can you apply this patchset now?

Thanks,
Kuai
> 
>   block/bio.c               |   2 -
>   block/blk-throttle.c      | 137 +++++++++++++++++++++++++-------------
>   block/blk-throttle.h      |  11 ++-
>   include/linux/bio.h       |   2 +-
>   include/linux/blk_types.h |   2 +-
>   5 files changed, 104 insertions(+), 50 deletions(-)
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v9 0/4] blk-throttle bugfix
  2022-08-31 11:31 ` [PATCH v9 0/4] blk-throttle bugfix Yu Kuai
@ 2022-09-12  6:13   ` Yu Kuai
  0 siblings, 0 replies; 9+ messages in thread
From: Yu Kuai @ 2022-09-12  6:13 UTC (permalink / raw)
  To: Yu Kuai, axboe, tj, mkoutny, ming.lei
  Cc: linux-block, linux-kernel, cgroups, yi.zhang, yukuai (C)



在 2022/08/31 19:31, Yu Kuai 写道:
> Hi, Jens!
> 
> 在 2022/08/29 10:22, Yu Kuai 写道:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Changes in v9:
>>   - renaming the flag BIO_THROTTLED to BIO_BPS_THROTTLED, and always
>>   apply iops limit in path 1;
>>   - add tag for patch 4
>> Changes in v8:
>>   - use a new solution in patch 1
>>   - move cleanups to a separate patchset
>>   - rename bytes/io_skipped to carryover_bytes/ios in patch 4
>> Changes in v7:
>>   - add patch 5 to improve handling of re-entered bio for bps limit
>>   - as suggested by Tejun, add some comments
>>   - sdd some Acked tag by Tejun
>> Changes in v6:
>>   - rename parameter in patch 3
>>   - add comments and reviewed tag for patch 4
>> Changes in v5:
>>   - add comments in patch 4
>>   - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
>>   patch 4
>>   - and cleanup patches 5-8
>> Changes in v4:
>>   - add reviewed-by tag for patch 1
>>   - add patch 2,3
>>   - use a different way to fix io hung in patch 4
>> Changes in v3:
>>   - fix a check in patch 1
>>   - fix link err in patch 2 on 32-bit platform
>>   - handle overflow in patch 2
>> Changes in v2:
>>   - use a new solution suggested by Ming
>>   - change the title of patch 1
>>   - add patch 2
>>
>> Patch 1 fix that blk-throttle can't work if multiple bios are throttle.
>> Patch 2 fix overflow while calculating wait time.
>> Patch 3,4 fix io hung due to configuration updates.
>>
>> Previous version:
>> v1: 
>> https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
>> v2: 
>> https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
>> v3: 
>> https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
>> v4: 
>> https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
>> v5: 
>> https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/
>> v6: 
>> https://lore.kernel.org/all/20220701093441.885741-1-yukuai1@huaweicloud.com/ 
>>
>> v7: 
>> https://lore.kernel.org/all/20220802140415.2960284-1-yukuai1@huaweicloud.com/ 
>>
>> v8: 
>> https://lore.kernel.org/all/20220823033130.874230-1-yukuai1@huaweicloud.com/ 
>>
>>
>> Yu Kuai (4):
>>    blk-throttle: fix that io throttle can only work for single bio
>>    blk-throttle: prevent overflow while calculating wait time
>>    blk-throttle: factor out code to calculate ios/bytes_allowed
>>    blk-throttle: fix io hung due to configuration updates
> 
> Can you apply this patchset now?

friendly ping...
> 
> Thanks,
> Kuai
>>
>>   block/bio.c               |   2 -
>>   block/blk-throttle.c      | 137 +++++++++++++++++++++++++-------------
>>   block/blk-throttle.h      |  11 ++-
>>   include/linux/bio.h       |   2 +-
>>   include/linux/blk_types.h |   2 +-
>>   5 files changed, 104 insertions(+), 50 deletions(-)
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v9 0/4] blk-throttle bugfix
  2022-08-29  2:22 [PATCH v9 0/4] blk-throttle bugfix Yu Kuai
                   ` (4 preceding siblings ...)
  2022-08-31 11:31 ` [PATCH v9 0/4] blk-throttle bugfix Yu Kuai
@ 2022-09-12  6:20 ` Jens Axboe
  5 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2022-09-12  6:20 UTC (permalink / raw)
  To: mkoutny, tj, ming.lei, Yu Kuai
  Cc: cgroups, linux-block, yukuai3, linux-kernel, yi.zhang

On Mon, 29 Aug 2022 10:22:36 +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Changes in v9:
>  - renaming the flag BIO_THROTTLED to BIO_BPS_THROTTLED, and always
>  apply iops limit in path 1;
>  - add tag for patch 4
> Changes in v8:
>  - use a new solution in patch 1
>  - move cleanups to a separate patchset
>  - rename bytes/io_skipped to carryover_bytes/ios in patch 4
> Changes in v7:
>  - add patch 5 to improve handling of re-entered bio for bps limit
>  - as suggested by Tejun, add some comments
>  - sdd some Acked tag by Tejun
> Changes in v6:
>  - rename parameter in patch 3
>  - add comments and reviewed tag for patch 4
> Changes in v5:
>  - add comments in patch 4
>  - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
>  patch 4
>  - and cleanup patches 5-8
> Changes in v4:
>  - add reviewed-by tag for patch 1
>  - add patch 2,3
>  - use a different way to fix io hung in patch 4
> Changes in v3:
>  - fix a check in patch 1
>  - fix link err in patch 2 on 32-bit platform
>  - handle overflow in patch 2
> Changes in v2:
>  - use a new solution suggested by Ming
>  - change the title of patch 1
>  - add patch 2
> 
> [...]

Applied, thanks!

[1/4] blk-throttle: fix that io throttle can only work for single bio
      commit: 320fb0f91e55ba248d4bad106b408e59099cfa89
[2/4] blk-throttle: prevent overflow while calculating wait time
      commit: 8d6bbaada2e0a65f9012ac4c2506460160e7237a
[3/4] blk-throttle: factor out code to calculate ios/bytes_allowed
      commit: 681cd46fff8cd81e387747c7850f2e730d3e0b74
[4/4] blk-throttle: fix io hung due to configuration updates
      commit: a880ae93e5b5bb5d8d5500077a391e3f5ec7715c

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-09-12  6:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-29  2:22 [PATCH v9 0/4] blk-throttle bugfix Yu Kuai
2022-08-29  2:22 ` [PATCH v9 1/4] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
2022-08-29  4:03   ` Tejun Heo
2022-08-29  2:22 ` [PATCH v9 2/4] blk-throttle: prevent overflow while calculating wait time Yu Kuai
2022-08-29  2:22 ` [PATCH v9 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed Yu Kuai
2022-08-29  2:22 ` [PATCH v9 4/4] blk-throttle: fix io hung due to configuration updates Yu Kuai
2022-08-31 11:31 ` [PATCH v9 0/4] blk-throttle bugfix Yu Kuai
2022-09-12  6:13   ` Yu Kuai
2022-09-12  6:20 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).