linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/4] blk-throttle bugfix
@ 2022-08-23  3:31 Yu Kuai
  2022-08-23  3:31 ` [PATCH v8 1/4] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Yu Kuai @ 2022-08-23  3:31 UTC (permalink / raw)
  To: axboe, tj, ming.lei, mkoutny
  Cc: linux-block, linux-kernel, cgroups, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

Changes in v8:
 - use a new solution in patch 1
 - move cleanups to a separate patchset
 - rename bytes/io_skipped to carryover_bytes/ios in patch 4
Changes in v7:
 - add patch 5 to improve handling of re-entered bio for bps limit
 - as suggested by Tejun, add some comments
 - sdd some Acked tag by Tejun
Changes in v6:
 - rename parameter in patch 3
 - add comments and reviewed tag for patch 4
Changes in v5:
 - add comments in patch 4
 - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
 patch 4
 - and cleanup patches 5-8
Changes in v4:
 - add reviewed-by tag for patch 1
 - add patch 2,3
 - use a different way to fix io hung in patch 4
Changes in v3:
 - fix a check in patch 1
 - fix link err in patch 2 on 32-bit platform
 - handle overflow in patch 2
Changes in v2:
 - use a new solution suggested by Ming
 - change the title of patch 1
 - add patch 2

Patch 1 fix that blk-throttle can't work if multiple bios are throttle.
Patch 2 fix overflow while calculating wait time.
Patch 3,4 fix io hung due to configuration updates.

Previous version:
v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
v4: https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
v5: https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/
v6: https://lore.kernel.org/all/20220701093441.885741-1-yukuai1@huaweicloud.com/
v7: https://lore.kernel.org/all/20220802140415.2960284-1-yukuai1@huaweicloud.com/
Yu Kuai (4):
  blk-throttle: fix that io throttle can only work for single bio
  blk-throttle: prevent overflow while calculating wait time
  blk-throttle: factor out code to calculate ios/bytes_allowed
  blk-throttle: fix io hung due to configuration updates

 block/bio.c               |   2 -
 block/blk-merge.c         |   7 ++
 block/blk-throttle.c      | 138 +++++++++++++++++++++++++++-----------
 block/blk-throttle.h      |  15 ++++-
 include/linux/bio.h       |   6 +-
 include/linux/blk_types.h |   6 +-
 6 files changed, 125 insertions(+), 49 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v8 1/4] blk-throttle: fix that io throttle can only work for single bio
  2022-08-23  3:31 [PATCH v8 0/4] blk-throttle bugfix Yu Kuai
@ 2022-08-23  3:31 ` Yu Kuai
  2022-08-23 18:07   ` Tejun Heo
  2022-08-23  3:31 ` [PATCH v8 2/4] blk-throttle: prevent overflow while calculating wait time Yu Kuai
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2022-08-23  3:31 UTC (permalink / raw)
  To: axboe, tj, ming.lei, mkoutny
  Cc: linux-block, linux-kernel, cgroups, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

Test scripts:
cd /sys/fs/cgroup/blkio/
echo "8:0 1024" > blkio.throttle.write_bps_device
echo $$ > cgroup.procs
dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &

Test result:
10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s
10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s

The problem is that the second bio is finished after 10s instead of 20s.

Root cause:
1) second bio will be flagged:

__blk_throtl_bio
 while (true) {
  ...
  if (sq->nr_queued[rw]) -> some bio is throttled already
   break
 };
 bio_set_flag(bio, BIO_THROTTLED); -> flag the bio

2) flagged bio will be dispatched without waiting:

throtl_dispatch_tg
 tg_may_dispatch
  tg_with_in_bps_limit
   if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED))
    *wait = 0; -> wait time is zero
    return true;

commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
support to count split bios for iops limit, thus it adds flagged bio
checking in tg_with_in_bps_limit() so that split bios will only count
once for bps limit, however, it introduce a new problem that io throttle
won't work if multiple bios are throttled.

In order to fix the problem, two flags BIO_IOPS/BPS_THROTTLED is used,
thus iops and bps can be treated separately for split bio:

1) for bps limit, original bio already flagged, nothing need to be done.
2) for iops limit, original bio already flagged, and the flag should be
   cleared before resubmitting the original bio.

Noted this patch also remove the code to set flag in __bio_clone(), it's
introduced in commit 111be8839817 ("block-throttle: avoid double
charge"), and author thinks split bio can be resubmited and throttled
again, which is wrong because split bio will continue to dispatch from
caller.

Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
Cc: <stable@vger.kernel.org>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bio.c               |  2 --
 block/blk-merge.c         |  7 +++++++
 block/blk-throttle.c      | 23 +++++++++++++----------
 block/blk-throttle.h      |  6 +++---
 include/linux/bio.h       |  6 ++++--
 include/linux/blk_types.h |  6 ++++--
 6 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3d3a2678fea2..77e3b764a078 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -760,8 +760,6 @@ EXPORT_SYMBOL(bio_put);
 static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
 {
 	bio_set_flag(bio, BIO_CLONED);
-	if (bio_flagged(bio_src, BIO_THROTTLED))
-		bio_set_flag(bio, BIO_THROTTLED);
 	bio->bi_ioprio = bio_src->bi_ioprio;
 	bio->bi_iter = bio_src->bi_iter;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index ff04e9290715..10330bb038ca 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -358,6 +358,13 @@ struct bio *__bio_split_to_limits(struct bio *bio, struct queue_limits *lim,
 		blkcg_bio_issue_init(split);
 		bio_chain(split, bio);
 		trace_block_split(split, bio->bi_iter.bi_sector);
+
+		/*
+		 * original bio will be resubmited and throttled again, clear
+		 * the iops flag so that it can be count again for iops limit.
+		 */
+		if (bio_flagged(bio, BIO_IOPS_THROTTLED))
+			bio_clear_flag(bio, BIO_IOPS_THROTTLED);
 		submit_bio_noacct(bio);
 		return split;
 	}
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index e47506a8ef47..f3c9dcab63e2 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -762,7 +762,7 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	u64 tmp;
 
-	if (iops_limit == UINT_MAX) {
+	if (iops_limit == UINT_MAX || bio_flagged(bio, BIO_IOPS_THROTTLED)) {
 		if (wait)
 			*wait = 0;
 		return true;
@@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
 	/* no need to throttle if this bio's bytes have been accounted */
-	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) {
+	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_BPS_THROTTLED)) {
 		if (wait)
 			*wait = 0;
 		return true;
@@ -921,13 +921,11 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
 	/* Charge the bio to the group */
-	if (!bio_flagged(bio, BIO_THROTTLED)) {
+	if (!bio_flagged(bio, BIO_BPS_THROTTLED)) {
 		tg->bytes_disp[rw] += bio_size;
 		tg->last_bytes_disp[rw] += bio_size;
 	}
 
-	tg->io_disp[rw]++;
-	tg->last_io_disp[rw]++;
 
 	/*
 	 * BIO_THROTTLED is used to prevent the same bio to be throttled
@@ -935,8 +933,10 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
 	 * second time when it eventually gets issued.  Set it when a bio
 	 * is being charged to a tg.
 	 */
-	if (!bio_flagged(bio, BIO_THROTTLED))
-		bio_set_flag(bio, BIO_THROTTLED);
+	if (!bio_flagged(bio, BIO_IOPS_THROTTLED)) {
+		tg->io_disp[rw]++;
+		tg->last_io_disp[rw]++;
+	}
 }
 
 /**
@@ -1026,6 +1026,8 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
 	sq->nr_queued[rw]--;
 
 	throtl_charge_bio(tg, bio);
+	bio_set_flag(bio, BIO_BPS_THROTTLED);
+	bio_set_flag(bio, BIO_IOPS_THROTTLED);
 
 	/*
 	 * If our parent is another tg, we just need to transfer @bio to
@@ -2159,8 +2161,11 @@ bool __blk_throtl_bio(struct bio *bio)
 		qn = &tg->qnode_on_parent[rw];
 		sq = sq->parent_sq;
 		tg = sq_to_tg(sq);
-		if (!tg)
+		if (!tg) {
+			bio_set_flag(bio, BIO_BPS_THROTTLED);
+			bio_set_flag(bio, BIO_IOPS_THROTTLED);
 			goto out_unlock;
+		}
 	}
 
 	/* out-of-limit, queue to @tg */
@@ -2189,8 +2194,6 @@ bool __blk_throtl_bio(struct bio *bio)
 	}
 
 out_unlock:
-	bio_set_flag(bio, BIO_THROTTLED);
-
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
 	if (throttled || !td->track_bio_latency)
 		bio->bi_issue.value |= BIO_ISSUE_THROTL_SKIP_LATENCY;
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index c1b602996127..45c6f3c1dfe0 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -174,9 +174,9 @@ static inline bool blk_throtl_bio(struct bio *bio)
 {
 	struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg);
 
-	/* no need to throttle bps any more if the bio has been throttled */
-	if (bio_flagged(bio, BIO_THROTTLED) &&
-	    !(tg->flags & THROTL_TG_HAS_IOPS_LIMIT))
+	/* no need to throttle any more if the bio has been throttled */
+	if (bio_flagged(bio, BIO_BPS_THROTTLED) &&
+	    bio_flagged(bio, BIO_IOPS_THROTTLED))
 		return false;
 
 	if (!tg->has_rules[bio_data_dir(bio)])
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ca22b06700a9..8ab014af163d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -508,8 +508,10 @@ static inline void bio_clone_blkg_association(struct bio *dst,
 static inline void bio_set_dev(struct bio *bio, struct block_device *bdev)
 {
 	bio_clear_flag(bio, BIO_REMAPPED);
-	if (bio->bi_bdev != bdev)
-		bio_clear_flag(bio, BIO_THROTTLED);
+	if (bio->bi_bdev != bdev) {
+		bio_clear_flag(bio, BIO_BPS_THROTTLED);
+		bio_clear_flag(bio, BIO_IOPS_THROTTLED);
+	}
 	bio->bi_bdev = bdev;
 	bio_associate_blkg(bio);
 }
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 1ef99790f6ed..d37cdafa2a07 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -325,8 +325,10 @@ enum {
 	BIO_QUIET,		/* Make BIO Quiet */
 	BIO_CHAIN,		/* chained bio, ->bi_remaining in effect */
 	BIO_REFFED,		/* bio has elevated ->bi_cnt */
-	BIO_THROTTLED,		/* This bio has already been subjected to
-				 * throttling rules. Don't do it again. */
+	BIO_BPS_THROTTLED,	/* This bio has already been subjected to
+				 * bps throttling rules. Don't do it again. */
+	BIO_IOPS_THROTTLED,	/* This bio has already been subjected to
+				 * iops throttling rules. Don't do it again. */
 	BIO_TRACE_COMPLETION,	/* bio_endio() should trace the final completion
 				 * of this bio. */
 	BIO_CGROUP_ACCT,	/* has been accounted to a cgroup */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 2/4] blk-throttle: prevent overflow while calculating wait time
  2022-08-23  3:31 [PATCH v8 0/4] blk-throttle bugfix Yu Kuai
  2022-08-23  3:31 ` [PATCH v8 1/4] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
@ 2022-08-23  3:31 ` Yu Kuai
  2022-08-23  3:31 ` [PATCH v8 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed Yu Kuai
  2022-08-23  3:31 ` [PATCH v8 4/4] blk-throttle: fix io hung due to configuration updates Yu Kuai
  3 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2022-08-23  3:31 UTC (permalink / raw)
  To: axboe, tj, ming.lei, mkoutny
  Cc: linux-block, linux-kernel, cgroups, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

There is a problem found by code review in tg_with_in_bps_limit() that
'bps_limit * jiffy_elapsed_rnd' might overflow. Fix the problem by
calling mul_u64_u64_div_u64() instead.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/blk-throttle.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index f3c9dcab63e2..773a355d3838 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -806,7 +806,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 				 u64 bps_limit, unsigned long *wait)
 {
 	bool rw = bio_data_dir(bio);
-	u64 bytes_allowed, extra_bytes, tmp;
+	u64 bytes_allowed, extra_bytes;
 	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
@@ -824,10 +824,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-
-	tmp = bps_limit * jiffy_elapsed_rnd;
-	do_div(tmp, HZ);
-	bytes_allowed = tmp;
+	bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
+					    (u64)HZ);
 
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed
  2022-08-23  3:31 [PATCH v8 0/4] blk-throttle bugfix Yu Kuai
  2022-08-23  3:31 ` [PATCH v8 1/4] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
  2022-08-23  3:31 ` [PATCH v8 2/4] blk-throttle: prevent overflow while calculating wait time Yu Kuai
@ 2022-08-23  3:31 ` Yu Kuai
  2022-08-23  3:31 ` [PATCH v8 4/4] blk-throttle: fix io hung due to configuration updates Yu Kuai
  3 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2022-08-23  3:31 UTC (permalink / raw)
  To: axboe, tj, ming.lei, mkoutny
  Cc: linux-block, linux-kernel, cgroups, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

No functional changes, new apis will be used in later patches to
calculate wait time for throttled bios when new configuration is
submitted.

Noted this patch also rename tg_with_in_iops/bps_limit() to
tg_within_iops/bps_limit().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/blk-throttle.c | 59 ++++++++++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 773a355d3838..757b620f0f2d 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -754,33 +754,20 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
 		   tg->slice_start[rw], tg->slice_end[rw], jiffies);
 }
 
-static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
-				  u32 iops_limit, unsigned long *wait)
+static unsigned int calculate_io_allowed(u32 iops_limit,
+					 unsigned long jiffy_elapsed)
 {
-	bool rw = bio_data_dir(bio);
 	unsigned int io_allowed;
-	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	u64 tmp;
 
-	if (iops_limit == UINT_MAX || bio_flagged(bio, BIO_IOPS_THROTTLED)) {
-		if (wait)
-			*wait = 0;
-		return true;
-	}
-
-	jiffy_elapsed = jiffies - tg->slice_start[rw];
-
-	/* Round up to the next throttle slice, wait time must be nonzero */
-	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-
 	/*
-	 * jiffy_elapsed_rnd should not be a big value as minimum iops can be
+	 * jiffy_elapsed should not be a big value as minimum iops can be
 	 * 1 then at max jiffy elapsed should be equivalent of 1 second as we
 	 * will allow dispatch after 1 second and after that slice should
 	 * have been trimmed.
 	 */
 
-	tmp = (u64)iops_limit * jiffy_elapsed_rnd;
+	tmp = (u64)iops_limit * jiffy_elapsed;
 	do_div(tmp, HZ);
 
 	if (tmp > UINT_MAX)
@@ -788,6 +775,32 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 	else
 		io_allowed = tmp;
 
+	return io_allowed;
+}
+
+static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
+{
+	return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
+}
+
+static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
+				 u32 iops_limit, unsigned long *wait)
+{
+	bool rw = bio_data_dir(bio);
+	unsigned int io_allowed;
+	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
+
+	if (iops_limit == UINT_MAX || bio_flagged(bio, BIO_IOPS_THROTTLED)) {
+		if (wait)
+			*wait = 0;
+		return true;
+	}
+
+	jiffy_elapsed = jiffies - tg->slice_start[rw];
+
+	/* Round up to the next throttle slice, wait time must be nonzero */
+	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
+	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
 		if (wait)
 			*wait = 0;
@@ -802,8 +815,8 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 	return false;
 }
 
-static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
-				 u64 bps_limit, unsigned long *wait)
+static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
+				u64 bps_limit, unsigned long *wait)
 {
 	bool rw = bio_data_dir(bio);
 	u64 bytes_allowed, extra_bytes;
@@ -824,9 +837,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-	bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
-					    (u64)HZ);
-
+	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
 			*wait = 0;
@@ -895,8 +906,8 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 				jiffies + tg->td->throtl_slice);
 	}
 
-	if (tg_with_in_bps_limit(tg, bio, bps_limit, &bps_wait) &&
-	    tg_with_in_iops_limit(tg, bio, iops_limit, &iops_wait)) {
+	if (tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) &&
+	    tg_within_iops_limit(tg, bio, iops_limit, &iops_wait)) {
 		if (wait)
 			*wait = 0;
 		return true;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v8 4/4] blk-throttle: fix io hung due to configuration updates
  2022-08-23  3:31 [PATCH v8 0/4] blk-throttle bugfix Yu Kuai
                   ` (2 preceding siblings ...)
  2022-08-23  3:31 ` [PATCH v8 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed Yu Kuai
@ 2022-08-23  3:31 ` Yu Kuai
  2022-08-23  9:41   ` Michal Koutný
  2022-08-23 18:08   ` Tejun Heo
  3 siblings, 2 replies; 11+ messages in thread
From: Yu Kuai @ 2022-08-23  3:31 UTC (permalink / raw)
  To: axboe, tj, ming.lei, mkoutny
  Cc: linux-block, linux-kernel, cgroups, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

If new configuration is submitted while a bio is throttled, then new
waiting time is recalculated regardless that the bio might already wait
for some time:

tg_conf_updated
 throtl_start_new_slice
  tg_update_disptime
  throtl_schedule_next_dispatch

Then io hung can be triggered by always submmiting new configuration
before the throttled bio is dispatched.

Fix the problem by respecting the time that throttled bio already waited.
In order to do that, add new fields to record how many bytes/io are
waited, and use it to calculate wait time for throttled bio under new
configuration.

Some simple test:
1)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 2048" > blkio.throttle.write_bps_device
{
        sleep 2
        echo "8:0 1024" > blkio.throttle.write_bps_device
} &
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

2)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 1024" > blkio.throttle.write_bps_device
{
        sleep 4
        echo "8:0 2048" > blkio.throttle.write_bps_device
} &
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

test results: io finish time
	before this patch	with this patch
1)	10s			6s
2)	8s			6s

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 58 +++++++++++++++++++++++++++++++++++++++-----
 block/blk-throttle.h |  9 +++++++
 2 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 757b620f0f2d..679e08c0714c 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -639,6 +639,8 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
 {
 	tg->bytes_disp[rw] = 0;
 	tg->io_disp[rw] = 0;
+	tg->carryover_bytes[rw] = 0;
+	tg->carryover_ios[rw] = 0;
 
 	/*
 	 * Previous slice has expired. We must have trimmed it after last
@@ -656,12 +658,17 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
 		   tg->slice_end[rw], jiffies);
 }
 
-static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw)
+static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw,
+					  bool clear_carryover)
 {
 	tg->bytes_disp[rw] = 0;
 	tg->io_disp[rw] = 0;
 	tg->slice_start[rw] = jiffies;
 	tg->slice_end[rw] = jiffies + tg->td->throtl_slice;
+	if (clear_carryover) {
+		tg->carryover_bytes[rw] = 0;
+		tg->carryover_ios[rw] = 0;
+	}
 
 	throtl_log(&tg->service_queue,
 		   "[%c] new slice start=%lu end=%lu jiffies=%lu",
@@ -783,6 +790,41 @@ static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
 	return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
 }
 
+static void __tg_update_carryover(struct throtl_grp *tg, bool rw)
+{
+	unsigned long jiffy_elapsed = jiffies - tg->slice_start[rw];
+	u64 bps_limit = tg_bps_limit(tg, rw);
+	u32 iops_limit = tg_iops_limit(tg, rw);
+
+	/*
+	 * If config is updated while bios are still throttled, calculate and
+	 * accumulate how many bytes/ios are waited across changes. And
+	 * carryover_bytes/ios will be used to calculate new wait time under new
+	 * configuration.
+	 */
+	if (bps_limit != U64_MAX)
+		tg->carryover_bytes[rw] +=
+			calculate_bytes_allowed(bps_limit, jiffy_elapsed) -
+			tg->bytes_disp[rw];
+	if (iops_limit != UINT_MAX)
+		tg->carryover_ios[rw] +=
+			calculate_io_allowed(iops_limit, jiffy_elapsed) -
+			tg->io_disp[rw];
+}
+
+static void tg_update_carryover(struct throtl_grp *tg)
+{
+	if (tg->service_queue.nr_queued[READ])
+		__tg_update_carryover(tg, READ);
+	if (tg->service_queue.nr_queued[WRITE])
+		__tg_update_carryover(tg, WRITE);
+
+	/* see comments in struct throtl_grp for meaning of these fields. */
+	throtl_log(&tg->service_queue, "%s: %llu %llu %u %u\n", __func__,
+		   tg->carryover_bytes[READ], tg->carryover_bytes[WRITE],
+		   tg->carryover_ios[READ], tg->carryover_ios[WRITE]);
+}
+
 static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
 				 u32 iops_limit, unsigned long *wait)
 {
@@ -800,7 +842,8 @@ static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
 
 	/* Round up to the next throttle slice, wait time must be nonzero */
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
+	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd) +
+		     tg->carryover_ios[rw];
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
 		if (wait)
 			*wait = 0;
@@ -837,7 +880,8 @@ static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
+	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd) +
+			tg->carryover_bytes[rw];
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
 			*wait = 0;
@@ -898,7 +942,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 	 * slice and it should be extended instead.
 	 */
 	if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]))
-		throtl_start_new_slice(tg, rw);
+		throtl_start_new_slice(tg, rw, true);
 	else {
 		if (time_before(tg->slice_end[rw],
 		    jiffies + tg->td->throtl_slice))
@@ -1332,8 +1376,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
 	 * that a group's limit are dropped suddenly and we don't want to
 	 * account recently dispatched IO with new low rate.
 	 */
-	throtl_start_new_slice(tg, READ);
-	throtl_start_new_slice(tg, WRITE);
+	throtl_start_new_slice(tg, READ, false);
+	throtl_start_new_slice(tg, WRITE, false);
 
 	if (tg->flags & THROTL_TG_PENDING) {
 		tg_update_disptime(tg);
@@ -1361,6 +1405,7 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of,
 		v = U64_MAX;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_carryover(tg);
 
 	if (is_u64)
 		*(u64 *)((void *)tg + of_cft(of)->private) = v;
@@ -1547,6 +1592,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of,
 		return ret;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_carryover(tg);
 
 	v[0] = tg->bps_conf[READ][index];
 	v[1] = tg->bps_conf[WRITE][index];
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index 45c6f3c1dfe0..a25e9e356d17 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -121,6 +121,15 @@ struct throtl_grp {
 	uint64_t last_bytes_disp[2];
 	unsigned int last_io_disp[2];
 
+	/*
+	 * The following two fields are updated when new configuration is
+	 * submitted while some bios are still throttled, they record how many
+	 * bytes/ios are waited already in previous configuration, and they will
+	 * be used to calculate wait time under new configuration.
+	 */
+	uint64_t carryover_bytes[2];
+	unsigned int carryover_ios[2];
+
 	unsigned long last_check_time;
 
 	unsigned long latency_target; /* us */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 4/4] blk-throttle: fix io hung due to configuration updates
  2022-08-23  3:31 ` [PATCH v8 4/4] blk-throttle: fix io hung due to configuration updates Yu Kuai
@ 2022-08-23  9:41   ` Michal Koutný
  2022-08-23 18:08   ` Tejun Heo
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Koutný @ 2022-08-23  9:41 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, tj, ming.lei, linux-block, linux-kernel, cgroups, yukuai3,
	yi.zhang

Hello.

On Tue, Aug 23, 2022 at 11:31:30AM +0800, Yu Kuai <yukuai1@huaweicloud.com> wrote:
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  block/blk-throttle.c | 58 +++++++++++++++++++++++++++++++++++++++-----
>  block/blk-throttle.h |  9 +++++++
>  2 files changed, 61 insertions(+), 6 deletions(-)

I see v8 is just naming+comments [1] change, calculations remain, so it
can have

Reviewed-by: Michal Koutný <mkoutny@suse.com>

[1] I assume dropping of the overflow/signedness is intentional after
previous debate.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 1/4] blk-throttle: fix that io throttle can only work for single bio
  2022-08-23  3:31 ` [PATCH v8 1/4] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
@ 2022-08-23 18:07   ` Tejun Heo
  2022-08-24  1:15     ` Yu Kuai
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2022-08-23 18:07 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, ming.lei, mkoutny, linux-block, linux-kernel, cgroups,
	yukuai3, yi.zhang

Hello,

Should have asked you earlier but with the BIO_THROTTLED flag setting from
clone removed, with single BIO_THROTTLED flag, does the fix still require
bytes subtraction? If we can do single flag and we don't need the bytes
subtraction, might as well just stay with single flag?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 4/4] blk-throttle: fix io hung due to configuration updates
  2022-08-23  3:31 ` [PATCH v8 4/4] blk-throttle: fix io hung due to configuration updates Yu Kuai
  2022-08-23  9:41   ` Michal Koutný
@ 2022-08-23 18:08   ` Tejun Heo
  1 sibling, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2022-08-23 18:08 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, ming.lei, mkoutny, linux-block, linux-kernel, cgroups,
	yukuai3, yi.zhang

On Tue, Aug 23, 2022 at 11:31:30AM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> If new configuration is submitted while a bio is throttled, then new
> waiting time is recalculated regardless that the bio might already wait
> for some time:
> 
> tg_conf_updated
>  throtl_start_new_slice
>   tg_update_disptime
>   throtl_schedule_next_dispatch
> 
> Then io hung can be triggered by always submmiting new configuration
> before the throttled bio is dispatched.
> 
> Fix the problem by respecting the time that throttled bio already waited.
> In order to do that, add new fields to record how many bytes/io are
> waited, and use it to calculate wait time for throttled bio under new
> configuration.
> 
> Some simple test:
> 1)
> cd /sys/fs/cgroup/blkio/
> echo $$ > cgroup.procs
> echo "8:0 2048" > blkio.throttle.write_bps_device
> {
>         sleep 2
>         echo "8:0 1024" > blkio.throttle.write_bps_device
> } &
> dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
> 
> 2)
> cd /sys/fs/cgroup/blkio/
> echo $$ > cgroup.procs
> echo "8:0 1024" > blkio.throttle.write_bps_device
> {
>         sleep 4
>         echo "8:0 2048" > blkio.throttle.write_bps_device
> } &
> dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
> 
> test results: io finish time
> 	before this patch	with this patch
> 1)	10s			6s
> 2)	8s			6s
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

For 2-4,

 Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 1/4] blk-throttle: fix that io throttle can only work for single bio
  2022-08-23 18:07   ` Tejun Heo
@ 2022-08-24  1:15     ` Yu Kuai
  2022-08-25 18:15       ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2022-08-24  1:15 UTC (permalink / raw)
  To: Tejun Heo, Yu Kuai
  Cc: axboe, ming.lei, mkoutny, linux-block, linux-kernel, cgroups,
	yi.zhang, yukuai (C)

Hi, Tejun

在 2022/08/24 2:07, Tejun Heo 写道:
> Hello,
> 
> Should have asked you earlier but with the BIO_THROTTLED flag setting from
> clone removed, with single BIO_THROTTLED flag, does the fix still require
> bytes subtraction? If we can do single flag and we don't need the bytes
> subtraction, might as well just stay with single flag?

Do you mean 'compensate the over-accounting' for bytes subtraction? If
so, yes, it's not required.

This patch actually set two flags when bio is throttled and
dispatched, and only iops flag is cleared after the original bio is
split. If only one flag can be used, the way that I come up with is
that let iops limit become default, which means bio is always counted
for iops limit each time blk_throtl_bio() is called. I'm not quite
sure yet if iops limit can be counted excessively this way in some
special scenario...

Thanks,
Kuai
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 1/4] blk-throttle: fix that io throttle can only work for single bio
  2022-08-24  1:15     ` Yu Kuai
@ 2022-08-25 18:15       ` Tejun Heo
  2022-08-26  1:07         ` Yu Kuai
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2022-08-25 18:15 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, ming.lei, mkoutny, linux-block, linux-kernel, cgroups,
	yi.zhang, yukuai (C)

On Wed, Aug 24, 2022 at 09:15:32AM +0800, Yu Kuai wrote:
> This patch actually set two flags when bio is throttled and
> dispatched, and only iops flag is cleared after the original bio is
> split. If only one flag can be used, the way that I come up with is
> that let iops limit become default, which means bio is always counted
> for iops limit each time blk_throtl_bio() is called. I'm not quite
> sure yet if iops limit can be counted excessively this way in some
> special scenario...

I don't think we have a path where we clone and re-submit other than
splitting. What do you think about renaming the flag to BIO_BPS_THROTTLED
and just assuming that IOPS is always applied?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v8 1/4] blk-throttle: fix that io throttle can only work for single bio
  2022-08-25 18:15       ` Tejun Heo
@ 2022-08-26  1:07         ` Yu Kuai
  0 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2022-08-26  1:07 UTC (permalink / raw)
  To: Tejun Heo, Yu Kuai
  Cc: axboe, ming.lei, mkoutny, linux-block, linux-kernel, cgroups,
	yi.zhang, yukuai (C)

Hi, Tejun

在 2022/08/26 2:15, Tejun Heo 写道:
> On Wed, Aug 24, 2022 at 09:15:32AM +0800, Yu Kuai wrote:
>> This patch actually set two flags when bio is throttled and
>> dispatched, and only iops flag is cleared after the original bio is
>> split. If only one flag can be used, the way that I come up with is
>> that let iops limit become default, which means bio is always counted
>> for iops limit each time blk_throtl_bio() is called. I'm not quite
>> sure yet if iops limit can be counted excessively this way in some
>> special scenario...
> 
> I don't think we have a path where we clone and re-submit other than
> splitting. What do you think about renaming the flag to BIO_BPS_THROTTLED
> and just assuming that IOPS is always applied?

Yes, I didn't found such path myself, this way sounds good. I'll send a
new version soon.

Thanks,
Kuai
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-08-26  1:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-23  3:31 [PATCH v8 0/4] blk-throttle bugfix Yu Kuai
2022-08-23  3:31 ` [PATCH v8 1/4] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
2022-08-23 18:07   ` Tejun Heo
2022-08-24  1:15     ` Yu Kuai
2022-08-25 18:15       ` Tejun Heo
2022-08-26  1:07         ` Yu Kuai
2022-08-23  3:31 ` [PATCH v8 2/4] blk-throttle: prevent overflow while calculating wait time Yu Kuai
2022-08-23  3:31 ` [PATCH v8 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed Yu Kuai
2022-08-23  3:31 ` [PATCH v8 4/4] blk-throttle: fix io hung due to configuration updates Yu Kuai
2022-08-23  9:41   ` Michal Koutný
2022-08-23 18:08   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).