[PATCH v3 0/9] A few bugfix and cleanup patches for blk-throttle

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/9] A few bugfix and cleanup patches for blk-throttle
@ 2022-12-05 11:57 Kemeng Shi
  2022-12-05 11:57 ` [PATCH v3 1/9] blk-throttle: correct stale comment in throtl_pd_init Kemeng Shi
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26

Hi, this series contain a few patches to fix problem when on the default
hierarchy, corret comment and so on. More details can be found in
respective changelogs. Thanks.

---
V2->V3:
 -Thanks for the review and advice from Tejun. Now all patches are acked
  by Tejun.
 -remove quotes around parent in corrected comment in patch "blk-throttle:
  correct stale comment in throtl_pd_init"
 -improve log message and rename throtl_tg_reach_low_limit to
  throtl_low_limit_reached.
 -drop patch "blk-throttle: avoid dead code in
  throtl_hierarchy_can_upgrade"
---
V1->V2:
 -Thanks for the review and advice from Tejun. The corrected comment of
  "blk-throttle: correct stale comment in throtl_pd_init" and the
  solution of "blk-throttle: Fix that bps of child could exceed bps
  limited in parent" are from reply of Tejun.
 -Collect Ack from Tejun.
 -Fix the compile problem when CONFIG_BLK_DEV_THROTTLING_LOW is set.
 -Drop "blk-throttle: Limit whole system if root group is configured
  when on the default hierarchy", "blk-throttle: remove unnecessary check
  for validation of limit index" and "blk-throttle: remove unused variable
  td in tg_update_has_rules"
 -Add "blk-throttle: correct stale comment in throtl_pd_init" and
  "blk-throttle: avoid dead code in throtl_hierarchy_can_upgrade"
 -Use solution that set the BIO_BPS_THROTTLED flag only when the bio
  traversed the entire tree to fix that bps of child could exceed bps
  limited in parent in patch 2/10.
 -Improve the description and comment of most commits.
---

Kemeng Shi (9):
  blk-throttle: correct stale comment in throtl_pd_init
  blk-throttle: Fix that bps of child could exceed bps limited in parent
  blk-throttle: ignore cgroup without io queued in
    blk_throtl_cancel_bios
  blk-throttle: correct calculation of wait time in tg_may_dispatch
  blk-throttle: simpfy low limit reached check in throtl_tg_can_upgrade
  blk-throttle: fix typo in comment of throtl_adjusted_limit
  blk-throttle: remove incorrect comment for tg_last_low_overflow_time
  blk-throttle: remove repeat check of elapsed time from last upgrade in
    throtl_hierarchy_can_downgrade
  blk-throttle: Use more siutable time_after check for update of
    slice_start

 block/blk-throttle.c | 102 +++++++++++++++++++++++--------------------
 1 file changed, 55 insertions(+), 47 deletions(-)

-- 
2.30.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/9] blk-throttle: correct stale comment in throtl_pd_init
  2022-12-05 11:57 [PATCH v3 0/9] A few bugfix and cleanup patches for blk-throttle Kemeng Shi
@ 2022-12-05 11:57 ` Kemeng Shi
  2022-12-05 20:43     ` Jens Axboe
  2022-12-05 11:57   ` Kemeng Shi
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26

From: Kemeng Shi <shikemeng@huawei.com>

On the default hierarchy (cgroup2), the throttle interface files don't
exist in the root cgroup, so the ablity to limit the whole system
by configuring root group is not existing anymore. In general, cgroup
doesn't wanna be in the business of restricting resources at the
system level, so correct the stale comment that we can limit whole
system to we can only limit subtree.

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-throttle.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 847721dc2b2b..8e2349b17936 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -395,8 +395,9 @@ static void throtl_pd_init(struct blkg_policy_data *pd)
 	 * If on the default hierarchy, we switch to properly hierarchical
 	 * behavior where limits on a given throtl_grp are applied to the
 	 * whole subtree rather than just the group itself.  e.g. If 16M
-	 * read_bps limit is set on the root group, the whole system can't
-	 * exceed 16M for the device.
+	 * read_bps limit is set on a parent group, summary bps of
+	 * parent group and its subtree groups can't exceed 16M for the
+	 * device.
 	 *
 	 * If not on the default hierarchy, the broken flat hierarchy
 	 * behavior is retained where all throtl_grps are treated as if
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/9] blk-throttle: Fix that bps of child could exceed bps limited in parent
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26

From: Kemeng Shi <shikemeng@huawei.com>

Consider situation as following (on the default hierarchy):
 HDD
  |
root (bps limit: 4k)
  |
child (bps limit :8k)
  |
fio bs=8k
Rate of fio is supposed to be 4k, but result is 8k. Reason is as
following:
Size of single IO from fio is larger than bytes allowed in one
throtl_slice in child, so IOs are always queued in child group first.
When queued IOs in child are dispatched to parent group, BIO_BPS_THROTTLED
is set and these IOs will not be limited by tg_within_bps_limit anymore.
Fix this by only set BIO_BPS_THROTTLED when the bio traversed the entire
tree.

There patch has no influence on situation which is not on the default
hierarchy as each group is a single root group without parent.

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-throttle.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8e2349b17936..2444ebf5f11d 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1067,7 +1067,6 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
 	sq->nr_queued[rw]--;

 	throtl_charge_bio(tg, bio);
-	bio_set_flag(bio, BIO_BPS_THROTTLED);

 	/*
 	 * If our parent is another tg, we just need to transfer @bio to
@@ -1080,6 +1079,7 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
 		throtl_add_bio_tg(bio, &tg->qnode_on_parent[rw], parent_tg);
 		start_parent_slice_with_credit(tg, parent_tg, rw);
 	} else {
+		bio_set_flag(bio, BIO_BPS_THROTTLED);
 		throtl_qnode_add_bio(bio, &tg->qnode_on_parent[rw],
 				     &parent_sq->queued[rw]);
 		BUG_ON(tg->td->nr_queued[rw] <= 0);
-- 
2.30.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/9] blk-throttle: Fix that bps of child could exceed bps limited in parent
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, josef-DigfWCa+lFGyeJad7bwFQA,
	axboe-tSWWG44O7X1aa/9Udqfwiw
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shikemeng-hv44wF8Li93QT0dZR+AlfA,
	linfeilong-hv44wF8Li93QT0dZR+AlfA,
	liuzhiqiang26-hv44wF8Li93QT0dZR+AlfA

From: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Consider situation as following (on the default hierarchy):
 HDD
  |
root (bps limit: 4k)
  |
child (bps limit :8k)
  |
fio bs=8k
Rate of fio is supposed to be 4k, but result is 8k. Reason is as
following:
Size of single IO from fio is larger than bytes allowed in one
throtl_slice in child, so IOs are always queued in child group first.
When queued IOs in child are dispatched to parent group, BIO_BPS_THROTTLED
is set and these IOs will not be limited by tg_within_bps_limit anymore.
Fix this by only set BIO_BPS_THROTTLED when the bio traversed the entire
tree.

There patch has no influence on situation which is not on the default
hierarchy as each group is a single root group without parent.

Signed-off-by: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Kemeng Shi <shikemeng-XF6JlduFytWkHkcT6e4Xnw@public.gmane.org>
---
 block/blk-throttle.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8e2349b17936..2444ebf5f11d 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1067,7 +1067,6 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
 	sq->nr_queued[rw]--;
 
 	throtl_charge_bio(tg, bio);
-	bio_set_flag(bio, BIO_BPS_THROTTLED);
 
 	/*
 	 * If our parent is another tg, we just need to transfer @bio to
@@ -1080,6 +1079,7 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
 		throtl_add_bio_tg(bio, &tg->qnode_on_parent[rw], parent_tg);
 		start_parent_slice_with_credit(tg, parent_tg, rw);
 	} else {
+		bio_set_flag(bio, BIO_BPS_THROTTLED);
 		throtl_qnode_add_bio(bio, &tg->qnode_on_parent[rw],
 				     &parent_sq->queued[rw]);
 		BUG_ON(tg->td->nr_queued[rw] <= 0);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/9] blk-throttle: ignore cgroup without io queued in blk_throtl_cancel_bios
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26

From: Kemeng Shi <shikemeng@huawei.com>

Ignore cgroup without io queued in blk_throtl_cancel_bios for two
reasons:
1. Save cpu cycle for trying to dispatch cgroup which is no io queued.
2. Avoid non-consistent state that cgroup is inserted to service queue
without THROTL_TG_PENDING set as tg_update_disptime will unconditional
re-insert cgroup to service queue. If we are on the default hierarchy,
IO dispatched from child in tg_dispatch_one_bio will trigger inserting
cgroup to service queue without erase first and ruin the tree.

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-throttle.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 2444ebf5f11d..75010110d481 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1738,7 +1738,18 @@ void blk_throtl_cancel_bios(struct gendisk *disk)
 		 * Set the flag to make sure throtl_pending_timer_fn() won't
 		 * stop until all throttled bios are dispatched.
 		 */
-		blkg_to_tg(blkg)->flags |= THROTL_TG_CANCELING;
+		tg->flags |= THROTL_TG_CANCELING;
+
+		/*
+		 * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
+		 * will be inserted to service queue without THROTL_TG_PENDING
+		 * set in tg_update_disptime below. Then IO dispatched from
+		 * child in tg_dispatch_one_bio will trigger double insertion
+		 * and corrupt the tree.
+		 */
+		if (!(tg->flags & THROTL_TG_PENDING))
+			continue;
+
 		/*
 		 * Update disptime after setting the above flag to make sure
 		 * throtl_select_dispatch() won't exit without dispatching.
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/9] blk-throttle: ignore cgroup without io queued in blk_throtl_cancel_bios
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, josef-DigfWCa+lFGyeJad7bwFQA,
	axboe-tSWWG44O7X1aa/9Udqfwiw
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shikemeng-hv44wF8Li93QT0dZR+AlfA,
	linfeilong-hv44wF8Li93QT0dZR+AlfA,
	liuzhiqiang26-hv44wF8Li93QT0dZR+AlfA

From: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Ignore cgroup without io queued in blk_throtl_cancel_bios for two
reasons:
1. Save cpu cycle for trying to dispatch cgroup which is no io queued.
2. Avoid non-consistent state that cgroup is inserted to service queue
without THROTL_TG_PENDING set as tg_update_disptime will unconditional
re-insert cgroup to service queue. If we are on the default hierarchy,
IO dispatched from child in tg_dispatch_one_bio will trigger inserting
cgroup to service queue without erase first and ruin the tree.

Signed-off-by: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Kemeng Shi <shikemeng-XF6JlduFytWkHkcT6e4Xnw@public.gmane.org>
---
 block/blk-throttle.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 2444ebf5f11d..75010110d481 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1738,7 +1738,18 @@ void blk_throtl_cancel_bios(struct gendisk *disk)
 		 * Set the flag to make sure throtl_pending_timer_fn() won't
 		 * stop until all throttled bios are dispatched.
 		 */
-		blkg_to_tg(blkg)->flags |= THROTL_TG_CANCELING;
+		tg->flags |= THROTL_TG_CANCELING;
+
+		/*
+		 * Do not dispatch cgroup without THROTL_TG_PENDING or cgroup
+		 * will be inserted to service queue without THROTL_TG_PENDING
+		 * set in tg_update_disptime below. Then IO dispatched from
+		 * child in tg_dispatch_one_bio will trigger double insertion
+		 * and corrupt the tree.
+		 */
+		if (!(tg->flags & THROTL_TG_PENDING))
+			continue;
+
 		/*
 		 * Update disptime after setting the above flag to make sure
 		 * throtl_select_dispatch() won't exit without dispatching.
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/9] blk-throttle: correct calculation of wait time in tg_may_dispatch
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26

From: Kemeng Shi <shikemeng@huawei.com>

In C language, When executing "if (expression1 && expression2)" and
expression1 return false, the expression2 may not be executed.
For "tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) &&
tg_within_iops_limit(tg, bio, iops_limit, &iops_wait))", if bps is
limited, tg_within_bps_limit will return false and
tg_within_iops_limit will not be called. So even bps and iops are
both limited, iops_wait will not be calculated and is always zero.
So wait time of iops is always ignored.

Fix this by always calling tg_within_bps_limit and tg_within_iops_limit
to get wait time for both bps and iops.

Observed that:
1. Wait time in tg_within_iops_limit/tg_within_bps_limit need always
be stored as wait argument is always passed.
2. wait time is stored to zero if iops/bps is limited otherwise non-zero
is stored.
Simpfy tg_within_iops_limit/tg_within_bps_limit by removing wait argument
and return wait time directly. Caller tg_may_dispatch checks if wait time
is zero to find if iops/bps is limited.

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-throttle.c | 38 +++++++++++++-------------------------
 1 file changed, 13 insertions(+), 25 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 75010110d481..d5b7a2354ad7 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -822,17 +822,15 @@ static void tg_update_carryover(struct throtl_grp *tg)
 		   tg->carryover_ios[READ], tg->carryover_ios[WRITE]);
 }
 
-static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
-				 u32 iops_limit, unsigned long *wait)
+static unsigned long tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
+				 u32 iops_limit)
 {
 	bool rw = bio_data_dir(bio);
 	unsigned int io_allowed;
 	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 
 	if (iops_limit == UINT_MAX) {
-		if (wait)
-			*wait = 0;
-		return true;
+		return 0;
 	}
 
 	jiffy_elapsed = jiffies - tg->slice_start[rw];
@@ -842,21 +840,16 @@ static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
 	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd) +
 		     tg->carryover_ios[rw];
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
-		if (wait)
-			*wait = 0;
-		return true;
+		return 0;
 	}
 
 	/* Calc approx time to dispatch */
 	jiffy_wait = jiffy_elapsed_rnd - jiffy_elapsed;
-
-	if (wait)
-		*wait = jiffy_wait;
-	return false;
+	return jiffy_wait;
 }
 
-static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
-				u64 bps_limit, unsigned long *wait)
+static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
+				u64 bps_limit)
 {
 	bool rw = bio_data_dir(bio);
 	u64 bytes_allowed, extra_bytes;
@@ -865,9 +858,7 @@ static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
 
 	/* no need to throttle if this bio's bytes have been accounted */
 	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_BPS_THROTTLED)) {
-		if (wait)
-			*wait = 0;
-		return true;
+		return 0;
 	}
 
 	jiffy_elapsed = jiffy_elapsed_rnd = jiffies - tg->slice_start[rw];
@@ -880,9 +871,7 @@ static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
 	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd) +
 			tg->carryover_bytes[rw];
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
-		if (wait)
-			*wait = 0;
-		return true;
+		return 0;
 	}
 
 	/* Calc approx time to dispatch */
@@ -897,9 +886,7 @@ static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
 	 * up we did. Add that time also.
 	 */
 	jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed);
-	if (wait)
-		*wait = jiffy_wait;
-	return false;
+	return jiffy_wait;
 }
 
 /*
@@ -947,8 +934,9 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 				jiffies + tg->td->throtl_slice);
 	}
 
-	if (tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) &&
-	    tg_within_iops_limit(tg, bio, iops_limit, &iops_wait)) {
+	bps_wait = tg_within_bps_limit(tg, bio, bps_limit);
+	iops_wait = tg_within_iops_limit(tg, bio, iops_limit);
+	if (bps_wait + iops_wait == 0) {
 		if (wait)
 			*wait = 0;
 		return true;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/9] blk-throttle: correct calculation of wait time in tg_may_dispatch
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, josef-DigfWCa+lFGyeJad7bwFQA,
	axboe-tSWWG44O7X1aa/9Udqfwiw
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shikemeng-hv44wF8Li93QT0dZR+AlfA,
	linfeilong-hv44wF8Li93QT0dZR+AlfA,
	liuzhiqiang26-hv44wF8Li93QT0dZR+AlfA

From: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

In C language, When executing "if (expression1 && expression2)" and
expression1 return false, the expression2 may not be executed.
For "tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) &&
tg_within_iops_limit(tg, bio, iops_limit, &iops_wait))", if bps is
limited, tg_within_bps_limit will return false and
tg_within_iops_limit will not be called. So even bps and iops are
both limited, iops_wait will not be calculated and is always zero.
So wait time of iops is always ignored.

Fix this by always calling tg_within_bps_limit and tg_within_iops_limit
to get wait time for both bps and iops.

Observed that:
1. Wait time in tg_within_iops_limit/tg_within_bps_limit need always
be stored as wait argument is always passed.
2. wait time is stored to zero if iops/bps is limited otherwise non-zero
is stored.
Simpfy tg_within_iops_limit/tg_within_bps_limit by removing wait argument
and return wait time directly. Caller tg_may_dispatch checks if wait time
is zero to find if iops/bps is limited.

Signed-off-by: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Kemeng Shi <shikemeng-XF6JlduFytWkHkcT6e4Xnw@public.gmane.org>
---
 block/blk-throttle.c | 38 +++++++++++++-------------------------
 1 file changed, 13 insertions(+), 25 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 75010110d481..d5b7a2354ad7 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -822,17 +822,15 @@ static void tg_update_carryover(struct throtl_grp *tg)
 		   tg->carryover_ios[READ], tg->carryover_ios[WRITE]);
 }
 
-static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
-				 u32 iops_limit, unsigned long *wait)
+static unsigned long tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
+				 u32 iops_limit)
 {
 	bool rw = bio_data_dir(bio);
 	unsigned int io_allowed;
 	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 
 	if (iops_limit == UINT_MAX) {
-		if (wait)
-			*wait = 0;
-		return true;
+		return 0;
 	}
 
 	jiffy_elapsed = jiffies - tg->slice_start[rw];
@@ -842,21 +840,16 @@ static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
 	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd) +
 		     tg->carryover_ios[rw];
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
-		if (wait)
-			*wait = 0;
-		return true;
+		return 0;
 	}
 
 	/* Calc approx time to dispatch */
 	jiffy_wait = jiffy_elapsed_rnd - jiffy_elapsed;
-
-	if (wait)
-		*wait = jiffy_wait;
-	return false;
+	return jiffy_wait;
 }
 
-static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
-				u64 bps_limit, unsigned long *wait)
+static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
+				u64 bps_limit)
 {
 	bool rw = bio_data_dir(bio);
 	u64 bytes_allowed, extra_bytes;
@@ -865,9 +858,7 @@ static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
 
 	/* no need to throttle if this bio's bytes have been accounted */
 	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_BPS_THROTTLED)) {
-		if (wait)
-			*wait = 0;
-		return true;
+		return 0;
 	}
 
 	jiffy_elapsed = jiffy_elapsed_rnd = jiffies - tg->slice_start[rw];
@@ -880,9 +871,7 @@ static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
 	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd) +
 			tg->carryover_bytes[rw];
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
-		if (wait)
-			*wait = 0;
-		return true;
+		return 0;
 	}
 
 	/* Calc approx time to dispatch */
@@ -897,9 +886,7 @@ static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
 	 * up we did. Add that time also.
 	 */
 	jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed);
-	if (wait)
-		*wait = jiffy_wait;
-	return false;
+	return jiffy_wait;
 }
 
 /*
@@ -947,8 +934,9 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 				jiffies + tg->td->throtl_slice);
 	}
 
-	if (tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) &&
-	    tg_within_iops_limit(tg, bio, iops_limit, &iops_wait)) {
+	bps_wait = tg_within_bps_limit(tg, bio, bps_limit);
+	iops_wait = tg_within_iops_limit(tg, bio, iops_limit);
+	if (bps_wait + iops_wait == 0) {
 		if (wait)
 			*wait = 0;
 		return true;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 5/9] blk-throttle: simpfy low limit reached check in throtl_tg_can_upgrade
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26

From: Kemeng Shi <shikemeng@huawei.com>

Commit c79892c557616 ("blk-throttle: add upgrade logic for LIMIT_LOW
state") added upgrade logic for low limit and methioned that
1. "To determine if a cgroup exceeds its limitation, we check if the cgroup
has pending request. Since cgroup is throttled according to the limit,
pending request means the cgroup reaches the limit."
2. "If a cgroup has limit set for both read and write, we consider the
combination of them for upgrade. The reason is read IO and write IO can
interfere with each other. If we do the upgrade based in one direction IO,
the other direction IO could be severly harmed."
Besides, we also determine that cgroup reaches low limit if low limit is 0,
see comment in throtl_tg_can_upgrade.

Collect the information above, the desgin of upgrade check is as following:
1.The low limit is reached if limit is zero or io is already queued.
2.Cgroup will pass upgrade check if low limits of READ and WRITE are both
reached.

Simpfy the check code described above to removce repeat check and improve
readability. There is no functional change.

Detail equivalence proof is as following:
All replaced conditions to return true are as following:
condition 1
  (!read_limit && !write_limit)
condition 2
  read_limit && sq->nr_queued[READ] &&
  (!write_limit || sq->nr_queued[WRITE])
condition 3
  write_limit && sq->nr_queued[WRITE] &&
  (!read_limit || sq->nr_queued[READ])

Transferring condition 2 as following:
  (read_limit && sq->nr_queued[READ]) &&
  (!write_limit || sq->nr_queued[WRITE])
is equivalent to
  (read_limit && sq->nr_queued[READ]) &&
  (!write_limit || (write_limit && sq->nr_queued[WRITE]))
is equivalent to
condition 2.1
  (read_limit && sq->nr_queued[READ] &&
  !write_limit) ||
condition 2.2
  (read_limit && sq->nr_queued[READ] &&
  (write_limit && sq->nr_queued[WRITE]))

Transferring condition 3 as following:
  write_limit && sq->nr_queued[WRITE] &&
  (!read_limit || sq->nr_queued[READ])
is equivalent to
  (write_limit && sq->nr_queued[WRITE]) &&
  (!read_limit || (read_limit && sq->nr_queued[READ]))
is equivalent to
condition 3.1
  ((write_limit && sq->nr_queued[WRITE]) &&
  !read_limit) ||
condition 3.2
  ((write_limit && sq->nr_queued[WRITE]) &&
  (read_limit && sq->nr_queued[READ]))

Condition 3.2 is the same as condition 2.2, so all conditions we get to
return are as following:
  (!read_limit && !write_limit) (1)
  (!read_limit && (write_limit && sq->nr_queued[WRITE])) (3.1)
  ((read_limit && sq->nr_queued[READ]) && !write_limit) (2.1)
  ((write_limit && sq->nr_queued[WRITE]) &&
  (read_limit && sq->nr_queued[READ])) (2.2)

As we can extract conditions "(a1 || a2) && (b1 || b2)" to:
a1 && b1
a1 && b2
a2 && b1
ab && b2

Considering that:
a1 = !read_limit
a2 = read_limit && sq->nr_queued[READ]
b1 = !write_limit
b2 = write_limit && sq->nr_queued[WRITE]

We can pack replaced conditions to
  (!read_limit || (read_limit && sq->nr_queued[READ])) &&
  (!write_limit || (write_limit && sq->nr_queued[WRITE]))
which is equivalent to
  (!read_limit || sq->nr_queued[READ]) &&
  (!write_limit || sq->nr_queued[WRITE])

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-throttle.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index d5b7a2354ad7..1623507ed56e 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1816,24 +1816,29 @@ static bool throtl_tg_is_idle(struct throtl_grp *tg)
 	return ret;
 }
 
-static bool throtl_tg_can_upgrade(struct throtl_grp *tg)
+static bool throtl_low_limit_reached(struct throtl_grp *tg, int rw)
 {
 	struct throtl_service_queue *sq = &tg->service_queue;
-	bool read_limit, write_limit;
+	bool limit = tg->bps[rw][LIMIT_LOW] || tg->iops[rw][LIMIT_LOW];
 
 	/*
-	 * if cgroup reaches low limit (if low limit is 0, the cgroup always
-	 * reaches), it's ok to upgrade to next limit
+	 * if low limit is zero, low limit is always reached.
+	 * if low limit is non-zero, we can check if there is any request
+	 * is queued to determine if low limit is reached as we throttle
+	 * request according to limit.
 	 */
-	read_limit = tg->bps[READ][LIMIT_LOW] || tg->iops[READ][LIMIT_LOW];
-	write_limit = tg->bps[WRITE][LIMIT_LOW] || tg->iops[WRITE][LIMIT_LOW];
-	if (!read_limit && !write_limit)
-		return true;
-	if (read_limit && sq->nr_queued[READ] &&
-	    (!write_limit || sq->nr_queued[WRITE]))
-		return true;
-	if (write_limit && sq->nr_queued[WRITE] &&
-	    (!read_limit || sq->nr_queued[READ]))
+	return !limit || sq->nr_queued[rw];
+}
+
+static bool throtl_tg_can_upgrade(struct throtl_grp *tg)
+{
+	/*
+	 * cgroup reaches low limit when low limit of READ and WRITE are
+	 * both reached, it's ok to upgrade to next limit if cgroup reaches
+	 * low limit
+	 */
+	if (throtl_low_limit_reached(tg, READ) &&
+	    throtl_low_limit_reached(tg, WRITE))
 		return true;
 
 	if (time_after_eq(jiffies,
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 5/9] blk-throttle: simpfy low limit reached check in throtl_tg_can_upgrade
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, josef-DigfWCa+lFGyeJad7bwFQA,
	axboe-tSWWG44O7X1aa/9Udqfwiw
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shikemeng-hv44wF8Li93QT0dZR+AlfA,
	linfeilong-hv44wF8Li93QT0dZR+AlfA,
	liuzhiqiang26-hv44wF8Li93QT0dZR+AlfA

From: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Commit c79892c557616 ("blk-throttle: add upgrade logic for LIMIT_LOW
state") added upgrade logic for low limit and methioned that
1. "To determine if a cgroup exceeds its limitation, we check if the cgroup
has pending request. Since cgroup is throttled according to the limit,
pending request means the cgroup reaches the limit."
2. "If a cgroup has limit set for both read and write, we consider the
combination of them for upgrade. The reason is read IO and write IO can
interfere with each other. If we do the upgrade based in one direction IO,
the other direction IO could be severly harmed."
Besides, we also determine that cgroup reaches low limit if low limit is 0,
see comment in throtl_tg_can_upgrade.

Collect the information above, the desgin of upgrade check is as following:
1.The low limit is reached if limit is zero or io is already queued.
2.Cgroup will pass upgrade check if low limits of READ and WRITE are both
reached.

Simpfy the check code described above to removce repeat check and improve
readability. There is no functional change.

Detail equivalence proof is as following:
All replaced conditions to return true are as following:
condition 1
  (!read_limit && !write_limit)
condition 2
  read_limit && sq->nr_queued[READ] &&
  (!write_limit || sq->nr_queued[WRITE])
condition 3
  write_limit && sq->nr_queued[WRITE] &&
  (!read_limit || sq->nr_queued[READ])

Transferring condition 2 as following:
  (read_limit && sq->nr_queued[READ]) &&
  (!write_limit || sq->nr_queued[WRITE])
is equivalent to
  (read_limit && sq->nr_queued[READ]) &&
  (!write_limit || (write_limit && sq->nr_queued[WRITE]))
is equivalent to
condition 2.1
  (read_limit && sq->nr_queued[READ] &&
  !write_limit) ||
condition 2.2
  (read_limit && sq->nr_queued[READ] &&
  (write_limit && sq->nr_queued[WRITE]))

Transferring condition 3 as following:
  write_limit && sq->nr_queued[WRITE] &&
  (!read_limit || sq->nr_queued[READ])
is equivalent to
  (write_limit && sq->nr_queued[WRITE]) &&
  (!read_limit || (read_limit && sq->nr_queued[READ]))
is equivalent to
condition 3.1
  ((write_limit && sq->nr_queued[WRITE]) &&
  !read_limit) ||
condition 3.2
  ((write_limit && sq->nr_queued[WRITE]) &&
  (read_limit && sq->nr_queued[READ]))

Condition 3.2 is the same as condition 2.2, so all conditions we get to
return are as following:
  (!read_limit && !write_limit) (1)
  (!read_limit && (write_limit && sq->nr_queued[WRITE])) (3.1)
  ((read_limit && sq->nr_queued[READ]) && !write_limit) (2.1)
  ((write_limit && sq->nr_queued[WRITE]) &&
  (read_limit && sq->nr_queued[READ])) (2.2)

As we can extract conditions "(a1 || a2) && (b1 || b2)" to:
a1 && b1
a1 && b2
a2 && b1
ab && b2

Considering that:
a1 = !read_limit
a2 = read_limit && sq->nr_queued[READ]
b1 = !write_limit
b2 = write_limit && sq->nr_queued[WRITE]

We can pack replaced conditions to
  (!read_limit || (read_limit && sq->nr_queued[READ])) &&
  (!write_limit || (write_limit && sq->nr_queued[WRITE]))
which is equivalent to
  (!read_limit || sq->nr_queued[READ]) &&
  (!write_limit || sq->nr_queued[WRITE])

Signed-off-by: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Reported-by: kernel test robot <lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Kemeng Shi <shikemeng-XF6JlduFytWkHkcT6e4Xnw@public.gmane.org>
---
 block/blk-throttle.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index d5b7a2354ad7..1623507ed56e 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1816,24 +1816,29 @@ static bool throtl_tg_is_idle(struct throtl_grp *tg)
 	return ret;
 }
 
-static bool throtl_tg_can_upgrade(struct throtl_grp *tg)
+static bool throtl_low_limit_reached(struct throtl_grp *tg, int rw)
 {
 	struct throtl_service_queue *sq = &tg->service_queue;
-	bool read_limit, write_limit;
+	bool limit = tg->bps[rw][LIMIT_LOW] || tg->iops[rw][LIMIT_LOW];
 
 	/*
-	 * if cgroup reaches low limit (if low limit is 0, the cgroup always
-	 * reaches), it's ok to upgrade to next limit
+	 * if low limit is zero, low limit is always reached.
+	 * if low limit is non-zero, we can check if there is any request
+	 * is queued to determine if low limit is reached as we throttle
+	 * request according to limit.
 	 */
-	read_limit = tg->bps[READ][LIMIT_LOW] || tg->iops[READ][LIMIT_LOW];
-	write_limit = tg->bps[WRITE][LIMIT_LOW] || tg->iops[WRITE][LIMIT_LOW];
-	if (!read_limit && !write_limit)
-		return true;
-	if (read_limit && sq->nr_queued[READ] &&
-	    (!write_limit || sq->nr_queued[WRITE]))
-		return true;
-	if (write_limit && sq->nr_queued[WRITE] &&
-	    (!read_limit || sq->nr_queued[READ]))
+	return !limit || sq->nr_queued[rw];
+}
+
+static bool throtl_tg_can_upgrade(struct throtl_grp *tg)
+{
+	/*
+	 * cgroup reaches low limit when low limit of READ and WRITE are
+	 * both reached, it's ok to upgrade to next limit if cgroup reaches
+	 * low limit
+	 */
+	if (throtl_low_limit_reached(tg, READ) &&
+	    throtl_low_limit_reached(tg, WRITE))
 		return true;
 
 	if (time_after_eq(jiffies,
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 6/9] blk-throttle: fix typo in comment of throtl_adjusted_limit
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26

From: Kemeng Shi <shikemeng@huawei.com>

lapsed time -> elapsed time

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-throttle.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 1623507ed56e..7db8592dae38 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -129,7 +129,7 @@ static struct throtl_data *sq_to_td(struct throtl_service_queue *sq)
 /*
  * cgroup's limit in LIMIT_MAX is scaled if low limit is set. This scale is to
  * make the IO dispatch more smooth.
- * Scale up: linearly scale up according to lapsed time since upgrade. For
+ * Scale up: linearly scale up according to elapsed time since upgrade. For
  *           every throtl_slice, the limit scales up 1/2 .low limit till the
  *           limit hits .max limit
  * Scale down: exponentially scale down if a cgroup doesn't hit its .low limit
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 6/9] blk-throttle: fix typo in comment of throtl_adjusted_limit
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, josef-DigfWCa+lFGyeJad7bwFQA,
	axboe-tSWWG44O7X1aa/9Udqfwiw
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shikemeng-hv44wF8Li93QT0dZR+AlfA,
	linfeilong-hv44wF8Li93QT0dZR+AlfA,
	liuzhiqiang26-hv44wF8Li93QT0dZR+AlfA

From: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

lapsed time -> elapsed time

Signed-off-by: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Kemeng Shi <shikemeng-XF6JlduFytWkHkcT6e4Xnw@public.gmane.org>
---
 block/blk-throttle.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 1623507ed56e..7db8592dae38 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -129,7 +129,7 @@ static struct throtl_data *sq_to_td(struct throtl_service_queue *sq)
 /*
  * cgroup's limit in LIMIT_MAX is scaled if low limit is set. This scale is to
  * make the IO dispatch more smooth.
- * Scale up: linearly scale up according to lapsed time since upgrade. For
+ * Scale up: linearly scale up according to elapsed time since upgrade. For
  *           every throtl_slice, the limit scales up 1/2 .low limit till the
  *           limit hits .max limit
  * Scale down: exponentially scale down if a cgroup doesn't hit its .low limit
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 7/9] blk-throttle: remove incorrect comment for tg_last_low_overflow_time
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26

From: Kemeng Shi <shikemeng@huawei.com>

Function tg_last_low_overflow_time is called with intermediate node as
following:
throtl_hierarchy_can_downgrade
  throtl_tg_can_downgrade
    tg_last_low_overflow_time

throtl_hierarchy_can_upgrade
  throtl_tg_can_upgrade
    tg_last_low_overflow_time

throtl_hierarchy_can_downgrade/throtl_hierarchy_can_upgrade will traverse
from leaf node to sub-root node and pass traversed intermediate node
to tg_last_low_overflow_time.

No such limit could be found from context and implementation of
tg_last_low_overflow_time, so remove this limit in comment.

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-throttle.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 7db8592dae38..e6a087de414d 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1762,7 +1762,6 @@ static unsigned long __tg_last_low_overflow_time(struct throtl_grp *tg)
 	return min(rtime, wtime);
 }
 
-/* tg should not be an intermediate node */
 static unsigned long tg_last_low_overflow_time(struct throtl_grp *tg)
 {
 	struct throtl_service_queue *parent_sq;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 7/9] blk-throttle: remove incorrect comment for tg_last_low_overflow_time
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, josef-DigfWCa+lFGyeJad7bwFQA,
	axboe-tSWWG44O7X1aa/9Udqfwiw
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shikemeng-hv44wF8Li93QT0dZR+AlfA,
	linfeilong-hv44wF8Li93QT0dZR+AlfA,
	liuzhiqiang26-hv44wF8Li93QT0dZR+AlfA

From: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Function tg_last_low_overflow_time is called with intermediate node as
following:
throtl_hierarchy_can_downgrade
  throtl_tg_can_downgrade
    tg_last_low_overflow_time

throtl_hierarchy_can_upgrade
  throtl_tg_can_upgrade
    tg_last_low_overflow_time

throtl_hierarchy_can_downgrade/throtl_hierarchy_can_upgrade will traverse
from leaf node to sub-root node and pass traversed intermediate node
to tg_last_low_overflow_time.

No such limit could be found from context and implementation of
tg_last_low_overflow_time, so remove this limit in comment.

Signed-off-by: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Kemeng Shi <shikemeng-XF6JlduFytWkHkcT6e4Xnw@public.gmane.org>
---
 block/blk-throttle.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 7db8592dae38..e6a087de414d 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1762,7 +1762,6 @@ static unsigned long __tg_last_low_overflow_time(struct throtl_grp *tg)
 	return min(rtime, wtime);
 }
 
-/* tg should not be an intermediate node */
 static unsigned long tg_last_low_overflow_time(struct throtl_grp *tg)
 {
 	struct throtl_service_queue *parent_sq;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 8/9] blk-throttle: remove repeat check of elapsed time from last upgrade in throtl_hierarchy_can_downgrade
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26

From: Kemeng Shi <shikemeng@huawei.com>

There is no need to check elapsed time from last upgrade for each node in
hierarchy. Move this check before traversing as throtl_can_upgrade do
to remove repeat check.

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-throttle.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index e6a087de414d..413e668249cf 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1955,8 +1955,7 @@ static bool throtl_tg_can_downgrade(struct throtl_grp *tg)
 	 * If cgroup is below low limit, consider downgrade and throttle other
 	 * cgroups
 	 */
-	if (time_after_eq(now, td->low_upgrade_time + td->throtl_slice) &&
-	    time_after_eq(now, tg_last_low_overflow_time(tg) +
+	if (time_after_eq(now, tg_last_low_overflow_time(tg) +
 					td->throtl_slice) &&
 	    (!throtl_tg_is_idle(tg) ||
 	     !list_empty(&tg_to_blkg(tg)->blkcg->css.children)))
@@ -1966,6 +1965,11 @@ static bool throtl_tg_can_downgrade(struct throtl_grp *tg)
 
 static bool throtl_hierarchy_can_downgrade(struct throtl_grp *tg)
 {
+	struct throtl_data *td = tg->td;
+
+	if (time_before(jiffies, td->low_upgrade_time + td->throtl_slice))
+		return false;
+
 	while (true) {
 		if (!throtl_tg_can_downgrade(tg))
 			return false;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 8/9] blk-throttle: remove repeat check of elapsed time from last upgrade in throtl_hierarchy_can_downgrade
@ 2022-12-05 11:57   ` Kemeng Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, josef-DigfWCa+lFGyeJad7bwFQA,
	axboe-tSWWG44O7X1aa/9Udqfwiw
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shikemeng-hv44wF8Li93QT0dZR+AlfA,
	linfeilong-hv44wF8Li93QT0dZR+AlfA,
	liuzhiqiang26-hv44wF8Li93QT0dZR+AlfA

From: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

There is no need to check elapsed time from last upgrade for each node in
hierarchy. Move this check before traversing as throtl_can_upgrade do
to remove repeat check.

Signed-off-by: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Reported-by: kernel test robot <lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Kemeng Shi <shikemeng-XF6JlduFytWkHkcT6e4Xnw@public.gmane.org>
---
 block/blk-throttle.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index e6a087de414d..413e668249cf 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1955,8 +1955,7 @@ static bool throtl_tg_can_downgrade(struct throtl_grp *tg)
 	 * If cgroup is below low limit, consider downgrade and throttle other
 	 * cgroups
 	 */
-	if (time_after_eq(now, td->low_upgrade_time + td->throtl_slice) &&
-	    time_after_eq(now, tg_last_low_overflow_time(tg) +
+	if (time_after_eq(now, tg_last_low_overflow_time(tg) +
 					td->throtl_slice) &&
 	    (!throtl_tg_is_idle(tg) ||
 	     !list_empty(&tg_to_blkg(tg)->blkcg->css.children)))
@@ -1966,6 +1965,11 @@ static bool throtl_tg_can_downgrade(struct throtl_grp *tg)
 
 static bool throtl_hierarchy_can_downgrade(struct throtl_grp *tg)
 {
+	struct throtl_data *td = tg->td;
+
+	if (time_before(jiffies, td->low_upgrade_time + td->throtl_slice))
+		return false;
+
 	while (true) {
 		if (!throtl_tg_can_downgrade(tg))
 			return false;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 9/9] blk-throttle: Use more siutable time_after check for update of slice_start
  2022-12-05 11:57 [PATCH v3 0/9] A few bugfix and cleanup patches for blk-throttle Kemeng Shi
                   ` (7 preceding siblings ...)
  2022-12-05 11:57   ` Kemeng Shi
@ 2022-12-05 11:57 ` Kemeng Shi
  2022-12-05 20:46   ` Jens Axboe
  9 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-05 11:57 UTC (permalink / raw)
  To: tj, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26

From: Kemeng Shi <shikemeng@huawei.com>

There is no need to update tg->slice_start[rw] to start when they are
equal already. So remove "eq" part of check before update slice_start.

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-throttle.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 413e668249cf..6fb5a2f9e1ee 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -645,7 +645,7 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
 	 * that bandwidth. Do try to make use of that bandwidth while giving
 	 * credit.
 	 */
-	if (time_after_eq(start, tg->slice_start[rw]))
+	if (time_after(start, tg->slice_start[rw]))
 		tg->slice_start[rw] = start;
 
 	tg->slice_end[rw] = jiffies + tg->td->throtl_slice;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/9] blk-throttle: correct stale comment in throtl_pd_init
@ 2022-12-05 20:43     ` Jens Axboe
  0 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2022-12-05 20:43 UTC (permalink / raw)
  To: Kemeng Shi, tj, josef
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26

On 12/5/22 4:57?AM, Kemeng Shi wrote:
> From: Kemeng Shi <shikemeng@huawei.com>
> 
> On the default hierarchy (cgroup2), the throttle interface files don't
> exist in the root cgroup, so the ablity to limit the whole system
> by configuring root group is not existing anymore. In general, cgroup
> doesn't wanna be in the business of restricting resources at the
> system level, so correct the stale comment that we can limit whole
> system to we can only limit subtree.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Which one should be used? You have duplicate SOBs in each of the
commits. It's marked as being from Kemeng Shi <shikemeng@huawei.com> so
that is what I'll use.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/9] blk-throttle: correct stale comment in throtl_pd_init
@ 2022-12-05 20:43     ` Jens Axboe
  0 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2022-12-05 20:43 UTC (permalink / raw)
  To: Kemeng Shi, tj-DgEjT+Ai2ygdnm+yROfE0A, josef-DigfWCa+lFGyeJad7bwFQA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shikemeng-hv44wF8Li93QT0dZR+AlfA,
	linfeilong-hv44wF8Li93QT0dZR+AlfA,
	liuzhiqiang26-hv44wF8Li93QT0dZR+AlfA

On 12/5/22 4:57?AM, Kemeng Shi wrote:
> From: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> 
> On the default hierarchy (cgroup2), the throttle interface files don't
> exist in the root cgroup, so the ablity to limit the whole system
> by configuring root group is not existing anymore. In general, cgroup
> doesn't wanna be in the business of restricting resources at the
> system level, so correct the stale comment that we can limit whole
> system to we can only limit subtree.
> 
> Signed-off-by: Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Signed-off-by: Kemeng Shi <shikemeng-XF6JlduFytWkHkcT6e4Xnw@public.gmane.org>

Which one should be used? You have duplicate SOBs in each of the
commits. It's marked as being from Kemeng Shi <shikemeng-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> so
that is what I'll use.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/9] A few bugfix and cleanup patches for blk-throttle
@ 2022-12-05 20:46   ` Jens Axboe
  0 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2022-12-05 20:46 UTC (permalink / raw)
  To: tj, josef, Kemeng Shi
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26


On Mon, 05 Dec 2022 19:57:00 +0800, Kemeng Shi wrote:
> hierarchy, corret comment and so on. More details can be found in
> respective changelogs. Thanks.
> 

Applied, thanks!

[1/9] blk-throttle: correct stale comment in throtl_pd_init
      commit: f56019aef353576f43f945fdd065858145090582
[2/9] blk-throttle: Fix that bps of child could exceed bps limited in parent
      commit: 84aca0a7e039c8735abc0f89f3f48e9006c0dfc7
[3/9] blk-throttle: ignore cgroup without io queued in blk_throtl_cancel_bios
      commit: eb184791821409c37fef4f67638bb56bdaa82900
[4/9] blk-throttle: correct calculation of wait time in tg_may_dispatch
      commit: 183daeb11de871b073515d14ec1e3bc0da79e038
[5/9] blk-throttle: simpfy low limit reached check in throtl_tg_can_upgrade
      commit: a4d508e333829a8394e59efa06ce56e51f3e2b29
[6/9] blk-throttle: fix typo in comment of throtl_adjusted_limit
      commit: 009df341714c6c20a44dd9268681a8bff10bb050
[7/9] blk-throttle: remove incorrect comment for tg_last_low_overflow_time
      commit: e3031d4c7d2c5bff6b5944d61d4e31319739d216
[8/9] blk-throttle: remove repeat check of elapsed time from last upgrade in throtl_hierarchy_can_downgrade
      commit: 9c9f209d9d81ea67cd84f53f470a592c252d845d
[9/9] blk-throttle: Use more siutable time_after check for update of slice_start
      commit: eea3e8b74aa1648fc96b739458d067a6e498c302

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/9] A few bugfix and cleanup patches for blk-throttle
@ 2022-12-05 20:46   ` Jens Axboe
  0 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2022-12-05 20:46 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, josef-DigfWCa+lFGyeJad7bwFQA, Kemeng Shi
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shikemeng-hv44wF8Li93QT0dZR+AlfA,
	linfeilong-hv44wF8Li93QT0dZR+AlfA,
	liuzhiqiang26-hv44wF8Li93QT0dZR+AlfA


On Mon, 05 Dec 2022 19:57:00 +0800, Kemeng Shi wrote:
> hierarchy, corret comment and so on. More details can be found in
> respective changelogs. Thanks.
> 

Applied, thanks!

[1/9] blk-throttle: correct stale comment in throtl_pd_init
      commit: f56019aef353576f43f945fdd065858145090582
[2/9] blk-throttle: Fix that bps of child could exceed bps limited in parent
      commit: 84aca0a7e039c8735abc0f89f3f48e9006c0dfc7
[3/9] blk-throttle: ignore cgroup without io queued in blk_throtl_cancel_bios
      commit: eb184791821409c37fef4f67638bb56bdaa82900
[4/9] blk-throttle: correct calculation of wait time in tg_may_dispatch
      commit: 183daeb11de871b073515d14ec1e3bc0da79e038
[5/9] blk-throttle: simpfy low limit reached check in throtl_tg_can_upgrade
      commit: a4d508e333829a8394e59efa06ce56e51f3e2b29
[6/9] blk-throttle: fix typo in comment of throtl_adjusted_limit
      commit: 009df341714c6c20a44dd9268681a8bff10bb050
[7/9] blk-throttle: remove incorrect comment for tg_last_low_overflow_time
      commit: e3031d4c7d2c5bff6b5944d61d4e31319739d216
[8/9] blk-throttle: remove repeat check of elapsed time from last upgrade in throtl_hierarchy_can_downgrade
      commit: 9c9f209d9d81ea67cd84f53f470a592c252d845d
[9/9] blk-throttle: Use more siutable time_after check for update of slice_start
      commit: eea3e8b74aa1648fc96b739458d067a6e498c302

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/9] blk-throttle: correct stale comment in throtl_pd_init
  2022-12-05 20:43     ` Jens Axboe
  (?)
@ 2022-12-06  1:17     ` Kemeng Shi
  -1 siblings, 0 replies; 22+ messages in thread
From: Kemeng Shi @ 2022-12-06  1:17 UTC (permalink / raw)
  To: Jens Axboe, tj, josef
  Cc: cgroups, linux-block, linux-kernel, shikemeng, linfeilong, liuzhiqiang26



on 12/6/2022 4:43 AM, Jens Axboe wrote:
> On 12/5/22 4:57?AM, Kemeng Shi wrote:
>> From: Kemeng Shi <shikemeng@huawei.com>
>>
>> On the default hierarchy (cgroup2), the throttle interface files don't
>> exist in the root cgroup, so the ablity to limit the whole system
>> by configuring root group is not existing anymore. In general, cgroup
>> doesn't wanna be in the business of restricting resources at the
>> system level, so correct the stale comment that we can limit whole
>> system to we can only limit subtree.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
>> Acked-by: Tejun Heo <tj@kernel.org>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> 
> Which one should be used? You have duplicate SOBs in each of the
> commits. It's marked as being from Kemeng Shi <shikemeng@huawei.com> so
> that is what I'll use.

Either one is great, thanks.

-- 
Best wishes
Kemeng Shi


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-12-06  1:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-05 11:57 [PATCH v3 0/9] A few bugfix and cleanup patches for blk-throttle Kemeng Shi
2022-12-05 11:57 ` [PATCH v3 1/9] blk-throttle: correct stale comment in throtl_pd_init Kemeng Shi
2022-12-05 20:43   ` Jens Axboe
2022-12-05 20:43     ` Jens Axboe
2022-12-06  1:17     ` Kemeng Shi
2022-12-05 11:57 ` [PATCH v3 2/9] blk-throttle: Fix that bps of child could exceed bps limited in parent Kemeng Shi
2022-12-05 11:57   ` Kemeng Shi
2022-12-05 11:57 ` [PATCH v3 3/9] blk-throttle: ignore cgroup without io queued in blk_throtl_cancel_bios Kemeng Shi
2022-12-05 11:57   ` Kemeng Shi
2022-12-05 11:57 ` [PATCH v3 4/9] blk-throttle: correct calculation of wait time in tg_may_dispatch Kemeng Shi
2022-12-05 11:57   ` Kemeng Shi
2022-12-05 11:57 ` [PATCH v3 5/9] blk-throttle: simpfy low limit reached check in throtl_tg_can_upgrade Kemeng Shi
2022-12-05 11:57   ` Kemeng Shi
2022-12-05 11:57 ` [PATCH v3 6/9] blk-throttle: fix typo in comment of throtl_adjusted_limit Kemeng Shi
2022-12-05 11:57   ` Kemeng Shi
2022-12-05 11:57 ` [PATCH v3 7/9] blk-throttle: remove incorrect comment for tg_last_low_overflow_time Kemeng Shi
2022-12-05 11:57   ` Kemeng Shi
2022-12-05 11:57 ` [PATCH v3 8/9] blk-throttle: remove repeat check of elapsed time from last upgrade in throtl_hierarchy_can_downgrade Kemeng Shi
2022-12-05 11:57   ` Kemeng Shi
2022-12-05 11:57 ` [PATCH v3 9/9] blk-throttle: Use more siutable time_after check for update of slice_start Kemeng Shi
2022-12-05 20:46 ` [PATCH v3 0/9] A few bugfix and cleanup patches for blk-throttle Jens Axboe
2022-12-05 20:46   ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.