All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND v6 0/8] bugfix and cleanup for blk-throttle
@ 2022-07-01  9:34 ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

Resend v5 by a new mail address(huaweicloud.com) because old
address(huawei.com)has some problem that emails can end up in spam.
Please let me know if anyone still see this patchset end up in spam.

Changes in v6:
 - rename parameter in patch 3
 - add comments and reviewed tag for patch 4
Changes in v5:
 - add comments in patch 4
 - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
 patch 4
 - and cleanup patches 5-8
Changes in v4:
 - add reviewed-by tag for patch 1
 - add patch 2,3
 - use a different way to fix io hung in patch 4
Changes in v3:
 - fix a check in patch 1
 - fix link err in patch 2 on 32-bit platform
 - handle overflow in patch 2
Changes in v2:
 - use a new solution suggested by Ming
 - change the title of patch 1
 - add patch 2

Patch 1 fix that blk-throttle can't work if multiple bios are throttle,
Patch 2 fix overflow while calculating wait time
Patch 3,4 fix io hung due to configuration updates.
Patch 5-8 are cleanup patches, there are no functional changes, just
some places that I think can be optimized during code review.

Previous version:
v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
v4: https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
v5: https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/

Yu Kuai (8):
  blk-throttle: fix that io throttle can only work for single bio
  blk-throttle: prevent overflow while calculating wait time
  blk-throttle: factor out code to calculate ios/bytes_allowed
  blk-throttle: fix io hung due to config updates
  blk-throttle: use 'READ/WRITE' instead of '0/1'
  blk-throttle: calling throtl_dequeue/enqueue_tg in pairs
  blk-throttle: cleanup tg_update_disptime()
  blk-throttle: clean up flag 'THROTL_TG_PENDING'

 block/blk-throttle.c | 168 +++++++++++++++++++++++++++++--------------
 block/blk-throttle.h |  16 +++--
 2 files changed, 128 insertions(+), 56 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH RESEND v6 0/8] bugfix and cleanup for blk-throttle
@ 2022-07-01  9:34 ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, mkoutny-IBi9RG/b67k,
	axboe-tSWWG44O7X1aa/9Udqfwiw, ming.lei-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yukuai3-hv44wF8Li93QT0dZR+AlfA, yukuai1-XF6JlduFytWkHkcT6e4Xnw,
	yi.zhang-hv44wF8Li93QT0dZR+AlfA

From: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Resend v5 by a new mail address(huaweicloud.com) because old
address(huawei.com)has some problem that emails can end up in spam.
Please let me know if anyone still see this patchset end up in spam.

Changes in v6:
 - rename parameter in patch 3
 - add comments and reviewed tag for patch 4
Changes in v5:
 - add comments in patch 4
 - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
 patch 4
 - and cleanup patches 5-8
Changes in v4:
 - add reviewed-by tag for patch 1
 - add patch 2,3
 - use a different way to fix io hung in patch 4
Changes in v3:
 - fix a check in patch 1
 - fix link err in patch 2 on 32-bit platform
 - handle overflow in patch 2
Changes in v2:
 - use a new solution suggested by Ming
 - change the title of patch 1
 - add patch 2

Patch 1 fix that blk-throttle can't work if multiple bios are throttle,
Patch 2 fix overflow while calculating wait time
Patch 3,4 fix io hung due to configuration updates.
Patch 5-8 are cleanup patches, there are no functional changes, just
some places that I think can be optimized during code review.

Previous version:
v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/
v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/
v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/
v4: https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/
v5: https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/

Yu Kuai (8):
  blk-throttle: fix that io throttle can only work for single bio
  blk-throttle: prevent overflow while calculating wait time
  blk-throttle: factor out code to calculate ios/bytes_allowed
  blk-throttle: fix io hung due to config updates
  blk-throttle: use 'READ/WRITE' instead of '0/1'
  blk-throttle: calling throtl_dequeue/enqueue_tg in pairs
  blk-throttle: cleanup tg_update_disptime()
  blk-throttle: clean up flag 'THROTL_TG_PENDING'

 block/blk-throttle.c | 168 +++++++++++++++++++++++++++++--------------
 block/blk-throttle.h |  16 +++--
 2 files changed, 128 insertions(+), 56 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH RESEND v6 1/8] blk-throttle: fix that io throttle can only work for single bio
  2022-07-01  9:34 ` Yu Kuai
  (?)
@ 2022-07-01  9:34 ` Yu Kuai
  2022-07-27 18:27     ` Tejun Heo
  -1 siblings, 1 reply; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
introduce a new problem, for example:

Test scripts:
cd /sys/fs/cgroup/blkio/
echo "8:0 1024" > blkio.throttle.write_bps_device
echo $$ > cgroup.procs
dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &

Test result:
10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s
10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s

The problem is that the second bio is finished after 10s instead of 20s.
This is because if some bios are already queued, current bio is queued
directly and the flag 'BIO_THROTTLED' is set. And later, when former
bios are dispatched, this bio will be dispatched without waiting at all,
this is due to tg_with_in_bps_limit() return 0 for this bio.

In order to fix the problem, don't skip flaged bio in
tg_with_in_bps_limit(), and for the problem that split bio can be
double accounted, compensate the over-accounting in __blk_throtl_bio().

Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-throttle.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 139b2d7a99e2..5c1d1c4d8188 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
 	/* no need to throttle if this bio's bytes have been accounted */
-	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) {
+	if (bps_limit == U64_MAX) {
 		if (wait)
 			*wait = 0;
 		return true;
@@ -921,11 +921,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
 	/* Charge the bio to the group */
-	if (!bio_flagged(bio, BIO_THROTTLED)) {
-		tg->bytes_disp[rw] += bio_size;
-		tg->last_bytes_disp[rw] += bio_size;
-	}
-
+	tg->bytes_disp[rw] += bio_size;
+	tg->last_bytes_disp[rw] += bio_size;
 	tg->io_disp[rw]++;
 	tg->last_io_disp[rw]++;
 
@@ -2121,6 +2118,21 @@ bool __blk_throtl_bio(struct bio *bio)
 			tg->last_low_overflow_time[rw] = jiffies;
 		throtl_downgrade_check(tg);
 		throtl_upgrade_check(tg);
+
+		/*
+		 * re-entered bio has accounted bytes already, so try to
+		 * compensate previous over-accounting. However, if new
+		 * slice is started, just forget it.
+		 */
+		if (bio_flagged(bio, BIO_THROTTLED)) {
+			unsigned int bio_size = throtl_bio_data_size(bio);
+
+			if (tg->bytes_disp[rw] >= bio_size)
+				tg->bytes_disp[rw] -= bio_size;
+			if (tg->last_bytes_disp[rw] >= bio_size)
+				tg->last_bytes_disp[rw] -= bio_size;
+		}
+
 		/* throtl is FIFO - if bios are already queued, should queue */
 		if (sq->nr_queued[rw])
 			break;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RESEND v6 2/8] blk-throttle: prevent overflow while calculating wait time
@ 2022-07-01  9:34   ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

In tg_with_in_bps_limit(), 'bps_limit * jiffy_elapsed_rnd' might
overflow. FIx the problem by calling mul_u64_u64_div_u64() instead.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 5c1d1c4d8188..a89c62bef2fb 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -806,7 +806,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 				 u64 bps_limit, unsigned long *wait)
 {
 	bool rw = bio_data_dir(bio);
-	u64 bytes_allowed, extra_bytes, tmp;
+	u64 bytes_allowed, extra_bytes;
 	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
@@ -824,10 +824,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-
-	tmp = bps_limit * jiffy_elapsed_rnd;
-	do_div(tmp, HZ);
-	bytes_allowed = tmp;
+	bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
+					    (u64)HZ);
 
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RESEND v6 2/8] blk-throttle: prevent overflow while calculating wait time
@ 2022-07-01  9:34   ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, mkoutny-IBi9RG/b67k,
	axboe-tSWWG44O7X1aa/9Udqfwiw, ming.lei-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yukuai3-hv44wF8Li93QT0dZR+AlfA, yukuai1-XF6JlduFytWkHkcT6e4Xnw,
	yi.zhang-hv44wF8Li93QT0dZR+AlfA

From: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

In tg_with_in_bps_limit(), 'bps_limit * jiffy_elapsed_rnd' might
overflow. FIx the problem by calling mul_u64_u64_div_u64() instead.

Signed-off-by: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 block/blk-throttle.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 5c1d1c4d8188..a89c62bef2fb 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -806,7 +806,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 				 u64 bps_limit, unsigned long *wait)
 {
 	bool rw = bio_data_dir(bio);
-	u64 bytes_allowed, extra_bytes, tmp;
+	u64 bytes_allowed, extra_bytes;
 	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
@@ -824,10 +824,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-
-	tmp = bps_limit * jiffy_elapsed_rnd;
-	do_div(tmp, HZ);
-	bytes_allowed = tmp;
+	bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
+					    (u64)HZ);
 
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RESEND v6 3/8] blk-throttle: factor out code to calculate ios/bytes_allowed
  2022-07-01  9:34 ` Yu Kuai
                   ` (2 preceding siblings ...)
  (?)
@ 2022-07-01  9:34 ` Yu Kuai
  -1 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

No functional changes, new apis will be used in later patches to
calculate wait time for throttled bios while updating config.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 51 +++++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index a89c62bef2fb..8612a071305e 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -754,33 +754,20 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
 		   tg->slice_start[rw], tg->slice_end[rw], jiffies);
 }
 
-static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
-				  u32 iops_limit, unsigned long *wait)
+static unsigned int calculate_io_allowed(u32 iops_limit,
+					 unsigned long jiffy_elapsed)
 {
-	bool rw = bio_data_dir(bio);
 	unsigned int io_allowed;
-	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	u64 tmp;
 
-	if (iops_limit == UINT_MAX) {
-		if (wait)
-			*wait = 0;
-		return true;
-	}
-
-	jiffy_elapsed = jiffies - tg->slice_start[rw];
-
-	/* Round up to the next throttle slice, wait time must be nonzero */
-	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-
 	/*
-	 * jiffy_elapsed_rnd should not be a big value as minimum iops can be
+	 * jiffy_elapsed should not be a big value as minimum iops can be
 	 * 1 then at max jiffy elapsed should be equivalent of 1 second as we
 	 * will allow dispatch after 1 second and after that slice should
 	 * have been trimmed.
 	 */
 
-	tmp = (u64)iops_limit * jiffy_elapsed_rnd;
+	tmp = (u64)iops_limit * jiffy_elapsed;
 	do_div(tmp, HZ);
 
 	if (tmp > UINT_MAX)
@@ -788,6 +775,32 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 	else
 		io_allowed = tmp;
 
+	return io_allowed;
+}
+
+static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
+{
+	return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
+}
+
+static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
+				  u32 iops_limit, unsigned long *wait)
+{
+	bool rw = bio_data_dir(bio);
+	unsigned int io_allowed;
+	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
+
+	if (iops_limit == UINT_MAX) {
+		if (wait)
+			*wait = 0;
+		return true;
+	}
+
+	jiffy_elapsed = jiffies - tg->slice_start[rw];
+
+	/* Round up to the next throttle slice, wait time must be nonzero */
+	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
+	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
 		if (wait)
 			*wait = 0;
@@ -824,9 +837,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-	bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
-					    (u64)HZ);
-
+	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
 			*wait = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RESEND v6 4/8] blk-throttle: fix io hung due to config updates
  2022-07-01  9:34 ` Yu Kuai
@ 2022-07-01  9:34   ` Yu Kuai
  -1 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

If new configuration is submitted while a bio is throttled, then new
waiting time is recalculated regardless that the bio might aready wait
for some time:

tg_conf_updated
 throtl_start_new_slice
  tg_update_disptime
  throtl_schedule_next_dispatch

Then io hung can be triggered by always submmiting new configuration
before the throttled bio is dispatched.

Fix the problem by respecting the time that throttled bio aready waited.
In order to do that, add new fields to record how many bytes/io already
waited, and use it to calculate wait time for throttled bio under new
configuration.

Some simple test:
1)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 2048" > blkio.throttle.write_bps_device
{
        sleep 2
        echo "8:0 1024" > blkio.throttle.write_bps_device
} &
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

2)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 1024" > blkio.throttle.write_bps_device
{
        sleep 4
        echo "8:0 2048" > blkio.throttle.write_bps_device
} &
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

test results: io finish time
	before this patch	with this patch
1)	10s			6s
2)	8s			6s

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
---
 block/blk-throttle.c | 58 +++++++++++++++++++++++++++++++++++++++-----
 block/blk-throttle.h |  9 +++++++
 2 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8612a071305e..7b09b48577ba 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -639,6 +639,8 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
 {
 	tg->bytes_disp[rw] = 0;
 	tg->io_disp[rw] = 0;
+	tg->bytes_skipped[rw] = 0;
+	tg->io_skipped[rw] = 0;
 
 	/*
 	 * Previous slice has expired. We must have trimmed it after last
@@ -656,12 +658,17 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
 		   tg->slice_end[rw], jiffies);
 }
 
-static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw)
+static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw,
+					  bool clear_skipped)
 {
 	tg->bytes_disp[rw] = 0;
 	tg->io_disp[rw] = 0;
 	tg->slice_start[rw] = jiffies;
 	tg->slice_end[rw] = jiffies + tg->td->throtl_slice;
+	if (clear_skipped) {
+		tg->bytes_skipped[rw] = 0;
+		tg->io_skipped[rw] = 0;
+	}
 
 	throtl_log(&tg->service_queue,
 		   "[%c] new slice start=%lu end=%lu jiffies=%lu",
@@ -783,6 +790,41 @@ static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
 	return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
 }
 
+static void __tg_update_skipped(struct throtl_grp *tg, bool rw)
+{
+	unsigned long jiffy_elapsed = jiffies - tg->slice_start[rw];
+	u64 bps_limit = tg_bps_limit(tg, rw);
+	u32 iops_limit = tg_iops_limit(tg, rw);
+
+	/*
+	 * Following calculation won't overflow as long as bios that are
+	 * dispatched later won't preempt already throttled bios. Even if such
+	 * overflow do happen, there should be no problem because we are using
+	 * unsigned here, and bytes_skipped/io_skipped will be updated
+	 * correctly.
+	 */
+	if (bps_limit != U64_MAX)
+		tg->bytes_skipped[rw] +=
+			calculate_bytes_allowed(bps_limit, jiffy_elapsed) -
+			tg->bytes_disp[rw];
+	if (iops_limit != UINT_MAX)
+		tg->io_skipped[rw] +=
+			calculate_io_allowed(iops_limit, jiffy_elapsed) -
+			tg->io_disp[rw];
+}
+
+static void tg_update_skipped(struct throtl_grp *tg)
+{
+	if (tg->service_queue.nr_queued[READ])
+		__tg_update_skipped(tg, READ);
+	if (tg->service_queue.nr_queued[WRITE])
+		__tg_update_skipped(tg, WRITE);
+
+	throtl_log(&tg->service_queue, "%s: %llu %llu %u %u\n", __func__,
+		   tg->bytes_skipped[READ], tg->bytes_skipped[WRITE],
+		   tg->io_skipped[READ], tg->io_skipped[WRITE]);
+}
+
 static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 				  u32 iops_limit, unsigned long *wait)
 {
@@ -800,7 +842,8 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 
 	/* Round up to the next throttle slice, wait time must be nonzero */
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
+	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd) +
+		     tg->io_skipped[rw];
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
 		if (wait)
 			*wait = 0;
@@ -837,7 +880,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
+	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd) +
+			tg->bytes_skipped[rw];
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
 			*wait = 0;
@@ -898,7 +942,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 	 * slice and it should be extended instead.
 	 */
 	if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]))
-		throtl_start_new_slice(tg, rw);
+		throtl_start_new_slice(tg, rw, true);
 	else {
 		if (time_before(tg->slice_end[rw],
 		    jiffies + tg->td->throtl_slice))
@@ -1327,8 +1371,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
 	 * that a group's limit are dropped suddenly and we don't want to
 	 * account recently dispatched IO with new low rate.
 	 */
-	throtl_start_new_slice(tg, READ);
-	throtl_start_new_slice(tg, WRITE);
+	throtl_start_new_slice(tg, READ, false);
+	throtl_start_new_slice(tg, WRITE, false);
 
 	if (tg->flags & THROTL_TG_PENDING) {
 		tg_update_disptime(tg);
@@ -1356,6 +1400,7 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of,
 		v = U64_MAX;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_skipped(tg);
 
 	if (is_u64)
 		*(u64 *)((void *)tg + of_cft(of)->private) = v;
@@ -1542,6 +1587,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of,
 		return ret;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_skipped(tg);
 
 	v[0] = tg->bps_conf[READ][index];
 	v[1] = tg->bps_conf[WRITE][index];
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index c1b602996127..371d624af845 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -115,6 +115,15 @@ struct throtl_grp {
 	uint64_t bytes_disp[2];
 	/* Number of bio's dispatched in current slice */
 	unsigned int io_disp[2];
+	/*
+	 * The following two fields are used to calculate new wait time for
+	 * throttled bio when new configuration is submmited.
+	 *
+	 * Number of bytes will be skipped in current slice
+	 */
+	uint64_t bytes_skipped[2];
+	/* Number of bio will be skipped in current slice */
+	unsigned int io_skipped[2];
 
 	unsigned long last_low_overflow_time[2];
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RESEND v6 4/8] blk-throttle: fix io hung due to config updates
@ 2022-07-01  9:34   ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

If new configuration is submitted while a bio is throttled, then new
waiting time is recalculated regardless that the bio might aready wait
for some time:

tg_conf_updated
 throtl_start_new_slice
  tg_update_disptime
  throtl_schedule_next_dispatch

Then io hung can be triggered by always submmiting new configuration
before the throttled bio is dispatched.

Fix the problem by respecting the time that throttled bio aready waited.
In order to do that, add new fields to record how many bytes/io already
waited, and use it to calculate wait time for throttled bio under new
configuration.

Some simple test:
1)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 2048" > blkio.throttle.write_bps_device
{
        sleep 2
        echo "8:0 1024" > blkio.throttle.write_bps_device
} &
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

2)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 1024" > blkio.throttle.write_bps_device
{
        sleep 4
        echo "8:0 2048" > blkio.throttle.write_bps_device
} &
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

test results: io finish time
	before this patch	with this patch
1)	10s			6s
2)	8s			6s

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Michal Koutn√Ω <mkoutny@suse.com>
---
 block/blk-throttle.c | 58 +++++++++++++++++++++++++++++++++++++++-----
 block/blk-throttle.h |  9 +++++++
 2 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8612a071305e..7b09b48577ba 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -639,6 +639,8 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
 {
 	tg->bytes_disp[rw] = 0;
 	tg->io_disp[rw] = 0;
+	tg->bytes_skipped[rw] = 0;
+	tg->io_skipped[rw] = 0;
 
 	/*
 	 * Previous slice has expired. We must have trimmed it after last
@@ -656,12 +658,17 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
 		   tg->slice_end[rw], jiffies);
 }
 
-static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw)
+static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw,
+					  bool clear_skipped)
 {
 	tg->bytes_disp[rw] = 0;
 	tg->io_disp[rw] = 0;
 	tg->slice_start[rw] = jiffies;
 	tg->slice_end[rw] = jiffies + tg->td->throtl_slice;
+	if (clear_skipped) {
+		tg->bytes_skipped[rw] = 0;
+		tg->io_skipped[rw] = 0;
+	}
 
 	throtl_log(&tg->service_queue,
 		   "[%c] new slice start=%lu end=%lu jiffies=%lu",
@@ -783,6 +790,41 @@ static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
 	return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
 }
 
+static void __tg_update_skipped(struct throtl_grp *tg, bool rw)
+{
+	unsigned long jiffy_elapsed = jiffies - tg->slice_start[rw];
+	u64 bps_limit = tg_bps_limit(tg, rw);
+	u32 iops_limit = tg_iops_limit(tg, rw);
+
+	/*
+	 * Following calculation won't overflow as long as bios that are
+	 * dispatched later won't preempt already throttled bios. Even if such
+	 * overflow do happen, there should be no problem because we are using
+	 * unsigned here, and bytes_skipped/io_skipped will be updated
+	 * correctly.
+	 */
+	if (bps_limit != U64_MAX)
+		tg->bytes_skipped[rw] +=
+			calculate_bytes_allowed(bps_limit, jiffy_elapsed) -
+			tg->bytes_disp[rw];
+	if (iops_limit != UINT_MAX)
+		tg->io_skipped[rw] +=
+			calculate_io_allowed(iops_limit, jiffy_elapsed) -
+			tg->io_disp[rw];
+}
+
+static void tg_update_skipped(struct throtl_grp *tg)
+{
+	if (tg->service_queue.nr_queued[READ])
+		__tg_update_skipped(tg, READ);
+	if (tg->service_queue.nr_queued[WRITE])
+		__tg_update_skipped(tg, WRITE);
+
+	throtl_log(&tg->service_queue, "%s: %llu %llu %u %u\n", __func__,
+		   tg->bytes_skipped[READ], tg->bytes_skipped[WRITE],
+		   tg->io_skipped[READ], tg->io_skipped[WRITE]);
+}
+
 static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 				  u32 iops_limit, unsigned long *wait)
 {
@@ -800,7 +842,8 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 
 	/* Round up to the next throttle slice, wait time must be nonzero */
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
+	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd) +
+		     tg->io_skipped[rw];
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
 		if (wait)
 			*wait = 0;
@@ -837,7 +880,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
+	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd) +
+			tg->bytes_skipped[rw];
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
 			*wait = 0;
@@ -898,7 +942,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 	 * slice and it should be extended instead.
 	 */
 	if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]))
-		throtl_start_new_slice(tg, rw);
+		throtl_start_new_slice(tg, rw, true);
 	else {
 		if (time_before(tg->slice_end[rw],
 		    jiffies + tg->td->throtl_slice))
@@ -1327,8 +1371,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
 	 * that a group's limit are dropped suddenly and we don't want to
 	 * account recently dispatched IO with new low rate.
 	 */
-	throtl_start_new_slice(tg, READ);
-	throtl_start_new_slice(tg, WRITE);
+	throtl_start_new_slice(tg, READ, false);
+	throtl_start_new_slice(tg, WRITE, false);
 
 	if (tg->flags & THROTL_TG_PENDING) {
 		tg_update_disptime(tg);
@@ -1356,6 +1400,7 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of,
 		v = U64_MAX;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_skipped(tg);
 
 	if (is_u64)
 		*(u64 *)((void *)tg + of_cft(of)->private) = v;
@@ -1542,6 +1587,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of,
 		return ret;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_skipped(tg);
 
 	v[0] = tg->bps_conf[READ][index];
 	v[1] = tg->bps_conf[WRITE][index];
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index c1b602996127..371d624af845 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -115,6 +115,15 @@ struct throtl_grp {
 	uint64_t bytes_disp[2];
 	/* Number of bio's dispatched in current slice */
 	unsigned int io_disp[2];
+	/*
+	 * The following two fields are used to calculate new wait time for
+	 * throttled bio when new configuration is submmited.
+	 *
+	 * Number of bytes will be skipped in current slice
+	 */
+	uint64_t bytes_skipped[2];
+	/* Number of bio will be skipped in current slice */
+	unsigned int io_skipped[2];
 
 	unsigned long last_low_overflow_time[2];
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RESEND v6 5/8] blk-throttle: use 'READ/WRITE' instead of '0/1'
  2022-07-01  9:34 ` Yu Kuai
                   ` (4 preceding siblings ...)
  (?)
@ 2022-07-01  9:34 ` Yu Kuai
  2022-07-27 18:39   ` Tejun Heo
  -1 siblings, 1 reply; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

Make the code easier to read, like everywhere else.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 7b09b48577ba..e690dc1c1cde 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -329,8 +329,8 @@ static struct bio *throtl_pop_queued(struct list_head *queued,
 /* init a service_queue, assumes the caller zeroed it */
 static void throtl_service_queue_init(struct throtl_service_queue *sq)
 {
-	INIT_LIST_HEAD(&sq->queued[0]);
-	INIT_LIST_HEAD(&sq->queued[1]);
+	INIT_LIST_HEAD(&sq->queued[READ]);
+	INIT_LIST_HEAD(&sq->queued[WRITE]);
 	sq->pending_tree = RB_ROOT_CACHED;
 	timer_setup(&sq->pending_timer, throtl_pending_timer_fn, 0);
 }
@@ -1156,7 +1156,7 @@ static int throtl_select_dispatch(struct throtl_service_queue *parent_sq)
 		nr_disp += throtl_dispatch_tg(tg);
 
 		sq = &tg->service_queue;
-		if (sq->nr_queued[0] || sq->nr_queued[1])
+		if (sq->nr_queued[READ] || sq->nr_queued[WRITE])
 			tg_update_disptime(tg);
 
 		if (nr_disp >= THROTL_QUANTUM)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RESEND v6 6/8] blk-throttle: calling throtl_dequeue/enqueue_tg in pairs
  2022-07-01  9:34 ` Yu Kuai
                   ` (5 preceding siblings ...)
  (?)
@ 2022-07-01  9:34 ` Yu Kuai
  2022-07-27 18:40   ` Tejun Heo
  -1 siblings, 1 reply; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

It's a litter weird to call throtl_dequeue_tg() unconditionally in
throtl_select_dispatch(), since it will be called in
tg_update_disptime() again if some bio is still throttled.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index e690dc1c1cde..ab30efedff4e 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1151,13 +1151,13 @@ static int throtl_select_dispatch(struct throtl_service_queue *parent_sq)
 		if (time_before(jiffies, tg->disptime))
 			break;
 
-		throtl_dequeue_tg(tg);
-
 		nr_disp += throtl_dispatch_tg(tg);
 
 		sq = &tg->service_queue;
 		if (sq->nr_queued[READ] || sq->nr_queued[WRITE])
 			tg_update_disptime(tg);
+		else
+			throtl_dequeue_tg(tg);
 
 		if (nr_disp >= THROTL_QUANTUM)
 			break;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RESEND v6 7/8] blk-throttle: cleanup tg_update_disptime()
  2022-07-01  9:34 ` Yu Kuai
                   ` (6 preceding siblings ...)
  (?)
@ 2022-07-01  9:34 ` Yu Kuai
  2022-07-27 18:42     ` Tejun Heo
  -1 siblings, 1 reply; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

th_update_disptime() only need to adjust postion for 'tg' in
'parent_sq', there is no need to call throtl_enqueue/dequeue_tg().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index ab30efedff4e..473f0b651ef0 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -520,7 +520,6 @@ static void throtl_rb_erase(struct rb_node *n,
 {
 	rb_erase_cached(n, &parent_sq->pending_tree);
 	RB_CLEAR_NODE(n);
-	--parent_sq->nr_pending;
 }
 
 static void update_min_dispatch_time(struct throtl_service_queue *parent_sq)
@@ -572,7 +571,11 @@ static void throtl_enqueue_tg(struct throtl_grp *tg)
 static void throtl_dequeue_tg(struct throtl_grp *tg)
 {
 	if (tg->flags & THROTL_TG_PENDING) {
-		throtl_rb_erase(&tg->rb_node, tg->service_queue.parent_sq);
+		struct throtl_service_queue *parent_sq =
+			tg->service_queue.parent_sq;
+
+		throtl_rb_erase(&tg->rb_node, parent_sq);
+		--parent_sq->nr_pending;
 		tg->flags &= ~THROTL_TG_PENDING;
 	}
 }
@@ -1040,9 +1043,9 @@ static void tg_update_disptime(struct throtl_grp *tg)
 	disptime = jiffies + min_wait;
 
 	/* Update dispatch time */
-	throtl_dequeue_tg(tg);
+	throtl_rb_erase(&tg->rb_node, tg->service_queue.parent_sq);
 	tg->disptime = disptime;
-	throtl_enqueue_tg(tg);
+	tg_service_queue_add(tg);
 
 	/* see throtl_add_bio_tg() */
 	tg->flags &= ~THROTL_TG_WAS_EMPTY;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RESEND v6 8/8] blk-throttle: clean up flag 'THROTL_TG_PENDING'
  2022-07-01  9:34 ` Yu Kuai
                   ` (7 preceding siblings ...)
  (?)
@ 2022-07-01  9:34 ` Yu Kuai
  2022-07-27 18:44     ` Tejun Heo
  -1 siblings, 1 reply; 44+ messages in thread
From: Yu Kuai @ 2022-07-01  9:34 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

All related operations are inside 'queue_lock', there is no need to use
the flag, we only need to make sure throtl_enqueue_tg() is called when
the first bio is throttled, and throtl_dequeue_tg() is called when the
last throttled bio is dispatched.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 22 ++++++++--------------
 block/blk-throttle.h |  7 +++----
 2 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 473f0b651ef0..29e9f7f6573c 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -561,23 +561,16 @@ static void tg_service_queue_add(struct throtl_grp *tg)
 
 static void throtl_enqueue_tg(struct throtl_grp *tg)
 {
-	if (!(tg->flags & THROTL_TG_PENDING)) {
-		tg_service_queue_add(tg);
-		tg->flags |= THROTL_TG_PENDING;
-		tg->service_queue.parent_sq->nr_pending++;
-	}
+	tg_service_queue_add(tg);
+	tg->service_queue.parent_sq->nr_pending++;
 }
 
 static void throtl_dequeue_tg(struct throtl_grp *tg)
 {
-	if (tg->flags & THROTL_TG_PENDING) {
-		struct throtl_service_queue *parent_sq =
-			tg->service_queue.parent_sq;
+	struct throtl_service_queue *parent_sq = tg->service_queue.parent_sq;
 
-		throtl_rb_erase(&tg->rb_node, parent_sq);
-		--parent_sq->nr_pending;
-		tg->flags &= ~THROTL_TG_PENDING;
-	}
+	throtl_rb_erase(&tg->rb_node, parent_sq);
+	--parent_sq->nr_pending;
 }
 
 /* Call with queue lock held */
@@ -1021,8 +1014,9 @@ static void throtl_add_bio_tg(struct bio *bio, struct throtl_qnode *qn,
 
 	throtl_qnode_add_bio(bio, qn, &sq->queued[rw]);
 
+	if (!sq->nr_queued[READ] && !sq->nr_queued[WRITE])
+		throtl_enqueue_tg(tg);
 	sq->nr_queued[rw]++;
-	throtl_enqueue_tg(tg);
 }
 
 static void tg_update_disptime(struct throtl_grp *tg)
@@ -1377,7 +1371,7 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
 	throtl_start_new_slice(tg, READ, false);
 	throtl_start_new_slice(tg, WRITE, false);
 
-	if (tg->flags & THROTL_TG_PENDING) {
+	if (sq->nr_queued[READ] || sq->nr_queued[WRITE]) {
 		tg_update_disptime(tg);
 		throtl_schedule_next_dispatch(sq->parent_sq, true);
 	}
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index 371d624af845..fba48afbcff3 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -53,10 +53,9 @@ struct throtl_service_queue {
 };
 
 enum tg_state_flags {
-	THROTL_TG_PENDING	= 1 << 0,	/* on parent's pending tree */
-	THROTL_TG_WAS_EMPTY	= 1 << 1,	/* bio_lists[] became non-empty */
-	THROTL_TG_HAS_IOPS_LIMIT = 1 << 2,	/* tg has iops limit */
-	THROTL_TG_CANCELING	= 1 << 3,	/* starts to cancel bio */
+	THROTL_TG_WAS_EMPTY	= 1 << 0,	/* bio_lists[] became non-empty */
+	THROTL_TG_HAS_IOPS_LIMIT = 1 << 1,	/* tg has iops limit */
+	THROTL_TG_CANCELING	= 1 << 2,	/* starts to cancel bio */
 };
 
 enum {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 0/8] bugfix and cleanup for blk-throttle
  2022-07-01  9:34 ` Yu Kuai
@ 2022-07-10  2:39   ` Yu Kuai
  -1 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-10  2:39 UTC (permalink / raw)
  To: Yu Kuai, tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yi.zhang

Hi!

在 2022/07/01 17:34, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Resend v5 by a new mail address(huaweicloud.com) because old
> address(huawei.com)has some problem that emails can end up in spam.
> Please let me know if anyone still see this patchset end up in spam.
> 
> Changes in v6:
>   - rename parameter in patch 3
>   - add comments and reviewed tag for patch 4
> Changes in v5:
>   - add comments in patch 4
>   - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
>   patch 4
>   - and cleanup patches 5-8
> Changes in v4:
>   - add reviewed-by tag for patch 1
>   - add patch 2,3
>   - use a different way to fix io hung in patch 4
> Changes in v3:
>   - fix a check in patch 1
>   - fix link err in patch 2 on 32-bit platform
>   - handle overflow in patch 2
> Changes in v2:
>   - use a new solution suggested by Ming
>   - change the title of patch 1
>   - add patch 2
> 
> Patch 1 fix that blk-throttle can't work if multiple bios are throttle,
> Patch 2 fix overflow while calculating wait time
> Patch 3,4 fix io hung due to configuration updates.
> Patch 5-8 are cleanup patches, there are no functional changes, just
> some places that I think can be optimized during code review.

Jens and Michal,

Can you receive this patchset normally(not end up in spam)?

If so, Tejun, can you take a look? This patchset do fix some problems in
blk-throttle.

BTW, Michal and Ming, it'll be great if you can take a look at other
patches as well.

Thansk,
Kuai
> 
> Previous version:
> v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
> v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
> v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
> v4: https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
> v5: https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/
> 
> Yu Kuai (8):
>    blk-throttle: fix that io throttle can only work for single bio
>    blk-throttle: prevent overflow while calculating wait time
>    blk-throttle: factor out code to calculate ios/bytes_allowed
>    blk-throttle: fix io hung due to config updates
>    blk-throttle: use 'READ/WRITE' instead of '0/1'
>    blk-throttle: calling throtl_dequeue/enqueue_tg in pairs
>    blk-throttle: cleanup tg_update_disptime()
>    blk-throttle: clean up flag 'THROTL_TG_PENDING'
> 
>   block/blk-throttle.c | 168 +++++++++++++++++++++++++++++--------------
>   block/blk-throttle.h |  16 +++--
>   2 files changed, 128 insertions(+), 56 deletions(-)
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 0/8] bugfix and cleanup for blk-throttle
@ 2022-07-10  2:39   ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-10  2:39 UTC (permalink / raw)
  To: Yu Kuai, tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yi.zhang

Hi!

ÔÚ 2022/07/01 17:34, Yu Kuai дµÀ:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Resend v5 by a new mail address(huaweicloud.com) because old
> address(huawei.com)has some problem that emails can end up in spam.
> Please let me know if anyone still see this patchset end up in spam.
> 
> Changes in v6:
>   - rename parameter in patch 3
>   - add comments and reviewed tag for patch 4
> Changes in v5:
>   - add comments in patch 4
>   - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
>   patch 4
>   - and cleanup patches 5-8
> Changes in v4:
>   - add reviewed-by tag for patch 1
>   - add patch 2,3
>   - use a different way to fix io hung in patch 4
> Changes in v3:
>   - fix a check in patch 1
>   - fix link err in patch 2 on 32-bit platform
>   - handle overflow in patch 2
> Changes in v2:
>   - use a new solution suggested by Ming
>   - change the title of patch 1
>   - add patch 2
> 
> Patch 1 fix that blk-throttle can't work if multiple bios are throttle,
> Patch 2 fix overflow while calculating wait time
> Patch 3,4 fix io hung due to configuration updates.
> Patch 5-8 are cleanup patches, there are no functional changes, just
> some places that I think can be optimized during code review.

Jens and Michal,

Can you receive this patchset normally(not end up in spam)?

If so, Tejun, can you take a look? This patchset do fix some problems in
blk-throttle.

BTW, Michal and Ming, it'll be great if you can take a look at other
patches as well.

Thansk,
Kuai
> 
> Previous version:
> v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
> v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
> v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
> v4: https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
> v5: https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/
> 
> Yu Kuai (8):
>    blk-throttle: fix that io throttle can only work for single bio
>    blk-throttle: prevent overflow while calculating wait time
>    blk-throttle: factor out code to calculate ios/bytes_allowed
>    blk-throttle: fix io hung due to config updates
>    blk-throttle: use 'READ/WRITE' instead of '0/1'
>    blk-throttle: calling throtl_dequeue/enqueue_tg in pairs
>    blk-throttle: cleanup tg_update_disptime()
>    blk-throttle: clean up flag 'THROTL_TG_PENDING'
> 
>   block/blk-throttle.c | 168 +++++++++++++++++++++++++++++--------------
>   block/blk-throttle.h |  16 +++--
>   2 files changed, 128 insertions(+), 56 deletions(-)
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 0/8] bugfix and cleanup for blk-throttle
  2022-07-01  9:34 ` Yu Kuai
@ 2022-07-10  2:40   ` Yu Kuai
  -1 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-10  2:40 UTC (permalink / raw)
  To: Yu Kuai, tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yi.zhang

Hi!

在 2022/07/01 17:34, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Resend v5 by a new mail address(huaweicloud.com) because old
> address(huawei.com)has some problem that emails can end up in spam.
> Please let me know if anyone still see this patchset end up in spam.
> 
> Changes in v6:
>   - rename parameter in patch 3
>   - add comments and reviewed tag for patch 4
> Changes in v5:
>   - add comments in patch 4
>   - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
>   patch 4
>   - and cleanup patches 5-8
> Changes in v4:
>   - add reviewed-by tag for patch 1
>   - add patch 2,3
>   - use a different way to fix io hung in patch 4
> Changes in v3:
>   - fix a check in patch 1
>   - fix link err in patch 2 on 32-bit platform
>   - handle overflow in patch 2
> Changes in v2:
>   - use a new solution suggested by Ming
>   - change the title of patch 1
>   - add patch 2
> 
> Patch 1 fix that blk-throttle can't work if multiple bios are throttle,
> Patch 2 fix overflow while calculating wait time
> Patch 3,4 fix io hung due to configuration updates.
> Patch 5-8 are cleanup patches, there are no functional changes, just
> some places that I think can be optimized during code review.
> 
Jens and Michal,

Can you receive this patchset normally(not end up in spam)?

If so, Tejun, can you take a look? This patchset do fix some problems in
blk-throttle.

BTW, Michal and Ming, it'll be great if you can take a look at other
patches as well.

Thansk,
Kuai
> Previous version:
> v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
> v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
> v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
> v4: https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
> v5: https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/
> 
> Yu Kuai (8):
>    blk-throttle: fix that io throttle can only work for single bio
>    blk-throttle: prevent overflow while calculating wait time
>    blk-throttle: factor out code to calculate ios/bytes_allowed
>    blk-throttle: fix io hung due to config updates
>    blk-throttle: use 'READ/WRITE' instead of '0/1'
>    blk-throttle: calling throtl_dequeue/enqueue_tg in pairs
>    blk-throttle: cleanup tg_update_disptime()
>    blk-throttle: clean up flag 'THROTL_TG_PENDING'
> 
>   block/blk-throttle.c | 168 +++++++++++++++++++++++++++++--------------
>   block/blk-throttle.h |  16 +++--
>   2 files changed, 128 insertions(+), 56 deletions(-)
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 0/8] bugfix and cleanup for blk-throttle
@ 2022-07-10  2:40   ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-10  2:40 UTC (permalink / raw)
  To: Yu Kuai, tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yi.zhang

Hi!

ÔÚ 2022/07/01 17:34, Yu Kuai дµÀ:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Resend v5 by a new mail address(huaweicloud.com) because old
> address(huawei.com)has some problem that emails can end up in spam.
> Please let me know if anyone still see this patchset end up in spam.
> 
> Changes in v6:
>   - rename parameter in patch 3
>   - add comments and reviewed tag for patch 4
> Changes in v5:
>   - add comments in patch 4
>   - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
>   patch 4
>   - and cleanup patches 5-8
> Changes in v4:
>   - add reviewed-by tag for patch 1
>   - add patch 2,3
>   - use a different way to fix io hung in patch 4
> Changes in v3:
>   - fix a check in patch 1
>   - fix link err in patch 2 on 32-bit platform
>   - handle overflow in patch 2
> Changes in v2:
>   - use a new solution suggested by Ming
>   - change the title of patch 1
>   - add patch 2
> 
> Patch 1 fix that blk-throttle can't work if multiple bios are throttle,
> Patch 2 fix overflow while calculating wait time
> Patch 3,4 fix io hung due to configuration updates.
> Patch 5-8 are cleanup patches, there are no functional changes, just
> some places that I think can be optimized during code review.
> 
Jens and Michal,

Can you receive this patchset normally(not end up in spam)?

If so, Tejun, can you take a look? This patchset do fix some problems in
blk-throttle.

BTW, Michal and Ming, it'll be great if you can take a look at other
patches as well.

Thansk,
Kuai
> Previous version:
> v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
> v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
> v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
> v4: https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
> v5: https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/
> 
> Yu Kuai (8):
>    blk-throttle: fix that io throttle can only work for single bio
>    blk-throttle: prevent overflow while calculating wait time
>    blk-throttle: factor out code to calculate ios/bytes_allowed
>    blk-throttle: fix io hung due to config updates
>    blk-throttle: use 'READ/WRITE' instead of '0/1'
>    blk-throttle: calling throtl_dequeue/enqueue_tg in pairs
>    blk-throttle: cleanup tg_update_disptime()
>    blk-throttle: clean up flag 'THROTL_TG_PENDING'
> 
>   block/blk-throttle.c | 168 +++++++++++++++++++++++++++++--------------
>   block/blk-throttle.h |  16 +++--
>   2 files changed, 128 insertions(+), 56 deletions(-)
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 0/8] bugfix and cleanup for blk-throttle
  2022-07-10  2:40   ` Yu Kuai
  (?)
@ 2022-07-20 11:45   ` Yu Kuai
  -1 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-20 11:45 UTC (permalink / raw)
  To: Yu Kuai, tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yi.zhang

在 2022/07/10 10:40, Yu Kuai 写道:
> Hi!
> 
> 在 2022/07/01 17:34, Yu Kuai 写道:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Resend v5 by a new mail address(huaweicloud.com) because old
>> address(huawei.com)has some problem that emails can end up in spam.
>> Please let me know if anyone still see this patchset end up in spam.
>>
>> Changes in v6:
>>   - rename parameter in patch 3
>>   - add comments and reviewed tag for patch 4
>> Changes in v5:
>>   - add comments in patch 4
>>   - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
>>   patch 4
>>   - and cleanup patches 5-8
>> Changes in v4:
>>   - add reviewed-by tag for patch 1
>>   - add patch 2,3
>>   - use a different way to fix io hung in patch 4
>> Changes in v3:
>>   - fix a check in patch 1
>>   - fix link err in patch 2 on 32-bit platform
>>   - handle overflow in patch 2
>> Changes in v2:
>>   - use a new solution suggested by Ming
>>   - change the title of patch 1
>>   - add patch 2
>>
>> Patch 1 fix that blk-throttle can't work if multiple bios are throttle,
>> Patch 2 fix overflow while calculating wait time
>> Patch 3,4 fix io hung due to configuration updates.
>> Patch 5-8 are cleanup patches, there are no functional changes, just
>> some places that I think can be optimized during code review.
>>
> Jens and Michal,
> 
> Can you receive this patchset normally(not end up in spam)?
> 
> If so, Tejun, can you take a look? This patchset do fix some problems in
> blk-throttle.

friendly ping ...
> 
> BTW, Michal and Ming, it'll be great if you can take a look at other
> patches as well.
> 
> Thansk,
> Kuai
>> Previous version:
>> v1: 
>> https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
>> v2: 
>> https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
>> v3: 
>> https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
>> v4: 
>> https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
>> v5: 
>> https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/
>>
>> Yu Kuai (8):
>>    blk-throttle: fix that io throttle can only work for single bio
>>    blk-throttle: prevent overflow while calculating wait time
>>    blk-throttle: factor out code to calculate ios/bytes_allowed
>>    blk-throttle: fix io hung due to config updates
>>    blk-throttle: use 'READ/WRITE' instead of '0/1'
>>    blk-throttle: calling throtl_dequeue/enqueue_tg in pairs
>>    blk-throttle: cleanup tg_update_disptime()
>>    blk-throttle: clean up flag 'THROTL_TG_PENDING'
>>
>>   block/blk-throttle.c | 168 +++++++++++++++++++++++++++++--------------
>>   block/blk-throttle.h |  16 +++--
>>   2 files changed, 128 insertions(+), 56 deletions(-)
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 0/8] bugfix and cleanup for blk-throttle
  2022-07-01  9:34 ` Yu Kuai
@ 2022-07-27 12:12   ` Yu Kuai
  -1 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-27 12:12 UTC (permalink / raw)
  To: Yu Kuai, tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yukuai (C)

Hi, Tejun

Are you still interested in this patchset?

在 2022/07/01 17:34, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Resend v5 by a new mail address(huaweicloud.com) because old
> address(huawei.com)has some problem that emails can end up in spam.
> Please let me know if anyone still see this patchset end up in spam.
> 
> Changes in v6:
>   - rename parameter in patch 3
>   - add comments and reviewed tag for patch 4
> Changes in v5:
>   - add comments in patch 4
>   - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
>   patch 4
>   - and cleanup patches 5-8
> Changes in v4:
>   - add reviewed-by tag for patch 1
>   - add patch 2,3
>   - use a different way to fix io hung in patch 4
> Changes in v3:
>   - fix a check in patch 1
>   - fix link err in patch 2 on 32-bit platform
>   - handle overflow in patch 2
> Changes in v2:
>   - use a new solution suggested by Ming
>   - change the title of patch 1
>   - add patch 2
> 
> Patch 1 fix that blk-throttle can't work if multiple bios are throttle,
> Patch 2 fix overflow while calculating wait time
> Patch 3,4 fix io hung due to configuration updates.
> Patch 5-8 are cleanup patches, there are no functional changes, just
> some places that I think can be optimized during code review.
> 
> Previous version:
> v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
> v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
> v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
> v4: https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
> v5: https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/
> 
> Yu Kuai (8):
>    blk-throttle: fix that io throttle can only work for single bio
>    blk-throttle: prevent overflow while calculating wait time
>    blk-throttle: factor out code to calculate ios/bytes_allowed
>    blk-throttle: fix io hung due to config updates
>    blk-throttle: use 'READ/WRITE' instead of '0/1'
>    blk-throttle: calling throtl_dequeue/enqueue_tg in pairs
>    blk-throttle: cleanup tg_update_disptime()
>    blk-throttle: clean up flag 'THROTL_TG_PENDING'
> 
>   block/blk-throttle.c | 168 +++++++++++++++++++++++++++++--------------
>   block/blk-throttle.h |  16 +++--
>   2 files changed, 128 insertions(+), 56 deletions(-)
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 0/8] bugfix and cleanup for blk-throttle
@ 2022-07-27 12:12   ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-27 12:12 UTC (permalink / raw)
  To: Yu Kuai, tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yi.zhang, yukuai (C)

Hi, Tejun

Are you still interested in this patchset?

ÔÚ 2022/07/01 17:34, Yu Kuai дµÀ:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Resend v5 by a new mail address(huaweicloud.com) because old
> address(huawei.com)has some problem that emails can end up in spam.
> Please let me know if anyone still see this patchset end up in spam.
> 
> Changes in v6:
>   - rename parameter in patch 3
>   - add comments and reviewed tag for patch 4
> Changes in v5:
>   - add comments in patch 4
>   - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
>   patch 4
>   - and cleanup patches 5-8
> Changes in v4:
>   - add reviewed-by tag for patch 1
>   - add patch 2,3
>   - use a different way to fix io hung in patch 4
> Changes in v3:
>   - fix a check in patch 1
>   - fix link err in patch 2 on 32-bit platform
>   - handle overflow in patch 2
> Changes in v2:
>   - use a new solution suggested by Ming
>   - change the title of patch 1
>   - add patch 2
> 
> Patch 1 fix that blk-throttle can't work if multiple bios are throttle,
> Patch 2 fix overflow while calculating wait time
> Patch 3,4 fix io hung due to configuration updates.
> Patch 5-8 are cleanup patches, there are no functional changes, just
> some places that I think can be optimized during code review.
> 
> Previous version:
> v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
> v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
> v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/
> v4: https://lore.kernel.org/all/20220523082633.2324980-1-yukuai3@huawei.com/
> v5: https://lore.kernel.org/all/20220528064330.3471000-1-yukuai3@huawei.com/
> 
> Yu Kuai (8):
>    blk-throttle: fix that io throttle can only work for single bio
>    blk-throttle: prevent overflow while calculating wait time
>    blk-throttle: factor out code to calculate ios/bytes_allowed
>    blk-throttle: fix io hung due to config updates
>    blk-throttle: use 'READ/WRITE' instead of '0/1'
>    blk-throttle: calling throtl_dequeue/enqueue_tg in pairs
>    blk-throttle: cleanup tg_update_disptime()
>    blk-throttle: clean up flag 'THROTL_TG_PENDING'
> 
>   block/blk-throttle.c | 168 +++++++++++++++++++++++++++++--------------
>   block/blk-throttle.h |  16 +++--
>   2 files changed, 128 insertions(+), 56 deletions(-)
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 1/8] blk-throttle: fix that io throttle can only work for single bio
@ 2022-07-27 18:27     ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-27 18:27 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yukuai3, yi.zhang

Sorry about the long delay.

So, the code looks nice but I have a difficult time following the logic.

On Fri, Jul 01, 2022 at 05:34:34PM +0800, Yu Kuai wrote:
> @@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
>  	unsigned int bio_size = throtl_bio_data_size(bio);
>  
>  	/* no need to throttle if this bio's bytes have been accounted */
> -	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) {
> +	if (bps_limit == U64_MAX) {
>  		if (wait)
>  			*wait = 0;
>  		return true;
> @@ -921,11 +921,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
>  	unsigned int bio_size = throtl_bio_data_size(bio);
>  
>  	/* Charge the bio to the group */
> -	if (!bio_flagged(bio, BIO_THROTTLED)) {
> -		tg->bytes_disp[rw] += bio_size;
> -		tg->last_bytes_disp[rw] += bio_size;
> -	}
> -
> +	tg->bytes_disp[rw] += bio_size;
> +	tg->last_bytes_disp[rw] += bio_size;
>  	tg->io_disp[rw]++;
>  	tg->last_io_disp[rw]++;

So, we're charging and controlling whether it has already been throttled or
not.

> @@ -2121,6 +2118,21 @@ bool __blk_throtl_bio(struct bio *bio)
>  			tg->last_low_overflow_time[rw] = jiffies;
>  		throtl_downgrade_check(tg);
>  		throtl_upgrade_check(tg);
> +
> +		/*
> +		 * re-entered bio has accounted bytes already, so try to
> +		 * compensate previous over-accounting. However, if new
> +		 * slice is started, just forget it.
> +		 */
> +		if (bio_flagged(bio, BIO_THROTTLED)) {
> +			unsigned int bio_size = throtl_bio_data_size(bio);
> +
> +			if (tg->bytes_disp[rw] >= bio_size)
> +				tg->bytes_disp[rw] -= bio_size;
> +			if (tg->last_bytes_disp[rw] >= bio_size)
> +				tg->last_bytes_disp[rw] -= bio_size;
> +		}

and trying to restore the overaccounting. However, it's not clear why this
helps with the problem you're describing. The comment should be clearly
spelling out why it's done this way and how this works.

Also, blk_throttl_bio() doesn't call into __blk_throtl_bio() at all if
THROTTLED is set and HAS_IOPS_LIMIT is not, so if there are only bw limits,
we end up accounting these IOs twice?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 1/8] blk-throttle: fix that io throttle can only work for single bio
@ 2022-07-27 18:27     ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-27 18:27 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny-IBi9RG/b67k, axboe-tSWWG44O7X1aa/9Udqfwiw,
	ming.lei-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yukuai3-hv44wF8Li93QT0dZR+AlfA, yi.zhang-hv44wF8Li93QT0dZR+AlfA

Sorry about the long delay.

So, the code looks nice but I have a difficult time following the logic.

On Fri, Jul 01, 2022 at 05:34:34PM +0800, Yu Kuai wrote:
> @@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
>  	unsigned int bio_size = throtl_bio_data_size(bio);
>  
>  	/* no need to throttle if this bio's bytes have been accounted */
> -	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) {
> +	if (bps_limit == U64_MAX) {
>  		if (wait)
>  			*wait = 0;
>  		return true;
> @@ -921,11 +921,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
>  	unsigned int bio_size = throtl_bio_data_size(bio);
>  
>  	/* Charge the bio to the group */
> -	if (!bio_flagged(bio, BIO_THROTTLED)) {
> -		tg->bytes_disp[rw] += bio_size;
> -		tg->last_bytes_disp[rw] += bio_size;
> -	}
> -
> +	tg->bytes_disp[rw] += bio_size;
> +	tg->last_bytes_disp[rw] += bio_size;
>  	tg->io_disp[rw]++;
>  	tg->last_io_disp[rw]++;

So, we're charging and controlling whether it has already been throttled or
not.

> @@ -2121,6 +2118,21 @@ bool __blk_throtl_bio(struct bio *bio)
>  			tg->last_low_overflow_time[rw] = jiffies;
>  		throtl_downgrade_check(tg);
>  		throtl_upgrade_check(tg);
> +
> +		/*
> +		 * re-entered bio has accounted bytes already, so try to
> +		 * compensate previous over-accounting. However, if new
> +		 * slice is started, just forget it.
> +		 */
> +		if (bio_flagged(bio, BIO_THROTTLED)) {
> +			unsigned int bio_size = throtl_bio_data_size(bio);
> +
> +			if (tg->bytes_disp[rw] >= bio_size)
> +				tg->bytes_disp[rw] -= bio_size;
> +			if (tg->last_bytes_disp[rw] >= bio_size)
> +				tg->last_bytes_disp[rw] -= bio_size;
> +		}

and trying to restore the overaccounting. However, it's not clear why this
helps with the problem you're describing. The comment should be clearly
spelling out why it's done this way and how this works.

Also, blk_throttl_bio() doesn't call into __blk_throtl_bio() at all if
THROTTLED is set and HAS_IOPS_LIMIT is not, so if there are only bw limits,
we end up accounting these IOs twice?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 2/8] blk-throttle: prevent overflow while calculating wait time
@ 2022-07-27 18:28     ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-27 18:28 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yukuai3, yi.zhang

On Fri, Jul 01, 2022 at 05:34:35PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> In tg_with_in_bps_limit(), 'bps_limit * jiffy_elapsed_rnd' might
> overflow. FIx the problem by calling mul_u64_u64_div_u64() instead.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Acked-by: Tejun Heo <tj@kernel.org>

BTW, have you observed this happening or is it from just reviewing the code?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 2/8] blk-throttle: prevent overflow while calculating wait time
@ 2022-07-27 18:28     ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-27 18:28 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny-IBi9RG/b67k, axboe-tSWWG44O7X1aa/9Udqfwiw,
	ming.lei-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yukuai3-hv44wF8Li93QT0dZR+AlfA, yi.zhang-hv44wF8Li93QT0dZR+AlfA

On Fri, Jul 01, 2022 at 05:34:35PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> 
> In tg_with_in_bps_limit(), 'bps_limit * jiffy_elapsed_rnd' might
> overflow. FIx the problem by calling mul_u64_u64_div_u64() instead.
> 
> Signed-off-by: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

BTW, have you observed this happening or is it from just reviewing the code?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 4/8] blk-throttle: fix io hung due to config updates
  2022-07-01  9:34   ` Yu Kuai
  (?)
@ 2022-07-27 18:39   ` Tejun Heo
  2022-07-28  9:33     ` Michal Koutný
  -1 siblings, 1 reply; 44+ messages in thread
From: Tejun Heo @ 2022-07-27 18:39 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yukuai3, yi.zhang

Hello,

On Fri, Jul 01, 2022 at 05:34:37PM +0800, Yu Kuai wrote:
> +static void __tg_update_skipped(struct throtl_grp *tg, bool rw)
> +{
> +	unsigned long jiffy_elapsed = jiffies - tg->slice_start[rw];
> +	u64 bps_limit = tg_bps_limit(tg, rw);
> +	u32 iops_limit = tg_iops_limit(tg, rw);
> +
> +	/*
> +	 * Following calculation won't overflow as long as bios that are
> +	 * dispatched later won't preempt already throttled bios. Even if such
> +	 * overflow do happen, there should be no problem because we are using
> +	 * unsigned here, and bytes_skipped/io_skipped will be updated
> +	 * correctly.
> +	 */
> +	if (bps_limit != U64_MAX)
> +		tg->bytes_skipped[rw] +=
> +			calculate_bytes_allowed(bps_limit, jiffy_elapsed) -
> +			tg->bytes_disp[rw];
> +	if (iops_limit != UINT_MAX)
> +		tg->io_skipped[rw] +=
> +			calculate_io_allowed(iops_limit, jiffy_elapsed) -
> +			tg->io_disp[rw];

I'm not quiet sure this is correct. What if the limit keeps changing across
different values? Then we'd be calculating the skipped amount based on the
last configuration only which would be incorrect.

It's probably more straight-forward if the code keeps track of the total
budget allowed in the period somewhere and keeps adding to it whenever it
wanna calculates the current budget - sth like:

  tg->bytes_budget[rw] += calculate_bytes_allowed(limit, jiffies - tg->last_budget_at);
  tg->last_budget_at = jiffies;

then, you'd always know the correct budget.

> +}
> +
> +static void tg_update_skipped(struct throtl_grp *tg)
> +{
> +	if (tg->service_queue.nr_queued[READ])
> +		__tg_update_skipped(tg, READ);
> +	if (tg->service_queue.nr_queued[WRITE])
> +		__tg_update_skipped(tg, WRITE);
> +
> +	throtl_log(&tg->service_queue, "%s: %llu %llu %u %u\n", __func__,
> +		   tg->bytes_skipped[READ], tg->bytes_skipped[WRITE],
> +		   tg->io_skipped[READ], tg->io_skipped[WRITE]);
> +}

Also, please add a comment explaining what this is all about. What is the
code trying to achieve, why and how?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 5/8] blk-throttle: use 'READ/WRITE' instead of '0/1'
  2022-07-01  9:34 ` [PATCH RESEND v6 5/8] blk-throttle: use 'READ/WRITE' instead of '0/1' Yu Kuai
@ 2022-07-27 18:39   ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-27 18:39 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yukuai3, yi.zhang

On Fri, Jul 01, 2022 at 05:34:38PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Make the code easier to read, like everywhere else.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 6/8] blk-throttle: calling throtl_dequeue/enqueue_tg in pairs
  2022-07-01  9:34 ` [PATCH RESEND v6 6/8] blk-throttle: calling throtl_dequeue/enqueue_tg in pairs Yu Kuai
@ 2022-07-27 18:40   ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-27 18:40 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yukuai3, yi.zhang

On Fri, Jul 01, 2022 at 05:34:39PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> It's a litter weird to call throtl_dequeue_tg() unconditionally in
         ^
         little

> throtl_select_dispatch(), since it will be called in
> tg_update_disptime() again if some bio is still throttled.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Maybe note that this doesn't create any functional differences in the
description?

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 7/8] blk-throttle: cleanup tg_update_disptime()
@ 2022-07-27 18:42     ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-27 18:42 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yukuai3, yi.zhang

On Fri, Jul 01, 2022 at 05:34:40PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> th_update_disptime() only need to adjust postion for 'tg' in
  ^                         ^
  tg                        needs

> 'parent_sq', there is no need to call throtl_enqueue/dequeue_tg().

What are we gaining / losing by changing this? Why is this better?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 7/8] blk-throttle: cleanup tg_update_disptime()
@ 2022-07-27 18:42     ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-27 18:42 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny-IBi9RG/b67k, axboe-tSWWG44O7X1aa/9Udqfwiw,
	ming.lei-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yukuai3-hv44wF8Li93QT0dZR+AlfA, yi.zhang-hv44wF8Li93QT0dZR+AlfA

On Fri, Jul 01, 2022 at 05:34:40PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> 
> th_update_disptime() only need to adjust postion for 'tg' in
  ^                         ^
  tg                        needs

> 'parent_sq', there is no need to call throtl_enqueue/dequeue_tg().

What are we gaining / losing by changing this? Why is this better?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 8/8] blk-throttle: clean up flag 'THROTL_TG_PENDING'
@ 2022-07-27 18:44     ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-27 18:44 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yukuai3, yi.zhang

On Fri, Jul 01, 2022 at 05:34:41PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> All related operations are inside 'queue_lock', there is no need to use
> the flag, we only need to make sure throtl_enqueue_tg() is called when
> the first bio is throttled, and throtl_dequeue_tg() is called when the
> last throttled bio is dispatched.

Ah, okay, so the prev patch was to enable this cleanup. Can you please note
so in the previous patch and also that this doesn't cause any functional
changes?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 8/8] blk-throttle: clean up flag 'THROTL_TG_PENDING'
@ 2022-07-27 18:44     ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-27 18:44 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny-IBi9RG/b67k, axboe-tSWWG44O7X1aa/9Udqfwiw,
	ming.lei-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yukuai3-hv44wF8Li93QT0dZR+AlfA, yi.zhang-hv44wF8Li93QT0dZR+AlfA

On Fri, Jul 01, 2022 at 05:34:41PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> 
> All related operations are inside 'queue_lock', there is no need to use
> the flag, we only need to make sure throtl_enqueue_tg() is called when
> the first bio is throttled, and throtl_dequeue_tg() is called when the
> last throttled bio is dispatched.

Ah, okay, so the prev patch was to enable this cleanup. Can you please note
so in the previous patch and also that this doesn't cause any functional
changes?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 4/8] blk-throttle: fix io hung due to config updates
  2022-07-27 18:39   ` Tejun Heo
@ 2022-07-28  9:33     ` Michal Koutný
  2022-07-28 10:34         ` Yu Kuai
  0 siblings, 1 reply; 44+ messages in thread
From: Michal Koutný @ 2022-07-28  9:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yu Kuai, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yukuai3, yi.zhang

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

On Wed, Jul 27, 2022 at 08:39:19AM -1000, Tejun Heo <tj@kernel.org> wrote:
> I'm not quiet sure this is correct. What if the limit keeps changing across
> different values? Then we'd be calculating the skipped amount based on the
> last configuration only which would be incorrect.

When one change of configuration is correct, then all changes must be
correct by induction. It's sufficient to take into account only the one
old config and the new one.

This __tg_update_skipped() calculates bytes_skipped with the limit
before the change and bytes_skipped are used (divided by) the new limit
in tg_with_in_bps_limit().
The accumulation of bytes_skipped across multiple changes (until slice
properly ends) is proportional to how bytes_allowed would grow over
time.
That's why I find this correct (I admit I had to look back into my
notes when this was first discussed).

HTH,
Michal

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 2/8] blk-throttle: prevent overflow while calculating wait time
@ 2022-07-28 10:23       ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-28 10:23 UTC (permalink / raw)
  To: Tejun Heo, Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yi.zhang, yukuai (C)

在 2022/07/28 2:28, Tejun Heo 写道:
> On Fri, Jul 01, 2022 at 05:34:35PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> In tg_with_in_bps_limit(), 'bps_limit * jiffy_elapsed_rnd' might
>> overflow. FIx the problem by calling mul_u64_u64_div_u64() instead.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> 
> Acked-by: Tejun Heo <tj@kernel.org>
> 
> BTW, have you observed this happening or is it from just reviewing the code?

It's just from code review.

Thanks.
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 2/8] blk-throttle: prevent overflow while calculating wait time
@ 2022-07-28 10:23       ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-28 10:23 UTC (permalink / raw)
  To: Tejun Heo, Yu Kuai
  Cc: mkoutny-IBi9RG/b67k, axboe-tSWWG44O7X1aa/9Udqfwiw,
	ming.lei-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yi.zhang-hv44wF8Li93QT0dZR+AlfA, yukuai (C)

ÔÚ 2022/07/28 2:28, Tejun Heo дµÀ:
> On Fri, Jul 01, 2022 at 05:34:35PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>>
>> In tg_with_in_bps_limit(), 'bps_limit * jiffy_elapsed_rnd' might
>> overflow. FIx the problem by calling mul_u64_u64_div_u64() instead.
>>
>> Signed-off-by: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> 
> Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> 
> BTW, have you observed this happening or is it from just reviewing the code?

It's just from code review.

Thanks.
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 4/8] blk-throttle: fix io hung due to config updates
@ 2022-07-28 10:34         ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-28 10:34 UTC (permalink / raw)
  To: Michal Koutný, Tejun Heo
  Cc: Yu Kuai, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yi.zhang, yukuai (C)

Hi

在 2022/07/28 17:33, Michal Koutný 写道:
> On Wed, Jul 27, 2022 at 08:39:19AM -1000, Tejun Heo <tj@kernel.org> wrote:
>> I'm not quiet sure this is correct. What if the limit keeps changing across
>> different values? Then we'd be calculating the skipped amount based on the
>> last configuration only which would be incorrect.
> 
> When one change of configuration is correct, then all changes must be
> correct by induction. It's sufficient to take into account only the one
> old config and the new one.
> 
> This __tg_update_skipped() calculates bytes_skipped with the limit
> before the change and bytes_skipped are used (divided by) the new limit
> in tg_with_in_bps_limit().
> The accumulation of bytes_skipped across multiple changes (until slice
> properly ends) is proportional to how bytes_allowed would grow over
> time.
> That's why I find this correct (I admit I had to look back into my
> notes when this was first discussed).
> 
> HTH,
> Michal
> 

Hi, Tejun

Michal already explain it very well, please let me know if you still
thinks there are better ways.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 4/8] blk-throttle: fix io hung due to config updates
@ 2022-07-28 10:34         ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-28 10:34 UTC (permalink / raw)
  To: Michal Koutný, Tejun Heo
  Cc: Yu Kuai, axboe-tSWWG44O7X1aa/9Udqfwiw,
	ming.lei-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yi.zhang-hv44wF8Li93QT0dZR+AlfA, yukuai (C)

Hi

在 2022/07/28 17:33, Michal Koutný 写道:
> On Wed, Jul 27, 2022 at 08:39:19AM -1000, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> I'm not quiet sure this is correct. What if the limit keeps changing across
>> different values? Then we'd be calculating the skipped amount based on the
>> last configuration only which would be incorrect.
> 
> When one change of configuration is correct, then all changes must be
> correct by induction. It's sufficient to take into account only the one
> old config and the new one.
> 
> This __tg_update_skipped() calculates bytes_skipped with the limit
> before the change and bytes_skipped are used (divided by) the new limit
> in tg_with_in_bps_limit().
> The accumulation of bytes_skipped across multiple changes (until slice
> properly ends) is proportional to how bytes_allowed would grow over
> time.
> That's why I find this correct (I admit I had to look back into my
> notes when this was first discussed).
> 
> HTH,
> Michal
> 

Hi, Tejun

Michal already explain it very well, please let me know if you still
thinks there are better ways.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 8/8] blk-throttle: clean up flag 'THROTL_TG_PENDING'
  2022-07-27 18:44     ` Tejun Heo
@ 2022-07-28 11:03       ` Yu Kuai
  -1 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-28 11:03 UTC (permalink / raw)
  To: Tejun Heo, Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yi.zhang, yukuai (C)



在 2022/07/28 2:44, Tejun Heo 写道:
> On Fri, Jul 01, 2022 at 05:34:41PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> All related operations are inside 'queue_lock', there is no need to use
>> the flag, we only need to make sure throtl_enqueue_tg() is called when
>> the first bio is throttled, and throtl_dequeue_tg() is called when the
>> last throttled bio is dispatched.
> 
> Ah, okay, so the prev patch was to enable this cleanup. Can you please note
> so in the previous patch and also that this doesn't cause any functional
> changes?
> 

Of course, I'll do that in next iteration.

Thanks,
Kuai
> Thanks.
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 8/8] blk-throttle: clean up flag 'THROTL_TG_PENDING'
@ 2022-07-28 11:03       ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-28 11:03 UTC (permalink / raw)
  To: Tejun Heo, Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yi.zhang, yukuai (C)



ÔÚ 2022/07/28 2:44, Tejun Heo дµÀ:
> On Fri, Jul 01, 2022 at 05:34:41PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> All related operations are inside 'queue_lock', there is no need to use
>> the flag, we only need to make sure throtl_enqueue_tg() is called when
>> the first bio is throttled, and throtl_dequeue_tg() is called when the
>> last throttled bio is dispatched.
> 
> Ah, okay, so the prev patch was to enable this cleanup. Can you please note
> so in the previous patch and also that this doesn't cause any functional
> changes?
> 

Of course, I'll do that in next iteration.

Thanks,
Kuai
> Thanks.
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 4/8] blk-throttle: fix io hung due to config updates
  2022-07-28 10:34         ` Yu Kuai
  (?)
@ 2022-07-28 16:55         ` Tejun Heo
  -1 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-28 16:55 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Michal Koutný,
	axboe, ming.lei, cgroups, linux-block, linux-kernel, yi.zhang,
	yukuai (C)

On Thu, Jul 28, 2022 at 06:34:44PM +0800, Yu Kuai wrote:
> Hi
> 
> 在 2022/07/28 17:33, Michal Koutný 写道:
> > On Wed, Jul 27, 2022 at 08:39:19AM -1000, Tejun Heo <tj@kernel.org> wrote:
> > > I'm not quiet sure this is correct. What if the limit keeps changing across
> > > different values? Then we'd be calculating the skipped amount based on the
> > > last configuration only which would be incorrect.
> > 
> > When one change of configuration is correct, then all changes must be
> > correct by induction. It's sufficient to take into account only the one
> > old config and the new one.
> > 
> > This __tg_update_skipped() calculates bytes_skipped with the limit
> > before the change and bytes_skipped are used (divided by) the new limit
> > in tg_with_in_bps_limit().
> > The accumulation of bytes_skipped across multiple changes (until slice
> > properly ends) is proportional to how bytes_allowed would grow over
> > time.
> > That's why I find this correct (I admit I had to look back into my
> > notes when this was first discussed).
> > 
> > HTH,
> > Michal
> > 
> 
> Hi, Tejun
> 
> Michal already explain it very well, please let me know if you still
> thinks there are better ways.

Ah, I see, so it's integrating into the skipped counters across multiple
updates. I think it can definitely use comments explaining how it's working
but that looks okay.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 1/8] blk-throttle: fix that io throttle can only work for single bio
@ 2022-07-29  6:32       ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-29  6:32 UTC (permalink / raw)
  To: Tejun Heo, Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yi.zhang, yukuai (C)

Hi, Tejun!

在 2022/07/28 2:27, Tejun Heo 写道:
> Sorry about the long delay.
> 
> So, the code looks nice but I have a difficult time following the logic.
> 
> On Fri, Jul 01, 2022 at 05:34:34PM +0800, Yu Kuai wrote:
>> @@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
>>   	unsigned int bio_size = throtl_bio_data_size(bio);
>>   
>>   	/* no need to throttle if this bio's bytes have been accounted */
>> -	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) {
>> +	if (bps_limit == U64_MAX) {
>>   		if (wait)
>>   			*wait = 0;
>>   		return true;
>> @@ -921,11 +921,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
>>   	unsigned int bio_size = throtl_bio_data_size(bio);
>>   
>>   	/* Charge the bio to the group */
>> -	if (!bio_flagged(bio, BIO_THROTTLED)) {
>> -		tg->bytes_disp[rw] += bio_size;
>> -		tg->last_bytes_disp[rw] += bio_size;
>> -	}
>> -
>> +	tg->bytes_disp[rw] += bio_size;
>> +	tg->last_bytes_disp[rw] += bio_size;
>>   	tg->io_disp[rw]++;
>>   	tg->last_io_disp[rw]++;
> 
> So, we're charging and controlling whether it has already been throttled or
> not.
> 
>> @@ -2121,6 +2118,21 @@ bool __blk_throtl_bio(struct bio *bio)
>>   			tg->last_low_overflow_time[rw] = jiffies;
>>   		throtl_downgrade_check(tg);
>>   		throtl_upgrade_check(tg);
>> +
>> +		/*
>> +		 * re-entered bio has accounted bytes already, so try to
>> +		 * compensate previous over-accounting. However, if new
>> +		 * slice is started, just forget it.
>> +		 */
>> +		if (bio_flagged(bio, BIO_THROTTLED)) {
>> +			unsigned int bio_size = throtl_bio_data_size(bio);
>> +
>> +			if (tg->bytes_disp[rw] >= bio_size)
>> +				tg->bytes_disp[rw] -= bio_size;
>> +			if (tg->last_bytes_disp[rw] >= bio_size)
>> +				tg->last_bytes_disp[rw] -= bio_size;
>> +		}
> 
> and trying to restore the overaccounting. However, it's not clear why this
> helps with the problem you're describing. The comment should be clearly
> spelling out why it's done this way and how this works.
> 
> Also, blk_throttl_bio() doesn't call into __blk_throtl_bio() at all if
> THROTTLED is set and HAS_IOPS_LIMIT is not, so if there are only bw limits,
> we end up accounting these IOs twice?
> 

We need to make sure following conditions is always hold:

1) If a bio is splited, iops limits should count multiple times, while
bps limits should only count once.
2) If a bio is issued while some bios are already throttled, bps limits
should not be ignored.

commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
fixes that 1) is not hold, while it breaks 2). Root cause is that such
bio will be flaged in __blk_throtl_bio(), and later
tg_with_in_bps_limit() will skip flaged bio.

In order to fix this problem, at first, I change that flaged bio won't
be skipped in tg_with_in_bps_limit():

-	if (!bio_flagged(bio, BIO_THROTTLED)) {
-		tg->bytes_disp[rw] += bio_size;
-		tg->last_bytes_disp[rw] += bio_size;
-	}
-
+	tg->bytes_disp[rw] += bio_size;
+	tg->last_bytes_disp[rw] += bio_size;

However, this will break that bps limits should only count once. Thus I
try to restore the overaccounting in __blk_throtl_bio() in such case:

+		if (bio_flagged(bio, BIO_THROTTLED)) {
+			unsigned int bio_size = throtl_bio_data_size(bio);
+
+			if (tg->bytes_disp[rw] >= bio_size)
+				tg->bytes_disp[rw] -= bio_size;
+			if (tg->last_bytes_disp[rw] >= bio_size)
+				tg->last_bytes_disp[rw] -= bio_size;
+		}

If new slice is not started, then the decrement should make sure this
bio won't be counted again. However, if new slice is started and the
condition 'bytes_disp >= bio_size' doesn't hold, this bio will end up
accounting twice.

Pleas let me know if you think this suituation is problematic, I'll try
to figure out a new way...

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 1/8] blk-throttle: fix that io throttle can only work for single bio
@ 2022-07-29  6:32       ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-29  6:32 UTC (permalink / raw)
  To: Tejun Heo, Yu Kuai
  Cc: mkoutny-IBi9RG/b67k, axboe-tSWWG44O7X1aa/9Udqfwiw,
	ming.lei-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yi.zhang-hv44wF8Li93QT0dZR+AlfA, yukuai (C)

Hi, Tejun!

ÔÚ 2022/07/28 2:27, Tejun Heo дµÀ:
> Sorry about the long delay.
> 
> So, the code looks nice but I have a difficult time following the logic.
> 
> On Fri, Jul 01, 2022 at 05:34:34PM +0800, Yu Kuai wrote:
>> @@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
>>   	unsigned int bio_size = throtl_bio_data_size(bio);
>>   
>>   	/* no need to throttle if this bio's bytes have been accounted */
>> -	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) {
>> +	if (bps_limit == U64_MAX) {
>>   		if (wait)
>>   			*wait = 0;
>>   		return true;
>> @@ -921,11 +921,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
>>   	unsigned int bio_size = throtl_bio_data_size(bio);
>>   
>>   	/* Charge the bio to the group */
>> -	if (!bio_flagged(bio, BIO_THROTTLED)) {
>> -		tg->bytes_disp[rw] += bio_size;
>> -		tg->last_bytes_disp[rw] += bio_size;
>> -	}
>> -
>> +	tg->bytes_disp[rw] += bio_size;
>> +	tg->last_bytes_disp[rw] += bio_size;
>>   	tg->io_disp[rw]++;
>>   	tg->last_io_disp[rw]++;
> 
> So, we're charging and controlling whether it has already been throttled or
> not.
> 
>> @@ -2121,6 +2118,21 @@ bool __blk_throtl_bio(struct bio *bio)
>>   			tg->last_low_overflow_time[rw] = jiffies;
>>   		throtl_downgrade_check(tg);
>>   		throtl_upgrade_check(tg);
>> +
>> +		/*
>> +		 * re-entered bio has accounted bytes already, so try to
>> +		 * compensate previous over-accounting. However, if new
>> +		 * slice is started, just forget it.
>> +		 */
>> +		if (bio_flagged(bio, BIO_THROTTLED)) {
>> +			unsigned int bio_size = throtl_bio_data_size(bio);
>> +
>> +			if (tg->bytes_disp[rw] >= bio_size)
>> +				tg->bytes_disp[rw] -= bio_size;
>> +			if (tg->last_bytes_disp[rw] >= bio_size)
>> +				tg->last_bytes_disp[rw] -= bio_size;
>> +		}
> 
> and trying to restore the overaccounting. However, it's not clear why this
> helps with the problem you're describing. The comment should be clearly
> spelling out why it's done this way and how this works.
> 
> Also, blk_throttl_bio() doesn't call into __blk_throtl_bio() at all if
> THROTTLED is set and HAS_IOPS_LIMIT is not, so if there are only bw limits,
> we end up accounting these IOs twice?
> 

We need to make sure following conditions is always hold:

1) If a bio is splited, iops limits should count multiple times, while
bps limits should only count once.
2) If a bio is issued while some bios are already throttled, bps limits
should not be ignored.

commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
fixes that 1) is not hold, while it breaks 2). Root cause is that such
bio will be flaged in __blk_throtl_bio(), and later
tg_with_in_bps_limit() will skip flaged bio.

In order to fix this problem, at first, I change that flaged bio won't
be skipped in tg_with_in_bps_limit():

-	if (!bio_flagged(bio, BIO_THROTTLED)) {
-		tg->bytes_disp[rw] += bio_size;
-		tg->last_bytes_disp[rw] += bio_size;
-	}
-
+	tg->bytes_disp[rw] += bio_size;
+	tg->last_bytes_disp[rw] += bio_size;

However, this will break that bps limits should only count once. Thus I
try to restore the overaccounting in __blk_throtl_bio() in such case:

+		if (bio_flagged(bio, BIO_THROTTLED)) {
+			unsigned int bio_size = throtl_bio_data_size(bio);
+
+			if (tg->bytes_disp[rw] >= bio_size)
+				tg->bytes_disp[rw] -= bio_size;
+			if (tg->last_bytes_disp[rw] >= bio_size)
+				tg->last_bytes_disp[rw] -= bio_size;
+		}

If new slice is not started, then the decrement should make sure this
bio won't be counted again. However, if new slice is started and the
condition 'bytes_disp >= bio_size' doesn't hold, this bio will end up
accounting twice.

Pleas let me know if you think this suituation is problematic, I'll try
to figure out a new way...

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 1/8] blk-throttle: fix that io throttle can only work for single bio
@ 2022-07-29 18:04         ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-29 18:04 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yi.zhang, yukuai (C)

Hello,

On Fri, Jul 29, 2022 at 02:32:36PM +0800, Yu Kuai wrote:
> We need to make sure following conditions is always hold:
> 
> 1) If a bio is splited, iops limits should count multiple times, while
> bps limits should only count once.
> 2) If a bio is issued while some bios are already throttled, bps limits
> should not be ignored.
> 
> commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
> fixes that 1) is not hold, while it breaks 2). Root cause is that such
> bio will be flaged in __blk_throtl_bio(), and later
> tg_with_in_bps_limit() will skip flaged bio.
> 
> In order to fix this problem, at first, I change that flaged bio won't
> be skipped in tg_with_in_bps_limit():
> 
> -	if (!bio_flagged(bio, BIO_THROTTLED)) {
> -		tg->bytes_disp[rw] += bio_size;
> -		tg->last_bytes_disp[rw] += bio_size;
> -	}
> -
> +	tg->bytes_disp[rw] += bio_size;
> +	tg->last_bytes_disp[rw] += bio_size;
> 
> However, this will break that bps limits should only count once. Thus I
> try to restore the overaccounting in __blk_throtl_bio() in such case:
> 
> +		if (bio_flagged(bio, BIO_THROTTLED)) {
> +			unsigned int bio_size = throtl_bio_data_size(bio);
> +
> +			if (tg->bytes_disp[rw] >= bio_size)
> +				tg->bytes_disp[rw] -= bio_size;
> +			if (tg->last_bytes_disp[rw] >= bio_size)
> +				tg->last_bytes_disp[rw] -= bio_size;
> +		}
> 
> If new slice is not started, then the decrement should make sure this
> bio won't be counted again. However, if new slice is started and the
> condition 'bytes_disp >= bio_size' doesn't hold, this bio will end up
> accounting twice.
> 
> Pleas let me know if you think this suituation is problematic, I'll try
> to figure out a new way...

While a bit tricky, I think it's fine but please add comments in the code
explaining what's going on and why. Also, can you please explain why
__blk_throtl_bio() being skipped when iops limit is not set doesn't skew the
result?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 1/8] blk-throttle: fix that io throttle can only work for single bio
@ 2022-07-29 18:04         ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2022-07-29 18:04 UTC (permalink / raw)
  To: Yu Kuai
  Cc: mkoutny-IBi9RG/b67k, axboe-tSWWG44O7X1aa/9Udqfwiw,
	ming.lei-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yi.zhang-hv44wF8Li93QT0dZR+AlfA, yukuai (C)

Hello,

On Fri, Jul 29, 2022 at 02:32:36PM +0800, Yu Kuai wrote:
> We need to make sure following conditions is always hold:
> 
> 1) If a bio is splited, iops limits should count multiple times, while
> bps limits should only count once.
> 2) If a bio is issued while some bios are already throttled, bps limits
> should not be ignored.
> 
> commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
> fixes that 1) is not hold, while it breaks 2). Root cause is that such
> bio will be flaged in __blk_throtl_bio(), and later
> tg_with_in_bps_limit() will skip flaged bio.
> 
> In order to fix this problem, at first, I change that flaged bio won't
> be skipped in tg_with_in_bps_limit():
> 
> -	if (!bio_flagged(bio, BIO_THROTTLED)) {
> -		tg->bytes_disp[rw] += bio_size;
> -		tg->last_bytes_disp[rw] += bio_size;
> -	}
> -
> +	tg->bytes_disp[rw] += bio_size;
> +	tg->last_bytes_disp[rw] += bio_size;
> 
> However, this will break that bps limits should only count once. Thus I
> try to restore the overaccounting in __blk_throtl_bio() in such case:
> 
> +		if (bio_flagged(bio, BIO_THROTTLED)) {
> +			unsigned int bio_size = throtl_bio_data_size(bio);
> +
> +			if (tg->bytes_disp[rw] >= bio_size)
> +				tg->bytes_disp[rw] -= bio_size;
> +			if (tg->last_bytes_disp[rw] >= bio_size)
> +				tg->last_bytes_disp[rw] -= bio_size;
> +		}
> 
> If new slice is not started, then the decrement should make sure this
> bio won't be counted again. However, if new slice is started and the
> condition 'bytes_disp >= bio_size' doesn't hold, this bio will end up
> accounting twice.
> 
> Pleas let me know if you think this suituation is problematic, I'll try
> to figure out a new way...

While a bit tricky, I think it's fine but please add comments in the code
explaining what's going on and why. Also, can you please explain why
__blk_throtl_bio() being skipped when iops limit is not set doesn't skew the
result?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 1/8] blk-throttle: fix that io throttle can only work for single bio
@ 2022-07-30  1:06           ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-30  1:06 UTC (permalink / raw)
  To: Tejun Heo, Yu Kuai
  Cc: mkoutny, axboe, ming.lei, cgroups, linux-block, linux-kernel,
	yi.zhang, yukuai (C)



在 2022/07/30 2:04, Tejun Heo 写道:
> Hello,
> 
> On Fri, Jul 29, 2022 at 02:32:36PM +0800, Yu Kuai wrote:
>> We need to make sure following conditions is always hold:
>>
>> 1) If a bio is splited, iops limits should count multiple times, while
>> bps limits should only count once.
>> 2) If a bio is issued while some bios are already throttled, bps limits
>> should not be ignored.
>>
>> commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
>> fixes that 1) is not hold, while it breaks 2). Root cause is that such
>> bio will be flaged in __blk_throtl_bio(), and later
>> tg_with_in_bps_limit() will skip flaged bio.
>>
>> In order to fix this problem, at first, I change that flaged bio won't
>> be skipped in tg_with_in_bps_limit():
>>
>> -	if (!bio_flagged(bio, BIO_THROTTLED)) {
>> -		tg->bytes_disp[rw] += bio_size;
>> -		tg->last_bytes_disp[rw] += bio_size;
>> -	}
>> -
>> +	tg->bytes_disp[rw] += bio_size;
>> +	tg->last_bytes_disp[rw] += bio_size;
>>
>> However, this will break that bps limits should only count once. Thus I
>> try to restore the overaccounting in __blk_throtl_bio() in such case:
>>
>> +		if (bio_flagged(bio, BIO_THROTTLED)) {
>> +			unsigned int bio_size = throtl_bio_data_size(bio);
>> +
>> +			if (tg->bytes_disp[rw] >= bio_size)
>> +				tg->bytes_disp[rw] -= bio_size;
>> +			if (tg->last_bytes_disp[rw] >= bio_size)
>> +				tg->last_bytes_disp[rw] -= bio_size;
>> +		}
>>
>> If new slice is not started, then the decrement should make sure this
>> bio won't be counted again. However, if new slice is started and the
>> condition 'bytes_disp >= bio_size' doesn't hold, this bio will end up
>> accounting twice.
>>
>> Pleas let me know if you think this suituation is problematic, I'll try
>> to figure out a new way...
> 
> While a bit tricky, I think it's fine but please add comments in the code
> explaining what's going on and why. Also, can you please explain why
> __blk_throtl_bio() being skipped when iops limit is not set doesn't skew the
> result?

Because bps limit is already counted the first time __blk_throtl_bio()
is called for the orignal bio. When splited bio is reentered, we only
need to throttle it again if iops limit is set.

By the way, I found that this way is better after patch 4:

in __blk_throtl_bio():

if (bio_flagged(bio, BIO_THROTTLED)) {
	tg->bytes_skipped[rw] += bio_size;
	if (tg->last_bytes_disp[rw] >= bio_size)
		tg->last_bytes_disp[rw] -= bio_size;
}

The overaccounting can be restored even if new slice is started.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RESEND v6 1/8] blk-throttle: fix that io throttle can only work for single bio
@ 2022-07-30  1:06           ` Yu Kuai
  0 siblings, 0 replies; 44+ messages in thread
From: Yu Kuai @ 2022-07-30  1:06 UTC (permalink / raw)
  To: Tejun Heo, Yu Kuai
  Cc: mkoutny-IBi9RG/b67k, axboe-tSWWG44O7X1aa/9Udqfwiw,
	ming.lei-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yi.zhang-hv44wF8Li93QT0dZR+AlfA, yukuai (C)



ÔÚ 2022/07/30 2:04, Tejun Heo дµÀ:
> Hello,
> 
> On Fri, Jul 29, 2022 at 02:32:36PM +0800, Yu Kuai wrote:
>> We need to make sure following conditions is always hold:
>>
>> 1) If a bio is splited, iops limits should count multiple times, while
>> bps limits should only count once.
>> 2) If a bio is issued while some bios are already throttled, bps limits
>> should not be ignored.
>>
>> commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
>> fixes that 1) is not hold, while it breaks 2). Root cause is that such
>> bio will be flaged in __blk_throtl_bio(), and later
>> tg_with_in_bps_limit() will skip flaged bio.
>>
>> In order to fix this problem, at first, I change that flaged bio won't
>> be skipped in tg_with_in_bps_limit():
>>
>> -	if (!bio_flagged(bio, BIO_THROTTLED)) {
>> -		tg->bytes_disp[rw] += bio_size;
>> -		tg->last_bytes_disp[rw] += bio_size;
>> -	}
>> -
>> +	tg->bytes_disp[rw] += bio_size;
>> +	tg->last_bytes_disp[rw] += bio_size;
>>
>> However, this will break that bps limits should only count once. Thus I
>> try to restore the overaccounting in __blk_throtl_bio() in such case:
>>
>> +		if (bio_flagged(bio, BIO_THROTTLED)) {
>> +			unsigned int bio_size = throtl_bio_data_size(bio);
>> +
>> +			if (tg->bytes_disp[rw] >= bio_size)
>> +				tg->bytes_disp[rw] -= bio_size;
>> +			if (tg->last_bytes_disp[rw] >= bio_size)
>> +				tg->last_bytes_disp[rw] -= bio_size;
>> +		}
>>
>> If new slice is not started, then the decrement should make sure this
>> bio won't be counted again. However, if new slice is started and the
>> condition 'bytes_disp >= bio_size' doesn't hold, this bio will end up
>> accounting twice.
>>
>> Pleas let me know if you think this suituation is problematic, I'll try
>> to figure out a new way...
> 
> While a bit tricky, I think it's fine but please add comments in the code
> explaining what's going on and why. Also, can you please explain why
> __blk_throtl_bio() being skipped when iops limit is not set doesn't skew the
> result?

Because bps limit is already counted the first time __blk_throtl_bio()
is called for the orignal bio. When splited bio is reentered, we only
need to throttle it again if iops limit is set.

By the way, I found that this way is better after patch 4:

in __blk_throtl_bio():

if (bio_flagged(bio, BIO_THROTTLED)) {
	tg->bytes_skipped[rw] += bio_size;
	if (tg->last_bytes_disp[rw] >= bio_size)
		tg->last_bytes_disp[rw] -= bio_size;
}

The overaccounting can be restored even if new slice is started.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2022-07-30  1:06 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-01  9:34 [PATCH RESEND v6 0/8] bugfix and cleanup for blk-throttle Yu Kuai
2022-07-01  9:34 ` Yu Kuai
2022-07-01  9:34 ` [PATCH RESEND v6 1/8] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
2022-07-27 18:27   ` Tejun Heo
2022-07-27 18:27     ` Tejun Heo
2022-07-29  6:32     ` Yu Kuai
2022-07-29  6:32       ` Yu Kuai
2022-07-29 18:04       ` Tejun Heo
2022-07-29 18:04         ` Tejun Heo
2022-07-30  1:06         ` Yu Kuai
2022-07-30  1:06           ` Yu Kuai
2022-07-01  9:34 ` [PATCH RESEND v6 2/8] blk-throttle: prevent overflow while calculating wait time Yu Kuai
2022-07-01  9:34   ` Yu Kuai
2022-07-27 18:28   ` Tejun Heo
2022-07-27 18:28     ` Tejun Heo
2022-07-28 10:23     ` Yu Kuai
2022-07-28 10:23       ` Yu Kuai
2022-07-01  9:34 ` [PATCH RESEND v6 3/8] blk-throttle: factor out code to calculate ios/bytes_allowed Yu Kuai
2022-07-01  9:34 ` [PATCH RESEND v6 4/8] blk-throttle: fix io hung due to config updates Yu Kuai
2022-07-01  9:34   ` Yu Kuai
2022-07-27 18:39   ` Tejun Heo
2022-07-28  9:33     ` Michal Koutný
2022-07-28 10:34       ` Yu Kuai
2022-07-28 10:34         ` Yu Kuai
2022-07-28 16:55         ` Tejun Heo
2022-07-01  9:34 ` [PATCH RESEND v6 5/8] blk-throttle: use 'READ/WRITE' instead of '0/1' Yu Kuai
2022-07-27 18:39   ` Tejun Heo
2022-07-01  9:34 ` [PATCH RESEND v6 6/8] blk-throttle: calling throtl_dequeue/enqueue_tg in pairs Yu Kuai
2022-07-27 18:40   ` Tejun Heo
2022-07-01  9:34 ` [PATCH RESEND v6 7/8] blk-throttle: cleanup tg_update_disptime() Yu Kuai
2022-07-27 18:42   ` Tejun Heo
2022-07-27 18:42     ` Tejun Heo
2022-07-01  9:34 ` [PATCH RESEND v6 8/8] blk-throttle: clean up flag 'THROTL_TG_PENDING' Yu Kuai
2022-07-27 18:44   ` Tejun Heo
2022-07-27 18:44     ` Tejun Heo
2022-07-28 11:03     ` Yu Kuai
2022-07-28 11:03       ` Yu Kuai
2022-07-10  2:39 ` [PATCH RESEND v6 0/8] bugfix and cleanup for blk-throttle Yu Kuai
2022-07-10  2:39   ` Yu Kuai
2022-07-10  2:40 ` Yu Kuai
2022-07-10  2:40   ` Yu Kuai
2022-07-20 11:45   ` Yu Kuai
2022-07-27 12:12 ` Yu Kuai
2022-07-27 12:12   ` Yu Kuai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.