All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -next v4 0/4] bugfix for blk-throttle
@ 2022-05-23  8:26 ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-23  8:26 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

Changes in v4:
 - add reviewed-by tag for patch 1
 - add patch 2,3
 - use a different way to fix io hung in patch 4
Changes in v3:
 - fix a check in patch 1
 - fix link err in patch 2 on 32-bit platform
 - handle overflow in patch 2
Changes in v2:
 - use a new solution suggested by Ming
 - change the title of patch 1
 - add patch 2

Patch 1 fix that blk-throttle can't work if multiple bios are throttle,
Patch 2 fix overflow while calculating wait time
Patch 3,4 fix io hung due to configuration updates.

Previous version:
v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/

Yu Kuai (4):
  blk-throttle: fix that io throttle can only work for single bio
  blk-throttle: prevent overflow while calculating wait time
  blk-throttle: factor out code to calculate ios/bytes_allowed
  blk-throttle: fix io hung due to config updates

 block/blk-throttle.c | 121 ++++++++++++++++++++++++++++++++-----------
 block/blk-throttle.h |   4 ++
 2 files changed, 94 insertions(+), 31 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH -next v4 0/4] bugfix for blk-throttle
@ 2022-05-23  8:26 ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-23  8:26 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

Changes in v4:
 - add reviewed-by tag for patch 1
 - add patch 2,3
 - use a different way to fix io hung in patch 4
Changes in v3:
 - fix a check in patch 1
 - fix link err in patch 2 on 32-bit platform
 - handle overflow in patch 2
Changes in v2:
 - use a new solution suggested by Ming
 - change the title of patch 1
 - add patch 2

Patch 1 fix that blk-throttle can't work if multiple bios are throttle,
Patch 2 fix overflow while calculating wait time
Patch 3,4 fix io hung due to configuration updates.

Previous version:
v1: https://lore.kernel.org/all/20220517134909.2910251-1-yukuai3@huawei.com/
v2: https://lore.kernel.org/all/20220518072751.1188163-1-yukuai3@huawei.com/
v3: https://lore.kernel.org/all/20220519085811.879097-1-yukuai3@huawei.com/

Yu Kuai (4):
  blk-throttle: fix that io throttle can only work for single bio
  blk-throttle: prevent overflow while calculating wait time
  blk-throttle: factor out code to calculate ios/bytes_allowed
  blk-throttle: fix io hung due to config updates

 block/blk-throttle.c | 121 ++++++++++++++++++++++++++++++++-----------
 block/blk-throttle.h |   4 ++
 2 files changed, 94 insertions(+), 31 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH -next v4 1/4] blk-throttle: fix that io throttle can only work for single bio
  2022-05-23  8:26 ` Yu Kuai
@ 2022-05-23  8:26   ` Yu Kuai
  -1 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-23  8:26 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
introduce a new problem, for example:

[root@localhost ~]# echo "8:0 1024" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
[root@localhost ~]# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs
[root@localhost ~]# dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
[1] 620
[root@localhost ~]# dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
[2] 626
[root@localhost ~]# 1+0 records in
1+0 records out
10240 bytes (10 kB, 10 KiB) copied, 10.0038 s, 1.0 kB/s1+0 records in
1+0 records out

10240 bytes (10 kB, 10 KiB) copied, 9.23076 s, 1.1 kB/s
-> the second bio is finished after 10s instead of 20s.

This is because if some bios are already queued, current bio is queued
directly and the flag 'BIO_THROTTLED' is set. And later, when former
bios are dispatched, this bio will be dispatched without waiting at all,
this is due to tg_with_in_bps_limit() return 0 for this bio.

In order to fix the problem, don't skip flaged bio in
tg_with_in_bps_limit(), and for the problem that split bio can be
double accounted, compensate the over-accounting in __blk_throtl_bio().

Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-throttle.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 447e1b8722f7..0c37be08ff28 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
 	/* no need to throttle if this bio's bytes have been accounted */
-	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) {
+	if (bps_limit == U64_MAX) {
 		if (wait)
 			*wait = 0;
 		return true;
@@ -921,11 +921,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
 	/* Charge the bio to the group */
-	if (!bio_flagged(bio, BIO_THROTTLED)) {
-		tg->bytes_disp[rw] += bio_size;
-		tg->last_bytes_disp[rw] += bio_size;
-	}
-
+	tg->bytes_disp[rw] += bio_size;
+	tg->last_bytes_disp[rw] += bio_size;
 	tg->io_disp[rw]++;
 	tg->last_io_disp[rw]++;
 
@@ -2121,6 +2118,21 @@ bool __blk_throtl_bio(struct bio *bio)
 			tg->last_low_overflow_time[rw] = jiffies;
 		throtl_downgrade_check(tg);
 		throtl_upgrade_check(tg);
+
+		/*
+		 * re-entered bio has accounted bytes already, so try to
+		 * compensate previous over-accounting. However, if new
+		 * slice is started, just forget it.
+		 */
+		if (bio_flagged(bio, BIO_THROTTLED)) {
+			unsigned int bio_size = throtl_bio_data_size(bio);
+
+			if (tg->bytes_disp[rw] >= bio_size)
+				tg->bytes_disp[rw] -= bio_size;
+			if (tg->last_bytes_disp[rw] >= bio_size)
+				tg->last_bytes_disp[rw] -= bio_size;
+		}
+
 		/* throtl is FIFO - if bios are already queued, should queue */
 		if (sq->nr_queued[rw])
 			break;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -next v4 1/4] blk-throttle: fix that io throttle can only work for single bio
@ 2022-05-23  8:26   ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-23  8:26 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
introduce a new problem, for example:

[root@localhost ~]# echo "8:0 1024" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
[root@localhost ~]# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs
[root@localhost ~]# dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
[1] 620
[root@localhost ~]# dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
[2] 626
[root@localhost ~]# 1+0 records in
1+0 records out
10240 bytes (10 kB, 10 KiB) copied, 10.0038 s, 1.0 kB/s1+0 records in
1+0 records out

10240 bytes (10 kB, 10 KiB) copied, 9.23076 s, 1.1 kB/s
-> the second bio is finished after 10s instead of 20s.

This is because if some bios are already queued, current bio is queued
directly and the flag 'BIO_THROTTLED' is set. And later, when former
bios are dispatched, this bio will be dispatched without waiting at all,
this is due to tg_with_in_bps_limit() return 0 for this bio.

In order to fix the problem, don't skip flaged bio in
tg_with_in_bps_limit(), and for the problem that split bio can be
double accounted, compensate the over-accounting in __blk_throtl_bio().

Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-throttle.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 447e1b8722f7..0c37be08ff28 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
 	/* no need to throttle if this bio's bytes have been accounted */
-	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) {
+	if (bps_limit == U64_MAX) {
 		if (wait)
 			*wait = 0;
 		return true;
@@ -921,11 +921,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
 	/* Charge the bio to the group */
-	if (!bio_flagged(bio, BIO_THROTTLED)) {
-		tg->bytes_disp[rw] += bio_size;
-		tg->last_bytes_disp[rw] += bio_size;
-	}
-
+	tg->bytes_disp[rw] += bio_size;
+	tg->last_bytes_disp[rw] += bio_size;
 	tg->io_disp[rw]++;
 	tg->last_io_disp[rw]++;
 
@@ -2121,6 +2118,21 @@ bool __blk_throtl_bio(struct bio *bio)
 			tg->last_low_overflow_time[rw] = jiffies;
 		throtl_downgrade_check(tg);
 		throtl_upgrade_check(tg);
+
+		/*
+		 * re-entered bio has accounted bytes already, so try to
+		 * compensate previous over-accounting. However, if new
+		 * slice is started, just forget it.
+		 */
+		if (bio_flagged(bio, BIO_THROTTLED)) {
+			unsigned int bio_size = throtl_bio_data_size(bio);
+
+			if (tg->bytes_disp[rw] >= bio_size)
+				tg->bytes_disp[rw] -= bio_size;
+			if (tg->last_bytes_disp[rw] >= bio_size)
+				tg->last_bytes_disp[rw] -= bio_size;
+		}
+
 		/* throtl is FIFO - if bios are already queued, should queue */
 		if (sq->nr_queued[rw])
 			break;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -next v4 2/4] blk-throttle: prevent overflow while calculating wait time
@ 2022-05-23  8:26   ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-23  8:26 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

in tg_with_in_bps_limit(), 'bps_limit * jiffy_elapsed_rnd' might
overflow, handle the case by calling mul_u64_u64_div_u64() instead.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 0c37be08ff28..7e0c31e920dd 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -806,7 +806,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 				 u64 bps_limit, unsigned long *wait)
 {
 	bool rw = bio_data_dir(bio);
-	u64 bytes_allowed, extra_bytes, tmp;
+	u64 bytes_allowed, extra_bytes;
 	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
@@ -824,10 +824,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-
-	tmp = bps_limit * jiffy_elapsed_rnd;
-	do_div(tmp, HZ);
-	bytes_allowed = tmp;
+	bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
+					    (u64)HZ);
 
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -next v4 2/4] blk-throttle: prevent overflow while calculating wait time
@ 2022-05-23  8:26   ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-23  8:26 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, mkoutny-IBi9RG/b67k,
	axboe-tSWWG44O7X1aa/9Udqfwiw, ming.lei-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yukuai3-hv44wF8Li93QT0dZR+AlfA, yi.zhang-hv44wF8Li93QT0dZR+AlfA

in tg_with_in_bps_limit(), 'bps_limit * jiffy_elapsed_rnd' might
overflow, handle the case by calling mul_u64_u64_div_u64() instead.

Signed-off-by: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 block/blk-throttle.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 0c37be08ff28..7e0c31e920dd 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -806,7 +806,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 				 u64 bps_limit, unsigned long *wait)
 {
 	bool rw = bio_data_dir(bio);
-	u64 bytes_allowed, extra_bytes, tmp;
+	u64 bytes_allowed, extra_bytes;
 	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
@@ -824,10 +824,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-
-	tmp = bps_limit * jiffy_elapsed_rnd;
-	do_div(tmp, HZ);
-	bytes_allowed = tmp;
+	bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
+					    (u64)HZ);
 
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -next v4 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed
  2022-05-23  8:26 ` Yu Kuai
@ 2022-05-23  8:26   ` Yu Kuai
  -1 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-23  8:26 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

No functional changes, new apis will be used in later patches to handle
throttled bios while updating config.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 48 +++++++++++++++++++++++++++-----------------
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 7e0c31e920dd..ded0d30ef49e 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -754,25 +754,12 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
 		   tg->slice_start[rw], tg->slice_end[rw], jiffies);
 }
 
-static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
-				  u32 iops_limit, unsigned long *wait)
+static unsigned int calculate_io_allowed(u32 iops_limit,
+					 unsigned long jiffy_elapsed_rnd)
 {
-	bool rw = bio_data_dir(bio);
 	unsigned int io_allowed;
-	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	u64 tmp;
 
-	if (iops_limit == UINT_MAX) {
-		if (wait)
-			*wait = 0;
-		return true;
-	}
-
-	jiffy_elapsed = jiffies - tg->slice_start[rw];
-
-	/* Round up to the next throttle slice, wait time must be nonzero */
-	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-
 	/*
 	 * jiffy_elapsed_rnd should not be a big value as minimum iops can be
 	 * 1 then at max jiffy elapsed should be equivalent of 1 second as we
@@ -788,6 +775,33 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 	else
 		io_allowed = tmp;
 
+	return io_allowed;
+}
+
+static u64 calculate_bytes_allowed(u64 bps_limit,
+				   unsigned long jiffy_elapsed_rnd)
+{
+	return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd, (u64)HZ);
+}
+
+static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
+				  u32 iops_limit, unsigned long *wait)
+{
+	bool rw = bio_data_dir(bio);
+	unsigned int io_allowed;
+	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
+
+	if (iops_limit == UINT_MAX) {
+		if (wait)
+			*wait = 0;
+		return true;
+	}
+
+	jiffy_elapsed = jiffies - tg->slice_start[rw];
+
+	/* Round up to the next throttle slice, wait time must be nonzero */
+	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
+	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
 		if (wait)
 			*wait = 0;
@@ -824,9 +838,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-	bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
-					    (u64)HZ);
-
+	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
 			*wait = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -next v4 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed
@ 2022-05-23  8:26   ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-23  8:26 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

No functional changes, new apis will be used in later patches to handle
throttled bios while updating config.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 48 +++++++++++++++++++++++++++-----------------
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 7e0c31e920dd..ded0d30ef49e 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -754,25 +754,12 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
 		   tg->slice_start[rw], tg->slice_end[rw], jiffies);
 }
 
-static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
-				  u32 iops_limit, unsigned long *wait)
+static unsigned int calculate_io_allowed(u32 iops_limit,
+					 unsigned long jiffy_elapsed_rnd)
 {
-	bool rw = bio_data_dir(bio);
 	unsigned int io_allowed;
-	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	u64 tmp;
 
-	if (iops_limit == UINT_MAX) {
-		if (wait)
-			*wait = 0;
-		return true;
-	}
-
-	jiffy_elapsed = jiffies - tg->slice_start[rw];
-
-	/* Round up to the next throttle slice, wait time must be nonzero */
-	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-
 	/*
 	 * jiffy_elapsed_rnd should not be a big value as minimum iops can be
 	 * 1 then at max jiffy elapsed should be equivalent of 1 second as we
@@ -788,6 +775,33 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 	else
 		io_allowed = tmp;
 
+	return io_allowed;
+}
+
+static u64 calculate_bytes_allowed(u64 bps_limit,
+				   unsigned long jiffy_elapsed_rnd)
+{
+	return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd, (u64)HZ);
+}
+
+static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
+				  u32 iops_limit, unsigned long *wait)
+{
+	bool rw = bio_data_dir(bio);
+	unsigned int io_allowed;
+	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
+
+	if (iops_limit == UINT_MAX) {
+		if (wait)
+			*wait = 0;
+		return true;
+	}
+
+	jiffy_elapsed = jiffies - tg->slice_start[rw];
+
+	/* Round up to the next throttle slice, wait time must be nonzero */
+	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
+	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
 		if (wait)
 			*wait = 0;
@@ -824,9 +838,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-	bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
-					    (u64)HZ);
-
+	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
 			*wait = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -next v4 4/4] blk-throttle: fix io hung due to config updates
@ 2022-05-23  8:26   ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-23  8:26 UTC (permalink / raw)
  To: tj, mkoutny, axboe, ming.lei
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

If new configuration is submitted while a bio is throttled, then new
waiting time is recalculated regardless that the bio might aready wait
for some time:

tg_conf_updated
 throtl_start_new_slice
  tg_update_disptime
  throtl_schedule_next_dispatch

Then io hung can be triggered by always submmiting new configuration
before the throttled bio is dispatched.

Fix the problem by respecting the time that throttled bio aready waited.
In order to do that, add new fields to record how many bytes/io already
waited, and use it to calculate wait time for throttled bio under new
configuration.

Some simple test:
1)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 2048" > blkio.throttle.write_bps_device
{
        sleep 3
        echo "8:0 1024" > blkio.throttle.write_bps_device
} &
sleep 1
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

2)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 1024" > blkio.throttle.write_bps_device
{
        sleep 5
        echo "8:0 2048" > blkio.throttle.write_bps_device
} &
sleep 1
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

test results: io finish time
	before this patch	with this patch
1)	10s			6s
2)	8s			6s

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 49 ++++++++++++++++++++++++++++++++++++++------
 block/blk-throttle.h |  4 ++++
 2 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index ded0d30ef49e..612bd221783c 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -656,12 +656,17 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
 		   tg->slice_end[rw], jiffies);
 }
 
-static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw)
+static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw,
+					  bool clear_skipped)
 {
 	tg->bytes_disp[rw] = 0;
 	tg->io_disp[rw] = 0;
 	tg->slice_start[rw] = jiffies;
 	tg->slice_end[rw] = jiffies + tg->td->throtl_slice;
+	if (clear_skipped) {
+		tg->bytes_skipped[rw] = 0;
+		tg->io_skipped[rw] = 0;
+	}
 
 	throtl_log(&tg->service_queue,
 		   "[%c] new slice start=%lu end=%lu jiffies=%lu",
@@ -784,6 +789,34 @@ static u64 calculate_bytes_allowed(u64 bps_limit,
 	return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd, (u64)HZ);
 }
 
+static void __tg_update_skipped(struct throtl_grp *tg, bool rw)
+{
+	unsigned long jiffy_elapsed = jiffies - tg->slice_start[rw];
+	u64 bps_limit = tg_bps_limit(tg, rw);
+	u32 iops_limit = tg_iops_limit(tg, rw);
+
+	if (bps_limit != U64_MAX)
+		tg->bytes_skipped[rw] +=
+			calculate_bytes_allowed(bps_limit, jiffy_elapsed) -
+			tg->bytes_disp[rw];
+	if (iops_limit != UINT_MAX)
+		tg->io_skipped[rw] +=
+			calculate_io_allowed(iops_limit, jiffy_elapsed) -
+			tg->io_disp[rw];
+}
+
+static void tg_update_skipped(struct throtl_grp *tg)
+{
+	if (tg->service_queue.nr_queued[READ])
+		__tg_update_skipped(tg, READ);
+	if (tg->service_queue.nr_queued[WRITE])
+		__tg_update_skipped(tg, WRITE);
+
+	throtl_log(&tg->service_queue, "%s: %llu %llu %u %u\n", __func__,
+		   tg->bytes_skipped[READ], tg->bytes_skipped[WRITE],
+		   tg->io_skipped[READ], tg->io_skipped[WRITE]);
+}
+
 static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 				  u32 iops_limit, unsigned long *wait)
 {
@@ -801,7 +834,8 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 
 	/* Round up to the next throttle slice, wait time must be nonzero */
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
+	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd) +
+		     tg->io_skipped[rw];
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
 		if (wait)
 			*wait = 0;
@@ -838,7 +872,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
+	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd) +
+			tg->bytes_skipped[rw];
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
 			*wait = 0;
@@ -899,7 +934,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 	 * slice and it should be extended instead.
 	 */
 	if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]))
-		throtl_start_new_slice(tg, rw);
+		throtl_start_new_slice(tg, rw, true);
 	else {
 		if (time_before(tg->slice_end[rw],
 		    jiffies + tg->td->throtl_slice))
@@ -1328,8 +1363,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
 	 * that a group's limit are dropped suddenly and we don't want to
 	 * account recently dispatched IO with new low rate.
 	 */
-	throtl_start_new_slice(tg, READ);
-	throtl_start_new_slice(tg, WRITE);
+	throtl_start_new_slice(tg, READ, false);
+	throtl_start_new_slice(tg, WRITE, false);
 
 	if (tg->flags & THROTL_TG_PENDING) {
 		tg_update_disptime(tg);
@@ -1357,6 +1392,7 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of,
 		v = U64_MAX;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_skipped(tg);
 
 	if (is_u64)
 		*(u64 *)((void *)tg + of_cft(of)->private) = v;
@@ -1543,6 +1579,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of,
 		return ret;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_skipped(tg);
 
 	v[0] = tg->bps_conf[READ][index];
 	v[1] = tg->bps_conf[WRITE][index];
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index c1b602996127..845909c72f86 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -115,6 +115,10 @@ struct throtl_grp {
 	uint64_t bytes_disp[2];
 	/* Number of bio's dispatched in current slice */
 	unsigned int io_disp[2];
+	/* Number of bytes will be skipped in current slice */
+	uint64_t bytes_skipped[2];
+	/* Number of bio's will be skipped in current slice */
+	unsigned int io_skipped[2];
 
 	unsigned long last_low_overflow_time[2];
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -next v4 4/4] blk-throttle: fix io hung due to config updates
@ 2022-05-23  8:26   ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-23  8:26 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, mkoutny-IBi9RG/b67k,
	axboe-tSWWG44O7X1aa/9Udqfwiw, ming.lei-H+wXaHxf7aLQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yukuai3-hv44wF8Li93QT0dZR+AlfA, yi.zhang-hv44wF8Li93QT0dZR+AlfA

If new configuration is submitted while a bio is throttled, then new
waiting time is recalculated regardless that the bio might aready wait
for some time:

tg_conf_updated
 throtl_start_new_slice
  tg_update_disptime
  throtl_schedule_next_dispatch

Then io hung can be triggered by always submmiting new configuration
before the throttled bio is dispatched.

Fix the problem by respecting the time that throttled bio aready waited.
In order to do that, add new fields to record how many bytes/io already
waited, and use it to calculate wait time for throttled bio under new
configuration.

Some simple test:
1)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 2048" > blkio.throttle.write_bps_device
{
        sleep 3
        echo "8:0 1024" > blkio.throttle.write_bps_device
} &
sleep 1
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

2)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 1024" > blkio.throttle.write_bps_device
{
        sleep 5
        echo "8:0 2048" > blkio.throttle.write_bps_device
} &
sleep 1
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

test results: io finish time
	before this patch	with this patch
1)	10s			6s
2)	8s			6s

Signed-off-by: Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 block/blk-throttle.c | 49 ++++++++++++++++++++++++++++++++++++++------
 block/blk-throttle.h |  4 ++++
 2 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index ded0d30ef49e..612bd221783c 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -656,12 +656,17 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
 		   tg->slice_end[rw], jiffies);
 }
 
-static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw)
+static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw,
+					  bool clear_skipped)
 {
 	tg->bytes_disp[rw] = 0;
 	tg->io_disp[rw] = 0;
 	tg->slice_start[rw] = jiffies;
 	tg->slice_end[rw] = jiffies + tg->td->throtl_slice;
+	if (clear_skipped) {
+		tg->bytes_skipped[rw] = 0;
+		tg->io_skipped[rw] = 0;
+	}
 
 	throtl_log(&tg->service_queue,
 		   "[%c] new slice start=%lu end=%lu jiffies=%lu",
@@ -784,6 +789,34 @@ static u64 calculate_bytes_allowed(u64 bps_limit,
 	return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd, (u64)HZ);
 }
 
+static void __tg_update_skipped(struct throtl_grp *tg, bool rw)
+{
+	unsigned long jiffy_elapsed = jiffies - tg->slice_start[rw];
+	u64 bps_limit = tg_bps_limit(tg, rw);
+	u32 iops_limit = tg_iops_limit(tg, rw);
+
+	if (bps_limit != U64_MAX)
+		tg->bytes_skipped[rw] +=
+			calculate_bytes_allowed(bps_limit, jiffy_elapsed) -
+			tg->bytes_disp[rw];
+	if (iops_limit != UINT_MAX)
+		tg->io_skipped[rw] +=
+			calculate_io_allowed(iops_limit, jiffy_elapsed) -
+			tg->io_disp[rw];
+}
+
+static void tg_update_skipped(struct throtl_grp *tg)
+{
+	if (tg->service_queue.nr_queued[READ])
+		__tg_update_skipped(tg, READ);
+	if (tg->service_queue.nr_queued[WRITE])
+		__tg_update_skipped(tg, WRITE);
+
+	throtl_log(&tg->service_queue, "%s: %llu %llu %u %u\n", __func__,
+		   tg->bytes_skipped[READ], tg->bytes_skipped[WRITE],
+		   tg->io_skipped[READ], tg->io_skipped[WRITE]);
+}
+
 static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 				  u32 iops_limit, unsigned long *wait)
 {
@@ -801,7 +834,8 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
 
 	/* Round up to the next throttle slice, wait time must be nonzero */
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
+	io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd) +
+		     tg->io_skipped[rw];
 	if (tg->io_disp[rw] + 1 <= io_allowed) {
 		if (wait)
 			*wait = 0;
@@ -838,7 +872,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 		jiffy_elapsed_rnd = tg->td->throtl_slice;
 
 	jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
+	bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd) +
+			tg->bytes_skipped[rw];
 	if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
 		if (wait)
 			*wait = 0;
@@ -899,7 +934,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 	 * slice and it should be extended instead.
 	 */
 	if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]))
-		throtl_start_new_slice(tg, rw);
+		throtl_start_new_slice(tg, rw, true);
 	else {
 		if (time_before(tg->slice_end[rw],
 		    jiffies + tg->td->throtl_slice))
@@ -1328,8 +1363,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
 	 * that a group's limit are dropped suddenly and we don't want to
 	 * account recently dispatched IO with new low rate.
 	 */
-	throtl_start_new_slice(tg, READ);
-	throtl_start_new_slice(tg, WRITE);
+	throtl_start_new_slice(tg, READ, false);
+	throtl_start_new_slice(tg, WRITE, false);
 
 	if (tg->flags & THROTL_TG_PENDING) {
 		tg_update_disptime(tg);
@@ -1357,6 +1392,7 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of,
 		v = U64_MAX;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_skipped(tg);
 
 	if (is_u64)
 		*(u64 *)((void *)tg + of_cft(of)->private) = v;
@@ -1543,6 +1579,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of,
 		return ret;
 
 	tg = blkg_to_tg(ctx.blkg);
+	tg_update_skipped(tg);
 
 	v[0] = tg->bps_conf[READ][index];
 	v[1] = tg->bps_conf[WRITE][index];
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index c1b602996127..845909c72f86 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -115,6 +115,10 @@ struct throtl_grp {
 	uint64_t bytes_disp[2];
 	/* Number of bio's dispatched in current slice */
 	unsigned int io_disp[2];
+	/* Number of bytes will be skipped in current slice */
+	uint64_t bytes_skipped[2];
+	/* Number of bio's will be skipped in current slice */
+	unsigned int io_skipped[2];
 
 	unsigned long last_low_overflow_time[2];
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH -next v4 4/4] blk-throttle: fix io hung due to config updates
  2022-05-23  8:26   ` Yu Kuai
@ 2022-05-24  9:59     ` Michal Koutný
  -1 siblings, 0 replies; 14+ messages in thread
From: Michal Koutný @ 2022-05-24  9:59 UTC (permalink / raw)
  To: Yu Kuai; +Cc: tj, axboe, ming.lei, cgroups, linux-block, linux-kernel, yi.zhang

On Mon, May 23, 2022 at 04:26:33PM +0800, Yu Kuai <yukuai3@huawei.com> wrote:
> Fix the problem by respecting the time that throttled bio aready waited.
> In order to do that, add new fields to record how many bytes/io already
> waited, and use it to calculate wait time for throttled bio under new
> configuration.

This new approach is correctly conserving the bandwidth upon changes.
(Looking and BPS paths.)

> 
> Some simple test:
> 1)
> cd /sys/fs/cgroup/blkio/
> echo $$ > cgroup.procs
> echo "8:0 2048" > blkio.throttle.write_bps_device
> {
>         sleep 3
>         echo "8:0 1024" > blkio.throttle.write_bps_device
> } &
> sleep 1
> dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
> 
> 2)
> cd /sys/fs/cgroup/blkio/
> echo $$ > cgroup.procs
> echo "8:0 1024" > blkio.throttle.write_bps_device
> {
>         sleep 5
>         echo "8:0 2048" > blkio.throttle.write_bps_device
> } &
> sleep 1
> dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
> 

It's interesting that you're getting these numbers (w/patch)

> test results: io finish time
> 	before this patch	with this patch
> 1)	10s			6s
> 2)	8s			6s

wait := (disp + bio - Δt*l_old) / l_new

1)
wait = (0k + 8k - 3s*2k/s) / 1k/s = 2s -> i.e. 5s absolute

2)
wait = (0k + 8k - 5s*1k/s) / 2k/s = 2.5s -> i.e. 6.5s absolute

Are you numbers noisy+rounded or do I still mis anything?

(Also isn't it worth having this more permanent in tools/testing/selftest?)

> +static void tg_update_skipped(struct throtl_grp *tg)
> +{
> +	if (tg->service_queue.nr_queued[READ])
> +		__tg_update_skipped(tg, READ);
> +	if (tg->service_queue.nr_queued[WRITE])
> +		__tg_update_skipped(tg, WRITE);

On one hand, the callers of tg_update_skipped() know whether R/W limit
is changed, so only the respective variant could be called.
On the other hand, this conditions look implied by tg->flags &
THROTL_TG_PENDING.
(Just noting, it's likely still not possibly to pass the skipped value
only via stack.)


> @@ -115,6 +115,10 @@ struct throtl_grp {
>  	uint64_t bytes_disp[2];
>  	/* Number of bio's dispatched in current slice */
>  	unsigned int io_disp[2];
> +	/* Number of bytes will be skipped in current slice */
> +	uint64_t bytes_skipped[2];
> +	/* Number of bio's will be skipped in current slice */
> +	unsigned int io_skipped[2];

Please add a comment these fields exists to facilitate config updates
(the bytes to be skipped is sort of obvious from the name :-).

Thanks,
Michal


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -next v4 4/4] blk-throttle: fix io hung due to config updates
@ 2022-05-24  9:59     ` Michal Koutný
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Koutný @ 2022-05-24  9:59 UTC (permalink / raw)
  To: Yu Kuai; +Cc: tj, axboe, ming.lei, cgroups, linux-block, linux-kernel, yi.zhang

On Mon, May 23, 2022 at 04:26:33PM +0800, Yu Kuai <yukuai3@huawei.com> wrote:
> Fix the problem by respecting the time that throttled bio aready waited.
> In order to do that, add new fields to record how many bytes/io already
> waited, and use it to calculate wait time for throttled bio under new
> configuration.

This new approach is correctly conserving the bandwidth upon changes.
(Looking and BPS paths.)

> 
> Some simple test:
> 1)
> cd /sys/fs/cgroup/blkio/
> echo $$ > cgroup.procs
> echo "8:0 2048" > blkio.throttle.write_bps_device
> {
>         sleep 3
>         echo "8:0 1024" > blkio.throttle.write_bps_device
> } &
> sleep 1
> dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
> 
> 2)
> cd /sys/fs/cgroup/blkio/
> echo $$ > cgroup.procs
> echo "8:0 1024" > blkio.throttle.write_bps_device
> {
>         sleep 5
>         echo "8:0 2048" > blkio.throttle.write_bps_device
> } &
> sleep 1
> dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
> 

It's interesting that you're getting these numbers (w/patch)

> test results: io finish time
> 	before this patch	with this patch
> 1)	10s			6s
> 2)	8s			6s

wait := (disp + bio - Δt*l_old) / l_new

1)
wait = (0k + 8k - 3s*2k/s) / 1k/s = 2s -> i.e. 5s absolute

2)
wait = (0k + 8k - 5s*1k/s) / 2k/s = 2.5s -> i.e. 6.5s absolute

Are you numbers noisy+rounded or do I still mis anything?

(Also isn't it worth having this more permanent in tools/testing/selftest?)

> +static void tg_update_skipped(struct throtl_grp *tg)
> +{
> +	if (tg->service_queue.nr_queued[READ])
> +		__tg_update_skipped(tg, READ);
> +	if (tg->service_queue.nr_queued[WRITE])
> +		__tg_update_skipped(tg, WRITE);

On one hand, the callers of tg_update_skipped() know whether R/W limit
is changed, so only the respective variant could be called.
On the other hand, this conditions look implied by tg->flags &
THROTL_TG_PENDING.
(Just noting, it's likely still not possibly to pass the skipped value
only via stack.)


> @@ -115,6 +115,10 @@ struct throtl_grp {
>  	uint64_t bytes_disp[2];
>  	/* Number of bio's dispatched in current slice */
>  	unsigned int io_disp[2];
> +	/* Number of bytes will be skipped in current slice */
> +	uint64_t bytes_skipped[2];
> +	/* Number of bio's will be skipped in current slice */
> +	unsigned int io_skipped[2];

Please add a comment these fields exists to facilitate config updates
(the bytes to be skipped is sort of obvious from the name :-).

Thanks,
Michal


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -next v4 4/4] blk-throttle: fix io hung due to config updates
@ 2022-05-24 11:47       ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-24 11:47 UTC (permalink / raw)
  To: Michal Koutný
  Cc: tj, axboe, ming.lei, cgroups, linux-block, linux-kernel, yi.zhang

在 2022/05/24 17:59, Michal Koutný 写道:
> On Mon, May 23, 2022 at 04:26:33PM +0800, Yu Kuai <yukuai3@huawei.com> wrote:
>> Fix the problem by respecting the time that throttled bio aready waited.
>> In order to do that, add new fields to record how many bytes/io already
>> waited, and use it to calculate wait time for throttled bio under new
>> configuration.
> 
> This new approach is correctly conserving the bandwidth upon changes.
> (Looking and BPS paths.)
> 
>>
>> Some simple test:
>> 1)
>> cd /sys/fs/cgroup/blkio/
>> echo $$ > cgroup.procs
>> echo "8:0 2048" > blkio.throttle.write_bps_device
>> {
>>          sleep 3
>>          echo "8:0 1024" > blkio.throttle.write_bps_device
>> } &
>> sleep 1
>> dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
>>
>> 2)
>> cd /sys/fs/cgroup/blkio/
>> echo $$ > cgroup.procs
>> echo "8:0 1024" > blkio.throttle.write_bps_device
>> {
>>          sleep 5
>>          echo "8:0 2048" > blkio.throttle.write_bps_device
>> } &
>> sleep 1
>> dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
>>
> 
> It's interesting that you're getting these numbers (w/patch)
> 
>> test results: io finish time
>> 	before this patch	with this patch
>> 1)	10s			6s
>> 2)	8s			6s
> 
> wait := (disp + bio - Δt*l_old) / l_new
> 
> 1)
> wait = (0k + 8k - 3s*2k/s) / 1k/s = 2s -> i.e. 5s absolute
> 
> 2)
> wait = (0k + 8k - 5s*1k/s) / 2k/s = 2.5s -> i.e. 6.5s absolute
> 
> Are you numbers noisy+rounded or do I still mis anything?
Hi, Michal

The way of your caculation is right, however, it seems like you missed
that io is dispatched after 1s:

sleep 1  -> here
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
> 
> (Also isn't it worth having this more permanent in tools/testing/selftest?)
> 
>> +static void tg_update_skipped(struct throtl_grp *tg)
>> +{
>> +	if (tg->service_queue.nr_queued[READ])
>> +		__tg_update_skipped(tg, READ);
>> +	if (tg->service_queue.nr_queued[WRITE])
>> +		__tg_update_skipped(tg, WRITE);
> 
> On one hand, the callers of tg_update_skipped() know whether R/W limit
> is changed, so only the respective variant could be called.
> On the other hand, this conditions look implied by tg->flags &
> THROTL_TG_PENDING.
> (Just noting, it's likely still not possibly to pass the skipped value
> only via stack.)
> 
> 
>> @@ -115,6 +115,10 @@ struct throtl_grp {
>>   	uint64_t bytes_disp[2];
>>   	/* Number of bio's dispatched in current slice */
>>   	unsigned int io_disp[2];
>> +	/* Number of bytes will be skipped in current slice */
>> +	uint64_t bytes_skipped[2];
>> +	/* Number of bio's will be skipped in current slice */
>> +	unsigned int io_skipped[2];
> 
> Please add a comment these fields exists to facilitate config updates
> (the bytes to be skipped is sort of obvious from the name :-).
Ok, will do that in next iteration.

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -next v4 4/4] blk-throttle: fix io hung due to config updates
@ 2022-05-24 11:47       ` Yu Kuai
  0 siblings, 0 replies; 14+ messages in thread
From: Yu Kuai @ 2022-05-24 11:47 UTC (permalink / raw)
  To: Michal Koutný
  Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, axboe-tSWWG44O7X1aa/9Udqfwiw,
	ming.lei-H+wXaHxf7aLQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yi.zhang-hv44wF8Li93QT0dZR+AlfA

在 2022/05/24 17:59, Michal Koutný 写道:
> On Mon, May 23, 2022 at 04:26:33PM +0800, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>> Fix the problem by respecting the time that throttled bio aready waited.
>> In order to do that, add new fields to record how many bytes/io already
>> waited, and use it to calculate wait time for throttled bio under new
>> configuration.
> 
> This new approach is correctly conserving the bandwidth upon changes.
> (Looking and BPS paths.)
> 
>>
>> Some simple test:
>> 1)
>> cd /sys/fs/cgroup/blkio/
>> echo $$ > cgroup.procs
>> echo "8:0 2048" > blkio.throttle.write_bps_device
>> {
>>          sleep 3
>>          echo "8:0 1024" > blkio.throttle.write_bps_device
>> } &
>> sleep 1
>> dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
>>
>> 2)
>> cd /sys/fs/cgroup/blkio/
>> echo $$ > cgroup.procs
>> echo "8:0 1024" > blkio.throttle.write_bps_device
>> {
>>          sleep 5
>>          echo "8:0 2048" > blkio.throttle.write_bps_device
>> } &
>> sleep 1
>> dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
>>
> 
> It's interesting that you're getting these numbers (w/patch)
> 
>> test results: io finish time
>> 	before this patch	with this patch
>> 1)	10s			6s
>> 2)	8s			6s
> 
> wait := (disp + bio - Δt*l_old) / l_new
> 
> 1)
> wait = (0k + 8k - 3s*2k/s) / 1k/s = 2s -> i.e. 5s absolute
> 
> 2)
> wait = (0k + 8k - 5s*1k/s) / 2k/s = 2.5s -> i.e. 6.5s absolute
> 
> Are you numbers noisy+rounded or do I still mis anything?
Hi, Michal

The way of your caculation is right, however, it seems like you missed
that io is dispatched after 1s:

sleep 1  -> here
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct
> 
> (Also isn't it worth having this more permanent in tools/testing/selftest?)
> 
>> +static void tg_update_skipped(struct throtl_grp *tg)
>> +{
>> +	if (tg->service_queue.nr_queued[READ])
>> +		__tg_update_skipped(tg, READ);
>> +	if (tg->service_queue.nr_queued[WRITE])
>> +		__tg_update_skipped(tg, WRITE);
> 
> On one hand, the callers of tg_update_skipped() know whether R/W limit
> is changed, so only the respective variant could be called.
> On the other hand, this conditions look implied by tg->flags &
> THROTL_TG_PENDING.
> (Just noting, it's likely still not possibly to pass the skipped value
> only via stack.)
> 
> 
>> @@ -115,6 +115,10 @@ struct throtl_grp {
>>   	uint64_t bytes_disp[2];
>>   	/* Number of bio's dispatched in current slice */
>>   	unsigned int io_disp[2];
>> +	/* Number of bytes will be skipped in current slice */
>> +	uint64_t bytes_skipped[2];
>> +	/* Number of bio's will be skipped in current slice */
>> +	unsigned int io_skipped[2];
> 
> Please add a comment these fields exists to facilitate config updates
> (the bytes to be skipped is sort of obvious from the name :-).
Ok, will do that in next iteration.

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-05-24 11:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-23  8:26 [PATCH -next v4 0/4] bugfix for blk-throttle Yu Kuai
2022-05-23  8:26 ` Yu Kuai
2022-05-23  8:26 ` [PATCH -next v4 1/4] blk-throttle: fix that io throttle can only work for single bio Yu Kuai
2022-05-23  8:26   ` Yu Kuai
2022-05-23  8:26 ` [PATCH -next v4 2/4] blk-throttle: prevent overflow while calculating wait time Yu Kuai
2022-05-23  8:26   ` Yu Kuai
2022-05-23  8:26 ` [PATCH -next v4 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed Yu Kuai
2022-05-23  8:26   ` Yu Kuai
2022-05-23  8:26 ` [PATCH -next v4 4/4] blk-throttle: fix io hung due to config updates Yu Kuai
2022-05-23  8:26   ` Yu Kuai
2022-05-24  9:59   ` Michal Koutný
2022-05-24  9:59     ` Michal Koutný
2022-05-24 11:47     ` Yu Kuai
2022-05-24 11:47       ` Yu Kuai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.