All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -next v7 0/3] support concurrent sync io for bfq on a specail occasion
@ 2022-05-28  9:50 ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-28  9:50 UTC (permalink / raw)
  To: tj, axboe, paolo.valente
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

Changes in v7:
 - fix mismatch bfq_inc/del_busy_queues() and bfqq_add/del_bfqq_busy(),
 also retest this patchset on v5.18 to make sure functionality is
 correct.
 - move the updating of 'bfqd->busy_queues' into new apis

Changes in v6:
 - add reviewed-by tag for patch 1

Changes in v5:
 - rename bfq_add_busy_queues() to bfq_inc_busy_queues() in patch 1
 - fix wrong definition in patch 1
 - fix spelling mistake in patch 2: leaset -> least
 - update comments in patch 3
 - add reviewed-by tag in patch 2,3

Changes in v4:
 - split bfq_update_busy_queues() to bfq_add/dec_busy_queues(),
   suggested by Jan Kara.
 - remove unused 'in_groups_with_pending_reqs',

Changes in v3:
 - remove the cleanup patch that is irrelevant now(I'll post it
   separately).
 - instead of hacking wr queues and using weights tree insertion/removal,
   using bfq_add/del_bfqq_busy() to count the number of groups
   (suggested by Jan Kara).

Changes in v2:
 - Use a different approch to count root group, which is much simple.

Currently, bfq can't handle sync io concurrently as long as they
are not issued from root group. This is because
'bfqd->num_groups_with_pending_reqs > 0' is always true in
bfq_asymmetric_scenario().

The way that bfqg is counted into 'num_groups_with_pending_reqs':

Before this patchset:
 1) root group will never be counted.
 2) Count if bfqg or it's child bfqgs have pending requests.
 3) Don't count if bfqg and it's child bfqgs complete all the requests.

After this patchset:
 1) root group is counted.
 2) Count if bfqg have at least one bfqq that is marked busy.
 3) Don't count if bfqg doesn't have any busy bfqqs.

The main reason to use busy state of bfqq instead of 'pending requests'
is that bfqq can stay busy after dispatching the last request if idling
is needed for service guarantees.

With the above changes, concurrent sync io can be supported if only
one group is activated.

fio test script(startdelay is used to avoid queue merging):
[global]
filename=/dev/sda
allow_mounted_write=0
ioengine=psync
direct=1
ioscheduler=bfq
offset_increment=10g
group_reporting
rw=randwrite
bs=4k

[test1]
numjobs=1

[test2]
startdelay=1
numjobs=1

[test3]
startdelay=2
numjobs=1

[test4]
startdelay=3
numjobs=1

[test5]
startdelay=4
numjobs=1

[test6]
startdelay=5
numjobs=1

[test7]
startdelay=6
numjobs=1

[test8]
startdelay=7
numjobs=1

test result:
running fio on root cgroup
v5.18:	   112 Mib/s
v5.18-patched: 112 Mib/s

running fio on non-root cgroup
v5.18:	   51.2 Mib/s
v5.18-patched: 112 Mib/s

Note that I also test null_blk with "irqmode=2
completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
that service guarantees are still preserved.

Previous versions:
RFC: https://lore.kernel.org/all/20211127101132.486806-1-yukuai3@huawei.com/
v1: https://lore.kernel.org/all/20220305091205.4188398-1-yukuai3@huawei.com/
v2: https://lore.kernel.org/all/20220416093753.3054696-1-yukuai3@huawei.com/
v3: https://lore.kernel.org/all/20220427124722.48465-1-yukuai3@huawei.com/
v4: https://lore.kernel.org/all/20220428111907.3635820-1-yukuai3@huawei.com/
v5: https://lore.kernel.org/all/20220428120837.3737765-1-yukuai3@huawei.com/
v6: https://lore.kernel.org/all/20220523131818.2798712-1-yukuai3@huawei.com/

Yu Kuai (3):
  block, bfq: record how many queues are busy in bfq_group
  block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
  block, bfq: do not idle if only one group is activated

 block/bfq-cgroup.c  |  1 +
 block/bfq-iosched.c | 48 +++-----------------------------------
 block/bfq-iosched.h | 57 +++++++--------------------------------------
 block/bfq-wf2q.c    | 41 ++++++++++++++++++++------------
 4 files changed, 39 insertions(+), 108 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -next v7 0/3] support concurrent sync io for bfq on a specail occasion
@ 2022-05-28  9:50 ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-28  9:50 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, axboe-tSWWG44O7X1aa/9Udqfwiw,
	paolo.valente-QSEj5FYQhm4dnm+yROfE0A
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	yukuai3-hv44wF8Li93QT0dZR+AlfA, yi.zhang-hv44wF8Li93QT0dZR+AlfA

Changes in v7:
 - fix mismatch bfq_inc/del_busy_queues() and bfqq_add/del_bfqq_busy(),
 also retest this patchset on v5.18 to make sure functionality is
 correct.
 - move the updating of 'bfqd->busy_queues' into new apis

Changes in v6:
 - add reviewed-by tag for patch 1

Changes in v5:
 - rename bfq_add_busy_queues() to bfq_inc_busy_queues() in patch 1
 - fix wrong definition in patch 1
 - fix spelling mistake in patch 2: leaset -> least
 - update comments in patch 3
 - add reviewed-by tag in patch 2,3

Changes in v4:
 - split bfq_update_busy_queues() to bfq_add/dec_busy_queues(),
   suggested by Jan Kara.
 - remove unused 'in_groups_with_pending_reqs',

Changes in v3:
 - remove the cleanup patch that is irrelevant now(I'll post it
   separately).
 - instead of hacking wr queues and using weights tree insertion/removal,
   using bfq_add/del_bfqq_busy() to count the number of groups
   (suggested by Jan Kara).

Changes in v2:
 - Use a different approch to count root group, which is much simple.

Currently, bfq can't handle sync io concurrently as long as they
are not issued from root group. This is because
'bfqd->num_groups_with_pending_reqs > 0' is always true in
bfq_asymmetric_scenario().

The way that bfqg is counted into 'num_groups_with_pending_reqs':

Before this patchset:
 1) root group will never be counted.
 2) Count if bfqg or it's child bfqgs have pending requests.
 3) Don't count if bfqg and it's child bfqgs complete all the requests.

After this patchset:
 1) root group is counted.
 2) Count if bfqg have at least one bfqq that is marked busy.
 3) Don't count if bfqg doesn't have any busy bfqqs.

The main reason to use busy state of bfqq instead of 'pending requests'
is that bfqq can stay busy after dispatching the last request if idling
is needed for service guarantees.

With the above changes, concurrent sync io can be supported if only
one group is activated.

fio test script(startdelay is used to avoid queue merging):
[global]
filename=/dev/sda
allow_mounted_write=0
ioengine=psync
direct=1
ioscheduler=bfq
offset_increment=10g
group_reporting
rw=randwrite
bs=4k

[test1]
numjobs=1

[test2]
startdelay=1
numjobs=1

[test3]
startdelay=2
numjobs=1

[test4]
startdelay=3
numjobs=1

[test5]
startdelay=4
numjobs=1

[test6]
startdelay=5
numjobs=1

[test7]
startdelay=6
numjobs=1

[test8]
startdelay=7
numjobs=1

test result:
running fio on root cgroup
v5.18:	   112 Mib/s
v5.18-patched: 112 Mib/s

running fio on non-root cgroup
v5.18:	   51.2 Mib/s
v5.18-patched: 112 Mib/s

Note that I also test null_blk with "irqmode=2
completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
that service guarantees are still preserved.

Previous versions:
RFC: https://lore.kernel.org/all/20211127101132.486806-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/
v1: https://lore.kernel.org/all/20220305091205.4188398-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/
v2: https://lore.kernel.org/all/20220416093753.3054696-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/
v3: https://lore.kernel.org/all/20220427124722.48465-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/
v4: https://lore.kernel.org/all/20220428111907.3635820-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/
v5: https://lore.kernel.org/all/20220428120837.3737765-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/
v6: https://lore.kernel.org/all/20220523131818.2798712-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/

Yu Kuai (3):
  block, bfq: record how many queues are busy in bfq_group
  block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
  block, bfq: do not idle if only one group is activated

 block/bfq-cgroup.c  |  1 +
 block/bfq-iosched.c | 48 +++-----------------------------------
 block/bfq-iosched.h | 57 +++++++--------------------------------------
 block/bfq-wf2q.c    | 41 ++++++++++++++++++++------------
 4 files changed, 39 insertions(+), 108 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -next v7 1/3] block, bfq: record how many queues are busy in bfq_group
  2022-05-28  9:50 ` Yu Kuai
@ 2022-05-28  9:50   ` Yu Kuai
  -1 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-28  9:50 UTC (permalink / raw)
  To: tj, axboe, paolo.valente
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

Prepare to refactor the counting of 'num_groups_with_pending_reqs'.

Add a counter 'busy_queues' in bfq_group, and update it in
bfq_add/del_bfqq_busy().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bfq-cgroup.c  |  1 +
 block/bfq-iosched.h |  2 ++
 block/bfq-wf2q.c    | 26 ++++++++++++++++++++++++--
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 09574af83566..4d516879d9fa 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -557,6 +557,7 @@ static void bfq_pd_init(struct blkg_policy_data *pd)
 				   */
 	bfqg->bfqd = bfqd;
 	bfqg->active_entities = 0;
+	bfqg->busy_queues = 0;
 	bfqg->online = true;
 	bfqg->rq_pos_tree = RB_ROOT;
 }
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index ca8177d7bf7c..d92adbdd70ee 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -907,6 +907,7 @@ struct bfq_group_data {
  *                   are groups with more than one active @bfq_entity
  *                   (see the comments to the function
  *                   bfq_bfqq_may_idle()).
+ * @busy_queues: number of busy bfqqs.
  * @rq_pos_tree: rbtree sorted by next_request position, used when
  *               determining if two or more queues have interleaving
  *               requests (see bfq_find_close_cooperator()).
@@ -943,6 +944,7 @@ struct bfq_group {
 	struct bfq_entity *my_entity;
 
 	int active_entities;
+	int busy_queues;
 
 	struct rb_root rq_pos_tree;
 
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index f8eb340381cf..b97e33688335 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -218,6 +218,18 @@ static bool bfq_no_longer_next_in_service(struct bfq_entity *entity)
 	return false;
 }
 
+static void bfq_inc_busy_queues(struct bfq_queue *bfqq)
+{
+	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]++;
+	bfqq_group(bfqq)->busy_queues++;
+}
+
+static void bfq_dec_busy_queues(struct bfq_queue *bfqq)
+{
+	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]--;
+	bfqq_group(bfqq)->busy_queues--;
+}
+
 #else /* CONFIG_BFQ_GROUP_IOSCHED */
 
 static bool bfq_update_parent_budget(struct bfq_entity *next_in_service)
@@ -230,6 +242,16 @@ static bool bfq_no_longer_next_in_service(struct bfq_entity *entity)
 	return true;
 }
 
+static void bfq_inc_busy_queues(struct bfq_queue *bfqq)
+{
+	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]++;
+}
+
+static void bfq_dec_busy_queues(struct bfq_queue *bfqq)
+{
+	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]--;
+}
+
 #endif /* CONFIG_BFQ_GROUP_IOSCHED */
 
 /*
@@ -1659,7 +1681,7 @@ void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 
 	bfq_clear_bfqq_busy(bfqq);
 
-	bfqd->busy_queues[bfqq->ioprio_class - 1]--;
+	bfq_dec_busy_queues(bfqq);
 
 	if (bfqq->wr_coeff > 1)
 		bfqd->wr_busy_queues--;
@@ -1682,7 +1704,7 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 	bfq_activate_bfqq(bfqd, bfqq);
 
 	bfq_mark_bfqq_busy(bfqq);
-	bfqd->busy_queues[bfqq->ioprio_class - 1]++;
+	bfq_inc_busy_queues(bfqq);
 
 	if (!bfqq->dispatched)
 		if (bfqq->wr_coeff == 1)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next v7 1/3] block, bfq: record how many queues are busy in bfq_group
@ 2022-05-28  9:50   ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-28  9:50 UTC (permalink / raw)
  To: tj, axboe, paolo.valente
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

Prepare to refactor the counting of 'num_groups_with_pending_reqs'.

Add a counter 'busy_queues' in bfq_group, and update it in
bfq_add/del_bfqq_busy().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bfq-cgroup.c  |  1 +
 block/bfq-iosched.h |  2 ++
 block/bfq-wf2q.c    | 26 ++++++++++++++++++++++++--
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 09574af83566..4d516879d9fa 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -557,6 +557,7 @@ static void bfq_pd_init(struct blkg_policy_data *pd)
 				   */
 	bfqg->bfqd = bfqd;
 	bfqg->active_entities = 0;
+	bfqg->busy_queues = 0;
 	bfqg->online = true;
 	bfqg->rq_pos_tree = RB_ROOT;
 }
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index ca8177d7bf7c..d92adbdd70ee 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -907,6 +907,7 @@ struct bfq_group_data {
  *                   are groups with more than one active @bfq_entity
  *                   (see the comments to the function
  *                   bfq_bfqq_may_idle()).
+ * @busy_queues: number of busy bfqqs.
  * @rq_pos_tree: rbtree sorted by next_request position, used when
  *               determining if two or more queues have interleaving
  *               requests (see bfq_find_close_cooperator()).
@@ -943,6 +944,7 @@ struct bfq_group {
 	struct bfq_entity *my_entity;
 
 	int active_entities;
+	int busy_queues;
 
 	struct rb_root rq_pos_tree;
 
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index f8eb340381cf..b97e33688335 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -218,6 +218,18 @@ static bool bfq_no_longer_next_in_service(struct bfq_entity *entity)
 	return false;
 }
 
+static void bfq_inc_busy_queues(struct bfq_queue *bfqq)
+{
+	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]++;
+	bfqq_group(bfqq)->busy_queues++;
+}
+
+static void bfq_dec_busy_queues(struct bfq_queue *bfqq)
+{
+	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]--;
+	bfqq_group(bfqq)->busy_queues--;
+}
+
 #else /* CONFIG_BFQ_GROUP_IOSCHED */
 
 static bool bfq_update_parent_budget(struct bfq_entity *next_in_service)
@@ -230,6 +242,16 @@ static bool bfq_no_longer_next_in_service(struct bfq_entity *entity)
 	return true;
 }
 
+static void bfq_inc_busy_queues(struct bfq_queue *bfqq)
+{
+	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]++;
+}
+
+static void bfq_dec_busy_queues(struct bfq_queue *bfqq)
+{
+	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]--;
+}
+
 #endif /* CONFIG_BFQ_GROUP_IOSCHED */
 
 /*
@@ -1659,7 +1681,7 @@ void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 
 	bfq_clear_bfqq_busy(bfqq);
 
-	bfqd->busy_queues[bfqq->ioprio_class - 1]--;
+	bfq_dec_busy_queues(bfqq);
 
 	if (bfqq->wr_coeff > 1)
 		bfqd->wr_busy_queues--;
@@ -1682,7 +1704,7 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 	bfq_activate_bfqq(bfqd, bfqq);
 
 	bfq_mark_bfqq_busy(bfqq);
-	bfqd->busy_queues[bfqq->ioprio_class - 1]++;
+	bfq_inc_busy_queues(bfqq);
 
 	if (!bfqq->dispatched)
 		if (bfqq->wr_coeff == 1)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
  2022-05-28  9:50 ` Yu Kuai
@ 2022-05-28  9:50   ` Yu Kuai
  -1 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-28  9:50 UTC (permalink / raw)
  To: tj, axboe, paolo.valente
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

Currently, bfq can't handle sync io concurrently as long as they
are not issued from root group. This is because
'bfqd->num_groups_with_pending_reqs > 0' is always true in
bfq_asymmetric_scenario().

The way that bfqg is counted into 'num_groups_with_pending_reqs':

Before this patch:
 1) root group will never be counted.
 2) Count if bfqg or it's child bfqgs have pending requests.
 3) Don't count if bfqg and it's child bfqgs complete all the requests.

After this patch:
 1) root group is counted.
 2) Count if bfqg have at least one bfqq that is marked busy.
 3) Don't count if bfqg doesn't have any busy bfqqs.

The main reason to use busy state of bfqq instead of 'pending requests'
is that bfqq can stay busy after dispatching the last request if idling
is needed for service guarantees.

With this change, the occasion that only one group is activated can be
detected, and next patch will support concurrent sync io in the
occasion.

This patch also rename 'num_groups_with_pending_reqs' to
'num_groups_with_busy_queues'.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 46 ++-----------------------------------
 block/bfq-iosched.h | 55 ++++++---------------------------------------
 block/bfq-wf2q.c    | 19 ++++------------
 3 files changed, 13 insertions(+), 107 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0d46cb728bbf..eb1da1bd5eb4 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -852,7 +852,7 @@ static bool bfq_asymmetric_scenario(struct bfq_data *bfqd,
 
 	return varied_queue_weights || multiple_classes_busy
 #ifdef CONFIG_BFQ_GROUP_IOSCHED
-	       || bfqd->num_groups_with_pending_reqs > 0
+	       || bfqd->num_groups_with_busy_queues > 0
 #endif
 		;
 }
@@ -970,48 +970,6 @@ void __bfq_weights_tree_remove(struct bfq_data *bfqd,
 void bfq_weights_tree_remove(struct bfq_data *bfqd,
 			     struct bfq_queue *bfqq)
 {
-	struct bfq_entity *entity = bfqq->entity.parent;
-
-	for_each_entity(entity) {
-		struct bfq_sched_data *sd = entity->my_sched_data;
-
-		if (sd->next_in_service || sd->in_service_entity) {
-			/*
-			 * entity is still active, because either
-			 * next_in_service or in_service_entity is not
-			 * NULL (see the comments on the definition of
-			 * next_in_service for details on why
-			 * in_service_entity must be checked too).
-			 *
-			 * As a consequence, its parent entities are
-			 * active as well, and thus this loop must
-			 * stop here.
-			 */
-			break;
-		}
-
-		/*
-		 * The decrement of num_groups_with_pending_reqs is
-		 * not performed immediately upon the deactivation of
-		 * entity, but it is delayed to when it also happens
-		 * that the first leaf descendant bfqq of entity gets
-		 * all its pending requests completed. The following
-		 * instructions perform this delayed decrement, if
-		 * needed. See the comments on
-		 * num_groups_with_pending_reqs for details.
-		 */
-		if (entity->in_groups_with_pending_reqs) {
-			entity->in_groups_with_pending_reqs = false;
-			bfqd->num_groups_with_pending_reqs--;
-		}
-	}
-
-	/*
-	 * Next function is invoked last, because it causes bfqq to be
-	 * freed if the following holds: bfqq is not in service and
-	 * has no dispatched request. DO NOT use bfqq after the next
-	 * function invocation.
-	 */
 	__bfq_weights_tree_remove(bfqd, bfqq,
 				  &bfqd->queue_weights_tree);
 }
@@ -7118,7 +7076,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
 	bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
 
 	bfqd->queue_weights_tree = RB_ROOT_CACHED;
-	bfqd->num_groups_with_pending_reqs = 0;
+	bfqd->num_groups_with_busy_queues = 0;
 
 	INIT_LIST_HEAD(&bfqd->active_list);
 	INIT_LIST_HEAD(&bfqd->idle_list);
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index d92adbdd70ee..6c6cd984d769 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -197,9 +197,6 @@ struct bfq_entity {
 	/* flag, set to request a weight, ioprio or ioprio_class change  */
 	int prio_changed;
 
-	/* flag, set if the entity is counted in groups_with_pending_reqs */
-	bool in_groups_with_pending_reqs;
-
 	/* last child queue of entity created (for non-leaf entities) */
 	struct bfq_queue *last_bfqq_created;
 };
@@ -496,52 +493,14 @@ struct bfq_data {
 	struct rb_root_cached queue_weights_tree;
 
 	/*
-	 * Number of groups with at least one descendant process that
-	 * has at least one request waiting for completion. Note that
-	 * this accounts for also requests already dispatched, but not
-	 * yet completed. Therefore this number of groups may differ
-	 * (be larger) than the number of active groups, as a group is
-	 * considered active only if its corresponding entity has
-	 * descendant queues with at least one request queued. This
-	 * number is used to decide whether a scenario is symmetric.
-	 * For a detailed explanation see comments on the computation
-	 * of the variable asymmetric_scenario in the function
-	 * bfq_better_to_idle().
-	 *
-	 * However, it is hard to compute this number exactly, for
-	 * groups with multiple descendant processes. Consider a group
-	 * that is inactive, i.e., that has no descendant process with
-	 * pending I/O inside BFQ queues. Then suppose that
-	 * num_groups_with_pending_reqs is still accounting for this
-	 * group, because the group has descendant processes with some
-	 * I/O request still in flight. num_groups_with_pending_reqs
-	 * should be decremented when the in-flight request of the
-	 * last descendant process is finally completed (assuming that
-	 * nothing else has changed for the group in the meantime, in
-	 * terms of composition of the group and active/inactive state of child
-	 * groups and processes). To accomplish this, an additional
-	 * pending-request counter must be added to entities, and must
-	 * be updated correctly. To avoid this additional field and operations,
-	 * we resort to the following tradeoff between simplicity and
-	 * accuracy: for an inactive group that is still counted in
-	 * num_groups_with_pending_reqs, we decrement
-	 * num_groups_with_pending_reqs when the first descendant
-	 * process of the group remains with no request waiting for
-	 * completion.
-	 *
-	 * Even this simpler decrement strategy requires a little
-	 * carefulness: to avoid multiple decrements, we flag a group,
-	 * more precisely an entity representing a group, as still
-	 * counted in num_groups_with_pending_reqs when it becomes
-	 * inactive. Then, when the first descendant queue of the
-	 * entity remains with no request waiting for completion,
-	 * num_groups_with_pending_reqs is decremented, and this flag
-	 * is reset. After this flag is reset for the entity,
-	 * num_groups_with_pending_reqs won't be decremented any
-	 * longer in case a new descendant queue of the entity remains
-	 * with no request waiting for completion.
+	 * Number of groups with at least one bfqq that is marked busy,
+	 * and this number is used to decide whether a scenario is symmetric.
+	 * Note that bfqq is busy doesn't mean that the bfqq contains requests.
+	 * If idling is needed for service guarantees, bfqq will stay busy
+	 * after dispatching the last request, see details in
+	 * __bfq_bfqq_expire().
 	 */
-	unsigned int num_groups_with_pending_reqs;
+	unsigned int num_groups_with_busy_queues;
 
 	/*
 	 * Per-class (RT, BE, IDLE) number of bfq_queues containing
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index b97e33688335..48ca7922035c 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -221,13 +221,15 @@ static bool bfq_no_longer_next_in_service(struct bfq_entity *entity)
 static void bfq_inc_busy_queues(struct bfq_queue *bfqq)
 {
 	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]++;
-	bfqq_group(bfqq)->busy_queues++;
+	if (!(bfqq_group(bfqq)->busy_queues++))
+		bfqq->bfqd->num_groups_with_busy_queues++;
 }
 
 static void bfq_dec_busy_queues(struct bfq_queue *bfqq)
 {
 	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]--;
-	bfqq_group(bfqq)->busy_queues--;
+	if (!(--bfqq_group(bfqq)->busy_queues))
+		bfqq->bfqd->num_groups_with_busy_queues--;
 }
 
 #else /* CONFIG_BFQ_GROUP_IOSCHED */
@@ -1006,19 +1008,6 @@ static void __bfq_activate_entity(struct bfq_entity *entity,
 		entity->on_st_or_in_serv = true;
 	}
 
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
-	if (!bfq_entity_to_bfqq(entity)) { /* bfq_group */
-		struct bfq_group *bfqg =
-			container_of(entity, struct bfq_group, entity);
-		struct bfq_data *bfqd = bfqg->bfqd;
-
-		if (!entity->in_groups_with_pending_reqs) {
-			entity->in_groups_with_pending_reqs = true;
-			bfqd->num_groups_with_pending_reqs++;
-		}
-	}
-#endif
-
 	bfq_update_fin_time_enqueue(entity, st, backshifted);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-28  9:50   ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-28  9:50 UTC (permalink / raw)
  To: tj, axboe, paolo.valente
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

Currently, bfq can't handle sync io concurrently as long as they
are not issued from root group. This is because
'bfqd->num_groups_with_pending_reqs > 0' is always true in
bfq_asymmetric_scenario().

The way that bfqg is counted into 'num_groups_with_pending_reqs':

Before this patch:
 1) root group will never be counted.
 2) Count if bfqg or it's child bfqgs have pending requests.
 3) Don't count if bfqg and it's child bfqgs complete all the requests.

After this patch:
 1) root group is counted.
 2) Count if bfqg have at least one bfqq that is marked busy.
 3) Don't count if bfqg doesn't have any busy bfqqs.

The main reason to use busy state of bfqq instead of 'pending requests'
is that bfqq can stay busy after dispatching the last request if idling
is needed for service guarantees.

With this change, the occasion that only one group is activated can be
detected, and next patch will support concurrent sync io in the
occasion.

This patch also rename 'num_groups_with_pending_reqs' to
'num_groups_with_busy_queues'.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 46 ++-----------------------------------
 block/bfq-iosched.h | 55 ++++++---------------------------------------
 block/bfq-wf2q.c    | 19 ++++------------
 3 files changed, 13 insertions(+), 107 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0d46cb728bbf..eb1da1bd5eb4 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -852,7 +852,7 @@ static bool bfq_asymmetric_scenario(struct bfq_data *bfqd,
 
 	return varied_queue_weights || multiple_classes_busy
 #ifdef CONFIG_BFQ_GROUP_IOSCHED
-	       || bfqd->num_groups_with_pending_reqs > 0
+	       || bfqd->num_groups_with_busy_queues > 0
 #endif
 		;
 }
@@ -970,48 +970,6 @@ void __bfq_weights_tree_remove(struct bfq_data *bfqd,
 void bfq_weights_tree_remove(struct bfq_data *bfqd,
 			     struct bfq_queue *bfqq)
 {
-	struct bfq_entity *entity = bfqq->entity.parent;
-
-	for_each_entity(entity) {
-		struct bfq_sched_data *sd = entity->my_sched_data;
-
-		if (sd->next_in_service || sd->in_service_entity) {
-			/*
-			 * entity is still active, because either
-			 * next_in_service or in_service_entity is not
-			 * NULL (see the comments on the definition of
-			 * next_in_service for details on why
-			 * in_service_entity must be checked too).
-			 *
-			 * As a consequence, its parent entities are
-			 * active as well, and thus this loop must
-			 * stop here.
-			 */
-			break;
-		}
-
-		/*
-		 * The decrement of num_groups_with_pending_reqs is
-		 * not performed immediately upon the deactivation of
-		 * entity, but it is delayed to when it also happens
-		 * that the first leaf descendant bfqq of entity gets
-		 * all its pending requests completed. The following
-		 * instructions perform this delayed decrement, if
-		 * needed. See the comments on
-		 * num_groups_with_pending_reqs for details.
-		 */
-		if (entity->in_groups_with_pending_reqs) {
-			entity->in_groups_with_pending_reqs = false;
-			bfqd->num_groups_with_pending_reqs--;
-		}
-	}
-
-	/*
-	 * Next function is invoked last, because it causes bfqq to be
-	 * freed if the following holds: bfqq is not in service and
-	 * has no dispatched request. DO NOT use bfqq after the next
-	 * function invocation.
-	 */
 	__bfq_weights_tree_remove(bfqd, bfqq,
 				  &bfqd->queue_weights_tree);
 }
@@ -7118,7 +7076,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
 	bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
 
 	bfqd->queue_weights_tree = RB_ROOT_CACHED;
-	bfqd->num_groups_with_pending_reqs = 0;
+	bfqd->num_groups_with_busy_queues = 0;
 
 	INIT_LIST_HEAD(&bfqd->active_list);
 	INIT_LIST_HEAD(&bfqd->idle_list);
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index d92adbdd70ee..6c6cd984d769 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -197,9 +197,6 @@ struct bfq_entity {
 	/* flag, set to request a weight, ioprio or ioprio_class change  */
 	int prio_changed;
 
-	/* flag, set if the entity is counted in groups_with_pending_reqs */
-	bool in_groups_with_pending_reqs;
-
 	/* last child queue of entity created (for non-leaf entities) */
 	struct bfq_queue *last_bfqq_created;
 };
@@ -496,52 +493,14 @@ struct bfq_data {
 	struct rb_root_cached queue_weights_tree;
 
 	/*
-	 * Number of groups with at least one descendant process that
-	 * has at least one request waiting for completion. Note that
-	 * this accounts for also requests already dispatched, but not
-	 * yet completed. Therefore this number of groups may differ
-	 * (be larger) than the number of active groups, as a group is
-	 * considered active only if its corresponding entity has
-	 * descendant queues with at least one request queued. This
-	 * number is used to decide whether a scenario is symmetric.
-	 * For a detailed explanation see comments on the computation
-	 * of the variable asymmetric_scenario in the function
-	 * bfq_better_to_idle().
-	 *
-	 * However, it is hard to compute this number exactly, for
-	 * groups with multiple descendant processes. Consider a group
-	 * that is inactive, i.e., that has no descendant process with
-	 * pending I/O inside BFQ queues. Then suppose that
-	 * num_groups_with_pending_reqs is still accounting for this
-	 * group, because the group has descendant processes with some
-	 * I/O request still in flight. num_groups_with_pending_reqs
-	 * should be decremented when the in-flight request of the
-	 * last descendant process is finally completed (assuming that
-	 * nothing else has changed for the group in the meantime, in
-	 * terms of composition of the group and active/inactive state of child
-	 * groups and processes). To accomplish this, an additional
-	 * pending-request counter must be added to entities, and must
-	 * be updated correctly. To avoid this additional field and operations,
-	 * we resort to the following tradeoff between simplicity and
-	 * accuracy: for an inactive group that is still counted in
-	 * num_groups_with_pending_reqs, we decrement
-	 * num_groups_with_pending_reqs when the first descendant
-	 * process of the group remains with no request waiting for
-	 * completion.
-	 *
-	 * Even this simpler decrement strategy requires a little
-	 * carefulness: to avoid multiple decrements, we flag a group,
-	 * more precisely an entity representing a group, as still
-	 * counted in num_groups_with_pending_reqs when it becomes
-	 * inactive. Then, when the first descendant queue of the
-	 * entity remains with no request waiting for completion,
-	 * num_groups_with_pending_reqs is decremented, and this flag
-	 * is reset. After this flag is reset for the entity,
-	 * num_groups_with_pending_reqs won't be decremented any
-	 * longer in case a new descendant queue of the entity remains
-	 * with no request waiting for completion.
+	 * Number of groups with at least one bfqq that is marked busy,
+	 * and this number is used to decide whether a scenario is symmetric.
+	 * Note that bfqq is busy doesn't mean that the bfqq contains requests.
+	 * If idling is needed for service guarantees, bfqq will stay busy
+	 * after dispatching the last request, see details in
+	 * __bfq_bfqq_expire().
 	 */
-	unsigned int num_groups_with_pending_reqs;
+	unsigned int num_groups_with_busy_queues;
 
 	/*
 	 * Per-class (RT, BE, IDLE) number of bfq_queues containing
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index b97e33688335..48ca7922035c 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -221,13 +221,15 @@ static bool bfq_no_longer_next_in_service(struct bfq_entity *entity)
 static void bfq_inc_busy_queues(struct bfq_queue *bfqq)
 {
 	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]++;
-	bfqq_group(bfqq)->busy_queues++;
+	if (!(bfqq_group(bfqq)->busy_queues++))
+		bfqq->bfqd->num_groups_with_busy_queues++;
 }
 
 static void bfq_dec_busy_queues(struct bfq_queue *bfqq)
 {
 	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]--;
-	bfqq_group(bfqq)->busy_queues--;
+	if (!(--bfqq_group(bfqq)->busy_queues))
+		bfqq->bfqd->num_groups_with_busy_queues--;
 }
 
 #else /* CONFIG_BFQ_GROUP_IOSCHED */
@@ -1006,19 +1008,6 @@ static void __bfq_activate_entity(struct bfq_entity *entity,
 		entity->on_st_or_in_serv = true;
 	}
 
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
-	if (!bfq_entity_to_bfqq(entity)) { /* bfq_group */
-		struct bfq_group *bfqg =
-			container_of(entity, struct bfq_group, entity);
-		struct bfq_data *bfqd = bfqg->bfqd;
-
-		if (!entity->in_groups_with_pending_reqs) {
-			entity->in_groups_with_pending_reqs = true;
-			bfqd->num_groups_with_pending_reqs++;
-		}
-	}
-#endif
-
 	bfq_update_fin_time_enqueue(entity, st, backshifted);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next v7 3/3] block, bfq: do not idle if only one group is activated
  2022-05-28  9:50 ` Yu Kuai
@ 2022-05-28  9:50   ` Yu Kuai
  -1 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-28  9:50 UTC (permalink / raw)
  To: tj, axboe, paolo.valente
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

Now that root group is counted into 'num_groups_with_busy_queues',
'num_groups_with_busy_queues > 0' is always true in
bfq_asymmetric_scenario(). Thus change the condition to '> 1'.

On the other hand, this change can enable concurrent sync io if only
one group is activated.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index eb1da1bd5eb4..ffbc2d1593af 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -820,7 +820,7 @@ bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq)
  * much easier to maintain the needed state:
  * 1) all active queues have the same weight,
  * 2) all active queues belong to the same I/O-priority class,
- * 3) there are no active groups.
+ * 3) there are one active group at most.
  * In particular, the last condition is always true if hierarchical
  * support or the cgroups interface are not enabled, thus no state
  * needs to be maintained in this case.
@@ -852,7 +852,7 @@ static bool bfq_asymmetric_scenario(struct bfq_data *bfqd,
 
 	return varied_queue_weights || multiple_classes_busy
 #ifdef CONFIG_BFQ_GROUP_IOSCHED
-	       || bfqd->num_groups_with_busy_queues > 0
+	       || bfqd->num_groups_with_busy_queues > 1
 #endif
 		;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next v7 3/3] block, bfq: do not idle if only one group is activated
@ 2022-05-28  9:50   ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-28  9:50 UTC (permalink / raw)
  To: tj, axboe, paolo.valente
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yi.zhang

Now that root group is counted into 'num_groups_with_busy_queues',
'num_groups_with_busy_queues > 0' is always true in
bfq_asymmetric_scenario(). Thus change the condition to '> 1'.

On the other hand, this change can enable concurrent sync io if only
one group is activated.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index eb1da1bd5eb4..ffbc2d1593af 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -820,7 +820,7 @@ bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq)
  * much easier to maintain the needed state:
  * 1) all active queues have the same weight,
  * 2) all active queues belong to the same I/O-priority class,
- * 3) there are no active groups.
+ * 3) there are one active group at most.
  * In particular, the last condition is always true if hierarchical
  * support or the cgroups interface are not enabled, thus no state
  * needs to be maintained in this case.
@@ -852,7 +852,7 @@ static bool bfq_asymmetric_scenario(struct bfq_data *bfqd,
 
 	return varied_queue_weights || multiple_classes_busy
 #ifdef CONFIG_BFQ_GROUP_IOSCHED
-	       || bfqd->num_groups_with_busy_queues > 0
+	       || bfqd->num_groups_with_busy_queues > 1
 #endif
 		;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
  2022-05-28  9:50   ` Yu Kuai
  (?)
@ 2022-05-30  8:10   ` Paolo Valente
  2022-05-30  8:34       ` Yu Kuai
  -1 siblings, 1 reply; 28+ messages in thread
From: Paolo Valente @ 2022-05-30  8:10 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Tejun Heo, Jens Axboe, cgroups, linux-block, LKML, yi.zhang



> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
> 
> Currently, bfq can't handle sync io concurrently as long as they
> are not issued from root group. This is because
> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
> bfq_asymmetric_scenario().
> 
> The way that bfqg is counted into 'num_groups_with_pending_reqs':
> 
> Before this patch:
> 1) root group will never be counted.
> 2) Count if bfqg or it's child bfqgs have pending requests.
> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
> 
> After this patch:
> 1) root group is counted.
> 2) Count if bfqg have at least one bfqq that is marked busy.
> 3) Don't count if bfqg doesn't have any busy bfqqs.

Unfortunately, I see a last problem here. I see a double change:
(1) a bfqg is now counted only as a function of the state of its child
    queues, and not of also its child bfqgs
(2) the state considered for counting a bfqg moves from having pending
    requests to having busy queues

I'm ok with with (1), which is a good catch (you are lady explained
the idea to me some time ago IIRC).

Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
still has in-flight I/O, i.e.  I/O being served in the drive.  The
weight of such a bfqq must still be considered in the weights_tree,
and the group containing such a queue must still be counted when
checking whether the scenario is asymmetric.  Otherwise service
guarantees are broken.  The reason is that, if a scenario is deemed as
symmetric because in-flight I/O is not taken into account, then idling
will not be performed to protect some bfqq, and in-flight I/O may
steal bandwidth to that bfqq in an uncontrolled way.

I verified this also experimentally a few years ago, when I added this
weights_tree stuff.  That's the rationale behind the part of
bfq_weights_tree_remove that this patch eliminates.  IOW,
for a bfqq and its parent bfqg to be out of the count for symmetry,
all bfqq's requests must also be completed.

Thanks,
Paolo

> 
> The main reason to use busy state of bfqq instead of 'pending requests'
> is that bfqq can stay busy after dispatching the last request if idling
> is needed for service guarantees.
> 
> With this change, the occasion that only one group is activated can be
> detected, and next patch will support concurrent sync io in the
> occasion.
> 
> This patch also rename 'num_groups_with_pending_reqs' to
> 'num_groups_with_busy_queues'.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> ---
> block/bfq-iosched.c | 46 ++-----------------------------------
> block/bfq-iosched.h | 55 ++++++---------------------------------------
> block/bfq-wf2q.c    | 19 ++++------------
> 3 files changed, 13 insertions(+), 107 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 0d46cb728bbf..eb1da1bd5eb4 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -852,7 +852,7 @@ static bool bfq_asymmetric_scenario(struct bfq_data *bfqd,
> 
> 	return varied_queue_weights || multiple_classes_busy
> #ifdef CONFIG_BFQ_GROUP_IOSCHED
> -	       || bfqd->num_groups_with_pending_reqs > 0
> +	       || bfqd->num_groups_with_busy_queues > 0
> #endif
> 		;
> }
> @@ -970,48 +970,6 @@ void __bfq_weights_tree_remove(struct bfq_data *bfqd,
> void bfq_weights_tree_remove(struct bfq_data *bfqd,
> 			     struct bfq_queue *bfqq)
> {
> -	struct bfq_entity *entity = bfqq->entity.parent;
> -
> -	for_each_entity(entity) {
> -		struct bfq_sched_data *sd = entity->my_sched_data;
> -
> -		if (sd->next_in_service || sd->in_service_entity) {
> -			/*
> -			 * entity is still active, because either
> -			 * next_in_service or in_service_entity is not
> -			 * NULL (see the comments on the definition of
> -			 * next_in_service for details on why
> -			 * in_service_entity must be checked too).
> -			 *
> -			 * As a consequence, its parent entities are
> -			 * active as well, and thus this loop must
> -			 * stop here.
> -			 */
> -			break;
> -		}
> -
> -		/*
> -		 * The decrement of num_groups_with_pending_reqs is
> -		 * not performed immediately upon the deactivation of
> -		 * entity, but it is delayed to when it also happens
> -		 * that the first leaf descendant bfqq of entity gets
> -		 * all its pending requests completed. The following
> -		 * instructions perform this delayed decrement, if
> -		 * needed. See the comments on
> -		 * num_groups_with_pending_reqs for details.
> -		 */
> -		if (entity->in_groups_with_pending_reqs) {
> -			entity->in_groups_with_pending_reqs = false;
> -			bfqd->num_groups_with_pending_reqs--;
> -		}
> -	}
> -
> -	/*
> -	 * Next function is invoked last, because it causes bfqq to be
> -	 * freed if the following holds: bfqq is not in service and
> -	 * has no dispatched request. DO NOT use bfqq after the next
> -	 * function invocation.
> -	 */
> 	__bfq_weights_tree_remove(bfqd, bfqq,
> 				  &bfqd->queue_weights_tree);
> }
> @@ -7118,7 +7076,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
> 	bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
> 
> 	bfqd->queue_weights_tree = RB_ROOT_CACHED;
> -	bfqd->num_groups_with_pending_reqs = 0;
> +	bfqd->num_groups_with_busy_queues = 0;
> 
> 	INIT_LIST_HEAD(&bfqd->active_list);
> 	INIT_LIST_HEAD(&bfqd->idle_list);
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index d92adbdd70ee..6c6cd984d769 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -197,9 +197,6 @@ struct bfq_entity {
> 	/* flag, set to request a weight, ioprio or ioprio_class change  */
> 	int prio_changed;
> 
> -	/* flag, set if the entity is counted in groups_with_pending_reqs */
> -	bool in_groups_with_pending_reqs;
> -
> 	/* last child queue of entity created (for non-leaf entities) */
> 	struct bfq_queue *last_bfqq_created;
> };
> @@ -496,52 +493,14 @@ struct bfq_data {
> 	struct rb_root_cached queue_weights_tree;
> 
> 	/*
> -	 * Number of groups with at least one descendant process that
> -	 * has at least one request waiting for completion. Note that
> -	 * this accounts for also requests already dispatched, but not
> -	 * yet completed. Therefore this number of groups may differ
> -	 * (be larger) than the number of active groups, as a group is
> -	 * considered active only if its corresponding entity has
> -	 * descendant queues with at least one request queued. This
> -	 * number is used to decide whether a scenario is symmetric.
> -	 * For a detailed explanation see comments on the computation
> -	 * of the variable asymmetric_scenario in the function
> -	 * bfq_better_to_idle().
> -	 *
> -	 * However, it is hard to compute this number exactly, for
> -	 * groups with multiple descendant processes. Consider a group
> -	 * that is inactive, i.e., that has no descendant process with
> -	 * pending I/O inside BFQ queues. Then suppose that
> -	 * num_groups_with_pending_reqs is still accounting for this
> -	 * group, because the group has descendant processes with some
> -	 * I/O request still in flight. num_groups_with_pending_reqs
> -	 * should be decremented when the in-flight request of the
> -	 * last descendant process is finally completed (assuming that
> -	 * nothing else has changed for the group in the meantime, in
> -	 * terms of composition of the group and active/inactive state of child
> -	 * groups and processes). To accomplish this, an additional
> -	 * pending-request counter must be added to entities, and must
> -	 * be updated correctly. To avoid this additional field and operations,
> -	 * we resort to the following tradeoff between simplicity and
> -	 * accuracy: for an inactive group that is still counted in
> -	 * num_groups_with_pending_reqs, we decrement
> -	 * num_groups_with_pending_reqs when the first descendant
> -	 * process of the group remains with no request waiting for
> -	 * completion.
> -	 *
> -	 * Even this simpler decrement strategy requires a little
> -	 * carefulness: to avoid multiple decrements, we flag a group,
> -	 * more precisely an entity representing a group, as still
> -	 * counted in num_groups_with_pending_reqs when it becomes
> -	 * inactive. Then, when the first descendant queue of the
> -	 * entity remains with no request waiting for completion,
> -	 * num_groups_with_pending_reqs is decremented, and this flag
> -	 * is reset. After this flag is reset for the entity,
> -	 * num_groups_with_pending_reqs won't be decremented any
> -	 * longer in case a new descendant queue of the entity remains
> -	 * with no request waiting for completion.
> +	 * Number of groups with at least one bfqq that is marked busy,
> +	 * and this number is used to decide whether a scenario is symmetric.
> +	 * Note that bfqq is busy doesn't mean that the bfqq contains requests.
> +	 * If idling is needed for service guarantees, bfqq will stay busy
> +	 * after dispatching the last request, see details in
> +	 * __bfq_bfqq_expire().
> 	 */
> -	unsigned int num_groups_with_pending_reqs;
> +	unsigned int num_groups_with_busy_queues;
> 
> 	/*
> 	 * Per-class (RT, BE, IDLE) number of bfq_queues containing
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index b97e33688335..48ca7922035c 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
> @@ -221,13 +221,15 @@ static bool bfq_no_longer_next_in_service(struct bfq_entity *entity)
> static void bfq_inc_busy_queues(struct bfq_queue *bfqq)
> {
> 	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]++;
> -	bfqq_group(bfqq)->busy_queues++;
> +	if (!(bfqq_group(bfqq)->busy_queues++))
> +		bfqq->bfqd->num_groups_with_busy_queues++;
> }
> 
> static void bfq_dec_busy_queues(struct bfq_queue *bfqq)
> {
> 	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]--;
> -	bfqq_group(bfqq)->busy_queues--;
> +	if (!(--bfqq_group(bfqq)->busy_queues))
> +		bfqq->bfqd->num_groups_with_busy_queues--;
> }
> 
> #else /* CONFIG_BFQ_GROUP_IOSCHED */
> @@ -1006,19 +1008,6 @@ static void __bfq_activate_entity(struct bfq_entity *entity,
> 		entity->on_st_or_in_serv = true;
> 	}
> 
> -#ifdef CONFIG_BFQ_GROUP_IOSCHED
> -	if (!bfq_entity_to_bfqq(entity)) { /* bfq_group */
> -		struct bfq_group *bfqg =
> -			container_of(entity, struct bfq_group, entity);
> -		struct bfq_data *bfqd = bfqg->bfqd;
> -
> -		if (!entity->in_groups_with_pending_reqs) {
> -			entity->in_groups_with_pending_reqs = true;
> -			bfqd->num_groups_with_pending_reqs++;
> -		}
> -	}
> -#endif
> -
> 	bfq_update_fin_time_enqueue(entity, st, backshifted);
> }
> 
> -- 
> 2.31.1
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
  2022-05-30  8:10   ` Paolo Valente
@ 2022-05-30  8:34       ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-30  8:34 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Tejun Heo, Jens Axboe, cgroups, linux-block, LKML, yi.zhang

在 2022/05/30 16:10, Paolo Valente 写道:
> 
> 
>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>
>> Currently, bfq can't handle sync io concurrently as long as they
>> are not issued from root group. This is because
>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>> bfq_asymmetric_scenario().
>>
>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>
>> Before this patch:
>> 1) root group will never be counted.
>> 2) Count if bfqg or it's child bfqgs have pending requests.
>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>
>> After this patch:
>> 1) root group is counted.
>> 2) Count if bfqg have at least one bfqq that is marked busy.
>> 3) Don't count if bfqg doesn't have any busy bfqqs.
> 
> Unfortunately, I see a last problem here. I see a double change:
> (1) a bfqg is now counted only as a function of the state of its child
>      queues, and not of also its child bfqgs
> (2) the state considered for counting a bfqg moves from having pending
>      requests to having busy queues
> 
> I'm ok with with (1), which is a good catch (you are lady explained
> the idea to me some time ago IIRC).
> 
> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
> still has in-flight I/O, i.e.  I/O being served in the drive.  The
> weight of such a bfqq must still be considered in the weights_tree,
> and the group containing such a queue must still be counted when
> checking whether the scenario is asymmetric.  Otherwise service
> guarantees are broken.  The reason is that, if a scenario is deemed as
> symmetric because in-flight I/O is not taken into account, then idling
> will not be performed to protect some bfqq, and in-flight I/O may
> steal bandwidth to that bfqq in an uncontrolled way.
Hi, Paolo

Thanks for your explanation.

My orginal thoughts was using weights_tree insertion/removal, however,
Jan convinced me that using bfq_add/del_bfqq_busy() is ok.

 From what I see, when bfqq dispatch the last request,
bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
idling is needed, and it will delayed to when such bfqq get scheduled as
in-service queue again. Which means the weight of such bfqq should still
be considered in the weights_tree.

I also run some tests on null_blk with "irqmode=2
completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
that service guarantees are still preserved on slow device.

Do you this is strong enough to cover your concern?

Thanks,
Kuai
> 
> I verified this also experimentally a few years ago, when I added this
> weights_tree stuff.  That's the rationale behind the part of
> bfq_weights_tree_remove that this patch eliminates.  IOW,
> for a bfqq and its parent bfqg to be out of the count for symmetry,
> all bfqq's requests must also be completed.
> 
> Thanks,
> Paolo
> 
>>
>> The main reason to use busy state of bfqq instead of 'pending requests'
>> is that bfqq can stay busy after dispatching the last request if idling
>> is needed for service guarantees.
>>
>> With this change, the occasion that only one group is activated can be
>> detected, and next patch will support concurrent sync io in the
>> occasion.
>>
>> This patch also rename 'num_groups_with_pending_reqs' to
>> 'num_groups_with_busy_queues'.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> Reviewed-by: Jan Kara <jack@suse.cz>
>> ---
>> block/bfq-iosched.c | 46 ++-----------------------------------
>> block/bfq-iosched.h | 55 ++++++---------------------------------------
>> block/bfq-wf2q.c    | 19 ++++------------
>> 3 files changed, 13 insertions(+), 107 deletions(-)
>>
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index 0d46cb728bbf..eb1da1bd5eb4 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -852,7 +852,7 @@ static bool bfq_asymmetric_scenario(struct bfq_data *bfqd,
>>
>> 	return varied_queue_weights || multiple_classes_busy
>> #ifdef CONFIG_BFQ_GROUP_IOSCHED
>> -	       || bfqd->num_groups_with_pending_reqs > 0
>> +	       || bfqd->num_groups_with_busy_queues > 0
>> #endif
>> 		;
>> }
>> @@ -970,48 +970,6 @@ void __bfq_weights_tree_remove(struct bfq_data *bfqd,
>> void bfq_weights_tree_remove(struct bfq_data *bfqd,
>> 			     struct bfq_queue *bfqq)
>> {
>> -	struct bfq_entity *entity = bfqq->entity.parent;
>> -
>> -	for_each_entity(entity) {
>> -		struct bfq_sched_data *sd = entity->my_sched_data;
>> -
>> -		if (sd->next_in_service || sd->in_service_entity) {
>> -			/*
>> -			 * entity is still active, because either
>> -			 * next_in_service or in_service_entity is not
>> -			 * NULL (see the comments on the definition of
>> -			 * next_in_service for details on why
>> -			 * in_service_entity must be checked too).
>> -			 *
>> -			 * As a consequence, its parent entities are
>> -			 * active as well, and thus this loop must
>> -			 * stop here.
>> -			 */
>> -			break;
>> -		}
>> -
>> -		/*
>> -		 * The decrement of num_groups_with_pending_reqs is
>> -		 * not performed immediately upon the deactivation of
>> -		 * entity, but it is delayed to when it also happens
>> -		 * that the first leaf descendant bfqq of entity gets
>> -		 * all its pending requests completed. The following
>> -		 * instructions perform this delayed decrement, if
>> -		 * needed. See the comments on
>> -		 * num_groups_with_pending_reqs for details.
>> -		 */
>> -		if (entity->in_groups_with_pending_reqs) {
>> -			entity->in_groups_with_pending_reqs = false;
>> -			bfqd->num_groups_with_pending_reqs--;
>> -		}
>> -	}
>> -
>> -	/*
>> -	 * Next function is invoked last, because it causes bfqq to be
>> -	 * freed if the following holds: bfqq is not in service and
>> -	 * has no dispatched request. DO NOT use bfqq after the next
>> -	 * function invocation.
>> -	 */
>> 	__bfq_weights_tree_remove(bfqd, bfqq,
>> 				  &bfqd->queue_weights_tree);
>> }
>> @@ -7118,7 +7076,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>> 	bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
>>
>> 	bfqd->queue_weights_tree = RB_ROOT_CACHED;
>> -	bfqd->num_groups_with_pending_reqs = 0;
>> +	bfqd->num_groups_with_busy_queues = 0;
>>
>> 	INIT_LIST_HEAD(&bfqd->active_list);
>> 	INIT_LIST_HEAD(&bfqd->idle_list);
>> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
>> index d92adbdd70ee..6c6cd984d769 100644
>> --- a/block/bfq-iosched.h
>> +++ b/block/bfq-iosched.h
>> @@ -197,9 +197,6 @@ struct bfq_entity {
>> 	/* flag, set to request a weight, ioprio or ioprio_class change  */
>> 	int prio_changed;
>>
>> -	/* flag, set if the entity is counted in groups_with_pending_reqs */
>> -	bool in_groups_with_pending_reqs;
>> -
>> 	/* last child queue of entity created (for non-leaf entities) */
>> 	struct bfq_queue *last_bfqq_created;
>> };
>> @@ -496,52 +493,14 @@ struct bfq_data {
>> 	struct rb_root_cached queue_weights_tree;
>>
>> 	/*
>> -	 * Number of groups with at least one descendant process that
>> -	 * has at least one request waiting for completion. Note that
>> -	 * this accounts for also requests already dispatched, but not
>> -	 * yet completed. Therefore this number of groups may differ
>> -	 * (be larger) than the number of active groups, as a group is
>> -	 * considered active only if its corresponding entity has
>> -	 * descendant queues with at least one request queued. This
>> -	 * number is used to decide whether a scenario is symmetric.
>> -	 * For a detailed explanation see comments on the computation
>> -	 * of the variable asymmetric_scenario in the function
>> -	 * bfq_better_to_idle().
>> -	 *
>> -	 * However, it is hard to compute this number exactly, for
>> -	 * groups with multiple descendant processes. Consider a group
>> -	 * that is inactive, i.e., that has no descendant process with
>> -	 * pending I/O inside BFQ queues. Then suppose that
>> -	 * num_groups_with_pending_reqs is still accounting for this
>> -	 * group, because the group has descendant processes with some
>> -	 * I/O request still in flight. num_groups_with_pending_reqs
>> -	 * should be decremented when the in-flight request of the
>> -	 * last descendant process is finally completed (assuming that
>> -	 * nothing else has changed for the group in the meantime, in
>> -	 * terms of composition of the group and active/inactive state of child
>> -	 * groups and processes). To accomplish this, an additional
>> -	 * pending-request counter must be added to entities, and must
>> -	 * be updated correctly. To avoid this additional field and operations,
>> -	 * we resort to the following tradeoff between simplicity and
>> -	 * accuracy: for an inactive group that is still counted in
>> -	 * num_groups_with_pending_reqs, we decrement
>> -	 * num_groups_with_pending_reqs when the first descendant
>> -	 * process of the group remains with no request waiting for
>> -	 * completion.
>> -	 *
>> -	 * Even this simpler decrement strategy requires a little
>> -	 * carefulness: to avoid multiple decrements, we flag a group,
>> -	 * more precisely an entity representing a group, as still
>> -	 * counted in num_groups_with_pending_reqs when it becomes
>> -	 * inactive. Then, when the first descendant queue of the
>> -	 * entity remains with no request waiting for completion,
>> -	 * num_groups_with_pending_reqs is decremented, and this flag
>> -	 * is reset. After this flag is reset for the entity,
>> -	 * num_groups_with_pending_reqs won't be decremented any
>> -	 * longer in case a new descendant queue of the entity remains
>> -	 * with no request waiting for completion.
>> +	 * Number of groups with at least one bfqq that is marked busy,
>> +	 * and this number is used to decide whether a scenario is symmetric.
>> +	 * Note that bfqq is busy doesn't mean that the bfqq contains requests.
>> +	 * If idling is needed for service guarantees, bfqq will stay busy
>> +	 * after dispatching the last request, see details in
>> +	 * __bfq_bfqq_expire().
>> 	 */
>> -	unsigned int num_groups_with_pending_reqs;
>> +	unsigned int num_groups_with_busy_queues;
>>
>> 	/*
>> 	 * Per-class (RT, BE, IDLE) number of bfq_queues containing
>> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
>> index b97e33688335..48ca7922035c 100644
>> --- a/block/bfq-wf2q.c
>> +++ b/block/bfq-wf2q.c
>> @@ -221,13 +221,15 @@ static bool bfq_no_longer_next_in_service(struct bfq_entity *entity)
>> static void bfq_inc_busy_queues(struct bfq_queue *bfqq)
>> {
>> 	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]++;
>> -	bfqq_group(bfqq)->busy_queues++;
>> +	if (!(bfqq_group(bfqq)->busy_queues++))
>> +		bfqq->bfqd->num_groups_with_busy_queues++;
>> }
>>
>> static void bfq_dec_busy_queues(struct bfq_queue *bfqq)
>> {
>> 	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]--;
>> -	bfqq_group(bfqq)->busy_queues--;
>> +	if (!(--bfqq_group(bfqq)->busy_queues))
>> +		bfqq->bfqd->num_groups_with_busy_queues--;
>> }
>>
>> #else /* CONFIG_BFQ_GROUP_IOSCHED */
>> @@ -1006,19 +1008,6 @@ static void __bfq_activate_entity(struct bfq_entity *entity,
>> 		entity->on_st_or_in_serv = true;
>> 	}
>>
>> -#ifdef CONFIG_BFQ_GROUP_IOSCHED
>> -	if (!bfq_entity_to_bfqq(entity)) { /* bfq_group */
>> -		struct bfq_group *bfqg =
>> -			container_of(entity, struct bfq_group, entity);
>> -		struct bfq_data *bfqd = bfqg->bfqd;
>> -
>> -		if (!entity->in_groups_with_pending_reqs) {
>> -			entity->in_groups_with_pending_reqs = true;
>> -			bfqd->num_groups_with_pending_reqs++;
>> -		}
>> -	}
>> -#endif
>> -
>> 	bfq_update_fin_time_enqueue(entity, st, backshifted);
>> }
>>
>> -- 
>> 2.31.1
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-30  8:34       ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-30  8:34 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Tejun Heo, Jens Axboe, cgroups, linux-block, LKML, yi.zhang

ÔÚ 2022/05/30 16:10, Paolo Valente дµÀ:
> 
> 
>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>
>> Currently, bfq can't handle sync io concurrently as long as they
>> are not issued from root group. This is because
>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>> bfq_asymmetric_scenario().
>>
>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>
>> Before this patch:
>> 1) root group will never be counted.
>> 2) Count if bfqg or it's child bfqgs have pending requests.
>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>
>> After this patch:
>> 1) root group is counted.
>> 2) Count if bfqg have at least one bfqq that is marked busy.
>> 3) Don't count if bfqg doesn't have any busy bfqqs.
> 
> Unfortunately, I see a last problem here. I see a double change:
> (1) a bfqg is now counted only as a function of the state of its child
>      queues, and not of also its child bfqgs
> (2) the state considered for counting a bfqg moves from having pending
>      requests to having busy queues
> 
> I'm ok with with (1), which is a good catch (you are lady explained
> the idea to me some time ago IIRC).
> 
> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
> still has in-flight I/O, i.e.  I/O being served in the drive.  The
> weight of such a bfqq must still be considered in the weights_tree,
> and the group containing such a queue must still be counted when
> checking whether the scenario is asymmetric.  Otherwise service
> guarantees are broken.  The reason is that, if a scenario is deemed as
> symmetric because in-flight I/O is not taken into account, then idling
> will not be performed to protect some bfqq, and in-flight I/O may
> steal bandwidth to that bfqq in an uncontrolled way.
Hi, Paolo

Thanks for your explanation.

My orginal thoughts was using weights_tree insertion/removal, however,
Jan convinced me that using bfq_add/del_bfqq_busy() is ok.

 From what I see, when bfqq dispatch the last request,
bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
idling is needed, and it will delayed to when such bfqq get scheduled as
in-service queue again. Which means the weight of such bfqq should still
be considered in the weights_tree.

I also run some tests on null_blk with "irqmode=2
completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
that service guarantees are still preserved on slow device.

Do you this is strong enough to cover your concern?

Thanks,
Kuai
> 
> I verified this also experimentally a few years ago, when I added this
> weights_tree stuff.  That's the rationale behind the part of
> bfq_weights_tree_remove that this patch eliminates.  IOW,
> for a bfqq and its parent bfqg to be out of the count for symmetry,
> all bfqq's requests must also be completed.
> 
> Thanks,
> Paolo
> 
>>
>> The main reason to use busy state of bfqq instead of 'pending requests'
>> is that bfqq can stay busy after dispatching the last request if idling
>> is needed for service guarantees.
>>
>> With this change, the occasion that only one group is activated can be
>> detected, and next patch will support concurrent sync io in the
>> occasion.
>>
>> This patch also rename 'num_groups_with_pending_reqs' to
>> 'num_groups_with_busy_queues'.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> Reviewed-by: Jan Kara <jack@suse.cz>
>> ---
>> block/bfq-iosched.c | 46 ++-----------------------------------
>> block/bfq-iosched.h | 55 ++++++---------------------------------------
>> block/bfq-wf2q.c    | 19 ++++------------
>> 3 files changed, 13 insertions(+), 107 deletions(-)
>>
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index 0d46cb728bbf..eb1da1bd5eb4 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -852,7 +852,7 @@ static bool bfq_asymmetric_scenario(struct bfq_data *bfqd,
>>
>> 	return varied_queue_weights || multiple_classes_busy
>> #ifdef CONFIG_BFQ_GROUP_IOSCHED
>> -	       || bfqd->num_groups_with_pending_reqs > 0
>> +	       || bfqd->num_groups_with_busy_queues > 0
>> #endif
>> 		;
>> }
>> @@ -970,48 +970,6 @@ void __bfq_weights_tree_remove(struct bfq_data *bfqd,
>> void bfq_weights_tree_remove(struct bfq_data *bfqd,
>> 			     struct bfq_queue *bfqq)
>> {
>> -	struct bfq_entity *entity = bfqq->entity.parent;
>> -
>> -	for_each_entity(entity) {
>> -		struct bfq_sched_data *sd = entity->my_sched_data;
>> -
>> -		if (sd->next_in_service || sd->in_service_entity) {
>> -			/*
>> -			 * entity is still active, because either
>> -			 * next_in_service or in_service_entity is not
>> -			 * NULL (see the comments on the definition of
>> -			 * next_in_service for details on why
>> -			 * in_service_entity must be checked too).
>> -			 *
>> -			 * As a consequence, its parent entities are
>> -			 * active as well, and thus this loop must
>> -			 * stop here.
>> -			 */
>> -			break;
>> -		}
>> -
>> -		/*
>> -		 * The decrement of num_groups_with_pending_reqs is
>> -		 * not performed immediately upon the deactivation of
>> -		 * entity, but it is delayed to when it also happens
>> -		 * that the first leaf descendant bfqq of entity gets
>> -		 * all its pending requests completed. The following
>> -		 * instructions perform this delayed decrement, if
>> -		 * needed. See the comments on
>> -		 * num_groups_with_pending_reqs for details.
>> -		 */
>> -		if (entity->in_groups_with_pending_reqs) {
>> -			entity->in_groups_with_pending_reqs = false;
>> -			bfqd->num_groups_with_pending_reqs--;
>> -		}
>> -	}
>> -
>> -	/*
>> -	 * Next function is invoked last, because it causes bfqq to be
>> -	 * freed if the following holds: bfqq is not in service and
>> -	 * has no dispatched request. DO NOT use bfqq after the next
>> -	 * function invocation.
>> -	 */
>> 	__bfq_weights_tree_remove(bfqd, bfqq,
>> 				  &bfqd->queue_weights_tree);
>> }
>> @@ -7118,7 +7076,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>> 	bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
>>
>> 	bfqd->queue_weights_tree = RB_ROOT_CACHED;
>> -	bfqd->num_groups_with_pending_reqs = 0;
>> +	bfqd->num_groups_with_busy_queues = 0;
>>
>> 	INIT_LIST_HEAD(&bfqd->active_list);
>> 	INIT_LIST_HEAD(&bfqd->idle_list);
>> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
>> index d92adbdd70ee..6c6cd984d769 100644
>> --- a/block/bfq-iosched.h
>> +++ b/block/bfq-iosched.h
>> @@ -197,9 +197,6 @@ struct bfq_entity {
>> 	/* flag, set to request a weight, ioprio or ioprio_class change  */
>> 	int prio_changed;
>>
>> -	/* flag, set if the entity is counted in groups_with_pending_reqs */
>> -	bool in_groups_with_pending_reqs;
>> -
>> 	/* last child queue of entity created (for non-leaf entities) */
>> 	struct bfq_queue *last_bfqq_created;
>> };
>> @@ -496,52 +493,14 @@ struct bfq_data {
>> 	struct rb_root_cached queue_weights_tree;
>>
>> 	/*
>> -	 * Number of groups with at least one descendant process that
>> -	 * has at least one request waiting for completion. Note that
>> -	 * this accounts for also requests already dispatched, but not
>> -	 * yet completed. Therefore this number of groups may differ
>> -	 * (be larger) than the number of active groups, as a group is
>> -	 * considered active only if its corresponding entity has
>> -	 * descendant queues with at least one request queued. This
>> -	 * number is used to decide whether a scenario is symmetric.
>> -	 * For a detailed explanation see comments on the computation
>> -	 * of the variable asymmetric_scenario in the function
>> -	 * bfq_better_to_idle().
>> -	 *
>> -	 * However, it is hard to compute this number exactly, for
>> -	 * groups with multiple descendant processes. Consider a group
>> -	 * that is inactive, i.e., that has no descendant process with
>> -	 * pending I/O inside BFQ queues. Then suppose that
>> -	 * num_groups_with_pending_reqs is still accounting for this
>> -	 * group, because the group has descendant processes with some
>> -	 * I/O request still in flight. num_groups_with_pending_reqs
>> -	 * should be decremented when the in-flight request of the
>> -	 * last descendant process is finally completed (assuming that
>> -	 * nothing else has changed for the group in the meantime, in
>> -	 * terms of composition of the group and active/inactive state of child
>> -	 * groups and processes). To accomplish this, an additional
>> -	 * pending-request counter must be added to entities, and must
>> -	 * be updated correctly. To avoid this additional field and operations,
>> -	 * we resort to the following tradeoff between simplicity and
>> -	 * accuracy: for an inactive group that is still counted in
>> -	 * num_groups_with_pending_reqs, we decrement
>> -	 * num_groups_with_pending_reqs when the first descendant
>> -	 * process of the group remains with no request waiting for
>> -	 * completion.
>> -	 *
>> -	 * Even this simpler decrement strategy requires a little
>> -	 * carefulness: to avoid multiple decrements, we flag a group,
>> -	 * more precisely an entity representing a group, as still
>> -	 * counted in num_groups_with_pending_reqs when it becomes
>> -	 * inactive. Then, when the first descendant queue of the
>> -	 * entity remains with no request waiting for completion,
>> -	 * num_groups_with_pending_reqs is decremented, and this flag
>> -	 * is reset. After this flag is reset for the entity,
>> -	 * num_groups_with_pending_reqs won't be decremented any
>> -	 * longer in case a new descendant queue of the entity remains
>> -	 * with no request waiting for completion.
>> +	 * Number of groups with at least one bfqq that is marked busy,
>> +	 * and this number is used to decide whether a scenario is symmetric.
>> +	 * Note that bfqq is busy doesn't mean that the bfqq contains requests.
>> +	 * If idling is needed for service guarantees, bfqq will stay busy
>> +	 * after dispatching the last request, see details in
>> +	 * __bfq_bfqq_expire().
>> 	 */
>> -	unsigned int num_groups_with_pending_reqs;
>> +	unsigned int num_groups_with_busy_queues;
>>
>> 	/*
>> 	 * Per-class (RT, BE, IDLE) number of bfq_queues containing
>> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
>> index b97e33688335..48ca7922035c 100644
>> --- a/block/bfq-wf2q.c
>> +++ b/block/bfq-wf2q.c
>> @@ -221,13 +221,15 @@ static bool bfq_no_longer_next_in_service(struct bfq_entity *entity)
>> static void bfq_inc_busy_queues(struct bfq_queue *bfqq)
>> {
>> 	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]++;
>> -	bfqq_group(bfqq)->busy_queues++;
>> +	if (!(bfqq_group(bfqq)->busy_queues++))
>> +		bfqq->bfqd->num_groups_with_busy_queues++;
>> }
>>
>> static void bfq_dec_busy_queues(struct bfq_queue *bfqq)
>> {
>> 	bfqq->bfqd->busy_queues[bfqq->ioprio_class - 1]--;
>> -	bfqq_group(bfqq)->busy_queues--;
>> +	if (!(--bfqq_group(bfqq)->busy_queues))
>> +		bfqq->bfqd->num_groups_with_busy_queues--;
>> }
>>
>> #else /* CONFIG_BFQ_GROUP_IOSCHED */
>> @@ -1006,19 +1008,6 @@ static void __bfq_activate_entity(struct bfq_entity *entity,
>> 		entity->on_st_or_in_serv = true;
>> 	}
>>
>> -#ifdef CONFIG_BFQ_GROUP_IOSCHED
>> -	if (!bfq_entity_to_bfqq(entity)) { /* bfq_group */
>> -		struct bfq_group *bfqg =
>> -			container_of(entity, struct bfq_group, entity);
>> -		struct bfq_data *bfqd = bfqg->bfqd;
>> -
>> -		if (!entity->in_groups_with_pending_reqs) {
>> -			entity->in_groups_with_pending_reqs = true;
>> -			bfqd->num_groups_with_pending_reqs++;
>> -		}
>> -	}
>> -#endif
>> -
>> 	bfq_update_fin_time_enqueue(entity, st, backshifted);
>> }
>>
>> -- 
>> 2.31.1
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
       [not found]       ` <efe01dd1-0f99-dadf-956d-b0e80e1e602c@huawei.com>
@ 2022-05-31  8:36         ` Paolo VALENTE
  2022-05-31  9:06             ` Yu Kuai
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo VALENTE @ 2022-05-31  8:36 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jan Kara, Jens Axboe, Tejun Heo, cgroups, linux-block, LKML, yi.zhang



> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
> 
> 在 2022/05/30 16:34, Yu Kuai 写道:
>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>> 
>>> 
>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>> 
>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>> are not issued from root group. This is because
>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>> bfq_asymmetric_scenario().
>>>> 
>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>> 
>>>> Before this patch:
>>>> 1) root group will never be counted.
>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>> 
>>>> After this patch:
>>>> 1) root group is counted.
>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>> 
>>> Unfortunately, I see a last problem here. I see a double change:
>>> (1) a bfqg is now counted only as a function of the state of its child
>>>      queues, and not of also its child bfqgs
>>> (2) the state considered for counting a bfqg moves from having pending
>>>      requests to having busy queues
>>> 
>>> I'm ok with with (1), which is a good catch (you are lady explained
>>> the idea to me some time ago IIRC).
>>> 
>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>> weight of such a bfqq must still be considered in the weights_tree,
>>> and the group containing such a queue must still be counted when
>>> checking whether the scenario is asymmetric.  Otherwise service
>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>> symmetric because in-flight I/O is not taken into account, then idling
>>> will not be performed to protect some bfqq, and in-flight I/O may
>>> steal bandwidth to that bfqq in an uncontrolled way.
>> Hi, Paolo
>> Thanks for your explanation.
>> My orginal thoughts was using weights_tree insertion/removal, however,
>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>> From what I see, when bfqq dispatch the last request,
>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>> idling is needed, and it will delayed to when such bfqq get scheduled as
>> in-service queue again. Which means the weight of such bfqq should still
>> be considered in the weights_tree.
>> I also run some tests on null_blk with "irqmode=2
>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>> that service guarantees are still preserved on slow device.
>> Do you this is strong enough to cover your concern?

Unfortunately it is not.  Your very argument is what made be believe
that considering busy queues was enough, in the first place.  But, as
I found out, the problem is caused by the queues that do not enjoy
idling.  With your patch (as well as in my initial version) they are
not counted when they remain without requests queued.  And this makes
asymmetric scenarios be considered erroneously as symmetric.  The
consequence is that idling gets switched off when it had to be kept
on, and control on bandwidth is lost for the victim in-service queues.

Thanks,
Paolo

>> Thanks,
>> Kuai
>>> 
>>> I verified this also experimentally a few years ago, when I added this
>>> weights_tree stuff.  That's the rationale behind the part of
>>> bfq_weights_tree_remove that this patch eliminates.  IOW,
>>> for a bfqq and its parent bfqg to be out of the count for symmetry,
>>> all bfqq's requests must also be completed.
>>> 
>>> Thanks,
>>> Paolo
> 
> I forgot to cc Jan for this patchset... This is a reply for Jan.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
  2022-05-31  8:36         ` Paolo VALENTE
@ 2022-05-31  9:06             ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-31  9:06 UTC (permalink / raw)
  To: Paolo VALENTE
  Cc: Jan Kara, Jens Axboe, Tejun Heo, cgroups, linux-block, LKML, yi.zhang

在 2022/05/31 16:36, Paolo VALENTE 写道:
> 
> 
>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>
>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>
>>>>
>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>
>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>> are not issued from root group. This is because
>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>> bfq_asymmetric_scenario().
>>>>>
>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>
>>>>> Before this patch:
>>>>> 1) root group will never be counted.
>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>
>>>>> After this patch:
>>>>> 1) root group is counted.
>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>
>>>> Unfortunately, I see a last problem here. I see a double change:
>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>       queues, and not of also its child bfqgs
>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>       requests to having busy queues
>>>>
>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>> the idea to me some time ago IIRC).
>>>>
>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>> and the group containing such a queue must still be counted when
>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>> Hi, Paolo
>>> Thanks for your explanation.
>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>  From what I see, when bfqq dispatch the last request,
>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>> in-service queue again. Which means the weight of such bfqq should still
>>> be considered in the weights_tree.
>>> I also run some tests on null_blk with "irqmode=2
>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>> that service guarantees are still preserved on slow device.
>>> Do you this is strong enough to cover your concern?
> 
> Unfortunately it is not.  Your very argument is what made be believe
> that considering busy queues was enough, in the first place.  But, as
> I found out, the problem is caused by the queues that do not enjoy
> idling.  With your patch (as well as in my initial version) they are
> not counted when they remain without requests queued.  And this makes
> asymmetric scenarios be considered erroneously as symmetric.  The
> consequence is that idling gets switched off when it had to be kept
> on, and control on bandwidth is lost for the victim in-service queues.

Hi,Paolo

Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
idling, then such bfqq will clear busy after dispatching the last
request?

Please kindly correct me if I'm wrong in the following process:

If there are more than one bfqg that is activatied, then bfqqs that are
not enjoying idle are still left busy after dispatching the last
request.

details in __bfq_bfqq_expire:

         if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
         ┊   !(reason == BFQQE_PREEMPTED &&
         ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
-> idling_needed_for_service_guarantees will always return true,
bfqq(whether or not enjoy idling) will stay busy.
                 if (bfqq->dispatched == 0)
                         /*
                         ┊* Overloading budget_timeout field to store
                         ┊* the time at which the queue remains with no
                         ┊* backlog and no outstanding request; used by
                         ┊* the weight-raising mechanism.
                         ┊*/
                         bfqq->budget_timeout = jiffies;

                 bfq_del_bfqq_busy(bfqd, bfqq, true);

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-31  9:06             ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-31  9:06 UTC (permalink / raw)
  To: Paolo VALENTE
  Cc: Jan Kara, Jens Axboe, Tejun Heo, cgroups, linux-block, LKML, yi.zhang

在 2022/05/31 16:36, Paolo VALENTE 写道:
> 
> 
>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>
>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>
>>>>
>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>
>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>> are not issued from root group. This is because
>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>> bfq_asymmetric_scenario().
>>>>>
>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>
>>>>> Before this patch:
>>>>> 1) root group will never be counted.
>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>
>>>>> After this patch:
>>>>> 1) root group is counted.
>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>
>>>> Unfortunately, I see a last problem here. I see a double change:
>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>       queues, and not of also its child bfqgs
>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>       requests to having busy queues
>>>>
>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>> the idea to me some time ago IIRC).
>>>>
>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>> and the group containing such a queue must still be counted when
>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>> Hi, Paolo
>>> Thanks for your explanation.
>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>  From what I see, when bfqq dispatch the last request,
>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>> in-service queue again. Which means the weight of such bfqq should still
>>> be considered in the weights_tree.
>>> I also run some tests on null_blk with "irqmode=2
>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>> that service guarantees are still preserved on slow device.
>>> Do you this is strong enough to cover your concern?
> 
> Unfortunately it is not.  Your very argument is what made be believe
> that considering busy queues was enough, in the first place.  But, as
> I found out, the problem is caused by the queues that do not enjoy
> idling.  With your patch (as well as in my initial version) they are
> not counted when they remain without requests queued.  And this makes
> asymmetric scenarios be considered erroneously as symmetric.  The
> consequence is that idling gets switched off when it had to be kept
> on, and control on bandwidth is lost for the victim in-service queues.

Hi,Paolo

Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
idling, then such bfqq will clear busy after dispatching the last
request?

Please kindly correct me if I'm wrong in the following process:

If there are more than one bfqg that is activatied, then bfqqs that are
not enjoying idle are still left busy after dispatching the last
request.

details in __bfq_bfqq_expire:

         if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
         ┊   !(reason == BFQQE_PREEMPTED &&
         ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
-> idling_needed_for_service_guarantees will always return true,
bfqq(whether or not enjoy idling) will stay busy.
                 if (bfqq->dispatched == 0)
                         /*
                         ┊* Overloading budget_timeout field to store
                         ┊* the time at which the queue remains with no
                         ┊* backlog and no outstanding request; used by
                         ┊* the weight-raising mechanism.
                         ┊*/
                         bfqq->budget_timeout = jiffies;

                 bfq_del_bfqq_busy(bfqd, bfqq, true);

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
  2022-05-31  9:06             ` Yu Kuai
  (?)
@ 2022-05-31  9:19             ` Paolo Valente
  2022-05-31  9:24                 ` Yu Kuai
  2022-05-31  9:33                 ` Yu Kuai
  -1 siblings, 2 replies; 28+ messages in thread
From: Paolo Valente @ 2022-05-31  9:19 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jan Kara, Jens Axboe, Tejun Heo, cgroups, linux-block, LKML, yi.zhang



> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3@huawei.com> ha scritto:
> 
> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>> 
>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>> 
>>>>> 
>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>> 
>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>> are not issued from root group. This is because
>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>> bfq_asymmetric_scenario().
>>>>>> 
>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>> 
>>>>>> Before this patch:
>>>>>> 1) root group will never be counted.
>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>> 
>>>>>> After this patch:
>>>>>> 1) root group is counted.
>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>> 
>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>      queues, and not of also its child bfqgs
>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>      requests to having busy queues
>>>>> 
>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>> the idea to me some time ago IIRC).
>>>>> 
>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>> and the group containing such a queue must still be counted when
>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>> Hi, Paolo
>>>> Thanks for your explanation.
>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>> From what I see, when bfqq dispatch the last request,
>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>> in-service queue again. Which means the weight of such bfqq should still
>>>> be considered in the weights_tree.
>>>> I also run some tests on null_blk with "irqmode=2
>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>> that service guarantees are still preserved on slow device.
>>>> Do you this is strong enough to cover your concern?
>> Unfortunately it is not.  Your very argument is what made be believe
>> that considering busy queues was enough, in the first place.  But, as
>> I found out, the problem is caused by the queues that do not enjoy
>> idling.  With your patch (as well as in my initial version) they are
>> not counted when they remain without requests queued.  And this makes
>> asymmetric scenarios be considered erroneously as symmetric.  The
>> consequence is that idling gets switched off when it had to be kept
>> on, and control on bandwidth is lost for the victim in-service queues.
> 
> Hi,Paolo
> 
> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
> idling, then such bfqq will clear busy after dispatching the last
> request?
> 
> Please kindly correct me if I'm wrong in the following process:
> 
> If there are more than one bfqg that is activatied, then bfqqs that are
> not enjoying idle are still left busy after dispatching the last
> request.
> 
> details in __bfq_bfqq_expire:
> 
>        if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>        ┊   !(reason == BFQQE_PREEMPTED &&
>        ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
> -> idling_needed_for_service_guarantees will always return true,

It returns true only is the scenario is symmetric.  Not counting bfqqs
with in-flight requests makes an asymmetric scenario be considered
wrongly symmetric.  See function bfq_asymmetric_scenario().

Paolo

> bfqq(whether or not enjoy idling) will stay busy.
>                if (bfqq->dispatched == 0)
>                        /*
>                        ┊* Overloading budget_timeout field to store
>                        ┊* the time at which the queue remains with no
>                        ┊* backlog and no outstanding request; used by
>                        ┊* the weight-raising mechanism.
>                        ┊*/
>                        bfqq->budget_timeout = jiffies;
> 
>                bfq_del_bfqq_busy(bfqd, bfqq, true);
> 
> Thanks,
> Kuai


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
  2022-05-31  9:19             ` Paolo Valente
@ 2022-05-31  9:24                 ` Yu Kuai
  2022-05-31  9:33                 ` Yu Kuai
  1 sibling, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-31  9:24 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jan Kara, Jens Axboe, Tejun Heo, cgroups, linux-block, LKML, yi.zhang

在 2022/05/31 17:19, Paolo Valente 写道:
> 
> 
>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>
>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>
>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>
>>>>>>
>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>>>
>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>> are not issued from root group. This is because
>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>> bfq_asymmetric_scenario().
>>>>>>>
>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>
>>>>>>> Before this patch:
>>>>>>> 1) root group will never be counted.
>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>
>>>>>>> After this patch:
>>>>>>> 1) root group is counted.
>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>
>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>       queues, and not of also its child bfqgs
>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>       requests to having busy queues
>>>>>>
>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>> the idea to me some time ago IIRC).
>>>>>>
>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>> and the group containing such a queue must still be counted when
>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>> Hi, Paolo
>>>>> Thanks for your explanation.
>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>  From what I see, when bfqq dispatch the last request,
>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>> be considered in the weights_tree.
>>>>> I also run some tests on null_blk with "irqmode=2
>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>> that service guarantees are still preserved on slow device.
>>>>> Do you this is strong enough to cover your concern?
>>> Unfortunately it is not.  Your very argument is what made be believe
>>> that considering busy queues was enough, in the first place.  But, as
>>> I found out, the problem is caused by the queues that do not enjoy
>>> idling.  With your patch (as well as in my initial version) they are
>>> not counted when they remain without requests queued.  And this makes
>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>> consequence is that idling gets switched off when it had to be kept
>>> on, and control on bandwidth is lost for the victim in-service queues.
>>
>> Hi,Paolo
>>
>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>> idling, then such bfqq will clear busy after dispatching the last
>> request?
>>
>> Please kindly correct me if I'm wrong in the following process:
>>
>> If there are more than one bfqg that is activatied, then bfqqs that are
>> not enjoying idle are still left busy after dispatching the last
>> request.
>>
>> details in __bfq_bfqq_expire:
>>
>>         if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>         ┊   !(reason == BFQQE_PREEMPTED &&
>>         ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>> -> idling_needed_for_service_guarantees will always return true,
> 
> It returns true only is the scenario is symmetric.  Not counting bfqqs
> with in-flight requests makes an asymmetric scenario be considered
> wrongly symmetric.  See function bfq_asymmetric_scenario().
Hi,

Yes, with this patchset, If there are more than one bfqg that is
activatied(contain busy bfqq), bfq_asymmetric_scenario() will return
true:

bfq_asymmetric_scenario()
  	return varied_queue_weights || multiple_classes_busy
  #ifdef CONFIG_BFQ_GROUP_IOSCHED
	       || bfqd->num_groups_with_busy_queues > 1
  #endif

 From what I see, bfqd->num_groups_with_busy_queues > 1 is always true...
> 
> Paolo
> 
>> bfqq(whether or not enjoy idling) will stay busy.
>>                 if (bfqq->dispatched == 0)
>>                         /*
>>                         ┊* Overloading budget_timeout field to store
>>                         ┊* the time at which the queue remains with no
>>                         ┊* backlog and no outstanding request; used by
>>                         ┊* the weight-raising mechanism.
>>                         ┊*/
>>                         bfqq->budget_timeout = jiffies;
>>
>>                 bfq_del_bfqq_busy(bfqd, bfqq, true);
>>
>> Thanks,
>> Kuai
> 
> .
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-31  9:24                 ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-31  9:24 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jan Kara, Jens Axboe, Tejun Heo, cgroups, linux-block, LKML, yi.zhang

在 2022/05/31 17:19, Paolo Valente 写道:
> 
> 
>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>
>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>
>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>
>>>>>>
>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>>>
>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>> are not issued from root group. This is because
>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>> bfq_asymmetric_scenario().
>>>>>>>
>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>
>>>>>>> Before this patch:
>>>>>>> 1) root group will never be counted.
>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>
>>>>>>> After this patch:
>>>>>>> 1) root group is counted.
>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>
>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>       queues, and not of also its child bfqgs
>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>       requests to having busy queues
>>>>>>
>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>> the idea to me some time ago IIRC).
>>>>>>
>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>> and the group containing such a queue must still be counted when
>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>> Hi, Paolo
>>>>> Thanks for your explanation.
>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>  From what I see, when bfqq dispatch the last request,
>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>> be considered in the weights_tree.
>>>>> I also run some tests on null_blk with "irqmode=2
>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>> that service guarantees are still preserved on slow device.
>>>>> Do you this is strong enough to cover your concern?
>>> Unfortunately it is not.  Your very argument is what made be believe
>>> that considering busy queues was enough, in the first place.  But, as
>>> I found out, the problem is caused by the queues that do not enjoy
>>> idling.  With your patch (as well as in my initial version) they are
>>> not counted when they remain without requests queued.  And this makes
>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>> consequence is that idling gets switched off when it had to be kept
>>> on, and control on bandwidth is lost for the victim in-service queues.
>>
>> Hi,Paolo
>>
>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>> idling, then such bfqq will clear busy after dispatching the last
>> request?
>>
>> Please kindly correct me if I'm wrong in the following process:
>>
>> If there are more than one bfqg that is activatied, then bfqqs that are
>> not enjoying idle are still left busy after dispatching the last
>> request.
>>
>> details in __bfq_bfqq_expire:
>>
>>         if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>         ┊   !(reason == BFQQE_PREEMPTED &&
>>         ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>> -> idling_needed_for_service_guarantees will always return true,
> 
> It returns true only is the scenario is symmetric.  Not counting bfqqs
> with in-flight requests makes an asymmetric scenario be considered
> wrongly symmetric.  See function bfq_asymmetric_scenario().
Hi,

Yes, with this patchset, If there are more than one bfqg that is
activatied(contain busy bfqq), bfq_asymmetric_scenario() will return
true:

bfq_asymmetric_scenario()
  	return varied_queue_weights || multiple_classes_busy
  #ifdef CONFIG_BFQ_GROUP_IOSCHED
	       || bfqd->num_groups_with_busy_queues > 1
  #endif

 From what I see, bfqd->num_groups_with_busy_queues > 1 is always true...
> 
> Paolo
> 
>> bfqq(whether or not enjoy idling) will stay busy.
>>                 if (bfqq->dispatched == 0)
>>                         /*
>>                         ┊* Overloading budget_timeout field to store
>>                         ┊* the time at which the queue remains with no
>>                         ┊* backlog and no outstanding request; used by
>>                         ┊* the weight-raising mechanism.
>>                         ┊*/
>>                         bfqq->budget_timeout = jiffies;
>>
>>                 bfq_del_bfqq_busy(bfqd, bfqq, true);
>>
>> Thanks,
>> Kuai
> 
> .
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-31  9:33                 ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-31  9:33 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jan Kara, Jens Axboe, Tejun Heo, cgroups, linux-block, LKML, yi.zhang

在 2022/05/31 17:19, Paolo Valente 写道:
> 
> 
>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>
>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>
>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>
>>>>>>
>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>>>
>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>> are not issued from root group. This is because
>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>> bfq_asymmetric_scenario().
>>>>>>>
>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>
>>>>>>> Before this patch:
>>>>>>> 1) root group will never be counted.
>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>
>>>>>>> After this patch:
>>>>>>> 1) root group is counted.
>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>
>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>       queues, and not of also its child bfqgs
>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>       requests to having busy queues
>>>>>>
>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>> the idea to me some time ago IIRC).
>>>>>>
>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>> and the group containing such a queue must still be counted when
>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>> Hi, Paolo
>>>>> Thanks for your explanation.
>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>  From what I see, when bfqq dispatch the last request,
>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>> be considered in the weights_tree.
>>>>> I also run some tests on null_blk with "irqmode=2
>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>> that service guarantees are still preserved on slow device.
>>>>> Do you this is strong enough to cover your concern?
>>> Unfortunately it is not.  Your very argument is what made be believe
>>> that considering busy queues was enough, in the first place.  But, as
>>> I found out, the problem is caused by the queues that do not enjoy
>>> idling.  With your patch (as well as in my initial version) they are
>>> not counted when they remain without requests queued.  And this makes
>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>> consequence is that idling gets switched off when it had to be kept
>>> on, and control on bandwidth is lost for the victim in-service queues.
>>
>> Hi,Paolo
>>
>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>> idling, then such bfqq will clear busy after dispatching the last
>> request?
>>
>> Please kindly correct me if I'm wrong in the following process:
>>
>> If there are more than one bfqg that is activatied, then bfqqs that are
>> not enjoying idle are still left busy after dispatching the last
>> request.
>>
>> details in __bfq_bfqq_expire:
>>
>>         if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>         ┊   !(reason == BFQQE_PREEMPTED &&
>>         ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>> -> idling_needed_for_service_guarantees will always return true,
> 
> It returns true only is the scenario is symmetric.  Not counting bfqqs
> with in-flight requests makes an asymmetric scenario be considered
> wrongly symmetric.  See function bfq_asymmetric_scenario().

Hi, Paolo

Do you mean this gap?

1. io1 is issued from bfqq1(from bfqg1)
2. bfqq1 dispatched this io, it's busy is cleared
3. *before io1 is completed*, io2 is issued from bfqq2(bfqg2)
4. with this patchset, while dispatching io2 from bfqq2, the scenario
should be symmetric while it's considered wrongly asymmetric.
> 
> Paolo
> 
>> bfqq(whether or not enjoy idling) will stay busy.
>>                 if (bfqq->dispatched == 0)
>>                         /*
>>                         ┊* Overloading budget_timeout field to store
>>                         ┊* the time at which the queue remains with no
>>                         ┊* backlog and no outstanding request; used by
>>                         ┊* the weight-raising mechanism.
>>                         ┊*/
>>                         bfqq->budget_timeout = jiffies;
>>
>>                 bfq_del_bfqq_busy(bfqd, bfqq, true);
>>
>> Thanks,
>> Kuai
> 
> .
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-31  9:33                 ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-31  9:33 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jan Kara, Jens Axboe, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block, LKML, yi.zhang-hv44wF8Li93QT0dZR+AlfA

在 2022/05/31 17:19, Paolo Valente 写道:
> 
> 
>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>
>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>
>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>
>>>>>>
>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>>>>
>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>> are not issued from root group. This is because
>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>> bfq_asymmetric_scenario().
>>>>>>>
>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>
>>>>>>> Before this patch:
>>>>>>> 1) root group will never be counted.
>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>
>>>>>>> After this patch:
>>>>>>> 1) root group is counted.
>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>
>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>       queues, and not of also its child bfqgs
>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>       requests to having busy queues
>>>>>>
>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>> the idea to me some time ago IIRC).
>>>>>>
>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>> and the group containing such a queue must still be counted when
>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>> Hi, Paolo
>>>>> Thanks for your explanation.
>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>  From what I see, when bfqq dispatch the last request,
>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>> be considered in the weights_tree.
>>>>> I also run some tests on null_blk with "irqmode=2
>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>> that service guarantees are still preserved on slow device.
>>>>> Do you this is strong enough to cover your concern?
>>> Unfortunately it is not.  Your very argument is what made be believe
>>> that considering busy queues was enough, in the first place.  But, as
>>> I found out, the problem is caused by the queues that do not enjoy
>>> idling.  With your patch (as well as in my initial version) they are
>>> not counted when they remain without requests queued.  And this makes
>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>> consequence is that idling gets switched off when it had to be kept
>>> on, and control on bandwidth is lost for the victim in-service queues.
>>
>> Hi,Paolo
>>
>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>> idling, then such bfqq will clear busy after dispatching the last
>> request?
>>
>> Please kindly correct me if I'm wrong in the following process:
>>
>> If there are more than one bfqg that is activatied, then bfqqs that are
>> not enjoying idle are still left busy after dispatching the last
>> request.
>>
>> details in __bfq_bfqq_expire:
>>
>>         if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>         ┊   !(reason == BFQQE_PREEMPTED &&
>>         ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>> -> idling_needed_for_service_guarantees will always return true,
> 
> It returns true only is the scenario is symmetric.  Not counting bfqqs
> with in-flight requests makes an asymmetric scenario be considered
> wrongly symmetric.  See function bfq_asymmetric_scenario().

Hi, Paolo

Do you mean this gap?

1. io1 is issued from bfqq1(from bfqg1)
2. bfqq1 dispatched this io, it's busy is cleared
3. *before io1 is completed*, io2 is issued from bfqq2(bfqg2)
4. with this patchset, while dispatching io2 from bfqq2, the scenario
should be symmetric while it's considered wrongly asymmetric.
> 
> Paolo
> 
>> bfqq(whether or not enjoy idling) will stay busy.
>>                 if (bfqq->dispatched == 0)
>>                         /*
>>                         ┊* Overloading budget_timeout field to store
>>                         ┊* the time at which the queue remains with no
>>                         ┊* backlog and no outstanding request; used by
>>                         ┊* the weight-raising mechanism.
>>                         ┊*/
>>                         bfqq->budget_timeout = jiffies;
>>
>>                 bfq_del_bfqq_busy(bfqd, bfqq, true);
>>
>> Thanks,
>> Kuai
> 
> .
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
  2022-05-31  9:33                 ` Yu Kuai
  (?)
@ 2022-05-31 10:01                 ` Jan Kara
  2022-05-31 10:59                     ` Yu Kuai
  2022-05-31 12:57                     ` Paolo Valente
  -1 siblings, 2 replies; 28+ messages in thread
From: Jan Kara @ 2022-05-31 10:01 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Paolo Valente, Jan Kara, Jens Axboe, Tejun Heo, cgroups,
	linux-block, LKML, yi.zhang

On Tue 31-05-22 17:33:25, Yu Kuai wrote:
> 在 2022/05/31 17:19, Paolo Valente 写道:
> > > Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3@huawei.com> ha scritto:
> > > 
> > > 在 2022/05/31 16:36, Paolo VALENTE 写道:
> > > > > Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
> > > > > 
> > > > > 在 2022/05/30 16:34, Yu Kuai 写道:
> > > > > > 在 2022/05/30 16:10, Paolo Valente 写道:
> > > > > > > 
> > > > > > > 
> > > > > > > > Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
> > > > > > > > 
> > > > > > > > Currently, bfq can't handle sync io concurrently as long as they
> > > > > > > > are not issued from root group. This is because
> > > > > > > > 'bfqd->num_groups_with_pending_reqs > 0' is always true in
> > > > > > > > bfq_asymmetric_scenario().
> > > > > > > > 
> > > > > > > > The way that bfqg is counted into 'num_groups_with_pending_reqs':
> > > > > > > > 
> > > > > > > > Before this patch:
> > > > > > > > 1) root group will never be counted.
> > > > > > > > 2) Count if bfqg or it's child bfqgs have pending requests.
> > > > > > > > 3) Don't count if bfqg and it's child bfqgs complete all the requests.
> > > > > > > > 
> > > > > > > > After this patch:
> > > > > > > > 1) root group is counted.
> > > > > > > > 2) Count if bfqg have at least one bfqq that is marked busy.
> > > > > > > > 3) Don't count if bfqg doesn't have any busy bfqqs.
> > > > > > > 
> > > > > > > Unfortunately, I see a last problem here. I see a double change:
> > > > > > > (1) a bfqg is now counted only as a function of the state of its child
> > > > > > >       queues, and not of also its child bfqgs
> > > > > > > (2) the state considered for counting a bfqg moves from having pending
> > > > > > >       requests to having busy queues
> > > > > > > 
> > > > > > > I'm ok with with (1), which is a good catch (you are lady explained
> > > > > > > the idea to me some time ago IIRC).
> > > > > > > 
> > > > > > > Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
> > > > > > > still has in-flight I/O, i.e.  I/O being served in the drive.  The
> > > > > > > weight of such a bfqq must still be considered in the weights_tree,
> > > > > > > and the group containing such a queue must still be counted when
> > > > > > > checking whether the scenario is asymmetric.  Otherwise service
> > > > > > > guarantees are broken.  The reason is that, if a scenario is deemed as
> > > > > > > symmetric because in-flight I/O is not taken into account, then idling
> > > > > > > will not be performed to protect some bfqq, and in-flight I/O may
> > > > > > > steal bandwidth to that bfqq in an uncontrolled way.
> > > > > > Hi, Paolo
> > > > > > Thanks for your explanation.
> > > > > > My orginal thoughts was using weights_tree insertion/removal, however,
> > > > > > Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
> > > > > >  From what I see, when bfqq dispatch the last request,
> > > > > > bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
> > > > > > idling is needed, and it will delayed to when such bfqq get scheduled as
> > > > > > in-service queue again. Which means the weight of such bfqq should still
> > > > > > be considered in the weights_tree.
> > > > > > I also run some tests on null_blk with "irqmode=2
> > > > > > completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
> > > > > > that service guarantees are still preserved on slow device.
> > > > > > Do you this is strong enough to cover your concern?
> > > > Unfortunately it is not.  Your very argument is what made be believe
> > > > that considering busy queues was enough, in the first place.  But, as
> > > > I found out, the problem is caused by the queues that do not enjoy
> > > > idling.  With your patch (as well as in my initial version) they are
> > > > not counted when they remain without requests queued.  And this makes
> > > > asymmetric scenarios be considered erroneously as symmetric.  The
> > > > consequence is that idling gets switched off when it had to be kept
> > > > on, and control on bandwidth is lost for the victim in-service queues.
> > > 
> > > Hi,Paolo
> > > 
> > > Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
> > > idling, then such bfqq will clear busy after dispatching the last
> > > request?
> > > 
> > > Please kindly correct me if I'm wrong in the following process:
> > > 
> > > If there are more than one bfqg that is activatied, then bfqqs that are
> > > not enjoying idle are still left busy after dispatching the last
> > > request.
> > > 
> > > details in __bfq_bfqq_expire:
> > > 
> > >         if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
> > >         ┊   !(reason == BFQQE_PREEMPTED &&
> > >         ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
> > > -> idling_needed_for_service_guarantees will always return true,
> > 
> > It returns true only is the scenario is symmetric.  Not counting bfqqs
> > with in-flight requests makes an asymmetric scenario be considered
> > wrongly symmetric.  See function bfq_asymmetric_scenario().
> 
> Hi, Paolo
> 
> Do you mean this gap?
> 
> 1. io1 is issued from bfqq1(from bfqg1)
> 2. bfqq1 dispatched this io, it's busy is cleared
> 3. *before io1 is completed*, io2 is issued from bfqq2(bfqg2)

Yes. So as far as I understand Paolo is concerned about this scenario.

> 4. with this patchset, while dispatching io2 from bfqq2, the scenario
> should be symmetric while it's considered wrongly asymmetric.

But with this patchset, we will consider this scenario symmetric because at
any point in time there is only one busy bfqq. Before, we considered this
scenario asymmetric because two different bfq groups have bfqq in their
weights_tree. So before this patchset
idling_needed_for_service_guarantees() returned true, after this patchset
the function returns false so we won't idle anymore and Paolo argues that
bfqq1 does not get adequate protection from bfqq2 as a result.

I agree with Paolo this seems possible. The fix is relatively simple though
- instead of changing how weights_tree is used for weight raised queues as
you did originally, I'd move the accounting of groups with pending requests
to bfq_add/del_bfqq_busy() and bfq_completed_request().

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-31 10:59                     ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-31 10:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: Paolo Valente, Jens Axboe, Tejun Heo, cgroups, linux-block, LKML,
	yi.zhang

在 2022/05/31 18:01, Jan Kara 写道:
> On Tue 31-05-22 17:33:25, Yu Kuai wrote:
>> 在 2022/05/31 17:19, Paolo Valente 写道:
>>>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>
>>>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>>
>>>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>>>>>
>>>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>>>> are not issued from root group. This is because
>>>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>>>> bfq_asymmetric_scenario().
>>>>>>>>>
>>>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>>>
>>>>>>>>> Before this patch:
>>>>>>>>> 1) root group will never be counted.
>>>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>>>
>>>>>>>>> After this patch:
>>>>>>>>> 1) root group is counted.
>>>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>>>
>>>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>>>        queues, and not of also its child bfqgs
>>>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>>>        requests to having busy queues
>>>>>>>>
>>>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>>>> the idea to me some time ago IIRC).
>>>>>>>>
>>>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>>>> and the group containing such a queue must still be counted when
>>>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>>>> Hi, Paolo
>>>>>>> Thanks for your explanation.
>>>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>>>   From what I see, when bfqq dispatch the last request,
>>>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>>>> be considered in the weights_tree.
>>>>>>> I also run some tests on null_blk with "irqmode=2
>>>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>>>> that service guarantees are still preserved on slow device.
>>>>>>> Do you this is strong enough to cover your concern?
>>>>> Unfortunately it is not.  Your very argument is what made be believe
>>>>> that considering busy queues was enough, in the first place.  But, as
>>>>> I found out, the problem is caused by the queues that do not enjoy
>>>>> idling.  With your patch (as well as in my initial version) they are
>>>>> not counted when they remain without requests queued.  And this makes
>>>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>>>> consequence is that idling gets switched off when it had to be kept
>>>>> on, and control on bandwidth is lost for the victim in-service queues.
>>>>
>>>> Hi,Paolo
>>>>
>>>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>>>> idling, then such bfqq will clear busy after dispatching the last
>>>> request?
>>>>
>>>> Please kindly correct me if I'm wrong in the following process:
>>>>
>>>> If there are more than one bfqg that is activatied, then bfqqs that are
>>>> not enjoying idle are still left busy after dispatching the last
>>>> request.
>>>>
>>>> details in __bfq_bfqq_expire:
>>>>
>>>>          if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>>>          ┊   !(reason == BFQQE_PREEMPTED &&
>>>>          ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>>>> -> idling_needed_for_service_guarantees will always return true,
>>>
>>> It returns true only is the scenario is symmetric.  Not counting bfqqs
>>> with in-flight requests makes an asymmetric scenario be considered
>>> wrongly symmetric.  See function bfq_asymmetric_scenario().
>>
>> Hi, Paolo
>>
>> Do you mean this gap?
>>
>> 1. io1 is issued from bfqq1(from bfqg1)
>> 2. bfqq1 dispatched this io, it's busy is cleared
>> 3. *before io1 is completed*, io2 is issued from bfqq2(bfqg2)
> 
> Yes. So as far as I understand Paolo is concerned about this scenario.
> 
>> 4. with this patchset, while dispatching io2 from bfqq2, the scenario
>> should be symmetric while it's considered wrongly asymmetric.
> 
> But with this patchset, we will consider this scenario symmetric because at
> any point in time there is only one busy bfqq. Before, we considered this
> scenario asymmetric because two different bfq groups have bfqq in their
> weights_tree. So before this patchset
> idling_needed_for_service_guarantees() returned true, after this patchset
> the function returns false so we won't idle anymore and Paolo argues that
> bfqq1 does not get adequate protection from bfqq2 as a result.
> 
> I agree with Paolo this seems possible. The fix is relatively simple though
> - instead of changing how weights_tree is used for weight raised queues as
> you did originally, I'd move the accounting of groups with pending requests
> to bfq_add/del_bfqq_busy() and bfq_completed_request().
> 
> 								Honza

Thanks for your explanation, I'll send a new version.

Kuai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-31 10:59                     ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-31 10:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: Paolo Valente, Jens Axboe, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-block, LKML,
	yi.zhang-hv44wF8Li93QT0dZR+AlfA

在 2022/05/31 18:01, Jan Kara 写道:
> On Tue 31-05-22 17:33:25, Yu Kuai wrote:
>> 在 2022/05/31 17:19, Paolo Valente 写道:
>>>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>
>>>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>>>
>>>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>>>>>>
>>>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>>>> are not issued from root group. This is because
>>>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>>>> bfq_asymmetric_scenario().
>>>>>>>>>
>>>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>>>
>>>>>>>>> Before this patch:
>>>>>>>>> 1) root group will never be counted.
>>>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>>>
>>>>>>>>> After this patch:
>>>>>>>>> 1) root group is counted.
>>>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>>>
>>>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>>>        queues, and not of also its child bfqgs
>>>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>>>        requests to having busy queues
>>>>>>>>
>>>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>>>> the idea to me some time ago IIRC).
>>>>>>>>
>>>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>>>> and the group containing such a queue must still be counted when
>>>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>>>> Hi, Paolo
>>>>>>> Thanks for your explanation.
>>>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>>>   From what I see, when bfqq dispatch the last request,
>>>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>>>> be considered in the weights_tree.
>>>>>>> I also run some tests on null_blk with "irqmode=2
>>>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>>>> that service guarantees are still preserved on slow device.
>>>>>>> Do you this is strong enough to cover your concern?
>>>>> Unfortunately it is not.  Your very argument is what made be believe
>>>>> that considering busy queues was enough, in the first place.  But, as
>>>>> I found out, the problem is caused by the queues that do not enjoy
>>>>> idling.  With your patch (as well as in my initial version) they are
>>>>> not counted when they remain without requests queued.  And this makes
>>>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>>>> consequence is that idling gets switched off when it had to be kept
>>>>> on, and control on bandwidth is lost for the victim in-service queues.
>>>>
>>>> Hi,Paolo
>>>>
>>>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>>>> idling, then such bfqq will clear busy after dispatching the last
>>>> request?
>>>>
>>>> Please kindly correct me if I'm wrong in the following process:
>>>>
>>>> If there are more than one bfqg that is activatied, then bfqqs that are
>>>> not enjoying idle are still left busy after dispatching the last
>>>> request.
>>>>
>>>> details in __bfq_bfqq_expire:
>>>>
>>>>          if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>>>          ┊   !(reason == BFQQE_PREEMPTED &&
>>>>          ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>>>> -> idling_needed_for_service_guarantees will always return true,
>>>
>>> It returns true only is the scenario is symmetric.  Not counting bfqqs
>>> with in-flight requests makes an asymmetric scenario be considered
>>> wrongly symmetric.  See function bfq_asymmetric_scenario().
>>
>> Hi, Paolo
>>
>> Do you mean this gap?
>>
>> 1. io1 is issued from bfqq1(from bfqg1)
>> 2. bfqq1 dispatched this io, it's busy is cleared
>> 3. *before io1 is completed*, io2 is issued from bfqq2(bfqg2)
> 
> Yes. So as far as I understand Paolo is concerned about this scenario.
> 
>> 4. with this patchset, while dispatching io2 from bfqq2, the scenario
>> should be symmetric while it's considered wrongly asymmetric.
> 
> But with this patchset, we will consider this scenario symmetric because at
> any point in time there is only one busy bfqq. Before, we considered this
> scenario asymmetric because two different bfq groups have bfqq in their
> weights_tree. So before this patchset
> idling_needed_for_service_guarantees() returned true, after this patchset
> the function returns false so we won't idle anymore and Paolo argues that
> bfqq1 does not get adequate protection from bfqq2 as a result.
> 
> I agree with Paolo this seems possible. The fix is relatively simple though
> - instead of changing how weights_tree is used for weight raised queues as
> you did originally, I'd move the accounting of groups with pending requests
> to bfq_add/del_bfqq_busy() and bfq_completed_request().
> 
> 								Honza

Thanks for your explanation, I'll send a new version.

Kuai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-31 12:57                     ` Paolo Valente
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Valente @ 2022-05-31 12:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Yu Kuai, Jens Axboe, Tejun Heo, cgroups, linux-block, LKML, yi.zhang



> Il giorno 31 mag 2022, alle ore 12:01, Jan Kara <jack@suse.cz> ha scritto:
> 
> On Tue 31-05-22 17:33:25, Yu Kuai wrote:
>> 在 2022/05/31 17:19, Paolo Valente 写道:
>>>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>> 
>>>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>> 
>>>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>>>>> 
>>>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>>>> are not issued from root group. This is because
>>>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>>>> bfq_asymmetric_scenario().
>>>>>>>>> 
>>>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>>> 
>>>>>>>>> Before this patch:
>>>>>>>>> 1) root group will never be counted.
>>>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>>> 
>>>>>>>>> After this patch:
>>>>>>>>> 1) root group is counted.
>>>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>>> 
>>>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>>>      queues, and not of also its child bfqgs
>>>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>>>      requests to having busy queues
>>>>>>>> 
>>>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>>>> the idea to me some time ago IIRC).
>>>>>>>> 
>>>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>>>> and the group containing such a queue must still be counted when
>>>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>>>> Hi, Paolo
>>>>>>> Thanks for your explanation.
>>>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>>> From what I see, when bfqq dispatch the last request,
>>>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>>>> be considered in the weights_tree.
>>>>>>> I also run some tests on null_blk with "irqmode=2
>>>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>>>> that service guarantees are still preserved on slow device.
>>>>>>> Do you this is strong enough to cover your concern?
>>>>> Unfortunately it is not.  Your very argument is what made be believe
>>>>> that considering busy queues was enough, in the first place.  But, as
>>>>> I found out, the problem is caused by the queues that do not enjoy
>>>>> idling.  With your patch (as well as in my initial version) they are
>>>>> not counted when they remain without requests queued.  And this makes
>>>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>>>> consequence is that idling gets switched off when it had to be kept
>>>>> on, and control on bandwidth is lost for the victim in-service queues.
>>>> 
>>>> Hi,Paolo
>>>> 
>>>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>>>> idling, then such bfqq will clear busy after dispatching the last
>>>> request?
>>>> 
>>>> Please kindly correct me if I'm wrong in the following process:
>>>> 
>>>> If there are more than one bfqg that is activatied, then bfqqs that are
>>>> not enjoying idle are still left busy after dispatching the last
>>>> request.
>>>> 
>>>> details in __bfq_bfqq_expire:
>>>> 
>>>>        if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>>>        ┊   !(reason == BFQQE_PREEMPTED &&
>>>>        ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>>>> -> idling_needed_for_service_guarantees will always return true,
>>> 
>>> It returns true only is the scenario is symmetric.  Not counting bfqqs
>>> with in-flight requests makes an asymmetric scenario be considered
>>> wrongly symmetric.  See function bfq_asymmetric_scenario().
>> 
>> Hi, Paolo
>> 
>> Do you mean this gap?
>> 
>> 1. io1 is issued from bfqq1(from bfqg1)
>> 2. bfqq1 dispatched this io, it's busy is cleared
>> 3. *before io1 is completed*, io2 is issued from bfqq2(bfqg2)
> 
> Yes. So as far as I understand Paolo is concerned about this scenario.
> 
>> 4. with this patchset, while dispatching io2 from bfqq2, the scenario
>> should be symmetric while it's considered wrongly asymmetric.
> 
> But with this patchset, we will consider this scenario symmetric because at
> any point in time there is only one busy bfqq. Before, we considered this
> scenario asymmetric because two different bfq groups have bfqq in their
> weights_tree. So before this patchset
> idling_needed_for_service_guarantees() returned true, after this patchset
> the function returns false so we won't idle anymore and Paolo argues that
> bfqq1 does not get adequate protection from bfqq2 as a result.
> 
> I agree with Paolo this seems possible. The fix is relatively simple though
> - instead of changing how weights_tree is used for weight raised queues as
> you did originally, I'd move the accounting of groups with pending requests
> to bfq_add/del_bfqq_busy() and bfq_completed_request().
> 

Why don't we use simply the existing logic? I mean, as for the changes made by this patch, we could simply turn the loop:

void bfq_weights_tree_remove(struct bfq_data *bfqd,
			     struct bfq_queue *bfqq)
{
	...
	for_each_entity(entity) {
		struct bfq_sched_data *sd = entity->my_sched_data;

		...
		if (entity->in_groups_with_pending_reqs) {
			entity->in_groups_with_pending_reqs = false;
			bfqd->num_groups_with_pending_reqs--;
		}
	}
	...
}

into a single:

	bfqd->num_groups_with_pending_reqs--;

so that only the parent group is concerned.

Thanks,
Paolo



> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-31 12:57                     ` Paolo Valente
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Valente @ 2022-05-31 12:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Yu Kuai, Jens Axboe, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block, LKML, yi.zhang-hv44wF8Li93QT0dZR+AlfA



> Il giorno 31 mag 2022, alle ore 12:01, Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> ha scritto:
> 
> On Tue 31-05-22 17:33:25, Yu Kuai wrote:
>> 在 2022/05/31 17:19, Paolo Valente 写道:
>>>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>> 
>>>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>>> 
>>>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>>>>>> 
>>>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>>>> are not issued from root group. This is because
>>>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>>>> bfq_asymmetric_scenario().
>>>>>>>>> 
>>>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>>> 
>>>>>>>>> Before this patch:
>>>>>>>>> 1) root group will never be counted.
>>>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>>> 
>>>>>>>>> After this patch:
>>>>>>>>> 1) root group is counted.
>>>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>>> 
>>>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>>>      queues, and not of also its child bfqgs
>>>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>>>      requests to having busy queues
>>>>>>>> 
>>>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>>>> the idea to me some time ago IIRC).
>>>>>>>> 
>>>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>>>> and the group containing such a queue must still be counted when
>>>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>>>> Hi, Paolo
>>>>>>> Thanks for your explanation.
>>>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>>> From what I see, when bfqq dispatch the last request,
>>>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>>>> be considered in the weights_tree.
>>>>>>> I also run some tests on null_blk with "irqmode=2
>>>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>>>> that service guarantees are still preserved on slow device.
>>>>>>> Do you this is strong enough to cover your concern?
>>>>> Unfortunately it is not.  Your very argument is what made be believe
>>>>> that considering busy queues was enough, in the first place.  But, as
>>>>> I found out, the problem is caused by the queues that do not enjoy
>>>>> idling.  With your patch (as well as in my initial version) they are
>>>>> not counted when they remain without requests queued.  And this makes
>>>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>>>> consequence is that idling gets switched off when it had to be kept
>>>>> on, and control on bandwidth is lost for the victim in-service queues.
>>>> 
>>>> Hi,Paolo
>>>> 
>>>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>>>> idling, then such bfqq will clear busy after dispatching the last
>>>> request?
>>>> 
>>>> Please kindly correct me if I'm wrong in the following process:
>>>> 
>>>> If there are more than one bfqg that is activatied, then bfqqs that are
>>>> not enjoying idle are still left busy after dispatching the last
>>>> request.
>>>> 
>>>> details in __bfq_bfqq_expire:
>>>> 
>>>>        if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>>>        ┊   !(reason == BFQQE_PREEMPTED &&
>>>>        ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>>>> -> idling_needed_for_service_guarantees will always return true,
>>> 
>>> It returns true only is the scenario is symmetric.  Not counting bfqqs
>>> with in-flight requests makes an asymmetric scenario be considered
>>> wrongly symmetric.  See function bfq_asymmetric_scenario().
>> 
>> Hi, Paolo
>> 
>> Do you mean this gap?
>> 
>> 1. io1 is issued from bfqq1(from bfqg1)
>> 2. bfqq1 dispatched this io, it's busy is cleared
>> 3. *before io1 is completed*, io2 is issued from bfqq2(bfqg2)
> 
> Yes. So as far as I understand Paolo is concerned about this scenario.
> 
>> 4. with this patchset, while dispatching io2 from bfqq2, the scenario
>> should be symmetric while it's considered wrongly asymmetric.
> 
> But with this patchset, we will consider this scenario symmetric because at
> any point in time there is only one busy bfqq. Before, we considered this
> scenario asymmetric because two different bfq groups have bfqq in their
> weights_tree. So before this patchset
> idling_needed_for_service_guarantees() returned true, after this patchset
> the function returns false so we won't idle anymore and Paolo argues that
> bfqq1 does not get adequate protection from bfqq2 as a result.
> 
> I agree with Paolo this seems possible. The fix is relatively simple though
> - instead of changing how weights_tree is used for weight raised queues as
> you did originally, I'd move the accounting of groups with pending requests
> to bfq_add/del_bfqq_busy() and bfq_completed_request().
> 

Why don't we use simply the existing logic? I mean, as for the changes made by this patch, we could simply turn the loop:

void bfq_weights_tree_remove(struct bfq_data *bfqd,
			     struct bfq_queue *bfqq)
{
	...
	for_each_entity(entity) {
		struct bfq_sched_data *sd = entity->my_sched_data;

		...
		if (entity->in_groups_with_pending_reqs) {
			entity->in_groups_with_pending_reqs = false;
			bfqd->num_groups_with_pending_reqs--;
		}
	}
	...
}

into a single:

	bfqd->num_groups_with_pending_reqs--;

so that only the parent group is concerned.

Thanks,
Paolo



> 								Honza
> -- 
> Jan Kara <jack-IBi9RG/b67k@public.gmane.org>
> SUSE Labs, CR


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-31 13:28                       ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-31 13:28 UTC (permalink / raw)
  To: Paolo Valente, Jan Kara
  Cc: Jens Axboe, Tejun Heo, cgroups, linux-block, LKML, yi.zhang

在 2022/05/31 20:57, Paolo Valente 写道:
> 
> 
>> Il giorno 31 mag 2022, alle ore 12:01, Jan Kara <jack@suse.cz> ha scritto:
>>
>> On Tue 31-05-22 17:33:25, Yu Kuai wrote:
>>> 在 2022/05/31 17:19, Paolo Valente 写道:
>>>>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>
>>>>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>>>
>>>>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>>>>>>
>>>>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>>>>> are not issued from root group. This is because
>>>>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>>>>> bfq_asymmetric_scenario().
>>>>>>>>>>
>>>>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>>>>
>>>>>>>>>> Before this patch:
>>>>>>>>>> 1) root group will never be counted.
>>>>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>>>>
>>>>>>>>>> After this patch:
>>>>>>>>>> 1) root group is counted.
>>>>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>>>>
>>>>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>>>>       queues, and not of also its child bfqgs
>>>>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>>>>       requests to having busy queues
>>>>>>>>>
>>>>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>>>>> the idea to me some time ago IIRC).
>>>>>>>>>
>>>>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>>>>> and the group containing such a queue must still be counted when
>>>>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>>>>> Hi, Paolo
>>>>>>>> Thanks for your explanation.
>>>>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>>>>  From what I see, when bfqq dispatch the last request,
>>>>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>>>>> be considered in the weights_tree.
>>>>>>>> I also run some tests on null_blk with "irqmode=2
>>>>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>>>>> that service guarantees are still preserved on slow device.
>>>>>>>> Do you this is strong enough to cover your concern?
>>>>>> Unfortunately it is not.  Your very argument is what made be believe
>>>>>> that considering busy queues was enough, in the first place.  But, as
>>>>>> I found out, the problem is caused by the queues that do not enjoy
>>>>>> idling.  With your patch (as well as in my initial version) they are
>>>>>> not counted when they remain without requests queued.  And this makes
>>>>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>>>>> consequence is that idling gets switched off when it had to be kept
>>>>>> on, and control on bandwidth is lost for the victim in-service queues.
>>>>>
>>>>> Hi,Paolo
>>>>>
>>>>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>>>>> idling, then such bfqq will clear busy after dispatching the last
>>>>> request?
>>>>>
>>>>> Please kindly correct me if I'm wrong in the following process:
>>>>>
>>>>> If there are more than one bfqg that is activatied, then bfqqs that are
>>>>> not enjoying idle are still left busy after dispatching the last
>>>>> request.
>>>>>
>>>>> details in __bfq_bfqq_expire:
>>>>>
>>>>>         if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>>>>         ┊   !(reason == BFQQE_PREEMPTED &&
>>>>>         ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>>>>> -> idling_needed_for_service_guarantees will always return true,
>>>>
>>>> It returns true only is the scenario is symmetric.  Not counting bfqqs
>>>> with in-flight requests makes an asymmetric scenario be considered
>>>> wrongly symmetric.  See function bfq_asymmetric_scenario().
>>>
>>> Hi, Paolo
>>>
>>> Do you mean this gap?
>>>
>>> 1. io1 is issued from bfqq1(from bfqg1)
>>> 2. bfqq1 dispatched this io, it's busy is cleared
>>> 3. *before io1 is completed*, io2 is issued from bfqq2(bfqg2)
>>
>> Yes. So as far as I understand Paolo is concerned about this scenario.
>>
>>> 4. with this patchset, while dispatching io2 from bfqq2, the scenario
>>> should be symmetric while it's considered wrongly asymmetric.
>>
>> But with this patchset, we will consider this scenario symmetric because at
>> any point in time there is only one busy bfqq. Before, we considered this
>> scenario asymmetric because two different bfq groups have bfqq in their
>> weights_tree. So before this patchset
>> idling_needed_for_service_guarantees() returned true, after this patchset
>> the function returns false so we won't idle anymore and Paolo argues that
>> bfqq1 does not get adequate protection from bfqq2 as a result.
>>
>> I agree with Paolo this seems possible. The fix is relatively simple though
>> - instead of changing how weights_tree is used for weight raised queues as
>> you did originally, I'd move the accounting of groups with pending requests
>> to bfq_add/del_bfqq_busy() and bfq_completed_request().
>>
> 
> Why don't we use simply the existing logic? I mean, as for the changes made by this patch, we could simply turn the loop:
> 
> void bfq_weights_tree_remove(struct bfq_data *bfqd,
> 			     struct bfq_queue *bfqq)
> {
> 	...
> 	for_each_entity(entity) {
> 		struct bfq_sched_data *sd = entity->my_sched_data;
> 
> 		...
> 		if (entity->in_groups_with_pending_reqs) {
> 			entity->in_groups_with_pending_reqs = false;
> 			bfqd->num_groups_with_pending_reqs--;
> 		}
> 	}
> 	...
> }
> 
> into a single:
> 
> 	bfqd->num_groups_with_pending_reqs--;
> 
> so that only the parent group is concerned.

It's ok to decrease it here, however, we need another place to increase
it in order to count root group... And bfq_weights_tree_add is not good
because it bypass wr queues.

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-05-31 13:28                       ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-05-31 13:28 UTC (permalink / raw)
  To: Paolo Valente, Jan Kara
  Cc: Jens Axboe, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block, LKML, yi.zhang-hv44wF8Li93QT0dZR+AlfA

在 2022/05/31 20:57, Paolo Valente 写道:
> 
> 
>> Il giorno 31 mag 2022, alle ore 12:01, Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> ha scritto:
>>
>> On Tue 31-05-22 17:33:25, Yu Kuai wrote:
>>> 在 2022/05/31 17:19, Paolo Valente 写道:
>>>>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>>
>>>>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>>>>
>>>>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>>>>>>>
>>>>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>>>>> are not issued from root group. This is because
>>>>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>>>>> bfq_asymmetric_scenario().
>>>>>>>>>>
>>>>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>>>>
>>>>>>>>>> Before this patch:
>>>>>>>>>> 1) root group will never be counted.
>>>>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>>>>
>>>>>>>>>> After this patch:
>>>>>>>>>> 1) root group is counted.
>>>>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>>>>
>>>>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>>>>       queues, and not of also its child bfqgs
>>>>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>>>>       requests to having busy queues
>>>>>>>>>
>>>>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>>>>> the idea to me some time ago IIRC).
>>>>>>>>>
>>>>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>>>>> and the group containing such a queue must still be counted when
>>>>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>>>>> Hi, Paolo
>>>>>>>> Thanks for your explanation.
>>>>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>>>>  From what I see, when bfqq dispatch the last request,
>>>>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>>>>> be considered in the weights_tree.
>>>>>>>> I also run some tests on null_blk with "irqmode=2
>>>>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>>>>> that service guarantees are still preserved on slow device.
>>>>>>>> Do you this is strong enough to cover your concern?
>>>>>> Unfortunately it is not.  Your very argument is what made be believe
>>>>>> that considering busy queues was enough, in the first place.  But, as
>>>>>> I found out, the problem is caused by the queues that do not enjoy
>>>>>> idling.  With your patch (as well as in my initial version) they are
>>>>>> not counted when they remain without requests queued.  And this makes
>>>>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>>>>> consequence is that idling gets switched off when it had to be kept
>>>>>> on, and control on bandwidth is lost for the victim in-service queues.
>>>>>
>>>>> Hi,Paolo
>>>>>
>>>>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>>>>> idling, then such bfqq will clear busy after dispatching the last
>>>>> request?
>>>>>
>>>>> Please kindly correct me if I'm wrong in the following process:
>>>>>
>>>>> If there are more than one bfqg that is activatied, then bfqqs that are
>>>>> not enjoying idle are still left busy after dispatching the last
>>>>> request.
>>>>>
>>>>> details in __bfq_bfqq_expire:
>>>>>
>>>>>         if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>>>>         ┊   !(reason == BFQQE_PREEMPTED &&
>>>>>         ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>>>>> -> idling_needed_for_service_guarantees will always return true,
>>>>
>>>> It returns true only is the scenario is symmetric.  Not counting bfqqs
>>>> with in-flight requests makes an asymmetric scenario be considered
>>>> wrongly symmetric.  See function bfq_asymmetric_scenario().
>>>
>>> Hi, Paolo
>>>
>>> Do you mean this gap?
>>>
>>> 1. io1 is issued from bfqq1(from bfqg1)
>>> 2. bfqq1 dispatched this io, it's busy is cleared
>>> 3. *before io1 is completed*, io2 is issued from bfqq2(bfqg2)
>>
>> Yes. So as far as I understand Paolo is concerned about this scenario.
>>
>>> 4. with this patchset, while dispatching io2 from bfqq2, the scenario
>>> should be symmetric while it's considered wrongly asymmetric.
>>
>> But with this patchset, we will consider this scenario symmetric because at
>> any point in time there is only one busy bfqq. Before, we considered this
>> scenario asymmetric because two different bfq groups have bfqq in their
>> weights_tree. So before this patchset
>> idling_needed_for_service_guarantees() returned true, after this patchset
>> the function returns false so we won't idle anymore and Paolo argues that
>> bfqq1 does not get adequate protection from bfqq2 as a result.
>>
>> I agree with Paolo this seems possible. The fix is relatively simple though
>> - instead of changing how weights_tree is used for weight raised queues as
>> you did originally, I'd move the accounting of groups with pending requests
>> to bfq_add/del_bfqq_busy() and bfq_completed_request().
>>
> 
> Why don't we use simply the existing logic? I mean, as for the changes made by this patch, we could simply turn the loop:
> 
> void bfq_weights_tree_remove(struct bfq_data *bfqd,
> 			     struct bfq_queue *bfqq)
> {
> 	...
> 	for_each_entity(entity) {
> 		struct bfq_sched_data *sd = entity->my_sched_data;
> 
> 		...
> 		if (entity->in_groups_with_pending_reqs) {
> 			entity->in_groups_with_pending_reqs = false;
> 			bfqd->num_groups_with_pending_reqs--;
> 		}
> 	}
> 	...
> }
> 
> into a single:
> 
> 	bfqd->num_groups_with_pending_reqs--;
> 
> so that only the parent group is concerned.

It's ok to decrease it here, however, we need another place to increase
it in order to count root group... And bfq_weights_tree_add is not good
because it bypass wr queues.

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-06-02  1:05                       ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-06-02  1:05 UTC (permalink / raw)
  To: Paolo Valente, Jan Kara
  Cc: Jens Axboe, Tejun Heo, cgroups, linux-block, LKML, yi.zhang

在 2022/05/31 20:57, Paolo Valente 写道:
> 
> 
>> Il giorno 31 mag 2022, alle ore 12:01, Jan Kara <jack@suse.cz> ha scritto:
>>
>> On Tue 31-05-22 17:33:25, Yu Kuai wrote:
>>> 在 2022/05/31 17:19, Paolo Valente 写道:
>>>>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>
>>>>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>>>
>>>>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3@huawei.com> ha scritto:
>>>>>>>>>>
>>>>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>>>>> are not issued from root group. This is because
>>>>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>>>>> bfq_asymmetric_scenario().
>>>>>>>>>>
>>>>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>>>>
>>>>>>>>>> Before this patch:
>>>>>>>>>> 1) root group will never be counted.
>>>>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>>>>
>>>>>>>>>> After this patch:
>>>>>>>>>> 1) root group is counted.
>>>>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>>>>
>>>>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>>>>       queues, and not of also its child bfqgs
>>>>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>>>>       requests to having busy queues
>>>>>>>>>
>>>>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>>>>> the idea to me some time ago IIRC).
>>>>>>>>>
>>>>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>>>>> and the group containing such a queue must still be counted when
>>>>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>>>>> Hi, Paolo
>>>>>>>> Thanks for your explanation.
>>>>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>>>>  From what I see, when bfqq dispatch the last request,
>>>>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>>>>> be considered in the weights_tree.
>>>>>>>> I also run some tests on null_blk with "irqmode=2
>>>>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>>>>> that service guarantees are still preserved on slow device.
>>>>>>>> Do you this is strong enough to cover your concern?
>>>>>> Unfortunately it is not.  Your very argument is what made be believe
>>>>>> that considering busy queues was enough, in the first place.  But, as
>>>>>> I found out, the problem is caused by the queues that do not enjoy
>>>>>> idling.  With your patch (as well as in my initial version) they are
>>>>>> not counted when they remain without requests queued.  And this makes
>>>>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>>>>> consequence is that idling gets switched off when it had to be kept
>>>>>> on, and control on bandwidth is lost for the victim in-service queues.
>>>>>
>>>>> Hi,Paolo
>>>>>
>>>>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>>>>> idling, then such bfqq will clear busy after dispatching the last
>>>>> request?
>>>>>
>>>>> Please kindly correct me if I'm wrong in the following process:
>>>>>
>>>>> If there are more than one bfqg that is activatied, then bfqqs that are
>>>>> not enjoying idle are still left busy after dispatching the last
>>>>> request.
>>>>>
>>>>> details in __bfq_bfqq_expire:
>>>>>
>>>>>         if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>>>>         ┊   !(reason == BFQQE_PREEMPTED &&
>>>>>         ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>>>>> -> idling_needed_for_service_guarantees will always return true,
>>>>
>>>> It returns true only is the scenario is symmetric.  Not counting bfqqs
>>>> with in-flight requests makes an asymmetric scenario be considered
>>>> wrongly symmetric.  See function bfq_asymmetric_scenario().
>>>
>>> Hi, Paolo
>>>
>>> Do you mean this gap?
>>>
>>> 1. io1 is issued from bfqq1(from bfqg1)
>>> 2. bfqq1 dispatched this io, it's busy is cleared
>>> 3. *before io1 is completed*, io2 is issued from bfqq2(bfqg2)
>>
>> Yes. So as far as I understand Paolo is concerned about this scenario.
>>
>>> 4. with this patchset, while dispatching io2 from bfqq2, the scenario
>>> should be symmetric while it's considered wrongly asymmetric.
>>
>> But with this patchset, we will consider this scenario symmetric because at
>> any point in time there is only one busy bfqq. Before, we considered this
>> scenario asymmetric because two different bfq groups have bfqq in their
>> weights_tree. So before this patchset
>> idling_needed_for_service_guarantees() returned true, after this patchset
>> the function returns false so we won't idle anymore and Paolo argues that
>> bfqq1 does not get adequate protection from bfqq2 as a result.
>>
>> I agree with Paolo this seems possible. The fix is relatively simple though
>> - instead of changing how weights_tree is used for weight raised queues as
>> you did originally, I'd move the accounting of groups with pending requests
>> to bfq_add/del_bfqq_busy() and bfq_completed_request().
>>
> 
> Why don't we use simply the existing logic? I mean, as for the changes made by this patch, we could simply turn the loop:
> 
> void bfq_weights_tree_remove(struct bfq_data *bfqd,
> 			     struct bfq_queue *bfqq)
> {
> 	...
> 	for_each_entity(entity) {
> 		struct bfq_sched_data *sd = entity->my_sched_data;
> 
> 		...
> 		if (entity->in_groups_with_pending_reqs) {
> 			entity->in_groups_with_pending_reqs = false;
> 			bfqd->num_groups_with_pending_reqs--;
> 		}
> 	}
> 	...
> }
> 
> into a single:
> 
> 	bfqd->num_groups_with_pending_reqs--;
> 
> so that only the parent group is concerned.
> 
> Thanks,
> Paolo

Hi, Paolo

Can you please take a look if this patchset(v9) delivered to you? There
are still some problems with out mail server...

https://patchwork.kernel.org/project/linux-block/cover/20220601114340.949953-1-yukuai3@huawei.com/

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
@ 2022-06-02  1:05                       ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2022-06-02  1:05 UTC (permalink / raw)
  To: Paolo Valente, Jan Kara
  Cc: Jens Axboe, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block, LKML, yi.zhang-hv44wF8Li93QT0dZR+AlfA

在 2022/05/31 20:57, Paolo Valente 写道:
> 
> 
>> Il giorno 31 mag 2022, alle ore 12:01, Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> ha scritto:
>>
>> On Tue 31-05-22 17:33:25, Yu Kuai wrote:
>>> 在 2022/05/31 17:19, Paolo Valente 写道:
>>>>> Il giorno 31 mag 2022, alle ore 11:06, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>>
>>>>> 在 2022/05/31 16:36, Paolo VALENTE 写道:
>>>>>>> Il giorno 30 mag 2022, alle ore 10:40, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>>>>
>>>>>>> 在 2022/05/30 16:34, Yu Kuai 写道:
>>>>>>>> 在 2022/05/30 16:10, Paolo Valente 写道:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Il giorno 28 mag 2022, alle ore 11:50, Yu Kuai <yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ha scritto:
>>>>>>>>>>
>>>>>>>>>> Currently, bfq can't handle sync io concurrently as long as they
>>>>>>>>>> are not issued from root group. This is because
>>>>>>>>>> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
>>>>>>>>>> bfq_asymmetric_scenario().
>>>>>>>>>>
>>>>>>>>>> The way that bfqg is counted into 'num_groups_with_pending_reqs':
>>>>>>>>>>
>>>>>>>>>> Before this patch:
>>>>>>>>>> 1) root group will never be counted.
>>>>>>>>>> 2) Count if bfqg or it's child bfqgs have pending requests.
>>>>>>>>>> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>>>>>>>>>>
>>>>>>>>>> After this patch:
>>>>>>>>>> 1) root group is counted.
>>>>>>>>>> 2) Count if bfqg have at least one bfqq that is marked busy.
>>>>>>>>>> 3) Don't count if bfqg doesn't have any busy bfqqs.
>>>>>>>>>
>>>>>>>>> Unfortunately, I see a last problem here. I see a double change:
>>>>>>>>> (1) a bfqg is now counted only as a function of the state of its child
>>>>>>>>>       queues, and not of also its child bfqgs
>>>>>>>>> (2) the state considered for counting a bfqg moves from having pending
>>>>>>>>>       requests to having busy queues
>>>>>>>>>
>>>>>>>>> I'm ok with with (1), which is a good catch (you are lady explained
>>>>>>>>> the idea to me some time ago IIRC).
>>>>>>>>>
>>>>>>>>> Yet I fear that (2) is not ok.  A bfqq can become non busy even if it
>>>>>>>>> still has in-flight I/O, i.e.  I/O being served in the drive.  The
>>>>>>>>> weight of such a bfqq must still be considered in the weights_tree,
>>>>>>>>> and the group containing such a queue must still be counted when
>>>>>>>>> checking whether the scenario is asymmetric.  Otherwise service
>>>>>>>>> guarantees are broken.  The reason is that, if a scenario is deemed as
>>>>>>>>> symmetric because in-flight I/O is not taken into account, then idling
>>>>>>>>> will not be performed to protect some bfqq, and in-flight I/O may
>>>>>>>>> steal bandwidth to that bfqq in an uncontrolled way.
>>>>>>>> Hi, Paolo
>>>>>>>> Thanks for your explanation.
>>>>>>>> My orginal thoughts was using weights_tree insertion/removal, however,
>>>>>>>> Jan convinced me that using bfq_add/del_bfqq_busy() is ok.
>>>>>>>>  From what I see, when bfqq dispatch the last request,
>>>>>>>> bfq_del_bfqq_busy() will not be called from __bfq_bfqq_expire() if
>>>>>>>> idling is needed, and it will delayed to when such bfqq get scheduled as
>>>>>>>> in-service queue again. Which means the weight of such bfqq should still
>>>>>>>> be considered in the weights_tree.
>>>>>>>> I also run some tests on null_blk with "irqmode=2
>>>>>>>> completion_nsec=100000000(100ms) hw_queue_depth=1", and tests show
>>>>>>>> that service guarantees are still preserved on slow device.
>>>>>>>> Do you this is strong enough to cover your concern?
>>>>>> Unfortunately it is not.  Your very argument is what made be believe
>>>>>> that considering busy queues was enough, in the first place.  But, as
>>>>>> I found out, the problem is caused by the queues that do not enjoy
>>>>>> idling.  With your patch (as well as in my initial version) they are
>>>>>> not counted when they remain without requests queued.  And this makes
>>>>>> asymmetric scenarios be considered erroneously as symmetric.  The
>>>>>> consequence is that idling gets switched off when it had to be kept
>>>>>> on, and control on bandwidth is lost for the victim in-service queues.
>>>>>
>>>>> Hi,Paolo
>>>>>
>>>>> Thanks for your explanation, are you thinking that if bfqq doesn't enjoy
>>>>> idling, then such bfqq will clear busy after dispatching the last
>>>>> request?
>>>>>
>>>>> Please kindly correct me if I'm wrong in the following process:
>>>>>
>>>>> If there are more than one bfqg that is activatied, then bfqqs that are
>>>>> not enjoying idle are still left busy after dispatching the last
>>>>> request.
>>>>>
>>>>> details in __bfq_bfqq_expire:
>>>>>
>>>>>         if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
>>>>>         ┊   !(reason == BFQQE_PREEMPTED &&
>>>>>         ┊     idling_needed_for_service_guarantees(bfqd, bfqq))) {
>>>>> -> idling_needed_for_service_guarantees will always return true,
>>>>
>>>> It returns true only is the scenario is symmetric.  Not counting bfqqs
>>>> with in-flight requests makes an asymmetric scenario be considered
>>>> wrongly symmetric.  See function bfq_asymmetric_scenario().
>>>
>>> Hi, Paolo
>>>
>>> Do you mean this gap?
>>>
>>> 1. io1 is issued from bfqq1(from bfqg1)
>>> 2. bfqq1 dispatched this io, it's busy is cleared
>>> 3. *before io1 is completed*, io2 is issued from bfqq2(bfqg2)
>>
>> Yes. So as far as I understand Paolo is concerned about this scenario.
>>
>>> 4. with this patchset, while dispatching io2 from bfqq2, the scenario
>>> should be symmetric while it's considered wrongly asymmetric.
>>
>> But with this patchset, we will consider this scenario symmetric because at
>> any point in time there is only one busy bfqq. Before, we considered this
>> scenario asymmetric because two different bfq groups have bfqq in their
>> weights_tree. So before this patchset
>> idling_needed_for_service_guarantees() returned true, after this patchset
>> the function returns false so we won't idle anymore and Paolo argues that
>> bfqq1 does not get adequate protection from bfqq2 as a result.
>>
>> I agree with Paolo this seems possible. The fix is relatively simple though
>> - instead of changing how weights_tree is used for weight raised queues as
>> you did originally, I'd move the accounting of groups with pending requests
>> to bfq_add/del_bfqq_busy() and bfq_completed_request().
>>
> 
> Why don't we use simply the existing logic? I mean, as for the changes made by this patch, we could simply turn the loop:
> 
> void bfq_weights_tree_remove(struct bfq_data *bfqd,
> 			     struct bfq_queue *bfqq)
> {
> 	...
> 	for_each_entity(entity) {
> 		struct bfq_sched_data *sd = entity->my_sched_data;
> 
> 		...
> 		if (entity->in_groups_with_pending_reqs) {
> 			entity->in_groups_with_pending_reqs = false;
> 			bfqd->num_groups_with_pending_reqs--;
> 		}
> 	}
> 	...
> }
> 
> into a single:
> 
> 	bfqd->num_groups_with_pending_reqs--;
> 
> so that only the parent group is concerned.
> 
> Thanks,
> Paolo

Hi, Paolo

Can you please take a look if this patchset(v9) delivered to you? There
are still some problems with out mail server...

https://patchwork.kernel.org/project/linux-block/cover/20220601114340.949953-1-yukuai3-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-06-02  1:05 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-28  9:50 [PATCH -next v7 0/3] support concurrent sync io for bfq on a specail occasion Yu Kuai
2022-05-28  9:50 ` Yu Kuai
2022-05-28  9:50 ` [PATCH -next v7 1/3] block, bfq: record how many queues are busy in bfq_group Yu Kuai
2022-05-28  9:50   ` Yu Kuai
2022-05-28  9:50 ` [PATCH -next v7 2/3] block, bfq: refactor the counting of 'num_groups_with_pending_reqs' Yu Kuai
2022-05-28  9:50   ` Yu Kuai
2022-05-30  8:10   ` Paolo Valente
2022-05-30  8:34     ` Yu Kuai
2022-05-30  8:34       ` Yu Kuai
     [not found]       ` <efe01dd1-0f99-dadf-956d-b0e80e1e602c@huawei.com>
2022-05-31  8:36         ` Paolo VALENTE
2022-05-31  9:06           ` Yu Kuai
2022-05-31  9:06             ` Yu Kuai
2022-05-31  9:19             ` Paolo Valente
2022-05-31  9:24               ` Yu Kuai
2022-05-31  9:24                 ` Yu Kuai
2022-05-31  9:33               ` Yu Kuai
2022-05-31  9:33                 ` Yu Kuai
2022-05-31 10:01                 ` Jan Kara
2022-05-31 10:59                   ` Yu Kuai
2022-05-31 10:59                     ` Yu Kuai
2022-05-31 12:57                   ` Paolo Valente
2022-05-31 12:57                     ` Paolo Valente
2022-05-31 13:28                     ` Yu Kuai
2022-05-31 13:28                       ` Yu Kuai
2022-06-02  1:05                     ` Yu Kuai
2022-06-02  1:05                       ` Yu Kuai
2022-05-28  9:50 ` [PATCH -next v7 3/3] block, bfq: do not idle if only one group is activated Yu Kuai
2022-05-28  9:50   ` Yu Kuai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.