All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] sched/fair: optimize and simplify rq
@ 2022-05-26 10:39 Chengming Zhou
  2022-05-26 10:39 ` [PATCH v3 1/2] sched/fair: fix propagate during synchronous attach/detach Chengming Zhou
  2022-05-26 10:39 ` [PATCH v3 2/2] sched/fair: optimize and simplify rq leaf_cfs_rq_list Chengming Zhou
  0 siblings, 2 replies; 7+ messages in thread
From: Chengming Zhou @ 2022-05-26 10:39 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, odin
  Cc: linux-kernel, duanxiongchun, songmuchun, Chengming Zhou

Hi,

This version is based on [1], which try to not put cfs_rqs in throttled_hierarchy
on the rq leaf_cfs_rq_list, simplify the leaf_cfs_rq_list maintenance and
optimize the periodic load decay.

Vincent suggested moving throttled_hierarchy() outside list_add_leaf_cfs_rq()
because the task will not be added in this case which is quite misleading.

When do that moving in propagate_entity_cfs_rq(), I found it has problem
with propagation upwards. So add the fix in the first patch to be easier
to backport, please see the details in its commit message.

[1] https://lore.kernel.org/all/20220427160544.40309-1-zhouchengming@bytedance.com/

Chengming Zhou (2):
  sched/fair: fix propagate during synchronous attach/detach
  sched/fair: optimize and simplify rq leaf_cfs_rq_list

 kernel/sched/fair.c | 87 ++++++++++++---------------------------------
 1 file changed, 22 insertions(+), 65 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/2] sched/fair: fix propagate during synchronous attach/detach
  2022-05-26 10:39 [PATCH v3 0/2] sched/fair: optimize and simplify rq Chengming Zhou
@ 2022-05-26 10:39 ` Chengming Zhou
  2022-05-30  7:31   ` Vincent Guittot
  2022-05-26 10:39 ` [PATCH v3 2/2] sched/fair: optimize and simplify rq leaf_cfs_rq_list Chengming Zhou
  1 sibling, 1 reply; 7+ messages in thread
From: Chengming Zhou @ 2022-05-26 10:39 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, odin
  Cc: linux-kernel, duanxiongchun, songmuchun, Chengming Zhou

When a task moves from/to a cfs_rq, we first detach/attach the load_avg
of se from/to that cfs_rq, then propagate the changes across the tg tree
to make it visible to the root, which did in update_load_avg().

But the current code will break when encountering a on_list cfs_rq,
can't propagate up to the root cfs_rq, that also mismatch with the
comment of propagate_entity_cfs_rq(), which says "Propagate the changes
of the sched_entity across the tg tree to make it visible to the root".

The second problem is that it won't update_load_avg() for throttled
cfs_rq, cause the load changes can't be propagated upwards.

    A
    |
    B  --> throttled cfs_rq
   /
  C

The prop_runnable_sum of C won't be propagated to B, so won't be
propagated to A.

Fixes: 0258bdfaff5b ("sched/fair: Fix unfairness caused by missing load decay")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/fair.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 906b2c7c48d1..c6da204f3068 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11267,14 +11267,8 @@ static void propagate_entity_cfs_rq(struct sched_entity *se)
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
 
-		if (!cfs_rq_throttled(cfs_rq)){
-			update_load_avg(cfs_rq, se, UPDATE_TG);
-			list_add_leaf_cfs_rq(cfs_rq);
-			continue;
-		}
-
-		if (list_add_leaf_cfs_rq(cfs_rq))
-			break;
+		update_load_avg(cfs_rq, se, UPDATE_TG);
+		list_add_leaf_cfs_rq(cfs_rq);
 	}
 }
 #else
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/2] sched/fair: optimize and simplify rq leaf_cfs_rq_list
  2022-05-26 10:39 [PATCH v3 0/2] sched/fair: optimize and simplify rq Chengming Zhou
  2022-05-26 10:39 ` [PATCH v3 1/2] sched/fair: fix propagate during synchronous attach/detach Chengming Zhou
@ 2022-05-26 10:39 ` Chengming Zhou
  2022-05-30  7:52   ` Vincent Guittot
  1 sibling, 1 reply; 7+ messages in thread
From: Chengming Zhou @ 2022-05-26 10:39 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, odin
  Cc: linux-kernel, duanxiongchun, songmuchun, Chengming Zhou

We notice the rq leaf_cfs_rq_list has two problems when do bugfix
backports and some test profiling.

1. cfs_rqs under throttled subtree could be added to the list, and
   make their fully decayed ancestors on the list, even though not needed.

2. #1 also make the leaf_cfs_rq_list management complex and error prone,
   this is the list of related bugfix so far:

   commit 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
   commit fe61468b2cbc ("sched/fair: Fix enqueue_task_fair warning")
   commit b34cb07dde7c ("sched/fair: Fix enqueue_task_fair() warning some more")
   commit 39f23ce07b93 ("sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list")
   commit 0258bdfaff5b ("sched/fair: Fix unfairness caused by missing load decay")
   commit a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
   commit fdaba61ef8a2 ("sched/fair: Ensure that the CFS parent is added after unthrottling")
   commit 2630cde26711 ("sched/fair: Add ancestors of unthrottled undecayed cfs_rq")

commit 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
delete every cfs_rq under throttled subtree from rq->leaf_cfs_rq_list,
and delete the throttled_hierarchy() test in update_blocked_averages(),
which optimized update_blocked_averages().

But those later bugfix add cfs_rqs under throttled subtree back to
rq->leaf_cfs_rq_list again, with their fully decayed ancestors, for
the integrity of rq->leaf_cfs_rq_list.

This patch takes another method, skip all cfs_rqs under throttled
hierarchy when list_add_leaf_cfs_rq(), to completely make cfs_rqs
under throttled subtree off the leaf_cfs_rq_list.

So we don't need to consider throttled related things in
enqueue_entity(), unthrottle_cfs_rq() and enqueue_task_fair(),
which simplify the code a lot. Also optimize update_blocked_averages()
since cfs_rqs under throttled hierarchy and their ancestors
won't be on the leaf_cfs_rq_list.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
v3:
 - fix !CONFIG_FAIR_GROUP_SCHED build error, reported by
   kernel test robot <lkp@intel.com>

v2:
 - move throttled_hierarchy() outside list_add_leaf_cfs_rq(),
   suggested by Vincent.
---
 kernel/sched/fair.c | 81 ++++++++++++---------------------------------
 1 file changed, 22 insertions(+), 59 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c6da204f3068..48ff4330d68a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3114,6 +3114,8 @@ void reweight_task(struct task_struct *p, int prio)
 	load->inv_weight = sched_prio_to_wmult[prio];
 }
 
+static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 #ifdef CONFIG_SMP
 /*
@@ -3224,8 +3226,6 @@ static long calc_group_shares(struct cfs_rq *cfs_rq)
 }
 #endif /* CONFIG_SMP */
 
-static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
-
 /*
  * Recomputes the group entity based on the current state of its group
  * runqueue.
@@ -4338,16 +4338,11 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 		__enqueue_entity(cfs_rq, se);
 	se->on_rq = 1;
 
-	/*
-	 * When bandwidth control is enabled, cfs might have been removed
-	 * because of a parent been throttled but cfs->nr_running > 1. Try to
-	 * add it unconditionally.
-	 */
-	if (cfs_rq->nr_running == 1 || cfs_bandwidth_used())
-		list_add_leaf_cfs_rq(cfs_rq);
-
-	if (cfs_rq->nr_running == 1)
+	if (cfs_rq->nr_running == 1) {
 		check_enqueue_throttle(cfs_rq);
+		if (!throttled_hierarchy(cfs_rq))
+			list_add_leaf_cfs_rq(cfs_rq);
+	}
 }
 
 static void __clear_buddies_last(struct sched_entity *se)
@@ -4962,11 +4957,18 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	/* update hierarchical throttle state */
 	walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);
 
-	/* Nothing to run but something to decay (on_list)? Complete the branch */
 	if (!cfs_rq->load.weight) {
-		if (cfs_rq->on_list)
-			goto unthrottle_throttle;
-		return;
+		if (!cfs_rq->on_list)
+			return;
+		/*
+		 * Nothing to run but something to decay (on_list)?
+		 * Complete the branch.
+		 */
+		for_each_sched_entity(se) {
+			if (list_add_leaf_cfs_rq(cfs_rq_of(se)))
+				break;
+		}
+		goto unthrottle_throttle;
 	}
 
 	task_delta = cfs_rq->h_nr_running;
@@ -5004,31 +5006,12 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 		/* end evaluation on encountering a throttled cfs_rq */
 		if (cfs_rq_throttled(qcfs_rq))
 			goto unthrottle_throttle;
-
-		/*
-		 * One parent has been throttled and cfs_rq removed from the
-		 * list. Add it back to not break the leaf list.
-		 */
-		if (throttled_hierarchy(qcfs_rq))
-			list_add_leaf_cfs_rq(qcfs_rq);
 	}
 
 	/* At this point se is NULL and we are at root level*/
 	add_nr_running(rq, task_delta);
 
 unthrottle_throttle:
-	/*
-	 * The cfs_rq_throttled() breaks in the above iteration can result in
-	 * incomplete leaf list maintenance, resulting in triggering the
-	 * assertion below.
-	 */
-	for_each_sched_entity(se) {
-		struct cfs_rq *qcfs_rq = cfs_rq_of(se);
-
-		if (list_add_leaf_cfs_rq(qcfs_rq))
-			break;
-	}
-
 	assert_list_leaf_cfs_rq(rq);
 
 	/* Determine whether we need to wake up potentially idle CPU: */
@@ -5683,13 +5666,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		/* end evaluation on encountering a throttled cfs_rq */
 		if (cfs_rq_throttled(cfs_rq))
 			goto enqueue_throttle;
-
-               /*
-                * One parent has been throttled and cfs_rq removed from the
-                * list. Add it back to not break the leaf list.
-                */
-               if (throttled_hierarchy(cfs_rq))
-                       list_add_leaf_cfs_rq(cfs_rq);
 	}
 
 	/* At this point se is NULL and we are at root level*/
@@ -5713,21 +5689,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		update_overutilized_status(rq);
 
 enqueue_throttle:
-	if (cfs_bandwidth_used()) {
-		/*
-		 * When bandwidth control is enabled; the cfs_rq_throttled()
-		 * breaks in the above iteration can result in incomplete
-		 * leaf list maintenance, resulting in triggering the assertion
-		 * below.
-		 */
-		for_each_sched_entity(se) {
-			cfs_rq = cfs_rq_of(se);
-
-			if (list_add_leaf_cfs_rq(cfs_rq))
-				break;
-		}
-	}
-
 	assert_list_leaf_cfs_rq(rq);
 
 	hrtick_update(rq);
@@ -11257,9 +11218,10 @@ static inline bool vruntime_normalized(struct task_struct *p)
  */
 static void propagate_entity_cfs_rq(struct sched_entity *se)
 {
-	struct cfs_rq *cfs_rq;
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
-	list_add_leaf_cfs_rq(cfs_rq_of(se));
+	if (!throttled_hierarchy(cfs_rq))
+		list_add_leaf_cfs_rq(cfs_rq);
 
 	/* Start to propagate at parent */
 	se = se->parent;
@@ -11268,7 +11230,8 @@ static void propagate_entity_cfs_rq(struct sched_entity *se)
 		cfs_rq = cfs_rq_of(se);
 
 		update_load_avg(cfs_rq, se, UPDATE_TG);
-		list_add_leaf_cfs_rq(cfs_rq);
+		if (!throttled_hierarchy(cfs_rq))
+			list_add_leaf_cfs_rq(cfs_rq);
 	}
 }
 #else
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] sched/fair: fix propagate during synchronous attach/detach
  2022-05-26 10:39 ` [PATCH v3 1/2] sched/fair: fix propagate during synchronous attach/detach Chengming Zhou
@ 2022-05-30  7:31   ` Vincent Guittot
  0 siblings, 0 replies; 7+ messages in thread
From: Vincent Guittot @ 2022-05-30  7:31 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, odin, linux-kernel, duanxiongchun,
	songmuchun

On Thu, 26 May 2022 at 12:39, Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> When a task moves from/to a cfs_rq, we first detach/attach the load_avg
> of se from/to that cfs_rq, then propagate the changes across the tg tree
> to make it visible to the root, which did in update_load_avg().
>
> But the current code will break when encountering a on_list cfs_rq,

It breaks only when the cfs is throttled and the full branch is on the
list because is this case, we only want to make sure that the branch
is correctly ordered in the list

> can't propagate up to the root cfs_rq, that also mismatch with the
> comment of propagate_entity_cfs_rq(), which says "Propagate the changes
> of the sched_entity across the tg tree to make it visible to the root".
>
> The second problem is that it won't update_load_avg() for throttled
> cfs_rq, cause the load changes can't be propagated upwards.

If the cfs is throttled, its sched_entity has been dequeued and its
load is not accounted in the parent so there is nothing to propagate

>
>     A
>     |
>     B  --> throttled cfs_rq
>    /
>   C
>
> The prop_runnable_sum of C won't be propagated to B, so won't be
> propagated to A.
>
> Fixes: 0258bdfaff5b ("sched/fair: Fix unfairness caused by missing load decay")
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  kernel/sched/fair.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 906b2c7c48d1..c6da204f3068 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11267,14 +11267,8 @@ static void propagate_entity_cfs_rq(struct sched_entity *se)
>         for_each_sched_entity(se) {
>                 cfs_rq = cfs_rq_of(se);
>
> -               if (!cfs_rq_throttled(cfs_rq)){
> -                       update_load_avg(cfs_rq, se, UPDATE_TG);
> -                       list_add_leaf_cfs_rq(cfs_rq);
> -                       continue;
> -               }
> -
> -               if (list_add_leaf_cfs_rq(cfs_rq))
> -                       break;
> +               update_load_avg(cfs_rq, se, UPDATE_TG);
> +               list_add_leaf_cfs_rq(cfs_rq);
>         }
>  }
>  #else
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] sched/fair: optimize and simplify rq leaf_cfs_rq_list
  2022-05-26 10:39 ` [PATCH v3 2/2] sched/fair: optimize and simplify rq leaf_cfs_rq_list Chengming Zhou
@ 2022-05-30  7:52   ` Vincent Guittot
  2022-05-31 13:55     ` [External] " Chengming Zhou
  0 siblings, 1 reply; 7+ messages in thread
From: Vincent Guittot @ 2022-05-30  7:52 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, odin, linux-kernel, duanxiongchun,
	songmuchun

On Thu, 26 May 2022 at 12:40, Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> We notice the rq leaf_cfs_rq_list has two problems when do bugfix
> backports and some test profiling.
>
> 1. cfs_rqs under throttled subtree could be added to the list, and
>    make their fully decayed ancestors on the list, even though not needed.
>
> 2. #1 also make the leaf_cfs_rq_list management complex and error prone,
>    this is the list of related bugfix so far:
>
>    commit 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
>    commit fe61468b2cbc ("sched/fair: Fix enqueue_task_fair warning")
>    commit b34cb07dde7c ("sched/fair: Fix enqueue_task_fair() warning some more")
>    commit 39f23ce07b93 ("sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list")
>    commit 0258bdfaff5b ("sched/fair: Fix unfairness caused by missing load decay")
>    commit a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
>    commit fdaba61ef8a2 ("sched/fair: Ensure that the CFS parent is added after unthrottling")
>    commit 2630cde26711 ("sched/fair: Add ancestors of unthrottled undecayed cfs_rq")
>
> commit 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
> delete every cfs_rq under throttled subtree from rq->leaf_cfs_rq_list,
> and delete the throttled_hierarchy() test in update_blocked_averages(),
> which optimized update_blocked_averages().
>
> But those later bugfix add cfs_rqs under throttled subtree back to
> rq->leaf_cfs_rq_list again, with their fully decayed ancestors, for
> the integrity of rq->leaf_cfs_rq_list.
>
> This patch takes another method, skip all cfs_rqs under throttled
> hierarchy when list_add_leaf_cfs_rq(), to completely make cfs_rqs
> under throttled subtree off the leaf_cfs_rq_list.
>
> So we don't need to consider throttled related things in
> enqueue_entity(), unthrottle_cfs_rq() and enqueue_task_fair(),
> which simplify the code a lot. Also optimize update_blocked_averages()
> since cfs_rqs under throttled hierarchy and their ancestors
> won't be on the leaf_cfs_rq_list.
>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
> v3:
>  - fix !CONFIG_FAIR_GROUP_SCHED build error, reported by
>    kernel test robot <lkp@intel.com>
>
> v2:
>  - move throttled_hierarchy() outside list_add_leaf_cfs_rq(),
>    suggested by Vincent.
> ---
>  kernel/sched/fair.c | 81 ++++++++++++---------------------------------
>  1 file changed, 22 insertions(+), 59 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c6da204f3068..48ff4330d68a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3114,6 +3114,8 @@ void reweight_task(struct task_struct *p, int prio)
>         load->inv_weight = sched_prio_to_wmult[prio];
>  }
>
> +static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
> +
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  #ifdef CONFIG_SMP
>  /*
> @@ -3224,8 +3226,6 @@ static long calc_group_shares(struct cfs_rq *cfs_rq)
>  }
>  #endif /* CONFIG_SMP */
>
> -static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
> -
>  /*
>   * Recomputes the group entity based on the current state of its group
>   * runqueue.
> @@ -4338,16 +4338,11 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>                 __enqueue_entity(cfs_rq, se);
>         se->on_rq = 1;
>
> -       /*
> -        * When bandwidth control is enabled, cfs might have been removed
> -        * because of a parent been throttled but cfs->nr_running > 1. Try to
> -        * add it unconditionally.
> -        */
> -       if (cfs_rq->nr_running == 1 || cfs_bandwidth_used())
> -               list_add_leaf_cfs_rq(cfs_rq);
> -
> -       if (cfs_rq->nr_running == 1)
> +       if (cfs_rq->nr_running == 1) {
>                 check_enqueue_throttle(cfs_rq);
> +               if (!throttled_hierarchy(cfs_rq))
> +                       list_add_leaf_cfs_rq(cfs_rq);
> +       }
>  }
>
>  static void __clear_buddies_last(struct sched_entity *se)
> @@ -4962,11 +4957,18 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>         /* update hierarchical throttle state */
>         walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);
>
> -       /* Nothing to run but something to decay (on_list)? Complete the branch */
>         if (!cfs_rq->load.weight) {
> -               if (cfs_rq->on_list)
> -                       goto unthrottle_throttle;
> -               return;
> +               if (!cfs_rq->on_list)
> +                       return;
> +               /*
> +                * Nothing to run but something to decay (on_list)?
> +                * Complete the branch.
> +                */
> +               for_each_sched_entity(se) {
> +                       if (list_add_leaf_cfs_rq(cfs_rq_of(se)))
> +                               break;
> +               }
> +               goto unthrottle_throttle;
>         }
>
>         task_delta = cfs_rq->h_nr_running;
> @@ -5004,31 +5006,12 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>                 /* end evaluation on encountering a throttled cfs_rq */
>                 if (cfs_rq_throttled(qcfs_rq))
>                         goto unthrottle_throttle;
> -
> -               /*
> -                * One parent has been throttled and cfs_rq removed from the
> -                * list. Add it back to not break the leaf list.
> -                */
> -               if (throttled_hierarchy(qcfs_rq))
> -                       list_add_leaf_cfs_rq(qcfs_rq);
>         }
>
>         /* At this point se is NULL and we are at root level*/
>         add_nr_running(rq, task_delta);
>
>  unthrottle_throttle:
> -       /*
> -        * The cfs_rq_throttled() breaks in the above iteration can result in
> -        * incomplete leaf list maintenance, resulting in triggering the
> -        * assertion below.
> -        */
> -       for_each_sched_entity(se) {
> -               struct cfs_rq *qcfs_rq = cfs_rq_of(se);
> -
> -               if (list_add_leaf_cfs_rq(qcfs_rq))
> -                       break;
> -       }
> -
>         assert_list_leaf_cfs_rq(rq);
>
>         /* Determine whether we need to wake up potentially idle CPU: */
> @@ -5683,13 +5666,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>                 /* end evaluation on encountering a throttled cfs_rq */
>                 if (cfs_rq_throttled(cfs_rq))
>                         goto enqueue_throttle;
> -
> -               /*
> -                * One parent has been throttled and cfs_rq removed from the
> -                * list. Add it back to not break the leaf list.
> -                */
> -               if (throttled_hierarchy(cfs_rq))
> -                       list_add_leaf_cfs_rq(cfs_rq);
>         }
>
>         /* At this point se is NULL and we are at root level*/
> @@ -5713,21 +5689,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>                 update_overutilized_status(rq);
>
>  enqueue_throttle:
> -       if (cfs_bandwidth_used()) {
> -               /*
> -                * When bandwidth control is enabled; the cfs_rq_throttled()
> -                * breaks in the above iteration can result in incomplete
> -                * leaf list maintenance, resulting in triggering the assertion
> -                * below.
> -                */
> -               for_each_sched_entity(se) {
> -                       cfs_rq = cfs_rq_of(se);
> -
> -                       if (list_add_leaf_cfs_rq(cfs_rq))
> -                               break;
> -               }
> -       }
> -
>         assert_list_leaf_cfs_rq(rq);
>
>         hrtick_update(rq);
> @@ -11257,9 +11218,10 @@ static inline bool vruntime_normalized(struct task_struct *p)
>   */
>  static void propagate_entity_cfs_rq(struct sched_entity *se)
>  {
> -       struct cfs_rq *cfs_rq;
> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>
> -       list_add_leaf_cfs_rq(cfs_rq_of(se));
> +       if (!throttled_hierarchy(cfs_rq))
> +               list_add_leaf_cfs_rq(cfs_rq);
>
>         /* Start to propagate at parent */
>         se = se->parent;
> @@ -11268,7 +11230,8 @@ static void propagate_entity_cfs_rq(struct sched_entity *se)
>                 cfs_rq = cfs_rq_of(se);
>

 you can break  if the cfs is throttled because it's sched_entity has
been dequeued. In this case we check if the cfs is throttled not if
the hierarchy is throttled

+               if (cfs_rq_throttled(cfs_rq))
+                       break;

>                 update_load_avg(cfs_rq, se, UPDATE_TG);
> -               list_add_leaf_cfs_rq(cfs_rq);
> +               if (!throttled_hierarchy(cfs_rq))
> +                       list_add_leaf_cfs_rq(cfs_rq);
>         }
>  }
>  #else
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: [PATCH v3 2/2] sched/fair: optimize and simplify rq leaf_cfs_rq_list
  2022-05-30  7:52   ` Vincent Guittot
@ 2022-05-31 13:55     ` Chengming Zhou
  2022-05-31 16:32       ` Vincent Guittot
  0 siblings, 1 reply; 7+ messages in thread
From: Chengming Zhou @ 2022-05-31 13:55 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, odin, linux-kernel, duanxiongchun,
	songmuchun

Hi,

On 2022/5/30 15:52, Vincent Guittot wrote:
> On Thu, 26 May 2022 at 12:40, Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
>>
[...]
>> @@ -11257,9 +11218,10 @@ static inline bool vruntime_normalized(struct task_struct *p)
>>   */
>>  static void propagate_entity_cfs_rq(struct sched_entity *se)
>>  {
>> -       struct cfs_rq *cfs_rq;
>> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>
>> -       list_add_leaf_cfs_rq(cfs_rq_of(se));
>> +       if (!throttled_hierarchy(cfs_rq))
>> +               list_add_leaf_cfs_rq(cfs_rq);
>>
>>         /* Start to propagate at parent */
>>         se = se->parent;
>> @@ -11268,7 +11230,8 @@ static void propagate_entity_cfs_rq(struct sched_entity *se)
>>                 cfs_rq = cfs_rq_of(se);
>>
> 
>  you can break  if the cfs is throttled because it's sched_entity has
> been dequeued. In this case we check if the cfs is throttled not if
> the hierarchy is throttled
> 
> +               if (cfs_rq_throttled(cfs_rq))
> +                       break;
> 

This propagate part still make me confused. :-)

I wonder if you think we should change like this:

static void propagate_entity_cfs_rq(struct sched_entity *se)
{
        struct cfs_rq *cfs_rq = cfs_rq_of(se);

        if (cfs_rq_throttled(cfs_rq))			--> break if cfs is throttled
                return;

        if (!throttled_hierarchy(cfs_rq))
                list_add_leaf_cfs_rq(cfs_rq);

        /* Start to propagate at parent */
        se = se->parent;

        for_each_sched_entity(se) {
                cfs_rq = cfs_rq_of(se);

                if (cfs_rq_throttled(cfs_rq))		--> break if cfs is throttled
                        break;

                update_load_avg(cfs_rq, se, UPDATE_TG); --> throttled cfs_rq->prop not updated
                if (!throttled_hierarchy(cfs_rq))
                        list_add_leaf_cfs_rq(cfs_rq);
        }
}



If I understand right, we should update_load_avg() until cfs_rq_throttled(),
including that throttled cfs_rq? So we can go on propagating when unthrottle.

Maybe like this?

static void propagate_entity_cfs_rq(struct sched_entity *se)
{
        struct cfs_rq *cfs_rq = cfs_rq_of(se);

        if (cfs_rq_throttled(cfs_rq))			--> break if cfs is throttled
                return;

        if (!throttled_hierarchy(cfs_rq))
                list_add_leaf_cfs_rq(cfs_rq);

        /* Start to propagate at parent */
        se = se->parent;

        for_each_sched_entity(se) {
                cfs_rq = cfs_rq_of(se);

                update_load_avg(cfs_rq, se, UPDATE_TG);	--> update throttled cfs_rq->prop

                if (cfs_rq_throttled(cfs_rq))		--> break if cfs is throttled
                        break;

                if (!throttled_hierarchy(cfs_rq))
                        list_add_leaf_cfs_rq(cfs_rq);
        }
}


Thanks!


>>                 update_load_avg(cfs_rq, se, UPDATE_TG);
>> -               list_add_leaf_cfs_rq(cfs_rq);
>> +               if (!throttled_hierarchy(cfs_rq))
>> +                       list_add_leaf_cfs_rq(cfs_rq);
>>         }
>>  }
>>  #else
>> --
>> 2.36.1
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: [PATCH v3 2/2] sched/fair: optimize and simplify rq leaf_cfs_rq_list
  2022-05-31 13:55     ` [External] " Chengming Zhou
@ 2022-05-31 16:32       ` Vincent Guittot
  0 siblings, 0 replies; 7+ messages in thread
From: Vincent Guittot @ 2022-05-31 16:32 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, odin, linux-kernel, duanxiongchun,
	songmuchun

On Tue, 31 May 2022 at 15:55, Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> Hi,
>
> On 2022/5/30 15:52, Vincent Guittot wrote:
> > On Thu, 26 May 2022 at 12:40, Chengming Zhou
> > <zhouchengming@bytedance.com> wrote:
> >>
> [...]
> >> @@ -11257,9 +11218,10 @@ static inline bool vruntime_normalized(struct task_struct *p)
> >>   */
> >>  static void propagate_entity_cfs_rq(struct sched_entity *se)
> >>  {
> >> -       struct cfs_rq *cfs_rq;
> >> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >>
> >> -       list_add_leaf_cfs_rq(cfs_rq_of(se));
> >> +       if (!throttled_hierarchy(cfs_rq))
> >> +               list_add_leaf_cfs_rq(cfs_rq);
> >>
> >>         /* Start to propagate at parent */
> >>         se = se->parent;
> >> @@ -11268,7 +11230,8 @@ static void propagate_entity_cfs_rq(struct sched_entity *se)
> >>                 cfs_rq = cfs_rq_of(se);
> >>
> >
> >  you can break  if the cfs is throttled because it's sched_entity has
> > been dequeued. In this case we check if the cfs is throttled not if
> > the hierarchy is throttled
> >
> > +               if (cfs_rq_throttled(cfs_rq))
> > +                       break;
> >
>
> This propagate part still make me confused. :-)
>
> I wonder if you think we should change like this:
>
> static void propagate_entity_cfs_rq(struct sched_entity *se)
> {
>         struct cfs_rq *cfs_rq = cfs_rq_of(se);
>
>         if (cfs_rq_throttled(cfs_rq))                   --> break if cfs is throttled
>                 return;
>
>         if (!throttled_hierarchy(cfs_rq))
>                 list_add_leaf_cfs_rq(cfs_rq);
>
>         /* Start to propagate at parent */
>         se = se->parent;
>
>         for_each_sched_entity(se) {
>                 cfs_rq = cfs_rq_of(se);
>
>                 if (cfs_rq_throttled(cfs_rq))           --> break if cfs is throttled
>                         break;
>
>                 update_load_avg(cfs_rq, se, UPDATE_TG); --> throttled cfs_rq->prop not updated
>                 if (!throttled_hierarchy(cfs_rq))
>                         list_add_leaf_cfs_rq(cfs_rq);
>         }
> }
>
>
>
> If I understand right, we should update_load_avg() until cfs_rq_throttled(),
> including that throttled cfs_rq? So we can go on propagating when unthrottle.
>
> Maybe like this?
>
> static void propagate_entity_cfs_rq(struct sched_entity *se)
> {
>         struct cfs_rq *cfs_rq = cfs_rq_of(se);
>
>         if (cfs_rq_throttled(cfs_rq))                   --> break if cfs is throttled
>                 return;
>
>         if (!throttled_hierarchy(cfs_rq))
>                 list_add_leaf_cfs_rq(cfs_rq);
>
>         /* Start to propagate at parent */
>         se = se->parent;
>
>         for_each_sched_entity(se) {
>                 cfs_rq = cfs_rq_of(se);
>
>                 update_load_avg(cfs_rq, se, UPDATE_TG); --> update throttled cfs_rq->prop

Yes, that looks correct.
So we will ensure that the attach/detach is propagated down up to
thethrottle cfs

>
>                 if (cfs_rq_throttled(cfs_rq))           --> break if cfs is throttled
>                         break;
>
>                 if (!throttled_hierarchy(cfs_rq))
>                         list_add_leaf_cfs_rq(cfs_rq);
>         }
> }
>
>
> Thanks!
>
>
> >>                 update_load_avg(cfs_rq, se, UPDATE_TG);
> >> -               list_add_leaf_cfs_rq(cfs_rq);
> >> +               if (!throttled_hierarchy(cfs_rq))
> >> +                       list_add_leaf_cfs_rq(cfs_rq);
> >>         }
> >>  }
> >>  #else
> >> --
> >> 2.36.1
> >>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-05-31 16:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-26 10:39 [PATCH v3 0/2] sched/fair: optimize and simplify rq Chengming Zhou
2022-05-26 10:39 ` [PATCH v3 1/2] sched/fair: fix propagate during synchronous attach/detach Chengming Zhou
2022-05-30  7:31   ` Vincent Guittot
2022-05-26 10:39 ` [PATCH v3 2/2] sched/fair: optimize and simplify rq leaf_cfs_rq_list Chengming Zhou
2022-05-30  7:52   ` Vincent Guittot
2022-05-31 13:55     ` [External] " Chengming Zhou
2022-05-31 16:32       ` Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.