From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Rik van Riel <riel@surriel.com>, linux-kernel@vger.kernel.org
Cc: kernel-team@fb.com, pjt@google.com, peterz@infradead.org,
mingo@redhat.com, morten.rasmussen@arm.com, tglx@linutronix.de,
mgorman@techsingularity.net, vincent.guittot@linaro.org
Subject: Re: [PATCH 11/15] sched,fair: flatten hierarchical runqueues
Date: Fri, 23 Aug 2019 20:14:41 +0200 [thread overview]
Message-ID: <967114b2-15a7-b445-3133-074732b20e34@arm.com> (raw)
In-Reply-To: <20190822021740.15554-12-riel@surriel.com>
On 22/08/2019 04:17, Rik van Riel wrote:
> Flatten the hierarchical runqueues into just the per CPU rq.cfs runqueue.
>
> Iteration of the sched_entity hierarchy is rate limited to once per jiffy
> per sched_entity, which is a smaller change than it seems, because load
> average adjustments were already rate limited to once per jiffy before this
> patch series.
>
> This patch breaks CONFIG_CFS_BANDWIDTH. The plan for that is to park tasks
> from throttled cgroups onto their cgroup runqueues, and slowly (using the
> GENTLE_FAIR_SLEEPERS) wake them back up, in vruntime order, once the cgroup
> gets unthrottled, to prevent thundering herd issues.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
>
> Header from folded patch 'fix-attach-detach_enticy_cfs_rq.patch~':
>
> Subject: sched,fair: fix attach/detach_entity_cfs_rq
>
> While attach_entity_cfs_rq and detach_entity_cfs_rq should iterate over
> the hierarchy, they do not need to so that twice.
>
> Passing flags into propagate_entity_cfs_rq allows us to reuse that same
> loop from other functions.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
>
>
> Header from folded patch 'enqueue-order.patch':
>
> Subject: sched,fair: better ordering at enqueue_task_fair time
>
> In order to get useful numbers for the task's hierarchical weight,
> task priority, etc things need to be done in a certain order at task
> enqueue time.
>
> Specifically:
> 1) static load/weight to "local" cfs_rq
> 2) propagate load/weight up the tree
> 3) add runnable load avg to root cfs_rq
>
> The reason is that each step depends on the things done by the
> step beforehand, and we can end up with nonsense numbers if we
> do not do things right.
>
> Also, make sure that we walk all the way up the hierarchy at
> enqueue_task_fair time in order to get the benefit from the ramp-up
> logic in update_cfs_group.
[...]
> /*
> @@ -6953,7 +6849,6 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
> return;
>
> - find_matching_se(&se, &pse);
> update_curr(cfs_rq_of(se));
> BUG_ON(!pse);
> if (wakeup_preempt_entity(se, pse) == 1) {
> @@ -6994,100 +6889,18 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
> struct task_struct *p;
> int new_tasks;
>
> + put_prev_task(rq, prev);
> again:
> if (!cfs_rq->nr_running)
> goto idle;
>
> -#ifdef CONFIG_FAIR_GROUP_SCHED
> - if (prev->sched_class != &fair_sched_class)
> - goto simple;
> -
> - /*
> - * Because of the set_next_buddy() in dequeue_task_fair() it is rather
> - * likely that a next task is from the same cgroup as the current.
> - *
> - * Therefore attempt to avoid putting and setting the entire cgroup
> - * hierarchy, only change the part that actually changes.
> - */
> -
> - do {
> - struct sched_entity *curr = cfs_rq->curr;
> -
> - /*
> - * Since we got here without doing put_prev_entity() we also
> - * have to consider cfs_rq->curr. If it is still a runnable
> - * entity, update_curr() will update its vruntime, otherwise
> - * forget we've ever seen it.
> - */
> - if (curr) {
> - if (curr->on_rq)
> - update_curr(cfs_rq);
> - else
> - curr = NULL;
> -
> - /*
> - * This call to check_cfs_rq_runtime() will do the
> - * throttle and dequeue its entity in the parent(s).
> - * Therefore the nr_running test will indeed
> - * be correct.
> - */
> - if (unlikely(check_cfs_rq_runtime(cfs_rq))) {
> - cfs_rq = &rq->cfs;
> -
> - if (!cfs_rq->nr_running)
> - goto idle;
> -
> - goto simple;
> - }
> - }
> -
> - se = pick_next_entity(cfs_rq, curr);
> - cfs_rq = group_cfs_rq(se);
> - } while (cfs_rq);
> -
> - p = task_of(se);
> -
> - /*
> - * Since we haven't yet done put_prev_entity and if the selected task
> - * is a different task than we started out with, try and touch the
> - * least amount of cfs_rqs.
> - */
> - if (prev != p) {
> - struct sched_entity *pse = &prev->se;
> -
> - while (!(cfs_rq = is_same_group(se, pse))) {
> - int se_depth = se->depth;
> - int pse_depth = pse->depth;
> -
> - if (se_depth <= pse_depth) {
> - put_prev_entity(cfs_rq_of(pse), pse);
> - pse = parent_entity(pse);
> - }
> - if (se_depth >= pse_depth) {
> - set_next_entity(cfs_rq_of(se), se);
> - se = parent_entity(se);
> - }
Looks like with the se->depth related code gone here in
pick_next_task_fair() and the call to find_matching_se() in
check_preempt_wakeup() you could remove se->depth entirely.
[...]
next prev parent reply other threads:[~2019-08-23 18:14 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-22 2:17 [PATCH RFC v4 0/15] sched,fair: flatten CPU controller runqueues Rik van Riel
2019-08-22 2:17 ` [PATCH 01/15] sched: introduce task_se_h_load helper Rik van Riel
2019-08-23 18:13 ` Dietmar Eggemann
2019-08-24 0:05 ` Rik van Riel
2019-08-22 2:17 ` [PATCH 02/15] sched: change /proc/sched_debug fields Rik van Riel
2019-08-22 2:17 ` [PATCH 03/15] sched,fair: redefine runnable_load_avg as the sum of task_h_load Rik van Riel
2019-08-28 13:50 ` Vincent Guittot
2019-08-28 14:47 ` Rik van Riel
2019-08-28 15:02 ` Vincent Guittot
2019-08-22 2:17 ` [PATCH 04/15] sched,fair: move runnable_load_avg to cfs_rq Rik van Riel
2019-08-22 2:17 ` [PATCH 05/15] sched,fair: remove cfs_rqs from leaf_cfs_rq_list bottom up Rik van Riel
2019-08-28 14:09 ` Vincent Guittot
2019-08-22 2:17 ` [PATCH 06/15] sched,cfs: use explicit cfs_rq of parent se helper Rik van Riel
2019-08-28 13:53 ` Vincent Guittot
2019-08-28 15:28 ` Rik van Riel
2019-08-28 16:34 ` Vincent Guittot
2019-08-22 2:17 ` [PATCH 07/15] sched,cfs: fix zero length timeslice calculation Rik van Riel
2019-08-28 16:59 ` Vincent Guittot
2019-08-22 2:17 ` [PATCH 08/15] sched,fair: simplify timeslice length code Rik van Riel
2019-08-28 17:32 ` Vincent Guittot
2019-08-28 23:18 ` Rik van Riel
2019-08-29 14:02 ` Vincent Guittot
2019-08-29 16:00 ` Rik van Riel
2019-08-30 6:41 ` Vincent Guittot
2019-08-30 15:01 ` Rik van Riel
2019-09-02 7:51 ` Vincent Guittot
2019-09-02 17:47 ` Rik van Riel
2019-08-22 2:17 ` [PATCH 09/15] sched,fair: refactor enqueue/dequeue_entity Rik van Riel
2019-09-03 15:38 ` Vincent Guittot
2019-09-03 20:27 ` Rik van Riel
2019-09-04 6:44 ` Vincent Guittot
2019-08-22 2:17 ` [PATCH 10/15] sched,fair: add helper functions for flattened runqueue Rik van Riel
2019-08-22 2:17 ` [PATCH 11/15] sched,fair: flatten hierarchical runqueues Rik van Riel
2019-08-23 18:14 ` Dietmar Eggemann [this message]
2019-08-24 1:16 ` Rik van Riel
2019-08-22 2:17 ` [PATCH 12/15] sched,fair: flatten update_curr functionality Rik van Riel
2019-08-27 10:37 ` Dietmar Eggemann
2019-08-22 2:17 ` [PATCH 13/15] sched,fair: propagate sum_exec_runtime up the hierarchy Rik van Riel
2019-08-28 7:51 ` Dietmar Eggemann
2019-08-28 13:14 ` Rik van Riel
2019-08-29 17:20 ` Dietmar Eggemann
2019-08-29 18:06 ` Rik van Riel
2019-08-22 2:17 ` [PATCH 14/15] sched,fair: ramp up task_se_h_weight quickly Rik van Riel
2019-08-22 2:17 ` [PATCH 15/15] sched,fair: scale vdiff in wakeup_preempt_entity Rik van Riel
2019-09-02 10:53 ` [PATCH RFC v4 0/15] sched,fair: flatten CPU controller runqueues Dietmar Eggemann
2019-09-03 1:44 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=967114b2-15a7-b445-3133-074732b20e34@arm.com \
--to=dietmar.eggemann@arm.com \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=riel@surriel.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).