linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Rik van Riel <riel@surriel.com>, linux-kernel@vger.kernel.org
Cc: kernel-team@fb.com, pjt@google.com, peterz@infradead.org,
	mingo@redhat.com, morten.rasmussen@arm.com, tglx@linutronix.de,
	mgorman@techsingularity.net, vincent.guittot@linaro.org
Subject: Re: [PATCH 11/15] sched,fair: flatten hierarchical runqueues
Date: Fri, 23 Aug 2019 20:14:41 +0200	[thread overview]
Message-ID: <967114b2-15a7-b445-3133-074732b20e34@arm.com> (raw)
In-Reply-To: <20190822021740.15554-12-riel@surriel.com>

On 22/08/2019 04:17, Rik van Riel wrote:
> Flatten the hierarchical runqueues into just the per CPU rq.cfs runqueue.
> 
> Iteration of the sched_entity hierarchy is rate limited to once per jiffy
> per sched_entity, which is a smaller change than it seems, because load
> average adjustments were already rate limited to once per jiffy before this
> patch series.
> 
> This patch breaks CONFIG_CFS_BANDWIDTH. The plan for that is to park tasks
> from throttled cgroups onto their cgroup runqueues, and slowly (using the
> GENTLE_FAIR_SLEEPERS) wake them back up, in vruntime order, once the cgroup
> gets unthrottled, to prevent thundering herd issues.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> 
> Header from folded patch 'fix-attach-detach_enticy_cfs_rq.patch~':
> 
> Subject: sched,fair: fix attach/detach_entity_cfs_rq
> 
> While attach_entity_cfs_rq and detach_entity_cfs_rq should iterate over
> the hierarchy, they do not need to so that twice.
> 
> Passing flags into propagate_entity_cfs_rq allows us to reuse that same
> loop from other functions.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> 
> 
> Header from folded patch 'enqueue-order.patch':
> 
> Subject: sched,fair: better ordering at enqueue_task_fair time
> 
> In order to get useful numbers for the task's hierarchical weight,
> task priority, etc things need to be done in a certain order at task
> enqueue time.
> 
> Specifically:
> 1) static load/weight to "local" cfs_rq
> 2) propagate load/weight up the tree
> 3) add runnable load avg to root cfs_rq
> 
> The reason is that each step depends on the things done by the
> step beforehand, and we can end up with nonsense numbers if we
> do not do things right.
> 
> Also, make sure that we walk all the way up the hierarchy at
> enqueue_task_fair time in order to get the benefit from the ramp-up
> logic in update_cfs_group.

[...]

>  /*
> @@ -6953,7 +6849,6 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
>  	if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>  		return;
>  
> -	find_matching_se(&se, &pse);
>  	update_curr(cfs_rq_of(se));
>  	BUG_ON(!pse);
>  	if (wakeup_preempt_entity(se, pse) == 1) {
> @@ -6994,100 +6889,18 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
>  	struct task_struct *p;
>  	int new_tasks;
>  
> +	put_prev_task(rq, prev);
>  again:
>  	if (!cfs_rq->nr_running)
>  		goto idle;
>  
> -#ifdef CONFIG_FAIR_GROUP_SCHED
> -	if (prev->sched_class != &fair_sched_class)
> -		goto simple;
> -
> -	/*
> -	 * Because of the set_next_buddy() in dequeue_task_fair() it is rather
> -	 * likely that a next task is from the same cgroup as the current.
> -	 *
> -	 * Therefore attempt to avoid putting and setting the entire cgroup
> -	 * hierarchy, only change the part that actually changes.
> -	 */
> -
> -	do {
> -		struct sched_entity *curr = cfs_rq->curr;
> -
> -		/*
> -		 * Since we got here without doing put_prev_entity() we also
> -		 * have to consider cfs_rq->curr. If it is still a runnable
> -		 * entity, update_curr() will update its vruntime, otherwise
> -		 * forget we've ever seen it.
> -		 */
> -		if (curr) {
> -			if (curr->on_rq)
> -				update_curr(cfs_rq);
> -			else
> -				curr = NULL;
> -
> -			/*
> -			 * This call to check_cfs_rq_runtime() will do the
> -			 * throttle and dequeue its entity in the parent(s).
> -			 * Therefore the nr_running test will indeed
> -			 * be correct.
> -			 */
> -			if (unlikely(check_cfs_rq_runtime(cfs_rq))) {
> -				cfs_rq = &rq->cfs;
> -
> -				if (!cfs_rq->nr_running)
> -					goto idle;
> -
> -				goto simple;
> -			}
> -		}
> -
> -		se = pick_next_entity(cfs_rq, curr);
> -		cfs_rq = group_cfs_rq(se);
> -	} while (cfs_rq);
> -
> -	p = task_of(se);
> -
> -	/*
> -	 * Since we haven't yet done put_prev_entity and if the selected task
> -	 * is a different task than we started out with, try and touch the
> -	 * least amount of cfs_rqs.
> -	 */
> -	if (prev != p) {
> -		struct sched_entity *pse = &prev->se;
> -
> -		while (!(cfs_rq = is_same_group(se, pse))) {
> -			int se_depth = se->depth;
> -			int pse_depth = pse->depth;
> -
> -			if (se_depth <= pse_depth) {
> -				put_prev_entity(cfs_rq_of(pse), pse);
> -				pse = parent_entity(pse);
> -			}
> -			if (se_depth >= pse_depth) {
> -				set_next_entity(cfs_rq_of(se), se);
> -				se = parent_entity(se);
> -			}

Looks like with the se->depth related code gone here in
pick_next_task_fair() and the call to find_matching_se() in
check_preempt_wakeup() you could remove se->depth entirely.

[...]

  reply	other threads:[~2019-08-23 18:14 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-22  2:17 [PATCH RFC v4 0/15] sched,fair: flatten CPU controller runqueues Rik van Riel
2019-08-22  2:17 ` [PATCH 01/15] sched: introduce task_se_h_load helper Rik van Riel
2019-08-23 18:13   ` Dietmar Eggemann
2019-08-24  0:05     ` Rik van Riel
2019-08-22  2:17 ` [PATCH 02/15] sched: change /proc/sched_debug fields Rik van Riel
2019-08-22  2:17 ` [PATCH 03/15] sched,fair: redefine runnable_load_avg as the sum of task_h_load Rik van Riel
2019-08-28 13:50   ` Vincent Guittot
2019-08-28 14:47     ` Rik van Riel
2019-08-28 15:02       ` Vincent Guittot
2019-08-22  2:17 ` [PATCH 04/15] sched,fair: move runnable_load_avg to cfs_rq Rik van Riel
2019-08-22  2:17 ` [PATCH 05/15] sched,fair: remove cfs_rqs from leaf_cfs_rq_list bottom up Rik van Riel
2019-08-28 14:09   ` Vincent Guittot
2019-08-22  2:17 ` [PATCH 06/15] sched,cfs: use explicit cfs_rq of parent se helper Rik van Riel
2019-08-28 13:53   ` Vincent Guittot
2019-08-28 15:28     ` Rik van Riel
2019-08-28 16:34       ` Vincent Guittot
2019-08-22  2:17 ` [PATCH 07/15] sched,cfs: fix zero length timeslice calculation Rik van Riel
2019-08-28 16:59   ` Vincent Guittot
2019-08-22  2:17 ` [PATCH 08/15] sched,fair: simplify timeslice length code Rik van Riel
2019-08-28 17:32   ` Vincent Guittot
2019-08-28 23:18     ` Rik van Riel
2019-08-29 14:02       ` Vincent Guittot
2019-08-29 16:00         ` Rik van Riel
2019-08-30  6:41           ` Vincent Guittot
2019-08-30 15:01             ` Rik van Riel
2019-09-02  7:51               ` Vincent Guittot
2019-09-02 17:47                 ` Rik van Riel
2019-08-22  2:17 ` [PATCH 09/15] sched,fair: refactor enqueue/dequeue_entity Rik van Riel
2019-09-03 15:38   ` Vincent Guittot
2019-09-03 20:27     ` Rik van Riel
2019-09-04  6:44       ` Vincent Guittot
2019-08-22  2:17 ` [PATCH 10/15] sched,fair: add helper functions for flattened runqueue Rik van Riel
2019-08-22  2:17 ` [PATCH 11/15] sched,fair: flatten hierarchical runqueues Rik van Riel
2019-08-23 18:14   ` Dietmar Eggemann [this message]
2019-08-24  1:16     ` Rik van Riel
2019-08-22  2:17 ` [PATCH 12/15] sched,fair: flatten update_curr functionality Rik van Riel
2019-08-27 10:37   ` Dietmar Eggemann
2019-08-22  2:17 ` [PATCH 13/15] sched,fair: propagate sum_exec_runtime up the hierarchy Rik van Riel
2019-08-28  7:51   ` Dietmar Eggemann
2019-08-28 13:14     ` Rik van Riel
2019-08-29 17:20       ` Dietmar Eggemann
2019-08-29 18:06         ` Rik van Riel
2019-08-22  2:17 ` [PATCH 14/15] sched,fair: ramp up task_se_h_weight quickly Rik van Riel
2019-08-22  2:17 ` [PATCH 15/15] sched,fair: scale vdiff in wakeup_preempt_entity Rik van Riel
2019-09-02 10:53 ` [PATCH RFC v4 0/15] sched,fair: flatten CPU controller runqueues Dietmar Eggemann
2019-09-03  1:44   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=967114b2-15a7-b445-3133-074732b20e34@arm.com \
    --to=dietmar.eggemann@arm.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).