All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: vpillai <vpillai@digitalocean.com>
Cc: Nishanth Aravamudan <naravamudan@digitalocean.com>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	mingo@kernel.org, tglx@linutronix.de, pjt@google.com,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com,
	Phil Auld <pauld@redhat.com>, Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	aubrey.li@linux.intel.com,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Aaron Lu <aaron.lu@linux.alibaba.com>
Subject: Re: [RFC PATCH 07/13] sched: Add core wide task selection and scheduling.
Date: Thu, 21 May 2020 22:35:56 -0400	[thread overview]
Message-ID: <20200522023556.GE140701@google.com> (raw)
In-Reply-To: <20200521231426.GA246288@google.com>

On Thu, May 21, 2020 at 07:14:26PM -0400, Joel Fernandes wrote:
> On Wed, Mar 04, 2020 at 04:59:57PM +0000, vpillai wrote:
> > From: Peter Zijlstra <peterz@infradead.org>
> > 
> > Instead of only selecting a local task, select a task for all SMT
> > siblings for every reschedule on the core (irrespective which logical
> > CPU does the reschedule).
> > 
> > There could be races in core scheduler where a CPU is trying to pick
> > a task for its sibling in core scheduler, when that CPU has just been
> > offlined.  We should not schedule any tasks on the CPU in this case.
> > Return an idle task in pick_next_task for this situation.
> > 
> > NOTE: there is still potential for siblings rivalry.
> > NOTE: this is far too complicated; but thus far I've failed to
> >       simplify it further.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Julien Desfossez <jdesfossez@digitalocean.com>
> > Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com>
> > Signed-off-by: Aaron Lu <aaron.lu@linux.alibaba.com>
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > ---
> >  kernel/sched/core.c  | 274 ++++++++++++++++++++++++++++++++++++++++++-
> >  kernel/sched/fair.c  |  40 +++++++
> >  kernel/sched/sched.h |   6 +-
> >  3 files changed, 318 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 445f0d519336..9a1bd236044e 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4253,7 +4253,7 @@ static inline void schedule_debug(struct task_struct *prev, bool preempt)
> >   * Pick up the highest-prio task:
> >   */
> >  static inline struct task_struct *
> > -pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> > +__pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> >  {
> >  	const struct sched_class *class;
> >  	struct task_struct *p;
> > @@ -4309,6 +4309,273 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> >  	BUG();
> >  }
> >  
> > +#ifdef CONFIG_SCHED_CORE
> > +
> > +static inline bool cookie_equals(struct task_struct *a, unsigned long cookie)
> > +{
> > +	return is_idle_task(a) || (a->core_cookie == cookie);
> > +}
> > +
> > +static inline bool cookie_match(struct task_struct *a, struct task_struct *b)
> > +{
> > +	if (is_idle_task(a) || is_idle_task(b))
> > +		return true;
> > +
> > +	return a->core_cookie == b->core_cookie;
> > +}
> > +
> > +// XXX fairness/fwd progress conditions
> > +/*
> > + * Returns
> > + * - NULL if there is no runnable task for this class.
> > + * - the highest priority task for this runqueue if it matches
> > + *   rq->core->core_cookie or its priority is greater than max.
> > + * - Else returns idle_task.
> > + */
> > +static struct task_struct *
> > +pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *max)
> > +{
> > +	struct task_struct *class_pick, *cookie_pick;
> > +	unsigned long cookie = rq->core->core_cookie;
> > +
> > +	class_pick = class->pick_task(rq);
> > +	if (!class_pick)
> > +		return NULL;
> > +
> > +	if (!cookie) {
> > +		/*
> > +		 * If class_pick is tagged, return it only if it has
> > +		 * higher priority than max.
> > +		 */
> > +		if (max && class_pick->core_cookie &&
> > +		    prio_less(class_pick, max))
> > +			return idle_sched_class.pick_task(rq);
> > +
> > +		return class_pick;
> > +	}
> > +
> > +	/*
> > +	 * If class_pick is idle or matches cookie, return early.
> > +	 */
> > +	if (cookie_equals(class_pick, cookie))
> > +		return class_pick;
> > +
> > +	cookie_pick = sched_core_find(rq, cookie);
> > +
> > +	/*
> > +	 * If class > max && class > cookie, it is the highest priority task on
> > +	 * the core (so far) and it must be selected, otherwise we must go with
> > +	 * the cookie pick in order to satisfy the constraint.
> > +	 */
> > +	if (prio_less(cookie_pick, class_pick) &&
> > +	    (!max || prio_less(max, class_pick)))
> > +		return class_pick;
> > +
> > +	return cookie_pick;
> > +}
> 
> I've been hating on this pick_task() routine for a while now :-). If we add
> the task to the tag tree as Peter suggested at OSPM for that other issue
> Vineeth found, it seems it could be simpler.
> 
> This has just been near a compiler so far but how about:

Discussed a lot with Vineeth. Below is an improved version of the pick_task()
similification.

It also handles the following "bug" in the existing code as well that Vineeth
brought up in OSPM: Suppose 2 siblings of a core: rq 1 and rq 2.

In priority order (high to low), say we have the tasks:
A - untagged  (rq 1)
B - tagged    (rq 2)
C - untagged  (rq 2)

Say, B and C are in the same scheduling class.

When the pick_next_task() loop runs, it looks at rq 1 and max is A, A is
tenantively selected for rq 1. Then it looks at rq 2 and the class_pick is B.
But that's not compatible with A. So rq 2 gets forced idle.

In reality, rq 2 could have run C instead of idle. The fix is to add C to the
tag tree as Peter suggested in OSPM.

Updated diff below:

---8<-----------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 005d7f7323e2d..625377f393ed3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -182,9 +182,6 @@ static void sched_core_enqueue(struct rq *rq, struct task_struct *p)
 
 	rq->core->core_task_seq++;
 
-	if (!p->core_cookie)
-		return;
-
 	node = &rq->core_tree.rb_node;
 	parent = *node;
 
@@ -215,7 +212,7 @@ static void sched_core_dequeue(struct rq *rq, struct task_struct *p)
 
 void sched_core_add(struct rq *rq, struct task_struct *p)
 {
-	if (p->core_cookie && task_on_rq_queued(p))
+	if (task_on_rq_queued(p))
 		sched_core_enqueue(rq, p);
 }
 
@@ -4556,43 +4553,57 @@ void sched_core_irq_exit(void)
 static struct task_struct *
 pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *max)
 {
-	struct task_struct *class_pick, *cookie_pick;
+	struct task_struct *class_pick, *cookie_pick, *rq_pick;
 	unsigned long cookie = rq->core->core_cookie;
 
 	class_pick = class->pick_task(rq);
 	if (!class_pick)
 		return NULL;
 
-	if (!cookie) {
-		/*
-		 * If class_pick is tagged, return it only if it has
-		 * higher priority than max.
-		 */
-		if (max && class_pick->core_cookie &&
-		    prio_less(class_pick, max))
-			return idle_sched_class.pick_task(rq);
+	if (!max)
+		return class_pick;
+
+	/* Make sure the current max's cookie is core->core_cookie */
+	WARN_ON_ONCE(max->core_cookie != cookie);
 
+	/* Try to play really nice: see if the class's cookie works. */
+	if (cookie_equals(class_pick, cookie))
 		return class_pick;
-	}
 
 	/*
-	 * If class_pick is idle or matches cookie, return early.
+	 * From here on, we must return class_pick, cookie_pick or idle.
+	 * Following are the cases:
+	 * 1 - lowest prio.
+	 * 3 - highest prio.
+	 *
+	 * max	class	cookie	outcome
+	 * 1	2	3	cookie
+	 * 1	3	2	class
+	 * 2	1	3	cookie
+	 * 2	3	1	class
+	 * 3	1	2	cookie
+	 * 3	2	1	cookie
+	 * 3	2	-	return idle (when no cookie task).
 	 */
-	if (cookie_equals(class_pick, cookie))
-		return class_pick;
 
+	/* First try to find the highest prio of (cookie, class and max). */
 	cookie_pick = sched_core_find(rq, cookie);
+	if (cookie_pick && prio_less(class_pick, cookie_pick))
+		rq_pick = cookie_pick;
+	else
+		rq_pick = class_pick;
+	if (prio_less(max, rq_pick))
+		return rq_pick;
+
+	/* If we max was greatest, then see if there was a cookie. */
+	if (cookie_pick)
+		return cookie_pick;
 
 	/*
-	 * If class > max && class > cookie, it is the highest priority task on
-	 * the core (so far) and it must be selected, otherwise we must go with
-	 * the cookie pick in order to satisfy the constraint.
+	 * We get here with if class_pick was incompatible with max
+	 * and lower prio than max. So we have nothing.
 	 */
-	if (prio_less(cookie_pick, class_pick) &&
-	    (!max || prio_less(max, class_pick)))
-		return class_pick;
-
-	return cookie_pick;
+	return idle_sched_class.pick_task(rq);
 }
 
 static struct task_struct *

  parent reply	other threads:[~2020-05-22  2:36 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-04 16:59 [RFC PATCH 00/13] Core scheduling v5 vpillai
2020-03-04 16:59 ` [RFC PATCH 01/13] sched: Wrap rq::lock access vpillai
2020-03-04 16:59 ` [RFC PATCH 02/13] sched: Introduce sched_class::pick_task() vpillai
2020-03-04 16:59 ` [RFC PATCH 03/13] sched: Core-wide rq->lock vpillai
2020-04-01 11:42   ` [PATCH] sched/arm64: store cpu topology before notify_cpu_starting Cheng Jian
2020-04-01 13:23     ` Valentin Schneider
2020-04-01 13:23       ` Valentin Schneider
2020-04-06  8:00       ` chengjian (D)
2020-04-06  8:00         ` chengjian (D)
2020-04-09  9:59       ` Sudeep Holla
2020-04-09  9:59         ` Sudeep Holla
2020-04-09 10:32         ` Valentin Schneider
2020-04-09 10:32           ` Valentin Schneider
2020-04-09 11:08           ` Sudeep Holla
2020-04-09 11:08             ` Sudeep Holla
2020-04-09 17:54     ` Joel Fernandes
2020-04-10 13:49       ` chengjian (D)
2020-04-14 11:36   ` [RFC PATCH 03/13] sched: Core-wide rq->lock Peter Zijlstra
2020-04-14 21:35     ` Vineeth Remanan Pillai
2020-04-15 10:55       ` Peter Zijlstra
2020-04-14 14:32   ` Peter Zijlstra
2020-03-04 16:59 ` [RFC PATCH 04/13] sched/fair: Add a few assertions vpillai
2020-03-04 16:59 ` [RFC PATCH 05/13] sched: Basic tracking of matching tasks vpillai
2020-03-04 16:59 ` [RFC PATCH 06/13] sched: Update core scheduler queue when taking cpu online/offline vpillai
2020-03-04 16:59 ` [RFC PATCH 07/13] sched: Add core wide task selection and scheduling vpillai
2020-04-14 13:35   ` Peter Zijlstra
2020-04-16 23:32     ` Tim Chen
2020-04-17 10:57       ` Peter Zijlstra
2020-04-16  3:39   ` Chen Yu
2020-04-16 19:59     ` Vineeth Remanan Pillai
2020-04-17 11:18     ` Peter Zijlstra
2020-04-19 15:31       ` Chen Yu
2020-05-21 23:14   ` Joel Fernandes
2020-05-21 23:16     ` Joel Fernandes
2020-05-22  2:35     ` Joel Fernandes [this message]
2020-05-22  3:44       ` Aaron Lu
2020-05-22 20:13         ` Joel Fernandes
2020-03-04 16:59 ` [RFC PATCH 08/13] sched/fair: wrapper for cfs_rq->min_vruntime vpillai
2020-03-04 16:59 ` [RFC PATCH 09/13] sched/fair: core wide vruntime comparison vpillai
2020-04-14 13:56   ` Peter Zijlstra
2020-04-15  3:34     ` Aaron Lu
2020-04-15  4:07       ` Aaron Lu
2020-04-15 21:24         ` Vineeth Remanan Pillai
2020-04-17  9:40           ` Aaron Lu
2020-04-20  8:07             ` [PATCH updated] sched/fair: core wide cfs task priority comparison Aaron Lu
2020-04-20 22:26               ` Vineeth Remanan Pillai
2020-04-21  2:51                 ` Aaron Lu
2020-04-24 14:24                   ` [PATCH updated v2] " Aaron Lu
2020-05-06 14:35                     ` Peter Zijlstra
2020-05-08  8:44                       ` Aaron Lu
2020-05-08  9:09                         ` Peter Zijlstra
2020-05-08 12:34                           ` Aaron Lu
2020-05-14 13:02                             ` Peter Zijlstra
2020-05-14 22:51                               ` Vineeth Remanan Pillai
2020-05-15 10:38                                 ` Peter Zijlstra
2020-05-15 10:43                                   ` Peter Zijlstra
2020-05-15 14:24                                   ` Vineeth Remanan Pillai
2020-05-16  3:42                               ` Aaron Lu
2020-05-22  9:40                                 ` Aaron Lu
2020-06-08  1:41                               ` Ning, Hongyu
2020-03-04 17:00 ` [RFC PATCH 10/13] sched: Trivial forced-newidle balancer vpillai
2020-03-04 17:00 ` [RFC PATCH 11/13] sched: migration changes for core scheduling vpillai
2020-06-12 13:21   ` Joel Fernandes
2020-06-12 21:32     ` Vineeth Remanan Pillai
2020-06-13  2:25       ` Joel Fernandes
2020-06-13 18:59         ` Vineeth Remanan Pillai
2020-06-15  2:05           ` Li, Aubrey
2020-03-04 17:00 ` [RFC PATCH 12/13] sched: cgroup tagging interface " vpillai
2020-06-26 15:06   ` Vineeth Remanan Pillai
2020-03-04 17:00 ` [RFC PATCH 13/13] sched: Debug bits vpillai
2020-03-04 17:36 ` [RFC PATCH 00/13] Core scheduling v5 Tim Chen
2020-03-04 17:42   ` Vineeth Remanan Pillai
2020-04-14 14:21 ` Peter Zijlstra
2020-04-15 16:32   ` Joel Fernandes
2020-04-17 11:12     ` Peter Zijlstra
2020-04-17 12:35       ` Alexander Graf
2020-04-17 13:08         ` Peter Zijlstra
2020-04-18  2:25       ` Joel Fernandes
2020-05-09 14:35   ` Dario Faggioli
     [not found] ` <38805656-2e2f-222a-c083-692f4b113313@linux.intel.com>
2020-05-09  3:39   ` Ning, Hongyu
2020-05-14 20:51     ` FW: " Gruza, Agata
2020-05-10 23:46 ` [PATCH RFC] Add support for core-wide protection of IRQ and softirq Joel Fernandes (Google)
2020-05-11 13:49   ` Peter Zijlstra
2020-05-11 14:54     ` Joel Fernandes
2020-05-20 22:26 ` [PATCH RFC] sched: Add a per-thread core scheduling interface Joel Fernandes (Google)
2020-05-21  4:09   ` [PATCH RFC] sched: Add a per-thread core scheduling interface(Internet mail) benbjiang(蒋彪)
2020-05-21 13:49     ` Joel Fernandes
2020-05-21  8:51   ` [PATCH RFC] sched: Add a per-thread core scheduling interface Peter Zijlstra
2020-05-21 13:47     ` Joel Fernandes
2020-05-21 20:20       ` Vineeth Remanan Pillai
2020-05-22 12:59       ` Peter Zijlstra
2020-05-22 21:35         ` Joel Fernandes
2020-05-24 14:00           ` Phil Auld
2020-05-28 14:51             ` Joel Fernandes
2020-05-28 17:01             ` Peter Zijlstra
2020-05-28 18:17               ` Phil Auld
2020-05-28 18:34                 ` Phil Auld
2020-05-28 18:23               ` Joel Fernandes
2020-05-21 18:31   ` Linus Torvalds
2020-05-21 20:40     ` Joel Fernandes
2020-05-21 21:58       ` Jesse Barnes
2020-05-22 16:33         ` Linus Torvalds
2020-05-20 22:37 ` [PATCH RFC v2] Add support for core-wide protection of IRQ and softirq Joel Fernandes (Google)
2020-05-20 22:48 ` [PATCH RFC] sched: Use sched-RCU in core-scheduling balancing logic Joel Fernandes (Google)
2020-05-21 22:52   ` Paul E. McKenney
2020-05-22  1:26     ` Joel Fernandes
2020-06-25 20:12 ` [RFC PATCH 00/13] Core scheduling v5 Vineeth Remanan Pillai
2020-06-26  1:47   ` Joel Fernandes
2020-06-26 14:36     ` Vineeth Remanan Pillai
2020-06-26 15:10       ` Joel Fernandes
2020-06-26 15:12         ` Joel Fernandes
2020-06-27 16:21         ` Joel Fernandes
2020-06-30 14:11         ` Phil Auld
2020-06-29 12:33   ` Li, Aubrey
2020-06-29 19:41     ` Vineeth Remanan Pillai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200522023556.GE140701@google.com \
    --to=joel@joelfernandes.org \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=aaron.lwe@gmail.com \
    --cc=aubrey.intel@gmail.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vpillai@digitalocean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.