All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Lu <aaron.lu@linux.alibaba.com>
To: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Cc: Phil Auld <pauld@redhat.com>,
	Nishanth Aravamudan <naravamudan@digitalocean.com>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	mingo@kernel.org, tglx@linutronix.de, pjt@google.com,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	subhra.mazumdar@oracle.com, fweisbec@gmail.com,
	keescook@chromium.org, kerrnel@google.com,
	Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [RFC PATCH v2 00/17] Core scheduling v2
Date: Mon, 29 Apr 2019 11:53:21 +0800	[thread overview]
Message-ID: <20190429035320.GB128241@aaronlu> (raw)
In-Reply-To: <20190423184527.6230-1-vpillai@digitalocean.com>

On Tue, Apr 23, 2019 at 06:45:27PM +0000, Vineeth Remanan Pillai wrote:
> >> - Processes with different tags can still share the core
> 
> > I may have missed something... Could you explain this statement?
> 
> > This, to me, is the whole point of the patch series. If it's not
> > doing this then ... what?
> 
> What I meant was, the patch needs some more work to be accurate.
> There are some race conditions where the core violation can still
> happen. In our testing, we saw around 1 to 5% of the time being
> shared with incompatible processes. One example of this happening
> is as follows(let cpu 0 and 1 be siblings):
> - cpu 0 selects a process with a cookie
> - cpu 1 selects a higher priority process without cookie
> - Selection process restarts for cpu 0 and it might select a
>   process with cookie but with lesser priority.
> - Since it is lesser priority, the logic in pick_next_task
>   doesn't compare again for the cookie(trusts pick_task) and
>   proceeds.
> 
> This is one of the scenarios that we saw from traces, but there
> might be other race conditions as well. Fix seems a little
> involved and We are working on that.

This is what I have used to make sure no two unmatched tasks being
scheduled on the same core: (on top of v1, I thinks it's easier to just
show the diff instead of commenting on various places of the patches :-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cb24a0141e57..0cdb1c6a00a4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -186,6 +186,10 @@ struct task_struct *sched_core_find(struct rq *rq, unsigned long cookie)
 	 */
 	match = idle_sched_class.pick_task(rq);
 
+	/* TODO: untagged tasks are not in the core tree */
+	if (!cookie)
+		goto out;
+
 	while (node) {
 		node_task = container_of(node, struct task_struct, core_node);
 
@@ -199,6 +203,7 @@ struct task_struct *sched_core_find(struct rq *rq, unsigned long cookie)
 		}
 	}
 
+out:
 	return match;
 }
 
@@ -3634,6 +3639,8 @@ static inline bool cookie_match(struct task_struct *a, struct task_struct *b)
 }
 
 // XXX fairness/fwd progress conditions
+// when max is unset, return class_pick;
+// when max is set, return cookie_pick unless class_pick has higher priority.
 static struct task_struct *
 pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *max)
 {
@@ -3652,7 +3659,19 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma
 	}
 
 	class_pick = class->pick_task(rq);
-	if (!cookie)
+	/*
+	 * we can only return class_pick here when max is not set.
+	 *
+	 * when max is set and cookie is 0, we still have to check if
+	 * class_pick's cookie matches with max, or we can end up picking
+	 * an unmacthed task. e.g. max is untagged and class_pick here
+	 * is tagged.
+	 */
+	if (!cookie && !max)
+		return class_pick;
+
+	/* in case class_pick matches with max, no need to check priority */
+	if (class_pick && cookie_match(class_pick, max))
 		return class_pick;
 
 	cookie_pick = sched_core_find(rq, cookie);
@@ -3663,8 +3682,11 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma
 	 * If class > max && class > cookie, it is the highest priority task on
 	 * the core (so far) and it must be selected, otherwise we must go with
 	 * the cookie pick in order to satisfy the constraint.
+	 *
+	 * class_pick and cookie_pick are on the same cpu so use cpu_prio_less()
+	 * max and class_pick are on different cpus so use core_prio_less()
 	 */
-	if (cpu_prio_less(cookie_pick, class_pick) && cpu_prio_less(max, class_pick))
+	if (cpu_prio_less(cookie_pick, class_pick) && core_prio_less(max, class_pick))
 		return class_pick;
 
 	return cookie_pick;
@@ -3731,8 +3753,17 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 
 		rq_i->core_pick = NULL;
 
-		if (i != cpu)
+		if (i != cpu) {
 			update_rq_clock(rq_i);
+			/*
+			 * we are going to pick tasks for both cpus, if our
+			 * sibling is idle and we have core_cookie set, now
+			 * is the time to clear/reset it so that we can do
+			 * an unconstained pick.
+			 */
+			if (is_idle_task(rq_i->curr) && rq_i->core->core_cookie)
+				rq_i->core->core_cookie = 0;
+		}
 	}
 
 	/*
@@ -3794,20 +3825,42 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 			 *
 			 * NOTE: this is a linear max-filter and is thus bounded
 			 * in execution time.
+			 *
+			 * The fact that pick_task() returns p with a different
+			 * cookie means p has higher priority and we need to
+			 * replace max with p.
 			 */
-			if (!max || core_prio_less(max, p)) {
+			if (!max || !cookie_match(max, p)) {
 				struct task_struct *old_max = max;
 
 				rq->core->core_cookie = p->core_cookie;
 				max = p;
 				trace_printk("max: %s/%d %lx\n", max->comm, max->pid, max->core_cookie);
 
-				if (old_max && !cookie_match(old_max, p)) {
+				if (old_max) {
 					for_each_cpu(j, smt_mask) {
 						if (j == i)
 							continue;
 
 						cpu_rq(j)->core_pick = NULL;
+
+						/*
+						 * if max is untagged, then core_cookie
+						 * is zero and siblig can do a wrongly
+						 * unconstained pick. avoid that by doing
+						 * pick directly here. since there is no
+						 * untagged tasks in core tree, just
+						 * use idle for our sibling.
+						 * TODO: sibling may pick an untagged task.
+						 */
+						if (max->core_cookie)
+							cpu_rq(j)->core_pick = NULL;
+						else {
+							cpu_rq(j)->core_pick = idle_sched_class.pick_task(cpu_rq(j));
+							occ = 1;
+							goto out;
+						}
+
 					}
 					occ = 1;
 					goto again;
@@ -3817,6 +3870,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 next_class:;
 	}
 
+out:
 	rq->core->core_pick_seq = rq->core->core_task_seq;
 
 	/*
@@ -3834,6 +3888,17 @@ next_class:;
 
 		rq_i->core_pick->core_occupation = occ;
 
+		/* make sure we didn't break L1TF */
+		if (!is_idle_task(rq_i->core_pick) &&
+		    rq_i->core_pick->core_cookie != rq_i->core->core_cookie) {
+			trace_printk("cpu%d: cookie mismatch. %s/%d/0x%lx/0x%lx\n",
+					rq_i->cpu, rq_i->core_pick->comm,
+					rq_i->core_pick->pid,
+					rq_i->core_pick->core_cookie,
+					rq_i->core->core_cookie);
+			WARN_ON_ONCE(1);
+		}
+
 		if (i == cpu)
 			continue;
 

  reply	other threads:[~2019-04-29  3:53 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-23 16:18 [RFC PATCH v2 00/17] Core scheduling v2 Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 01/17] stop_machine: Fix stop_cpus_in_progress ordering Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 02/17] sched: Fix kerneldoc comment for ia64_set_curr_task Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 03/17] sched: Wrap rq::lock access Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 04/17] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 05/17] sched: Add task_struct pointer to sched_class::set_curr_task Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 06/17] sched/fair: Export newidle_balance() Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 07/17] sched: Allow put_prev_task() to drop rq->lock Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 08/17] sched: Rework pick_next_task() slow-path Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 09/17] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2019-04-26 14:02   ` Peter Zijlstra
2019-04-26 16:10     ` Vineeth Remanan Pillai
2019-04-29  5:38   ` Aaron Lu
2019-04-23 16:18 ` [RFC PATCH v2 10/17] sched: Core-wide rq->lock Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 11/17] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2019-04-24  0:08   ` Tim Chen
2019-04-24 20:43     ` Vineeth Remanan Pillai
2019-04-24 22:12       ` Tim Chen
2019-04-25 14:35       ` Phil Auld
2019-05-22 19:52         ` Vineeth Remanan Pillai
2019-04-24  0:17   ` Tim Chen
2019-04-24 20:43     ` Vineeth Remanan Pillai
2019-04-29  3:36   ` Aaron Lu
2019-05-10 13:06     ` Peter Zijlstra
2019-04-29  6:15   ` Aaron Lu
2019-05-01 23:27     ` Tim Chen
2019-05-03  0:06       ` Tim Chen
2019-05-08 15:49         ` Aubrey Li
2019-05-08 18:19           ` Subhra Mazumdar
2019-05-08 18:37             ` Subhra Mazumdar
2019-05-09  0:01               ` Aubrey Li
2019-05-09  0:25                 ` Subhra Mazumdar
2019-05-09  1:38                   ` Aubrey Li
2019-05-09  2:14                     ` Subhra Mazumdar
2019-05-09 15:10                       ` Aubrey Li
2019-05-09 17:50                         ` Subhra Mazumdar
2019-05-10  0:09                           ` Tim Chen
2019-04-23 16:18 ` [RFC PATCH v2 12/17] sched: A quick and dirty cgroup tagging interface Vineeth Remanan Pillai
2019-04-25 14:26   ` Phil Auld
2019-04-26 14:13     ` Peter Zijlstra
2019-04-26 14:19       ` Phil Auld
2019-05-10 15:12   ` Julien Desfossez
2019-04-23 16:18 ` [RFC PATCH v2 13/17] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2019-04-29  7:13   ` Aaron Lu
2019-05-18 15:37   ` Aubrey Li
2019-05-20 13:04     ` Phil Auld
2019-05-20 14:04       ` Vineeth Pillai
2019-05-21  8:19         ` Aubrey Li
2019-05-21 13:24           ` Vineeth Pillai
2019-04-23 16:18 ` [RFC PATCH v2 14/17] sched/fair: Add a few assertions Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 15/17] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2019-04-23 23:46   ` Aubrey Li
2019-04-24 14:03     ` Vineeth Remanan Pillai
2019-04-24 14:05     ` Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 16/17] sched: Wake up sibling if it has something to run Vineeth Remanan Pillai
2019-04-26 15:03   ` Peter Zijlstra
2019-04-29 12:36     ` Julien Desfossez
2019-04-23 16:18 ` [RFC PATCH v2 17/17] sched: Debug bits Vineeth Remanan Pillai
2019-05-17 17:18   ` Aubrey Li
2019-04-23 18:02 ` [RFC PATCH v2 00/17] Core scheduling v2 Phil Auld
2019-04-23 18:45   ` Vineeth Remanan Pillai
2019-04-29  3:53     ` Aaron Lu [this message]
2019-05-06 19:39       ` Julien Desfossez
2019-05-08  2:30         ` Aaron Lu
2019-05-08 17:49           ` Julien Desfossez
2019-05-09  2:11             ` Aaron Lu
2019-05-15 21:36               ` Vineeth Remanan Pillai
2019-04-23 23:25 ` Aubrey Li
2019-04-24 11:19   ` Vineeth Remanan Pillai
2019-05-15 21:39     ` Vineeth Remanan Pillai
2019-04-24 13:13 ` Aubrey Li
2019-04-24 14:00   ` Julien Desfossez
2019-04-25  3:15     ` Aubrey Li
2019-04-25  9:55       ` Ingo Molnar
2019-04-25 14:46         ` Mel Gorman
2019-04-25 18:53           ` Ingo Molnar
2019-04-25 18:59             ` Thomas Gleixner
2019-04-25 19:34               ` Ingo Molnar
2019-04-25 21:31             ` Mel Gorman
2019-04-26  8:42               ` Ingo Molnar
2019-04-26 10:43                 ` Mel Gorman
2019-04-26 18:37                   ` Subhra Mazumdar
2019-04-26 19:49                     ` Mel Gorman
2019-04-26  9:45               ` Ingo Molnar
2019-04-26 10:19                 ` Mel Gorman
2019-04-27  9:06                   ` Ingo Molnar
2019-04-26  9:51               ` Ingo Molnar
2019-04-26 14:15             ` Phil Auld
2019-04-26  2:18         ` Aubrey Li
2019-04-26  9:51           ` Ingo Molnar
2019-04-27  3:51         ` Aubrey Li
2019-04-27  9:17           ` Ingo Molnar
2019-04-27 14:04             ` Aubrey Li
2019-04-27 14:21               ` Ingo Molnar
2019-04-27 15:54                 ` Aubrey Li
2019-04-28  9:33                   ` Ingo Molnar
2019-04-28 10:29                     ` Aubrey Li
2019-04-28 12:17                       ` Ingo Molnar
2019-04-29  2:17                         ` Li, Aubrey
2019-04-29  6:14                           ` Ingo Molnar
2019-04-29 13:25                             ` Li, Aubrey
2019-04-29 15:39                               ` Phil Auld
2019-04-30  1:24                                 ` Aubrey Li
2019-04-29 16:00                               ` Ingo Molnar
2019-04-30  1:34                                 ` Aubrey Li
2019-04-30  4:42                                   ` Ingo Molnar
2019-05-18  0:58                                     ` Li, Aubrey
2019-05-18  1:08                                       ` Li, Aubrey
2019-04-25 14:36 ` Julien Desfossez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190429035320.GB128241@aaronlu \
    --to=aaron.lu@linux.alibaba.com \
    --cc=aaron.lwe@gmail.com \
    --cc=aubrey.intel@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vpillai@digitalocean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.