All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vineeth Remanan Pillai <vpillai@digitalocean.com>
To: Aaron Lu <aaron.lu@linux.alibaba.com>
Cc: Vineeth Remanan Pillai <vpillai@digitalocean.com>,
	Nishanth Aravamudan <naravamudan@digitalocean.com>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	mingo@kernel.org, tglx@linutronix.de, pjt@google.com,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	subhra.mazumdar@oracle.com, fweisbec@gmail.com,
	keescook@chromium.org, kerrnel@google.com,
	Aubrey Li <aubrey.intel@gmail.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Phil Auld <pauld@redhat.com>
Subject: Re: [RFC PATCH v2 00/17] Core scheduling v2
Date: Wed, 15 May 2019 21:36:15 +0000	[thread overview]
Message-ID: <20190515213615.3387-1-vpillai@digitalocean.com> (raw)
In-Reply-To: <20190509021144.GA24577@aaronlu>

> It's clear now, thanks.
> I don't immediately see how my isolation fix would make your fix stop
> working, will need to check. But I'm busy with other stuffs so it will
> take a while.
>
We have identified the issue and have a fix for this. The issue is
same as before, forced idle sibling has a runnable process which
is starved due to an unconstrained pick bug.

One sample scenario is like this:
cpu0 and cpu1 are siblings. cpu0 selects an untagged process 'a'
which forces idle on cpu1 even though it had a runnable tagged
process 'b' which is determined by the code to be of lesser priority.
cpu1 can go to deep idle.

During the next schedule in cpu0, the following could happen:
 - cpu0 selects swapper as there is nothing to run and hence
   prev_cookie is 0, it does an unconstrained pick of swapper.
   So both cpu0 and 1 are idling and cpu1 might be deep idle.
 - cpu0 again goes to schedule and selects 'a' which is runnable
   now. since prev_cookie is 0, 'a' is an unconstrained pick and
   'b' on cpu1 is forgotten again.

This continues with swapper and process 'a' taking turns without
considering sibling until a tagged process becomes runnable in cpu0
and then we don't get into unconstrained pick.

The above is one of the couple of scenarios we have seen and each
have a slightly different path, which ultimately leads to an
unconstrianed pick, starving the sibling's runnable thread.

The fix is to mark if a core has gone forced idle when there was a
runnable process and then do not do uncontrained pick if a forced
idle happened in the last pick.

I am attaching here wth, the patch that fixes the above issue. Patch
is on top of Peter's fix and your correctness fix that we modified for
v2. We have a public reposiory with all the changes including this
fix as well:
https://github.com/digitalocean/linux-coresched/tree/coresched

We are working on a v3 where the last 3 commits will be squashed to
their related patches in v2. We hope to come up with a v3 next week
with all the suggestions and fixes posted in v2.

Thanks,
Vineeth

---
 kernel/sched/core.c  | 26 ++++++++++++++++++++++----
 kernel/sched/sched.h |  1 +
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 413d46bde17d..3aba0f8fe384 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3653,8 +3653,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	struct task_struct *next, *max = NULL;
 	const struct sched_class *class;
 	const struct cpumask *smt_mask;
-	unsigned long prev_cookie;
 	int i, j, cpu, occ = 0;
+	bool need_sync = false;
 
 	if (!sched_core_enabled(rq))
 		return __pick_next_task(rq, prev, rf);
@@ -3702,7 +3702,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	 * 'Fix' this by also increasing @task_seq for every pick.
 	 */
 	rq->core->core_task_seq++;
-	prev_cookie = rq->core->core_cookie;
+	need_sync = !!rq->core->core_cookie;
 
 	/* reset state */
 	rq->core->core_cookie = 0UL;
@@ -3711,6 +3711,11 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 
 		rq_i->core_pick = NULL;
 
+		if (rq_i->core_forceidle) {
+			need_sync = true;
+			rq_i->core_forceidle = false;
+		}
+
 		if (i != cpu)
 			update_rq_clock(rq_i);
 	}
@@ -3743,7 +3748,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 				 * If there weren't no cookies; we don't need
 				 * to bother with the other siblings.
 				 */
-				if (i == cpu && !prev_cookie)
+				if (i == cpu && !need_sync)
 					goto next_class;
 
 				continue;
@@ -3753,7 +3758,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 			 * Optimize the 'normal' case where there aren't any
 			 * cookies and we don't need to sync up.
 			 */
-			if (i == cpu && !prev_cookie && !p->core_cookie) {
+			if (i == cpu && !need_sync && !p->core_cookie) {
 				next = p;
 				rq->core_pick = NULL;
 				rq->core->core_cookie = 0UL;
@@ -3816,7 +3821,16 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 					}
 					occ = 1;
 					goto again;
+				} else {
+					/*
+					 * Once we select a task for a cpu, we
+					 * should not be doing an unconstrained
+					 * pick because it might starve a task
+					 * on a forced idle cpu.
+					 */
+					need_sync = true;
 				}
+
 			}
 		}
 next_class:;
@@ -3843,6 +3857,9 @@ next_class:;
 
 		WARN_ON_ONCE(!rq_i->core_pick);
 
+		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
+			rq->core_forceidle = true;
+
 		rq_i->core_pick->core_occupation = occ;
 
 		if (i == cpu)
@@ -6746,6 +6763,7 @@ void __init sched_init(void)
 		rq->core_pick = NULL;
 		rq->core_enabled = 0;
 		rq->core_tree = RB_ROOT;
+		rq->core_forceidle = false;
 
 		rq->core_cookie = 0UL;
 #endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f38d4149443b..74c29afa0f32 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -964,6 +964,7 @@ struct rq {
 	unsigned int		core_enabled;
 	unsigned int		core_sched_seq;
 	struct rb_root		core_tree;
+	bool			core_forceidle;
 
 	/* shared state */
 	unsigned int		core_task_seq;
-- 
2.17.1

  reply	other threads:[~2019-05-15 21:36 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-23 16:18 [RFC PATCH v2 00/17] Core scheduling v2 Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 01/17] stop_machine: Fix stop_cpus_in_progress ordering Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 02/17] sched: Fix kerneldoc comment for ia64_set_curr_task Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 03/17] sched: Wrap rq::lock access Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 04/17] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 05/17] sched: Add task_struct pointer to sched_class::set_curr_task Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 06/17] sched/fair: Export newidle_balance() Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 07/17] sched: Allow put_prev_task() to drop rq->lock Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 08/17] sched: Rework pick_next_task() slow-path Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 09/17] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2019-04-26 14:02   ` Peter Zijlstra
2019-04-26 16:10     ` Vineeth Remanan Pillai
2019-04-29  5:38   ` Aaron Lu
2019-04-23 16:18 ` [RFC PATCH v2 10/17] sched: Core-wide rq->lock Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 11/17] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2019-04-24  0:08   ` Tim Chen
2019-04-24 20:43     ` Vineeth Remanan Pillai
2019-04-24 22:12       ` Tim Chen
2019-04-25 14:35       ` Phil Auld
2019-05-22 19:52         ` Vineeth Remanan Pillai
2019-04-24  0:17   ` Tim Chen
2019-04-24 20:43     ` Vineeth Remanan Pillai
2019-04-29  3:36   ` Aaron Lu
2019-05-10 13:06     ` Peter Zijlstra
2019-04-29  6:15   ` Aaron Lu
2019-05-01 23:27     ` Tim Chen
2019-05-03  0:06       ` Tim Chen
2019-05-08 15:49         ` Aubrey Li
2019-05-08 18:19           ` Subhra Mazumdar
2019-05-08 18:37             ` Subhra Mazumdar
2019-05-09  0:01               ` Aubrey Li
2019-05-09  0:25                 ` Subhra Mazumdar
2019-05-09  1:38                   ` Aubrey Li
2019-05-09  2:14                     ` Subhra Mazumdar
2019-05-09 15:10                       ` Aubrey Li
2019-05-09 17:50                         ` Subhra Mazumdar
2019-05-10  0:09                           ` Tim Chen
2019-04-23 16:18 ` [RFC PATCH v2 12/17] sched: A quick and dirty cgroup tagging interface Vineeth Remanan Pillai
2019-04-25 14:26   ` Phil Auld
2019-04-26 14:13     ` Peter Zijlstra
2019-04-26 14:19       ` Phil Auld
2019-05-10 15:12   ` Julien Desfossez
2019-04-23 16:18 ` [RFC PATCH v2 13/17] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2019-04-29  7:13   ` Aaron Lu
2019-05-18 15:37   ` Aubrey Li
2019-05-20 13:04     ` Phil Auld
2019-05-20 14:04       ` Vineeth Pillai
2019-05-21  8:19         ` Aubrey Li
2019-05-21 13:24           ` Vineeth Pillai
2019-04-23 16:18 ` [RFC PATCH v2 14/17] sched/fair: Add a few assertions Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 15/17] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2019-04-23 23:46   ` Aubrey Li
2019-04-24 14:03     ` Vineeth Remanan Pillai
2019-04-24 14:05     ` Vineeth Remanan Pillai
2019-04-23 16:18 ` [RFC PATCH v2 16/17] sched: Wake up sibling if it has something to run Vineeth Remanan Pillai
2019-04-26 15:03   ` Peter Zijlstra
2019-04-29 12:36     ` Julien Desfossez
2019-04-23 16:18 ` [RFC PATCH v2 17/17] sched: Debug bits Vineeth Remanan Pillai
2019-05-17 17:18   ` Aubrey Li
2019-04-23 18:02 ` [RFC PATCH v2 00/17] Core scheduling v2 Phil Auld
2019-04-23 18:45   ` Vineeth Remanan Pillai
2019-04-29  3:53     ` Aaron Lu
2019-05-06 19:39       ` Julien Desfossez
2019-05-08  2:30         ` Aaron Lu
2019-05-08 17:49           ` Julien Desfossez
2019-05-09  2:11             ` Aaron Lu
2019-05-15 21:36               ` Vineeth Remanan Pillai [this message]
2019-04-23 23:25 ` Aubrey Li
2019-04-24 11:19   ` Vineeth Remanan Pillai
2019-05-15 21:39     ` Vineeth Remanan Pillai
2019-04-24 13:13 ` Aubrey Li
2019-04-24 14:00   ` Julien Desfossez
2019-04-25  3:15     ` Aubrey Li
2019-04-25  9:55       ` Ingo Molnar
2019-04-25 14:46         ` Mel Gorman
2019-04-25 18:53           ` Ingo Molnar
2019-04-25 18:59             ` Thomas Gleixner
2019-04-25 19:34               ` Ingo Molnar
2019-04-25 21:31             ` Mel Gorman
2019-04-26  8:42               ` Ingo Molnar
2019-04-26 10:43                 ` Mel Gorman
2019-04-26 18:37                   ` Subhra Mazumdar
2019-04-26 19:49                     ` Mel Gorman
2019-04-26  9:45               ` Ingo Molnar
2019-04-26 10:19                 ` Mel Gorman
2019-04-27  9:06                   ` Ingo Molnar
2019-04-26  9:51               ` Ingo Molnar
2019-04-26 14:15             ` Phil Auld
2019-04-26  2:18         ` Aubrey Li
2019-04-26  9:51           ` Ingo Molnar
2019-04-27  3:51         ` Aubrey Li
2019-04-27  9:17           ` Ingo Molnar
2019-04-27 14:04             ` Aubrey Li
2019-04-27 14:21               ` Ingo Molnar
2019-04-27 15:54                 ` Aubrey Li
2019-04-28  9:33                   ` Ingo Molnar
2019-04-28 10:29                     ` Aubrey Li
2019-04-28 12:17                       ` Ingo Molnar
2019-04-29  2:17                         ` Li, Aubrey
2019-04-29  6:14                           ` Ingo Molnar
2019-04-29 13:25                             ` Li, Aubrey
2019-04-29 15:39                               ` Phil Auld
2019-04-30  1:24                                 ` Aubrey Li
2019-04-29 16:00                               ` Ingo Molnar
2019-04-30  1:34                                 ` Aubrey Li
2019-04-30  4:42                                   ` Ingo Molnar
2019-05-18  0:58                                     ` Li, Aubrey
2019-05-18  1:08                                       ` Li, Aubrey
2019-04-25 14:36 ` Julien Desfossez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190515213615.3387-1-vpillai@digitalocean.com \
    --to=vpillai@digitalocean.com \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=aubrey.intel@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.