All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Nishanth Aravamudan <naravamudan@digitalocean.com>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Vineeth Pillai <viremana@linux.microsoft.com>,
	Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	tglx@linutronix.de, linux-kernel@vger.kernel.org,
	mingo@kernel.org, torvalds@linux-foundation.org,
	fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com,
	Phil Auld <pauld@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	vineeth@bitbyteword.org, Chen Yu <yu.c.chen@intel.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Agata Gruza <agata.gruza@intel.com>,
	Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>,
	graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com,
	pjt@google.com, rostedt@goodmis.org, derkling@google.com,
	benbjiang@tencent.com,
	Alexandre Chartre <alexandre.chartre@oracle.com>,
	James.Bottomley@hansenpartnership.com, OWeisse@umich.edu,
	Dhaval Giani <dhaval.giani@oracle.com>,
	Junaid Shahid <junaids@google.com>,
	jsbarnes@google.com, chris.hyser@oracle.com,
	Ben Segall <bsegall@google.com>, Josh Don <joshdon@google.com>,
	Hao Luo <haoluo@google.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tim Chen <tim.c.chen@intel.com>
Subject: Re: [PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling
Date: Tue, 24 Nov 2020 13:30:38 -0500	[thread overview]
Message-ID: <20201124183038.GG1021337@google.com> (raw)
In-Reply-To: <20201122224123.GE110669@balbir-desktop>

On Mon, Nov 23, 2020 at 09:41:23AM +1100, Balbir Singh wrote:
> On Tue, Nov 17, 2020 at 06:19:40PM -0500, Joel Fernandes (Google) wrote:
> > From: Peter Zijlstra <peterz@infradead.org>
> > 
> > The rationale is as follows. In the core-wide pick logic, even if
> > need_sync == false, we need to go look at other CPUs (non-local CPUs) to
> > see if they could be running RT.
> > 
> > Say the RQs in a particular core look like this:
> > Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.
> > 
> > rq0            rq1
> > CFS1 (tagged)  RT1 (not tag)
> > CFS2 (tagged)
> > 
> > Say schedule() runs on rq0. Now, it will enter the above loop and
> > pick_task(RT) will return NULL for 'p'. It will enter the above if() block
> > and see that need_sync == false and will skip RT entirely.
> > 
> > The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
> > rq0             rq1
> > CFS1            IDLE
> > 
> > When it should have selected:
> > rq0             r1
> > IDLE            RT
> > 
> > Joel saw this issue on real-world usecases in ChromeOS where an RT task
> > gets constantly force-idled and breaks RT. Lets cure it.
> > 
> > NOTE: This problem will be fixed differently in a later patch. It just
> >       kept here for reference purposes about this issue, and to make
> >       applying later patches easier.
> >
> 
> The changelog is hard to read, it refers to above if(), whereas there
> is no code snippet in the changelog.

Yeah sorry, it comes from this email where I described the issue:
http://lore.kernel.org/r/20201023175724.GA3563800@google.com

I corrected the changelog and appended the patch below. Also pushed it to:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched

> Also, from what I can see following
> the series, p->core_cookie is not yet set anywhere (unless I missed it),
> so fixing it in here did not make sense just reading the series.

The interface patches for core_cookie are added later, that's how it is. The
infrastructure comes first here. It would also not make sense to add
interface first as well so I think the current ordering is fine.

---8<-----------------------

From: Peter Zijlstra <peterz@infradead.org>
Subject: [PATCH] sched: Fix priority inversion of cookied task with sibling

The rationale is as follows. In the core-wide pick logic, even if
need_sync == false, we need to go look at other CPUs (non-local CPUs) to
see if they could be running RT.

Say the RQs in a particular core look like this:
Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.

rq0            rq1
CFS1 (tagged)  RT1 (not tag)
CFS2 (tagged)

The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
rq0             rq1
CFS1            IDLE

When it should have selected:
rq0             r1
IDLE            RT

Fix this issue by forcing need_sync and restarting the search if a
cookied task was discovered. This will avoid this optimization from
making incorrect picks.

Joel saw this issue on real-world usecases in ChromeOS where an RT task
gets constantly force-idled and breaks RT. Lets cure it.

NOTE: This problem will be fixed differently in a later patch. It just
      kept here for reference purposes about this issue, and to make
      applying later patches easier.

Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/sched/core.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4ee4902c2cf5..53af817740c0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5195,6 +5195,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	need_sync = !!rq->core->core_cookie;
 
 	/* reset state */
+reset:
 	rq->core->core_cookie = 0UL;
 	if (rq->core->core_forceidle) {
 		need_sync = true;
@@ -5242,14 +5243,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 				/*
 				 * If there weren't no cookies; we don't need to
 				 * bother with the other siblings.
-				 * If the rest of the core is not running a tagged
-				 * task, i.e.  need_sync == 0, and the current CPU
-				 * which called into the schedule() loop does not
-				 * have any tasks for this class, skip selecting for
-				 * other siblings since there's no point. We don't skip
-				 * for RT/DL because that could make CFS force-idle RT.
 				 */
-				if (i == cpu && !need_sync && class == &fair_sched_class)
+				if (i == cpu && !need_sync)
 					goto next_class;
 
 				continue;
@@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 			 * Optimize the 'normal' case where there aren't any
 			 * cookies and we don't need to sync up.
 			 */
-			if (i == cpu && !need_sync && !p->core_cookie) {
+			if (i == cpu && !need_sync) {
+				if (p->core_cookie) {
+					/*
+					 * This optimization is only valid as
+					 * long as there are no cookies
+					 * involved. We may have skipped
+					 * non-empty higher priority classes on
+					 * siblings, which are empty on this
+					 * CPU, so start over.
+					 */
+					need_sync = true;
+					goto reset;
+				}
+
 				next = p;
 				goto done;
 			}
@@ -5299,7 +5307,6 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 					 */
 					need_sync = true;
 				}
-
 			}
 		}
 next_class:;
-- 
2.29.2.454.gaff20da3a2-goog


  reply	other threads:[~2020-11-24 18:30 UTC|newest]

Thread overview: 150+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17 23:19 [PATCH -tip 00/32] Core scheduling (v9) Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 01/32] sched: Wrap rq::lock access Joel Fernandes (Google)
2020-11-19 23:31   ` Singh, Balbir
2020-11-20 16:55     ` Joel Fernandes
2020-11-22  8:52       ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 02/32] sched: Introduce sched_class::pick_task() Joel Fernandes (Google)
2020-11-19 23:56   ` Singh, Balbir
2020-11-20 16:58     ` Joel Fernandes
2020-11-25 23:19       ` Balbir Singh
2020-11-25 16:28   ` Vincent Guittot
2020-11-26  9:07     ` Peter Zijlstra
2020-11-26 10:17       ` Vincent Guittot
2020-11-26 12:40         ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 03/32] sched/fair: Fix pick_task_fair crashes due to empty rbtree Joel Fernandes (Google)
2020-11-20 10:15   ` Singh, Balbir
2020-11-20 18:11     ` Vineeth Pillai
2020-11-23 22:31       ` Balbir Singh
2020-11-24  8:31     ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 04/32] sched: Core-wide rq->lock Joel Fernandes (Google)
2020-11-22  9:11   ` Balbir Singh
2020-11-24  8:16     ` Peter Zijlstra
2020-11-26  0:35       ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 05/32] sched/fair: Add a few assertions Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 06/32] sched: Basic tracking of matching tasks Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 07/32] sched: Add core wide task selection and scheduling Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 08/32] sched/fair: Fix forced idle sibling starvation corner case Joel Fernandes (Google)
2020-11-22 10:35   ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 09/32] sched/fair: Snapshot the min_vruntime of CPUs on force idle Joel Fernandes (Google)
2020-11-22 11:44   ` Balbir Singh
2020-11-23 12:31     ` Vineeth Pillai
2020-11-23 23:31       ` Balbir Singh
2020-11-24  9:09         ` Peter Zijlstra
2020-11-25 23:17           ` Balbir Singh
2020-11-26  8:23             ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling Joel Fernandes (Google)
2020-11-22 22:41   ` Balbir Singh
2020-11-24 18:30     ` Joel Fernandes [this message]
2020-11-25 23:05       ` Balbir Singh
2020-11-26  8:29         ` Peter Zijlstra
2020-11-26 22:27           ` Balbir Singh
2020-12-01 17:49         ` Joel Fernandes
2020-11-17 23:19 ` [PATCH -tip 11/32] sched: Enqueue task into core queue only after vruntime is updated Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 12/32] sched: Simplify the core pick loop for optimized case Joel Fernandes (Google)
2020-11-24 12:04   ` Peter Zijlstra
2020-11-24 17:04     ` Joel Fernandes
2020-11-25  8:37       ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 13/32] sched: Trivial forced-newidle balancer Joel Fernandes (Google)
2020-11-23  4:38   ` Balbir Singh
2020-11-23 15:07     ` Li, Aubrey
2020-11-23 23:35       ` Balbir Singh
2020-11-24  0:32         ` Li, Aubrey
2020-11-25 21:28           ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 14/32] sched: migration changes for core scheduling Joel Fernandes (Google)
2020-11-22 23:54   ` Balbir Singh
2020-11-23  4:36     ` Li, Aubrey
2020-11-24 15:42       ` Peter Zijlstra
2020-11-25  3:12         ` Li, Aubrey
2020-11-25 22:57           ` Balbir Singh
2020-11-26  3:20             ` Li, Aubrey
2020-11-26  8:32               ` Balbir Singh
2020-11-26  9:26                 ` Li, Aubrey
2020-11-30  9:33                   ` Balbir Singh
2020-11-30 12:29                     ` Li, Aubrey
2020-12-02 14:09                       ` Li, Aubrey
2020-12-03  1:06                         ` Li, Aubrey
2020-11-30 10:35   ` Vincent Guittot
2020-11-30 12:32     ` Li, Aubrey
2020-11-17 23:19 ` [PATCH -tip 15/32] sched: Improve snapshotting of min_vruntime for CGroups Joel Fernandes (Google)
2020-11-24 10:27   ` Peter Zijlstra
2020-11-24 17:07     ` Joel Fernandes
2020-11-25  8:41       ` Peter Zijlstra
2020-11-24 10:41   ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 16/32] irq_work: Cleanup Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 17/32] arch/x86: Add a new TIF flag for untrusted tasks Joel Fernandes (Google)
2020-11-23  5:18   ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 18/32] kernel/entry: Add support for core-wide protection of kernel-mode Joel Fernandes (Google)
2020-11-24 16:09   ` Peter Zijlstra
2020-11-24 17:52     ` Joel Fernandes
2020-11-25  9:37   ` Peter Zijlstra
2020-12-01 17:55     ` Joel Fernandes
2020-11-26  5:37   ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 19/32] entry/idle: Enter and exit kernel protection during idle entry and exit Joel Fernandes (Google)
2020-11-24 16:13   ` Peter Zijlstra
2020-11-24 18:03     ` Joel Fernandes
2020-11-25  8:49       ` Peter Zijlstra
2020-12-01 18:24         ` Joel Fernandes
2020-11-17 23:19 ` [PATCH -tip 20/32] entry/kvm: Protect the kernel when entering from guest Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 21/32] sched: CGroup tagging interface for core scheduling Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 22/32] sched: Split the cookie and setup per-task cookie on fork Joel Fernandes (Google)
2020-11-25 11:07   ` Peter Zijlstra
2020-12-01 18:56     ` Joel Fernandes
2020-11-25 11:10   ` Peter Zijlstra
2020-12-01 19:20     ` Joel Fernandes
2020-12-01 19:34       ` Peter Zijlstra
2020-12-02  6:36         ` Josh Don
2020-12-02  7:54           ` Peter Zijlstra
2020-12-04  0:20             ` Josh Don
2020-12-06 17:49         ` Joel Fernandes
2020-11-25 11:11   ` Peter Zijlstra
2020-12-01 19:16     ` Joel Fernandes
2020-11-25 11:15   ` Peter Zijlstra
2020-12-01 19:11     ` Joel Fernandes
2020-12-01 19:20       ` Peter Zijlstra
2020-12-06 18:15         ` Joel Fernandes
2020-11-25 12:54   ` Peter Zijlstra
2020-12-01 18:38     ` Joel Fernandes
2020-11-25 13:03   ` Peter Zijlstra
2020-12-01 18:52     ` Joel Fernandes
2020-11-30 23:05   ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 23/32] sched: Add a per-thread core scheduling interface Joel Fernandes (Google)
2020-11-25 13:08   ` Peter Zijlstra
2020-12-01 19:36     ` Joel Fernandes
2020-12-02 21:47   ` Chris Hyser
2020-12-02 23:13     ` chris hyser
2020-12-06 17:34     ` Joel Fernandes
2020-12-07 21:48       ` chris hyser
2020-12-09 18:52       ` Chris Hyser
2020-12-14 19:31         ` Joel Fernandes
2020-12-14 19:44           ` chris hyser
2020-12-14 23:25             ` Joel Fernandes
2020-12-15 14:56               ` chris hyser
2020-12-15 16:23               ` chris hyser
2020-12-15 18:13               ` Dhaval Giani
2020-12-16  0:35                 ` Joel Fernandes
2020-11-17 23:19 ` [PATCH -tip 24/32] sched: Release references to the per-task cookie on exit Joel Fernandes (Google)
2020-11-25 13:03   ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 25/32] sched: Refactor core cookie into struct Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 26/32] sched: Add a second-level tag for nested CGroup usecase Joel Fernandes (Google)
2020-11-25 13:42   ` Peter Zijlstra
2020-11-30 23:10     ` Balbir Singh
2020-12-01 20:08     ` Joel Fernandes
2020-12-02  6:18     ` Josh Don
2020-12-02  8:02       ` Peter Zijlstra
2020-12-02 18:53         ` Tejun Heo
2020-12-04  0:51         ` Josh Don
2020-12-04 15:45           ` Tejun Heo
2020-11-17 23:19 ` [PATCH -tip 27/32] sched/debug: Add CGroup node for printing group cookie if SCHED_DEBUG Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 28/32] kselftest: Add tests for core-sched interface Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 29/32] sched: Move core-scheduler interfacing code to a new file Joel Fernandes (Google)
2020-11-17 23:20 ` [PATCH -tip 30/32] Documentation: Add core scheduling documentation Joel Fernandes (Google)
2020-11-17 23:20 ` [PATCH -tip 31/32] sched: Add a coresched command line option Joel Fernandes (Google)
2020-11-19 23:39   ` Randy Dunlap
2020-11-25 13:45   ` Peter Zijlstra
2020-11-26  0:11     ` Balbir Singh
2020-11-17 23:20 ` [PATCH -tip 32/32] sched: Debug bits Joel Fernandes (Google)
2020-12-01  0:21   ` Balbir Singh
2021-01-15 15:10     ` Joel Fernandes
2020-11-24 11:48 ` [PATCH -tip 00/32] Core scheduling (v9) Vincent Guittot
2020-11-24 15:08   ` Joel Fernandes
2020-12-03  6:16     ` Ning, Hongyu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201124183038.GG1021337@google.com \
    --to=joel@joelfernandes.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=OWeisse@umich.edu \
    --cc=aaron.lwe@gmail.com \
    --cc=agata.gruza@intel.com \
    --cc=alexandre.chartre@oracle.com \
    --cc=antonio.gomez.iglesias@intel.com \
    --cc=aubrey.intel@gmail.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=benbjiang@tencent.com \
    --cc=bsegall@google.com \
    --cc=bsingharora@gmail.com \
    --cc=chris.hyser@oracle.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=derkling@google.com \
    --cc=dfaggioli@suse.com \
    --cc=dhaval.giani@oracle.com \
    --cc=fweisbec@gmail.com \
    --cc=graf@amazon.com \
    --cc=haoluo@google.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=joshdon@google.com \
    --cc=jsbarnes@google.com \
    --cc=junaids@google.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tim.c.chen@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vineeth@bitbyteword.org \
    --cc=viremana@linux.microsoft.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.