All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michal Koutný" <mkoutny@suse.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Phil Auld <pauld@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH] sched: Simplify leaf_cfs_rq_list maintenance
Date: Thu, 25 Feb 2021 17:27:57 +0100	[thread overview]
Message-ID: <20210225162757.48858-1-mkoutny@suse.com> (raw)

leaf_cfs_rq_list should contain cfs_rqs that have runnable entities in
them.  When we're operating under a throttled hierarchy we always update
the leaf_cfs_rq_list in order no to break list_add_leaf_cfs_rq invariant
of adding complete branches.

This caused troubles when an entity became runnable (enqueue_entity)
under a throttled hierarchy (see commit b34cb07dde7c ("sched/fair: Fix
enqueue_task_fair() warning some more")). While it proved well, this
new change ignores updating leaf_cfs_rq_list when we're operating under
the throttled hierarchy and defers the leaf_cfs_rq_list update to the
point when whole hiearchy is unthrottled (tg_unthrottle_up).

The code is now simpler and leaf_cfs_rq_list contains only cfs_rqs that
are truly runnable.

Why is this RFC?
- Primarily, I'm not sure I interpreted the purpose of leaf_cfs_rq_list
  right. The removal of throttled cfs_rqs from it would exclude them
  from __update_blocked_fair() calculation and I can't see past it now.
  If it's wrong assumption, I'd like this to help clarify what the
  proper definition of leaf_cfs_rq_list would be.
- Additionally, I didn't check thoroughly for corner cases when
  se->on_rq => cfs_rq_of(se)->on_list wouldn't hold, so the patch
  certainly isn't finished.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/fair.c  | 41 ++++++-----------------------------------
 kernel/sched/sched.h |  4 +---
 2 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04a3ce20da67..634ba6637824 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4250,10 +4250,11 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 
 	/*
 	 * When bandwidth control is enabled, cfs might have been removed
-	 * because of a parent been throttled but cfs->nr_running > 1. Try to
-	 * add it unconditionnally.
+	 * because of a parent been throttled. We'll add it later (with
+	 * complete branch up to se->on_rq/cfs_eq->on_list) in
+	 * tg_unthrottle_up() and unthrottle_cfs_rq().
 	 */
-	if (cfs_rq->nr_running == 1 || cfs_bandwidth_used())
+	if (cfs_rq->nr_running == 1 && !throttled_hierarchy(cfs_rq))
 		list_add_leaf_cfs_rq(cfs_rq);
 
 	if (cfs_rq->nr_running == 1)
@@ -4859,6 +4860,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	for_each_sched_entity(se) {
 		if (se->on_rq)
 			break;
+		// XXX: se->on_rq implies cfs_rq_of(se)->on_list (unless throttled_hierarchy)
 		cfs_rq = cfs_rq_of(se);
 		enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
 
@@ -4896,17 +4898,6 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	add_nr_running(rq, task_delta);
 
 unthrottle_throttle:
-	/*
-	 * The cfs_rq_throttled() breaks in the above iteration can result in
-	 * incomplete leaf list maintenance, resulting in triggering the
-	 * assertion below.
-	 */
-	for_each_sched_entity(se) {
-		cfs_rq = cfs_rq_of(se);
-
-		if (list_add_leaf_cfs_rq(cfs_rq))
-			break;
-	}
 
 	assert_list_leaf_cfs_rq(rq);
 
@@ -5518,6 +5509,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	for_each_sched_entity(se) {
 		if (se->on_rq)
 			break;
+		// XXX: se->on_rq implies cfs_rq_of(se)->on_list (unless throttled_hierarchy)
 		cfs_rq = cfs_rq_of(se);
 		enqueue_entity(cfs_rq, se, flags);
 
@@ -5544,13 +5536,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		/* end evaluation on encountering a throttled cfs_rq */
 		if (cfs_rq_throttled(cfs_rq))
 			goto enqueue_throttle;
-
-               /*
-                * One parent has been throttled and cfs_rq removed from the
-                * list. Add it back to not break the leaf list.
-                */
-               if (throttled_hierarchy(cfs_rq))
-                       list_add_leaf_cfs_rq(cfs_rq);
 	}
 
 	/* At this point se is NULL and we are at root level*/
@@ -5574,20 +5559,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		update_overutilized_status(rq);
 
 enqueue_throttle:
-	if (cfs_bandwidth_used()) {
-		/*
-		 * When bandwidth control is enabled; the cfs_rq_throttled()
-		 * breaks in the above iteration can result in incomplete
-		 * leaf list maintenance, resulting in triggering the assertion
-		 * below.
-		 */
-		for_each_sched_entity(se) {
-			cfs_rq = cfs_rq_of(se);
-
-			if (list_add_leaf_cfs_rq(cfs_rq))
-				break;
-		}
-	}
 
 	assert_list_leaf_cfs_rq(rq);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bb09988451a0..f674d88920da 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -573,9 +573,7 @@ struct cfs_rq {
 	struct rq		*rq;	/* CPU runqueue to which this cfs_rq is attached */
 
 	/*
-	 * leaf cfs_rqs are those that hold tasks (lowest schedulable entity in
-	 * a hierarchy). Non-leaf lrqs hold other higher schedulable entities
-	 * (like users, containers etc.)
+	 * leaf cfs_rqs are those that hold runnable entities.
 	 *
 	 * leaf_cfs_rq_list ties together list of leaf cfs_rq's in a CPU.
 	 * This list is used during load balance.
-- 
2.30.1


             reply	other threads:[~2021-02-25 16:29 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-25 16:27 Michal Koutný [this message]
2021-02-25 17:44 ` [RFC PATCH] sched: Simplify leaf_cfs_rq_list maintenance kernel test robot
2021-03-08  8:18 ` Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210225162757.48858-1-mkoutny@suse.com \
    --to=mkoutny@suse.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.