linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Sargun Dhillon <sargun@sargun.me>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Xie XiuQi <xiexiuqi@huawei.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	xiezhipeng1@huawei.com, huawei.libin@huawei.com,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Dmitry Adamushko <dmitry.adamushko@gmail.com>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] sched: fix infinity loop in update_blocked_averages
Date: Thu, 27 Dec 2018 13:46:17 -0800	[thread overview]
Message-ID: <CAHk-=wg_0v3t+mAdS2-sPWD6DTH3Y9aGoQUhx7Mk1MB8gm9xjw@mail.gmail.com> (raw)
In-Reply-To: <CAMp4zn_N9fVNmyaiH-XGJW=E8QO0OnnVmAwQM_kcXjiybmhyGw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1165 bytes --]

On Thu, Dec 27, 2018 at 1:09 PM Sargun Dhillon <sargun@sargun.me> wrote:
>
> This appears to be broken since October on 4.18.5. We've only noticed
> it recently with a workload which does ridiculously parallel compiles
> in cgroups that are rapidly churned.

Yeah, that's probably unusual enough that people will have missed it.

Because it really looks like the bug has been there since 4.13, unless
I'm mis-reading things. Other things have changed there since, so
maybe I am.

> It's also an awkward bug to catch, because none of the lockup
> detectors, were catching it in our environment. The only reason we
> caught it was that it was blocking other cores, and those other cores
> were missing IPIs, resulting in catastrophic failure.

My gut feel is that we just need to revert that commit. It doesn't
revert clealy, but it doesn't look hard to do manually.

Something like the attached?

But we do need Tejun and PeterZ to take a look, since there might be
something subtle going on.

Everybody is probably still on well-deserved vacations, so it might be
a while. But testing the attached patch is probably a good idea
regardless.

                  Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2945 bytes --]

 kernel/sched/fair.c | 41 ++++++++---------------------------------
 1 file changed, 8 insertions(+), 33 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d1907506318a..01f3cb89d188 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -353,9 +353,8 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
 }
 
 /* Iterate thr' all leaf cfs_rq's on a runqueue */
-#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos)			\
-	list_for_each_entry_safe(cfs_rq, pos, &rq->leaf_cfs_rq_list,	\
-				 leaf_cfs_rq_list)
+#define for_each_leaf_cfs_rq(rq, cfs_rq) \
+	list_for_each_entry_rcu(cfs_rq, &rq->leaf_cfs_rq_list, leaf_cfs_rq_list)
 
 /* Do the two (enqueued) entities belong to the same group ? */
 static inline struct cfs_rq *
@@ -447,8 +446,8 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
 {
 }
 
-#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos)	\
-		for (cfs_rq = &rq->cfs, pos = NULL; cfs_rq; cfs_rq = pos)
+#define for_each_leaf_cfs_rq(rq, cfs_rq)	\
+		for (cfs_rq = &rq->cfs; cfs_rq; cfs_rq = NULL)
 
 static inline struct sched_entity *parent_entity(struct sched_entity *se)
 {
@@ -7647,27 +7646,10 @@ static inline bool others_have_blocked(struct rq *rq)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 
-static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
-{
-	if (cfs_rq->load.weight)
-		return false;
-
-	if (cfs_rq->avg.load_sum)
-		return false;
-
-	if (cfs_rq->avg.util_sum)
-		return false;
-
-	if (cfs_rq->avg.runnable_load_sum)
-		return false;
-
-	return true;
-}
-
 static void update_blocked_averages(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
-	struct cfs_rq *cfs_rq, *pos;
+	struct cfs_rq *cfs_rq;
 	const struct sched_class *curr_class;
 	struct rq_flags rf;
 	bool done = true;
@@ -7679,7 +7661,7 @@ static void update_blocked_averages(int cpu)
 	 * Iterates the task_group tree in a bottom up fashion, see
 	 * list_add_leaf_cfs_rq() for details.
 	 */
-	for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) {
+	for_each_leaf_cfs_rq(rq, cfs_rq) {
 		struct sched_entity *se;
 
 		/* throttled entities do not contribute to load */
@@ -7694,13 +7676,6 @@ static void update_blocked_averages(int cpu)
 		if (se && !skip_blocked_update(se))
 			update_load_avg(cfs_rq_of(se), se, 0);
 
-		/*
-		 * There can be a lot of idle CPU cgroups.  Don't let fully
-		 * decayed cfs_rqs linger on the list.
-		 */
-		if (cfs_rq_is_decayed(cfs_rq))
-			list_del_leaf_cfs_rq(cfs_rq);
-
 		/* Don't need periodic decay once load/util_avg are null */
 		if (cfs_rq_has_blocked(cfs_rq))
 			done = false;
@@ -10570,10 +10545,10 @@ const struct sched_class fair_sched_class = {
 #ifdef CONFIG_SCHED_DEBUG
 void print_cfs_stats(struct seq_file *m, int cpu)
 {
-	struct cfs_rq *cfs_rq, *pos;
+	struct cfs_rq *cfs_rq;
 
 	rcu_read_lock();
-	for_each_leaf_cfs_rq_safe(cpu_rq(cpu), cfs_rq, pos)
+	for_each_leaf_cfs_rq(cpu_rq(cpu), cfs_rq)
 		print_cfs_rq(m, cpu, cfs_rq);
 	rcu_read_unlock();
 }

  reply	other threads:[~2018-12-27 21:46 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-27  3:04 [PATCH] sched: fix infinity loop in update_blocked_averages Xie XiuQi
2018-12-27  9:21 ` Vincent Guittot
2018-12-27 10:21   ` Vincent Guittot
2018-12-27 10:23     ` Vincent Guittot
2018-12-27 16:39       ` Sargun Dhillon
2018-12-27 17:01         ` Vincent Guittot
2018-12-27 18:15           ` Linus Torvalds
2018-12-27 21:08             ` Sargun Dhillon
2018-12-27 21:46               ` Linus Torvalds [this message]
2018-12-28  1:15             ` Tejun Heo
2018-12-28  1:36               ` Linus Torvalds
2018-12-28  1:53                 ` Tejun Heo
2018-12-28  2:02                   ` Tejun Heo
2018-12-28  2:30                     ` Xie XiuQi
2018-12-28  5:38                     ` Sargun Dhillon
2018-12-28  9:30                     ` Vincent Guittot
2018-12-28 14:26                       ` Sargun Dhillon
2018-12-28 16:54                       ` Tejun Heo
2018-12-28 17:25                         ` Vincent Guittot
2018-12-28 17:46                           ` Tejun Heo
2018-12-28 18:04                             ` Vincent Guittot
2018-12-28 10:25                     ` Xiezhipeng (EulerOS)
2018-12-30 12:04   ` Ingo Molnar
2018-12-30 12:31     ` [PATCH] sched: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c Ingo Molnar
2018-12-30 12:36     ` [PATCH] sched: fix infinity loop in update_blocked_averages Vincent Guittot
2018-12-30 12:54       ` Ingo Molnar
2018-12-30 13:00 ` [tip:sched/urgent] sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c tip-bot for Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wg_0v3t+mAdS2-sPWD6DTH3Y9aGoQUhx7Mk1MB8gm9xjw@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=dmitry.adamushko@gmail.com \
    --cc=huawei.libin@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sargun@sargun.me \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=xiexiuqi@huawei.com \
    --cc=xiezhipeng1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).