From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753092AbdEJRgY (ORCPT ); Wed, 10 May 2017 13:36:24 -0400 Received: from mail-yw0-f196.google.com ([209.85.161.196]:34747 "EHLO mail-yw0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751955AbdEJRgX (ORCPT ); Wed, 10 May 2017 13:36:23 -0400 Date: Wed, 10 May 2017 13:36:20 -0400 From: Tejun Heo To: Ingo Molnar , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Vincent Guittot , Mike Galbraith , Paul Turner , Chris Mason , kernel-team@fb.com Subject: [PATCH v3 for-4.12-fixes 2/2] sched/fair: Fix O(# total cgroups) in load balance path Message-ID: <20170510173620.GG32165@htj.duckdns.org> References: <20170509161740.GD8609@htj.duckdns.org> <20170509161854.GE8609@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170509161854.GE8609@htj.duckdns.org> User-Agent: Mutt/1.8.2 (2017-04-18) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, rq->leaf_cfs_rq_list is a traversal ordered list of all live cfs_rqs which have ever been active on the CPU; unfortunately, this makes update_blocked_averages() O(# total cgroups) which isn't scalable at all. This shows up as a small CPU consumption and scheduling latency increase in the load balancing path in systems with CPU controller enabled across most cgroups. In an edge case where temporary cgroups were leaking, this caused the kernel to consume good several tens of percents of CPU cycles running update_blocked_averages(), each run taking multiple millisecs. This patch fixes the issue by taking empty and fully decayed cfs_rqs off the rq->leaf_cfs_rq_list. v3: The debug path still uses for_each_leaf_cfs_rq(). Keep it around. Signed-off-by: Tejun Heo Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Paul Turner Cc: Chris Mason Cc: stable@vger.kernel.org --- Hello, The previous patch caused build error if SCHED_DEBUG is set as that still uses for_each_leaf_cfs_rq(). Patch updated to keep that around. Thanks. kernel/sched/fair.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -372,6 +372,10 @@ static inline void list_del_leaf_cfs_rq( #define for_each_leaf_cfs_rq(rq, cfs_rq) \ list_for_each_entry_rcu(cfs_rq, &rq->leaf_cfs_rq_list, leaf_cfs_rq_list) +#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \ + list_for_each_entry_safe(cfs_rq, pos, &rq->leaf_cfs_rq_list, \ + leaf_cfs_rq_list) + /* Do the two (enqueued) entities belong to the same group ? */ static inline struct cfs_rq * is_same_group(struct sched_entity *se, struct sched_entity *pse) @@ -466,6 +470,9 @@ static inline void list_del_leaf_cfs_rq( #define for_each_leaf_cfs_rq(rq, cfs_rq) \ for (cfs_rq = &rq->cfs; cfs_rq; cfs_rq = NULL) +#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \ + for (cfs_rq = &rq->cfs; cfs_rq; cfs_rq = NULL) + static inline struct sched_entity *parent_entity(struct sched_entity *se) { return NULL; @@ -6984,7 +6991,7 @@ static void attach_tasks(struct lb_env * static void update_blocked_averages(int cpu) { struct rq *rq = cpu_rq(cpu); - struct cfs_rq *cfs_rq; + struct cfs_rq *cfs_rq, *pos; struct rq_flags rf; rq_lock_irqsave(rq, &rf); @@ -6994,7 +7001,7 @@ static void update_blocked_averages(int * Iterates the task_group tree in a bottom up fashion, see * list_add_leaf_cfs_rq() for details. */ - for_each_leaf_cfs_rq(rq, cfs_rq) { + for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) { struct sched_entity *se; /* throttled entities do not contribute to load */ @@ -7008,6 +7015,14 @@ static void update_blocked_averages(int se = cfs_rq->tg->se[cpu]; if (se && !skip_blocked_update(se)) update_load_avg(se, 0); + + /* + * There can be a lot of idle CPU cgroups. Don't let fully + * decayed cfs_rqs linger on the list. + */ + if (!cfs_rq->load.weight && !cfs_rq->avg.load_sum && + !cfs_rq->avg.util_sum && !cfs_rq->runnable_load_sum) + list_del_leaf_cfs_rq(cfs_rq); } rq_unlock_irqrestore(rq, &rf); }