From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754685AbdEIQRp (ORCPT ); Tue, 9 May 2017 12:17:45 -0400 Received: from mail-yb0-f196.google.com ([209.85.213.196]:34809 "EHLO mail-yb0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753670AbdEIQRo (ORCPT ); Tue, 9 May 2017 12:17:44 -0400 Date: Tue, 9 May 2017 12:17:40 -0400 From: Tejun Heo To: Ingo Molnar , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Vincent Guittot , Mike Galbraith , Paul Turner , Chris Mason , kernel-team@fb.com Subject: [PATCH v2 for-4.12-fixes 1/2] sched/fair: Use task_groups instead of leaf_cfs_rq_list to walk all cfs_rqs Message-ID: <20170509161740.GD8609@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.8.2 (2017-04-18) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, rq->leaf_cfs_rq_list is a traversal ordered list of all live cfs_rqs which have ever been active on the CPU; unfortunately, this makes update_blocked_averages() O(total number of CPU cgroups) which isn't scalable at all. The next patch will make rq->leaf_cfs_rq_list only contain the cfs_rqs which are currently active. In preparation, this patch converts users which need to traverse all cfs_rqs to use task_groups list instead. task_groups list is protected by its own lock and allows RCU protected traversal and the order of operations guarantees that all online cfs_rqs will be visited, but holding rq->lock won't protect against iterating an already unregistered cfs_rq. However, the operations of the two users that get converted - update_runtime_enabled() and unthrottle_offline_cfs_rqs() - should be safe to perform on already dead cfs_rqs, so adding rcu read protection around them should be enough. Note that print_cfs_stats() is not converted. The next patch will change its behavior to print out only active cfs_rqs, which is intended as there's not much point in printing out idle cfs_rqs. v2: Dropped strong synchronization around removal and left print_cfs_stats() unchanged as suggested by Peterz. Signed-off-by: Tejun Heo Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Paul Turner Cc: Chris Mason Cc: stable@vger.kernel.org --- Peter, Updated the patch - pasted in your version and updated the description accordingly. Please feel free to use this one or whichever you like. Based on mainline and stable is cc'd. Thanks. kernel/sched/fair.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4644,22 +4644,32 @@ static void destroy_cfs_bandwidth(struct static void __maybe_unused update_runtime_enabled(struct rq *rq) { - struct cfs_rq *cfs_rq; + struct task_group *tg; - for_each_leaf_cfs_rq(rq, cfs_rq) { - struct cfs_bandwidth *cfs_b = &cfs_rq->tg->cfs_bandwidth; + lockdep_assert_held(&rq->lock); + + rcu_read_lock(); + list_for_each_entry_rcu(tg, &task_groups, list) { + struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth; + struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)]; raw_spin_lock(&cfs_b->lock); cfs_rq->runtime_enabled = cfs_b->quota != RUNTIME_INF; raw_spin_unlock(&cfs_b->lock); } + rcu_read_unlock(); } static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq) { - struct cfs_rq *cfs_rq; + struct task_group *tg; + + lockdep_assert_held(&rq->lock); + + rcu_read_lock(); + list_for_each_entry_rcu(tg, &task_groups, list) { + struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)]; - for_each_leaf_cfs_rq(rq, cfs_rq) { if (!cfs_rq->runtime_enabled) continue; @@ -4677,6 +4687,7 @@ static void __maybe_unused unthrottle_of if (cfs_rq_throttled(cfs_rq)) unthrottle_cfs_rq(cfs_rq); } + rcu_read_unlock(); } #else /* CONFIG_CFS_BANDWIDTH */