From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753898AbdEDNbk (ORCPT <rfc822;w@1wt.eu>);
        Thu, 4 May 2017 09:31:40 -0400
Received: from merlin.infradead.org ([205.233.59.134]:47736 "EHLO
        merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751565AbdEDNbc (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 4 May 2017 09:31:32 -0400
Date: Thu, 4 May 2017 15:31:22 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
        =?utf-8?B?4oCcbGludXgta2VybmVsQHZnZXIua2VybmVsLm9yZ+KAnQ==?= 
        <linux-kernel@vger.kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
        Chris Mason <clm@fb.com>,
        =?utf-8?B?4oCca2VybmVsLXRlYW1AZmIuY29t4oCd?= <kernel-team@fb.com>
Subject: Re: [PATCH 1/2] sched/fair: Use task_groups instead of
 leaf_cfs_rq_list to walk all cfs_rqs
Message-ID: <20170504133122.a6qjlj3hlblbjxux@hirez.programming.kicks-ass.net>
References: <20170426004039.GA3222@wtj.duckdns.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170426004039.GA3222@wtj.duckdns.org>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Apr 25, 2017 at 05:40:39PM -0700, Tejun Heo wrote:

> task_groups list is protected by its own lock.  While it allows RCU
> protected traversal and the order of operations guarantees that all
> online cfs_rqs will be visited, holding rq->lock won't protect against
> iterating an already unregistered cfs_rq.

So I think the below is sufficient.

Yes we can hit an (almost) dead cfs_rq, but poking the bandwidth
variables thereof is harmless.

unthrottle_cfs_rq() also stops doing anything much when it finds the
cfs_rq is empty, which must be the case if we're removing it.

I don't know Paul's opinion on RCU GPs happening while stop_machine(),
but just in case he feels that's fair game, I did add the
rcu_read_lock() thingies.

The lockdep assert is mostly documentation, to more easily see
it is indeed held when we get there.

I left print_cfs_stats using the leaf list, no point in printing stuff
that's empty.

This way we can avoid taking all RQ locks on cgroup destruction.

---
 kernel/sched/fair.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cd6c3f957d44..c897ad6d714c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4644,22 +4644,32 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
 
 static void __maybe_unused update_runtime_enabled(struct rq *rq)
 {
-	struct cfs_rq *cfs_rq;
+	struct task_group *tg;
 
-	for_each_leaf_cfs_rq(rq, cfs_rq) {
-		struct cfs_bandwidth *cfs_b = &cfs_rq->tg->cfs_bandwidth;
+	lockdep_assert_held(&rq->lock);
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(tg, &task_groups, list) {
+		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
+		struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
 
 		raw_spin_lock(&cfs_b->lock);
 		cfs_rq->runtime_enabled = cfs_b->quota != RUNTIME_INF;
 		raw_spin_unlock(&cfs_b->lock);
 	}
+	rcu_read_unlock();
 }
 
 static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
 {
-	struct cfs_rq *cfs_rq;
+	struct task_group *tg;
+
+	lockdep_assert_held(&rq->lock);
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(tg, &task_groups, list) {
+		struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
 
-	for_each_leaf_cfs_rq(rq, cfs_rq) {
 		if (!cfs_rq->runtime_enabled)
 			continue;
 
@@ -4677,6 +4687,7 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
 		if (cfs_rq_throttled(cfs_rq))
 			unthrottle_cfs_rq(cfs_rq);
 	}
+	rcu_read_unlock();
 }
 
 #else /* CONFIG_CFS_BANDWIDTH */