From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751450AbdEATLS (ORCPT ); Mon, 1 May 2017 15:11:18 -0400 Received: from mail-yw0-f195.google.com ([209.85.161.195]:34624 "EHLO mail-yw0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750778AbdEATLP (ORCPT ); Mon, 1 May 2017 15:11:15 -0400 Date: Mon, 1 May 2017 15:11:13 -0400 From: Tejun Heo To: Peter Zijlstra Cc: Ingo Molnar , =?utf-8?B?4oCcbGludXgta2VybmVsQHZnZXIua2VybmVsLm9yZ+KAnQ==?= , Linus Torvalds , Mike Galbraith , Paul Turner , Chris Mason , =?utf-8?B?4oCca2VybmVsLXRlYW1AZmIuY29t4oCd?= Subject: Re: [2/2] sched/fair: Fix O(# total cgroups) in load balance path Message-ID: <20170501191113.GD8921@htj.duckdns.org> References: <20170426004039.GA3222@wtj.duckdns.org> <20170426004350.GB3222@wtj.duckdns.org> <20170501161158.tgw3rko72aziygjx@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170501161158.tgw3rko72aziygjx@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Peter. On Mon, May 01, 2017 at 06:11:58PM +0200, Peter Zijlstra wrote: > On Tue, Apr 25, 2017 at 05:43:50PM -0700, Tejun Heo wrote: > > @@ -7007,6 +7008,14 @@ static void update_blocked_averages(int > > se = cfs_rq->tg->se[cpu]; > > if (se && !skip_blocked_update(se)) > > update_load_avg(se, 0); > > + > > + /* > > + * There can be a lot of idle CPU cgroups. Don't let fully > > + * decayed cfs_rqs linger on the list. > > + */ > > + if (!cfs_rq->load.weight && !cfs_rq->avg.load_sum && > > + !cfs_rq->avg.util_sum && !cfs_rq->runnable_load_sum) > > + list_del_leaf_cfs_rq(cfs_rq); > > } > > rq_unlock_irqrestore(rq, &rf); > > } > > Right this is a 'known' issue and we recently talked about this. > > I think you got the condition right, we want to wait for all the stuff > to be decayed out before taking it off the list. > > The only 'problem', which Vincent mentioned in that other thread, is that > NOHZ idle doesn't guarantee decay -- then again, you don't want to go > wake a CPU just to decay this crud either. And if we're idle, the list > being long doesn't matter either. The list staying long is fine as long as nobody walks it; however, the list can be *really* long, e.g. hundreds of thousands long, so walking it repeatedly won't be a good idea even if the system is idle. As long as NOHZ decays and trims the list when it ends up walking the list, and AFAICS it does, it should be fine. Thanks. -- tejun