From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751725AbdECKiF (ORCPT ); Wed, 3 May 2017 06:38:05 -0400 Received: from mail-oi0-f47.google.com ([209.85.218.47]:36321 "EHLO mail-oi0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750861AbdECKh6 (ORCPT ); Wed, 3 May 2017 06:37:58 -0400 MIME-Version: 1.0 In-Reply-To: <20170503093723.tnt53ppb23tn6buz@hirez.programming.kicks-ass.net> References: <20170424201344.GA14169@wtj.duckdns.org> <20170424201444.GC14169@wtj.duckdns.org> <20170426225202.GC11348@wtj.duckdns.org> <20170428203347.GC19364@htj.duckdns.org> <20170502215054.GC5335@htj.duckdns.org> <20170503093723.tnt53ppb23tn6buz@hirez.programming.kicks-ass.net> From: Vincent Guittot Date: Wed, 3 May 2017 12:37:37 +0200 Message-ID: Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg To: Peter Zijlstra Cc: Tejun Heo , Ingo Molnar , linux-kernel , Linus Torvalds , Mike Galbraith , Paul Turner , Chris Mason , kernel-team@fb.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3 May 2017 at 11:37, Peter Zijlstra wrote: > > On Wed, May 03, 2017 at 09:34:51AM +0200, Vincent Guittot wrote: > > > We use load_avg for calculating a stable share and we want to use it > > more and more. So breaking it because it's easier doesn't seems to be > > the right way to do IMHO > > So afaict we calculate group se->load.weight (aka shares, see > calc_cfs_shares), using cfs_rq->avg.load_avg, which feeds into > tg->load_avg through cfs_rq->tg_load_avg_contrib and > cfs_rq->load.weight. > > And cfs_rq->avg.load_avg is computed from cfs_rq->load.weight, which > is \Sum se->load.weight. > > OTOH group se->avg.load_avg isn't used much, which is TJ's point. > > The only cases where group se->avg.load_avg are relevant to > cfs_rq->avg.load are the cases I listed yesterday, group creation and > group destruction. There we use the group se->avg.load_avg to manage the > boundary conditions. > > So with the proposed change to se->avg.load_avg we can have some > (temporary) boundary effect when you destroy a lot of (previously > active) cgroups. > > > Of course, it could be I overlooked something, in which case, please > tell :-) That's mainly based on the regression i see on my platform. I haven't find the root cause of the regression but it's there which means that using group_entity's load_avg to propagate child cfs_rq runnable_load_avg breaks something > > > That said, I agree it would be nice to entirely get rid of runnable_avg, > but that is a much larger change and would require a lot more work. I > don't immediately see why we can't fix the thing now and then work on > removing runnable_load_avg later. What propose Tejun is to break the group's load_avg and make it follows child cfs_rq's runnable_load_avg instead of child cfs_rq's load_avg so it will be difficult if not possible to try to move load_balance on load_avg and remove runnable_load_avg later on if load_avg doesn't work anymore as expected. So keeping group's load_avg working correctly seems a key point Then, we know that we still have wrong behavior with runnable_load_avg when running task's load are really different. so it fixes part of wider problem IMO > > Of course, we should not regress either, I'll go read up on that part. >