From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755295AbcDAW2x (ORCPT ); Fri, 1 Apr 2016 18:28:53 -0400 Received: from mail-pf0-f172.google.com ([209.85.192.172]:34899 "EHLO mail-pf0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755228AbcDAW2v (ORCPT ); Fri, 1 Apr 2016 18:28:51 -0400 Subject: Re: [PATCH RFC] sched/fair: let cpu's cfs_rq to reflect task migration To: Peter Zijlstra , Leo Yan References: <1459528717-17339-1-git-send-email-leo.yan@linaro.org> <20160401194948.GN3448@twins.programming.kicks-ass.net> Cc: Ingo Molnar , Morten Rasmussen , Dietmar Eggemann , Vincent Guittot , linux-kernel@vger.kernel.org, eas-dev@lists.linaro.org From: Steve Muckle X-Enigmail-Draft-Status: N1110 Message-ID: <56FEF621.3070404@linaro.org> Date: Fri, 1 Apr 2016 15:28:49 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: <20160401194948.GN3448@twins.programming.kicks-ass.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/01/2016 12:49 PM, Peter Zijlstra wrote: > On Sat, Apr 02, 2016 at 12:38:37AM +0800, Leo Yan wrote: >> When task is migrated from CPU_A to CPU_B, scheduler will decrease >> the task's load/util from the task's cfs_rq and also add them into >> migrated cfs_rq. But if kernel enables CONFIG_FAIR_GROUP_SCHED then this >> cfs_rq is not the same one with cpu's cfs_rq. As a result, after task is >> migrated to CPU_B, then CPU_A still have task's stale value for >> load/util; on the other hand CPU_B also cannot reflect new load/util >> which introduced by the task. >> >> So this patch is to operate the task's load/util to cpu's cfs_rq, so >> finally cpu's cfs_rq can really reflect task's migration. > > Sorry but that is unintelligible. > > What is the problem? Why do we care? How did you fix it? and at what > cost? I think I follow - Leo please correct me if I mangle your intentions. It's an issue that Morten and Dietmar had mentioned to me as well. Assume CONFIG_FAIR_GROUP_SCHED is enabled and a task is running in a task group other than the root. The task migrates from one CPU to another. The cfs_rq.avg structures on the src and dest CPUs corresponding to the group containing the task are updated immediately via remove_entity_load_avg()/update_cfs_rq_load_avg() and attach_entity_load_avg(). But the cfs_rq.avg corresponding to the root on the src and dest are not immediately updated. The root cfs_rq.avg must catch up over time with PELT. As to why we care, there's at least one issue which may or may not be Leo's - the root cfs_rq is the one whose avg structure we read from to determine the frequency with schedutil. If you have a cpu-bound task in a non-root cgroup which periodically migrates among CPUs on an otherwise idle system, I believe each time it migrates the frequency would drop to fmin and have to ramp up again with PELT. Leo I noticed you did not modify detach_entity_load_average(). I think this would be needed to avoid the task's stats being double counted for a while after switched_from_fair() or task_move_group_fair().