From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753411AbbC0QhF (ORCPT ); Fri, 27 Mar 2015 12:37:05 -0400 Received: from foss.arm.com ([217.140.101.70]:56883 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752915AbbC0QhB (ORCPT ); Fri, 27 Mar 2015 12:37:01 -0400 Date: Fri, 27 Mar 2015 16:37:37 +0000 From: Morten Rasmussen To: Sai Gurrappadi Cc: "peterz@infradead.org" , "mingo@redhat.com" , "vincent.guittot@linaro.org" , Dietmar Eggemann , "yuyang.du@intel.com" , "preeti@linux.vnet.ibm.com" , "mturquette@linaro.org" , "nico@linaro.org" , "rjw@rjwysocki.net" , Juri Lelli , "linux-kernel@vger.kernel.org" , Peter Boonstoppel Subject: Re: [RFCv3 PATCH 33/48] sched: Energy-aware wake-up task placement Message-ID: <20150327163737.GQ18994@e105550-lin.cambridge.arm.com> References: <1423074685-6336-1-git-send-email-morten.rasmussen@arm.com> <1423074685-6336-34-git-send-email-morten.rasmussen@arm.com> <550368F4.5050905@nvidia.com> <20150316144722.GA13019@e105550-lin.cambridge.arm.com> <5509DCFF.7080407@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5509DCFF.7080407@nvidia.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 18, 2015 at 08:15:59PM +0000, Sai Gurrappadi wrote: > On 03/16/2015 07:47 AM, Morten Rasmussen wrote: > > Again you are right. We could make the + task_utilization(p) conditional > > on i != task_cpu(p). One argument against doing that is that in > > select_task_rq_fair() task_utilization(p) hasn't been decayed yet while > > it blocked load on the previous cpu (rq) has. If the task has been gone > > for a long time, its blocked contribution may have decayed to zero and > > therefore be a poor estimate of the utilization increase caused by > > putting the task back on the previous cpu. Particularly if we still use > > the non-decayed task_utilization(p) to estimate the utilization increase > > on other cpus (!task_cpu(p)). In the interest of responsiveness and not > > trying to squeeze tasks back onto the previous cpu which might soon run > > out of capacity when utilization increases we could leave it as a sort > > of performance bias. > > > > In any case it deserves a comment in the code I think. > > I think it makes sense to use the non-decayed value of the the task's > contrib. on wake but I am not sure if we should do this 2x accounting > all the time. If we could just find a way to remove the blocked load contribution and only use the non-decayed value. I'll have a look and see if I can do better. > Another slightly related issue is that NOHZ could cause blocked rq sums > to remain stale for long periods if there aren't frequent enough > idle/nohz-idle-balances. This would cause the above bit and > energy_diff() to compute incorrect values. I have looked into load tracking behaviour when cpus are in nohz idle. It is not easy to fix properly. You will either need to put the burden of updating the blocked load of the nohz-idle cpu on one of the non-idle cpus and thereby spend precious cycles on busy cpus, or make sure to kick a nohz-idle cpu to do the updates on a regular basis. I am experimenting a bit with a third option which is to 'pre-decay' the blocked load/usage when a cpu enters nohz-idle based on the nohz-idle predicted period of idle. When the cpu exits nohz-idle I swap the non-decayed blocked back in so it get decayed properly as if the no pre-decay had happened. If some other cpu running nohz_idle_balance() decides to update the blocked load the original is swapped back in as well. It isn't bulletproof as nohz_idle_balance() updates from other cpus ruins the pre-decay and prediction used for pre-decay might be wrong. So I'm not really convinced if it is the right way to go. Any better ideas? NOHZ full (tickless busy) is a nightmare for accurate load-tracking that I don't want to face right now. Morten