From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755013AbcFHL26 (ORCPT ); Wed, 8 Jun 2016 07:28:58 -0400 Received: from foss.arm.com ([217.140.101.70]:52588 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754955AbcFHL24 (ORCPT ); Wed, 8 Jun 2016 07:28:56 -0400 Date: Wed, 8 Jun 2016 12:29:52 +0100 From: Morten Rasmussen To: Peter Zijlstra Cc: mingo@redhat.com, dietmar.eggemann@arm.com, yuyang.du@intel.com, vincent.guittot@linaro.org, mgalbraith@suse.de, linux-kernel@vger.kernel.org Subject: Re: [PATCH 09/16] sched/fair: Let asymmetric cpu configurations balance at wake-up Message-ID: <20160608112951.GE9187@e105550-lin.cambridge.arm.com> References: <1464001138-25063-1-git-send-email-morten.rasmussen@arm.com> <1464001138-25063-10-git-send-email-morten.rasmussen@arm.com> <20160602142105.GG28447@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160602142105.GG28447@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 02, 2016 at 04:21:05PM +0200, Peter Zijlstra wrote: > On Mon, May 23, 2016 at 11:58:51AM +0100, Morten Rasmussen wrote: > > Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if > > SD_BALANCE_WAKE is set on the sched_domains. For asymmetric > > configurations SD_WAKE_AFFINE is only desirable if the waking task's > > compute demand (utilization) is suitable for the cpu capacities > > available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup > > balancing take over (find_idlest_{group, cpu}()). > > > > The assumption is that SD_WAKE_AFFINE is never set for a sched_domain > > containing cpus with different capacities. This is enforced by a > > previous patch based on the SD_ASYM_CPUCAPACITY flag. > > > > Ideally, we shouldn't set 'want_affine' in the first place, but we don't > > know if SD_BALANCE_WAKE is enabled on the sched_domain(s) until we start > > traversing them. > > I'm a bit confused... > > Lets assume a 2+2 big.little thing with shared LLC: > > > ---------- SD2 ---------- > > -- SD1 -- -- SD1 -- > > 0 1 2 3 > > > SD1: WAKE_AFFINE, BALANCE_WAKE > SD2: ASYM_CAPACITY, BALANCE_WAKE > > t0 used to run on cpu1, t0 used to run on cpu2 > > cpu0 wakes t0: > > want_affine = 1 > SD1: > WAKE_AFFINE > cpumask_test_cpu(prev_cpu, sd_mask) == true > affine_sd = SD1 > break; > > affine_sd != NULL -> affine-wakeup > > cpu0 wakes t1: > > want_affine = 1 > SD1: > WAKE_AFFINE > cpumask_test_cpu(prev_cpu, sd_mask) == false > SD2: > BALANCE_WAKE > sd = SD2 > > affine_sd == NULL, sd == SD2 -> find_idlest_*() > > > All without this patch... > > So what is this thing doing? Not very much in those cases, but it makes one important difference in one case. We could do fine without the patch if we could assume that all tasks are already in the right SD according their PELT utilization and if not they will be woken up by a cpu in the right SD (so we do find_idlest_*()). But we can't :-( Let's take your example above and add that t0 should really be running on cpu2/3 due to its utilization, assuming SD1[01] are little cpus and SD1[23] are big cpus. In that case we would still do affine-wakeup and stick the task on cpu0 despite it being a little cpu. To avoid that, this patch sets want_affine = 0 in that case so we go find_idlest_*() to give the task a chance of being put on cpu2/3. The patch is also setting want_affine = 0 for other cases which are already taking the find_idlest_*() route due to the cpumask test as illustrated by your example above. We can have the current scenarios: b = big cpu capacity/task util l = little cpu capacity/task util x = don't care case task util prev_cpu this_cpu wakeup ------------------------------------------------------------------- 1 b b b affine (b) 2 b b l slow (b) 3 b l b slow (b) 4 b l l slow (b) 5 l b b affine (x) 6 l b l slow (x) 7 l l b slow (x) 8 l l l affine (x) Without the patch we would do affine-wakeup on little in case 4, where we want to wake up on a big cpu. We only do affine-wakeup when both this_cpu and prev_cpu have the same capacity and that capacity is sufficient. Vincent pointed out that it is overly restrictive as it is perfectly safe to do affine-wakeup in case 6 and 7, where the waker and the previous cpu have sufficient capacity but they are not the same. If we made wake_affine() consider cpu capacity, it should be possible to do affine-wakeup even for case 2 and 3, leaving us with only case 4 requiring the find_idles_*() route. There are more cases for taking the slow wakeup path if you have more than two cpu capacities to deal with, but I'm going to spare you the full detailed table ;-)