From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758766AbcGKMjw (ORCPT ); Mon, 11 Jul 2016 08:39:52 -0400 Received: from foss.arm.com ([217.140.101.70]:52590 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758686AbcGKMbA (ORCPT ); Mon, 11 Jul 2016 08:31:00 -0400 Date: Mon, 11 Jul 2016 13:32:28 +0100 From: Morten Rasmussen To: Peter Zijlstra Cc: mingo@redhat.com, dietmar.eggemann@arm.com, yuyang.du@intel.com, vincent.guittot@linaro.org, mgalbraith@suse.de, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 07/13] sched/fair: Let asymmetric cpu configurations balance at wake-up Message-ID: <20160711123227.GI12540@e105550-lin.cambridge.arm.com> References: <1466615004-3503-1-git-send-email-morten.rasmussen@arm.com> <1466615004-3503-8-git-send-email-morten.rasmussen@arm.com> <20160711111344.GO30909@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160711111344.GO30909@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 11, 2016 at 01:13:44PM +0200, Peter Zijlstra wrote: > On Wed, Jun 22, 2016 at 06:03:18PM +0100, Morten Rasmussen wrote: > > Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if > > SD_BALANCE_WAKE is set on the sched_domains. For asymmetric > > configurations SD_WAKE_AFFINE is only desirable if the waking task's > > compute demand (utilization) is suitable for all the cpu capacities > > available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup > > balancing take over (find_idlest_{group, cpu}()). > > I think I tripped over this one the last time around, and I'm not sure > this Changelog is any clearer. > > This is about the case where the waking cpu and prev_cpu are both in the > 'wrong' cluster, right? Almost :-) It is the cases where waking cpu _or_ prev_cpu both don't 'fit' the task ('wrong' cluster). select_idle_sibling() doesn't consider capacity so we can't let it choose between waking cpu and prev_cpu if one of them isn't a good choice. Bringing in the table from the cover letter: tu = Task utilization big/little pcpu = Previous cpu big/little tcpu = This (waker) cpu big/little dl = New cpu is little db = New cpu is big sis = New cpu chosen by select_idle_sibling() figc = New cpu chosen by find_idlest_*() ww = wake_wide(task) count for figc wakeups bw = sd_flag & SD_BALANCE_WAKE (non-fork/exec wake) for figc wakeups case tu pcpu tcpu dl db sis figc ww bw 1 l l l 122 68 28 162 161 161 2 l l b 11 4 0 15 15 15 3 l b l 0 252 8 244 244 244 4 l b b 36 1928 711 1253 1016 1016 5 b l l 5 19 0 24 22 24 6 b l b 5 1 0 6 0 6 7 b b l 0 31 0 31 31 31 8 b b b 1 194 109 86 59 59 -------------------------------------------------- 180 2497 856 1821 In cases 5, 6, and 7 prev_cpu or waking cpu is little and the task only fits on big. This is why we have zeros in the 'sis' column for those three cases. All other cases we don't care from a capacity point of view if select_idle_sibling() choose prev_cpu or waking cpu. I will try to rewrite the commit message to makes this clearer. > > > This patch makes affine wake-ups conditional on whether both the waker > > cpu and prev_cpu has sufficient capacity for the waking task, or not. > > > > It is assumed that the sched_group(s) containing the waker cpu and > > prev_cpu only contain cpu with the same capacity (homogeneous). > > > > > Ideally, we shouldn't set 'want_affine' in the first place, but we don't > > know if SD_BALANCE_WAKE is enabled on the sched_domain(s) until we start > > traversing them. > > Is this again more fallout from that weird ASYM_CAP thing? I think this is a left-over comment from v1 that shouldn't have survived into v2. The flag game was more complicated in v1 as it required disabling SD_WAKE_AFFINE. That is all gone know thanks to Vincent's suggestions in the v1 review. > > > +static int wake_cap(struct task_struct *p, int cpu, int prev_cpu) > > +{ > > + long min_cap, max_cap; > > + > > + min_cap = min(capacity_orig_of(prev_cpu), capacity_orig_of(cpu)); > > + max_cap = cpu_rq(cpu)->rd->max_cpu_capacity; > > + > > + /* Minimum capacity is close to max, no need to abort wake_affine */ > > + if (max_cap - min_cap < max_cap >> 3) > > + return 0; > > + > > + return min_cap * 1024 < task_util(p) * capacity_margin; > > +} > > I'm most puzzled by these inequalities, how, why ? > > I would figure you'd compare task_util to the current remaining util of > the small group, and if it fits, place it there. This seems to do > something entirely different. Right, I only compare task utilization to max capacity here and completely ignore whether any of those cpus are already fully or partially utilized. Available (remaining) capacity is considered in find_idlest_group() when that route is taken. wake_cap() is meant as a first check to see if we should worry about cpu types (max capacities) at all: If not go select_idle_sibling(), if we should, go find_idlest_{group,cpu}() and look more into the details. Find the best target cpu quickly becomes messy. Comparing group utilization to group capacity can be very misleading. The ARM Juno platform has four little cpus and two big cpus, so cluster group capacities are roughly comparable. We have to iterate over all the cpus to find the one with most spare capacity to find the best fit. So I thought that would fit better with the existing code in find_idlest_*().