From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756260AbbCXRND (ORCPT ); Tue, 24 Mar 2015 13:13:03 -0400 Received: from foss.arm.com ([217.140.101.70]:48934 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754989AbbCXRM7 (ORCPT ); Tue, 24 Mar 2015 13:12:59 -0400 Date: Tue, 24 Mar 2015 17:13:31 +0000 From: Morten Rasmussen To: Peter Zijlstra Cc: "mingo@redhat.com" , "vincent.guittot@linaro.org" , Dietmar Eggemann , "yuyang.du@intel.com" , "preeti@linux.vnet.ibm.com" , "mturquette@linaro.org" , "nico@linaro.org" , "rjw@rjwysocki.net" , Juri Lelli , "linux-kernel@vger.kernel.org" Subject: Re: [RFCv3 PATCH 36/48] sched: Count number of shallower idle-states in struct sched_group_energy Message-ID: <20150324171331.GH18994@e105550-lin.cambridge.arm.com> References: <1423074685-6336-1-git-send-email-morten.rasmussen@arm.com> <1423074685-6336-37-git-send-email-morten.rasmussen@arm.com> <20150324131439.GO23123@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150324131439.GO23123@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 24, 2015 at 01:14:39PM +0000, Peter Zijlstra wrote: > On Wed, Feb 04, 2015 at 06:31:13PM +0000, Morten Rasmussen wrote: > > cpuidle associates all idle-states with each cpu while the energy model > > associates them with the sched_group covering the cpus coordinating > > entry to the idle-state. To get idle-state power consumption it is > > therefore necessary to translate from cpuidle idle-state index to energy > > model index. For this purpose it is helpful to know how many idle-states > > that are listed in lower level sched_groups (in struct > > sched_group_energy). > > I think this could use some text to describe how that number is useful. > > I suspect is has something to do with bigger domains having more idle > modes (package C states etc..). Close :) You are not the first to be confused about the idle state representation and numbering. Maybe I should just change it. If we take typical ARM idle-states as an example, we have both per-cpu and per-cluster idle-states. Unlike x86 (IIUC), cluster states are controlled by cpuidle. All states are represented in the cpuidle state table for each cpu regardless of whether it is a per-cpu or per-cluster state. For the energy model we have organized them by attaching the states to the cpumask representing the cpus that need to coordination to enter the state as this is rather important to know to estimate energy consumption. Idle-state cpuidle Energy model table indices index per-cpu sg per-cluster sg WFI 0 0 Core power-down 1 1 Cluster power-down 2 0 Cluster power-down is the first (and only in this example) per-cluster idle-state and in is therefore put in the idle-state table for the sched_group spanning the whole cluster. Since it is first it has index 0. However, the same state has index 2 in cpuidle as it only has a table per cpu. To do an easy translation from cpuidle index to energy model idle-state table index it is therefore quite useful to know how many states are in the tables of of the energy model attached to groups a lower levels. Basically, energy_model_idx = cpuidle_idx - state_below, which is 2 - 2 = 0 for cluster power-down. An alternative that could avoid this translation is to have a full table at each level (3 entries for this example) and insert dummy values on indices not applicable to the group the table is attached to. For example insert '0' on index=2 for the per-cpu sg energy model data. We can't avoid index translation entirely though. We need to know the cluster power consumption when all cpus are in state 0 or 1, but the cluster is still up in an idle but yet active state to estimate energy consumption. The energy model therefore has an additional 'active idle' idle state for the cluster which sits before the first true idle-state in the energy model idle-state table. In the example above, active idle would be per-cluster sg energy model table index 0 and cluster power-down index 1.