From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AIpwx4/MSvTawecauH13PJwfEhuYRjfddMWCZwPQLe/WL4J6TT9mkNRp7GSuDIab5jXETAze3ioh ARC-Seal: i=1; a=rsa-sha256; t=1523281519; cv=none; d=google.com; s=arc-20160816; b=j3e60OKXo6o6FONDWcCBYNpO38MC6qKidXE7/YvyJ8JgXpDUXyxJQjhqXYGS0IQKRq bP5zR9oWtcmN4180ZvoqK8/rD5tfaHHyWcXG69p1m27xMWGGkDOvbYgbuolzF9UniT0H zNJxV9xohPquXu9lqQj/LmMIYUj4Uxv+n+SNuaoUbkdqy2FqqS0QW7EUkjgwxOABhb8A Ausfa7hXYjBJk2ghyN6E5Ig7tS0/My3iK9DXp8S0/vuQ+0ORmu2HR1kTpw/c6FjxDoFi 3vGqX6mgl7nj68l9sZTGPgfq3wSJWYkT8qB5Y5OpyVU2gClX9TlTLjkc3BlYXs1wiHu5 nGBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:arc-authentication-results; bh=GBr1wkJ/DQKAz4/K/PN5vRsdqADpo15F8t/+JOvT2iE=; b=nYg2RLJQUU5pCSt1rWsMouUspXypay91RjL11WxI2nSVybsDnSKRQNArUYsb1nEdOu pTmT4xkJbBgmxza8gK9re/arOK8drK0rkRwCRQX8OoWI2qX/3WCgpMxIWg3fEi4Bw23v yD1MDXQKDXaiVm9gyAWYs7DhRksHn9WNnrgnl6ibh/Wzmjn5vBbNMBWK5cCEMbmaUP/B XK+QXhvThrM+t5Q9IFHOqFeXFRk7uZrJJXreDukOjBpyQ1ta4a4SaVMOGKQq5Us8wUX1 Tgmy5ioRfEwHLqBE2Ew1y2FYgqyxJ41sSA7pizliABaATsLEwNC8s7HcK4thEsAhrLOP eq/w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of quentin.perret@arm.com designates 217.140.101.70 as permitted sender) smtp.mailfrom=quentin.perret@arm.com Authentication-Results: mx.google.com; spf=pass (google.com: domain of quentin.perret@arm.com designates 217.140.101.70 as permitted sender) smtp.mailfrom=quentin.perret@arm.com Date: Mon, 9 Apr 2018 14:45:11 +0100 From: Quentin Perret To: Peter Zijlstra Cc: Dietmar Eggemann , linux-kernel@vger.kernel.org, Thara Gopinath , linux-pm@vger.kernel.org, Morten Rasmussen , Chris Redpath , Patrick Bellasi , Valentin Schneider , "Rafael J . Wysocki" , Greg Kroah-Hartman , Vincent Guittot , Viresh Kumar , Todd Kjos , Joel Fernandes Subject: Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs Message-ID: <20180409134510.GA4577@e108498-lin.cambridge.arm.com> References: <20180320094312.24081-1-dietmar.eggemann@arm.com> <20180320094312.24081-3-dietmar.eggemann@arm.com> <20180409120111.GA4043@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180409120111.GA4043@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.8.3 (2017-05-23) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1595449333969249680?= X-GMAIL-MSGID: =?utf-8?q?1597276441965412781?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Monday 09 Apr 2018 at 14:01:11 (+0200), Peter Zijlstra wrote: > On Tue, Mar 20, 2018 at 09:43:08AM +0000, Dietmar Eggemann wrote: > > From: Quentin Perret > > > > The energy consumption of each CPU in the system is modeled with a list > > of values representing its dissipated power and compute capacity at each > > available Operating Performance Point (OPP). These values are derived > > from existing information in the kernel (currently used by the thermal > > subsystem) and don't require the introduction of new platform-specific > > tunables. The energy model is also provided with a simple representation > > of all frequency domains as cpumasks, hence enabling the scheduler to be > > aware of dependencies between CPUs. The data required to build the energy > > model is provided by the OPP library which enables an abstract view of > > the platform from the scheduler. The new data structures holding these > > models and the routines to populate them are stored in > > kernel/sched/energy.c. > > > > For the sake of simplicity, it is assumed in the energy model that all > > CPUs in a frequency domain share the same micro-architecture. As long as > > this assumption is correct, the energy models of different CPUs belonging > > to the same frequency domain are equal. Hence, this commit builds only one > > energy model per frequency domain, and links all relevant CPUs to it in > > order to save time and memory. If needed for future hardware platforms, > > relaxing this assumption should imply relatively simple modifications in > > the code but a significantly higher algorithmic complexity. > > What this doesn't mention is why this isn't part of the regular topology > bits. IIRC this is because the frequency domains don't necessarily need > to align with the existing topology, but this completely fails to state > any of that. Yes that's the main reason. Frequency domains and scheduling domains don't necessarily align. That used to be the case for big.LITTLE platforms, but not anymore with DynamIQ ... > > Also, since I'm not at all familiar with DT and the OPP library stuff, > this code is completely unreadable to me and there isn't a nice comment > to help me along. Right, so I can definitely fix that. Comments in the code and a better commit message should help hopefully. And also, it has already been suggested that a documentation file should be added alongside the code for this patchset, so I'll make sure we add that for the next version. In the meantime, here is a (hopefully) better explanation below. In this specific patch, we are basically trying to figure out the boundaries of frequency domains, and the power consumed by each CPU at each OPP, to make them available to the scheduler. The important thing here is that, in both cases, we rely on the OPP library to keep the code as platform-agnostic as possible. In the case of the frequency domains for example, the cpufreq driver is in charge of specifying the CPUs that are sharing frequencies. That information can come from DT, or SCPI, or SCMI, or whatever -- we probably shouldn't have to care about that from the scheduler's standpoint. That's why using dev_pm_opp_get_sharing_cpus() is handy, the OPP library gives us the digested information we need. The power values (dev_pm_opp_get_power) we use right now are those already used by the thermal subsystem (IPA), which means we don't have to introduce any new DT binding whatsoever. In a close future, the power values could also come from other sources (SCMI for ex), and again it's probably not the scheduler's job to care about those things, so the OPP library is helping us again. As mentioned in the notes, as of today, this approach has dependencies on other patches relating to these things which are already on the list [1]. The rest of the code in this patch is just about iterating over the CPUs/freq. domains/OPPs. The algorithm is more or less the following: 1. find a frequency domain which hasn't been visited yet; 2. estimate the power and capacity of a CPU in this freq domain at each possible OPP; 3. map all CPUs in the freq domain to this list of tuples; 4. go to 1. I hope that makes sense. Thanks, Quentin [1] https://marc.info/?l=linux-pm&m=151635516419249&w=2