From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932107AbbCWQrR (ORCPT ); Mon, 23 Mar 2015 12:47:17 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:42823 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932080AbbCWQrP (ORCPT ); Mon, 23 Mar 2015 12:47:15 -0400 Date: Mon, 23 Mar 2015 17:47:02 +0100 From: Peter Zijlstra To: Morten Rasmussen Cc: Sai Gurrappadi , "mingo@redhat.com" , "vincent.guittot@linaro.org" , Dietmar Eggemann , "yuyang.du@intel.com" , "preeti@linux.vnet.ibm.com" , "mturquette@linaro.org" , "nico@linaro.org" , "rjw@rjwysocki.net" , Juri Lelli , "linux-kernel@vger.kernel.org" , Peter Boonstoppel Subject: Re: [RFCv3 PATCH 30/48] sched: Calculate energy consumption of sched_group Message-ID: <20150323164702.GL23123@twins.programming.kicks-ass.net> References: <1423074685-6336-1-git-send-email-morten.rasmussen@arm.com> <1423074685-6336-31-git-send-email-morten.rasmussen@arm.com> <55036AA1.7000801@nvidia.com> <20150316141546.GQ4081@e105550-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150316141546.GQ4081@e105550-lin.cambridge.arm.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 16, 2015 at 02:15:46PM +0000, Morten Rasmussen wrote: > You are absolutely right. The current code is broken for system > topologies where all cpus share the same clock source. To be honest, it > is actually worse than that and you already pointed out the reason. We > don't have a way of representing top level contributions to power > consumption in RFCv3, as we don't have sched_group spanning all cpus in > single cluster system. For example, we can't represent L2 cache and > interconnect power consumption on such systems. > > In RFCv2 we had a system wide sched_group dangling by itself for that > purpose. We chose to remove that in this rewrite as it led to messy > code. In my opinion, a more elegant solution is to introduce an > additional sched_domain above the current top level which has a single > sched_group spanning all cpus in the system. That should fix the > SD_SHARE_CAP_STATES problem and allow us to attach power data for the > top level. Maybe remind us why this needs to be tied to sched_groups ? Why can't we attach the energy information to the domains? There is an additional problem with groups you've not yet discovered and that is overlapping groups. Certain NUMA topologies result in this. There the sum of cpus over the groups is greater than the total cpus in the domain.