From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932721AbdCJOxo (ORCPT ); Fri, 10 Mar 2017 09:53:44 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:56782 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932621AbdCJOxm (ORCPT ); Fri, 10 Mar 2017 09:53:42 -0500 Date: Fri, 10 Mar 2017 15:53:32 +0100 (CET) From: Thomas Gleixner To: David Carrillo-Cisneros cc: Stephane Eranian , "Luck, Tony" , Vikas Shivappa , "Shivappa, Vikas" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "hpa@zytor.com" , "mingo@kernel.org" , "peterz@infradead.org" , "Shankar, Ravi V" , "Yu, Fenghua" , "Kleen, Andi" Subject: Re: [PATCH 1/1] x86/cqm: Cqm requirements In-Reply-To: Message-ID: References: <1488908964-30261-1-git-send-email-vikas.shivappa@linux.intel.com> <3908561D78D1C84285E8C5FCA982C28F6120F0A9@ORSMSX113.amr.corp.intel.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 9 Mar 2017, David Carrillo-Cisneros wrote: > On Thu, Mar 9, 2017 at 3:01 AM, Thomas Gleixner wrote: > > On Wed, 8 Mar 2017, David Carrillo-Cisneros wrote: > >> On Wed, Mar 8, 2017 at 12:30 AM, Thomas Gleixner wrote: > >> > Same applies for per CPU measurements. > >> > >> For CPU measurements. We need perf-like CPU filtering to support tools > >> that perform low overhead monitoring by polling CPU events. These > >> tools approximate per-cgroup/task events by reconciling CPU events > >> with logs of what job run when in what CPU. > > > > Sorry, but for CQM that's just voodoo analysis. > > I'll argue that. Yet, perf-like CPU is also needed for MBM, a less > contentious scenario, I believe. MBM is a different playground (albeit related due to the RMID stuff). > It does not work well for a single run (your example). But for the > example I gave, one can just rely on Random Sampling, Law of Large > Numbers, and Central Limit Theorem. Fine. So we need this for ONE particular use case. And if that is not well documented including the underlying mechanics to analyze the data then this will be a nice source of confusion for Joe User. I still think that this can be done differently while keeping the overhead small. You look at this from the existing perf mechanics which require high overhead context switching machinery. But that's just wrong because that's not how the cache and bandwidth monitoring works. Contrary to the other perf counters, CQM and MBM are based on a context selectable set of counters which do not require readout and reconfiguration when the switch happens. Especially with CAT in play, the context switch overhead is there already when CAT partitions need to be switched. So switching the RMID at the same time is basically free, if we are smart enough to do an equivalent to the CLOSID context switch mechanics and ideally combine both into a single MSR write. With that the low overhead periodic sampling can read N counters which are related to the monitored set and provide N separate results. For bandwidth the aggregation is a simple ADD and for cache residency it's pointless. Just because perf was designed with the regular performance counters in mind (way before that CQM/MBM stuff came around) does not mean that we cannot change/extend that if it makes sense. And looking at the way Cache/Bandwidth allocation and monitoring works, it makes a lot of sense. Definitely more than shoving it into the current mode of operandi with duct tape just because we can. Thanks, tglx