Re: [PATCH 1/1] x86/cqm: Cqm requirements - David Carrillo-Cisneros

From: David Carrillo-Cisneros <davidcc@google.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	Vikas Shivappa <vikas.shivappa@linux.intel.com>,
	"Shivappa, Vikas" <vikas.shivappa@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
	"Yu, Fenghua" <fenghua.yu@intel.com>,
	"Kleen, Andi" <andi.kleen@intel.com>
Subject: Re: [PATCH 1/1] x86/cqm: Cqm requirements
Date: Thu, 9 Mar 2017 10:05:36 -0800	[thread overview]
Message-ID: <CALcN6mjpuiy1mMn2KG33ororsf7e68AZYTmSjtpD9b-pitsY1g@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1703091145070.3521@nanos>

On Thu, Mar 9, 2017 at 3:01 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 8 Mar 2017, David Carrillo-Cisneros wrote:
>> On Wed, Mar 8, 2017 at 12:30 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > Same applies for per CPU measurements.
>>
>> For CPU measurements. We need perf-like CPU filtering to support tools
>> that perform low overhead monitoring by polling CPU events. These
>> tools approximate per-cgroup/task events by reconciling CPU events
>> with logs of what job run when in what CPU.
>
> Sorry, but for CQM that's just voodoo analysis.

I'll argue that. Yet, perf-like CPU is also needed for MBM, a less
contentious scenario, I believe.

>
> CPU default is CAT group 0      (20% of cache)
> T1 belongs to CAT group 1       (40% of cache)
> T2 belongs to CAT group 2       (40% of cache)
>
> Now you do low overhead samples of the CPU (all groups accounted) with 1
> second period.
>
> Lets assume that T1 runs 50% and T2 runs 20% the rest of the time is
> utilized by random other things and the kernel itself (using CAT group 0).
>
> What is the accumulated value telling you?

In this single example not much, only the sum of occupancies. But
assume I have T1...T10000 different jobs, and I randomly select a pair
of those jobs to run together in a machine, (they become the T1 and T2
in your example). Then I repeat that hundreds of thousands of times.

I can collect all data with (tasks run, time run, occupancy) and build
a simple regression to estimate the expected occupancy (and some
confidence interval). That inaccurate but approximate value is very
useful to feed into a job scheduler. Furthermore, it can be correlated
with values of other events that are currently sampled this way.

>
> How do you approximate that back to T1/T2 and the rest?

Described above for large numbers and random samples. More
sophisticated (voodo?) statistic techniques are employed in practice
to account for almost all issues I could think of (selection bias,
missing values, interaction between tasks, etc). They seem to work
fine.

>
> How do you do that when the tasks are switching between the samples several
> times?

It does not work well for a single run (your example). But for the
example I gave, one can just rely on Random Sampling, Law of Large
Numbers, and Central Limit Theorem.

Thanks,
David