Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management

From: "Luck, Tony" <tony.luck@intel.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: "Yu, Fenghua" <fenghua.yu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	"Anvin, H Peter" <h.peter.anvin@intel.com>,
	Tejun Heo <tj@kernel.org>, Borislav Petkov <bp@suse.de>,
	Stephane Eranian <eranian@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	David Carrillo-Cisneros <davidcc@google.com>,
	"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
	Vikas Shivappa <vikas.shivappa@linux.intel.com>,
	"Prakhya, Sai Praneeth" <sai.praneeth.prakhya@intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>, x86 <x86@kernel.org>
Subject: Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management
Date: Tue, 26 Jul 2016 03:18:33 +0000	[thread overview]
Message-ID: <EEE30627-295A-4131-A7A1-BAB8ECA1A326@intel.com> (raw)
In-Reply-To: <20160723043103.GA22015@amt.cnet>

You must specify a mask for each L3 cache. So you can achieve your 80/80 split either with one rdtgroup that has an 80% mask on each of the sockets and using affinity to make one VM run only on CPUs on one socket and the second VM on the other. 

Or separate rdtgroups for each VM that give them the 80% when they are on their own socket and the spare 20% if the wander off to the other socket.

Sent from my iPhone

> On Jul 25, 2016, at 19:13, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
>> On Fri, Jul 22, 2016 at 02:43:23PM -0700, Luck, Tony wrote:
>>> On Fri, Jul 22, 2016 at 04:12:04AM -0300, Marcelo Tosatti wrote:
>>> How does this patchset handle the following condition:
>>> 
>>> 6) Create reservations in such a way that the sum is larger than
>>> total amount of cache, and CPU pinning (example from Karen Noel):
>>> 
>>> VM-1 on socket-1 with 80% of reservation.
>>> VM-2 on socket-2 with 80% of reservation.
>>> VM-1 pinned to socket-1.
>>> VM-2 pinned to socket-2.
>> 
>> That's legal, but perhaps we need a description of
>> overlapping cache reservations.
>> 
>> Hardware tells you how finely you can divide the cache (and this
>> information is shown in /sys/fs/resctrl/info/l3/max_cbm_len to save
>> you from digging in CPUID leaves).  E.g. on Broadwell the value is
>> 20, so you can control cache allocations in 5% slices.
>> 
>> A bitmask defines which slices you can use (and h/w has the restriction
>> that you must have contiguous '1' bits in any mask).  So you can pick
>> your 80% using 0x0ffff, 0x1fffe, 0x3fffc, 0x7fff8 or 0xffff0.
>> 
>> There is no requirement that masks be exclusive of each other. So
>> you might pick the two extremes: 0x0ffff and 0xffff0 for your two
>> VM's in this example. Each would be allowed to allocate up to 80%,
>> but with a big overlap in the middle. Each has 20% exclusive, but
>> there is a 60% range in the middle that they would compete for.
> 
> This are different sockets, so there is no competing/sharing of L3 cache
> here: the question is about whether the interface allows the
> user to specify that 80/80 reservation without complaining:
> because the VM's are pinned, they will never actually
> share the same L3 cache.
> 
> (haven't finished reading the patchset to be certain).
> 
>> Is this specific case useful? Possibly not.  I think the more common
>> overlap cases might be between processes that you know have shared
>> code/data. Also the case where some rdtgroup has access to allocate
>> in the entire cache (mask 0xfffff on Broadwell) and some other
>> rdtgroups
>> have limited cache allocation with less bits in the mask.
>> 
>> -Tony
> 
> All you have to do is to build the bitmask for a given processor
> from the union of the tasks which have been scheduled on that
> processor.
> 
>