Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface

From: Thomas Gleixner <tglx@linutronix.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>, Ingo Molnar <mingo@elte.hu>,
	"H. Peter Anvin" <h.peter.anvin@intel.com>,
	Tejun Heo <tj@kernel.org>, Borislav Petkov <bp@suse.de>,
	Stephane Eranian <eranian@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	David Carrillo-Cisneros <davidcc@google.com>,
	Ravi V Shankar <ravi.v.shankar@intel.com>,
	Vikas Shivappa <vikas.shivappa@linux.intel.com>,
	Sai Prakhya <sai.praneeth.prakhya@intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>, x86 <x86@kernel.org>
Subject: Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface
Date: Thu, 14 Jul 2016 08:53:17 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.11.1607132310310.4083@nanos> (raw)
In-Reply-To: <20160713171310.GA14521@intel.com>

On Wed, 13 Jul 2016, Luck, Tony wrote:
> On Wed, Jul 13, 2016 at 02:47:30PM +0200, Thomas Gleixner wrote:
> > On Tue, 12 Jul 2016, Fenghua Yu wrote:
> > > +3. Hierarchy in rscctrl
> > > +=======================
> > 
> > What means rscctrl?
> > 
> > You were not able to find a more cryptic acronym?
> 
> rscctrl == resource control
> 
> Intel marketing would (probably) like us to use:
> 
>    /sys/fs/Intel(R) Resource Director Technology(TM)/
> 
> Happy to take suggestions for something in between those
> extremes :-)

I'd suggest "resctrl" and the abbreviation dictionaries tell me that the most
common ones for resource are: R, RESORC, RES

> > > +Any tasks scheduled on the cpus will use the schemas. User can set
> > > +both "cpus" and "tasks" to share the same schema in one directory. But when
> > > +a CPU is bound to a schema, a task running on the CPU uses this schema and
> > > +kernel will ignore scheam set up for the task in "tasks".
> > 
> > This does not make any sense. 
> > 
> > When a task is bound to a schema then this should have preference over the
> > schema which is associated to the CPU. The CPU association is meant for tasks
> > which are not bound to a particular partition/schema.
> > 
> > So the initial setup should be:
> > 
> >    - All CPUs are associated to the root resource partition
> > 
> >    - No thread is associated to a particular resource partition
> > 
> > When a thread is added to a 'tasks' file of a partition then this partition
> > takes preference. If it's removed, i.e. the association to a partition is
> > undone, then the CPU association is used.
> > 
> > I have no idea why you think that all threads should be in a tasks file by
> > default. Associating CPUs in the first place makes a lot more sense as it
> > represents the topology of the system nicely.
> 
> If we did it that way, it would be harder to change the default
> resources.  E.g. now we start with all processes in the root
> rdtgroup.  We can change the schema for the root group and restrict
> them to, say, 60% of L3 cache on one (or all) sockets - giving us
> 40% of cache to give out to one or more groups.

I tend to disagree.

If you start up with all resources assigned to all CPUs and all tasks are set
to use the CPU default, then you still can restrict the root CPU defaults to
60% L3 which gives you 40% of cache to hand out.

What's hard about this?

Now you can start to create new partitions and either assign CPU or tasks to
them.

As a side effect that avoids the whole 'find all tasks' on mount machinery
simply because the CPU defaults do not change at all.

> So what we've implemented (and perhaps need to explain better here)
> is that every thread always belongs to one (and only one) rdtgroup.
> It will use the resources described in that group whereever it runs,
> except in the case where we have designated some cpus as special snowflakes.

I don't think that case as special snowflakes. Due to the very limited number
of cosids the CPU association is going to be a very useful tool.

> When a cpu is assigned to an rdtgroup the schema for the cpu has
> precedence (i.e. we write the MSR with a CLOSID once, and then it
> never changes).
> 
> Some of this is confusing because people will very likely also use
> cpu affinity to control where their processes run. But affinity is
> orthogonal to rdtgroup membership.

Right. It's confusing and what's even more confusing is that you have no way
to figure out what a particular task is actually using. With the 'use CPU
defaults, if not assigned to a partition' scheme you can very easy figure out
what a task is using because its either in a partition task list or not.

> I think what we have allows you to so all the things we talked about.
> But if we are missing a case, or if things can be simplified while
> still retaining the same functionality then lets discuss that.

It covers almost everything except the case I outlined before:

   Isolated CPU	 	    Important Task runs on isolated CPU
   5% exclusive cache	    10% exclusive cache

That's impossible with your scheme, but it's something which matters. You want
to make sure that the system services on that isolated CPU stay cache hot
without hurting the cache locality of your isolated task.

> Otherwise we can revise the documentation to explain all this better.

That needs to be done in any case. The existing one does not really qualify as
proper documentation. It's closer to a fairy tale :)

I really have to ask why you did not take the time and include all the
information you gave now into that documentation file in the first place.

> > > +Initial value is all zeros which means there is no CPU bound to the schemas
> > > +in the root directory and tasks use the schemas.
> > 
> > As I said above this is backwards.
> 
> > > +If one resource is disabled, its line is not shown in schemas file.
> > 
> > That means:	  
> > 
> >      Resources which are not described in a schemata file are disabled for
> >      that particular partition.
> > 
> > Right?
> > 
> > Now that raises the question how this is supposed to work. Let's assume that
> > we have a partition 'foo' and thread X is in the tasks file of that
> > partition. The schema of that partition contains only an L2 entry. What's the
> > L3 association for thread X? Nothing at all?
> 
> Resources are either enabled or disabled globally. Each schema file
> must provide details for every enabled resource. So if we are on a
> processor that supports both L2 and L3, we will normally have schema
> files that specify both.
> We could boot with the "disable_cat_l2"
> kernel command line option and then every schema file would just
> specify L3 (and the MSRs for L2 would all be set to all-ones so that
> everyone had full access to the L2 on each core).

So the above should read:

   Each schema file must provide configuration for all resource controls which
   are enabled in the system.

Right?

> > > +User can create a sub-directory under the root directory by "mkdir" command.
> > > +User can remove the sub-directory by "rmdir" command.
> > 
> > User? Any user?
> 
> Well if someone did:
>  # chmod 777 /sys/fs/rscctrl
> then any user could make directories.  That would be inadvisable.
> You could use 775 and let a trusted group have control so that you
> didn't require root access to modify things.
> 
> Should we say "system administrator" rather than "user"?

Yes. Because the default should be 755 which is the obvious choice for all
root/admin controlled things. If root decides to change it to 777 then it's
not the kernels problem. But documentation should clearly say: It's a root
controlled resource.

Thanks,

	tglx