linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Fenghua Yu <fenghua.yu@intel.com>, H Peter Anvin <hpa@zytor.com>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org, x86 <x86@kernel.org>,
	Vikas Shivappa <vikas.shivappa@linux.intel.com>,
	Karen Noel <knoel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Luiz Capitulino <lcapitulino@redhat.com>,
	Clark Williams <williams@redhat.com>, Tejun Heo <tj@kernel.org>
Subject: cat cgroup interface proposal (non hierarchical) was Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support
Date: Mon, 2 Nov 2015 20:20:58 -0200	[thread overview]
Message-ID: <20151102222057.GA30158@amt.cnet> (raw)
In-Reply-To: <20151016095022.GP3816@twins.programming.kicks-ass.net>

On Fri, Oct 16, 2015 at 11:50:22AM +0200, Peter Zijlstra wrote:
> On Thu, Oct 15, 2015 at 11:28:52PM -0300, Marcelo Tosatti wrote:
> > On Thu, Oct 15, 2015 at 01:36:14PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 13, 2015 at 06:31:27PM -0300, Marcelo Tosatti wrote:
> > > > I am rewriting the interface with ioctls, with commands similar to the
> > > > syscall interface proposed.
> > > 
> > > Which is horrible for other use cases. I really don't see the problem
> > > with the cgroup stuff.
> > 
> > Can you detail what "horrible" means? 
> 
> Say an RT scenario; you set up your machine with cgroups. You create a
> cpuset with is disjoint from the others, you frob around with the cpu
> cgroup, etc..
> 
> So once you're all done, you start your RT app into a cgroup.
> 
> But oh, fail, now you have to go muck about with ioctl()s to get the
> cache allocation cruft to work.

Peter, what follows is your cgroup proposal (extended), but 
at the end there is a point about impossibility of this cgroup 
interface to share cache between tasks, which IMO renders it unuseable
(as it blocks any threads from sharing reserved cache).

If you have any ideas on how to circumvent this, they are appreciated.

Follows non-hierarchical cgroup CAT interface proposal. Thanks to some of the
CC'ed Red Hat folks for early comments.

cgroup CAT interface (non hierarchical):
---------------------------------------

0) Main directory files:

cat_hw_info
-----------
CAT HW information: CBM length, CDP supported, etc.
Information reported per-socket, as sockets can have
different configurations. Perhaps should be inside
sysfs.

1) Sub-directories represent cache reservations (size,type).

mkdir cache_reservation_for_forwarding_app
cd cache_reservation_for_forwarding_app
echo "80KB" > size
echo "data_and_code" > type
echo "socketmask=0xfffff" > socketmask (optional)
echo "1" > enable_reservation
echo "pid-of-forwarding-main-thread pid-of-helper-thread ..." > tasks

Files:

type
----------------
{data_and_code, only_code, only_data}. Type of
L3 CAT cache allocation to use. only_code,only_data only
supported on CDP capable processors.

size
----
size of L3 cache reservation.

rounding
--------
{round_down,round_up} whether to round up / round down
allocation size in kbytes, to cache-way size.

Default: round_up

socketmask
----------
Mask of sockets where the reservation is in effect.
A zero bit means the task will not have the L3 cache
portion that the cgroup references reserved on that socket.
Default: all sockets set.

enable
------
Allocate reservation with parameters set above.

When a reservation is enabled, it reserves L3 cache
space on any socket thats specified in "socketmask".

After cgroup has been enabled by a write of "1" to
"enable_reservation" file, only the "tasks" file can be modified.
To change the size of a cgroup reservation, recreate the directory.

tasks
-----

Contains the list of tasks which use this cache reservation.

Error reporting
---------------

Errors are reported in response to write as appropriate:
for example, write 1 > enable when there is not enough space
for "socketmask" would return -ENOSPC, etc.
Write to "enable" without size being set would return -EINVAL, etc.

Listing
-------
To list which reservations are in place, search for subdirectories
where "enabled" file has value 1.

Semantics: A task has guaranteed cache reservation on any CPU where its
scheduled in, for the lifetime of the cgroup, as long as that task is
not attached to further cgroups.

That is, a task belonging to cgroup-A can have its cache reservation
invalidated when attached to cgroup-B, (reasoning: it might be necessary
to reallocate the CBMs to respect contiguous bits in cache, a
restriction of the CAT HW interface).


-------
BLOCKER
-------

Can't use cgroups for CAT because:

"Control Groups extends the kernel as follows:

 - Each task in the system has a reference-counted pointer to a
   css_set.

 - A css_set contains a set of reference-counted pointers to
   cgroup_subsys_state objects, one for each cgroup subsystem
   registered in the system."

You need a task to be part of two cgroups at one time, 
to support the following configuration:

Task-A: 70% of cache reserved exclusively (reservation-0).
        20% of cache reserved (reservation-1).

Task-B: 20% of cache reserved (reservation-1).

Unless reservations are created separately, then added to cgroups:

mount -t cgroup ... /../catcgroup/
cd /../catcgroup/
# create reservations
cd reservations
mkdir reservation-1
echo "80K" > size
echo "socketmask" > ...
echo "1" > enable
mkdir reservation-2 
echo "160K" > size
echo "socketmask" > ...
echo "1" > enable
# attach reservation to cgroups
cd /../catcgroup/
mkdir cgroup-for-threaded-app
echo reservation-1 reservation-2 > reservations
echo "mainthread" > tasks
cd ..
mkdir cgroup-for-helper-thread
echo reservation-1 > reservations
echo "helperthread" > tasks
cd .. 

This way mainthread and helperthread can share "reservation-1".

But this is abusing cgroups in a way that it has not been designed for.
Who is going to maintain the linkage between reservations and cgroups?



  parent reply	other threads:[~2015-11-02 22:21 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-02  6:09 [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 01/11] x86/intel_cqm: Modify hot cpu notification handling Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 02/11] x86/intel_rapl: " Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 03/11] x86/intel_rdt: Cache Allocation documentation Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 04/11] x86/intel_rdt: Add support for Cache Allocation detection Fenghua Yu
2015-11-04 14:51   ` Luiz Capitulino
2015-10-02  6:09 ` [PATCH V15 05/11] x86/intel_rdt: Add Class of service management Fenghua Yu
2015-11-04 14:55   ` Luiz Capitulino
2015-10-02  6:09 ` [PATCH V15 06/11] x86/intel_rdt: Add L3 cache capacity bitmask management Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 07/11] x86/intel_rdt: Implement scheduling support for Intel RDT Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 08/11] x86/intel_rdt: Hot cpu support for Cache Allocation Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 09/11] x86/intel_rdt: Intel haswell Cache Allocation enumeration Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 10/11] x86,cgroup/intel_rdt : Add intel_rdt cgroup documentation Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 11/11] x86,cgroup/intel_rdt : Add a cgroup interface to manage Intel cache allocation Fenghua Yu
2015-11-18 20:58   ` Marcelo Tosatti
2015-11-18 21:27   ` Marcelo Tosatti
2015-12-16 22:00     ` Yu, Fenghua
2015-11-18 22:15   ` Marcelo Tosatti
2015-12-14 22:58     ` Yu, Fenghua
2015-10-11 19:50 ` [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support Thomas Gleixner
2015-10-12 18:52   ` Yu, Fenghua
2015-10-12 19:58     ` Thomas Gleixner
2015-10-13 22:40     ` Marcelo Tosatti
2015-10-15 11:37       ` Peter Zijlstra
2015-10-16  0:17         ` Marcelo Tosatti
2015-10-16  9:44           ` Peter Zijlstra
2015-10-16 20:24             ` Marcelo Tosatti
2015-10-19 23:49               ` Marcelo Tosatti
2015-10-13 21:31   ` Marcelo Tosatti
2015-10-15 11:36     ` Peter Zijlstra
2015-10-16  2:28       ` Marcelo Tosatti
2015-10-16  9:50         ` Peter Zijlstra
2015-10-26 20:02           ` Marcelo Tosatti
2015-11-02 22:20           ` Marcelo Tosatti [this message]
2015-11-04 14:42 ` Luiz Capitulino
2015-11-04 14:57   ` Thomas Gleixner
2015-11-04 15:12     ` Luiz Capitulino
2015-11-04 15:28       ` Thomas Gleixner
2015-11-04 15:35         ` Luiz Capitulino
2015-11-04 15:50           ` Thomas Gleixner
2015-11-05  2:19 ` [PATCH 1/2] x86/intel_rdt,intel_cqm: Remove build dependency of RDT code on CQM code David Carrillo-Cisneros
2015-11-05  2:19   ` [PATCH 2/2] x86/intel_rdt: Fix bug in initialization, locks and write cbm mask David Carrillo-Cisneros

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151102222057.GA30158@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=knoel@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vikas.shivappa@linux.intel.com \
    --cc=williams@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).