From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756253AbbHDM4G (ORCPT ); Tue, 4 Aug 2015 08:56:06 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43579 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755894AbbHDM4D (ORCPT ); Tue, 4 Aug 2015 08:56:03 -0400 Date: Tue, 4 Aug 2015 09:55:20 -0300 From: Marcelo Tosatti To: Tejun Heo Cc: Vikas Shivappa , linux-kernel@vger.kernel.org, vikas.shivappa@intel.com, x86@kernel.org, hpa@zytor.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, matt.fleming@intel.com, will.auld@intel.com, glenn.p.williamson@intel.com, kanaka.d.juvva@intel.com Subject: Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management Message-ID: <20150804125520.GA31450@amt.cnet> References: <1435789270-27010-1-git-send-email-vikas.shivappa@linux.intel.com> <1435789270-27010-6-git-send-email-vikas.shivappa@linux.intel.com> <20150730194458.GD3504@mtj.duckdns.org> <20150731151218.GC22948@amt.cnet> <20150802162325.GA32599@mtj.duckdns.org> <20150803203250.GA31668@amt.cnet> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150803203250.GA31668@amt.cnet> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 03, 2015 at 05:32:50PM -0300, Marcelo Tosatti wrote: > On Sun, Aug 02, 2015 at 12:23:25PM -0400, Tejun Heo wrote: > > Hello, > > > > On Fri, Jul 31, 2015 at 12:12:18PM -0300, Marcelo Tosatti wrote: > > > > I don't really think it makes sense to implement a fully hierarchical > > > > cgroup solution when there isn't the basic affinity-adjusting > > > > interface > > > > > > What is an "affinity adjusting interface" ? Can you give an example > > > please? > > > > Something similar to sched_setaffinity(). Just a syscall / prctl or > > whatever programmable interface which sets per-task attribute. > > You really want to specify the cache configuration "at once": > having process-A exclusive access to 2MB of cache at all times, > and process-B 4MB exclusive, means you can't have process-C use 4MB of > cache exclusively (consider 8MB cache machine). Thats not true. Its fine to setup the task set <--> cache portion mapping in pieces. In fact, its more natural because you don't necessarily know in advance the entire cache allocation (think of "cp largefile /destination" with sequential use-once behavior). However, there is a use-case for sharing: in scenario 1 it might be possible (and desired) to share code between applications. > > > > and it isn't clear whether fully hierarchical resource > > > > distribution would be necessary especially given that the granularity > > > > of the target resource is very coarse. > > > > > > As i see it, the benefit of the hierarchical structure to the CAT > > > configuration is simply to organize sharing of cache ways in subtrees > > > - two cgroups can share a given cache way only if they have a common > > > parent. > > > > > > That is the only benefit. Vikas, please correct me if i'm wrong. > > > > cgroups is not a superset of a programmable interface. It has > > distinctive disadvantages and not a substitute with hirearchy support > > for regular systemcall-like interface. I don't think it makes sense > > to go full-on hierarchical cgroups when we don't have basic interface > > which is likely to cover many use cases better. A syscall-like > > interface combined with a tool similar to taskset would cover a lot in > > a more accessible way. > > How are you going to specify sharing of portions of cache by two sets > of tasks with a syscall interface? > > > > > I can see that how cpuset would seem to invite this sort of usage but > > > > cpuset itself is more of an arbitrary outgrowth (regardless of > > > > history) in terms of resource control and most things controlled by > > > > cpuset already have countepart interface which is readily accessible > > > > to the normal applications. > > > > > > I can't parse that phrase (due to ignorance). Please educate. > > > > Hmmm... consider CPU affinity. cpuset definitely is useful for some > > use cases as a management tool especially if the workloads are not > > cooperative or delegated; however, it's no substitute for a proper > > syscall interface and it'd be silly to try to replace that with > > cpuset. > > > > > > Given that what the feature allows is restricting usage rather than > > > > granting anything exclusively, a programmable interface wouldn't need > > > > to worry about complications around priviledges > > > > > > What complications about priviledges you refer to? > > > > It's not granting exclusive access, so individual user applications > > can be allowed to do whatever it wanna do as long as the issuer has > > enough priv over the target task. > > Priviledge management with cgroup system: to change cache allocation > requires priviledge over cgroups. > > Priviledge management with system call interface: applications > could be allowed to reserve up to a certain percentage of the cache. > > > > > while being able to reap most of the benefits in an a lot easier way. > > > > Am I missing something? > > > > > > The interface does allow for exclusive cache usage by an application. > > > Please read the Intel manual, section 17, it is very instructive. > > > > For that, it'd have to require some CAP but I think just having > > restrictive interface in the style of CPU or NUMA affinity would go a > > long way. > > > > > The use cases we have now are the following: > > > > > > Scenario 1: Consider a system with 4 high performance applications > > > running, one of which is a streaming application that manages a very > > > large address space from which it reads and writes as it does its processing. > > > As such the application will use all the cache it can get but does > > > not need much if any cache. So, it spoils the cache for everyone for no > > > gain on its own. In this case we'd like to constrain it to the > > > smallest possible amount of cache while at the same time constraining > > > the other 3 applications to stay out of this thrashed area of the > > > cache. > > > > A tool in the style of taskset should be enough for the above > > scenario. > > > > > Scenario 2: We have a numeric application that has been highly optimized > > > to fit in the L2 cache (2M for example). We want to ensure that its > > > cached data does not get flushed from the cache hierarchy while it is > > > scheduled out. In this case we exclusively allocate enough L3 cache to > > > hold all of the L2 cache. > > > > > > Scenario 3: Latency sensitive application executing in a shared > > > environment, where memory to handle an event must be in L3 cache > > > for latency requirements to be met. > > > > Either isolate CPUs or run other stuff with affinity restricted. > > > > cpuset-style allocation can be easier for things like this but that > > should be an addition on top not the one and only interface. How is > > it gonna handle if multiple threads of a process want to restrict > > cache usages to avoid stepping on each other's toes? Delegate the > > subdirectory and let the process itself open it and write to files to > > configure when there isn't even a way to atomically access the > > process's own directory or a way to synchronize against migration? > > One would preconfigure that in advance - but you are right, a > syscall interface is more flexible in that respect. So, systemd is responsible for locking. > > cgroups may be an okay management interface but a horrible > > programmable interface. > > > > Sure, if this turns out to be as important as cpu or numa affinity and > > gets widely used creating management burden in many use cases, we sure > > can add cgroups controller for it but that's a remote possibility at > > this point and the current attempt is over-engineering solution for > > problems which haven't been shown to exist. Let's please first > > implement something simple and easy to use. > > > > Thanks. > > > > -- > > tejun Don't see an easy way to fix the sharing use-case (it would require exposing the "intersection" between two task sets). Can't "cacheset" helper (similar to taskset) talk to systemd to achieve the flexibility you point ?