Re: [Documentation] State of CPU controller in cgroup v2

From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Andy Lutomirski <luto@amacapital.net>, Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	kernel-team@fb.com,
	"open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paul Turner <pjt@google.com>, Li Zefan <lizefan@huawei.com>,
	Linux API <linux-api@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [Documentation] State of CPU controller in cgroup v2
Date: Sat, 20 Aug 2016 22:34:14 -0700	[thread overview]
Message-ID: <1471757654.2354.97.camel@HansenPartnership.com> (raw)
In-Reply-To: <CALCETrXvLNeds+ugZ8j3eD1Zg1RZYJSAET3Kguz5G2vqSLFCwQ@mail.gmail.com>

On Wed, 2016-08-17 at 13:18 -0700, Andy Lutomirski wrote:
> On Aug 5, 2016 7:07 PM, "Tejun Heo" <tj@kernel.org> wrote:
[...]
> > 2. Disagreements and Arguments
> > 
> > There have been several lengthy discussion threads [3][4] on LKML
> > around the structural constraints of cgroup v2.  The two that 
> > affect the CPU controller are process granularity and no internal 
> > process constraint.  Both arise primarily from the need for common 
> > resource domain definition across different resources.
> > 
> > The common resource domain is a powerful concept in cgroup v2 that
> > allows controllers to make basic assumptions about the structural
> > organization of processes and controllers inside the cgroup 
> > hierarchy, and thus solve problems spanning multiple types of 
> > resources.  The prime example for this is page cache writeback: 
> > dirty page cache is regulated through throttling buffered writers 
> > based on memory availability, and initiating batched write outs to 
> > the disk based on IO capacity.  Tracking and controlling writeback 
> > inside a cgroup thus requires the direct cooperation of the memory 
> > and the IO controller.
> > 
> > This easily extends to other areas, such as CPU cycles consumed 
> > while performing memory reclaim or IO encryption.
> > 
> > 
> > 2-1. Contentious Restrictions
> > 
> > For controllers of different resources to work together, they must
> > agree on a common organization.  This uniform model across 
> > controllers imposes two contentious restrictions on the CPU 
> > controller: process granularity and the no-internal-process
> > constraint.
> > 
> > 
> >   2-1-1. Process Granularity
> > 
> >   For memory, because an address space is shared between all
> > threads
> >   of a process, the terminal consumer is a process, not a thread.
> >   Separating the threads of a single process into different memory
> >   control domains doesn't make semantical sense.  cgroup v2 ensures
> >   that all controller can agree on the same organization by
> > requiring
> >   that threads of the same process belong to the same cgroup.
> 
> I haven't followed all of the history here, but it seems to me that
> this argument is less accurate than it appears.  Linux, for better or
> for worse, has somewhat orthogonal concepts of thread groups
> (processes), mms, and file tables.  An mm has VMAs in it, and VMAs 
> can reference things (files, etc) that hold resources.  (Two mms can
> share resources by mapping the same thing or using fork().)  File 
> tables hold files, and files can use resources.  Both of these are, 
> at best, moderately good approximations of what actually holds 
> resources. Meanwhile, threads (tasks) do syscalls, take page faults, 
> *allocate* resources, etc.
> 
> So I think it's not really true to say that the "terminal consumer" 
> of anything is a process, not a thread.
> 
> While it's certainly easier to think about assigning processes to
> cgroups, and I certainly agree that, in the common case, it's the
> right thing to do, I don't see why requiring it is a good idea.  Can
> we turn this around: what actually goes wrong if cgroup v2 were to
> allow assigning individual threads if a user specifically requests
> it?

A similar point from a different consumer: from the unprivileged
containers point of view, I'm interested in a thread based interface as
well.  The principle utility of unprivileged containers is to allow
applications that wish to to use container properties (effectively to
become self-containerising).  Some that use the producer/consumer model
do use process pools (apache springs to mind instantly) but some use
thread pools.  It is useful to the latter to preserve the concept of a
thread as being the entity inhabiting the cgroup (but only where the
granularity of the cgroup permits threads to participate) so we can
easily modify them to be self containerising without forcing them to
switch back from a thread pool model to a process pool model.

I can see that process based is conceptually easier in v2 because you
begin with a process tree, but it would really be a pity to lose the
thread based controls we have now and permanently lose the ability to
create more as we find uses for them.  I can't really see how improving
"common resource domain" is a good tradeoff for this.

James