Re: [PATCH v6 4/7] cgroup: cgroup v2 freezer

From: Roman Gushchin <guro@fb.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Roman Gushchin <guroan@gmail.com>, Tejun Heo <tj@kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v6 4/7] cgroup: cgroup v2 freezer
Date: Mon, 11 Feb 2019 21:30:41 +0000	[thread overview]
Message-ID: <20190211213036.GA26063@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <20190130165200.GA4131@redhat.com>

On Wed, Jan 30, 2019 at 05:52:01PM +0100, Oleg Nesterov wrote:
> Hi Roman,
> 
> On 01/28, Roman Gushchin wrote:
> >
> > Yes, I think you're right: cgroup_exit() should check CGRP_FREEZE bit,
> > not CGRP_FROZEN. Like cgroup_post_fork() does (a one-liner change below).

Hi Oleg!

Sorry for the late reply, I was out of work for some time...
Now I'm fully back.

> 
> but this won't fix all problems? it seems that you missed my other concerns.
> 
> Firstly, this doesn't look consistent. Suppose a cgroup contains a single
> process sleeping in ptrace_stop(). Then it becomes CGRP_FROZEN right after
> "echo 1 > cgroup.freeze".
> 
> OTOH. if this single task sleeps in do_freezer_trap() and gets PTRACE_INTERRUPT,
> it will equally sleep ptrace_stop() but cgroup won't be CGRP_FROZEN. Never.
> 
> Worse, this looks just wrong. In the latter case, cgroup becomes CGRP_FROZEN
> right after a 2nd task migrates to this cgroup, before this new task calls
> do_freezer_trap() or cgroup_enter_stopped().

You're right. So, it looks like the problem is in the equation
nr_tasks_frozen + nr_tasks_stopped == nr_tasks_to_freeze ,
because a task can be frozen and stopped simultaneously.

So, basically it has to be
nr_tasks_frozen + nr_tasks_stopped >= nr_tasks_to_freeze instead.

I'll cover it with an unit test, and will post v7 soon.

> 
> 
> 
> > About spurious transitions (like frozen->non frozen->frozen on a task
> > being SIGKILLed):
> > in early versions of the patchset I've tried to avoid them, but then
> > following the Tejun's advice
> > switched over to expose them to a user. The logic behind is simple: if
> > the state of the cgroup has been changed (a task is gone, for
> > example), let's notify a user.
> 
> OK, I won't argue...
> 
> actually I can't argue because I do not really understand why do we want
> a "killable" freezer, let alone ptraceable ;)

So the problem with the frozen state as in cgroup v1 that it's a very
special and "non-natural" task state, which requires special handling
in many places.

Just for an example, we're using oomd (userspace OOM handling daemon),
which selects and kills one of the workloads in case of too high memory
pressure. It should be able to kill all tasks in the selected cgroup,
but we definitely don't want to let it to fiddle with cgroup controls
to unfreeze the cgroup. Etc.

In other words, the cgroup freezer has a simple task of preventing
processes in the given cgroup to consume CPU resources. Unlike the system-wide
freezer, it shouldn't try to make a "snapshot", it just a non-goal.

Thanks!