From: Li Zefan <lizefan@huawei.com>
To: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>,
<containers@lists.linux-foundation.org>,
<cgroups@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
Michal Hocko <mhocko@suse.cz>,
Peter Zijlstra <peterz@infradead.org>,
Paul Turner <pjt@google.com>,
Johannes Weiner <hannes@cmpxchg.org>, Thomas Graf <tgraf@suug.ch>,
"Serge E. Hallyn" <serue@us.ibm.com>,
Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
Neil Horman <nhorman@tuxdriver.com>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
"Daniel P. Berrange" <berrange@redhat.com>,
Lennart Poettering <lennart@poettering.net>,
Kay Sievers <kay.sievers@vrfy.org>
Subject: Re: [RFC] cgroup TODOs
Date: Fri, 14 Sep 2012 17:12:31 +0800 [thread overview]
Message-ID: <5052F4FF.6070508@huawei.com> (raw)
In-Reply-To: <5052E7DF.7040000@parallels.com>
>>
>> 2. memcg's __DEPRECATED_clear_css_refs
>>
>> This is a remnant of another weird design decision of requiring
>> synchronous draining of refcnts on cgroup removal and allowing
>> subsystems to veto cgroup removal - what's the userspace supposed to
>> do afterwards? Note that this also hinders co-mounting different
>> controllers.
>>
>> The behavior could be useful for development and debugging but it
>> unnecessarily interlocks userland visible behavior with in-kernel
>> implementation details. To me, it seems outright wrong (either
>> implement proper severing semantics in the controller or do full
>> refcnting) and disallows, for example, lazy drain of caching refs.
>> Also, it complicates the removal path with try / commit / revert
>> logic which has never been fully correct since the beginning.
>>
>> Currently, the only left user is memcg.
>>
>> Solution:
>>
>> * Update memcg->pre_destroy() such that it never fails.
>>
>> * Drop __DEPRECATED_clear_css_refs and all related logic.
>> Convert pre_destroy() to return void.
>>
>> Who:
>>
>> KAMEZAWA, Michal, PLEASE. I will make __DEPRECATED_clear_css_refs
>> trigger WARN sooner or later. Let's please get this settled.
>>
>> 3. cgroup_mutex usage outside cgroup core
>>
>> This is another thing which is simply broken. Given the way cgroup
>> is structured and used, nesting cgroup_mutex inside any other
>> commonly used lock simply doesn't work - it's held while invoking
>> controller callbacks which then interact and synchronize with
>> various core subsystems.
>>
>> There are currently three external cgroup_mutex users - cpuset,
>> memcontrol and cgroup_freezer.
>>
>> Solution:
>>
>> Well, we should just stop doing it - use a separate nested lock
>> (which seems possible for cgroup_freezer) or track and mange task
>> in/egress some other way.
>>
>> Who:
>>
>> I'll do the cgroup_freezer. I'm hoping PeterZ or someone who's
>> familiar with the code base takes care of cpuset. Michal, can you
>> please take care of memcg?
>>
>
> I think this is a pressing problem, yes, but not the only problem with
> cgroup lock. Even if we restrict its usage to cgroup core, we still can
> call cgroup functions, which will lock. And then we gain nothing.
>
Agreed. The biggest issue in cpuset is if hotplug makes a cpuset's cpulist
empty the tasks in it will be moved to an ancestor cgroup, which requires
holding cgroup lock. We have to either change cpuset's behavior or eliminate
the global lock.
> And the problem is that people need to lock. cgroup_lock is needed
> because the data you are accessing is protected by it. The way I see it,
> it is incredible how we were able to revive the BKL in the form of
> cgroup_lock after we finally manage to successfully get rid of it!
>
> We should just start to do a more fine grained locking of data, instead
> of "stop the world, cgroup just started!". If we do that, the problem
> you are trying to address here will even cease to exist.
>
next prev parent reply other threads:[~2012-09-14 9:12 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-13 20:58 [RFC] cgroup TODOs Tejun Heo
2012-09-14 9:04 ` Mike Galbraith
2012-09-14 17:17 ` Tejun Heo
2012-09-14 9:10 ` Daniel P. Berrange
2012-09-14 13:58 ` Vivek Goyal
2012-09-14 19:29 ` Tejun Heo
2012-09-14 21:51 ` Kay Sievers
[not found] ` <5052E7DF.7040000@parallels.com>
2012-09-14 9:12 ` Li Zefan [this message]
2012-09-14 11:22 ` Peter Zijlstra
2012-09-14 17:59 ` Tejun Heo
2012-09-14 18:23 ` Peter Zijlstra
2012-09-14 18:33 ` Tejun Heo
2012-09-14 17:43 ` Tejun Heo
2012-09-17 8:50 ` Glauber Costa
2012-09-17 17:21 ` Tejun Heo
2012-09-14 11:15 ` Peter Zijlstra
2012-09-14 12:54 ` Daniel P. Berrange
2012-09-14 17:53 ` Tejun Heo
2012-09-14 14:25 ` Vivek Goyal
2012-09-14 14:53 ` Peter Zijlstra
2012-09-14 15:14 ` Vivek Goyal
2012-09-14 21:57 ` Tejun Heo
2012-09-17 15:27 ` Vivek Goyal
2012-09-18 18:08 ` Vivek Goyal
2012-09-14 21:39 ` Tejun Heo
2012-09-17 15:05 ` Vivek Goyal
2012-09-17 16:40 ` Tejun Heo
2012-09-14 15:03 ` Michal Hocko
2012-09-19 14:02 ` Michal Hocko
2012-09-19 14:03 ` [PATCH 2.6.32] memcg: warn on deeper hierarchies with use_hierarchy==0 Michal Hocko
2012-09-19 19:38 ` David Rientjes
2012-09-20 13:24 ` Michal Hocko
2012-09-20 22:33 ` David Rientjes
2012-09-21 7:16 ` Michal Hocko
2012-09-19 14:03 ` [PATCH 3.0] " Michal Hocko
2012-09-19 14:05 ` [PATCH 3.2+] " Michal Hocko
2012-09-14 18:07 ` [RFC] cgroup TODOs Vivek Goyal
2012-09-14 18:53 ` Tejun Heo
2012-09-14 19:28 ` Vivek Goyal
2012-09-14 19:44 ` Tejun Heo
2012-09-14 19:49 ` Tejun Heo
2012-09-14 20:39 ` Tejun Heo
2012-09-17 8:40 ` Glauber Costa
2012-09-17 17:30 ` Tejun Heo
2012-09-17 14:37 ` Vivek Goyal
2012-09-14 18:36 ` Aristeu Rozanski
2012-09-14 18:54 ` Tejun Heo
2012-09-15 2:20 ` Serge E. Hallyn
2012-09-15 9:27 ` Controlling devices and device namespaces Eric W. Biederman
2012-09-15 22:05 ` Serge E. Hallyn
2012-09-16 0:24 ` Eric W. Biederman
2012-09-16 3:31 ` Serge E. Hallyn
2012-09-16 11:21 ` Alan Cox
2012-09-16 11:56 ` Eric W. Biederman
2012-09-16 12:17 ` Eric W. Biederman
2012-09-16 13:32 ` Serge Hallyn
2012-09-16 14:23 ` Eric W. Biederman
2012-09-16 16:13 ` Alan Cox
2012-09-16 17:49 ` Eric W. Biederman
2012-09-16 16:15 ` Serge Hallyn
2012-09-16 16:53 ` Eric W. Biederman
2012-09-16 8:19 ` [RFC] cgroup TODOs James Bottomley
2012-09-16 14:41 ` Eric W. Biederman
2012-09-17 13:21 ` Aristeu Rozanski
2012-09-14 22:03 ` Dhaval Giani
2012-09-14 22:06 ` Tejun Heo
2012-09-20 1:33 ` Andy Lutomirski
2012-09-20 18:26 ` Tejun Heo
2012-09-20 18:39 ` Andy Lutomirski
2012-09-21 21:40 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5052F4FF.6070508@huawei.com \
--to=lizefan@huawei.com \
--cc=acme@ghostprotocols.net \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=berrange@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=glommer@parallels.com \
--cc=hannes@cmpxchg.org \
--cc=kay.sievers@vrfy.org \
--cc=lennart@poettering.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=mingo@redhat.com \
--cc=nhorman@tuxdriver.com \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=serue@us.ibm.com \
--cc=tgraf@suug.ch \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).