From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752066Ab1H0Nkq (ORCPT ); Sat, 27 Aug 2011 09:40:46 -0400 Received: from mail-ww0-f42.google.com ([74.125.82.42]:49347 "EHLO mail-ww0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751818Ab1H0Nkp (ORCPT ); Sat, 27 Aug 2011 09:40:45 -0400 Date: Sat, 27 Aug 2011 15:40:40 +0200 From: Frederic Weisbecker To: Paul Menage Cc: Kay Sievers , Li Zefan , Tim Hockin , Andrew Morton , LKML , Johannes Weiner , Aditya Kali , Oleg Nesterov Subject: Re: [RFD] Task counter: cgroup core feature or cgroup subsystem? (was Re: [PATCH 0/8 v3] cgroups: Task counter subsystem) Message-ID: <20110827134038.GH3298@somewhere> References: <1311956010-32076-1-git-send-email-fweisbec@gmail.com> <20110801161900.1fe24b76.akpm@linux-foundation.org> <20110818143319.GC10441@somewhere> <20110824175431.GA26417@somewhere.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 26, 2011 at 08:16:32AM -0700, Paul Menage wrote: > On Wed, Aug 24, 2011 at 10:54 AM, Frederic Weisbecker > wrote: > > > > It seems your patch doesn't handle the ->fork() and ->exit() calls. > > We probably need a quick access to states of multi-subsystems from > > the task, some lists available from task->cgroups, I don't know yet. > > > > That state is available, but currently only while holding cgroup_mutex > - at least, that's what task_cgroup_from_root() requires. > > It might be the case that we could achieve the same effect by just > locking the task, so the pre-condition for task_cgroup_from_root() > would be either that cgroup_mutex is held or the task lock is held. > > We could extend the signature of cgroup_subsys.fork to include a > reference to the cgroup; for the singly-bindable subsystems this would > be trivially available via task->cgroups; for the multi-bindable > subsystems then for each hierarchy that the subsystem is mounted on > we'd call task_cgroup_from_root() to get the cgroup for that > hierarchy. So multi-bindable subsystems with fork/exit callbacks would > get called once for each mounted instance of the subsystem. > > This would still make the task counter subsystem a bit painful - it > would read_lock a global rwlock (css_set_lock) on every fork/exit in > order to find the cgroup to charge/uncharge. I'm not sure how painful > that would be on a big system. If that were a noticeable performance > problem, we could have a variable-length extension on the end of > css_set that contains a list of hierarchy_index/cgroup pairs for any > hierarchies that had multi-bindable subsystems (or maybe for all > hierarchies, for simplicity). This would make creating a css_set a > little bit more complicated, but overall shouldn't be too painful, and > would make the problem of finding a cgroup for a given hierarchy > trivial. Oh you're right. My first idea was to reference multi-bindable subsystem states in cgroup_subsys_state, like it's done currently for singletons subsystems. But this indeed require cgroup_mutex or task_lock. And only the last one look sensible in fork/exit path. And if that becomes a scalability problem we can still have a dedicated lock for cgroup attach/detach on tasks. Whatever we do, we need that lock. So we can pick your solution that references cgroups that belong to multi-bindable subsystems for a given task in css_set, or we can have tsk->cgroups->subsys[] a variable size array that references 1 * singletons and N * multi bindable subsystems, N beeing the number of hierarchies that use a given subsystem. What do you think?