linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Tejun Heo <tj@kernel.org>
Cc: paul@paulmenage.org, rjw@sisk.pl, lizf@cn.fujitsu.com,
	linux-pm@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org,
	containers@lists.linux-foundation.org, matthltc@us.ibm.com,
	akpm@linux-foundation.org, oleg@redhat.com,
	kamezawa.hiroyu@Jp.fujitsu.com
Subject: Re: [PATCH 03/10] threadgroup: extend threadgroup_lock() to cover exit and exec
Date: Wed, 23 Nov 2011 15:02:05 +0100	[thread overview]
Message-ID: <20111123140139.GB10669@somewhere.redhat.com> (raw)
In-Reply-To: <20111121215839.GL25776@google.com>

On Mon, Nov 21, 2011 at 01:58:39PM -0800, Tejun Heo wrote:
> > but IMHO the races involved in exit and exec
> > are too different, specific and complicated on their own to be solved in a
> > single one patch. This should be split in two things.
> > 
> > The specific problems and their fix need to be described more in detail
> > in the changelog because the issues are very tricky.
> > 
> > The exec case:
> > 
> > IIUC, the race in exec is about the group leader that can be changed
> > to become the exec'ing thread, making while_each_thread() unsafe.
> 
> Not only that, pid changes, sighand struct may get swapped, and other
> weird things which aren't usually expected for a live task happen.
> It's basically semi-killed and then resurrected.

Right. I've no problem with the fact you want to make the threadgroup
more stable against subsystem attachment. Just please put more
detailed comments about what you are protecting. It took me quite
some time to precisely identify the race involved in that game in
order to consciously review your patches. And I'm thinking about
the next people who will work on this piece of code.

This:

 static inline void threadgroup_lock(struct task_struct *tsk)                                                                
 {                                                                                                                           
+       /* exec uses exit for de-threading, grab cred_guard_mutex first */                                                   
+       mutex_lock(&tsk->signal->cred_guard_mutex);                                                                          
        down_write(&tsk->signal->group_rwsem);

is really not obvious.

Just point out we want to synchronize against the leader, pid and the sighand
that may change concurrently.

> > We also have other things happening there like all the other threads
> > in the group that get killed, but that should be handled by the threadgroup_change_begin()
> > you put in the exit path.
> > The old leader is also killed but release_task() -> __unhash_process() is called
> > for it manually from the exec path. However this thread too should be covered by your
> > synchronisation in exit().
> > 
> > So after your protection in the exit path, the only thing to protect against in exec
> > is that group_leader that can change concurrently. But I may be missing something in the picture.
> 
> Hmm... an exec'ing task goes through transitions which usually aren't
> allowed to happen.  It assumes exclusive access to group-shared data
> structures and may ditch the current one and create a new one.
> 
> Sure, it's possible to make every cgroup method correct w.r.t. all
> those via explicit locking or by being more careful but I don't see
> much point in such excercise.  If the code is sitting in some hot path
> and the added exclusion is gonna add significant amount of contention,
> sure, we should trade off easiness for better performance /
> scalability but this is for cgroup modifications, a fundamentally cold
> path, and the added locking is per-task or per-threadgroup.  I don't
> see any reason not to be easy here.

Sure. I have no problem with that. I just wish we can precisely identify
what we are protecting against, and not have a foggy locking. Especially
when it's about such a very core issue.

> Please read on.
> 
> > Also note this is currently protected by the tasklist readlock. Cred
> > guard mutex is certainly better, I just don't remember if you remove
> > the tasklist lock in a further patch.
> 
> I don't remove tasklist_lock.

I believe you can do it after using the cred guard mutex. That needs to
be double checked though. I think it was mostly there to keep while_each_thread()
stable non-racy against leader change. cred_guard_mutex should handle that
now.

> 
> > The exit case:
> > 
> > There the issue is about racing against cgroup_exit() where the task can be
> > assigned to the root cgroup. Is this really a problem in fact? I don't know
> > if subsystems care about that. Perhaps some of them are buggy in that
> > regard. At least the task counter handles that and it needs a
> > ->cancel_attach_task() for this purpose in the case the migration fails due to exit.
> > Not sure about others. A real synchronization against exit is less error prone for sure.
> > In the end that's probably a win.
> 
> This is the same story.  Yes, we can narrow the locking and try to
> make sure everyone handles partially destroyed tasks properly in all
> methods, but for what?  If we can give stable threadgroups to all
> cgroup methods without introducing performance or scalability
> bottleneck, that's the right thing to do.  Please also note that bugs
> stemming from failure to handle those corner cases properly will be
> subtle, difficult to reproduce and track down.

Agreed.

> In general, for any subsystem with pluggable interface, I think it's
> best to put as much complexity as possible into the core layer to make
> things eaiser for its customers.  It becomes excruciatingly painful if
> the API invites subtle corner case bugs and the subsystem grows
> several dozen clients down the road.

Sure.

> 
> So, to me, what seems more important is how to make it easier for each
> cgroup client instead of what's the minimal that's necessary right
> now.
> 
> Heh, did I make any sense? :)

Yep, fine for me :)
Just wanted to ensure I (and others) understood and identified well the issues
and the fixes.

  reply	other threads:[~2011-11-23 14:02 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-01 23:46 [PATCHSET] cgroup: stable threadgroup during attach & subsys methods consolidation Tejun Heo
2011-11-01 23:46 ` [PATCH 01/10] cgroup: add cgroup_root_mutex Tejun Heo
2011-11-04  8:38   ` KAMEZAWA Hiroyuki
2011-11-01 23:46 ` [PATCH 02/10] threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem Tejun Heo
2011-11-04  8:40   ` KAMEZAWA Hiroyuki
2011-11-04 15:16     ` Tejun Heo
2011-11-01 23:46 ` [PATCH 03/10] threadgroup: extend threadgroup_lock() to cover exit and exec Tejun Heo
2011-11-04  8:45   ` KAMEZAWA Hiroyuki
2011-11-13 16:44   ` Frederic Weisbecker
2011-11-14 13:54     ` Frederic Weisbecker
2011-11-21 22:03       ` Tejun Heo
2011-11-23 14:34         ` Frederic Weisbecker
2011-11-21 21:58     ` Tejun Heo
2011-11-23 14:02       ` Frederic Weisbecker [this message]
2011-11-24 21:22         ` Tejun Heo
2011-11-13 18:20   ` Frederic Weisbecker
2011-11-24 22:50   ` [PATCH UPDATED " Tejun Heo
2011-11-25  4:02     ` Linus Torvalds
2011-11-27 19:21       ` Tejun Heo
2011-11-27 21:25         ` Tejun Heo
2011-12-01 19:29           ` Tejun Heo
2011-11-25 14:01     ` Frederic Weisbecker
2011-11-27 19:30       ` Tejun Heo
2011-12-02 16:28         ` Frederic Weisbecker
2011-12-05 18:43           ` Tejun Heo
2011-12-07 15:30             ` Frederic Weisbecker
2011-12-07 18:22               ` Tejun Heo
2011-12-08 20:50                 ` [PATCH UPDATED AGAIN " Tejun Heo
2011-12-09 23:42                   ` Frederic Weisbecker
2011-12-13  1:33                     ` Tejun Heo
2011-12-13  2:17                       ` Tejun Heo
2011-11-01 23:46 ` [PATCH 04/10] cgroup: always lock threadgroup during migration Tejun Heo
2011-11-04  8:54   ` KAMEZAWA Hiroyuki
2011-11-04 15:21     ` Tejun Heo
2011-11-14 18:46   ` Frederic Weisbecker
2011-11-14 18:52     ` Frederic Weisbecker
2011-11-21 22:05       ` Tejun Heo
2011-11-01 23:46 ` [PATCH 05/10] cgroup: subsys->attach_task() should be called after migration Tejun Heo
2011-11-14 20:06   ` Frederic Weisbecker
2011-11-21 22:04     ` Tejun Heo
2011-11-01 23:46 ` [PATCH 06/10] cgroup: improve old cgroup handling in cgroup_attach_proc() Tejun Heo
2011-11-14 20:37   ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 07/10] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach() Tejun Heo
2011-11-14 21:16   ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 08/10] cgroup: don't use subsys->can_attach_task() or ->attach_task() Tejun Heo
2011-11-04  9:08   ` KAMEZAWA Hiroyuki
2011-11-14 23:54   ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 09/10] cgroup, cpuset: don't use ss->pre_attach() Tejun Heo
2011-11-15  0:51   ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 10/10] cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task() Tejun Heo
2011-11-04  9:10   ` KAMEZAWA Hiroyuki
2011-11-15  0:54   ` Frederic Weisbecker
2011-11-21 22:07 ` [PATCHSET] cgroup: stable threadgroup during attach & subsys methods consolidation Tejun Heo
2011-11-22  2:27   ` Li Zefan
2011-11-22 16:20     ` Tejun Heo
2011-11-24 22:51 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111123140139.GB10669@somewhere.redhat.com \
    --to=fweisbec@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=kamezawa.hiroyu@Jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=matthltc@us.ibm.com \
    --cc=oleg@redhat.com \
    --cc=paul@paulmenage.org \
    --cc=rjw@sisk.pl \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).