From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751696AbcDOPIU (ORCPT <rfc822;w@1wt.eu>);
	Fri, 15 Apr 2016 11:08:20 -0400
Received: from mail-wm0-f68.google.com ([74.125.82.68]:35520 "EHLO
	mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750716AbcDOPIT (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 15 Apr 2016 11:08:19 -0400
Date: Fri, 15 Apr 2016 17:08:15 +0200
From: Michal Hocko <mhocko@kernel.org>
To: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>, Petr Mladek <pmladek@suse.com>,
        cgroups@vger.kernel.org, Cyril Hrubis <chrubis@suse.cz>,
        linux-kernel@vger.kernel.org
Subject: Re: [BUG] cgroup/workques/fork: deadlock when moving cgroups
Message-ID: <20160415150815.GM32377@dhcp22.suse.cz>
References: <20160413094216.GC5774@pathway.suse.cz>
 <20160413183309.GG3676@htj.duckdns.org>
 <20160413192313.GA30260@dhcp22.suse.cz>
 <20160414175055.GA6794@cmpxchg.org>
 <20160415070601.GA32377@dhcp22.suse.cz>
 <20160415143815.GH12583@htj.duckdns.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160415143815.GH12583@htj.duckdns.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri 15-04-16 10:38:15, Tejun Heo wrote:
> Hello, Michal.
> 
> On Fri, Apr 15, 2016 at 09:06:01AM +0200, Michal Hocko wrote:
> > Tejun was proposing to do the migration async (move the whole
> > mem_cgroup_move_charge into the work item). This would solve the problem
> > of course. I haven't checked whether this would be safe but it at least
> > sounds doable (albeit far from trivial). It would also be a user visible
> > change because the new memcg will not contain the moved charges after we
> > return to user space. I think this would be acceptable but if somebody
> 
> Not necessarily.  The only thing necessary is flushing the work item
> after releasing locks but before returning to user.
> cpuset_post_attach_flush() does exactly the same thing.

Ahh, ok, didn't know that __cgroup_procs_write is doing something
controller specific. Yes then we wouldn't need a generic callback if
another code like above would be acceptable.

> > really relies on the previous behavior I guess we can solve it with a
> > post_move cgroup callback which would be called from a lockless context.
> > 
> > Anyway, before we go that way, can we at least consider the possibility
> > of removing the kworker creation dependency on the global rwsem? AFAIU
> > this locking was added because of the pid controller. Do we even care
> > about something as volatile as kworkers in the pid controller?
> 
> It's not just pid controller and the global percpu locking has lower

where else would the locking matter? I have only checked the git history
to build my picture so I might be missing something of course.

> hotpath overhead.  We can try to exclude kworkers out of the locking
> but that can get really nasty and there are already attempts to add
> cgroup support to workqueue.  Will think more about it.  For now tho,
> do you think making charge moving async would be difficult?

Well it certainly is not that trivial because it relies on being
exclusive with global context. I will have to look closer of course but
I cannot guarantee I will get to it before I get back from LSF. We can
certainly discuss that at the conference. Johannes will be there as
well.

Thanks!
-- 
Michal Hocko
SUSE Labs