From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752408AbbG2PFy (ORCPT <rfc822;w@1wt.eu>);
	Wed, 29 Jul 2015 11:05:54 -0400
Received: from mail-wi0-f176.google.com ([209.85.212.176]:33822 "EHLO
	mail-wi0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751458AbbG2PFw (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 29 Jul 2015 11:05:52 -0400
Date: Wed, 29 Jul 2015 17:05:49 +0200
From: Michal Hocko <mhocko@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>, Andrew Morton <akpm@linux-foundation.org>,
        Oleg Nesterov <oleg@redhat.com>,
        Vladimir Davydov <vdavydov@parallels.com>,
        Greg Thelen <gthelen@google.com>,
        KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
        KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, linux-mm@kvack.org,
        LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 7/8] memcg: get rid of mm_struct::owner
Message-ID: <20150729150549.GL15801@dhcp22.suse.cz>
References: <1436358472-29137-1-git-send-email-mhocko@kernel.org>
 <1436358472-29137-8-git-send-email-mhocko@kernel.org>
 <20150710140533.GB29540@dhcp22.suse.cz>
 <20150714151823.GG17660@dhcp22.suse.cz>
 <20150729131454.GB10001@cmpxchg.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150729131454.GB10001@cmpxchg.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed 29-07-15 09:14:54, Johannes Weiner wrote:
> On Tue, Jul 14, 2015 at 05:18:23PM +0200, Michal Hocko wrote:
[...]
> > 3) fail mem_cgroup_can_attach if we are trying to migrate a task sharing
> > mm_struct with a process outside of the tset. If I understand the
> > tset properly this would require all the sharing tasks to be migrated
> > together and we would never end up with task_css != &task->mm->css.
> > __cgroup_procs_write doesn't seem to support multi pid move currently
> > AFAICS, though. cgroup_migrate_add_src, however, seems to be intended
> > for this purpose so this should be doable. Without that support we would
> > basically disallow migrating these tasks - I wouldn't object if you ask
> > me.
> 
> I'd prefer not adding controller-specific failure modes for attaching,

Does this mean that there is a plan to drop the return value from
can_attach? I can see that both cpuset and cpu controllers currently
allow to fail to attach. Are those going to change? I remember some
discussions but no clear outcome of those.

> and this too would lead to very non-obvious behavior.

Yeah, the user will not get an error source with the current API but
this is an inherent restriction currently. Maybe we can add a knob with
the error source?

If there is a clear consensus that can_attach failures are clearly a no
go then what about "silent" moving of the associated tasks? This would
be similar to thread group except the group would be more generic term.

> > Do you see other options? From the above three options the 3rd one
> > sounds the most sane to me and the 1st quite easy to implement. Both will
> > require some cgroup core work though. But maybe we would be good enough
> > with 3rd option without supporting moving schizophrenic tasks and that
> > would be reduced to memcg code.
> 
> A modified form of 1) would be to track the mms referring to a memcg
> but during offline search the process tree for a matching task.

But we might have many of those and all of them living in different
cgroups. So which one do we take? The first encountered, the one with
the majority? I am not sure this is much better.

I would really prefer if we could get rid of the schizophrenia if it is
possible.

> This is heavy-handed, but it's a rare case and this work would be done
> in the cgroup removal path rather than during task exit.

Yes it would be lighter a bit.

> This is stolen
> from the current mm_update_next_owner():
> 
> list_for_each_entry(mm, memcg->mms, memcg_list) {
>     for_each_process(g) {
>         if (g->flags & PF_KTHREAD)
>             continue;
>         for_each_thread(g, c) {
>             if (c->mm == mm)
>                 goto assign;
>             if (c->mm)
>                 break;
>         }
>     }
> assign:
>     memcg = mem_cgroup_from_task(c);
>     mm->memcg = memcg;
>     list_move(&mm->memcg_list, &memcg->mms);
> }
> 
> (plus appropriate synchronization, of course)

-- 
Michal Hocko
SUSE Labs