From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932289AbdJaP32 (ORCPT ); Tue, 31 Oct 2017 11:29:28 -0400 Received: from mx2.suse.de ([195.135.220.15]:60102 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752216AbdJaP30 (ORCPT ); Tue, 31 Oct 2017 11:29:26 -0400 Date: Tue, 31 Oct 2017 16:29:23 +0100 From: Michal Hocko To: Shakeel Butt Cc: Roman Gushchin , Linux MM , Vladimir Davydov , Tetsuo Handa , David Rientjes , Andrew Morton , Tejun Heo , kernel-team@fb.com, Cgroups , linux-doc@vger.kernel.org, LKML Subject: Re: [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer Message-ID: <20171031152923.ndyxpdmx3npyqoqf@dhcp22.suse.cz> References: <20171019185218.12663-1-guro@fb.com> <20171019185218.12663-4-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 31-10-17 08:04:19, Shakeel Butt wrote: > > + > > +static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) > > +{ > > + struct mem_cgroup *iter; > > + > > + oc->chosen_memcg = NULL; > > + oc->chosen_points = 0; > > + > > + /* > > + * The oom_score is calculated for leaf memory cgroups (including > > + * the root memcg). > > + */ > > + rcu_read_lock(); > > + for_each_mem_cgroup_tree(iter, root) { > > + long score; > > + > > + if (memcg_has_children(iter) && iter != root_mem_cgroup) > > + continue; > > + > > Cgroup v2 does not support charge migration between memcgs. So, there > can be intermediate nodes which may contain the major charge of the > processes in their leave descendents. Skipping such intermediate nodes > will kind of protect such processes from oom-killer (lower on the list > to be killed). Is it ok to not handle such scenario? If yes, shouldn't > we document it? Yes, this is a real problem and the one which is not really solvable without the charge migration. You simply have no clue _who_ owns the memory so I assume that admins will need to setup the hierarchy which allows subgroups to migrate tasks to be oom_group. Or we might want to allow opt-in for charge migration in v2. To be honest I wasn't completely happy about removing this functionality altogether in v2 but there was a strong pushback back then that relying on the charge migration doesn't have any sound usecase. Anyway, I agree that documentation should be explicit about that. -- Michal Hocko SUSE Labs