Re: [PATCH v3 2/4] mm/oom: handle remote ooms

From: Michal Hocko <mhocko@suse.com>
To: Mina Almasry <almasrymina@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>, Greg Thelen <gthelen@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Roman Gushchin <guro@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Muchun Song <songmuchun@bytedance.com>,
	riel@surriel.com, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH v3 2/4] mm/oom: handle remote ooms
Date: Tue, 16 Nov 2021 10:39:08 +0100	[thread overview]
Message-ID: <YZN8PCK9kmmYUXSp@dhcp22.suse.cz> (raw)
In-Reply-To: <YZN5tkhHomj6HSb2@dhcp22.suse.cz>

On Tue 16-11-21 10:28:25, Michal Hocko wrote:
> On Mon 15-11-21 16:58:19, Mina Almasry wrote:
[...]
> > To be honest I think this is very workable, as is Shakeel's suggestion
> > of MEMCG_OOM_NO_VICTIM. Since this is an opt-in feature, we can
> > document the behavior and if the userspace doesn't want to get killed
> > they can catch the sigbus and handle it gracefully. If not, the
> > userspace just gets killed if we hit this edge case.
> 
> I am not sure about the MEMCG_OOM_NO_VICTIM approach. It sounds really
> hackish to me. I will get back to Shakeel's email as time permits. The
> primary problem I have with this, though, is that the kernel oom killer
> cannot really do anything sensible if the limit is reached and there
> is nothing reclaimable left in this case. The tmpfs backed memory will
> simply stay around and there are no means to recover without userspace
> intervention.

And just a small clarification. Tmpfs is fundamentally problematic from
the OOM handling POV. The nuance here is that the OOM happens in a
different memcg and thus a different resource domain. If you kill a task
in the target memcg then you effectively DoS that workload. If you kill
the allocating task then it is DoSed by anybody allowed to write to that
shmem. All that without a graceful fallback.

I still have very hard time seeing how that can work reasonably except
for a very special case with a lot of other measures to ensure the
target memcg never hits the hard limit so the OOM simply is not a
problem.

Memory controller has always been used to enforce and balance memory
usage among resource domains and this goes against that principle.
I would be really curious what Johannes thinks about this.
-- 
Michal Hocko
SUSE Labs