Re: [RFC 0/2] opportunistic memory reclaim of a killed process

From: Joel Fernandes <joel@joelfernandes.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	yuzhoujian@didichuxing.com, jrdr.linux@gmail.com, guro@fb.com,
	Johannes Weiner <hannes@cmpxchg.org>,
	penguin-kernel@i-love.sakura.ne.jp, ebiederm@xmission.com,
	shakeelb@google.com, Christian Brauner <christian@brauner.io>,
	Minchan Kim <minchan@kernel.org>,
	Tim Murray <timmurray@google.com>,
	Daniel Colascione <dancol@google.com>,
	Jann Horn <jannh@google.com>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	lsf-pc@lists.linux-foundation.org,
	LKML <linux-kernel@vger.kernel.org>,
	"Cc: Android Kernel" <kernel-team@android.com>
Subject: Re: [RFC 0/2] opportunistic memory reclaim of a killed process
Date: Thu, 11 Apr 2019 17:11:00 -0400	[thread overview]
Message-ID: <20190411211100.GB130334@google.com> (raw)
In-Reply-To: <20190411201151.GA4743@dhcp22.suse.cz>

On Thu, Apr 11, 2019 at 10:11:51PM +0200, Michal Hocko wrote:
> On Thu 11-04-19 15:14:30, Joel Fernandes wrote:
> > On Thu, Apr 11, 2019 at 08:12:43PM +0200, Michal Hocko wrote:
> > > On Thu 11-04-19 12:18:33, Joel Fernandes wrote:
> > > > On Thu, Apr 11, 2019 at 6:51 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > On Wed 10-04-19 18:43:51, Suren Baghdasaryan wrote:
> > > > > [...]
> > > > > > Proposed solution uses existing oom-reaper thread to increase memory
> > > > > > reclaim rate of a killed process and to make this rate more deterministic.
> > > > > > By no means the proposed solution is considered the best and was chosen
> > > > > > because it was simple to implement and allowed for test data collection.
> > > > > > The downside of this solution is that it requires additional “expedite”
> > > > > > hint for something which has to be fast in all cases. Would be great to
> > > > > > find a way that does not require additional hints.
> > > > >
> > > > > I have to say I do not like this much. It is abusing an implementation
> > > > > detail of the OOM implementation and makes it an official API. Also
> > > > > there are some non trivial assumptions to be fullfilled to use the
> > > > > current oom_reaper. First of all all the process groups that share the
> > > > > address space have to be killed. How do you want to guarantee/implement
> > > > > that with a simply kill to a thread/process group?
> > > > 
> > > > Will task_will_free_mem() not bail out in such cases because of
> > > > process_shares_mm() returning true?
> > > 
> > > I am not really sure I understand your question. task_will_free_mem is
> > > just a shortcut to not kill anything if the current process or a victim
> > > is already dying and likely to free memory without killing or spamming
> > > the log. My concern is that this patch allows to invoke the reaper
> > 
> > Got it.
> > 
> > > without guaranteeing the same. So it can only be an optimistic attempt
> > > and then I am wondering how reasonable of an interface this really is.
> > > Userspace send the signal and has no way to find out whether the async
> > > reaping has been scheduled or not.
> > 
> > Could you clarify more what you're asking to guarantee? I cannot picture it.
> > If you mean guaranteeing that "a task is dying anyway and will free its
> > memory on its own", we are calling task_will_free_mem() to check that before
> > invoking the oom reaper.
> 
> No, I am talking about the API aspect. Say you kall kill with the flag
> to make the async address space tear down. Now you cannot really
> guarantee that this is safe to do because the target task might
> clone(CLONE_VM) at any time. So this will be known only once the signal
> is sent, but the calling process has no way to find out. So the caller
> has no way to know what is the actual result of the requested operation.
> That is a poor API in my book.
> 
> > Could you clarify what is the draback if OOM reaper is invoked in parallel to
> > an exiting task which will free its memory soon? It looks like the OOM reaper
> > is taking all the locks necessary (mmap_sem) in particular and is unmapping
> > pages. It seemed to me to be safe, but I am missing what are the main draw
> > backs of this - other than the intereference with core dump. One could be
> > presumably scalability since the since OOM reaper could be bottlenecked by
> > freeing memory on behalf of potentially several dying tasks.
> 
> oom_reaper or any other kernel thread doing the same is a mere
> implementation detail I think. The oom killer doesn't really need the
> oom_reaper to act swiftly because it is there to act as a last resort if
> the oom victim cannot terminate on its own. If you want to offer an
> user space API then you can assume users will like to use it and expect
> a certain behavior but what that is? E.g. what if there are thousands of
> tasks killed this way? Do we care that some of them will not get the
> async treatment? If yes why do we need an API to control that at all?
> 
> Am I more clear now?

Yes, your concerns are more clear now. We will think more about this and your
other responses, thanks a lot.

 - Joel