linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <guro@fb.com>, Rik van Riel <riel@surriel.com>,
	Minchan Kim <minchan@kernel.org>,
	Christian Brauner <christian@brauner.io>,
	Christoph Hellwig <hch@infradead.org>,
	Oleg Nesterov <oleg@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Jann Horn <jannh@google.com>, Andy Lutomirski <luto@kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Florian Weimer <fweimer@redhat.com>,
	Jan Engelhardt <jengelh@inai.de>,
	Tim Murray <timmurray@google.com>,
	Linux API <linux-api@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	kernel-team <kernel-team@android.com>
Subject: Re: [PATCH v3 1/2] mm: introduce process_mrelease system call
Date: Mon, 26 Jul 2021 09:27:04 +0200	[thread overview]
Message-ID: <YP5jyLeYsN3JtdX8@dhcp22.suse.cz> (raw)
In-Reply-To: <CALvZod7Vb2MKgCcSYtsMd8F4sFb2K7jQk3AGSECYfKvd3MNqzQ@mail.gmail.com>

On Fri 23-07-21 10:00:26, Shakeel Butt wrote:
> On Fri, Jul 23, 2021 at 9:09 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Fri, Jul 23, 2021 at 6:46 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Fri, Jul 23, 2021 at 1:53 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > [...]
> > > > > However
> > > > > retrying means issuing another syscall, so additional overhead...
> > > > > I guess such "best effort" approach would be unusual for a syscall, so
> > > > > maybe we can keep it as it is now and if such "do not block" mode is needed
> > > > > we can use flags to implement it later?
> > > >
> > > > Yeah, an explicit opt-in via flags would be an option if that turns out
> > > > to be really necessary.
> > > >
> > >
> > > I am fine with keeping it as it is but we do need the non-blocking
> > > option (via flags) to enable userspace to act more aggressively.
> >
> > I think you want to check memory conditions shortly after issuing
> > kill/reap requests irrespective of mmap_sem contention. The reason is
> > that even when memory release is not blocked, allocations from other
> > processes might consume memory faster than we release it. For example,
> > in Android we issue kill and start waiting on pidfd for its death
> > notification. As soon as the process is dead we reassess the situation
> > and possibly kill again. If the process is not dead within a
> > configurable timeout we check conditions again and might issue more
> > kill requests (IOW our wait for the process to die has a timeout). If
> > process_mrelease() is blocked on mmap_sem, we might timeout like this.
> > I imagine that a non-blocking option for process_mrelease() would not
> > really change this logic.
> 
> On a containerized system, killing a job requires killing multiple
> processes and then process_mrelease() them. Now there is cgroup.kill
> to kill all the processes in a cgroup tree but we would still need to
> process_mrelease() all the processes in that tree.

Is process_mrelease on all of them really necessary? I thought that the
primary reason for the call is to guarantee a forward progress in cases
where the userspace OOM victim cannot die on SIGKILL. That should be
more an exception than a normal case, no?

> There is a chance
> that we get stuck in reaping the early process. Making
> process_mrelease() non-blocking will enable the userspace to go to
> other processes in the list.

I do agree that allowing (guanrateed) non-blocking behavior is nice but
it is also a rather strong promise. There is some memory that cannot be
released by the oom reaper currently because there are locks involved
(e.g. mlocked memory or memory areas backed by blocking notifiers).
I can imagine some users of this api would rather block and make sure to
release the memory rather than skip over it. So if anything this has to
be an opt in with a big fat warning that the behavior of the kernel wrt
to releasable memory can vary due to all sorts of implementation
details.

> An alternative would be to have a cgroup specific interface for
> reaping similar to cgroup.kill.

Could you elaborate?

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2021-07-26  7:27 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23  1:14 [PATCH v3 1/2] mm: introduce process_mrelease system call Suren Baghdasaryan
2021-07-23  1:14 ` [PATCH v3 2/2] mm: wire up syscall process_mrelease Suren Baghdasaryan
2021-07-23  2:03 ` [PATCH v3 1/2] mm: introduce process_mrelease system call Shakeel Butt
     [not found]   ` <CAJuCfpFZeQez77CB7odfaSpi3JcLQ_Nz0WvDTsra1VPoA-j7sg@mail.gmail.com>
2021-07-23  6:20     ` Michal Hocko
     [not found]       ` <CAJuCfpGSZwVgZ=FxhCV-uC_mzC7O-v-3k3tm-F6kOB7WM9t9tw@mail.gmail.com>
2021-07-23  8:15         ` David Hildenbrand
2021-07-23  8:18           ` Suren Baghdasaryan
2021-07-23  8:53         ` Michal Hocko
2021-07-23 13:46           ` Shakeel Butt
2021-07-23 16:08             ` Suren Baghdasaryan
2021-07-23 17:00               ` Shakeel Butt
2021-07-26  7:27                 ` Michal Hocko [this message]
2021-07-26 13:43                   ` Shakeel Butt
2021-08-02 19:53                     ` Suren Baghdasaryan
2021-08-02 20:05                       ` Shakeel Butt
2021-08-02 20:08                         ` Suren Baghdasaryan
2021-08-02 22:16                           ` Suren Baghdasaryan
2021-07-23 13:40       ` Shakeel Butt
2021-07-26  8:20   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YP5jyLeYsN3JtdX8@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian.brauner@ubuntu.com \
    --cc=christian@brauner.io \
    --cc=david@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jannh@google.com \
    --cc=jengelh@inai.de \
    --cc=kernel-team@android.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).