linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Daniel Colascione <dancol@google.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	yuzhoujian@didichuxing.com,
	Souptick Joarder <jrdr.linux@gmail.com>,
	Roman Gushchin <guro@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Shakeel Butt <shakeelb@google.com>,
	Christian Brauner <christian@brauner.io>,
	Minchan Kim <minchan@kernel.org>,
	Tim Murray <timmurray@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Jann Horn <jannh@google.com>, linux-mm <linux-mm@kvack.org>,
	lsf-pc@lists.linux-foundation.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Android Kernel Team <kernel-team@android.com>,
	Oleg Nesterov <oleg@redhat.com>
Subject: Re: [RFC 2/2] signal: extend pidfd_send_signal() to allow expedited process killing
Date: Thu, 25 Apr 2019 09:09:46 -0700	[thread overview]
Message-ID: <CAJuCfpHQm5=QrOiOQ4thA5FmRUQjNGdSfis=iEaNGv7npmMVxQ@mail.gmail.com> (raw)
In-Reply-To: <CAKOZuetQH1rVtPdMNgw0sdnzWidd6v9eCWscRiOb7Y+3-JQ14Q@mail.gmail.com>

On Fri, Apr 12, 2019 at 7:14 AM Daniel Colascione <dancol@google.com> wrote:
>
> On Thu, Apr 11, 2019 at 11:53 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Thu 11-04-19 08:33:13, Matthew Wilcox wrote:
> > > On Wed, Apr 10, 2019 at 06:43:53PM -0700, Suren Baghdasaryan wrote:
> > > > Add new SS_EXPEDITE flag to be used when sending SIGKILL via
> > > > pidfd_send_signal() syscall to allow expedited memory reclaim of the
> > > > victim process. The usage of this flag is currently limited to SIGKILL
> > > > signal and only to privileged users.
> > >
> > > What is the downside of doing expedited memory reclaim?  ie why not do it
> > > every time a process is going to die?
> >
> > Well, you are tearing down an address space which might be still in use
> > because the task not fully dead yeat. So there are two downsides AFAICS.
> > Core dumping which will not see the reaped memory so the resulting
>
> Test for SIGNAL_GROUP_COREDUMP before doing any of this then. If you
> try to start a core dump after reaping begins, too bad: you could have
> raced with process death anyway.
>
> > coredump might be incomplete. And unexpected #PF/gup on the reaped
> > memory will result in SIGBUS.
>
> It's a dying process. Why even bother returning from the fault
> handler? Just treat that situation as a thread exit. There's no need
> to make this observable to userspace at all.

I've spent some more time to investigate possible effects of reaping
on coredumps and asked Oleg Nesterov who worked on patchsets that
prioritize SIGKILLs over coredump activity
(https://lkml.org/lkml/2013/2/17/118). Current do_coredump
implementation seems to handle the case of SIGKILL interruption by
bailing out whenever dump_interrupted() returns true and that would be
the case with pending SIGKILL. So in the case of race when coredump
happens first and SIGKILL comes next interrupting the coredump seems
to result in no change in behavior and reaping memory proactively
seems to have no side effects.
An opposite race when SIGKILL gets posted and then coredump happens
seems impossible because do_coredump won't be called from get_signal
due to signal_group_exit check (get_signal checks signal_group_exit
while holding sighand->siglock and complete_signal sets
SIGNAL_GROUP_EXIT while holding the same lock). There is a path from
__seccomp_filter calling do_coredump while processing
SECCOMP_RET_KILL_xxx but even then it should bail out when
coredump_wait()->zap_threads(current) checks signal_group_exit().
(Thanks Oleg for clarifying this for me).

If we are really concerned about possible increase in failed coredumps
because of the proactive reaping I could make it conditional on
whether coredumping mm is possible by placing this feature behind
!get_dumpable(mm) condition. Another possibility is to check
RLIMIT_CORE to decide if coredumps are possible (although if pipe is
used for coredump that limit seems to be ignored, so that check would
have to take this into consideration).

On the issue of SIGBUS happening when accessed memory was already
reaped, my understanding that SIGBUS being a synchronous signal will
still have to be fetched using dequeue_synchronous_signal from
get_signal but not before signal_group_exit is checked. So again if
SIGKILL is pending I think SIGBUS will be ignored (please correct me
if that's not correct).

One additional question I would like to clarify is whether per-node
reapers like Roman suggested would make a big difference (All CPUs
that I've seen used for Android are single-node ones, so looking for
more feedback here). If it's important then reaping victim's memory by
the killer is probably not an option.

  parent reply	other threads:[~2019-04-25 16:10 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11  1:43 [RFC 0/2] opportunistic memory reclaim of a killed process Suren Baghdasaryan
2019-04-11  1:43 ` [RFC 1/2] mm: oom: expose expedite_reclaim to use oom_reaper outside of oom_kill.c Suren Baghdasaryan
2019-04-25 21:12   ` Tetsuo Handa
2019-04-25 21:56     ` Suren Baghdasaryan
2019-04-11  1:43 ` [RFC 2/2] signal: extend pidfd_send_signal() to allow expedited process killing Suren Baghdasaryan
2019-04-11 10:30   ` Christian Brauner
2019-04-11 10:34     ` Christian Brauner
2019-04-11 15:18     ` Suren Baghdasaryan
2019-04-11 15:23       ` Suren Baghdasaryan
2019-04-11 16:25         ` Daniel Colascione
2019-04-11 15:33   ` Matthew Wilcox
2019-04-11 17:05     ` Johannes Weiner
2019-04-11 17:09     ` Suren Baghdasaryan
2019-04-11 17:33       ` Daniel Colascione
2019-04-11 17:36         ` Matthew Wilcox
2019-04-11 17:47           ` Daniel Colascione
2019-04-12  6:49             ` Michal Hocko
2019-04-12 14:15               ` Suren Baghdasaryan
2019-04-12 14:20                 ` Daniel Colascione
2019-04-12 21:03             ` Matthew Wilcox
2019-04-11 17:52           ` Suren Baghdasaryan
2019-04-11 21:45       ` Roman Gushchin
2019-04-11 21:59         ` Suren Baghdasaryan
2019-04-12  6:53     ` Michal Hocko
2019-04-12 14:10       ` Suren Baghdasaryan
2019-04-12 14:14       ` Daniel Colascione
2019-04-12 15:30         ` Daniel Colascione
2019-04-25 16:09         ` Suren Baghdasaryan [this message]
2019-04-11 10:51 ` [RFC 0/2] opportunistic memory reclaim of a killed process Michal Hocko
2019-04-11 16:18   ` Joel Fernandes
2019-04-11 18:12     ` Michal Hocko
2019-04-11 19:14       ` Joel Fernandes
2019-04-11 20:11         ` Michal Hocko
2019-04-11 21:11           ` Joel Fernandes
2019-04-11 16:20   ` Sandeep Patil
2019-04-11 16:47   ` Suren Baghdasaryan
2019-04-11 18:19     ` Michal Hocko
2019-04-11 19:56       ` Suren Baghdasaryan
2019-04-11 20:17         ` Michal Hocko
2019-04-11 17:19   ` Johannes Weiner
2019-04-11 11:51 ` [Lsf-pc] " Rik van Riel
2019-04-11 12:16   ` Michal Hocko
2019-04-11 16:54     ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJuCfpHQm5=QrOiOQ4thA5FmRUQjNGdSfis=iEaNGv7npmMVxQ@mail.gmail.com' \
    --to=surenb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian@brauner.io \
    --cc=dancol@google.com \
    --cc=ebiederm@xmission.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=jrdr.linux@gmail.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=timmurray@google.com \
    --cc=willy@infradead.org \
    --cc=yuzhoujian@didichuxing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).