linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <guro@fb.com>, Rik van Riel <riel@surriel.com>,
	Christian Brauner <christian@brauner.io>,
	Oleg Nesterov <oleg@redhat.com>,
	Tim Murray <timmurray@google.com>,
	linux-api@vger.kernel.org, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	kernel-team <kernel-team@android.com>,
	Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH 1/1] RFC: add pidfd_send_signal flag to reclaim mm while killing a process
Date: Fri, 13 Nov 2020 18:16:32 -0800	[thread overview]
Message-ID: <20201113181632.6d98489465430a987c96568d@linux-foundation.org> (raw)
In-Reply-To: <CAJuCfpH-Qjm5uqfaUcfk0QV2zC76uL96FQjd88bZGBvCuXE_aA@mail.gmail.com>

On Fri, 13 Nov 2020 17:57:02 -0800 Suren Baghdasaryan <surenb@google.com> wrote:

> On Fri, Nov 13, 2020 at 5:18 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Fri, 13 Nov 2020 17:09:37 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > > > > > Seems to me that the ability to reap another process's memory is a
> > > > > > generally useful one, and that it should not be tied to delivering a
> > > > > > signal in this fashion.
> > > > > >
> > > > > > And we do have the new process_madvise(MADV_PAGEOUT).  It may need a
> > > > > > few changes and tweaks, but can't that be used to solve this problem?
> > > > >
> > > > > Thank you for the feedback, Andrew. process_madvise(MADV_DONTNEED) was
> > > > > one of the options recently discussed in
> > > > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > > > > . The thread describes some of the issues with that approach but if we
> > > > > limit it to processes with pending SIGKILL only then I think that
> > > > > would be doable.
> > > >
> > > > Why would it be necessary to read /proc/pid/maps?  I'd have thought
> > > > that a starting effort would be
> > > >
> > > >         madvise((void *)0, (void *)-1, MADV_PAGEOUT)
> > > >
> > > > (after translation into process_madvise() speak).  Which is equivalent
> > > > to the proposed process_madvise(MADV_DONTNEED_MM)?
> > >
> > > Yep, this is very similar to option #3 in
> > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > > and I actually have a tested prototype for that.
> >
> > Why is the `vector=NULL' needed?  Can't `vector' point at a single iovec
> > which spans the whole address range?
> 
> That would be the option #4 from the same discussion and the issues
> noted there are "process_madvise return value can't handle such a
> large number of bytes and there is MAX_RW_COUNT limit on max number of
> bytes one process_madvise call can handle". In my prototype I have a
> special handling for such "bulk operation" to work around the
> MAX_RW_COUNT limitation.

Ah, OK, return value.  Maybe process_madvise() shouldn't have done that
and should have simply returned 0 on success, like madvise().

I guess a special "nuke whole address space" command is OK.  But, again
in the search for generality, the ability to nuke very large amounts of
address space (but not the entire address space) would be better. 

The process_madvise() return value issue could be addressed by adding a
process_madvise() mode which return 0 on success.

And I guess the MAX_RW_COUNT issue is solvable by adding an
import_iovec() arg to say "don't check that".  Along those lines.

It's all sounding a bit painful (but not *too* painful).  But to
reiterate, I do think that adding the ability for a process to shoot
down a large amount of another process's memory is a lot more generally
useful than tying it to SIGKILL, agree?

> >
> > > If that's the
> > > preferred method then I can post it quite quickly.
> >
> > I assume you've tested that prototype.  How did its usefulness compare
> > with this SIGKILL-based approach?
> 
> Just to make sure I understand correctly your question, you are asking
> about performance comparison of:
> 
> // approach in this RFC
> pidfd_send_signal(SIGKILL, SYNC_REAP_MM)
> 
> vs
> 
> // option #4 in the previous RFC
> kill(SIGKILL); process_madvise(vector=NULL, MADV_DONTNEED);
> 
> If so, I have results for the current RFC approach but the previous
> approach was testing on an older device, so don't have
> apples-to-apples comparison results at the moment. I can collect the
> data for fair comparison if desired, however I don't expect a
> noticeable performance difference since they both do pretty much the
> same thing (even on different devices my results are quite close). I
> think it's more a question of which API would be more appropriate.

OK.  I wouldn't expect performance to be very different (and things can
be sped up if so), but the API usefulness might be an issue.  Using
process_madvise() (or similar) makes it a two-step operation, whereas
tying it to SIGKILL&&TASK_UNINTERRUPTIBLE provides a more precise tool.
Any thoughts on this?


  reply	other threads:[~2020-11-14  2:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-13 17:34 [PATCH 1/1] RFC: add pidfd_send_signal flag to reclaim mm while killing a process Suren Baghdasaryan
2020-11-13 23:55 ` Andrew Morton
2020-11-14  0:06   ` Suren Baghdasaryan
2020-11-14  1:00     ` Andrew Morton
2020-11-14  1:09       ` Suren Baghdasaryan
2020-11-14  1:18         ` Andrew Morton
2020-11-14  1:57           ` Suren Baghdasaryan
2020-11-14  2:16             ` Andrew Morton [this message]
2020-11-14  2:51               ` Suren Baghdasaryan
2020-11-16 23:24               ` Minchan Kim
2020-11-18 19:10               ` Michal Hocko
2020-11-18 19:22                 ` Suren Baghdasaryan
2020-11-18 19:32                   ` Michal Hocko
2020-11-18 19:51                     ` Suren Baghdasaryan
2020-11-18 19:55                       ` Suren Baghdasaryan
2020-11-19  0:13                         ` Suren Baghdasaryan
2020-11-24  5:45                           ` Suren Baghdasaryan
2020-11-18 10:32   ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201113181632.6d98489465430a987c96568d@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=christian@brauner.io \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@android.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).