linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <guro@fb.com>, Rik van Riel <riel@surriel.com>,
	Christian Brauner <christian@brauner.io>,
	Oleg Nesterov <oleg@redhat.com>,
	Tim Murray <timmurray@google.com>,
	linux-api@vger.kernel.org, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	kernel-team <kernel-team@android.com>,
	Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH 1/1] RFC: add pidfd_send_signal flag to reclaim mm while killing a process
Date: Wed, 18 Nov 2020 16:13:52 -0800	[thread overview]
Message-ID: <CAJuCfpH8nMijL+ADZnEWiceYE0MXEePYspSGyoNxq4CQC-nXgg@mail.gmail.com> (raw)
In-Reply-To: <CAJuCfpHP0n6Fyi6Lt9dUyYE72S5=iONkvDMkVSmKo6oRPjbMXQ@mail.gmail.com>

On Wed, Nov 18, 2020 at 11:55 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Nov 18, 2020 at 11:51 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Wed, Nov 18, 2020 at 11:32 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 18-11-20 11:22:21, Suren Baghdasaryan wrote:
> > > > On Wed, Nov 18, 2020 at 11:10 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Fri 13-11-20 18:16:32, Andrew Morton wrote:
> > > > > [...]
> > > > > > It's all sounding a bit painful (but not *too* painful).  But to
> > > > > > reiterate, I do think that adding the ability for a process to shoot
> > > > > > down a large amount of another process's memory is a lot more generally
> > > > > > useful than tying it to SIGKILL, agree?

I was looking into how to work around the limitation of MAX_RW_COUNT
and the conceptual issue there is the "struct iovec" which has its
iov_len as size_t that lacks capacity for expressing ranges like
"entire process memory". I would like to check your reaction to the
following idea which can be implemented without painful surgeries to
the import_iovec and its friends.

process_madvise(pidfd, iovec = [ { range_start_addr, 0 }, {
range_end_addr, 0 } ], vlen = 2, behavior=MADV_xxx, flags =
PMADV_FLAG_RANGE)

So, to represent a range we pass a new PMADV_FLAG_RANGE flag and
construct a 2-element vector to express range start and range end
using iovec.iov_base members. iov_len member of the iovec elements is
ignored in this mode. I know it sounds hacky but I think it's the
simplest way if we want the ability to express an arbitrarily large
range.
Another option is to do what Andrew described as "madvise((void *)0,
(void *)-1, MADV_PAGEOUT)" which means this mode works only with the
entire mm of the process.
WDYT?

> > > > >
> > > > > I am not sure TBH. Is there any reasonable usecase where uncoordinated
> > > > > memory tear down is OK and a target process which is able to see the
> > > > > unmapped memory?
> > > >
> > > > I think uncoordinated memory tear down is a special case which makes
> > > > sense only when the target process is being killed (and we can enforce
> > > > that by allowing MADV_DONTNEED to be used only if the target process
> > > > has pending SIGKILL).
> > >
> > > That would be safe but then I am wondering whether it makes sense to
> > > implement as a madvise call. It is quite strange to expect somebody call
> > > a syscall on a killed process. But this is more a detail. I am not a
> > > great fan of a more generic MADV_DONTNEED on a remote process. This is
> > > just too dangerous IMHO.
> >
> > Agree 100%
>
> I assumed here that by "a more generic MADV_DONTNEED on a remote
> process" you meant "process_madvise(MADV_DONTNEED) applied to a
> process that is not being killed". Re-reading your comment I realized
> that you might have meant "process_madvice() with generic support to
> large memory areas". I hope I understood you correctly.
>
> >
> > >
> > > > However, the ability to apply other flavors of
> > > > process_madvise() to large memory areas spanning multiple VMAs can be
> > > > useful in more cases.
> > >
> > > Yes I do agree with that. The error reporting would be more tricky but
> > > I am not really sure that the exact reporting is really necessary for
> > > advice like interface.
> >
> > Andrew's suggestion for this special mode to change return semantics
> > to the usual "0 or error code" seems to me like the most reasonable
> > way to deal with the return value limitation.
> >
> > >
> > > > For example in Android we will use
> > > > process_madvise(MADV_PAGEOUT) to "shrink" an inactive background
> > > > process.
> > >
> > > That makes sense to me.
> > > --
> > > Michal Hocko
> > > SUSE Labs

  reply	other threads:[~2020-11-19  0:14 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-13 17:34 [PATCH 1/1] RFC: add pidfd_send_signal flag to reclaim mm while killing a process Suren Baghdasaryan
2020-11-13 23:55 ` Andrew Morton
2020-11-14  0:06   ` Suren Baghdasaryan
2020-11-14  1:00     ` Andrew Morton
2020-11-14  1:09       ` Suren Baghdasaryan
2020-11-14  1:18         ` Andrew Morton
2020-11-14  1:57           ` Suren Baghdasaryan
2020-11-14  2:16             ` Andrew Morton
2020-11-14  2:51               ` Suren Baghdasaryan
2020-11-16 23:24               ` Minchan Kim
2020-11-18 19:10               ` Michal Hocko
2020-11-18 19:22                 ` Suren Baghdasaryan
2020-11-18 19:32                   ` Michal Hocko
2020-11-18 19:51                     ` Suren Baghdasaryan
2020-11-18 19:55                       ` Suren Baghdasaryan
2020-11-19  0:13                         ` Suren Baghdasaryan [this message]
2020-11-24  5:45                           ` Suren Baghdasaryan
2020-11-18 10:32   ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpH8nMijL+ADZnEWiceYE0MXEePYspSGyoNxq4CQC-nXgg@mail.gmail.com \
    --to=surenb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian@brauner.io \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@android.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=timmurray@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).