linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-api@vger.kernel.org, oleksandr@redhat.com,
	Tim Murray <timmurray@google.com>,
	Daniel Colascione <dancol@google.com>,
	Sandeep Patil <sspatil@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	John Dias <joaodias@google.com>,
	Joel Fernandes <joel@joelfernandes.org>
Subject: Re: [PATCH v3 2/5] mm: introduce external memory hinting API
Date: Mon, 10 Feb 2020 13:27:27 -0800	[thread overview]
Message-ID: <20200210212727.GA48790@google.com> (raw)
In-Reply-To: <CAJuCfpFOBUYfxyQZZCvKjD0pyh_D-ZEpJP9kLD8xav+hVHZWYQ@mail.gmail.com>

Hi Suren,

On Mon, Feb 10, 2020 at 09:50:20AM -0800, Suren Baghdasaryan wrote:
> On Mon, Jan 27, 2020 at 4:17 PM Minchan Kim <minchan@kernel.org> wrote:
> >
> > There is usecase that System Management Software(SMS) want to give
> > a memory hint like MADV_[COLD|PAGEEOUT] to other processes and
> > in the case of Android, it is the ActivityManagerService.
> >
> > It's similar in spirit to madvise(MADV_WONTNEED), but the information
> > required to make the reclaim decision is not known to the app. Instead,
> > it is known to the centralized userspace daemon(ActivityManagerService),
> > and that daemon must be able to initiate reclaim on its own without
> > any app involvement.
> >
> > To solve the issue, this patch introduces a new syscall process_madvise(2).
> > It uses pidfd of an external process to give the hint.
> >
> >  int process_madvise(int pidfd, void *addr, size_t length, int advise,
> >                         unsigned long flag);
> >
> > Since it could affect other process's address range, only privileged
> > process(CAP_SYS_PTRACE) or something else(e.g., being the same UID)
> > gives it the right to ptrace the process could use it successfully.
> > The flag argument is reserved for future use if we need to extend the
> > API.
> >
> > I think supporting all hints madvise has/will supported/support to
> > process_madvise is rather risky. Because we are not sure all hints make
> > sense from external process and implementation for the hint may rely on
> > the caller being in the current context so it could be error-prone.
> > Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch.
> >
> > If someone want to add other hints, we could hear hear the usecase and
> > review it for each hint. It's safer for maintenance rather than
> > introducing a buggy syscall but hard to fix it later.
> 
> I would definitely be interested in adding MADV_DONTNEED support for
> process_madvise() to allow quick memory reclaim after a kill. The
> scenario is that userspace daemon can kill a process and try to help
> reclaim its memory. Having process_madvise(MADV_DONTNEED) support
> helps in the following cases:
> 1. Process issuing process_madvise has a higher CPU bandwidth
> allowance than the victim process, therefore can reclaim victim's
> memory quicker.
> 2. In case the victim occupies large amounts of memory the process
> issuing process_madvise can spawn multiple (possibly high priority)
> threads each reclaiming portions of the victim's memory.
> Such an extension will add a destructive kind of madvise into the set
> supported by process_madvise and I want to make sure we can accomodate
> for that in the future. Do you see any issues with supporting
> MADV_DONTNEED in the future?

Or kernel could do by themselves to spawn mulitple tasks if the system has
available badwidth and target process has a lot memory to be reclaimed

Anyway, it doesn't have any issue because we already have some synchrnoization
methods(e.g., signal or cgroup freezer) to freeze target processes before
giving a hint. It's not different with usual write syscall on shared file
among processes.

< snip >

> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 0c901de531e4..00ffa7e92f79 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/falloc.h>
> >  #include <linux/fadvise.h>
> >  #include <linux/sched.h>
> > +#include <linux/sched/mm.h>
> >  #include <linux/ksm.h>
> >  #include <linux/fs.h>
> >  #include <linux/file.h>
> > @@ -315,6 +316,8 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> >
> >         if (fatal_signal_pending(task))
> >                 return -EINTR;
> > +       else if (current != task && fatal_signal_pending(current))
> > +               return -EINTR;
> 
> I think this can be simplified as:
> 
> +       if (fatal_signal_pending(current))
> +               return -EINTR;
> 
> current != task condition is not needed because if current == task
> then you would return earlier after checking
> fatal_signal_pending(task).

True, I will remove it.

Thanks!

  reply	other threads:[~2020-02-10 21:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-28  0:16 [PATCH v3 0/5] introduce memory hinting API for external process Minchan Kim
2020-01-28  0:16 ` [PATCH v3 1/5] mm: factor out madvise's core functionality Minchan Kim
2020-02-10 23:00   ` Alexander Duyck
2020-02-11 21:08     ` Minchan Kim
2020-01-28  0:16 ` [PATCH v3 2/5] mm: introduce external memory hinting API Minchan Kim
2020-02-10 17:50   ` Suren Baghdasaryan
2020-02-10 21:27     ` Minchan Kim [this message]
2020-01-28  0:16 ` [PATCH v3 3/5] mm/madvise: employ mmget_still_valid for write lock Minchan Kim
2020-01-28  0:16 ` [PATCH v3 4/5] mm/madvise: allow KSM hints for remote API Minchan Kim
2020-01-28  0:16 ` [PATCH v3 5/5] mm: support both pid and pidfd for process_madvise Minchan Kim
2020-02-10 23:12   ` Alexander Duyck
2020-02-11 21:11     ` Minchan Kim
2020-02-10 21:29 ` [PATCH v3 0/5] introduce memory hinting API for external process Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200210212727.GA48790@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=joaodias@google.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=oleksandr@redhat.com \
    --cc=shakeelb@google.com \
    --cc=sonnyrao@google.com \
    --cc=sspatil@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).