All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Michal Hocko <mhocko@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>,
	Minchan Kim <minchan@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-api@vger.kernel.org, oleksandr@redhat.com,
	Suren Baghdasaryan <surenb@google.com>,
	Tim Murray <timmurray@google.com>,
	Daniel Colascione <dancol@google.com>,
	Sandeep Patil <sspatil@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	John Dias <joaodias@google.com>,
	christian.brauner@ubuntu.com, sjpark@amazon.de
Subject: Re: [PATCH v2 2/5] mm: introduce external memory hinting API
Date: Mon, 20 Jan 2020 15:39:35 +0300	[thread overview]
Message-ID: <20200120123935.onlls7enjtzenbvt@box> (raw)
In-Reply-To: <20200120112722.GY18451@dhcp22.suse.cz>

On Mon, Jan 20, 2020 at 12:27:22PM +0100, Michal Hocko wrote:
> On Mon 20-01-20 13:24:35, Kirill Tkhai wrote:
> > On 17.01.2020 14:52, Michal Hocko wrote:
> > > On Thu 16-01-20 15:59:50, Minchan Kim wrote:
> > >> There is usecase that System Management Software(SMS) want to give
> > >> a memory hint like MADV_[COLD|PAGEEOUT] to other processes and
> > >> in the case of Android, it is the ActivityManagerService.
> > >>
> > >> It's similar in spirit to madvise(MADV_WONTNEED), but the information
> > >> required to make the reclaim decision is not known to the app. Instead,
> > >> it is known to the centralized userspace daemon(ActivityManagerService),
> > >> and that daemon must be able to initiate reclaim on its own without
> > >> any app involvement.
> > >>
> > >> To solve the issue, this patch introduces new syscall process_madvise(2).
> > >> It uses pidfd of an external processs to give the hint.
> > >>
> > >>  int process_madvise(int pidfd, void *addr, size_t length, int advise,
> > >> 			unsigned long flag);
> > >>
> > >> Since it could affect other process's address range, only privileged
> > >> process(CAP_SYS_PTRACE) or something else(e.g., being the same UID)
> > >> gives it the right to ptrace the process could use it successfully.
> > >> The flag argument is reserved for future use if we need to extend the
> > >> API.
> > >>
> > >> I think supporting all hints madvise has/will supported/support to
> > >> process_madvise is rather risky. Because we are not sure all hints make
> > >> sense from external process and implementation for the hint may rely on
> > >> the caller being in the current context so it could be error-prone.
> > >> Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch.
> > >>
> > >> If someone want to add other hints, we could hear hear the usecase and
> > >> review it for each hint. It's more safe for maintainace rather than
> > >> introducing a buggy syscall but hard to fix it later.
> > > 
> > > I have brought this up when we discussed this in the past but there is
> > > no reflection on that here so let me bring that up again. 
> > > 
> > > I believe that the interface has an inherent problem that it is racy.
> > > The external entity needs to know the address space layout of the target
> > > process to do anyhing useful on it. The address space is however under
> > > the full control of the target process though and the external entity
> > > has no means to find out that the layout has changed. So
> > > time-to-check-time-to-act is an inherent problem.
> > > 
> > > This is a serious design flaw and it should be explained why it doesn't
> > > matter or how to use the interface properly to prevent that problem.
> > 
> > Really, any address space manipulation, where more than one process is
> > involved, is racy.
> 
> They are, indeed. But that is not the point I wanted to make.
> 
> > Even two threads on common memory need a synchronization
> > to manage mappings in a sane way. Managing memory from two processes
> > is the same in principle, and the only difference is that another level
> > of synchronization is required.
> 
> Well, not really. The operation might simply attempt to perform an
> operation on a specific memory area and get a failure if it doesn't
> reference the same object anymore. What I think we need is some form of
> a handle to operate on. In the past we have discussed several
> directions. I was proposing /proc/self/map_anon/ (analogous to
> map_files) where you could inspect anonymous memory and get a file
> handle for it. madvise would then operate on the fd and then there
> shouldn't be a real problem to revalidate that the object is still
> valid. But there was no general enthusiasm about that approach. There
> are likely some land mines on the way.

Converting anon memory to file-backed is bad idea and going to backfire.
It will break many things around rmap (for THP in particular). We have
assumption that an anon page cannot be mapped twice in the same process.
Breaking rules around memory inheritance is not helpful too.

It is a major re-design of the subsystem and I don't see any real need for
this.

-- 
 Kirill A. Shutemov

  reply	other threads:[~2020-01-20 12:39 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-16 23:59 [PATCH v2 0/5] introduce memory hinting API for external process Minchan Kim
2020-01-16 23:59 ` [PATCH v2 1/5] mm: factor out madvise's core functionality Minchan Kim
2020-01-17 10:02   ` Kirill Tkhai
2020-01-17 10:02     ` Kirill Tkhai
2020-01-17 18:14     ` Minchan Kim
2020-01-17 18:14       ` Minchan Kim
2020-01-16 23:59 ` [PATCH v2 2/5] mm: introduce external memory hinting API Minchan Kim
2020-01-16 23:59   ` Minchan Kim
2020-01-17 11:52   ` Michal Hocko
2020-01-17 11:52     ` Michal Hocko
2020-01-17 15:58     ` Kirill A. Shutemov
2020-01-17 15:58       ` Kirill A. Shutemov
2020-01-17 17:32       ` Minchan Kim
2020-01-17 17:32         ` Minchan Kim
2020-01-17 21:26         ` Kirill A. Shutemov
2020-01-17 21:26           ` Kirill A. Shutemov
2020-01-18  9:40           ` SeongJae Park
2020-01-18  9:40             ` SeongJae Park
2020-01-19 16:14           ` sspatil
2020-01-19 16:14             ` sspatil-hpIqsD4AKlfQT0dZR+AlfA
2020-01-20  7:58             ` Michal Hocko
2020-01-20 10:39               ` Kirill Tkhai
2020-01-20 10:39                 ` Kirill Tkhai
2020-01-21 18:32               ` Minchan Kim
2020-01-22  8:28                 ` Michal Hocko
2020-01-22  8:28                   ` Michal Hocko
2020-01-22  9:36                   ` SeongJae Park
2020-01-22  9:36                     ` SeongJae Park
2020-01-22 10:02                     ` Michal Hocko
2020-01-22 10:02                       ` Michal Hocko
2020-01-22 13:28                       ` SeongJae Park
2020-01-22 13:28                         ` SeongJae Park
2020-01-23  1:41                   ` Minchan Kim
2020-01-23  1:41                     ` Minchan Kim
2020-01-23  9:13                     ` Michal Hocko
2020-01-21 18:11           ` Minchan Kim
2020-01-21 18:11             ` Minchan Kim
2020-01-22 10:44             ` Oleksandr Natalenko
2020-01-23  1:43               ` Minchan Kim
2020-01-23  7:29                 ` Oleksandr Natalenko
2020-01-17 17:25     ` Minchan Kim
2020-01-17 17:25       ` Minchan Kim
2020-01-20  8:03       ` Michal Hocko
2020-01-20  8:03         ` Michal Hocko
2020-01-20 10:24     ` Kirill Tkhai
2020-01-20 10:24       ` Kirill Tkhai
2020-01-20 11:27       ` Michal Hocko
2020-01-20 11:27         ` Michal Hocko
2020-01-20 12:39         ` Kirill A. Shutemov [this message]
2020-01-20 13:24           ` Michal Hocko
2020-01-20 13:24             ` Michal Hocko
2020-01-20 14:21             ` Kirill A. Shutemov
2020-01-20 15:44               ` Michal Hocko
2020-01-20 15:44                 ` Michal Hocko
2020-01-21 18:43             ` Minchan Kim
2020-01-21 18:43               ` Minchan Kim
2020-01-16 23:59 ` [PATCH v2 3/5] mm/madvise: employ mmget_still_valid for write lock Minchan Kim
2020-01-16 23:59 ` [PATCH v2 4/5] mm/madvise: allow KSM hints for remote API Minchan Kim
2020-01-17 10:13   ` Kirill Tkhai
2020-01-17 10:13     ` Kirill Tkhai
2020-01-17 12:34     ` Oleksandr Natalenko
2020-01-17 12:34       ` Oleksandr Natalenko
2020-01-21 17:45       ` Minchan Kim
2020-01-21 17:45         ` Minchan Kim
2020-01-16 23:59 ` [PATCH v2 5/5] mm: support both pid and pidfd for process_madvise Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200120123935.onlls7enjtzenbvt@box \
    --to=kirill@shutemov.name \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=joaodias@google.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=oleksandr@redhat.com \
    --cc=shakeelb@google.com \
    --cc=sjpark@amazon.de \
    --cc=sonnyrao@google.com \
    --cc=sspatil@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.