Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Christian Brauner <christian@brauner.io>
To: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tim Murray <timmurray@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Daniel Colascione <dancol@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>,
	jannh@google.com
Subject: Re: [RFC 0/7] introduce memory hinting API for external process
Date: Tue, 21 May 2019 10:42:00 +0200
Message-ID: <20190521084158.s5wwjgewexjzrsm6@brauner.io> (raw)
In-Reply-To: <20190520035254.57579-1-minchan@kernel.org>

On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> - Background
> 
> The Android terminology used for forking a new process and starting an app
> from scratch is a cold start, while resuming an existing app is a hot start.
> While we continually try to improve the performance of cold starts, hot
> starts will always be significantly less power hungry as well as faster so
> we are trying to make hot start more likely than cold start.
> 
> To increase hot start, Android userspace manages the order that apps should
> be killed in a process called ActivityManagerService. ActivityManagerService
> tracks every Android app or service that the user could be interacting with
> at any time and translates that into a ranked list for lmkd(low memory
> killer daemon). They are likely to be killed by lmkd if the system has to
> reclaim memory. In that sense they are similar to entries in any other cache.
> Those apps are kept alive for opportunistic performance improvements but
> those performance improvements will vary based on the memory requirements of
> individual workloads.
> 
> - Problem
> 
> Naturally, cached apps were dominant consumers of memory on the system.
> However, they were not significant consumers of swap even though they are
> good candidate for swap. Under investigation, swapping out only begins
> once the low zone watermark is hit and kswapd wakes up, but the overall
> allocation rate in the system might trip lmkd thresholds and cause a cached
> process to be killed(we measured performance swapping out vs. zapping the
> memory by killing a process. Unsurprisingly, zapping is 10x times faster
> even though we use zram which is much faster than real storage) so kill
> from lmkd will often satisfy the high zone watermark, resulting in very
> few pages actually being moved to swap.
> 
> - Approach
> 
> The approach we chose was to use a new interface to allow userspace to
> proactively reclaim entire processes by leveraging platform information.
> This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> that are known to be cold from userspace and to avoid races with lmkd
> by reclaiming apps as soon as they entered the cached state. Additionally,
> it could provide many chances for platform to use much information to
> optimize memory efficiency.
> 
> IMHO we should spell it out that this patchset complements MADV_WONTNEED
> and MADV_FREE by adding non-destructive ways to gain some free memory
> space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> kernel that memory region is not currently needed and should be reclaimed
> when memory pressure rises.
> 
> To achieve the goal, the patchset introduce two new options for madvise.
> One is MADV_COOL which will deactive activated pages and the other is
> MADV_COLD which will reclaim private pages instantly. These new options
> complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
> gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way
> that it hints the kernel that memory region is not currently needed and
> should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way
> that it hints the kernel that memory region is not currently needed and
> should be reclaimed when memory pressure rises.
> 
> This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> information required to make the reclaim decision is not known to the app.
> Instead, it is known to a centralized userspace daemon, and that daemon
> must be able to initiate reclaim on its own without any app involvement.
> To solve the concern, this patch introduces new syscall -
> 
> 	struct pr_madvise_param {
> 		int size;
> 		const struct iovec *vec;
> 	}
> 
> 	int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
> 				struct pr_madvise_param *restuls,
> 				struct pr_madvise_param *ranges,
> 				unsigned long flags);
> 
> The syscall get pidfd to give hints to external process and provides
> pair of result/ranges vector arguments so that it could give several
> hints to each address range all at once.
> 
> I guess others have different ideas about the naming of syscall and options
> so feel free to suggest better naming.

Yes, all new syscalls making use of pidfds should be named
pidfd_<action>. So please make this pidfd_madvise.

Please make sure to Cc me on this in the future as I'm maintaining
pidfds. Would be great to have Jann on this too since he's been touching
both mm and parts of the pidfd stuff with me.


  parent reply index

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-20  3:52 Minchan Kim
2019-05-20  3:52 ` [RFC 1/7] mm: introduce MADV_COOL Minchan Kim
2019-05-20  8:16   ` Michal Hocko
2019-05-20  8:19     ` Michal Hocko
2019-05-20 15:08       ` Suren Baghdasaryan
2019-05-20 22:55       ` Minchan Kim
2019-05-20 22:54     ` Minchan Kim
2019-05-21  6:04       ` Michal Hocko
2019-05-21  9:11         ` Minchan Kim
2019-05-21 10:05           ` Michal Hocko
2019-05-28  8:53   ` Hillf Danton
2019-05-28 10:58   ` Minchan Kim
2019-05-20  3:52 ` [RFC 2/7] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Minchan Kim
2019-05-20 16:50   ` Johannes Weiner
2019-05-20 22:57     ` Minchan Kim
2019-05-20  3:52 ` [RFC 3/7] mm: introduce MADV_COLD Minchan Kim
2019-05-20  8:27   ` Michal Hocko
2019-05-20 23:00     ` Minchan Kim
2019-05-21  6:08       ` Michal Hocko
2019-05-21  9:13         ` Minchan Kim
2019-05-28 14:54   ` Hillf Danton
2019-05-30  0:45   ` Minchan Kim
2019-05-20  3:52 ` [RFC 4/7] mm: factor out madvise's core functionality Minchan Kim
2019-05-20 14:26   ` Oleksandr Natalenko
2019-05-21  1:26     ` Minchan Kim
2019-05-21  6:36       ` Oleksandr Natalenko
2019-05-21  6:50         ` Michal Hocko
2019-05-21  7:06           ` Oleksandr Natalenko
2019-05-21 10:52             ` Minchan Kim
2019-05-21 11:00               ` Michal Hocko
2019-05-21 11:24                 ` Minchan Kim
2019-05-21 11:32                   ` Michal Hocko
2019-05-21 10:49         ` Minchan Kim
2019-05-21 10:55           ` Michal Hocko
2019-05-20  3:52 ` [RFC 5/7] mm: introduce external memory hinting API Minchan Kim
2019-05-20  9:18   ` Michal Hocko
2019-05-21  2:41     ` Minchan Kim
2019-05-21  6:17       ` Michal Hocko
2019-05-21 10:32         ` Minchan Kim
2019-05-21  9:01   ` Christian Brauner
2019-05-21 11:35     ` Minchan Kim
2019-05-21 11:51       ` Christian Brauner
2019-05-21 15:31   ` Oleg Nesterov
2019-05-27  7:43     ` Minchan Kim
2019-05-27 15:12       ` Oleg Nesterov
2019-05-27 23:33         ` Minchan Kim
2019-05-28  7:23           ` Michal Hocko
2019-05-29  3:41   ` Hillf Danton
2019-05-30  0:38   ` Minchan Kim
2019-05-20  3:52 ` [RFC 6/7] mm: extend process_madvise syscall to support vector arrary Minchan Kim
2019-05-20  9:22   ` Michal Hocko
2019-05-21  2:48     ` Minchan Kim
2019-05-21  6:24       ` Michal Hocko
2019-05-21 10:26         ` Minchan Kim
2019-05-21 10:37           ` Michal Hocko
2019-05-27  7:49             ` Minchan Kim
2019-05-29 10:08               ` Daniel Colascione
2019-05-29 10:33                 ` Michal Hocko
2019-05-30  2:17                   ` Minchan Kim
2019-05-30  6:57                     ` Michal Hocko
2019-05-30  8:02                       ` Minchan Kim
2019-05-30 16:19                         ` Daniel Colascione
2019-05-30 18:47                         ` Michal Hocko
2019-05-29  4:14   ` Hillf Danton
2019-05-30  0:35   ` Minchan Kim
2019-05-20  3:52 ` [RFC 7/7] mm: madvise support MADV_ANONYMOUS_FILTER and MADV_FILE_FILTER Minchan Kim
2019-05-20  9:28   ` Michal Hocko
2019-05-21  2:55     ` Minchan Kim
2019-05-21  6:26       ` Michal Hocko
2019-05-27  7:58         ` Minchan Kim
2019-05-27 12:44           ` Michal Hocko
2019-05-28  3:26             ` Minchan Kim
2019-05-28  6:29               ` Michal Hocko
2019-05-28  8:13                 ` Minchan Kim
2019-05-28  8:31                   ` Daniel Colascione
2019-05-28  8:49                     ` Minchan Kim
2019-05-28  9:08                       ` Michal Hocko
2019-05-28  9:39                         ` Daniel Colascione
2019-05-28 10:33                           ` Michal Hocko
2019-05-28 11:21                             ` Daniel Colascione
2019-05-28 11:49                               ` Michal Hocko
2019-05-28 12:11                                 ` Daniel Colascione
2019-05-28 12:32                                   ` Michal Hocko
2019-05-28 10:32                         ` Minchan Kim
2019-05-28 10:41                           ` Michal Hocko
2019-05-28 11:12                             ` Minchan Kim
2019-05-28 11:28                               ` Michal Hocko
2019-05-28 11:42                                 ` Daniel Colascione
2019-05-28 11:56                                   ` Michal Hocko
2019-05-28 12:18                                     ` Daniel Colascione
2019-05-28 12:38                                       ` Michal Hocko
2019-05-28 12:10                                   ` Minchan Kim
2019-05-28 11:44                                 ` Minchan Kim
2019-05-28 11:51                                   ` Daniel Colascione
2019-05-28 12:06                                   ` Michal Hocko
2019-05-28 12:22                                     ` Minchan Kim
2019-05-28 11:28                             ` Daniel Colascione
2019-05-21 15:33       ` Johannes Weiner
2019-05-22  1:50         ` Minchan Kim
2019-05-29  4:36   ` Hillf Danton
2019-05-30  1:00   ` Minchan Kim
2019-05-20  6:37 ` [RFC 0/7] introduce memory hinting API for external process Anshuman Khandual
2019-05-20 16:59   ` Tim Murray
2019-05-21  2:55     ` Anshuman Khandual
2019-05-21  5:14       ` Minchan Kim
2019-05-21 10:34       ` Michal Hocko
2019-05-28 10:50         ` Anshuman Khandual
2019-05-21 12:56       ` Shakeel Butt
2019-05-22  4:15         ` Brian Geffon
2019-05-22  4:23         ` Brian Geffon
2019-05-20  9:28 ` Michal Hocko
2019-05-20 14:42 ` Oleksandr Natalenko
2019-05-21  2:56   ` Minchan Kim
2019-05-20 16:46 ` Johannes Weiner
2019-05-21  4:39   ` Minchan Kim
2019-05-21  6:32     ` Michal Hocko
2019-05-21  1:44 ` Matthew Wilcox
2019-05-21  5:01   ` Minchan Kim
2019-05-21  6:34   ` Michal Hocko
2019-05-21  8:42 ` Christian Brauner [this message]
2019-05-21 11:05   ` Minchan Kim
2019-05-21 11:30     ` Christian Brauner
2019-05-21 11:39       ` Christian Brauner
2019-05-22  5:11         ` Daniel Colascione
2019-05-22  8:22           ` Christian Brauner
2019-05-22 13:16             ` Daniel Colascione
2019-05-22 14:52               ` Christian Brauner
2019-05-22 15:17                 ` Daniel Colascione
2019-05-22 15:48                   ` Christian Brauner
2019-05-22 15:57                     ` Daniel Colascione
2019-05-22 16:01                       ` Christian Brauner
2019-05-22 16:01                         ` Daniel Colascione
2019-05-23 13:07                           ` Minchan Kim
2019-05-27  8:06                             ` Minchan Kim
2019-05-21 11:41       ` Minchan Kim
2019-05-21 12:04         ` Christian Brauner
2019-05-21 12:15           ` Oleksandr Natalenko
2019-05-21 12:53 ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190521084158.s5wwjgewexjzrsm6@brauner.io \
    --to=christian@brauner.io \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=shakeelb@google.com \
    --cc=sonnyrao@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git