linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-api@vger.kernel.org, Johannes Weiner <hannes@cmpxchg.org>,
	Tim Murray <timmurray@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Daniel Colascione <dancol@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>,
	jannh@google.com, oleg@redhat.com, christian@brauner.io,
	oleksandr@redhat.com, hdanton@sina.com
Subject: Re: [RFCv2 1/6] mm: introduce MADV_COLD
Date: Fri, 31 May 2019 16:03:32 +0200	[thread overview]
Message-ID: <20190531140332.GT6896@dhcp22.suse.cz> (raw)
In-Reply-To: <20190531133904.GC195463@google.com>

On Fri 31-05-19 22:39:04, Minchan Kim wrote:
> On Fri, May 31, 2019 at 10:47:52AM +0200, Michal Hocko wrote:
> > On Fri 31-05-19 15:43:08, Minchan Kim wrote:
> > > When a process expects no accesses to a certain memory range, it could
> > > give a hint to kernel that the pages can be reclaimed when memory pressure
> > > happens but data should be preserved for future use.  This could reduce
> > > workingset eviction so it ends up increasing performance.
> > > 
> > > This patch introduces the new MADV_COLD hint to madvise(2) syscall.
> > > MADV_COLD can be used by a process to mark a memory range as not expected
> > > to be used in the near future. The hint can help kernel in deciding which
> > > pages to evict early during memory pressure.
> > > 
> > > Internally, it works via deactivating pages from active list to inactive's
> > > head if the page is private because inactive list could be full of
> > > used-once pages which are first candidate for the reclaiming and that's a
> > > reason why MADV_FREE move pages to head of inactive LRU list. Therefore,
> > > if the memory pressure happens, they will be reclaimed earlier than other
> > > active pages unless there is no access until the time.
> > 
> > [I am intentionally not looking at the implementation because below
> > points should be clear from the changelog - sorry about nagging ;)]
> > 
> > What kind of pages can be deactivated? Anonymous/File backed.
> > Private/shared? If shared, are there any restrictions?
> 
> Both file and private pages could be deactived from each active LRU
> to each inactive LRU if the page has one map_count. In other words,
> 
>     if (page_mapcount(page) <= 1)
>         deactivate_page(page);

Why do we restrict to pages that are single mapped?

> > Are there any restrictions on mappings? E.g. what would be an effect of
> > this operation on hugetlbfs mapping?
> 
> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma will be skipped like MADV_FREE|DONTNEED

OK documenting that this is restricted to the same vmas as MADV_FREE|DONTNEED
is really useful to mention.

> 
> > 
> > Also you are talking about inactive LRU but what kind of LRU is that? Is
> > it the anonymous LRU? If yes, don't we have the same problem as with the
> 
> active file page -> inactive file LRU
> active anon page -> inacdtive anon LRU
> 
> > early MADV_FREE implementation when enough page cache causes that
> > deactivated anonymous memory doesn't get reclaimed anytime soon. Or
> > worse never when there is no swap available?
> 
> I think MADV_COLD is a little bit different symantic with MAVD_FREE.
> MADV_FREE means it's okay to discard when the memory pressure because
> the content of the page is *garbage*. Furthemore, freeing such pages is
> almost zero overhead since we don't need to swap out and access
> afterward causes minor fault. Thus, it would make sense to put those
> freeable pages in inactive file LRU to compete other used-once pages.
> 
> However, MADV_COLD doesn't means it's a garbage and freeing requires
> swap out/swap in afterward. So, it would be better to move inactive
> anon's LRU list, not file LRU. Furthermore, it would avoid unnecessary
> scanning of those cold anonymous if system doesn't have a swap device.

Please document this, if this is really a desirable semantic because
then you have the same set of problems as we've had with the early
MADV_FREE implementation mentioned above.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2019-05-31 14:03 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-31  6:43 [RFCv2 0/6] introduce memory hinting API for external process Minchan Kim
2019-05-31  6:43 ` [RFCv2 1/6] mm: introduce MADV_COLD Minchan Kim
2019-05-31  8:47   ` Michal Hocko
2019-05-31 13:39     ` Minchan Kim
2019-05-31 14:03       ` Michal Hocko [this message]
2019-05-31 14:34         ` Minchan Kim
2019-06-03  7:16           ` Michal Hocko
2019-06-03 15:43             ` Daniel Colascione
2019-06-03 17:27             ` Johannes Weiner
2019-06-03 20:32               ` Michal Hocko
2019-06-03 21:50                 ` Johannes Weiner
2019-06-03 23:02                   ` Minchan Kim
2019-06-04  6:56                     ` Michal Hocko
2019-06-04 12:06                     ` Johannes Weiner
2019-06-04  6:55                   ` Michal Hocko
2019-06-04  4:26             ` Minchan Kim
2019-06-04  7:02               ` Michal Hocko
2019-05-31  6:43 ` [RFCv2 2/6] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Minchan Kim
2019-05-31  6:43 ` [RFCv2 3/6] mm: introduce MADV_PAGEOUT Minchan Kim
2019-05-31  8:50   ` Michal Hocko
2019-05-31 13:44     ` Minchan Kim
2019-05-31 16:59   ` Johannes Weiner
2019-05-31 23:14     ` Minchan Kim
2019-05-31  6:43 ` [RFCv2 4/6] mm: factor out madvise's core functionality Minchan Kim
2019-05-31  7:04   ` Oleksandr Natalenko
2019-05-31 13:12     ` Minchan Kim
2019-05-31 14:35       ` Oleksandr Natalenko
2019-05-31 23:29         ` Minchan Kim
2019-06-05 13:27           ` Oleksandr Natalenko
2019-06-10 10:12             ` Minchan Kim
2019-05-31  6:43 ` [RFCv2 5/6] mm: introduce external memory hinting API Minchan Kim
2019-05-31  8:37   ` Michal Hocko
2019-05-31 13:19     ` Minchan Kim
2019-05-31 14:00       ` Michal Hocko
2019-05-31 14:11         ` Minchan Kim
2019-05-31 17:35   ` Daniel Colascione
2019-05-31  6:43 ` [RFCv2 6/6] mm: extend process_madvise syscall to support vector arrary Minchan Kim
2019-05-31 10:06   ` Yann Droneaud
2019-05-31 23:18     ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190531140332.GT6896@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=christian@brauner.io \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=oleksandr@redhat.com \
    --cc=shakeelb@google.com \
    --cc=sonnyrao@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).