linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>,
	Michal Hocko <mhocko@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Minchan Kim <minchan@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Hugh Dickins <hughd@google.com>,
	Alexander Duyck <alexander.duyck@gmail.com>
Subject: Re: [RFC 0/3] mm: Discard lazily freed pages when migrating
Date: Fri, 28 Feb 2020 09:49:54 +0000	[thread overview]
Message-ID: <20200228094954.GB3772@suse.de> (raw)
In-Reply-To: <871rqf850z.fsf@yhuang-dev.intel.com>

On Fri, Feb 28, 2020 at 04:55:40PM +0800, Huang, Ying wrote:
> > E.g., free page reporting in QEMU wants to use MADV_FREE. The guest will
> > report currently free pages to the hypervisor, which will MADV_FREE the
> > reported memory. As long as there is no memory pressure, there is no
> > need to actually free the pages. Once the guest reuses such a page, it
> > could happen that there is still the old page and pulling in in a fresh
> > (zeroed) page can be avoided.
> >
> > AFAIKs, after your change, we would get more pages discarded from our
> > guest, resulting in more fresh (zeroed) pages having to be pulled in
> > when a guest touches a reported free page again. But OTOH, page
> > migration is speed up (avoiding to migrate these pages).
> 
> Let's look at this problem in another perspective.  To migrate the
> MADV_FREE pages of the QEMU process from the node A to the node B, we
> need to free the original pages in the node A, and (maybe) allocate the
> same number of pages in the node B.  So the question becomes
> 
> - we may need to allocate some pages in the node B
> - these pages may be accessed by the application or not
> - we should allocate all these pages in advance or allocate them lazily
>   when they are accessed.
> 
> We thought the common philosophy in Linux kernel is to allocate lazily.
> 

I also think there needs to be an example of a real application that
benefits from this new behaviour. Consider the possible sources of page
migration

1. NUMA balancing -- The application has to read/write the data for this
   to trigger. In the case of write, MADV_FREE is cancelled and it's
   mostly likely going to be a write unless it's an application bug.
2. sys_movepages -- the application has explictly stated the data is in
   use on a particular node yet any MADV_FREE page gets discarded
3. Compaction -- there may be no memory pressure at all but the
   MADV_FREE memory is discarded prematurely

In the first case, the data is explicitly in use, most likely due to
a write in which case it's inappropriate to discard. Discarding and
reallocating a zero'd page is not expected. In second case, the data is
likely in use or else why would the system call be used? In the third case
the timing of when MADV_FREE pages disappear is arbitrary as it can happen
without any actual memory pressure.  This may or may not be problematic but
it leads to unpredictable latencies for applications that use MADV_FREE
for a quick malloc/free implementation.  Before, as long as there is no
pressure, the reuse of a MADV_FREE incurs just a small penalty but now
with compaction it does not matter if the system avoids memory pressure
because they may still incur a fault to allocate and zero a new page.

There is a hypothetical fourth case which I only mention because of your
email address. If persistent memory is ever used for tiered memory then
MADV_FREE pages that migrate from dram to pmem gets discarded instead
of migrated. When it's reused, it gets reallocated from dram regardless
of whether that region is hot or not.  This may lead to an odd scenario
whereby applications occupy dram prematurely due to a single reference
of a MADV_FREE page.

It's all subtle enough that we really should have an example application
in mind that benefits so we can weigh the benefits against the potential
risks.

-- 
Mel Gorman
SUSE Labs


  reply	other threads:[~2020-02-28  9:50 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28  3:38 [RFC 0/3] mm: Discard lazily freed pages when migrating Huang, Ying
2020-02-28  3:38 ` [RFC 1/3] mm, migrate: Check return value of try_to_unmap() Huang, Ying
2020-02-28  3:38 ` [RFC 2/3] mm: Add a new page flag PageLayzyFree() for MADV_FREE Huang, Ying
2020-02-28  6:13   ` David Hildenbrand
2020-02-28  6:47     ` Huang, Ying
2020-03-15  8:18   ` Wei Yang
2020-03-15  8:54     ` Mika Penttilä
2020-03-15 12:22       ` Wei Yang
2020-03-16  1:21         ` Huang, Ying
2020-03-16 22:38           ` Wei Yang
2020-02-28  3:38 ` [RFC 3/3] mm: Discard lazily freed pages when migrating Huang, Ying
2020-02-28  3:42 ` [RFC 0/3] " Matthew Wilcox
2020-02-28  7:25   ` Huang, Ying
2020-02-28  8:22     ` David Hildenbrand
2020-02-28  8:55       ` Huang, Ying
2020-02-28  9:49         ` Mel Gorman [this message]
2020-03-02 11:23           ` Huang, Ying
2020-03-02 15:16             ` Mel Gorman
2020-03-03  1:51               ` Huang, Ying
2020-03-03  8:09                 ` Michal Hocko
2020-03-03  8:47                   ` Huang, Ying
2020-03-03  8:58                     ` Michal Hocko
2020-03-03 11:49                       ` Huang, Ying
2020-03-04  9:58                         ` Michal Hocko
2020-03-04 10:56                           ` Mel Gorman
2020-03-05  1:42                             ` Huang, Ying
2020-03-04 11:15                           ` Huang, Ying
2020-03-04 11:26                             ` Michal Hocko
2020-03-05  1:45                               ` Huang, Ying
2020-03-05 10:48                             ` Mel Gorman
2020-03-06  4:05                               ` Huang, Ying
2020-03-09  5:26                               ` Huang, Ying
2020-03-03 13:02                 ` Mel Gorman
2020-03-04  0:33                   ` Huang, Ying
2020-02-28  9:50         ` Michal Hocko
2020-02-28 10:15           ` Michal Hocko
2020-02-28 13:45           ` Johannes Weiner
2020-03-02 14:12           ` Huang, Ying
2020-03-02 14:23             ` David Hildenbrand
2020-03-03  0:25               ` Huang, Ying
2020-03-02 14:25             ` Michal Hocko
2020-03-03  1:30               ` Huang, Ying
2020-03-03  8:19                 ` Michal Hocko
2020-03-03 11:36                   ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200228094954.GB3772@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).