linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Yang Shi <yang.shi@linux.alibaba.com>
Cc: mgorman@techsingularity.net, riel@surriel.com,
	hannes@cmpxchg.org, akpm@linux-foundation.org,
	dave.hansen@intel.com, keith.busch@intel.com,
	dan.j.williams@intel.com, fengguang.wu@intel.com,
	fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node
Date: Wed, 17 Apr 2019 11:17:48 +0200	[thread overview]
Message-ID: <20190417091748.GF655@dhcp22.suse.cz> (raw)
In-Reply-To: <876768ad-a63a-99c3-59de-458403f008c4@linux.alibaba.com>

On Tue 16-04-19 12:19:21, Yang Shi wrote:
> 
> 
> On 4/16/19 12:47 AM, Michal Hocko wrote:
[...]
> > Why cannot we simply demote in the proximity order? Why do you make
> > cpuless nodes so special? If other close nodes are vacant then just use
> > them.
> 
> We could. But, this raises another question, would we prefer to just demote
> to the next fallback node (just try once), if it is contended, then just
> swap (i.e. DRAM0 -> PMEM0 -> Swap); or would we prefer to try all the nodes
> in the fallback order to find the first less contended one (i.e. DRAM0 ->
> PMEM0 -> DRAM1 -> PMEM1 -> Swap)?

I would go with the later. Why, because it is more natural. Because that
is the natural allocation path so I do not see why this shouldn't be the
natural demotion path.

> 
> |------|     |------| |------|        |------|
> |PMEM0|---|DRAM0| --- CPU0 --- CPU1 --- |DRAM1| --- |PMEM1|
> |------|     |------| |------|       |------|
> 
> The first one sounds simpler, and the current implementation does so and
> this needs find out the closest PMEM node by recognizing cpuless node.

Unless you are specifying an explicit nodemask then the allocator will
do the allocation fallback for the migration target for you.

> If we prefer go with the second option, it is definitely unnecessary to
> specialize any node.
> 
> > > > I would expect that the very first attempt wouldn't do much more than
> > > > migrate to-be-reclaimed pages (without an explicit binding) with a
> > > Do you mean respect mempolicy or cpuset when doing demotion? I was wondering
> > > this, but I didn't do so in the current implementation since it may need
> > > walk the rmap to retrieve the mempolicy in the reclaim path. Is there any
> > > easier way to do so?
> > You definitely have to follow policy. You cannot demote to a node which
> > is outside of the cpuset/mempolicy because you are breaking contract
> > expected by the userspace. That implies doing a rmap walk.
> 
> OK, however, this may prevent from demoting unmapped page cache since there
> is no way to find those pages' policy.

I do not really expect that hard numa binding for the page cache is a
usecase we really have to lose sleep over for now.

> And, we have to think about what we should do when the demotion target has
> conflict with the mempolicy.

Simply skip it.

> The easiest way is to just skip those conflict
> pages in demotion. Or we may have to do the demotion one page by one page
> instead of migrating a list of pages.

Yes one page at the time sounds reasonable to me. THis is how we do
reclaim anyway.
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2019-04-17  9:17 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11  3:56 [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node Yang Shi
2019-04-11  3:56 ` [v2 PATCH 1/9] mm: define N_CPU_MEM node states Yang Shi
2019-04-11  3:56 ` [v2 PATCH 2/9] mm: page_alloc: make find_next_best_node find return cpuless node Yang Shi
2019-04-11  3:56 ` [v2 PATCH 3/9] mm: numa: promote pages to DRAM when it gets accessed twice Yang Shi
2019-04-11  3:56 ` [v2 PATCH 4/9] mm: migrate: make migrate_pages() return nr_succeeded Yang Shi
2019-04-11  3:56 ` [v2 PATCH 5/9] mm: vmscan: demote anon DRAM pages to PMEM node Yang Shi
2019-04-11 14:31   ` Dave Hansen
2019-04-15 22:10     ` Yang Shi
2019-04-15 22:14       ` Dave Hansen
2019-04-15 22:26         ` Yang Shi
2019-04-11  3:56 ` [v2 PATCH 6/9] mm: vmscan: don't demote for memcg reclaim Yang Shi
2019-04-11  3:56 ` [v2 PATCH 7/9] mm: vmscan: check if the demote target node is contended or not Yang Shi
2019-04-11 16:06   ` Dave Hansen
2019-04-15 22:06     ` Yang Shi
2019-04-15 22:13       ` Dave Hansen
2019-04-15 22:23         ` Yang Shi
2019-04-11  3:56 ` [v2 PATCH 8/9] mm: vmscan: add page demotion counter Yang Shi
2019-04-11  3:56 ` [v2 PATCH 9/9] mm: numa: add page promotion counter Yang Shi
2019-04-11 14:28 ` [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node Dave Hansen
2019-04-12  8:47 ` Michal Hocko
2019-04-16  0:09   ` Yang Shi
2019-04-16  7:47     ` Michal Hocko
2019-04-16 14:30       ` Dave Hansen
2019-04-16 14:39         ` Michal Hocko
2019-04-16 15:46           ` Dave Hansen
2019-04-16 18:34             ` Michal Hocko
2019-04-16 15:33         ` Zi Yan
2019-04-16 15:55           ` Dave Hansen
2019-04-16 16:12             ` Zi Yan
2019-04-16 19:19       ` Yang Shi
2019-04-16 21:22         ` Dave Hansen
2019-04-16 21:59           ` Yang Shi
2019-04-16 23:04             ` Dave Hansen
2019-04-16 23:17               ` Yang Shi
2019-04-17 15:13                 ` Keith Busch
2019-04-17  9:23           ` Michal Hocko
2019-04-17 15:23             ` Keith Busch
2019-04-17 15:39               ` Michal Hocko
2019-04-17 15:37                 ` Keith Busch
2019-04-17 16:39                   ` Michal Hocko
2019-04-17 17:26                     ` Yang Shi
2019-04-17 17:29                       ` Keith Busch
2019-04-17 17:51                       ` Michal Hocko
2019-04-18 16:24                         ` Yang Shi
2019-04-17 17:13             ` Dave Hansen
2019-04-17 17:57               ` Michal Hocko
2019-04-18 18:16               ` Keith Busch
2019-04-18 19:23                 ` Yang Shi
2019-04-18 21:07                   ` Zi Yan
2019-04-16 23:18         ` Yang Shi
2019-04-17  9:17         ` Michal Hocko [this message]
2019-05-01  6:43           ` Fengguang Wu
2019-04-17 20:43         ` Yang Shi
2019-04-18  9:02           ` Michal Hocko
2019-05-01  5:20             ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190417091748.GF655@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=fan.du@intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=riel@surriel.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).