linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wei Xu <weixugc@google.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Yang Shi <shy828301@gmail.com>,
	David Rientjes <rientjes@google.com>,
	Huang Ying <ying.huang@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH 02/10] mm/numa: automatically generate node migration order
Date: Wed, 14 Apr 2021 21:07:01 -0700	[thread overview]
Message-ID: <CAAPL-u-w_WShb0RyXhs8koihTOPvFK_dCwB22RhzA=f9kRyqqQ@mail.gmail.com> (raw)
In-Reply-To: <20210414080849.GA20886@linux>

On Wed, Apr 14, 2021 at 1:08 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> Hi Wei Xu,
>
> I have some questions about it
>
> Fast class/memory are pictured as those nodes with CPUs, while Slow class/memory
> are PMEM, right?
> Then, what stands for medium class/memory?

That is Dave's example.  I think David's guess makes sense (HBM - fast, DRAM -
medium, PMEM - slow).  It may also be possible that we have DDR5 as fast,
CXL-DDR4 as medium, and CXL-PMEM as slow.  But the most likely use cases for
now should be just two tiers: DRAM vs PMEM or other types of slower
memory devices.

> In Dave's example, list is created in a way that stays local to the socket,
> and we go from the fast one to the slow one.
> In yours, lists are created taking the fastest nodes from all sockets and
> we work our way down, which means have cross-socket nodes in the list.
> How much of a penalty is that?

Cross-socket demotion is certainly more expensive.  But because it is
sequential access
and can also be optimized with non-temporal stores, it may not be much
slower than
demotion to a local node in the next tier.  The actual penalty will
depend on the devices.

> And while I get your point, I am not sure if that is what we pretend here.
> This patchset aims to place cold pages that are about to be reclaim in slower
> nodes to give them a second chance, while your design seems more to have kind
> of different memory clases and be able to place applications in one of those tiers
> depending on its demands or sysadmin-demand.
>
> Could you expand some more?

Sure.  What I have described has the same goal as Dave's patchset,
i,e, to demote
cold pages to the slower nodes when they are about to be reclaimed.  The only
difference is that in my suggestion the demotion target of a fast tier
node is expanded
from a single node to a set of nodes from the slow tier and one node
in such a set
can be marked as the preferred/local demotion target.   This can help
enable more
flexible demotion policies to be configured, such as to allow a cgroup
to allocate from
all fast tier nodes, but only demote to a local slow tier node.  Such
a policy can reduce
memory stranding at the fast tier (compared to if memory hardwall is
used) and still
allow demotion from all fast tier nodes without incurring the expensive random
accesses to the demoted pages if they were demoted to remote slow tier nodes.

I understand that Dave started this patchset with a simplified
demotion path definition,
which I agree.  Meanwhile, I think this more generalized definition of
demotion path
is useful and can also be important for some use cases.

  parent reply	other threads:[~2021-04-15  4:07 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 18:32 [PATCH 00/10] [v7][RESEND] Migrate Pages in lieu of discard Dave Hansen
2021-04-01 18:32 ` [PATCH 01/10] mm/numa: node demotion data structure and lookup Dave Hansen
2021-04-08  8:03   ` Oscar Salvador
2021-04-08 21:29     ` Dave Hansen
2021-04-09  5:32   ` Wei Xu
2021-04-01 18:32 ` [PATCH 02/10] mm/numa: automatically generate node migration order Dave Hansen
2021-04-08  8:26   ` Oscar Salvador
2021-04-08 21:51     ` Dave Hansen
2021-04-09  8:17       ` Oscar Salvador
2021-04-10  3:07   ` Wei Xu
2021-04-14  8:08     ` Oscar Salvador
2021-04-14  8:11       ` Oscar Salvador
2021-04-14  8:12       ` David Hildenbrand
2021-04-14  8:14         ` Oscar Salvador
2021-04-14  8:20           ` David Hildenbrand
2021-04-15  4:07       ` Wei Xu [this message]
2021-04-15 15:35         ` Dave Hansen
2021-04-15 20:25           ` Wei Xu
2021-04-01 18:32 ` [PATCH 03/10] mm/migrate: update node demotion order during on hotplug events Dave Hansen
2021-04-08  9:52   ` Oscar Salvador
2021-04-09 10:14     ` Oscar Salvador
2021-04-09 10:15       ` Oscar Salvador
2021-04-09 18:59       ` David Hildenbrand
2021-04-12  7:19         ` Oscar Salvador
2021-04-12  9:19           ` David Hildenbrand
2021-04-01 18:32 ` [PATCH 04/10] mm/migrate: make migrate_pages() return nr_succeeded Dave Hansen
2021-04-01 22:39   ` Wei Xu
     [not found]   ` <CAAPL-u-o-M2T25xBtSoipYjUnu+3aJNcJ9uS84yKaAnbrXpefw@mail.gmail.com>
2021-04-01 23:21     ` Dave Hansen
2021-04-08 10:14   ` Oscar Salvador
2021-04-08 17:26     ` Yang Shi
2021-04-08 18:17       ` Oscar Salvador
2021-04-08 18:21         ` Oscar Salvador
2021-04-08 20:40         ` Yang Shi
2021-04-09  5:06           ` Oscar Salvador
2021-04-09  5:43             ` Wei Xu
2021-04-09 15:43             ` Yang Shi
2021-04-09 15:50         ` Dave Hansen
2021-04-09 18:47           ` Wei Xu
2021-04-09 20:10           ` Yang Shi
2021-04-01 18:32 ` [PATCH 05/10] mm/migrate: demote pages during reclaim Dave Hansen
2021-04-01 20:01   ` Yang Shi
2021-04-01 22:58     ` Dave Hansen
2021-04-08 10:47   ` Oscar Salvador
2021-04-10  3:35   ` Wei Xu
2021-04-01 18:32 ` [PATCH 06/10] mm/vmscan: add page demotion counter Dave Hansen
2021-04-10  3:40   ` Wei Xu
2021-04-01 18:32 ` [PATCH 07/10] mm/vmscan: add helper for querying ability to age anonymous pages Dave Hansen
2021-04-07 18:40   ` Wei Xu
2021-04-09  8:31     ` Oscar Salvador
2021-04-01 18:32 ` [PATCH 08/10] mm/vmscan: Consider anonymous pages without swap Dave Hansen
2021-04-02  0:55   ` Wei Xu
2021-04-01 18:32 ` [PATCH 09/10] mm/vmscan: never demote for memcg reclaim Dave Hansen
2021-04-02  0:18   ` Wei Xu
2021-04-01 18:32 ` [PATCH 10/10] mm/migrate: new zone_reclaim_mode to enable reclaim migration Dave Hansen
2021-04-01 20:06   ` Yang Shi
2021-04-10  4:10   ` Wei Xu
2021-04-16 12:35 ` [PATCH 00/10] [v7][RESEND] Migrate Pages in lieu of discard Michal Hocko
2021-04-16 14:26   ` Dave Hansen
2021-04-16 15:02     ` Michal Hocko
2021-04-21  2:39       ` Huang, Ying
2021-05-07  6:14       ` Huang, Ying
2021-06-11  5:50       ` Huang, Ying
  -- strict thread matches above, loose matches on Subject: below --
2021-03-04 23:59 [PATCH 00/10] [v6] " Dave Hansen
2021-03-04 23:59 ` [PATCH 02/10] mm/numa: automatically generate node migration order Dave Hansen
2021-03-08 23:59   ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAPL-u-w_WShb0RyXhs8koihTOPvFK_dCwB22RhzA=f9kRyqqQ@mail.gmail.com' \
    --to=weixugc@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).