Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Keith Busch <kbusch@kernel.org>,
	Yang Shi <yang.shi@linux.alibaba.com>, <mhocko@suse.com>,
	<mgorman@techsingularity.net>, <riel@surriel.com>,
	<hannes@cmpxchg.org>, <akpm@linux-foundation.org>,
	"Busch, Keith" <keith.busch@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"Wu, Fengguang" <fengguang.wu@intel.com>,
	"Du, Fan" <fan.du@intel.com>,
	"Huang, Ying" <ying.huang@intel.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 06/10] mm: vmscan: demote anon DRAM pages to PMEM node
Date: Wed, 27 Mar 2019 13:37:46 -0700
Message-ID: <6A903D34-A293-4056-B135-6FA227DE1828@nvidia.com> (raw)
In-Reply-To: <3fd20a95-7f2d-f395-73f6-21561eae9912@intel.com>

[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]

On 27 Mar 2019, at 11:00, Dave Hansen wrote:

> On 3/27/19 10:48 AM, Zi Yan wrote:
>> For 40MB/s vs 750MB/s, they were using sys_migrate_pages(). Sorry
>> about the confusion there. As I measure only the migrate_pages() in
>> the kernel, the throughput becomes: migrating 4KB page: 0.312GB/s
>> vs migrating 512 4KB pages: 0.854GB/s. They are still >2x
>> difference.
>>
>> Furthermore, if we only consider the migrate_page_copy() in
>> mm/migrate.c, which only calls copy_highpage() and
>> migrate_page_states(), the throughput becomes: migrating 4KB page:
>> 1.385GB/s vs migrating 512 4KB pages: 1.983GB/s. The gap is
>> smaller, but migrating 512 4KB pages still achieves 40% more
>> throughput.
>>
>> Do these numbers make sense to you?
>
> Yes.  It would be very interesting to batch the migrations in the
> kernel and see how it affects the code.  A 50% boost is interesting,
> but not if it's only in microbenchmarks and takes 2k lines of code.
>
> 50% is *very* interesting if it happens in the real world and we can
> do it in 10 lines of code.
>
> So, let's see what the code looks like.

Actually, the migration throughput difference does not come from any kernel
changes, it is a pure comparison between migrate_pages(single 4KB page) and
migrate_pages(a list of 4KB pages). The point I wanted to make is that
Yang’s approach, which migrates a list of pages at the end of shrink_page_list(),
can achieve higher throughput than Keith’s approach, which migrates one page
at a time in the while loop inside shrink_page_list().

In addition to the above, migrating a single THP can get us even higher throughput.
Here is the throughput numbers comparing all three cases:
                             |  migrate_page()  |    migrate_page_copy()
migrating single 4KB page:   |  0.312GB/s       |   1.385GB/s
migrating 512 4KB pages:     |  0.854GB/s       |   1.983GB/s
migrating single 2MB THP:    |  2.387GB/s       |   2.481GB/s

Obviously, we would like to migrate THPs as a whole instead of 512 4KB pages
individually. Of course, this assumes we have free space in PMEM for THPs and
all subpages in the THPs are cold.


To batch the migration, I posted some code a while ago: https://lwn.net/Articles/714991/,
which show good throughput improvement for microbenchmarking sys_migrate_page().
It also included using multi threads to copy a page, aggregate multiple migrate_page_copy(),
and even using DMA instead of CPUs to copy data. We could revisit the code if necessary.

In terms of end-to-end results, I also have some results from my paper:
http://www.cs.yale.edu/homes/abhishek/ziyan-asplos19.pdf (Figure 8 to Figure 11 show the
microbenchmark result and Figure 12 shows end-to-end results). I basically called
shrink_active/inactive_list() every 5 seconds to track page hotness and used all my page
migration optimizations above, which can get 40% application runtime speedup on average.
The experiments were done in a two-socket NUMA machine where one node was slowed down to
have 1/2 BW and 2x access latency, compared to the other node. I can discuss about it
more if you are interested.


--
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

  reply index

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-23  4:44 [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node Yang Shi
2019-03-23  4:44 ` [PATCH 01/10] mm: control memory placement by nodemask for two tier main memory Yang Shi
2019-03-23 17:21   ` Dan Williams
2019-03-25 19:28     ` Yang Shi
2019-03-25 23:18       ` Dan Williams
2019-03-25 23:36         ` Yang Shi
2019-03-25 23:42           ` Dan Williams
2019-03-23  4:44 ` [PATCH 02/10] mm: mempolicy: introduce MPOL_HYBRID policy Yang Shi
2019-03-23  4:44 ` [PATCH 03/10] mm: mempolicy: promote page to DRAM for MPOL_HYBRID Yang Shi
2019-03-23  4:44 ` [PATCH 04/10] mm: numa: promote pages to DRAM when it is accessed twice Yang Shi
2019-03-29  0:31   ` kbuild test robot
2019-03-23  4:44 ` [PATCH 05/10] mm: page_alloc: make find_next_best_node could skip DRAM node Yang Shi
2019-03-23  4:44 ` [PATCH 06/10] mm: vmscan: demote anon DRAM pages to PMEM node Yang Shi
2019-03-23  6:03   ` Zi Yan
2019-03-25 21:49     ` Yang Shi
2019-03-24 22:20   ` Keith Busch
2019-03-25 19:49     ` Yang Shi
2019-03-27  0:35       ` Keith Busch
2019-03-27  3:41         ` Yang Shi
2019-03-27 13:08           ` Keith Busch
2019-03-27 17:00             ` Zi Yan
2019-03-27 17:05               ` Dave Hansen
2019-03-27 17:48                 ` Zi Yan
2019-03-27 18:00                   ` Dave Hansen
2019-03-27 20:37                     ` Zi Yan [this message]
2019-03-27 20:42                       ` Dave Hansen
2019-03-28 21:59             ` Yang Shi
2019-03-28 22:45               ` Keith Busch
2019-03-23  4:44 ` [PATCH 07/10] mm: vmscan: add page demotion counter Yang Shi
2019-03-23  4:44 ` [PATCH 08/10] mm: numa: add page promotion counter Yang Shi
2019-03-23  4:44 ` [PATCH 09/10] doc: add description for MPOL_HYBRID mode Yang Shi
2019-03-23  4:44 ` [PATCH 10/10] doc: elaborate the PMEM allocation rule Yang Shi
2019-03-25 16:15 ` [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node Brice Goglin
2019-03-25 16:56   ` Dan Williams
2019-03-25 17:45     ` Brice Goglin
2019-03-25 19:29       ` Dan Williams
2019-03-25 23:09         ` Brice Goglin
2019-03-25 23:37           ` Dan Williams
2019-03-26 12:19             ` Jonathan Cameron
2019-03-25 20:04   ` Yang Shi
2019-03-26 13:58 ` Michal Hocko
2019-03-26 18:33   ` Yang Shi
2019-03-26 18:37     ` Michal Hocko
2019-03-27  2:58       ` Yang Shi
2019-03-27  9:01         ` Michal Hocko
2019-03-27 17:34           ` Dan Williams
2019-03-27 18:59             ` Yang Shi
2019-03-27 20:09               ` Michal Hocko
2019-03-28  2:09                 ` Yang Shi
2019-03-28  6:58                   ` Michal Hocko
2019-03-28 18:58                     ` Yang Shi
2019-03-28 19:12                       ` Michal Hocko
2019-03-28 19:40                         ` Yang Shi
2019-03-28 20:40                           ` Michal Hocko
2019-03-28  8:21                   ` Dan Williams
2019-03-27 20:14               ` Dave Hansen
2019-03-27 20:35             ` Matthew Wilcox
2019-03-27 20:40               ` Dave Hansen

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6A903D34-A293-4056-B135-6FA227DE1828@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=fan.du@intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=kbusch@kernel.org \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox