linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Keith Busch <keith.busch@intel.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-nvdimm@lists.01.org>, Dave Hansen <dave.hansen@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	John Hubbard <jhubbard@nvidia.com>,
	Michal Hocko <mhocko@suse.com>,
	David Nellans <dnellans@nvidia.com>
Subject: Re: [PATCH 0/5] Page demotion for memory reclaim
Date: Thu, 21 Mar 2019 17:12:33 -0700	[thread overview]
Message-ID: <F33CDC43-745B-4555-B8E0-D50D8024C727@nvidia.com> (raw)
In-Reply-To: <20190321223706.GA29817@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 5729 bytes --]

<snip>
>> 2. For the demotion path, a common case would be from high-performance memory, like HBM
>> or Multi-Channel DRAM, to DRAM, then to PMEM, and finally to disks, right? More general
>> case for demotion path would be derived from the memory performance description from HMAT[1],
>> right? Do you have any algorithm to form such a path from HMAT?
>
> Yes, I have a PoC for the kernel setting up a demotion path based on
> HMAT properties here:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/kbusch/linux.git/commit/?h=mm-migrate&id=4d007659e1dd1b0dad49514348be4441fbe7cadb
>
> The above is just from an experimental branch.

Got it. Thanks.

>
>> 3. Do you have a plan for promoting pages from lower-level memory to higher-level memory,
>> like from PMEM to DRAM? Will this one-way demotion make all pages sink to PMEM and disk?
>
> Promoting previously demoted pages would require the application do
> something to make that happen if you turn demotion on with this series.
> Kernel auto-promotion is still being investigated, and it's a little
> trickier than reclaim.
>
> If it sinks to disk, though, the next access behavior is the same as
> before, without this series.

This means, when demotion is on, the path for a page would be DRAM->PMEM->Disk->DRAM->PMEM->… .
This could be a start point.

I actually did something similar here for two-level heterogeneous memory structure: https://github.com/ysarch-lab/nimble_page_management_asplos_2019/blob/nimble_page_management_4_14_78/mm/memory_manage.c#L401.
What I did basically was calling shrink_page_list() periodically, so pages will be separated
in active and inactive lists. Then, pages in the _inactive_ list of fast memory (like DRAM)
are migrated to slow memory (like PMEM) and pages in the _active_ list of slow memory are migrated
to fast memory. It is kinda of abusing the existing page lists. :)

My conclusion from that experiments is that you need high-throughput page migration mechanisms,
like multi-threaded page migration, migrating a bunch of pages in a batch (https://github.com/ysarch-lab/nimble_page_management_asplos_2019/blob/nimble_page_management_4_14_78/mm/copy_page.c), and
a new mechanism called exchange pages (https://github.com/ysarch-lab/nimble_page_management_asplos_2019/blob/nimble_page_management_4_14_78/mm/exchange.c), so that using page migration to manage multi-level
memory systems becomes useful. Otherwise, the overheads (TLB shootdown and other kernel activities
in the page migration process) of page migration may kill the benefit. Because the performance
gap between DRAM and PMEM is supposed to be smaller than the one between DRAM and disk,
the benefit of putting data in DRAM might not compensate the cost of migrating cold pages from DRAM
to PMEM. Namely, directly putting data in PMEM after DRAM is full might be better.


>> 4. In your patch 3, you created a new method migrate_demote_mapping() to migrate pages to
>> other memory node, is there any problem of reusing existing migrate_pages() interface?
>
> Yes, we may not want to migrate everything in the shrink_page_list()
> pages. We might want to keep a page, so we have to do those checks first. At
> the point we know we want to attempt migration, the page is already
> locked and not in a list, so it is just easier to directly invoke the
> new __unmap_and_move_locked() that migrate_pages() eventually also calls.

Right, I understand that you want to only migrate small pages to begin with. My question is
why not using the existing migrate_pages() in your patch 3. Like:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a5ad0b35ab8e..0a0753af357f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1261,6 +1261,20 @@ static unsigned long shrink_page_list(struct list_head *page_list,
                        ; /* try to reclaim the page below */
                }

+               if (!PageCompound(page)) {
+                       int next_nid = next_migration_node(page);
+                       int err;
+
+                       if (next_nid != TERMINAL_NODE) {
+                               LIST_HEAD(migrate_list);
+                               list_add(&migrate_list, &page->lru);
+                               err = migrate_pages(&migrate_list, alloc_new_node_page, NULL,
+                                       next_nid, MIGRATE_ASYNC, MR_DEMOTION);
+                               if (err)
+                                       putback_movable_pages(&migrate_list);
+                       }
+               }
+
                /*
                 * Anonymous process memory has backing store?
                 * Try to allocate it some swap space here.

Because your new migrate_demote_mapping() basically does the same thing as the code above.
If you are not OK with the gfp flags in alloc_new_node_page(), you can just write your own
alloc_new_node_page(). :)

>
>> 5. In addition, you only migrate base pages, is there any performance concern on migrating THPs?
>> Is it too costly to migrate THPs?
>
> It was just easier to consider single pages first, so we let a THP split
> if possible. I'm not sure of the cost in migrating THPs directly.

AFAICT, when migrating the same amount of 2MB data, migrating a THP is much quick than migrating
512 4KB pages. Because you save 511 TLB shootdowns in THP migration and copying 2MB contiguous data
achieves higher throughput than copying individual 4KB pages. But it highly depends on whether
any subpage in a THP is hotter than others, so migrating a THP as a whole might hurt performance
sometimes. Just some of my observation in my own experiments.


--
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

  parent reply	other threads:[~2019-03-22  0:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-21 20:01 [PATCH 0/5] Page demotion for memory reclaim Keith Busch
2019-03-21 20:01 ` [PATCH 1/5] node: Define and export memory migration path Keith Busch
2019-03-21 20:01 ` [PATCH 2/5] mm: Split handling old page for migration Keith Busch
2019-03-21 20:01 ` [PATCH 3/5] mm: Attempt to migrate page in lieu of discard Keith Busch
2019-03-21 23:58   ` Yang Shi
2019-03-22 16:34     ` Keith Busch
2019-03-21 20:01 ` [PATCH 4/5] mm: Consider anonymous pages without swap Keith Busch
2019-03-21 20:01 ` [PATCH 5/5] mm/migrate: Add page movement trace event Keith Busch
2019-03-21 21:20 ` [PATCH 0/5] Page demotion for memory reclaim Zi Yan
2019-03-21 22:37   ` Keith Busch
2019-03-21 23:02     ` Yang Shi
2019-03-22  0:20       ` Zi Yan
2019-03-22  0:12     ` Zi Yan [this message]
2019-03-22 14:41       ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F33CDC43-745B-4555-B8E0-D50D8024C727@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dnellans@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=keith.busch@intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mhocko@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).