All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Tatashin <pasha.tatashin@oracle.com>
To: dan.j.williams@intel.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	tony.luck@intel.com, yehs1@lenovo.com, vishal.l.verma@intel.com,
	jack@suse.cz, willy@infradead.org, dave.jiang@intel.com,
	hpa@zytor.com, tglx@linutronix.de, dalias@libc.org,
	fenghua.yu@intel.com, Daniel Jordan <daniel.m.jordan@oracle.com>,
	ysato@users.sourceforge.jp, benh@kernel.crashing.org,
	Michal Hocko <mhocko@suse.com>,
	paulus@samba.org, hch@lst.de, jglisse@redhat.com,
	mingo@redhat.com, mpe@ellerman.id.au,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	x86@kernel.org, logang@deltatee.com,
	ross.zwisler@linux.intel.com, jmoyer@redhat.com,
	jthumshirn@suse.de, schwidefsky@de.ibm.com,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-nvdimm@lists.01.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 00/14] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE
Date: Mon, 16 Jul 2018 15:12:58 -0400	[thread overview]
Message-ID: <CAGM2rea9AwQGaf1JiV_SDDKTKyP_n+dG9Z20gtTZEkuZPFnXFQ@mail.gmail.com> (raw)
In-Reply-To: <153176041838.12695.3365448145295112857.stgit@dwillia2-desk3.amr.corp.intel.com>

On Mon, Jul 16, 2018 at 1:10 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Changes since v1 [1]:
> * Teach memmap_sync() to take over a sub-set of memmap initialization in
>   the foreground. This foreground work still needs to await the
>   completion of vmemmap_populate_hugepages(), but it will otherwise
>   steal 1/1024th of the 'struct page' init work for the given range.
>   (Jan)
> * Add kernel-doc for all the new 'async' structures.
> * Split foreach_order_pgoff() to its own patch.
> * Add Pavel and Daniel to the cc as they have been active in the memory
>   hotplug code.
> * Fix a typo that prevented CONFIG_DAX_DRIVER_DEBUG=y from performing
>   early pfn retrieval at dax-filesystem mount time.
> * Improve some of the changelogs
>
> [1]: https://lwn.net/Articles/759117/
>
> ---
>
> In order to keep pfn_to_page() a simple offset calculation the 'struct
> page' memmap needs to be mapped and initialized in advance of any usage
> of a page. This poses a problem for large memory systems as it delays
> full availability of memory resources for 10s to 100s of seconds.
>
> For typical 'System RAM' the problem is mitigated by the fact that large
> memory allocations tend to happen after the kernel has fully initialized
> and userspace services / applications are launched. A small amount, 2GB
> of memory, is initialized up front. The remainder is initialized in the
> background and freed to the page allocator over time.
>
> Unfortunately, that scheme is not directly reusable for persistent
> memory and dax because userspace has visibility to the entire resource
> pool and can choose to access any offset directly at its choosing. In
> other words there is no allocator indirection where the kernel can
> satisfy requests with arbitrary pages as they become initialized.
>
> That said, we can approximate the optimization by performing the
> initialization in the background, allow the kernel to fully boot the
> platform, start up pmem block devices, mount filesystems in dax mode,
> and only incur delay at the first userspace dax fault. When that initial
> fault occurs that process is delegated a portion of the memmap to
> initialize in the foreground so that it need not wait for initialization
> of resources that it does not immediately need.
>
> With this change an 8 socket system was observed to initialize pmem
> namespaces in ~4 seconds whereas it was previously taking ~4 minutes.

Hi Dan,

I am worried that this work adds another way to multi-thread struct
page initialization without re-use of already existing method. The
code is already a mess, and leads to bugs [1] because of the number of
different memory layouts, architecture specific quirks, and different
struct page initialization methods.

So, when DEFERRED_STRUCT_PAGE_INIT is used we initialize struct pages
on demand until page_alloc_init_late() is called, and at that time we
initialize all the rest of struct pages by calling:

page_alloc_init_late()
  deferred_init_memmap() (a thread per node)
    deferred_init_pages()
       __init_single_page()

This is because memmap_init_zone() is not multi-threaded. However,
this work makes memmap_init_zone() multi-threaded. So, I think we
should really be either be using deferred_init_memmap() here, or teach
DEFERRED_STRUCT_PAGE_INIT to use new multi-threaded memmap_init_zone()
but not both.

I am planning to study the memmap layouts, and figure out how can we
reduce their number or merge some of the code, and also, I'd like to
simplify memmap_init_zone() by at least splitting it into two
functions: one that handles the boot case, and another that handles
the hotplug case, as those are substantially different, and make
memmap_init_zone() more complicated than needed.

Thank you,
Pavel

[1] https://www.spinics.net/lists/linux-mm/msg157271.html

>
> These patches apply on top of the HMM + devm_memremap_pages() reworks:

  parent reply	other threads:[~2018-07-16 19:12 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 17:00 [PATCH v2 00/14] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE Dan Williams
2018-07-16 17:00 ` Dan Williams
2018-07-16 17:00 ` Dan Williams
2018-07-16 17:00 ` [PATCH v2 01/14] mm: Plumb dev_pagemap instead of vmem_altmap to memmap_init_zone() Dan Williams
2018-07-16 17:00   ` Dan Williams
2018-07-16 17:00 ` [PATCH v2 02/14] mm: Enable asynchronous __add_pages() and vmemmap_populate_hugepages() Dan Williams
2018-07-16 17:00   ` Dan Williams
2018-07-16 17:00 ` [PATCH v2 03/14] mm: Teach memmap_init_zone() to initialize ZONE_DEVICE pages Dan Williams
2018-07-16 17:00   ` Dan Williams
2018-07-16 17:00   ` Dan Williams
2018-07-16 17:00 ` [PATCH v2 04/14] mm: Multithread ZONE_DEVICE initialization Dan Williams
2018-07-16 17:00 ` [PATCH v2 05/14] mm, memremap: Up-level foreach_order_pgoff() Dan Williams
2018-07-16 17:00   ` Dan Williams
2018-07-16 21:00   ` Matthew Wilcox
2018-07-16 21:00     ` Matthew Wilcox
2018-07-16 17:00 ` [PATCH v2 06/14] mm: Allow an external agent to coordinate memmap initialization Dan Williams
2018-07-16 17:00   ` Dan Williams
2018-07-16 17:00   ` Dan Williams
2018-07-16 17:00 ` [PATCH v2 07/14] libnvdimm, pmem: Allow a NULL-pfn to ->direct_access() Dan Williams
2018-07-16 17:00   ` Dan Williams
2018-07-16 17:00   ` Dan Williams
2018-07-16 17:01 ` [PATCH v2 08/14] tools/testing/nvdimm: " Dan Williams
2018-07-16 17:01   ` Dan Williams
2018-07-16 17:01   ` Dan Williams
2018-07-16 17:01 ` [PATCH v2 09/14] s390, dcssblk: " Dan Williams
2018-07-16 17:01   ` Dan Williams
2018-07-16 17:01   ` Dan Williams
2018-07-16 17:01 ` [PATCH v2 10/14] filesystem-dax: Do not request a pfn when not required Dan Williams
2018-07-16 17:01   ` Dan Williams
2018-07-16 17:01   ` Dan Williams
2018-07-16 17:01 ` [PATCH v2 11/14] filesystem-dax: Make mount time pfn validation a debug check Dan Williams
2018-07-16 17:01   ` Dan Williams
2018-07-16 17:01   ` Dan Williams
2018-07-16 17:01 ` [PATCH v2 12/14] libnvdimm, pmem: Initialize the memmap in the background Dan Williams
2018-07-16 17:01   ` Dan Williams
2018-07-16 17:01 ` [PATCH v2 13/14] device-dax: " Dan Williams
2018-07-16 17:01   ` Dan Williams
2018-07-16 17:01 ` [PATCH v2 14/14] libnvdimm, namespace: Publish page structure init state / control Dan Williams
2018-07-16 17:01   ` Dan Williams
2018-07-16 19:12 ` Pavel Tatashin [this message]
2018-07-16 20:30   ` [PATCH v2 00/14] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE Dan Williams
2018-07-16 20:30     ` Dan Williams
2018-07-17 14:46     ` Pavel Tatashin
2018-07-17 14:46       ` Pavel Tatashin
2018-07-17 15:50       ` Michal Hocko
2018-07-17 15:50         ` Michal Hocko
2018-07-17 17:32         ` Dan Williams
2018-07-17 17:32           ` Dan Williams
2018-07-17 17:32           ` Dan Williams
2018-07-18 12:05           ` Michal Hocko
2018-07-18 12:05             ` Michal Hocko
2018-07-19 18:41             ` Dave Hansen
2018-07-19 18:41               ` Dave Hansen
2018-07-23 11:09               ` Michal Hocko
2018-07-23 16:15                 ` Dave Hansen
2018-07-23 16:15                   ` Dave Hansen
2018-07-24  7:29                   ` Michal Hocko
2018-09-10 19:06                     ` Dan Williams
2018-09-10 19:06                       ` Dan Williams
2018-09-10 19:47                       ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGM2rea9AwQGaf1JiV_SDDKTKyP_n+dG9Z20gtTZEkuZPFnXFQ@mail.gmail.com \
    --to=pasha.tatashin@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=dalias@libc.org \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.jiang@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=hch@lst.de \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=jmoyer@redhat.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=logang@deltatee.com \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=yehs1@lenovo.com \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.