nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/14] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE
@ 2018-07-16 17:00 Dan Williams
  2018-07-16 17:00 ` [PATCH v2 01/14] mm: Plumb dev_pagemap instead of vmem_altmap to memmap_init_zone() Dan Williams
                   ` (14 more replies)
  0 siblings, 15 replies; 28+ messages in thread
From: Dan Williams @ 2018-07-16 17:00 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, jack, Benjamin Herrenschmidt, Heiko Carstens,
	linux-mm, Rich Felker, Paul Mackerras, H. Peter Anvin,
	Christoph Hellwig, Yoshinori Sato, linux-nvdimm, x86,
	Pavel Tatashin, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
	Fenghua Yu, Jérôme Glisse, Thomas Gleixner, Tony Luck,
	linux-kernel, Michael Ellerman, Martin Schwidefsky

Changes since v1 [1]:
* Teach memmap_sync() to take over a sub-set of memmap initialization in
  the foreground. This foreground work still needs to await the
  completion of vmemmap_populate_hugepages(), but it will otherwise
  steal 1/1024th of the 'struct page' init work for the given range.
  (Jan)
* Add kernel-doc for all the new 'async' structures.
* Split foreach_order_pgoff() to its own patch.
* Add Pavel and Daniel to the cc as they have been active in the memory
  hotplug code.
* Fix a typo that prevented CONFIG_DAX_DRIVER_DEBUG=y from performing
  early pfn retrieval at dax-filesystem mount time.
* Improve some of the changelogs

[1]: https://lwn.net/Articles/759117/

---

In order to keep pfn_to_page() a simple offset calculation the 'struct
page' memmap needs to be mapped and initialized in advance of any usage
of a page. This poses a problem for large memory systems as it delays
full availability of memory resources for 10s to 100s of seconds.

For typical 'System RAM' the problem is mitigated by the fact that large
memory allocations tend to happen after the kernel has fully initialized
and userspace services / applications are launched. A small amount, 2GB
of memory, is initialized up front. The remainder is initialized in the
background and freed to the page allocator over time.

Unfortunately, that scheme is not directly reusable for persistent
memory and dax because userspace has visibility to the entire resource
pool and can choose to access any offset directly at its choosing. In
other words there is no allocator indirection where the kernel can
satisfy requests with arbitrary pages as they become initialized.

That said, we can approximate the optimization by performing the
initialization in the background, allow the kernel to fully boot the
platform, start up pmem block devices, mount filesystems in dax mode,
and only incur delay at the first userspace dax fault. When that initial
fault occurs that process is delegated a portion of the memmap to
initialize in the foreground so that it need not wait for initialization
of resources that it does not immediately need.

With this change an 8 socket system was observed to initialize pmem
namespaces in ~4 seconds whereas it was previously taking ~4 minutes.

These patches apply on top of the HMM + devm_memremap_pages() reworks:

https://marc.info/?l=linux-mm&m=153128668008585&w=2

---

Dan Williams (10):
      mm: Plumb dev_pagemap instead of vmem_altmap to memmap_init_zone()
      mm: Enable asynchronous __add_pages() and vmemmap_populate_hugepages()
      mm: Teach memmap_init_zone() to initialize ZONE_DEVICE pages
      mm: Multithread ZONE_DEVICE initialization
      mm, memremap: Up-level foreach_order_pgoff()
      mm: Allow an external agent to coordinate memmap initialization
      filesystem-dax: Make mount time pfn validation a debug check
      libnvdimm, pmem: Initialize the memmap in the background
      device-dax: Initialize the memmap in the background
      libnvdimm, namespace: Publish page structure init state / control

Huaisheng Ye (4):
      libnvdimm, pmem: Allow a NULL-pfn to ->direct_access()
      tools/testing/nvdimm: Allow a NULL-pfn to ->direct_access()
      s390, dcssblk: Allow a NULL-pfn to ->direct_access()
      filesystem-dax: Do not request a pfn when not required


 arch/ia64/mm/init.c             |    5 +
 arch/powerpc/mm/mem.c           |    5 +
 arch/s390/mm/init.c             |    8 +
 arch/sh/mm/init.c               |    5 +
 arch/x86/mm/init_32.c           |    8 +
 arch/x86/mm/init_64.c           |   27 ++--
 drivers/dax/Kconfig             |   10 +
 drivers/dax/dax-private.h       |    2 
 drivers/dax/device-dax.h        |    2 
 drivers/dax/device.c            |   16 ++
 drivers/dax/pmem.c              |    5 +
 drivers/dax/super.c             |   64 ++++++---
 drivers/nvdimm/nd.h             |    2 
 drivers/nvdimm/pfn_devs.c       |   50 +++++--
 drivers/nvdimm/pmem.c           |   17 ++
 drivers/nvdimm/pmem.h           |    1 
 drivers/s390/block/dcssblk.c    |    5 -
 fs/dax.c                        |   10 -
 include/linux/memmap_async.h    |  110 ++++++++++++++++
 include/linux/memory_hotplug.h  |   18 ++-
 include/linux/memremap.h        |   31 ++++
 include/linux/mm.h              |    8 +
 kernel/memremap.c               |   85 ++++++------
 mm/memory_hotplug.c             |   73 ++++++++---
 mm/page_alloc.c                 |  271 +++++++++++++++++++++++++++++++++++----
 mm/sparse-vmemmap.c             |   56 ++++++--
 tools/testing/nvdimm/pmem-dax.c |   11 +-
 27 files changed, 717 insertions(+), 188 deletions(-)
 create mode 100644 include/linux/memmap_async.h
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-09-10 19:47 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-16 17:00 [PATCH v2 00/14] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE Dan Williams
2018-07-16 17:00 ` [PATCH v2 01/14] mm: Plumb dev_pagemap instead of vmem_altmap to memmap_init_zone() Dan Williams
2018-07-16 17:00 ` [PATCH v2 02/14] mm: Enable asynchronous __add_pages() and vmemmap_populate_hugepages() Dan Williams
2018-07-16 17:00 ` [PATCH v2 03/14] mm: Teach memmap_init_zone() to initialize ZONE_DEVICE pages Dan Williams
2018-07-16 17:00 ` [PATCH v2 04/14] mm: Multithread ZONE_DEVICE initialization Dan Williams
2018-07-16 17:00 ` [PATCH v2 05/14] mm, memremap: Up-level foreach_order_pgoff() Dan Williams
2018-07-16 21:00   ` Matthew Wilcox
2018-07-16 17:00 ` [PATCH v2 06/14] mm: Allow an external agent to coordinate memmap initialization Dan Williams
2018-07-16 17:00 ` [PATCH v2 07/14] libnvdimm, pmem: Allow a NULL-pfn to ->direct_access() Dan Williams
2018-07-16 17:01 ` [PATCH v2 08/14] tools/testing/nvdimm: " Dan Williams
2018-07-16 17:01 ` [PATCH v2 09/14] s390, dcssblk: " Dan Williams
2018-07-16 17:01 ` [PATCH v2 10/14] filesystem-dax: Do not request a pfn when not required Dan Williams
2018-07-16 17:01 ` [PATCH v2 11/14] filesystem-dax: Make mount time pfn validation a debug check Dan Williams
2018-07-16 17:01 ` [PATCH v2 12/14] libnvdimm, pmem: Initialize the memmap in the background Dan Williams
2018-07-16 17:01 ` [PATCH v2 13/14] device-dax: " Dan Williams
2018-07-16 17:01 ` [PATCH v2 14/14] libnvdimm, namespace: Publish page structure init state / control Dan Williams
2018-07-16 19:12 ` [PATCH v2 00/14] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE Pavel Tatashin
2018-07-16 20:30   ` Dan Williams
2018-07-17 14:46     ` Pavel Tatashin
2018-07-17 15:50       ` Michal Hocko
2018-07-17 17:32         ` Dan Williams
2018-07-18 12:05           ` Michal Hocko
2018-07-19 18:41             ` Dave Hansen
2018-07-23 11:09               ` Michal Hocko
2018-07-23 16:15                 ` Dave Hansen
2018-07-24  7:29                   ` Michal Hocko
2018-09-10 19:06                     ` Dan Williams
2018-09-10 19:47                       ` Alexander Duyck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).