From: Jan Kara <jack@suse.cz> To: Dan Williams <dan.j.williams@intel.com> Cc: "Michal Hocko" <mhocko@suse.com>, "Jan Kara" <jack@suse.cz>, "Benjamin Herrenschmidt" <benh@kernel.crashing.org>, "Heiko Carstens" <heiko.carstens@de.ibm.com>, linux-mm@kvack.org, "Rich Felker" <dalias@libc.org>, "Paul Mackerras" <paulus@samba.org>, "H. Peter Anvin" <hpa@zytor.com>, "Christoph Hellwig" <hch@lst.de>, "Yoshinori Sato" <ysato@users.sourceforge.jp>, linux-nvdimm@lists.01.org, x86@kernel.org, "Ingo Molnar" <mingo@redhat.com>, "Fenghua Yu" <fenghua.yu@intel.com>, "Jérôme Glisse" <jglisse@redhat.com>, "Thomas Gleixner" <tglx@linutronix.de>, "Vlastimil Babka" <vbabka@suse.cz>, "Tony Luck" <tony.luck@intel.com>, linux-kernel@vger.kernel.org, "Michael Ellerman" <mpe@ellerman.id.au>, "Martin Schwidefsky" <schwidefsky@de.ibm.com>, akpm@linux-foundation.org Subject: Re: [PATCH 00/13] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE Date: Mon, 9 Jul 2018 14:56:41 +0200 [thread overview] Message-ID: <20180709125641.xpoq66p4r7dzsgyj@quack2.suse.cz> (raw) In-Reply-To: <153077334130.40830.2714147692560185329.stgit@dwillia2-desk3.amr.corp.intel.com> On Wed 04-07-18 23:49:02, Dan Williams wrote: > In order to keep pfn_to_page() a simple offset calculation the 'struct > page' memmap needs to be mapped and initialized in advance of any usage > of a page. This poses a problem for large memory systems as it delays > full availability of memory resources for 10s to 100s of seconds. > > For typical 'System RAM' the problem is mitigated by the fact that large > memory allocations tend to happen after the kernel has fully initialized > and userspace services / applications are launched. A small amount, 2GB > of memory, is initialized up front. The remainder is initialized in the > background and freed to the page allocator over time. > > Unfortunately, that scheme is not directly reusable for persistent > memory and dax because userspace has visibility to the entire resource > pool and can choose to access any offset directly at its choosing. In > other words there is no allocator indirection where the kernel can > satisfy requests with arbitrary pages as they become initialized. > > That said, we can approximate the optimization by performing the > initialization in the background, allow the kernel to fully boot the > platform, start up pmem block devices, mount filesystems in dax mode, > and only incur the delay at the first userspace dax fault. > > With this change an 8 socket system was observed to initialize pmem > namespaces in ~4 seconds whereas it was previously taking ~4 minutes. > > These patches apply on top of the HMM + devm_memremap_pages() reworks > [1]. Andrew, once the reviews come back, please consider this series for > -mm as well. > > [1]: https://lkml.org/lkml/2018/6/19/108 One question: Why not (in addition to background initialization) have ->direct_access() initialize a block of struct pages around the pfn it needs if it finds it's not initialized yet? That would make devices usable immediately without waiting for init to complete... Honza > > --- > > Dan Williams (9): > mm: Plumb dev_pagemap instead of vmem_altmap to memmap_init_zone() > mm: Enable asynchronous __add_pages() and vmemmap_populate_hugepages() > mm: Teach memmap_init_zone() to initialize ZONE_DEVICE pages > mm: Multithread ZONE_DEVICE initialization > mm: Allow an external agent to wait for memmap initialization > filesystem-dax: Make mount time pfn validation a debug check > libnvdimm, pmem: Initialize the memmap in the background > device-dax: Initialize the memmap in the background > libnvdimm, namespace: Publish page structure init state / control > > Huaisheng Ye (4): > nvdimm/pmem: check the validity of the pointer pfn > nvdimm/pmem-dax: check the validity of the pointer pfn > s390/block/dcssblk: check the validity of the pointer pfn > fs/dax: Assign NULL to pfn of dax_direct_access if useless > > > arch/ia64/mm/init.c | 5 + > arch/powerpc/mm/mem.c | 5 + > arch/s390/mm/init.c | 8 + > arch/sh/mm/init.c | 5 + > arch/x86/mm/init_32.c | 8 + > arch/x86/mm/init_64.c | 27 +++-- > drivers/dax/Kconfig | 10 ++ > drivers/dax/dax-private.h | 2 > drivers/dax/device-dax.h | 2 > drivers/dax/device.c | 16 +++ > drivers/dax/pmem.c | 5 + > drivers/dax/super.c | 64 +++++++----- > drivers/nvdimm/nd.h | 2 > drivers/nvdimm/pfn_devs.c | 54 ++++++++-- > drivers/nvdimm/pmem.c | 17 ++- > drivers/nvdimm/pmem.h | 1 > drivers/s390/block/dcssblk.c | 5 + > fs/dax.c | 10 +- > include/linux/memmap_async.h | 55 ++++++++++ > include/linux/memory_hotplug.h | 18 ++- > include/linux/memremap.h | 31 ++++++ > include/linux/mm.h | 8 + > kernel/memremap.c | 85 ++++++++------- > mm/memory_hotplug.c | 73 ++++++++++--- > mm/page_alloc.c | 215 +++++++++++++++++++++++++++++++++------ > mm/sparse-vmemmap.c | 56 ++++++++-- > tools/testing/nvdimm/pmem-dax.c | 11 ++ > 27 files changed, 610 insertions(+), 188 deletions(-) > create mode 100644 include/linux/memmap_async.h -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz> To: Dan Williams <dan.j.williams@intel.com> Cc: akpm@linux-foundation.org, "Tony Luck" <tony.luck@intel.com>, "Huaisheng Ye" <yehs1@lenovo.com>, "Vishal Verma" <vishal.l.verma@intel.com>, "Jan Kara" <jack@suse.cz>, "Dave Jiang" <dave.jiang@intel.com>, "H. Peter Anvin" <hpa@zytor.com>, "Thomas Gleixner" <tglx@linutronix.de>, "Rich Felker" <dalias@libc.org>, "Fenghua Yu" <fenghua.yu@intel.com>, "Yoshinori Sato" <ysato@users.sourceforge.jp>, "Benjamin Herrenschmidt" <benh@kernel.crashing.org>, "Michal Hocko" <mhocko@suse.com>, "Paul Mackerras" <paulus@samba.org>, "Christoph Hellwig" <hch@lst.de>, "Jérôme Glisse" <jglisse@redhat.com>, "Ingo Molnar" <mingo@redhat.com>, "Johannes Thumshirn" <jthumshirn@suse.de>, "Michael Ellerman" <mpe@ellerman.id.au>, "Heiko Carstens" <heiko.carstens@de.ibm.com>, x86@kernel.org, "Logan Gunthorpe" <logang@deltatee.com>, "Ross Zwisler" <ross.zwisler@linux.intel.com>, "Jeff Moyer" <jmoyer@redhat.com>, "Vlastimil Babka" <vbabka@suse.cz>, "Martin Schwidefsky" <schwidefsky@de.ibm.com>, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 00/13] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE Date: Mon, 9 Jul 2018 14:56:41 +0200 [thread overview] Message-ID: <20180709125641.xpoq66p4r7dzsgyj@quack2.suse.cz> (raw) In-Reply-To: <153077334130.40830.2714147692560185329.stgit@dwillia2-desk3.amr.corp.intel.com> On Wed 04-07-18 23:49:02, Dan Williams wrote: > In order to keep pfn_to_page() a simple offset calculation the 'struct > page' memmap needs to be mapped and initialized in advance of any usage > of a page. This poses a problem for large memory systems as it delays > full availability of memory resources for 10s to 100s of seconds. > > For typical 'System RAM' the problem is mitigated by the fact that large > memory allocations tend to happen after the kernel has fully initialized > and userspace services / applications are launched. A small amount, 2GB > of memory, is initialized up front. The remainder is initialized in the > background and freed to the page allocator over time. > > Unfortunately, that scheme is not directly reusable for persistent > memory and dax because userspace has visibility to the entire resource > pool and can choose to access any offset directly at its choosing. In > other words there is no allocator indirection where the kernel can > satisfy requests with arbitrary pages as they become initialized. > > That said, we can approximate the optimization by performing the > initialization in the background, allow the kernel to fully boot the > platform, start up pmem block devices, mount filesystems in dax mode, > and only incur the delay at the first userspace dax fault. > > With this change an 8 socket system was observed to initialize pmem > namespaces in ~4 seconds whereas it was previously taking ~4 minutes. > > These patches apply on top of the HMM + devm_memremap_pages() reworks > [1]. Andrew, once the reviews come back, please consider this series for > -mm as well. > > [1]: https://lkml.org/lkml/2018/6/19/108 One question: Why not (in addition to background initialization) have ->direct_access() initialize a block of struct pages around the pfn it needs if it finds it's not initialized yet? That would make devices usable immediately without waiting for init to complete... Honza > > --- > > Dan Williams (9): > mm: Plumb dev_pagemap instead of vmem_altmap to memmap_init_zone() > mm: Enable asynchronous __add_pages() and vmemmap_populate_hugepages() > mm: Teach memmap_init_zone() to initialize ZONE_DEVICE pages > mm: Multithread ZONE_DEVICE initialization > mm: Allow an external agent to wait for memmap initialization > filesystem-dax: Make mount time pfn validation a debug check > libnvdimm, pmem: Initialize the memmap in the background > device-dax: Initialize the memmap in the background > libnvdimm, namespace: Publish page structure init state / control > > Huaisheng Ye (4): > nvdimm/pmem: check the validity of the pointer pfn > nvdimm/pmem-dax: check the validity of the pointer pfn > s390/block/dcssblk: check the validity of the pointer pfn > fs/dax: Assign NULL to pfn of dax_direct_access if useless > > > arch/ia64/mm/init.c | 5 + > arch/powerpc/mm/mem.c | 5 + > arch/s390/mm/init.c | 8 + > arch/sh/mm/init.c | 5 + > arch/x86/mm/init_32.c | 8 + > arch/x86/mm/init_64.c | 27 +++-- > drivers/dax/Kconfig | 10 ++ > drivers/dax/dax-private.h | 2 > drivers/dax/device-dax.h | 2 > drivers/dax/device.c | 16 +++ > drivers/dax/pmem.c | 5 + > drivers/dax/super.c | 64 +++++++----- > drivers/nvdimm/nd.h | 2 > drivers/nvdimm/pfn_devs.c | 54 ++++++++-- > drivers/nvdimm/pmem.c | 17 ++- > drivers/nvdimm/pmem.h | 1 > drivers/s390/block/dcssblk.c | 5 + > fs/dax.c | 10 +- > include/linux/memmap_async.h | 55 ++++++++++ > include/linux/memory_hotplug.h | 18 ++- > include/linux/memremap.h | 31 ++++++ > include/linux/mm.h | 8 + > kernel/memremap.c | 85 ++++++++------- > mm/memory_hotplug.c | 73 ++++++++++--- > mm/page_alloc.c | 215 +++++++++++++++++++++++++++++++++------ > mm/sparse-vmemmap.c | 56 ++++++++-- > tools/testing/nvdimm/pmem-dax.c | 11 ++ > 27 files changed, 610 insertions(+), 188 deletions(-) > create mode 100644 include/linux/memmap_async.h -- Jan Kara <jack@suse.com> SUSE Labs, CR
next prev parent reply other threads:[~2018-07-09 12:56 UTC|newest] Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-07-05 6:49 [PATCH 00/13] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` [PATCH 01/13] mm: Plumb dev_pagemap instead of vmem_altmap to memmap_init_zone() Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` [PATCH 02/13] mm: Enable asynchronous __add_pages() and vmemmap_populate_hugepages() Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` [PATCH 03/13] mm: Teach memmap_init_zone() to initialize ZONE_DEVICE pages Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` [PATCH 04/13] mm: Multithread ZONE_DEVICE initialization Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` [PATCH 05/13] mm: Allow an external agent to wait for memmap initialization Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` [PATCH 06/13] nvdimm/pmem: check the validity of the pointer pfn Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` [PATCH 07/13] nvdimm/pmem-dax: " Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` [PATCH 08/13] s390/block/dcssblk: " Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` [PATCH 09/13] fs/dax: Assign NULL to pfn of dax_direct_access if useless Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:49 ` [PATCH 10/13] filesystem-dax: Make mount time pfn validation a debug check Dan Williams 2018-07-05 6:49 ` Dan Williams 2018-07-05 6:50 ` [PATCH 11/13] libnvdimm, pmem: Initialize the memmap in the background Dan Williams 2018-07-05 6:50 ` Dan Williams 2018-07-05 6:50 ` [PATCH 12/13] device-dax: " Dan Williams 2018-07-05 6:50 ` Dan Williams 2018-07-05 6:50 ` [PATCH 13/13] libnvdimm, namespace: Publish page structure init state / control Dan Williams 2018-07-05 6:50 ` Dan Williams 2018-07-05 8:29 ` Johannes Thumshirn 2018-07-05 8:29 ` Johannes Thumshirn 2018-07-05 8:29 ` Johannes Thumshirn 2018-07-05 14:46 ` Dan Williams 2018-07-05 14:46 ` Dan Williams 2018-07-05 14:49 ` Johannes Thumshirn 2018-07-05 14:49 ` Johannes Thumshirn 2018-07-05 14:49 ` Johannes Thumshirn 2018-07-05 20:24 ` Andrew Morton 2018-07-05 20:24 ` Andrew Morton 2018-07-05 20:34 ` Dan Williams 2018-07-05 20:34 ` Dan Williams 2018-07-06 8:18 ` Johannes Thumshirn 2018-07-06 8:18 ` Johannes Thumshirn 2018-07-06 8:18 ` Johannes Thumshirn 2018-07-05 21:00 ` Matthew Wilcox 2018-07-05 21:00 ` Matthew Wilcox 2018-07-05 19:49 ` Matthew Wilcox 2018-07-05 19:49 ` Matthew Wilcox 2018-07-05 19:52 ` Dan Williams 2018-07-05 19:52 ` Dan Williams 2018-07-05 20:00 ` Jeff Moyer 2018-07-09 12:56 ` Jan Kara [this message] 2018-07-09 12:56 ` [PATCH 00/13] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE Jan Kara 2018-07-09 16:53 ` Dan Williams
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180709125641.xpoq66p4r7dzsgyj@quack2.suse.cz \ --to=jack@suse.cz \ --cc=akpm@linux-foundation.org \ --cc=benh@kernel.crashing.org \ --cc=dalias@libc.org \ --cc=dan.j.williams@intel.com \ --cc=fenghua.yu@intel.com \ --cc=hch@lst.de \ --cc=heiko.carstens@de.ibm.com \ --cc=hpa@zytor.com \ --cc=jglisse@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvdimm@lists.01.org \ --cc=mhocko@suse.com \ --cc=mingo@redhat.com \ --cc=mpe@ellerman.id.au \ --cc=paulus@samba.org \ --cc=schwidefsky@de.ibm.com \ --cc=tglx@linutronix.de \ --cc=tony.luck@intel.com \ --cc=vbabka@suse.cz \ --cc=x86@kernel.org \ --cc=ysato@users.sourceforge.jp \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.