* [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
@ 2018-09-10 23:43 ` Alexander Duyck
0 siblings, 0 replies; 55+ messages in thread
From: Alexander Duyck @ 2018-09-10 23:43 UTC (permalink / raw)
To: linux-mm, linux-kernel, linux-nvdimm
Cc: pavel.tatashin, mhocko, dave.jiang, mingo, dave.hansen, jglisse,
akpm, logang, dan.j.williams, kirill.shutemov
From: Alexander Duyck <alexander.h.duyck@intel.com>
The ZONE_DEVICE pages were being initialized in two locations. One was with
the memory_hotplug lock held and another was outside of that lock. The
problem with this is that it was nearly doubling the memory initialization
time. Instead of doing this twice, once while holding a global lock and
once without, I am opting to defer the initialization to the one outside of
the lock. This allows us to avoid serializing the overhead for memory init
and we can instead focus on per-node init times.
One issue I encountered is that devm_memremap_pages and
hmm_devmmem_pages_create were initializing only the pgmap field the same
way. One wasn't initializing hmm_data, and the other was initializing it to
a poison value. Since this is something that is exposed to the driver in
the case of hmm I am opting for a third option and just initializing
hmm_data to 0 since this is going to be exposed to unknown third party
drivers.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
include/linux/mm.h | 2 +
kernel/memremap.c | 24 +++++---------
mm/hmm.c | 12 ++++---
mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 105 insertions(+), 22 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a61ebe8ad4ca..47b440bb3050 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -848,6 +848,8 @@ static inline bool is_zone_device_page(const struct page *page)
{
return page_zonenum(page) == ZONE_DEVICE;
}
+extern void memmap_init_zone_device(struct zone *, unsigned long,
+ unsigned long, struct dev_pagemap *);
#else
static inline bool is_zone_device_page(const struct page *page)
{
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 5b8600d39931..d0c32e473f82 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -175,10 +175,10 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
struct vmem_altmap *altmap = pgmap->altmap_valid ?
&pgmap->altmap : NULL;
struct resource *res = &pgmap->res;
- unsigned long pfn, pgoff, order;
+ struct dev_pagemap *conflict_pgmap;
pgprot_t pgprot = PAGE_KERNEL;
+ unsigned long pgoff, order;
int error, nid, is_ram;
- struct dev_pagemap *conflict_pgmap;
align_start = res->start & ~(SECTION_SIZE - 1);
align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
@@ -256,19 +256,13 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
if (error)
goto err_add_memory;
- for_each_device_pfn(pfn, pgmap) {
- struct page *page = pfn_to_page(pfn);
-
- /*
- * ZONE_DEVICE pages union ->lru with a ->pgmap back
- * pointer. It is a bug if a ZONE_DEVICE page is ever
- * freed or placed on a driver-private list. Seed the
- * storage with LIST_POISON* values.
- */
- list_del(&page->lru);
- page->pgmap = pgmap;
- percpu_ref_get(pgmap->ref);
- }
+ /*
+ * Initialization of the pages has been deferred until now in order
+ * to allow us to do the work while not holding the hotplug lock.
+ */
+ memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
+ align_start >> PAGE_SHIFT,
+ align_size >> PAGE_SHIFT, pgmap);
devm_add_action(dev, devm_memremap_pages_release, pgmap);
diff --git a/mm/hmm.c b/mm/hmm.c
index c968e49f7a0c..774d684fa2b4 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -1024,7 +1024,6 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
resource_size_t key, align_start, align_size, align_end;
struct device *device = devmem->device;
int ret, nid, is_ram;
- unsigned long pfn;
align_start = devmem->resource->start & ~(PA_SECTION_SIZE - 1);
align_size = ALIGN(devmem->resource->start +
@@ -1109,11 +1108,14 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
align_size >> PAGE_SHIFT, NULL);
mem_hotplug_done();
- for (pfn = devmem->pfn_first; pfn < devmem->pfn_last; pfn++) {
- struct page *page = pfn_to_page(pfn);
+ /*
+ * Initialization of the pages has been deferred until now in order
+ * to allow us to do the work while not holding the hotplug lock.
+ */
+ memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
+ align_start >> PAGE_SHIFT,
+ align_size >> PAGE_SHIFT, &devmem->pagemap);
- page->pgmap = &devmem->pagemap;
- }
return 0;
error_add_memory:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a9b095a72fd9..81a3fd942c45 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5454,6 +5454,83 @@ void __ref build_all_zonelists(pg_data_t *pgdat)
#endif
}
+#ifdef CONFIG_ZONE_DEVICE
+void __ref memmap_init_zone_device(struct zone *zone, unsigned long pfn,
+ unsigned long size,
+ struct dev_pagemap *pgmap)
+{
+ struct pglist_data *pgdat = zone->zone_pgdat;
+ unsigned long zone_idx = zone_idx(zone);
+ unsigned long end_pfn = pfn + size;
+ unsigned long start = jiffies;
+ int nid = pgdat->node_id;
+ unsigned long nr_pages;
+
+ if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
+ return;
+
+ /*
+ * The call to memmap_init_zone should have already taken care
+ * of the pages reserved for the memmap, so we can just jump to
+ * the end of that region and start processing the device pages.
+ */
+ if (pgmap->altmap_valid) {
+ struct vmem_altmap *altmap = &pgmap->altmap;
+
+ pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
+ }
+
+ /* Record the number of pages we are about to initialize */
+ nr_pages = end_pfn - pfn;
+
+ for (; pfn < end_pfn; pfn++) {
+ struct page *page = pfn_to_page(pfn);
+
+ __init_single_page(page, pfn, zone_idx, nid);
+
+ /*
+ * Mark page reserved as it will need to wait for onlining
+ * phase for it to be fully associated with a zone.
+ *
+ * We can use the non-atomic __set_bit operation for setting
+ * the flag as we are still initializing the pages.
+ */
+ __SetPageReserved(page);
+
+ /*
+ * ZONE_DEVICE pages union ->lru with a ->pgmap back
+ * pointer and hmm_data. It is a bug if a ZONE_DEVICE
+ * page is ever freed or placed on a driver-private list.
+ */
+ page->pgmap = pgmap;
+ page->hmm_data = 0;
+
+ /*
+ * Mark the block movable so that blocks are reserved for
+ * movable at startup. This will force kernel allocations
+ * to reserve their blocks rather than leaking throughout
+ * the address space during boot when many long-lived
+ * kernel allocations are made.
+ *
+ * bitmap is created for zone's valid pfn range. but memmap
+ * can be created for invalid pages (for alignment)
+ * check here not to call set_pageblock_migratetype() against
+ * pfn out of zone.
+ *
+ * Please note that MEMMAP_HOTPLUG path doesn't clear memmap
+ * because this is done early in sparse_add_one_section
+ */
+ if (!(pfn & (pageblock_nr_pages - 1))) {
+ set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+ cond_resched();
+ }
+ }
+
+ pr_info("%s initialised, %lu pages in %ums\n", dev_name(pgmap->dev),
+ nr_pages, jiffies_to_msecs(jiffies - start));
+}
+
+#endif
/*
* Initially all pages are reserved - free ones are freed
* up by free_all_bootmem() once the early boot process is
@@ -5477,10 +5554,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
/*
* Honor reservation requested by the driver for this ZONE_DEVICE
- * memory
+ * memory. We limit the total number of pages to initialize to just
+ * those that might contain the memory mapping. We will defer the
+ * ZONE_DEVICE page initialization until after we have released
+ * the hotplug lock.
*/
- if (altmap && start_pfn == altmap->base_pfn)
+ if (altmap && start_pfn == altmap->base_pfn) {
start_pfn += altmap->reserve;
+ end_pfn = altmap->base_pfn +
+ vmem_altmap_offset(altmap);
+ } else if (zone == ZONE_DEVICE) {
+ end_pfn = start_pfn;
+ }
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
/*
^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-10 23:43 ` Alexander Duyck
@ 2018-09-11 7:49 ` kbuild test robot
-1 siblings, 0 replies; 55+ messages in thread
From: kbuild test robot @ 2018-09-11 7:49 UTC (permalink / raw)
To: Alexander Duyck
Cc: pavel.tatashin, mhocko, kirill.shutemov, linux-nvdimm,
dave.hansen, linux-kernel, linux-mm, jglisse, kbuild-all, akpm,
mingo
Hi Alexander,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[also build test WARNING on v4.19-rc3]
[cannot apply to next-20180910]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Alexander-Duyck/Address-issues-slowing-persistent-memory-initialization/20180911-144536
config: x86_64-randconfig-x009-201836 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
In file included from include/asm-generic/bug.h:5:0,
from arch/x86/include/asm/bug.h:83,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/mm.h:9,
from mm/page_alloc.c:18:
mm/page_alloc.c: In function 'memmap_init_zone':
mm/page_alloc.c:5566:21: error: 'ZONE_DEVICE' undeclared (first use in this function); did you mean 'ZONE_MOVABLE'?
} else if (zone == ZONE_DEVICE) {
^
include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^~~~
>> mm/page_alloc.c:5566:9: note: in expansion of macro 'if'
} else if (zone == ZONE_DEVICE) {
^~
mm/page_alloc.c:5566:21: note: each undeclared identifier is reported only once for each function it appears in
} else if (zone == ZONE_DEVICE) {
^
include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^~~~
>> mm/page_alloc.c:5566:9: note: in expansion of macro 'if'
} else if (zone == ZONE_DEVICE) {
^~
vim +/if +5566 mm/page_alloc.c
5551
5552 if (highest_memmap_pfn < end_pfn - 1)
5553 highest_memmap_pfn = end_pfn - 1;
5554
5555 /*
5556 * Honor reservation requested by the driver for this ZONE_DEVICE
5557 * memory. We limit the total number of pages to initialize to just
5558 * those that might contain the memory mapping. We will defer the
5559 * ZONE_DEVICE page initialization until after we have released
5560 * the hotplug lock.
5561 */
5562 if (altmap && start_pfn == altmap->base_pfn) {
5563 start_pfn += altmap->reserve;
5564 end_pfn = altmap->base_pfn +
5565 vmem_altmap_offset(altmap);
> 5566 } else if (zone == ZONE_DEVICE) {
5567 end_pfn = start_pfn;
5568 }
5569
5570 for (pfn = start_pfn; pfn < end_pfn; pfn++) {
5571 /*
5572 * There can be holes in boot-time mem_map[]s handed to this
5573 * function. They do not exist on hotplugged memory.
5574 */
5575 if (context != MEMMAP_EARLY)
5576 goto not_early;
5577
5578 if (!early_pfn_valid(pfn))
5579 continue;
5580 if (!early_pfn_in_nid(pfn, nid))
5581 continue;
5582 if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
5583 break;
5584
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
@ 2018-09-11 7:49 ` kbuild test robot
0 siblings, 0 replies; 55+ messages in thread
From: kbuild test robot @ 2018-09-11 7:49 UTC (permalink / raw)
To: Alexander Duyck
Cc: kbuild-all, linux-mm, linux-kernel, linux-nvdimm, pavel.tatashin,
mhocko, dave.jiang, mingo, dave.hansen, jglisse, akpm, logang,
dan.j.williams, kirill.shutemov
[-- Attachment #1: Type: text/plain, Size: 3354 bytes --]
Hi Alexander,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[also build test WARNING on v4.19-rc3]
[cannot apply to next-20180910]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Alexander-Duyck/Address-issues-slowing-persistent-memory-initialization/20180911-144536
config: x86_64-randconfig-x009-201836 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
In file included from include/asm-generic/bug.h:5:0,
from arch/x86/include/asm/bug.h:83,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/mm.h:9,
from mm/page_alloc.c:18:
mm/page_alloc.c: In function 'memmap_init_zone':
mm/page_alloc.c:5566:21: error: 'ZONE_DEVICE' undeclared (first use in this function); did you mean 'ZONE_MOVABLE'?
} else if (zone == ZONE_DEVICE) {
^
include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^~~~
>> mm/page_alloc.c:5566:9: note: in expansion of macro 'if'
} else if (zone == ZONE_DEVICE) {
^~
mm/page_alloc.c:5566:21: note: each undeclared identifier is reported only once for each function it appears in
} else if (zone == ZONE_DEVICE) {
^
include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^~~~
>> mm/page_alloc.c:5566:9: note: in expansion of macro 'if'
} else if (zone == ZONE_DEVICE) {
^~
vim +/if +5566 mm/page_alloc.c
5551
5552 if (highest_memmap_pfn < end_pfn - 1)
5553 highest_memmap_pfn = end_pfn - 1;
5554
5555 /*
5556 * Honor reservation requested by the driver for this ZONE_DEVICE
5557 * memory. We limit the total number of pages to initialize to just
5558 * those that might contain the memory mapping. We will defer the
5559 * ZONE_DEVICE page initialization until after we have released
5560 * the hotplug lock.
5561 */
5562 if (altmap && start_pfn == altmap->base_pfn) {
5563 start_pfn += altmap->reserve;
5564 end_pfn = altmap->base_pfn +
5565 vmem_altmap_offset(altmap);
> 5566 } else if (zone == ZONE_DEVICE) {
5567 end_pfn = start_pfn;
5568 }
5569
5570 for (pfn = start_pfn; pfn < end_pfn; pfn++) {
5571 /*
5572 * There can be holes in boot-time mem_map[]s handed to this
5573 * function. They do not exist on hotplugged memory.
5574 */
5575 if (context != MEMMAP_EARLY)
5576 goto not_early;
5577
5578 if (!early_pfn_valid(pfn))
5579 continue;
5580 if (!early_pfn_in_nid(pfn, nid))
5581 continue;
5582 if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
5583 break;
5584
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27230 bytes --]
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-10 23:43 ` Alexander Duyck
@ 2018-09-11 7:54 ` kbuild test robot
-1 siblings, 0 replies; 55+ messages in thread
From: kbuild test robot @ 2018-09-11 7:54 UTC (permalink / raw)
To: Alexander Duyck
Cc: pavel.tatashin, mhocko, kirill.shutemov, linux-nvdimm,
dave.hansen, linux-kernel, linux-mm, jglisse, kbuild-all, akpm,
mingo
Hi Alexander,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v4.19-rc3]
[cannot apply to next-20180910]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Alexander-Duyck/Address-issues-slowing-persistent-memory-initialization/20180911-144536
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=openrisc
All errors (new ones prefixed by >>):
mm/page_alloc.c: In function 'memmap_init_zone':
>> mm/page_alloc.c:5566:21: error: 'ZONE_DEVICE' undeclared (first use in this function)
} else if (zone == ZONE_DEVICE) {
^~~~~~~~~~~
mm/page_alloc.c:5566:21: note: each undeclared identifier is reported only once for each function it appears in
vim +/ZONE_DEVICE +5566 mm/page_alloc.c
5551
5552 if (highest_memmap_pfn < end_pfn - 1)
5553 highest_memmap_pfn = end_pfn - 1;
5554
5555 /*
5556 * Honor reservation requested by the driver for this ZONE_DEVICE
5557 * memory. We limit the total number of pages to initialize to just
5558 * those that might contain the memory mapping. We will defer the
5559 * ZONE_DEVICE page initialization until after we have released
5560 * the hotplug lock.
5561 */
5562 if (altmap && start_pfn == altmap->base_pfn) {
5563 start_pfn += altmap->reserve;
5564 end_pfn = altmap->base_pfn +
5565 vmem_altmap_offset(altmap);
> 5566 } else if (zone == ZONE_DEVICE) {
5567 end_pfn = start_pfn;
5568 }
5569
5570 for (pfn = start_pfn; pfn < end_pfn; pfn++) {
5571 /*
5572 * There can be holes in boot-time mem_map[]s handed to this
5573 * function. They do not exist on hotplugged memory.
5574 */
5575 if (context != MEMMAP_EARLY)
5576 goto not_early;
5577
5578 if (!early_pfn_valid(pfn))
5579 continue;
5580 if (!early_pfn_in_nid(pfn, nid))
5581 continue;
5582 if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
5583 break;
5584
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
@ 2018-09-11 7:54 ` kbuild test robot
0 siblings, 0 replies; 55+ messages in thread
From: kbuild test robot @ 2018-09-11 7:54 UTC (permalink / raw)
To: Alexander Duyck
Cc: kbuild-all, linux-mm, linux-kernel, linux-nvdimm, pavel.tatashin,
mhocko, dave.jiang, mingo, dave.hansen, jglisse, akpm, logang,
dan.j.williams, kirill.shutemov
[-- Attachment #1: Type: text/plain, Size: 2559 bytes --]
Hi Alexander,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v4.19-rc3]
[cannot apply to next-20180910]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Alexander-Duyck/Address-issues-slowing-persistent-memory-initialization/20180911-144536
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=openrisc
All errors (new ones prefixed by >>):
mm/page_alloc.c: In function 'memmap_init_zone':
>> mm/page_alloc.c:5566:21: error: 'ZONE_DEVICE' undeclared (first use in this function)
} else if (zone == ZONE_DEVICE) {
^~~~~~~~~~~
mm/page_alloc.c:5566:21: note: each undeclared identifier is reported only once for each function it appears in
vim +/ZONE_DEVICE +5566 mm/page_alloc.c
5551
5552 if (highest_memmap_pfn < end_pfn - 1)
5553 highest_memmap_pfn = end_pfn - 1;
5554
5555 /*
5556 * Honor reservation requested by the driver for this ZONE_DEVICE
5557 * memory. We limit the total number of pages to initialize to just
5558 * those that might contain the memory mapping. We will defer the
5559 * ZONE_DEVICE page initialization until after we have released
5560 * the hotplug lock.
5561 */
5562 if (altmap && start_pfn == altmap->base_pfn) {
5563 start_pfn += altmap->reserve;
5564 end_pfn = altmap->base_pfn +
5565 vmem_altmap_offset(altmap);
> 5566 } else if (zone == ZONE_DEVICE) {
5567 end_pfn = start_pfn;
5568 }
5569
5570 for (pfn = start_pfn; pfn < end_pfn; pfn++) {
5571 /*
5572 * There can be holes in boot-time mem_map[]s handed to this
5573 * function. They do not exist on hotplugged memory.
5574 */
5575 if (context != MEMMAP_EARLY)
5576 goto not_early;
5577
5578 if (!early_pfn_valid(pfn))
5579 continue;
5580 if (!early_pfn_in_nid(pfn, nid))
5581 continue;
5582 if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
5583 break;
5584
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7798 bytes --]
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-10 23:43 ` Alexander Duyck
@ 2018-09-11 22:35 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2018-09-11 22:35 UTC (permalink / raw)
To: Alexander Duyck
Cc: pavel.tatashin, Michal Hocko, linux-nvdimm, Dave Hansen,
Linux Kernel Mailing List, Linux MM, Jérôme Glisse,
Andrew Morton, Ingo Molnar, Kirill A. Shutemov
On Mon, Sep 10, 2018 at 4:43 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> The ZONE_DEVICE pages were being initialized in two locations. One was with
> the memory_hotplug lock held and another was outside of that lock. The
> problem with this is that it was nearly doubling the memory initialization
> time. Instead of doing this twice, once while holding a global lock and
> once without, I am opting to defer the initialization to the one outside of
> the lock. This allows us to avoid serializing the overhead for memory init
> and we can instead focus on per-node init times.
>
> One issue I encountered is that devm_memremap_pages and
> hmm_devmmem_pages_create were initializing only the pgmap field the same
> way. One wasn't initializing hmm_data, and the other was initializing it to
> a poison value. Since this is something that is exposed to the driver in
> the case of hmm I am opting for a third option and just initializing
> hmm_data to 0 since this is going to be exposed to unknown third party
> drivers.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> include/linux/mm.h | 2 +
> kernel/memremap.c | 24 +++++---------
> mm/hmm.c | 12 ++++---
> mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
Hmm, why mm/page_alloc.c and not kernel/memremap.c for this new
helper? I think that would address the kbuild reports and keeps all
the devm_memremap_pages / ZONE_DEVICE special casing centralized. I
also think it makes sense to move memremap.c to mm/ rather than
kernel/ especially since commit 5981690ddb8f "memremap: split
devm_memremap_pages() and memremap() infrastructure". Arguably, that
commit should have went ahead with the directory move.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
@ 2018-09-11 22:35 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2018-09-11 22:35 UTC (permalink / raw)
To: Alexander Duyck
Cc: Linux MM, Linux Kernel Mailing List, linux-nvdimm,
pavel.tatashin, Michal Hocko, Dave Jiang, Ingo Molnar,
Dave Hansen, Jérôme Glisse, Andrew Morton,
Logan Gunthorpe, Kirill A. Shutemov
On Mon, Sep 10, 2018 at 4:43 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> The ZONE_DEVICE pages were being initialized in two locations. One was with
> the memory_hotplug lock held and another was outside of that lock. The
> problem with this is that it was nearly doubling the memory initialization
> time. Instead of doing this twice, once while holding a global lock and
> once without, I am opting to defer the initialization to the one outside of
> the lock. This allows us to avoid serializing the overhead for memory init
> and we can instead focus on per-node init times.
>
> One issue I encountered is that devm_memremap_pages and
> hmm_devmmem_pages_create were initializing only the pgmap field the same
> way. One wasn't initializing hmm_data, and the other was initializing it to
> a poison value. Since this is something that is exposed to the driver in
> the case of hmm I am opting for a third option and just initializing
> hmm_data to 0 since this is going to be exposed to unknown third party
> drivers.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> include/linux/mm.h | 2 +
> kernel/memremap.c | 24 +++++---------
> mm/hmm.c | 12 ++++---
> mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
Hmm, why mm/page_alloc.c and not kernel/memremap.c for this new
helper? I think that would address the kbuild reports and keeps all
the devm_memremap_pages / ZONE_DEVICE special casing centralized. I
also think it makes sense to move memremap.c to mm/ rather than
kernel/ especially since commit 5981690ddb8f "memremap: split
devm_memremap_pages() and memremap() infrastructure". Arguably, that
commit should have went ahead with the directory move.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-11 22:35 ` Dan Williams
@ 2018-09-12 0:51 ` Alexander Duyck
-1 siblings, 0 replies; 55+ messages in thread
From: Alexander Duyck @ 2018-09-12 0:51 UTC (permalink / raw)
To: dan.j.williams
Cc: pavel.tatashin, Michal Hocko, linux-nvdimm, Dave Hansen, LKML,
linux-mm, jglisse, Andrew Morton, Ingo Molnar,
Kirill A. Shutemov
On Tue, Sep 11, 2018 at 3:35 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Mon, Sep 10, 2018 at 4:43 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > From: Alexander Duyck <alexander.h.duyck@intel.com>
> >
> > The ZONE_DEVICE pages were being initialized in two locations. One was with
> > the memory_hotplug lock held and another was outside of that lock. The
> > problem with this is that it was nearly doubling the memory initialization
> > time. Instead of doing this twice, once while holding a global lock and
> > once without, I am opting to defer the initialization to the one outside of
> > the lock. This allows us to avoid serializing the overhead for memory init
> > and we can instead focus on per-node init times.
> >
> > One issue I encountered is that devm_memremap_pages and
> > hmm_devmmem_pages_create were initializing only the pgmap field the same
> > way. One wasn't initializing hmm_data, and the other was initializing it to
> > a poison value. Since this is something that is exposed to the driver in
> > the case of hmm I am opting for a third option and just initializing
> > hmm_data to 0 since this is going to be exposed to unknown third party
> > drivers.
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > ---
> > include/linux/mm.h | 2 +
> > kernel/memremap.c | 24 +++++---------
> > mm/hmm.c | 12 ++++---
> > mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>
> Hmm, why mm/page_alloc.c and not kernel/memremap.c for this new
> helper? I think that would address the kbuild reports and keeps all
> the devm_memremap_pages / ZONE_DEVICE special casing centralized. I
> also think it makes sense to move memremap.c to mm/ rather than
> kernel/ especially since commit 5981690ddb8f "memremap: split
> devm_memremap_pages() and memremap() infrastructure". Arguably, that
> commit should have went ahead with the directory move.
The issue ends up being the fact that I would then have to start
exporting infrastructure such as __init_single_page from page_alloc. I
have some follow-up patches I am working on that will generate some
other shared functions that can be used by both memmap_init_zone and
memmap_init_zone_device, as well as pulling in some of the code from
the deferred memory init.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
@ 2018-09-12 0:51 ` Alexander Duyck
0 siblings, 0 replies; 55+ messages in thread
From: Alexander Duyck @ 2018-09-12 0:51 UTC (permalink / raw)
To: dan.j.williams
Cc: linux-mm, LKML, linux-nvdimm, pavel.tatashin, Michal Hocko,
dave.jiang, Ingo Molnar, Dave Hansen, jglisse, Andrew Morton,
logang, Kirill A. Shutemov
On Tue, Sep 11, 2018 at 3:35 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Mon, Sep 10, 2018 at 4:43 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > From: Alexander Duyck <alexander.h.duyck@intel.com>
> >
> > The ZONE_DEVICE pages were being initialized in two locations. One was with
> > the memory_hotplug lock held and another was outside of that lock. The
> > problem with this is that it was nearly doubling the memory initialization
> > time. Instead of doing this twice, once while holding a global lock and
> > once without, I am opting to defer the initialization to the one outside of
> > the lock. This allows us to avoid serializing the overhead for memory init
> > and we can instead focus on per-node init times.
> >
> > One issue I encountered is that devm_memremap_pages and
> > hmm_devmmem_pages_create were initializing only the pgmap field the same
> > way. One wasn't initializing hmm_data, and the other was initializing it to
> > a poison value. Since this is something that is exposed to the driver in
> > the case of hmm I am opting for a third option and just initializing
> > hmm_data to 0 since this is going to be exposed to unknown third party
> > drivers.
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > ---
> > include/linux/mm.h | 2 +
> > kernel/memremap.c | 24 +++++---------
> > mm/hmm.c | 12 ++++---
> > mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>
> Hmm, why mm/page_alloc.c and not kernel/memremap.c for this new
> helper? I think that would address the kbuild reports and keeps all
> the devm_memremap_pages / ZONE_DEVICE special casing centralized. I
> also think it makes sense to move memremap.c to mm/ rather than
> kernel/ especially since commit 5981690ddb8f "memremap: split
> devm_memremap_pages() and memremap() infrastructure". Arguably, that
> commit should have went ahead with the directory move.
The issue ends up being the fact that I would then have to start
exporting infrastructure such as __init_single_page from page_alloc. I
have some follow-up patches I am working on that will generate some
other shared functions that can be used by both memmap_init_zone and
memmap_init_zone_device, as well as pulling in some of the code from
the deferred memory init.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-12 0:51 ` Alexander Duyck
@ 2018-09-12 0:59 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2018-09-12 0:59 UTC (permalink / raw)
To: Alexander Duyck
Cc: pavel.tatashin, Michal Hocko, linux-nvdimm, Dave Hansen, LKML,
linux-mm, Jérôme Glisse, Andrew Morton, Ingo Molnar,
Kirill A. Shutemov
On Tue, Sep 11, 2018 at 5:51 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Tue, Sep 11, 2018 at 3:35 PM Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> On Mon, Sep 10, 2018 at 4:43 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>> >
>> > From: Alexander Duyck <alexander.h.duyck@intel.com>
>> >
>> > The ZONE_DEVICE pages were being initialized in two locations. One was with
>> > the memory_hotplug lock held and another was outside of that lock. The
>> > problem with this is that it was nearly doubling the memory initialization
>> > time. Instead of doing this twice, once while holding a global lock and
>> > once without, I am opting to defer the initialization to the one outside of
>> > the lock. This allows us to avoid serializing the overhead for memory init
>> > and we can instead focus on per-node init times.
>> >
>> > One issue I encountered is that devm_memremap_pages and
>> > hmm_devmmem_pages_create were initializing only the pgmap field the same
>> > way. One wasn't initializing hmm_data, and the other was initializing it to
>> > a poison value. Since this is something that is exposed to the driver in
>> > the case of hmm I am opting for a third option and just initializing
>> > hmm_data to 0 since this is going to be exposed to unknown third party
>> > drivers.
>> >
>> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> > ---
>> > include/linux/mm.h | 2 +
>> > kernel/memremap.c | 24 +++++---------
>> > mm/hmm.c | 12 ++++---
>> > mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>>
>> Hmm, why mm/page_alloc.c and not kernel/memremap.c for this new
>> helper? I think that would address the kbuild reports and keeps all
>> the devm_memremap_pages / ZONE_DEVICE special casing centralized. I
>> also think it makes sense to move memremap.c to mm/ rather than
>> kernel/ especially since commit 5981690ddb8f "memremap: split
>> devm_memremap_pages() and memremap() infrastructure". Arguably, that
>> commit should have went ahead with the directory move.
>
> The issue ends up being the fact that I would then have to start
> exporting infrastructure such as __init_single_page from page_alloc. I
> have some follow-up patches I am working on that will generate some
> other shared functions that can be used by both memmap_init_zone and
> memmap_init_zone_device, as well as pulling in some of the code from
> the deferred memory init.
You wouldn't need to export it, just make it public to mm/ in
mm/internal.h, or a similar local header. With kernel/memremap.c moved
to mm/memremap.c this becomes even easier and better scoped for the
shared symbols.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
@ 2018-09-12 0:59 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2018-09-12 0:59 UTC (permalink / raw)
To: Alexander Duyck
Cc: linux-mm, LKML, linux-nvdimm, pavel.tatashin, Michal Hocko,
Dave Jiang, Ingo Molnar, Dave Hansen, Jérôme Glisse,
Andrew Morton, Logan Gunthorpe, Kirill A. Shutemov
On Tue, Sep 11, 2018 at 5:51 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Tue, Sep 11, 2018 at 3:35 PM Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> On Mon, Sep 10, 2018 at 4:43 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>> >
>> > From: Alexander Duyck <alexander.h.duyck@intel.com>
>> >
>> > The ZONE_DEVICE pages were being initialized in two locations. One was with
>> > the memory_hotplug lock held and another was outside of that lock. The
>> > problem with this is that it was nearly doubling the memory initialization
>> > time. Instead of doing this twice, once while holding a global lock and
>> > once without, I am opting to defer the initialization to the one outside of
>> > the lock. This allows us to avoid serializing the overhead for memory init
>> > and we can instead focus on per-node init times.
>> >
>> > One issue I encountered is that devm_memremap_pages and
>> > hmm_devmmem_pages_create were initializing only the pgmap field the same
>> > way. One wasn't initializing hmm_data, and the other was initializing it to
>> > a poison value. Since this is something that is exposed to the driver in
>> > the case of hmm I am opting for a third option and just initializing
>> > hmm_data to 0 since this is going to be exposed to unknown third party
>> > drivers.
>> >
>> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> > ---
>> > include/linux/mm.h | 2 +
>> > kernel/memremap.c | 24 +++++---------
>> > mm/hmm.c | 12 ++++---
>> > mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>>
>> Hmm, why mm/page_alloc.c and not kernel/memremap.c for this new
>> helper? I think that would address the kbuild reports and keeps all
>> the devm_memremap_pages / ZONE_DEVICE special casing centralized. I
>> also think it makes sense to move memremap.c to mm/ rather than
>> kernel/ especially since commit 5981690ddb8f "memremap: split
>> devm_memremap_pages() and memremap() infrastructure". Arguably, that
>> commit should have went ahead with the directory move.
>
> The issue ends up being the fact that I would then have to start
> exporting infrastructure such as __init_single_page from page_alloc. I
> have some follow-up patches I am working on that will generate some
> other shared functions that can be used by both memmap_init_zone and
> memmap_init_zone_device, as well as pulling in some of the code from
> the deferred memory init.
You wouldn't need to export it, just make it public to mm/ in
mm/internal.h, or a similar local header. With kernel/memremap.c moved
to mm/memremap.c this becomes even easier and better scoped for the
shared symbols.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-10 23:43 ` Alexander Duyck
` (3 preceding siblings ...)
(?)
@ 2018-09-12 13:59 ` Pasha Tatashin
2018-09-12 15:48 ` Alexander Duyck
-1 siblings, 1 reply; 55+ messages in thread
From: Pasha Tatashin @ 2018-09-12 13:59 UTC (permalink / raw)
To: Alexander Duyck, linux-mm, linux-kernel, linux-nvdimm
Cc: mhocko, dave.jiang, mingo, dave.hansen, jglisse, akpm, logang,
dan.j.williams, kirill.shutemov
Hi Alex,
Please re-base on linux-next, memmap_init_zone() has been updated there
compared to mainline. You might even find a way to unify some parts of
memmap_init_zone and memmap_init_zone_device as memmap_init_zone() is a
lot simpler now.
I think __init_single_page() should stay local to page_alloc.c to keep
the inlining optimization.
I will review you this patch once you send an updated version.
Thank you,
Pavel
On 9/10/18 7:43 PM, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> The ZONE_DEVICE pages were being initialized in two locations. One was with
> the memory_hotplug lock held and another was outside of that lock. The
> problem with this is that it was nearly doubling the memory initialization
> time. Instead of doing this twice, once while holding a global lock and
> once without, I am opting to defer the initialization to the one outside of
> the lock. This allows us to avoid serializing the overhead for memory init
> and we can instead focus on per-node init times.
>
> One issue I encountered is that devm_memremap_pages and
> hmm_devmmem_pages_create were initializing only the pgmap field the same
> way. One wasn't initializing hmm_data, and the other was initializing it to
> a poison value. Since this is something that is exposed to the driver in
> the case of hmm I am opting for a third option and just initializing
> hmm_data to 0 since this is going to be exposed to unknown third party
> drivers.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> include/linux/mm.h | 2 +
> kernel/memremap.c | 24 +++++---------
> mm/hmm.c | 12 ++++---
> mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 4 files changed, 105 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a61ebe8ad4ca..47b440bb3050 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -848,6 +848,8 @@ static inline bool is_zone_device_page(const struct page *page)
> {
> return page_zonenum(page) == ZONE_DEVICE;
> }
> +extern void memmap_init_zone_device(struct zone *, unsigned long,
> + unsigned long, struct dev_pagemap *);
> #else
> static inline bool is_zone_device_page(const struct page *page)
> {
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index 5b8600d39931..d0c32e473f82 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -175,10 +175,10 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> struct vmem_altmap *altmap = pgmap->altmap_valid ?
> &pgmap->altmap : NULL;
> struct resource *res = &pgmap->res;
> - unsigned long pfn, pgoff, order;
> + struct dev_pagemap *conflict_pgmap;
> pgprot_t pgprot = PAGE_KERNEL;
> + unsigned long pgoff, order;
> int error, nid, is_ram;
> - struct dev_pagemap *conflict_pgmap;
>
> align_start = res->start & ~(SECTION_SIZE - 1);
> align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
> @@ -256,19 +256,13 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> if (error)
> goto err_add_memory;
>
> - for_each_device_pfn(pfn, pgmap) {
> - struct page *page = pfn_to_page(pfn);
> -
> - /*
> - * ZONE_DEVICE pages union ->lru with a ->pgmap back
> - * pointer. It is a bug if a ZONE_DEVICE page is ever
> - * freed or placed on a driver-private list. Seed the
> - * storage with LIST_POISON* values.
> - */
> - list_del(&page->lru);
> - page->pgmap = pgmap;
> - percpu_ref_get(pgmap->ref);
> - }
> + /*
> + * Initialization of the pages has been deferred until now in order
> + * to allow us to do the work while not holding the hotplug lock.
> + */
> + memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
> + align_start >> PAGE_SHIFT,
> + align_size >> PAGE_SHIFT, pgmap);
>
> devm_add_action(dev, devm_memremap_pages_release, pgmap);
>
> diff --git a/mm/hmm.c b/mm/hmm.c
> index c968e49f7a0c..774d684fa2b4 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -1024,7 +1024,6 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
> resource_size_t key, align_start, align_size, align_end;
> struct device *device = devmem->device;
> int ret, nid, is_ram;
> - unsigned long pfn;
>
> align_start = devmem->resource->start & ~(PA_SECTION_SIZE - 1);
> align_size = ALIGN(devmem->resource->start +
> @@ -1109,11 +1108,14 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
> align_size >> PAGE_SHIFT, NULL);
> mem_hotplug_done();
>
> - for (pfn = devmem->pfn_first; pfn < devmem->pfn_last; pfn++) {
> - struct page *page = pfn_to_page(pfn);
> + /*
> + * Initialization of the pages has been deferred until now in order
> + * to allow us to do the work while not holding the hotplug lock.
> + */
> + memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
> + align_start >> PAGE_SHIFT,
> + align_size >> PAGE_SHIFT, &devmem->pagemap);
>
> - page->pgmap = &devmem->pagemap;
> - }
> return 0;
>
> error_add_memory:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a9b095a72fd9..81a3fd942c45 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5454,6 +5454,83 @@ void __ref build_all_zonelists(pg_data_t *pgdat)
> #endif
> }
>
> +#ifdef CONFIG_ZONE_DEVICE
> +void __ref memmap_init_zone_device(struct zone *zone, unsigned long pfn,
> + unsigned long size,
> + struct dev_pagemap *pgmap)
> +{
> + struct pglist_data *pgdat = zone->zone_pgdat;
> + unsigned long zone_idx = zone_idx(zone);
> + unsigned long end_pfn = pfn + size;
> + unsigned long start = jiffies;
> + int nid = pgdat->node_id;
> + unsigned long nr_pages;
> +
> + if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
> + return;
> +
> + /*
> + * The call to memmap_init_zone should have already taken care
> + * of the pages reserved for the memmap, so we can just jump to
> + * the end of that region and start processing the device pages.
> + */
> + if (pgmap->altmap_valid) {
> + struct vmem_altmap *altmap = &pgmap->altmap;
> +
> + pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
> + }
> +
> + /* Record the number of pages we are about to initialize */
> + nr_pages = end_pfn - pfn;
> +
> + for (; pfn < end_pfn; pfn++) {
> + struct page *page = pfn_to_page(pfn);
> +
> + __init_single_page(page, pfn, zone_idx, nid);
> +
> + /*
> + * Mark page reserved as it will need to wait for onlining
> + * phase for it to be fully associated with a zone.
> + *
> + * We can use the non-atomic __set_bit operation for setting
> + * the flag as we are still initializing the pages.
> + */
> + __SetPageReserved(page);
> +
> + /*
> + * ZONE_DEVICE pages union ->lru with a ->pgmap back
> + * pointer and hmm_data. It is a bug if a ZONE_DEVICE
> + * page is ever freed or placed on a driver-private list.
> + */
> + page->pgmap = pgmap;
> + page->hmm_data = 0;
> +
> + /*
> + * Mark the block movable so that blocks are reserved for
> + * movable at startup. This will force kernel allocations
> + * to reserve their blocks rather than leaking throughout
> + * the address space during boot when many long-lived
> + * kernel allocations are made.
> + *
> + * bitmap is created for zone's valid pfn range. but memmap
> + * can be created for invalid pages (for alignment)
> + * check here not to call set_pageblock_migratetype() against
> + * pfn out of zone.
> + *
> + * Please note that MEMMAP_HOTPLUG path doesn't clear memmap
> + * because this is done early in sparse_add_one_section
> + */
> + if (!(pfn & (pageblock_nr_pages - 1))) {
> + set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> + cond_resched();
> + }
> + }
> +
> + pr_info("%s initialised, %lu pages in %ums\n", dev_name(pgmap->dev),
> + nr_pages, jiffies_to_msecs(jiffies - start));
> +}
> +
> +#endif
> /*
> * Initially all pages are reserved - free ones are freed
> * up by free_all_bootmem() once the early boot process is
> @@ -5477,10 +5554,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>
> /*
> * Honor reservation requested by the driver for this ZONE_DEVICE
> - * memory
> + * memory. We limit the total number of pages to initialize to just
> + * those that might contain the memory mapping. We will defer the
> + * ZONE_DEVICE page initialization until after we have released
> + * the hotplug lock.
> */
> - if (altmap && start_pfn == altmap->base_pfn)
> + if (altmap && start_pfn == altmap->base_pfn) {
> start_pfn += altmap->reserve;
> + end_pfn = altmap->base_pfn +
> + vmem_altmap_offset(altmap);
> + } else if (zone == ZONE_DEVICE) {
> + end_pfn = start_pfn;
> + }
>
> for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> /*
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-12 13:59 ` Pasha Tatashin
@ 2018-09-12 15:48 ` Alexander Duyck
0 siblings, 0 replies; 55+ messages in thread
From: Alexander Duyck @ 2018-09-12 15:48 UTC (permalink / raw)
To: Pavel.Tatashin
Cc: Michal Hocko, linux-nvdimm, Dave Hansen, LKML, linux-mm, jglisse,
Kirill A. Shutemov, Andrew Morton, Ingo Molnar
On Wed, Sep 12, 2018 at 6:59 AM Pasha Tatashin
<Pavel.Tatashin@microsoft.com> wrote:
>
> Hi Alex,
Hi Pavel,
> Please re-base on linux-next, memmap_init_zone() has been updated there
> compared to mainline. You might even find a way to unify some parts of
> memmap_init_zone and memmap_init_zone_device as memmap_init_zone() is a
> lot simpler now.
This patch applied to the linux-next tree with only a little bit of
fuzz. It looks like it is mostly due to some code you had added above
the function as well. I have updated this patch so that it will apply
to both linux and linux-next by just moving the new function to
underneath memmap_init_zone instead of above it.
> I think __init_single_page() should stay local to page_alloc.c to keep
> the inlining optimization.
I agree. In addition it will make pulling common init together into
one space easier. I would rather not have us create an opportunity for
things to further diverge by making it available for anybody to use.
> I will review you this patch once you send an updated version.
Other than moving the new function from being added above versus below
there isn't much else that needs to change, at least for this patch. I
have some follow-up patches I am planning that will be targeted for
linux-next. Those I think will focus more on what you have in mind in
terms of combining this new function
> Thank you,
> Pavel
Thanks,
- Alex
> On 9/10/18 7:43 PM, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@intel.com>
> >
> > The ZONE_DEVICE pages were being initialized in two locations. One was with
> > the memory_hotplug lock held and another was outside of that lock. The
> > problem with this is that it was nearly doubling the memory initialization
> > time. Instead of doing this twice, once while holding a global lock and
> > once without, I am opting to defer the initialization to the one outside of
> > the lock. This allows us to avoid serializing the overhead for memory init
> > and we can instead focus on per-node init times.
> >
> > One issue I encountered is that devm_memremap_pages and
> > hmm_devmmem_pages_create were initializing only the pgmap field the same
> > way. One wasn't initializing hmm_data, and the other was initializing it to
> > a poison value. Since this is something that is exposed to the driver in
> > the case of hmm I am opting for a third option and just initializing
> > hmm_data to 0 since this is going to be exposed to unknown third party
> > drivers.
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > ---
> > include/linux/mm.h | 2 +
> > kernel/memremap.c | 24 +++++---------
> > mm/hmm.c | 12 ++++---
> > mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> > 4 files changed, 105 insertions(+), 22 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index a61ebe8ad4ca..47b440bb3050 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -848,6 +848,8 @@ static inline bool is_zone_device_page(const struct page *page)
> > {
> > return page_zonenum(page) == ZONE_DEVICE;
> > }
> > +extern void memmap_init_zone_device(struct zone *, unsigned long,
> > + unsigned long, struct dev_pagemap *);
> > #else
> > static inline bool is_zone_device_page(const struct page *page)
> > {
> > diff --git a/kernel/memremap.c b/kernel/memremap.c
> > index 5b8600d39931..d0c32e473f82 100644
> > --- a/kernel/memremap.c
> > +++ b/kernel/memremap.c
> > @@ -175,10 +175,10 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> > struct vmem_altmap *altmap = pgmap->altmap_valid ?
> > &pgmap->altmap : NULL;
> > struct resource *res = &pgmap->res;
> > - unsigned long pfn, pgoff, order;
> > + struct dev_pagemap *conflict_pgmap;
> > pgprot_t pgprot = PAGE_KERNEL;
> > + unsigned long pgoff, order;
> > int error, nid, is_ram;
> > - struct dev_pagemap *conflict_pgmap;
> >
> > align_start = res->start & ~(SECTION_SIZE - 1);
> > align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
> > @@ -256,19 +256,13 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> > if (error)
> > goto err_add_memory;
> >
> > - for_each_device_pfn(pfn, pgmap) {
> > - struct page *page = pfn_to_page(pfn);
> > -
> > - /*
> > - * ZONE_DEVICE pages union ->lru with a ->pgmap back
> > - * pointer. It is a bug if a ZONE_DEVICE page is ever
> > - * freed or placed on a driver-private list. Seed the
> > - * storage with LIST_POISON* values.
> > - */
> > - list_del(&page->lru);
> > - page->pgmap = pgmap;
> > - percpu_ref_get(pgmap->ref);
> > - }
> > + /*
> > + * Initialization of the pages has been deferred until now in order
> > + * to allow us to do the work while not holding the hotplug lock.
> > + */
> > + memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
> > + align_start >> PAGE_SHIFT,
> > + align_size >> PAGE_SHIFT, pgmap);
> >
> > devm_add_action(dev, devm_memremap_pages_release, pgmap);
> >
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index c968e49f7a0c..774d684fa2b4 100644
> > --- a/mm/hmm.c
> > +++ b/mm/hmm.c
> > @@ -1024,7 +1024,6 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
> > resource_size_t key, align_start, align_size, align_end;
> > struct device *device = devmem->device;
> > int ret, nid, is_ram;
> > - unsigned long pfn;
> >
> > align_start = devmem->resource->start & ~(PA_SECTION_SIZE - 1);
> > align_size = ALIGN(devmem->resource->start +
> > @@ -1109,11 +1108,14 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
> > align_size >> PAGE_SHIFT, NULL);
> > mem_hotplug_done();
> >
> > - for (pfn = devmem->pfn_first; pfn < devmem->pfn_last; pfn++) {
> > - struct page *page = pfn_to_page(pfn);
> > + /*
> > + * Initialization of the pages has been deferred until now in order
> > + * to allow us to do the work while not holding the hotplug lock.
> > + */
> > + memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
> > + align_start >> PAGE_SHIFT,
> > + align_size >> PAGE_SHIFT, &devmem->pagemap);
> >
> > - page->pgmap = &devmem->pagemap;
> > - }
> > return 0;
> >
> > error_add_memory:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a9b095a72fd9..81a3fd942c45 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5454,6 +5454,83 @@ void __ref build_all_zonelists(pg_data_t *pgdat)
> > #endif
> > }
> >
> > +#ifdef CONFIG_ZONE_DEVICE
> > +void __ref memmap_init_zone_device(struct zone *zone, unsigned long pfn,
> > + unsigned long size,
> > + struct dev_pagemap *pgmap)
> > +{
> > + struct pglist_data *pgdat = zone->zone_pgdat;
> > + unsigned long zone_idx = zone_idx(zone);
> > + unsigned long end_pfn = pfn + size;
> > + unsigned long start = jiffies;
> > + int nid = pgdat->node_id;
> > + unsigned long nr_pages;
> > +
> > + if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
> > + return;
> > +
> > + /*
> > + * The call to memmap_init_zone should have already taken care
> > + * of the pages reserved for the memmap, so we can just jump to
> > + * the end of that region and start processing the device pages.
> > + */
> > + if (pgmap->altmap_valid) {
> > + struct vmem_altmap *altmap = &pgmap->altmap;
> > +
> > + pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
> > + }
> > +
> > + /* Record the number of pages we are about to initialize */
> > + nr_pages = end_pfn - pfn;
> > +
> > + for (; pfn < end_pfn; pfn++) {
> > + struct page *page = pfn_to_page(pfn);
> > +
> > + __init_single_page(page, pfn, zone_idx, nid);
> > +
> > + /*
> > + * Mark page reserved as it will need to wait for onlining
> > + * phase for it to be fully associated with a zone.
> > + *
> > + * We can use the non-atomic __set_bit operation for setting
> > + * the flag as we are still initializing the pages.
> > + */
> > + __SetPageReserved(page);
> > +
> > + /*
> > + * ZONE_DEVICE pages union ->lru with a ->pgmap back
> > + * pointer and hmm_data. It is a bug if a ZONE_DEVICE
> > + * page is ever freed or placed on a driver-private list.
> > + */
> > + page->pgmap = pgmap;
> > + page->hmm_data = 0;
> > +
> > + /*
> > + * Mark the block movable so that blocks are reserved for
> > + * movable at startup. This will force kernel allocations
> > + * to reserve their blocks rather than leaking throughout
> > + * the address space during boot when many long-lived
> > + * kernel allocations are made.
> > + *
> > + * bitmap is created for zone's valid pfn range. but memmap
> > + * can be created for invalid pages (for alignment)
> > + * check here not to call set_pageblock_migratetype() against
> > + * pfn out of zone.
> > + *
> > + * Please note that MEMMAP_HOTPLUG path doesn't clear memmap
> > + * because this is done early in sparse_add_one_section
> > + */
> > + if (!(pfn & (pageblock_nr_pages - 1))) {
> > + set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> > + cond_resched();
> > + }
> > + }
> > +
> > + pr_info("%s initialised, %lu pages in %ums\n", dev_name(pgmap->dev),
> > + nr_pages, jiffies_to_msecs(jiffies - start));
> > +}
> > +
> > +#endif
> > /*
> > * Initially all pages are reserved - free ones are freed
> > * up by free_all_bootmem() once the early boot process is
> > @@ -5477,10 +5554,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >
> > /*
> > * Honor reservation requested by the driver for this ZONE_DEVICE
> > - * memory
> > + * memory. We limit the total number of pages to initialize to just
> > + * those that might contain the memory mapping. We will defer the
> > + * ZONE_DEVICE page initialization until after we have released
> > + * the hotplug lock.
> > */
> > - if (altmap && start_pfn == altmap->base_pfn)
> > + if (altmap && start_pfn == altmap->base_pfn) {
> > start_pfn += altmap->reserve;
> > + end_pfn = altmap->base_pfn +
> > + vmem_altmap_offset(altmap);
> > + } else if (zone == ZONE_DEVICE) {
> > + end_pfn = start_pfn;
> > + }
> >
> > for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > /*
> >
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
@ 2018-09-12 15:48 ` Alexander Duyck
0 siblings, 0 replies; 55+ messages in thread
From: Alexander Duyck @ 2018-09-12 15:48 UTC (permalink / raw)
To: Pavel.Tatashin
Cc: linux-mm, LKML, linux-nvdimm, Michal Hocko, dave.jiang,
Ingo Molnar, Dave Hansen, jglisse, Andrew Morton, logang,
dan.j.williams, Kirill A. Shutemov
On Wed, Sep 12, 2018 at 6:59 AM Pasha Tatashin
<Pavel.Tatashin@microsoft.com> wrote:
>
> Hi Alex,
Hi Pavel,
> Please re-base on linux-next, memmap_init_zone() has been updated there
> compared to mainline. You might even find a way to unify some parts of
> memmap_init_zone and memmap_init_zone_device as memmap_init_zone() is a
> lot simpler now.
This patch applied to the linux-next tree with only a little bit of
fuzz. It looks like it is mostly due to some code you had added above
the function as well. I have updated this patch so that it will apply
to both linux and linux-next by just moving the new function to
underneath memmap_init_zone instead of above it.
> I think __init_single_page() should stay local to page_alloc.c to keep
> the inlining optimization.
I agree. In addition it will make pulling common init together into
one space easier. I would rather not have us create an opportunity for
things to further diverge by making it available for anybody to use.
> I will review you this patch once you send an updated version.
Other than moving the new function from being added above versus below
there isn't much else that needs to change, at least for this patch. I
have some follow-up patches I am planning that will be targeted for
linux-next. Those I think will focus more on what you have in mind in
terms of combining this new function
> Thank you,
> Pavel
Thanks,
- Alex
> On 9/10/18 7:43 PM, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@intel.com>
> >
> > The ZONE_DEVICE pages were being initialized in two locations. One was with
> > the memory_hotplug lock held and another was outside of that lock. The
> > problem with this is that it was nearly doubling the memory initialization
> > time. Instead of doing this twice, once while holding a global lock and
> > once without, I am opting to defer the initialization to the one outside of
> > the lock. This allows us to avoid serializing the overhead for memory init
> > and we can instead focus on per-node init times.
> >
> > One issue I encountered is that devm_memremap_pages and
> > hmm_devmmem_pages_create were initializing only the pgmap field the same
> > way. One wasn't initializing hmm_data, and the other was initializing it to
> > a poison value. Since this is something that is exposed to the driver in
> > the case of hmm I am opting for a third option and just initializing
> > hmm_data to 0 since this is going to be exposed to unknown third party
> > drivers.
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > ---
> > include/linux/mm.h | 2 +
> > kernel/memremap.c | 24 +++++---------
> > mm/hmm.c | 12 ++++---
> > mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> > 4 files changed, 105 insertions(+), 22 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index a61ebe8ad4ca..47b440bb3050 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -848,6 +848,8 @@ static inline bool is_zone_device_page(const struct page *page)
> > {
> > return page_zonenum(page) == ZONE_DEVICE;
> > }
> > +extern void memmap_init_zone_device(struct zone *, unsigned long,
> > + unsigned long, struct dev_pagemap *);
> > #else
> > static inline bool is_zone_device_page(const struct page *page)
> > {
> > diff --git a/kernel/memremap.c b/kernel/memremap.c
> > index 5b8600d39931..d0c32e473f82 100644
> > --- a/kernel/memremap.c
> > +++ b/kernel/memremap.c
> > @@ -175,10 +175,10 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> > struct vmem_altmap *altmap = pgmap->altmap_valid ?
> > &pgmap->altmap : NULL;
> > struct resource *res = &pgmap->res;
> > - unsigned long pfn, pgoff, order;
> > + struct dev_pagemap *conflict_pgmap;
> > pgprot_t pgprot = PAGE_KERNEL;
> > + unsigned long pgoff, order;
> > int error, nid, is_ram;
> > - struct dev_pagemap *conflict_pgmap;
> >
> > align_start = res->start & ~(SECTION_SIZE - 1);
> > align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
> > @@ -256,19 +256,13 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> > if (error)
> > goto err_add_memory;
> >
> > - for_each_device_pfn(pfn, pgmap) {
> > - struct page *page = pfn_to_page(pfn);
> > -
> > - /*
> > - * ZONE_DEVICE pages union ->lru with a ->pgmap back
> > - * pointer. It is a bug if a ZONE_DEVICE page is ever
> > - * freed or placed on a driver-private list. Seed the
> > - * storage with LIST_POISON* values.
> > - */
> > - list_del(&page->lru);
> > - page->pgmap = pgmap;
> > - percpu_ref_get(pgmap->ref);
> > - }
> > + /*
> > + * Initialization of the pages has been deferred until now in order
> > + * to allow us to do the work while not holding the hotplug lock.
> > + */
> > + memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
> > + align_start >> PAGE_SHIFT,
> > + align_size >> PAGE_SHIFT, pgmap);
> >
> > devm_add_action(dev, devm_memremap_pages_release, pgmap);
> >
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index c968e49f7a0c..774d684fa2b4 100644
> > --- a/mm/hmm.c
> > +++ b/mm/hmm.c
> > @@ -1024,7 +1024,6 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
> > resource_size_t key, align_start, align_size, align_end;
> > struct device *device = devmem->device;
> > int ret, nid, is_ram;
> > - unsigned long pfn;
> >
> > align_start = devmem->resource->start & ~(PA_SECTION_SIZE - 1);
> > align_size = ALIGN(devmem->resource->start +
> > @@ -1109,11 +1108,14 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
> > align_size >> PAGE_SHIFT, NULL);
> > mem_hotplug_done();
> >
> > - for (pfn = devmem->pfn_first; pfn < devmem->pfn_last; pfn++) {
> > - struct page *page = pfn_to_page(pfn);
> > + /*
> > + * Initialization of the pages has been deferred until now in order
> > + * to allow us to do the work while not holding the hotplug lock.
> > + */
> > + memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
> > + align_start >> PAGE_SHIFT,
> > + align_size >> PAGE_SHIFT, &devmem->pagemap);
> >
> > - page->pgmap = &devmem->pagemap;
> > - }
> > return 0;
> >
> > error_add_memory:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a9b095a72fd9..81a3fd942c45 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5454,6 +5454,83 @@ void __ref build_all_zonelists(pg_data_t *pgdat)
> > #endif
> > }
> >
> > +#ifdef CONFIG_ZONE_DEVICE
> > +void __ref memmap_init_zone_device(struct zone *zone, unsigned long pfn,
> > + unsigned long size,
> > + struct dev_pagemap *pgmap)
> > +{
> > + struct pglist_data *pgdat = zone->zone_pgdat;
> > + unsigned long zone_idx = zone_idx(zone);
> > + unsigned long end_pfn = pfn + size;
> > + unsigned long start = jiffies;
> > + int nid = pgdat->node_id;
> > + unsigned long nr_pages;
> > +
> > + if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
> > + return;
> > +
> > + /*
> > + * The call to memmap_init_zone should have already taken care
> > + * of the pages reserved for the memmap, so we can just jump to
> > + * the end of that region and start processing the device pages.
> > + */
> > + if (pgmap->altmap_valid) {
> > + struct vmem_altmap *altmap = &pgmap->altmap;
> > +
> > + pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
> > + }
> > +
> > + /* Record the number of pages we are about to initialize */
> > + nr_pages = end_pfn - pfn;
> > +
> > + for (; pfn < end_pfn; pfn++) {
> > + struct page *page = pfn_to_page(pfn);
> > +
> > + __init_single_page(page, pfn, zone_idx, nid);
> > +
> > + /*
> > + * Mark page reserved as it will need to wait for onlining
> > + * phase for it to be fully associated with a zone.
> > + *
> > + * We can use the non-atomic __set_bit operation for setting
> > + * the flag as we are still initializing the pages.
> > + */
> > + __SetPageReserved(page);
> > +
> > + /*
> > + * ZONE_DEVICE pages union ->lru with a ->pgmap back
> > + * pointer and hmm_data. It is a bug if a ZONE_DEVICE
> > + * page is ever freed or placed on a driver-private list.
> > + */
> > + page->pgmap = pgmap;
> > + page->hmm_data = 0;
> > +
> > + /*
> > + * Mark the block movable so that blocks are reserved for
> > + * movable at startup. This will force kernel allocations
> > + * to reserve their blocks rather than leaking throughout
> > + * the address space during boot when many long-lived
> > + * kernel allocations are made.
> > + *
> > + * bitmap is created for zone's valid pfn range. but memmap
> > + * can be created for invalid pages (for alignment)
> > + * check here not to call set_pageblock_migratetype() against
> > + * pfn out of zone.
> > + *
> > + * Please note that MEMMAP_HOTPLUG path doesn't clear memmap
> > + * because this is done early in sparse_add_one_section
> > + */
> > + if (!(pfn & (pageblock_nr_pages - 1))) {
> > + set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> > + cond_resched();
> > + }
> > + }
> > +
> > + pr_info("%s initialised, %lu pages in %ums\n", dev_name(pgmap->dev),
> > + nr_pages, jiffies_to_msecs(jiffies - start));
> > +}
> > +
> > +#endif
> > /*
> > * Initially all pages are reserved - free ones are freed
> > * up by free_all_bootmem() once the early boot process is
> > @@ -5477,10 +5554,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >
> > /*
> > * Honor reservation requested by the driver for this ZONE_DEVICE
> > - * memory
> > + * memory. We limit the total number of pages to initialize to just
> > + * those that might contain the memory mapping. We will defer the
> > + * ZONE_DEVICE page initialization until after we have released
> > + * the hotplug lock.
> > */
> > - if (altmap && start_pfn == altmap->base_pfn)
> > + if (altmap && start_pfn == altmap->base_pfn) {
> > start_pfn += altmap->reserve;
> > + end_pfn = altmap->base_pfn +
> > + vmem_altmap_offset(altmap);
> > + } else if (zone == ZONE_DEVICE) {
> > + end_pfn = start_pfn;
> > + }
> >
> > for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > /*
> >
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-12 15:48 ` Alexander Duyck
(?)
@ 2018-09-12 15:54 ` Pasha Tatashin
2018-09-12 16:44 ` Alexander Duyck
-1 siblings, 1 reply; 55+ messages in thread
From: Pasha Tatashin @ 2018-09-12 15:54 UTC (permalink / raw)
To: Alexander Duyck
Cc: linux-mm, LKML, linux-nvdimm, Michal Hocko, dave.jiang,
Ingo Molnar, Dave Hansen, jglisse, Andrew Morton, logang,
dan.j.williams, Kirill A. Shutemov
On 9/12/18 11:48 AM, Alexander Duyck wrote:
> On Wed, Sep 12, 2018 at 6:59 AM Pasha Tatashin
> <Pavel.Tatashin@microsoft.com> wrote:
>>
>> Hi Alex,
>
> Hi Pavel,
>
>> Please re-base on linux-next, memmap_init_zone() has been updated there
>> compared to mainline. You might even find a way to unify some parts of
>> memmap_init_zone and memmap_init_zone_device as memmap_init_zone() is a
>> lot simpler now.
>
> This patch applied to the linux-next tree with only a little bit of
> fuzz. It looks like it is mostly due to some code you had added above
> the function as well. I have updated this patch so that it will apply
> to both linux and linux-next by just moving the new function to
> underneath memmap_init_zone instead of above it.
>
>> I think __init_single_page() should stay local to page_alloc.c to keep
>> the inlining optimization.
>
> I agree. In addition it will make pulling common init together into
> one space easier. I would rather not have us create an opportunity for
> things to further diverge by making it available for anybody to use.
>
>> I will review you this patch once you send an updated version.
>
> Other than moving the new function from being added above versus below
> there isn't much else that needs to change, at least for this patch. I
> have some follow-up patches I am planning that will be targeted for
> linux-next. Those I think will focus more on what you have in mind in
> terms of combining this new function
Hi Alex,
I'd like see the combining to be part of the same series. May be this
patch can be pulled from this series and merged with your upcoming
patches series?
Thank you,
Pavel
>
>> Thank you,
>> Pavel
>
> Thanks,
> - Alex
>
>> On 9/10/18 7:43 PM, Alexander Duyck wrote:
>>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>>
>>> The ZONE_DEVICE pages were being initialized in two locations. One was with
>>> the memory_hotplug lock held and another was outside of that lock. The
>>> problem with this is that it was nearly doubling the memory initialization
>>> time. Instead of doing this twice, once while holding a global lock and
>>> once without, I am opting to defer the initialization to the one outside of
>>> the lock. This allows us to avoid serializing the overhead for memory init
>>> and we can instead focus on per-node init times.
>>>
>>> One issue I encountered is that devm_memremap_pages and
>>> hmm_devmmem_pages_create were initializing only the pgmap field the same
>>> way. One wasn't initializing hmm_data, and the other was initializing it to
>>> a poison value. Since this is something that is exposed to the driver in
>>> the case of hmm I am opting for a third option and just initializing
>>> hmm_data to 0 since this is going to be exposed to unknown third party
>>> drivers.
>>>
>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>> ---
>>> include/linux/mm.h | 2 +
>>> kernel/memremap.c | 24 +++++---------
>>> mm/hmm.c | 12 ++++---
>>> mm/page_alloc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>>> 4 files changed, 105 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index a61ebe8ad4ca..47b440bb3050 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -848,6 +848,8 @@ static inline bool is_zone_device_page(const struct page *page)
>>> {
>>> return page_zonenum(page) == ZONE_DEVICE;
>>> }
>>> +extern void memmap_init_zone_device(struct zone *, unsigned long,
>>> + unsigned long, struct dev_pagemap *);
>>> #else
>>> static inline bool is_zone_device_page(const struct page *page)
>>> {
>>> diff --git a/kernel/memremap.c b/kernel/memremap.c
>>> index 5b8600d39931..d0c32e473f82 100644
>>> --- a/kernel/memremap.c
>>> +++ b/kernel/memremap.c
>>> @@ -175,10 +175,10 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
>>> struct vmem_altmap *altmap = pgmap->altmap_valid ?
>>> &pgmap->altmap : NULL;
>>> struct resource *res = &pgmap->res;
>>> - unsigned long pfn, pgoff, order;
>>> + struct dev_pagemap *conflict_pgmap;
>>> pgprot_t pgprot = PAGE_KERNEL;
>>> + unsigned long pgoff, order;
>>> int error, nid, is_ram;
>>> - struct dev_pagemap *conflict_pgmap;
>>>
>>> align_start = res->start & ~(SECTION_SIZE - 1);
>>> align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
>>> @@ -256,19 +256,13 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
>>> if (error)
>>> goto err_add_memory;
>>>
>>> - for_each_device_pfn(pfn, pgmap) {
>>> - struct page *page = pfn_to_page(pfn);
>>> -
>>> - /*
>>> - * ZONE_DEVICE pages union ->lru with a ->pgmap back
>>> - * pointer. It is a bug if a ZONE_DEVICE page is ever
>>> - * freed or placed on a driver-private list. Seed the
>>> - * storage with LIST_POISON* values.
>>> - */
>>> - list_del(&page->lru);
>>> - page->pgmap = pgmap;
>>> - percpu_ref_get(pgmap->ref);
>>> - }
>>> + /*
>>> + * Initialization of the pages has been deferred until now in order
>>> + * to allow us to do the work while not holding the hotplug lock.
>>> + */
>>> + memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
>>> + align_start >> PAGE_SHIFT,
>>> + align_size >> PAGE_SHIFT, pgmap);
>>>
>>> devm_add_action(dev, devm_memremap_pages_release, pgmap);
>>>
>>> diff --git a/mm/hmm.c b/mm/hmm.c
>>> index c968e49f7a0c..774d684fa2b4 100644
>>> --- a/mm/hmm.c
>>> +++ b/mm/hmm.c
>>> @@ -1024,7 +1024,6 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
>>> resource_size_t key, align_start, align_size, align_end;
>>> struct device *device = devmem->device;
>>> int ret, nid, is_ram;
>>> - unsigned long pfn;
>>>
>>> align_start = devmem->resource->start & ~(PA_SECTION_SIZE - 1);
>>> align_size = ALIGN(devmem->resource->start +
>>> @@ -1109,11 +1108,14 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
>>> align_size >> PAGE_SHIFT, NULL);
>>> mem_hotplug_done();
>>>
>>> - for (pfn = devmem->pfn_first; pfn < devmem->pfn_last; pfn++) {
>>> - struct page *page = pfn_to_page(pfn);
>>> + /*
>>> + * Initialization of the pages has been deferred until now in order
>>> + * to allow us to do the work while not holding the hotplug lock.
>>> + */
>>> + memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
>>> + align_start >> PAGE_SHIFT,
>>> + align_size >> PAGE_SHIFT, &devmem->pagemap);
>>>
>>> - page->pgmap = &devmem->pagemap;
>>> - }
>>> return 0;
>>>
>>> error_add_memory:
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index a9b095a72fd9..81a3fd942c45 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -5454,6 +5454,83 @@ void __ref build_all_zonelists(pg_data_t *pgdat)
>>> #endif
>>> }
>>>
>>> +#ifdef CONFIG_ZONE_DEVICE
>>> +void __ref memmap_init_zone_device(struct zone *zone, unsigned long pfn,
>>> + unsigned long size,
>>> + struct dev_pagemap *pgmap)
>>> +{
>>> + struct pglist_data *pgdat = zone->zone_pgdat;
>>> + unsigned long zone_idx = zone_idx(zone);
>>> + unsigned long end_pfn = pfn + size;
>>> + unsigned long start = jiffies;
>>> + int nid = pgdat->node_id;
>>> + unsigned long nr_pages;
>>> +
>>> + if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
>>> + return;
>>> +
>>> + /*
>>> + * The call to memmap_init_zone should have already taken care
>>> + * of the pages reserved for the memmap, so we can just jump to
>>> + * the end of that region and start processing the device pages.
>>> + */
>>> + if (pgmap->altmap_valid) {
>>> + struct vmem_altmap *altmap = &pgmap->altmap;
>>> +
>>> + pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
>>> + }
>>> +
>>> + /* Record the number of pages we are about to initialize */
>>> + nr_pages = end_pfn - pfn;
>>> +
>>> + for (; pfn < end_pfn; pfn++) {
>>> + struct page *page = pfn_to_page(pfn);
>>> +
>>> + __init_single_page(page, pfn, zone_idx, nid);
>>> +
>>> + /*
>>> + * Mark page reserved as it will need to wait for onlining
>>> + * phase for it to be fully associated with a zone.
>>> + *
>>> + * We can use the non-atomic __set_bit operation for setting
>>> + * the flag as we are still initializing the pages.
>>> + */
>>> + __SetPageReserved(page);
>>> +
>>> + /*
>>> + * ZONE_DEVICE pages union ->lru with a ->pgmap back
>>> + * pointer and hmm_data. It is a bug if a ZONE_DEVICE
>>> + * page is ever freed or placed on a driver-private list.
>>> + */
>>> + page->pgmap = pgmap;
>>> + page->hmm_data = 0;
>>> +
>>> + /*
>>> + * Mark the block movable so that blocks are reserved for
>>> + * movable at startup. This will force kernel allocations
>>> + * to reserve their blocks rather than leaking throughout
>>> + * the address space during boot when many long-lived
>>> + * kernel allocations are made.
>>> + *
>>> + * bitmap is created for zone's valid pfn range. but memmap
>>> + * can be created for invalid pages (for alignment)
>>> + * check here not to call set_pageblock_migratetype() against
>>> + * pfn out of zone.
>>> + *
>>> + * Please note that MEMMAP_HOTPLUG path doesn't clear memmap
>>> + * because this is done early in sparse_add_one_section
>>> + */
>>> + if (!(pfn & (pageblock_nr_pages - 1))) {
>>> + set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>>> + cond_resched();
>>> + }
>>> + }
>>> +
>>> + pr_info("%s initialised, %lu pages in %ums\n", dev_name(pgmap->dev),
>>> + nr_pages, jiffies_to_msecs(jiffies - start));
>>> +}
>>> +
>>> +#endif
>>> /*
>>> * Initially all pages are reserved - free ones are freed
>>> * up by free_all_bootmem() once the early boot process is
>>> @@ -5477,10 +5554,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>
>>> /*
>>> * Honor reservation requested by the driver for this ZONE_DEVICE
>>> - * memory
>>> + * memory. We limit the total number of pages to initialize to just
>>> + * those that might contain the memory mapping. We will defer the
>>> + * ZONE_DEVICE page initialization until after we have released
>>> + * the hotplug lock.
>>> */
>>> - if (altmap && start_pfn == altmap->base_pfn)
>>> + if (altmap && start_pfn == altmap->base_pfn) {
>>> start_pfn += altmap->reserve;
>>> + end_pfn = altmap->base_pfn +
>>> + vmem_altmap_offset(altmap);
>>> + } else if (zone == ZONE_DEVICE) {
>>> + end_pfn = start_pfn;
>>> + }
>>>
>>> for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>> /*
>>>
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-12 15:54 ` Pasha Tatashin
@ 2018-09-12 16:44 ` Alexander Duyck
0 siblings, 0 replies; 55+ messages in thread
From: Alexander Duyck @ 2018-09-12 16:44 UTC (permalink / raw)
To: Pavel.Tatashin
Cc: Michal Hocko, linux-nvdimm, Dave Hansen, LKML, linux-mm, jglisse,
Kirill A. Shutemov, Andrew Morton, Ingo Molnar
On Wed, Sep 12, 2018 at 8:54 AM Pasha Tatashin
<Pavel.Tatashin@microsoft.com> wrote:
>
>
>
> On 9/12/18 11:48 AM, Alexander Duyck wrote:
> > On Wed, Sep 12, 2018 at 6:59 AM Pasha Tatashin
> > <Pavel.Tatashin@microsoft.com> wrote:
> >>
> >> Hi Alex,
> >
> > Hi Pavel,
> >
> >> Please re-base on linux-next, memmap_init_zone() has been updated there
> >> compared to mainline. You might even find a way to unify some parts of
> >> memmap_init_zone and memmap_init_zone_device as memmap_init_zone() is a
> >> lot simpler now.
> >
> > This patch applied to the linux-next tree with only a little bit of
> > fuzz. It looks like it is mostly due to some code you had added above
> > the function as well. I have updated this patch so that it will apply
> > to both linux and linux-next by just moving the new function to
> > underneath memmap_init_zone instead of above it.
> >
> >> I think __init_single_page() should stay local to page_alloc.c to keep
> >> the inlining optimization.
> >
> > I agree. In addition it will make pulling common init together into
> > one space easier. I would rather not have us create an opportunity for
> > things to further diverge by making it available for anybody to use.
> >
> >> I will review you this patch once you send an updated version.
> >
> > Other than moving the new function from being added above versus below
> > there isn't much else that needs to change, at least for this patch. I
> > have some follow-up patches I am planning that will be targeted for
> > linux-next. Those I think will focus more on what you have in mind in
> > terms of combining this new function
>
> Hi Alex,
>
> I'd like see the combining to be part of the same series. May be this
> patch can be pulled from this series and merged with your upcoming
> patches series?
>
> Thank you,
> Pavel
The problem is the issue is somewhat time sensitive, and the patches I
put out in this set needed to be easily backported. That is one of the
reasons this patch set is as conservative as it is.
I was hoping to make 4.20 with this patch set at the latest. My
follow-up patches are more of what I would consider 4.21 material as
it will be something we will probably want to give some testing time,
and I figure there will end up being a few revisions. I would probably
have them ready for review in another week or so.
Thanks.
- Alex
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
@ 2018-09-12 16:44 ` Alexander Duyck
0 siblings, 0 replies; 55+ messages in thread
From: Alexander Duyck @ 2018-09-12 16:44 UTC (permalink / raw)
To: Pavel.Tatashin
Cc: linux-mm, LKML, linux-nvdimm, Michal Hocko, dave.jiang,
Ingo Molnar, Dave Hansen, jglisse, Andrew Morton, logang,
dan.j.williams, Kirill A. Shutemov
On Wed, Sep 12, 2018 at 8:54 AM Pasha Tatashin
<Pavel.Tatashin@microsoft.com> wrote:
>
>
>
> On 9/12/18 11:48 AM, Alexander Duyck wrote:
> > On Wed, Sep 12, 2018 at 6:59 AM Pasha Tatashin
> > <Pavel.Tatashin@microsoft.com> wrote:
> >>
> >> Hi Alex,
> >
> > Hi Pavel,
> >
> >> Please re-base on linux-next, memmap_init_zone() has been updated there
> >> compared to mainline. You might even find a way to unify some parts of
> >> memmap_init_zone and memmap_init_zone_device as memmap_init_zone() is a
> >> lot simpler now.
> >
> > This patch applied to the linux-next tree with only a little bit of
> > fuzz. It looks like it is mostly due to some code you had added above
> > the function as well. I have updated this patch so that it will apply
> > to both linux and linux-next by just moving the new function to
> > underneath memmap_init_zone instead of above it.
> >
> >> I think __init_single_page() should stay local to page_alloc.c to keep
> >> the inlining optimization.
> >
> > I agree. In addition it will make pulling common init together into
> > one space easier. I would rather not have us create an opportunity for
> > things to further diverge by making it available for anybody to use.
> >
> >> I will review you this patch once you send an updated version.
> >
> > Other than moving the new function from being added above versus below
> > there isn't much else that needs to change, at least for this patch. I
> > have some follow-up patches I am planning that will be targeted for
> > linux-next. Those I think will focus more on what you have in mind in
> > terms of combining this new function
>
> Hi Alex,
>
> I'd like see the combining to be part of the same series. May be this
> patch can be pulled from this series and merged with your upcoming
> patches series?
>
> Thank you,
> Pavel
The problem is the issue is somewhat time sensitive, and the patches I
put out in this set needed to be easily backported. That is one of the
reasons this patch set is as conservative as it is.
I was hoping to make 4.20 with this patch set at the latest. My
follow-up patches are more of what I would consider 4.21 material as
it will be something we will probably want to give some testing time,
and I figure there will end up being a few revisions. I would probably
have them ready for review in another week or so.
Thanks.
- Alex
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-12 15:48 ` Alexander Duyck
@ 2018-09-12 16:50 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2018-09-12 16:50 UTC (permalink / raw)
To: Alexander Duyck
Cc: Pavel.Tatashin, Michal Hocko, linux-nvdimm, Dave Hansen, LKML,
linux-mm, Jérôme Glisse, Andrew Morton, Ingo Molnar,
Kirill A. Shutemov
On Wed, Sep 12, 2018 at 8:48 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Wed, Sep 12, 2018 at 6:59 AM Pasha Tatashin
> <Pavel.Tatashin@microsoft.com> wrote:
>>
>> Hi Alex,
>
> Hi Pavel,
>
>> Please re-base on linux-next, memmap_init_zone() has been updated there
>> compared to mainline. You might even find a way to unify some parts of
>> memmap_init_zone and memmap_init_zone_device as memmap_init_zone() is a
>> lot simpler now.
>
> This patch applied to the linux-next tree with only a little bit of
> fuzz. It looks like it is mostly due to some code you had added above
> the function as well. I have updated this patch so that it will apply
> to both linux and linux-next by just moving the new function to
> underneath memmap_init_zone instead of above it.
>
>> I think __init_single_page() should stay local to page_alloc.c to keep
>> the inlining optimization.
>
> I agree. In addition it will make pulling common init together into
> one space easier. I would rather not have us create an opportunity for
> things to further diverge by making it available for anybody to use.
I'll buy the inline argument for keeping the new routine in
page_alloc.c, but I otherwise do not see the divergence danger or
"making __init_single_page() available for anybody" given the the
declaration is limited in scope to a mm/ local header file.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
@ 2018-09-12 16:50 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2018-09-12 16:50 UTC (permalink / raw)
To: Alexander Duyck
Cc: Pavel.Tatashin, linux-mm, LKML, linux-nvdimm, Michal Hocko,
Dave Jiang, Ingo Molnar, Dave Hansen, Jérôme Glisse,
Andrew Morton, Logan Gunthorpe, Kirill A. Shutemov
On Wed, Sep 12, 2018 at 8:48 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Wed, Sep 12, 2018 at 6:59 AM Pasha Tatashin
> <Pavel.Tatashin@microsoft.com> wrote:
>>
>> Hi Alex,
>
> Hi Pavel,
>
>> Please re-base on linux-next, memmap_init_zone() has been updated there
>> compared to mainline. You might even find a way to unify some parts of
>> memmap_init_zone and memmap_init_zone_device as memmap_init_zone() is a
>> lot simpler now.
>
> This patch applied to the linux-next tree with only a little bit of
> fuzz. It looks like it is mostly due to some code you had added above
> the function as well. I have updated this patch so that it will apply
> to both linux and linux-next by just moving the new function to
> underneath memmap_init_zone instead of above it.
>
>> I think __init_single_page() should stay local to page_alloc.c to keep
>> the inlining optimization.
>
> I agree. In addition it will make pulling common init together into
> one space easier. I would rather not have us create an opportunity for
> things to further diverge by making it available for anybody to use.
I'll buy the inline argument for keeping the new routine in
page_alloc.c, but I otherwise do not see the divergence danger or
"making __init_single_page() available for anybody" given the the
declaration is limited in scope to a mm/ local header file.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-12 16:50 ` Dan Williams
(?)
@ 2018-09-12 17:46 ` Pasha Tatashin
2018-09-12 19:11 ` Dan Williams
-1 siblings, 1 reply; 55+ messages in thread
From: Pasha Tatashin @ 2018-09-12 17:46 UTC (permalink / raw)
To: Dan Williams, Alexander Duyck
Cc: linux-mm, LKML, linux-nvdimm, Michal Hocko, Dave Jiang,
Ingo Molnar, Dave Hansen, Jérôme Glisse, Andrew Morton,
Logan Gunthorpe, Kirill A. Shutemov
On 9/12/18 12:50 PM, Dan Williams wrote:
> On Wed, Sep 12, 2018 at 8:48 AM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Wed, Sep 12, 2018 at 6:59 AM Pasha Tatashin
>> <Pavel.Tatashin@microsoft.com> wrote:
>>>
>>> Hi Alex,
>>
>> Hi Pavel,
>>
>>> Please re-base on linux-next, memmap_init_zone() has been updated there
>>> compared to mainline. You might even find a way to unify some parts of
>>> memmap_init_zone and memmap_init_zone_device as memmap_init_zone() is a
>>> lot simpler now.
>>
>> This patch applied to the linux-next tree with only a little bit of
>> fuzz. It looks like it is mostly due to some code you had added above
>> the function as well. I have updated this patch so that it will apply
>> to both linux and linux-next by just moving the new function to
>> underneath memmap_init_zone instead of above it.
>>
>>> I think __init_single_page() should stay local to page_alloc.c to keep
>>> the inlining optimization.
>>
>> I agree. In addition it will make pulling common init together into
>> one space easier. I would rather not have us create an opportunity for
>> things to further diverge by making it available for anybody to use.
>
> I'll buy the inline argument for keeping the new routine in
> page_alloc.c, but I otherwise do not see the divergence danger or
> "making __init_single_page() available for anybody" given the the
> declaration is limited in scope to a mm/ local header file.
>
Hi Dan,
It is much harder for compiler to decide that function can be inlined
once it is non-static. Of course, we can simply move this function to a
header file, and declare it inline to begin with.
But, still __init_single_page() is so performance sensitive, that I'd
like to reduce number of callers to this function, and keep it in .c file.
Thank you,
Pavel
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
2018-09-12 17:46 ` Pasha Tatashin
@ 2018-09-12 19:11 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2018-09-12 19:11 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Michal Hocko, linux-nvdimm, Dave Hansen, LKML, linux-mm,
Jérôme Glisse, Andrew Morton, Ingo Molnar,
Kirill A. Shutemov
On Wed, Sep 12, 2018 at 10:46 AM, Pasha Tatashin
<Pavel.Tatashin@microsoft.com> wrote:
>
>
> On 9/12/18 12:50 PM, Dan Williams wrote:
>> On Wed, Sep 12, 2018 at 8:48 AM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On Wed, Sep 12, 2018 at 6:59 AM Pasha Tatashin
>>> <Pavel.Tatashin@microsoft.com> wrote:
>>>>
>>>> Hi Alex,
>>>
>>> Hi Pavel,
>>>
>>>> Please re-base on linux-next, memmap_init_zone() has been updated there
>>>> compared to mainline. You might even find a way to unify some parts of
>>>> memmap_init_zone and memmap_init_zone_device as memmap_init_zone() is a
>>>> lot simpler now.
>>>
>>> This patch applied to the linux-next tree with only a little bit of
>>> fuzz. It looks like it is mostly due to some code you had added above
>>> the function as well. I have updated this patch so that it will apply
>>> to both linux and linux-next by just moving the new function to
>>> underneath memmap_init_zone instead of above it.
>>>
>>>> I think __init_single_page() should stay local to page_alloc.c to keep
>>>> the inlining optimization.
>>>
>>> I agree. In addition it will make pulling common init together into
>>> one space easier. I would rather not have us create an opportunity for
>>> things to further diverge by making it available for anybody to use.
>>
>> I'll buy the inline argument for keeping the new routine in
>> page_alloc.c, but I otherwise do not see the divergence danger or
>> "making __init_single_page() available for anybody" given the the
>> declaration is limited in scope to a mm/ local header file.
>>
>
> Hi Dan,
>
> It is much harder for compiler to decide that function can be inlined
> once it is non-static. Of course, we can simply move this function to a
> header file, and declare it inline to begin with.
>
> But, still __init_single_page() is so performance sensitive, that I'd
> like to reduce number of callers to this function, and keep it in .c file.
Yes, agree, inline considerations win the day. I was just objecting to
the "make it available for anybody" assertion.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap
@ 2018-09-12 19:11 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2018-09-12 19:11 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Alexander Duyck, linux-mm, LKML, linux-nvdimm, Michal Hocko,
Dave Jiang, Ingo Molnar, Dave Hansen, Jérôme Glisse,
Andrew Morton, Logan Gunthorpe, Kirill A. Shutemov
On Wed, Sep 12, 2018 at 10:46 AM, Pasha Tatashin
<Pavel.Tatashin@microsoft.com> wrote:
>
>
> On 9/12/18 12:50 PM, Dan Williams wrote:
>> On Wed, Sep 12, 2018 at 8:48 AM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On Wed, Sep 12, 2018 at 6:59 AM Pasha Tatashin
>>> <Pavel.Tatashin@microsoft.com> wrote:
>>>>
>>>> Hi Alex,
>>>
>>> Hi Pavel,
>>>
>>>> Please re-base on linux-next, memmap_init_zone() has been updated there
>>>> compared to mainline. You might even find a way to unify some parts of
>>>> memmap_init_zone and memmap_init_zone_device as memmap_init_zone() is a
>>>> lot simpler now.
>>>
>>> This patch applied to the linux-next tree with only a little bit of
>>> fuzz. It looks like it is mostly due to some code you had added above
>>> the function as well. I have updated this patch so that it will apply
>>> to both linux and linux-next by just moving the new function to
>>> underneath memmap_init_zone instead of above it.
>>>
>>>> I think __init_single_page() should stay local to page_alloc.c to keep
>>>> the inlining optimization.
>>>
>>> I agree. In addition it will make pulling common init together into
>>> one space easier. I would rather not have us create an opportunity for
>>> things to further diverge by making it available for anybody to use.
>>
>> I'll buy the inline argument for keeping the new routine in
>> page_alloc.c, but I otherwise do not see the divergence danger or
>> "making __init_single_page() available for anybody" given the the
>> declaration is limited in scope to a mm/ local header file.
>>
>
> Hi Dan,
>
> It is much harder for compiler to decide that function can be inlined
> once it is non-static. Of course, we can simply move this function to a
> header file, and declare it inline to begin with.
>
> But, still __init_single_page() is so performance sensitive, that I'd
> like to reduce number of callers to this function, and keep it in .c file.
Yes, agree, inline considerations win the day. I was just objecting to
the "make it available for anybody" assertion.
^ permalink raw reply [flat|nested] 55+ messages in thread