* [v1 0/5] parallelized "struct page" zeroing
@ 2017-03-23 23:01 Pavel Tatashin
2017-03-23 23:01 ` [v1 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
` (6 more replies)
0 siblings, 7 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390
When deferred struct page initialization feature is enabled, we get a
performance gain of initializing vmemmap in parallel after other CPUs are
started. However, we still zero the memory for vmemmap using one boot CPU.
This patch-set fixes the memset-zeroing limitation by deferring it as well.
Here is example performance gain on SPARC with 32T:
base
https://hastebin.com/ozanelatat.go
fix
https://hastebin.com/utonawukof.go
As you can see without the fix it takes: 97.89s to boot
With the fix it takes: 46.91 to boot.
On x86 time saving is going to be even greater (proportionally to memory size)
because there are twice as many "struct page"es for the same amount of memory,
as base pages are twice smaller.
Pavel Tatashin (5):
sparc64: simplify vmemmap_populate
mm: defining memblock_virt_alloc_try_nid_raw
mm: add "zero" argument to vmemmap allocators
mm: zero struct pages during initialization
mm: teach platforms not to zero struct pages memory
arch/powerpc/mm/init_64.c | 4 +-
arch/s390/mm/vmem.c | 5 ++-
arch/sparc/mm/init_64.c | 26 +++++++----------------
arch/x86/mm/init_64.c | 3 +-
include/linux/bootmem.h | 3 ++
include/linux/mm.h | 15 +++++++++++--
mm/memblock.c | 46 ++++++++++++++++++++++++++++++++++++------
mm/page_alloc.c | 3 ++
mm/sparse-vmemmap.c | 48 +++++++++++++++++++++++++++++---------------
9 files changed, 103 insertions(+), 50 deletions(-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [v1 1/5] sparc64: simplify vmemmap_populate
2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
@ 2017-03-23 23:01 ` Pavel Tatashin
2017-03-23 23:01 ` [v1 2/5] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
` (5 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390
Remove duplicating code, by using common functions
vmemmap_pud_populate and vmemmap_pgd_populate functions.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
arch/sparc/mm/init_64.c | 23 ++++++-----------------
1 files changed, 6 insertions(+), 17 deletions(-)
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 2c0cb2a..01eccab 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2526,30 +2526,19 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
vstart = vstart & PMD_MASK;
vend = ALIGN(vend, PMD_SIZE);
for (; vstart < vend; vstart += PMD_SIZE) {
- pgd_t *pgd = pgd_offset_k(vstart);
+ pgd_t *pgd = vmemmap_pgd_populate(vstart, node);
unsigned long pte;
pud_t *pud;
pmd_t *pmd;
- if (pgd_none(*pgd)) {
- pud_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
+ if (!pgd)
+ return -ENOMEM;
- if (!new)
- return -ENOMEM;
- pgd_populate(&init_mm, pgd, new);
- }
-
- pud = pud_offset(pgd, vstart);
- if (pud_none(*pud)) {
- pmd_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
-
- if (!new)
- return -ENOMEM;
- pud_populate(&init_mm, pud, new);
- }
+ pud = vmemmap_pud_populate(pgd, vstart, node);
+ if (!pud)
+ return -ENOMEM;
pmd = pmd_offset(pud, vstart);
-
pte = pmd_val(*pmd);
if (!(pte & _PAGE_VALID)) {
void *block = vmemmap_alloc_block(PMD_SIZE, node);
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [v1 2/5] mm: defining memblock_virt_alloc_try_nid_raw
2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
2017-03-23 23:01 ` [v1 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
@ 2017-03-23 23:01 ` Pavel Tatashin
2017-03-23 23:01 ` [v1 3/5] mm: add "zero" argument to vmemmap allocators Pavel Tatashin
` (4 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390
A new version of memblock_virt_alloc_* allocations:
- Does not zero the allocated memory
- Does not panic if request cannot be satisfied
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
include/linux/bootmem.h | 3 +++
mm/memblock.c | 46 +++++++++++++++++++++++++++++++++++++++-------
2 files changed, 42 insertions(+), 7 deletions(-)
diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index dbaf312..b61ea10 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -160,6 +160,9 @@ extern int reserve_bootmem_node(pg_data_t *pgdat,
#define BOOTMEM_ALLOC_ANYWHERE (~(phys_addr_t)0)
/* FIXME: Move to memblock.h at a point where we remove nobootmem.c */
+void *memblock_virt_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
+ phys_addr_t min_addr,
+ phys_addr_t max_addr, int nid);
void *memblock_virt_alloc_try_nid_nopanic(phys_addr_t size,
phys_addr_t align, phys_addr_t min_addr,
phys_addr_t max_addr, int nid);
diff --git a/mm/memblock.c b/mm/memblock.c
index 696f06d..7fdc555 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1271,7 +1271,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
static void * __init memblock_virt_alloc_internal(
phys_addr_t size, phys_addr_t align,
phys_addr_t min_addr, phys_addr_t max_addr,
- int nid)
+ int nid, bool zero)
{
phys_addr_t alloc;
void *ptr;
@@ -1322,7 +1322,8 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
return NULL;
done:
ptr = phys_to_virt(alloc);
- memset(ptr, 0, size);
+ if (zero)
+ memset(ptr, 0, size);
/*
* The min_count is set to 0 so that bootmem allocated blocks
@@ -1336,6 +1337,37 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
}
/**
+ * memblock_virt_alloc_try_nid_raw - allocate boot memory block without zeroing
+ * memory and without panicking
+ * @size: size of memory block to be allocated in bytes
+ * @align: alignment of the region and block's size
+ * @min_addr: the lower bound of the memory region from where the allocation
+ * is preferred (phys address)
+ * @max_addr: the upper bound of the memory region from where the allocation
+ * is preferred (phys address), or %BOOTMEM_ALLOC_ACCESSIBLE to
+ * allocate only from memory limited by memblock.current_limit value
+ * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ *
+ * Public function, provides additional debug information (including caller
+ * info), if enabled. Does not zero allocated memory, does not panic if request
+ * cannot be satisfied.
+ *
+ * RETURNS:
+ * Virtual address of allocated memory block on success, NULL on failure.
+ */
+void * __init memblock_virt_alloc_try_nid_raw(
+ phys_addr_t size, phys_addr_t align,
+ phys_addr_t min_addr, phys_addr_t max_addr,
+ int nid)
+{
+ memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
+ __func__, (u64)size, (u64)align, nid, (u64)min_addr,
+ (u64)max_addr, (void *)_RET_IP_);
+ return memblock_virt_alloc_internal(size, align,
+ min_addr, max_addr, nid, false);
+}
+
+/**
* memblock_virt_alloc_try_nid_nopanic - allocate boot memory block
* @size: size of memory block to be allocated in bytes
* @align: alignment of the region and block's size
@@ -1346,8 +1378,8 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
* allocate only from memory limited by memblock.current_limit value
* @nid: nid of the free area to find, %NUMA_NO_NODE for any node
*
- * Public version of _memblock_virt_alloc_try_nid_nopanic() which provides
- * additional debug information (including caller info), if enabled.
+ * Public function, provides additional debug information (including caller
+ * info), if enabled. This function zeroes the allocated memory.
*
* RETURNS:
* Virtual address of allocated memory block on success, NULL on failure.
@@ -1361,7 +1393,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
__func__, (u64)size, (u64)align, nid, (u64)min_addr,
(u64)max_addr, (void *)_RET_IP_);
return memblock_virt_alloc_internal(size, align, min_addr,
- max_addr, nid);
+ max_addr, nid, true);
}
/**
@@ -1375,7 +1407,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
* allocate only from memory limited by memblock.current_limit value
* @nid: nid of the free area to find, %NUMA_NO_NODE for any node
*
- * Public panicking version of _memblock_virt_alloc_try_nid_nopanic()
+ * Public panicking version of memblock_virt_alloc_try_nid_nopanic()
* which provides debug information (including caller info), if enabled,
* and panics if the request can not be satisfied.
*
@@ -1393,7 +1425,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
__func__, (u64)size, (u64)align, nid, (u64)min_addr,
(u64)max_addr, (void *)_RET_IP_);
ptr = memblock_virt_alloc_internal(size, align,
- min_addr, max_addr, nid);
+ min_addr, max_addr, nid, true);
if (ptr)
return ptr;
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [v1 3/5] mm: add "zero" argument to vmemmap allocators
2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
2017-03-23 23:01 ` [v1 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
2017-03-23 23:01 ` [v1 2/5] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
@ 2017-03-23 23:01 ` Pavel Tatashin
2017-03-23 23:01 ` [v1 4/5] mm: zero struct pages during initialization Pavel Tatashin
` (3 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390
Allow clients to request non-zeroed memory from vmemmap allocator.
The following two public function have a new boolean argument called zero:
__vmemmap_alloc_block_buf()
vmemmap_alloc_block()
When zero is true, memory that is allocated by memblock allocator is zeroed
(the current behavior), when argument is false, the memory is not zeroed.
This change allows for optimizations where client knows when it is better
to zero memory: may be later when other CPUs are started, or may be client
is going to set every byte in the allocated memory, so no need to zero
memory beforehand.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
arch/powerpc/mm/init_64.c | 4 +-
arch/s390/mm/vmem.c | 5 ++-
arch/sparc/mm/init_64.c | 3 +-
arch/x86/mm/init_64.c | 3 +-
include/linux/mm.h | 6 ++--
mm/sparse-vmemmap.c | 48 +++++++++++++++++++++++++++++---------------
6 files changed, 43 insertions(+), 26 deletions(-)
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 9be9920..eb4c270 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -133,7 +133,7 @@ static int __meminit vmemmap_populated(unsigned long start, int page_size)
/* allocate a page when required and hand out chunks */
if (!num_left) {
- next = vmemmap_alloc_block(PAGE_SIZE, node);
+ next = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (unlikely(!next)) {
WARN_ON(1);
return NULL;
@@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
if (vmemmap_populated(start, page_size))
continue;
- p = vmemmap_alloc_block(page_size, node);
+ p = vmemmap_alloc_block(page_size, node, true);
if (!p)
return -ENOMEM;
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 60d3899..9c75214 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -251,7 +251,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
if (MACHINE_HAS_EDAT1) {
void *new_page;
- new_page = vmemmap_alloc_block(PMD_SIZE, node);
+ new_page = vmemmap_alloc_block(PMD_SIZE, node,
+ true);
if (!new_page)
goto out;
pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
@@ -271,7 +272,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
if (pte_none(*pt_dir)) {
void *new_page;
- new_page = vmemmap_alloc_block(PAGE_SIZE, node);
+ new_page = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (!new_page)
goto out;
pte_val(*pt_dir) = __pa(new_page) | pgt_prot;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 01eccab..d91e462 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2541,7 +2541,8 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
pmd = pmd_offset(pud, vstart);
pte = pmd_val(*pmd);
if (!(pte & _PAGE_VALID)) {
- void *block = vmemmap_alloc_block(PMD_SIZE, node);
+ void *block = vmemmap_alloc_block(PMD_SIZE, node,
+ true);
if (!block)
return -ENOMEM;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 15173d3..46101b6 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1176,7 +1176,8 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
if (pmd_none(*pmd)) {
void *p;
- p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
+ p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap,
+ true);
if (p) {
pte_t entry;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5f01c88..54df194 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2410,13 +2410,13 @@ void sparse_mem_maps_populate_node(struct page **map_map,
pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node);
pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node);
-void *vmemmap_alloc_block(unsigned long size, int node);
+void *vmemmap_alloc_block(unsigned long size, int node, bool zero);
struct vmem_altmap;
void *__vmemmap_alloc_block_buf(unsigned long size, int node,
- struct vmem_altmap *altmap);
+ struct vmem_altmap *altmap, bool zero);
static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
{
- return __vmemmap_alloc_block_buf(size, node, NULL);
+ return __vmemmap_alloc_block_buf(size, node, NULL, true);
}
void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a56c398..1e9508b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -39,16 +39,27 @@
static void * __ref __earlyonly_bootmem_alloc(int node,
unsigned long size,
unsigned long align,
- unsigned long goal)
+ unsigned long goal,
+ bool zero)
{
- return memblock_virt_alloc_try_nid(size, align, goal,
- BOOTMEM_ALLOC_ACCESSIBLE, node);
+ void *mem = memblock_virt_alloc_try_nid_raw(size, align, goal,
+ BOOTMEM_ALLOC_ACCESSIBLE,
+ node);
+ if (!mem) {
+ panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=0x%lx\n",
+ __func__, size, align, node, goal);
+ return NULL;
+ }
+
+ if (zero)
+ memset(mem, 0, size);
+ return mem;
}
static void *vmemmap_buf;
static void *vmemmap_buf_end;
-void * __meminit vmemmap_alloc_block(unsigned long size, int node)
+void * __meminit vmemmap_alloc_block(unsigned long size, int node, bool zero)
{
/* If the main allocator is up use that, fallback to bootmem. */
if (slab_is_available()) {
@@ -67,24 +78,27 @@
return NULL;
} else
return __earlyonly_bootmem_alloc(node, size, size,
- __pa(MAX_DMA_ADDRESS));
+ __pa(MAX_DMA_ADDRESS), zero);
}
/* need to make sure size is all the same during early stage */
-static void * __meminit alloc_block_buf(unsigned long size, int node)
+static void * __meminit alloc_block_buf(unsigned long size, int node, bool zero)
{
void *ptr;
if (!vmemmap_buf)
- return vmemmap_alloc_block(size, node);
+ return vmemmap_alloc_block(size, node, zero);
/* take the from buf */
ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);
if (ptr + size > vmemmap_buf_end)
- return vmemmap_alloc_block(size, node);
+ return vmemmap_alloc_block(size, node, zero);
vmemmap_buf = ptr + size;
+ if (zero)
+ memset(ptr, 0, size);
+
return ptr;
}
@@ -152,11 +166,11 @@ static unsigned long __meminit vmem_altmap_alloc(struct vmem_altmap *altmap,
/* need to make sure size is all the same during early stage */
void * __meminit __vmemmap_alloc_block_buf(unsigned long size, int node,
- struct vmem_altmap *altmap)
+ struct vmem_altmap *altmap, bool zero)
{
if (altmap)
return altmap_alloc_block_buf(size, altmap);
- return alloc_block_buf(size, node);
+ return alloc_block_buf(size, node, zero);
}
void __meminit vmemmap_verify(pte_t *pte, int node,
@@ -175,7 +189,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
pte_t *pte = pte_offset_kernel(pmd, addr);
if (pte_none(*pte)) {
pte_t entry;
- void *p = alloc_block_buf(PAGE_SIZE, node);
+ void *p = alloc_block_buf(PAGE_SIZE, node, true);
if (!p)
return NULL;
entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
@@ -188,7 +202,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
{
pmd_t *pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd)) {
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (!p)
return NULL;
pmd_populate_kernel(&init_mm, pmd, p);
@@ -200,7 +214,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
{
pud_t *pud = pud_offset(p4d, addr);
if (pud_none(*pud)) {
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (!p)
return NULL;
pud_populate(&init_mm, pud, p);
@@ -212,7 +226,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
{
p4d_t *p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d)) {
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (!p)
return NULL;
p4d_populate(&init_mm, p4d, p);
@@ -224,7 +238,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
{
pgd_t *pgd = pgd_offset_k(addr);
if (pgd_none(*pgd)) {
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (!p)
return NULL;
pgd_populate(&init_mm, pgd, p);
@@ -290,8 +304,8 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
void *vmemmap_buf_start;
size = ALIGN(size, PMD_SIZE);
- vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
- PMD_SIZE, __pa(MAX_DMA_ADDRESS));
+ vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size
+ * map_count, PMD_SIZE, __pa(MAX_DMA_ADDRESS), false);
if (vmemmap_buf_start) {
vmemmap_buf = vmemmap_buf_start;
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [v1 4/5] mm: zero struct pages during initialization
2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
` (2 preceding siblings ...)
2017-03-23 23:01 ` [v1 3/5] mm: add "zero" argument to vmemmap allocators Pavel Tatashin
@ 2017-03-23 23:01 ` Pavel Tatashin
2017-03-23 23:01 ` [v1 5/5] mm: teach platforms not to zero struct pages memory Pavel Tatashin
` (2 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390
When deferred struct page initialization is enabled, do not expect that
the memory that was allocated for struct pages was zeroed by the
allocator. Zero it when "struct pages" are initialized.
Also, a defined boolean VMEMMAP_ZERO is provided to tell platforms whether
they should zero memory or can deffer it.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
include/linux/mm.h | 9 +++++++++
mm/page_alloc.c | 3 +++
2 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 54df194..eb052f6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2427,6 +2427,15 @@ int vmemmap_populate_basepages(unsigned long start, unsigned long end,
#ifdef CONFIG_MEMORY_HOTPLUG
void vmemmap_free(unsigned long start, unsigned long end);
#endif
+/*
+ * Don't zero "struct page"es during early boot, and zero only when they are
+ * initialized in parallel.
+ */
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+#define VMEMMAP_ZERO false
+#else
+#define VMEMMAP_ZERO true
+#endif
void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
unsigned long size);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f202f8b..02945e4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1168,6 +1168,9 @@ static void free_one_page(struct zone *zone,
static void __meminit __init_single_page(struct page *page, unsigned long pfn,
unsigned long zone, int nid)
{
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+ memset(page, 0, sizeof(struct page));
+#endif
set_page_links(page, zone, nid, pfn);
init_page_count(page);
page_mapcount_reset(page);
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [v1 5/5] mm: teach platforms not to zero struct pages memory
2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
` (3 preceding siblings ...)
2017-03-23 23:01 ` [v1 4/5] mm: zero struct pages during initialization Pavel Tatashin
@ 2017-03-23 23:01 ` Pavel Tatashin
2017-03-23 23:26 ` [v1 0/5] parallelized "struct page" zeroing Matthew Wilcox
2017-03-24 8:51 ` Christian Borntraeger
6 siblings, 0 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390
If we are using deferred struct page initialization feature, most of
"struct page"es are getting initialized after other CPUs are started, and
hence we are benefiting from doing this job in parallel. However, we are
still zeroing all the memory that is allocated for "struct pages" using the
boot CPU. This patch solves this problem, by deferring zeroing "struct
pages" to only when they are initialized.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
arch/powerpc/mm/init_64.c | 2 +-
arch/sparc/mm/init_64.c | 2 +-
arch/x86/mm/init_64.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index eb4c270..24faf2d 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
if (vmemmap_populated(start, page_size))
continue;
- p = vmemmap_alloc_block(page_size, node, true);
+ p = vmemmap_alloc_block(page_size, node, VMEMMAP_ZERO);
if (!p)
return -ENOMEM;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index d91e462..280834e 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2542,7 +2542,7 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
pte = pmd_val(*pmd);
if (!(pte & _PAGE_VALID)) {
void *block = vmemmap_alloc_block(PMD_SIZE, node,
- true);
+ VMEMMAP_ZERO);
if (!block)
return -ENOMEM;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 46101b6..9d8c72c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1177,7 +1177,7 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
void *p;
p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap,
- true);
+ VMEMMAP_ZERO);
if (p) {
pte_t entry;
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [v1 0/5] parallelized "struct page" zeroing
2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
` (4 preceding siblings ...)
2017-03-23 23:01 ` [v1 5/5] mm: teach platforms not to zero struct pages memory Pavel Tatashin
@ 2017-03-23 23:26 ` Matthew Wilcox
2017-03-23 23:35 ` David Miller
2017-03-23 23:36 ` Pasha Tatashin
2017-03-24 8:51 ` Christian Borntraeger
6 siblings, 2 replies; 13+ messages in thread
From: Matthew Wilcox @ 2017-03-23 23:26 UTC (permalink / raw)
To: Pavel Tatashin
Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390
On Thu, Mar 23, 2017 at 07:01:48PM -0400, Pavel Tatashin wrote:
> When deferred struct page initialization feature is enabled, we get a
> performance gain of initializing vmemmap in parallel after other CPUs are
> started. However, we still zero the memory for vmemmap using one boot CPU.
> This patch-set fixes the memset-zeroing limitation by deferring it as well.
>
> Here is example performance gain on SPARC with 32T:
> base
> https://hastebin.com/ozanelatat.go
>
> fix
> https://hastebin.com/utonawukof.go
>
> As you can see without the fix it takes: 97.89s to boot
> With the fix it takes: 46.91 to boot.
How long does it take if we just don't zero this memory at all? We seem
to be initialising most of struct page in __init_single_page(), so it
seems like a lot of additional complexity to conditionally zero the rest
of struct page.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v1 0/5] parallelized "struct page" zeroing
2017-03-23 23:26 ` [v1 0/5] parallelized "struct page" zeroing Matthew Wilcox
@ 2017-03-23 23:35 ` David Miller
2017-03-23 23:47 ` Pasha Tatashin
2017-03-23 23:36 ` Pasha Tatashin
1 sibling, 1 reply; 13+ messages in thread
From: David Miller @ 2017-03-23 23:35 UTC (permalink / raw)
To: willy
Cc: pasha.tatashin, linux-kernel, sparclinux, linux-mm, linuxppc-dev,
linux-s390
From: Matthew Wilcox <willy@infradead.org>
Date: Thu, 23 Mar 2017 16:26:38 -0700
> On Thu, Mar 23, 2017 at 07:01:48PM -0400, Pavel Tatashin wrote:
>> When deferred struct page initialization feature is enabled, we get a
>> performance gain of initializing vmemmap in parallel after other CPUs are
>> started. However, we still zero the memory for vmemmap using one boot CPU.
>> This patch-set fixes the memset-zeroing limitation by deferring it as well.
>>
>> Here is example performance gain on SPARC with 32T:
>> base
>> https://hastebin.com/ozanelatat.go
>>
>> fix
>> https://hastebin.com/utonawukof.go
>>
>> As you can see without the fix it takes: 97.89s to boot
>> With the fix it takes: 46.91 to boot.
>
> How long does it take if we just don't zero this memory at all? We seem
> to be initialising most of struct page in __init_single_page(), so it
> seems like a lot of additional complexity to conditionally zero the rest
> of struct page.
Alternatively, just zero out the entire vmemmap area when it is setup
in the kernel page tables.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v1 0/5] parallelized "struct page" zeroing
2017-03-23 23:26 ` [v1 0/5] parallelized "struct page" zeroing Matthew Wilcox
2017-03-23 23:35 ` David Miller
@ 2017-03-23 23:36 ` Pasha Tatashin
1 sibling, 0 replies; 13+ messages in thread
From: Pasha Tatashin @ 2017-03-23 23:36 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390
Hi Matthew,
Thank you for your comment. If you look at the data, having memset()
actually benefits initializing data.
With base it takes:
[ 66.148867] node 0 initialised, 128312523 pages in 7200ms
With fix:
[ 15.260634] node 0 initialised, 128312523 pages in 4190ms
So 4.19s vs 7.2s for the same number of "struct page". This is because
memset() actually brings "struct page" into cache with efficient block
initializing store instruction. I have not tested if there is the same
effect on Intel.
Pasha
On 03/23/2017 07:26 PM, Matthew Wilcox wrote:
> On Thu, Mar 23, 2017 at 07:01:48PM -0400, Pavel Tatashin wrote:
>> When deferred struct page initialization feature is enabled, we get a
>> performance gain of initializing vmemmap in parallel after other CPUs are
>> started. However, we still zero the memory for vmemmap using one boot CPU.
>> This patch-set fixes the memset-zeroing limitation by deferring it as well.
>>
>> Here is example performance gain on SPARC with 32T:
>> base
>> https://hastebin.com/ozanelatat.go
>>
>> fix
>> https://hastebin.com/utonawukof.go
>>
>> As you can see without the fix it takes: 97.89s to boot
>> With the fix it takes: 46.91 to boot.
>
> How long does it take if we just don't zero this memory at all? We seem
> to be initialising most of struct page in __init_single_page(), so it
> seems like a lot of additional complexity to conditionally zero the rest
> of struct page.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v1 0/5] parallelized "struct page" zeroing
2017-03-23 23:35 ` David Miller
@ 2017-03-23 23:47 ` Pasha Tatashin
2017-03-24 1:15 ` Pasha Tatashin
0 siblings, 1 reply; 13+ messages in thread
From: Pasha Tatashin @ 2017-03-23 23:47 UTC (permalink / raw)
To: David Miller, willy
Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390
On 03/23/2017 07:35 PM, David Miller wrote:
> From: Matthew Wilcox <willy@infradead.org>
> Date: Thu, 23 Mar 2017 16:26:38 -0700
>
>> On Thu, Mar 23, 2017 at 07:01:48PM -0400, Pavel Tatashin wrote:
>>> When deferred struct page initialization feature is enabled, we get a
>>> performance gain of initializing vmemmap in parallel after other CPUs are
>>> started. However, we still zero the memory for vmemmap using one boot CPU.
>>> This patch-set fixes the memset-zeroing limitation by deferring it as well.
>>>
>>> Here is example performance gain on SPARC with 32T:
>>> base
>>> https://hastebin.com/ozanelatat.go
>>>
>>> fix
>>> https://hastebin.com/utonawukof.go
>>>
>>> As you can see without the fix it takes: 97.89s to boot
>>> With the fix it takes: 46.91 to boot.
>>
>> How long does it take if we just don't zero this memory at all? We seem
>> to be initialising most of struct page in __init_single_page(), so it
>> seems like a lot of additional complexity to conditionally zero the rest
>> of struct page.
>
> Alternatively, just zero out the entire vmemmap area when it is setup
> in the kernel page tables.
Hi Dave,
I can do this, either way is fine with me. It would be a little slower
compared to the current approach where we benefit from having memset()
to work as prefetch. But that would become negligible, once in the
future we will increase the granularity of multi-threading, currently it
is only one thread per-mnode to multithread vmemamp. Your call.
Thank you,
Pasha
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v1 0/5] parallelized "struct page" zeroing
2017-03-23 23:47 ` Pasha Tatashin
@ 2017-03-24 1:15 ` Pasha Tatashin
0 siblings, 0 replies; 13+ messages in thread
From: Pasha Tatashin @ 2017-03-24 1:15 UTC (permalink / raw)
To: David Miller, willy
Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390
On 03/23/2017 07:47 PM, Pasha Tatashin wrote:
>>>
>>> How long does it take if we just don't zero this memory at all? We seem
>>> to be initialising most of struct page in __init_single_page(), so it
>>> seems like a lot of additional complexity to conditionally zero the rest
>>> of struct page.
>>
>> Alternatively, just zero out the entire vmemmap area when it is setup
>> in the kernel page tables.
>
> Hi Dave,
>
> I can do this, either way is fine with me. It would be a little slower
> compared to the current approach where we benefit from having memset()
> to work as prefetch. But that would become negligible, once in the
> future we will increase the granularity of multi-threading, currently it
> is only one thread per-mnode to multithread vmemamp. Your call.
>
> Thank you,
> Pasha
Hi Dave and Matthew,
I've been thinking about it some more, and figured that the current
approach is better:
1. Most importantly: Part of the vmemmap is initialized early during
boot to support Linux to get to the multi-CPU environment. This means
that we would need to figure out what part of vmemmap will need to be
zeroed before hand in single thread, than zero the rest in multi-thread.
This will be very awkward architecturally and error prone.
2. As I already showed, the current approach is significantly faster.
So, perhaps it should be the default behavior even for non-deferred
"struct page" initialization: unconditionally do not zero vmemmap in
memblock allocator, and always zero in __init_single_page(). But, I am
afraid it could cause boot time regressions on some platforms where
memset() is not optimized, so I would not do it in this patchset. But,
hopefully, gradually more platforms will support deferred struct page
initialization, and this would become the default behavior.
3. By zeroing "struct page" in __init_single_page(), we set every byte
of "struct page" in one place instead of scattering it across different
places. So, it could help in the future when we will multi-thread
addition of hotplugged memory.
Thank you,
Pasha
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v1 0/5] parallelized "struct page" zeroing
2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
` (5 preceding siblings ...)
2017-03-23 23:26 ` [v1 0/5] parallelized "struct page" zeroing Matthew Wilcox
@ 2017-03-24 8:51 ` Christian Borntraeger
2017-03-24 9:35 ` Heiko Carstens
6 siblings, 1 reply; 13+ messages in thread
From: Christian Borntraeger @ 2017-03-24 8:51 UTC (permalink / raw)
To: Pavel Tatashin, linux-kernel, sparclinux, linux-mm, linuxppc-dev,
linux-s390
On 03/24/2017 12:01 AM, Pavel Tatashin wrote:
> When deferred struct page initialization feature is enabled, we get a
> performance gain of initializing vmemmap in parallel after other CPUs are
> started. However, we still zero the memory for vmemmap using one boot CPU.
> This patch-set fixes the memset-zeroing limitation by deferring it as well.
>
> Here is example performance gain on SPARC with 32T:
> base
> https://hastebin.com/ozanelatat.go
>
> fix
> https://hastebin.com/utonawukof.go
>
> As you can see without the fix it takes: 97.89s to boot
> With the fix it takes: 46.91 to boot.
>
> On x86 time saving is going to be even greater (proportionally to memory size)
> because there are twice as many "struct page"es for the same amount of memory,
> as base pages are twice smaller.
Fixing the linux-s390 mailing list email.
This might be useful for s390 as well.
>
>
> Pavel Tatashin (5):
> sparc64: simplify vmemmap_populate
> mm: defining memblock_virt_alloc_try_nid_raw
> mm: add "zero" argument to vmemmap allocators
> mm: zero struct pages during initialization
> mm: teach platforms not to zero struct pages memory
>
> arch/powerpc/mm/init_64.c | 4 +-
> arch/s390/mm/vmem.c | 5 ++-
> arch/sparc/mm/init_64.c | 26 +++++++----------------
> arch/x86/mm/init_64.c | 3 +-
> include/linux/bootmem.h | 3 ++
> include/linux/mm.h | 15 +++++++++++--
> mm/memblock.c | 46 ++++++++++++++++++++++++++++++++++++------
> mm/page_alloc.c | 3 ++
> mm/sparse-vmemmap.c | 48 +++++++++++++++++++++++++++++---------------
> 9 files changed, 103 insertions(+), 50 deletions(-)
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [v1 0/5] parallelized "struct page" zeroing
2017-03-24 8:51 ` Christian Borntraeger
@ 2017-03-24 9:35 ` Heiko Carstens
0 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2017-03-24 9:35 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Pavel Tatashin, linux-kernel, sparclinux, linux-mm, linuxppc-dev,
linux-s390
On Fri, Mar 24, 2017 at 09:51:09AM +0100, Christian Borntraeger wrote:
> On 03/24/2017 12:01 AM, Pavel Tatashin wrote:
> > When deferred struct page initialization feature is enabled, we get a
> > performance gain of initializing vmemmap in parallel after other CPUs are
> > started. However, we still zero the memory for vmemmap using one boot CPU.
> > This patch-set fixes the memset-zeroing limitation by deferring it as well.
> >
> > Here is example performance gain on SPARC with 32T:
> > base
> > https://hastebin.com/ozanelatat.go
> >
> > fix
> > https://hastebin.com/utonawukof.go
> >
> > As you can see without the fix it takes: 97.89s to boot
> > With the fix it takes: 46.91 to boot.
> >
> > On x86 time saving is going to be even greater (proportionally to memory size)
> > because there are twice as many "struct page"es for the same amount of memory,
> > as base pages are twice smaller.
>
> Fixing the linux-s390 mailing list email.
> This might be useful for s390 as well.
Unfortunately only for the fake numa case, since as far as I understand it,
parallelization happens only on a node granularity. And since we are
usually only having one node...
But anyway, it won't hurt to set ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT on
s390 also. I'll do some testing and then we'll see.
Pavel, could you please change your patch 5 so it also converts the s390
call sites of vmemmap_alloc_block() so they use VMEMMAP_ZERO instead of
'true' as argument?
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-03-24 9:36 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
2017-03-23 23:01 ` [v1 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
2017-03-23 23:01 ` [v1 2/5] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
2017-03-23 23:01 ` [v1 3/5] mm: add "zero" argument to vmemmap allocators Pavel Tatashin
2017-03-23 23:01 ` [v1 4/5] mm: zero struct pages during initialization Pavel Tatashin
2017-03-23 23:01 ` [v1 5/5] mm: teach platforms not to zero struct pages memory Pavel Tatashin
2017-03-23 23:26 ` [v1 0/5] parallelized "struct page" zeroing Matthew Wilcox
2017-03-23 23:35 ` David Miller
2017-03-23 23:47 ` Pasha Tatashin
2017-03-24 1:15 ` Pasha Tatashin
2017-03-23 23:36 ` Pasha Tatashin
2017-03-24 8:51 ` Christian Borntraeger
2017-03-24 9:35 ` Heiko Carstens
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).