linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v1 0/5] parallelized "struct page" zeroing
@ 2017-03-23 23:01 Pavel Tatashin
  2017-03-23 23:01 ` [v1 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390

When deferred struct page initialization feature is enabled, we get a
performance gain of initializing vmemmap in parallel after other CPUs are
started. However, we still zero the memory for vmemmap using one boot CPU.
This patch-set fixes the memset-zeroing limitation by deferring it as well.

Here is example performance gain on SPARC with 32T:
base
https://hastebin.com/ozanelatat.go

fix
https://hastebin.com/utonawukof.go

As you can see without the fix it takes: 97.89s to boot
With the fix it takes: 46.91 to boot.

On x86 time saving is going to be even greater (proportionally to memory size)
because there are twice as many "struct page"es for the same amount of memory,
as base pages are twice smaller.


Pavel Tatashin (5):
  sparc64: simplify vmemmap_populate
  mm: defining memblock_virt_alloc_try_nid_raw
  mm: add "zero" argument to vmemmap allocators
  mm: zero struct pages during initialization
  mm: teach platforms not to zero struct pages memory

 arch/powerpc/mm/init_64.c |    4 +-
 arch/s390/mm/vmem.c       |    5 ++-
 arch/sparc/mm/init_64.c   |   26 +++++++----------------
 arch/x86/mm/init_64.c     |    3 +-
 include/linux/bootmem.h   |    3 ++
 include/linux/mm.h        |   15 +++++++++++--
 mm/memblock.c             |   46 ++++++++++++++++++++++++++++++++++++------
 mm/page_alloc.c           |    3 ++
 mm/sparse-vmemmap.c       |   48 +++++++++++++++++++++++++++++---------------
 9 files changed, 103 insertions(+), 50 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [v1 1/5] sparc64: simplify vmemmap_populate
  2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
@ 2017-03-23 23:01 ` Pavel Tatashin
  2017-03-23 23:01 ` [v1 2/5] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390

Remove duplicating code, by using common functions
vmemmap_pud_populate and vmemmap_pgd_populate functions.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 arch/sparc/mm/init_64.c |   23 ++++++-----------------
 1 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 2c0cb2a..01eccab 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2526,30 +2526,19 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
 	vstart = vstart & PMD_MASK;
 	vend = ALIGN(vend, PMD_SIZE);
 	for (; vstart < vend; vstart += PMD_SIZE) {
-		pgd_t *pgd = pgd_offset_k(vstart);
+		pgd_t *pgd = vmemmap_pgd_populate(vstart, node);
 		unsigned long pte;
 		pud_t *pud;
 		pmd_t *pmd;
 
-		if (pgd_none(*pgd)) {
-			pud_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
+		if (!pgd)
+			return -ENOMEM;
 
-			if (!new)
-				return -ENOMEM;
-			pgd_populate(&init_mm, pgd, new);
-		}
-
-		pud = pud_offset(pgd, vstart);
-		if (pud_none(*pud)) {
-			pmd_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
-
-			if (!new)
-				return -ENOMEM;
-			pud_populate(&init_mm, pud, new);
-		}
+		pud = vmemmap_pud_populate(pgd, vstart, node);
+		if (!pud)
+			return -ENOMEM;
 
 		pmd = pmd_offset(pud, vstart);
-
 		pte = pmd_val(*pmd);
 		if (!(pte & _PAGE_VALID)) {
 			void *block = vmemmap_alloc_block(PMD_SIZE, node);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [v1 2/5] mm: defining memblock_virt_alloc_try_nid_raw
  2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
  2017-03-23 23:01 ` [v1 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
@ 2017-03-23 23:01 ` Pavel Tatashin
  2017-03-23 23:01 ` [v1 3/5] mm: add "zero" argument to vmemmap allocators Pavel Tatashin
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390

A new version of memblock_virt_alloc_* allocations:
- Does not zero the allocated memory
- Does not panic if request cannot be satisfied

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 include/linux/bootmem.h |    3 +++
 mm/memblock.c           |   46 +++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index dbaf312..b61ea10 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -160,6 +160,9 @@ extern int reserve_bootmem_node(pg_data_t *pgdat,
 #define BOOTMEM_ALLOC_ANYWHERE		(~(phys_addr_t)0)
 
 /* FIXME: Move to memblock.h at a point where we remove nobootmem.c */
+void *memblock_virt_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
+				      phys_addr_t min_addr,
+				      phys_addr_t max_addr, int nid);
 void *memblock_virt_alloc_try_nid_nopanic(phys_addr_t size,
 		phys_addr_t align, phys_addr_t min_addr,
 		phys_addr_t max_addr, int nid);
diff --git a/mm/memblock.c b/mm/memblock.c
index 696f06d..7fdc555 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1271,7 +1271,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
 static void * __init memblock_virt_alloc_internal(
 				phys_addr_t size, phys_addr_t align,
 				phys_addr_t min_addr, phys_addr_t max_addr,
-				int nid)
+				int nid, bool zero)
 {
 	phys_addr_t alloc;
 	void *ptr;
@@ -1322,7 +1322,8 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
 	return NULL;
 done:
 	ptr = phys_to_virt(alloc);
-	memset(ptr, 0, size);
+	if (zero)
+		memset(ptr, 0, size);
 
 	/*
 	 * The min_count is set to 0 so that bootmem allocated blocks
@@ -1336,6 +1337,37 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
 }
 
 /**
+ * memblock_virt_alloc_try_nid_raw - allocate boot memory block without zeroing
+ * memory and without panicking
+ * @size: size of memory block to be allocated in bytes
+ * @align: alignment of the region and block's size
+ * @min_addr: the lower bound of the memory region from where the allocation
+ *	  is preferred (phys address)
+ * @max_addr: the upper bound of the memory region from where the allocation
+ *	      is preferred (phys address), or %BOOTMEM_ALLOC_ACCESSIBLE to
+ *	      allocate only from memory limited by memblock.current_limit value
+ * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ *
+ * Public function, provides additional debug information (including caller
+ * info), if enabled. Does not zero allocated memory, does not panic if request
+ * cannot be satisfied.
+ *
+ * RETURNS:
+ * Virtual address of allocated memory block on success, NULL on failure.
+ */
+void * __init memblock_virt_alloc_try_nid_raw(
+			phys_addr_t size, phys_addr_t align,
+			phys_addr_t min_addr, phys_addr_t max_addr,
+			int nid)
+{
+	memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
+		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
+		     (u64)max_addr, (void *)_RET_IP_);
+	return memblock_virt_alloc_internal(size, align,
+					   min_addr, max_addr, nid, false);
+}
+
+/**
  * memblock_virt_alloc_try_nid_nopanic - allocate boot memory block
  * @size: size of memory block to be allocated in bytes
  * @align: alignment of the region and block's size
@@ -1346,8 +1378,8 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
  *	      allocate only from memory limited by memblock.current_limit value
  * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
  *
- * Public version of _memblock_virt_alloc_try_nid_nopanic() which provides
- * additional debug information (including caller info), if enabled.
+ * Public function, provides additional debug information (including caller
+ * info), if enabled. This function zeroes the allocated memory.
  *
  * RETURNS:
  * Virtual address of allocated memory block on success, NULL on failure.
@@ -1361,7 +1393,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
 		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
 		     (u64)max_addr, (void *)_RET_IP_);
 	return memblock_virt_alloc_internal(size, align, min_addr,
-					     max_addr, nid);
+					     max_addr, nid, true);
 }
 
 /**
@@ -1375,7 +1407,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
  *	      allocate only from memory limited by memblock.current_limit value
  * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
  *
- * Public panicking version of _memblock_virt_alloc_try_nid_nopanic()
+ * Public panicking version of memblock_virt_alloc_try_nid_nopanic()
  * which provides debug information (including caller info), if enabled,
  * and panics if the request can not be satisfied.
  *
@@ -1393,7 +1425,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
 		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
 		     (u64)max_addr, (void *)_RET_IP_);
 	ptr = memblock_virt_alloc_internal(size, align,
-					   min_addr, max_addr, nid);
+					   min_addr, max_addr, nid, true);
 	if (ptr)
 		return ptr;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [v1 3/5] mm: add "zero" argument to vmemmap allocators
  2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
  2017-03-23 23:01 ` [v1 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
  2017-03-23 23:01 ` [v1 2/5] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
@ 2017-03-23 23:01 ` Pavel Tatashin
  2017-03-23 23:01 ` [v1 4/5] mm: zero struct pages during initialization Pavel Tatashin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390

Allow clients to request non-zeroed memory from vmemmap allocator.
The following two public function have a new boolean argument called zero:

__vmemmap_alloc_block_buf()
vmemmap_alloc_block()

When zero is true, memory that is allocated by memblock allocator is zeroed
(the current behavior), when argument is false, the memory is not zeroed.

This change allows for optimizations where client knows when it is better
to zero memory: may be later when other CPUs are started, or may be client
is going to set every byte in the allocated memory, so no need to zero
memory beforehand.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 arch/powerpc/mm/init_64.c |    4 +-
 arch/s390/mm/vmem.c       |    5 ++-
 arch/sparc/mm/init_64.c   |    3 +-
 arch/x86/mm/init_64.c     |    3 +-
 include/linux/mm.h        |    6 ++--
 mm/sparse-vmemmap.c       |   48 +++++++++++++++++++++++++++++---------------
 6 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 9be9920..eb4c270 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -133,7 +133,7 @@ static int __meminit vmemmap_populated(unsigned long start, int page_size)
 
 	/* allocate a page when required and hand out chunks */
 	if (!num_left) {
-		next = vmemmap_alloc_block(PAGE_SIZE, node);
+		next = vmemmap_alloc_block(PAGE_SIZE, node, true);
 		if (unlikely(!next)) {
 			WARN_ON(1);
 			return NULL;
@@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (vmemmap_populated(start, page_size))
 			continue;
 
-		p = vmemmap_alloc_block(page_size, node);
+		p = vmemmap_alloc_block(page_size, node, true);
 		if (!p)
 			return -ENOMEM;
 
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 60d3899..9c75214 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -251,7 +251,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 			if (MACHINE_HAS_EDAT1) {
 				void *new_page;
 
-				new_page = vmemmap_alloc_block(PMD_SIZE, node);
+				new_page = vmemmap_alloc_block(PMD_SIZE, node,
+							       true);
 				if (!new_page)
 					goto out;
 				pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
@@ -271,7 +272,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (pte_none(*pt_dir)) {
 			void *new_page;
 
-			new_page = vmemmap_alloc_block(PAGE_SIZE, node);
+			new_page = vmemmap_alloc_block(PAGE_SIZE, node, true);
 			if (!new_page)
 				goto out;
 			pte_val(*pt_dir) = __pa(new_page) | pgt_prot;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 01eccab..d91e462 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2541,7 +2541,8 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
 		pmd = pmd_offset(pud, vstart);
 		pte = pmd_val(*pmd);
 		if (!(pte & _PAGE_VALID)) {
-			void *block = vmemmap_alloc_block(PMD_SIZE, node);
+			void *block = vmemmap_alloc_block(PMD_SIZE, node,
+							  true);
 
 			if (!block)
 				return -ENOMEM;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 15173d3..46101b6 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1176,7 +1176,8 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 		if (pmd_none(*pmd)) {
 			void *p;
 
-			p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
+			p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap,
+						      true);
 			if (p) {
 				pte_t entry;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5f01c88..54df194 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2410,13 +2410,13 @@ void sparse_mem_maps_populate_node(struct page **map_map,
 pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
 pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node);
 pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node);
-void *vmemmap_alloc_block(unsigned long size, int node);
+void *vmemmap_alloc_block(unsigned long size, int node, bool zero);
 struct vmem_altmap;
 void *__vmemmap_alloc_block_buf(unsigned long size, int node,
-		struct vmem_altmap *altmap);
+		struct vmem_altmap *altmap, bool zero);
 static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
 {
-	return __vmemmap_alloc_block_buf(size, node, NULL);
+	return __vmemmap_alloc_block_buf(size, node, NULL, true);
 }
 
 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a56c398..1e9508b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -39,16 +39,27 @@
 static void * __ref __earlyonly_bootmem_alloc(int node,
 				unsigned long size,
 				unsigned long align,
-				unsigned long goal)
+				unsigned long goal,
+				bool zero)
 {
-	return memblock_virt_alloc_try_nid(size, align, goal,
-					    BOOTMEM_ALLOC_ACCESSIBLE, node);
+	void *mem = memblock_virt_alloc_try_nid_raw(size, align, goal,
+						    BOOTMEM_ALLOC_ACCESSIBLE,
+						    node);
+	if (!mem) {
+		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=0x%lx\n",
+		      __func__, size, align, node, goal);
+		return NULL;
+	}
+
+	if (zero)
+		memset(mem, 0, size);
+	return mem;
 }
 
 static void *vmemmap_buf;
 static void *vmemmap_buf_end;
 
-void * __meminit vmemmap_alloc_block(unsigned long size, int node)
+void * __meminit vmemmap_alloc_block(unsigned long size, int node, bool zero)
 {
 	/* If the main allocator is up use that, fallback to bootmem. */
 	if (slab_is_available()) {
@@ -67,24 +78,27 @@
 		return NULL;
 	} else
 		return __earlyonly_bootmem_alloc(node, size, size,
-				__pa(MAX_DMA_ADDRESS));
+				__pa(MAX_DMA_ADDRESS), zero);
 }
 
 /* need to make sure size is all the same during early stage */
-static void * __meminit alloc_block_buf(unsigned long size, int node)
+static void * __meminit alloc_block_buf(unsigned long size, int node, bool zero)
 {
 	void *ptr;
 
 	if (!vmemmap_buf)
-		return vmemmap_alloc_block(size, node);
+		return vmemmap_alloc_block(size, node, zero);
 
 	/* take the from buf */
 	ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);
 	if (ptr + size > vmemmap_buf_end)
-		return vmemmap_alloc_block(size, node);
+		return vmemmap_alloc_block(size, node, zero);
 
 	vmemmap_buf = ptr + size;
 
+	if (zero)
+		memset(ptr, 0, size);
+
 	return ptr;
 }
 
@@ -152,11 +166,11 @@ static unsigned long __meminit vmem_altmap_alloc(struct vmem_altmap *altmap,
 
 /* need to make sure size is all the same during early stage */
 void * __meminit __vmemmap_alloc_block_buf(unsigned long size, int node,
-		struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, bool zero)
 {
 	if (altmap)
 		return altmap_alloc_block_buf(size, altmap);
-	return alloc_block_buf(size, node);
+	return alloc_block_buf(size, node, zero);
 }
 
 void __meminit vmemmap_verify(pte_t *pte, int node,
@@ -175,7 +189,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
 	pte_t *pte = pte_offset_kernel(pmd, addr);
 	if (pte_none(*pte)) {
 		pte_t entry;
-		void *p = alloc_block_buf(PAGE_SIZE, node);
+		void *p = alloc_block_buf(PAGE_SIZE, node, true);
 		if (!p)
 			return NULL;
 		entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
@@ -188,7 +202,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
 {
 	pmd_t *pmd = pmd_offset(pud, addr);
 	if (pmd_none(*pmd)) {
-		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+		void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
 		if (!p)
 			return NULL;
 		pmd_populate_kernel(&init_mm, pmd, p);
@@ -200,7 +214,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
 {
 	pud_t *pud = pud_offset(p4d, addr);
 	if (pud_none(*pud)) {
-		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+		void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
 		if (!p)
 			return NULL;
 		pud_populate(&init_mm, pud, p);
@@ -212,7 +226,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
 {
 	p4d_t *p4d = p4d_offset(pgd, addr);
 	if (p4d_none(*p4d)) {
-		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+		void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
 		if (!p)
 			return NULL;
 		p4d_populate(&init_mm, p4d, p);
@@ -224,7 +238,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
 {
 	pgd_t *pgd = pgd_offset_k(addr);
 	if (pgd_none(*pgd)) {
-		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+		void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
 		if (!p)
 			return NULL;
 		pgd_populate(&init_mm, pgd, p);
@@ -290,8 +304,8 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	void *vmemmap_buf_start;
 
 	size = ALIGN(size, PMD_SIZE);
-	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
-			 PMD_SIZE, __pa(MAX_DMA_ADDRESS));
+	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size
+			* map_count, PMD_SIZE, __pa(MAX_DMA_ADDRESS), false);
 
 	if (vmemmap_buf_start) {
 		vmemmap_buf = vmemmap_buf_start;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [v1 4/5] mm: zero struct pages during initialization
  2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
                   ` (2 preceding siblings ...)
  2017-03-23 23:01 ` [v1 3/5] mm: add "zero" argument to vmemmap allocators Pavel Tatashin
@ 2017-03-23 23:01 ` Pavel Tatashin
  2017-03-23 23:01 ` [v1 5/5] mm: teach platforms not to zero struct pages memory Pavel Tatashin
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390

When deferred struct page initialization is enabled, do not expect that
the memory that was allocated for struct pages was zeroed by the
allocator. Zero it when "struct pages" are initialized.

Also, a defined boolean VMEMMAP_ZERO is provided to tell platforms whether
they should zero memory or can deffer it.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 include/linux/mm.h |    9 +++++++++
 mm/page_alloc.c    |    3 +++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 54df194..eb052f6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2427,6 +2427,15 @@ int vmemmap_populate_basepages(unsigned long start, unsigned long end,
 #ifdef CONFIG_MEMORY_HOTPLUG
 void vmemmap_free(unsigned long start, unsigned long end);
 #endif
+/*
+ * Don't zero "struct page"es during early boot, and zero only when they are
+ * initialized in parallel.
+ */
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+#define VMEMMAP_ZERO	false
+#else
+#define VMEMMAP_ZERO	true
+#endif
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
 				  unsigned long size);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f202f8b..02945e4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1168,6 +1168,9 @@ static void free_one_page(struct zone *zone,
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+	memset(page, 0, sizeof(struct page));
+#endif
 	set_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [v1 5/5] mm: teach platforms not to zero struct pages memory
  2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
                   ` (3 preceding siblings ...)
  2017-03-23 23:01 ` [v1 4/5] mm: zero struct pages during initialization Pavel Tatashin
@ 2017-03-23 23:01 ` Pavel Tatashin
  2017-03-23 23:26 ` [v1 0/5] parallelized "struct page" zeroing Matthew Wilcox
  2017-03-24  8:51 ` Christian Borntraeger
  6 siblings, 0 replies; 13+ messages in thread
From: Pavel Tatashin @ 2017-03-23 23:01 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390

If we are using deferred struct page initialization feature, most of
"struct page"es are getting initialized after other CPUs are started, and
hence we are benefiting from doing this job in parallel. However, we are
still zeroing all the memory that is allocated for "struct pages" using the
boot CPU.  This patch solves this problem, by deferring zeroing "struct
pages" to only when they are initialized.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 arch/powerpc/mm/init_64.c |    2 +-
 arch/sparc/mm/init_64.c   |    2 +-
 arch/x86/mm/init_64.c     |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index eb4c270..24faf2d 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (vmemmap_populated(start, page_size))
 			continue;
 
-		p = vmemmap_alloc_block(page_size, node, true);
+		p = vmemmap_alloc_block(page_size, node, VMEMMAP_ZERO);
 		if (!p)
 			return -ENOMEM;
 
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index d91e462..280834e 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2542,7 +2542,7 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
 		pte = pmd_val(*pmd);
 		if (!(pte & _PAGE_VALID)) {
 			void *block = vmemmap_alloc_block(PMD_SIZE, node,
-							  true);
+							  VMEMMAP_ZERO);
 
 			if (!block)
 				return -ENOMEM;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 46101b6..9d8c72c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1177,7 +1177,7 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 			void *p;
 
 			p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap,
-						      true);
+						      VMEMMAP_ZERO);
 			if (p) {
 				pte_t entry;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [v1 0/5] parallelized "struct page" zeroing
  2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
                   ` (4 preceding siblings ...)
  2017-03-23 23:01 ` [v1 5/5] mm: teach platforms not to zero struct pages memory Pavel Tatashin
@ 2017-03-23 23:26 ` Matthew Wilcox
  2017-03-23 23:35   ` David Miller
  2017-03-23 23:36   ` Pasha Tatashin
  2017-03-24  8:51 ` Christian Borntraeger
  6 siblings, 2 replies; 13+ messages in thread
From: Matthew Wilcox @ 2017-03-23 23:26 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390

On Thu, Mar 23, 2017 at 07:01:48PM -0400, Pavel Tatashin wrote:
> When deferred struct page initialization feature is enabled, we get a
> performance gain of initializing vmemmap in parallel after other CPUs are
> started. However, we still zero the memory for vmemmap using one boot CPU.
> This patch-set fixes the memset-zeroing limitation by deferring it as well.
> 
> Here is example performance gain on SPARC with 32T:
> base
> https://hastebin.com/ozanelatat.go
> 
> fix
> https://hastebin.com/utonawukof.go
> 
> As you can see without the fix it takes: 97.89s to boot
> With the fix it takes: 46.91 to boot.

How long does it take if we just don't zero this memory at all?  We seem
to be initialising most of struct page in __init_single_page(), so it
seems like a lot of additional complexity to conditionally zero the rest
of struct page.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [v1 0/5] parallelized "struct page" zeroing
  2017-03-23 23:26 ` [v1 0/5] parallelized "struct page" zeroing Matthew Wilcox
@ 2017-03-23 23:35   ` David Miller
  2017-03-23 23:47     ` Pasha Tatashin
  2017-03-23 23:36   ` Pasha Tatashin
  1 sibling, 1 reply; 13+ messages in thread
From: David Miller @ 2017-03-23 23:35 UTC (permalink / raw)
  To: willy
  Cc: pasha.tatashin, linux-kernel, sparclinux, linux-mm, linuxppc-dev,
	linux-s390

From: Matthew Wilcox <willy@infradead.org>
Date: Thu, 23 Mar 2017 16:26:38 -0700

> On Thu, Mar 23, 2017 at 07:01:48PM -0400, Pavel Tatashin wrote:
>> When deferred struct page initialization feature is enabled, we get a
>> performance gain of initializing vmemmap in parallel after other CPUs are
>> started. However, we still zero the memory for vmemmap using one boot CPU.
>> This patch-set fixes the memset-zeroing limitation by deferring it as well.
>> 
>> Here is example performance gain on SPARC with 32T:
>> base
>> https://hastebin.com/ozanelatat.go
>> 
>> fix
>> https://hastebin.com/utonawukof.go
>> 
>> As you can see without the fix it takes: 97.89s to boot
>> With the fix it takes: 46.91 to boot.
> 
> How long does it take if we just don't zero this memory at all?  We seem
> to be initialising most of struct page in __init_single_page(), so it
> seems like a lot of additional complexity to conditionally zero the rest
> of struct page.

Alternatively, just zero out the entire vmemmap area when it is setup
in the kernel page tables.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [v1 0/5] parallelized "struct page" zeroing
  2017-03-23 23:26 ` [v1 0/5] parallelized "struct page" zeroing Matthew Wilcox
  2017-03-23 23:35   ` David Miller
@ 2017-03-23 23:36   ` Pasha Tatashin
  1 sibling, 0 replies; 13+ messages in thread
From: Pasha Tatashin @ 2017-03-23 23:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390

Hi Matthew,

Thank you for your comment. If you look at the data, having memset() 
actually benefits initializing data.

With base it takes:
[   66.148867] node 0 initialised, 128312523 pages in 7200ms

With fix:
[   15.260634] node 0 initialised, 128312523 pages in 4190ms

So 4.19s vs 7.2s for the same number of "struct page". This is because 
memset() actually brings "struct page" into cache with efficient  block 
initializing store instruction. I have not tested if there is the same 
effect on Intel.

Pasha

On 03/23/2017 07:26 PM, Matthew Wilcox wrote:
> On Thu, Mar 23, 2017 at 07:01:48PM -0400, Pavel Tatashin wrote:
>> When deferred struct page initialization feature is enabled, we get a
>> performance gain of initializing vmemmap in parallel after other CPUs are
>> started. However, we still zero the memory for vmemmap using one boot CPU.
>> This patch-set fixes the memset-zeroing limitation by deferring it as well.
>>
>> Here is example performance gain on SPARC with 32T:
>> base
>> https://hastebin.com/ozanelatat.go
>>
>> fix
>> https://hastebin.com/utonawukof.go
>>
>> As you can see without the fix it takes: 97.89s to boot
>> With the fix it takes: 46.91 to boot.
>
> How long does it take if we just don't zero this memory at all?  We seem
> to be initialising most of struct page in __init_single_page(), so it
> seems like a lot of additional complexity to conditionally zero the rest
> of struct page.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [v1 0/5] parallelized "struct page" zeroing
  2017-03-23 23:35   ` David Miller
@ 2017-03-23 23:47     ` Pasha Tatashin
  2017-03-24  1:15       ` Pasha Tatashin
  0 siblings, 1 reply; 13+ messages in thread
From: Pasha Tatashin @ 2017-03-23 23:47 UTC (permalink / raw)
  To: David Miller, willy
  Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390



On 03/23/2017 07:35 PM, David Miller wrote:
> From: Matthew Wilcox <willy@infradead.org>
> Date: Thu, 23 Mar 2017 16:26:38 -0700
>
>> On Thu, Mar 23, 2017 at 07:01:48PM -0400, Pavel Tatashin wrote:
>>> When deferred struct page initialization feature is enabled, we get a
>>> performance gain of initializing vmemmap in parallel after other CPUs are
>>> started. However, we still zero the memory for vmemmap using one boot CPU.
>>> This patch-set fixes the memset-zeroing limitation by deferring it as well.
>>>
>>> Here is example performance gain on SPARC with 32T:
>>> base
>>> https://hastebin.com/ozanelatat.go
>>>
>>> fix
>>> https://hastebin.com/utonawukof.go
>>>
>>> As you can see without the fix it takes: 97.89s to boot
>>> With the fix it takes: 46.91 to boot.
>>
>> How long does it take if we just don't zero this memory at all?  We seem
>> to be initialising most of struct page in __init_single_page(), so it
>> seems like a lot of additional complexity to conditionally zero the rest
>> of struct page.
>
> Alternatively, just zero out the entire vmemmap area when it is setup
> in the kernel page tables.

Hi Dave,

I can do this, either way is fine with me. It would be a little slower 
compared to the current approach where we benefit from having memset() 
to work as prefetch. But that would become negligible, once in the 
future we will increase the granularity of multi-threading, currently it 
is only one thread per-mnode to multithread vmemamp. Your call.

Thank  you,
Pasha

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [v1 0/5] parallelized "struct page" zeroing
  2017-03-23 23:47     ` Pasha Tatashin
@ 2017-03-24  1:15       ` Pasha Tatashin
  0 siblings, 0 replies; 13+ messages in thread
From: Pasha Tatashin @ 2017-03-24  1:15 UTC (permalink / raw)
  To: David Miller, willy
  Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390


On 03/23/2017 07:47 PM, Pasha Tatashin wrote:
>>>
>>> How long does it take if we just don't zero this memory at all?  We seem
>>> to be initialising most of struct page in __init_single_page(), so it
>>> seems like a lot of additional complexity to conditionally zero the rest
>>> of struct page.
>>
>> Alternatively, just zero out the entire vmemmap area when it is setup
>> in the kernel page tables.
>
> Hi Dave,
>
> I can do this, either way is fine with me. It would be a little slower
> compared to the current approach where we benefit from having memset()
> to work as prefetch. But that would become negligible, once in the
> future we will increase the granularity of multi-threading, currently it
> is only one thread per-mnode to multithread vmemamp. Your call.
>
> Thank  you,
> Pasha

Hi Dave and Matthew,

I've been thinking about it some more, and figured that the current 
approach is better:

1. Most importantly: Part of the vmemmap is initialized early during 
boot to support Linux to get to the multi-CPU environment. This means 
that we would need to figure out what part of vmemmap will need to be 
zeroed before hand in single thread, than zero the rest in multi-thread. 
This will be very awkward architecturally and error prone.

2. As I already showed, the current approach is significantly faster. 
So, perhaps it should be the default behavior even for non-deferred 
"struct page" initialization: unconditionally do not zero vmemmap in 
memblock allocator, and always zero in __init_single_page(). But, I am 
afraid it could cause boot time regressions on some platforms where 
memset() is not optimized, so I would not do it in this patchset. But, 
hopefully, gradually more platforms will support deferred struct page 
initialization, and this would become the default behavior.

3. By zeroing "struct page" in  __init_single_page(), we set every byte 
of "struct page" in one place instead of scattering it across different 
places. So, it could help in the future when we will multi-thread 
addition of hotplugged memory.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [v1 0/5] parallelized "struct page" zeroing
  2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
                   ` (5 preceding siblings ...)
  2017-03-23 23:26 ` [v1 0/5] parallelized "struct page" zeroing Matthew Wilcox
@ 2017-03-24  8:51 ` Christian Borntraeger
  2017-03-24  9:35   ` Heiko Carstens
  6 siblings, 1 reply; 13+ messages in thread
From: Christian Borntraeger @ 2017-03-24  8:51 UTC (permalink / raw)
  To: Pavel Tatashin, linux-kernel, sparclinux, linux-mm, linuxppc-dev,
	linux-s390

On 03/24/2017 12:01 AM, Pavel Tatashin wrote:
> When deferred struct page initialization feature is enabled, we get a
> performance gain of initializing vmemmap in parallel after other CPUs are
> started. However, we still zero the memory for vmemmap using one boot CPU.
> This patch-set fixes the memset-zeroing limitation by deferring it as well.
> 
> Here is example performance gain on SPARC with 32T:
> base
> https://hastebin.com/ozanelatat.go
> 
> fix
> https://hastebin.com/utonawukof.go
> 
> As you can see without the fix it takes: 97.89s to boot
> With the fix it takes: 46.91 to boot.
> 
> On x86 time saving is going to be even greater (proportionally to memory size)
> because there are twice as many "struct page"es for the same amount of memory,
> as base pages are twice smaller.

Fixing the linux-s390 mailing list email.
This might be useful for s390 as well.

> 
> 
> Pavel Tatashin (5):
>   sparc64: simplify vmemmap_populate
>   mm: defining memblock_virt_alloc_try_nid_raw
>   mm: add "zero" argument to vmemmap allocators
>   mm: zero struct pages during initialization
>   mm: teach platforms not to zero struct pages memory
> 
>  arch/powerpc/mm/init_64.c |    4 +-
>  arch/s390/mm/vmem.c       |    5 ++-
>  arch/sparc/mm/init_64.c   |   26 +++++++----------------
>  arch/x86/mm/init_64.c     |    3 +-
>  include/linux/bootmem.h   |    3 ++
>  include/linux/mm.h        |   15 +++++++++++--
>  mm/memblock.c             |   46 ++++++++++++++++++++++++++++++++++++------
>  mm/page_alloc.c           |    3 ++
>  mm/sparse-vmemmap.c       |   48 +++++++++++++++++++++++++++++---------------
>  9 files changed, 103 insertions(+), 50 deletions(-)
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [v1 0/5] parallelized "struct page" zeroing
  2017-03-24  8:51 ` Christian Borntraeger
@ 2017-03-24  9:35   ` Heiko Carstens
  0 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2017-03-24  9:35 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Pavel Tatashin, linux-kernel, sparclinux, linux-mm, linuxppc-dev,
	linux-s390

On Fri, Mar 24, 2017 at 09:51:09AM +0100, Christian Borntraeger wrote:
> On 03/24/2017 12:01 AM, Pavel Tatashin wrote:
> > When deferred struct page initialization feature is enabled, we get a
> > performance gain of initializing vmemmap in parallel after other CPUs are
> > started. However, we still zero the memory for vmemmap using one boot CPU.
> > This patch-set fixes the memset-zeroing limitation by deferring it as well.
> > 
> > Here is example performance gain on SPARC with 32T:
> > base
> > https://hastebin.com/ozanelatat.go
> > 
> > fix
> > https://hastebin.com/utonawukof.go
> > 
> > As you can see without the fix it takes: 97.89s to boot
> > With the fix it takes: 46.91 to boot.
> > 
> > On x86 time saving is going to be even greater (proportionally to memory size)
> > because there are twice as many "struct page"es for the same amount of memory,
> > as base pages are twice smaller.
> 
> Fixing the linux-s390 mailing list email.
> This might be useful for s390 as well.

Unfortunately only for the fake numa case, since as far as I understand it,
parallelization happens only on a node granularity. And since we are
usually only having one node...

But anyway, it won't hurt to set ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT on
s390 also. I'll do some testing and then we'll see.

Pavel, could you please change your patch 5 so it also converts the s390
call sites of vmemmap_alloc_block() so they use VMEMMAP_ZERO instead of
'true' as argument?

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-03-24  9:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-23 23:01 [v1 0/5] parallelized "struct page" zeroing Pavel Tatashin
2017-03-23 23:01 ` [v1 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
2017-03-23 23:01 ` [v1 2/5] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
2017-03-23 23:01 ` [v1 3/5] mm: add "zero" argument to vmemmap allocators Pavel Tatashin
2017-03-23 23:01 ` [v1 4/5] mm: zero struct pages during initialization Pavel Tatashin
2017-03-23 23:01 ` [v1 5/5] mm: teach platforms not to zero struct pages memory Pavel Tatashin
2017-03-23 23:26 ` [v1 0/5] parallelized "struct page" zeroing Matthew Wilcox
2017-03-23 23:35   ` David Miller
2017-03-23 23:47     ` Pasha Tatashin
2017-03-24  1:15       ` Pasha Tatashin
2017-03-23 23:36   ` Pasha Tatashin
2017-03-24  8:51 ` Christian Borntraeger
2017-03-24  9:35   ` Heiko Carstens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).