linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v2 0/5] parallelized "struct page" zeroing
@ 2017-03-24 19:19 Pavel Tatashin
  2017-03-24 19:19 ` [v2 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Pavel Tatashin @ 2017-03-24 19:19 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	borntraeger, heiko.carstens, davem, willy

Changelog:
	v1 - v2
	- Per request, added s390 to deferred "struct page" zeroing
	- Collected performance data on x86 which proofs the importance to
	  keep memset() as prefetch (see below).

When deferred struct page initialization feature is enabled, we get a
performance gain of initializing vmemmap in parallel after other CPUs are
started. However, we still zero the memory for vmemmap using one boot CPU.
This patch-set fixes the memset-zeroing limitation by deferring it as well.

Performance gain on SPARC with 32T:
base:	https://hastebin.com/ozanelatat.go
fix:	https://hastebin.com/utonawukof.go

As you can see without the fix it takes: 97.89s to boot
With the fix it takes: 46.91 to boot.

Performance gain on x86 with 1T:
base:	https://hastebin.com/uvifasohon.pas
fix:	https://hastebin.com/anodiqaguj.pas

On Intel we save 10.66s/T while on SPARC we save 1.59s/T. Intel has
twice as many pages, and also fewer nodes than SPARC (sparc 32 nodes, vs.
intel 8 nodes).

It takes one thread 11.25s to zero vmemmap on Intel for 1T, so it should
take additional 11.25 / 8 = 1.4s  (this machine has 8 nodes) per node to
initialize the memory, but it takes only additional 0.456s per node, which
means on Intel we also benefit from having memset() and initializing all
other fields in one place.

Pavel Tatashin (5):
  sparc64: simplify vmemmap_populate
  mm: defining memblock_virt_alloc_try_nid_raw
  mm: add "zero" argument to vmemmap allocators
  mm: zero struct pages during initialization
  mm: teach platforms not to zero struct pages memory

 arch/powerpc/mm/init_64.c |    4 +-
 arch/s390/mm/vmem.c       |    5 ++-
 arch/sparc/mm/init_64.c   |   26 +++++++----------------
 arch/x86/mm/init_64.c     |    3 +-
 include/linux/bootmem.h   |    3 ++
 include/linux/mm.h        |   15 +++++++++++--
 mm/memblock.c             |   46 ++++++++++++++++++++++++++++++++++++------
 mm/page_alloc.c           |    3 ++
 mm/sparse-vmemmap.c       |   48 +++++++++++++++++++++++++++++---------------
 9 files changed, 103 insertions(+), 50 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [v2 1/5] sparc64: simplify vmemmap_populate
  2017-03-24 19:19 [v2 0/5] parallelized "struct page" zeroing Pavel Tatashin
@ 2017-03-24 19:19 ` Pavel Tatashin
  2017-03-24 19:19 ` [v2 2/5] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Pavel Tatashin @ 2017-03-24 19:19 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	borntraeger, heiko.carstens, davem, willy

Remove duplicating code, by using common functions
vmemmap_pud_populate and vmemmap_pgd_populate functions.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 arch/sparc/mm/init_64.c |   23 ++++++-----------------
 1 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 2c0cb2a..01eccab 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2526,30 +2526,19 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
 	vstart = vstart & PMD_MASK;
 	vend = ALIGN(vend, PMD_SIZE);
 	for (; vstart < vend; vstart += PMD_SIZE) {
-		pgd_t *pgd = pgd_offset_k(vstart);
+		pgd_t *pgd = vmemmap_pgd_populate(vstart, node);
 		unsigned long pte;
 		pud_t *pud;
 		pmd_t *pmd;
 
-		if (pgd_none(*pgd)) {
-			pud_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
+		if (!pgd)
+			return -ENOMEM;
 
-			if (!new)
-				return -ENOMEM;
-			pgd_populate(&init_mm, pgd, new);
-		}
-
-		pud = pud_offset(pgd, vstart);
-		if (pud_none(*pud)) {
-			pmd_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
-
-			if (!new)
-				return -ENOMEM;
-			pud_populate(&init_mm, pud, new);
-		}
+		pud = vmemmap_pud_populate(pgd, vstart, node);
+		if (!pud)
+			return -ENOMEM;
 
 		pmd = pmd_offset(pud, vstart);
-
 		pte = pmd_val(*pmd);
 		if (!(pte & _PAGE_VALID)) {
 			void *block = vmemmap_alloc_block(PMD_SIZE, node);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [v2 2/5] mm: defining memblock_virt_alloc_try_nid_raw
  2017-03-24 19:19 [v2 0/5] parallelized "struct page" zeroing Pavel Tatashin
  2017-03-24 19:19 ` [v2 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
@ 2017-03-24 19:19 ` Pavel Tatashin
  2017-03-24 19:19 ` [v2 3/5] mm: add "zero" argument to vmemmap allocators Pavel Tatashin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Pavel Tatashin @ 2017-03-24 19:19 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	borntraeger, heiko.carstens, davem, willy

A new version of memblock_virt_alloc_* allocations:
- Does not zero the allocated memory
- Does not panic if request cannot be satisfied

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 include/linux/bootmem.h |    3 +++
 mm/memblock.c           |   46 +++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index dbaf312..b61ea10 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -160,6 +160,9 @@ extern int reserve_bootmem_node(pg_data_t *pgdat,
 #define BOOTMEM_ALLOC_ANYWHERE		(~(phys_addr_t)0)
 
 /* FIXME: Move to memblock.h at a point where we remove nobootmem.c */
+void *memblock_virt_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
+				      phys_addr_t min_addr,
+				      phys_addr_t max_addr, int nid);
 void *memblock_virt_alloc_try_nid_nopanic(phys_addr_t size,
 		phys_addr_t align, phys_addr_t min_addr,
 		phys_addr_t max_addr, int nid);
diff --git a/mm/memblock.c b/mm/memblock.c
index 696f06d..7fdc555 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1271,7 +1271,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
 static void * __init memblock_virt_alloc_internal(
 				phys_addr_t size, phys_addr_t align,
 				phys_addr_t min_addr, phys_addr_t max_addr,
-				int nid)
+				int nid, bool zero)
 {
 	phys_addr_t alloc;
 	void *ptr;
@@ -1322,7 +1322,8 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
 	return NULL;
 done:
 	ptr = phys_to_virt(alloc);
-	memset(ptr, 0, size);
+	if (zero)
+		memset(ptr, 0, size);
 
 	/*
 	 * The min_count is set to 0 so that bootmem allocated blocks
@@ -1336,6 +1337,37 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
 }
 
 /**
+ * memblock_virt_alloc_try_nid_raw - allocate boot memory block without zeroing
+ * memory and without panicking
+ * @size: size of memory block to be allocated in bytes
+ * @align: alignment of the region and block's size
+ * @min_addr: the lower bound of the memory region from where the allocation
+ *	  is preferred (phys address)
+ * @max_addr: the upper bound of the memory region from where the allocation
+ *	      is preferred (phys address), or %BOOTMEM_ALLOC_ACCESSIBLE to
+ *	      allocate only from memory limited by memblock.current_limit value
+ * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ *
+ * Public function, provides additional debug information (including caller
+ * info), if enabled. Does not zero allocated memory, does not panic if request
+ * cannot be satisfied.
+ *
+ * RETURNS:
+ * Virtual address of allocated memory block on success, NULL on failure.
+ */
+void * __init memblock_virt_alloc_try_nid_raw(
+			phys_addr_t size, phys_addr_t align,
+			phys_addr_t min_addr, phys_addr_t max_addr,
+			int nid)
+{
+	memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
+		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
+		     (u64)max_addr, (void *)_RET_IP_);
+	return memblock_virt_alloc_internal(size, align,
+					   min_addr, max_addr, nid, false);
+}
+
+/**
  * memblock_virt_alloc_try_nid_nopanic - allocate boot memory block
  * @size: size of memory block to be allocated in bytes
  * @align: alignment of the region and block's size
@@ -1346,8 +1378,8 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
  *	      allocate only from memory limited by memblock.current_limit value
  * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
  *
- * Public version of _memblock_virt_alloc_try_nid_nopanic() which provides
- * additional debug information (including caller info), if enabled.
+ * Public function, provides additional debug information (including caller
+ * info), if enabled. This function zeroes the allocated memory.
  *
  * RETURNS:
  * Virtual address of allocated memory block on success, NULL on failure.
@@ -1361,7 +1393,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
 		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
 		     (u64)max_addr, (void *)_RET_IP_);
 	return memblock_virt_alloc_internal(size, align, min_addr,
-					     max_addr, nid);
+					     max_addr, nid, true);
 }
 
 /**
@@ -1375,7 +1407,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
  *	      allocate only from memory limited by memblock.current_limit value
  * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
  *
- * Public panicking version of _memblock_virt_alloc_try_nid_nopanic()
+ * Public panicking version of memblock_virt_alloc_try_nid_nopanic()
  * which provides debug information (including caller info), if enabled,
  * and panics if the request can not be satisfied.
  *
@@ -1393,7 +1425,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
 		     __func__, (u64)size, (u64)align, nid, (u64)min_addr,
 		     (u64)max_addr, (void *)_RET_IP_);
 	ptr = memblock_virt_alloc_internal(size, align,
-					   min_addr, max_addr, nid);
+					   min_addr, max_addr, nid, true);
 	if (ptr)
 		return ptr;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [v2 3/5] mm: add "zero" argument to vmemmap allocators
  2017-03-24 19:19 [v2 0/5] parallelized "struct page" zeroing Pavel Tatashin
  2017-03-24 19:19 ` [v2 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
  2017-03-24 19:19 ` [v2 2/5] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
@ 2017-03-24 19:19 ` Pavel Tatashin
  2017-05-03 14:34   ` David Miller
  2017-03-24 19:19 ` [v2 4/5] mm: zero struct pages during initialization Pavel Tatashin
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Pavel Tatashin @ 2017-03-24 19:19 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	borntraeger, heiko.carstens, davem, willy

Allow clients to request non-zeroed memory from vmemmap allocator.
The following two public function have a new boolean argument called zero:

__vmemmap_alloc_block_buf()
vmemmap_alloc_block()

When zero is true, memory that is allocated by memblock allocator is zeroed
(the current behavior), when argument is false, the memory is not zeroed.

This change allows for optimizations where client knows when it is better
to zero memory: may be later when other CPUs are started, or may be client
is going to set every byte in the allocated memory, so no need to zero
memory beforehand.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 arch/powerpc/mm/init_64.c |    4 +-
 arch/s390/mm/vmem.c       |    5 ++-
 arch/sparc/mm/init_64.c   |    3 +-
 arch/x86/mm/init_64.c     |    3 +-
 include/linux/mm.h        |    6 ++--
 mm/sparse-vmemmap.c       |   48 +++++++++++++++++++++++++++++---------------
 6 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 9be9920..eb4c270 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -133,7 +133,7 @@ static int __meminit vmemmap_populated(unsigned long start, int page_size)
 
 	/* allocate a page when required and hand out chunks */
 	if (!num_left) {
-		next = vmemmap_alloc_block(PAGE_SIZE, node);
+		next = vmemmap_alloc_block(PAGE_SIZE, node, true);
 		if (unlikely(!next)) {
 			WARN_ON(1);
 			return NULL;
@@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (vmemmap_populated(start, page_size))
 			continue;
 
-		p = vmemmap_alloc_block(page_size, node);
+		p = vmemmap_alloc_block(page_size, node, true);
 		if (!p)
 			return -ENOMEM;
 
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 60d3899..9c75214 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -251,7 +251,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 			if (MACHINE_HAS_EDAT1) {
 				void *new_page;
 
-				new_page = vmemmap_alloc_block(PMD_SIZE, node);
+				new_page = vmemmap_alloc_block(PMD_SIZE, node,
+							       true);
 				if (!new_page)
 					goto out;
 				pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
@@ -271,7 +272,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (pte_none(*pt_dir)) {
 			void *new_page;
 
-			new_page = vmemmap_alloc_block(PAGE_SIZE, node);
+			new_page = vmemmap_alloc_block(PAGE_SIZE, node, true);
 			if (!new_page)
 				goto out;
 			pte_val(*pt_dir) = __pa(new_page) | pgt_prot;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 01eccab..d91e462 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2541,7 +2541,8 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
 		pmd = pmd_offset(pud, vstart);
 		pte = pmd_val(*pmd);
 		if (!(pte & _PAGE_VALID)) {
-			void *block = vmemmap_alloc_block(PMD_SIZE, node);
+			void *block = vmemmap_alloc_block(PMD_SIZE, node,
+							  true);
 
 			if (!block)
 				return -ENOMEM;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 15173d3..46101b6 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1176,7 +1176,8 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 		if (pmd_none(*pmd)) {
 			void *p;
 
-			p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
+			p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap,
+						      true);
 			if (p) {
 				pte_t entry;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5f01c88..54df194 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2410,13 +2410,13 @@ void sparse_mem_maps_populate_node(struct page **map_map,
 pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
 pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node);
 pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node);
-void *vmemmap_alloc_block(unsigned long size, int node);
+void *vmemmap_alloc_block(unsigned long size, int node, bool zero);
 struct vmem_altmap;
 void *__vmemmap_alloc_block_buf(unsigned long size, int node,
-		struct vmem_altmap *altmap);
+		struct vmem_altmap *altmap, bool zero);
 static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
 {
-	return __vmemmap_alloc_block_buf(size, node, NULL);
+	return __vmemmap_alloc_block_buf(size, node, NULL, true);
 }
 
 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a56c398..1e9508b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -39,16 +39,27 @@
 static void * __ref __earlyonly_bootmem_alloc(int node,
 				unsigned long size,
 				unsigned long align,
-				unsigned long goal)
+				unsigned long goal,
+				bool zero)
 {
-	return memblock_virt_alloc_try_nid(size, align, goal,
-					    BOOTMEM_ALLOC_ACCESSIBLE, node);
+	void *mem = memblock_virt_alloc_try_nid_raw(size, align, goal,
+						    BOOTMEM_ALLOC_ACCESSIBLE,
+						    node);
+	if (!mem) {
+		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=0x%lx\n",
+		      __func__, size, align, node, goal);
+		return NULL;
+	}
+
+	if (zero)
+		memset(mem, 0, size);
+	return mem;
 }
 
 static void *vmemmap_buf;
 static void *vmemmap_buf_end;
 
-void * __meminit vmemmap_alloc_block(unsigned long size, int node)
+void * __meminit vmemmap_alloc_block(unsigned long size, int node, bool zero)
 {
 	/* If the main allocator is up use that, fallback to bootmem. */
 	if (slab_is_available()) {
@@ -67,24 +78,27 @@
 		return NULL;
 	} else
 		return __earlyonly_bootmem_alloc(node, size, size,
-				__pa(MAX_DMA_ADDRESS));
+				__pa(MAX_DMA_ADDRESS), zero);
 }
 
 /* need to make sure size is all the same during early stage */
-static void * __meminit alloc_block_buf(unsigned long size, int node)
+static void * __meminit alloc_block_buf(unsigned long size, int node, bool zero)
 {
 	void *ptr;
 
 	if (!vmemmap_buf)
-		return vmemmap_alloc_block(size, node);
+		return vmemmap_alloc_block(size, node, zero);
 
 	/* take the from buf */
 	ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);
 	if (ptr + size > vmemmap_buf_end)
-		return vmemmap_alloc_block(size, node);
+		return vmemmap_alloc_block(size, node, zero);
 
 	vmemmap_buf = ptr + size;
 
+	if (zero)
+		memset(ptr, 0, size);
+
 	return ptr;
 }
 
@@ -152,11 +166,11 @@ static unsigned long __meminit vmem_altmap_alloc(struct vmem_altmap *altmap,
 
 /* need to make sure size is all the same during early stage */
 void * __meminit __vmemmap_alloc_block_buf(unsigned long size, int node,
-		struct vmem_altmap *altmap)
+		struct vmem_altmap *altmap, bool zero)
 {
 	if (altmap)
 		return altmap_alloc_block_buf(size, altmap);
-	return alloc_block_buf(size, node);
+	return alloc_block_buf(size, node, zero);
 }
 
 void __meminit vmemmap_verify(pte_t *pte, int node,
@@ -175,7 +189,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
 	pte_t *pte = pte_offset_kernel(pmd, addr);
 	if (pte_none(*pte)) {
 		pte_t entry;
-		void *p = alloc_block_buf(PAGE_SIZE, node);
+		void *p = alloc_block_buf(PAGE_SIZE, node, true);
 		if (!p)
 			return NULL;
 		entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
@@ -188,7 +202,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
 {
 	pmd_t *pmd = pmd_offset(pud, addr);
 	if (pmd_none(*pmd)) {
-		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+		void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
 		if (!p)
 			return NULL;
 		pmd_populate_kernel(&init_mm, pmd, p);
@@ -200,7 +214,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
 {
 	pud_t *pud = pud_offset(p4d, addr);
 	if (pud_none(*pud)) {
-		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+		void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
 		if (!p)
 			return NULL;
 		pud_populate(&init_mm, pud, p);
@@ -212,7 +226,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
 {
 	p4d_t *p4d = p4d_offset(pgd, addr);
 	if (p4d_none(*p4d)) {
-		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+		void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
 		if (!p)
 			return NULL;
 		p4d_populate(&init_mm, p4d, p);
@@ -224,7 +238,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
 {
 	pgd_t *pgd = pgd_offset_k(addr);
 	if (pgd_none(*pgd)) {
-		void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+		void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
 		if (!p)
 			return NULL;
 		pgd_populate(&init_mm, pgd, p);
@@ -290,8 +304,8 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	void *vmemmap_buf_start;
 
 	size = ALIGN(size, PMD_SIZE);
-	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
-			 PMD_SIZE, __pa(MAX_DMA_ADDRESS));
+	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size
+			* map_count, PMD_SIZE, __pa(MAX_DMA_ADDRESS), false);
 
 	if (vmemmap_buf_start) {
 		vmemmap_buf = vmemmap_buf_start;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [v2 4/5] mm: zero struct pages during initialization
  2017-03-24 19:19 [v2 0/5] parallelized "struct page" zeroing Pavel Tatashin
                   ` (2 preceding siblings ...)
  2017-03-24 19:19 ` [v2 3/5] mm: add "zero" argument to vmemmap allocators Pavel Tatashin
@ 2017-03-24 19:19 ` Pavel Tatashin
  2017-03-24 19:19 ` [v2 5/5] mm: teach platforms not to zero struct pages memory Pavel Tatashin
  2017-03-25 21:25 ` [v2 0/5] parallelized "struct page" zeroing Matthew Wilcox
  5 siblings, 0 replies; 11+ messages in thread
From: Pavel Tatashin @ 2017-03-24 19:19 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	borntraeger, heiko.carstens, davem, willy

When deferred struct page initialization is enabled, do not expect that
the memory that was allocated for struct pages was zeroed by the
allocator. Zero it when "struct pages" are initialized.

Also, a defined boolean VMEMMAP_ZERO is provided to tell platforms whether
they should zero memory or can deffer it.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 include/linux/mm.h |    9 +++++++++
 mm/page_alloc.c    |    3 +++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 54df194..eb052f6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2427,6 +2427,15 @@ int vmemmap_populate_basepages(unsigned long start, unsigned long end,
 #ifdef CONFIG_MEMORY_HOTPLUG
 void vmemmap_free(unsigned long start, unsigned long end);
 #endif
+/*
+ * Don't zero "struct page"es during early boot, and zero only when they are
+ * initialized in parallel.
+ */
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+#define VMEMMAP_ZERO	false
+#else
+#define VMEMMAP_ZERO	true
+#endif
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
 				  unsigned long size);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f202f8b..02945e4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1168,6 +1168,9 @@ static void free_one_page(struct zone *zone,
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+	memset(page, 0, sizeof(struct page));
+#endif
 	set_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [v2 5/5] mm: teach platforms not to zero struct pages memory
  2017-03-24 19:19 [v2 0/5] parallelized "struct page" zeroing Pavel Tatashin
                   ` (3 preceding siblings ...)
  2017-03-24 19:19 ` [v2 4/5] mm: zero struct pages during initialization Pavel Tatashin
@ 2017-03-24 19:19 ` Pavel Tatashin
  2017-03-27  6:00   ` Heiko Carstens
  2017-03-25 21:25 ` [v2 0/5] parallelized "struct page" zeroing Matthew Wilcox
  5 siblings, 1 reply; 11+ messages in thread
From: Pavel Tatashin @ 2017-03-24 19:19 UTC (permalink / raw)
  To: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	borntraeger, heiko.carstens, davem, willy

If we are using deferred struct page initialization feature, most of
"struct page"es are getting initialized after other CPUs are started, and
hence we are benefiting from doing this job in parallel. However, we are
still zeroing all the memory that is allocated for "struct pages" using the
boot CPU.  This patch solves this problem, by deferring zeroing "struct
pages" to only when they are initialized.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 arch/powerpc/mm/init_64.c |    2 +-
 arch/s390/mm/vmem.c       |    2 +-
 arch/sparc/mm/init_64.c   |    2 +-
 arch/x86/mm/init_64.c     |    2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index eb4c270..24faf2d 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 		if (vmemmap_populated(start, page_size))
 			continue;
 
-		p = vmemmap_alloc_block(page_size, node, true);
+		p = vmemmap_alloc_block(page_size, node, VMEMMAP_ZERO);
 		if (!p)
 			return -ENOMEM;
 
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 9c75214..ffe9ba1 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -252,7 +252,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 				void *new_page;
 
 				new_page = vmemmap_alloc_block(PMD_SIZE, node,
-							       true);
+							       VMEMMAP_ZERO);
 				if (!new_page)
 					goto out;
 				pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index d91e462..280834e 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2542,7 +2542,7 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
 		pte = pmd_val(*pmd);
 		if (!(pte & _PAGE_VALID)) {
 			void *block = vmemmap_alloc_block(PMD_SIZE, node,
-							  true);
+							  VMEMMAP_ZERO);
 
 			if (!block)
 				return -ENOMEM;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 46101b6..9d8c72c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1177,7 +1177,7 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 			void *p;
 
 			p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap,
-						      true);
+						      VMEMMAP_ZERO);
 			if (p) {
 				pte_t entry;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [v2 0/5] parallelized "struct page" zeroing
  2017-03-24 19:19 [v2 0/5] parallelized "struct page" zeroing Pavel Tatashin
                   ` (4 preceding siblings ...)
  2017-03-24 19:19 ` [v2 5/5] mm: teach platforms not to zero struct pages memory Pavel Tatashin
@ 2017-03-25 21:25 ` Matthew Wilcox
  5 siblings, 0 replies; 11+ messages in thread
From: Matthew Wilcox @ 2017-03-25 21:25 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	borntraeger, heiko.carstens, davem

On Fri, Mar 24, 2017 at 03:19:47PM -0400, Pavel Tatashin wrote:
> Changelog:
> 	v1 - v2
> 	- Per request, added s390 to deferred "struct page" zeroing
> 	- Collected performance data on x86 which proofs the importance to
> 	  keep memset() as prefetch (see below).
> 
> When deferred struct page initialization feature is enabled, we get a
> performance gain of initializing vmemmap in parallel after other CPUs are
> started. However, we still zero the memory for vmemmap using one boot CPU.
> This patch-set fixes the memset-zeroing limitation by deferring it as well.
> 
> Performance gain on SPARC with 32T:
> base:	https://hastebin.com/ozanelatat.go
> fix:	https://hastebin.com/utonawukof.go
> 
> As you can see without the fix it takes: 97.89s to boot
> With the fix it takes: 46.91 to boot.
> 
> Performance gain on x86 with 1T:
> base:	https://hastebin.com/uvifasohon.pas
> fix:	https://hastebin.com/anodiqaguj.pas
> 
> On Intel we save 10.66s/T while on SPARC we save 1.59s/T. Intel has
> twice as many pages, and also fewer nodes than SPARC (sparc 32 nodes, vs.
> intel 8 nodes).
> 
> It takes one thread 11.25s to zero vmemmap on Intel for 1T, so it should
> take additional 11.25 / 8 = 1.4s  (this machine has 8 nodes) per node to
> initialize the memory, but it takes only additional 0.456s per node, which
> means on Intel we also benefit from having memset() and initializing all
> other fields in one place.

My question was how long it takes if you memset in neither place.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [v2 5/5] mm: teach platforms not to zero struct pages memory
  2017-03-24 19:19 ` [v2 5/5] mm: teach platforms not to zero struct pages memory Pavel Tatashin
@ 2017-03-27  6:00   ` Heiko Carstens
  2017-04-07  6:15     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 11+ messages in thread
From: Heiko Carstens @ 2017-03-27  6:00 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	borntraeger, davem, willy

On Fri, Mar 24, 2017 at 03:19:52PM -0400, Pavel Tatashin wrote:
> If we are using deferred struct page initialization feature, most of
> "struct page"es are getting initialized after other CPUs are started, and
> hence we are benefiting from doing this job in parallel. However, we are
> still zeroing all the memory that is allocated for "struct pages" using the
> boot CPU.  This patch solves this problem, by deferring zeroing "struct
> pages" to only when they are initialized.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
> ---
>  arch/powerpc/mm/init_64.c |    2 +-
>  arch/s390/mm/vmem.c       |    2 +-
>  arch/sparc/mm/init_64.c   |    2 +-
>  arch/x86/mm/init_64.c     |    2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index eb4c270..24faf2d 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
>  		if (vmemmap_populated(start, page_size))
>  			continue;
> 
> -		p = vmemmap_alloc_block(page_size, node, true);
> +		p = vmemmap_alloc_block(page_size, node, VMEMMAP_ZERO);
>  		if (!p)
>  			return -ENOMEM;
> 
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index 9c75214..ffe9ba1 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -252,7 +252,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
>  				void *new_page;
> 
>  				new_page = vmemmap_alloc_block(PMD_SIZE, node,
> -							       true);
> +							       VMEMMAP_ZERO);
>  				if (!new_page)
>  					goto out;
>  				pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;

s390 has two call sites that need to be converted, like you did in one of
your previous patches. The same seems to be true for powerpc, unless there
is a reason to not convert them?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [v2 5/5] mm: teach platforms not to zero struct pages memory
  2017-03-27  6:00   ` Heiko Carstens
@ 2017-04-07  6:15     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 11+ messages in thread
From: Aneesh Kumar K.V @ 2017-04-07  6:15 UTC (permalink / raw)
  To: Heiko Carstens, Pavel Tatashin
  Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	borntraeger, davem, willy

Heiko Carstens <heiko.carstens@de.ibm.com> writes:

> On Fri, Mar 24, 2017 at 03:19:52PM -0400, Pavel Tatashin wrote:
>> If we are using deferred struct page initialization feature, most of
>> "struct page"es are getting initialized after other CPUs are started, and
>> hence we are benefiting from doing this job in parallel. However, we are
>> still zeroing all the memory that is allocated for "struct pages" using the
>> boot CPU.  This patch solves this problem, by deferring zeroing "struct
>> pages" to only when they are initialized.
>> 
>> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
>> ---
>>  arch/powerpc/mm/init_64.c |    2 +-
>>  arch/s390/mm/vmem.c       |    2 +-
>>  arch/sparc/mm/init_64.c   |    2 +-
>>  arch/x86/mm/init_64.c     |    2 +-
>>  4 files changed, 4 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
>> index eb4c270..24faf2d 100644
>> --- a/arch/powerpc/mm/init_64.c
>> +++ b/arch/powerpc/mm/init_64.c
>> @@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
>>  		if (vmemmap_populated(start, page_size))
>>  			continue;
>> 
>> -		p = vmemmap_alloc_block(page_size, node, true);
>> +		p = vmemmap_alloc_block(page_size, node, VMEMMAP_ZERO);
>>  		if (!p)
>>  			return -ENOMEM;
>> 
>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>> index 9c75214..ffe9ba1 100644
>> --- a/arch/s390/mm/vmem.c
>> +++ b/arch/s390/mm/vmem.c
>> @@ -252,7 +252,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
>>  				void *new_page;
>> 
>>  				new_page = vmemmap_alloc_block(PMD_SIZE, node,
>> -							       true);
>> +							       VMEMMAP_ZERO);
>>  				if (!new_page)
>>  					goto out;
>>  				pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
>
> s390 has two call sites that need to be converted, like you did in one of
> your previous patches. The same seems to be true for powerpc, unless there
> is a reason to not convert them?
>

vmemmap_list_alloc is not really struct page allocation right ? We are
just allocating memory to be used as vmemmmap_backing. But considering
we are updating all the three elements of the sturct, we can avoid that
memset . But instead of VMEMMAP_ZERO we can just pass false in that case
?

-aneesh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [v2 3/5] mm: add "zero" argument to vmemmap allocators
  2017-03-24 19:19 ` [v2 3/5] mm: add "zero" argument to vmemmap allocators Pavel Tatashin
@ 2017-05-03 14:34   ` David Miller
  2017-05-03 15:05     ` Pasha Tatashin
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2017-05-03 14:34 UTC (permalink / raw)
  To: pasha.tatashin
  Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	borntraeger, heiko.carstens, willy

From: Pavel Tatashin <pasha.tatashin@oracle.com>
Date: Fri, 24 Mar 2017 15:19:50 -0400

> Allow clients to request non-zeroed memory from vmemmap allocator.
> The following two public function have a new boolean argument called zero:
> 
> __vmemmap_alloc_block_buf()
> vmemmap_alloc_block()
> 
> When zero is true, memory that is allocated by memblock allocator is zeroed
> (the current behavior), when argument is false, the memory is not zeroed.
> 
> This change allows for optimizations where client knows when it is better
> to zero memory: may be later when other CPUs are started, or may be client
> is going to set every byte in the allocated memory, so no need to zero
> memory beforehand.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>

I think when you add a new argument that can adjust behavior, you
should add the new argument but retain exactly the current behavior in
the existing calls.

Then later you can piece by piece change behavior, and document properly
in the commit message what is happening and why the transformation is
legal.

Here, you are adding the new boolean to __earlyonly_bootmem_alloc() and
then making sparse_mem_maps_populate_node() pass false, which changes
behavior such that it doesn't get zero'd memory any more.

Please make one change at a time.  Otherwise review and bisection is
going to be difficult.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [v2 3/5] mm: add "zero" argument to vmemmap allocators
  2017-05-03 14:34   ` David Miller
@ 2017-05-03 15:05     ` Pasha Tatashin
  0 siblings, 0 replies; 11+ messages in thread
From: Pasha Tatashin @ 2017-05-03 15:05 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, sparclinux, linux-mm, linuxppc-dev, linux-s390,
	borntraeger, heiko.carstens, willy

Hi Dave,

Thank you for the review. I will address your comment and update patchset..

Pasha

On 05/03/2017 10:34 AM, David Miller wrote:
> From: Pavel Tatashin <pasha.tatashin@oracle.com>
> Date: Fri, 24 Mar 2017 15:19:50 -0400
> 
>> Allow clients to request non-zeroed memory from vmemmap allocator.
>> The following two public function have a new boolean argument called zero:
>>
>> __vmemmap_alloc_block_buf()
>> vmemmap_alloc_block()
>>
>> When zero is true, memory that is allocated by memblock allocator is zeroed
>> (the current behavior), when argument is false, the memory is not zeroed.
>>
>> This change allows for optimizations where client knows when it is better
>> to zero memory: may be later when other CPUs are started, or may be client
>> is going to set every byte in the allocated memory, so no need to zero
>> memory beforehand.
>>
>> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
> 
> I think when you add a new argument that can adjust behavior, you
> should add the new argument but retain exactly the current behavior in
> the existing calls.
> 
> Then later you can piece by piece change behavior, and document properly
> in the commit message what is happening and why the transformation is
> legal.
> 
> Here, you are adding the new boolean to __earlyonly_bootmem_alloc() and
> then making sparse_mem_maps_populate_node() pass false, which changes
> behavior such that it doesn't get zero'd memory any more.
> 
> Please make one change at a time.  Otherwise review and bisection is
> going to be difficult.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-05-03 15:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-24 19:19 [v2 0/5] parallelized "struct page" zeroing Pavel Tatashin
2017-03-24 19:19 ` [v2 1/5] sparc64: simplify vmemmap_populate Pavel Tatashin
2017-03-24 19:19 ` [v2 2/5] mm: defining memblock_virt_alloc_try_nid_raw Pavel Tatashin
2017-03-24 19:19 ` [v2 3/5] mm: add "zero" argument to vmemmap allocators Pavel Tatashin
2017-05-03 14:34   ` David Miller
2017-05-03 15:05     ` Pasha Tatashin
2017-03-24 19:19 ` [v2 4/5] mm: zero struct pages during initialization Pavel Tatashin
2017-03-24 19:19 ` [v2 5/5] mm: teach platforms not to zero struct pages memory Pavel Tatashin
2017-03-27  6:00   ` Heiko Carstens
2017-04-07  6:15     ` Aneesh Kumar K.V
2017-03-25 21:25 ` [v2 0/5] parallelized "struct page" zeroing Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).