linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/5] sparse_init rewrite
@ 2018-07-12 20:37 Pavel Tatashin
  2018-07-12 20:37 ` [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations Pavel Tatashin
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Pavel Tatashin @ 2018-07-12 20:37 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux-kernel, akpm,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, osalvador, pasha.tatashin, abdhalee, mpe

Changelog:
v5 - v4
	- Fixed the issue that was reported on ppc64 when
	  CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER is removed
	- Consolidated the new buffer allocation between vmemmap
	  and non-vmemmap variants of sparse layout.
	- Removed all review-by comments, because I had to do
	  significant amount of changes compared to previous version
	  and need another round of review.
	- I also would appreciate if those who reported problems with
	  PPC64 could test this change.
v4 - v3
	- Addressed comments from Dave Hansen
v3 - v1
	- Fixed two issues found by Baoquan He
v1 - v2
	- Addressed comments from Oscar Salvador

In sparse_init() we allocate two large buffers to temporary hold usemap and
memmap for the whole machine. However, we can avoid doing that if we
changed sparse_init() to operated on per-node bases instead of doing it on
the whole machine beforehand.

As shown by Baoquan
http://lkml.kernel.org/r/20180628062857.29658-1-bhe@redhat.com

The buffers are large enough to cause machine stop to boot on small memory
systems.

Another benefit of these changes is that they also obsolete
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER.

Pavel Tatashin (5):
  mm/sparse: abstract sparse buffer allocations
  mm/sparse: use the new sparse buffer functions in non-vmemmap
  mm/sparse: move buffer init/fini to the common place
  mm/sparse: add new sparse_init_nid() and sparse_init()
  mm/sparse: delete old sprase_init and enable new one

 include/linux/mm.h  |   7 +-
 mm/Kconfig          |   4 -
 mm/sparse-vmemmap.c |  59 +--------
 mm/sparse.c         | 300 +++++++++++++++-----------------------------
 4 files changed, 105 insertions(+), 265 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations
  2018-07-12 20:37 [PATCH v5 0/5] sparse_init rewrite Pavel Tatashin
@ 2018-07-12 20:37 ` Pavel Tatashin
  2018-07-12 22:45   ` Andrew Morton
  2018-07-13 13:17   ` Oscar Salvador
  2018-07-12 20:37 ` [PATCH v5 2/5] mm/sparse: use the new sparse buffer functions in non-vmemmap Pavel Tatashin
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Pavel Tatashin @ 2018-07-12 20:37 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux-kernel, akpm,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, osalvador, pasha.tatashin, abdhalee, mpe

When struct pages are allocated for sparse-vmemmap VA layout, we first
try to allocate one large buffer, and than if that fails allocate struct
pages for each section as we go.

The code that allocates buffer is uses global variables and is spread
across several call sites.

Cleanup the code by introducing three functions to handle the global
buffer:

sparse_buffer_init()	initialize the buffer
sparse_buffer_fini()	free the remaining part of the buffer
sparse_buffer_alloc()	alloc from the buffer, and if buffer is empty
return NULL

Define these functions in sparse.c instead of sparse-vmemmap.c because
later we will use them for non-vmemmap sparse allocations as well.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 include/linux/mm.h  |  4 ++++
 mm/sparse-vmemmap.c | 40 ++++++----------------------------------
 mm/sparse.c         | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 54 insertions(+), 35 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 577e578eb640..a83d3e0e66d4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2655,6 +2655,10 @@ void sparse_mem_maps_populate_node(struct page **map_map,
 				   unsigned long map_count,
 				   int nodeid);
 
+unsigned long __init section_map_size(void);
+void sparse_buffer_init(unsigned long size, int nid);
+void sparse_buffer_fini(void);
+void *sparse_buffer_alloc(unsigned long size);
 struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
 		struct vmem_altmap *altmap);
 pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 95e2c7638a5c..b05c7663c640 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -43,12 +43,9 @@ static void * __ref __earlyonly_bootmem_alloc(int node,
 				unsigned long goal)
 {
 	return memblock_virt_alloc_try_nid_raw(size, align, goal,
-					    BOOTMEM_ALLOC_ACCESSIBLE, node);
+					       BOOTMEM_ALLOC_ACCESSIBLE, node);
 }
 
-static void *vmemmap_buf;
-static void *vmemmap_buf_end;
-
 void * __meminit vmemmap_alloc_block(unsigned long size, int node)
 {
 	/* If the main allocator is up use that, fallback to bootmem. */
@@ -76,18 +73,10 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
 /* need to make sure size is all the same during early stage */
 void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node)
 {
-	void *ptr;
-
-	if (!vmemmap_buf)
-		return vmemmap_alloc_block(size, node);
-
-	/* take the from buf */
-	ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);
-	if (ptr + size > vmemmap_buf_end)
-		return vmemmap_alloc_block(size, node);
-
-	vmemmap_buf = ptr + size;
+	void *ptr = sparse_buffer_alloc(size);
 
+	if (!ptr)
+		ptr = vmemmap_alloc_block(size, node);
 	return ptr;
 }
 
@@ -279,19 +268,9 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 					  unsigned long map_count, int nodeid)
 {
 	unsigned long pnum;
-	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
-	void *vmemmap_buf_start;
 	int nr_consumed_maps = 0;
 
-	size = ALIGN(size, PMD_SIZE);
-	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
-			 PMD_SIZE, __pa(MAX_DMA_ADDRESS));
-
-	if (vmemmap_buf_start) {
-		vmemmap_buf = vmemmap_buf_start;
-		vmemmap_buf_end = vmemmap_buf_start + size * map_count;
-	}
-
+	sparse_buffer_init(section_map_size() * map_count, nodeid);
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
@@ -303,12 +282,5 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
 	}
-
-	if (vmemmap_buf_start) {
-		/* need to free left buf */
-		memblock_free_early(__pa(vmemmap_buf),
-				    vmemmap_buf_end - vmemmap_buf);
-		vmemmap_buf = NULL;
-		vmemmap_buf_end = NULL;
-	}
+	sparse_buffer_fini();
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index 2ea8b3dbd0df..ac57bae476f4 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -400,7 +400,14 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 	}
 }
 
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+unsigned long __init section_map_size(void)
+
+{
+	return ALIGN(sizeof(struct page) * PAGES_PER_SECTION, PMD_SIZE);
+}
+
+#else
 struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
 		struct vmem_altmap *altmap)
 {
@@ -457,6 +464,42 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
+static void *sparsemap_buf __meminitdata;
+static void *sparsemap_buf_end __meminitdata;
+
+void __init sparse_buffer_init(unsigned long size, int nid)
+{
+	BUG_ON(sparsemap_buf);
+	sparsemap_buf =
+		memblock_virt_alloc_try_nid_raw(size, PAGE_SIZE,
+						__pa(MAX_DMA_ADDRESS),
+						BOOTMEM_ALLOC_ACCESSIBLE, nid);
+	sparsemap_buf_end = sparsemap_buf + size;
+}
+
+void __init sparse_buffer_fini(void)
+{
+	unsigned long size = sparsemap_buf_end - sparsemap_buf;
+
+	if (sparsemap_buf && size > 0)
+		memblock_free_early(__pa(sparsemap_buf), size);
+	sparsemap_buf = NULL;
+}
+
+void * __meminit sparse_buffer_alloc(unsigned long size)
+{
+	void *ptr = NULL;
+
+	if (sparsemap_buf) {
+		ptr = (void *)ALIGN((unsigned long)sparsemap_buf, size);
+		if (ptr + size > sparsemap_buf_end)
+			ptr = NULL;
+		else
+			sparsemap_buf = ptr + size;
+	}
+	return ptr;
+}
+
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 static void __init sparse_early_mem_maps_alloc_node(void *data,
 				 unsigned long pnum_begin,
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 2/5] mm/sparse: use the new sparse buffer functions in non-vmemmap
  2018-07-12 20:37 [PATCH v5 0/5] sparse_init rewrite Pavel Tatashin
  2018-07-12 20:37 ` [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations Pavel Tatashin
@ 2018-07-12 20:37 ` Pavel Tatashin
  2018-07-12 20:37 ` [PATCH v5 3/5] mm/sparse: move buffer init/fini to the common place Pavel Tatashin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Pavel Tatashin @ 2018-07-12 20:37 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux-kernel, akpm,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, osalvador, pasha.tatashin, abdhalee, mpe

non-vmemmap sparse also allocated large contiguous chunk of memory, and if
fails falls back to smaller allocations. Use the same functions to allocate
buffer as the vmemmap-sparse

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 mm/sparse.c | 41 ++++++++++++++---------------------------
 1 file changed, 14 insertions(+), 27 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index ac57bae476f4..976854a3af8b 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -408,13 +408,20 @@ unsigned long __init section_map_size(void)
 }
 
 #else
+unsigned long __init section_map_size(void)
+{
+	return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
+}
+
 struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
 		struct vmem_altmap *altmap)
 {
-	struct page *map;
-	unsigned long size;
+	unsigned long size = section_map_size();
+	struct page *map = sparse_buffer_alloc(size);
+
+	if (map)
+		return map;
 
-	size = PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
 	map = memblock_virt_alloc_try_nid(size,
 					  PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
 					  BOOTMEM_ALLOC_ACCESSIBLE, nid);
@@ -425,42 +432,22 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 					  unsigned long pnum_end,
 					  unsigned long map_count, int nodeid)
 {
-	void *map;
 	unsigned long pnum;
-	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
-	int nr_consumed_maps;
-
-	size = PAGE_ALIGN(size);
-	map = memblock_virt_alloc_try_nid_raw(size * map_count,
-					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
-					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
-	if (map) {
-		nr_consumed_maps = 0;
-		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
-			if (!present_section_nr(pnum))
-				continue;
-			map_map[nr_consumed_maps] = map;
-			map += size;
-			nr_consumed_maps++;
-		}
-		return;
-	}
+	unsigned long size = section_map_size();
+	int nr_consumed_maps = 0;
 
-	/* fallback */
-	nr_consumed_maps = 0;
+	sparse_buffer_init(size * map_count, nodeid);
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
-		struct mem_section *ms;
-
 		if (!present_section_nr(pnum))
 			continue;
 		map_map[nr_consumed_maps] =
 				sparse_mem_map_populate(pnum, nodeid, NULL);
 		if (map_map[nr_consumed_maps++])
 			continue;
-		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
 	}
+	sparse_buffer_fini();
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 3/5] mm/sparse: move buffer init/fini to the common place
  2018-07-12 20:37 [PATCH v5 0/5] sparse_init rewrite Pavel Tatashin
  2018-07-12 20:37 ` [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations Pavel Tatashin
  2018-07-12 20:37 ` [PATCH v5 2/5] mm/sparse: use the new sparse buffer functions in non-vmemmap Pavel Tatashin
@ 2018-07-12 20:37 ` Pavel Tatashin
  2018-07-12 20:37 ` [PATCH v5 4/5] mm/sparse: add new sparse_init_nid() and sparse_init() Pavel Tatashin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Pavel Tatashin @ 2018-07-12 20:37 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux-kernel, akpm,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, osalvador, pasha.tatashin, abdhalee, mpe

Now, that both variants of sparse memory use the same buffers to populate
memory map, we can move sparse_buffer_init()/sparse_buffer_fini() to the
common place.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 include/linux/mm.h  |  3 ---
 mm/sparse-vmemmap.c |  2 --
 mm/sparse.c         | 14 +++++++-------
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a83d3e0e66d4..99d8c50adef6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2655,9 +2655,6 @@ void sparse_mem_maps_populate_node(struct page **map_map,
 				   unsigned long map_count,
 				   int nodeid);
 
-unsigned long __init section_map_size(void);
-void sparse_buffer_init(unsigned long size, int nid);
-void sparse_buffer_fini(void);
 void *sparse_buffer_alloc(unsigned long size);
 struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
 		struct vmem_altmap *altmap);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index b05c7663c640..cd15f3d252c3 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -270,7 +270,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	unsigned long pnum;
 	int nr_consumed_maps = 0;
 
-	sparse_buffer_init(section_map_size() * map_count, nodeid);
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
@@ -282,5 +281,4 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
 	}
-	sparse_buffer_fini();
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index 976854a3af8b..01c616342909 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -401,14 +401,14 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 }
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-unsigned long __init section_map_size(void)
+static unsigned long __init section_map_size(void)
 
 {
 	return ALIGN(sizeof(struct page) * PAGES_PER_SECTION, PMD_SIZE);
 }
 
 #else
-unsigned long __init section_map_size(void)
+static unsigned long __init section_map_size(void)
 {
 	return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
 }
@@ -433,10 +433,8 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 					  unsigned long map_count, int nodeid)
 {
 	unsigned long pnum;
-	unsigned long size = section_map_size();
 	int nr_consumed_maps = 0;
 
-	sparse_buffer_init(size * map_count, nodeid);
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
@@ -447,14 +445,13 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
 	}
-	sparse_buffer_fini();
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
 static void *sparsemap_buf __meminitdata;
 static void *sparsemap_buf_end __meminitdata;
 
-void __init sparse_buffer_init(unsigned long size, int nid)
+static void __init sparse_buffer_init(unsigned long size, int nid)
 {
 	BUG_ON(sparsemap_buf);
 	sparsemap_buf =
@@ -464,7 +461,7 @@ void __init sparse_buffer_init(unsigned long size, int nid)
 	sparsemap_buf_end = sparsemap_buf + size;
 }
 
-void __init sparse_buffer_fini(void)
+static void __init sparse_buffer_fini(void)
 {
 	unsigned long size = sparsemap_buf_end - sparsemap_buf;
 
@@ -494,8 +491,11 @@ static void __init sparse_early_mem_maps_alloc_node(void *data,
 				 unsigned long map_count, int nodeid)
 {
 	struct page **map_map = (struct page **)data;
+
+	sparse_buffer_init(section_map_size() * map_count, nodeid);
 	sparse_mem_maps_populate_node(map_map, pnum_begin, pnum_end,
 					 map_count, nodeid);
+	sparse_buffer_fini();
 }
 #else
 static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 4/5] mm/sparse: add new sparse_init_nid() and sparse_init()
  2018-07-12 20:37 [PATCH v5 0/5] sparse_init rewrite Pavel Tatashin
                   ` (2 preceding siblings ...)
  2018-07-12 20:37 ` [PATCH v5 3/5] mm/sparse: move buffer init/fini to the common place Pavel Tatashin
@ 2018-07-12 20:37 ` Pavel Tatashin
  2018-07-13 12:03   ` Oscar Salvador
  2018-07-12 20:37 ` [PATCH v5 5/5] mm/sparse: delete old sprase_init and enable new one Pavel Tatashin
  2018-07-13  9:59 ` [PATCH v5 0/5] sparse_init rewrite Oscar Salvador
  5 siblings, 1 reply; 17+ messages in thread
From: Pavel Tatashin @ 2018-07-12 20:37 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux-kernel, akpm,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, osalvador, pasha.tatashin, abdhalee, mpe

sparse_init() requires to temporary allocate two large buffers:
usemap_map and map_map. Baoquan He has identified that these buffers are so
large that Linux is not bootable on small memory machines, such as a kdump
boot. The buffers are especially large when CONFIG_X86_5LEVEL is set, as
they are scaled to the maximum physical memory size.

Baoquan provided a fix, which reduces these sizes of these buffers, but it
is much better to get rid of them entirely.

Add a new way to initialize sparse memory: sparse_init_nid(), which only
operates within one memory node, and thus allocates memory either in large
contiguous block or allocates section by section. This eliminates the need
for use of temporary buffers.

For simplified bisecting and review temporarly call sparse_init()
new_sparse_init(), the new interface is going to be enabled as well as old
code removed in the next patch.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 mm/sparse.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index 01c616342909..4087b94afddf 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -200,6 +200,11 @@ static inline int next_present_section_nr(int section_nr)
 	      (section_nr <= __highest_present_section_nr));	\
 	     section_nr = next_present_section_nr(section_nr))
 
+static inline unsigned long first_present_section_nr(void)
+{
+	return next_present_section_nr(-1);
+}
+
 /*
  * Record how many memory sections are marked as present
  * during system bootup.
@@ -668,6 +673,86 @@ void __init sparse_init(void)
 	memblock_free_early(__pa(usemap_map), size);
 }
 
+/*
+ * Initialize sparse on a specific node. The node spans [pnum_begin, pnum_end)
+ * And number of present sections in this node is map_count.
+ */
+static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
+				   unsigned long pnum_end,
+				   unsigned long map_count)
+{
+	unsigned long pnum, usemap_longs, *usemap;
+	struct page *map;
+
+	usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS);
+	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid),
+							  usemap_size() *
+							  map_count);
+	if (!usemap) {
+		pr_err("%s: node[%d] usemap allocation failed", __func__, nid);
+		goto failed;
+	}
+	sparse_buffer_init(map_count * section_map_size(), nid);
+	for_each_present_section_nr(pnum_begin, pnum) {
+		if (pnum >= pnum_end)
+			break;
+
+		map = sparse_mem_map_populate(pnum, nid, NULL);
+		if (!map) {
+			pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.",
+			       __func__, nid);
+			pnum_begin = pnum;
+			goto failed;
+		}
+		check_usemap_section_nr(nid, usemap);
+		sparse_init_one_section(__nr_to_section(pnum), pnum, map, usemap);
+		usemap += usemap_longs;
+	}
+	sparse_buffer_fini();
+	return;
+failed:
+	/* We failed to allocate, mark all the following pnums as not present */
+	for_each_present_section_nr(pnum_begin, pnum) {
+		struct mem_section *ms;
+
+		if (pnum >= pnum_end)
+			break;
+		ms = __nr_to_section(pnum);
+		ms->section_mem_map = 0;
+	}
+}
+
+/*
+ * Allocate the accumulated non-linear sections, allocate a mem_map
+ * for each and record the physical to section mapping.
+ */
+void __init new_sparse_init(void)
+{
+	unsigned long pnum_begin = first_present_section_nr();
+	int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
+	unsigned long pnum_end, map_count = 1;
+
+	/* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */
+	set_pageblock_order();
+
+	for_each_present_section_nr(pnum_begin + 1, pnum_end) {
+		int nid = sparse_early_nid(__nr_to_section(pnum_end));
+
+		if (nid == nid_begin) {
+			map_count++;
+			continue;
+		}
+		/* Init node with sections in range [pnum_begin, pnum_end) */
+		sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
+		nid_begin = nid;
+		pnum_begin = pnum_end;
+		map_count = 1;
+	}
+	/* cover the last node */
+	sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
+	vmemmap_populate_print_last();
+}
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 
 /* Mark all memory sections within the pfn range as online */
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 5/5] mm/sparse: delete old sprase_init and enable new one
  2018-07-12 20:37 [PATCH v5 0/5] sparse_init rewrite Pavel Tatashin
                   ` (3 preceding siblings ...)
  2018-07-12 20:37 ` [PATCH v5 4/5] mm/sparse: add new sparse_init_nid() and sparse_init() Pavel Tatashin
@ 2018-07-12 20:37 ` Pavel Tatashin
  2018-07-13  9:09   ` Oscar Salvador
  2018-07-13  9:59 ` [PATCH v5 0/5] sparse_init rewrite Oscar Salvador
  5 siblings, 1 reply; 17+ messages in thread
From: Pavel Tatashin @ 2018-07-12 20:37 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux-kernel, akpm,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, osalvador, pasha.tatashin, abdhalee, mpe

Rename new_sparse_init() to sparse_init() which enables it. Delete old
sparse_init() and all the code that became obsolete with.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 include/linux/mm.h  |   6 --
 mm/Kconfig          |   4 -
 mm/sparse-vmemmap.c |  21 -----
 mm/sparse.c         | 217 +-------------------------------------------
 4 files changed, 1 insertion(+), 247 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 99d8c50adef6..726e71475144 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2649,12 +2649,6 @@ extern int randomize_va_space;
 const char * arch_vma_name(struct vm_area_struct *vma);
 void print_vma_addr(char *prefix, unsigned long rip);
 
-void sparse_mem_maps_populate_node(struct page **map_map,
-				   unsigned long pnum_begin,
-				   unsigned long pnum_end,
-				   unsigned long map_count,
-				   int nodeid);
-
 void *sparse_buffer_alloc(unsigned long size);
 struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
 		struct vmem_altmap *altmap);
diff --git a/mm/Kconfig b/mm/Kconfig
index 28fcf54946ea..b78e7cd4e9fe 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -115,10 +115,6 @@ config SPARSEMEM_EXTREME
 config SPARSEMEM_VMEMMAP_ENABLE
 	bool
 
-config SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	def_bool y
-	depends on SPARSEMEM && X86_64
-
 config SPARSEMEM_VMEMMAP
 	bool "Sparse Memory virtual memmap"
 	depends on SPARSEMEM && SPARSEMEM_VMEMMAP_ENABLE
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index cd15f3d252c3..8301293331a2 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -261,24 +261,3 @@ struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid,
 
 	return map;
 }
-
-void __init sparse_mem_maps_populate_node(struct page **map_map,
-					  unsigned long pnum_begin,
-					  unsigned long pnum_end,
-					  unsigned long map_count, int nodeid)
-{
-	unsigned long pnum;
-	int nr_consumed_maps = 0;
-
-	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
-		if (!present_section_nr(pnum))
-			continue;
-
-		map_map[nr_consumed_maps] =
-				sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[nr_consumed_maps++])
-			continue;
-		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
-		       __func__);
-	}
-}
diff --git a/mm/sparse.c b/mm/sparse.c
index 4087b94afddf..3335a1c7e9be 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -205,12 +205,6 @@ static inline unsigned long first_present_section_nr(void)
 	return next_present_section_nr(-1);
 }
 
-/*
- * Record how many memory sections are marked as present
- * during system bootup.
- */
-static int __initdata nr_present_sections;
-
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -240,7 +234,6 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
 			ms->section_mem_map = sparse_encode_early_nid(nid) |
 							SECTION_IS_ONLINE;
 			section_mark_present(ms);
-			nr_present_sections++;
 		}
 	}
 }
@@ -377,34 +370,6 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-static void __init sparse_early_usemaps_alloc_node(void *data,
-				 unsigned long pnum_begin,
-				 unsigned long pnum_end,
-				 unsigned long usemap_count, int nodeid)
-{
-	void *usemap;
-	unsigned long pnum;
-	unsigned long **usemap_map = (unsigned long **)data;
-	int size = usemap_size();
-	int nr_consumed_maps = 0;
-
-	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
-							  size * usemap_count);
-	if (!usemap) {
-		pr_warn("%s: allocation failed\n", __func__);
-		return;
-	}
-
-	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
-		if (!present_section_nr(pnum))
-			continue;
-		usemap_map[nr_consumed_maps] = usemap;
-		usemap += size;
-		check_usemap_section_nr(nodeid, usemap_map[nr_consumed_maps]);
-		nr_consumed_maps++;
-	}
-}
-
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static unsigned long __init section_map_size(void)
 
@@ -489,190 +454,10 @@ void * __meminit sparse_buffer_alloc(unsigned long size)
 	return ptr;
 }
 
-#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-static void __init sparse_early_mem_maps_alloc_node(void *data,
-				 unsigned long pnum_begin,
-				 unsigned long pnum_end,
-				 unsigned long map_count, int nodeid)
-{
-	struct page **map_map = (struct page **)data;
-
-	sparse_buffer_init(section_map_size() * map_count, nodeid);
-	sparse_mem_maps_populate_node(map_map, pnum_begin, pnum_end,
-					 map_count, nodeid);
-	sparse_buffer_fini();
-}
-#else
-static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
-{
-	struct page *map;
-	struct mem_section *ms = __nr_to_section(pnum);
-	int nid = sparse_early_nid(ms);
-
-	map = sparse_mem_map_populate(pnum, nid, NULL);
-	if (map)
-		return map;
-
-	pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
-	       __func__);
-	return NULL;
-}
-#endif
-
 void __weak __meminit vmemmap_populate_print_last(void)
 {
 }
 
-/**
- *  alloc_usemap_and_memmap - memory alloction for pageblock flags and vmemmap
- *  @map: usemap_map for pageblock flags or mmap_map for vmemmap
- *  @unit_size: size of map unit
- */
-static void __init alloc_usemap_and_memmap(void (*alloc_func)
-					(void *, unsigned long, unsigned long,
-					unsigned long, int), void *data,
-					int data_unit_size)
-{
-	unsigned long pnum;
-	unsigned long map_count;
-	int nodeid_begin = 0;
-	unsigned long pnum_begin = 0;
-
-	for_each_present_section_nr(0, pnum) {
-		struct mem_section *ms;
-
-		ms = __nr_to_section(pnum);
-		nodeid_begin = sparse_early_nid(ms);
-		pnum_begin = pnum;
-		break;
-	}
-	map_count = 1;
-	for_each_present_section_nr(pnum_begin + 1, pnum) {
-		struct mem_section *ms;
-		int nodeid;
-
-		ms = __nr_to_section(pnum);
-		nodeid = sparse_early_nid(ms);
-		if (nodeid == nodeid_begin) {
-			map_count++;
-			continue;
-		}
-		/* ok, we need to take cake of from pnum_begin to pnum - 1*/
-		alloc_func(data, pnum_begin, pnum,
-						map_count, nodeid_begin);
-		/* new start, update count etc*/
-		nodeid_begin = nodeid;
-		pnum_begin = pnum;
-		data += map_count * data_unit_size;
-		map_count = 1;
-	}
-	/* ok, last chunk */
-	alloc_func(data, pnum_begin, __highest_present_section_nr+1,
-						map_count, nodeid_begin);
-}
-
-/*
- * Allocate the accumulated non-linear sections, allocate a mem_map
- * for each and record the physical to section mapping.
- */
-void __init sparse_init(void)
-{
-	unsigned long pnum;
-	struct page *map;
-	unsigned long *usemap;
-	unsigned long **usemap_map;
-	int size;
-	int nr_consumed_maps = 0;
-#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	int size2;
-	struct page **map_map;
-#endif
-
-	/* see include/linux/mmzone.h 'struct mem_section' definition */
-	BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
-
-	/* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */
-	set_pageblock_order();
-
-	/*
-	 * map is using big page (aka 2M in x86 64 bit)
-	 * usemap is less one page (aka 24 bytes)
-	 * so alloc 2M (with 2M align) and 24 bytes in turn will
-	 * make next 2M slip to one more 2M later.
-	 * then in big system, the memory will have a lot of holes...
-	 * here try to allocate 2M pages continuously.
-	 *
-	 * powerpc need to call sparse_init_one_section right after each
-	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
-	 */
-	size = sizeof(unsigned long *) * nr_present_sections;
-	usemap_map = memblock_virt_alloc(size, 0);
-	if (!usemap_map)
-		panic("can not allocate usemap_map\n");
-	alloc_usemap_and_memmap(sparse_early_usemaps_alloc_node,
-				(void *)usemap_map,
-				sizeof(usemap_map[0]));
-
-#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	size2 = sizeof(struct page *) * nr_present_sections;
-	map_map = memblock_virt_alloc(size2, 0);
-	if (!map_map)
-		panic("can not allocate map_map\n");
-	alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
-				(void *)map_map,
-				sizeof(map_map[0]));
-#endif
-
-	/*
-	 * The number of present sections stored in nr_present_sections
-	 * are kept the same since mem sections are marked as present in
-	 * memory_present(). In this for loop, we need check which sections
-	 * failed to allocate memmap or usemap, then clear its
-	 * ->section_mem_map accordingly. During this process, we need
-	 * increase 'nr_consumed_maps' whether its allocation of memmap
-	 * or usemap failed or not, so that after we handle the i-th
-	 * memory section, can get memmap and usemap of (i+1)-th section
-	 * correctly.
-	 */
-	for_each_present_section_nr(0, pnum) {
-		struct mem_section *ms;
-
-		if (nr_consumed_maps >= nr_present_sections) {
-			pr_err("nr_consumed_maps goes beyond nr_present_sections\n");
-			break;
-		}
-		ms = __nr_to_section(pnum);
-		usemap = usemap_map[nr_consumed_maps];
-		if (!usemap) {
-			ms->section_mem_map = 0;
-			nr_consumed_maps++;
-			continue;
-		}
-
-#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-		map = map_map[nr_consumed_maps];
-#else
-		map = sparse_early_mem_map_alloc(pnum);
-#endif
-		if (!map) {
-			ms->section_mem_map = 0;
-			nr_consumed_maps++;
-			continue;
-		}
-
-		sparse_init_one_section(__nr_to_section(pnum), pnum, map,
-								usemap);
-		nr_consumed_maps++;
-	}
-
-	vmemmap_populate_print_last();
-
-#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	memblock_free_early(__pa(map_map), size2);
-#endif
-	memblock_free_early(__pa(usemap_map), size);
-}
-
 /*
  * Initialize sparse on a specific node. The node spans [pnum_begin, pnum_end)
  * And number of present sections in this node is map_count.
@@ -726,7 +511,7 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
  * Allocate the accumulated non-linear sections, allocate a mem_map
  * for each and record the physical to section mapping.
  */
-void __init new_sparse_init(void)
+void __init sparse_init(void)
 {
 	unsigned long pnum_begin = first_present_section_nr();
 	int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations
  2018-07-12 20:37 ` [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations Pavel Tatashin
@ 2018-07-12 22:45   ` Andrew Morton
  2018-07-13 13:17   ` Oscar Salvador
  1 sibling, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2018-07-12 22:45 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: steven.sistare, daniel.m.jordan, linux-kernel, kirill.shutemov,
	mhocko, linux-mm, dan.j.williams, jack, jglisse, jrdr.linux, bhe,
	gregkh, vbabka, richard.weiyang, dave.hansen, rientjes, mingo,
	osalvador, abdhalee, mpe

On Thu, 12 Jul 2018 16:37:26 -0400 Pavel Tatashin <pasha.tatashin@oracle.com> wrote:

> When struct pages are allocated for sparse-vmemmap VA layout, we first
> try to allocate one large buffer, and than if that fails allocate struct
> pages for each section as we go.
> 
> The code that allocates buffer is uses global variables and is spread
> across several call sites.
> 
> Cleanup the code by introducing three functions to handle the global
> buffer:
> 
> sparse_buffer_init()	initialize the buffer
> sparse_buffer_fini()	free the remaining part of the buffer
> sparse_buffer_alloc()	alloc from the buffer, and if buffer is empty
> return NULL
> 
> Define these functions in sparse.c instead of sparse-vmemmap.c because
> later we will use them for non-vmemmap sparse allocations as well.
> 
> ...
>
> +void * __meminit sparse_buffer_alloc(unsigned long size)
> +{
> +	void *ptr = NULL;
> +
> +	if (sparsemap_buf) {
> +		ptr = (void *)ALIGN((unsigned long)sparsemap_buf, size);
> +		if (ptr + size > sparsemap_buf_end)
> +			ptr = NULL;
> +		else
> +			sparsemap_buf = ptr + size;
> +	}
> +	return ptr;
> +}

tweak...

diff -puN mm/sparse.c~mm-sparse-abstract-sparse-buffer-allocations-fix mm/sparse.c
--- a/mm/sparse.c~mm-sparse-abstract-sparse-buffer-allocations-fix
+++ a/mm/sparse.c
@@ -491,7 +491,7 @@ void * __meminit sparse_buffer_alloc(uns
 	void *ptr = NULL;
 
 	if (sparsemap_buf) {
-		ptr = (void *)ALIGN((unsigned long)sparsemap_buf, size);
+		ptr = PTR_ALIGN(sparsemap_buf, size);
 		if (ptr + size > sparsemap_buf_end)
 			ptr = NULL;
 		else


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/5] mm/sparse: delete old sprase_init and enable new one
  2018-07-12 20:37 ` [PATCH v5 5/5] mm/sparse: delete old sprase_init and enable new one Pavel Tatashin
@ 2018-07-13  9:09   ` Oscar Salvador
  2018-07-13 11:15     ` Pavel Tatashin
  0 siblings, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2018-07-13  9:09 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: steven.sistare, daniel.m.jordan, linux-kernel, akpm,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, abdhalee, mpe

  
> -#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> -static void __init sparse_early_mem_maps_alloc_node(void *data,
> -				 unsigned long pnum_begin,
> -				 unsigned long pnum_end,
> -				 unsigned long map_count, int nodeid)
> -{
> -	struct page **map_map = (struct page **)data;
> -
> -	sparse_buffer_init(section_map_size() * map_count, nodeid);
> -	sparse_mem_maps_populate_node(map_map, pnum_begin, pnum_end,
> -					 map_count, nodeid);
> -	sparse_buffer_fini();
> -}

From now on, sparse_mem_maps_populate_node() is not being used anymore, so I guess we can just
remove it from sparse.c, right? (as it is done in sparse-vmemmap.c).
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 0/5] sparse_init rewrite
  2018-07-12 20:37 [PATCH v5 0/5] sparse_init rewrite Pavel Tatashin
                   ` (4 preceding siblings ...)
  2018-07-12 20:37 ` [PATCH v5 5/5] mm/sparse: delete old sprase_init and enable new one Pavel Tatashin
@ 2018-07-13  9:59 ` Oscar Salvador
  2018-07-13 11:10   ` Pavel Tatashin
  5 siblings, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2018-07-13  9:59 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: steven.sistare, daniel.m.jordan, linux-kernel, akpm,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, abdhalee, mpe

On Thu, Jul 12, 2018 at 04:37:25PM -0400, Pavel Tatashin wrote:
> Changelog:
> v5 - v4
> 	- Fixed the issue that was reported on ppc64 when
> 	  CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER is removed
> 	- Consolidated the new buffer allocation between vmemmap
> 	  and non-vmemmap variants of sparse layout.
> 	- Removed all review-by comments, because I had to do
> 	  significant amount of changes compared to previous version
> 	  and need another round of review.
> 	- I also would appreciate if those who reported problems with
> 	  PPC64 could test this change.

About PPC64, your patchset fixes the issue as the population gets followed by a
sparse_init_one_section().

It can be seen here:

Before:

kernel: vmemmap_populate f000000000000000..f000000000004000, node 0
kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)
kernel: vmemmap_populate f000000000000000..f000000000008000, node 0
kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)
kernel: vmemmap_populate f000000000000000..f00000000000c000, node 0
kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)


After:

kernel: vmemmap_populate f000000000000000..f000000000004000, node 0
kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)
kernel: vmemmap_populate f000000000000000..f000000000008000, node 0
kernel: vmemmap_populate f000000000000000..f00000000000c000, node 0
kernel: vmemmap_populate f000000000000000..f000000000010000, node 0
kernel: vmemmap_populate f000000000010000..f000000000014000, node 0
kernel:       * f000000000010000..f000000000020000 allocated at (____ptrval____)


As can be seen, before the patchset, we keep calling vmemmap_create_mapping() even if we
populated that section already, because of vmemmap_populated() checking for SECTION_HAS_MEM_MAP.

After the patchset, since each population is being followed by a call to sparse_init_one_section(),
when vmemmap_populated() gets called, we have SECTION_HAS_MEM_MAP already in case the section
was populated.
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 0/5] sparse_init rewrite
  2018-07-13  9:59 ` [PATCH v5 0/5] sparse_init rewrite Oscar Salvador
@ 2018-07-13 11:10   ` Pavel Tatashin
  2018-07-16  6:40     ` Michael Ellerman
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Tatashin @ 2018-07-13 11:10 UTC (permalink / raw)
  To: osalvador
  Cc: Steven Sistare, Daniel Jordan, LKML, Andrew Morton,
	kirill.shutemov, Michal Hocko, Linux Memory Management List,
	dan.j.williams, jack, jglisse, Souptick Joarder, bhe, gregkh,
	Vlastimil Babka, Wei Yang, dave.hansen, rientjes, mingo,
	abdhalee, mpe

> About PPC64, your patchset fixes the issue as the population gets followed by a
> sparse_init_one_section().
>
> It can be seen here:
>
> Before:
>
> kernel: vmemmap_populate f000000000000000..f000000000004000, node 0
> kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)
> kernel: vmemmap_populate f000000000000000..f000000000008000, node 0
> kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)
> kernel: vmemmap_populate f000000000000000..f00000000000c000, node 0
> kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)
>
>
> After:
>
> kernel: vmemmap_populate f000000000000000..f000000000004000, node 0
> kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)
> kernel: vmemmap_populate f000000000000000..f000000000008000, node 0
> kernel: vmemmap_populate f000000000000000..f00000000000c000, node 0
> kernel: vmemmap_populate f000000000000000..f000000000010000, node 0
> kernel: vmemmap_populate f000000000010000..f000000000014000, node 0
> kernel:       * f000000000010000..f000000000020000 allocated at (____ptrval____)
>
>
> As can be seen, before the patchset, we keep calling vmemmap_create_mapping() even if we
> populated that section already, because of vmemmap_populated() checking for SECTION_HAS_MEM_MAP.
>
> After the patchset, since each population is being followed by a call to sparse_init_one_section(),
> when vmemmap_populated() gets called, we have SECTION_HAS_MEM_MAP already in case the section
> was populated.

Hi Oscar,

Right, I also like that this solution removes one extra loop, thus
reduces the code size. We were populating pages in one place, and then
loop again to set sections, now we do both in one place, but still
allow preallocation of memory to reduces fragmentation on all
platforms. However, I still wanted to see if someone could test on
real hardware.

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/5] mm/sparse: delete old sprase_init and enable new one
  2018-07-13  9:09   ` Oscar Salvador
@ 2018-07-13 11:15     ` Pavel Tatashin
  0 siblings, 0 replies; 17+ messages in thread
From: Pavel Tatashin @ 2018-07-13 11:15 UTC (permalink / raw)
  To: osalvador
  Cc: Steven Sistare, Daniel Jordan, LKML, Andrew Morton,
	kirill.shutemov, Michal Hocko, Linux Memory Management List,
	dan.j.williams, jack, jglisse, Souptick Joarder, bhe, gregkh,
	Vlastimil Babka, Wei Yang, dave.hansen, rientjes, mingo,
	abdhalee, mpe

On Fri, Jul 13, 2018 at 5:09 AM Oscar Salvador
<osalvador@techadventures.net> wrote:
>
>
> > -#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> > -static void __init sparse_early_mem_maps_alloc_node(void *data,
> > -                              unsigned long pnum_begin,
> > -                              unsigned long pnum_end,
> > -                              unsigned long map_count, int nodeid)
> > -{
> > -     struct page **map_map = (struct page **)data;
> > -
> > -     sparse_buffer_init(section_map_size() * map_count, nodeid);
> > -     sparse_mem_maps_populate_node(map_map, pnum_begin, pnum_end,
> > -                                      map_count, nodeid);
> > -     sparse_buffer_fini();
> > -}
>
> From now on, sparse_mem_maps_populate_node() is not being used anymore, so I guess we can just
> remove it from sparse.c, right? (as it is done in sparse-vmemmap.c).

Missed this one, even more code can be deleted! :) I will include this
in updated patches, after review comments.

Thank you,
Pavel

> --
> Oscar Salvador
> SUSE L3
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 4/5] mm/sparse: add new sparse_init_nid() and sparse_init()
  2018-07-12 20:37 ` [PATCH v5 4/5] mm/sparse: add new sparse_init_nid() and sparse_init() Pavel Tatashin
@ 2018-07-13 12:03   ` Oscar Salvador
  2018-07-13 12:37     ` Pavel Tatashin
  0 siblings, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2018-07-13 12:03 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: steven.sistare, daniel.m.jordan, linux-kernel, akpm,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, abdhalee, mpe

On Thu, Jul 12, 2018 at 04:37:29PM -0400, Pavel Tatashin wrote:
> sparse_init() requires to temporary allocate two large buffers:
> usemap_map and map_map. Baoquan He has identified that these buffers are so
> large that Linux is not bootable on small memory machines, such as a kdump
> boot. The buffers are especially large when CONFIG_X86_5LEVEL is set, as
> they are scaled to the maximum physical memory size.
> 
> Baoquan provided a fix, which reduces these sizes of these buffers, but it
> is much better to get rid of them entirely.
> 
> Add a new way to initialize sparse memory: sparse_init_nid(), which only
> operates within one memory node, and thus allocates memory either in large
> contiguous block or allocates section by section. This eliminates the need
> for use of temporary buffers.
> 
> For simplified bisecting and review temporarly call sparse_init()
> new_sparse_init(), the new interface is going to be enabled as well as old
> code removed in the next patch.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>

Looks good to me, and it will make the code much shorter/easier.

Reviewed-by: Oscar Salvador <osalvador@suse.de>

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 4/5] mm/sparse: add new sparse_init_nid() and sparse_init()
  2018-07-13 12:03   ` Oscar Salvador
@ 2018-07-13 12:37     ` Pavel Tatashin
  0 siblings, 0 replies; 17+ messages in thread
From: Pavel Tatashin @ 2018-07-13 12:37 UTC (permalink / raw)
  To: osalvador
  Cc: Steven Sistare, Daniel Jordan, LKML, Andrew Morton,
	kirill.shutemov, Michal Hocko, Linux Memory Management List,
	dan.j.williams, jack, jglisse, Souptick Joarder, bhe, gregkh,
	Vlastimil Babka, Wei Yang, dave.hansen, rientjes, mingo,
	abdhalee, mpe

> > Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
>
> Looks good to me, and it will make the code much shorter/easier.
>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>

Thank you!

Pave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations
  2018-07-12 20:37 ` [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations Pavel Tatashin
  2018-07-12 22:45   ` Andrew Morton
@ 2018-07-13 13:17   ` Oscar Salvador
  2018-07-13 13:24     ` Pavel Tatashin
  1 sibling, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2018-07-13 13:17 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: steven.sistare, daniel.m.jordan, linux-kernel, akpm,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, abdhalee, mpe

On Thu, Jul 12, 2018 at 04:37:26PM -0400, Pavel Tatashin wrote:
> +static void *sparsemap_buf __meminitdata;
> +static void *sparsemap_buf_end __meminitdata;
> +
> +void __init sparse_buffer_init(unsigned long size, int nid)
> +{
> +	BUG_ON(sparsemap_buf);

Why do we need a BUG_ON() here?
Looking at the code I cannot really see how we can end up with sparsemap_buf being NULL.
Is it just for over-protection?

> +	sparsemap_buf =
> +		memblock_virt_alloc_try_nid_raw(size, PAGE_SIZE,
> +						__pa(MAX_DMA_ADDRESS),
> +						BOOTMEM_ALLOC_ACCESSIBLE, nid);

In your previous version, you didn't pass a required alignment when setting up sparsemap_buf.
size is already PMD_SIZE aligned, do we need to align it also to PAGE_SIZE?

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations
  2018-07-13 13:17   ` Oscar Salvador
@ 2018-07-13 13:24     ` Pavel Tatashin
  2018-07-13 20:02       ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Tatashin @ 2018-07-13 13:24 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: steven.sistare, daniel.m.jordan, linux-kernel, akpm,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, abdhalee, mpe



On 07/13/2018 09:17 AM, Oscar Salvador wrote:
> On Thu, Jul 12, 2018 at 04:37:26PM -0400, Pavel Tatashin wrote:
>> +static void *sparsemap_buf __meminitdata;
>> +static void *sparsemap_buf_end __meminitdata;
>> +
>> +void __init sparse_buffer_init(unsigned long size, int nid)
>> +{
>> +	BUG_ON(sparsemap_buf);
> 
> Why do we need a BUG_ON() here?
> Looking at the code I cannot really see how we can end up with sparsemap_buf being NULL.
> Is it just for over-protection?

This checks that we do not accidentally leak memory by calling sparse_buffer_init() consequently without sparse_buffer_fini() in-between.

> 
>> +	sparsemap_buf =
>> +		memblock_virt_alloc_try_nid_raw(size, PAGE_SIZE,
>> +						__pa(MAX_DMA_ADDRESS),
>> +						BOOTMEM_ALLOC_ACCESSIBLE, nid);
> 
> In your previous version, you didn't pass a required alignment when setting up sparsemap_buf.
> size is already PMD_SIZE aligned, do we need to align it also to PAGE_SIZE?
> 

I decided to add PAGE_SIZE alignment, because the implicit memblock alignment is SMP_CACHE_BYTES which is smaller than page size. While, in practice we will most likely get a page size aligned allocation, it is still possible that some ranges in memblock are not page size aligned if that the way they were passed from BIOS.

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations
  2018-07-13 13:24     ` Pavel Tatashin
@ 2018-07-13 20:02       ` Andrew Morton
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2018-07-13 20:02 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Oscar Salvador, steven.sistare, daniel.m.jordan, linux-kernel,
	kirill.shutemov, mhocko, linux-mm, dan.j.williams, jack, jglisse,
	jrdr.linux, bhe, gregkh, vbabka, richard.weiyang, dave.hansen,
	rientjes, mingo, abdhalee, mpe

On Fri, 13 Jul 2018 09:24:44 -0400 Pavel Tatashin <pasha.tatashin@oracle.com> wrote:

> On 07/13/2018 09:17 AM, Oscar Salvador wrote:
> > On Thu, Jul 12, 2018 at 04:37:26PM -0400, Pavel Tatashin wrote:
> >> +static void *sparsemap_buf __meminitdata;
> >> +static void *sparsemap_buf_end __meminitdata;
> >> +
> >> +void __init sparse_buffer_init(unsigned long size, int nid)
> >> +{
> >> +	BUG_ON(sparsemap_buf);
> > 
> > Why do we need a BUG_ON() here?
> > Looking at the code I cannot really see how we can end up with sparsemap_buf being NULL.
> > Is it just for over-protection?
> 
> This checks that we do not accidentally leak memory by calling sparse_buffer_init() consequently without sparse_buffer_fini() in-between.

A memory leak isn't serious enough to justify crashing the kernel. 
Therefore

--- a/mm/sparse.c~mm-sparse-abstract-sparse-buffer-allocations-fix-fix
+++ a/mm/sparse.c
@@ -469,7 +469,7 @@ static void *sparsemap_buf_end __meminit
 
 void __init sparse_buffer_init(unsigned long size, int nid)
 {
-	BUG_ON(sparsemap_buf);
+	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
 	sparsemap_buf =
 		memblock_virt_alloc_try_nid_raw(size, PAGE_SIZE,
 						__pa(MAX_DMA_ADDRESS),
_


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 0/5] sparse_init rewrite
  2018-07-13 11:10   ` Pavel Tatashin
@ 2018-07-16  6:40     ` Michael Ellerman
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2018-07-16  6:40 UTC (permalink / raw)
  To: Pavel Tatashin, osalvador
  Cc: Steven Sistare, Daniel Jordan, LKML, Andrew Morton,
	kirill.shutemov, Michal Hocko, Linux Memory Management List,
	dan.j.williams, jack, jglisse, Souptick Joarder, bhe, gregkh,
	Vlastimil Babka, Wei Yang, dave.hansen, rientjes, mingo,
	abdhalee

Pavel Tatashin <pasha.tatashin@oracle.com> writes:

>> About PPC64, your patchset fixes the issue as the population gets followed by a
>> sparse_init_one_section().
>>
>> It can be seen here:
>>
>> Before:
>>
>> kernel: vmemmap_populate f000000000000000..f000000000004000, node 0
>> kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)
>> kernel: vmemmap_populate f000000000000000..f000000000008000, node 0
>> kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)
>> kernel: vmemmap_populate f000000000000000..f00000000000c000, node 0
>> kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)
>>
>>
>> After:
>>
>> kernel: vmemmap_populate f000000000000000..f000000000004000, node 0
>> kernel:       * f000000000000000..f000000000010000 allocated at (____ptrval____)
>> kernel: vmemmap_populate f000000000000000..f000000000008000, node 0
>> kernel: vmemmap_populate f000000000000000..f00000000000c000, node 0
>> kernel: vmemmap_populate f000000000000000..f000000000010000, node 0
>> kernel: vmemmap_populate f000000000010000..f000000000014000, node 0
>> kernel:       * f000000000010000..f000000000020000 allocated at (____ptrval____)
>>
>>
>> As can be seen, before the patchset, we keep calling vmemmap_create_mapping() even if we
>> populated that section already, because of vmemmap_populated() checking for SECTION_HAS_MEM_MAP.
>>
>> After the patchset, since each population is being followed by a call to sparse_init_one_section(),
>> when vmemmap_populated() gets called, we have SECTION_HAS_MEM_MAP already in case the section
>> was populated.
>
> Hi Oscar,
>
> Right, I also like that this solution removes one extra loop, thus
> reduces the code size. We were populating pages in one place, and then
> loop again to set sections, now we do both in one place, but still
> allow preallocation of memory to reduces fragmentation on all
> platforms. However, I still wanted to see if someone could test on
> real hardware.

I booted it on a small VM and a 160 CPU 4 node machine, both booted
fine.

If you want:
  Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)


Thanks for fixing it up for us.

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-07-16  6:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-12 20:37 [PATCH v5 0/5] sparse_init rewrite Pavel Tatashin
2018-07-12 20:37 ` [PATCH v5 1/5] mm/sparse: abstract sparse buffer allocations Pavel Tatashin
2018-07-12 22:45   ` Andrew Morton
2018-07-13 13:17   ` Oscar Salvador
2018-07-13 13:24     ` Pavel Tatashin
2018-07-13 20:02       ` Andrew Morton
2018-07-12 20:37 ` [PATCH v5 2/5] mm/sparse: use the new sparse buffer functions in non-vmemmap Pavel Tatashin
2018-07-12 20:37 ` [PATCH v5 3/5] mm/sparse: move buffer init/fini to the common place Pavel Tatashin
2018-07-12 20:37 ` [PATCH v5 4/5] mm/sparse: add new sparse_init_nid() and sparse_init() Pavel Tatashin
2018-07-13 12:03   ` Oscar Salvador
2018-07-13 12:37     ` Pavel Tatashin
2018-07-12 20:37 ` [PATCH v5 5/5] mm/sparse: delete old sprase_init and enable new one Pavel Tatashin
2018-07-13  9:09   ` Oscar Salvador
2018-07-13 11:15     ` Pavel Tatashin
2018-07-13  9:59 ` [PATCH v5 0/5] sparse_init rewrite Oscar Salvador
2018-07-13 11:10   ` Pavel Tatashin
2018-07-16  6:40     ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).