All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] mm/sparse: Optimize memmap allocation during sparse_init()
@ 2018-02-22  9:11 ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22  9:11 UTC (permalink / raw)
  To: linux-kernel, dave.hansen
  Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx, Baoquan He

This is v2 post. V1 can be found here:
https://www.spinics.net/lists/linux-mm/msg144486.html

In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. In
5-level paging mode, this will cost 512M memory though they will be
released at the end of sparse_init(). System with few memory, like
kdump kernel which usually only has about 256M, will fail to boot
because of allocation failure if CONFIG_X86_5LEVEL=y.

In this patchset, optimize the memmap allocation code to only use
usemap_map and map_map with the size of nr_present_sections. This
makes kdump kernel boot up with normal crashkernel='' setting when
CONFIG_X86_5LEVEL=y.

Baoquan He (3):
  mm/sparse: Add a static variable nr_present_sections
  mm/sparsemem: Defer the ms->section_mem_map clearing
  mm/sparse: Optimize memmap allocation during sparse_init()

 mm/sparse-vmemmap.c |  9 +++++----
 mm/sparse.c         | 54 +++++++++++++++++++++++++++++++++++------------------
 2 files changed, 41 insertions(+), 22 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 0/3] mm/sparse: Optimize memmap allocation during sparse_init()
@ 2018-02-22  9:11 ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22  9:11 UTC (permalink / raw)
  To: linux-kernel, dave.hansen
  Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx, Baoquan He

This is v2 post. V1 can be found here:
https://www.spinics.net/lists/linux-mm/msg144486.html

In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. In
5-level paging mode, this will cost 512M memory though they will be
released at the end of sparse_init(). System with few memory, like
kdump kernel which usually only has about 256M, will fail to boot
because of allocation failure if CONFIG_X86_5LEVEL=y.

In this patchset, optimize the memmap allocation code to only use
usemap_map and map_map with the size of nr_present_sections. This
makes kdump kernel boot up with normal crashkernel='' setting when
CONFIG_X86_5LEVEL=y.

Baoquan He (3):
  mm/sparse: Add a static variable nr_present_sections
  mm/sparsemem: Defer the ms->section_mem_map clearing
  mm/sparse: Optimize memmap allocation during sparse_init()

 mm/sparse-vmemmap.c |  9 +++++----
 mm/sparse.c         | 54 +++++++++++++++++++++++++++++++++++------------------
 2 files changed, 41 insertions(+), 22 deletions(-)

-- 
2.13.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/3] mm/sparse: Add a static variable nr_present_sections
  2018-02-22  9:11 ` Baoquan He
@ 2018-02-22  9:11   ` Baoquan He
  -1 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22  9:11 UTC (permalink / raw)
  To: linux-kernel, dave.hansen
  Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx, Baoquan He

It's used to record how many memory sections are marked as present
during system boot up, and will be used in the later patch.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index 7af5e7a92528..433d3c9b4b56 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -202,6 +202,7 @@ static inline int next_present_section_nr(int section_nr)
 	      (section_nr <= __highest_present_section_nr));	\
 	     section_nr = next_present_section_nr(section_nr))
 
+static int nr_present_sections;
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -231,6 +232,7 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
 			ms->section_mem_map = sparse_encode_early_nid(nid) |
 							SECTION_IS_ONLINE;
 			section_mark_present(ms);
+			nr_present_sections++;
 		}
 	}
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 1/3] mm/sparse: Add a static variable nr_present_sections
@ 2018-02-22  9:11   ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22  9:11 UTC (permalink / raw)
  To: linux-kernel, dave.hansen
  Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx, Baoquan He

It's used to record how many memory sections are marked as present
during system boot up, and will be used in the later patch.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index 7af5e7a92528..433d3c9b4b56 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -202,6 +202,7 @@ static inline int next_present_section_nr(int section_nr)
 	      (section_nr <= __highest_present_section_nr));	\
 	     section_nr = next_present_section_nr(section_nr))
 
+static int nr_present_sections;
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -231,6 +232,7 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
 			ms->section_mem_map = sparse_encode_early_nid(nid) |
 							SECTION_IS_ONLINE;
 			section_mark_present(ms);
+			nr_present_sections++;
 		}
 	}
 }
-- 
2.13.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/3] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-02-22  9:11 ` Baoquan He
@ 2018-02-22  9:11   ` Baoquan He
  -1 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22  9:11 UTC (permalink / raw)
  To: linux-kernel, dave.hansen
  Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx, Baoquan He

In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
will allocate one continuous memory chunk for mem maps on one node and
populate the relevant page tables to map memory section one by one. If
fail to populate for a certain mem section, print warning and its
->section_mem_map will be cleared to cancel the marking of being present.
Like this, the number of mem sections marked as present could become
less during sparse_init() execution.

Here just defer the ms->section_mem_map clearing if failed to populate
its page tables until the last for_each_present_section_nr() loop. This
is in preparation for later optimizing the mem map allocation.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse-vmemmap.c |  1 -
 mm/sparse.c         | 12 ++++++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index bd0276d5f66b..640e68f8324b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -303,7 +303,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
-		ms->section_mem_map = 0;
 	}
 
 	if (vmemmap_buf_start) {
diff --git a/mm/sparse.c b/mm/sparse.c
index 433d3c9b4b56..e9311b44e28a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -485,7 +485,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
-		ms->section_mem_map = 0;
 	}
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
@@ -513,7 +512,6 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
 
 	pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 	       __func__);
-	ms->section_mem_map = 0;
 	return NULL;
 }
 #endif
@@ -617,17 +615,23 @@ void __init sparse_init(void)
 #endif
 
 	for_each_present_section_nr(0, pnum) {
+		struct mem_section *ms;
+		ms = __nr_to_section(pnum);
 		usemap = usemap_map[pnum];
-		if (!usemap)
+		if (!usemap) {
+			ms->section_mem_map = 0;
 			continue;
+		}
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 		map = map_map[pnum];
 #else
 		map = sparse_early_mem_map_alloc(pnum);
 #endif
-		if (!map)
+		if (!map) {
+			ms->section_mem_map = 0;
 			continue;
+		}
 
 		sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 								usemap);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/3] mm/sparsemem: Defer the ms->section_mem_map clearing
@ 2018-02-22  9:11   ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22  9:11 UTC (permalink / raw)
  To: linux-kernel, dave.hansen
  Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx, Baoquan He

In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
will allocate one continuous memory chunk for mem maps on one node and
populate the relevant page tables to map memory section one by one. If
fail to populate for a certain mem section, print warning and its
->section_mem_map will be cleared to cancel the marking of being present.
Like this, the number of mem sections marked as present could become
less during sparse_init() execution.

Here just defer the ms->section_mem_map clearing if failed to populate
its page tables until the last for_each_present_section_nr() loop. This
is in preparation for later optimizing the mem map allocation.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse-vmemmap.c |  1 -
 mm/sparse.c         | 12 ++++++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index bd0276d5f66b..640e68f8324b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -303,7 +303,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
-		ms->section_mem_map = 0;
 	}
 
 	if (vmemmap_buf_start) {
diff --git a/mm/sparse.c b/mm/sparse.c
index 433d3c9b4b56..e9311b44e28a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -485,7 +485,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
-		ms->section_mem_map = 0;
 	}
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
@@ -513,7 +512,6 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
 
 	pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 	       __func__);
-	ms->section_mem_map = 0;
 	return NULL;
 }
 #endif
@@ -617,17 +615,23 @@ void __init sparse_init(void)
 #endif
 
 	for_each_present_section_nr(0, pnum) {
+		struct mem_section *ms;
+		ms = __nr_to_section(pnum);
 		usemap = usemap_map[pnum];
-		if (!usemap)
+		if (!usemap) {
+			ms->section_mem_map = 0;
 			continue;
+		}
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 		map = map_map[pnum];
 #else
 		map = sparse_early_mem_map_alloc(pnum);
 #endif
-		if (!map)
+		if (!map) {
+			ms->section_mem_map = 0;
 			continue;
+		}
 
 		sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 								usemap);
-- 
2.13.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-02-22  9:11 ` Baoquan He
@ 2018-02-22  9:11   ` Baoquan He
  -1 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22  9:11 UTC (permalink / raw)
  To: linux-kernel, dave.hansen
  Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx, Baoquan He

In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. With
the help of these two arrays, continuous memory chunk is allocated for
usemap and memmap for memory sections on one node. This avoids too many
memory fragmentations. Like below diagram, '1' indicates the present
memory section, '0' means absent one. The number 'n' could be much
smaller than NR_MEM_SECTIONS on most of systems.

|1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
-------------------------------------------------------
 0 1 2 3         4 5         i   i+1     n-1   n

If fail to populate the page tables to map one section's memmap, its
->section_mem_map will be cleared finally to indicate that it's not present.
After use, these two arrays will be released at the end of sparse_init().

In 4-level paging mode, each array costs 4M which can be ignorable. While
in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel
Usually only reserves very few memory, e.g 256M. So, even thouth they are
temporarily allocated, still not acceptable.

In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS.
Since the ->section_mem_map clearing has been deferred to the last, the
number of present memory sections are kept the same during sparse_init()
until we finally clear out the memory section's ->section_mem_map if its
usemap or memmap is not correctly handled. Thus in the middle whenever
for_each_present_section_nr() loop is taken, the i-th present memory
section is always the same one.

Here only allocate usemap_map and map_map with the size of
'nr_present_sections'. For the i-th present memory section, install its
usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then
in the last for_each_present_section_nr() loop which clears the failed
memory section's ->section_mem_map, fetch usemap and memmap from
usemap_map[] and map_map[] array and set them into mem_section[]
accordingly.

Signed-off-by: Baoquan He <bhe@redhat.com>

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse-vmemmap.c |  8 +++++---
 mm/sparse.c         | 40 ++++++++++++++++++++++++++--------------
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 640e68f8324b..f83723a49e47 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
 	void *vmemmap_buf_start;
+	int i = 0;
 
 	size = ALIGN(size, PMD_SIZE);
 	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
@@ -291,14 +292,15 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		vmemmap_buf_end = vmemmap_buf_start + size * map_count;
 	}
 
-	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
 		struct mem_section *ms;
 
 		if (!present_section_nr(pnum))
 			continue;
 
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		i++;
+		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[i-1])
 			continue;
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
diff --git a/mm/sparse.c b/mm/sparse.c
index e9311b44e28a..aafb6d838872 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -405,6 +405,7 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 	unsigned long pnum;
 	unsigned long **usemap_map = (unsigned long **)data;
 	int size = usemap_size();
+	int i = 0;
 
 	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
 							  size * usemap_count);
@@ -413,12 +414,13 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 		return;
 	}
 
-	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+	for (pnum = pnum_begin; pnum < pnum_end && i < usemap_count; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
-		usemap_map[pnum] = usemap;
+		usemap_map[i] = usemap;
 		usemap += size;
-		check_usemap_section_nr(nodeid, usemap_map[pnum]);
+		check_usemap_section_nr(nodeid, usemap_map[i]);
+		i++;
 	}
 }
 
@@ -447,14 +449,17 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	void *map;
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
+	int i;
 
 	map = alloc_remap(nodeid, size * map_count);
 	if (map) {
-		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+		i = 0;
+		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
 			if (!present_section_nr(pnum))
 				continue;
-			map_map[pnum] = map;
+			map_map[i] = map;
 			map += size;
+			i++;
 		}
 		return;
 	}
@@ -464,23 +469,27 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
 					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
 	if (map) {
-		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+		i = 0;
+		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
 			if (!present_section_nr(pnum))
 				continue;
-			map_map[pnum] = map;
+			map_map[i] = map;
 			map += size;
+			i++;
 		}
 		return;
 	}
 
 	/* fallback */
-	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+	i = 0;
+	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
 		struct mem_section *ms;
 
 		if (!present_section_nr(pnum))
 			continue;
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		i++;
+		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[i-1])
 			continue;
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
@@ -558,6 +567,7 @@ static void __init alloc_usemap_and_memmap(void (*alloc_func)
 		/* new start, update count etc*/
 		nodeid_begin = nodeid;
 		pnum_begin = pnum;
+		data += map_count;
 		map_count = 1;
 	}
 	/* ok, last chunk */
@@ -576,6 +586,7 @@ void __init sparse_init(void)
 	unsigned long *usemap;
 	unsigned long **usemap_map;
 	int size;
+	int i = 0;
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 	int size2;
 	struct page **map_map;
@@ -598,7 +609,7 @@ void __init sparse_init(void)
 	 * powerpc need to call sparse_init_one_section right after each
 	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
 	 */
-	size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
+	size = sizeof(unsigned long *) * nr_present_sections;
 	usemap_map = memblock_virt_alloc(size, 0);
 	if (!usemap_map)
 		panic("can not allocate usemap_map\n");
@@ -606,7 +617,7 @@ void __init sparse_init(void)
 							(void *)usemap_map);
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
+	size2 = sizeof(struct page *) * nr_present_sections;
 	map_map = memblock_virt_alloc(size2, 0);
 	if (!map_map)
 		panic("can not allocate map_map\n");
@@ -617,14 +628,15 @@ void __init sparse_init(void)
 	for_each_present_section_nr(0, pnum) {
 		struct mem_section *ms;
 		ms = __nr_to_section(pnum);
-		usemap = usemap_map[pnum];
+		i++;
+		usemap = usemap_map[i-1];
 		if (!usemap) {
 			ms->section_mem_map = 0;
 			continue;
 		}
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-		map = map_map[pnum];
+		map = map_map[i-1];
 #else
 		map = sparse_early_mem_map_alloc(pnum);
 #endif
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init()
@ 2018-02-22  9:11   ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22  9:11 UTC (permalink / raw)
  To: linux-kernel, dave.hansen
  Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx, Baoquan He

In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. With
the help of these two arrays, continuous memory chunk is allocated for
usemap and memmap for memory sections on one node. This avoids too many
memory fragmentations. Like below diagram, '1' indicates the present
memory section, '0' means absent one. The number 'n' could be much
smaller than NR_MEM_SECTIONS on most of systems.

|1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
-------------------------------------------------------
 0 1 2 3         4 5         i   i+1     n-1   n

If fail to populate the page tables to map one section's memmap, its
->section_mem_map will be cleared finally to indicate that it's not present.
After use, these two arrays will be released at the end of sparse_init().

In 4-level paging mode, each array costs 4M which can be ignorable. While
in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel
Usually only reserves very few memory, e.g 256M. So, even thouth they are
temporarily allocated, still not acceptable.

In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS.
Since the ->section_mem_map clearing has been deferred to the last, the
number of present memory sections are kept the same during sparse_init()
until we finally clear out the memory section's ->section_mem_map if its
usemap or memmap is not correctly handled. Thus in the middle whenever
for_each_present_section_nr() loop is taken, the i-th present memory
section is always the same one.

Here only allocate usemap_map and map_map with the size of
'nr_present_sections'. For the i-th present memory section, install its
usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then
in the last for_each_present_section_nr() loop which clears the failed
memory section's ->section_mem_map, fetch usemap and memmap from
usemap_map[] and map_map[] array and set them into mem_section[]
accordingly.

Signed-off-by: Baoquan He <bhe@redhat.com>

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse-vmemmap.c |  8 +++++---
 mm/sparse.c         | 40 ++++++++++++++++++++++++++--------------
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 640e68f8324b..f83723a49e47 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
 	void *vmemmap_buf_start;
+	int i = 0;
 
 	size = ALIGN(size, PMD_SIZE);
 	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
@@ -291,14 +292,15 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		vmemmap_buf_end = vmemmap_buf_start + size * map_count;
 	}
 
-	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
 		struct mem_section *ms;
 
 		if (!present_section_nr(pnum))
 			continue;
 
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		i++;
+		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[i-1])
 			continue;
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
diff --git a/mm/sparse.c b/mm/sparse.c
index e9311b44e28a..aafb6d838872 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -405,6 +405,7 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 	unsigned long pnum;
 	unsigned long **usemap_map = (unsigned long **)data;
 	int size = usemap_size();
+	int i = 0;
 
 	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
 							  size * usemap_count);
@@ -413,12 +414,13 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 		return;
 	}
 
-	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+	for (pnum = pnum_begin; pnum < pnum_end && i < usemap_count; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
-		usemap_map[pnum] = usemap;
+		usemap_map[i] = usemap;
 		usemap += size;
-		check_usemap_section_nr(nodeid, usemap_map[pnum]);
+		check_usemap_section_nr(nodeid, usemap_map[i]);
+		i++;
 	}
 }
 
@@ -447,14 +449,17 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	void *map;
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
+	int i;
 
 	map = alloc_remap(nodeid, size * map_count);
 	if (map) {
-		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+		i = 0;
+		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
 			if (!present_section_nr(pnum))
 				continue;
-			map_map[pnum] = map;
+			map_map[i] = map;
 			map += size;
+			i++;
 		}
 		return;
 	}
@@ -464,23 +469,27 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
 					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
 	if (map) {
-		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+		i = 0;
+		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
 			if (!present_section_nr(pnum))
 				continue;
-			map_map[pnum] = map;
+			map_map[i] = map;
 			map += size;
+			i++;
 		}
 		return;
 	}
 
 	/* fallback */
-	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+	i = 0;
+	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
 		struct mem_section *ms;
 
 		if (!present_section_nr(pnum))
 			continue;
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		i++;
+		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[i-1])
 			continue;
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
@@ -558,6 +567,7 @@ static void __init alloc_usemap_and_memmap(void (*alloc_func)
 		/* new start, update count etc*/
 		nodeid_begin = nodeid;
 		pnum_begin = pnum;
+		data += map_count;
 		map_count = 1;
 	}
 	/* ok, last chunk */
@@ -576,6 +586,7 @@ void __init sparse_init(void)
 	unsigned long *usemap;
 	unsigned long **usemap_map;
 	int size;
+	int i = 0;
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 	int size2;
 	struct page **map_map;
@@ -598,7 +609,7 @@ void __init sparse_init(void)
 	 * powerpc need to call sparse_init_one_section right after each
 	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
 	 */
-	size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
+	size = sizeof(unsigned long *) * nr_present_sections;
 	usemap_map = memblock_virt_alloc(size, 0);
 	if (!usemap_map)
 		panic("can not allocate usemap_map\n");
@@ -606,7 +617,7 @@ void __init sparse_init(void)
 							(void *)usemap_map);
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
+	size2 = sizeof(struct page *) * nr_present_sections;
 	map_map = memblock_virt_alloc(size2, 0);
 	if (!map_map)
 		panic("can not allocate map_map\n");
@@ -617,14 +628,15 @@ void __init sparse_init(void)
 	for_each_present_section_nr(0, pnum) {
 		struct mem_section *ms;
 		ms = __nr_to_section(pnum);
-		usemap = usemap_map[pnum];
+		i++;
+		usemap = usemap_map[i-1];
 		if (!usemap) {
 			ms->section_mem_map = 0;
 			continue;
 		}
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-		map = map_map[pnum];
+		map = map_map[i-1];
 #else
 		map = sparse_early_mem_map_alloc(pnum);
 #endif
-- 
2.13.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-02-22  9:11 ` Baoquan He
@ 2018-02-22  9:15   ` Baoquan He
  -1 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22  9:15 UTC (permalink / raw)
  To: linux-kernel, dave.hansen; +Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx

On 02/22/18 at 05:11pm, Baoquan He wrote:
> This is v2 post. V1 can be found here:
> https://www.spinics.net/lists/linux-mm/msg144486.html
> 
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS. They are used to store
> each memory section's usemap and mem map if marked as present. In
> 5-level paging mode, this will cost 512M memory though they will be
> released at the end of sparse_init(). System with few memory, like
> kdump kernel which usually only has about 256M, will fail to boot
> because of allocation failure if CONFIG_X86_5LEVEL=y.
> 
> In this patchset, optimize the memmap allocation code to only use
> usemap_map and map_map with the size of nr_present_sections. This
> makes kdump kernel boot up with normal crashkernel='' setting when
> CONFIG_X86_5LEVEL=y.

Sorry, forgot adding the change log.

v1-v2:
  Split out the nr_present_sections adding as a single patch for easier
  reviewing.

  Rewrite patch log according to Dave's suggestion.

  Fix code bug in patch 0002 reported by test robot.

> 
> Baoquan He (3):
>   mm/sparse: Add a static variable nr_present_sections
>   mm/sparsemem: Defer the ms->section_mem_map clearing
>   mm/sparse: Optimize memmap allocation during sparse_init()
> 
>  mm/sparse-vmemmap.c |  9 +++++----
>  mm/sparse.c         | 54 +++++++++++++++++++++++++++++++++++------------------
>  2 files changed, 41 insertions(+), 22 deletions(-)
> 
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] mm/sparse: Optimize memmap allocation during sparse_init()
@ 2018-02-22  9:15   ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22  9:15 UTC (permalink / raw)
  To: linux-kernel, dave.hansen; +Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx

On 02/22/18 at 05:11pm, Baoquan He wrote:
> This is v2 post. V1 can be found here:
> https://www.spinics.net/lists/linux-mm/msg144486.html
> 
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS. They are used to store
> each memory section's usemap and mem map if marked as present. In
> 5-level paging mode, this will cost 512M memory though they will be
> released at the end of sparse_init(). System with few memory, like
> kdump kernel which usually only has about 256M, will fail to boot
> because of allocation failure if CONFIG_X86_5LEVEL=y.
> 
> In this patchset, optimize the memmap allocation code to only use
> usemap_map and map_map with the size of nr_present_sections. This
> makes kdump kernel boot up with normal crashkernel='' setting when
> CONFIG_X86_5LEVEL=y.

Sorry, forgot adding the change log.

v1-v2:
  Split out the nr_present_sections adding as a single patch for easier
  reviewing.

  Rewrite patch log according to Dave's suggestion.

  Fix code bug in patch 0002 reported by test robot.

> 
> Baoquan He (3):
>   mm/sparse: Add a static variable nr_present_sections
>   mm/sparsemem: Defer the ms->section_mem_map clearing
>   mm/sparse: Optimize memmap allocation during sparse_init()
> 
>  mm/sparse-vmemmap.c |  9 +++++----
>  mm/sparse.c         | 54 +++++++++++++++++++++++++++++++++++------------------
>  2 files changed, 41 insertions(+), 22 deletions(-)
> 
> -- 
> 2.13.6
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-02-22  9:11   ` Baoquan He
@ 2018-02-22 10:07     ` Pankaj Gupta
  -1 siblings, 0 replies; 22+ messages in thread
From: Pankaj Gupta @ 2018-02-22 10:07 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, dave hansen, linux-mm, akpm, kirill shutemov, mhocko, tglx


Hi Baoquan,

> 
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS. They are used to store
> each memory section's usemap and mem map if marked as present. With
> the help of these two arrays, continuous memory chunk is allocated for
> usemap and memmap for memory sections on one node. This avoids too many
> memory fragmentations. Like below diagram, '1' indicates the present
> memory section, '0' means absent one. The number 'n' could be much
> smaller than NR_MEM_SECTIONS on most of systems.
> 
> |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
> -------------------------------------------------------
>  0 1 2 3         4 5         i   i+1     n-1   n
> 
> If fail to populate the page tables to map one section's memmap, its
> ->section_mem_map will be cleared finally to indicate that it's not present.
> After use, these two arrays will be released at the end of sparse_init().
> 
> In 4-level paging mode, each array costs 4M which can be ignorable. While
> in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel
> Usually only reserves very few memory, e.g 256M. So, even thouth they are
> temporarily allocated, still not acceptable.
> 
> In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS.
> Since the ->section_mem_map clearing has been deferred to the last, the
> number of present memory sections are kept the same during sparse_init()
> until we finally clear out the memory section's ->section_mem_map if its
> usemap or memmap is not correctly handled. Thus in the middle whenever
> for_each_present_section_nr() loop is taken, the i-th present memory
> section is always the same one.
> 
> Here only allocate usemap_map and map_map with the size of
> 'nr_present_sections'. For the i-th present memory section, install its
> usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then
> in the last for_each_present_section_nr() loop which clears the failed
> memory section's ->section_mem_map, fetch usemap and memmap from
> usemap_map[] and map_map[] array and set them into mem_section[]
> accordingly.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/sparse-vmemmap.c |  8 +++++---
>  mm/sparse.c         | 40 ++++++++++++++++++++++++++--------------
>  2 files changed, 31 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 640e68f8324b..f83723a49e47 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page
> **map_map,
>  	unsigned long pnum;
>  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
>  	void *vmemmap_buf_start;
> +	int i = 0;
>  
>  	size = ALIGN(size, PMD_SIZE);
>  	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
> @@ -291,14 +292,15 @@ void __init sparse_mem_maps_populate_node(struct page
> **map_map,
>  		vmemmap_buf_end = vmemmap_buf_start + size * map_count;
>  	}
>  
> -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
>  		struct mem_section *ms;
>  
>  		if (!present_section_nr(pnum))
>  			continue;
>  
> -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> -		if (map_map[pnum])
> +		i++;
> +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> +		if (map_map[i-1])
>  			continue;
>  		ms = __nr_to_section(pnum);
>  		pr_err("%s: sparsemem memory map backing failed some memory will not be
>  		available\n",
> diff --git a/mm/sparse.c b/mm/sparse.c
> index e9311b44e28a..aafb6d838872 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -405,6 +405,7 @@ static void __init sparse_early_usemaps_alloc_node(void
> *data,
>  	unsigned long pnum;
>  	unsigned long **usemap_map = (unsigned long **)data;
>  	int size = usemap_size();
> +	int i = 0;
>  
>  	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
>  							  size * usemap_count);
> @@ -413,12 +414,13 @@ static void __init sparse_early_usemaps_alloc_node(void
> *data,
>  		return;
>  	}
>  
> -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +	for (pnum = pnum_begin; pnum < pnum_end && i < usemap_count; pnum++) {
>  		if (!present_section_nr(pnum))
>  			continue;
> -		usemap_map[pnum] = usemap;
> +		usemap_map[i] = usemap;
>  		usemap += size;
> -		check_usemap_section_nr(nodeid, usemap_map[pnum]);
> +		check_usemap_section_nr(nodeid, usemap_map[i]);
> +		i++;
>  	}
>  }
>  
> @@ -447,14 +449,17 @@ void __init sparse_mem_maps_populate_node(struct page
> **map_map,
>  	void *map;
>  	unsigned long pnum;
>  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> +	int i;
>  
>  	map = alloc_remap(nodeid, size * map_count);
>  	if (map) {
> -		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +		i = 0;
> +		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
>  			if (!present_section_nr(pnum))
>  				continue;
> -			map_map[pnum] = map;
> +			map_map[i] = map;
>  			map += size;
> +			i++;
>  		}
>  		return;
>  	}
> @@ -464,23 +469,27 @@ void __init sparse_mem_maps_populate_node(struct page
> **map_map,
>  					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
>  					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
>  	if (map) {
> -		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +		i = 0;
> +		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
>  			if (!present_section_nr(pnum))
>  				continue;
> -			map_map[pnum] = map;
> +			map_map[i] = map;
>  			map += size;
> +			i++;
>  		}
>  		return;
>  	}
>  
>  	/* fallback */
> -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +	i = 0;
> +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
>  		struct mem_section *ms;
>  
>  		if (!present_section_nr(pnum))
>  			continue;
> -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> -		if (map_map[pnum])
> +		i++;
> +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> +		if (map_map[i-1])
>  			continue;

Below statement will look better?

                map_map[i] = sparse_mem_map_populate(pnum, nodeid, NULL);
                if (map_map[i++])
  			continue;


>  		ms = __nr_to_section(pnum);
>  		pr_err("%s: sparsemem memory map backing failed some memory will not be
>  		available\n",
> @@ -558,6 +567,7 @@ static void __init alloc_usemap_and_memmap(void
> (*alloc_func)
>  		/* new start, update count etc*/
>  		nodeid_begin = nodeid;
>  		pnum_begin = pnum;
> +		data += map_count;
>  		map_count = 1;
>  	}
>  	/* ok, last chunk */
> @@ -576,6 +586,7 @@ void __init sparse_init(void)
>  	unsigned long *usemap;
>  	unsigned long **usemap_map;
>  	int size;
> +	int i = 0;
>  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
>  	int size2;
>  	struct page **map_map;
> @@ -598,7 +609,7 @@ void __init sparse_init(void)
>  	 * powerpc need to call sparse_init_one_section right after each
>  	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
>  	 */
> -	size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
> +	size = sizeof(unsigned long *) * nr_present_sections;
>  	usemap_map = memblock_virt_alloc(size, 0);
>  	if (!usemap_map)
>  		panic("can not allocate usemap_map\n");
> @@ -606,7 +617,7 @@ void __init sparse_init(void)
>  							(void *)usemap_map);
>  
>  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> -	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
> +	size2 = sizeof(struct page *) * nr_present_sections;
>  	map_map = memblock_virt_alloc(size2, 0);
>  	if (!map_map)
>  		panic("can not allocate map_map\n");
> @@ -617,14 +628,15 @@ void __init sparse_init(void)
>  	for_each_present_section_nr(0, pnum) {
>  		struct mem_section *ms;
>  		ms = __nr_to_section(pnum);
> -		usemap = usemap_map[pnum];
> +		i++;
> +		usemap = usemap_map[i-1];

Can try same as above and other places where possible

>  		if (!usemap) {
>  			ms->section_mem_map = 0;
>  			continue;
>  		}
>  
>  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> -		map = map_map[pnum];
> +		map = map_map[i-1];
>  #else
>  		map = sparse_early_mem_map_alloc(pnum);
>  #endif
> --
> 2.13.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init()
@ 2018-02-22 10:07     ` Pankaj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Pankaj Gupta @ 2018-02-22 10:07 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, dave hansen, linux-mm, akpm, kirill shutemov, mhocko, tglx


Hi Baoquan,

> 
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS. They are used to store
> each memory section's usemap and mem map if marked as present. With
> the help of these two arrays, continuous memory chunk is allocated for
> usemap and memmap for memory sections on one node. This avoids too many
> memory fragmentations. Like below diagram, '1' indicates the present
> memory section, '0' means absent one. The number 'n' could be much
> smaller than NR_MEM_SECTIONS on most of systems.
> 
> |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
> -------------------------------------------------------
>  0 1 2 3         4 5         i   i+1     n-1   n
> 
> If fail to populate the page tables to map one section's memmap, its
> ->section_mem_map will be cleared finally to indicate that it's not present.
> After use, these two arrays will be released at the end of sparse_init().
> 
> In 4-level paging mode, each array costs 4M which can be ignorable. While
> in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel
> Usually only reserves very few memory, e.g 256M. So, even thouth they are
> temporarily allocated, still not acceptable.
> 
> In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS.
> Since the ->section_mem_map clearing has been deferred to the last, the
> number of present memory sections are kept the same during sparse_init()
> until we finally clear out the memory section's ->section_mem_map if its
> usemap or memmap is not correctly handled. Thus in the middle whenever
> for_each_present_section_nr() loop is taken, the i-th present memory
> section is always the same one.
> 
> Here only allocate usemap_map and map_map with the size of
> 'nr_present_sections'. For the i-th present memory section, install its
> usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then
> in the last for_each_present_section_nr() loop which clears the failed
> memory section's ->section_mem_map, fetch usemap and memmap from
> usemap_map[] and map_map[] array and set them into mem_section[]
> accordingly.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/sparse-vmemmap.c |  8 +++++---
>  mm/sparse.c         | 40 ++++++++++++++++++++++++++--------------
>  2 files changed, 31 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 640e68f8324b..f83723a49e47 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page
> **map_map,
>  	unsigned long pnum;
>  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
>  	void *vmemmap_buf_start;
> +	int i = 0;
>  
>  	size = ALIGN(size, PMD_SIZE);
>  	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
> @@ -291,14 +292,15 @@ void __init sparse_mem_maps_populate_node(struct page
> **map_map,
>  		vmemmap_buf_end = vmemmap_buf_start + size * map_count;
>  	}
>  
> -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
>  		struct mem_section *ms;
>  
>  		if (!present_section_nr(pnum))
>  			continue;
>  
> -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> -		if (map_map[pnum])
> +		i++;
> +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> +		if (map_map[i-1])
>  			continue;
>  		ms = __nr_to_section(pnum);
>  		pr_err("%s: sparsemem memory map backing failed some memory will not be
>  		available\n",
> diff --git a/mm/sparse.c b/mm/sparse.c
> index e9311b44e28a..aafb6d838872 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -405,6 +405,7 @@ static void __init sparse_early_usemaps_alloc_node(void
> *data,
>  	unsigned long pnum;
>  	unsigned long **usemap_map = (unsigned long **)data;
>  	int size = usemap_size();
> +	int i = 0;
>  
>  	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
>  							  size * usemap_count);
> @@ -413,12 +414,13 @@ static void __init sparse_early_usemaps_alloc_node(void
> *data,
>  		return;
>  	}
>  
> -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +	for (pnum = pnum_begin; pnum < pnum_end && i < usemap_count; pnum++) {
>  		if (!present_section_nr(pnum))
>  			continue;
> -		usemap_map[pnum] = usemap;
> +		usemap_map[i] = usemap;
>  		usemap += size;
> -		check_usemap_section_nr(nodeid, usemap_map[pnum]);
> +		check_usemap_section_nr(nodeid, usemap_map[i]);
> +		i++;
>  	}
>  }
>  
> @@ -447,14 +449,17 @@ void __init sparse_mem_maps_populate_node(struct page
> **map_map,
>  	void *map;
>  	unsigned long pnum;
>  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> +	int i;
>  
>  	map = alloc_remap(nodeid, size * map_count);
>  	if (map) {
> -		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +		i = 0;
> +		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
>  			if (!present_section_nr(pnum))
>  				continue;
> -			map_map[pnum] = map;
> +			map_map[i] = map;
>  			map += size;
> +			i++;
>  		}
>  		return;
>  	}
> @@ -464,23 +469,27 @@ void __init sparse_mem_maps_populate_node(struct page
> **map_map,
>  					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
>  					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
>  	if (map) {
> -		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +		i = 0;
> +		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
>  			if (!present_section_nr(pnum))
>  				continue;
> -			map_map[pnum] = map;
> +			map_map[i] = map;
>  			map += size;
> +			i++;
>  		}
>  		return;
>  	}
>  
>  	/* fallback */
> -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +	i = 0;
> +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
>  		struct mem_section *ms;
>  
>  		if (!present_section_nr(pnum))
>  			continue;
> -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> -		if (map_map[pnum])
> +		i++;
> +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> +		if (map_map[i-1])
>  			continue;

Below statement will look better?

                map_map[i] = sparse_mem_map_populate(pnum, nodeid, NULL);
                if (map_map[i++])
  			continue;


>  		ms = __nr_to_section(pnum);
>  		pr_err("%s: sparsemem memory map backing failed some memory will not be
>  		available\n",
> @@ -558,6 +567,7 @@ static void __init alloc_usemap_and_memmap(void
> (*alloc_func)
>  		/* new start, update count etc*/
>  		nodeid_begin = nodeid;
>  		pnum_begin = pnum;
> +		data += map_count;
>  		map_count = 1;
>  	}
>  	/* ok, last chunk */
> @@ -576,6 +586,7 @@ void __init sparse_init(void)
>  	unsigned long *usemap;
>  	unsigned long **usemap_map;
>  	int size;
> +	int i = 0;
>  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
>  	int size2;
>  	struct page **map_map;
> @@ -598,7 +609,7 @@ void __init sparse_init(void)
>  	 * powerpc need to call sparse_init_one_section right after each
>  	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
>  	 */
> -	size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
> +	size = sizeof(unsigned long *) * nr_present_sections;
>  	usemap_map = memblock_virt_alloc(size, 0);
>  	if (!usemap_map)
>  		panic("can not allocate usemap_map\n");
> @@ -606,7 +617,7 @@ void __init sparse_init(void)
>  							(void *)usemap_map);
>  
>  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> -	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
> +	size2 = sizeof(struct page *) * nr_present_sections;
>  	map_map = memblock_virt_alloc(size2, 0);
>  	if (!map_map)
>  		panic("can not allocate map_map\n");
> @@ -617,14 +628,15 @@ void __init sparse_init(void)
>  	for_each_present_section_nr(0, pnum) {
>  		struct mem_section *ms;
>  		ms = __nr_to_section(pnum);
> -		usemap = usemap_map[pnum];
> +		i++;
> +		usemap = usemap_map[i-1];

Can try same as above and other places where possible

>  		if (!usemap) {
>  			ms->section_mem_map = 0;
>  			continue;
>  		}
>  
>  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> -		map = map_map[pnum];
> +		map = map_map[i-1];
>  #else
>  		map = sparse_early_mem_map_alloc(pnum);
>  #endif
> --
> 2.13.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-02-22 10:07     ` Pankaj Gupta
@ 2018-02-22 10:39       ` Baoquan He
  -1 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22 10:39 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, dave hansen, linux-mm, akpm, kirill shutemov, mhocko, tglx

Hi Pankaj,

On 02/22/18 at 05:07am, Pankaj Gupta wrote:
> 
> Hi Baoquan,
> 
> > 
> > In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> > are allocated with the size of NR_MEM_SECTIONS. They are used to store
> > each memory section's usemap and mem map if marked as present. With
> > the help of these two arrays, continuous memory chunk is allocated for
> > usemap and memmap for memory sections on one node. This avoids too many
> > memory fragmentations. Like below diagram, '1' indicates the present
> > memory section, '0' means absent one. The number 'n' could be much
> > smaller than NR_MEM_SECTIONS on most of systems.
> > 
> > |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
> > -------------------------------------------------------
> >  0 1 2 3         4 5         i   i+1     n-1   n
> > 
> > If fail to populate the page tables to map one section's memmap, its
> > ->section_mem_map will be cleared finally to indicate that it's not present.
> > After use, these two arrays will be released at the end of sparse_init().
> > 
> > In 4-level paging mode, each array costs 4M which can be ignorable. While
> > in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel
> > Usually only reserves very few memory, e.g 256M. So, even thouth they are
> > temporarily allocated, still not acceptable.
> > 
> > In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS.
> > Since the ->section_mem_map clearing has been deferred to the last, the
> > number of present memory sections are kept the same during sparse_init()
> > until we finally clear out the memory section's ->section_mem_map if its
> > usemap or memmap is not correctly handled. Thus in the middle whenever
> > for_each_present_section_nr() loop is taken, the i-th present memory
> > section is always the same one.
> > 
> > Here only allocate usemap_map and map_map with the size of
> > 'nr_present_sections'. For the i-th present memory section, install its
> > usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then
> > in the last for_each_present_section_nr() loop which clears the failed
> > memory section's ->section_mem_map, fetch usemap and memmap from
> > usemap_map[] and map_map[] array and set them into mem_section[]
> > accordingly.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/sparse-vmemmap.c |  8 +++++---
> >  mm/sparse.c         | 40 ++++++++++++++++++++++++++--------------
> >  2 files changed, 31 insertions(+), 17 deletions(-)
> > 
> > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> > index 640e68f8324b..f83723a49e47 100644
> > --- a/mm/sparse-vmemmap.c
> > +++ b/mm/sparse-vmemmap.c
> > @@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page
> > **map_map,
> >  	unsigned long pnum;
> >  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> >  	void *vmemmap_buf_start;
> > +	int i = 0;
> >  
> >  	size = ALIGN(size, PMD_SIZE);
> >  	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
> > @@ -291,14 +292,15 @@ void __init sparse_mem_maps_populate_node(struct page
> > **map_map,
> >  		vmemmap_buf_end = vmemmap_buf_start + size * map_count;
> >  	}
> >  
> > -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
> >  		struct mem_section *ms;
> >  
> >  		if (!present_section_nr(pnum))
> >  			continue;
> >  
> > -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > -		if (map_map[pnum])
> > +		i++;
> > +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > +		if (map_map[i-1])
> >  			continue;
> >  		ms = __nr_to_section(pnum);
> >  		pr_err("%s: sparsemem memory map backing failed some memory will not be
> >  		available\n",
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index e9311b44e28a..aafb6d838872 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -405,6 +405,7 @@ static void __init sparse_early_usemaps_alloc_node(void
> > *data,
> >  	unsigned long pnum;
> >  	unsigned long **usemap_map = (unsigned long **)data;
> >  	int size = usemap_size();
> > +	int i = 0;
> >  
> >  	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
> >  							  size * usemap_count);
> > @@ -413,12 +414,13 @@ static void __init sparse_early_usemaps_alloc_node(void
> > *data,
> >  		return;
> >  	}
> >  
> > -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +	for (pnum = pnum_begin; pnum < pnum_end && i < usemap_count; pnum++) {
> >  		if (!present_section_nr(pnum))
> >  			continue;
> > -		usemap_map[pnum] = usemap;
> > +		usemap_map[i] = usemap;
> >  		usemap += size;
> > -		check_usemap_section_nr(nodeid, usemap_map[pnum]);
> > +		check_usemap_section_nr(nodeid, usemap_map[i]);
> > +		i++;
> >  	}
> >  }
> >  
> > @@ -447,14 +449,17 @@ void __init sparse_mem_maps_populate_node(struct page
> > **map_map,
> >  	void *map;
> >  	unsigned long pnum;
> >  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> > +	int i;
> >  
> >  	map = alloc_remap(nodeid, size * map_count);
> >  	if (map) {
> > -		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +		i = 0;
> > +		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
> >  			if (!present_section_nr(pnum))
> >  				continue;
> > -			map_map[pnum] = map;
> > +			map_map[i] = map;
> >  			map += size;
> > +			i++;
> >  		}
> >  		return;
> >  	}
> > @@ -464,23 +469,27 @@ void __init sparse_mem_maps_populate_node(struct page
> > **map_map,
> >  					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
> >  					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
> >  	if (map) {
> > -		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +		i = 0;
> > +		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
> >  			if (!present_section_nr(pnum))
> >  				continue;
> > -			map_map[pnum] = map;
> > +			map_map[i] = map;
> >  			map += size;
> > +			i++;
> >  		}
> >  		return;
> >  	}
> >  
> >  	/* fallback */
> > -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +	i = 0;
> > +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
> >  		struct mem_section *ms;
> >  
> >  		if (!present_section_nr(pnum))
> >  			continue;
> > -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > -		if (map_map[pnum])
> > +		i++;
> > +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > +		if (map_map[i-1])
> >  			continue;
> 
> Below statement will look better?
> 
>                 map_map[i] = sparse_mem_map_populate(pnum, nodeid, NULL);
>                 if (map_map[i++])
>   			continue;

Thanks for reviewing.

It looks similar trick as the used in patch. If no other input about
this, I will update with your suggestion. And other places too.

Thanks
Baoquan

> 
> 
> >  		ms = __nr_to_section(pnum);
> >  		pr_err("%s: sparsemem memory map backing failed some memory will not be
> >  		available\n",
> > @@ -558,6 +567,7 @@ static void __init alloc_usemap_and_memmap(void
> > (*alloc_func)
> >  		/* new start, update count etc*/
> >  		nodeid_begin = nodeid;
> >  		pnum_begin = pnum;
> > +		data += map_count;
> >  		map_count = 1;
> >  	}
> >  	/* ok, last chunk */
> > @@ -576,6 +586,7 @@ void __init sparse_init(void)
> >  	unsigned long *usemap;
> >  	unsigned long **usemap_map;
> >  	int size;
> > +	int i = 0;
> >  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> >  	int size2;
> >  	struct page **map_map;
> > @@ -598,7 +609,7 @@ void __init sparse_init(void)
> >  	 * powerpc need to call sparse_init_one_section right after each
> >  	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
> >  	 */
> > -	size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
> > +	size = sizeof(unsigned long *) * nr_present_sections;
> >  	usemap_map = memblock_virt_alloc(size, 0);
> >  	if (!usemap_map)
> >  		panic("can not allocate usemap_map\n");
> > @@ -606,7 +617,7 @@ void __init sparse_init(void)
> >  							(void *)usemap_map);
> >  
> >  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> > -	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
> > +	size2 = sizeof(struct page *) * nr_present_sections;
> >  	map_map = memblock_virt_alloc(size2, 0);
> >  	if (!map_map)
> >  		panic("can not allocate map_map\n");
> > @@ -617,14 +628,15 @@ void __init sparse_init(void)
> >  	for_each_present_section_nr(0, pnum) {
> >  		struct mem_section *ms;
> >  		ms = __nr_to_section(pnum);
> > -		usemap = usemap_map[pnum];
> > +		i++;
> > +		usemap = usemap_map[i-1];
> 
> Can try same as above and other places where possible
> 
> >  		if (!usemap) {
> >  			ms->section_mem_map = 0;
> >  			continue;
> >  		}
> >  
> >  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> > -		map = map_map[pnum];
> > +		map = map_map[i-1];
> >  #else
> >  		map = sparse_early_mem_map_alloc(pnum);
> >  #endif
> > --
> > 2.13.6
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init()
@ 2018-02-22 10:39       ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22 10:39 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, dave hansen, linux-mm, akpm, kirill shutemov, mhocko, tglx

Hi Pankaj,

On 02/22/18 at 05:07am, Pankaj Gupta wrote:
> 
> Hi Baoquan,
> 
> > 
> > In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> > are allocated with the size of NR_MEM_SECTIONS. They are used to store
> > each memory section's usemap and mem map if marked as present. With
> > the help of these two arrays, continuous memory chunk is allocated for
> > usemap and memmap for memory sections on one node. This avoids too many
> > memory fragmentations. Like below diagram, '1' indicates the present
> > memory section, '0' means absent one. The number 'n' could be much
> > smaller than NR_MEM_SECTIONS on most of systems.
> > 
> > |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
> > -------------------------------------------------------
> >  0 1 2 3         4 5         i   i+1     n-1   n
> > 
> > If fail to populate the page tables to map one section's memmap, its
> > ->section_mem_map will be cleared finally to indicate that it's not present.
> > After use, these two arrays will be released at the end of sparse_init().
> > 
> > In 4-level paging mode, each array costs 4M which can be ignorable. While
> > in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel
> > Usually only reserves very few memory, e.g 256M. So, even thouth they are
> > temporarily allocated, still not acceptable.
> > 
> > In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS.
> > Since the ->section_mem_map clearing has been deferred to the last, the
> > number of present memory sections are kept the same during sparse_init()
> > until we finally clear out the memory section's ->section_mem_map if its
> > usemap or memmap is not correctly handled. Thus in the middle whenever
> > for_each_present_section_nr() loop is taken, the i-th present memory
> > section is always the same one.
> > 
> > Here only allocate usemap_map and map_map with the size of
> > 'nr_present_sections'. For the i-th present memory section, install its
> > usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then
> > in the last for_each_present_section_nr() loop which clears the failed
> > memory section's ->section_mem_map, fetch usemap and memmap from
> > usemap_map[] and map_map[] array and set them into mem_section[]
> > accordingly.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/sparse-vmemmap.c |  8 +++++---
> >  mm/sparse.c         | 40 ++++++++++++++++++++++++++--------------
> >  2 files changed, 31 insertions(+), 17 deletions(-)
> > 
> > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> > index 640e68f8324b..f83723a49e47 100644
> > --- a/mm/sparse-vmemmap.c
> > +++ b/mm/sparse-vmemmap.c
> > @@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page
> > **map_map,
> >  	unsigned long pnum;
> >  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> >  	void *vmemmap_buf_start;
> > +	int i = 0;
> >  
> >  	size = ALIGN(size, PMD_SIZE);
> >  	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
> > @@ -291,14 +292,15 @@ void __init sparse_mem_maps_populate_node(struct page
> > **map_map,
> >  		vmemmap_buf_end = vmemmap_buf_start + size * map_count;
> >  	}
> >  
> > -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
> >  		struct mem_section *ms;
> >  
> >  		if (!present_section_nr(pnum))
> >  			continue;
> >  
> > -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > -		if (map_map[pnum])
> > +		i++;
> > +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > +		if (map_map[i-1])
> >  			continue;
> >  		ms = __nr_to_section(pnum);
> >  		pr_err("%s: sparsemem memory map backing failed some memory will not be
> >  		available\n",
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index e9311b44e28a..aafb6d838872 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -405,6 +405,7 @@ static void __init sparse_early_usemaps_alloc_node(void
> > *data,
> >  	unsigned long pnum;
> >  	unsigned long **usemap_map = (unsigned long **)data;
> >  	int size = usemap_size();
> > +	int i = 0;
> >  
> >  	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
> >  							  size * usemap_count);
> > @@ -413,12 +414,13 @@ static void __init sparse_early_usemaps_alloc_node(void
> > *data,
> >  		return;
> >  	}
> >  
> > -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +	for (pnum = pnum_begin; pnum < pnum_end && i < usemap_count; pnum++) {
> >  		if (!present_section_nr(pnum))
> >  			continue;
> > -		usemap_map[pnum] = usemap;
> > +		usemap_map[i] = usemap;
> >  		usemap += size;
> > -		check_usemap_section_nr(nodeid, usemap_map[pnum]);
> > +		check_usemap_section_nr(nodeid, usemap_map[i]);
> > +		i++;
> >  	}
> >  }
> >  
> > @@ -447,14 +449,17 @@ void __init sparse_mem_maps_populate_node(struct page
> > **map_map,
> >  	void *map;
> >  	unsigned long pnum;
> >  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> > +	int i;
> >  
> >  	map = alloc_remap(nodeid, size * map_count);
> >  	if (map) {
> > -		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +		i = 0;
> > +		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
> >  			if (!present_section_nr(pnum))
> >  				continue;
> > -			map_map[pnum] = map;
> > +			map_map[i] = map;
> >  			map += size;
> > +			i++;
> >  		}
> >  		return;
> >  	}
> > @@ -464,23 +469,27 @@ void __init sparse_mem_maps_populate_node(struct page
> > **map_map,
> >  					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
> >  					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
> >  	if (map) {
> > -		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +		i = 0;
> > +		for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
> >  			if (!present_section_nr(pnum))
> >  				continue;
> > -			map_map[pnum] = map;
> > +			map_map[i] = map;
> >  			map += size;
> > +			i++;
> >  		}
> >  		return;
> >  	}
> >  
> >  	/* fallback */
> > -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +	i = 0;
> > +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
> >  		struct mem_section *ms;
> >  
> >  		if (!present_section_nr(pnum))
> >  			continue;
> > -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > -		if (map_map[pnum])
> > +		i++;
> > +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > +		if (map_map[i-1])
> >  			continue;
> 
> Below statement will look better?
> 
>                 map_map[i] = sparse_mem_map_populate(pnum, nodeid, NULL);
>                 if (map_map[i++])
>   			continue;

Thanks for reviewing.

It looks similar trick as the used in patch. If no other input about
this, I will update with your suggestion. And other places too.

Thanks
Baoquan

> 
> 
> >  		ms = __nr_to_section(pnum);
> >  		pr_err("%s: sparsemem memory map backing failed some memory will not be
> >  		available\n",
> > @@ -558,6 +567,7 @@ static void __init alloc_usemap_and_memmap(void
> > (*alloc_func)
> >  		/* new start, update count etc*/
> >  		nodeid_begin = nodeid;
> >  		pnum_begin = pnum;
> > +		data += map_count;
> >  		map_count = 1;
> >  	}
> >  	/* ok, last chunk */
> > @@ -576,6 +586,7 @@ void __init sparse_init(void)
> >  	unsigned long *usemap;
> >  	unsigned long **usemap_map;
> >  	int size;
> > +	int i = 0;
> >  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> >  	int size2;
> >  	struct page **map_map;
> > @@ -598,7 +609,7 @@ void __init sparse_init(void)
> >  	 * powerpc need to call sparse_init_one_section right after each
> >  	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
> >  	 */
> > -	size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
> > +	size = sizeof(unsigned long *) * nr_present_sections;
> >  	usemap_map = memblock_virt_alloc(size, 0);
> >  	if (!usemap_map)
> >  		panic("can not allocate usemap_map\n");
> > @@ -606,7 +617,7 @@ void __init sparse_init(void)
> >  							(void *)usemap_map);
> >  
> >  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> > -	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
> > +	size2 = sizeof(struct page *) * nr_present_sections;
> >  	map_map = memblock_virt_alloc(size2, 0);
> >  	if (!map_map)
> >  		panic("can not allocate map_map\n");
> > @@ -617,14 +628,15 @@ void __init sparse_init(void)
> >  	for_each_present_section_nr(0, pnum) {
> >  		struct mem_section *ms;
> >  		ms = __nr_to_section(pnum);
> > -		usemap = usemap_map[pnum];
> > +		i++;
> > +		usemap = usemap_map[i-1];
> 
> Can try same as above and other places where possible
> 
> >  		if (!usemap) {
> >  			ms->section_mem_map = 0;
> >  			continue;
> >  		}
> >  
> >  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> > -		map = map_map[pnum];
> > +		map = map_map[i-1];
> >  #else
> >  		map = sparse_early_mem_map_alloc(pnum);
> >  #endif
> > --
> > 2.13.6
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] mm/sparse: Add a static variable nr_present_sections
  2018-02-22  9:11   ` Baoquan He
@ 2018-02-22 21:24     ` Andrew Morton
  -1 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2018-02-22 21:24 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, dave.hansen, linux-mm, kirill.shutemov, mhocko, tglx

On Thu, 22 Feb 2018 17:11:28 +0800 Baoquan He <bhe@redhat.com> wrote:

> It's used to record how many memory sections are marked as present
> during system boot up, and will be used in the later patch.
> 
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -202,6 +202,7 @@ static inline int next_present_section_nr(int section_nr)
>  	      (section_nr <= __highest_present_section_nr));	\
>  	     section_nr = next_present_section_nr(section_nr))
>  
> +static int nr_present_sections;

I think this could be __initdata.

A nice comment explaining why it exists would be nice.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] mm/sparse: Add a static variable nr_present_sections
@ 2018-02-22 21:24     ` Andrew Morton
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2018-02-22 21:24 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, dave.hansen, linux-mm, kirill.shutemov, mhocko, tglx

On Thu, 22 Feb 2018 17:11:28 +0800 Baoquan He <bhe@redhat.com> wrote:

> It's used to record how many memory sections are marked as present
> during system boot up, and will be used in the later patch.
> 
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -202,6 +202,7 @@ static inline int next_present_section_nr(int section_nr)
>  	      (section_nr <= __highest_present_section_nr));	\
>  	     section_nr = next_present_section_nr(section_nr))
>  
> +static int nr_present_sections;

I think this could be __initdata.

A nice comment explaining why it exists would be nice.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-02-22  9:11   ` Baoquan He
@ 2018-02-22 22:22     ` Dave Hansen
  -1 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2018-02-22 22:22 UTC (permalink / raw)
  To: Baoquan He, linux-kernel; +Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx

First of all, this is a much-improved changelog.  Thanks for that!

On 02/22/2018 01:11 AM, Baoquan He wrote:
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS. They are used to store
> each memory section's usemap and mem map if marked as present. With
> the help of these two arrays, continuous memory chunk is allocated for
> usemap and memmap for memory sections on one node. This avoids too many
> memory fragmentations. Like below diagram, '1' indicates the present
> memory section, '0' means absent one. The number 'n' could be much
> smaller than NR_MEM_SECTIONS on most of systems.
> 
> |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
> -------------------------------------------------------
>  0 1 2 3         4 5         i   i+1     n-1   n
> 
> If fail to populate the page tables to map one section's memmap, its
> ->section_mem_map will be cleared finally to indicate that it's not present.
> After use, these two arrays will be released at the end of sparse_init().

Let me see if I understand this.  tl;dr version of this changelog:

Today, we allocate usemap and mem_map for all sections up front and then
free them later if they are not needed.  With 5-level paging, this eats
all memory and we fall over before we can free them.  Fix it by only
allocating what we _need_ (nr_present_sections).


> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 640e68f8324b..f83723a49e47 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  	unsigned long pnum;
>  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
>  	void *vmemmap_buf_start;
> +	int i = 0;

'i' is a criminally negligent variable name for how it is used here.

>  	size = ALIGN(size, PMD_SIZE);
>  	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
> @@ -291,14 +292,15 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  		vmemmap_buf_end = vmemmap_buf_start + size * map_count;
>  	}
>  
> -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
>  		struct mem_section *ms;
>  
>  		if (!present_section_nr(pnum))
>  			continue;
>  
> -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> -		if (map_map[pnum])
> +		i++;
> +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> +		if (map_map[i-1])
>  			continue;

The i-1 stuff here looks pretty funky.  Isn't this much more readable?

	map_map[i] = sparse_mem_map_populate(pnum, nodeid, NULL);
	if (map_map[i]) {
		i++;
		continue;
	}


> diff --git a/mm/sparse.c b/mm/sparse.c
> index e9311b44e28a..aafb6d838872 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -405,6 +405,7 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
>  	unsigned long pnum;
>  	unsigned long **usemap_map = (unsigned long **)data;
>  	int size = usemap_size();
> +	int i = 0;

Ditto on the naming.  Shouldn't it be nr_consumed_maps or something?

>  	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
>  							  size * usemap_count);
> @@ -413,12 +414,13 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
>  		return;
>  	}
>  
> -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +	for (pnum = pnum_begin; pnum < pnum_end && i < usemap_count; pnum++) {
>  		if (!present_section_nr(pnum))
>  			continue;
> -		usemap_map[pnum] = usemap;
> +		usemap_map[i] = usemap;
>  		usemap += size;
> -		check_usemap_section_nr(nodeid, usemap_map[pnum]);
> +		check_usemap_section_nr(nodeid, usemap_map[i]);
> +		i++;
>  	}
>  }

How would 'i' ever exceed usemap_count?

Also, are there any other side-effects from changing map_map[] to be
indexed by something other than the section number?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init()
@ 2018-02-22 22:22     ` Dave Hansen
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2018-02-22 22:22 UTC (permalink / raw)
  To: Baoquan He, linux-kernel; +Cc: linux-mm, akpm, kirill.shutemov, mhocko, tglx

First of all, this is a much-improved changelog.  Thanks for that!

On 02/22/2018 01:11 AM, Baoquan He wrote:
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS. They are used to store
> each memory section's usemap and mem map if marked as present. With
> the help of these two arrays, continuous memory chunk is allocated for
> usemap and memmap for memory sections on one node. This avoids too many
> memory fragmentations. Like below diagram, '1' indicates the present
> memory section, '0' means absent one. The number 'n' could be much
> smaller than NR_MEM_SECTIONS on most of systems.
> 
> |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
> -------------------------------------------------------
>  0 1 2 3         4 5         i   i+1     n-1   n
> 
> If fail to populate the page tables to map one section's memmap, its
> ->section_mem_map will be cleared finally to indicate that it's not present.
> After use, these two arrays will be released at the end of sparse_init().

Let me see if I understand this.  tl;dr version of this changelog:

Today, we allocate usemap and mem_map for all sections up front and then
free them later if they are not needed.  With 5-level paging, this eats
all memory and we fall over before we can free them.  Fix it by only
allocating what we _need_ (nr_present_sections).


> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 640e68f8324b..f83723a49e47 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  	unsigned long pnum;
>  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
>  	void *vmemmap_buf_start;
> +	int i = 0;

'i' is a criminally negligent variable name for how it is used here.

>  	size = ALIGN(size, PMD_SIZE);
>  	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
> @@ -291,14 +292,15 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  		vmemmap_buf_end = vmemmap_buf_start + size * map_count;
>  	}
>  
> -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
>  		struct mem_section *ms;
>  
>  		if (!present_section_nr(pnum))
>  			continue;
>  
> -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> -		if (map_map[pnum])
> +		i++;
> +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> +		if (map_map[i-1])
>  			continue;

The i-1 stuff here looks pretty funky.  Isn't this much more readable?

	map_map[i] = sparse_mem_map_populate(pnum, nodeid, NULL);
	if (map_map[i]) {
		i++;
		continue;
	}


> diff --git a/mm/sparse.c b/mm/sparse.c
> index e9311b44e28a..aafb6d838872 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -405,6 +405,7 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
>  	unsigned long pnum;
>  	unsigned long **usemap_map = (unsigned long **)data;
>  	int size = usemap_size();
> +	int i = 0;

Ditto on the naming.  Shouldn't it be nr_consumed_maps or something?

>  	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
>  							  size * usemap_count);
> @@ -413,12 +414,13 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
>  		return;
>  	}
>  
> -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> +	for (pnum = pnum_begin; pnum < pnum_end && i < usemap_count; pnum++) {
>  		if (!present_section_nr(pnum))
>  			continue;
> -		usemap_map[pnum] = usemap;
> +		usemap_map[i] = usemap;
>  		usemap += size;
> -		check_usemap_section_nr(nodeid, usemap_map[pnum]);
> +		check_usemap_section_nr(nodeid, usemap_map[i]);
> +		i++;
>  	}
>  }

How would 'i' ever exceed usemap_count?

Also, are there any other side-effects from changing map_map[] to be
indexed by something other than the section number?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] mm/sparse: Add a static variable nr_present_sections
  2018-02-22 21:24     ` Andrew Morton
@ 2018-02-22 23:56       ` Baoquan He
  -1 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22 23:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, dave.hansen, linux-mm, kirill.shutemov, mhocko, tglx

On 02/22/18 at 01:24pm, Andrew Morton wrote:
> On Thu, 22 Feb 2018 17:11:28 +0800 Baoquan He <bhe@redhat.com> wrote:
> 
> > It's used to record how many memory sections are marked as present
> > during system boot up, and will be used in the later patch.
> > 
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -202,6 +202,7 @@ static inline int next_present_section_nr(int section_nr)
> >  	      (section_nr <= __highest_present_section_nr));	\
> >  	     section_nr = next_present_section_nr(section_nr))
> >  
> > +static int nr_present_sections;
> 
> I think this could be __initdata.
> 
> A nice comment explaining why it exists would be nice.

Thanks, I will update as you suggested.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] mm/sparse: Add a static variable nr_present_sections
@ 2018-02-22 23:56       ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-22 23:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, dave.hansen, linux-mm, kirill.shutemov, mhocko, tglx

On 02/22/18 at 01:24pm, Andrew Morton wrote:
> On Thu, 22 Feb 2018 17:11:28 +0800 Baoquan He <bhe@redhat.com> wrote:
> 
> > It's used to record how many memory sections are marked as present
> > during system boot up, and will be used in the later patch.
> > 
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -202,6 +202,7 @@ static inline int next_present_section_nr(int section_nr)
> >  	      (section_nr <= __highest_present_section_nr));	\
> >  	     section_nr = next_present_section_nr(section_nr))
> >  
> > +static int nr_present_sections;
> 
> I think this could be __initdata.
> 
> A nice comment explaining why it exists would be nice.

Thanks, I will update as you suggested.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-02-22 22:22     ` Dave Hansen
@ 2018-02-23  2:38       ` Baoquan He
  -1 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-23  2:38 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, akpm, kirill.shutemov, mhocko, tglx

On 02/22/18 at 02:22pm, Dave Hansen wrote:
> First of all, this is a much-improved changelog.  Thanks for that!
> 
> On 02/22/2018 01:11 AM, Baoquan He wrote:
> > In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> > are allocated with the size of NR_MEM_SECTIONS. They are used to store
> > each memory section's usemap and mem map if marked as present. With
> > the help of these two arrays, continuous memory chunk is allocated for
> > usemap and memmap for memory sections on one node. This avoids too many
> > memory fragmentations. Like below diagram, '1' indicates the present
> > memory section, '0' means absent one. The number 'n' could be much
> > smaller than NR_MEM_SECTIONS on most of systems.
> > 
> > |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
> > -------------------------------------------------------
> >  0 1 2 3         4 5         i   i+1     n-1   n
> > 
> > If fail to populate the page tables to map one section's memmap, its
> > ->section_mem_map will be cleared finally to indicate that it's not present.
> > After use, these two arrays will be released at the end of sparse_init().
> 

Thanks, Dave.

> Let me see if I understand this.  tl;dr version of this changelog:
> 
> Today, we allocate usemap and mem_map for all sections up front and then
> free them later if they are not needed.  With 5-level paging, this eats
> all memory and we fall over before we can free them.  Fix it by only
> allocating what we _need_ (nr_present_sections).

Might no quite right, we allocate pointer array usemap_map and map_map
for all sections, then we allocate usemap and memmap for present sections,
and use usemap_map to point at the allocated usemap, map_map to point at
allocated memmap. At last, we set them into mem_section[]'s member,
please see sparse_init_one_section() calling in sparse_init(). And
pointer array usemap_map and map_map are not needed any more, they are
freed totally.

And yes, with 5-level paging, this eats too much memory. We fall over
because we can't allocate so much memory on system with few memory, e.g
in kdump kernel with 256M memory usually.

Here, pointer array usemap_map and map_map are auxiliary data
structures. Without them, we have to allocate usemap and memmap for 
section one by one, and we tend to allocate each node's data on that
node itself. This will cause too many memory fragmentations.

In this patch, only allocate those temporary pointer arrays usemap_map
and map_map with 'nr_present_sections'. You can see, in sections loop,
there are two variables increasing with different steps. 'pnum' steps up
from 0 to NR_MEM_SECTIONS, while 'i' steps up only if section is
present.

> 
> 
> > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> > index 640e68f8324b..f83723a49e47 100644
> > --- a/mm/sparse-vmemmap.c
> > +++ b/mm/sparse-vmemmap.c
> > @@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> >  	unsigned long pnum;
> >  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> >  	void *vmemmap_buf_start;
> > +	int i = 0;
> 
> 'i' is a criminally negligent variable name for how it is used here.

Hmm, I considered this. However, it's mainly used to index map, I can't
think of a good name to represent the present section, and also do not
want to make the array indexing line too long. I would like to hear any
suggestion about a better naming.

> 
> >  	size = ALIGN(size, PMD_SIZE);
> >  	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
> > @@ -291,14 +292,15 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> >  		vmemmap_buf_end = vmemmap_buf_start + size * map_count;
> >  	}
> >  
> > -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
> >  		struct mem_section *ms;
> >  
> >  		if (!present_section_nr(pnum))
> >  			continue;
> >  
> > -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > -		if (map_map[pnum])
> > +		i++;
> > +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > +		if (map_map[i-1])
> >  			continue;
> 
> The i-1 stuff here looks pretty funky.  Isn't this much more readable?


Below code needs another 'i++;' if map_map[i] == 0, it might look not good.
That is why I used trick to avoid it.

	map_map[i] = sparse_mem_map_populate(pnum, nodeid, NULL);
	if (map_map[i]) {
	        i++;
	        continue;
	}
	ms = __nr_to_section(pnum);
	pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
		__func__);
	i++;


Pankaj's suggestion looks better, I plan to take his if no objection.

        map_map[i] = sparse_mem_map_populate(pnum, nodeid, NULL);
        if (map_map[i++])                                                                                                                 
                continue;
        ms = __nr_to_section(pnum);
> 
> 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index e9311b44e28a..aafb6d838872 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -405,6 +405,7 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
> >  	unsigned long pnum;
> >  	unsigned long **usemap_map = (unsigned long **)data;
> >  	int size = usemap_size();
> > +	int i = 0;
> 
> Ditto on the naming.  Shouldn't it be nr_consumed_maps or something?

Before I hesitated on this because it would make the code line too long.

		usemap_map[nr_consumed_maps] = usemap;

I am fine with nr_consumed_maps, or is it OK to replace 'i' with
'nr_present' in all places?

> 
> >  	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
> >  							  size * usemap_count);
> > @@ -413,12 +414,13 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
> >  		return;
> >  	}
> >  
> > -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +	for (pnum = pnum_begin; pnum < pnum_end && i < usemap_count; pnum++) {
> >  		if (!present_section_nr(pnum))
> >  			continue;
> > -		usemap_map[pnum] = usemap;
> > +		usemap_map[i] = usemap;
> >  		usemap += size;
> > -		check_usemap_section_nr(nodeid, usemap_map[pnum]);
> > +		check_usemap_section_nr(nodeid, usemap_map[i]);
> > +		i++;
> >  	}
> >  }
> 
> How would 'i' ever exceed usemap_count?

'i' should not exceed usemap_count, just it's a limit, I had worry. Will
remove the 'i < usemap_count' checking.

> 
> Also, are there any other side-effects from changing map_map[] to be
> indexed by something other than the section number?

>From code, it won't bring side-effect. As I said above, we just write
into map_map[] by indexing with 'i', and fetch with 'i' too from
map_map[]. And agree we need be very careful since this is core code,
need more eyes to help review. 

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init()
@ 2018-02-23  2:38       ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-02-23  2:38 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, akpm, kirill.shutemov, mhocko, tglx

On 02/22/18 at 02:22pm, Dave Hansen wrote:
> First of all, this is a much-improved changelog.  Thanks for that!
> 
> On 02/22/2018 01:11 AM, Baoquan He wrote:
> > In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> > are allocated with the size of NR_MEM_SECTIONS. They are used to store
> > each memory section's usemap and mem map if marked as present. With
> > the help of these two arrays, continuous memory chunk is allocated for
> > usemap and memmap for memory sections on one node. This avoids too many
> > memory fragmentations. Like below diagram, '1' indicates the present
> > memory section, '0' means absent one. The number 'n' could be much
> > smaller than NR_MEM_SECTIONS on most of systems.
> > 
> > |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
> > -------------------------------------------------------
> >  0 1 2 3         4 5         i   i+1     n-1   n
> > 
> > If fail to populate the page tables to map one section's memmap, its
> > ->section_mem_map will be cleared finally to indicate that it's not present.
> > After use, these two arrays will be released at the end of sparse_init().
> 

Thanks, Dave.

> Let me see if I understand this.  tl;dr version of this changelog:
> 
> Today, we allocate usemap and mem_map for all sections up front and then
> free them later if they are not needed.  With 5-level paging, this eats
> all memory and we fall over before we can free them.  Fix it by only
> allocating what we _need_ (nr_present_sections).

Might no quite right, we allocate pointer array usemap_map and map_map
for all sections, then we allocate usemap and memmap for present sections,
and use usemap_map to point at the allocated usemap, map_map to point at
allocated memmap. At last, we set them into mem_section[]'s member,
please see sparse_init_one_section() calling in sparse_init(). And
pointer array usemap_map and map_map are not needed any more, they are
freed totally.

And yes, with 5-level paging, this eats too much memory. We fall over
because we can't allocate so much memory on system with few memory, e.g
in kdump kernel with 256M memory usually.

Here, pointer array usemap_map and map_map are auxiliary data
structures. Without them, we have to allocate usemap and memmap for 
section one by one, and we tend to allocate each node's data on that
node itself. This will cause too many memory fragmentations.

In this patch, only allocate those temporary pointer arrays usemap_map
and map_map with 'nr_present_sections'. You can see, in sections loop,
there are two variables increasing with different steps. 'pnum' steps up
from 0 to NR_MEM_SECTIONS, while 'i' steps up only if section is
present.

> 
> 
> > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> > index 640e68f8324b..f83723a49e47 100644
> > --- a/mm/sparse-vmemmap.c
> > +++ b/mm/sparse-vmemmap.c
> > @@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> >  	unsigned long pnum;
> >  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> >  	void *vmemmap_buf_start;
> > +	int i = 0;
> 
> 'i' is a criminally negligent variable name for how it is used here.

Hmm, I considered this. However, it's mainly used to index map, I can't
think of a good name to represent the present section, and also do not
want to make the array indexing line too long. I would like to hear any
suggestion about a better naming.

> 
> >  	size = ALIGN(size, PMD_SIZE);
> >  	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
> > @@ -291,14 +292,15 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> >  		vmemmap_buf_end = vmemmap_buf_start + size * map_count;
> >  	}
> >  
> > -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +	for (pnum = pnum_begin; pnum < pnum_end && i < map_count; pnum++) {
> >  		struct mem_section *ms;
> >  
> >  		if (!present_section_nr(pnum))
> >  			continue;
> >  
> > -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > -		if (map_map[pnum])
> > +		i++;
> > +		map_map[i-1] = sparse_mem_map_populate(pnum, nodeid, NULL);
> > +		if (map_map[i-1])
> >  			continue;
> 
> The i-1 stuff here looks pretty funky.  Isn't this much more readable?


Below code needs another 'i++;' if map_map[i] == 0, it might look not good.
That is why I used trick to avoid it.

	map_map[i] = sparse_mem_map_populate(pnum, nodeid, NULL);
	if (map_map[i]) {
	        i++;
	        continue;
	}
	ms = __nr_to_section(pnum);
	pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
		__func__);
	i++;


Pankaj's suggestion looks better, I plan to take his if no objection.

        map_map[i] = sparse_mem_map_populate(pnum, nodeid, NULL);
        if (map_map[i++])                                                                                                                 
                continue;
        ms = __nr_to_section(pnum);
> 
> 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index e9311b44e28a..aafb6d838872 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -405,6 +405,7 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
> >  	unsigned long pnum;
> >  	unsigned long **usemap_map = (unsigned long **)data;
> >  	int size = usemap_size();
> > +	int i = 0;
> 
> Ditto on the naming.  Shouldn't it be nr_consumed_maps or something?

Before I hesitated on this because it would make the code line too long.

		usemap_map[nr_consumed_maps] = usemap;

I am fine with nr_consumed_maps, or is it OK to replace 'i' with
'nr_present' in all places?

> 
> >  	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
> >  							  size * usemap_count);
> > @@ -413,12 +414,13 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
> >  		return;
> >  	}
> >  
> > -	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> > +	for (pnum = pnum_begin; pnum < pnum_end && i < usemap_count; pnum++) {
> >  		if (!present_section_nr(pnum))
> >  			continue;
> > -		usemap_map[pnum] = usemap;
> > +		usemap_map[i] = usemap;
> >  		usemap += size;
> > -		check_usemap_section_nr(nodeid, usemap_map[pnum]);
> > +		check_usemap_section_nr(nodeid, usemap_map[i]);
> > +		i++;
> >  	}
> >  }
> 
> How would 'i' ever exceed usemap_count?

'i' should not exceed usemap_count, just it's a limit, I had worry. Will
remove the 'i < usemap_count' checking.

> 
> Also, are there any other side-effects from changing map_map[] to be
> indexed by something other than the section number?

>From code, it won't bring side-effect. As I said above, we just write
into map_map[] by indexing with 'i', and fetch with 'i' too from
map_map[]. And agree we need be very careful since this is core code,
need more eyes to help review. 

Thanks
Baoquan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-02-23  2:38 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-22  9:11 [PATCH v2 0/3] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
2018-02-22  9:11 ` Baoquan He
2018-02-22  9:11 ` [PATCH v2 1/3] mm/sparse: Add a static variable nr_present_sections Baoquan He
2018-02-22  9:11   ` Baoquan He
2018-02-22 21:24   ` Andrew Morton
2018-02-22 21:24     ` Andrew Morton
2018-02-22 23:56     ` Baoquan He
2018-02-22 23:56       ` Baoquan He
2018-02-22  9:11 ` [PATCH v2 2/3] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
2018-02-22  9:11   ` Baoquan He
2018-02-22  9:11 ` [PATCH v2 3/3] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
2018-02-22  9:11   ` Baoquan He
2018-02-22 10:07   ` Pankaj Gupta
2018-02-22 10:07     ` Pankaj Gupta
2018-02-22 10:39     ` Baoquan He
2018-02-22 10:39       ` Baoquan He
2018-02-22 22:22   ` Dave Hansen
2018-02-22 22:22     ` Dave Hansen
2018-02-23  2:38     ` Baoquan He
2018-02-23  2:38       ` Baoquan He
2018-02-22  9:15 ` [PATCH v2 0/3] " Baoquan He
2018-02-22  9:15   ` Baoquan He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.