linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] mm/sparse: Optimize memmap allocation during sparse_init()
@ 2018-06-27  1:31 Baoquan He
  2018-06-27  1:31 ` [PATCH v5 1/4] mm/sparse: Add a static variable nr_present_sections Baoquan He
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Baoquan He @ 2018-06-27  1:31 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta
  Cc: linux-mm, kirill.shutemov, Baoquan He

This is v4 post. V3 can be found here:
https://lkml.org/lkml/2018/2/27/928

V1 can be found here:
https://www.spinics.net/lists/linux-mm/msg144486.html

In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. In
5-level paging mode, this will cost 512M memory though they will be
released at the end of sparse_init(). System with few memory, like
kdump kernel which usually only has about 256M, will fail to boot
because of allocation failure if CONFIG_X86_5LEVEL=y.

In this patchset, optimize the memmap allocation code to only use
usemap_map and map_map with the size of nr_present_sections. This
makes kdump kernel boot up with normal crashkernel='' setting when
CONFIG_X86_5LEVEL=y.

Change log:
v4->v5:
  Improve patch 3/4 log according to Dave's suggestion.

  Correct the wrong copy&paste of making 'nr_consumed_maps' to
  'alloc_usemap_and_memmap' mistakenly which is pointed out by
  Dave in patch 4/4 code comment.

  Otherwise, no code change in this version.
v3->v4:
  Improve according to Dave's three concerns which are in patch 0004:

  Rename variable 'idx_present' to 'nr_consumed_maps' which used to
  index the memmap and usemap of present sections.

  Add a check if 'nr_consumed_maps' goes beyond nr_present_sections.

  Add code comment above the final for_each_present_section_nr() to
  tell why 'nr_consumed_maps' need be increased in each iteration
  whether the 'ms->section_mem_map' need cleared or out.

v2->v3:
  Change nr_present_sections as __initdata and add code comment
  according to Andrew's suggestion.

  Change the local variable 'i' as idx_present which loops over the
  present sections, and improve the code. These are suggested by
  Dave and Pankaj.

  Add a new patch 0003 which adds a new parameter 'data_unit_size'
  to function alloc_usemap_and_memmap() in which we will update 'data'
  to make it point at new position. However its type 'void *' can't give
  us needed info to do that. Need pass the unit size in. So change code
  in patch 0004 accordingly. This is a code bug fix found when tested
  the memory deployed on multiple nodes.

v1-v2:
  Split out the nr_present_sections adding as a single patch for easier
  reviewing.

  Rewrite patch log according to Dave's suggestion.

  Fix code bug in patch 0002 reported by test robot.

Baoquan He (4):
  mm/sparse: Add a static variable nr_present_sections
  mm/sparsemem: Defer the ms->section_mem_map clearing
  mm/sparse: Add a new parameter 'data_unit_size' for
    alloc_usemap_and_memmap
  mm/sparse: Optimize memmap allocation during sparse_init()

 mm/sparse-vmemmap.c |  6 ++---
 mm/sparse.c         | 72 +++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 59 insertions(+), 19 deletions(-)

-- 
2.13.6


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 1/4] mm/sparse: Add a static variable nr_present_sections
  2018-06-27  1:31 [PATCH v5 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
@ 2018-06-27  1:31 ` Baoquan He
  2018-06-28  3:10   ` Pavel Tatashin
  2018-06-27  1:31 ` [PATCH v5 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Baoquan He @ 2018-06-27  1:31 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta
  Cc: linux-mm, kirill.shutemov, Baoquan He

It's used to record how many memory sections are marked as present
during system boot up, and will be used in the later patch.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index f13f2723950a..6314303130b0 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -200,6 +200,12 @@ static inline int next_present_section_nr(int section_nr)
 	      (section_nr <= __highest_present_section_nr));	\
 	     section_nr = next_present_section_nr(section_nr))
 
+/*
+ * Record how many memory sections are marked as present
+ * during system bootup.
+ */
+static int __initdata nr_present_sections;
+
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -229,6 +235,7 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
 			ms->section_mem_map = sparse_encode_early_nid(nid) |
 							SECTION_IS_ONLINE;
 			section_mark_present(ms);
+			nr_present_sections++;
 		}
 	}
 }
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-06-27  1:31 [PATCH v5 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
  2018-06-27  1:31 ` [PATCH v5 1/4] mm/sparse: Add a static variable nr_present_sections Baoquan He
@ 2018-06-27  1:31 ` Baoquan He
  2018-06-27  9:54   ` Oscar Salvador
  2018-06-27  1:31 ` [PATCH v5 3/4] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap Baoquan He
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Baoquan He @ 2018-06-27  1:31 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta
  Cc: linux-mm, kirill.shutemov, Baoquan He

In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
will allocate one continuous memory chunk for mem maps on one node and
populate the relevant page tables to map memory section one by one. If
fail to populate for a certain mem section, print warning and its
->section_mem_map will be cleared to cancel the marking of being present.
Like this, the number of mem sections marked as present could become
less during sparse_init() execution.

Here just defer the ms->section_mem_map clearing if failed to populate
its page tables until the last for_each_present_section_nr() loop. This
is in preparation for later optimizing the mem map allocation.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse-vmemmap.c |  1 -
 mm/sparse.c         | 12 ++++++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index bd0276d5f66b..640e68f8324b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -303,7 +303,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
-		ms->section_mem_map = 0;
 	}
 
 	if (vmemmap_buf_start) {
diff --git a/mm/sparse.c b/mm/sparse.c
index 6314303130b0..71ad53da2cd1 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -451,7 +451,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
-		ms->section_mem_map = 0;
 	}
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
@@ -479,7 +478,6 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
 
 	pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 	       __func__);
-	ms->section_mem_map = 0;
 	return NULL;
 }
 #endif
@@ -583,17 +581,23 @@ void __init sparse_init(void)
 #endif
 
 	for_each_present_section_nr(0, pnum) {
+		struct mem_section *ms;
+		ms = __nr_to_section(pnum);
 		usemap = usemap_map[pnum];
-		if (!usemap)
+		if (!usemap) {
+			ms->section_mem_map = 0;
 			continue;
+		}
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 		map = map_map[pnum];
 #else
 		map = sparse_early_mem_map_alloc(pnum);
 #endif
-		if (!map)
+		if (!map) {
+			ms->section_mem_map = 0;
 			continue;
+		}
 
 		sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 								usemap);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 3/4] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap
  2018-06-27  1:31 [PATCH v5 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
  2018-06-27  1:31 ` [PATCH v5 1/4] mm/sparse: Add a static variable nr_present_sections Baoquan He
  2018-06-27  1:31 ` [PATCH v5 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
@ 2018-06-27  1:31 ` Baoquan He
  2018-06-28  3:14   ` Pavel Tatashin
  2018-06-27  1:31 ` [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Baoquan He @ 2018-06-27  1:31 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta
  Cc: linux-mm, kirill.shutemov, Baoquan He

alloc_usemap_and_memmap() is passing in a "void *" that points to
usemap_map or memmap_map. In next patch we will change both of the
map allocation from taking 'NR_MEM_SECTIONS' as the length to taking
'nr_present_sections' as the length. After that, the passed in 'void*'
needs to update as things get consumed. But, it knows only the
quantity of objects consumed and not the type.  This effectively
tells it enough about the type to let it update the pointer as
objects are consumed.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 71ad53da2cd1..b2848cc6e32a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -489,10 +489,12 @@ void __weak __meminit vmemmap_populate_print_last(void)
 /**
  *  alloc_usemap_and_memmap - memory alloction for pageblock flags and vmemmap
  *  @map: usemap_map for pageblock flags or mmap_map for vmemmap
+ *  @unit_size: size of map unit
  */
 static void __init alloc_usemap_and_memmap(void (*alloc_func)
 					(void *, unsigned long, unsigned long,
-					unsigned long, int), void *data)
+					unsigned long, int), void *data,
+					int data_unit_size)
 {
 	unsigned long pnum;
 	unsigned long map_count;
@@ -569,7 +571,8 @@ void __init sparse_init(void)
 	if (!usemap_map)
 		panic("can not allocate usemap_map\n");
 	alloc_usemap_and_memmap(sparse_early_usemaps_alloc_node,
-							(void *)usemap_map);
+				(void *)usemap_map,
+				sizeof(usemap_map[0]));
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
@@ -577,7 +580,8 @@ void __init sparse_init(void)
 	if (!map_map)
 		panic("can not allocate map_map\n");
 	alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
-							(void *)map_map);
+				(void *)map_map,
+				sizeof(map_map[0]));
 #endif
 
 	for_each_present_section_nr(0, pnum) {
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-27  1:31 [PATCH v5 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
                   ` (2 preceding siblings ...)
  2018-06-27  1:31 ` [PATCH v5 3/4] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap Baoquan He
@ 2018-06-27  1:31 ` Baoquan He
  2018-06-28  3:19   ` Pavel Tatashin
  2018-06-29 17:16   ` Dave Hansen
  2018-06-27  1:47 ` [PATCH v5 0/4] " Baoquan He
  2018-06-27 17:47 ` Pavel Tatashin
  5 siblings, 2 replies; 22+ messages in thread
From: Baoquan He @ 2018-06-27  1:31 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta
  Cc: linux-mm, kirill.shutemov, Baoquan He

In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. With
the help of these two arrays, continuous memory chunk is allocated for
usemap and memmap for memory sections on one node. This avoids too many
memory fragmentations. Like below diagram, '1' indicates the present
memory section, '0' means absent one. The number 'n' could be much
smaller than NR_MEM_SECTIONS on most of systems.

|1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
-------------------------------------------------------
 0 1 2 3         4 5         i   i+1     n-1   n

If fail to populate the page tables to map one section's memmap, its
->section_mem_map will be cleared finally to indicate that it's not present.
After use, these two arrays will be released at the end of sparse_init().

In 4-level paging mode, each array costs 4M which can be ignorable. While
in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel
Usually only reserves very few memory, e.g 256M. So, even thouth they are
temporarily allocated, still not acceptable.

In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS.
Since the ->section_mem_map clearing has been deferred to the last, the
number of present memory sections are kept the same during sparse_init()
until we finally clear out the memory section's ->section_mem_map if its
usemap or memmap is not correctly handled. Thus in the middle whenever
for_each_present_section_nr() loop is taken, the i-th present memory
section is always the same one.

Here only allocate usemap_map and map_map with the size of
'nr_present_sections'. For the i-th present memory section, install its
usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then
in the last for_each_present_section_nr() loop which clears the failed
memory section's ->section_mem_map, fetch usemap and memmap from
usemap_map[] and map_map[] array and set them into mem_section[]
accordingly.

Signed-off-by: Baoquan He <bhe@redhat.com>

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse-vmemmap.c |  5 +++--
 mm/sparse.c         | 43 ++++++++++++++++++++++++++++++++++---------
 2 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 640e68f8324b..a98ec2fb6915 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
 	void *vmemmap_buf_start;
+	int nr_consumed_maps = 0;
 
 	size = ALIGN(size, PMD_SIZE);
 	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
@@ -297,8 +298,8 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		if (!present_section_nr(pnum))
 			continue;
 
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[nr_consumed_maps++])
 			continue;
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
diff --git a/mm/sparse.c b/mm/sparse.c
index b2848cc6e32a..2eb8ee72e44d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -386,6 +386,7 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 	unsigned long pnum;
 	unsigned long **usemap_map = (unsigned long **)data;
 	int size = usemap_size();
+	int nr_consumed_maps = 0;
 
 	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
 							  size * usemap_count);
@@ -397,9 +398,10 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
-		usemap_map[pnum] = usemap;
+		usemap_map[nr_consumed_maps] = usemap;
 		usemap += size;
-		check_usemap_section_nr(nodeid, usemap_map[pnum]);
+		check_usemap_section_nr(nodeid, usemap_map[nr_consumed_maps]);
+		nr_consumed_maps++;
 	}
 }
 
@@ -424,29 +426,33 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	void *map;
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
+	int nr_consumed_maps;
 
 	size = PAGE_ALIGN(size);
 	map = memblock_virt_alloc_try_nid_raw(size * map_count,
 					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
 					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
 	if (map) {
+		nr_consumed_maps = 0;
 		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 			if (!present_section_nr(pnum))
 				continue;
-			map_map[pnum] = map;
+			map_map[nr_consumed_maps] = map;
 			map += size;
+			nr_consumed_maps++;
 		}
 		return;
 	}
 
 	/* fallback */
+	nr_consumed_maps = 0;
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 		struct mem_section *ms;
 
 		if (!present_section_nr(pnum))
 			continue;
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[nr_consumed_maps++])
 			continue;
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
@@ -526,6 +532,7 @@ static void __init alloc_usemap_and_memmap(void (*alloc_func)
 		/* new start, update count etc*/
 		nodeid_begin = nodeid;
 		pnum_begin = pnum;
+		data += map_count * data_unit_size;
 		map_count = 1;
 	}
 	/* ok, last chunk */
@@ -544,6 +551,7 @@ void __init sparse_init(void)
 	unsigned long *usemap;
 	unsigned long **usemap_map;
 	int size;
+	int nr_consumed_maps = 0;
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 	int size2;
 	struct page **map_map;
@@ -566,7 +574,7 @@ void __init sparse_init(void)
 	 * powerpc need to call sparse_init_one_section right after each
 	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
 	 */
-	size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
+	size = sizeof(unsigned long *) * nr_present_sections;
 	usemap_map = memblock_virt_alloc(size, 0);
 	if (!usemap_map)
 		panic("can not allocate usemap_map\n");
@@ -575,7 +583,7 @@ void __init sparse_init(void)
 				sizeof(usemap_map[0]));
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
+	size2 = sizeof(struct page *) * nr_present_sections;
 	map_map = memblock_virt_alloc(size2, 0);
 	if (!map_map)
 		panic("can not allocate map_map\n");
@@ -584,27 +592,44 @@ void __init sparse_init(void)
 				sizeof(map_map[0]));
 #endif
 
+	/* The numner of present sections stored in nr_present_sections
+	 * are kept the same since mem sections are marked as present in
+	 * memory_present(). In this for loop, we need check which sections
+	 * failed to allocate memmap or usemap, then clear its
+	 * ->section_mem_map accordingly. During this process, we need
+	 * increase 'nr_consumed_maps' whether its allocation of memmap
+	 * or usemap failed or not, so that after we handle the i-th
+	 * memory section, can get memmap and usemap of (i+1)-th section
+	 * correctly. */
 	for_each_present_section_nr(0, pnum) {
 		struct mem_section *ms;
+
+		if (nr_consumed_maps >= nr_present_sections) {
+			pr_err("nr_consumed_maps goes beyond nr_present_sections\n");
+			break;
+		}
 		ms = __nr_to_section(pnum);
-		usemap = usemap_map[pnum];
+		usemap = usemap_map[nr_consumed_maps];
 		if (!usemap) {
 			ms->section_mem_map = 0;
+			nr_consumed_maps++;
 			continue;
 		}
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-		map = map_map[pnum];
+		map = map_map[nr_consumed_maps];
 #else
 		map = sparse_early_mem_map_alloc(pnum);
 #endif
 		if (!map) {
 			ms->section_mem_map = 0;
+			nr_consumed_maps++;
 			continue;
 		}
 
 		sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 								usemap);
+		nr_consumed_maps++;
 	}
 
 	vmemmap_populate_print_last();
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-27  1:31 [PATCH v5 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
                   ` (3 preceding siblings ...)
  2018-06-27  1:31 ` [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
@ 2018-06-27  1:47 ` Baoquan He
  2018-06-27 17:47 ` Pavel Tatashin
  5 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-06-27  1:47 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta; +Cc: linux-mm, kirill.shutemov

On 06/27/18 at 09:31am, Baoquan He wrote:
> This is v4 post. V3 can be found here:
> https://lkml.org/lkml/2018/2/27/928

Sorry, forgot updating this part.

This is v5 post, v4 can be found here:
http://lkml.kernel.org/r/20180521101555.25610-1-bhe@redhat.com

> 
> V1 can be found here:
> https://www.spinics.net/lists/linux-mm/msg144486.html
> 
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS. They are used to store
> each memory section's usemap and mem map if marked as present. In
> 5-level paging mode, this will cost 512M memory though they will be
> released at the end of sparse_init(). System with few memory, like
> kdump kernel which usually only has about 256M, will fail to boot
> because of allocation failure if CONFIG_X86_5LEVEL=y.
> 
> In this patchset, optimize the memmap allocation code to only use
> usemap_map and map_map with the size of nr_present_sections. This
> makes kdump kernel boot up with normal crashkernel='' setting when
> CONFIG_X86_5LEVEL=y.
> 
> Change log:
> v4->v5:
>   Improve patch 3/4 log according to Dave's suggestion.
> 
>   Correct the wrong copy&paste of making 'nr_consumed_maps' to
>   'alloc_usemap_and_memmap' mistakenly which is pointed out by
>   Dave in patch 4/4 code comment.
> 
>   Otherwise, no code change in this version.
> v3->v4:
>   Improve according to Dave's three concerns which are in patch 0004:
> 
>   Rename variable 'idx_present' to 'nr_consumed_maps' which used to
>   index the memmap and usemap of present sections.
> 
>   Add a check if 'nr_consumed_maps' goes beyond nr_present_sections.
> 
>   Add code comment above the final for_each_present_section_nr() to
>   tell why 'nr_consumed_maps' need be increased in each iteration
>   whether the 'ms->section_mem_map' need cleared or out.
> 
> v2->v3:
>   Change nr_present_sections as __initdata and add code comment
>   according to Andrew's suggestion.
> 
>   Change the local variable 'i' as idx_present which loops over the
>   present sections, and improve the code. These are suggested by
>   Dave and Pankaj.
> 
>   Add a new patch 0003 which adds a new parameter 'data_unit_size'
>   to function alloc_usemap_and_memmap() in which we will update 'data'
>   to make it point at new position. However its type 'void *' can't give
>   us needed info to do that. Need pass the unit size in. So change code
>   in patch 0004 accordingly. This is a code bug fix found when tested
>   the memory deployed on multiple nodes.
> 
> v1-v2:
>   Split out the nr_present_sections adding as a single patch for easier
>   reviewing.
> 
>   Rewrite patch log according to Dave's suggestion.
> 
>   Fix code bug in patch 0002 reported by test robot.
> 
> Baoquan He (4):
>   mm/sparse: Add a static variable nr_present_sections
>   mm/sparsemem: Defer the ms->section_mem_map clearing
>   mm/sparse: Add a new parameter 'data_unit_size' for
>     alloc_usemap_and_memmap
>   mm/sparse: Optimize memmap allocation during sparse_init()
> 
>  mm/sparse-vmemmap.c |  6 ++---
>  mm/sparse.c         | 72 +++++++++++++++++++++++++++++++++++++++++------------
>  2 files changed, 59 insertions(+), 19 deletions(-)
> 
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-06-27  1:31 ` [PATCH v5 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
@ 2018-06-27  9:54   ` Oscar Salvador
  2018-06-27 22:59     ` Baoquan He
  0 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2018-06-27  9:54 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, akpm, dave.hansen, pagupta, linux-mm, kirill.shutemov

On Wed, Jun 27, 2018 at 09:31:14AM +0800, Baoquan He wrote:
> In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
> will allocate one continuous memory chunk for mem maps on one node and
> populate the relevant page tables to map memory section one by one. If
> fail to populate for a certain mem section, print warning and its
> ->section_mem_map will be cleared to cancel the marking of being present.
> Like this, the number of mem sections marked as present could become
> less during sparse_init() execution.
> 
> Here just defer the ms->section_mem_map clearing if failed to populate
> its page tables until the last for_each_present_section_nr() loop. This
> is in preparation for later optimizing the mem map allocation.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/sparse-vmemmap.c |  1 -
>  mm/sparse.c         | 12 ++++++++----
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index bd0276d5f66b..640e68f8324b 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -303,7 +303,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  		ms = __nr_to_section(pnum);
>  		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
>  		       __func__);
> -		ms->section_mem_map = 0;

Since we are deferring the clearing of section_mem_map, I guess we do not need

struct mem_section *ms;
ms = __nr_to_section(pnum);

anymore, right?

>  	}
>  
>  	if (vmemmap_buf_start) {
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 6314303130b0..71ad53da2cd1 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -451,7 +451,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  		ms = __nr_to_section(pnum);
>  		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
>  		       __func__);
> -		ms->section_mem_map = 0;

The same goes here.



-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-27  1:31 [PATCH v5 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
                   ` (4 preceding siblings ...)
  2018-06-27  1:47 ` [PATCH v5 0/4] " Baoquan He
@ 2018-06-27 17:47 ` Pavel Tatashin
  2018-06-27 23:39   ` Baoquan He
  5 siblings, 1 reply; 22+ messages in thread
From: Pavel Tatashin @ 2018-06-27 17:47 UTC (permalink / raw)
  To: bhe
  Cc: LKML, Andrew Morton, dave.hansen, pagupta,
	Linux Memory Management List, kirill.shutemov

This work made me think why do we even have
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER ? This really should be the
default behavior for all systems. Yet, it is enabled only on x86_64.
We could clean up an already messy sparse.c if we removed this config,
and enabled its path for all arches. We would not break anything
because if we cannot allocate one large mmap_map we still fallback to
allocating a page at a time the same as what happens when
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=n.

Pavel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-06-27  9:54   ` Oscar Salvador
@ 2018-06-27 22:59     ` Baoquan He
  2018-06-28  3:11       ` Pavel Tatashin
  0 siblings, 1 reply; 22+ messages in thread
From: Baoquan He @ 2018-06-27 22:59 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, akpm, dave.hansen, pagupta, linux-mm, kirill.shutemov

On 06/27/18 at 11:54am, Oscar Salvador wrote:
> On Wed, Jun 27, 2018 at 09:31:14AM +0800, Baoquan He wrote:
> > In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
> > will allocate one continuous memory chunk for mem maps on one node and
> > populate the relevant page tables to map memory section one by one. If
> > fail to populate for a certain mem section, print warning and its
> > ->section_mem_map will be cleared to cancel the marking of being present.
> > Like this, the number of mem sections marked as present could become
> > less during sparse_init() execution.
> > 
> > Here just defer the ms->section_mem_map clearing if failed to populate
> > its page tables until the last for_each_present_section_nr() loop. This
> > is in preparation for later optimizing the mem map allocation.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/sparse-vmemmap.c |  1 -
> >  mm/sparse.c         | 12 ++++++++----
> >  2 files changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> > index bd0276d5f66b..640e68f8324b 100644
> > --- a/mm/sparse-vmemmap.c
> > +++ b/mm/sparse-vmemmap.c
> > @@ -303,7 +303,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> >  		ms = __nr_to_section(pnum);
> >  		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
> >  		       __func__);
> > -		ms->section_mem_map = 0;
> 
> Since we are deferring the clearing of section_mem_map, I guess we do not need
> 
> struct mem_section *ms;
> ms = __nr_to_section(pnum);
> 
> anymore, right?

Right, good catch, thanks.

I will post a new round to fix this.

> 
> >  	}
> >  
> >  	if (vmemmap_buf_start) {
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 6314303130b0..71ad53da2cd1 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -451,7 +451,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> >  		ms = __nr_to_section(pnum);
> >  		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
> >  		       __func__);
> > -		ms->section_mem_map = 0;
> 
> The same goes here.
> 
> 
> 
> -- 
> Oscar Salvador
> SUSE L3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-27 17:47 ` Pavel Tatashin
@ 2018-06-27 23:39   ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-06-27 23:39 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, Andrew Morton, dave.hansen, pagupta,
	Linux Memory Management List, kirill.shutemov

Hi Pavel,

On 06/27/18 at 01:47pm, Pavel Tatashin wrote:
> This work made me think why do we even have
> CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER ? This really should be the
> default behavior for all systems. Yet, it is enabled only on x86_64.
> We could clean up an already messy sparse.c if we removed this config,
> and enabled its path for all arches. We would not break anything
> because if we cannot allocate one large mmap_map we still fallback to
> allocating a page at a time the same as what happens when
> CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=n.

Thanks for your idea.

Seems the common ARCHes all have ARCH_SPARSEMEM_ENABLE, such as x86,
arm/64, power, s390, mips, others don't have. For them, removing
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER makes sense. 

I will make a clean up patch to do this, but I can only test it on x86.
If test robot or other issues report issue on this clean up patch,
Andrew can help only pick the current 4 patches after updating, then
we can continue discussing the clean up patch. From the current code, it
should be OK to all ARCHes.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 1/4] mm/sparse: Add a static variable nr_present_sections
  2018-06-27  1:31 ` [PATCH v5 1/4] mm/sparse: Add a static variable nr_present_sections Baoquan He
@ 2018-06-28  3:10   ` Pavel Tatashin
  0 siblings, 0 replies; 22+ messages in thread
From: Pavel Tatashin @ 2018-06-28  3:10 UTC (permalink / raw)
  To: bhe
  Cc: LKML, Andrew Morton, dave.hansen, pagupta,
	Linux Memory Management List, kirill.shutemov

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
On Tue, Jun 26, 2018 at 9:31 PM Baoquan He <bhe@redhat.com> wrote:
>
> It's used to record how many memory sections are marked as present
> during system boot up, and will be used in the later patch.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/sparse.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f13f2723950a..6314303130b0 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -200,6 +200,12 @@ static inline int next_present_section_nr(int section_nr)
>               (section_nr <= __highest_present_section_nr));    \
>              section_nr = next_present_section_nr(section_nr))
>
> +/*
> + * Record how many memory sections are marked as present
> + * during system bootup.
> + */
> +static int __initdata nr_present_sections;
> +
>  /* Record a memory area against a node. */
>  void __init memory_present(int nid, unsigned long start, unsigned long end)
>  {
> @@ -229,6 +235,7 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
>                         ms->section_mem_map = sparse_encode_early_nid(nid) |
>                                                         SECTION_IS_ONLINE;
>                         section_mark_present(ms);
> +                       nr_present_sections++;
>                 }
>         }
>  }
> --
> 2.13.6
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-06-27 22:59     ` Baoquan He
@ 2018-06-28  3:11       ` Pavel Tatashin
  0 siblings, 0 replies; 22+ messages in thread
From: Pavel Tatashin @ 2018-06-28  3:11 UTC (permalink / raw)
  To: bhe
  Cc: osalvador, LKML, Andrew Morton, dave.hansen, pagupta,
	Linux Memory Management List, kirill.shutemov

Once you remove the ms mentioned by Oscar:
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
On Wed, Jun 27, 2018 at 6:59 PM Baoquan He <bhe@redhat.com> wrote:
>
> On 06/27/18 at 11:54am, Oscar Salvador wrote:
> > On Wed, Jun 27, 2018 at 09:31:14AM +0800, Baoquan He wrote:
> > > In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
> > > will allocate one continuous memory chunk for mem maps on one node and
> > > populate the relevant page tables to map memory section one by one. If
> > > fail to populate for a certain mem section, print warning and its
> > > ->section_mem_map will be cleared to cancel the marking of being present.
> > > Like this, the number of mem sections marked as present could become
> > > less during sparse_init() execution.
> > >
> > > Here just defer the ms->section_mem_map clearing if failed to populate
> > > its page tables until the last for_each_present_section_nr() loop. This
> > > is in preparation for later optimizing the mem map allocation.
> > >
> > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > ---
> > >  mm/sparse-vmemmap.c |  1 -
> > >  mm/sparse.c         | 12 ++++++++----
> > >  2 files changed, 8 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> > > index bd0276d5f66b..640e68f8324b 100644
> > > --- a/mm/sparse-vmemmap.c
> > > +++ b/mm/sparse-vmemmap.c
> > > @@ -303,7 +303,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> > >             ms = __nr_to_section(pnum);
> > >             pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
> > >                    __func__);
> > > -           ms->section_mem_map = 0;
> >
> > Since we are deferring the clearing of section_mem_map, I guess we do not need
> >
> > struct mem_section *ms;
> > ms = __nr_to_section(pnum);
> >
> > anymore, right?
>
> Right, good catch, thanks.
>
> I will post a new round to fix this.
>
> >
> > >     }
> > >
> > >     if (vmemmap_buf_start) {
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index 6314303130b0..71ad53da2cd1 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -451,7 +451,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> > >             ms = __nr_to_section(pnum);
> > >             pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
> > >                    __func__);
> > > -           ms->section_mem_map = 0;
> >
> > The same goes here.
> >
> >
> >
> > --
> > Oscar Salvador
> > SUSE L3
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 3/4] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap
  2018-06-27  1:31 ` [PATCH v5 3/4] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap Baoquan He
@ 2018-06-28  3:14   ` Pavel Tatashin
  2018-06-28  6:57     ` Baoquan He
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Tatashin @ 2018-06-28  3:14 UTC (permalink / raw)
  To: bhe
  Cc: LKML, Andrew Morton, dave.hansen, pagupta,
	Linux Memory Management List, kirill.shutemov

Honestly, I do not like this new agrument, but it will do for now. I
could not think of a better way without rewriting everything.

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

However, I will submit a series of patches to cleanup sparse.c and
completely remove large and confusing temporary buffers: map_map, and
usemap_map. In those patches, I will remove alloc_usemap_and_memmap().
On Tue, Jun 26, 2018 at 9:31 PM Baoquan He <bhe@redhat.com> wrote:
>
> alloc_usemap_and_memmap() is passing in a "void *" that points to
> usemap_map or memmap_map. In next patch we will change both of the
> map allocation from taking 'NR_MEM_SECTIONS' as the length to taking
> 'nr_present_sections' as the length. After that, the passed in 'void*'
> needs to update as things get consumed. But, it knows only the
> quantity of objects consumed and not the type.  This effectively
> tells it enough about the type to let it update the pointer as
> objects are consumed.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/sparse.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 71ad53da2cd1..b2848cc6e32a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -489,10 +489,12 @@ void __weak __meminit vmemmap_populate_print_last(void)
>  /**
>   *  alloc_usemap_and_memmap - memory alloction for pageblock flags and vmemmap
>   *  @map: usemap_map for pageblock flags or mmap_map for vmemmap
> + *  @unit_size: size of map unit
>   */
>  static void __init alloc_usemap_and_memmap(void (*alloc_func)
>                                         (void *, unsigned long, unsigned long,
> -                                       unsigned long, int), void *data)
> +                                       unsigned long, int), void *data,
> +                                       int data_unit_size)
>  {
>         unsigned long pnum;
>         unsigned long map_count;
> @@ -569,7 +571,8 @@ void __init sparse_init(void)
>         if (!usemap_map)
>                 panic("can not allocate usemap_map\n");
>         alloc_usemap_and_memmap(sparse_early_usemaps_alloc_node,
> -                                                       (void *)usemap_map);
> +                               (void *)usemap_map,
> +                               sizeof(usemap_map[0]));
>
>  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
>         size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
> @@ -577,7 +580,8 @@ void __init sparse_init(void)
>         if (!map_map)
>                 panic("can not allocate map_map\n");
>         alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
> -                                                       (void *)map_map);
> +                               (void *)map_map,
> +                               sizeof(map_map[0]));
>  #endif
>
>         for_each_present_section_nr(0, pnum) {
> --
> 2.13.6
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-27  1:31 ` [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
@ 2018-06-28  3:19   ` Pavel Tatashin
  2018-06-28  6:39     ` Baoquan He
  2018-06-29 17:16   ` Dave Hansen
  1 sibling, 1 reply; 22+ messages in thread
From: Pavel Tatashin @ 2018-06-28  3:19 UTC (permalink / raw)
  To: bhe
  Cc: LKML, Andrew Morton, dave.hansen, pagupta,
	Linux Memory Management List, kirill.shutemov

> Signed-off-by: Baoquan He <bhe@redhat.com>
>
> Signed-off-by: Baoquan He <bhe@redhat.com>

Please remove duplicated signed-off

>                 if (!usemap) {
>                         ms->section_mem_map = 0;
> +                       nr_consumed_maps++;

Currently, we do not set ms->section_mem_map to 0 when fail to
allocate usemap, only when fail to allocate mmap we set
section_mem_map to 0. I think this is an existing bug.

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-28  3:19   ` Pavel Tatashin
@ 2018-06-28  6:39     ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-06-28  6:39 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, Andrew Morton, dave.hansen, pagupta,
	Linux Memory Management List, kirill.shutemov

On 06/27/18 at 11:19pm, Pavel Tatashin wrote:
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> 
> Please remove duplicated signed-off

Done.
> 
> >                 if (!usemap) {
> >                         ms->section_mem_map = 0;
> > +                       nr_consumed_maps++;
> 
> Currently, we do not set ms->section_mem_map to 0 when fail to
> allocate usemap, only when fail to allocate mmap we set
> section_mem_map to 0. I think this is an existing bug.

Yes, found it when changing code. Later in sparse_init(), added checking
to see if usemap is available, otherwise also do "ms->section_mem_map = 0;"
to clear its ->section_mem_map.

Here if want to be perfect, we may need to free the relevant memmap
because usemap is allocated together, memmap could be allocated one
section by one section. I didn't do that because usemap allocation is
smaller one, if that allocation even failed in this early system
initializaiton stage, the kernel won't live long, so don't bother to do
that to complicate code.

> 
> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 3/4] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap
  2018-06-28  3:14   ` Pavel Tatashin
@ 2018-06-28  6:57     ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2018-06-28  6:57 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, Andrew Morton, dave.hansen, pagupta,
	Linux Memory Management List, kirill.shutemov

On 06/27/18 at 11:14pm, Pavel Tatashin wrote:
> Honestly, I do not like this new agrument, but it will do for now. I
> could not think of a better way without rewriting everything.
> 
> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> 
> However, I will submit a series of patches to cleanup sparse.c and
> completely remove large and confusing temporary buffers: map_map, and
> usemap_map. In those patches, I will remove alloc_usemap_and_memmap().

Great, look forward to seeing them, I can help review.

> On Tue, Jun 26, 2018 at 9:31 PM Baoquan He <bhe@redhat.com> wrote:
> >
> > alloc_usemap_and_memmap() is passing in a "void *" that points to
> > usemap_map or memmap_map. In next patch we will change both of the
> > map allocation from taking 'NR_MEM_SECTIONS' as the length to taking
> > 'nr_present_sections' as the length. After that, the passed in 'void*'
> > needs to update as things get consumed. But, it knows only the
> > quantity of objects consumed and not the type.  This effectively
> > tells it enough about the type to let it update the pointer as
> > objects are consumed.
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/sparse.c | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 71ad53da2cd1..b2848cc6e32a 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -489,10 +489,12 @@ void __weak __meminit vmemmap_populate_print_last(void)
> >  /**
> >   *  alloc_usemap_and_memmap - memory alloction for pageblock flags and vmemmap
> >   *  @map: usemap_map for pageblock flags or mmap_map for vmemmap
> > + *  @unit_size: size of map unit
> >   */
> >  static void __init alloc_usemap_and_memmap(void (*alloc_func)
> >                                         (void *, unsigned long, unsigned long,
> > -                                       unsigned long, int), void *data)
> > +                                       unsigned long, int), void *data,
> > +                                       int data_unit_size)
> >  {
> >         unsigned long pnum;
> >         unsigned long map_count;
> > @@ -569,7 +571,8 @@ void __init sparse_init(void)
> >         if (!usemap_map)
> >                 panic("can not allocate usemap_map\n");
> >         alloc_usemap_and_memmap(sparse_early_usemaps_alloc_node,
> > -                                                       (void *)usemap_map);
> > +                               (void *)usemap_map,
> > +                               sizeof(usemap_map[0]));
> >
> >  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> >         size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
> > @@ -577,7 +580,8 @@ void __init sparse_init(void)
> >         if (!map_map)
> >                 panic("can not allocate map_map\n");
> >         alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
> > -                                                       (void *)map_map);
> > +                               (void *)map_map,
> > +                               sizeof(map_map[0]));
> >  #endif
> >
> >         for_each_present_section_nr(0, pnum) {
> > --
> > 2.13.6
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-27  1:31 ` [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
  2018-06-28  3:19   ` Pavel Tatashin
@ 2018-06-29 17:16   ` Dave Hansen
  2018-06-29 17:48     ` Pavel Tatashin
  1 sibling, 1 reply; 22+ messages in thread
From: Dave Hansen @ 2018-06-29 17:16 UTC (permalink / raw)
  To: Baoquan He, linux-kernel, akpm, pagupta; +Cc: linux-mm, kirill.shutemov

> +	/* The numner of present sections stored in nr_present_sections

		^ "number", please.

This comment needs CodingStyle love.

> +	 * are kept the same since mem sections are marked as present in
	
	   ^ s/are/is/

This sentence is really odd to me.  What is the point?  Just that we are
not making sections present?  Could we just say that instead of
referring to functions and variable names?

> +	 * memory_present(). In this for loop, we need check which sections
> +	 * failed to allocate memmap or usemap, then clear its
> +	 * ->section_mem_map accordingly.

Rather than referring to the for loop, how about we actually comment the
code that is doing this operation?

>  					   During this process, we need

This is missing a "to": "we need _to_ increase".

> +	 * increase 'nr_consumed_maps' whether its allocation of memmap
> +	 * or usemap failed or not, so that after we handle the i-th
> +	 * memory section, can get memmap and usemap of (i+1)-th section
> +	 * correctly. */

This makes no sense to me.  Why are we incrementing 'nr_consumed_maps'
when we do not consume one?

You say that we increment it so that things will work, but not how or
why it makes things work.  I'm confused.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-29 17:16   ` Dave Hansen
@ 2018-06-29 17:48     ` Pavel Tatashin
  2018-06-29 17:52       ` Dave Hansen
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Tatashin @ 2018-06-29 17:48 UTC (permalink / raw)
  To: dave.hansen
  Cc: bhe, LKML, Andrew Morton, pagupta, Linux Memory Management List,
	kirill.shutemov

> > +      * increase 'nr_consumed_maps' whether its allocation of memmap
> > +      * or usemap failed or not, so that after we handle the i-th
> > +      * memory section, can get memmap and usemap of (i+1)-th section
> > +      * correctly. */
>
> This makes no sense to me.  Why are we incrementing 'nr_consumed_maps'
> when we do not consume one?
>
> You say that we increment it so that things will work, but not how or
> why it makes things work.  I'm confused.

Hi Dave,

nr_consumed_maps is a local counter. map_map contains struct pages for
each section. In order to assign them to correct sections this local
counter must be incremented even when some parts of map_map are empty.

Here is example:
Node1:
map_map[0] -> Struct pages ...
map_map[1] -> NULL
Node2:
map_map[2] -> Struct pages ...

We always want to configure section from Node2 with struct pages from
Node2. Even, if there are holes in-between. The same with usemap.

Pavel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-29 17:48     ` Pavel Tatashin
@ 2018-06-29 17:52       ` Dave Hansen
  2018-06-29 18:01         ` Pavel Tatashin
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Hansen @ 2018-06-29 17:52 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: bhe, LKML, Andrew Morton, pagupta, Linux Memory Management List,
	kirill.shutemov

On 06/29/2018 10:48 AM, Pavel Tatashin wrote:
> Here is example:
> Node1:
> map_map[0] -> Struct pages ...
> map_map[1] -> NULL
> Node2:
> map_map[2] -> Struct pages ...
> 
> We always want to configure section from Node2 with struct pages from
> Node2. Even, if there are holes in-between. The same with usemap.

Right...  But your example consumes two mem_map[]s.

But, from scanning the code, we increment nr_consumed_maps three times.
Correct?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-29 17:52       ` Dave Hansen
@ 2018-06-29 18:01         ` Pavel Tatashin
  2018-06-29 18:56           ` Dave Hansen
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Tatashin @ 2018-06-29 18:01 UTC (permalink / raw)
  To: Dave Hansen
  Cc: bhe, LKML, Andrew Morton, pagupta, Linux Memory Management List,
	kirill.shutemov

On 06/29/2018 01:52 PM, Dave Hansen wrote:
> On 06/29/2018 10:48 AM, Pavel Tatashin wrote:
>> Here is example:
>> Node1:
>> map_map[0] -> Struct pages ...
>> map_map[1] -> NULL
>> Node2:
>> map_map[2] -> Struct pages ...
>>
>> We always want to configure section from Node2 with struct pages from
>> Node2. Even, if there are holes in-between. The same with usemap.
> 
> Right...  But your example consumes two mem_map[]s.
> 
> But, from scanning the code, we increment nr_consumed_maps three times.
> Correct?

Correct: it should be incremented on every iteration of the loop. No matter if the entries contained valid data or NULLs. So we increment in three places:

if map_map[] has invalid entry, increment, continue
if usemap_map[] has invalid entry, increment, continue
at the end of the loop, everything was valid we increment it

This is done so nr_consumed_maps does not get out of sync with the current pnum. pnum does not equal to nr_consumed_maps, as there are may be holes in pnums, but there is one-to-one correlation.

Pavel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-29 18:01         ` Pavel Tatashin
@ 2018-06-29 18:56           ` Dave Hansen
  2018-06-29 18:59             ` Pavel Tatashin
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Hansen @ 2018-06-29 18:56 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: bhe, LKML, Andrew Morton, pagupta, Linux Memory Management List,
	kirill.shutemov

On 06/29/2018 11:01 AM, Pavel Tatashin wrote:
> Correct: it should be incremented on every iteration of the loop. No matter if the entries contained valid data or NULLs. So we increment in three places:
> 
> if map_map[] has invalid entry, increment, continue
> if usemap_map[] has invalid entry, increment, continue
> at the end of the loop, everything was valid we increment it
> 
> This is done so nr_consumed_maps does not get out of sync with the
> current pnum. pnum does not equal to nr_consumed_maps, as there are
> may be holes in pnums, but there is one-to-one correlation.
Can this be made more clear in the code?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-29 18:56           ` Dave Hansen
@ 2018-06-29 18:59             ` Pavel Tatashin
  0 siblings, 0 replies; 22+ messages in thread
From: Pavel Tatashin @ 2018-06-29 18:59 UTC (permalink / raw)
  To: dave.hansen
  Cc: bhe, LKML, Andrew Morton, pagupta, Linux Memory Management List,
	kirill.shutemov

> > This is done so nr_consumed_maps does not get out of sync with the
> > current pnum. pnum does not equal to nr_consumed_maps, as there are
> > may be holes in pnums, but there is one-to-one correlation.
> Can this be made more clear in the code?

Absolutely. I've done it here:
http://lkml.kernel.org/r/20180628173010.23849-1-pasha.tatashin@oracle.com

Pavel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-06-29 19:00 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-27  1:31 [PATCH v5 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
2018-06-27  1:31 ` [PATCH v5 1/4] mm/sparse: Add a static variable nr_present_sections Baoquan He
2018-06-28  3:10   ` Pavel Tatashin
2018-06-27  1:31 ` [PATCH v5 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
2018-06-27  9:54   ` Oscar Salvador
2018-06-27 22:59     ` Baoquan He
2018-06-28  3:11       ` Pavel Tatashin
2018-06-27  1:31 ` [PATCH v5 3/4] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap Baoquan He
2018-06-28  3:14   ` Pavel Tatashin
2018-06-28  6:57     ` Baoquan He
2018-06-27  1:31 ` [PATCH v5 4/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
2018-06-28  3:19   ` Pavel Tatashin
2018-06-28  6:39     ` Baoquan He
2018-06-29 17:16   ` Dave Hansen
2018-06-29 17:48     ` Pavel Tatashin
2018-06-29 17:52       ` Dave Hansen
2018-06-29 18:01         ` Pavel Tatashin
2018-06-29 18:56           ` Dave Hansen
2018-06-29 18:59             ` Pavel Tatashin
2018-06-27  1:47 ` [PATCH v5 0/4] " Baoquan He
2018-06-27 17:47 ` Pavel Tatashin
2018-06-27 23:39   ` Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).