LKML Archive on lore.kernel.org
 help / Atom feed
* [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init()
@ 2018-06-28  6:28 Baoquan He
  2018-06-28  6:28 ` [PATCH v6 1/5] mm/sparse: Add a static variable nr_present_sections Baoquan He
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Baoquan He @ 2018-06-28  6:28 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta, Pavel Tatashin, Oscar Salvador
  Cc: linux-mm, kirill.shutemov, Baoquan He

This is v6 post.

In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. In
5-level paging mode, this will cost 512M memory though they will be
released at the end of sparse_init(). System with few memory, like
kdump kernel which usually only has about 256M, will fail to boot
because of allocation failure if CONFIG_X86_5LEVEL=y.

In this patchset, optimize the memmap allocation code to only use
usemap_map and map_map with the size of nr_present_sections. This
makes kdump kernel boot up with normal crashkernel='' setting when
CONFIG_X86_5LEVEL=y.

The old version can be found below:

v5:
http://lkml.kernel.org/r/20180627013116.12411-1-bhe@redhat.com
v4:
http://lkml.kernel.org/r/20180521101555.25610-1-bhe@redhat.com

v3:
https://lkml.org/lkml/2018/2/27/928

V1 can be found here:
https://www.spinics.net/lists/linux-mm/msg144486.html

Change log:
v5->v6:
  Oscar found the redundant "struct mem_section *ms" definition and
  in the old patch 2/4, after deferring the clearing of section_mem_map.
  Clean them up in this version.

  Pavel pointed out that allocating memmap together for one node at
  one time should be a default behaviour for all ARCH-es. And if failed
  on large memory, it will drop to the fallback to allocate memmap
  for one section at one time, it shoult not break anything. Add
  patch 5/5 to remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER and clean
  up the related codes.
v4->v5:
  Improve patch 3/4 log according to Dave's suggestion.

  Correct the wrong copy&paste of making 'nr_consumed_maps' to
  'alloc_usemap_and_memmap' mistakenly which is pointed out by
  Dave in patch 4/4 code comment.

  Otherwise, no code change in this version.
v3->v4:
  Improve according to Dave's three concerns which are in patch 0004:

  Rename variable 'idx_present' to 'nr_consumed_maps' which used to
  index the memmap and usemap of present sections.

  Add a check if 'nr_consumed_maps' goes beyond nr_present_sections.

  Add code comment above the final for_each_present_section_nr() to
  tell why 'nr_consumed_maps' need be increased in each iteration
  whether the 'ms->section_mem_map' need cleared or out.

v2->v3:
  Change nr_present_sections as __initdata and add code comment
  according to Andrew's suggestion.

  Change the local variable 'i' as idx_present which loops over the
  present sections, and improve the code. These are suggested by
  Dave and Pankaj.

  Add a new patch 0003 which adds a new parameter 'data_unit_size'
  to function alloc_usemap_and_memmap() in which we will update 'data'
  to make it point at new position. However its type 'void *' can't give
  us needed info to do that. Need pass the unit size in. So change code
  in patch 0004 accordingly. This is a code bug fix found when tested
  the memory deployed on multiple nodes.

v1-v2:
  Split out the nr_present_sections adding as a single patch for easier
  reviewing.

  Rewrite patch log according to Dave's suggestion.

  Fix code bug in patch 0002 reported by test robot.

Baoquan He (5):
  mm/sparse: Add a static variable nr_present_sections
  mm/sparsemem: Defer the ms->section_mem_map clearing
  mm/sparse: Add a new parameter 'data_unit_size' for
    alloc_usemap_and_memmap
  mm/sparse: Optimize memmap allocation during sparse_init()
  mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER

 mm/Kconfig          |   4 --
 mm/sparse-vmemmap.c |   9 ++---
 mm/sparse.c         | 109 ++++++++++++++++++++++++++++------------------------
 3 files changed, 62 insertions(+), 60 deletions(-)

-- 
2.13.6


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 1/5] mm/sparse: Add a static variable nr_present_sections
  2018-06-28  6:28 [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
@ 2018-06-28  6:28 ` Baoquan He
  2018-06-28 11:24   ` Oscar Salvador
  2018-06-28  6:28 ` [PATCH v6 2/5] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Baoquan He @ 2018-06-28  6:28 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta, Pavel Tatashin, Oscar Salvador
  Cc: linux-mm, kirill.shutemov, Baoquan He

It's used to record how many memory sections are marked as present
during system boot up, and will be used in the later patch.

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 mm/sparse.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index f13f2723950a..6314303130b0 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -200,6 +200,12 @@ static inline int next_present_section_nr(int section_nr)
 	      (section_nr <= __highest_present_section_nr));	\
 	     section_nr = next_present_section_nr(section_nr))
 
+/*
+ * Record how many memory sections are marked as present
+ * during system bootup.
+ */
+static int __initdata nr_present_sections;
+
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -229,6 +235,7 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
 			ms->section_mem_map = sparse_encode_early_nid(nid) |
 							SECTION_IS_ONLINE;
 			section_mark_present(ms);
+			nr_present_sections++;
 		}
 	}
 }
-- 
2.13.6


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 2/5] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-06-28  6:28 [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
  2018-06-28  6:28 ` [PATCH v6 1/5] mm/sparse: Add a static variable nr_present_sections Baoquan He
@ 2018-06-28  6:28 ` Baoquan He
  2018-06-28 11:19   ` Oscar Salvador
  2018-06-28  6:28 ` [PATCH v6 3/5] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap Baoquan He
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Baoquan He @ 2018-06-28  6:28 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta, Pavel Tatashin, Oscar Salvador
  Cc: linux-mm, kirill.shutemov, Baoquan He

In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
will allocate one continuous memory chunk for mem maps on one node and
populate the relevant page tables to map memory section one by one. If
fail to populate for a certain mem section, print warning and its
->section_mem_map will be cleared to cancel the marking of being present.
Like this, the number of mem sections marked as present could become
less during sparse_init() execution.

Here just defer the ms->section_mem_map clearing if failed to populate
its page tables until the last for_each_present_section_nr() loop. This
is in preparation for later optimizing the mem map allocation.

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 mm/sparse-vmemmap.c |  4 ----
 mm/sparse.c         | 15 ++++++++-------
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index bd0276d5f66b..68bb65b2d34d 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -292,18 +292,14 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	}
 
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
-		struct mem_section *ms;
-
 		if (!present_section_nr(pnum))
 			continue;
 
 		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
 		if (map_map[pnum])
 			continue;
-		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
-		ms->section_mem_map = 0;
 	}
 
 	if (vmemmap_buf_start) {
diff --git a/mm/sparse.c b/mm/sparse.c
index 6314303130b0..6a706093307d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -441,17 +441,13 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 
 	/* fallback */
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
-		struct mem_section *ms;
-
 		if (!present_section_nr(pnum))
 			continue;
 		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
 		if (map_map[pnum])
 			continue;
-		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
-		ms->section_mem_map = 0;
 	}
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
@@ -479,7 +475,6 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
 
 	pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 	       __func__);
-	ms->section_mem_map = 0;
 	return NULL;
 }
 #endif
@@ -583,17 +578,23 @@ void __init sparse_init(void)
 #endif
 
 	for_each_present_section_nr(0, pnum) {
+		struct mem_section *ms;
+		ms = __nr_to_section(pnum);
 		usemap = usemap_map[pnum];
-		if (!usemap)
+		if (!usemap) {
+			ms->section_mem_map = 0;
 			continue;
+		}
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 		map = map_map[pnum];
 #else
 		map = sparse_early_mem_map_alloc(pnum);
 #endif
-		if (!map)
+		if (!map) {
+			ms->section_mem_map = 0;
 			continue;
+		}
 
 		sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 								usemap);
-- 
2.13.6


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 3/5] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap
  2018-06-28  6:28 [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
  2018-06-28  6:28 ` [PATCH v6 1/5] mm/sparse: Add a static variable nr_present_sections Baoquan He
  2018-06-28  6:28 ` [PATCH v6 2/5] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
@ 2018-06-28  6:28 ` Baoquan He
  2018-06-28 13:08   ` Oscar Salvador
  2018-06-28  6:28 ` [PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Baoquan He @ 2018-06-28  6:28 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta, Pavel Tatashin, Oscar Salvador
  Cc: linux-mm, kirill.shutemov, Baoquan He

alloc_usemap_and_memmap() is passing in a "void *" that points to
usemap_map or memmap_map. In next patch we will change both of the
map allocation from taking 'NR_MEM_SECTIONS' as the length to taking
'nr_present_sections' as the length. After that, the passed in 'void*'
needs to update as things get consumed. But, it knows only the
quantity of objects consumed and not the type.  This effectively
tells it enough about the type to let it update the pointer as
objects are consumed.

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 mm/sparse.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 6a706093307d..4458a23e5293 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -486,10 +486,12 @@ void __weak __meminit vmemmap_populate_print_last(void)
 /**
  *  alloc_usemap_and_memmap - memory alloction for pageblock flags and vmemmap
  *  @map: usemap_map for pageblock flags or mmap_map for vmemmap
+ *  @unit_size: size of map unit
  */
 static void __init alloc_usemap_and_memmap(void (*alloc_func)
 					(void *, unsigned long, unsigned long,
-					unsigned long, int), void *data)
+					unsigned long, int), void *data,
+					int data_unit_size)
 {
 	unsigned long pnum;
 	unsigned long map_count;
@@ -566,7 +568,8 @@ void __init sparse_init(void)
 	if (!usemap_map)
 		panic("can not allocate usemap_map\n");
 	alloc_usemap_and_memmap(sparse_early_usemaps_alloc_node,
-							(void *)usemap_map);
+				(void *)usemap_map,
+				sizeof(usemap_map[0]));
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
@@ -574,7 +577,8 @@ void __init sparse_init(void)
 	if (!map_map)
 		panic("can not allocate map_map\n");
 	alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
-							(void *)map_map);
+				(void *)map_map,
+				sizeof(map_map[0]));
 #endif
 
 	for_each_present_section_nr(0, pnum) {
-- 
2.13.6


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-28  6:28 [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
                   ` (2 preceding siblings ...)
  2018-06-28  6:28 ` [PATCH v6 3/5] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap Baoquan He
@ 2018-06-28  6:28 ` Baoquan He
  2018-06-28 12:09   ` Oscar Salvador
  2018-06-28  6:28 ` [PATCH v6 5/5] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER Baoquan He
  2018-07-08  2:09 ` [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
  5 siblings, 1 reply; 17+ messages in thread
From: Baoquan He @ 2018-06-28  6:28 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta, Pavel Tatashin, Oscar Salvador
  Cc: linux-mm, kirill.shutemov, Baoquan He

In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. With
the help of these two arrays, continuous memory chunk is allocated for
usemap and memmap for memory sections on one node. This avoids too many
memory fragmentations. Like below diagram, '1' indicates the present
memory section, '0' means absent one. The number 'n' could be much
smaller than NR_MEM_SECTIONS on most of systems.

|1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
-------------------------------------------------------
 0 1 2 3         4 5         i   i+1     n-1   n

If fail to populate the page tables to map one section's memmap, its
->section_mem_map will be cleared finally to indicate that it's not present.
After use, these two arrays will be released at the end of sparse_init().

In 4-level paging mode, each array costs 4M which can be ignorable. While
in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel
Usually only reserves very few memory, e.g 256M. So, even thouth they are
temporarily allocated, still not acceptable.

In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS.
Since the ->section_mem_map clearing has been deferred to the last, the
number of present memory sections are kept the same during sparse_init()
until we finally clear out the memory section's ->section_mem_map if its
usemap or memmap is not correctly handled. Thus in the middle whenever
for_each_present_section_nr() loop is taken, the i-th present memory
section is always the same one.

Here only allocate usemap_map and map_map with the size of
'nr_present_sections'. For the i-th present memory section, install its
usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then
in the last for_each_present_section_nr() loop which clears the failed
memory section's ->section_mem_map, fetch usemap and memmap from
usemap_map[] and map_map[] array and set them into mem_section[]
accordingly.

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 mm/sparse-vmemmap.c |  5 +++--
 mm/sparse.c         | 43 ++++++++++++++++++++++++++++++++++---------
 2 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 68bb65b2d34d..e1a54ba411ec 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
 	void *vmemmap_buf_start;
+	int nr_consumed_maps = 0;
 
 	size = ALIGN(size, PMD_SIZE);
 	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
@@ -295,8 +296,8 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		if (!present_section_nr(pnum))
 			continue;
 
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[nr_consumed_maps++])
 			continue;
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
diff --git a/mm/sparse.c b/mm/sparse.c
index 4458a23e5293..e1767d9fe4f3 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -386,6 +386,7 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 	unsigned long pnum;
 	unsigned long **usemap_map = (unsigned long **)data;
 	int size = usemap_size();
+	int nr_consumed_maps = 0;
 
 	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
 							  size * usemap_count);
@@ -397,9 +398,10 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
-		usemap_map[pnum] = usemap;
+		usemap_map[nr_consumed_maps] = usemap;
 		usemap += size;
-		check_usemap_section_nr(nodeid, usemap_map[pnum]);
+		check_usemap_section_nr(nodeid, usemap_map[nr_consumed_maps]);
+		nr_consumed_maps++;
 	}
 }
 
@@ -424,27 +426,31 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	void *map;
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
+	int nr_consumed_maps;
 
 	size = PAGE_ALIGN(size);
 	map = memblock_virt_alloc_try_nid_raw(size * map_count,
 					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
 					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
 	if (map) {
+		nr_consumed_maps = 0;
 		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 			if (!present_section_nr(pnum))
 				continue;
-			map_map[pnum] = map;
+			map_map[nr_consumed_maps] = map;
 			map += size;
+			nr_consumed_maps++;
 		}
 		return;
 	}
 
 	/* fallback */
+	nr_consumed_maps = 0;
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[nr_consumed_maps++])
 			continue;
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
@@ -523,6 +529,7 @@ static void __init alloc_usemap_and_memmap(void (*alloc_func)
 		/* new start, update count etc*/
 		nodeid_begin = nodeid;
 		pnum_begin = pnum;
+		data += map_count * data_unit_size;
 		map_count = 1;
 	}
 	/* ok, last chunk */
@@ -541,6 +548,7 @@ void __init sparse_init(void)
 	unsigned long *usemap;
 	unsigned long **usemap_map;
 	int size;
+	int nr_consumed_maps = 0;
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 	int size2;
 	struct page **map_map;
@@ -563,7 +571,7 @@ void __init sparse_init(void)
 	 * powerpc need to call sparse_init_one_section right after each
 	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
 	 */
-	size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
+	size = sizeof(unsigned long *) * nr_present_sections;
 	usemap_map = memblock_virt_alloc(size, 0);
 	if (!usemap_map)
 		panic("can not allocate usemap_map\n");
@@ -572,7 +580,7 @@ void __init sparse_init(void)
 				sizeof(usemap_map[0]));
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
+	size2 = sizeof(struct page *) * nr_present_sections;
 	map_map = memblock_virt_alloc(size2, 0);
 	if (!map_map)
 		panic("can not allocate map_map\n");
@@ -581,27 +589,44 @@ void __init sparse_init(void)
 				sizeof(map_map[0]));
 #endif
 
+	/* The numner of present sections stored in nr_present_sections
+	 * are kept the same since mem sections are marked as present in
+	 * memory_present(). In this for loop, we need check which sections
+	 * failed to allocate memmap or usemap, then clear its
+	 * ->section_mem_map accordingly. During this process, we need
+	 * increase 'nr_consumed_maps' whether its allocation of memmap
+	 * or usemap failed or not, so that after we handle the i-th
+	 * memory section, can get memmap and usemap of (i+1)-th section
+	 * correctly. */
 	for_each_present_section_nr(0, pnum) {
 		struct mem_section *ms;
+
+		if (nr_consumed_maps >= nr_present_sections) {
+			pr_err("nr_consumed_maps goes beyond nr_present_sections\n");
+			break;
+		}
 		ms = __nr_to_section(pnum);
-		usemap = usemap_map[pnum];
+		usemap = usemap_map[nr_consumed_maps];
 		if (!usemap) {
 			ms->section_mem_map = 0;
+			nr_consumed_maps++;
 			continue;
 		}
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-		map = map_map[pnum];
+		map = map_map[nr_consumed_maps];
 #else
 		map = sparse_early_mem_map_alloc(pnum);
 #endif
 		if (!map) {
 			ms->section_mem_map = 0;
+			nr_consumed_maps++;
 			continue;
 		}
 
 		sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 								usemap);
+		nr_consumed_maps++;
 	}
 
 	vmemmap_populate_print_last();
-- 
2.13.6


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 5/5] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
  2018-06-28  6:28 [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
                   ` (3 preceding siblings ...)
  2018-06-28  6:28 ` [PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
@ 2018-06-28  6:28 ` Baoquan He
  2018-06-28 12:15   ` Pavel Tatashin
  2018-07-11 22:26   ` [v6,5/5] " Guenter Roeck
  2018-07-08  2:09 ` [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
  5 siblings, 2 replies; 17+ messages in thread
From: Baoquan He @ 2018-06-28  6:28 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta, Pavel Tatashin, Oscar Salvador
  Cc: linux-mm, kirill.shutemov, Baoquan He

Pavel pointed out that the behaviour of allocating memmap together
for one node should be defaulted for all ARCH-es. It won't break
anything because it will drop to the fallback action to allocate
imemmap for each section at one time if failed on large chunk of
memory.

So remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER and clean up the
related codes.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 mm/Kconfig  |  4 ----
 mm/sparse.c | 32 ++------------------------------
 2 files changed, 2 insertions(+), 34 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index ce95491abd6a..75a196bf83e6 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -115,10 +115,6 @@ config SPARSEMEM_EXTREME
 config SPARSEMEM_VMEMMAP_ENABLE
 	bool
 
-config SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	def_bool y
-	depends on SPARSEMEM && X86_64
-
 config SPARSEMEM_VMEMMAP
 	bool "Sparse Memory virtual memmap"
 	depends on SPARSEMEM && SPARSEMEM_VMEMMAP_ENABLE
diff --git a/mm/sparse.c b/mm/sparse.c
index e1767d9fe4f3..d18e2697a781 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -458,7 +458,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
-#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 static void __init sparse_early_mem_maps_alloc_node(void *data,
 				 unsigned long pnum_begin,
 				 unsigned long pnum_end,
@@ -468,22 +467,6 @@ static void __init sparse_early_mem_maps_alloc_node(void *data,
 	sparse_mem_maps_populate_node(map_map, pnum_begin, pnum_end,
 					 map_count, nodeid);
 }
-#else
-static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
-{
-	struct page *map;
-	struct mem_section *ms = __nr_to_section(pnum);
-	int nid = sparse_early_nid(ms);
-
-	map = sparse_mem_map_populate(pnum, nid, NULL);
-	if (map)
-		return map;
-
-	pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
-	       __func__);
-	return NULL;
-}
-#endif
 
 void __weak __meminit vmemmap_populate_print_last(void)
 {
@@ -545,14 +528,11 @@ void __init sparse_init(void)
 {
 	unsigned long pnum;
 	struct page *map;
+	struct page **map_map;
 	unsigned long *usemap;
 	unsigned long **usemap_map;
-	int size;
+	int size, size2;
 	int nr_consumed_maps = 0;
-#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	int size2;
-	struct page **map_map;
-#endif
 
 	/* see include/linux/mmzone.h 'struct mem_section' definition */
 	BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
@@ -579,7 +559,6 @@ void __init sparse_init(void)
 				(void *)usemap_map,
 				sizeof(usemap_map[0]));
 
-#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 	size2 = sizeof(struct page *) * nr_present_sections;
 	map_map = memblock_virt_alloc(size2, 0);
 	if (!map_map)
@@ -587,7 +566,6 @@ void __init sparse_init(void)
 	alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
 				(void *)map_map,
 				sizeof(map_map[0]));
-#endif
 
 	/* The numner of present sections stored in nr_present_sections
 	 * are kept the same since mem sections are marked as present in
@@ -613,11 +591,7 @@ void __init sparse_init(void)
 			continue;
 		}
 
-#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 		map = map_map[nr_consumed_maps];
-#else
-		map = sparse_early_mem_map_alloc(pnum);
-#endif
 		if (!map) {
 			ms->section_mem_map = 0;
 			nr_consumed_maps++;
@@ -631,9 +605,7 @@ void __init sparse_init(void)
 
 	vmemmap_populate_print_last();
 
-#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 	memblock_free_early(__pa(map_map), size2);
-#endif
 	memblock_free_early(__pa(usemap_map), size);
 }
 
-- 
2.13.6


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 2/5] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-06-28  6:28 ` [PATCH v6 2/5] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
@ 2018-06-28 11:19   ` Oscar Salvador
  0 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2018-06-28 11:19 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, akpm, dave.hansen, pagupta, Pavel Tatashin,
	linux-mm, kirill.shutemov

On Thu, Jun 28, 2018 at 02:28:54PM +0800, Baoquan He wrote:
> In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
> will allocate one continuous memory chunk for mem maps on one node and
> populate the relevant page tables to map memory section one by one. If
> fail to populate for a certain mem section, print warning and its
> ->section_mem_map will be cleared to cancel the marking of being present.
> Like this, the number of mem sections marked as present could become
> less during sparse_init() execution.
> 
> Here just defer the ms->section_mem_map clearing if failed to populate
> its page tables until the last for_each_present_section_nr() loop. This
> is in preparation for later optimizing the mem map allocation.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

Looks good to me.

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> ---
>  mm/sparse-vmemmap.c |  4 ----
>  mm/sparse.c         | 15 ++++++++-------
>  2 files changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index bd0276d5f66b..68bb65b2d34d 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -292,18 +292,14 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  	}
>  
>  	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> -		struct mem_section *ms;
> -
>  		if (!present_section_nr(pnum))
>  			continue;
>  
>  		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
>  		if (map_map[pnum])
>  			continue;
> -		ms = __nr_to_section(pnum);
>  		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
>  		       __func__);
> -		ms->section_mem_map = 0;
>  	}
>  
>  	if (vmemmap_buf_start) {
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 6314303130b0..6a706093307d 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -441,17 +441,13 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  
>  	/* fallback */
>  	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> -		struct mem_section *ms;
> -
>  		if (!present_section_nr(pnum))
>  			continue;
>  		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
>  		if (map_map[pnum])
>  			continue;
> -		ms = __nr_to_section(pnum);
>  		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
>  		       __func__);
> -		ms->section_mem_map = 0;
>  	}
>  }
>  #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
> @@ -479,7 +475,6 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
>  
>  	pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
>  	       __func__);
> -	ms->section_mem_map = 0;
>  	return NULL;
>  }
>  #endif
> @@ -583,17 +578,23 @@ void __init sparse_init(void)
>  #endif
>  
>  	for_each_present_section_nr(0, pnum) {
> +		struct mem_section *ms;
> +		ms = __nr_to_section(pnum);
>  		usemap = usemap_map[pnum];
> -		if (!usemap)
> +		if (!usemap) {
> +			ms->section_mem_map = 0;
>  			continue;
> +		}
>  
>  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
>  		map = map_map[pnum];
>  #else
>  		map = sparse_early_mem_map_alloc(pnum);
>  #endif
> -		if (!map)
> +		if (!map) {
> +			ms->section_mem_map = 0;
>  			continue;
> +		}
>  
>  		sparse_init_one_section(__nr_to_section(pnum), pnum, map,
>  								usemap);
> -- 
> 2.13.6
> 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/5] mm/sparse: Add a static variable nr_present_sections
  2018-06-28  6:28 ` [PATCH v6 1/5] mm/sparse: Add a static variable nr_present_sections Baoquan He
@ 2018-06-28 11:24   ` Oscar Salvador
  0 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2018-06-28 11:24 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, akpm, dave.hansen, pagupta, Pavel Tatashin,
	linux-mm, kirill.shutemov

On Thu, Jun 28, 2018 at 02:28:53PM +0800, Baoquan He wrote:
> It's used to record how many memory sections are marked as present
> during system boot up, and will be used in the later patch.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> ---
>  mm/sparse.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f13f2723950a..6314303130b0 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -200,6 +200,12 @@ static inline int next_present_section_nr(int section_nr)
>  	      (section_nr <= __highest_present_section_nr));	\
>  	     section_nr = next_present_section_nr(section_nr))
>  
> +/*
> + * Record how many memory sections are marked as present
> + * during system bootup.
> + */
> +static int __initdata nr_present_sections;
> +
>  /* Record a memory area against a node. */
>  void __init memory_present(int nid, unsigned long start, unsigned long end)
>  {
> @@ -229,6 +235,7 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
>  			ms->section_mem_map = sparse_encode_early_nid(nid) |
>  							SECTION_IS_ONLINE;
>  			section_mark_present(ms);
> +			nr_present_sections++;
>  		}
>  	}
>  }
> -- 
> 2.13.6
> 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-28  6:28 ` [PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
@ 2018-06-28 12:09   ` Oscar Salvador
  2018-06-28 12:12     ` Pavel Tatashin
  0 siblings, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2018-06-28 12:09 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, akpm, dave.hansen, pagupta, Pavel Tatashin,
	linux-mm, kirill.shutemov

On Thu, Jun 28, 2018 at 02:28:56PM +0800, Baoquan He wrote:
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS. They are used to store
> each memory section's usemap and mem map if marked as present. With
> the help of these two arrays, continuous memory chunk is allocated for
> usemap and memmap for memory sections on one node. This avoids too many
> memory fragmentations. Like below diagram, '1' indicates the present
> memory section, '0' means absent one. The number 'n' could be much
> smaller than NR_MEM_SECTIONS on most of systems.
> 
> |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
> -------------------------------------------------------
>  0 1 2 3         4 5         i   i+1     n-1   n
> 
> If fail to populate the page tables to map one section's memmap, its
> ->section_mem_map will be cleared finally to indicate that it's not present.
> After use, these two arrays will be released at the end of sparse_init().
> 
> In 4-level paging mode, each array costs 4M which can be ignorable. While
> in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel
> Usually only reserves very few memory, e.g 256M. So, even thouth they are
> temporarily allocated, still not acceptable.
> 
> In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS.
> Since the ->section_mem_map clearing has been deferred to the last, the
> number of present memory sections are kept the same during sparse_init()
> until we finally clear out the memory section's ->section_mem_map if its
> usemap or memmap is not correctly handled. Thus in the middle whenever
> for_each_present_section_nr() loop is taken, the i-th present memory
> section is always the same one.
> 
> Here only allocate usemap_map and map_map with the size of
> 'nr_present_sections'. For the i-th present memory section, install its
> usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then
> in the last for_each_present_section_nr() loop which clears the failed
> memory section's ->section_mem_map, fetch usemap and memmap from
> usemap_map[] and map_map[] array and set them into mem_section[]
> accordingly.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> ---
>  mm/sparse-vmemmap.c |  5 +++--
>  mm/sparse.c         | 43 ++++++++++++++++++++++++++++++++++---------
>  2 files changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 68bb65b2d34d..e1a54ba411ec 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  	unsigned long pnum;
>  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
>  	void *vmemmap_buf_start;
> +	int nr_consumed_maps = 0;
>  
>  	size = ALIGN(size, PMD_SIZE);
>  	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
> @@ -295,8 +296,8 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  		if (!present_section_nr(pnum))
>  			continue;
>  
> -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> -		if (map_map[pnum])
> +		map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, nodeid, NULL);
> +		if (map_map[nr_consumed_maps++])
>  			continue;
>  		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
>  		       __func__);
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 4458a23e5293..e1767d9fe4f3 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -386,6 +386,7 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
>  	unsigned long pnum;
>  	unsigned long **usemap_map = (unsigned long **)data;
>  	int size = usemap_size();
> +	int nr_consumed_maps = 0;
>  
>  	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
>  							  size * usemap_count);
> @@ -397,9 +398,10 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
>  	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
>  		if (!present_section_nr(pnum))
>  			continue;
> -		usemap_map[pnum] = usemap;
> +		usemap_map[nr_consumed_maps] = usemap;
>  		usemap += size;
> -		check_usemap_section_nr(nodeid, usemap_map[pnum]);
> +		check_usemap_section_nr(nodeid, usemap_map[nr_consumed_maps]);
> +		nr_consumed_maps++;
>  	}
>  }
>  
> @@ -424,27 +426,31 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  	void *map;
>  	unsigned long pnum;
>  	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> +	int nr_consumed_maps;
>  
>  	size = PAGE_ALIGN(size);
>  	map = memblock_virt_alloc_try_nid_raw(size * map_count,
>  					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
>  					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
>  	if (map) {
> +		nr_consumed_maps = 0;
>  		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
>  			if (!present_section_nr(pnum))
>  				continue;
> -			map_map[pnum] = map;
> +			map_map[nr_consumed_maps] = map;
>  			map += size;
> +			nr_consumed_maps++;
>  		}
>  		return;
>  	}
>  
>  	/* fallback */
> +	nr_consumed_maps = 0;
>  	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
>  		if (!present_section_nr(pnum))
>  			continue;
> -		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> -		if (map_map[pnum])
> +		map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, nodeid, NULL);
> +		if (map_map[nr_consumed_maps++])
>  			continue;
>  		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
>  		       __func__);
> @@ -523,6 +529,7 @@ static void __init alloc_usemap_and_memmap(void (*alloc_func)
>  		/* new start, update count etc*/
>  		nodeid_begin = nodeid;
>  		pnum_begin = pnum;
> +		data += map_count * data_unit_size;
>  		map_count = 1;
>  	}
>  	/* ok, last chunk */
> @@ -541,6 +548,7 @@ void __init sparse_init(void)
>  	unsigned long *usemap;
>  	unsigned long **usemap_map;
>  	int size;
> +	int nr_consumed_maps = 0;
>  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
>  	int size2;
>  	struct page **map_map;
> @@ -563,7 +571,7 @@ void __init sparse_init(void)
>  	 * powerpc need to call sparse_init_one_section right after each
>  	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
>  	 */
> -	size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
> +	size = sizeof(unsigned long *) * nr_present_sections;
>  	usemap_map = memblock_virt_alloc(size, 0);
>  	if (!usemap_map)
>  		panic("can not allocate usemap_map\n");
> @@ -572,7 +580,7 @@ void __init sparse_init(void)
>  				sizeof(usemap_map[0]));
>  
>  #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> -	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
> +	size2 = sizeof(struct page *) * nr_present_sections;
>  	map_map = memblock_virt_alloc(size2, 0);
>  	if (!map_map)
>  		panic("can not allocate map_map\n");
> @@ -581,27 +589,44 @@ void __init sparse_init(void)
>  				sizeof(map_map[0]));
>  #endif
>  
> +	/* The numner of present sections stored in nr_present_sections
> +	 * are kept the same since mem sections are marked as present in
> +	 * memory_present(). In this for loop, we need check which sections
> +	 * failed to allocate memmap or usemap, then clear its
> +	 * ->section_mem_map accordingly. During this process, we need
> +	 * increase 'nr_consumed_maps' whether its allocation of memmap
> +	 * or usemap failed or not, so that after we handle the i-th
> +	 * memory section, can get memmap and usemap of (i+1)-th section
> +	 * correctly. */
>  	for_each_present_section_nr(0, pnum) {
>  		struct mem_section *ms;
> +
> +		if (nr_consumed_maps >= nr_present_sections) {
> +			pr_err("nr_consumed_maps goes beyond nr_present_sections\n");
> +			break;
> +		}

Hi Baoquan,

I am sure I am missing something here, but is this check really needed?

I mean, for_each_present_section_nr() only returns the section nr if the section
has been marked as SECTION_MARKED_PRESENT.
That happens in memory_present(), where now we also increment nr_present_sections whenever
we find a present section.

So, for_each_present_section_nr() should return the same nr of section as nr_present_sections.
Since we only increment nr_consumed_maps once in the loop, I am not so sure we can
go beyond nr_present_sections.

Did I overlook something?

Other than that, this looks good to me.

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-28 12:09   ` Oscar Salvador
@ 2018-06-28 12:12     ` Pavel Tatashin
  2018-06-28 13:12       ` Oscar Salvador
  2018-06-28 14:05       ` Dave Hansen
  0 siblings, 2 replies; 17+ messages in thread
From: Pavel Tatashin @ 2018-06-28 12:12 UTC (permalink / raw)
  To: osalvador
  Cc: bhe, LKML, Andrew Morton, dave.hansen, pagupta,
	Linux Memory Management List, kirill.shutemov

> > +             if (nr_consumed_maps >= nr_present_sections) {
> > +                     pr_err("nr_consumed_maps goes beyond nr_present_sections\n");
> > +                     break;
> > +             }
>
> Hi Baoquan,
>
> I am sure I am missing something here, but is this check really needed?
>
> I mean, for_each_present_section_nr() only returns the section nr if the section
> has been marked as SECTION_MARKED_PRESENT.
> That happens in memory_present(), where now we also increment nr_present_sections whenever
> we find a present section.
>
> So, for_each_present_section_nr() should return the same nr of section as nr_present_sections.
> Since we only increment nr_consumed_maps once in the loop, I am not so sure we can
> go beyond nr_present_sections.
>
> Did I overlook something?

You did not, this is basically a safety check. A BUG_ON() would be
better here. As, this something that should really not happening, and
would mean a bug in the current project.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 5/5] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
  2018-06-28  6:28 ` [PATCH v6 5/5] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER Baoquan He
@ 2018-06-28 12:15   ` Pavel Tatashin
  2018-07-11 22:26   ` [v6,5/5] " Guenter Roeck
  1 sibling, 0 replies; 17+ messages in thread
From: Pavel Tatashin @ 2018-06-28 12:15 UTC (permalink / raw)
  To: bhe
  Cc: LKML, Andrew Morton, dave.hansen, pagupta, osalvador,
	Linux Memory Management List, kirill.shutemov

On Thu, Jun 28, 2018 at 2:29 AM Baoquan He <bhe@redhat.com> wrote:
>
> Pavel pointed out that the behaviour of allocating memmap together
> for one node should be defaulted for all ARCH-es. It won't break
> anything because it will drop to the fallback action to allocate
> imemmap for each section at one time if failed on large chunk of
> memory.
>
> So remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER and clean up the
> related codes.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 3/5] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap
  2018-06-28  6:28 ` [PATCH v6 3/5] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap Baoquan He
@ 2018-06-28 13:08   ` Oscar Salvador
  0 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2018-06-28 13:08 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, akpm, dave.hansen, pagupta, Pavel Tatashin,
	linux-mm, kirill.shutemov

On Thu, Jun 28, 2018 at 02:28:55PM +0800, Baoquan He wrote:
> alloc_usemap_and_memmap() is passing in a "void *" that points to
> usemap_map or memmap_map. In next patch we will change both of the
> map allocation from taking 'NR_MEM_SECTIONS' as the length to taking
> 'nr_present_sections' as the length. After that, the passed in 'void*'
> needs to update as things get consumed. But, it knows only the
> quantity of objects consumed and not the type.  This effectively
> tells it enough about the type to let it update the pointer as
> objects are consumed.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-28 12:12     ` Pavel Tatashin
@ 2018-06-28 13:12       ` Oscar Salvador
  2018-06-28 14:05       ` Dave Hansen
  1 sibling, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2018-06-28 13:12 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: bhe, LKML, Andrew Morton, dave.hansen, pagupta,
	Linux Memory Management List, kirill.shutemov

On Thu, Jun 28, 2018 at 08:12:04AM -0400, Pavel Tatashin wrote:
> > > +             if (nr_consumed_maps >= nr_present_sections) {
> > > +                     pr_err("nr_consumed_maps goes beyond nr_present_sections\n");
> > > +                     break;
> > > +             }
> >
> > Hi Baoquan,
> >
> > I am sure I am missing something here, but is this check really needed?
> >
> > I mean, for_each_present_section_nr() only returns the section nr if the section
> > has been marked as SECTION_MARKED_PRESENT.
> > That happens in memory_present(), where now we also increment nr_present_sections whenever
> > we find a present section.
> >
> > So, for_each_present_section_nr() should return the same nr of section as nr_present_sections.
> > Since we only increment nr_consumed_maps once in the loop, I am not so sure we can
> > go beyond nr_present_sections.
> >
> > Did I overlook something?
> 
> You did not, this is basically a safety check. A BUG_ON() would be
> better here. As, this something that should really not happening, and
> would mean a bug in the current project.

I think we would be better off having a BUG_ON() there.
Otherwise the system can go sideways later on. 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-28 12:12     ` Pavel Tatashin
  2018-06-28 13:12       ` Oscar Salvador
@ 2018-06-28 14:05       ` Dave Hansen
  2018-06-28 14:09         ` Pavel Tatashin
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2018-06-28 14:05 UTC (permalink / raw)
  To: Pavel Tatashin, osalvador
  Cc: bhe, LKML, Andrew Morton, pagupta, Linux Memory Management List,
	kirill.shutemov

On 06/28/2018 05:12 AM, Pavel Tatashin wrote:
> You did not, this is basically a safety check. A BUG_ON() would be
> better here. As, this something that should really not happening, and
> would mean a bug in the current project.

Is this at a point in boot where a BUG_ON() generally produces useful
output, or will it just produce and early-boot silent hang with no
console output?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-28 14:05       ` Dave Hansen
@ 2018-06-28 14:09         ` Pavel Tatashin
  0 siblings, 0 replies; 17+ messages in thread
From: Pavel Tatashin @ 2018-06-28 14:09 UTC (permalink / raw)
  To: dave.hansen
  Cc: osalvador, bhe, LKML, Andrew Morton, pagupta,
	Linux Memory Management List, kirill.shutemov

> Is this at a point in boot where a BUG_ON() generally produces useful
> output, or will it just produce and early-boot silent hang with no
> console output?

Probably depends on the platform, but in KVM, I see a nice panic
message (inserted BUG_ON(1) into sparse_init()):

[    0.000000] kernel BUG at mm/sparse.c:490!
PANIC: early exception 0x06 IP 10:ffffffffb6bd43d9 error 0 cr2
0xffff898747575000
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18.0-rc2_pt_sparse #6
[    0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[    0.000000] RIP: 0010:sparse_init+0x0/0x2
[    0.000000] Code: fe 3b 05 ba d0 16 00 7e 06 89 05 b2 d0 16 00 49
83 08 01 48 81 c3 00 80 00 00 e9 73 ff ff ff 48 83 c4 10 5b 5d 41 5c
41 5d c3 <0f> 0b 48 8b 05 ae 46 8f ff 48 c1 e2 15 48 01 d0 c3 41 56 48
8b 05
[    0.000000] RSP: 0000:ffffffffb6603e98 EFLAGS: 00010086 ORIG_RAX:
0000000000000000
[    0.000000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb6603e80
[    0.000000] RDX: ffffffffb6603e78 RSI: 0000000000000040 RDI: ffffffffb6603e70
[    0.000000] RBP: 0000000007f7ec00 R08: ffffffffb6603e74 R09: 0000000000007fe0
[    0.000000] R10: 0000000000000100 R11: 0000000007fd6000 R12: 0000000000000000
[    0.000000] R13: ffffffffb6603f18 R14: 0000000000000000 R15: 0000000000000000
[    0.000000] FS:  0000000000000000(0000) GS:ffffffffb6b82000(0000)
knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: ffff898747575000 CR3: 0000000006e0a000 CR4: 00000000000606b0
[    0.000000] Call Trace:
[    0.000000]  ? paging_init+0xf/0x2c
[    0.000000]  ? setup_arch+0xae8/0xc17
[    0.000000]  ? printk+0x53/0x6a
[    0.000000]  ? start_kernel+0x62/0x4b3
[    0.000000]  ? load_ucode_bsp+0x3d/0x129
[    0.000000]  ? secondary_startup_64+0xa5/0xb0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-06-28  6:28 [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
                   ` (4 preceding siblings ...)
  2018-06-28  6:28 ` [PATCH v6 5/5] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER Baoquan He
@ 2018-07-08  2:09 ` Baoquan He
  5 siblings, 0 replies; 17+ messages in thread
From: Baoquan He @ 2018-07-08  2:09 UTC (permalink / raw)
  To: linux-kernel, akpm
  Cc: Oscar Salvador, Pavel Tatashin, dave.hansen, pagupta, linux-mm,
	kirill.shutemov

Hi Andrew,

Could you pick this series into mm tree so that it can catch 4.18?

Thanks
Baoquan

On 06/28/18 at 02:28pm, Baoquan He wrote:
> This is v6 post.
> 
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS. They are used to store
> each memory section's usemap and mem map if marked as present. In
> 5-level paging mode, this will cost 512M memory though they will be
> released at the end of sparse_init(). System with few memory, like
> kdump kernel which usually only has about 256M, will fail to boot
> because of allocation failure if CONFIG_X86_5LEVEL=y.
> 
> In this patchset, optimize the memmap allocation code to only use
> usemap_map and map_map with the size of nr_present_sections. This
> makes kdump kernel boot up with normal crashkernel='' setting when
> CONFIG_X86_5LEVEL=y.
> 
> The old version can be found below:
> 
> v5:
> http://lkml.kernel.org/r/20180627013116.12411-1-bhe@redhat.com
> v4:
> http://lkml.kernel.org/r/20180521101555.25610-1-bhe@redhat.com
> 
> v3:
> https://lkml.org/lkml/2018/2/27/928
> 
> V1 can be found here:
> https://www.spinics.net/lists/linux-mm/msg144486.html
> 
> Change log:
> v5->v6:
>   Oscar found the redundant "struct mem_section *ms" definition and
>   in the old patch 2/4, after deferring the clearing of section_mem_map.
>   Clean them up in this version.
> 
>   Pavel pointed out that allocating memmap together for one node at
>   one time should be a default behaviour for all ARCH-es. And if failed
>   on large memory, it will drop to the fallback to allocate memmap
>   for one section at one time, it shoult not break anything. Add
>   patch 5/5 to remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER and clean
>   up the related codes.
> v4->v5:
>   Improve patch 3/4 log according to Dave's suggestion.
> 
>   Correct the wrong copy&paste of making 'nr_consumed_maps' to
>   'alloc_usemap_and_memmap' mistakenly which is pointed out by
>   Dave in patch 4/4 code comment.
> 
>   Otherwise, no code change in this version.
> v3->v4:
>   Improve according to Dave's three concerns which are in patch 0004:
> 
>   Rename variable 'idx_present' to 'nr_consumed_maps' which used to
>   index the memmap and usemap of present sections.
> 
>   Add a check if 'nr_consumed_maps' goes beyond nr_present_sections.
> 
>   Add code comment above the final for_each_present_section_nr() to
>   tell why 'nr_consumed_maps' need be increased in each iteration
>   whether the 'ms->section_mem_map' need cleared or out.
> 
> v2->v3:
>   Change nr_present_sections as __initdata and add code comment
>   according to Andrew's suggestion.
> 
>   Change the local variable 'i' as idx_present which loops over the
>   present sections, and improve the code. These are suggested by
>   Dave and Pankaj.
> 
>   Add a new patch 0003 which adds a new parameter 'data_unit_size'
>   to function alloc_usemap_and_memmap() in which we will update 'data'
>   to make it point at new position. However its type 'void *' can't give
>   us needed info to do that. Need pass the unit size in. So change code
>   in patch 0004 accordingly. This is a code bug fix found when tested
>   the memory deployed on multiple nodes.
> 
> v1-v2:
>   Split out the nr_present_sections adding as a single patch for easier
>   reviewing.
> 
>   Rewrite patch log according to Dave's suggestion.
> 
>   Fix code bug in patch 0002 reported by test robot.
> 
> Baoquan He (5):
>   mm/sparse: Add a static variable nr_present_sections
>   mm/sparsemem: Defer the ms->section_mem_map clearing
>   mm/sparse: Add a new parameter 'data_unit_size' for
>     alloc_usemap_and_memmap
>   mm/sparse: Optimize memmap allocation during sparse_init()
>   mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> 
>  mm/Kconfig          |   4 --
>  mm/sparse-vmemmap.c |   9 ++---
>  mm/sparse.c         | 109 ++++++++++++++++++++++++++++------------------------
>  3 files changed, 62 insertions(+), 60 deletions(-)
> 
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [v6,5/5] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
  2018-06-28  6:28 ` [PATCH v6 5/5] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER Baoquan He
  2018-06-28 12:15   ` Pavel Tatashin
@ 2018-07-11 22:26   ` " Guenter Roeck
  1 sibling, 0 replies; 17+ messages in thread
From: Guenter Roeck @ 2018-07-11 22:26 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, akpm, dave.hansen, pagupta, Pavel Tatashin,
	Oscar Salvador, linux-mm, kirill.shutemov, Michael Ellerman

Hi,

On Thu, Jun 28, 2018 at 02:28:57PM +0800, Baoquan He wrote:
> Pavel pointed out that the behaviour of allocating memmap together
> for one node should be defaulted for all ARCH-es. It won't break
> anything because it will drop to the fallback action to allocate
> imemmap for each section at one time if failed on large chunk of
> memory.
> 
> So remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER and clean up the
> related codes.
> 

This patch causes several of my qemu boot tests to fail.
	powerpc:mac99:ppc64_book3s_defconfig:initrd:nosmp
	powerpc:mac99:ppc64_book3s_defconfig:initrd:smp4 
	powerpc:mac99:ppc64_book3s_defconfig:rootfs:smp4 
	powerpc:pseries:pseries_defconfig:initrd 
	powerpc:pseries:pseries_defconfig:rootfs 
	powerpc:powernv:powernv_defconfig:initrd 

Bisect points to this patch. Bisect log is attached for reference.
Reverting the patch fixes the problem.

[ Yes, I am aware that Michael already reported the problem at
  https://lkml.org/lkml/2018/7/11/480 ]

Guenter

---
# bad: [98be45067040799a801e6ce52d8bf4659a153893] Add linux-next specific files for 20180711
# good: [1e4b044d22517cae7047c99038abb444423243ca] Linux 4.18-rc4
git bisect start 'HEAD' 'v4.18-rc4'
# good: [ade30e73739a5174bcaee5860fee76c2365548c5] Merge remote-tracking branch 'crypto/master'
git bisect good ade30e73739a5174bcaee5860fee76c2365548c5
# good: [792be221c35d19a1c486789e5b5c91c05279b94d] Merge remote-tracking branch 'tip/auto-latest'
git bisect good 792be221c35d19a1c486789e5b5c91c05279b94d
# good: [1d66737ba99400ab9a79c906a25b2090f4cc8b18] Merge remote-tracking branch 'mux/for-next'
git bisect good 1d66737ba99400ab9a79c906a25b2090f4cc8b18
# good: [c02d5416bd8504866dd80d2129f4648747166b6f] Merge remote-tracking branch 'kspp/for-next/kspp'
git bisect good c02d5416bd8504866dd80d2129f4648747166b6f
# bad: [1e741337a9416010a48c6034855e316ba8057111] ntb: ntb_hw_switchtec: cleanup 64bit IO defines to use the common header
git bisect bad 1e741337a9416010a48c6034855e316ba8057111
# good: [205a106bac127145a4defae7d0d35945001fe924] kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN
git bisect good 205a106bac127145a4defae7d0d35945001fe924
# bad: [e87ebebf76c9ceeaea21a256341d6765c657e550] mm, oom: remove sleep from under oom_lock
git bisect bad e87ebebf76c9ceeaea21a256341d6765c657e550
# good: [9f95e9e87283a578fb676f14fd5e1edd4cb401f7] mm/vmscan.c: iterate only over charged shrinkers during memcg shrink_slab()
git bisect good 9f95e9e87283a578fb676f14fd5e1edd4cb401f7
# bad: [0ba29a108979bdbe3f77094e8b5cc06652e2698b] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
git bisect bad 0ba29a108979bdbe3f77094e8b5cc06652e2698b
# good: [ca464c7768574c49f58a5be8962330814262820f] mm/sparse.c: add a static variable nr_present_sections
git bisect good ca464c7768574c49f58a5be8962330814262820f
# good: [4f02f8533cbad406e004b3bfe75f65e6b791efd5] mm/sparse.c: add a new parameter 'data_unit_size' for alloc_usemap_and_memmap
git bisect good 4f02f8533cbad406e004b3bfe75f65e6b791efd5
# good: [110cb339e5d95c77cf83d33de25f6f392d0ca7f6] mm-sparse-optimize-memmap-allocation-during-sparse_init-checkpatch-fixes
git bisect good 110cb339e5d95c77cf83d33de25f6f392d0ca7f6
# first bad commit: [0ba29a108979bdbe3f77094e8b5cc06652e2698b] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-28  6:28 [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
2018-06-28  6:28 ` [PATCH v6 1/5] mm/sparse: Add a static variable nr_present_sections Baoquan He
2018-06-28 11:24   ` Oscar Salvador
2018-06-28  6:28 ` [PATCH v6 2/5] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
2018-06-28 11:19   ` Oscar Salvador
2018-06-28  6:28 ` [PATCH v6 3/5] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap Baoquan He
2018-06-28 13:08   ` Oscar Salvador
2018-06-28  6:28 ` [PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
2018-06-28 12:09   ` Oscar Salvador
2018-06-28 12:12     ` Pavel Tatashin
2018-06-28 13:12       ` Oscar Salvador
2018-06-28 14:05       ` Dave Hansen
2018-06-28 14:09         ` Pavel Tatashin
2018-06-28  6:28 ` [PATCH v6 5/5] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER Baoquan He
2018-06-28 12:15   ` Pavel Tatashin
2018-07-11 22:26   ` [v6,5/5] " Guenter Roeck
2018-07-08  2:09 ` [PATCH v6 0/5] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox