linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] mm/sparse: Optimize memmap allocation during sparse_init()
@ 2018-02-28  3:26 Baoquan He
  2018-02-28  3:26 ` [PATCH v3 1/4] mm/sparse: Add a static variable nr_present_sections Baoquan He
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Baoquan He @ 2018-02-28  3:26 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta
  Cc: linux-mm, kirill.shutemov, Baoquan He

This is v3 post. V1 can be found here:
https://www.spinics.net/lists/linux-mm/msg144486.html

In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. In
5-level paging mode, this will cost 512M memory though they will be
released at the end of sparse_init(). System with few memory, like
kdump kernel which usually only has about 256M, will fail to boot
because of allocation failure if CONFIG_X86_5LEVEL=y.

In this patchset, optimize the memmap allocation code to only use
usemap_map and map_map with the size of nr_present_sections. This
makes kdump kernel boot up with normal crashkernel='' setting when
CONFIG_X86_5LEVEL=y.

Change log:
v2->v3:
  Change nr_present_sections as __initdata and add code comment
  according to Andrew's suggestion.

  Change the local variable 'i' as idx_present which loops over the
  present sections, and improve the code. These are suggested by
  Dave and Pankaj.

  Add a new patch 0003 which adds a new parameter 'data_unit_size'
  to function alloc_usemap_and_memmap() in which we will update 'data'
  to make it point at new position. However its type 'void *' can't give
  us needed info to do that. Need pass the unit size in. So change code
  in patch 0004 accordingly. This is a code bug fix found when tested
  the memory deployed on multiple nodes.

v1-v2:
  Split out the nr_present_sections adding as a single patch for easier
  reviewing.

  Rewrite patch log according to Dave's suggestion.

  Fix code bug in patch 0002 reported by test robot.

Baoquan He (4):
  mm/sparse: Add a static variable nr_present_sections
  mm/sparsemem: Defer the ms->section_mem_map clearing
  mm/sparse: Add a new parameter 'data_unit_size' for
    alloc_usemap_and_memmap
  mm/sparse: Optimize memmap allocation during sparse_init()

 mm/sparse-vmemmap.c |  6 +++---
 mm/sparse.c         | 62 ++++++++++++++++++++++++++++++++++++++---------------
 2 files changed, 48 insertions(+), 20 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/4] mm/sparse: Add a static variable nr_present_sections
  2018-02-28  3:26 [PATCH v3 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
@ 2018-02-28  3:26 ` Baoquan He
  2018-02-28  3:26 ` [PATCH v3 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Baoquan He @ 2018-02-28  3:26 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta
  Cc: linux-mm, kirill.shutemov, Baoquan He

It's used to record how many memory sections are marked as present
during system boot up, and will be used in the later patch.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index 7af5e7a92528..3400c1c02f7a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -202,6 +202,12 @@ static inline int next_present_section_nr(int section_nr)
 	      (section_nr <= __highest_present_section_nr));	\
 	     section_nr = next_present_section_nr(section_nr))
 
+/*
+ * Record how many memory sections are marked as present
+ * during system bootup.
+ */
+static int __initdata nr_present_sections;
+
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -231,6 +237,7 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
 			ms->section_mem_map = sparse_encode_early_nid(nid) |
 							SECTION_IS_ONLINE;
 			section_mark_present(ms);
+			nr_present_sections++;
 		}
 	}
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-02-28  3:26 [PATCH v3 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
  2018-02-28  3:26 ` [PATCH v3 1/4] mm/sparse: Add a static variable nr_present_sections Baoquan He
@ 2018-02-28  3:26 ` Baoquan He
  2018-04-06 14:23   ` Dave Hansen
  2018-02-28  3:26 ` [PATCH v3 3/4] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap Baoquan He
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Baoquan He @ 2018-02-28  3:26 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta
  Cc: linux-mm, kirill.shutemov, Baoquan He

In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
will allocate one continuous memory chunk for mem maps on one node and
populate the relevant page tables to map memory section one by one. If
fail to populate for a certain mem section, print warning and its
->section_mem_map will be cleared to cancel the marking of being present.
Like this, the number of mem sections marked as present could become
less during sparse_init() execution.

Here just defer the ms->section_mem_map clearing if failed to populate
its page tables until the last for_each_present_section_nr() loop. This
is in preparation for later optimizing the mem map allocation.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse-vmemmap.c |  1 -
 mm/sparse.c         | 12 ++++++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index bd0276d5f66b..640e68f8324b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -303,7 +303,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
-		ms->section_mem_map = 0;
 	}
 
 	if (vmemmap_buf_start) {
diff --git a/mm/sparse.c b/mm/sparse.c
index 3400c1c02f7a..5769a0a79dc1 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -490,7 +490,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 		       __func__);
-		ms->section_mem_map = 0;
 	}
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
@@ -518,7 +517,6 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
 
 	pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
 	       __func__);
-	ms->section_mem_map = 0;
 	return NULL;
 }
 #endif
@@ -622,17 +620,23 @@ void __init sparse_init(void)
 #endif
 
 	for_each_present_section_nr(0, pnum) {
+		struct mem_section *ms;
+		ms = __nr_to_section(pnum);
 		usemap = usemap_map[pnum];
-		if (!usemap)
+		if (!usemap) {
+			ms->section_mem_map = 0;
 			continue;
+		}
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 		map = map_map[pnum];
 #else
 		map = sparse_early_mem_map_alloc(pnum);
 #endif
-		if (!map)
+		if (!map) {
+			ms->section_mem_map = 0;
 			continue;
+		}
 
 		sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 								usemap);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/4] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap
  2018-02-28  3:26 [PATCH v3 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
  2018-02-28  3:26 ` [PATCH v3 1/4] mm/sparse: Add a static variable nr_present_sections Baoquan He
  2018-02-28  3:26 ` [PATCH v3 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
@ 2018-02-28  3:26 ` Baoquan He
  2018-02-28  3:26 ` [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
  2018-04-05 22:08 ` [PATCH v3 0/4] " Andrew Morton
  4 siblings, 0 replies; 17+ messages in thread
From: Baoquan He @ 2018-02-28  3:26 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta
  Cc: linux-mm, kirill.shutemov, Baoquan He

It's used to pass the size of map data unit into alloc_usemap_and_memmap,
and is preparation for next patch.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 5769a0a79dc1..e1aa2f44530d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -528,10 +528,12 @@ void __weak __meminit vmemmap_populate_print_last(void)
 /**
  *  alloc_usemap_and_memmap - memory alloction for pageblock flags and vmemmap
  *  @map: usemap_map for pageblock flags or mmap_map for vmemmap
+ *  @unit_size: size of map unit
  */
 static void __init alloc_usemap_and_memmap(void (*alloc_func)
 					(void *, unsigned long, unsigned long,
-					unsigned long, int), void *data)
+					unsigned long, int), void *data,
+					int data_unit_size)
 {
 	unsigned long pnum;
 	unsigned long map_count;
@@ -608,7 +610,8 @@ void __init sparse_init(void)
 	if (!usemap_map)
 		panic("can not allocate usemap_map\n");
 	alloc_usemap_and_memmap(sparse_early_usemaps_alloc_node,
-							(void *)usemap_map);
+				(void *)usemap_map,
+				sizeof(usemap_map[0]));
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
@@ -616,7 +619,8 @@ void __init sparse_init(void)
 	if (!map_map)
 		panic("can not allocate map_map\n");
 	alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
-							(void *)map_map);
+				(void *)map_map,
+				sizeof(map_map[0]));
 #endif
 
 	for_each_present_section_nr(0, pnum) {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-02-28  3:26 [PATCH v3 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
                   ` (2 preceding siblings ...)
  2018-02-28  3:26 ` [PATCH v3 3/4] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap Baoquan He
@ 2018-02-28  3:26 ` Baoquan He
  2018-04-06 14:50   ` Dave Hansen
  2018-04-05 22:08 ` [PATCH v3 0/4] " Andrew Morton
  4 siblings, 1 reply; 17+ messages in thread
From: Baoquan He @ 2018-02-28  3:26 UTC (permalink / raw)
  To: linux-kernel, akpm, dave.hansen, pagupta
  Cc: linux-mm, kirill.shutemov, Baoquan He

In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. With
the help of these two arrays, continuous memory chunk is allocated for
usemap and memmap for memory sections on one node. This avoids too many
memory fragmentations. Like below diagram, '1' indicates the present
memory section, '0' means absent one. The number 'n' which means the
number of present sections could be much smaller than NR_MEM_SECTIONS
on most of systems.

|1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
-------------------------------------------------------
 0 1 2 3         4 5         i   i+1     n-1   n

If fail to populate the page tables to map one section's memmap, its
->section_mem_map will be cleared finally to indicate that it's not present.
After use, these two arrays will be released at the end of sparse_init().

In 4-level paging mode, each array costs 4M which can be ignorable. While
in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel
Usually only reserves very few memory, e.g 256M. So, even thouth they are
temporarily allocated, still not acceptable.

In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS.
Since the ->section_mem_map clearing has been deferred to the last, the
number of present memory sections are kept the same during sparse_init()
until we finally clear out the memory section's ->section_mem_map if its
usemap or memmap is not correctly handled. Thus in the middle whenever
for_each_present_section_nr() loop is taken, the i-th present memory
section is always the same one.

Here only allocate usemap_map and map_map with the size of
'nr_present_sections'. For the i-th present memory section, install its
usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then
in the last for_each_present_section_nr() loop which clears the failed
memory section's ->section_mem_map, fetch usemap and memmap from
usemap_map[] and map_map[] array and set them into mem_section[]
accordingly.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse-vmemmap.c |  5 +++--
 mm/sparse.c         | 33 +++++++++++++++++++++++----------
 2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 640e68f8324b..ca72203d5b49 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -281,6 +281,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
 	void *vmemmap_buf_start;
+	int idx_present = 0;
 
 	size = ALIGN(size, PMD_SIZE);
 	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
@@ -297,8 +298,8 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 		if (!present_section_nr(pnum))
 			continue;
 
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		map_map[idx_present] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[idx_present++])
 			continue;
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
diff --git a/mm/sparse.c b/mm/sparse.c
index e1aa2f44530d..34a28c0c7376 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -410,6 +410,7 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 	unsigned long pnum;
 	unsigned long **usemap_map = (unsigned long **)data;
 	int size = usemap_size();
+	int idx_present = 0;
 
 	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
 							  size * usemap_count);
@@ -421,9 +422,10 @@ static void __init sparse_early_usemaps_alloc_node(void *data,
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
-		usemap_map[pnum] = usemap;
+		usemap_map[idx_present] = usemap;
 		usemap += size;
-		check_usemap_section_nr(nodeid, usemap_map[pnum]);
+		check_usemap_section_nr(nodeid, usemap_map[idx_present]);
+		idx_present++;
 	}
 }
 
@@ -452,14 +454,17 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 	void *map;
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
+	int idx_present;
 
 	map = alloc_remap(nodeid, size * map_count);
 	if (map) {
+		idx_present = 0;
 		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 			if (!present_section_nr(pnum))
 				continue;
-			map_map[pnum] = map;
+			map_map[idx_present] = map;
 			map += size;
+			idx_present++;
 		}
 		return;
 	}
@@ -469,23 +474,26 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
 					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
 					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
 	if (map) {
+		idx_present = 0;
 		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 			if (!present_section_nr(pnum))
 				continue;
-			map_map[pnum] = map;
+			map_map[idx_present] = map;
 			map += size;
+			idx_present++;
 		}
 		return;
 	}
 
 	/* fallback */
+	idx_present = 0;
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 		struct mem_section *ms;
 
 		if (!present_section_nr(pnum))
 			continue;
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		map_map[idx_present] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[idx_present++])
 			continue;
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
@@ -565,6 +573,7 @@ static void __init alloc_usemap_and_memmap(void (*alloc_func)
 		/* new start, update count etc*/
 		nodeid_begin = nodeid;
 		pnum_begin = pnum;
+		data += map_count * data_unit_size;
 		map_count = 1;
 	}
 	/* ok, last chunk */
@@ -583,6 +592,7 @@ void __init sparse_init(void)
 	unsigned long *usemap;
 	unsigned long **usemap_map;
 	int size;
+	int idx_present = 0;
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 	int size2;
 	struct page **map_map;
@@ -605,7 +615,7 @@ void __init sparse_init(void)
 	 * powerpc need to call sparse_init_one_section right after each
 	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
 	 */
-	size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
+	size = sizeof(unsigned long *) * nr_present_sections;
 	usemap_map = memblock_virt_alloc(size, 0);
 	if (!usemap_map)
 		panic("can not allocate usemap_map\n");
@@ -614,7 +624,7 @@ void __init sparse_init(void)
 				sizeof(usemap_map[0]));
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
+	size2 = sizeof(struct page *) * nr_present_sections;
 	map_map = memblock_virt_alloc(size2, 0);
 	if (!map_map)
 		panic("can not allocate map_map\n");
@@ -626,24 +636,27 @@ void __init sparse_init(void)
 	for_each_present_section_nr(0, pnum) {
 		struct mem_section *ms;
 		ms = __nr_to_section(pnum);
-		usemap = usemap_map[pnum];
+		usemap = usemap_map[idx_present];
 		if (!usemap) {
 			ms->section_mem_map = 0;
+			idx_present++;
 			continue;
 		}
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-		map = map_map[pnum];
+		map = map_map[idx_present];
 #else
 		map = sparse_early_mem_map_alloc(pnum);
 #endif
 		if (!map) {
 			ms->section_mem_map = 0;
+			idx_present++;
 			continue;
 		}
 
 		sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 								usemap);
+		idx_present++;
 	}
 
 	vmemmap_populate_print_last();
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-02-28  3:26 [PATCH v3 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
                   ` (3 preceding siblings ...)
  2018-02-28  3:26 ` [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
@ 2018-04-05 22:08 ` Andrew Morton
  2018-04-06 11:05   ` Kirill A. Shutemov
  4 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2018-04-05 22:08 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, dave.hansen, pagupta, linux-mm, kirill.shutemov

On Wed, 28 Feb 2018 11:26:53 +0800 Baoquan He <bhe@redhat.com> wrote:

> This is v3 post. V1 can be found here:
> https://www.spinics.net/lists/linux-mm/msg144486.html
> 
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS. They are used to store
> each memory section's usemap and mem map if marked as present. In
> 5-level paging mode, this will cost 512M memory though they will be
> released at the end of sparse_init(). System with few memory, like
> kdump kernel which usually only has about 256M, will fail to boot
> because of allocation failure if CONFIG_X86_5LEVEL=y.
> 
> In this patchset, optimize the memmap allocation code to only use
> usemap_map and map_map with the size of nr_present_sections. This
> makes kdump kernel boot up with normal crashkernel='' setting when
> CONFIG_X86_5LEVEL=y.

This patchset could do with some more review, please?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-04-05 22:08 ` [PATCH v3 0/4] " Andrew Morton
@ 2018-04-06 11:05   ` Kirill A. Shutemov
  0 siblings, 0 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2018-04-06 11:05 UTC (permalink / raw)
  To: Andrew Morton, dave.hansen
  Cc: Baoquan He, linux-kernel, pagupta, linux-mm, kirill.shutemov

On Thu, Apr 05, 2018 at 03:08:42PM -0700, Andrew Morton wrote:
> On Wed, 28 Feb 2018 11:26:53 +0800 Baoquan He <bhe@redhat.com> wrote:
> 
> > This is v3 post. V1 can be found here:
> > https://www.spinics.net/lists/linux-mm/msg144486.html
> > 
> > In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> > are allocated with the size of NR_MEM_SECTIONS. They are used to store
> > each memory section's usemap and mem map if marked as present. In
> > 5-level paging mode, this will cost 512M memory though they will be
> > released at the end of sparse_init(). System with few memory, like
> > kdump kernel which usually only has about 256M, will fail to boot
> > because of allocation failure if CONFIG_X86_5LEVEL=y.
> > 
> > In this patchset, optimize the memmap allocation code to only use
> > usemap_map and map_map with the size of nr_present_sections. This
> > makes kdump kernel boot up with normal crashkernel='' setting when
> > CONFIG_X86_5LEVEL=y.
> 
> This patchset could do with some more review, please?

I don't really understand sparsemem good enough to comment on the
patchset.

Dave, could you review this?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-02-28  3:26 ` [PATCH v3 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
@ 2018-04-06 14:23   ` Dave Hansen
  2018-04-08  6:50     ` Baoquan He
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2018-04-06 14:23 UTC (permalink / raw)
  To: Baoquan He, linux-kernel, akpm, pagupta; +Cc: linux-mm, kirill.shutemov

On 02/27/2018 07:26 PM, Baoquan He wrote:
> In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
> will allocate one continuous memory chunk for mem maps on one node and
> populate the relevant page tables to map memory section one by one. If
> fail to populate for a certain mem section, print warning and its
> ->section_mem_map will be cleared to cancel the marking of being present.
> Like this, the number of mem sections marked as present could become
> less during sparse_init() execution.
> 
> Here just defer the ms->section_mem_map clearing if failed to populate
> its page tables until the last for_each_present_section_nr() loop. This
> is in preparation for later optimizing the mem map allocation.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/sparse-vmemmap.c |  1 -
>  mm/sparse.c         | 12 ++++++++----
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index bd0276d5f66b..640e68f8324b 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -303,7 +303,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
>  		ms = __nr_to_section(pnum);
>  		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
>  		       __func__);
> -		ms->section_mem_map = 0;
>  	}

I think you might have been trying to say this in the description, but I
was not able to parse it out of there.  What is in ms->section_mem_map
that needs to get cleared?

It *looks* like memory_present() uses ms->section_mem_map to just mark
which sections are online relatively early in boot.  We need this
clearing to mark that they are effectively *not* present any longer.
Correct?

I guess the concern here is that if you miss any of the error sites,
we'll end up with a bogus, non-null ms->section_mem_map.  Do we handle
that nicely?

Should the " = 0" instead be clearing SECTION_MARKED_PRESENT or
something?  That would make it easier to match the code up with the code
that it is effectively undoing.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-02-28  3:26 ` [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
@ 2018-04-06 14:50   ` Dave Hansen
  2018-04-08  8:20     ` Baoquan He
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2018-04-06 14:50 UTC (permalink / raw)
  To: Baoquan He, linux-kernel, akpm, pagupta; +Cc: linux-mm, kirill.shutemov

I'm having a really hard time tying all the pieces back together.  Let
me give it a shot and you can tell me where I go wrong.

On 02/27/2018 07:26 PM, Baoquan He wrote:
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated with the size of NR_MEM_SECTIONS.

In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated to hold the maps for every possible memory section
(NR_MEM_SECTIONS).  However, we obviously only need the array sized for
nr_present_sections (introduced in patch 1).

The reason this is a problem is that, with 5-level paging,
NR_MEM_SECTIONS (8M->512M) went up dramatically and these temporary
arrays can eat all of memory, like on kdump kernels.

This patch does two things: it makes sure to give usemap_map/mem_map a
less gluttonous size on small systems, and it changes the map allocation
and handling to handle the now more compact, less sparse arrays.

---

The code looks fine to me.  It's a bit of a shame that there's no
verification to ensure that idx_present never goes beyond the shiny new
nr_present_sections.


> @@ -583,6 +592,7 @@ void __init sparse_init(void)
>  	unsigned long *usemap;
>  	unsigned long **usemap_map;
>  	int size;
> +	int idx_present = 0;

I wonder whether idx_present is a good name.  Isn't it the number of
consumed mem_map[]s or usemaps?

> 
>  		if (!map) {
>  			ms->section_mem_map = 0;
> +			idx_present++;
>  			continue;
>  		}
>  


This hunk seems logically odd to me.  I would expect a non-used section
to *not* consume an entry from the temporary array.  Why does it?  The
error and success paths seem to do the same thing.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-04-06 14:23   ` Dave Hansen
@ 2018-04-08  6:50     ` Baoquan He
  2018-04-09 16:02       ` Dave Hansen
  0 siblings, 1 reply; 17+ messages in thread
From: Baoquan He @ 2018-04-08  6:50 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, akpm, pagupta, linux-mm, kirill.shutemov

Hi Dave,

Thanks a lot for your careful reviewing!

On 04/06/18 at 07:23am, Dave Hansen wrote:
> On 02/27/2018 07:26 PM, Baoquan He wrote:
> > In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
> > will allocate one continuous memory chunk for mem maps on one node and
> > populate the relevant page tables to map memory section one by one. If
> > fail to populate for a certain mem section, print warning and its
> > ->section_mem_map will be cleared to cancel the marking of being present.
> > Like this, the number of mem sections marked as present could become
> > less during sparse_init() execution.
> > 
> > Here just defer the ms->section_mem_map clearing if failed to populate
> > its page tables until the last for_each_present_section_nr() loop. This
> > is in preparation for later optimizing the mem map allocation.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/sparse-vmemmap.c |  1 -
> >  mm/sparse.c         | 12 ++++++++----
> >  2 files changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> > index bd0276d5f66b..640e68f8324b 100644
> > --- a/mm/sparse-vmemmap.c
> > +++ b/mm/sparse-vmemmap.c
> > @@ -303,7 +303,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
> >  		ms = __nr_to_section(pnum);
> >  		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
> >  		       __func__);
> > -		ms->section_mem_map = 0;
> >  	}
> 
> I think you might have been trying to say this in the description, but I
> was not able to parse it out of there.  What is in ms->section_mem_map
> that needs to get cleared?
> 
> It *looks* like memory_present() uses ms->section_mem_map to just mark
> which sections are online relatively early in boot.  We need this
> clearing to mark that they are effectively *not* present any longer.
> Correct?
> 
> I guess the concern here is that if you miss any of the error sites,
> we'll end up with a bogus, non-null ms->section_mem_map.  Do we handle
> that nicely?
> 
> Should the " = 0" instead be clearing SECTION_MARKED_PRESENT or
> something?  That would make it easier to match the code up with the code
> that it is effectively undoing.


Not sure if I understand your question correctly. From memory_present(),
information encoded into ms->section_mem_map including numa node,
SECTION_IS_ONLINE and SECTION_MARKED_PRESENT. Not sure if it's OK to only
clear SECTION_MARKED_PRESENT.  People may wrongly check SECTION_IS_ONLINE
and do something on this memory section?

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-04-06 14:50   ` Dave Hansen
@ 2018-04-08  8:20     ` Baoquan He
  2018-04-09  2:07       ` Baoquan He
  2018-04-11 15:48       ` Dave Hansen
  0 siblings, 2 replies; 17+ messages in thread
From: Baoquan He @ 2018-04-08  8:20 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, akpm, pagupta, linux-mm, kirill.shutemov

On 04/06/18 at 07:50am, Dave Hansen wrote:
> I'm having a really hard time tying all the pieces back together.  Let
> me give it a shot and you can tell me where I go wrong.
> 
> On 02/27/2018 07:26 PM, Baoquan He wrote:
> > In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> > are allocated with the size of NR_MEM_SECTIONS.
> 
> In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> are allocated to hold the maps for every possible memory section
> (NR_MEM_SECTIONS).  However, we obviously only need the array sized for
> nr_present_sections (introduced in patch 1).

Yes, correct.

> 
> The reason this is a problem is that, with 5-level paging,
> NR_MEM_SECTIONS (8M->512M) went up dramatically and these temporary
> arrays can eat all of memory, like on kdump kernels.

With 5-level paging enabled, MAX_PHYSMEM_BITS changed from 46 to
52. You can see NR_MEM_SECTIONS becomes 64 times of the old value. So
the two temporary pointer arrays eat more memory, 8M -> 8M*64 = 512M.

# define MAX_PHYSMEM_BITS       (pgtable_l5_enabled ? 52 : 46)

> 
> This patch does two things: it makes sure to give usemap_map/mem_map a
> less gluttonous size on small systems, and it changes the map allocation
> and handling to handle the now more compact, less sparse arrays.

Yes, because 99.9% of systems do not have PB level of memory, not even TB.
Any place of memory allocatin with the size of NR_MEM_SECTIONS should be
avoided.

> 
> ---
> 
> The code looks fine to me.  It's a bit of a shame that there's no
> verification to ensure that idx_present never goes beyond the shiny new
> nr_present_sections. 

This is a good point. Do you think it's OK to replace (section_nr <
NR_MEM_SECTIONS) with (section_nr < nr_present_sections) in below
for_each macro? This for_each_present_section_nr() is only used
during sparse_init() execution.

#define for_each_present_section_nr(start, section_nr)          \
        for (section_nr = next_present_section_nr(start-1);     \
             ((section_nr >= 0) &&                              \
              (section_nr < NR_MEM_SECTIONS) &&                 \                                                                                 
              (section_nr <= __highest_present_section_nr));    \
             section_nr = next_present_section_nr(section_nr))

> 
> 
> > @@ -583,6 +592,7 @@ void __init sparse_init(void)
> >  	unsigned long *usemap;
> >  	unsigned long **usemap_map;
> >  	int size;
> > +	int idx_present = 0;
> 
> I wonder whether idx_present is a good name.  Isn't it the number of
> consumed mem_map[]s or usemaps?

Yeah, in sparse_init(), it's the index of present memory sections, and
also the number of consumed mem_map[]s or usemaps. And I remember you
suggested nr_consumed_maps instead. seems nr_consumed_maps is a little
long to index array to make code line longer than 80 chars. How about
name it idx_present in sparse_init(), nr_consumed_maps in
alloc_usemap_and_memmap(), the maps allocation function? I am also fine
to use nr_consumed_maps for all of them.

> 
> > 
> >  		if (!map) {
> >  			ms->section_mem_map = 0;
> > +			idx_present++;
> >  			continue;
> >  		}
> >  
> 
> 
> This hunk seems logically odd to me.  I would expect a non-used section
> to *not* consume an entry from the temporary array.  Why does it?  The
> error and success paths seem to do the same thing.

Yes, this place is the hardest to understand. The temorary arrays are
allocated beforehand with the size of 'nr_present_sections'. The error
paths you mentioned is caused by allocation failure of mem_map or
map_map, but whatever it's error or success paths, the sections must be
marked as present in memory_present(). Error or success paths happened
in alloc_usemap_and_memmap(), while checking if it's erorr or success
paths happened in the last for_each_present_section_nr() of
sparse_init(), and clear the ms->section_mem_map if it goes along error
paths. This is the key point of this new allocation way.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-04-08  8:20     ` Baoquan He
@ 2018-04-09  2:07       ` Baoquan He
  2018-04-11 15:48       ` Dave Hansen
  1 sibling, 0 replies; 17+ messages in thread
From: Baoquan He @ 2018-04-09  2:07 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, akpm, pagupta, linux-mm, kirill.shutemov

On 04/08/18 at 04:20pm, Baoquan He wrote:
> On 04/06/18 at 07:50am, Dave Hansen wrote:
> > I'm having a really hard time tying all the pieces back together.  Let
> > me give it a shot and you can tell me where I go wrong.
> > 
> > On 02/27/2018 07:26 PM, Baoquan He wrote:
> > > In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> > > are allocated with the size of NR_MEM_SECTIONS.
> > 
> > In sparse_init(), two temporary pointer arrays, usemap_map and map_map
> > are allocated to hold the maps for every possible memory section
> > (NR_MEM_SECTIONS).  However, we obviously only need the array sized for
> > nr_present_sections (introduced in patch 1).
> 
> Yes, correct.
> 
> > 
> > The reason this is a problem is that, with 5-level paging,
> > NR_MEM_SECTIONS (8M->512M) went up dramatically and these temporary
> > arrays can eat all of memory, like on kdump kernels.
> 
> With 5-level paging enabled, MAX_PHYSMEM_BITS changed from 46 to
> 52. You can see NR_MEM_SECTIONS becomes 64 times of the old value. So
> the two temporary pointer arrays eat more memory, 8M -> 8M*64 = 512M.
> 
> # define MAX_PHYSMEM_BITS       (pgtable_l5_enabled ? 52 : 46)
> 
> > 
> > This patch does two things: it makes sure to give usemap_map/mem_map a
> > less gluttonous size on small systems, and it changes the map allocation
> > and handling to handle the now more compact, less sparse arrays.
> 
> Yes, because 99.9% of systems do not have PB level of memory, not even TB.
> Any place of memory allocatin with the size of NR_MEM_SECTIONS should be
> avoided.
> 
> > 
> > ---
> > 
> > The code looks fine to me.  It's a bit of a shame that there's no
> > verification to ensure that idx_present never goes beyond the shiny new
> > nr_present_sections. 
> 
> This is a good point. Do you think it's OK to replace (section_nr <
> NR_MEM_SECTIONS) with (section_nr < nr_present_sections) in below
> for_each macro? This for_each_present_section_nr() is only used
> during sparse_init() execution.

Oops, I was wrong. Here nr_present_sections is the number of present
sections, while section_nr is index of all sections. If decide to do,
can add check like below?

	if (idx_present >= nr_present_sections) {
		pr_err("idx_present goes beyond nr_present_sections, xxxx \n");
		break;
	}

> 
> #define for_each_present_section_nr(start, section_nr)          \
>         for (section_nr = next_present_section_nr(start-1);     \
>              ((section_nr >= 0) &&                              \
>               (section_nr < NR_MEM_SECTIONS) &&                 \                                                                                 
>               (section_nr <= __highest_present_section_nr));    \
>              section_nr = next_present_section_nr(section_nr))
> 
> > 
> > 
> > > @@ -583,6 +592,7 @@ void __init sparse_init(void)
> > >  	unsigned long *usemap;
> > >  	unsigned long **usemap_map;
> > >  	int size;
> > > +	int idx_present = 0;
> > 
> > I wonder whether idx_present is a good name.  Isn't it the number of
> > consumed mem_map[]s or usemaps?
> 
> Yeah, in sparse_init(), it's the index of present memory sections, and
> also the number of consumed mem_map[]s or usemaps. And I remember you
> suggested nr_consumed_maps instead. seems nr_consumed_maps is a little
> long to index array to make code line longer than 80 chars. How about
> name it idx_present in sparse_init(), nr_consumed_maps in
> alloc_usemap_and_memmap(), the maps allocation function? I am also fine
> to use nr_consumed_maps for all of them.
> 
> > 
> > > 
> > >  		if (!map) {
> > >  			ms->section_mem_map = 0;
> > > +			idx_present++;
> > >  			continue;
> > >  		}
> > >  
> > 
> > 
> > This hunk seems logically odd to me.  I would expect a non-used section
> > to *not* consume an entry from the temporary array.  Why does it?  The
> > error and success paths seem to do the same thing.
> 
> Yes, this place is the hardest to understand. The temorary arrays are
> allocated beforehand with the size of 'nr_present_sections'. The error
> paths you mentioned is caused by allocation failure of mem_map or
> map_map, but whatever it's error or success paths, the sections must be
> marked as present in memory_present(). Error or success paths happened
> in alloc_usemap_and_memmap(), while checking if it's erorr or success
> paths happened in the last for_each_present_section_nr() of
> sparse_init(), and clear the ms->section_mem_map if it goes along error
> paths. This is the key point of this new allocation way.
> 
> Thanks
> Baoquan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-04-08  6:50     ` Baoquan He
@ 2018-04-09 16:02       ` Dave Hansen
  2018-04-10  0:26         ` Baoquan He
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2018-04-09 16:02 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, akpm, pagupta, linux-mm, kirill.shutemov

On 04/07/2018 11:50 PM, Baoquan He wrote:
>> Should the " = 0" instead be clearing SECTION_MARKED_PRESENT or
>> something?  That would make it easier to match the code up with the code
>> that it is effectively undoing.
> 
> Not sure if I understand your question correctly. From memory_present(),
> information encoded into ms->section_mem_map including numa node,
> SECTION_IS_ONLINE and SECTION_MARKED_PRESENT. Not sure if it's OK to only
> clear SECTION_MARKED_PRESENT.  People may wrongly check SECTION_IS_ONLINE
> and do something on this memory section?

What is mean is that, instead of:

	
	ms->section_mem_map = 0;

we could literally do:

	ms->section_mem_map &= ~SECTION_MARKED_PRESENT;

That does the same thing in practice, but makes the _intent_ much more
clear.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing
  2018-04-09 16:02       ` Dave Hansen
@ 2018-04-10  0:26         ` Baoquan He
  0 siblings, 0 replies; 17+ messages in thread
From: Baoquan He @ 2018-04-10  0:26 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, akpm, pagupta, linux-mm, kirill.shutemov

On 04/09/18 at 09:02am, Dave Hansen wrote:
> On 04/07/2018 11:50 PM, Baoquan He wrote:
> >> Should the " = 0" instead be clearing SECTION_MARKED_PRESENT or
> >> something?  That would make it easier to match the code up with the code
> >> that it is effectively undoing.
> > 
> > Not sure if I understand your question correctly. From memory_present(),
> > information encoded into ms->section_mem_map including numa node,
> > SECTION_IS_ONLINE and SECTION_MARKED_PRESENT. Not sure if it's OK to only
> > clear SECTION_MARKED_PRESENT.  People may wrongly check SECTION_IS_ONLINE
> > and do something on this memory section?
> 
> What is mean is that, instead of:

I mean that in memory_present() all present sections are marked with
below information.

	ms->section_mem_map = (nid << SECTION_NID_SHIFT) |
			      SECTION_MARKED_PRESENT |
			      SECTION_IS_ONLINE;

Later in sparse_init(), if we failed to allocate mem map, the
corresponding section need clear its ->section_mem_map. The existing
code does the clearing with: 

	ms->section_mem_map = 0;

If with 'ms->section_mem_map &= ~SECTION_MARKED_PRESENT', the nid and
SECTION_IS_ONLINE are still left in ms->section_mem_map. Someone may
probably mistakenly check if this section is online and do something, or
still get nid from this section. Just worried.

> 
> 	
> 	ms->section_mem_map = 0;
> 
> we could literally do:
> 
> 	ms->section_mem_map &= ~SECTION_MARKED_PRESENT;
> 
> That does the same thing in practice, but makes the _intent_ much more
> clear.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-04-08  8:20     ` Baoquan He
  2018-04-09  2:07       ` Baoquan He
@ 2018-04-11 15:48       ` Dave Hansen
  2018-04-15  2:19         ` Baoquan He
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2018-04-11 15:48 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, akpm, pagupta, linux-mm, kirill.shutemov

On 04/08/2018 01:20 AM, Baoquan He wrote:
> On 04/06/18 at 07:50am, Dave Hansen wrote:
>> The code looks fine to me.  It's a bit of a shame that there's no
>> verification to ensure that idx_present never goes beyond the shiny new
>> nr_present_sections. 
> 
> This is a good point. Do you think it's OK to replace (section_nr <
> NR_MEM_SECTIONS) with (section_nr < nr_present_sections) in below
> for_each macro? This for_each_present_section_nr() is only used
> during sparse_init() execution.
> 
> #define for_each_present_section_nr(start, section_nr)          \
>         for (section_nr = next_present_section_nr(start-1);     \
>              ((section_nr >= 0) &&                              \
>               (section_nr < NR_MEM_SECTIONS) &&                 \                                                                                 
>               (section_nr <= __highest_present_section_nr));    \
>              section_nr = next_present_section_nr(section_nr))

I was more concerned about the loops that "consume" the section maps.
It seems like they might run over the end of the array.

>>> @@ -583,6 +592,7 @@ void __init sparse_init(void)
>>>  	unsigned long *usemap;
>>>  	unsigned long **usemap_map;
>>>  	int size;
>>> +	int idx_present = 0;
>>
>> I wonder whether idx_present is a good name.  Isn't it the number of
>> consumed mem_map[]s or usemaps?
> 
> Yeah, in sparse_init(), it's the index of present memory sections, and
> also the number of consumed mem_map[]s or usemaps. And I remember you
> suggested nr_consumed_maps instead. seems nr_consumed_maps is a little
> long to index array to make code line longer than 80 chars. How about
> name it idx_present in sparse_init(), nr_consumed_maps in
> alloc_usemap_and_memmap(), the maps allocation function? I am also fine
> to use nr_consumed_maps for all of them.

Does the large array index make a bunch of lines wrap or something?  If
not, I'd just use the long name.

>>>  		if (!map) {
>>>  			ms->section_mem_map = 0;
>>> +			idx_present++;
>>>  			continue;
>>>  		}
>>>  
>>
>>
>> This hunk seems logically odd to me.  I would expect a non-used section
>> to *not* consume an entry from the temporary array.  Why does it?  The
>> error and success paths seem to do the same thing.
> 
> Yes, this place is the hardest to understand. The temorary arrays are
> allocated beforehand with the size of 'nr_present_sections'. The error
> paths you mentioned is caused by allocation failure of mem_map or
> map_map, but whatever it's error or success paths, the sections must be
> marked as present in memory_present(). Error or success paths happened
> in alloc_usemap_and_memmap(), while checking if it's erorr or success
> paths happened in the last for_each_present_section_nr() of
> sparse_init(), and clear the ms->section_mem_map if it goes along error
> paths. This is the key point of this new allocation way.

I think you owe some commenting because this is so hard to understand.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-04-11 15:48       ` Dave Hansen
@ 2018-04-15  2:19         ` Baoquan He
  2018-04-16  4:36           ` Dave Hansen
  0 siblings, 1 reply; 17+ messages in thread
From: Baoquan He @ 2018-04-15  2:19 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, akpm, pagupta, linux-mm, kirill.shutemov

Hi Dave,

Sorry for late reply.

On 04/11/18 at 08:48am, Dave Hansen wrote:
> On 04/08/2018 01:20 AM, Baoquan He wrote:
> > On 04/06/18 at 07:50am, Dave Hansen wrote:
> >> The code looks fine to me.  It's a bit of a shame that there's no
> >> verification to ensure that idx_present never goes beyond the shiny new
> >> nr_present_sections. 
> > 
> > This is a good point. Do you think it's OK to replace (section_nr <
> > NR_MEM_SECTIONS) with (section_nr < nr_present_sections) in below
> > for_each macro? This for_each_present_section_nr() is only used
> > during sparse_init() execution.
> > 
> > #define for_each_present_section_nr(start, section_nr)          \
> >         for (section_nr = next_present_section_nr(start-1);     \
> >              ((section_nr >= 0) &&                              \
> >               (section_nr < NR_MEM_SECTIONS) &&                 \                                                                                 
> >               (section_nr <= __highest_present_section_nr));    \
> >              section_nr = next_present_section_nr(section_nr))
> 
> I was more concerned about the loops that "consume" the section maps.
> It seems like they might run over the end of the array.



> 
> >>> @@ -583,6 +592,7 @@ void __init sparse_init(void)
> >>>  	unsigned long *usemap;
> >>>  	unsigned long **usemap_map;
> >>>  	int size;
> >>> +	int idx_present = 0;
> >>
> >> I wonder whether idx_present is a good name.  Isn't it the number of
> >> consumed mem_map[]s or usemaps?
> > 
> > Yeah, in sparse_init(), it's the index of present memory sections, and
> > also the number of consumed mem_map[]s or usemaps. And I remember you
> > suggested nr_consumed_maps instead. seems nr_consumed_maps is a little
> > long to index array to make code line longer than 80 chars. How about
> > name it idx_present in sparse_init(), nr_consumed_maps in
> > alloc_usemap_and_memmap(), the maps allocation function? I am also fine
> > to use nr_consumed_maps for all of them.
> 
> Does the large array index make a bunch of lines wrap or something?  If
> not, I'd just use the long name.

I am fine with the long name, will use 'nr_consumed_maps' you suggested
earlier to replace.

> 
> >>>  		if (!map) {
> >>>  			ms->section_mem_map = 0;
> >>> +			idx_present++;
> >>>  			continue;
> >>>  		}
> >>>  
> >>
> >>
> >> This hunk seems logically odd to me.  I would expect a non-used section
> >> to *not* consume an entry from the temporary array.  Why does it?  The
> >> error and success paths seem to do the same thing.
> > 
> > Yes, this place is the hardest to understand. The temorary arrays are
> > allocated beforehand with the size of 'nr_present_sections'. The error
> > paths you mentioned is caused by allocation failure of mem_map or
> > map_map, but whatever it's error or success paths, the sections must be
> > marked as present in memory_present(). Error or success paths happened
> > in alloc_usemap_and_memmap(), while checking if it's erorr or success
> > paths happened in the last for_each_present_section_nr() of
> > sparse_init(), and clear the ms->section_mem_map if it goes along error
> > paths. This is the key point of this new allocation way.
> 
> I think you owe some commenting because this is so hard to understand.

I can arrange and write a code comment above sparse_init() according to
this patch's git log, do you think it's OK?

Honestly, it took me several days to write code, while I spent more
than one week to write the patch log. Writing patch log is really a
headache to me.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
  2018-04-15  2:19         ` Baoquan He
@ 2018-04-16  4:36           ` Dave Hansen
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Hansen @ 2018-04-16  4:36 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, akpm, pagupta, linux-mm, kirill.shutemov

On 04/14/2018 07:19 PM, Baoquan He wrote:
>>> Yes, this place is the hardest to understand. The temorary arrays are
>>> allocated beforehand with the size of 'nr_present_sections'. The error
>>> paths you mentioned is caused by allocation failure of mem_map or
>>> map_map, but whatever it's error or success paths, the sections must be
>>> marked as present in memory_present(). Error or success paths happened
>>> in alloc_usemap_and_memmap(), while checking if it's erorr or success
>>> paths happened in the last for_each_present_section_nr() of
>>> sparse_init(), and clear the ms->section_mem_map if it goes along error
>>> paths. This is the key point of this new allocation way.
>> I think you owe some commenting because this is so hard to understand.
> I can arrange and write a code comment above sparse_init() according to
> this patch's git log, do you think it's OK?
> 
> Honestly, it took me several days to write code, while I spent more
> than one week to write the patch log. Writing patch log is really a
> headache to me.

I often find the same: writing the code is the easy part.  Explaining
why it is right is the hard part.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-04-16  4:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-28  3:26 [PATCH v3 0/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
2018-02-28  3:26 ` [PATCH v3 1/4] mm/sparse: Add a static variable nr_present_sections Baoquan He
2018-02-28  3:26 ` [PATCH v3 2/4] mm/sparsemem: Defer the ms->section_mem_map clearing Baoquan He
2018-04-06 14:23   ` Dave Hansen
2018-04-08  6:50     ` Baoquan He
2018-04-09 16:02       ` Dave Hansen
2018-04-10  0:26         ` Baoquan He
2018-02-28  3:26 ` [PATCH v3 3/4] mm/sparse: Add a new parameter 'data_unit_size' for alloc_usemap_and_memmap Baoquan He
2018-02-28  3:26 ` [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init() Baoquan He
2018-04-06 14:50   ` Dave Hansen
2018-04-08  8:20     ` Baoquan He
2018-04-09  2:07       ` Baoquan He
2018-04-11 15:48       ` Dave Hansen
2018-04-15  2:19         ` Baoquan He
2018-04-16  4:36           ` Dave Hansen
2018-04-05 22:08 ` [PATCH v3 0/4] " Andrew Morton
2018-04-06 11:05   ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).