linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
@ 2014-08-11 13:31 Xishi Qiu
  2014-08-16 13:04 ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Xishi Qiu @ 2014-08-11 13:31 UTC (permalink / raw)
  To: Andrew Morton, Tang Chen, Zhang Yanfei, tj, Wen Congyang,
	Rafael J. Wysocki, H. Peter Anvin
  Cc: Linux MM, LKML, Xishi Qiu

Let memblock skip the hotpluggable memory regions in __next_mem_range(),
it is used to to prevent memblock from allocating hotpluggable memory 
for the kernel at early time. The code is the same as __next_mem_range_rev().

Clear hotpluggable flag before releasing free pages to the buddy allocator.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 mm/memblock.c  |    4 ++++
 mm/nobootmem.c |    2 ++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 6d2f219..5090050 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -817,6 +817,10 @@ void __init_memblock __next_mem_range(u64 *idx, int nid,
 		if (nid != NUMA_NO_NODE && nid != m_nid)
 			continue;
 
+		/* skip hotpluggable memory regions if needed */
+		if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
+			continue;
+
 		if (!type_b) {
 			if (out_start)
 				*out_start = m_start;
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 7ed5860..03de286 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -119,6 +119,8 @@ static unsigned long __init free_low_memory_core_early(void)
 	phys_addr_t start, end;
 	u64 i;
 
+	memblock_clear_hotplug(0, ULLONG_MAX);
+
 	for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL)
 		count += __free_memory_core(start, end);
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
  2014-08-11 13:31 [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() Xishi Qiu
@ 2014-08-16 13:04 ` Tejun Heo
  2014-08-16 14:36   ` Xishi Qiu
  0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2014-08-16 13:04 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, Tang Chen, Zhang Yanfei, Wen Congyang,
	Rafael J. Wysocki, H. Peter Anvin, Linux MM, LKML

On Mon, Aug 11, 2014 at 09:31:22PM +0800, Xishi Qiu wrote:
> Let memblock skip the hotpluggable memory regions in __next_mem_range(),
> it is used to to prevent memblock from allocating hotpluggable memory 
> for the kernel at early time. The code is the same as __next_mem_range_rev().
> 
> Clear hotpluggable flag before releasing free pages to the buddy allocator.

Please try to explain "why" in addition to "what".  Why do we need to
clear hotpluggable flag in free_low_memory_core_early() in addition to
numa_clear_node_hotplug() in x86 numa.c?  Does this make x86 code
redundant?  If not, why?

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
  2014-08-16 13:04 ` Tejun Heo
@ 2014-08-16 14:36   ` Xishi Qiu
  2014-08-17 11:08     ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Xishi Qiu @ 2014-08-16 14:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Tang Chen, Zhang Yanfei, Wen Congyang,
	Rafael J. Wysocki, H. Peter Anvin, Linux MM, LKML

On 2014/8/16 21:04, Tejun Heo wrote:

> On Mon, Aug 11, 2014 at 09:31:22PM +0800, Xishi Qiu wrote:
>> Let memblock skip the hotpluggable memory regions in __next_mem_range(),
>> it is used to to prevent memblock from allocating hotpluggable memory 
>> for the kernel at early time. The code is the same as __next_mem_range_rev().
>>
>> Clear hotpluggable flag before releasing free pages to the buddy allocator.
> 
> Please try to explain "why" in addition to "what".  Why do we need to
> clear hotpluggable flag in free_low_memory_core_early() in addition to
> numa_clear_node_hotplug() in x86 numa.c?  Does this make x86 code
> redundant?  If not, why?
> 

Hi Tejun,

numa_clear_node_hotplug()? There is only numa_clear_kernel_node_hotplug().

If we don't clear hotpluggable flag in free_low_memory_core_early(), the 
memory which marked hotpluggable flag will not free to buddy allocator.
Because __next_mem_range() will skip them.

free_low_memory_core_early
	for_each_free_mem_range
		for_each_mem_range
			__next_mem_range		

Thanks,
Xishi Qiu


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
  2014-08-16 14:36   ` Xishi Qiu
@ 2014-08-17 11:08     ` Tejun Heo
  2014-08-18  1:13       ` tangchen
  2014-08-18  2:00       ` Xishi Qiu
  0 siblings, 2 replies; 8+ messages in thread
From: Tejun Heo @ 2014-08-17 11:08 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andrew Morton, Tang Chen, Zhang Yanfei, Wen Congyang,
	Rafael J. Wysocki, H. Peter Anvin, Linux MM, LKML

Hello,

On Sat, Aug 16, 2014 at 10:36:41PM +0800, Xishi Qiu wrote:
> numa_clear_node_hotplug()? There is only numa_clear_kernel_node_hotplug().

Yeah, that one.

> If we don't clear hotpluggable flag in free_low_memory_core_early(), the 
> memory which marked hotpluggable flag will not free to buddy allocator.
> Because __next_mem_range() will skip them.
> 
> free_low_memory_core_early
> 	for_each_free_mem_range
> 		for_each_mem_range
> 			__next_mem_range		

Ah, okay, so the patch fixes __next_mem_range() and thus makes
free_low_memory_core_early() to skip hotpluggable regions unlike
before.  Please explain things like that in the changelog.  Also,
what's its relationship with numa_clear_kernel_node_hotplug()?  Do we
still need them?  If so, what are the different roles that these two
separate places serve?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
  2014-08-17 11:08     ` Tejun Heo
@ 2014-08-18  1:13       ` tangchen
  2014-08-18  3:18         ` Xishi Qiu
  2014-08-18  2:00       ` Xishi Qiu
  1 sibling, 1 reply; 8+ messages in thread
From: tangchen @ 2014-08-18  1:13 UTC (permalink / raw)
  To: Tejun Heo, Xishi Qiu
  Cc: Andrew Morton, Zhang Yanfei, Wen Congyang, Rafael J. Wysocki,
	H. Peter Anvin, Linux MM, LKML, tangchen

Hi tj,

On 08/17/2014 07:08 PM, Tejun Heo wrote:
> Hello,
>
> On Sat, Aug 16, 2014 at 10:36:41PM +0800, Xishi Qiu wrote:
>> numa_clear_node_hotplug()? There is only numa_clear_kernel_node_hotplug().
> Yeah, that one.
>
>> If we don't clear hotpluggable flag in free_low_memory_core_early(), the
>> memory which marked hotpluggable flag will not free to buddy allocator.
>> Because __next_mem_range() will skip them.
>>
>> free_low_memory_core_early
>> 	for_each_free_mem_range
>> 		for_each_mem_range
>> 			__next_mem_range		
> Ah, okay, so the patch fixes __next_mem_range() and thus makes
> free_low_memory_core_early() to skip hotpluggable regions unlike
> before.  Please explain things like that in the changelog.  Also,
> what's its relationship with numa_clear_kernel_node_hotplug()?  Do we
> still need them?  If so, what are the different roles that these two
> separate places serve?

numa_clear_kernel_node_hotplug() only clears hotplug flags for the nodes
the kernel resides in, not for hotpluggable nodes. The reason why we did
this is to enable the kernel to allocate memory in case all the nodes are
hotpluggable.

And we clear hotplug flags for all the nodes in free_low_memory_core_early()
is because if we do not, all hotpluggable memory won't be able to be freed
to buddy after Qiu's patch.

Thanks.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
  2014-08-17 11:08     ` Tejun Heo
  2014-08-18  1:13       ` tangchen
@ 2014-08-18  2:00       ` Xishi Qiu
  1 sibling, 0 replies; 8+ messages in thread
From: Xishi Qiu @ 2014-08-18  2:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Tang Chen, Zhang Yanfei, Wen Congyang,
	Rafael J. Wysocki, H. Peter Anvin, Linux MM, LKML

On 2014/8/17 19:08, Tejun Heo wrote:

> Hello,
> 
> On Sat, Aug 16, 2014 at 10:36:41PM +0800, Xishi Qiu wrote:
>> numa_clear_node_hotplug()? There is only numa_clear_kernel_node_hotplug().
> 
> Yeah, that one.
> 
>> If we don't clear hotpluggable flag in free_low_memory_core_early(), the 
>> memory which marked hotpluggable flag will not free to buddy allocator.
>> Because __next_mem_range() will skip them.
>>
>> free_low_memory_core_early
>> 	for_each_free_mem_range
>> 		for_each_mem_range
>> 			__next_mem_range		
> 
> Ah, okay, so the patch fixes __next_mem_range() and thus makes
> free_low_memory_core_early() to skip hotpluggable regions unlike
> before.  Please explain things like that in the changelog.  Also,

OK, I will send V2.

Thanks,
Xishi Qiu

> what's its relationship with numa_clear_kernel_node_hotplug()?  Do we
> still need them?  If so, what are the different roles that these two
> separate places serve?
> 
> Thanks.
> 




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
  2014-08-18  1:13       ` tangchen
@ 2014-08-18  3:18         ` Xishi Qiu
  2014-08-18 13:13           ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Xishi Qiu @ 2014-08-18  3:18 UTC (permalink / raw)
  To: tangchen
  Cc: Tejun Heo, Andrew Morton, Zhang Yanfei, Wen Congyang,
	H. Peter Anvin, Linux MM, LKML

On 2014/8/18 9:13, tangchen wrote:

> Hi tj,
> 
> On 08/17/2014 07:08 PM, Tejun Heo wrote:
>> Hello,
>>
>> On Sat, Aug 16, 2014 at 10:36:41PM +0800, Xishi Qiu wrote:
>>> numa_clear_node_hotplug()? There is only numa_clear_kernel_node_hotplug().
>> Yeah, that one.
>>
>>> If we don't clear hotpluggable flag in free_low_memory_core_early(), the
>>> memory which marked hotpluggable flag will not free to buddy allocator.
>>> Because __next_mem_range() will skip them.
>>>
>>> free_low_memory_core_early
>>>     for_each_free_mem_range
>>>         for_each_mem_range
>>>             __next_mem_range       
>> Ah, okay, so the patch fixes __next_mem_range() and thus makes
>> free_low_memory_core_early() to skip hotpluggable regions unlike
>> before.  Please explain things like that in the changelog.  Also,
>> what's its relationship with numa_clear_kernel_node_hotplug()?  Do we
>> still need them?  If so, what are the different roles that these two
>> separate places serve?
> 
> numa_clear_kernel_node_hotplug() only clears hotplug flags for the nodes
> the kernel resides in, not for hotpluggable nodes. The reason why we did
> this is to enable the kernel to allocate memory in case all the nodes are
> hotpluggable.
> 

Hi TangChen,

I find a problem in numa_init() (arch/x86/mm/numa.c)
numa_init()
	...
	ret = init_func();  // this will mark hotpluggable flag from SRAT
	...
	memblock_set_bottom_up(false);
	...
	ret = numa_register_memblks(&numa_meminfo);  // this will alloc node data(pglist_data) 
	...
	numa_clear_kernel_node_hotplug();  // in case all the nodes are hotpluggable
	...

If all the nodes are marked hotpluggable flag, alloc node data will fail.
Because __next_mem_range_rev() will skip the hotpluggable memory regions.
numa_register_memblks()
	setup_node_data()
		memblock_find_in_range_node()
			__memblock_find_range_top_down()
				for_each_mem_range_rev()
					__next_mem_range_rev()

What do you think?
How about move numa_clear_kernel_node_hotplug() into numa_register_memblks(),
like this:

numa_register_memblks()

...
                memblock_set_node(mb->start, mb->end - mb->start,
                                  &memblock.reserved, mb->nid);
        }

+        numa_clear_kernel_node_hotplug();

        /*
         * If sections array is gonna be used for pfn -> nid mapping, check
...

Thanks,
Xishi Qiu

> And we clear hotplug flags for all the nodes in free_low_memory_core_early()
> is because if we do not, all hotpluggable memory won't be able to be freed
> to buddy after Qiu's patch.
> 
> Thanks.
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
  2014-08-18  3:18         ` Xishi Qiu
@ 2014-08-18 13:13           ` Tejun Heo
  0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2014-08-18 13:13 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: tangchen, Andrew Morton, Zhang Yanfei, Wen Congyang,
	H. Peter Anvin, Linux MM, LKML

Hello, Xishi, Tang.

On Mon, Aug 18, 2014 at 11:18:00AM +0800, Xishi Qiu wrote:
> If all the nodes are marked hotpluggable flag, alloc node data will fail.
> Because __next_mem_range_rev() will skip the hotpluggable memory regions.
> numa_register_memblks()
> 	setup_node_data()
> 		memblock_find_in_range_node()
> 			__memblock_find_range_top_down()
> 				for_each_mem_range_rev()
> 					__next_mem_range_rev()

I'm not sure clearing hotplug flag for all memory is the best approach
here.  The problem is that there are places where we want to be
selectively ignoring the hotplug status and apparently we may want it
back later.  Why not add an agument to memblock allocation / iteration
functions so that hotplug area can be skipped selectively?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-08-18 13:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-11 13:31 [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() Xishi Qiu
2014-08-16 13:04 ` Tejun Heo
2014-08-16 14:36   ` Xishi Qiu
2014-08-17 11:08     ` Tejun Heo
2014-08-18  1:13       ` tangchen
2014-08-18  3:18         ` Xishi Qiu
2014-08-18 13:13           ` Tejun Heo
2014-08-18  2:00       ` Xishi Qiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).