All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] numa, mm, memory-hotplug: Do not allocate pagetable to local node with MEMORY_HOTREMOVE enabled.
@ 2013-05-16 11:50 Tang Chen
  2013-05-21  7:02 ` Pekka Enberg
  0 siblings, 1 reply; 3+ messages in thread
From: Tang Chen @ 2013-05-16 11:50 UTC (permalink / raw)
  To: yinghai, tglx, mingo, hpa, penberg, jacob.shin, akpm, isimatu.yasuaki
  Cc: x86, linux-kernel

The following patch-set allocated pagetables to local node.
https://lkml.org/lkml/2013/4/11/829

Doing this will break memory hot-remove.

Before removing memory, the kernel offlines memory. If offlining
memory fails, the memory cannot be removed. The pagetables are
used by the kernel, so they cannot be offlined. Furthermore, they
cannot be removed.

Of course, we can free pagetable pages because the pagetables of
the to be removed memory are useless. But offlining memory doesn't
mean removing memory. If users only want to offline memory, the
pagetables should not be freed.

The minimum unit of memory online/offline is block. And by default,
one block contains one section, which by default is 128MB. There is
possiblity that half of the block is pagetable, and the other half
is movable memory.

When we offline this kind of block, the status of the block is
uncertain. We cannot simply free the pagetables in this block because
they may be used by other online blocks. But when doing memory
hot-remove, the failure of offlining blocks will break the memory
hot-remove logic.


In order to fix it, we have three solutions:

1. Reserve the whole block (128MB), making no user can use the rest
   parts of the block. And skip them when offlining memory.
   When all the other blocks are offlined, free the pagetable, and remove
   all the memory.

   But we may lose some memory for this purpose. 128MB is a little big
   to waste.


2. Keep this block online. Although the offline operation fails, it is
   OK to remove memory.

   But the offline operation will always fail. And generally speaking,
   there are a lot of reasons of offline failing, it is difficult to
   detect if it is OK to remove memory. So we don't suggest this way.


3. Migrate user pages and make this block offline. Offlining memory won't
   stop the kernel using the pagetables stored in them, so it will be OK.

   But this will change the semantics of "offline". I'm not sure if we
   can do it in this way.


So before we fix this problem, I think we should not allocate pagetables
to local node when CONFIG_MEMORY_HOTREMOVE is enabled. And recover it when
we confirm the direction and fix the problem.

This patch is based on
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-mm

Any other solution for this problem is welcome.


Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/mm/init.c |   27 ++++++++++++++++-----------
 1 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 8d0007a..8cd8a2d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -55,18 +55,23 @@ __ref void *alloc_low_pages(unsigned int num)
 
 	if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) {
 		unsigned long ret;
-		if (local_min_pfn_mapped >= local_max_pfn_mapped) {
+#ifndef CONFIG_MEMORY_HOTPLUG
+		if (local_max_pfn_mapped > local_min_pfn_mapped) {
+			ret = memblock_find_in_range(
+					local_min_pfn_mapped << PAGE_SHIFT,
+					local_max_pfn_mapped << PAGE_SHIFT,
+					PAGE_SIZE * num , PAGE_SIZE);
+		} else
+#endif
+		{
 			if (low_min_pfn_mapped >= low_max_pfn_mapped)
 				panic("alloc_low_page: ran out of memory");
 			ret = memblock_find_in_range(
 					low_min_pfn_mapped << PAGE_SHIFT,
 					low_max_pfn_mapped << PAGE_SHIFT,
 					PAGE_SIZE * num , PAGE_SIZE);
-		} else
-			ret = memblock_find_in_range(
-					local_min_pfn_mapped << PAGE_SHIFT,
-					local_max_pfn_mapped << PAGE_SHIFT,
-					PAGE_SIZE * num , PAGE_SIZE);
+		}
+
 		if (!ret)
 			panic("alloc_low_page: can not alloc memory");
 		memblock_reserve(ret, PAGE_SIZE * num);
@@ -443,6 +448,11 @@ void __init init_mem_mapping(unsigned long begin, unsigned long end)
 		if (new_mapped_ram_size > mapped_ram_size)
 			step_size <<= STEP_SIZE_SHIFT;
 		mapped_ram_size += new_mapped_ram_size;
+
+		if (is_low) {
+			low_min_pfn_mapped = local_min_pfn_mapped;
+			low_max_pfn_mapped = local_max_pfn_mapped;
+		}
 	}
 
 	if (real_end < end) {
@@ -450,11 +460,6 @@ void __init init_mem_mapping(unsigned long begin, unsigned long end)
 		if ((end >> PAGE_SHIFT) > local_max_pfn_mapped)
 			local_max_pfn_mapped = end >> PAGE_SHIFT;
 	}
-
-	if (is_low) {
-		low_min_pfn_mapped = local_min_pfn_mapped;
-		low_max_pfn_mapped = local_max_pfn_mapped;
-	}
 }
 
 #ifndef CONFIG_NUMA
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] numa, mm, memory-hotplug: Do not allocate pagetable to local node with MEMORY_HOTREMOVE enabled.
  2013-05-16 11:50 [PATCH 1/1] numa, mm, memory-hotplug: Do not allocate pagetable to local node with MEMORY_HOTREMOVE enabled Tang Chen
@ 2013-05-21  7:02 ` Pekka Enberg
  2013-05-21  7:12   ` Tang Chen
  0 siblings, 1 reply; 3+ messages in thread
From: Pekka Enberg @ 2013-05-21  7:02 UTC (permalink / raw)
  To: Tang Chen
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jacob Shin, Andrew Morton, Yasuaki Ishimatsu, x86 maintainers,
	LKML

On Thu, May 16, 2013 at 2:50 PM, Tang Chen <tangchen@cn.fujitsu.com> wrote:
> The following patch-set allocated pagetables to local node.
> https://lkml.org/lkml/2013/4/11/829
>
> Doing this will break memory hot-remove.
>
> Before removing memory, the kernel offlines memory. If offlining
> memory fails, the memory cannot be removed. The pagetables are
> used by the kernel, so they cannot be offlined. Furthermore, they
> cannot be removed.
>
> Of course, we can free pagetable pages because the pagetables of
> the to be removed memory are useless. But offlining memory doesn't
> mean removing memory. If users only want to offline memory, the
> pagetables should not be freed.
>
> The minimum unit of memory online/offline is block. And by default,
> one block contains one section, which by default is 128MB. There is
> possiblity that half of the block is pagetable, and the other half
> is movable memory.
>
> When we offline this kind of block, the status of the block is
> uncertain. We cannot simply free the pagetables in this block because
> they may be used by other online blocks. But when doing memory
> hot-remove, the failure of offlining blocks will break the memory
> hot-remove logic.
>
>
> In order to fix it, we have three solutions:
>
> 1. Reserve the whole block (128MB), making no user can use the rest
>    parts of the block. And skip them when offlining memory.
>    When all the other blocks are offlined, free the pagetable, and remove
>    all the memory.
>
>    But we may lose some memory for this purpose. 128MB is a little big
>    to waste.
>
>
> 2. Keep this block online. Although the offline operation fails, it is
>    OK to remove memory.
>
>    But the offline operation will always fail. And generally speaking,
>    there are a lot of reasons of offline failing, it is difficult to
>    detect if it is OK to remove memory. So we don't suggest this way.
>
>
> 3. Migrate user pages and make this block offline. Offlining memory won't
>    stop the kernel using the pagetables stored in them, so it will be OK.
>
>    But this will change the semantics of "offline". I'm not sure if we
>    can do it in this way.
>
>
> So before we fix this problem, I think we should not allocate pagetables
> to local node when CONFIG_MEMORY_HOTREMOVE is enabled. And recover it when
> we confirm the direction and fix the problem.
>
> This patch is based on
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-mm
>
> Any other solution for this problem is welcome.
>
>
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>

Ugh. Special-casing for CONFIG_MEMORY_HOTPLUG is just begging for
trouble. Were you able to determine which commit broke memory
hot-remove?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] numa, mm, memory-hotplug: Do not allocate pagetable to local node with MEMORY_HOTREMOVE enabled.
  2013-05-21  7:02 ` Pekka Enberg
@ 2013-05-21  7:12   ` Tang Chen
  0 siblings, 0 replies; 3+ messages in thread
From: Tang Chen @ 2013-05-21  7:12 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jacob Shin, Andrew Morton, Yasuaki Ishimatsu, x86 maintainers,
	LKML

Hi

On 05/21/2013 03:02 PM, Pekka Enberg wrote:
......
>
> Ugh. Special-casing for CONFIG_MEMORY_HOTPLUG is just begging for
> trouble. Were you able to determine which commit broke memory
> hot-remove?

Please refer to the following patch-set.
https://lkml.org/lkml/2013/4/11/829

patch21 and patch22 will allocate pagetable to local, which may cause
memory hot-remove fail.

But this patch-set is not in the mainline now.

Thanks. :)



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-05-21  7:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-16 11:50 [PATCH 1/1] numa, mm, memory-hotplug: Do not allocate pagetable to local node with MEMORY_HOTREMOVE enabled Tang Chen
2013-05-21  7:02 ` Pekka Enberg
2013-05-21  7:12   ` Tang Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.