* [PATCH] [RESEND] x86_64: add memory hotremove config option @ 2008-09-05 17:21 Gary Hade 2008-09-05 17:44 ` Ingo Molnar 2008-09-05 18:04 ` [PATCH] [RESEND] x86_64: " Andi Kleen 0 siblings, 2 replies; 31+ messages in thread From: Gary Hade @ 2008-09-05 17:21 UTC (permalink / raw) To: linux-mm Cc: Andrew Morton, Yasunori Goto, Badari Pulavarty, Mel Gorman, Chris McDermott, Gary Hade, linux-kernel, x86, Ingo Molnar Resending with linux-kernel@vger.kernel.org and x86@kernel.org copied this time. No changes other than this and modified Subject line. The only response so far on linux-mm has been an Acked-by: from Yasunori Goto <y-goto@jp.fujitsu.com> Add memory hotremove config option to x86_64 Memory hotremove functionality can currently be configured into the ia64, powerpc, and s390 kernels. This patch makes it possible to configure the memory hotremove functionality into the x86_64 kernel as well. Signed-off-by: Gary Hade <garyhade@us.ibm.com> --- arch/x86/Kconfig | 3 +++ arch/x86/mm/init_64.c | 18 ++++++++++++++++++ 2 files changed, 21 insertions(+) Index: linux-2.6.27-rc5/arch/x86/Kconfig =================================================================== --- linux-2.6.27-rc5.orig/arch/x86/Kconfig 2008-09-03 13:33:59.000000000 -0700 +++ linux-2.6.27-rc5/arch/x86/Kconfig 2008-09-03 13:34:55.000000000 -0700 @@ -1384,6 +1384,9 @@ def_bool y depends on X86_64 || (X86_32 && HIGHMEM) +config ARCH_ENABLE_MEMORY_HOTREMOVE + def_bool y + config HAVE_ARCH_EARLY_PFN_TO_NID def_bool X86_64 depends on NUMA Index: linux-2.6.27-rc5/arch/x86/mm/init_64.c =================================================================== --- linux-2.6.27-rc5.orig/arch/x86/mm/init_64.c 2008-09-03 13:34:08.000000000 -0700 +++ linux-2.6.27-rc5/arch/x86/mm/init_64.c 2008-09-03 13:34:55.000000000 -0700 @@ -740,6 +740,24 @@ EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); #endif +#ifdef CONFIG_MEMORY_HOTREMOVE +int remove_memory(u64 start, u64 size) +{ + unsigned long start_pfn, end_pfn; + unsigned long timeout = 120 * HZ; + int ret; + start_pfn = start >> PAGE_SHIFT; + end_pfn = start_pfn + (size >> PAGE_SHIFT); + ret = offline_pages(start_pfn, end_pfn, timeout); + if (ret) + goto out; + /* Arch-specific calls go here */ +out: + return ret; +} +EXPORT_SYMBOL_GPL(remove_memory); +#endif /* CONFIG_MEMORY_HOTREMOVE */ + #endif /* CONFIG_MEMORY_HOTPLUG */ /* ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-05 17:21 [PATCH] [RESEND] x86_64: add memory hotremove config option Gary Hade @ 2008-09-05 17:44 ` Ingo Molnar 2008-09-05 18:14 ` Badari Pulavarty 2008-09-05 18:04 ` [PATCH] [RESEND] x86_64: " Andi Kleen 1 sibling, 1 reply; 31+ messages in thread From: Ingo Molnar @ 2008-09-05 17:44 UTC (permalink / raw) To: Gary Hade Cc: linux-mm, Andrew Morton, Yasunori Goto, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86 * Gary Hade <garyhade@us.ibm.com> wrote: > Add memory hotremove config option to x86_64 > > Memory hotremove functionality can currently be configured into the > ia64, powerpc, and s390 kernels. This patch makes it possible to > configure the memory hotremove functionality into the x86_64 kernel as > well. hm, why is it for 64-bit only? > +++ linux-2.6.27-rc5/arch/x86/Kconfig 2008-09-03 13:34:55.000000000 -0700 > @@ -1384,6 +1384,9 @@ > def_bool y > depends on X86_64 || (X86_32 && HIGHMEM) > > +config ARCH_ENABLE_MEMORY_HOTREMOVE > + def_bool y so this will break the build on 32-bit, if CONFIG_MEMORY_HOTREMOVE=y? mm/memory_hotplug.c assumes that remove_memory() is provided by the architecture. > +#ifdef CONFIG_MEMORY_HOTREMOVE > +int remove_memory(u64 start, u64 size) > +{ > + unsigned long start_pfn, end_pfn; > + unsigned long timeout = 120 * HZ; > + int ret; > + start_pfn = start >> PAGE_SHIFT; > + end_pfn = start_pfn + (size >> PAGE_SHIFT); > + ret = offline_pages(start_pfn, end_pfn, timeout); > + if (ret) > + goto out; > + /* Arch-specific calls go here */ > +out: > + return ret; > +} > +EXPORT_SYMBOL_GPL(remove_memory); > +#endif /* CONFIG_MEMORY_HOTREMOVE */ hm, nothing appears to be arch-specific about this trivial wrapper around offline_pages(). Shouldnt this be moved to the CONFIG_MEMORY_HOTREMOVE portion of mm/memory_hotplug.c instead, as a weak function? That way architectures only have to enable ARCH_ENABLE_MEMORY_HOTREMOVE - and architectures with different/special needs can override it. Ingo ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-05 17:44 ` Ingo Molnar @ 2008-09-05 18:14 ` Badari Pulavarty 2008-09-05 18:17 ` Ingo Molnar 0 siblings, 1 reply; 31+ messages in thread From: Badari Pulavarty @ 2008-09-05 18:14 UTC (permalink / raw) To: Ingo Molnar Cc: Gary Hade, linux-mm, Andrew Morton, Yasunori Goto, Mel Gorman, Chris McDermott, linux-kernel, x86 On Fri, 2008-09-05 at 19:44 +0200, Ingo Molnar wrote: > * Gary Hade <garyhade@us.ibm.com> wrote: > > > Add memory hotremove config option to x86_64 > > > > Memory hotremove functionality can currently be configured into the > > ia64, powerpc, and s390 kernels. This patch makes it possible to > > configure the memory hotremove functionality into the x86_64 kernel as > > well. > > hm, why is it for 64-bit only? > > > +++ linux-2.6.27-rc5/arch/x86/Kconfig 2008-09-03 13:34:55.000000000 -0700 > > @@ -1384,6 +1384,9 @@ > > def_bool y > > depends on X86_64 || (X86_32 && HIGHMEM) > > > > +config ARCH_ENABLE_MEMORY_HOTREMOVE > > + def_bool y > > so this will break the build on 32-bit, if CONFIG_MEMORY_HOTREMOVE=y? > mm/memory_hotplug.c assumes that remove_memory() is provided by the > architecture. > > > +#ifdef CONFIG_MEMORY_HOTREMOVE > > +int remove_memory(u64 start, u64 size) > > +{ > > + unsigned long start_pfn, end_pfn; > > + unsigned long timeout = 120 * HZ; > > + int ret; > > + start_pfn = start >> PAGE_SHIFT; > > + end_pfn = start_pfn + (size >> PAGE_SHIFT); > > + ret = offline_pages(start_pfn, end_pfn, timeout); > > + if (ret) > > + goto out; > > + /* Arch-specific calls go here */ > > +out: > > + return ret; > > +} > > +EXPORT_SYMBOL_GPL(remove_memory); > > +#endif /* CONFIG_MEMORY_HOTREMOVE */ > > hm, nothing appears to be arch-specific about this trivial wrapper > around offline_pages(). Yes. All the archs (ppc64, ia64, s390, x86_64) have exact same function. No architecture needed special handling so far (initial versions of ppc64 needed extra handling, but I moved the code to different place). We can make this generic and kill all arch-specific ones. Initially, we didn't know if any arch needs special handling - so ended up having private functions for each arch. I think its time to merge them all. > > Shouldnt this be moved to the CONFIG_MEMORY_HOTREMOVE portion of > mm/memory_hotplug.c instead, as a weak function? That way architectures > only have to enable ARCH_ENABLE_MEMORY_HOTREMOVE - and architectures > with different/special needs can override it. Yes. We should do that. I will send out a patch. Thanks, Badari ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-05 18:14 ` Badari Pulavarty @ 2008-09-05 18:17 ` Ingo Molnar 2008-09-08 21:52 ` [PATCH] Cleanup to make remove_memory() arch neutral Badari Pulavarty 2008-09-08 21:56 ` [PATCH] x86: add memory hotremove config option Badari Pulavarty 0 siblings, 2 replies; 31+ messages in thread From: Ingo Molnar @ 2008-09-05 18:17 UTC (permalink / raw) To: Badari Pulavarty Cc: Gary Hade, linux-mm, Andrew Morton, Yasunori Goto, Mel Gorman, Chris McDermott, linux-kernel, x86 * Badari Pulavarty <pbadari@us.ibm.com> wrote: > > On Fri, 2008-09-05 at 19:44 +0200, Ingo Molnar wrote: > > * Gary Hade <garyhade@us.ibm.com> wrote: > > > > > Add memory hotremove config option to x86_64 > > > > > > Memory hotremove functionality can currently be configured into the > > > ia64, powerpc, and s390 kernels. This patch makes it possible to > > > configure the memory hotremove functionality into the x86_64 kernel as > > > well. > > > > hm, why is it for 64-bit only? > > > > > +++ linux-2.6.27-rc5/arch/x86/Kconfig 2008-09-03 13:34:55.000000000 -0700 > > > @@ -1384,6 +1384,9 @@ > > > def_bool y > > > depends on X86_64 || (X86_32 && HIGHMEM) > > > > > > +config ARCH_ENABLE_MEMORY_HOTREMOVE > > > + def_bool y > > > > so this will break the build on 32-bit, if CONFIG_MEMORY_HOTREMOVE=y? > > mm/memory_hotplug.c assumes that remove_memory() is provided by the > > architecture. > > > > > +#ifdef CONFIG_MEMORY_HOTREMOVE > > > +int remove_memory(u64 start, u64 size) > > > +{ > > > + unsigned long start_pfn, end_pfn; > > > + unsigned long timeout = 120 * HZ; > > > + int ret; > > > + start_pfn = start >> PAGE_SHIFT; > > > + end_pfn = start_pfn + (size >> PAGE_SHIFT); > > > + ret = offline_pages(start_pfn, end_pfn, timeout); > > > + if (ret) > > > + goto out; > > > + /* Arch-specific calls go here */ > > > +out: > > > + return ret; > > > +} > > > +EXPORT_SYMBOL_GPL(remove_memory); > > > +#endif /* CONFIG_MEMORY_HOTREMOVE */ > > > > hm, nothing appears to be arch-specific about this trivial wrapper > > around offline_pages(). > > Yes. All the archs (ppc64, ia64, s390, x86_64) have exact same > function. No architecture needed special handling so far (initial > versions of ppc64 needed extra handling, but I moved the code > to different place). > > We can make this generic and kill all arch-specific ones. > Initially, we didn't know if any arch needs special handling - > so ended up having private functions for each arch. > I think its time to merge them all. > > > Shouldnt this be moved to the CONFIG_MEMORY_HOTREMOVE portion of > > mm/memory_hotplug.c instead, as a weak function? That way architectures > > only have to enable ARCH_ENABLE_MEMORY_HOTREMOVE - and architectures > > with different/special needs can override it. > > Yes. We should do that. I will send out a patch. ok - if all architectures have the same function then please make it a regular function not a weak one, and remove all the duplications. Ingo ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH] Cleanup to make remove_memory() arch neutral 2008-09-05 18:17 ` Ingo Molnar @ 2008-09-08 21:52 ` Badari Pulavarty 2008-09-09 0:56 ` Andrew Morton 2008-09-08 21:56 ` [PATCH] x86: add memory hotremove config option Badari Pulavarty 1 sibling, 1 reply; 31+ messages in thread From: Badari Pulavarty @ 2008-09-08 21:52 UTC (permalink / raw) To: Andrew Morton, Andrew Morton Cc: Gary Hade, linux-mm, Yasunori Goto, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar There is nothing architecture specific about remove_memory(). remove_memory() function is common for all architectures which support hotplug memory remove. Instead of duplicating it in every architecture, collapse them into arch neutral function. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> arch/ia64/mm/init.c | 17 ----------------- arch/powerpc/mm/mem.c | 17 ----------------- arch/s390/mm/init.c | 11 ----------- mm/memory_hotplug.c | 10 ++++++++++ 4 files changed, 10 insertions(+), 45 deletions(-) Index: linux-2.6.27-rc5/arch/ia64/mm/init.c =================================================================== --- linux-2.6.27-rc5.orig/arch/ia64/mm/init.c 2008-08-28 15:52:02.000000000 -0700 +++ linux-2.6.27-rc5/arch/ia64/mm/init.c 2008-09-08 12:38:59.000000000 -0700 @@ -701,23 +701,6 @@ int arch_add_memory(int nid, u64 start, return ret; } -#ifdef CONFIG_MEMORY_HOTREMOVE -int remove_memory(u64 start, u64 size) -{ - unsigned long start_pfn, end_pfn; - unsigned long timeout = 120 * HZ; - int ret; - start_pfn = start >> PAGE_SHIFT; - end_pfn = start_pfn + (size >> PAGE_SHIFT); - ret = offline_pages(start_pfn, end_pfn, timeout); - if (ret) - goto out; - /* we can free mem_map at this point */ -out: - return ret; -} -EXPORT_SYMBOL_GPL(remove_memory); -#endif /* CONFIG_MEMORY_HOTREMOVE */ #endif /* Index: linux-2.6.27-rc5/arch/powerpc/mm/mem.c =================================================================== --- linux-2.6.27-rc5.orig/arch/powerpc/mm/mem.c 2008-08-28 15:52:02.000000000 -0700 +++ linux-2.6.27-rc5/arch/powerpc/mm/mem.c 2008-09-08 12:39:19.000000000 -0700 @@ -135,23 +135,6 @@ int arch_add_memory(int nid, u64 start, return __add_pages(zone, start_pfn, nr_pages); } - -#ifdef CONFIG_MEMORY_HOTREMOVE -int remove_memory(u64 start, u64 size) -{ - unsigned long start_pfn, end_pfn; - int ret; - - start_pfn = start >> PAGE_SHIFT; - end_pfn = start_pfn + (size >> PAGE_SHIFT); - ret = offline_pages(start_pfn, end_pfn, 120 * HZ); - if (ret) - goto out; - /* Arch-specific calls go here - next patch */ -out: - return ret; -} -#endif /* CONFIG_MEMORY_HOTREMOVE */ #endif /* CONFIG_MEMORY_HOTPLUG */ /* Index: linux-2.6.27-rc5/arch/s390/mm/init.c =================================================================== --- linux-2.6.27-rc5.orig/arch/s390/mm/init.c 2008-08-28 15:52:02.000000000 -0700 +++ linux-2.6.27-rc5/arch/s390/mm/init.c 2008-09-08 12:40:41.000000000 -0700 @@ -189,14 +189,3 @@ int arch_add_memory(int nid, u64 start, return rc; } #endif /* CONFIG_MEMORY_HOTPLUG */ - -#ifdef CONFIG_MEMORY_HOTREMOVE -int remove_memory(u64 start, u64 size) -{ - unsigned long start_pfn, end_pfn; - - start_pfn = PFN_DOWN(start); - end_pfn = start_pfn + PFN_DOWN(size); - return offline_pages(start_pfn, end_pfn, 120 * HZ); -} -#endif /* CONFIG_MEMORY_HOTREMOVE */ Index: linux-2.6.27-rc5/mm/memory_hotplug.c =================================================================== --- linux-2.6.27-rc5.orig/mm/memory_hotplug.c 2008-08-28 15:52:02.000000000 -0700 +++ linux-2.6.27-rc5/mm/memory_hotplug.c 2008-09-08 12:41:37.000000000 -0700 @@ -26,6 +26,7 @@ #include <linux/delay.h> #include <linux/migrate.h> #include <linux/page-isolation.h> +#include <linux/pfn.h> #include <asm/tlbflush.h> @@ -849,6 +850,15 @@ failed_removal: return ret; } + +int remove_memory(u64 start, u64 size) +{ + unsigned long start_pfn, end_pfn; + + start_pfn = PFN_DOWN(start); + end_pfn = start_pfn + PFN_DOWN(size); + return offline_pages(start_pfn, end_pfn, 120 * HZ); +} #else int remove_memory(u64 start, u64 size) { ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] Cleanup to make remove_memory() arch neutral 2008-09-08 21:52 ` [PATCH] Cleanup to make remove_memory() arch neutral Badari Pulavarty @ 2008-09-09 0:56 ` Andrew Morton 2008-09-09 1:14 ` Randy Dunlap 2008-09-09 1:21 ` Yasunori Goto 0 siblings, 2 replies; 31+ messages in thread From: Andrew Morton @ 2008-09-09 0:56 UTC (permalink / raw) To: Badari Pulavarty Cc: garyhade, linux-mm, y-goto, mel, lcm, linux-kernel, x86, mingo On Mon, 08 Sep 2008 14:52:34 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote: > There is nothing architecture specific about remove_memory(). > remove_memory() function is common for all architectures which > support hotplug memory remove. Instead of duplicating it in every > architecture, collapse them into arch neutral function. > > Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> > > arch/ia64/mm/init.c | 17 ----------------- > arch/powerpc/mm/mem.c | 17 ----------------- > arch/s390/mm/init.c | 11 ----------- > mm/memory_hotplug.c | 10 ++++++++++ > 4 files changed, 10 insertions(+), 45 deletions(-) I spent some time trying to build-test this on ia64 and gave up. How the heck do you turn on memory hotplug on ia64? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] Cleanup to make remove_memory() arch neutral 2008-09-09 0:56 ` Andrew Morton @ 2008-09-09 1:14 ` Randy Dunlap 2008-09-09 1:21 ` Yasunori Goto 1 sibling, 0 replies; 31+ messages in thread From: Randy Dunlap @ 2008-09-09 1:14 UTC (permalink / raw) To: Andrew Morton Cc: Badari Pulavarty, garyhade, linux-mm, y-goto, mel, lcm, linux-kernel, x86, mingo On Mon, 8 Sep 2008 17:56:21 -0700 Andrew Morton wrote: > On Mon, 08 Sep 2008 14:52:34 -0700 > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > There is nothing architecture specific about remove_memory(). > > remove_memory() function is common for all architectures which > > support hotplug memory remove. Instead of duplicating it in every > > architecture, collapse them into arch neutral function. > > > > Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> > > > > arch/ia64/mm/init.c | 17 ----------------- > > arch/powerpc/mm/mem.c | 17 ----------------- > > arch/s390/mm/init.c | 11 ----------- > > mm/memory_hotplug.c | 10 ++++++++++ > > 4 files changed, 10 insertions(+), 45 deletions(-) > > I spent some time trying to build-test this on ia64 and gave up. How > the heck do you turn on memory hotplug on ia64? After using ia64 defconfig, all I had to do was enable Sparse Memory model instead of Discontiguous. --- ~Randy Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA http://linuxplumbersconf.org/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] Cleanup to make remove_memory() arch neutral 2008-09-09 0:56 ` Andrew Morton 2008-09-09 1:14 ` Randy Dunlap @ 2008-09-09 1:21 ` Yasunori Goto 2008-09-09 15:12 ` Badari Pulavarty 1 sibling, 1 reply; 31+ messages in thread From: Yasunori Goto @ 2008-09-09 1:21 UTC (permalink / raw) To: Andrew Morton Cc: Badari Pulavarty, garyhade, linux-mm, mel, lcm, linux-kernel, x86, mingo > On Mon, 08 Sep 2008 14:52:34 -0700 > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > There is nothing architecture specific about remove_memory(). > > remove_memory() function is common for all architectures which > > support hotplug memory remove. Instead of duplicating it in every > > architecture, collapse them into arch neutral function. > > > > Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> > > > > arch/ia64/mm/init.c | 17 ----------------- > > arch/powerpc/mm/mem.c | 17 ----------------- > > arch/s390/mm/init.c | 11 ----------- > > mm/memory_hotplug.c | 10 ++++++++++ > > 4 files changed, 10 insertions(+), 45 deletions(-) > > I spent some time trying to build-test this on ia64 and gave up. How > the heck do you turn on memory hotplug on ia64? > EXPORT_SYMBOL_GPL(remove_memory) is removed. It is required by drivers/acpi/acpi_memhotplug.ko. -- Yasunori Goto ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] Cleanup to make remove_memory() arch neutral 2008-09-09 1:21 ` Yasunori Goto @ 2008-09-09 15:12 ` Badari Pulavarty 0 siblings, 0 replies; 31+ messages in thread From: Badari Pulavarty @ 2008-09-09 15:12 UTC (permalink / raw) To: Yasunori Goto Cc: Andrew Morton, garyhade, linux-mm, mel, lcm, linux-kernel, x86, mingo On Tue, 2008-09-09 at 10:21 +0900, Yasunori Goto wrote: > > On Mon, 08 Sep 2008 14:52:34 -0700 > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > There is nothing architecture specific about remove_memory(). > > > remove_memory() function is common for all architectures which > > > support hotplug memory remove. Instead of duplicating it in every > > > architecture, collapse them into arch neutral function. > > > > > > Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> > > > > > > arch/ia64/mm/init.c | 17 ----------------- > > > arch/powerpc/mm/mem.c | 17 ----------------- > > > arch/s390/mm/init.c | 11 ----------- > > > mm/memory_hotplug.c | 10 ++++++++++ > > > 4 files changed, 10 insertions(+), 45 deletions(-) > > > > I spent some time trying to build-test this on ia64 and gave up. How > > the heck do you turn on memory hotplug on ia64? > > > > EXPORT_SYMBOL_GPL(remove_memory) is removed. > It is required by drivers/acpi/acpi_memhotplug.ko. Thanks for catching it. I forgot that it was being used by acpi. Since we didn't export it for ppc and s390, I assumed its safe to remove the export. Sorry !! Thanks, Badari ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH] x86: add memory hotremove config option 2008-09-05 18:17 ` Ingo Molnar 2008-09-08 21:52 ` [PATCH] Cleanup to make remove_memory() arch neutral Badari Pulavarty @ 2008-09-08 21:56 ` Badari Pulavarty 1 sibling, 0 replies; 31+ messages in thread From: Badari Pulavarty @ 2008-09-08 21:56 UTC (permalink / raw) To: Ingo Molnar Cc: Gary Hade, linux-mm, Andrew Morton, Yasunori Goto, Mel Gorman, Chris McDermott, linux-kernel, x86 Cleaned up patch with out remove_memory(). Depends on make remove_memory() arch neutral patch. Thanks, Badari Add memory hotremove config option to x86 Memory hotremove functionality can currently be configured into the ia64, powerpc, and s390 kernels. This patch makes it possible to configure the memory hotremove functionality into the x86 kernel as well. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Gary Hade <garyhade@us.ibm.com> --- arch/x86/Kconfig | 4 ++++ 1 file changed, 4 insertions(+) Index: linux-2.6.27-rc5/arch/x86/Kconfig =================================================================== --- linux-2.6.27-rc5.orig/arch/x86/Kconfig 2008-09-08 12:36:06.000000000 -0700 +++ linux-2.6.27-rc5/arch/x86/Kconfig 2008-09-08 12:45:30.000000000 -0700 @@ -1384,6 +1384,10 @@ config ARCH_ENABLE_MEMORY_HOTPLUG def_bool y depends on X86_64 || (X86_32 && HIGHMEM) +config ARCH_ENABLE_MEMORY_HOTREMOVE + def_bool y + depends on MEMORY_HOTPLUG + config HAVE_ARCH_EARLY_PFN_TO_NID def_bool X86_64 depends on NUMA ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-05 17:21 [PATCH] [RESEND] x86_64: add memory hotremove config option Gary Hade 2008-09-05 17:44 ` Ingo Molnar @ 2008-09-05 18:04 ` Andi Kleen 2008-09-05 18:31 ` Badari Pulavarty 2008-09-05 19:53 ` Gary Hade 1 sibling, 2 replies; 31+ messages in thread From: Andi Kleen @ 2008-09-05 18:04 UTC (permalink / raw) To: Gary Hade Cc: linux-mm, Andrew Morton, Yasunori Goto, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar Gary Hade <garyhade@us.ibm.com> writes: > > Add memory hotremove config option to x86_64 > > Memory hotremove functionality can currently be configured into > the ia64, powerpc, and s390 kernels. This patch makes it possible > to configure the memory hotremove functionality into the x86_64 > kernel as well. You forgot to describe how you tested it? Does it actually work. And why do you want to do it it? What's the use case? The general understanding was that it doesn't work very well on a real machine at least because it cannot be controlled how that memory maps to real pluggable hardware (and you cannot completely empty a node at runtime) and a Hypervisor would likely use different interfaces anyways. -Andi ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-05 18:04 ` [PATCH] [RESEND] x86_64: " Andi Kleen @ 2008-09-05 18:31 ` Badari Pulavarty 2008-09-05 18:54 ` Andi Kleen 2008-09-05 19:53 ` Gary Hade 1 sibling, 1 reply; 31+ messages in thread From: Badari Pulavarty @ 2008-09-05 18:31 UTC (permalink / raw) To: Andi Kleen Cc: Gary Hade, linux-mm, Andrew Morton, Yasunori Goto, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar On Fri, 2008-09-05 at 20:04 +0200, Andi Kleen wrote: > Gary Hade <garyhade@us.ibm.com> writes: > > > > Add memory hotremove config option to x86_64 > > > > Memory hotremove functionality can currently be configured into > > the ia64, powerpc, and s390 kernels. This patch makes it possible > > to configure the memory hotremove functionality into the x86_64 > > kernel as well. > > You forgot to describe how you tested it? Does it actually work. > And why do you want to do it it? What's the use case? I will let Gary answer these :) > The general understanding was that it doesn't work very well on a real > machine at least because it cannot be controlled how that memory maps > to real pluggable hardware (and you cannot completely empty a node at runtime) > and a Hypervisor would likely use different interfaces anyways. At this time we are interested on node remove (on x86_64). It doesn't really work well at this time - due to some of the structures (pgdat etc) are striped across all nodes. These is no easy way to relocate them. Yasunori Goto is working on patches to address some of these issues. But we are considering adding support to restrict/skip bootmem allocations on selected nodes. That way, we should be able to do node remove. (BTW, on ppc64 this works fine - since we are interested mostly in removing *some* sections of memory to give it back to hypervisor - not entire node removal). Thanks, Badari ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-05 18:31 ` Badari Pulavarty @ 2008-09-05 18:54 ` Andi Kleen 2008-09-05 22:34 ` Badari Pulavarty 0 siblings, 1 reply; 31+ messages in thread From: Andi Kleen @ 2008-09-05 18:54 UTC (permalink / raw) To: Badari Pulavarty Cc: Andi Kleen, Gary Hade, linux-mm, Andrew Morton, Yasunori Goto, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar > At this time we are interested on node remove (on x86_64). > It doesn't really work well at this time - That's a quite euphemistic way to put it. > due to some of the structures That means you can never put any slab data on specific nodes. And all the kernel subsystems on that node will not ever get local memory. How are you going to solve that? And if you disallow kernel allocations in so large memory areas you get many of the highmem issues that plagued 32bit back in the 64bit kernel. There are lots of other issues. It's quite questionable if this whole exercise makes sense at all. > (BTW, on ppc64 this works fine - since we are interested mostly in > removing *some* sections of memory to give it back to hypervisor - > not entire node removal). Ok for hypervisors you can do it reasonably easy on x86 too, but it's likely that some hypercall interface is better than going through sysfs. -Andi ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-05 18:54 ` Andi Kleen @ 2008-09-05 22:34 ` Badari Pulavarty 0 siblings, 0 replies; 31+ messages in thread From: Badari Pulavarty @ 2008-09-05 22:34 UTC (permalink / raw) To: Andi Kleen Cc: Gary Hade, linux-mm, Andrew Morton, Yasunori Goto, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar On Fri, 2008-09-05 at 20:54 +0200, Andi Kleen wrote: > > At this time we are interested on node remove (on x86_64). > > It doesn't really work well at this time - > > That's a quite euphemistic way to put it. > > > due to some of the structures > > That means you can never put any slab data on specific nodes. > And all the kernel subsystems on that node will not ever get local > memory. How are you going to solve that? And if you disallow > kernel allocations in so large memory areas you get many of the highmem > issues that plagued 32bit back in the 64bit kernel. You are absolutely correct. There is no easy solution - one has to loose performance in order to support node removal, along with some old x86 issues :( We were contemplating idea of limiting node removal to few select set of nodes as a compromise - but it didn't sound right :( > > There are lots of other issues. It's quite questionable if this > whole exercise makes sense at all. Same issues exist with ia64 and x86_64 won't be any worse off. Gary was trying to enable the functionality so that we can atleast test out offlining memory section easier (test page migration, isolation code and hash out issues) Another possible idea being considered (still lot of unknowns) to make use offline memory section feature for power management (*cough*). Anyway, as you can see this patch doesn't add any code - just enables config option for x86_64. (if you are worried about code bloat). > > (BTW, on ppc64 this works fine - since we are interested mostly in > > removing *some* sections of memory to give it back to hypervisor - > > not entire node removal). > > Ok for hypervisors you can do it reasonably easy on x86 too, but it's likely > that some hypercall interface is better than going through > sysfs. sysfs interface already exists to offline sections of memory. (same interface as online). The proposed patch provides easy way to find out what sections of memory belongs to which node. (could be useful on its own). Thanks, Badari ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-05 18:04 ` [PATCH] [RESEND] x86_64: " Andi Kleen 2008-09-05 18:31 ` Badari Pulavarty @ 2008-09-05 19:53 ` Gary Hade 2008-09-05 20:04 ` Andi Kleen 1 sibling, 1 reply; 31+ messages in thread From: Gary Hade @ 2008-09-05 19:53 UTC (permalink / raw) To: Andi Kleen Cc: Gary Hade, linux-mm, Andrew Morton, Yasunori Goto, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar On Fri, Sep 05, 2008 at 08:04:55PM +0200, Andi Kleen wrote: > Gary Hade <garyhade@us.ibm.com> writes: > > > > Add memory hotremove config option to x86_64 > > > > Memory hotremove functionality can currently be configured into > > the ia64, powerpc, and s390 kernels. This patch makes it possible > > to configure the memory hotremove functionality into the x86_64 > > kernel as well. > > You forgot to describe how you tested it? Does it actually work. So far, I have tested it on a 2-node IBM x460, 2-node IBM x3950, and a 4-node IBM x3950 M2 and have been able to successfully offline and re-online all memory sections marked as removable multiple times with no apparent problems. By directing the change to -mm our hope is that others will try it on their systems and help us shake out any issues that they my find. > And why do you want to do it it? What's the use case? A baby step towards evental total node removal. > > The general understanding was that it doesn't work very well on a real > machine at least because it cannot be controlled how that memory maps > to real pluggable hardware (and you cannot completely empty a node at runtime) > and a Hypervisor would likely use different interfaces anyways. The inability to offline all non-primary node memory sections certainly needs to be addressed. The pgdat removal work that Yasunori Goto has started will hopefully continue and help resolve this issue. We have only just started thinking about issues related to resources other that CPUs and memory that will need to be released in preparation for node removal (e.g. memory and i/o resources assigned to PCI devices on a node targeted for removal). Much of this is new territory for us so any suggestions that you and others can offer will be much appreciated. Thanks for asking. Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-05 19:53 ` Gary Hade @ 2008-09-05 20:04 ` Andi Kleen 2008-09-05 21:54 ` Gary Hade 0 siblings, 1 reply; 31+ messages in thread From: Andi Kleen @ 2008-09-05 20:04 UTC (permalink / raw) To: Gary Hade Cc: Andi Kleen, linux-mm, Andrew Morton, Yasunori Goto, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar > The inability to offline all non-primary node memory sections > certainly needs to be addressed. The pgdat removal work that > Yasunori Goto has started will hopefully continue and help resolve > this issue. You make it sound like it's just some minor technical hurdle that needs to be addressed. But from all analysis of these issues I've seen so far it's extremly hard and all possible solutions have serious issues. So before doing some baby steps there should be at least some general idea how this thing is supposed to work in the end. > We have only just started thinking about issues related > to resources other that CPUs and memory that will need to be released > in preparation for node removal (e.g. memory and i/o resources > assigned to PCI devices on a node targeted for removal). That's the easy stuff. The hard parts are all the kernel objects that you cannot move. -Andi ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-05 20:04 ` Andi Kleen @ 2008-09-05 21:54 ` Gary Hade 2008-09-06 0:01 ` Andi Kleen 0 siblings, 1 reply; 31+ messages in thread From: Gary Hade @ 2008-09-05 21:54 UTC (permalink / raw) To: Andi Kleen Cc: Gary Hade, linux-mm, Andrew Morton, Yasunori Goto, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar On Fri, Sep 05, 2008 at 10:04:01PM +0200, Andi Kleen wrote: > > The inability to offline all non-primary node memory sections > > certainly needs to be addressed. The pgdat removal work that > > Yasunori Goto has started will hopefully continue and help resolve > > this issue. > > You make it sound like it's just some minor technical hurdle > that needs to be addressed. Sorry, that was not my intent. > But from all analysis of these issues > I've seen so far it's extremly hard and all possible solutions > have serious issues. So before doing some baby steps there > should be at least some general idea how this thing is supposed > to work in the end. I am not sure if I understand why you appear to be opposed to enabling the hotremove function before all the issues related to an eventual goal of being able to free all memory on a node are addressed. Even in the absence of solutions for these issues it seems like there could still be other possible benefits such as the ability to selectively expand and shrink available memory for testing or debugging purposes. I believe it would also be helpful to those working on or testing possible solutions for the removal issues. Gary -- Gary Hade System x Enablement IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade@us.ibm.com http://www.ibm.com/linux/ltc ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-05 21:54 ` Gary Hade @ 2008-09-06 0:01 ` Andi Kleen 2008-09-06 7:06 ` Yasunori Goto 0 siblings, 1 reply; 31+ messages in thread From: Andi Kleen @ 2008-09-06 0:01 UTC (permalink / raw) To: Gary Hade Cc: Andi Kleen, linux-mm, Andrew Morton, Yasunori Goto, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar > I am not sure if I understand why you appear to be opposed to > enabling the hotremove function before all the issues related I'm quite sceptical that it can be ever made to work in a useful way for real hardware (as opposed to an hypervisor para virtual setup for which this interface is not the right way -- it should be done in some specific driver instead) And if it cannot be made to work then it will be a false promise to the user. They will see it and think it will work, but it will not. This means I don't see a real use case for this feature. -Andi ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-06 0:01 ` Andi Kleen @ 2008-09-06 7:06 ` Yasunori Goto 2008-09-06 8:53 ` Andi Kleen ` (3 more replies) 0 siblings, 4 replies; 31+ messages in thread From: Yasunori Goto @ 2008-09-06 7:06 UTC (permalink / raw) To: Andi Kleen Cc: Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar > > I am not sure if I understand why you appear to be opposed to > > enabling the hotremove function before all the issues related > > I'm quite sceptical that it can be ever made to work in a useful > way for real hardware (as opposed to an hypervisor para virtual setup > for which this interface is not the right way -- it should be done > in some specific driver instead) > And if it cannot be made to work then it will be a false promise > to the user. They will see it and think it will work, but it will > not. > > This means I don't see a real use case for this feature. I don't think its driver is almighty. IIRC, balloon driver can be cause of fragmentation for 24-7 system. In addition, I have heard that memory hotplug would be useful for reducing of power consumption of DIMM. I have to admit that memory hotplug has many issues, but I would like to solve them step by step. Thanks. -- Yasunori Goto ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-06 7:06 ` Yasunori Goto @ 2008-09-06 8:53 ` Andi Kleen 2008-09-08 5:52 ` Nick Piggin 2008-09-06 14:33 ` Ingo Molnar ` (2 subsequent siblings) 3 siblings, 1 reply; 31+ messages in thread From: Andi Kleen @ 2008-09-06 8:53 UTC (permalink / raw) To: Yasunori Goto Cc: Andi Kleen, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar On Sat, Sep 06, 2008 at 04:06:38PM +0900, Yasunori Goto wrote: > > not. > > > > This means I don't see a real use case for this feature. > > I don't think its driver is almighty. > IIRC, balloon driver can be cause of fragmentation for 24-7 system. Sure the balloon driver can be likely improved too, it's just that I don't think a balloon driver should call into the function the original patch in the series hooked up. > > In addition, I have heard that memory hotplug would be useful for reducing > of power consumption of DIMM. It's unclear that memory hotplug is the right model for DIMM power management. The problem is that DIMMs are interleaved, so you again have to completely free a quite large area. It's not much easier than node hotplug. > I have to admit that memory hotplug has many issues, but I would like to Let's call it "node" or "hardware" memory hot unplug, not that anyone confuses it with the easier VM based hot unplug or the really easy hotadd. > solve them step by step. The question is if they are even solvable in a useful way. I'm not sure it's that useful to start and then find out that it doesn't work anyways. -Andi ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-06 8:53 ` Andi Kleen @ 2008-09-08 5:52 ` Nick Piggin 2008-09-08 9:36 ` Andi Kleen 0 siblings, 1 reply; 31+ messages in thread From: Nick Piggin @ 2008-09-08 5:52 UTC (permalink / raw) To: Andi Kleen Cc: Yasunori Goto, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar On Saturday 06 September 2008 18:53, Andi Kleen wrote: > On Sat, Sep 06, 2008 at 04:06:38PM +0900, Yasunori Goto wrote: > > > not. > > > > > > This means I don't see a real use case for this feature. > > > > I don't think its driver is almighty. > > IIRC, balloon driver can be cause of fragmentation for 24-7 system. > > Sure the balloon driver can be likely improved too, it's just > that I don't think a balloon driver should call into the function > the original patch in the series hooked up. > > > In addition, I have heard that memory hotplug would be useful for > > reducing of power consumption of DIMM. > > It's unclear that memory hotplug is the right model for DIMM power > management. The problem is that DIMMs are interleaved, so you again have to > completely free a quite large area. It's not much easier than node hotplug. > > > I have to admit that memory hotplug has many issues, but I would like to > > Let's call it "node" or "hardware" memory hot unplug, not that > anyone confuses it with the easier VM based hot unplug or the really > easy hotadd. > > > solve them step by step. > > The question is if they are even solvable in a useful way. > I'm not sure it's that useful to start and then find out > that it doesn't work anyways. You use non-linear mappings for the kernel, so that kernel data is not tied to a specific physical address. AFAIK, that is the only way to really do it completely (like the fragmentation problem). Of course, I don't think that would be a good idea to do that in the forseeable future. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-08 5:52 ` Nick Piggin @ 2008-09-08 9:36 ` Andi Kleen 2008-09-08 9:46 ` Nick Piggin 0 siblings, 1 reply; 31+ messages in thread From: Andi Kleen @ 2008-09-08 9:36 UTC (permalink / raw) To: Nick Piggin Cc: Andi Kleen, Yasunori Goto, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar > You use non-linear mappings for the kernel, so that kernel data is > not tied to a specific physical address. AFAIK, that is the only way > to really do it completely (like the fragmentation problem). Even with that there are lots of issues, like keeping track of DMAs or handling executing kernel code. > > Of course, I don't think that would be a good idea to do that in the > forseeable future. Agreed. -Andi -- ak@linux.intel.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-08 9:36 ` Andi Kleen @ 2008-09-08 9:46 ` Nick Piggin 2008-09-08 10:30 ` Andi Kleen 0 siblings, 1 reply; 31+ messages in thread From: Nick Piggin @ 2008-09-08 9:46 UTC (permalink / raw) To: Andi Kleen Cc: Yasunori Goto, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar On Monday 08 September 2008 19:36, Andi Kleen wrote: > > You use non-linear mappings for the kernel, so that kernel data is > > not tied to a specific physical address. AFAIK, that is the only way > > to really do it completely (like the fragmentation problem). > > Even with that there are lots of issues, like keeping track of > DMAs or handling executing kernel code. Right, but the "high level" software solution is to have nonlinear kernel mappings. Executing kernel code should not be so hard because it could be handled just like executing user code (ie. the CPU that is executing will subsequently fault and be blocked until the relocation is complete). DMAs aren't trivial at all, but I guess there could be say, a method to submit and revoke areas of memory for DMA, and the submit would block if the memory is currently being relocated underneath it (then it would be able to find the new address). Anwyay, whatever the case, yeah I'm not trying to say it is trivial at all. Even without thinking about DMA it would be costly. > > Of course, I don't think that would be a good idea to do that in the > > forseeable future. > > Agreed. Same as the "anti-frag" patches. We must not proceed with this kind of thing on the justification that "in future we'll be able to unplug any bit of memory". Because it is not just a matter of logical steps to reach that point, but basically a fundamental rethink of how the kernel memory mapping should work. Other realistic justifications are OK, but if someone wants to unplug everything, then please put effort into *first* making the kernel mapping nonlinear, and then we can look at the complexity and performance costs of that fundamental step. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-08 9:46 ` Nick Piggin @ 2008-09-08 10:30 ` Andi Kleen 2008-09-08 11:19 ` Nick Piggin 0 siblings, 1 reply; 31+ messages in thread From: Andi Kleen @ 2008-09-08 10:30 UTC (permalink / raw) To: Nick Piggin Cc: Andi Kleen, Yasunori Goto, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar On Mon, Sep 08, 2008 at 07:46:30PM +1000, Nick Piggin wrote: > On Monday 08 September 2008 19:36, Andi Kleen wrote: > > > You use non-linear mappings for the kernel, so that kernel data is > > > not tied to a specific physical address. AFAIK, that is the only way > > > to really do it completely (like the fragmentation problem). > > > > Even with that there are lots of issues, like keeping track of > > DMAs or handling executing kernel code. > > Right, but the "high level" software solution is to have nonlinear > kernel mappings. Executing kernel code should not be so hard because > it could be handled just like executing user code (ie. the CPU that > is executing will subsequently fault and be blocked until the > relocation is complete). First blocking arbitary code is hard. There is some code parts which are not allowed to block arbitarily. Machine check or NMI handlers come to mind, but there are likely more. Then that would be essentially a hypervisor or micro kernel approach. e.g. Xen does that already kind of, but even there it would be quite hard to do fully in a general way. And for hardware hotplug only the fully generally way is actually useful unfortunately. -Andi ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-08 10:30 ` Andi Kleen @ 2008-09-08 11:19 ` Nick Piggin 2008-09-08 11:30 ` Andi Kleen 0 siblings, 1 reply; 31+ messages in thread From: Nick Piggin @ 2008-09-08 11:19 UTC (permalink / raw) To: Andi Kleen Cc: Yasunori Goto, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar On Monday 08 September 2008 20:30, Andi Kleen wrote: > On Mon, Sep 08, 2008 at 07:46:30PM +1000, Nick Piggin wrote: > > On Monday 08 September 2008 19:36, Andi Kleen wrote: > > > > You use non-linear mappings for the kernel, so that kernel data is > > > > not tied to a specific physical address. AFAIK, that is the only way > > > > to really do it completely (like the fragmentation problem). > > > > > > Even with that there are lots of issues, like keeping track of > > > DMAs or handling executing kernel code. > > > > Right, but the "high level" software solution is to have nonlinear > > kernel mappings. Executing kernel code should not be so hard because > > it could be handled just like executing user code (ie. the CPU that > > is executing will subsequently fault and be blocked until the > > relocation is complete). > > First blocking arbitary code is hard. There is some code parts > which are not allowed to block arbitarily. Machine check or NMI > handlers come to mind, but there are likely more. Sorry, by "block", I really mean spin I guess. I mean that the CPU will be forced to stop executing due to the page fault during this sequence: for prot RO: alloc new page memcpy(new, old) ptep_clear_flush(ptep) <--- from here set_pte(ptep, newpte) <--- until here for prot RW, the window also would include the memcpy, however if that adds too much latency for execute/reads, then it can be mapped RO first, then memcpy, then flushed and switched. > Then that would be essentially a hypervisor or micro kernel approach. What would be? Blocking in interrupts? Or non-linear kernel mapping in general? Nonlinear kernel mapping I don't think anyone disputes is the only way to defragment (for unplug or large allocations) arbitrary physical memory with any sort of guarantee. In the future if TLB costs grow very much larger, I think this might be worth considering. But until that becomes inevitable, I really don't want to hack the VM with crap like transparent variable order mappings etc. but rather "encourage" CPU manufacturers to have big fast TLBs :) > e.g. Xen does that already kind of, but even there it would > be quite hard to do fully in a general way. And for hardware hotplug > only the fully generally way is actually useful unfortunately. Yeah I don't really get the hardware hotplug thing. For reliability or anything it should all be done in hardware (eg. warm/hot spare memory module). For power I guess there is some argument, but I would prefer to wait the trends out longer before committing to something big: non volatile ram replacement for dram for example might be achieved in future. But if anybody disagrees, they are sure free to implement non-linear kernel mappings and physical defragmentation and shut me up with real numbers! ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-08 11:19 ` Nick Piggin @ 2008-09-08 11:30 ` Andi Kleen 2008-09-08 13:48 ` Nick Piggin 0 siblings, 1 reply; 31+ messages in thread From: Andi Kleen @ 2008-09-08 11:30 UTC (permalink / raw) To: Nick Piggin Cc: Andi Kleen, Yasunori Goto, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar > Sorry, by "block", I really mean spin I guess. I mean that the CPU will > be forced to stop executing due to the page fault during this sequence: It's hard for NMIs at least. They cannot execute faults. In the end you would need to define a core kernel which cannot be remapped and the rest which can and you end up with even more micro kernel like mess. > ptep_clear_flush(ptep) <--- from here > set_pte(ptep, newpte) <--- until here > > for prot RW, the window also would include the memcpy, however if that > adds too much latency for execute/reads, then it can be mapped RO first, > then memcpy, then flushed and switched. > > > > Then that would be essentially a hypervisor or micro kernel approach. > > What would be? Blocking in interrupts? Or non-linear kernel mapping in Well in general someone remapping all the memory beyond you. That's essentially a hypervisor in my book. -Andi ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-08 11:30 ` Andi Kleen @ 2008-09-08 13:48 ` Nick Piggin 0 siblings, 0 replies; 31+ messages in thread From: Nick Piggin @ 2008-09-08 13:48 UTC (permalink / raw) To: Andi Kleen Cc: Yasunori Goto, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86, Ingo Molnar On Monday 08 September 2008 21:30, Andi Kleen wrote: > > Sorry, by "block", I really mean spin I guess. I mean that the CPU will > > be forced to stop executing due to the page fault during this sequence: > > It's hard for NMIs at least. They cannot execute faults. Well, just for executing code (and reading RO data), then it shouldn't matter at all actually if the CPU starts executing from the new page or the old page, so long as there is a way to quiesce NMIs before freeing the old page. So the NMI can run, and read data, but it may have a problem with stores. At least, some kind of redesign of NMI handlers might be required so that they can make a note of the pending operation and try to do something sane in that case. Or, there could be a small region of memory; a page or two, which does not get migrated and NMIs can write to it. I don't think you need to go so far as saying the entire kernel image must be non movable just for NMIs. > In the end you would need to define a core kernel which > cannot be remapped and the rest which can and you end up > with even more micro kernel like mess. Are there any important NMIs that really can't fit with this? > > ptep_clear_flush(ptep) <--- from here > > set_pte(ptep, newpte) <--- until here > > > > for prot RW, the window also would include the memcpy, however if that > > adds too much latency for execute/reads, then it can be mapped RO first, > > then memcpy, then flushed and switched. > > > > > Then that would be essentially a hypervisor or micro kernel approach. > > > > What would be? Blocking in interrupts? Or non-linear kernel mapping in > > Well in general someone remapping all the memory beyond you. > That's essentially a hypervisor in my book. I don't see it. It is among one of the things a hypervisor may do. But anyway, call it what you will. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-06 7:06 ` Yasunori Goto 2008-09-06 8:53 ` Andi Kleen @ 2008-09-06 14:33 ` Ingo Molnar 2008-09-06 16:00 ` kamezawa.hiroyu 2008-09-06 16:05 ` kamezawa.hiroyu 3 siblings, 0 replies; 31+ messages in thread From: Ingo Molnar @ 2008-09-06 14:33 UTC (permalink / raw) To: Yasunori Goto Cc: Andi Kleen, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86 * Yasunori Goto <y-goto@jp.fujitsu.com> wrote: > I don't think its driver is almighty. IIRC, balloon driver can be > cause of fragmentation for 24-7 system. > > In addition, I have heard that memory hotplug would be useful for > reducing of power consumption of DIMM. > > I have to admit that memory hotplug has many issues, but I would like > to solve them step by step. What would be nice is to insert the information both during bootup and in /proc/meminfo and 'free' output that hot-removable memory segments are not generic free memory, it's currently a limited resource that might or might not be sufficient to serve a given workload. Perhaps even exclude it from 'total' memory reported by meminfo - to be on the safe side of user expectations. In terms of user-space memory it is already generic swappable memory but in terms of kernel-space allocations it is not. As i said it earlier in the thread, i certainly have no objections from the x86 maintenance side - nothing is worse than a generic kernel feature only available on certain less frequently used platforms. Memory hotplug has been available for some time in the MM and it's not really causing any maintenance trouble at the moment and it is not enabled by default either. Having said that, i have my doubts about its generic utility (the power saving aspects are likely not realizable - nobody really wants DIMMs to just sit there unused and the cost of dynamic migration is just horrendous) - but as long as it's opt-in there's no reason to limit the availability of an in-kernel feature artificially. Removing those limitations of kernel-space allocations should indeed be done in baby steps - and whether it's worth turning such memory into completely generic kernel memory is an open question. But the fact that a piece of memory is not fully generic is no reason not to allow users to create special, capability-limited RAM resources like they can already do via hugetlbfs or ramfs, as long as the the capability limitations are advertised clearly. Yes, memory hotplug has limitations we all understand, but still it's an arguably useful feature in some circumstances. If we never give a feature a chance to evolve on the main Linux platform that 90%+ of our users use it wont ever be truly useful. Please send the new patches against -git or -tip and we can put them into a separate standalone feature topic and can test it on various x86 boxes and send them towards linux-next if Andrew agrees with that process too. Btw., it would be nice if memory hotplug had a self-test that could be activated from the .config and would run autonomously (a bit like rcu-torture): it would mark say 10% of all RAM as hot-pluggable during bootup and would periodically hot-plug and hot-unplug that memory, every 10 seconds or 30 seconds or so, transparently. That would also test the x86 architecture's pagetable init code, the page migration code, etc. (Disabled by default and dependent on DEBUG_KERNEL && EXPERIMENTAL.) Ingo ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-06 7:06 ` Yasunori Goto 2008-09-06 8:53 ` Andi Kleen 2008-09-06 14:33 ` Ingo Molnar @ 2008-09-06 16:00 ` kamezawa.hiroyu 2008-09-06 16:17 ` Ingo Molnar 2008-09-06 16:05 ` kamezawa.hiroyu 3 siblings, 1 reply; 31+ messages in thread From: kamezawa.hiroyu @ 2008-09-06 16:00 UTC (permalink / raw) To: Ingo Molnar Cc: Yasunori Goto, Andi Kleen, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86 ----- Original Message ----- >* Yasunori Goto <y-goto@jp.fujitsu.com> wrote: > >> I don't think its driver is almighty. IIRC, balloon driver can be >> cause of fragmentation for 24-7 system. >> >> In addition, I have heard that memory hotplug would be useful for >> reducing of power consumption of DIMM. >> >> I have to admit that memory hotplug has many issues, but I would like >> to solve them step by step. > >What would be nice is to insert the information both during bootup and >in /proc/meminfo and 'free' output that hot-removable memory segments >are not generic free memory, it's currently a limited resource that >might or might not be sufficient to serve a given workload. > >Perhaps even exclude it from 'total' memory reported by meminfo - to be >on the safe side of user expectations. In terms of user-space memory it >is already generic swappable memory but in terms of kernel-space >allocations it is not. > I wonder why anyone doesn't talk about ZONE_MOVABLE...When I wrote memory hotplug, I assumed help of ZONE_MOVABLE and SPARSEMEM. It is shown in meminfo.(I think memory hotplug is useful only when ZONE_MOVABLE is used.) Most of problems which Goto wrote are mainly about placement of memmap and pgdat, zones. One example is that "when SPARSEMEM_VMEMMAP is enabled, memmap is not removed even when memory is removed. " >As i said it earlier in the thread, i certainly have no objections from >the x86 maintenance side - nothing is worse than a generic kernel >feature only available on certain less frequently used platforms. Memory >hotplug has been available for some time in the MM and it's not really >causing any maintenance trouble at the moment and it is not enabled by >default either. > >Having said that, i have my doubts about its generic utility (the power >saving aspects are likely not realizable - nobody really wants DIMMs to >just sit there unused and the cost of dynamic migration is just >horrendous) - but as long as it's opt-in there's no reason to limit the >availability of an in-kernel feature artificially. Nobody ? maybe just a trade-off problem in user side. Even without DIMM hotplug or DIMM's power save mode, making a DIMM idle is of no use ? I think memory consumes much power when it used. Memory Hotplug and ZONE_MOVABLE can make some memory idle. (I'm sorry if my thinking is wrong.) > >Removing those limitations of kernel-space allocations should indeed be >done in baby steps - and whether it's worth turning such memory into >completely generic kernel memory is an open question. > I think generic kernel space memory hotplug will never be available. >But the fact that a piece of memory is not fully generic is no reason >not to allow users to create special, capability-limited RAM resources >like they can already do via hugetlbfs or ramfs, as long as the the >capability limitations are advertised clearly. > Hmm, adding a feature like - offline some memory at boot. - online-memory-as-hugeltb mode is useful for generic pc users ? Regards, -Kame ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-06 16:00 ` kamezawa.hiroyu @ 2008-09-06 16:17 ` Ingo Molnar 0 siblings, 0 replies; 31+ messages in thread From: Ingo Molnar @ 2008-09-06 16:17 UTC (permalink / raw) To: kamezawa.hiroyu Cc: Yasunori Goto, Andi Kleen, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86 * kamezawa.hiroyu@jp.fujitsu.com <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > Removing those limitations of kernel-space allocations should indeed > > be done in baby steps - and whether it's worth turning such memory > > into completely generic kernel memory is an open question. > > I think generic kernel space memory hotplug will never be available. yeah, most likely. (It's possible technically even on a native kernel - just very expensive to various aspects of the kernel.) > > But the fact that a piece of memory is not fully generic is no > > reason not to allow users to create special, capability-limited RAM > > resources like they can already do via hugetlbfs or ramfs, as long > > as the the capability limitations are advertised clearly. > > Hmm, adding a feature like > - offline some memory at boot. > - online-memory-as-hugeltb mode > > is useful for generic pc users ? yeah - it's actually the way how hugetlb should be done. Plus expand gbpages to hugetlbfs and hotplug memory on Barcelona CPUs and you can do user-space apps that can run for a long time without any TLB misses. _That_ might make sense to explore in practice. (i'm not holding my breath though, TLB misses are _fast_ on the best x86 CPUs.) But we wont be able to make such experiments without having the capability on x86. So i'd like to break the catch-22 by accepting all this into arch/x86, it certainly is simple and makes some sense, it's just that i'm not that convinced about it personally at the moment. So feel free to turn it all into a killer feature (make hugetlb backed memory transparent to user-space, etc. etc.) that high-performance computing users strive for and all that will change. Please send the reshaped patches so we can move past the 'what if' discussion phase ;-) Ingo ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Re: Re: [PATCH] [RESEND] x86_64: add memory hotremove config option 2008-09-06 7:06 ` Yasunori Goto ` (2 preceding siblings ...) 2008-09-06 16:00 ` kamezawa.hiroyu @ 2008-09-06 16:05 ` kamezawa.hiroyu 3 siblings, 0 replies; 31+ messages in thread From: kamezawa.hiroyu @ 2008-09-06 16:05 UTC (permalink / raw) To: kamezawa.hiroyu Cc: Ingo Molnar, Yasunori Goto, Andi Kleen, Gary Hade, linux-mm, Andrew Morton, Badari Pulavarty, Mel Gorman, Chris McDermott, linux-kernel, x86 ----- Original Message ----- >>Having said that, i have my doubts about its generic utility (the power >>saving aspects are likely not realizable - nobody really wants DIMMs to >>just sit there unused and the cost of dynamic migration is just >>horrendous) - but as long as it's opt-in there's no reason to limit the >>availability of an in-kernel feature artificially. > >Nobody ? maybe just a trade-off problem in user side. >Even without DIMM hotplug or DIMM's power save mode, making a DIMM idle >is of no use ? I think memory consumes much power when it used. >Memory Hotplug and ZONE_MOVABLE can make some memory idle. >(I'm sorry if my thinking is wrong.) > But I have to point out HDD access consumes far power than memory. That's trade-off problem depends on usage, anyway. Thanks, -Kame ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2008-09-09 15:12 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-09-05 17:21 [PATCH] [RESEND] x86_64: add memory hotremove config option Gary Hade 2008-09-05 17:44 ` Ingo Molnar 2008-09-05 18:14 ` Badari Pulavarty 2008-09-05 18:17 ` Ingo Molnar 2008-09-08 21:52 ` [PATCH] Cleanup to make remove_memory() arch neutral Badari Pulavarty 2008-09-09 0:56 ` Andrew Morton 2008-09-09 1:14 ` Randy Dunlap 2008-09-09 1:21 ` Yasunori Goto 2008-09-09 15:12 ` Badari Pulavarty 2008-09-08 21:56 ` [PATCH] x86: add memory hotremove config option Badari Pulavarty 2008-09-05 18:04 ` [PATCH] [RESEND] x86_64: " Andi Kleen 2008-09-05 18:31 ` Badari Pulavarty 2008-09-05 18:54 ` Andi Kleen 2008-09-05 22:34 ` Badari Pulavarty 2008-09-05 19:53 ` Gary Hade 2008-09-05 20:04 ` Andi Kleen 2008-09-05 21:54 ` Gary Hade 2008-09-06 0:01 ` Andi Kleen 2008-09-06 7:06 ` Yasunori Goto 2008-09-06 8:53 ` Andi Kleen 2008-09-08 5:52 ` Nick Piggin 2008-09-08 9:36 ` Andi Kleen 2008-09-08 9:46 ` Nick Piggin 2008-09-08 10:30 ` Andi Kleen 2008-09-08 11:19 ` Nick Piggin 2008-09-08 11:30 ` Andi Kleen 2008-09-08 13:48 ` Nick Piggin 2008-09-06 14:33 ` Ingo Molnar 2008-09-06 16:00 ` kamezawa.hiroyu 2008-09-06 16:17 ` Ingo Molnar 2008-09-06 16:05 ` kamezawa.hiroyu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).