linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Memory Add Fixes for ppc64
@ 2005-11-04 23:15 Mike Kravetz
  2005-11-04 23:18 ` [PATCH 1/4] " Mike Kravetz
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Mike Kravetz @ 2005-11-04 23:15 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc64-dev, linux-kernel, lhms-devel

When memory add was merged into mainline in 2.6.14, there were
various bits and pieces missing that prevent it from working on
ppc64.  The following patches are against 2.6.14-git7 and address
all but one of the know issues.

1) Create hptes for new sections
2) Clear page count before freeing new pages
3) Kludge to add new memory to node 0
4) Ensure probe file is created for memory add via sysfs

-- 
Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] Memory Add Fixes for ppc64
  2005-11-04 23:15 [PATCH 0/4] Memory Add Fixes for ppc64 Mike Kravetz
@ 2005-11-04 23:18 ` Mike Kravetz
  2005-11-05  0:04   ` Benjamin Herrenschmidt
  2005-11-04 23:19 ` [PATCH 2/4] " Mike Kravetz
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Mike Kravetz @ 2005-11-04 23:18 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc64-dev, linux-kernel, lhms-devel

Add the create_section_mapping() routine to create hptes for memory
sections dynamically added after system boot.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>

diff -Naupr linux-2.6.14-git7/arch/powerpc/mm/hash_utils_64.c linux-2.6.14-git7.work/arch/powerpc/mm/hash_utils_64.c
--- linux-2.6.14-git7/arch/powerpc/mm/hash_utils_64.c	2005-11-04 21:21:05.000000000 +0000
+++ linux-2.6.14-git7.work/arch/powerpc/mm/hash_utils_64.c	2005-11-04 22:05:06.000000000 +0000
@@ -176,6 +176,15 @@ static unsigned long get_hashtable_size(
 	return pteg_count << 7;
 }
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+void create_section_mapping(unsigned long start, unsigned long end)
+{
+	create_pte_mapping(start, end,
+		_PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX,
+		cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE ? 1 : 0);
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
 void __init htab_initialize(void)
 {
 	unsigned long table, htab_size_bytes;
diff -Naupr linux-2.6.14-git7/arch/powerpc/mm/mem.c linux-2.6.14-git7.work/arch/powerpc/mm/mem.c
--- linux-2.6.14-git7/arch/powerpc/mm/mem.c	2005-11-04 21:21:05.000000000 +0000
+++ linux-2.6.14-git7.work/arch/powerpc/mm/mem.c	2005-11-04 22:05:06.000000000 +0000
@@ -124,6 +124,9 @@ int __devinit add_memory(u64 start, u64 
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
+	start += KERNELBASE;
+	create_section_mapping(start, start + size);
+
 	/* this should work for most non-highmem platforms */
 	zone = pgdata->node_zones;
 
diff -Naupr linux-2.6.14-git7/include/asm-ppc64/sparsemem.h linux-2.6.14-git7.work/include/asm-ppc64/sparsemem.h
--- linux-2.6.14-git7/include/asm-ppc64/sparsemem.h	2005-10-28 00:02:08.000000000 +0000
+++ linux-2.6.14-git7.work/include/asm-ppc64/sparsemem.h	2005-11-04 22:05:06.000000000 +0000
@@ -11,6 +11,10 @@
 #define MAX_PHYSADDR_BITS       38
 #define MAX_PHYSMEM_BITS        36
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern void create_section_mapping(unsigned long start, unsigned long end);
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
 #endif /* CONFIG_SPARSEMEM */
 
 #endif /* _ASM_PPC64_SPARSEMEM_H */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/4] Memory Add Fixes for ppc64
  2005-11-04 23:15 [PATCH 0/4] Memory Add Fixes for ppc64 Mike Kravetz
  2005-11-04 23:18 ` [PATCH 1/4] " Mike Kravetz
@ 2005-11-04 23:19 ` Mike Kravetz
  2005-11-04 23:20 ` [PATCH 3/4] " Mike Kravetz
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Mike Kravetz @ 2005-11-04 23:19 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc64-dev, linux-kernel, lhms-devel

memmap_init_zone() sets page count to 1.  Before 'freeing' the
page, we need to clear the count.  This is the same that is done
on free_all_bootmem_core() for memory discovered at boot time.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>

diff -Naupr linux-2.6.14-git7/arch/powerpc/mm/mem.c linux-2.6.14-git7.work/arch/powerpc/mm/mem.c
--- linux-2.6.14-git7/arch/powerpc/mm/mem.c	2005-11-04 21:21:05.000000000 +0000
+++ linux-2.6.14-git7.work/arch/powerpc/mm/mem.c	2005-11-04 22:09:59.000000000 +0000
@@ -107,6 +107,7 @@ EXPORT_SYMBOL(phys_mem_access_prot);
 void online_page(struct page *page)
 {
 	ClearPageReserved(page);
+	set_page_count(page, 0);
 	free_cold_page(page);
 	totalram_pages++;
 	num_physpages++;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/4] Memory Add Fixes for ppc64
  2005-11-04 23:15 [PATCH 0/4] Memory Add Fixes for ppc64 Mike Kravetz
  2005-11-04 23:18 ` [PATCH 1/4] " Mike Kravetz
  2005-11-04 23:19 ` [PATCH 2/4] " Mike Kravetz
@ 2005-11-04 23:20 ` Mike Kravetz
  2005-11-04 23:21 ` [PATCH 4/4] " Mike Kravetz
  2005-11-08  0:39 ` [PATCH 0/4] " Anton Blanchard
  4 siblings, 0 replies; 20+ messages in thread
From: Mike Kravetz @ 2005-11-04 23:20 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc64-dev, linux-kernel, lhms-devel

This is a temporary kludge that supports adding all new memory to
node 0.  I will provide a more complete solution similar to that
used for dynamically added CPUs in a few days.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>

diff -Naupr linux-2.6.14-git7/include/asm-ppc64/mmzone.h linux-2.6.14-git7.work/include/asm-ppc64/mmzone.h
--- linux-2.6.14-git7/include/asm-ppc64/mmzone.h	2005-11-04 21:21:09.000000000 +0000
+++ linux-2.6.14-git7.work/include/asm-ppc64/mmzone.h	2005-11-04 22:10:44.000000000 +0000
@@ -33,6 +33,9 @@ extern int numa_cpu_lookup_table[];
 extern char *numa_memory_lookup_table;
 extern cpumask_t numa_cpumask_lookup_table[];
 extern int nr_cpus_in_node[];
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern unsigned long max_pfn;
+#endif
 
 /* 16MB regions */
 #define MEMORY_INCREMENT_SHIFT 24
@@ -45,6 +48,11 @@ static inline int pa_to_nid(unsigned lon
 {
 	int nid;
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+	/* kludge hot added sections default to node 0 */
+	if (pa >= (max_pfn << PAGE_SHIFT))
+		return 0;
+#endif
 	nid = numa_memory_lookup_table[pa >> MEMORY_INCREMENT_SHIFT];
 
 #ifdef DEBUG_NUMA

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] Memory Add Fixes for ppc64
  2005-11-04 23:15 [PATCH 0/4] Memory Add Fixes for ppc64 Mike Kravetz
                   ` (2 preceding siblings ...)
  2005-11-04 23:20 ` [PATCH 3/4] " Mike Kravetz
@ 2005-11-04 23:21 ` Mike Kravetz
  2005-11-07  0:59   ` Paul Mackerras
  2005-11-08  0:39 ` [PATCH 0/4] " Anton Blanchard
  4 siblings, 1 reply; 20+ messages in thread
From: Mike Kravetz @ 2005-11-04 23:21 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc64-dev, linux-kernel, lhms-devel

ppc64 needs a special sysfs probe file for adding new memory.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>

diff -Naupr linux-2.6.14-git7/arch/ppc64/Kconfig linux-2.6.14-git7.work/arch/ppc64/Kconfig
--- linux-2.6.14-git7/arch/ppc64/Kconfig	2005-11-04 21:21:06.000000000 +0000
+++ linux-2.6.14-git7.work/arch/ppc64/Kconfig	2005-11-04 22:11:16.000000000 +0000
@@ -277,6 +277,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
 	def_bool y
 	depends on NEED_MULTIPLE_NODES
 
+config ARCH_MEMORY_PROBE
+	def_bool y
+	depends on MEMORY_HOTPLUG
+
 # Some NUMA nodes have memory ranges that span
 # other nodes.  Even though a pfn is valid and
 # between a node's start and end pfns, it may not

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] Memory Add Fixes for ppc64
  2005-11-04 23:18 ` [PATCH 1/4] " Mike Kravetz
@ 2005-11-05  0:04   ` Benjamin Herrenschmidt
  2005-11-05  0:35     ` Mike Kravetz
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Benjamin Herrenschmidt @ 2005-11-05  0:04 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Paul Mackerras, linuxppc64-dev, linux-kernel, lhms-devel

On Fri, 2005-11-04 at 15:18 -0800, Mike Kravetz wrote:
> Add the create_section_mapping() routine to create hptes for memory
> sections dynamically added after system boot.
> 
> Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>

This patch will have to be slightly reworked on top of the 64k pages
one. It should be trivial though.

Ben.

> diff -Naupr linux-2.6.14-git7/arch/powerpc/mm/hash_utils_64.c linux-2.6.14-git7.work/arch/powerpc/mm/hash_utils_64.c
> --- linux-2.6.14-git7/arch/powerpc/mm/hash_utils_64.c	2005-11-04 21:21:05.000000000 +0000
> +++ linux-2.6.14-git7.work/arch/powerpc/mm/hash_utils_64.c	2005-11-04 22:05:06.000000000 +0000
> @@ -176,6 +176,15 @@ static unsigned long get_hashtable_size(
>  	return pteg_count << 7;
>  }
>  
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +void create_section_mapping(unsigned long start, unsigned long end)
> +{
> +	create_pte_mapping(start, end,
> +		_PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX,
> +		cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE ? 1 : 0);
> +}
> +#endif /* CONFIG_MEMORY_HOTPLUG */
> +
>  void __init htab_initialize(void)
>  {
>  	unsigned long table, htab_size_bytes;
> diff -Naupr linux-2.6.14-git7/arch/powerpc/mm/mem.c linux-2.6.14-git7.work/arch/powerpc/mm/mem.c
> --- linux-2.6.14-git7/arch/powerpc/mm/mem.c	2005-11-04 21:21:05.000000000 +0000
> +++ linux-2.6.14-git7.work/arch/powerpc/mm/mem.c	2005-11-04 22:05:06.000000000 +0000
> @@ -124,6 +124,9 @@ int __devinit add_memory(u64 start, u64 
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
>  
> +	start += KERNELBASE;
> +	create_section_mapping(start, start + size);
> +
>  	/* this should work for most non-highmem platforms */
>  	zone = pgdata->node_zones;
>  
> diff -Naupr linux-2.6.14-git7/include/asm-ppc64/sparsemem.h linux-2.6.14-git7.work/include/asm-ppc64/sparsemem.h
> --- linux-2.6.14-git7/include/asm-ppc64/sparsemem.h	2005-10-28 00:02:08.000000000 +0000
> +++ linux-2.6.14-git7.work/include/asm-ppc64/sparsemem.h	2005-11-04 22:05:06.000000000 +0000
> @@ -11,6 +11,10 @@
>  #define MAX_PHYSADDR_BITS       38
>  #define MAX_PHYSMEM_BITS        36
>  
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +extern void create_section_mapping(unsigned long start, unsigned long end);
> +#endif /* CONFIG_MEMORY_HOTPLUG */
> +
>  #endif /* CONFIG_SPARSEMEM */
>  
>  #endif /* _ASM_PPC64_SPARSEMEM_H */
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] Memory Add Fixes for ppc64
  2005-11-05  0:04   ` Benjamin Herrenschmidt
@ 2005-11-05  0:35     ` Mike Kravetz
  2005-11-05  0:43       ` Benjamin Herrenschmidt
  2005-11-07 20:47     ` Mike Kravetz
  2005-11-08  0:25     ` [PATCH 1/4] revised " Mike Kravetz
  2 siblings, 1 reply; 20+ messages in thread
From: Mike Kravetz @ 2005-11-05  0:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paul Mackerras, linuxppc64-dev, linux-kernel, lhms-devel

On Sat, Nov 05, 2005 at 11:04:30AM +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2005-11-04 at 15:18 -0800, Mike Kravetz wrote:
> > Add the create_section_mapping() routine to create hptes for memory
> > sections dynamically added after system boot.
> > 
> > Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
> 
> This patch will have to be slightly reworked on top of the 64k pages
> one. It should be trivial though.

OK.  I'll respin on top of your patch at:

http://gate.crashing.org/~benh/ppc64-64k-pages.diff

Let me know if there is a different version going upstream.
-- 
Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] Memory Add Fixes for ppc64
  2005-11-05  0:35     ` Mike Kravetz
@ 2005-11-05  0:43       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 20+ messages in thread
From: Benjamin Herrenschmidt @ 2005-11-05  0:43 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Paul Mackerras, linuxppc64-dev, linux-kernel, lhms-devel

On Fri, 2005-11-04 at 16:35 -0800, Mike Kravetz wrote:
> On Sat, Nov 05, 2005 at 11:04:30AM +1100, Benjamin Herrenschmidt wrote:
> > On Fri, 2005-11-04 at 15:18 -0800, Mike Kravetz wrote:
> > > Add the create_section_mapping() routine to create hptes for memory
> > > sections dynamically added after system boot.
> > > 
> > > Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
> > 
> > This patch will have to be slightly reworked on top of the 64k pages
> > one. It should be trivial though.
> 
> OK.  I'll respin on top of your patch at:
> 
> http://gate.crashing.org/~benh/ppc64-64k-pages.diff
> 
> Let me know if there is a different version going upstream

I'll check if it still applied after linus pulls the next round of ppc
updates

Ben.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] Memory Add Fixes for ppc64
  2005-11-04 23:21 ` [PATCH 4/4] " Mike Kravetz
@ 2005-11-07  0:59   ` Paul Mackerras
  2005-11-07 17:39     ` Mike Kravetz
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Mackerras @ 2005-11-07  0:59 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: linuxppc64-dev, linux-kernel, lhms-devel

Mike Kravetz writes:

> ppc64 needs a special sysfs probe file for adding new memory.
> 
> Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
> 
> diff -Naupr linux-2.6.14-git7/arch/ppc64/Kconfig linux-2.6.14-git7.work/arch/ppc64/Kconfig
> --- linux-2.6.14-git7/arch/ppc64/Kconfig	2005-11-04 21:21:06.000000000 +0000
> +++ linux-2.6.14-git7.work/arch/ppc64/Kconfig	2005-11-04 22:11:16.000000000 +0000
> @@ -277,6 +277,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
>  	def_bool y
>  	depends on NEED_MULTIPLE_NODES
>  
> +config ARCH_MEMORY_PROBE
> +	def_bool y
> +	depends on MEMORY_HOTPLUG
> +

Does arch/powerpc/Kconfig need a similar fix then?

Paul.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] Memory Add Fixes for ppc64
  2005-11-07  0:59   ` Paul Mackerras
@ 2005-11-07 17:39     ` Mike Kravetz
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Kravetz @ 2005-11-07 17:39 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc64-dev, linux-kernel, lhms-devel

On Mon, Nov 07, 2005 at 11:59:42AM +1100, Paul Mackerras wrote:
> Mike Kravetz writes:
> > ppc64 needs a special sysfs probe file for adding new memory.
> 
> Does arch/powerpc/Kconfig need a similar fix then?

Yes it does.  Sorry, I haven't been paying as much attention to the
merge as I should.  Here is a new version.

ppc64 needs a special sysfs probe file for adding new memory.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>

diff -Naupr linux-2.6.14-git7/arch/powerpc/Kconfig linux-2.6.14-git7.work/arch/powerpc/Kconfig
--- linux-2.6.14-git7/arch/powerpc/Kconfig	2005-11-04 21:21:05.000000000 +0000
+++ linux-2.6.14-git7.work/arch/powerpc/Kconfig	2005-11-07 17:32:45.000000000 +0000
@@ -569,6 +569,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
 	def_bool y
 	depends on NEED_MULTIPLE_NODES
 
+config ARCH_MEMORY_PROBE
+	def_bool y
+	depends on MEMORY_HOTPLUG
+
 # Some NUMA nodes have memory ranges that span
 # other nodes.  Even though a pfn is valid and
 # between a node's start and end pfns, it may not
diff -Naupr linux-2.6.14-git7/arch/ppc64/Kconfig linux-2.6.14-git7.work/arch/ppc64/Kconfig
--- linux-2.6.14-git7/arch/ppc64/Kconfig	2005-11-04 21:21:06.000000000 +0000
+++ linux-2.6.14-git7.work/arch/ppc64/Kconfig	2005-11-07 17:31:51.000000000 +0000
@@ -277,6 +277,10 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
 	def_bool y
 	depends on NEED_MULTIPLE_NODES
 
+config ARCH_MEMORY_PROBE
+	def_bool y
+	depends on MEMORY_HOTPLUG
+
 # Some NUMA nodes have memory ranges that span
 # other nodes.  Even though a pfn is valid and
 # between a node's start and end pfns, it may not

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] Memory Add Fixes for ppc64
  2005-11-05  0:04   ` Benjamin Herrenschmidt
  2005-11-05  0:35     ` Mike Kravetz
@ 2005-11-07 20:47     ` Mike Kravetz
  2005-11-07 21:12       ` Benjamin Herrenschmidt
  2005-11-08 14:51       ` Andy Whitcroft
  2005-11-08  0:25     ` [PATCH 1/4] revised " Mike Kravetz
  2 siblings, 2 replies; 20+ messages in thread
From: Mike Kravetz @ 2005-11-07 20:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Andy Whitcroft, Paul Mackerras, linuxppc64-dev, linux-kernel, lhms-devel

On Sat, Nov 05, 2005 at 11:04:30AM +1100, Benjamin Herrenschmidt wrote:
> This patch will have to be slightly reworked on top of the 64k pages
> one. It should be trivial though.

Ran into an issue with the interaction of SPARSEMEM and 64k pages.
SPARSEMEM defines the pp64 section size to be 16MB which corresponds
to the smallest LMB size.  There is a check in the SPARSEMEM code
to ensure that MAX_ORDER (actually MAX_ORDER-1) block size is not
greater than section size.  Within the Kconfig file, there is this:

# We optimistically allocate largepages from the VM, so make the limit
# large enough (16MB). This badly named config option is actually
# max order + 1
config FORCE_MAX_ZONEORDER
        int
        depends on PPC64
        default "13"

Just curious if we still want to boost MAX_ORDER like this with 64k
pages?  Doesn't that make the MAX_ORDER block size 256MB in this case?
Also, not quite sure what happens if memory size (a 16 MB multiple)
does not align with a MAX_ORDER block size (a 256MB multiple in this
case).  My 'guess' is that the page allocator would not use it as it
would not fit within the buddy system.

cc'ing SPARSEMEM author Andy Whitcroft.
-- 
Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] Memory Add Fixes for ppc64
  2005-11-07 20:47     ` Mike Kravetz
@ 2005-11-07 21:12       ` Benjamin Herrenschmidt
  2005-11-07 21:48         ` Mike Kravetz
  2005-11-08 14:51       ` Andy Whitcroft
  1 sibling, 1 reply; 20+ messages in thread
From: Benjamin Herrenschmidt @ 2005-11-07 21:12 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andy Whitcroft, Paul Mackerras, linuxppc64-dev, linux-kernel, lhms-devel


> Just curious if we still want to boost MAX_ORDER like this with 64k
> pages?  Doesn't that make the MAX_ORDER block size 256MB in this case?
> Also, not quite sure what happens if memory size (a 16 MB multiple)
> does not align with a MAX_ORDER block size (a 256MB multiple in this
> case).  My 'guess' is that the page allocator would not use it as it
> would not fit within the buddy system.
> 
> cc'ing SPARSEMEM author Andy Whitcroft.

Yes, the MAX_ORDER should be different indeed. But can Kconfig do that ?
That is have the default value be different based on a Kconfig option ?
I don't see that ... We may have to do things differently here...

Ben.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] Memory Add Fixes for ppc64
  2005-11-07 21:12       ` Benjamin Herrenschmidt
@ 2005-11-07 21:48         ` Mike Kravetz
  2005-11-08  0:35           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Kravetz @ 2005-11-07 21:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Andy Whitcroft, Paul Mackerras, linuxppc64-dev, linux-kernel, lhms-devel

On Tue, Nov 08, 2005 at 08:12:56AM +1100, Benjamin Herrenschmidt wrote:
> Yes, the MAX_ORDER should be different indeed. But can Kconfig do that ?
> That is have the default value be different based on a Kconfig option ?
> I don't see that ... We may have to do things differently here...

This seems to be done in other parts of the Kconfig file.  Using those
as an example, this should keep the MAX_ORDER block size at 16MB.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>

diff -Naupr linux-2.6.14-git7.64k/arch/powerpc/Kconfig linux-2.6.14-git7.64k.work/arch/powerpc/Kconfig
--- linux-2.6.14-git7.64k/arch/powerpc/Kconfig	2005-11-07 18:38:50.000000000 +0000
+++ linux-2.6.14-git7.64k.work/arch/powerpc/Kconfig	2005-11-07 21:37:21.000000000 +0000
@@ -463,6 +463,7 @@ source "fs/Kconfig.binfmt"
 config FORCE_MAX_ZONEORDER
 	int
 	depends on PPC64
+	default "9" if PPC_64K_PAGES
 	default "13"
 
 config MATH_EMULATION
diff -Naupr linux-2.6.14-git7.64k/arch/ppc64/Kconfig linux-2.6.14-git7.64k.work/arch/ppc64/Kconfig
--- linux-2.6.14-git7.64k/arch/ppc64/Kconfig	2005-11-07 18:38:50.000000000 +0000
+++ linux-2.6.14-git7.64k.work/arch/ppc64/Kconfig	2005-11-07 21:36:42.000000000 +0000
@@ -56,6 +56,7 @@ config PPC_STD_MMU
 # max order + 1
 config FORCE_MAX_ZONEORDER
 	int
+	default "9" if PPC_64K_PAGES
 	default "13"
 
 source "init/Kconfig"

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] revised Memory Add Fixes for ppc64
  2005-11-05  0:04   ` Benjamin Herrenschmidt
  2005-11-05  0:35     ` Mike Kravetz
  2005-11-07 20:47     ` Mike Kravetz
@ 2005-11-08  0:25     ` Mike Kravetz
  2005-11-08  2:07       ` Paul Mackerras
  2 siblings, 1 reply; 20+ messages in thread
From: Mike Kravetz @ 2005-11-08  0:25 UTC (permalink / raw)
  To: Paul Mackerras, Benjamin Herrenschmidt
  Cc: linuxppc64-dev, linux-kernel, lhms-devel

On Sat, Nov 05, 2005 at 11:04:30AM +1100, Benjamin Herrenschmidt wrote:
> This patch will have to be slightly reworked on top of the 64k pages
> one. It should be trivial though.

Here is a new version of the patch on top of 64k page support (actually
2.6.14-git10).  One filename also changed due to more merge changes.

Add the create_section_mapping() routine to create hptes for memory
sections dynamically added after system boot.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>

diff -Naupr linux-2.6.14-git10/arch/powerpc/mm/hash_utils_64.c linux-2.6.14-git10.work/arch/powerpc/mm/hash_utils_64.c
--- linux-2.6.14-git10/arch/powerpc/mm/hash_utils_64.c	2005-11-08 00:04:15.784924264 +0000
+++ linux-2.6.14-git10.work/arch/powerpc/mm/hash_utils_64.c	2005-11-08 00:06:46.992964608 +0000
@@ -385,6 +385,15 @@ static unsigned long __init htab_get_tab
 	return pteg_count << 7;
 }
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+void create_section_mapping(unsigned long start, unsigned long end)
+{
+		BUG_ON(htab_bolt_mapping(start, end, start,
+			_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_COHERENT | PP_RWXX,
+			mmu_linear_psize));
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
 void __init htab_initialize(void)
 {
 	unsigned long table, htab_size_bytes;
diff -Naupr linux-2.6.14-git10/arch/powerpc/mm/mem.c linux-2.6.14-git10.work/arch/powerpc/mm/mem.c
--- linux-2.6.14-git10/arch/powerpc/mm/mem.c	2005-11-08 00:04:15.798922136 +0000
+++ linux-2.6.14-git10.work/arch/powerpc/mm/mem.c	2005-11-08 00:06:46.993964456 +0000
@@ -127,6 +127,9 @@ int __devinit add_memory(u64 start, u64 
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
+	start += KERNELBASE;
+	create_section_mapping(start, start + size);
+
 	/* this should work for most non-highmem platforms */
 	zone = pgdata->node_zones;
 
diff -Naupr linux-2.6.14-git10/include/asm-powerpc/sparsemem.h linux-2.6.14-git10.work/include/asm-powerpc/sparsemem.h
--- linux-2.6.14-git10/include/asm-powerpc/sparsemem.h	2005-11-08 00:04:28.486988472 +0000
+++ linux-2.6.14-git10.work/include/asm-powerpc/sparsemem.h	2005-11-08 00:07:39.138891344 +0000
@@ -11,6 +11,10 @@
 #define MAX_PHYSADDR_BITS       38
 #define MAX_PHYSMEM_BITS        36
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern void create_section_mapping(unsigned long start, unsigned long end);
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
 #endif /* CONFIG_SPARSEMEM */
 
 #endif /* _ASM_POWERPC_SPARSEMEM_H */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] Memory Add Fixes for ppc64
  2005-11-07 21:48         ` Mike Kravetz
@ 2005-11-08  0:35           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 20+ messages in thread
From: Benjamin Herrenschmidt @ 2005-11-08  0:35 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andy Whitcroft, Paul Mackerras, linuxppc64-dev, linux-kernel, lhms-devel

On Mon, 2005-11-07 at 13:48 -0800, Mike Kravetz wrote:
> On Tue, Nov 08, 2005 at 08:12:56AM +1100, Benjamin Herrenschmidt wrote:
> > Yes, the MAX_ORDER should be different indeed. But can Kconfig do that ?
> > That is have the default value be different based on a Kconfig option ?
> > I don't see that ... We may have to do things differently here...
> 
> This seems to be done in other parts of the Kconfig file.  Using those
> as an example, this should keep the MAX_ORDER block size at 16MB.

Ok, I verified it does the right thing with Kconfig, thanks.

Paul, can you add to the merge tree too ?

Ben.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Memory Add Fixes for ppc64
  2005-11-04 23:15 [PATCH 0/4] Memory Add Fixes for ppc64 Mike Kravetz
                   ` (3 preceding siblings ...)
  2005-11-04 23:21 ` [PATCH 4/4] " Mike Kravetz
@ 2005-11-08  0:39 ` Anton Blanchard
  2005-11-08  0:48   ` Mike Kravetz
  4 siblings, 1 reply; 20+ messages in thread
From: Anton Blanchard @ 2005-11-08  0:39 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Paul Mackerras, linuxppc64-dev, linux-kernel, lhms-devel


Hi Mike,

> When memory add was merged into mainline in 2.6.14, there were
> various bits and pieces missing that prevent it from working on
> ppc64.  The following patches are against 2.6.14-git7 and address
> all but one of the know issues.
> 
> 1) Create hptes for new sections
> 2) Clear page count before freeing new pages
> 3) Kludge to add new memory to node 0
> 4) Ensure probe file is created for memory add via sysfs

Ive got a patch that reworks our numa code and it might reject with
your stuff. I'll send them out for review this afternoon.

Anton

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Memory Add Fixes for ppc64
  2005-11-08  0:39 ` [PATCH 0/4] " Anton Blanchard
@ 2005-11-08  0:48   ` Mike Kravetz
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Kravetz @ 2005-11-08  0:48 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Paul Mackerras, linuxppc64-dev, linux-kernel, lhms-devel

On Tue, Nov 08, 2005 at 11:39:01AM +1100, Anton Blanchard wrote:
> Ive got a patch that reworks our numa code and it might reject with
> your stuff. I'll send them out for review this afternoon.

Interesting in that I was going to start reworking some of the
numa code to make it play nice with hot add.  Doubt this patch
set will impact your changes.  This set is not very intelligent
WRT numa and doesn't really modify any of the real code.

-- 
Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] revised Memory Add Fixes for ppc64
  2005-11-08  0:25     ` [PATCH 1/4] revised " Mike Kravetz
@ 2005-11-08  2:07       ` Paul Mackerras
  2005-11-08  3:02         ` Mike Kravetz
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Mackerras @ 2005-11-08  2:07 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Benjamin Herrenschmidt, linuxppc64-dev, linux-kernel, lhms-devel

Mike Kravetz writes:

> Here is a new version of the patch on top of 64k page support (actually
> 2.6.14-git10).  One filename also changed due to more merge changes.

So, should I send this on to Linus along with the original 2/4 and 3/4
you posted and the revised 4/4?

Paul.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] revised Memory Add Fixes for ppc64
  2005-11-08  2:07       ` Paul Mackerras
@ 2005-11-08  3:02         ` Mike Kravetz
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Kravetz @ 2005-11-08  3:02 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Benjamin Herrenschmidt, linuxppc64-dev, linux-kernel, lhms-devel

On Tue, Nov 08, 2005 at 01:07:13PM +1100, Paul Mackerras wrote:
> So, should I send this on to Linus along with the original 2/4 and 3/4
> you posted and the revised 4/4?

Yes, those should provide basic memory add support for ppc64.

Thanks,
-- 
Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] Memory Add Fixes for ppc64
  2005-11-07 20:47     ` Mike Kravetz
  2005-11-07 21:12       ` Benjamin Herrenschmidt
@ 2005-11-08 14:51       ` Andy Whitcroft
  1 sibling, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2005-11-08 14:51 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc64-dev,
	linux-kernel, lhms-devel

Mike Kravetz wrote:

> Just curious if we still want to boost MAX_ORDER like this with 64k
> pages?  Doesn't that make the MAX_ORDER block size 256MB in this case?
> Also, not quite sure what happens if memory size (a 16 MB multiple)
> does not align with a MAX_ORDER block size (a 256MB multiple in this
> case).  My 'guess' is that the page allocator would not use it as it
> would not fit within the buddy system.

The buddy system and the SPARSEMEM mem_map are separate really.  The key
limitation is the a MAX_ORDER chunk must fit within the SPARSEMEM block
size it cannot span two blocks.  This is because the algorithm by which
the buddy system finds buddies for a returning allocation assumes that
mem_map is contigious upto the maximum buddy size (MAX_ORDER); it
assumes it can use relative addressing to locate them.

The buddy system doesn't really care about the alignment of any of its
blocks.  The allocator is built empty and all existant pages are freed
back to it.  If there is a chunk of memory which can never coalesce back
to MAX_ORDER it will simply sit lower in the tree 'waiting' for these
non-existant buddies and will never merge.  It will still be usable.

-apw

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2005-11-08 14:52 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-04 23:15 [PATCH 0/4] Memory Add Fixes for ppc64 Mike Kravetz
2005-11-04 23:18 ` [PATCH 1/4] " Mike Kravetz
2005-11-05  0:04   ` Benjamin Herrenschmidt
2005-11-05  0:35     ` Mike Kravetz
2005-11-05  0:43       ` Benjamin Herrenschmidt
2005-11-07 20:47     ` Mike Kravetz
2005-11-07 21:12       ` Benjamin Herrenschmidt
2005-11-07 21:48         ` Mike Kravetz
2005-11-08  0:35           ` Benjamin Herrenschmidt
2005-11-08 14:51       ` Andy Whitcroft
2005-11-08  0:25     ` [PATCH 1/4] revised " Mike Kravetz
2005-11-08  2:07       ` Paul Mackerras
2005-11-08  3:02         ` Mike Kravetz
2005-11-04 23:19 ` [PATCH 2/4] " Mike Kravetz
2005-11-04 23:20 ` [PATCH 3/4] " Mike Kravetz
2005-11-04 23:21 ` [PATCH 4/4] " Mike Kravetz
2005-11-07  0:59   ` Paul Mackerras
2005-11-07 17:39     ` Mike Kravetz
2005-11-08  0:39 ` [PATCH 0/4] " Anton Blanchard
2005-11-08  0:48   ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).