linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] powerpc/mm: movable hotplug memory nodes
@ 2016-09-25 18:36 Reza Arbab
  2016-09-25 18:36 ` [PATCH v3 1/5] drivers/of: introduce of_fdt_is_available() Reza Arbab
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Reza Arbab @ 2016-09-25 18:36 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rob Herring, Frank Rowand, Jonathan Corbet, Andrew Morton
  Cc: Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Balbir Singh, Aneesh Kumar K.V, linux-doc, linux-kernel,
	linuxppc-dev, devicetree, linux-mm

These changes enable the dynamic creation of movable nodes on power.

On x86, the ACPI SRAT memory affinity structure can mark memory
hotpluggable, allowing the kernel to possibly create movable nodes at
boot.

While power has no analog of this SRAT information, we can still create
a movable memory node, post boot, by hotplugging all of the node's
memory into ZONE_MOVABLE.

We provide a way to describe the extents and numa associativity of such 
a node in the device tree, while deferring the memory addition to take 
place through hotplug.

In v1, this patchset introduced a new dt compatible id to explicitly 
create a memoryless node at boot. Here, things have been simplified to 
be applicable regardless of the status of node hotplug on power. We 
still intend to enable hotadding a pgdat, but that's now untangled as a 
separate topic.

v3:
* Use Rob Herring's suggestions to improve the node availability check.

* More verbose commit log in the patch enabling CONFIG_MOVABLE_NODE.

* Add a patch to restore top-down allocation the way x86 does.

v2:
* http://lkml.kernel.org/r/1473883618-14998-1-git-send-email-arbab@linux.vnet.ibm.com

* Use the "status" property of standard dt memory nodes instead of 
  introducing a new "ibm,hotplug-aperture" compatible id.

* Remove the patch which explicitly creates a memoryless node. This set 
  no longer has any bearing on whether the pgdat is created at boot or 
  at the time of memory addition.

v1:
* http://lkml.kernel.org/r/1470680843-28702-1-git-send-email-arbab@linux.vnet.ibm.com

Reza Arbab (5):
  drivers/of: introduce of_fdt_is_available()
  drivers/of: do not add memory for unavailable nodes
  powerpc/mm: allow memory hotplug into a memoryless node
  powerpc/mm: restore top-down allocation when using movable_node
  mm: enable CONFIG_MOVABLE_NODE on powerpc

 Documentation/kernel-parameters.txt |  2 +-
 arch/powerpc/mm/numa.c              | 16 ++++------------
 drivers/of/fdt.c                    | 29 ++++++++++++++++++++++++++---
 include/linux/of_fdt.h              |  2 ++
 mm/Kconfig                          |  2 +-
 5 files changed, 34 insertions(+), 17 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/5] drivers/of: introduce of_fdt_is_available()
  2016-09-25 18:36 [PATCH v3 0/5] powerpc/mm: movable hotplug memory nodes Reza Arbab
@ 2016-09-25 18:36 ` Reza Arbab
  2016-10-03 15:28   ` Rob Herring
  2016-09-25 18:36 ` [PATCH v3 2/5] drivers/of: do not add memory for unavailable nodes Reza Arbab
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Reza Arbab @ 2016-09-25 18:36 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rob Herring, Frank Rowand, Jonathan Corbet, Andrew Morton
  Cc: Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Balbir Singh, Aneesh Kumar K.V, linux-doc, linux-kernel,
	linuxppc-dev, devicetree, linux-mm

In __fdt_scan_reserved_mem(), the availability of a node is determined
by testing its "status" property.

Move this check into its own function, borrowing logic from the
unflattened version, of_device_is_available().

Another caller will be added in a subsequent patch.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 drivers/of/fdt.c       | 26 +++++++++++++++++++++++---
 include/linux/of_fdt.h |  2 ++
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 085c638..9241c6e 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -151,6 +151,23 @@ int of_fdt_match(const void *blob, unsigned long node,
 	return score;
 }
 
+bool of_fdt_is_available(const void *blob, unsigned long node)
+{
+	const char *status;
+	int statlen;
+
+	status = fdt_getprop(blob, node, "status", &statlen);
+	if (!status)
+		return true;
+
+	if (statlen) {
+		if (!strcmp(status, "okay") || !strcmp(status, "ok"))
+			return true;
+	}
+
+	return false;
+}
+
 static void *unflatten_dt_alloc(void **mem, unsigned long size,
 				       unsigned long align)
 {
@@ -647,7 +664,6 @@ static int __init __fdt_scan_reserved_mem(unsigned long node, const char *uname,
 					  int depth, void *data)
 {
 	static int found;
-	const char *status;
 	int err;
 
 	if (!found && depth == 1 && strcmp(uname, "reserved-memory") == 0) {
@@ -667,8 +683,7 @@ static int __init __fdt_scan_reserved_mem(unsigned long node, const char *uname,
 		return 1;
 	}
 
-	status = of_get_flat_dt_prop(node, "status", NULL);
-	if (status && strcmp(status, "okay") != 0 && strcmp(status, "ok") != 0)
+	if (!of_flat_dt_is_available(node))
 		return 0;
 
 	err = __reserved_mem_reserve_reg(node, uname);
@@ -809,6 +824,11 @@ int __init of_flat_dt_match(unsigned long node, const char *const *compat)
 	return of_fdt_match(initial_boot_params, node, compat);
 }
 
+bool __init of_flat_dt_is_available(unsigned long node)
+{
+	return of_fdt_is_available(initial_boot_params, node);
+}
+
 struct fdt_scan_status {
 	const char *name;
 	int namelen;
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index 26c3302..49e0b8f 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -37,6 +37,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
 				 unsigned long node);
 extern int of_fdt_match(const void *blob, unsigned long node,
 			const char *const *compat);
+extern bool of_fdt_is_available(const void *blob, unsigned long node);
 extern void *of_fdt_unflatten_tree(const unsigned long *blob,
 				   struct device_node *dad,
 				   struct device_node **mynodes);
@@ -59,6 +60,7 @@ extern const void *of_get_flat_dt_prop(unsigned long node, const char *name,
 				       int *size);
 extern int of_flat_dt_is_compatible(unsigned long node, const char *name);
 extern int of_flat_dt_match(unsigned long node, const char *const *matches);
+extern bool of_flat_dt_is_available(unsigned long node);
 extern unsigned long of_get_flat_dt_root(void);
 extern int of_get_flat_dt_size(void);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/5] drivers/of: do not add memory for unavailable nodes
  2016-09-25 18:36 [PATCH v3 0/5] powerpc/mm: movable hotplug memory nodes Reza Arbab
  2016-09-25 18:36 ` [PATCH v3 1/5] drivers/of: introduce of_fdt_is_available() Reza Arbab
@ 2016-09-25 18:36 ` Reza Arbab
  2016-09-25 18:36 ` [PATCH v3 3/5] powerpc/mm: allow memory hotplug into a memoryless node Reza Arbab
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Reza Arbab @ 2016-09-25 18:36 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rob Herring, Frank Rowand, Jonathan Corbet, Andrew Morton
  Cc: Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Balbir Singh, Aneesh Kumar K.V, linux-doc, linux-kernel,
	linuxppc-dev, devicetree, linux-mm

Respect the standard dt "status" property when scanning memory nodes in
early_init_dt_scan_memory(), so that if the node is unavailable, no
memory will be added.

The use case at hand is accelerator or device memory, which may be
unusable until post-boot initialization of the memory link. Such a node
can be described in the dt as any other, given its status is "disabled".
Per the device tree specification,

"disabled"
	Indicates that the device is not presently operational, but it
	might become operational in the future (for example, something
	is not plugged in, or switched off).

Once such memory is made operational, it can then be hotplugged.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 drivers/of/fdt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 9241c6e..59b772a 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1056,6 +1056,9 @@ int __init early_init_dt_scan_memory(unsigned long node, const char *uname,
 	} else if (strcmp(type, "memory") != 0)
 		return 0;
 
+	if (!of_flat_dt_is_available(node))
+		return 0;
+
 	reg = of_get_flat_dt_prop(node, "linux,usable-memory", &l);
 	if (reg == NULL)
 		reg = of_get_flat_dt_prop(node, "reg", &l);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/5] powerpc/mm: allow memory hotplug into a memoryless node
  2016-09-25 18:36 [PATCH v3 0/5] powerpc/mm: movable hotplug memory nodes Reza Arbab
  2016-09-25 18:36 ` [PATCH v3 1/5] drivers/of: introduce of_fdt_is_available() Reza Arbab
  2016-09-25 18:36 ` [PATCH v3 2/5] drivers/of: do not add memory for unavailable nodes Reza Arbab
@ 2016-09-25 18:36 ` Reza Arbab
  2016-09-25 18:36 ` [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node Reza Arbab
  2016-09-25 18:36 ` [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc Reza Arbab
  4 siblings, 0 replies; 16+ messages in thread
From: Reza Arbab @ 2016-09-25 18:36 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rob Herring, Frank Rowand, Jonathan Corbet, Andrew Morton
  Cc: Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Balbir Singh, Aneesh Kumar K.V, linux-doc, linux-kernel,
	linuxppc-dev, devicetree, linux-mm

Remove the check which prevents us from hotplugging into an empty node.

This limitation has been questioned before [1], and judging by the
response, there doesn't seem to be a reason we can't remove it. No issues
have been found in light testing.

[1] http://lkml.kernel.org/r/CAGZKiBrmkSa1yyhbf5hwGxubcjsE5SmkSMY4tpANERMe2UG4bg@mail.gmail.com
    http://lkml.kernel.org/r/20160511215051.GF22115@arbab-laptop.austin.ibm.com

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 75b9cd6..d7ac419 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1121,7 +1121,7 @@ static int hot_add_node_scn_to_nid(unsigned long scn_addr)
 int hot_add_scn_to_nid(unsigned long scn_addr)
 {
 	struct device_node *memory = NULL;
-	int nid, found = 0;
+	int nid;
 
 	if (!numa_enabled || (min_common_depth < 0))
 		return first_online_node;
@@ -1137,17 +1137,6 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
 	if (nid < 0 || !node_online(nid))
 		nid = first_online_node;
 
-	if (NODE_DATA(nid)->node_spanned_pages)
-		return nid;
-
-	for_each_online_node(nid) {
-		if (NODE_DATA(nid)->node_spanned_pages) {
-			found = 1;
-			break;
-		}
-	}
-
-	BUG_ON(!found);
 	return nid;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node
  2016-09-25 18:36 [PATCH v3 0/5] powerpc/mm: movable hotplug memory nodes Reza Arbab
                   ` (2 preceding siblings ...)
  2016-09-25 18:36 ` [PATCH v3 3/5] powerpc/mm: allow memory hotplug into a memoryless node Reza Arbab
@ 2016-09-25 18:36 ` Reza Arbab
  2016-09-26 15:47   ` Aneesh Kumar K.V
  2016-09-26 21:12   ` Benjamin Herrenschmidt
  2016-09-25 18:36 ` [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc Reza Arbab
  4 siblings, 2 replies; 16+ messages in thread
From: Reza Arbab @ 2016-09-25 18:36 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rob Herring, Frank Rowand, Jonathan Corbet, Andrew Morton
  Cc: Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Balbir Singh, Aneesh Kumar K.V, linux-doc, linux-kernel,
	linuxppc-dev, devicetree, linux-mm

At boot, the movable_node option sets bottom-up memblock allocation.

This reduces the chance that, in the window before movable memory has
been identified, an allocation for the kernel might come from a movable
node. By going bottom-up, early allocations will most likely come from
the same node as the kernel image, which is necessarily in a nonmovable
node.

Then, once any known hotplug memory has been marked, allocation can be
reset back to top-down. On x86, this is done in numa_init(). This patch
does the same on power, in numa initmem_init().

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index d7ac419..fdf1e69 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -945,6 +945,9 @@ void __init initmem_init(void)
 	max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
 	max_pfn = max_low_pfn;
 
+	/* bottom-up allocation may have been set by movable_node */
+	memblock_set_bottom_up(false);
+
 	if (parse_numa_properties())
 		setup_nonnuma();
 	else
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc
  2016-09-25 18:36 [PATCH v3 0/5] powerpc/mm: movable hotplug memory nodes Reza Arbab
                   ` (3 preceding siblings ...)
  2016-09-25 18:36 ` [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node Reza Arbab
@ 2016-09-25 18:36 ` Reza Arbab
  2016-09-26 15:48   ` Aneesh Kumar K.V
  2016-09-26 21:15   ` Benjamin Herrenschmidt
  4 siblings, 2 replies; 16+ messages in thread
From: Reza Arbab @ 2016-09-25 18:36 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rob Herring, Frank Rowand, Jonathan Corbet, Andrew Morton
  Cc: Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Balbir Singh, Aneesh Kumar K.V, linux-doc, linux-kernel,
	linuxppc-dev, devicetree, linux-mm

To create a movable node, we need to hotplug all of its memory into
ZONE_MOVABLE.

Note that to do this, auto_online_blocks should be off. Since the memory
will first be added to the default zone, we must explicitly use
online_movable to online.

Because such a node contains no normal memory, can_online_high_movable()
will only allow us to do the onlining if CONFIG_MOVABLE_NODE is set.
Enable the use of this config option on PPC64 platforms.

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 Documentation/kernel-parameters.txt | 2 +-
 mm/Kconfig                          | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index a4f4d69..3d8460d 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2344,7 +2344,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
-	movable_node	[KNL,X86] Boot-time switch to enable the effects
+	movable_node	[KNL,X86,PPC] Boot-time switch to enable the effects
 			of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
 
 	MTD_Partition=	[MTD]
diff --git a/mm/Kconfig b/mm/Kconfig
index be0ee11..4b19cd3 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -153,7 +153,7 @@ config MOVABLE_NODE
 	bool "Enable to assign a node which has only movable memory"
 	depends on HAVE_MEMBLOCK
 	depends on NO_BOOTMEM
-	depends on X86_64
+	depends on X86_64 || PPC64
 	depends on NUMA
 	default n
 	help
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node
  2016-09-25 18:36 ` [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node Reza Arbab
@ 2016-09-26 15:47   ` Aneesh Kumar K.V
  2016-09-26 20:48     ` Reza Arbab
  2016-09-26 21:12   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 16+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-26 15:47 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Rob Herring, Frank Rowand, Jonathan Corbet,
	Andrew Morton
  Cc: Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Balbir Singh, linux-doc, linux-kernel, linuxppc-dev, devicetree,
	linux-mm

Reza Arbab <arbab@linux.vnet.ibm.com> writes:

> At boot, the movable_node option sets bottom-up memblock allocation.
>
> This reduces the chance that, in the window before movable memory has
> been identified, an allocation for the kernel might come from a movable
> node. By going bottom-up, early allocations will most likely come from
> the same node as the kernel image, which is necessarily in a nonmovable
> node.
>
> Then, once any known hotplug memory has been marked, allocation can be
> reset back to top-down. On x86, this is done in numa_init(). This patch
> does the same on power, in numa initmem_init().
>
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/numa.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index d7ac419..fdf1e69 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -945,6 +945,9 @@ void __init initmem_init(void)
>  	max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
>  	max_pfn = max_low_pfn;
>
> +	/* bottom-up allocation may have been set by movable_node */
> +	memblock_set_bottom_up(false);
> +

By then we have done few memblock allocation right ? IMHO, we should do
this early enough in prom.c after we do parse_early_param, with a
comment there explaining that, we don't really support hotplug memblock
and when we do that, this should be moved to a place where we can handle
memblock allocation such that we avoid spreading memblock allocation to
movable node.


>  	if (parse_numa_properties())
>  		setup_nonnuma();
>  	else
> -- 
> 1.8.3.1

-aneesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc
  2016-09-25 18:36 ` [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc Reza Arbab
@ 2016-09-26 15:48   ` Aneesh Kumar K.V
  2016-09-26 21:15   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-26 15:48 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Rob Herring, Frank Rowand, Jonathan Corbet,
	Andrew Morton
  Cc: Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Balbir Singh, linux-doc, linux-kernel, linuxppc-dev, devicetree,
	linux-mm

Reza Arbab <arbab@linux.vnet.ibm.com> writes:

> To create a movable node, we need to hotplug all of its memory into
> ZONE_MOVABLE.
>
> Note that to do this, auto_online_blocks should be off. Since the memory
> will first be added to the default zone, we must explicitly use
> online_movable to online.
>
> Because such a node contains no normal memory, can_online_high_movable()
> will only allow us to do the onlining if CONFIG_MOVABLE_NODE is set.
> Enable the use of this config option on PPC64 platforms.
>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  Documentation/kernel-parameters.txt | 2 +-
>  mm/Kconfig                          | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index a4f4d69..3d8460d 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2344,7 +2344,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  			that the amount of memory usable for all allocations
>  			is not too small.
>
> -	movable_node	[KNL,X86] Boot-time switch to enable the effects
> +	movable_node	[KNL,X86,PPC] Boot-time switch to enable the effects
>  			of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
>
>  	MTD_Partition=	[MTD]
> diff --git a/mm/Kconfig b/mm/Kconfig
> index be0ee11..4b19cd3 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -153,7 +153,7 @@ config MOVABLE_NODE
>  	bool "Enable to assign a node which has only movable memory"
>  	depends on HAVE_MEMBLOCK
>  	depends on NO_BOOTMEM
> -	depends on X86_64
> +	depends on X86_64 || PPC64
>  	depends on NUMA
>  	default n
>  	help
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node
  2016-09-26 15:47   ` Aneesh Kumar K.V
@ 2016-09-26 20:48     ` Reza Arbab
  0 siblings, 0 replies; 16+ messages in thread
From: Reza Arbab @ 2016-09-26 20:48 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rob Herring, Frank Rowand, Jonathan Corbet, Andrew Morton,
	Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Balbir Singh, linux-doc, linux-kernel, linuxppc-dev, devicetree,
	linux-mm

On Mon, Sep 26, 2016 at 09:17:43PM +0530, Aneesh Kumar K.V wrote:
>> +	/* bottom-up allocation may have been set by movable_node */
>> +	memblock_set_bottom_up(false);
>> +
>
>By then we have done few memblock allocation right ?

Yes, some allocations do occur while bottom-up is set.

>IMHO, we should do this early enough in prom.c after we do 
>parse_early_param, with a comment there explaining that, we don't 
>really support hotplug memblock and when we do that, this should be 
>moved to a place where we can handle memblock allocation such that we 
>avoid spreading memblock allocation to movable node.

Sure, we can do it earlier. The only consideration is that any potential 
calls to memblock_mark_hotplug() happen before we reset to top-down.  
Since we don't do that at all on power, the call can go anywhere.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node
  2016-09-25 18:36 ` [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node Reza Arbab
  2016-09-26 15:47   ` Aneesh Kumar K.V
@ 2016-09-26 21:12   ` Benjamin Herrenschmidt
  2016-09-27  0:14     ` Reza Arbab
  1 sibling, 1 reply; 16+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-26 21:12 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Paul Mackerras, Rob Herring,
	Frank Rowand, Jonathan Corbet, Andrew Morton
  Cc: Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Balbir Singh, Aneesh Kumar K.V, linux-doc, linux-kernel,
	linuxppc-dev, devicetree, linux-mm

On Sun, 2016-09-25 at 13:36 -0500, Reza Arbab wrote:
> At boot, the movable_node option sets bottom-up memblock allocation.
> 
> This reduces the chance that, in the window before movable memory has
> been identified, an allocation for the kernel might come from a movable
> node. By going bottom-up, early allocations will most likely come from
> the same node as the kernel image, which is necessarily in a nonmovable
> node.
> 
> Then, once any known hotplug memory has been marked, allocation can be
> reset back to top-down. On x86, this is done in numa_init(). This patch
> does the same on power, in numa initmem_init().

That's fragile and a bit gross.

But then I'm not *that* fan of making accelerator memory be "memory" nodes
in the first place. Oh well...

In any case, if the memory hasn't been hotplug, this shouldn't be necessary
as we shouldn't be considering it for allocation.

If we want to prevent it for other reason, we should add logic for that
in memblock, or reserve it early or something like that.

Just relying magically on the direction of the allocator is bad, really bad.

Ben.

> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/numa.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index d7ac419..fdf1e69 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -945,6 +945,9 @@ void __init initmem_init(void)
> >  	max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
> >  	max_pfn = max_low_pfn;
>  
> > +	/* bottom-up allocation may have been set by movable_node */
> > +	memblock_set_bottom_up(false);
> +
> >  	if (parse_numa_properties())
> >  		setup_nonnuma();
> >  	else

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc
  2016-09-25 18:36 ` [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc Reza Arbab
  2016-09-26 15:48   ` Aneesh Kumar K.V
@ 2016-09-26 21:15   ` Benjamin Herrenschmidt
  2016-09-27  0:19     ` Reza Arbab
  1 sibling, 1 reply; 16+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-26 21:15 UTC (permalink / raw)
  To: Reza Arbab, Michael Ellerman, Paul Mackerras, Rob Herring,
	Frank Rowand, Jonathan Corbet, Andrew Morton
  Cc: Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Balbir Singh, Aneesh Kumar K.V, linux-doc, linux-kernel,
	linuxppc-dev, devicetree, linux-mm

On Sun, 2016-09-25 at 13:36 -0500, Reza Arbab wrote:
> To create a movable node, we need to hotplug all of its memory into
> ZONE_MOVABLE.
> 
> Note that to do this, auto_online_blocks should be off. Since the memory
> will first be added to the default zone, we must explicitly use
> online_movable to online.
> 
> Because such a node contains no normal memory, can_online_high_movable()
> will only allow us to do the onlining if CONFIG_MOVABLE_NODE is set.
> Enable the use of this config option on PPC64 platforms.

What is that business with a command line argument ? Do that mean that
we'll need some magic command line argument to properly handle LPC memory
on CAPI devices or GPUs ? If yes that's bad ... kernel arguments should
be a last resort.

We should have all the information we need from the device-tree.

Note also that we shouldn't need to create those nodes at boot time,
we need to add the ability to create the whole thing at runtime, we may know
that there's an NPU with an LPC window in the system but we won't know if it's
used until it is and for CAPI we just simply don't know until some PCI device
gets turned into CAPI mode and starts claiming LPC memory...

Ben.

> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  Documentation/kernel-parameters.txt | 2 +-
>  mm/Kconfig                          | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index a4f4d69..3d8460d 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2344,7 +2344,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> >  			that the amount of memory usable for all allocations
> >  			is not too small.
>  
> > > -	movable_node	[KNL,X86] Boot-time switch to enable the effects
> > > +	movable_node	[KNL,X86,PPC] Boot-time switch to enable the effects
> >  			of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
>  
> > >  	MTD_Partition=	[MTD]
> diff --git a/mm/Kconfig b/mm/Kconfig
> index be0ee11..4b19cd3 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -153,7 +153,7 @@ config MOVABLE_NODE
> >  	bool "Enable to assign a node which has only movable memory"
> >  	depends on HAVE_MEMBLOCK
> >  	depends on NO_BOOTMEM
> > -	depends on X86_64
> > +	depends on X86_64 || PPC64
> >  	depends on NUMA
> >  	default n
> >  	help

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node
  2016-09-26 21:12   ` Benjamin Herrenschmidt
@ 2016-09-27  0:14     ` Reza Arbab
  2016-10-04  0:48       ` Balbir Singh
  0 siblings, 1 reply; 16+ messages in thread
From: Reza Arbab @ 2016-09-27  0:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Ellerman, Paul Mackerras, Rob Herring, Frank Rowand,
	Jonathan Corbet, Andrew Morton, Bharata B Rao, Nathan Fontenot,
	Stewart Smith, Alistair Popple, Balbir Singh, Aneesh Kumar K.V,
	linux-doc, linux-kernel, linuxppc-dev, devicetree, linux-mm

On Tue, Sep 27, 2016 at 07:12:31AM +1000, Benjamin Herrenschmidt wrote:
>In any case, if the memory hasn't been hotplug, this shouldn't be 
>necessary as we shouldn't be considering it for allocation.

Right. To be clear, the background info I put in the commit log refers 
to x86, where the SRAT can describe movable nodes which exist at boot.  
They're trying to avoid allocations from those nodes before they've been 
identified.

On power, movable nodes can only exist via hotplug, so that scenario 
can't happen. We can immediately go back to top-down allocation. That is 
the missing call being added in the patch.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc
  2016-09-26 21:15   ` Benjamin Herrenschmidt
@ 2016-09-27  0:19     ` Reza Arbab
  0 siblings, 0 replies; 16+ messages in thread
From: Reza Arbab @ 2016-09-27  0:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Ellerman, Paul Mackerras, Rob Herring, Frank Rowand,
	Jonathan Corbet, Andrew Morton, Bharata B Rao, Nathan Fontenot,
	Stewart Smith, Alistair Popple, Balbir Singh, Aneesh Kumar K.V,
	linux-doc, linux-kernel, linuxppc-dev, devicetree, linux-mm

On Tue, Sep 27, 2016 at 07:15:41AM +1000, Benjamin Herrenschmidt wrote:
>What is that business with a command line argument ? Do that mean that
>we'll need some magic command line argument to properly handle LPC memory
>on CAPI devices or GPUs ? If yes that's bad ... kernel arguments should
>be a last resort.

Well, movable_node is just a boolean, meaning "allow nodes which contain 
only movable memory". It's _not_ like "movable_node=10,13-15,17", if 
that's what you were thinking.

>We should have all the information we need from the device-tree.
>
>Note also that we shouldn't need to create those nodes at boot time,
>we need to add the ability to create the whole thing at runtime, we may know
>that there's an NPU with an LPC window in the system but we won't know if it's
>used until it is and for CAPI we just simply don't know until some PCI device
>gets turned into CAPI mode and starts claiming LPC memory...

Yes, this is what is planned for, if I'm understanding you correctly.

In the dt, the PCI device node has a phandle pointing to the memory 
node. The memory node describes the window into which we can hotplug at 
runtime.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/5] drivers/of: introduce of_fdt_is_available()
  2016-09-25 18:36 ` [PATCH v3 1/5] drivers/of: introduce of_fdt_is_available() Reza Arbab
@ 2016-10-03 15:28   ` Rob Herring
  0 siblings, 0 replies; 16+ messages in thread
From: Rob Herring @ 2016-10-03 15:28 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Frank Rowand, Jonathan Corbet, Andrew Morton, Bharata B Rao,
	Nathan Fontenot, Stewart Smith, Alistair Popple, Balbir Singh,
	Aneesh Kumar K.V, linux-doc, linux-kernel, linuxppc-dev,
	devicetree, linux-mm

On Sun, Sep 25, 2016 at 1:36 PM, Reza Arbab <arbab@linux.vnet.ibm.com> wrote:
> In __fdt_scan_reserved_mem(), the availability of a node is determined
> by testing its "status" property.
>
> Move this check into its own function, borrowing logic from the
> unflattened version, of_device_is_available().
>
> Another caller will be added in a subsequent patch.
>
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  drivers/of/fdt.c       | 26 +++++++++++++++++++++++---
>  include/linux/of_fdt.h |  2 ++
>  2 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 085c638..9241c6e 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -151,6 +151,23 @@ int of_fdt_match(const void *blob, unsigned long node,
>         return score;
>  }
>
> +bool of_fdt_is_available(const void *blob, unsigned long node)

of_fdt_device_is_available

[...]

> +bool __init of_flat_dt_is_available(unsigned long node)

And of_flat_dt_device_is_available

With that,

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node
  2016-09-27  0:14     ` Reza Arbab
@ 2016-10-04  0:48       ` Balbir Singh
  2016-10-04 20:23         ` Reza Arbab
  0 siblings, 1 reply; 16+ messages in thread
From: Balbir Singh @ 2016-10-04  0:48 UTC (permalink / raw)
  To: Reza Arbab, Benjamin Herrenschmidt
  Cc: Michael Ellerman, Paul Mackerras, Rob Herring, Frank Rowand,
	Jonathan Corbet, Andrew Morton, Bharata B Rao, Nathan Fontenot,
	Stewart Smith, Alistair Popple, Aneesh Kumar K.V, linux-doc,
	linux-kernel, linuxppc-dev, devicetree, linux-mm



On 27/09/16 10:14, Reza Arbab wrote:
> On Tue, Sep 27, 2016 at 07:12:31AM +1000, Benjamin Herrenschmidt wrote:
>> In any case, if the memory hasn't been hotplug, this shouldn't be necessary as we shouldn't be considering it for allocation.
> 
> Right. To be clear, the background info I put in the commit log refers to x86, where the SRAT can describe movable nodes which exist at boot.  They're trying to avoid allocations from those nodes before they've been identified.
> 
> On power, movable nodes can only exist via hotplug, so that scenario can't happen. We can immediately go back to top-down allocation. That is the missing call being added in the patch.
> 

Can we fix cmdline_parse_movable_node() to do the right thing? I suspect that
code is heavily x86 only in the sense that no other arch needs it.

Balbir Singh.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node
  2016-10-04  0:48       ` Balbir Singh
@ 2016-10-04 20:23         ` Reza Arbab
  0 siblings, 0 replies; 16+ messages in thread
From: Reza Arbab @ 2016-10-04 20:23 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Benjamin Herrenschmidt, Michael Ellerman, Paul Mackerras,
	Rob Herring, Frank Rowand, Jonathan Corbet, Andrew Morton,
	Bharata B Rao, Nathan Fontenot, Stewart Smith, Alistair Popple,
	Aneesh Kumar K.V, linux-doc, linux-kernel, linuxppc-dev,
	devicetree, linux-mm

On Tue, Oct 04, 2016 at 11:48:30AM +1100, Balbir Singh wrote:
>On 27/09/16 10:14, Reza Arbab wrote:
>> Right. To be clear, the background info I put in the commit log 
>> refers to x86, where the SRAT can describe movable nodes which exist 
>> at boot.  They're trying to avoid allocations from those nodes before 
>> they've been identified.
>>
>> On power, movable nodes can only exist via hotplug, so that scenario 
>> can't happen. We can immediately go back to top-down allocation. That 
>> is the missing call being added in the patch.
>
>Can we fix cmdline_parse_movable_node() to do the right thing? I 
>suspect that code is heavily x86 only in the sense that no other arch 
>needs it.

Good idea. We could change it so things only go bottom-up on x86 in the 
first place.

A nice consequence is that CONFIG_MOVABLE_NODE would then basically be 
usable on any platform with memory hotplug, not just PPC64 and X86_64.

I'll see if I can move the relevant code into an arch_*() call or 
otherwise factor it out.

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-10-04 20:23 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-25 18:36 [PATCH v3 0/5] powerpc/mm: movable hotplug memory nodes Reza Arbab
2016-09-25 18:36 ` [PATCH v3 1/5] drivers/of: introduce of_fdt_is_available() Reza Arbab
2016-10-03 15:28   ` Rob Herring
2016-09-25 18:36 ` [PATCH v3 2/5] drivers/of: do not add memory for unavailable nodes Reza Arbab
2016-09-25 18:36 ` [PATCH v3 3/5] powerpc/mm: allow memory hotplug into a memoryless node Reza Arbab
2016-09-25 18:36 ` [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node Reza Arbab
2016-09-26 15:47   ` Aneesh Kumar K.V
2016-09-26 20:48     ` Reza Arbab
2016-09-26 21:12   ` Benjamin Herrenschmidt
2016-09-27  0:14     ` Reza Arbab
2016-10-04  0:48       ` Balbir Singh
2016-10-04 20:23         ` Reza Arbab
2016-09-25 18:36 ` [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc Reza Arbab
2016-09-26 15:48   ` Aneesh Kumar K.V
2016-09-26 21:15   ` Benjamin Herrenschmidt
2016-09-27  0:19     ` Reza Arbab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).