All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: fix movable_node kernel command-line
@ 2017-10-20 23:32 Sharath Kumar Bhat
  2017-10-23 12:52 ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Sharath Kumar Bhat @ 2017-10-20 23:32 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm

Currently when booted with the 'movable_node' kernel command-line the user
can not have both the functionality of 'movable_node' and at the same time
specify more movable memory than the total size of hotpluggable memories.

This is a problem because it limits the total amount of movable memory in
the system to the total size of hotpluggable memories and in a system the
total size of hotpluggable memories can be very small or all hotpluggable
memories could have been offlined. The 'movable_node' parameter was aimed
to provide the entire memory of hotpluggable NUMA nodes to applications
without any kernel allocations in them. The 'movable_node' option will be
useful if those hotpluggable nodes have special memory like MCDRAM as in
KNL which is a high bandwidth memory and the user would like to use all of
it for applications. But in doing so the 'movable_node' command-line poses
this limitation and does not allow the user to specify more movable memory
in addition to the hotpluggable memories.

With this change the existing 'movablecore=' and 'kernelcore=' command-line
parameters can be specified in addition to the 'movable_node' kernel
parameter. This allows the user to boot the kernel with an increased amount
of movable memory in the system and still have only movable memory in
hotpluggable NUMA nodes.

Ex:

Hardware  : Intel(R) Xeon Phi(TM) CPU 7250, SNC4 flat (cluster mode)
NUMA Nodes: 8
            0-3 DDR Memory (Non-hotpluggable)
            4-7 High Bandwidth Memory (Hotpluggable)

Kernel command-line parameters: kernelcore=16G movable_node

Before this patch,
----------------------------------
NUMA Node Zone    #Pages
----------------------------------
Node 0    DMA        3999
Node 0    DMA32    756023
Node 0    Normal  5505024
Node 1    Normal  6291456
Node 2    Normal  6291456
Node 3    Normal  6291456
Node 4    Movable 1048576
Node 5    Movable 1048576
Node 6    Movable 1048576
Node 7    Movable 1048576
----------------------------------
Total non-movable pages: 95.9 GB
Total movable pages    : 16.0 GB
----------------------------------

After this patch,
----------------------------------
NUMA Node Zone    #Pages
----------------------------------
Node 0    DMA        3999
Node 0    DMA32    756023
Node 0    Normal   288768
Node 0    Movable 5216256
Node 1    Normal  1048576
Node 1    Movable 5242880
Node 2    Normal  1048576
Node 2    Movable 5242880
Node 3    Normal  1048576
Node 3    Movable 5242880
Node 4    Movable 1048576
Node 5    Movable 1048576
Node 6    Movable 1048576
Node 7    Movable 1048576
----------------------------------
Total non-movable pages: 16.0 GB
Total movable pages    : 95.9 GB
----------------------------------

Signed-off-by: Sharath Kumar Bhat <sharath.k.bhat@linux.intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 13 +++++++++++-
 mm/page_alloc.c                                 | 28 ++++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0549662..81957e8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1807,6 +1807,11 @@
 			so you can NOT specify nn[KMGTPE] and "mirror" at the same
 			time.
 
+			When nn[KMGTPE] is specified along with movable_node
+			kernel parameter then only non-movable nodes are
+			considered for spreading the requested size while the
+			movable nodes have all movable memory.
+
 	kgdbdbgp=	[KGDB,HW] kgdb over EHCI usb debug port.
 			Format: <Controller#>[,poll interval]
 			The controller # is the number of the ehci usb debug
@@ -2324,7 +2329,13 @@
 			value but may be more. If movablecore on its own
 			is specified, the administrator must be careful
 			that the amount of memory usable for all allocations
-			is not too small.
+			is not too small. If movablecore is specified along
+			with movable_node then movablecore indicates the total
+			movable memory requested in the system that includes
+			movable memory in both movable and non-movable nodes.
+			When movable_node is specified, the minimum movable
+			memory allocated will be at least the total size of
+			movable nodes memory.
 
 	movable_node	[KNL] Boot-time switch to make hotplugable memory
 			NUMA nodes to be movable. This means that the memory
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 77e4d3c..4a3579e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6338,20 +6338,28 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	unsigned long totalpages = early_calculate_totalpages();
 	int usable_nodes = nodes_weight(node_states[N_MEMORY]);
 	struct memblock_region *r;
+	nodemask_t movable_nodes;
+	unsigned long movable_node_pages = 0;
 
 	/* Need to find movable_zone earlier when movable_node is specified. */
 	find_usable_zone_for_movable();
 
 	/*
 	 * If movable_node is specified, ignore kernelcore and movablecore
-	 * options.
+	 * options on hotpluggable nodes.
 	 */
+	nodes_clear(movable_nodes);
 	if (movable_node_is_enabled()) {
 		for_each_memblock(memory, r) {
 			if (!memblock_is_hotpluggable(r))
 				continue;
+			if (PFN_UP(r->base) >= PFN_DOWN(r->base + r->size))
+				continue;
 
 			nid = r->nid;
+			node_set(nid, movable_nodes);
+			movable_node_pages += PFN_DOWN(r->base + r->size) -
+						PFN_UP(r->base);
 
 			usable_startpfn = PFN_DOWN(r->base);
 			zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
@@ -6359,6 +6367,14 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 				usable_startpfn;
 		}
 
+		if (required_kernelcore || required_movablecore) {
+			usable_nodes -= nodes_weight(movable_nodes);
+			if (usable_nodes > 0 &&
+			    totalpages > movable_node_pages) {
+				totalpages -= movable_node_pages;
+				goto core_options;
+			}
+		}
 		goto out2;
 	}
 
@@ -6392,6 +6408,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		goto out2;
 	}
 
+core_options:
 	/*
 	 * If movablecore=nn[KMG] was specified, calculate what size of
 	 * kernelcore that corresponds so that memory usable for
@@ -6403,6 +6420,12 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	if (required_movablecore) {
 		unsigned long corepages;
 
+		if (movable_node_is_enabled()) {
+			if (required_movablecore > movable_node_pages)
+				required_movablecore -= movable_node_pages;
+			else
+				goto out2;
+		}
 		/*
 		 * Round-up so that ZONE_MOVABLE is at least as large as what
 		 * was requested by the user
@@ -6431,6 +6454,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 	for_each_node_state(nid, N_MEMORY) {
 		unsigned long start_pfn, end_pfn;
 
+		/* Skip movable nodes if any */
+		if (node_isset(nid, movable_nodes))
+			continue;
 		/*
 		 * Recalculate kernelcore_node if the division per node
 		 * now exceeds what is necessary to satisfy the requested
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-20 23:32 [PATCH] mm: fix movable_node kernel command-line Sharath Kumar Bhat
@ 2017-10-23 12:52 ` Michal Hocko
  2017-10-23 16:03   ` Sharath Kumar Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-10-23 12:52 UTC (permalink / raw)
  To: Sharath Kumar Bhat; +Cc: linux-mm, akpm

On Fri 20-10-17 16:32:09, Sharath Kumar Bhat wrote:
> Currently when booted with the 'movable_node' kernel command-line the user
> can not have both the functionality of 'movable_node' and at the same time
> specify more movable memory than the total size of hotpluggable memories.
> 
> This is a problem because it limits the total amount of movable memory in
> the system to the total size of hotpluggable memories and in a system the
> total size of hotpluggable memories can be very small or all hotpluggable
> memories could have been offlined. The 'movable_node' parameter was aimed
> to provide the entire memory of hotpluggable NUMA nodes to applications
> without any kernel allocations in them. The 'movable_node' option will be
> useful if those hotpluggable nodes have special memory like MCDRAM as in
> KNL which is a high bandwidth memory and the user would like to use all of
> it for applications. But in doing so the 'movable_node' command-line poses
> this limitation and does not allow the user to specify more movable memory
> in addition to the hotpluggable memories.
> 
> With this change the existing 'movablecore=' and 'kernelcore=' command-line
> parameters can be specified in addition to the 'movable_node' kernel
> parameter. This allows the user to boot the kernel with an increased amount
> of movable memory in the system and still have only movable memory in
> hotpluggable NUMA nodes.

I really detest making the already cluttered kernelcore* handling even
more so. Why cannot your MCDRAM simply announce itself as hotplugable?
Also it is not really clear to me how can you control that only your
specific memory type gets into movable zone.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 12:52 ` Michal Hocko
@ 2017-10-23 16:03   ` Sharath Kumar Bhat
  2017-10-23 16:15     ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Sharath Kumar Bhat @ 2017-10-23 16:03 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sharath Kumar Bhat, linux-mm, akpm

On Mon, Oct 23, 2017 at 02:52:13PM +0200, Michal Hocko wrote:
> On Fri 20-10-17 16:32:09, Sharath Kumar Bhat wrote:
> > Currently when booted with the 'movable_node' kernel command-line the user
> > can not have both the functionality of 'movable_node' and at the same time
> > specify more movable memory than the total size of hotpluggable memories.
> > 
> > This is a problem because it limits the total amount of movable memory in
> > the system to the total size of hotpluggable memories and in a system the
> > total size of hotpluggable memories can be very small or all hotpluggable
> > memories could have been offlined. The 'movable_node' parameter was aimed
> > to provide the entire memory of hotpluggable NUMA nodes to applications
> > without any kernel allocations in them. The 'movable_node' option will be
> > useful if those hotpluggable nodes have special memory like MCDRAM as in
> > KNL which is a high bandwidth memory and the user would like to use all of
> > it for applications. But in doing so the 'movable_node' command-line poses
> > this limitation and does not allow the user to specify more movable memory
> > in addition to the hotpluggable memories.
> > 
> > With this change the existing 'movablecore=' and 'kernelcore=' command-line
> > parameters can be specified in addition to the 'movable_node' kernel
> > parameter. This allows the user to boot the kernel with an increased amount
> > of movable memory in the system and still have only movable memory in
> > hotpluggable NUMA nodes.
> 
> I really detest making the already cluttered kernelcore* handling even
> more so. Why cannot your MCDRAM simply announce itself as hotplugable?
> Also it is not really clear to me how can you control that only your
> specific memory type gets into movable zone.
> -- 
> Michal Hocko
> SUSE Labs

In the example MCDRAM is already being announced as hotpluggable and
'movable_node' is also used to ensure that there is no kernel allocations
in that. This is a required functionality but when done so user can not have
movable zone in other non-hotpluggable memories in addition to hotpluggable
memory.

This change wont affect any of the present use cases such as 'kernelcore='
or 'movablecore=' or using only 'movable_node'. They continue to work as
before.

In addition to those it lets admin to specify 'kernelcore=' or
'movablecore=' when using 'movable_node' command-line

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 16:03   ` Sharath Kumar Bhat
@ 2017-10-23 16:15     ` Michal Hocko
  2017-10-23 17:14       ` Sharath Kumar Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-10-23 16:15 UTC (permalink / raw)
  To: Sharath Kumar Bhat; +Cc: linux-mm, akpm

On Mon 23-10-17 09:03:14, Sharath Kumar Bhat wrote:
> On Mon, Oct 23, 2017 at 02:52:13PM +0200, Michal Hocko wrote:
> > On Fri 20-10-17 16:32:09, Sharath Kumar Bhat wrote:
> > > Currently when booted with the 'movable_node' kernel command-line the user
> > > can not have both the functionality of 'movable_node' and at the same time
> > > specify more movable memory than the total size of hotpluggable memories.
> > > 
> > > This is a problem because it limits the total amount of movable memory in
> > > the system to the total size of hotpluggable memories and in a system the
> > > total size of hotpluggable memories can be very small or all hotpluggable
> > > memories could have been offlined. The 'movable_node' parameter was aimed
> > > to provide the entire memory of hotpluggable NUMA nodes to applications
> > > without any kernel allocations in them. The 'movable_node' option will be
> > > useful if those hotpluggable nodes have special memory like MCDRAM as in
> > > KNL which is a high bandwidth memory and the user would like to use all of
> > > it for applications. But in doing so the 'movable_node' command-line poses
> > > this limitation and does not allow the user to specify more movable memory
> > > in addition to the hotpluggable memories.
> > > 
> > > With this change the existing 'movablecore=' and 'kernelcore=' command-line
> > > parameters can be specified in addition to the 'movable_node' kernel
> > > parameter. This allows the user to boot the kernel with an increased amount
> > > of movable memory in the system and still have only movable memory in
> > > hotpluggable NUMA nodes.
> > 
> > I really detest making the already cluttered kernelcore* handling even
> > more so. Why cannot your MCDRAM simply announce itself as hotplugable?
> > Also it is not really clear to me how can you control that only your
> > specific memory type gets into movable zone.
> > -- 
> > Michal Hocko
> > SUSE Labs
> 
> In the example MCDRAM is already being announced as hotpluggable and
> 'movable_node' is also used to ensure that there is no kernel allocations
> in that. This is a required functionality but when done so user can not have
> movable zone in other non-hotpluggable memories in addition to hotpluggable
> memory.
> 
> This change wont affect any of the present use cases such as 'kernelcore='
> or 'movablecore=' or using only 'movable_node'. They continue to work as
> before.
> 
> In addition to those it lets admin to specify 'kernelcore=' or
> 'movablecore=' when using 'movable_node' command-line

So, why exactly do we need this functionality? kernelcore is an ugly
interface, I am not entirely thrilled into extending it even more.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 16:15     ` Michal Hocko
@ 2017-10-23 17:14       ` Sharath Kumar Bhat
  2017-10-23 17:20         ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Sharath Kumar Bhat @ 2017-10-23 17:14 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sharath Kumar Bhat, linux-mm, akpm

On Mon, Oct 23, 2017 at 06:15:54PM +0200, Michal Hocko wrote:
> On Mon 23-10-17 09:03:14, Sharath Kumar Bhat wrote:
> > On Mon, Oct 23, 2017 at 02:52:13PM +0200, Michal Hocko wrote:
> > > On Fri 20-10-17 16:32:09, Sharath Kumar Bhat wrote:
> > > > Currently when booted with the 'movable_node' kernel command-line the user
> > > > can not have both the functionality of 'movable_node' and at the same time
> > > > specify more movable memory than the total size of hotpluggable memories.
> > > > 
> > > > This is a problem because it limits the total amount of movable memory in
> > > > the system to the total size of hotpluggable memories and in a system the
> > > > total size of hotpluggable memories can be very small or all hotpluggable
> > > > memories could have been offlined. The 'movable_node' parameter was aimed
> > > > to provide the entire memory of hotpluggable NUMA nodes to applications
> > > > without any kernel allocations in them. The 'movable_node' option will be
> > > > useful if those hotpluggable nodes have special memory like MCDRAM as in
> > > > KNL which is a high bandwidth memory and the user would like to use all of
> > > > it for applications. But in doing so the 'movable_node' command-line poses
> > > > this limitation and does not allow the user to specify more movable memory
> > > > in addition to the hotpluggable memories.
> > > > 
> > > > With this change the existing 'movablecore=' and 'kernelcore=' command-line
> > > > parameters can be specified in addition to the 'movable_node' kernel
> > > > parameter. This allows the user to boot the kernel with an increased amount
> > > > of movable memory in the system and still have only movable memory in
> > > > hotpluggable NUMA nodes.
> > > 
> > > I really detest making the already cluttered kernelcore* handling even
> > > more so. Why cannot your MCDRAM simply announce itself as hotplugable?
> > > Also it is not really clear to me how can you control that only your
> > > specific memory type gets into movable zone.
> > > -- 
> > > Michal Hocko
> > > SUSE Labs
> > 
> > In the example MCDRAM is already being announced as hotpluggable and
> > 'movable_node' is also used to ensure that there is no kernel allocations
> > in that. This is a required functionality but when done so user can not have
> > movable zone in other non-hotpluggable memories in addition to hotpluggable
> > memory.
> > 
> > This change wont affect any of the present use cases such as 'kernelcore='
> > or 'movablecore=' or using only 'movable_node'. They continue to work as
> > before.
> > 
> > In addition to those it lets admin to specify 'kernelcore=' or
> > 'movablecore=' when using 'movable_node' command-line
> 
> So, why exactly do we need this functionality? kernelcore is an ugly
> interface, I am not entirely thrilled into extending it even more.
> 
> -- 
> Michal Hocko
> SUSE Labs

This lets admin to configure the kernel to have movable memory > size of
hotpluggable memories and at the same time hotpluggable nodes have only
movable memory. This is useful because it lets user to have more movable
memory in the system that can be offlined/onlined. When the same hardware
is shared between two OS's then this helps to dynamically provision the
physical memory between them by offlining/onlining as and when the
application/user need changes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 17:14       ` Sharath Kumar Bhat
@ 2017-10-23 17:20         ` Michal Hocko
  2017-10-23 17:35           ` Sharath Kumar Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-10-23 17:20 UTC (permalink / raw)
  To: Sharath Kumar Bhat; +Cc: linux-mm, akpm

On Mon 23-10-17 10:14:35, Sharath Kumar Bhat wrote:
[...]
> This lets admin to configure the kernel to have movable memory > size of
> hotpluggable memories and at the same time hotpluggable nodes have only
> movable memory.

Put aside that I believe that having too much of movable memory is
dangerous and people are not very prepared for that fact, what is the
specific usecase. Allowing users something is nice but as I've said the
interface is ugly already and putting more on top is not very desirable.

> This is useful because it lets user to have more movable
> memory in the system that can be offlined/onlined. When the same hardware
> is shared between two OS's then this helps to dynamically provision the
> physical memory between them by offlining/onlining as and when the
> application/user need changes.

just use hotplugable memory for that purpose. The latest memory hotplug
code allows you to online memory into a kernel or movable zone as per
admin policy without the previously hardcoded zone ordering. So I really
fail to see why to mock with the command line parameter at all.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 17:20         ` Michal Hocko
@ 2017-10-23 17:35           ` Sharath Kumar Bhat
  2017-10-23 17:49             ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Sharath Kumar Bhat @ 2017-10-23 17:35 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sharath Kumar Bhat, linux-mm, akpm

On Mon, Oct 23, 2017 at 07:20:08PM +0200, Michal Hocko wrote:
> On Mon 23-10-17 10:14:35, Sharath Kumar Bhat wrote:
> [...]
> > This lets admin to configure the kernel to have movable memory > size of
> > hotpluggable memories and at the same time hotpluggable nodes have only
> > movable memory.
> 
> Put aside that I believe that having too much of movable memory is
> dangerous and people are not very prepared for that fact, what is the
> specific usecase. Allowing users something is nice but as I've said the
> interface is ugly already and putting more on top is not very desirable.
> 
> > This is useful because it lets user to have more movable
> > memory in the system that can be offlined/onlined. When the same hardware
> > is shared between two OS's then this helps to dynamically provision the
> > physical memory between them by offlining/onlining as and when the
> > application/user need changes.
> 
> just use hotplugable memory for that purpose. The latest memory hotplug
> code allows you to online memory into a kernel or movable zone as per
> admin policy without the previously hardcoded zone ordering. So I really
> fail to see why to mock with the command line parameter at all.

Yes, but it won't let us offline the memory blocks if they are already
in use by kernel allocations. This is more likely over a long period of
uptime. The command-line ensures that the memory blocks are movable all
the time as reserved by the admin from the boot.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 17:35           ` Sharath Kumar Bhat
@ 2017-10-23 17:49             ` Michal Hocko
  2017-10-23 18:48               ` Sharath Kumar Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-10-23 17:49 UTC (permalink / raw)
  To: Sharath Kumar Bhat; +Cc: linux-mm, akpm

On Mon 23-10-17 10:35:44, Sharath Kumar Bhat wrote:
> On Mon, Oct 23, 2017 at 07:20:08PM +0200, Michal Hocko wrote:
> > On Mon 23-10-17 10:14:35, Sharath Kumar Bhat wrote:
> > [...]
> > > This lets admin to configure the kernel to have movable memory > size of
> > > hotpluggable memories and at the same time hotpluggable nodes have only
> > > movable memory.
> > 
> > Put aside that I believe that having too much of movable memory is
> > dangerous and people are not very prepared for that fact, what is the
> > specific usecase. Allowing users something is nice but as I've said the
> > interface is ugly already and putting more on top is not very desirable.
> > 
> > > This is useful because it lets user to have more movable
> > > memory in the system that can be offlined/onlined. When the same hardware
> > > is shared between two OS's then this helps to dynamically provision the
> > > physical memory between them by offlining/onlining as and when the
> > > application/user need changes.
> > 
> > just use hotplugable memory for that purpose. The latest memory hotplug
> > code allows you to online memory into a kernel or movable zone as per
> > admin policy without the previously hardcoded zone ordering. So I really
> > fail to see why to mock with the command line parameter at all.
> 
> Yes, but it won't let us offline the memory blocks if they are already
> in use by kernel allocations. This is more likely over a long period of
> uptime. The command-line ensures that the memory blocks are movable all
> the time as reserved by the admin from the boot.

I am really confused about your usecase then. Why do you want to make
non-hotplugable memory to be movable then?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 17:49             ` Michal Hocko
@ 2017-10-23 18:48               ` Sharath Kumar Bhat
  2017-10-23 19:04                 ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Sharath Kumar Bhat @ 2017-10-23 18:48 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sharath Kumar Bhat, linux-mm, akpm

On Mon, Oct 23, 2017 at 07:49:05PM +0200, Michal Hocko wrote:
> On Mon 23-10-17 10:35:44, Sharath Kumar Bhat wrote:
> > On Mon, Oct 23, 2017 at 07:20:08PM +0200, Michal Hocko wrote:
> > > On Mon 23-10-17 10:14:35, Sharath Kumar Bhat wrote:
> > > [...]
> > > > This lets admin to configure the kernel to have movable memory > size of
> > > > hotpluggable memories and at the same time hotpluggable nodes have only
> > > > movable memory.
> > > 
> > > Put aside that I believe that having too much of movable memory is
> > > dangerous and people are not very prepared for that fact, what is the
> > > specific usecase. Allowing users something is nice but as I've said the
> > > interface is ugly already and putting more on top is not very desirable.
> > > 
> > > > This is useful because it lets user to have more movable
> > > > memory in the system that can be offlined/onlined. When the same hardware
> > > > is shared between two OS's then this helps to dynamically provision the
> > > > physical memory between them by offlining/onlining as and when the
> > > > application/user need changes.
> > > 
> > > just use hotplugable memory for that purpose. The latest memory hotplug
> > > code allows you to online memory into a kernel or movable zone as per
> > > admin policy without the previously hardcoded zone ordering. So I really
> > > fail to see why to mock with the command line parameter at all.
> > 
> > Yes, but it won't let us offline the memory blocks if they are already
> > in use by kernel allocations. This is more likely over a long period of
> > uptime. The command-line ensures that the memory blocks are movable all
> > the time as reserved by the admin from the boot.
> 
> I am really confused about your usecase then. Why do you want to make
> non-hotplugable memory to be movable then?

Lets say,

The required total memory in the system which can be dynamically
offlined/onlined, T = M + N

M = movable memory in non-hotpluggable memory (say DDR in the example)
N = movable memory in hotpluggable memory (say MCDRAM in the example)

a. We need the entire hotpluggable memory (N) to be movable. Say this is
   16GB (MCDRAM) in KNL.

b. Additionally we need guranteed movable memory M, so that > 16GB (in this
   case) can be dynamically provisioned between two OS's

There is 'movable_node' command-line to accomplish a. But the problem is
that this makes all other non hotpluggable nodes as zone normal and
over a period of time there is no gurantee that we could get 'M' movable
memory to dynamically provision.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 18:48               ` Sharath Kumar Bhat
@ 2017-10-23 19:04                 ` Michal Hocko
  2017-10-23 19:25                   ` Sharath Kumar Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-10-23 19:04 UTC (permalink / raw)
  To: Sharath Kumar Bhat; +Cc: linux-mm, akpm

On Mon 23-10-17 11:48:52, Sharath Kumar Bhat wrote:
> On Mon, Oct 23, 2017 at 07:49:05PM +0200, Michal Hocko wrote:
[...]
> > I am really confused about your usecase then. Why do you want to make
> > non-hotplugable memory to be movable then?
> 
> Lets say,
> 
> The required total memory in the system which can be dynamically
> offlined/onlined, T = M + N
> 
> M = movable memory in non-hotpluggable memory (say DDR in the example)

Why do you need this memory to be on/offlineable if you cannot hotplug
it?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 19:04                 ` Michal Hocko
@ 2017-10-23 19:25                   ` Sharath Kumar Bhat
  2017-10-23 19:35                     ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Sharath Kumar Bhat @ 2017-10-23 19:25 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sharath Kumar Bhat, linux-mm, akpm

On Mon, Oct 23, 2017 at 09:04:59PM +0200, Michal Hocko wrote:
> On Mon 23-10-17 11:48:52, Sharath Kumar Bhat wrote:
> > On Mon, Oct 23, 2017 at 07:49:05PM +0200, Michal Hocko wrote:
> [...]
> > > I am really confused about your usecase then. Why do you want to make
> > > non-hotplugable memory to be movable then?
> > 
> > Lets say,
> > 
> > The required total memory in the system which can be dynamically
> > offlined/onlined, T = M + N
> > 
> > M = movable memory in non-hotpluggable memory (say DDR in the example)
> 
> Why do you need this memory to be on/offlineable if you cannot hotplug
> it?

We do not need the memory to be physcially hot added/removed. Instead we
just want it to be logically offlined so that these memory blocks are
no longer used by the OS which has offlined it and can be used by the
second OS. Once it is done using the memory for a certain use case it
can be returned back by onlining it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 19:25                   ` Sharath Kumar Bhat
@ 2017-10-23 19:35                     ` Michal Hocko
  2017-10-23 19:56                       ` Sharath Kumar Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-10-23 19:35 UTC (permalink / raw)
  To: Sharath Kumar Bhat; +Cc: linux-mm, akpm

On Mon 23-10-17 12:25:24, Sharath Kumar Bhat wrote:
> On Mon, Oct 23, 2017 at 09:04:59PM +0200, Michal Hocko wrote:
> > On Mon 23-10-17 11:48:52, Sharath Kumar Bhat wrote:
> > > On Mon, Oct 23, 2017 at 07:49:05PM +0200, Michal Hocko wrote:
> > [...]
> > > > I am really confused about your usecase then. Why do you want to make
> > > > non-hotplugable memory to be movable then?
> > > 
> > > Lets say,
> > > 
> > > The required total memory in the system which can be dynamically
> > > offlined/onlined, T = M + N
> > > 
> > > M = movable memory in non-hotpluggable memory (say DDR in the example)
> > 
> > Why do you need this memory to be on/offlineable if you cannot hotplug
> > it?
> 
> We do not need the memory to be physcially hot added/removed. Instead we
> just want it to be logically offlined so that these memory blocks are
> no longer used by the OS which has offlined it and can be used by the
> second OS. Once it is done using the memory for a certain use case it
> can be returned back by onlining it.

I am sorry for being dense here but why cannot you mark that memory
hotplugable? I assume you are under the control to set attributes of the
memory to the guest.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 19:35                     ` Michal Hocko
@ 2017-10-23 19:56                       ` Sharath Kumar Bhat
  2017-10-23 21:52                         ` Dave Hansen
  0 siblings, 1 reply; 20+ messages in thread
From: Sharath Kumar Bhat @ 2017-10-23 19:56 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sharath Kumar Bhat, linux-mm, akpm

On Mon, Oct 23, 2017 at 09:35:36PM +0200, Michal Hocko wrote:
> On Mon 23-10-17 12:25:24, Sharath Kumar Bhat wrote:
> > On Mon, Oct 23, 2017 at 09:04:59PM +0200, Michal Hocko wrote:
> > > On Mon 23-10-17 11:48:52, Sharath Kumar Bhat wrote:
> > > > On Mon, Oct 23, 2017 at 07:49:05PM +0200, Michal Hocko wrote:
> > > [...]
> > > > > I am really confused about your usecase then. Why do you want to make
> > > > > non-hotplugable memory to be movable then?
> > > > 
> > > > Lets say,
> > > > 
> > > > The required total memory in the system which can be dynamically
> > > > offlined/onlined, T = M + N
> > > > 
> > > > M = movable memory in non-hotpluggable memory (say DDR in the example)
> > > 
> > > Why do you need this memory to be on/offlineable if you cannot hotplug
> > > it?
> > 
> > We do not need the memory to be physcially hot added/removed. Instead we
> > just want it to be logically offlined so that these memory blocks are
> > no longer used by the OS which has offlined it and can be used by the
> > second OS. Once it is done using the memory for a certain use case it
> > can be returned back by onlining it.
> 
> I am sorry for being dense here but why cannot you mark that memory
> hotplugable? I assume you are under the control to set attributes of the
> memory to the guest.

When I said two OS's I meant multi-kernel environment sharing the same
hardware and not VMs. So we do not have the control to mark the memory
hotpluggable as done by BIOS through SRAT.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 19:56                       ` Sharath Kumar Bhat
@ 2017-10-23 21:52                         ` Dave Hansen
  2017-10-24  1:06                           ` Sharath Kumar Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2017-10-23 21:52 UTC (permalink / raw)
  To: sharath.k.bhat, Michal Hocko; +Cc: linux-mm, akpm

On 10/23/2017 12:56 PM, Sharath Kumar Bhat wrote:
>> I am sorry for being dense here but why cannot you mark that memory
>> hotplugable? I assume you are under the control to set attributes of the
>> memory to the guest.
> When I said two OS's I meant multi-kernel environment sharing the same
> hardware and not VMs. So we do not have the control to mark the memory
> hotpluggable as done by BIOS through SRAT.

If you are going as far as to pass in custom kernel command-line
arguments, there's a bunch of other fun stuff you can do.  ACPI table
overrides come to mind.

> This facility can be used by platform/BIOS vendors to provide a Linux
> compatible environment without modifying the underlying platform firmware.

https://www.kernel.org/doc/Documentation/acpi/initrd_table_override.txt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-23 21:52                         ` Dave Hansen
@ 2017-10-24  1:06                           ` Sharath Kumar Bhat
  2017-10-24  7:19                             ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Sharath Kumar Bhat @ 2017-10-24  1:06 UTC (permalink / raw)
  To: Dave Hansen; +Cc: sharath.k.bhat, Michal Hocko, linux-mm, akpm

On Mon, Oct 23, 2017 at 02:52:04PM -0700, Dave Hansen wrote:
> On 10/23/2017 12:56 PM, Sharath Kumar Bhat wrote:
> >> I am sorry for being dense here but why cannot you mark that memory
> >> hotplugable? I assume you are under the control to set attributes of the
> >> memory to the guest.
> > When I said two OS's I meant multi-kernel environment sharing the same
> > hardware and not VMs. So we do not have the control to mark the memory
> > hotpluggable as done by BIOS through SRAT.
> 
> If you are going as far as to pass in custom kernel command-line
> arguments, there's a bunch of other fun stuff you can do.  ACPI table
> overrides come to mind.
> 
> > This facility can be used by platform/BIOS vendors to provide a Linux
> > compatible environment without modifying the underlying platform firmware.
> 
> https://www.kernel.org/doc/Documentation/acpi/initrd_table_override.txt

I think ACPI table override won't be a generic solution to this problem and
instead would be a platform/architecture dependent solution which may not
be flexible for the users on different architectures. And moreover
'movable_node' is implemented with an assumption to provide the entire
hotpluggable memory as movable zone. This ACPI override would be against
that assumption. Also ACPI override would introduce additional topology
changes. Again this would have to change every time the total movable
memory requirement changes and the whole system and apps have to be
re-tuned (for job launch ex: numactl etc) to comphrehend this change.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-24  1:06                           ` Sharath Kumar Bhat
@ 2017-10-24  7:19                             ` Michal Hocko
  2017-10-25  0:53                               ` Sharath Kumar Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-10-24  7:19 UTC (permalink / raw)
  To: Sharath Kumar Bhat; +Cc: Dave Hansen, linux-mm, akpm

On Mon 23-10-17 18:06:33, Sharath Kumar Bhat wrote:
> On Mon, Oct 23, 2017 at 02:52:04PM -0700, Dave Hansen wrote:
> > On 10/23/2017 12:56 PM, Sharath Kumar Bhat wrote:
> > >> I am sorry for being dense here but why cannot you mark that memory
> > >> hotplugable? I assume you are under the control to set attributes of the
> > >> memory to the guest.
> > > When I said two OS's I meant multi-kernel environment sharing the same
> > > hardware and not VMs. So we do not have the control to mark the memory
> > > hotpluggable as done by BIOS through SRAT.
> > 
> > If you are going as far as to pass in custom kernel command-line
> > arguments, there's a bunch of other fun stuff you can do.  ACPI table
> > overrides come to mind.

absolutely agreed!

> > > This facility can be used by platform/BIOS vendors to provide a Linux
> > > compatible environment without modifying the underlying platform firmware.
> > 
> > https://www.kernel.org/doc/Documentation/acpi/initrd_table_override.txt
> 
> I think ACPI table override won't be a generic solution to this problem and
> instead would be a platform/architecture dependent solution which may not
> be flexible for the users on different architectures.

Do you have any specific architecture in mind?

> And moreover
> 'movable_node' is implemented with an assumption to provide the entire
> hotpluggable memory as movable zone. This ACPI override would be against
> that assumption.

This is true and in fact movable_node should become movable_memory over
time and only ranges marked as movable would become really movable. This
is a rather non-trivial change to do and there is not a great demand for
the feature so it is low on my TODO list.

> Also ACPI override would introduce additional topology
> changes. Again this would have to change every time the total movable
> memory requirement changes and the whole system and apps have to be
> re-tuned (for job launch ex: numactl etc) to comphrehend this change.

This is something you have to do anyway when the topology of the system
changes each boot.

That being said, I would really prefer to actually _remove_ kernel_core
parameter altogether. It is messy (just look at find_zone_movable_pfns_for_nodes
at al.) and the original usecase it has been added for [1] does not hold
anymore. Adding more stuff to workaround issues which can be handled
more cleanly is definitely not a right way to go.

[1] note that MOVABLE_ZONE has been originally added to help the
fragmentation avoidance.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-24  7:19                             ` Michal Hocko
@ 2017-10-25  0:53                               ` Sharath Kumar Bhat
  2017-10-25  6:38                                 ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Sharath Kumar Bhat @ 2017-10-25  0:53 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sharath Kumar Bhat, Dave Hansen, linux-mm, akpm

On Tue, Oct 24, 2017 at 09:19:06AM +0200, Michal Hocko wrote:
> On Mon 23-10-17 18:06:33, Sharath Kumar Bhat wrote:
> > On Mon, Oct 23, 2017 at 02:52:04PM -0700, Dave Hansen wrote:
> > > On 10/23/2017 12:56 PM, Sharath Kumar Bhat wrote:
> > > >> I am sorry for being dense here but why cannot you mark that memory
> > > >> hotplugable? I assume you are under the control to set attributes of the
> > > >> memory to the guest.
> > > > When I said two OS's I meant multi-kernel environment sharing the same
> > > > hardware and not VMs. So we do not have the control to mark the memory
> > > > hotpluggable as done by BIOS through SRAT.
> > > 
> > > If you are going as far as to pass in custom kernel command-line
> > > arguments, there's a bunch of other fun stuff you can do.  ACPI table
> > > overrides come to mind.
> 
> absolutely agreed!
> 
> > > > This facility can be used by platform/BIOS vendors to provide a Linux
> > > > compatible environment without modifying the underlying platform firmware.
> > > 
> > > https://www.kernel.org/doc/Documentation/acpi/initrd_table_override.txt
> > 
> > I think ACPI table override won't be a generic solution to this problem and
> > instead would be a platform/architecture dependent solution which may not
> > be flexible for the users on different architectures.
> 
> Do you have any specific architecture in mind?

There are no such restrictions related to architectures that we can run on
though we are currently testing on KNL, Xeon.

> 
> > And moreover
> > 'movable_node' is implemented with an assumption to provide the entire
> > hotpluggable memory as movable zone. This ACPI override would be against
> > that assumption.
> 
> This is true and in fact movable_node should become movable_memory over
> time and only ranges marked as movable would become really movable. This
> is a rather non-trivial change to do and there is not a great demand for
> the feature so it is low on my TODO list.

Do you mean to have a single kernel command-line 'movable_memory=' for this
purpose and remove all other kernel command-line parameters such as
'kernelcore=', 'movablecore=' and 'movable_node'? because after the kernel
boots up we can not gurantee that a contig memory range can be made zone
movable since any kernel allocations could pre-exist.

> 
> > Also ACPI override would introduce additional topology
> > changes. Again this would have to change every time the total movable
> > memory requirement changes and the whole system and apps have to be
> > re-tuned (for job launch ex: numactl etc) to comphrehend this change.
> 
> This is something you have to do anyway when the topology of the system
> changes each boot.

No, this is a manual tuning for job-launch, mem policy handling code etc.
which would be done once for a platform. But in this case based on the
application need the amount of movable memory will change so it is really
unfair to ask user to re-work their job launch and apps for every such
changes.

> 
> That being said, I would really prefer to actually _remove_ kernel_core
> parameter altogether. It is messy (just look at find_zone_movable_pfns_for_nodes
> at al.) and the original usecase it has been added for [1] does not hold
> anymore. Adding more stuff to workaround issues which can be handled
> more cleanly is definitely not a right way to go.

I agree that kernelcore handling is non-trivial in that function. But the
changes introduced by this patch are under 'movable_node' case handling in
find_zone_movable_pfns_for_nodes() and it does not cause any change to the
existing kernelcore behavior of the code. Also this enables all
multi-kernel users to make use of this functionality untill later when
new interface would be available for the same purpose.

> 
> [1] note that MOVABLE_ZONE has been originally added to help the
> fragmentation avoidance.

Isn't this true even now since ZONE_MOVABLE will populate only
MIGRATE_MOVABLE free list of pages? and other zones could have
MIGRATE_UNMOVABLE pages?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-25  0:53                               ` Sharath Kumar Bhat
@ 2017-10-25  6:38                                 ` Michal Hocko
  2017-10-25 22:01                                   ` Sharath Kumar Bhat
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-10-25  6:38 UTC (permalink / raw)
  To: Sharath Kumar Bhat; +Cc: Dave Hansen, linux-mm, akpm

On Tue 24-10-17 17:53:14, Sharath Kumar Bhat wrote:
> On Tue, Oct 24, 2017 at 09:19:06AM +0200, Michal Hocko wrote:
> > On Mon 23-10-17 18:06:33, Sharath Kumar Bhat wrote:
[...]
> > > And moreover
> > > 'movable_node' is implemented with an assumption to provide the entire
> > > hotpluggable memory as movable zone. This ACPI override would be against
> > > that assumption.
> > 
> > This is true and in fact movable_node should become movable_memory over
> > time and only ranges marked as movable would become really movable. This
> > is a rather non-trivial change to do and there is not a great demand for
> > the feature so it is low on my TODO list.
> 
> Do you mean to have a single kernel command-line 'movable_memory=' for this
> purpose and remove all other kernel command-line parameters such as
> 'kernelcore=', 'movablecore=' and 'movable_node'?

yes.

> because after the kernel
> boots up we can not gurantee that a contig memory range can be made zone
> movable since any kernel allocations could pre-exist.

No, I meant that the zone association would be done _only_ based by
memory attributes exported by ACPI or whatever is used to configure
memory ranges on the particular platform. So an early init code.

> > > Also ACPI override would introduce additional topology
> > > changes. Again this would have to change every time the total movable
> > > memory requirement changes and the whole system and apps have to be
> > > re-tuned (for job launch ex: numactl etc) to comphrehend this change.
> > 
> > This is something you have to do anyway when the topology of the system
> > changes each boot.
> 
> No, this is a manual tuning for job-launch, mem policy handling code etc.
> which would be done once for a platform. But in this case based on the
> application need the amount of movable memory will change so it is really
> unfair to ask user to re-work their job launch and apps for every such
> changes.

I am still confused. Why does the application even care about
movability?
 
> > That being said, I would really prefer to actually _remove_ kernel_core
> > parameter altogether. It is messy (just look at find_zone_movable_pfns_for_nodes
> > at al.) and the original usecase it has been added for [1] does not hold
> > anymore. Adding more stuff to workaround issues which can be handled
> > more cleanly is definitely not a right way to go.
> 
> I agree that kernelcore handling is non-trivial in that function. But the
> changes introduced by this patch are under 'movable_node' case handling in
> find_zone_movable_pfns_for_nodes() and it does not cause any change to the
> existing kernelcore behavior of the code. Also this enables all
> multi-kernel users to make use of this functionality untill later when
> new interface would be available for the same purpose.

The point is to not build on top and rather get rid of it completely.
 
> > [1] note that MOVABLE_ZONE has been originally added to help the
> > fragmentation avoidance.
> 
> Isn't this true even now since ZONE_MOVABLE will populate only
> MIGRATE_MOVABLE free list of pages? and other zones could have
> MIGRATE_UNMOVABLE pages?

My point was that the original motivation is gone because our compaction
code doesn't really depend on movable zone. So the movable zone is more
about making sure that the specific memory is migratable and so
offlineable.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-25  6:38                                 ` Michal Hocko
@ 2017-10-25 22:01                                   ` Sharath Kumar Bhat
  2017-10-26  7:36                                     ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Sharath Kumar Bhat @ 2017-10-25 22:01 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sharath Kumar Bhat, Dave Hansen, linux-mm, akpm

On Wed, Oct 25, 2017 at 08:38:52AM +0200, Michal Hocko wrote:
> On Tue 24-10-17 17:53:14, Sharath Kumar Bhat wrote:
> > On Tue, Oct 24, 2017 at 09:19:06AM +0200, Michal Hocko wrote:
> > > On Mon 23-10-17 18:06:33, Sharath Kumar Bhat wrote:
> [...]
> > > > And moreover
> > > > 'movable_node' is implemented with an assumption to provide the entire
> > > > hotpluggable memory as movable zone. This ACPI override would be against
> > > > that assumption.
> > > 
> > > This is true and in fact movable_node should become movable_memory over
> > > time and only ranges marked as movable would become really movable. This
> > > is a rather non-trivial change to do and there is not a great demand for
> > > the feature so it is low on my TODO list.
> > 
> > Do you mean to have a single kernel command-line 'movable_memory=' for this
> > purpose and remove all other kernel command-line parameters such as
> > 'kernelcore=', 'movablecore=' and 'movable_node'?
> 
> yes.

Ok then I believe it will let user to specify multiple memory ranges so
that admin can explicitly choose to have movable zones in either
hotpluggable or non-hotpluggable memories. Because in this use case the
requirement is to have the movable zones in both hotpluggable and
non-hotpluggable memories.

> 
> > because after the kernel
> > boots up we can not gurantee that a contig memory range can be made zone
> > movable since any kernel allocations could pre-exist.
> 
> No, I meant that the zone association would be done _only_ based by
> memory attributes exported by ACPI or whatever is used to configure
> memory ranges on the particular platform. So an early init code.
> 
> > > > Also ACPI override would introduce additional topology
> > > > changes. Again this would have to change every time the total movable
> > > > memory requirement changes and the whole system and apps have to be
> > > > re-tuned (for job launch ex: numactl etc) to comphrehend this change.
> > > 
> > > This is something you have to do anyway when the topology of the system
> > > changes each boot.
> > 
> > No, this is a manual tuning for job-launch, mem policy handling code etc.
> > which would be done once for a platform. But in this case based on the
> > application need the amount of movable memory will change so it is really
> > unfair to ask user to re-work their job launch and apps for every such
> > changes.
> 
> I am still confused. Why does the application even care about
> movability?

Right its not about movability, since 'movable_node' assumes that the entire
memory node is hotpluggable, to stay compatible with it the memory ranges of
non-hotpluggable memory that we want to be movable zone should be exposed as
a complete node. This increases the number of NUMA nodes and the total
no.of such nodes changes as the movable memory requirement changes.

>  
> > > That being said, I would really prefer to actually _remove_ kernel_core
> > > parameter altogether. It is messy (just look at find_zone_movable_pfns_for_nodes
> > > at al.) and the original usecase it has been added for [1] does not hold
> > > anymore. Adding more stuff to workaround issues which can be handled
> > > more cleanly is definitely not a right way to go.
> > 
> > I agree that kernelcore handling is non-trivial in that function. But the
> > changes introduced by this patch are under 'movable_node' case handling in
> > find_zone_movable_pfns_for_nodes() and it does not cause any change to the
> > existing kernelcore behavior of the code. Also this enables all
> > multi-kernel users to make use of this functionality untill later when
> > new interface would be available for the same purpose.
> 
> The point is to not build on top and rather get rid of it completely.

I thought you mentioned its a low priority on the TODO list and you
dont expect to see it in the near future. So till then there is no
existing solution that one case use.

>  
> > > [1] note that MOVABLE_ZONE has been originally added to help the
> > > fragmentation avoidance.
> > 
> > Isn't this true even now since ZONE_MOVABLE will populate only
> > MIGRATE_MOVABLE free list of pages? and other zones could have
> > MIGRATE_UNMOVABLE pages?
> 
> My point was that the original motivation is gone because our compaction
> code doesn't really depend on movable zone. So the movable zone is more
> about making sure that the specific memory is migratable and so
> offlineable.
> 
> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm: fix movable_node kernel command-line
  2017-10-25 22:01                                   ` Sharath Kumar Bhat
@ 2017-10-26  7:36                                     ` Michal Hocko
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2017-10-26  7:36 UTC (permalink / raw)
  To: Sharath Kumar Bhat; +Cc: Dave Hansen, linux-mm, akpm

On Wed 25-10-17 15:01:32, Sharath Kumar Bhat wrote:
> On Wed, Oct 25, 2017 at 08:38:52AM +0200, Michal Hocko wrote:
> > On Tue 24-10-17 17:53:14, Sharath Kumar Bhat wrote:
> > > On Tue, Oct 24, 2017 at 09:19:06AM +0200, Michal Hocko wrote:
> > > > On Mon 23-10-17 18:06:33, Sharath Kumar Bhat wrote:
> > [...]
> > > > > And moreover
> > > > > 'movable_node' is implemented with an assumption to provide the entire
> > > > > hotpluggable memory as movable zone. This ACPI override would be against
> > > > > that assumption.
> > > > 
> > > > This is true and in fact movable_node should become movable_memory over
> > > > time and only ranges marked as movable would become really movable. This
> > > > is a rather non-trivial change to do and there is not a great demand for
> > > > the feature so it is low on my TODO list.
> > > 
> > > Do you mean to have a single kernel command-line 'movable_memory=' for this
> > > purpose and remove all other kernel command-line parameters such as
> > > 'kernelcore=', 'movablecore=' and 'movable_node'?
> > 
> > yes.
> 
> Ok then I believe it will let user to specify multiple memory ranges so
> that admin can explicitly choose to have movable zones in either
> hotpluggable or non-hotpluggable memories. Because in this use case the
> requirement is to have the movable zones in both hotpluggable and
> non-hotpluggable memories.

Why? Please be more specific.

[...]

> > I am still confused. Why does the application even care about
> > movability?
> 
> Right its not about movability, since 'movable_node' assumes that the entire
> memory node is hotpluggable, to stay compatible with it the memory ranges of
> non-hotpluggable memory that we want to be movable zone should be exposed as
> a complete node. This increases the number of NUMA nodes and the total
> no.of such nodes changes as the movable memory requirement changes.

And that is the primary reason why this interface is a hack and should
be replaced.

> > > > That being said, I would really prefer to actually _remove_ kernel_core
> > > > parameter altogether. It is messy (just look at find_zone_movable_pfns_for_nodes
> > > > at al.) and the original usecase it has been added for [1] does not hold
> > > > anymore. Adding more stuff to workaround issues which can be handled
> > > > more cleanly is definitely not a right way to go.
> > > 
> > > I agree that kernelcore handling is non-trivial in that function. But the
> > > changes introduced by this patch are under 'movable_node' case handling in
> > > find_zone_movable_pfns_for_nodes() and it does not cause any change to the
> > > existing kernelcore behavior of the code. Also this enables all
> > > multi-kernel users to make use of this functionality untill later when
> > > new interface would be available for the same purpose.
> > 
> > The point is to not build on top and rather get rid of it completely.
> 
> I thought you mentioned its a low priority on the TODO list and you
> dont expect to see it in the near future. So till then there is no
> existing solution that one case use.

Feel free to work on it. But seriously. The whole memory hotplug land is
full of half ass solutions where everybody just cared about a specific
usecase without thinking more about a more generic way to implement the
feature. It's finally time to stop that kind of approach and finaly do
things properly.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-10-26  7:36 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-20 23:32 [PATCH] mm: fix movable_node kernel command-line Sharath Kumar Bhat
2017-10-23 12:52 ` Michal Hocko
2017-10-23 16:03   ` Sharath Kumar Bhat
2017-10-23 16:15     ` Michal Hocko
2017-10-23 17:14       ` Sharath Kumar Bhat
2017-10-23 17:20         ` Michal Hocko
2017-10-23 17:35           ` Sharath Kumar Bhat
2017-10-23 17:49             ` Michal Hocko
2017-10-23 18:48               ` Sharath Kumar Bhat
2017-10-23 19:04                 ` Michal Hocko
2017-10-23 19:25                   ` Sharath Kumar Bhat
2017-10-23 19:35                     ` Michal Hocko
2017-10-23 19:56                       ` Sharath Kumar Bhat
2017-10-23 21:52                         ` Dave Hansen
2017-10-24  1:06                           ` Sharath Kumar Bhat
2017-10-24  7:19                             ` Michal Hocko
2017-10-25  0:53                               ` Sharath Kumar Bhat
2017-10-25  6:38                                 ` Michal Hocko
2017-10-25 22:01                                   ` Sharath Kumar Bhat
2017-10-26  7:36                                     ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.