All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm, memory_hotplug: support movable_node for hotplugable nodes
@ 2017-06-08 12:23 ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-06-08 12:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Andi Kleen, David Rientjes, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

movable_node kernel parameter allows to make hotplugable NUMA
nodes to put all the hotplugable memory into movable zone which
allows more or less reliable memory hotremove.  At least this
is the case for the NUMA nodes present during the boot (see
find_zone_movable_pfns_for_nodes).

This is not the case for the memory hotplug, though.

	echo online > /sys/devices/system/memory/memoryXYZ/status

will default to a kernel zone (usually ZONE_NORMAL) unless the
particular memblock is already in the movable zone range which is not
the case normally when onlining the memory from the udev rule context
for a freshly hotadded NUMA node. The only option currently is to have a
special udev rule to echo online_movable to all memblocks belonging to
such a node which is rather clumsy. Not the mention this is inconsistent
as well because what ended up in the movable zone during the boot will
end up in a kernel zone after hotremove & hotadd without special care.

It would be nice to reuse memblock_is_hotpluggable but the runtime
hotplug doesn't have that information available because the boot and
hotplug paths are not shared and it would be really non trivial to
make them use the same code path because the runtime hotplug doesn't
play with the memblock allocator at all.

Teach move_pfn_range that MMOP_ONLINE_KEEP can use the movable zone if
movable_node is enabled and the range doesn't overlap with the existing
normal zone. This should provide a reasonable default onlining strategy.

Strictly speaking the semantic is not identical with the boot time
initialization because find_zone_movable_pfns_for_nodes covers only the
hotplugable range as described by the BIOS/FW. From my experience this
is usually a full node though (except for Node0 which is special and
never goes away completely). If this turns out to be a problem in the
real life we can tweak the code to store hotplug flag into memblocks
but let's keep this simple now.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi Andrew,
I've posted this as an RFC previously [1] and there haven't been any
objections to the approach so I've dropped the RFC and sending it for
inclusion. The only change since the last time is the update of the
documentation to clarify the semantic as suggested by Reza Arbab.

[1] http://lkml.kernel.org/r/20170601122004.32732-1-mhocko@kernel.org

 Documentation/memory-hotplug.txt | 12 +++++++++---
 mm/memory_hotplug.c              | 19 ++++++++++++++++---
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 670f3ded0802..5c628e19d6cd 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -282,20 +282,26 @@ offlined it is possible to change the individual block's state by writing to the
 % echo online > /sys/devices/system/memory/memoryXXX/state
 
 This onlining will not change the ZONE type of the target memory block,
-If the memory block is in ZONE_NORMAL, you can change it to ZONE_MOVABLE:
+If the memory block doesn't belong to any zone an appropriate kernel zone
+(usually ZONE_NORMAL) will be used unless movable_node kernel command line
+option is specified when ZONE_MOVABLE will be used.
+
+You can explicitly request to associate it with ZONE_MOVABLE by
 
 % echo online_movable > /sys/devices/system/memory/memoryXXX/state
 (NOTE: current limit: this memory block must be adjacent to ZONE_MOVABLE)
 
-And if the memory block is in ZONE_MOVABLE, you can change it to ZONE_NORMAL:
+Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by:
 
 % echo online_kernel > /sys/devices/system/memory/memoryXXX/state
 (NOTE: current limit: this memory block must be adjacent to ZONE_NORMAL)
 
+An explicit zone onlining can fail (e.g. when the range is already within
+and existing and incompatible zone already).
+
 After this, memory block XXX's state will be 'online' and the amount of
 available memory will be increased.
 
-Currently, newly added memory is added as ZONE_NORMAL (for powerpc, ZONE_DMA).
 This may be changed in future.
 
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b98fb0b3ae11..74d75583736c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -943,6 +943,19 @@ struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
 	return &pgdat->node_zones[ZONE_NORMAL];
 }
 
+static inline bool movable_pfn_range(int nid, struct zone *default_zone,
+		unsigned long start_pfn, unsigned long nr_pages)
+{
+	if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
+				MMOP_ONLINE_KERNEL))
+		return true;
+
+	if (!movable_node_is_enabled())
+		return false;
+
+	return !zone_intersects(default_zone, start_pfn, nr_pages);
+}
+
 /*
  * Associates the given pfn range with the given node and the zone appropriate
  * for the given online type.
@@ -958,10 +971,10 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
 		/*
 		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
 		 * movable zone if that is not possible (e.g. we are within
-		 * or past the existing movable zone)
+		 * or past the existing movable zone). movable_node overrides
+		 * this default and defaults to movable zone
 		 */
-		if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
-					MMOP_ONLINE_KERNEL))
+		if (movable_pfn_range(nid, zone, start_pfn, nr_pages))
 			zone = movable_zone;
 	} else if (online_type == MMOP_ONLINE_MOVABLE) {
 		zone = &pgdat->node_zones[ZONE_MOVABLE];
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-06-15 15:43 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-08 12:23 [PATCH] mm, memory_hotplug: support movable_node for hotplugable nodes Michal Hocko
2017-06-08 12:23 ` Michal Hocko
2017-06-10 14:33 ` Wei Yang
2017-06-12  6:35   ` Michal Hocko
2017-06-12  6:35     ` Michal Hocko
2017-06-11  1:45 ` Wei Yang
2017-06-12  6:37   ` Michal Hocko
2017-06-12  6:37     ` Michal Hocko
2017-06-12  4:28 ` Wei Yang
2017-06-12  6:45   ` Michal Hocko
2017-06-12  6:45     ` Michal Hocko
2017-06-14  9:06     ` Wei Yang
2017-06-14  9:07       ` Vlastimil Babka
2017-06-14  9:07         ` Vlastimil Babka
2017-06-15  1:03         ` Wei Yang
2017-06-15  3:13     ` Wei Yang
2017-06-15  8:16       ` Michal Hocko
2017-06-15  8:16         ` Michal Hocko
2017-06-12  8:58 ` Vlastimil Babka
2017-06-12  8:58   ` Vlastimil Babka
2017-06-12 11:12 ` [PATCH -v2] " Michal Hocko
2017-06-12 11:12   ` Michal Hocko
2017-06-15  3:29 ` [PATCH] " Wei Yang
2017-06-15  8:24   ` Michal Hocko
2017-06-15  8:24     ` Michal Hocko
2017-06-15 15:43 ` Reza Arbab
2017-06-15 15:43   ` Reza Arbab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.