All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] mm, memory_hotplug: remove zone onlining restriction
@ 2017-06-29  7:35 Michal Hocko
  2017-06-29  7:35 ` [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering Michal Hocko
  2017-06-29  7:35 ` [PATCH 2/2] mm, memory_hotplug: remove zone restrictions Michal Hocko
  0 siblings, 2 replies; 48+ messages in thread
From: Michal Hocko @ 2017-06-29  7:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML

Hi,
I am sending this as an RFC because this hasn't seen a lot of testing
yet but I would like to see whether the semantic I came up with (see
patch 2) is sensible. This work should help Joonsoo with his CMA zone
based approach when reusing MOVABLE zone. I think it will also help to
remove more code from the memory hotplug (e.g. zone shrinking).

Patch 1 restores original memoryXY/valid_zones semantic wrt zone
ordering. This can be merged without patch 2 which removes the zone
overlap restriction and defines a semantic for the default onlining. See
more in the patch.

Questions, concerns, objections?

Shortlog
Michal Hocko (2):
      mm, memory_hotplug: display allowed zones in the preferred ordering
      mm, memory_hotplug: remove zone restrictions

Diffstat
 drivers/base/memory.c          | 30 ++++++++++-----
 include/linux/memory_hotplug.h |  2 +-
 mm/memory_hotplug.c            | 87 +++++++++++++++++-------------------------
 3 files changed, 55 insertions(+), 64 deletions(-)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH 0/2] mm, memory_hotplug: remove zone onlining restriction
@ 2017-06-29  7:35 Michal Hocko
  2017-06-29  7:35 ` [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering Michal Hocko
  2017-06-29  7:35 ` [PATCH 2/2] mm, memory_hotplug: remove zone restrictions Michal Hocko
  0 siblings, 2 replies; 48+ messages in thread
From: Michal Hocko @ 2017-06-29  7:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML

Hi,
I am sending this as an RFC because this hasn't seen a lot of testing
yet but I would like to see whether the semantic I came up with (see
patch 2) is sensible. This work should help Joonsoo with his CMA zone
based approach when reusing MOVABLE zone. I think it will also help to
remove more code from the memory hotplug (e.g. zone shrinking).

Patch 1 restores original memoryXY/valid_zones semantic wrt zone
ordering. This can be merged without patch 2 which removes the zone
overlap restriction and defines a semantic for the default onlining. See
more in the patch.

Questions, concerns, objections?

Shortlog
Michal Hocko (2):
      mm, memory_hotplug: display allowed zones in the preferred ordering
      mm, memory_hotplug: remove zone restrictions

Diffstat
 drivers/base/memory.c          | 30 ++++++++++-----
 include/linux/memory_hotplug.h |  2 +-
 mm/memory_hotplug.c            | 87 +++++++++++++++++-------------------------
 3 files changed, 55 insertions(+), 64 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering
  2017-06-29  7:35 [RFC PATCH 0/2] mm, memory_hotplug: remove zone onlining restriction Michal Hocko
@ 2017-06-29  7:35 ` Michal Hocko
  2017-06-30  0:45   ` Joonsoo Kim
  2017-07-07 14:34   ` Vlastimil Babka
  2017-06-29  7:35 ` [PATCH 2/2] mm, memory_hotplug: remove zone restrictions Michal Hocko
  1 sibling, 2 replies; 48+ messages in thread
From: Michal Hocko @ 2017-06-29  7:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Prior to "mm, memory_hotplug: do not associate hotadded memory to zones
until online" we used to allow to change the valid zone types of a
memory block if it is adjacent to a different zone type. This fact was
reflected in memoryNN/valid_zones by the ordering of printed zones.
The first one was default (echo online > memoryNN/state) and the other
one could be onlined explicitly by online_{movable,kernel}. This
behavior was removed by the said patch and as such the ordering was
not all that important. In most cases a kernel zone would be default
anyway. The only exception is movable_node handled by "mm,
memory_hotplug: support movable_node for hotpluggable nodes".

Let's reintroduce this behavior again because later patch will remove
the zone overlap restriction and so user will be allowed to online
kernel resp. movable block regardless of its placement. Original
behavior will then become significant again because it would be
non-trivial for users to see what is the default zone to online into.

Implementation is really simple. Pull out zone selection out of
move_pfn_range into zone_for_pfn_range helper and use it in
show_valid_zones to display the zone for default onlining and then
both kernel and movable if they are allowed. Default online zone is not
duplicated.

Signed-off-by: Michal Hocko <mhocko@suse.com>

fold me "mm, memory_hotplug: display allowed zones in the preferred ordering"
---
 drivers/base/memory.c          | 33 +++++++++++++------
 include/linux/memory_hotplug.h |  2 +-
 mm/memory_hotplug.c            | 73 ++++++++++++++++++++++++------------------
 3 files changed, 65 insertions(+), 43 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c7c4e0325cdb..26383af9900c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -388,6 +388,22 @@ static ssize_t show_phys_device(struct device *dev,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
+static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
+		unsigned long nr_pages, int online_type,
+		struct zone *default_zone)
+{
+	struct zone *zone;
+
+	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
+		return;
+
+	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
+	if (zone != default_zone) {
+		strcat(buf, " ");
+		strcat(buf, zone->name);
+	}
+}
+
 static ssize_t show_valid_zones(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
@@ -395,7 +411,7 @@ static ssize_t show_valid_zones(struct device *dev,
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
 	unsigned long valid_start_pfn, valid_end_pfn;
-	bool append = false;
+	struct zone *default_zone;
 	int nid;
 
 	/*
@@ -418,16 +434,13 @@ static ssize_t show_valid_zones(struct device *dev,
 	}
 
 	nid = pfn_to_nid(start_pfn);
-	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) {
-		strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name);
-		append = true;
-	}
+	default_zone = zone_for_pfn_range(MMOP_ONLINE_KEEP, nid, start_pfn, nr_pages);
+	strcat(buf, default_zone->name);
 
-	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_MOVABLE)) {
-		if (append)
-			strcat(buf, " ");
-		strcat(buf, NODE_DATA(nid)->node_zones[ZONE_MOVABLE].name);
-	}
+	print_allowed_zone(buf, nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL,
+			default_zone);
+	print_allowed_zone(buf, nid, start_pfn, nr_pages, MMOP_ONLINE_MOVABLE,
+			default_zone);
 out:
 	strcat(buf, "\n");
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index c8a5056a5ae0..5e6e4cc36ff4 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -319,6 +319,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
 extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
 		int online_type);
-extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn,
+extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
 		unsigned long nr_pages);
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b4015a39d108..6b9a60115e37 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -836,31 +836,6 @@ static void node_states_set_node(int node, struct memory_notify *arg)
 	node_set_state(node, N_MEMORY);
 }
 
-bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
-{
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
-
-	/*
-	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
-	 * physically before ZONE_MOVABLE. All we need is they do not
-	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
-	 * though so let's stick with it for simplicity for now.
-	 * TODO make sure we do not overlap with ZONE_DEVICE
-	 */
-	if (online_type == MMOP_ONLINE_KERNEL) {
-		if (zone_is_empty(movable_zone))
-			return true;
-		return movable_zone->zone_start_pfn >= pfn + nr_pages;
-	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		return zone_end_pfn(default_zone) <= pfn;
-	}
-
-	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
-	return online_type == MMOP_ONLINE_KEEP;
-}
-
 static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages)
 {
@@ -919,7 +894,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
  * If no kernel zone covers this pfn range it will automatically go
  * to the ZONE_NORMAL.
  */
-struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
 		unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
@@ -935,6 +910,31 @@ struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
 	return &pgdat->node_zones[ZONE_NORMAL];
 }
 
+bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
+{
+	struct pglist_data *pgdat = NODE_DATA(nid);
+	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
+	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
+
+	/*
+	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
+	 * physically before ZONE_MOVABLE. All we need is they do not
+	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
+	 * though so let's stick with it for simplicity for now.
+	 * TODO make sure we do not overlap with ZONE_DEVICE
+	 */
+	if (online_type == MMOP_ONLINE_KERNEL) {
+		if (zone_is_empty(movable_zone))
+			return true;
+		return movable_zone->zone_start_pfn >= pfn + nr_pages;
+	} else if (online_type == MMOP_ONLINE_MOVABLE) {
+		return zone_end_pfn(default_zone) <= pfn;
+	}
+
+	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
+	return online_type == MMOP_ONLINE_KEEP;
+}
+
 static inline bool movable_pfn_range(int nid, struct zone *default_zone,
 		unsigned long start_pfn, unsigned long nr_pages)
 {
@@ -948,12 +948,8 @@ static inline bool movable_pfn_range(int nid, struct zone *default_zone,
 	return !zone_intersects(default_zone, start_pfn, nr_pages);
 }
 
-/*
- * Associates the given pfn range with the given node and the zone appropriate
- * for the given online type.
- */
-static struct zone * __meminit move_pfn_range(int online_type, int nid,
-		unsigned long start_pfn, unsigned long nr_pages)
+struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
+		unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
 	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
@@ -972,6 +968,19 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
 		zone = &pgdat->node_zones[ZONE_MOVABLE];
 	}
 
+	return zone;
+}
+
+/*
+ * Associates the given pfn range with the given node and the zone appropriate
+ * for the given online type.
+ */
+static struct zone * __meminit move_pfn_range(int online_type, int nid,
+		unsigned long start_pfn, unsigned long nr_pages)
+{
+	struct zone *zone;
+
+	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
 	move_pfn_range_to_zone(zone, start_pfn, nr_pages);
 	return zone;
 }
-- 
2.11.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering
@ 2017-06-29  7:35 ` Michal Hocko
  2017-06-30  0:45   ` Joonsoo Kim
  2017-07-07 14:34   ` Vlastimil Babka
  0 siblings, 2 replies; 48+ messages in thread
From: Michal Hocko @ 2017-06-29  7:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Prior to "mm, memory_hotplug: do not associate hotadded memory to zones
until online" we used to allow to change the valid zone types of a
memory block if it is adjacent to a different zone type. This fact was
reflected in memoryNN/valid_zones by the ordering of printed zones.
The first one was default (echo online > memoryNN/state) and the other
one could be onlined explicitly by online_{movable,kernel}. This
behavior was removed by the said patch and as such the ordering was
not all that important. In most cases a kernel zone would be default
anyway. The only exception is movable_node handled by "mm,
memory_hotplug: support movable_node for hotpluggable nodes".

Let's reintroduce this behavior again because later patch will remove
the zone overlap restriction and so user will be allowed to online
kernel resp. movable block regardless of its placement. Original
behavior will then become significant again because it would be
non-trivial for users to see what is the default zone to online into.

Implementation is really simple. Pull out zone selection out of
move_pfn_range into zone_for_pfn_range helper and use it in
show_valid_zones to display the zone for default onlining and then
both kernel and movable if they are allowed. Default online zone is not
duplicated.

Signed-off-by: Michal Hocko <mhocko@suse.com>

fold me "mm, memory_hotplug: display allowed zones in the preferred ordering"
---
 drivers/base/memory.c          | 33 +++++++++++++------
 include/linux/memory_hotplug.h |  2 +-
 mm/memory_hotplug.c            | 73 ++++++++++++++++++++++++------------------
 3 files changed, 65 insertions(+), 43 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c7c4e0325cdb..26383af9900c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -388,6 +388,22 @@ static ssize_t show_phys_device(struct device *dev,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
+static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
+		unsigned long nr_pages, int online_type,
+		struct zone *default_zone)
+{
+	struct zone *zone;
+
+	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
+		return;
+
+	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
+	if (zone != default_zone) {
+		strcat(buf, " ");
+		strcat(buf, zone->name);
+	}
+}
+
 static ssize_t show_valid_zones(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
@@ -395,7 +411,7 @@ static ssize_t show_valid_zones(struct device *dev,
 	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
 	unsigned long valid_start_pfn, valid_end_pfn;
-	bool append = false;
+	struct zone *default_zone;
 	int nid;
 
 	/*
@@ -418,16 +434,13 @@ static ssize_t show_valid_zones(struct device *dev,
 	}
 
 	nid = pfn_to_nid(start_pfn);
-	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) {
-		strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name);
-		append = true;
-	}
+	default_zone = zone_for_pfn_range(MMOP_ONLINE_KEEP, nid, start_pfn, nr_pages);
+	strcat(buf, default_zone->name);
 
-	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_MOVABLE)) {
-		if (append)
-			strcat(buf, " ");
-		strcat(buf, NODE_DATA(nid)->node_zones[ZONE_MOVABLE].name);
-	}
+	print_allowed_zone(buf, nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL,
+			default_zone);
+	print_allowed_zone(buf, nid, start_pfn, nr_pages, MMOP_ONLINE_MOVABLE,
+			default_zone);
 out:
 	strcat(buf, "\n");
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index c8a5056a5ae0..5e6e4cc36ff4 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -319,6 +319,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
 extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
 		int online_type);
-extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn,
+extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
 		unsigned long nr_pages);
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b4015a39d108..6b9a60115e37 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -836,31 +836,6 @@ static void node_states_set_node(int node, struct memory_notify *arg)
 	node_set_state(node, N_MEMORY);
 }
 
-bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
-{
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
-
-	/*
-	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
-	 * physically before ZONE_MOVABLE. All we need is they do not
-	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
-	 * though so let's stick with it for simplicity for now.
-	 * TODO make sure we do not overlap with ZONE_DEVICE
-	 */
-	if (online_type == MMOP_ONLINE_KERNEL) {
-		if (zone_is_empty(movable_zone))
-			return true;
-		return movable_zone->zone_start_pfn >= pfn + nr_pages;
-	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		return zone_end_pfn(default_zone) <= pfn;
-	}
-
-	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
-	return online_type == MMOP_ONLINE_KEEP;
-}
-
 static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages)
 {
@@ -919,7 +894,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
  * If no kernel zone covers this pfn range it will automatically go
  * to the ZONE_NORMAL.
  */
-struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
 		unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
@@ -935,6 +910,31 @@ struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
 	return &pgdat->node_zones[ZONE_NORMAL];
 }
 
+bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
+{
+	struct pglist_data *pgdat = NODE_DATA(nid);
+	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
+	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
+
+	/*
+	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
+	 * physically before ZONE_MOVABLE. All we need is they do not
+	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
+	 * though so let's stick with it for simplicity for now.
+	 * TODO make sure we do not overlap with ZONE_DEVICE
+	 */
+	if (online_type == MMOP_ONLINE_KERNEL) {
+		if (zone_is_empty(movable_zone))
+			return true;
+		return movable_zone->zone_start_pfn >= pfn + nr_pages;
+	} else if (online_type == MMOP_ONLINE_MOVABLE) {
+		return zone_end_pfn(default_zone) <= pfn;
+	}
+
+	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
+	return online_type == MMOP_ONLINE_KEEP;
+}
+
 static inline bool movable_pfn_range(int nid, struct zone *default_zone,
 		unsigned long start_pfn, unsigned long nr_pages)
 {
@@ -948,12 +948,8 @@ static inline bool movable_pfn_range(int nid, struct zone *default_zone,
 	return !zone_intersects(default_zone, start_pfn, nr_pages);
 }
 
-/*
- * Associates the given pfn range with the given node and the zone appropriate
- * for the given online type.
- */
-static struct zone * __meminit move_pfn_range(int online_type, int nid,
-		unsigned long start_pfn, unsigned long nr_pages)
+struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
+		unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
 	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
@@ -972,6 +968,19 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
 		zone = &pgdat->node_zones[ZONE_MOVABLE];
 	}
 
+	return zone;
+}
+
+/*
+ * Associates the given pfn range with the given node and the zone appropriate
+ * for the given online type.
+ */
+static struct zone * __meminit move_pfn_range(int online_type, int nid,
+		unsigned long start_pfn, unsigned long nr_pages)
+{
+	struct zone *zone;
+
+	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
 	move_pfn_range_to_zone(zone, start_pfn, nr_pages);
 	return zone;
 }
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-06-29  7:35 [RFC PATCH 0/2] mm, memory_hotplug: remove zone onlining restriction Michal Hocko
  2017-06-29  7:35 ` [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering Michal Hocko
@ 2017-06-29  7:35 ` Michal Hocko
  2017-06-30  1:16   ` Joonsoo Kim
                     ` (3 more replies)
  1 sibling, 4 replies; 48+ messages in thread
From: Michal Hocko @ 2017-06-29  7:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
to precede the Movable zone in the physical memory range. The purpose of
the movable zone is, however, not bound to any physical memory restriction.
It merely defines a class of migrateable and reclaimable memory.

There are users (e.g. CMA) who might want to reserve specific physical
memory ranges for their own purpose. Moreover our pfn walkers have to be
prepared for zones overlapping in the physical range already because we
do support interleaving NUMA nodes and therefore zones can interleave as
well. This means we can allow each memory block to be associated with a
different zone.

Loosen the current onlining semantic and allow explicit onlining type on
any memblock. That means that online_{kernel,movable} will be allowed
regardless of the physical address of the memblock as long as it is
offline of course. This might result in moveble zone overlapping with
other kernel zones. Default onlining then becomes a bit tricky but still
sensible. echo online > memoryXY/state will online the given block to
	1) the default zone if the given range is outside of any zone
	2) the enclosing zone if such a zone doesn't interleave with
	   any other zone
        3) the default zone if more zones interleave for this range
where default zone is movable zone only if movable_node is enabled
otherwise it is a kernel zone.

Here is an example of the semantic with (movable_node is not present but
it work in an analogous way). We start with following memblocks, all of
them offline
memory34/valid_zones:Normal Movable
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable

Now, we online block 34 in default mode and block 37 as movable
root@test1:/sys/devices/system/node/node1# echo online > memory34/state
root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable

As we can see all other blocks can still be onlined both into Normal and
Movable zones and the Normal is default because the Movable zone spans
only block37 now.
root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Movable Normal
memory39/valid_zones:Movable Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable

Now the default zone for blocks 37-41 has changed because movable zone
spans that range.
root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable

Note that the block 39 now belongs to the zone Normal and so block38
falls into Normal by default as well.

For completness
root@test1:/sys/devices/system/node/node1# for i in memory[34]?
do
	echo online > $i/state 2>/dev/null
done

memory34/valid_zones:Normal
memory35/valid_zones:Normal
memory36/valid_zones:Normal
memory37/valid_zones:Movable
memory38/valid_zones:Normal
memory39/valid_zones:Normal
memory40/valid_zones:Movable
memory41/valid_zones:Movable

Implementation wise the change is quite straightforward. We can get rid
of allow_online_pfn_range altogether. online_pages allows only offline
nodes already. The original default_zone_for_pfn will become
default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
above semantic. zone_for_pfn_range is slightly reorganized to implement
kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
a catch all default behavior.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/base/memory.c |  3 ---
 mm/memory_hotplug.c   | 74 ++++++++++++++++-----------------------------------
 2 files changed, 23 insertions(+), 54 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 26383af9900c..4e3b61cda520 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -394,9 +394,6 @@ static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
 {
 	struct zone *zone;
 
-	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
-		return;
-
 	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
 	if (zone != default_zone) {
 		strcat(buf, " ");
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6b9a60115e37..670f7acbecf4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -894,7 +894,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
  * If no kernel zone covers this pfn range it will automatically go
  * to the ZONE_NORMAL.
  */
-static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn,
 		unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
@@ -910,65 +910,40 @@ static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
 	return &pgdat->node_zones[ZONE_NORMAL];
 }
 
-bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
+static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+		unsigned long nr_pages)
 {
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
+	struct zone *kernel_zone = default_kernel_zone_for_pfn(nid, start_pfn,
+			nr_pages);
+	struct zone *movable_zone = &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
+	bool in_kernel = zone_intersects(kernel_zone, start_pfn, nr_pages);
+	bool in_movable = zone_intersects(movable_zone, start_pfn, nr_pages);
 
 	/*
-	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
-	 * physically before ZONE_MOVABLE. All we need is they do not
-	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
-	 * though so let's stick with it for simplicity for now.
-	 * TODO make sure we do not overlap with ZONE_DEVICE
+	 * We inherit the existing zone in a simple case where zones do not
+	 * overlap in the given range
 	 */
-	if (online_type == MMOP_ONLINE_KERNEL) {
-		if (zone_is_empty(movable_zone))
-			return true;
-		return movable_zone->zone_start_pfn >= pfn + nr_pages;
-	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		return zone_end_pfn(default_zone) <= pfn;
-	}
-
-	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
-	return online_type == MMOP_ONLINE_KEEP;
-}
-
-static inline bool movable_pfn_range(int nid, struct zone *default_zone,
-		unsigned long start_pfn, unsigned long nr_pages)
-{
-	if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
-				MMOP_ONLINE_KERNEL))
-		return true;
-
-	if (!movable_node_is_enabled())
-		return false;
+	if (in_kernel ^ in_movable)
+		return (in_kernel) ? kernel_zone : movable_zone;
 
-	return !zone_intersects(default_zone, start_pfn, nr_pages);
+	/*
+	 * If the range doesn't belong to any zone or two zones overlap in the
+	 * given range then we use movable zone only if movable_node is
+	 * enabled because we always online to a kernel zone by default.
+	 */
+	return movable_node_enabled ? movable_zone : kernel_zone;
 }
 
 struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
 		unsigned long nr_pages)
 {
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
+	if (online_type == MMOP_ONLINE_KERNEL)
+		return default_kernel_zone_for_pfn(nid, start_pfn, nr_pages);
 
-	if (online_type == MMOP_ONLINE_KEEP) {
-		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-		/*
-		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
-		 * movable zone if that is not possible (e.g. we are within
-		 * or past the existing movable zone). movable_node overrides
-		 * this default and defaults to movable zone
-		 */
-		if (movable_pfn_range(nid, zone, start_pfn, nr_pages))
-			zone = movable_zone;
-	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		zone = &pgdat->node_zones[ZONE_MOVABLE];
-	}
+	if (online_type == MMOP_ONLINE_MOVABLE)
+		return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
 
-	return zone;
+	return default_zone_for_pfn(nid, start_pfn, nr_pages);
 }
 
 /*
@@ -997,9 +972,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	struct memory_notify arg;
 
 	nid = pfn_to_nid(pfn);
-	if (!allow_online_pfn_range(nid, pfn, nr_pages, online_type))
-		return -EINVAL;
-
 	/* associate pfn range with the zone */
 	zone = move_pfn_range(online_type, nid, pfn, nr_pages);
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-06-29  7:35 ` Michal Hocko
  2017-06-30  1:16   ` Joonsoo Kim
                     ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Michal Hocko @ 2017-06-29  7:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
to precede the Movable zone in the physical memory range. The purpose of
the movable zone is, however, not bound to any physical memory restriction.
It merely defines a class of migrateable and reclaimable memory.

There are users (e.g. CMA) who might want to reserve specific physical
memory ranges for their own purpose. Moreover our pfn walkers have to be
prepared for zones overlapping in the physical range already because we
do support interleaving NUMA nodes and therefore zones can interleave as
well. This means we can allow each memory block to be associated with a
different zone.

Loosen the current onlining semantic and allow explicit onlining type on
any memblock. That means that online_{kernel,movable} will be allowed
regardless of the physical address of the memblock as long as it is
offline of course. This might result in moveble zone overlapping with
other kernel zones. Default onlining then becomes a bit tricky but still
sensible. echo online > memoryXY/state will online the given block to
	1) the default zone if the given range is outside of any zone
	2) the enclosing zone if such a zone doesn't interleave with
	   any other zone
        3) the default zone if more zones interleave for this range
where default zone is movable zone only if movable_node is enabled
otherwise it is a kernel zone.

Here is an example of the semantic with (movable_node is not present but
it work in an analogous way). We start with following memblocks, all of
them offline
memory34/valid_zones:Normal Movable
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable

Now, we online block 34 in default mode and block 37 as movable
root@test1:/sys/devices/system/node/node1# echo online > memory34/state
root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable

As we can see all other blocks can still be onlined both into Normal and
Movable zones and the Normal is default because the Movable zone spans
only block37 now.
root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Movable Normal
memory39/valid_zones:Movable Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable

Now the default zone for blocks 37-41 has changed because movable zone
spans that range.
root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable

Note that the block 39 now belongs to the zone Normal and so block38
falls into Normal by default as well.

For completness
root@test1:/sys/devices/system/node/node1# for i in memory[34]?
do
	echo online > $i/state 2>/dev/null
done

memory34/valid_zones:Normal
memory35/valid_zones:Normal
memory36/valid_zones:Normal
memory37/valid_zones:Movable
memory38/valid_zones:Normal
memory39/valid_zones:Normal
memory40/valid_zones:Movable
memory41/valid_zones:Movable

Implementation wise the change is quite straightforward. We can get rid
of allow_online_pfn_range altogether. online_pages allows only offline
nodes already. The original default_zone_for_pfn will become
default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
above semantic. zone_for_pfn_range is slightly reorganized to implement
kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
a catch all default behavior.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/base/memory.c |  3 ---
 mm/memory_hotplug.c   | 74 ++++++++++++++++-----------------------------------
 2 files changed, 23 insertions(+), 54 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 26383af9900c..4e3b61cda520 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -394,9 +394,6 @@ static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
 {
 	struct zone *zone;
 
-	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
-		return;
-
 	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
 	if (zone != default_zone) {
 		strcat(buf, " ");
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6b9a60115e37..670f7acbecf4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -894,7 +894,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
  * If no kernel zone covers this pfn range it will automatically go
  * to the ZONE_NORMAL.
  */
-static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn,
 		unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
@@ -910,65 +910,40 @@ static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
 	return &pgdat->node_zones[ZONE_NORMAL];
 }
 
-bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
+static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+		unsigned long nr_pages)
 {
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
+	struct zone *kernel_zone = default_kernel_zone_for_pfn(nid, start_pfn,
+			nr_pages);
+	struct zone *movable_zone = &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
+	bool in_kernel = zone_intersects(kernel_zone, start_pfn, nr_pages);
+	bool in_movable = zone_intersects(movable_zone, start_pfn, nr_pages);
 
 	/*
-	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
-	 * physically before ZONE_MOVABLE. All we need is they do not
-	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
-	 * though so let's stick with it for simplicity for now.
-	 * TODO make sure we do not overlap with ZONE_DEVICE
+	 * We inherit the existing zone in a simple case where zones do not
+	 * overlap in the given range
 	 */
-	if (online_type == MMOP_ONLINE_KERNEL) {
-		if (zone_is_empty(movable_zone))
-			return true;
-		return movable_zone->zone_start_pfn >= pfn + nr_pages;
-	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		return zone_end_pfn(default_zone) <= pfn;
-	}
-
-	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
-	return online_type == MMOP_ONLINE_KEEP;
-}
-
-static inline bool movable_pfn_range(int nid, struct zone *default_zone,
-		unsigned long start_pfn, unsigned long nr_pages)
-{
-	if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
-				MMOP_ONLINE_KERNEL))
-		return true;
-
-	if (!movable_node_is_enabled())
-		return false;
+	if (in_kernel ^ in_movable)
+		return (in_kernel) ? kernel_zone : movable_zone;
 
-	return !zone_intersects(default_zone, start_pfn, nr_pages);
+	/*
+	 * If the range doesn't belong to any zone or two zones overlap in the
+	 * given range then we use movable zone only if movable_node is
+	 * enabled because we always online to a kernel zone by default.
+	 */
+	return movable_node_enabled ? movable_zone : kernel_zone;
 }
 
 struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
 		unsigned long nr_pages)
 {
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
+	if (online_type == MMOP_ONLINE_KERNEL)
+		return default_kernel_zone_for_pfn(nid, start_pfn, nr_pages);
 
-	if (online_type == MMOP_ONLINE_KEEP) {
-		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-		/*
-		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
-		 * movable zone if that is not possible (e.g. we are within
-		 * or past the existing movable zone). movable_node overrides
-		 * this default and defaults to movable zone
-		 */
-		if (movable_pfn_range(nid, zone, start_pfn, nr_pages))
-			zone = movable_zone;
-	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		zone = &pgdat->node_zones[ZONE_MOVABLE];
-	}
+	if (online_type == MMOP_ONLINE_MOVABLE)
+		return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
 
-	return zone;
+	return default_zone_for_pfn(nid, start_pfn, nr_pages);
 }
 
 /*
@@ -997,9 +972,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	struct memory_notify arg;
 
 	nid = pfn_to_nid(pfn);
-	if (!allow_online_pfn_range(nid, pfn, nr_pages, online_type))
-		return -EINVAL;
-
 	/* associate pfn range with the zone */
 	zone = move_pfn_range(online_type, nid, pfn, nr_pages);
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering
  2017-06-29  7:35 ` [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering Michal Hocko
@ 2017-06-30  0:45   ` Joonsoo Kim
  2017-07-07 14:34   ` Vlastimil Babka
  1 sibling, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-06-30  0:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Daniel Kiper, Igor Mammedov,
	Vitaly Kuznetsov, Wei Yang, LKML, Michal Hocko

On Thu, Jun 29, 2017 at 09:35:08AM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Prior to "mm, memory_hotplug: do not associate hotadded memory to zones
> until online" we used to allow to change the valid zone types of a
> memory block if it is adjacent to a different zone type. This fact was
> reflected in memoryNN/valid_zones by the ordering of printed zones.
> The first one was default (echo online > memoryNN/state) and the other
> one could be onlined explicitly by online_{movable,kernel}. This
> behavior was removed by the said patch and as such the ordering was
> not all that important. In most cases a kernel zone would be default
> anyway. The only exception is movable_node handled by "mm,
> memory_hotplug: support movable_node for hotpluggable nodes".
> 
> Let's reintroduce this behavior again because later patch will remove
> the zone overlap restriction and so user will be allowed to online
> kernel resp. movable block regardless of its placement. Original
> behavior will then become significant again because it would be
> non-trivial for users to see what is the default zone to online into.
> 
> Implementation is really simple. Pull out zone selection out of
> move_pfn_range into zone_for_pfn_range helper and use it in
> show_valid_zones to display the zone for default onlining and then
> both kernel and movable if they are allowed. Default online zone is not
> duplicated.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> 

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering
@ 2017-06-30  0:45   ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-06-30  0:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Daniel Kiper, Igor Mammedov,
	Vitaly Kuznetsov, Wei Yang, LKML, Michal Hocko

On Thu, Jun 29, 2017 at 09:35:08AM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Prior to "mm, memory_hotplug: do not associate hotadded memory to zones
> until online" we used to allow to change the valid zone types of a
> memory block if it is adjacent to a different zone type. This fact was
> reflected in memoryNN/valid_zones by the ordering of printed zones.
> The first one was default (echo online > memoryNN/state) and the other
> one could be onlined explicitly by online_{movable,kernel}. This
> behavior was removed by the said patch and as such the ordering was
> not all that important. In most cases a kernel zone would be default
> anyway. The only exception is movable_node handled by "mm,
> memory_hotplug: support movable_node for hotpluggable nodes".
> 
> Let's reintroduce this behavior again because later patch will remove
> the zone overlap restriction and so user will be allowed to online
> kernel resp. movable block regardless of its placement. Original
> behavior will then become significant again because it would be
> non-trivial for users to see what is the default zone to online into.
> 
> Implementation is really simple. Pull out zone selection out of
> move_pfn_range into zone_for_pfn_range helper and use it in
> show_valid_zones to display the zone for default onlining and then
> both kernel and movable if they are allowed. Default online zone is not
> duplicated.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> 

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-06-29  7:35 ` [PATCH 2/2] mm, memory_hotplug: remove zone restrictions Michal Hocko
@ 2017-06-30  1:16   ` Joonsoo Kim
  2017-06-30  3:09   ` Wei Yang
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-06-30  1:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Daniel Kiper, Igor Mammedov,
	Vitaly Kuznetsov, Wei Yang, LKML, Michal Hocko

On Thu, Jun 29, 2017 at 09:35:09AM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> to precede the Movable zone in the physical memory range. The purpose of
> the movable zone is, however, not bound to any physical memory restriction.
> It merely defines a class of migrateable and reclaimable memory.
> 
> There are users (e.g. CMA) who might want to reserve specific physical
> memory ranges for their own purpose. Moreover our pfn walkers have to be
> prepared for zones overlapping in the physical range already because we
> do support interleaving NUMA nodes and therefore zones can interleave as
> well. This means we can allow each memory block to be associated with a
> different zone.
> 
> Loosen the current onlining semantic and allow explicit onlining type on
> any memblock. That means that online_{kernel,movable} will be allowed
> regardless of the physical address of the memblock as long as it is
> offline of course. This might result in moveble zone overlapping with
> other kernel zones. Default onlining then becomes a bit tricky but still
> sensible. echo online > memoryXY/state will online the given block to
> 	1) the default zone if the given range is outside of any zone
> 	2) the enclosing zone if such a zone doesn't interleave with
> 	   any other zone
>         3) the default zone if more zones interleave for this range
> where default zone is movable zone only if movable_node is enabled
> otherwise it is a kernel zone.
> 
> Here is an example of the semantic with (movable_node is not present but
> it work in an analogous way). We start with following memblocks, all of
> them offline
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> Now, we online block 34 in default mode and block 37 as movable
> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> As we can see all other blocks can still be onlined both into Normal and
> Movable zones and the Normal is default because the Movable zone spans
> only block37 now.
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Now the default zone for blocks 37-41 has changed because movable zone
> spans that range.
> root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Note that the block 39 now belongs to the zone Normal and so block38
> falls into Normal by default as well.
> 
> For completness
> root@test1:/sys/devices/system/node/node1# for i in memory[34]?
> do
> 	echo online > $i/state 2>/dev/null
> done
> 
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable
> memory41/valid_zones:Movable
> 
> Implementation wise the change is quite straightforward. We can get rid
> of allow_online_pfn_range altogether. online_pages allows only offline
> nodes already. The original default_zone_for_pfn will become
> default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
> above semantic. zone_for_pfn_range is slightly reorganized to implement
> kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
> a catch all default behavior.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

I appreaciate your help!

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-06-30  1:16   ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-06-30  1:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Daniel Kiper, Igor Mammedov,
	Vitaly Kuznetsov, Wei Yang, LKML, Michal Hocko

On Thu, Jun 29, 2017 at 09:35:09AM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> to precede the Movable zone in the physical memory range. The purpose of
> the movable zone is, however, not bound to any physical memory restriction.
> It merely defines a class of migrateable and reclaimable memory.
> 
> There are users (e.g. CMA) who might want to reserve specific physical
> memory ranges for their own purpose. Moreover our pfn walkers have to be
> prepared for zones overlapping in the physical range already because we
> do support interleaving NUMA nodes and therefore zones can interleave as
> well. This means we can allow each memory block to be associated with a
> different zone.
> 
> Loosen the current onlining semantic and allow explicit onlining type on
> any memblock. That means that online_{kernel,movable} will be allowed
> regardless of the physical address of the memblock as long as it is
> offline of course. This might result in moveble zone overlapping with
> other kernel zones. Default onlining then becomes a bit tricky but still
> sensible. echo online > memoryXY/state will online the given block to
> 	1) the default zone if the given range is outside of any zone
> 	2) the enclosing zone if such a zone doesn't interleave with
> 	   any other zone
>         3) the default zone if more zones interleave for this range
> where default zone is movable zone only if movable_node is enabled
> otherwise it is a kernel zone.
> 
> Here is an example of the semantic with (movable_node is not present but
> it work in an analogous way). We start with following memblocks, all of
> them offline
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> Now, we online block 34 in default mode and block 37 as movable
> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> As we can see all other blocks can still be onlined both into Normal and
> Movable zones and the Normal is default because the Movable zone spans
> only block37 now.
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Now the default zone for blocks 37-41 has changed because movable zone
> spans that range.
> root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Note that the block 39 now belongs to the zone Normal and so block38
> falls into Normal by default as well.
> 
> For completness
> root@test1:/sys/devices/system/node/node1# for i in memory[34]?
> do
> 	echo online > $i/state 2>/dev/null
> done
> 
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable
> memory41/valid_zones:Movable
> 
> Implementation wise the change is quite straightforward. We can get rid
> of allow_online_pfn_range altogether. online_pages allows only offline
> nodes already. The original default_zone_for_pfn will become
> default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
> above semantic. zone_for_pfn_range is slightly reorganized to implement
> kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
> a catch all default behavior.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

I appreaciate your help!

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-06-29  7:35 ` [PATCH 2/2] mm, memory_hotplug: remove zone restrictions Michal Hocko
  2017-06-30  1:16   ` Joonsoo Kim
@ 2017-06-30  3:09   ` Wei Yang
  2017-06-30  8:39     ` Michal Hocko
  2017-07-07 15:02   ` Vlastimil Babka
  2017-07-12 12:49   ` Michal Hocko
  3 siblings, 1 reply; 48+ messages in thread
From: Wei Yang @ 2017-06-30  3:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

On Thu, Jun 29, 2017 at 3:35 PM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>

Michal,

I love the idea very much.

> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> to precede the Movable zone in the physical memory range. The purpose of
> the movable zone is, however, not bound to any physical memory restriction.
> It merely defines a class of migrateable and reclaimable memory.
>
> There are users (e.g. CMA) who might want to reserve specific physical
> memory ranges for their own purpose. Moreover our pfn walkers have to be
> prepared for zones overlapping in the physical range already because we
> do support interleaving NUMA nodes and therefore zones can interleave as
> well. This means we can allow each memory block to be associated with a
> different zone.
>
> Loosen the current onlining semantic and allow explicit onlining type on
> any memblock. That means that online_{kernel,movable} will be allowed
> regardless of the physical address of the memblock as long as it is
> offline of course. This might result in moveble zone overlapping with
> other kernel zones. Default onlining then becomes a bit tricky but still

As here mentioned, we just remove the restriction for zone_movable.
For other zones, we still keep the restriction and the order as before.

Maybe the title is a little misleading. Audience may thinks no restriction
for all zones.

> sensible. echo online > memoryXY/state will online the given block to
>         1) the default zone if the given range is outside of any zone
>         2) the enclosing zone if such a zone doesn't interleave with
>            any other zone
>         3) the default zone if more zones interleave for this range
> where default zone is movable zone only if movable_node is enabled
> otherwise it is a kernel zone.
>
> Here is an example of the semantic with (movable_node is not present but
> it work in an analogous way). We start with following memblocks, all of
> them offline
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
>
> Now, we online block 34 in default mode and block 37 as movable
> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
>
> As we can see all other blocks can still be onlined both into Normal and
> Movable zones and the Normal is default because the Movable zone spans
> only block37 now.
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
>

As I spotted on the previous patch, after several round of online/offline,
The output of valid_zones will differ.

For example in this case, after I offline memory37 and 41, I expect this:

 memory34/valid_zones:Normal
 memory35/valid_zones:Normal Movable
 memory36/valid_zones:Normal Movable
 memory37/valid_zones:Normal Movable
 memory38/valid_zones:Normal Movable
 memory39/valid_zones:Normal Movable
 memory40/valid_zones:Normal Movable
 memory41/valid_zones:Normal Movable

While the current result would be

 memory34/valid_zones:Normal
 memory35/valid_zones:Normal Movable
 memory36/valid_zones:Normal Movable
 memory37/valid_zones:Movable Normal
 memory38/valid_zones:Movable Normal
 memory39/valid_zones:Movable Normal
 memory40/valid_zones:Movable Normal
 memory41/valid_zones:Movable Normal

The reason is the same, we don't adjust the zone's range when offline
memory.

This is also a known issue?

> Now the default zone for blocks 37-41 has changed because movable zone
> spans that range.
> root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
>
> Note that the block 39 now belongs to the zone Normal and so block38
> falls into Normal by default as well.
>
> For completness
> root@test1:/sys/devices/system/node/node1# for i in memory[34]?
> do
>         echo online > $i/state 2>/dev/null
> done
>
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable
> memory41/valid_zones:Movable
>
> Implementation wise the change is quite straightforward. We can get rid
> of allow_online_pfn_range altogether. online_pages allows only offline
> nodes already. The original default_zone_for_pfn will become
> default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
> above semantic. zone_for_pfn_range is slightly reorganized to implement
> kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
> a catch all default behavior.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-06-30  3:09   ` Wei Yang
  2017-06-30  8:39     ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Wei Yang @ 2017-06-30  3:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

On Thu, Jun 29, 2017 at 3:35 PM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>

Michal,

I love the idea very much.

> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> to precede the Movable zone in the physical memory range. The purpose of
> the movable zone is, however, not bound to any physical memory restriction.
> It merely defines a class of migrateable and reclaimable memory.
>
> There are users (e.g. CMA) who might want to reserve specific physical
> memory ranges for their own purpose. Moreover our pfn walkers have to be
> prepared for zones overlapping in the physical range already because we
> do support interleaving NUMA nodes and therefore zones can interleave as
> well. This means we can allow each memory block to be associated with a
> different zone.
>
> Loosen the current onlining semantic and allow explicit onlining type on
> any memblock. That means that online_{kernel,movable} will be allowed
> regardless of the physical address of the memblock as long as it is
> offline of course. This might result in moveble zone overlapping with
> other kernel zones. Default onlining then becomes a bit tricky but still

As here mentioned, we just remove the restriction for zone_movable.
For other zones, we still keep the restriction and the order as before.

Maybe the title is a little misleading. Audience may thinks no restriction
for all zones.

> sensible. echo online > memoryXY/state will online the given block to
>         1) the default zone if the given range is outside of any zone
>         2) the enclosing zone if such a zone doesn't interleave with
>            any other zone
>         3) the default zone if more zones interleave for this range
> where default zone is movable zone only if movable_node is enabled
> otherwise it is a kernel zone.
>
> Here is an example of the semantic with (movable_node is not present but
> it work in an analogous way). We start with following memblocks, all of
> them offline
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
>
> Now, we online block 34 in default mode and block 37 as movable
> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
>
> As we can see all other blocks can still be onlined both into Normal and
> Movable zones and the Normal is default because the Movable zone spans
> only block37 now.
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
>

As I spotted on the previous patch, after several round of online/offline,
The output of valid_zones will differ.

For example in this case, after I offline memory37 and 41, I expect this:

 memory34/valid_zones:Normal
 memory35/valid_zones:Normal Movable
 memory36/valid_zones:Normal Movable
 memory37/valid_zones:Normal Movable
 memory38/valid_zones:Normal Movable
 memory39/valid_zones:Normal Movable
 memory40/valid_zones:Normal Movable
 memory41/valid_zones:Normal Movable

While the current result would be

 memory34/valid_zones:Normal
 memory35/valid_zones:Normal Movable
 memory36/valid_zones:Normal Movable
 memory37/valid_zones:Movable Normal
 memory38/valid_zones:Movable Normal
 memory39/valid_zones:Movable Normal
 memory40/valid_zones:Movable Normal
 memory41/valid_zones:Movable Normal

The reason is the same, we don't adjust the zone's range when offline
memory.

This is also a known issue?

> Now the default zone for blocks 37-41 has changed because movable zone
> spans that range.
> root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
>
> Note that the block 39 now belongs to the zone Normal and so block38
> falls into Normal by default as well.
>
> For completness
> root@test1:/sys/devices/system/node/node1# for i in memory[34]?
> do
>         echo online > $i/state 2>/dev/null
> done
>
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable
> memory41/valid_zones:Movable
>
> Implementation wise the change is quite straightforward. We can get rid
> of allow_online_pfn_range altogether. online_pages allows only offline
> nodes already. The original default_zone_for_pfn will become
> default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
> above semantic. zone_for_pfn_range is slightly reorganized to implement
> kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
> a catch all default behavior.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-06-30  3:09   ` Wei Yang
@ 2017-06-30  8:39     ` Michal Hocko
  2017-06-30  9:39       ` Wei Yang
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-06-30  8:39 UTC (permalink / raw)
  To: Wei Yang
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 30-06-17 11:09:51, Wei Yang wrote:
> On Thu, Jun 29, 2017 at 3:35 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > From: Michal Hocko <mhocko@suse.com>
> >
> 
> Michal,
> 
> I love the idea very much.
> 
> > Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> > to precede the Movable zone in the physical memory range. The purpose of
> > the movable zone is, however, not bound to any physical memory restriction.
> > It merely defines a class of migrateable and reclaimable memory.
> >
> > There are users (e.g. CMA) who might want to reserve specific physical
> > memory ranges for their own purpose. Moreover our pfn walkers have to be
> > prepared for zones overlapping in the physical range already because we
> > do support interleaving NUMA nodes and therefore zones can interleave as
> > well. This means we can allow each memory block to be associated with a
> > different zone.
> >
> > Loosen the current onlining semantic and allow explicit onlining type on
> > any memblock. That means that online_{kernel,movable} will be allowed
> > regardless of the physical address of the memblock as long as it is
> > offline of course. This might result in moveble zone overlapping with
> > other kernel zones. Default onlining then becomes a bit tricky but still
> 
> As here mentioned, we just remove the restriction for zone_movable.
> For other zones, we still keep the restriction and the order as before.

All other zones except for ZONE_NORMAL are subject of the physical
memory restrictions.
 
> Maybe the title is a little misleading. Audience may thinks no restriction
> for all zones.

I thought the context was clear from the fact that this is a hotplug
related patch. As such we do not allow online_{dma,dma32,normal} we only
allow to online into a kernel zone. I can update the wording but do not
have a good idea how.

[...]
> As I spotted on the previous patch, after several round of online/offline,
> The output of valid_zones will differ.
> 
> For example in this case, after I offline memory37 and 41, I expect this:
> 
>  memory34/valid_zones:Normal
>  memory35/valid_zones:Normal Movable
>  memory36/valid_zones:Normal Movable
>  memory37/valid_zones:Normal Movable
>  memory38/valid_zones:Normal Movable
>  memory39/valid_zones:Normal Movable
>  memory40/valid_zones:Normal Movable
>  memory41/valid_zones:Normal Movable
> 
> While the current result would be
> 
>  memory34/valid_zones:Normal
>  memory35/valid_zones:Normal Movable
>  memory36/valid_zones:Normal Movable
>  memory37/valid_zones:Movable Normal
>  memory38/valid_zones:Movable Normal
>  memory39/valid_zones:Movable Normal
>  memory40/valid_zones:Movable Normal
>  memory41/valid_zones:Movable Normal

You haven't written your sequence of onlining but if you used the same
one as mentioned in the patch then you should get
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable Normal

Even if you kept 37 as movable and offline 38 you wouldn't get 38-41
movable by default because...

> The reason is the same, we don't adjust the zone's range when offline
> memory.

.. of this.

> This is also a known issue?

yes and to be honest I do not plan to fix it unless somebody has a real
life usecase for it. Now that we allow explicit onlininig type anywhere
it seems like a reasonable behavior and this will allow us to remove
quite some code which is always a good deal wrt longterm maintenance.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-06-30  8:39     ` Michal Hocko
  2017-06-30  9:39       ` Wei Yang
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-06-30  8:39 UTC (permalink / raw)
  To: Wei Yang
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 30-06-17 11:09:51, Wei Yang wrote:
> On Thu, Jun 29, 2017 at 3:35 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > From: Michal Hocko <mhocko@suse.com>
> >
> 
> Michal,
> 
> I love the idea very much.
> 
> > Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> > to precede the Movable zone in the physical memory range. The purpose of
> > the movable zone is, however, not bound to any physical memory restriction.
> > It merely defines a class of migrateable and reclaimable memory.
> >
> > There are users (e.g. CMA) who might want to reserve specific physical
> > memory ranges for their own purpose. Moreover our pfn walkers have to be
> > prepared for zones overlapping in the physical range already because we
> > do support interleaving NUMA nodes and therefore zones can interleave as
> > well. This means we can allow each memory block to be associated with a
> > different zone.
> >
> > Loosen the current onlining semantic and allow explicit onlining type on
> > any memblock. That means that online_{kernel,movable} will be allowed
> > regardless of the physical address of the memblock as long as it is
> > offline of course. This might result in moveble zone overlapping with
> > other kernel zones. Default onlining then becomes a bit tricky but still
> 
> As here mentioned, we just remove the restriction for zone_movable.
> For other zones, we still keep the restriction and the order as before.

All other zones except for ZONE_NORMAL are subject of the physical
memory restrictions.
 
> Maybe the title is a little misleading. Audience may thinks no restriction
> for all zones.

I thought the context was clear from the fact that this is a hotplug
related patch. As such we do not allow online_{dma,dma32,normal} we only
allow to online into a kernel zone. I can update the wording but do not
have a good idea how.

[...]
> As I spotted on the previous patch, after several round of online/offline,
> The output of valid_zones will differ.
> 
> For example in this case, after I offline memory37 and 41, I expect this:
> 
>  memory34/valid_zones:Normal
>  memory35/valid_zones:Normal Movable
>  memory36/valid_zones:Normal Movable
>  memory37/valid_zones:Normal Movable
>  memory38/valid_zones:Normal Movable
>  memory39/valid_zones:Normal Movable
>  memory40/valid_zones:Normal Movable
>  memory41/valid_zones:Normal Movable
> 
> While the current result would be
> 
>  memory34/valid_zones:Normal
>  memory35/valid_zones:Normal Movable
>  memory36/valid_zones:Normal Movable
>  memory37/valid_zones:Movable Normal
>  memory38/valid_zones:Movable Normal
>  memory39/valid_zones:Movable Normal
>  memory40/valid_zones:Movable Normal
>  memory41/valid_zones:Movable Normal

You haven't written your sequence of onlining but if you used the same
one as mentioned in the patch then you should get
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable Normal

Even if you kept 37 as movable and offline 38 you wouldn't get 38-41
movable by default because...

> The reason is the same, we don't adjust the zone's range when offline
> memory.

.. of this.

> This is also a known issue?

yes and to be honest I do not plan to fix it unless somebody has a real
life usecase for it. Now that we allow explicit onlininig type anywhere
it seems like a reasonable behavior and this will allow us to remove
quite some code which is always a good deal wrt longterm maintenance.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-06-30  8:39     ` Michal Hocko
@ 2017-06-30  9:39       ` Wei Yang
  2017-06-30  9:55         ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Wei Yang @ 2017-06-30  9:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri, Jun 30, 2017 at 4:39 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Fri 30-06-17 11:09:51, Wei Yang wrote:
>> On Thu, Jun 29, 2017 at 3:35 PM, Michal Hocko <mhocko@kernel.org> wrote:
>> > From: Michal Hocko <mhocko@suse.com>
>> >
>>
>> Michal,
>>
>> I love the idea very much.
>>

>
> You haven't written your sequence of onlining but if you used the same
> one as mentioned in the patch then you should get
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable Normal
>
> Even if you kept 37 as movable and offline 38 you wouldn't get 38-41
> movable by default because...
>

Yes, it depends on the zone range.

>> The reason is the same, we don't adjust the zone's range when offline
>> memory.
>
> .. of this.
>
>> This is also a known issue?
>
> yes and to be honest I do not plan to fix it unless somebody has a real
> life usecase for it. Now that we allow explicit onlininig type anywhere
> it seems like a reasonable behavior and this will allow us to remove
> quite some code which is always a good deal wrt longterm maintenance.
>

hmm... the statistics displayed in /proc/zoneinfo would be meaningless
for zone_normal and zone_movable.

I am not sure, maybe no one care about these fields.

> Thanks!
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-06-30  9:39       ` Wei Yang
  2017-06-30  9:55         ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Wei Yang @ 2017-06-30  9:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri, Jun 30, 2017 at 4:39 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Fri 30-06-17 11:09:51, Wei Yang wrote:
>> On Thu, Jun 29, 2017 at 3:35 PM, Michal Hocko <mhocko@kernel.org> wrote:
>> > From: Michal Hocko <mhocko@suse.com>
>> >
>>
>> Michal,
>>
>> I love the idea very much.
>>

>
> You haven't written your sequence of onlining but if you used the same
> one as mentioned in the patch then you should get
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable Normal
>
> Even if you kept 37 as movable and offline 38 you wouldn't get 38-41
> movable by default because...
>

Yes, it depends on the zone range.

>> The reason is the same, we don't adjust the zone's range when offline
>> memory.
>
> .. of this.
>
>> This is also a known issue?
>
> yes and to be honest I do not plan to fix it unless somebody has a real
> life usecase for it. Now that we allow explicit onlininig type anywhere
> it seems like a reasonable behavior and this will allow us to remove
> quite some code which is always a good deal wrt longterm maintenance.
>

hmm... the statistics displayed in /proc/zoneinfo would be meaningless
for zone_normal and zone_movable.

I am not sure, maybe no one care about these fields.

> Thanks!
> --
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-06-30  9:39       ` Wei Yang
@ 2017-06-30  9:55         ` Michal Hocko
  2017-06-30 11:01           ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-06-30  9:55 UTC (permalink / raw)
  To: Wei Yang
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 30-06-17 17:39:56, Wei Yang wrote:
> On Fri, Jun 30, 2017 at 4:39 PM, Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > yes and to be honest I do not plan to fix it unless somebody has a real
> > life usecase for it. Now that we allow explicit onlininig type anywhere
> > it seems like a reasonable behavior and this will allow us to remove
> > quite some code which is always a good deal wrt longterm maintenance.
> >
> 
> hmm... the statistics displayed in /proc/zoneinfo would be meaningless
> for zone_normal and zone_movable.

Why would they be meaningless? Counters will always reflect the actual
use - if not then it is a bug. And wrt to zone description what is
meaningless about
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Movable
memory37/valid_zones:Movable Normal
memory38/valid_zones:Movable Normal
memory39/valid_zones:Movable Normal
memory40/valid_zones:Normal
memory41/valid_zones:Movable

And
Node 1, zone   Normal
  pages free     65465
        min      156
        low      221
        high     286
        spanned  229376
        present  65536
        managed  65536
[...]
  start_pfn:           1114112
Node 1, zone  Movable
  pages free     65443
        min      156
        low      221
        high     286
        spanned  196608
        present  65536
        managed  65536
[...]
  start_pfn:           1179648

ranges are clearly defined as [start_pfn, start_pfn+managed] and managed
matches the number of onlined pages (256MB).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-06-30  9:55         ` Michal Hocko
  2017-06-30 11:01           ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-06-30  9:55 UTC (permalink / raw)
  To: Wei Yang
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 30-06-17 17:39:56, Wei Yang wrote:
> On Fri, Jun 30, 2017 at 4:39 PM, Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > yes and to be honest I do not plan to fix it unless somebody has a real
> > life usecase for it. Now that we allow explicit onlininig type anywhere
> > it seems like a reasonable behavior and this will allow us to remove
> > quite some code which is always a good deal wrt longterm maintenance.
> >
> 
> hmm... the statistics displayed in /proc/zoneinfo would be meaningless
> for zone_normal and zone_movable.

Why would they be meaningless? Counters will always reflect the actual
use - if not then it is a bug. And wrt to zone description what is
meaningless about
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Movable
memory37/valid_zones:Movable Normal
memory38/valid_zones:Movable Normal
memory39/valid_zones:Movable Normal
memory40/valid_zones:Normal
memory41/valid_zones:Movable

And
Node 1, zone   Normal
  pages free     65465
        min      156
        low      221
        high     286
        spanned  229376
        present  65536
        managed  65536
[...]
  start_pfn:           1114112
Node 1, zone  Movable
  pages free     65443
        min      156
        low      221
        high     286
        spanned  196608
        present  65536
        managed  65536
[...]
  start_pfn:           1179648

ranges are clearly defined as [start_pfn, start_pfn+managed] and managed
matches the number of onlined pages (256MB).
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-06-30  9:55         ` Michal Hocko
@ 2017-06-30 11:01           ` Michal Hocko
  2017-07-05 23:16             ` Wei Yang
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-06-30 11:01 UTC (permalink / raw)
  To: Wei Yang
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 30-06-17 11:55:45, Michal Hocko wrote:
> On Fri 30-06-17 17:39:56, Wei Yang wrote:
> > On Fri, Jun 30, 2017 at 4:39 PM, Michal Hocko <mhocko@kernel.org> wrote:
> [...]
> > > yes and to be honest I do not plan to fix it unless somebody has a real
> > > life usecase for it. Now that we allow explicit onlininig type anywhere
> > > it seems like a reasonable behavior and this will allow us to remove
> > > quite some code which is always a good deal wrt longterm maintenance.
> > >
> > 
> > hmm... the statistics displayed in /proc/zoneinfo would be meaningless
> > for zone_normal and zone_movable.
> 
> Why would they be meaningless? Counters will always reflect the actual
> use - if not then it is a bug. And wrt to zone description what is
> meaningless about
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Movable
> memory37/valid_zones:Movable Normal
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Normal
> memory41/valid_zones:Movable
> 
> And
> Node 1, zone   Normal
>   pages free     65465
>         min      156
>         low      221
>         high     286
>         spanned  229376
>         present  65536
>         managed  65536
> [...]
>   start_pfn:           1114112
> Node 1, zone  Movable
>   pages free     65443
>         min      156
>         low      221
>         high     286
>         spanned  196608
>         present  65536
>         managed  65536
> [...]
>   start_pfn:           1179648
> 
> ranges are clearly defined as [start_pfn, start_pfn+managed] and managed

errr, this should be [start_pfn, start_pfn + spanned] of course.

> matches the number of onlined pages (256MB).

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-06-30 11:01           ` Michal Hocko
  2017-07-05 23:16             ` Wei Yang
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-06-30 11:01 UTC (permalink / raw)
  To: Wei Yang
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 30-06-17 11:55:45, Michal Hocko wrote:
> On Fri 30-06-17 17:39:56, Wei Yang wrote:
> > On Fri, Jun 30, 2017 at 4:39 PM, Michal Hocko <mhocko@kernel.org> wrote:
> [...]
> > > yes and to be honest I do not plan to fix it unless somebody has a real
> > > life usecase for it. Now that we allow explicit onlininig type anywhere
> > > it seems like a reasonable behavior and this will allow us to remove
> > > quite some code which is always a good deal wrt longterm maintenance.
> > >
> > 
> > hmm... the statistics displayed in /proc/zoneinfo would be meaningless
> > for zone_normal and zone_movable.
> 
> Why would they be meaningless? Counters will always reflect the actual
> use - if not then it is a bug. And wrt to zone description what is
> meaningless about
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Movable
> memory37/valid_zones:Movable Normal
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Normal
> memory41/valid_zones:Movable
> 
> And
> Node 1, zone   Normal
>   pages free     65465
>         min      156
>         low      221
>         high     286
>         spanned  229376
>         present  65536
>         managed  65536
> [...]
>   start_pfn:           1114112
> Node 1, zone  Movable
>   pages free     65443
>         min      156
>         low      221
>         high     286
>         spanned  196608
>         present  65536
>         managed  65536
> [...]
>   start_pfn:           1179648
> 
> ranges are clearly defined as [start_pfn, start_pfn+managed] and managed

errr, this should be [start_pfn, start_pfn + spanned] of course.

> matches the number of onlined pages (256MB).

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-06-30 11:01           ` Michal Hocko
@ 2017-07-05 23:16             ` Wei Yang
  2017-07-06  6:56               ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Wei Yang @ 2017-07-05 23:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

[-- Attachment #1: Type: text/plain, Size: 2822 bytes --]

On Fri, Jun 30, 2017 at 01:01:18PM +0200, Michal Hocko wrote:
>On Fri 30-06-17 11:55:45, Michal Hocko wrote:
>> On Fri 30-06-17 17:39:56, Wei Yang wrote:
>> > On Fri, Jun 30, 2017 at 4:39 PM, Michal Hocko <mhocko@kernel.org> wrote:
>> [...]
>> > > yes and to be honest I do not plan to fix it unless somebody has a real
>> > > life usecase for it. Now that we allow explicit onlininig type anywhere
>> > > it seems like a reasonable behavior and this will allow us to remove
>> > > quite some code which is always a good deal wrt longterm maintenance.
>> > >
>> > 
>> > hmm... the statistics displayed in /proc/zoneinfo would be meaningless
>> > for zone_normal and zone_movable.
>> 
>> Why would they be meaningless? Counters will always reflect the actual
>> use - if not then it is a bug. And wrt to zone description what is
>> meaningless about
>> memory34/valid_zones:Normal
>> memory35/valid_zones:Normal Movable
>> memory36/valid_zones:Movable
>> memory37/valid_zones:Movable Normal
>> memory38/valid_zones:Movable Normal
>> memory39/valid_zones:Movable Normal
>> memory40/valid_zones:Normal
>> memory41/valid_zones:Movable
>> 
>> And
>> Node 1, zone   Normal
>>   pages free     65465
>>         min      156
>>         low      221
>>         high     286
>>         spanned  229376
>>         present  65536
>>         managed  65536
>> [...]
>>   start_pfn:           1114112
>> Node 1, zone  Movable
>>   pages free     65443
>>         min      156
>>         low      221
>>         high     286
>>         spanned  196608
>>         present  65536
>>         managed  65536
>> [...]
>>   start_pfn:           1179648
>> 
>> ranges are clearly defined as [start_pfn, start_pfn+managed] and managed
>
>errr, this should be [start_pfn, start_pfn + spanned] of course.
>

The spanned is not adjusted after offline, neither does start_pfn. For example,
even offline all the movable_zone range, we can still see the spanned.

Below is a result with a little changed kernel to show the start_pfn always.
The sequence is:
1. bootup

Node 0, zone  Movable
        spanned  65536
	present  0
	managed  0
  start_pfn:           0

2. online movable 2 continuous memory_blocks

Node 0, zone  Movable
        spanned  65536
	present  65536
	managed  65536
  start_pfn:           1310720

3. offline 2nd memory_blocks

Node 0, zone  Movable
        spanned  65536
	present  32768
	managed  32768
  start_pfn:           1310720

4. offline 1st memory_blocks

Node 0, zone  Movable
        spanned  65536
	present  0
	managed  0
  start_pfn:           1310720

So I am not sure this is still clearly defined?

>> matches the number of onlined pages (256MB).
>
>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-07-05 23:16             ` Wei Yang
@ 2017-07-06  6:56               ` Michal Hocko
  2017-07-07  8:37                 ` Wei Yang
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-07-06  6:56 UTC (permalink / raw)
  To: Wei Yang
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 06-07-17 07:16:49, Wei Yang wrote:
> On Fri, Jun 30, 2017 at 01:01:18PM +0200, Michal Hocko wrote:
> >On Fri 30-06-17 11:55:45, Michal Hocko wrote:
> >> On Fri 30-06-17 17:39:56, Wei Yang wrote:
> >> > On Fri, Jun 30, 2017 at 4:39 PM, Michal Hocko <mhocko@kernel.org> wrote:
> >> [...]
> >> > > yes and to be honest I do not plan to fix it unless somebody has a real
> >> > > life usecase for it. Now that we allow explicit onlininig type anywhere
> >> > > it seems like a reasonable behavior and this will allow us to remove
> >> > > quite some code which is always a good deal wrt longterm maintenance.
> >> > >
> >> > 
> >> > hmm... the statistics displayed in /proc/zoneinfo would be meaningless
> >> > for zone_normal and zone_movable.
> >> 
> >> Why would they be meaningless? Counters will always reflect the actual
> >> use - if not then it is a bug. And wrt to zone description what is
> >> meaningless about
> >> memory34/valid_zones:Normal
> >> memory35/valid_zones:Normal Movable
> >> memory36/valid_zones:Movable
> >> memory37/valid_zones:Movable Normal
> >> memory38/valid_zones:Movable Normal
> >> memory39/valid_zones:Movable Normal
> >> memory40/valid_zones:Normal
> >> memory41/valid_zones:Movable
> >> 
> >> And
> >> Node 1, zone   Normal
> >>   pages free     65465
> >>         min      156
> >>         low      221
> >>         high     286
> >>         spanned  229376
> >>         present  65536
> >>         managed  65536
> >> [...]
> >>   start_pfn:           1114112
> >> Node 1, zone  Movable
> >>   pages free     65443
> >>         min      156
> >>         low      221
> >>         high     286
> >>         spanned  196608
> >>         present  65536
> >>         managed  65536
> >> [...]
> >>   start_pfn:           1179648
> >> 
> >> ranges are clearly defined as [start_pfn, start_pfn+managed] and managed
> >
> >errr, this should be [start_pfn, start_pfn + spanned] of course.
> >
> 
> The spanned is not adjusted after offline, neither does start_pfn. For example,
> even offline all the movable_zone range, we can still see the spanned.

Which is completely valid. Offline only changes present/managed.

> Below is a result with a little changed kernel to show the start_pfn always.
> The sequence is:
> 1. bootup
> 
> Node 0, zone  Movable
>         spanned  65536
> 	present  0
> 	managed  0
>   start_pfn:           0
> 
> 2. online movable 2 continuous memory_blocks
> 
> Node 0, zone  Movable
>         spanned  65536
> 	present  65536
> 	managed  65536
>   start_pfn:           1310720
> 
> 3. offline 2nd memory_blocks
> 
> Node 0, zone  Movable
>         spanned  65536
> 	present  32768
> 	managed  32768
>   start_pfn:           1310720
> 
> 4. offline 1st memory_blocks
> 
> Node 0, zone  Movable
>         spanned  65536
> 	present  0
> 	managed  0
>   start_pfn:           1310720
> 
> So I am not sure this is still clearly defined?

Could you be more specific what is not clearly defined? You have
offlined all online memory blocks so present/managed is 0 while the
spanned is unchanged because the zone is still defined in range
[1310720, 1376256].

I also do not see how this is related with the discussed patch as there
is no zone interleaving involved.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-06  6:56               ` Michal Hocko
  2017-07-07  8:37                 ` Wei Yang
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-07-06  6:56 UTC (permalink / raw)
  To: Wei Yang
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 06-07-17 07:16:49, Wei Yang wrote:
> On Fri, Jun 30, 2017 at 01:01:18PM +0200, Michal Hocko wrote:
> >On Fri 30-06-17 11:55:45, Michal Hocko wrote:
> >> On Fri 30-06-17 17:39:56, Wei Yang wrote:
> >> > On Fri, Jun 30, 2017 at 4:39 PM, Michal Hocko <mhocko@kernel.org> wrote:
> >> [...]
> >> > > yes and to be honest I do not plan to fix it unless somebody has a real
> >> > > life usecase for it. Now that we allow explicit onlininig type anywhere
> >> > > it seems like a reasonable behavior and this will allow us to remove
> >> > > quite some code which is always a good deal wrt longterm maintenance.
> >> > >
> >> > 
> >> > hmm... the statistics displayed in /proc/zoneinfo would be meaningless
> >> > for zone_normal and zone_movable.
> >> 
> >> Why would they be meaningless? Counters will always reflect the actual
> >> use - if not then it is a bug. And wrt to zone description what is
> >> meaningless about
> >> memory34/valid_zones:Normal
> >> memory35/valid_zones:Normal Movable
> >> memory36/valid_zones:Movable
> >> memory37/valid_zones:Movable Normal
> >> memory38/valid_zones:Movable Normal
> >> memory39/valid_zones:Movable Normal
> >> memory40/valid_zones:Normal
> >> memory41/valid_zones:Movable
> >> 
> >> And
> >> Node 1, zone   Normal
> >>   pages free     65465
> >>         min      156
> >>         low      221
> >>         high     286
> >>         spanned  229376
> >>         present  65536
> >>         managed  65536
> >> [...]
> >>   start_pfn:           1114112
> >> Node 1, zone  Movable
> >>   pages free     65443
> >>         min      156
> >>         low      221
> >>         high     286
> >>         spanned  196608
> >>         present  65536
> >>         managed  65536
> >> [...]
> >>   start_pfn:           1179648
> >> 
> >> ranges are clearly defined as [start_pfn, start_pfn+managed] and managed
> >
> >errr, this should be [start_pfn, start_pfn + spanned] of course.
> >
> 
> The spanned is not adjusted after offline, neither does start_pfn. For example,
> even offline all the movable_zone range, we can still see the spanned.

Which is completely valid. Offline only changes present/managed.

> Below is a result with a little changed kernel to show the start_pfn always.
> The sequence is:
> 1. bootup
> 
> Node 0, zone  Movable
>         spanned  65536
> 	present  0
> 	managed  0
>   start_pfn:           0
> 
> 2. online movable 2 continuous memory_blocks
> 
> Node 0, zone  Movable
>         spanned  65536
> 	present  65536
> 	managed  65536
>   start_pfn:           1310720
> 
> 3. offline 2nd memory_blocks
> 
> Node 0, zone  Movable
>         spanned  65536
> 	present  32768
> 	managed  32768
>   start_pfn:           1310720
> 
> 4. offline 1st memory_blocks
> 
> Node 0, zone  Movable
>         spanned  65536
> 	present  0
> 	managed  0
>   start_pfn:           1310720
> 
> So I am not sure this is still clearly defined?

Could you be more specific what is not clearly defined? You have
offlined all online memory blocks so present/managed is 0 while the
spanned is unchanged because the zone is still defined in range
[1310720, 1376256].

I also do not see how this is related with the discussed patch as there
is no zone interleaving involved.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-07-06  6:56               ` Michal Hocko
@ 2017-07-07  8:37                 ` Wei Yang
  2017-07-07 12:41                   ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Wei Yang @ 2017-07-07  8:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]

On Thu, Jul 06, 2017 at 08:56:50AM +0200, Michal Hocko wrote:
>> Below is a result with a little changed kernel to show the start_pfn always.
>> The sequence is:
>> 1. bootup
>> 
>> Node 0, zone  Movable
>>         spanned  65536
>> 	present  0
>> 	managed  0
>>   start_pfn:           0
>> 
>> 2. online movable 2 continuous memory_blocks
>> 
>> Node 0, zone  Movable
>>         spanned  65536
>> 	present  65536
>> 	managed  65536
>>   start_pfn:           1310720
>> 
>> 3. offline 2nd memory_blocks
>> 
>> Node 0, zone  Movable
>>         spanned  65536
>> 	present  32768
>> 	managed  32768
>>   start_pfn:           1310720
>> 
>> 4. offline 1st memory_blocks
>> 
>> Node 0, zone  Movable
>>         spanned  65536
>> 	present  0
>> 	managed  0
>>   start_pfn:           1310720
>> 
>> So I am not sure this is still clearly defined?
>
>Could you be more specific what is not clearly defined? You have
>offlined all online memory blocks so present/managed is 0 while the
>spanned is unchanged because the zone is still defined in range
>[1310720, 1376256].
>

The zone is empty after remove these two memory blocks, while we still think
it is defined in range [1310720, 1376256]. This is what I want to point.

>I also do not see how this is related with the discussed patch as there
>is no zone interleaving involved.

I had a patch which fix the behavior, which means we can make sure the zone is
empty after remove these two memory blocks. As you mentioned in the reply,
http://www.spinics.net/lists/linux-mm/msg130230.html, I thought you would have
this fixed in this cycle. While it looks we will still have this behavior in
this cycle and looks no intend to fix this?

>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-07-07  8:37                 ` Wei Yang
@ 2017-07-07 12:41                   ` Michal Hocko
  0 siblings, 0 replies; 48+ messages in thread
From: Michal Hocko @ 2017-07-07 12:41 UTC (permalink / raw)
  To: Wei Yang
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 07-07-17 16:37:23, Wei Yang wrote:
> On Thu, Jul 06, 2017 at 08:56:50AM +0200, Michal Hocko wrote:
> >> Below is a result with a little changed kernel to show the start_pfn always.
> >> The sequence is:
> >> 1. bootup
> >> 
> >> Node 0, zone  Movable
> >>         spanned  65536
> >> 	present  0
> >> 	managed  0
> >>   start_pfn:           0
> >> 
> >> 2. online movable 2 continuous memory_blocks
> >> 
> >> Node 0, zone  Movable
> >>         spanned  65536
> >> 	present  65536
> >> 	managed  65536
> >>   start_pfn:           1310720
> >> 
> >> 3. offline 2nd memory_blocks
> >> 
> >> Node 0, zone  Movable
> >>         spanned  65536
> >> 	present  32768
> >> 	managed  32768
> >>   start_pfn:           1310720
> >> 
> >> 4. offline 1st memory_blocks
> >> 
> >> Node 0, zone  Movable
> >>         spanned  65536
> >> 	present  0
> >> 	managed  0
> >>   start_pfn:           1310720
> >> 
> >> So I am not sure this is still clearly defined?
> >
> >Could you be more specific what is not clearly defined? You have
> >offlined all online memory blocks so present/managed is 0 while the
> >spanned is unchanged because the zone is still defined in range
> >[1310720, 1376256].
> >
> 
> The zone is empty after remove these two memory blocks, while we still think
> it is defined in range [1310720, 1376256].

Yes and present/managed shows that the zone is empty. It's range spans
some range but there are no online pages.

> This is what I want to point.

As I've said several times already. This is somemething that _could_ be
fixed but I would rather not to do so until there is a _readl_ usecase
which would depend on it. Especially when we can online any memory block
to the zone you like. We should really strive to reduce the amount of
code rather than keep it just in case without anybody actually using it.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-07 12:41                   ` Michal Hocko
  0 siblings, 0 replies; 48+ messages in thread
From: Michal Hocko @ 2017-07-07 12:41 UTC (permalink / raw)
  To: Wei Yang
  Cc: Linux-MM, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu, Xishi Qiu,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 07-07-17 16:37:23, Wei Yang wrote:
> On Thu, Jul 06, 2017 at 08:56:50AM +0200, Michal Hocko wrote:
> >> Below is a result with a little changed kernel to show the start_pfn always.
> >> The sequence is:
> >> 1. bootup
> >> 
> >> Node 0, zone  Movable
> >>         spanned  65536
> >> 	present  0
> >> 	managed  0
> >>   start_pfn:           0
> >> 
> >> 2. online movable 2 continuous memory_blocks
> >> 
> >> Node 0, zone  Movable
> >>         spanned  65536
> >> 	present  65536
> >> 	managed  65536
> >>   start_pfn:           1310720
> >> 
> >> 3. offline 2nd memory_blocks
> >> 
> >> Node 0, zone  Movable
> >>         spanned  65536
> >> 	present  32768
> >> 	managed  32768
> >>   start_pfn:           1310720
> >> 
> >> 4. offline 1st memory_blocks
> >> 
> >> Node 0, zone  Movable
> >>         spanned  65536
> >> 	present  0
> >> 	managed  0
> >>   start_pfn:           1310720
> >> 
> >> So I am not sure this is still clearly defined?
> >
> >Could you be more specific what is not clearly defined? You have
> >offlined all online memory blocks so present/managed is 0 while the
> >spanned is unchanged because the zone is still defined in range
> >[1310720, 1376256].
> >
> 
> The zone is empty after remove these two memory blocks, while we still think
> it is defined in range [1310720, 1376256].

Yes and present/managed shows that the zone is empty. It's range spans
some range but there are no online pages.

> This is what I want to point.

As I've said several times already. This is somemething that _could_ be
fixed but I would rather not to do so until there is a _readl_ usecase
which would depend on it. Especially when we can online any memory block
to the zone you like. We should really strive to reduce the amount of
code rather than keep it just in case without anybody actually using it.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering
  2017-06-29  7:35 ` [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering Michal Hocko
  2017-06-30  0:45   ` Joonsoo Kim
@ 2017-07-07 14:34   ` Vlastimil Babka
  1 sibling, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-07-07 14:34 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Michal Hocko

On 06/29/2017 09:35 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Prior to "mm, memory_hotplug: do not associate hotadded memory to zones
> until online" we used to allow to change the valid zone types of a
> memory block if it is adjacent to a different zone type. This fact was
> reflected in memoryNN/valid_zones by the ordering of printed zones.
> The first one was default (echo online > memoryNN/state) and the other
> one could be onlined explicitly by online_{movable,kernel}. This
> behavior was removed by the said patch and as such the ordering was
> not all that important. In most cases a kernel zone would be default
> anyway. The only exception is movable_node handled by "mm,
> memory_hotplug: support movable_node for hotpluggable nodes".
> 
> Let's reintroduce this behavior again because later patch will remove
> the zone overlap restriction and so user will be allowed to online
> kernel resp. movable block regardless of its placement. Original
> behavior will then become significant again because it would be
> non-trivial for users to see what is the default zone to online into.
> 
> Implementation is really simple. Pull out zone selection out of
> move_pfn_range into zone_for_pfn_range helper and use it in
> show_valid_zones to display the zone for default onlining and then
> both kernel and movable if they are allowed. Default online zone is not
> duplicated.

Hm I wouldn't call this maze of functions simple, but seems to be correct.
Maybe Patch 2/2 will simplify the code...

> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> 
> fold me "mm, memory_hotplug: display allowed zones in the preferred ordering"
> ---
>  drivers/base/memory.c          | 33 +++++++++++++------
>  include/linux/memory_hotplug.h |  2 +-
>  mm/memory_hotplug.c            | 73 ++++++++++++++++++++++++------------------
>  3 files changed, 65 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index c7c4e0325cdb..26383af9900c 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -388,6 +388,22 @@ static ssize_t show_phys_device(struct device *dev,
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> +static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
> +		unsigned long nr_pages, int online_type,
> +		struct zone *default_zone)
> +{
> +	struct zone *zone;
> +
> +	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
> +		return;
> +
> +	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
> +	if (zone != default_zone) {
> +		strcat(buf, " ");
> +		strcat(buf, zone->name);
> +	}
> +}
> +
>  static ssize_t show_valid_zones(struct device *dev,
>  				struct device_attribute *attr, char *buf)
>  {
> @@ -395,7 +411,7 @@ static ssize_t show_valid_zones(struct device *dev,
>  	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
>  	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
>  	unsigned long valid_start_pfn, valid_end_pfn;
> -	bool append = false;
> +	struct zone *default_zone;
>  	int nid;
>  
>  	/*
> @@ -418,16 +434,13 @@ static ssize_t show_valid_zones(struct device *dev,
>  	}
>  
>  	nid = pfn_to_nid(start_pfn);
> -	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) {
> -		strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name);
> -		append = true;
> -	}
> +	default_zone = zone_for_pfn_range(MMOP_ONLINE_KEEP, nid, start_pfn, nr_pages);
> +	strcat(buf, default_zone->name);
>  
> -	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_MOVABLE)) {
> -		if (append)
> -			strcat(buf, " ");
> -		strcat(buf, NODE_DATA(nid)->node_zones[ZONE_MOVABLE].name);
> -	}
> +	print_allowed_zone(buf, nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL,
> +			default_zone);
> +	print_allowed_zone(buf, nid, start_pfn, nr_pages, MMOP_ONLINE_MOVABLE,
> +			default_zone);
>  out:
>  	strcat(buf, "\n");
>  
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index c8a5056a5ae0..5e6e4cc36ff4 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -319,6 +319,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
>  					  unsigned long pnum);
>  extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
>  		int online_type);
> -extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn,
> +extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
>  		unsigned long nr_pages);
>  #endif /* __LINUX_MEMORY_HOTPLUG_H */
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index b4015a39d108..6b9a60115e37 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -836,31 +836,6 @@ static void node_states_set_node(int node, struct memory_notify *arg)
>  	node_set_state(node, N_MEMORY);
>  }
>  
> -bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
> -{
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> -	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
> -
> -	/*
> -	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
> -	 * physically before ZONE_MOVABLE. All we need is they do not
> -	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
> -	 * though so let's stick with it for simplicity for now.
> -	 * TODO make sure we do not overlap with ZONE_DEVICE
> -	 */
> -	if (online_type == MMOP_ONLINE_KERNEL) {
> -		if (zone_is_empty(movable_zone))
> -			return true;
> -		return movable_zone->zone_start_pfn >= pfn + nr_pages;
> -	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> -		return zone_end_pfn(default_zone) <= pfn;
> -	}
> -
> -	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
> -	return online_type == MMOP_ONLINE_KEEP;
> -}
> -
>  static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn,
>  		unsigned long nr_pages)
>  {
> @@ -919,7 +894,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
>   * If no kernel zone covers this pfn range it will automatically go
>   * to the ZONE_NORMAL.
>   */
> -struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> +static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>  		unsigned long nr_pages)
>  {
>  	struct pglist_data *pgdat = NODE_DATA(nid);
> @@ -935,6 +910,31 @@ struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>  	return &pgdat->node_zones[ZONE_NORMAL];
>  }
>  
> +bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
> +{
> +	struct pglist_data *pgdat = NODE_DATA(nid);
> +	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> +	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
> +
> +	/*
> +	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
> +	 * physically before ZONE_MOVABLE. All we need is they do not
> +	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
> +	 * though so let's stick with it for simplicity for now.
> +	 * TODO make sure we do not overlap with ZONE_DEVICE
> +	 */
> +	if (online_type == MMOP_ONLINE_KERNEL) {
> +		if (zone_is_empty(movable_zone))
> +			return true;
> +		return movable_zone->zone_start_pfn >= pfn + nr_pages;
> +	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> +		return zone_end_pfn(default_zone) <= pfn;
> +	}
> +
> +	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
> +	return online_type == MMOP_ONLINE_KEEP;
> +}
> +
>  static inline bool movable_pfn_range(int nid, struct zone *default_zone,
>  		unsigned long start_pfn, unsigned long nr_pages)
>  {
> @@ -948,12 +948,8 @@ static inline bool movable_pfn_range(int nid, struct zone *default_zone,
>  	return !zone_intersects(default_zone, start_pfn, nr_pages);
>  }
>  
> -/*
> - * Associates the given pfn range with the given node and the zone appropriate
> - * for the given online type.
> - */
> -static struct zone * __meminit move_pfn_range(int online_type, int nid,
> -		unsigned long start_pfn, unsigned long nr_pages)
> +struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
> +		unsigned long nr_pages)
>  {
>  	struct pglist_data *pgdat = NODE_DATA(nid);
>  	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
> @@ -972,6 +968,19 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
>  		zone = &pgdat->node_zones[ZONE_MOVABLE];
>  	}
>  
> +	return zone;
> +}
> +
> +/*
> + * Associates the given pfn range with the given node and the zone appropriate
> + * for the given online type.
> + */
> +static struct zone * __meminit move_pfn_range(int online_type, int nid,
> +		unsigned long start_pfn, unsigned long nr_pages)
> +{
> +	struct zone *zone;
> +
> +	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
>  	move_pfn_range_to_zone(zone, start_pfn, nr_pages);
>  	return zone;
>  }
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering
@ 2017-07-07 14:34   ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-07-07 14:34 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Michal Hocko

On 06/29/2017 09:35 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Prior to "mm, memory_hotplug: do not associate hotadded memory to zones
> until online" we used to allow to change the valid zone types of a
> memory block if it is adjacent to a different zone type. This fact was
> reflected in memoryNN/valid_zones by the ordering of printed zones.
> The first one was default (echo online > memoryNN/state) and the other
> one could be onlined explicitly by online_{movable,kernel}. This
> behavior was removed by the said patch and as such the ordering was
> not all that important. In most cases a kernel zone would be default
> anyway. The only exception is movable_node handled by "mm,
> memory_hotplug: support movable_node for hotpluggable nodes".
> 
> Let's reintroduce this behavior again because later patch will remove
> the zone overlap restriction and so user will be allowed to online
> kernel resp. movable block regardless of its placement. Original
> behavior will then become significant again because it would be
> non-trivial for users to see what is the default zone to online into.
> 
> Implementation is really simple. Pull out zone selection out of
> move_pfn_range into zone_for_pfn_range helper and use it in
> show_valid_zones to display the zone for default onlining and then
> both kernel and movable if they are allowed. Default online zone is not
> duplicated.

Hm I wouldn't call this maze of functions simple, but seems to be correct.
Maybe Patch 2/2 will simplify the code...

> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> 
> fold me "mm, memory_hotplug: display allowed zones in the preferred ordering"
> ---
>  drivers/base/memory.c          | 33 +++++++++++++------
>  include/linux/memory_hotplug.h |  2 +-
>  mm/memory_hotplug.c            | 73 ++++++++++++++++++++++++------------------
>  3 files changed, 65 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index c7c4e0325cdb..26383af9900c 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -388,6 +388,22 @@ static ssize_t show_phys_device(struct device *dev,
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> +static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
> +		unsigned long nr_pages, int online_type,
> +		struct zone *default_zone)
> +{
> +	struct zone *zone;
> +
> +	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
> +		return;
> +
> +	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
> +	if (zone != default_zone) {
> +		strcat(buf, " ");
> +		strcat(buf, zone->name);
> +	}
> +}
> +
>  static ssize_t show_valid_zones(struct device *dev,
>  				struct device_attribute *attr, char *buf)
>  {
> @@ -395,7 +411,7 @@ static ssize_t show_valid_zones(struct device *dev,
>  	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
>  	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
>  	unsigned long valid_start_pfn, valid_end_pfn;
> -	bool append = false;
> +	struct zone *default_zone;
>  	int nid;
>  
>  	/*
> @@ -418,16 +434,13 @@ static ssize_t show_valid_zones(struct device *dev,
>  	}
>  
>  	nid = pfn_to_nid(start_pfn);
> -	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) {
> -		strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name);
> -		append = true;
> -	}
> +	default_zone = zone_for_pfn_range(MMOP_ONLINE_KEEP, nid, start_pfn, nr_pages);
> +	strcat(buf, default_zone->name);
>  
> -	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_MOVABLE)) {
> -		if (append)
> -			strcat(buf, " ");
> -		strcat(buf, NODE_DATA(nid)->node_zones[ZONE_MOVABLE].name);
> -	}
> +	print_allowed_zone(buf, nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL,
> +			default_zone);
> +	print_allowed_zone(buf, nid, start_pfn, nr_pages, MMOP_ONLINE_MOVABLE,
> +			default_zone);
>  out:
>  	strcat(buf, "\n");
>  
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index c8a5056a5ae0..5e6e4cc36ff4 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -319,6 +319,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
>  					  unsigned long pnum);
>  extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
>  		int online_type);
> -extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn,
> +extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
>  		unsigned long nr_pages);
>  #endif /* __LINUX_MEMORY_HOTPLUG_H */
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index b4015a39d108..6b9a60115e37 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -836,31 +836,6 @@ static void node_states_set_node(int node, struct memory_notify *arg)
>  	node_set_state(node, N_MEMORY);
>  }
>  
> -bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
> -{
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> -	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
> -
> -	/*
> -	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
> -	 * physically before ZONE_MOVABLE. All we need is they do not
> -	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
> -	 * though so let's stick with it for simplicity for now.
> -	 * TODO make sure we do not overlap with ZONE_DEVICE
> -	 */
> -	if (online_type == MMOP_ONLINE_KERNEL) {
> -		if (zone_is_empty(movable_zone))
> -			return true;
> -		return movable_zone->zone_start_pfn >= pfn + nr_pages;
> -	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> -		return zone_end_pfn(default_zone) <= pfn;
> -	}
> -
> -	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
> -	return online_type == MMOP_ONLINE_KEEP;
> -}
> -
>  static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn,
>  		unsigned long nr_pages)
>  {
> @@ -919,7 +894,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
>   * If no kernel zone covers this pfn range it will automatically go
>   * to the ZONE_NORMAL.
>   */
> -struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> +static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>  		unsigned long nr_pages)
>  {
>  	struct pglist_data *pgdat = NODE_DATA(nid);
> @@ -935,6 +910,31 @@ struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>  	return &pgdat->node_zones[ZONE_NORMAL];
>  }
>  
> +bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
> +{
> +	struct pglist_data *pgdat = NODE_DATA(nid);
> +	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> +	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
> +
> +	/*
> +	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
> +	 * physically before ZONE_MOVABLE. All we need is they do not
> +	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
> +	 * though so let's stick with it for simplicity for now.
> +	 * TODO make sure we do not overlap with ZONE_DEVICE
> +	 */
> +	if (online_type == MMOP_ONLINE_KERNEL) {
> +		if (zone_is_empty(movable_zone))
> +			return true;
> +		return movable_zone->zone_start_pfn >= pfn + nr_pages;
> +	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> +		return zone_end_pfn(default_zone) <= pfn;
> +	}
> +
> +	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
> +	return online_type == MMOP_ONLINE_KEEP;
> +}
> +
>  static inline bool movable_pfn_range(int nid, struct zone *default_zone,
>  		unsigned long start_pfn, unsigned long nr_pages)
>  {
> @@ -948,12 +948,8 @@ static inline bool movable_pfn_range(int nid, struct zone *default_zone,
>  	return !zone_intersects(default_zone, start_pfn, nr_pages);
>  }
>  
> -/*
> - * Associates the given pfn range with the given node and the zone appropriate
> - * for the given online type.
> - */
> -static struct zone * __meminit move_pfn_range(int online_type, int nid,
> -		unsigned long start_pfn, unsigned long nr_pages)
> +struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
> +		unsigned long nr_pages)
>  {
>  	struct pglist_data *pgdat = NODE_DATA(nid);
>  	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
> @@ -972,6 +968,19 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
>  		zone = &pgdat->node_zones[ZONE_MOVABLE];
>  	}
>  
> +	return zone;
> +}
> +
> +/*
> + * Associates the given pfn range with the given node and the zone appropriate
> + * for the given online type.
> + */
> +static struct zone * __meminit move_pfn_range(int online_type, int nid,
> +		unsigned long start_pfn, unsigned long nr_pages)
> +{
> +	struct zone *zone;
> +
> +	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
>  	move_pfn_range_to_zone(zone, start_pfn, nr_pages);
>  	return zone;
>  }
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-06-29  7:35 ` [PATCH 2/2] mm, memory_hotplug: remove zone restrictions Michal Hocko
  2017-06-30  1:16   ` Joonsoo Kim
  2017-06-30  3:09   ` Wei Yang
@ 2017-07-07 15:02   ` Vlastimil Babka
  2017-07-10  6:45     ` Michal Hocko
  2017-07-12 12:49   ` Michal Hocko
  3 siblings, 1 reply; 48+ messages in thread
From: Vlastimil Babka @ 2017-07-07 15:02 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Michal Hocko, Linux API

[+CC linux-api]

On 06/29/2017 09:35 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> to precede the Movable zone in the physical memory range. The purpose of
> the movable zone is, however, not bound to any physical memory restriction.
> It merely defines a class of migrateable and reclaimable memory.
> 
> There are users (e.g. CMA) who might want to reserve specific physical
> memory ranges for their own purpose. Moreover our pfn walkers have to be
> prepared for zones overlapping in the physical range already because we
> do support interleaving NUMA nodes and therefore zones can interleave as
> well. This means we can allow each memory block to be associated with a
> different zone.
> 
> Loosen the current onlining semantic and allow explicit onlining type on
> any memblock. That means that online_{kernel,movable} will be allowed
> regardless of the physical address of the memblock as long as it is
> offline of course. This might result in moveble zone overlapping with
> other kernel zones. Default onlining then becomes a bit tricky but still
> sensible. echo online > memoryXY/state will online the given block to
> 	1) the default zone if the given range is outside of any zone
> 	2) the enclosing zone if such a zone doesn't interleave with
> 	   any other zone
>         3) the default zone if more zones interleave for this range
> where default zone is movable zone only if movable_node is enabled
> otherwise it is a kernel zone.
> 
> Here is an example of the semantic with (movable_node is not present but
> it work in an analogous way). We start with following memblocks, all of
> them offline
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> Now, we online block 34 in default mode and block 37 as movable
> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable

Hm so previously, blocks 37-41 would only allow Movable at this point, right?
Shouldn't we still default to Movable for them? We might be breaking some
existing userspace here.
IMHO onlining new memory past existing blocks is more common use case than
onlining memory between two blocks that are already online?

I also agree with Wei Yang that it's rather fuzzy that a zone that has been
completely offlined will affect the defaults for the next onlining just because
it has some spanned range, which is however empty of actual populated memory.

Maybe it would simplest for everyone to just default to Normal, except
movable_node? That's if we decide that the potential breakage I described above
is a non-issue.

> As we can see all other blocks can still be onlined both into Normal and
> Movable zones and the Normal is default because the Movable zone spans
> only block37 now.
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Now the default zone for blocks 37-41 has changed because movable zone
> spans that range.
> root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Note that the block 39 now belongs to the zone Normal and so block38
> falls into Normal by default as well.
> 
> For completness
> root@test1:/sys/devices/system/node/node1# for i in memory[34]?
> do
> 	echo online > $i/state 2>/dev/null
> done
> 
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable
> memory41/valid_zones:Movable
> 
> Implementation wise the change is quite straightforward. We can get rid
> of allow_online_pfn_range altogether. online_pages allows only offline
> nodes already. The original default_zone_for_pfn will become
> default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
> above semantic. zone_for_pfn_range is slightly reorganized to implement
> kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
> a catch all default behavior.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  drivers/base/memory.c |  3 ---
>  mm/memory_hotplug.c   | 74 ++++++++++++++++-----------------------------------
>  2 files changed, 23 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 26383af9900c..4e3b61cda520 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -394,9 +394,6 @@ static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
>  {
>  	struct zone *zone;
>  
> -	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
> -		return;
> -
>  	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
>  	if (zone != default_zone) {
>  		strcat(buf, " ");
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6b9a60115e37..670f7acbecf4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -894,7 +894,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
>   * If no kernel zone covers this pfn range it will automatically go
>   * to the ZONE_NORMAL.
>   */
> -static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> +static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn,
>  		unsigned long nr_pages)
>  {
>  	struct pglist_data *pgdat = NODE_DATA(nid);
> @@ -910,65 +910,40 @@ static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>  	return &pgdat->node_zones[ZONE_NORMAL];
>  }
>  
> -bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
> +static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> +		unsigned long nr_pages)
>  {
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> -	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
> +	struct zone *kernel_zone = default_kernel_zone_for_pfn(nid, start_pfn,
> +			nr_pages);
> +	struct zone *movable_zone = &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
> +	bool in_kernel = zone_intersects(kernel_zone, start_pfn, nr_pages);
> +	bool in_movable = zone_intersects(movable_zone, start_pfn, nr_pages);
>  
>  	/*
> -	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
> -	 * physically before ZONE_MOVABLE. All we need is they do not
> -	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
> -	 * though so let's stick with it for simplicity for now.
> -	 * TODO make sure we do not overlap with ZONE_DEVICE
> +	 * We inherit the existing zone in a simple case where zones do not
> +	 * overlap in the given range
>  	 */
> -	if (online_type == MMOP_ONLINE_KERNEL) {
> -		if (zone_is_empty(movable_zone))
> -			return true;
> -		return movable_zone->zone_start_pfn >= pfn + nr_pages;
> -	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> -		return zone_end_pfn(default_zone) <= pfn;
> -	}
> -
> -	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
> -	return online_type == MMOP_ONLINE_KEEP;
> -}
> -
> -static inline bool movable_pfn_range(int nid, struct zone *default_zone,
> -		unsigned long start_pfn, unsigned long nr_pages)
> -{
> -	if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
> -				MMOP_ONLINE_KERNEL))
> -		return true;
> -
> -	if (!movable_node_is_enabled())
> -		return false;
> +	if (in_kernel ^ in_movable)
> +		return (in_kernel) ? kernel_zone : movable_zone;
>  
> -	return !zone_intersects(default_zone, start_pfn, nr_pages);
> +	/*
> +	 * If the range doesn't belong to any zone or two zones overlap in the
> +	 * given range then we use movable zone only if movable_node is
> +	 * enabled because we always online to a kernel zone by default.
> +	 */
> +	return movable_node_enabled ? movable_zone : kernel_zone;
>  }
>  
>  struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
>  		unsigned long nr_pages)
>  {
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
> +	if (online_type == MMOP_ONLINE_KERNEL)
> +		return default_kernel_zone_for_pfn(nid, start_pfn, nr_pages);
>  
> -	if (online_type == MMOP_ONLINE_KEEP) {
> -		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> -		/*
> -		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
> -		 * movable zone if that is not possible (e.g. we are within
> -		 * or past the existing movable zone). movable_node overrides
> -		 * this default and defaults to movable zone
> -		 */
> -		if (movable_pfn_range(nid, zone, start_pfn, nr_pages))
> -			zone = movable_zone;
> -	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> -		zone = &pgdat->node_zones[ZONE_MOVABLE];
> -	}
> +	if (online_type == MMOP_ONLINE_MOVABLE)
> +		return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
>  
> -	return zone;
> +	return default_zone_for_pfn(nid, start_pfn, nr_pages);
>  }
>  
>  /*
> @@ -997,9 +972,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>  	struct memory_notify arg;
>  
>  	nid = pfn_to_nid(pfn);
> -	if (!allow_online_pfn_range(nid, pfn, nr_pages, online_type))
> -		return -EINVAL;
> -
>  	/* associate pfn range with the zone */
>  	zone = move_pfn_range(online_type, nid, pfn, nr_pages);
>  
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-07 15:02   ` Vlastimil Babka
  2017-07-10  6:45     ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Vlastimil Babka @ 2017-07-07 15:02 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Michal Hocko, Linux API

[+CC linux-api]

On 06/29/2017 09:35 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> to precede the Movable zone in the physical memory range. The purpose of
> the movable zone is, however, not bound to any physical memory restriction.
> It merely defines a class of migrateable and reclaimable memory.
> 
> There are users (e.g. CMA) who might want to reserve specific physical
> memory ranges for their own purpose. Moreover our pfn walkers have to be
> prepared for zones overlapping in the physical range already because we
> do support interleaving NUMA nodes and therefore zones can interleave as
> well. This means we can allow each memory block to be associated with a
> different zone.
> 
> Loosen the current onlining semantic and allow explicit onlining type on
> any memblock. That means that online_{kernel,movable} will be allowed
> regardless of the physical address of the memblock as long as it is
> offline of course. This might result in moveble zone overlapping with
> other kernel zones. Default onlining then becomes a bit tricky but still
> sensible. echo online > memoryXY/state will online the given block to
> 	1) the default zone if the given range is outside of any zone
> 	2) the enclosing zone if such a zone doesn't interleave with
> 	   any other zone
>         3) the default zone if more zones interleave for this range
> where default zone is movable zone only if movable_node is enabled
> otherwise it is a kernel zone.
> 
> Here is an example of the semantic with (movable_node is not present but
> it work in an analogous way). We start with following memblocks, all of
> them offline
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> Now, we online block 34 in default mode and block 37 as movable
> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable

Hm so previously, blocks 37-41 would only allow Movable at this point, right?
Shouldn't we still default to Movable for them? We might be breaking some
existing userspace here.
IMHO onlining new memory past existing blocks is more common use case than
onlining memory between two blocks that are already online?

I also agree with Wei Yang that it's rather fuzzy that a zone that has been
completely offlined will affect the defaults for the next onlining just because
it has some spanned range, which is however empty of actual populated memory.

Maybe it would simplest for everyone to just default to Normal, except
movable_node? That's if we decide that the potential breakage I described above
is a non-issue.

> As we can see all other blocks can still be onlined both into Normal and
> Movable zones and the Normal is default because the Movable zone spans
> only block37 now.
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Now the default zone for blocks 37-41 has changed because movable zone
> spans that range.
> root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Note that the block 39 now belongs to the zone Normal and so block38
> falls into Normal by default as well.
> 
> For completness
> root@test1:/sys/devices/system/node/node1# for i in memory[34]?
> do
> 	echo online > $i/state 2>/dev/null
> done
> 
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable
> memory41/valid_zones:Movable
> 
> Implementation wise the change is quite straightforward. We can get rid
> of allow_online_pfn_range altogether. online_pages allows only offline
> nodes already. The original default_zone_for_pfn will become
> default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
> above semantic. zone_for_pfn_range is slightly reorganized to implement
> kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
> a catch all default behavior.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  drivers/base/memory.c |  3 ---
>  mm/memory_hotplug.c   | 74 ++++++++++++++++-----------------------------------
>  2 files changed, 23 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 26383af9900c..4e3b61cda520 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -394,9 +394,6 @@ static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
>  {
>  	struct zone *zone;
>  
> -	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
> -		return;
> -
>  	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
>  	if (zone != default_zone) {
>  		strcat(buf, " ");
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6b9a60115e37..670f7acbecf4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -894,7 +894,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
>   * If no kernel zone covers this pfn range it will automatically go
>   * to the ZONE_NORMAL.
>   */
> -static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> +static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn,
>  		unsigned long nr_pages)
>  {
>  	struct pglist_data *pgdat = NODE_DATA(nid);
> @@ -910,65 +910,40 @@ static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>  	return &pgdat->node_zones[ZONE_NORMAL];
>  }
>  
> -bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
> +static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> +		unsigned long nr_pages)
>  {
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> -	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
> +	struct zone *kernel_zone = default_kernel_zone_for_pfn(nid, start_pfn,
> +			nr_pages);
> +	struct zone *movable_zone = &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
> +	bool in_kernel = zone_intersects(kernel_zone, start_pfn, nr_pages);
> +	bool in_movable = zone_intersects(movable_zone, start_pfn, nr_pages);
>  
>  	/*
> -	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
> -	 * physically before ZONE_MOVABLE. All we need is they do not
> -	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
> -	 * though so let's stick with it for simplicity for now.
> -	 * TODO make sure we do not overlap with ZONE_DEVICE
> +	 * We inherit the existing zone in a simple case where zones do not
> +	 * overlap in the given range
>  	 */
> -	if (online_type == MMOP_ONLINE_KERNEL) {
> -		if (zone_is_empty(movable_zone))
> -			return true;
> -		return movable_zone->zone_start_pfn >= pfn + nr_pages;
> -	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> -		return zone_end_pfn(default_zone) <= pfn;
> -	}
> -
> -	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
> -	return online_type == MMOP_ONLINE_KEEP;
> -}
> -
> -static inline bool movable_pfn_range(int nid, struct zone *default_zone,
> -		unsigned long start_pfn, unsigned long nr_pages)
> -{
> -	if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
> -				MMOP_ONLINE_KERNEL))
> -		return true;
> -
> -	if (!movable_node_is_enabled())
> -		return false;
> +	if (in_kernel ^ in_movable)
> +		return (in_kernel) ? kernel_zone : movable_zone;
>  
> -	return !zone_intersects(default_zone, start_pfn, nr_pages);
> +	/*
> +	 * If the range doesn't belong to any zone or two zones overlap in the
> +	 * given range then we use movable zone only if movable_node is
> +	 * enabled because we always online to a kernel zone by default.
> +	 */
> +	return movable_node_enabled ? movable_zone : kernel_zone;
>  }
>  
>  struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
>  		unsigned long nr_pages)
>  {
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
> +	if (online_type == MMOP_ONLINE_KERNEL)
> +		return default_kernel_zone_for_pfn(nid, start_pfn, nr_pages);
>  
> -	if (online_type == MMOP_ONLINE_KEEP) {
> -		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> -		/*
> -		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
> -		 * movable zone if that is not possible (e.g. we are within
> -		 * or past the existing movable zone). movable_node overrides
> -		 * this default and defaults to movable zone
> -		 */
> -		if (movable_pfn_range(nid, zone, start_pfn, nr_pages))
> -			zone = movable_zone;
> -	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> -		zone = &pgdat->node_zones[ZONE_MOVABLE];
> -	}
> +	if (online_type == MMOP_ONLINE_MOVABLE)
> +		return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
>  
> -	return zone;
> +	return default_zone_for_pfn(nid, start_pfn, nr_pages);
>  }
>  
>  /*
> @@ -997,9 +972,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>  	struct memory_notify arg;
>  
>  	nid = pfn_to_nid(pfn);
> -	if (!allow_online_pfn_range(nid, pfn, nr_pages, online_type))
> -		return -EINVAL;
> -
>  	/* associate pfn range with the zone */
>  	zone = move_pfn_range(online_type, nid, pfn, nr_pages);
>  
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-07-07 15:02   ` Vlastimil Babka
@ 2017-07-10  6:45     ` Michal Hocko
  2017-07-10 11:11       ` Vlastimil Babka
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-07-10  6:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, Mel Gorman, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Linux API

On Fri 07-07-17 17:02:59, Vlastimil Babka wrote:
> [+CC linux-api]
> 
> On 06/29/2017 09:35 AM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> > to precede the Movable zone in the physical memory range. The purpose of
> > the movable zone is, however, not bound to any physical memory restriction.
> > It merely defines a class of migrateable and reclaimable memory.
> > 
> > There are users (e.g. CMA) who might want to reserve specific physical
> > memory ranges for their own purpose. Moreover our pfn walkers have to be
> > prepared for zones overlapping in the physical range already because we
> > do support interleaving NUMA nodes and therefore zones can interleave as
> > well. This means we can allow each memory block to be associated with a
> > different zone.
> > 
> > Loosen the current onlining semantic and allow explicit onlining type on
> > any memblock. That means that online_{kernel,movable} will be allowed
> > regardless of the physical address of the memblock as long as it is
> > offline of course. This might result in moveble zone overlapping with
> > other kernel zones. Default onlining then becomes a bit tricky but still
> > sensible. echo online > memoryXY/state will online the given block to
> > 	1) the default zone if the given range is outside of any zone
> > 	2) the enclosing zone if such a zone doesn't interleave with
> > 	   any other zone
> >         3) the default zone if more zones interleave for this range
> > where default zone is movable zone only if movable_node is enabled
> > otherwise it is a kernel zone.
> > 
> > Here is an example of the semantic with (movable_node is not present but
> > it work in an analogous way). We start with following memblocks, all of
> > them offline
> > memory34/valid_zones:Normal Movable
> > memory35/valid_zones:Normal Movable
> > memory36/valid_zones:Normal Movable
> > memory37/valid_zones:Normal Movable
> > memory38/valid_zones:Normal Movable
> > memory39/valid_zones:Normal Movable
> > memory40/valid_zones:Normal Movable
> > memory41/valid_zones:Normal Movable
> > 
> > Now, we online block 34 in default mode and block 37 as movable
> > root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> > root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> > memory34/valid_zones:Normal
> > memory35/valid_zones:Normal Movable
> > memory36/valid_zones:Normal Movable
> > memory37/valid_zones:Movable
> > memory38/valid_zones:Normal Movable
> > memory39/valid_zones:Normal Movable
> > memory40/valid_zones:Normal Movable
> > memory41/valid_zones:Normal Movable
> 
> Hm so previously, blocks 37-41 would only allow Movable at this point, right?

yes

> Shouldn't we still default to Movable for them? We might be breaking some
> existing userspace here.

I do not think so. Prior to this merge window f1dd2cd13c4b ("mm,
memory_hotplug: do not associate hotadded memory to zones until online")
we allowed only the last offline or the adjacent to existing movable
memory block to be onlined movable. So the above wasn't possible. I
doubt we have grown a new user since the rework has been merged but if
you think we should make sure nothing like that happens then we should
probably merge this patch in this release cycle.

> IMHO onlining new memory past existing blocks is more common use case than
> onlining memory between two blocks that are already online?

I am not really sure. It is quite common to online and offline within an
existing zones for the memory ballooning. I do not know what kind of
online operation they use but using the default online operation has
historically preserved the zone so I would be really reluctant to change
that.

> I also agree with Wei Yang that it's rather fuzzy that a zone that has been
> completely offlined will affect the defaults for the next onlining just because
> it has some spanned range, which is however empty of actual populated memory.

I am sorry but I still do not see why. The zone is not empty. It has a
range spanned. It just doesn't have any pages online. I really fail to
see how that is different from zones with large offline holes.

> Maybe it would simplest for everyone to just default to Normal, except
> movable_node? That's if we decide that the potential breakage I
> described above is a non-issue.

This would break the usecase where the memory is onlined a certain type
initially and the offline/online it later on demand for ballooning.

I wish this could be more clear but the default onlining has been fuzzy
since the movable online has been introduced and it is hard to buil
something really clear since then. The proposed semantic is the most
clean I could come up with but I am open to any suggestions that
wouldn't break existing usage.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-10  6:45     ` Michal Hocko
  2017-07-10 11:11       ` Vlastimil Babka
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-07-10  6:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, Mel Gorman, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Linux API

On Fri 07-07-17 17:02:59, Vlastimil Babka wrote:
> [+CC linux-api]
> 
> On 06/29/2017 09:35 AM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> > to precede the Movable zone in the physical memory range. The purpose of
> > the movable zone is, however, not bound to any physical memory restriction.
> > It merely defines a class of migrateable and reclaimable memory.
> > 
> > There are users (e.g. CMA) who might want to reserve specific physical
> > memory ranges for their own purpose. Moreover our pfn walkers have to be
> > prepared for zones overlapping in the physical range already because we
> > do support interleaving NUMA nodes and therefore zones can interleave as
> > well. This means we can allow each memory block to be associated with a
> > different zone.
> > 
> > Loosen the current onlining semantic and allow explicit onlining type on
> > any memblock. That means that online_{kernel,movable} will be allowed
> > regardless of the physical address of the memblock as long as it is
> > offline of course. This might result in moveble zone overlapping with
> > other kernel zones. Default onlining then becomes a bit tricky but still
> > sensible. echo online > memoryXY/state will online the given block to
> > 	1) the default zone if the given range is outside of any zone
> > 	2) the enclosing zone if such a zone doesn't interleave with
> > 	   any other zone
> >         3) the default zone if more zones interleave for this range
> > where default zone is movable zone only if movable_node is enabled
> > otherwise it is a kernel zone.
> > 
> > Here is an example of the semantic with (movable_node is not present but
> > it work in an analogous way). We start with following memblocks, all of
> > them offline
> > memory34/valid_zones:Normal Movable
> > memory35/valid_zones:Normal Movable
> > memory36/valid_zones:Normal Movable
> > memory37/valid_zones:Normal Movable
> > memory38/valid_zones:Normal Movable
> > memory39/valid_zones:Normal Movable
> > memory40/valid_zones:Normal Movable
> > memory41/valid_zones:Normal Movable
> > 
> > Now, we online block 34 in default mode and block 37 as movable
> > root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> > root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> > memory34/valid_zones:Normal
> > memory35/valid_zones:Normal Movable
> > memory36/valid_zones:Normal Movable
> > memory37/valid_zones:Movable
> > memory38/valid_zones:Normal Movable
> > memory39/valid_zones:Normal Movable
> > memory40/valid_zones:Normal Movable
> > memory41/valid_zones:Normal Movable
> 
> Hm so previously, blocks 37-41 would only allow Movable at this point, right?

yes

> Shouldn't we still default to Movable for them? We might be breaking some
> existing userspace here.

I do not think so. Prior to this merge window f1dd2cd13c4b ("mm,
memory_hotplug: do not associate hotadded memory to zones until online")
we allowed only the last offline or the adjacent to existing movable
memory block to be onlined movable. So the above wasn't possible. I
doubt we have grown a new user since the rework has been merged but if
you think we should make sure nothing like that happens then we should
probably merge this patch in this release cycle.

> IMHO onlining new memory past existing blocks is more common use case than
> onlining memory between two blocks that are already online?

I am not really sure. It is quite common to online and offline within an
existing zones for the memory ballooning. I do not know what kind of
online operation they use but using the default online operation has
historically preserved the zone so I would be really reluctant to change
that.

> I also agree with Wei Yang that it's rather fuzzy that a zone that has been
> completely offlined will affect the defaults for the next onlining just because
> it has some spanned range, which is however empty of actual populated memory.

I am sorry but I still do not see why. The zone is not empty. It has a
range spanned. It just doesn't have any pages online. I really fail to
see how that is different from zones with large offline holes.

> Maybe it would simplest for everyone to just default to Normal, except
> movable_node? That's if we decide that the potential breakage I
> described above is a non-issue.

This would break the usecase where the memory is onlined a certain type
initially and the offline/online it later on demand for ballooning.

I wish this could be more clear but the default onlining has been fuzzy
since the movable online has been introduced and it is hard to buil
something really clear since then. The proposed semantic is the most
clean I could come up with but I am open to any suggestions that
wouldn't break existing usage.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-07-10  6:45     ` Michal Hocko
@ 2017-07-10 11:11       ` Vlastimil Babka
  2017-07-10 11:17         ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Vlastimil Babka @ 2017-07-10 11:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Linux API

On 07/10/2017 08:45 AM, Michal Hocko wrote:
> On Fri 07-07-17 17:02:59, Vlastimil Babka wrote:
>> [+CC linux-api]
>>
>> On 06/29/2017 09:35 AM, Michal Hocko wrote:
>>> From: Michal Hocko <mhocko@suse.com>
>>>
>>> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
>>> to precede the Movable zone in the physical memory range. The purpose of
>>> the movable zone is, however, not bound to any physical memory restriction.
>>> It merely defines a class of migrateable and reclaimable memory.
>>>
>>> There are users (e.g. CMA) who might want to reserve specific physical
>>> memory ranges for their own purpose. Moreover our pfn walkers have to be
>>> prepared for zones overlapping in the physical range already because we
>>> do support interleaving NUMA nodes and therefore zones can interleave as
>>> well. This means we can allow each memory block to be associated with a
>>> different zone.
>>>
>>> Loosen the current onlining semantic and allow explicit onlining type on
>>> any memblock. That means that online_{kernel,movable} will be allowed
>>> regardless of the physical address of the memblock as long as it is
>>> offline of course. This might result in moveble zone overlapping with
>>> other kernel zones. Default onlining then becomes a bit tricky but still
>>> sensible. echo online > memoryXY/state will online the given block to
>>> 	1) the default zone if the given range is outside of any zone
>>> 	2) the enclosing zone if such a zone doesn't interleave with
>>> 	   any other zone
>>>         3) the default zone if more zones interleave for this range
>>> where default zone is movable zone only if movable_node is enabled
>>> otherwise it is a kernel zone.
>>>
>>> Here is an example of the semantic with (movable_node is not present but
>>> it work in an analogous way). We start with following memblocks, all of
>>> them offline
>>> memory34/valid_zones:Normal Movable
>>> memory35/valid_zones:Normal Movable
>>> memory36/valid_zones:Normal Movable
>>> memory37/valid_zones:Normal Movable
>>> memory38/valid_zones:Normal Movable
>>> memory39/valid_zones:Normal Movable
>>> memory40/valid_zones:Normal Movable
>>> memory41/valid_zones:Normal Movable
>>>
>>> Now, we online block 34 in default mode and block 37 as movable
>>> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
>>> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
>>> memory34/valid_zones:Normal
>>> memory35/valid_zones:Normal Movable
>>> memory36/valid_zones:Normal Movable
>>> memory37/valid_zones:Movable
>>> memory38/valid_zones:Normal Movable
>>> memory39/valid_zones:Normal Movable
>>> memory40/valid_zones:Normal Movable
>>> memory41/valid_zones:Normal Movable
>>
>> Hm so previously, blocks 37-41 would only allow Movable at this point, right?
> 
> yes
> 
>> Shouldn't we still default to Movable for them? We might be breaking some
>> existing userspace here.
> 
> I do not think so. Prior to this merge window f1dd2cd13c4b ("mm,
> memory_hotplug: do not associate hotadded memory to zones until online")
> we allowed only the last offline or the adjacent to existing movable
> memory block to be onlined movable. So the above wasn't possible.

Not exactly the above, but let's say 1-34 is onlined as Normal, 35-37 is
Movable. Then the only possible action before would be online 38 as
Movable? Now it defaults to Normal?

> I
> doubt we have grown a new user since the rework has been merged but if
> you think we should make sure nothing like that happens then we should
> probably merge this patch in this release cycle.

If I'm right and this is a change compared to pre-rework, then it
doesn't matter.

>> IMHO onlining new memory past existing blocks is more common use case than
>> onlining memory between two blocks that are already online?
> 
> I am not really sure. It is quite common to online and offline within an
> existing zones for the memory ballooning. I do not know what kind of
> online operation they use but using the default online operation has
> historically preserved the zone so I would be really reluctant to change
> that.

Hmm all right, ballooning...

>> I also agree with Wei Yang that it's rather fuzzy that a zone that has been
>> completely offlined will affect the defaults for the next onlining just because
>> it has some spanned range, which is however empty of actual populated memory.
> 
> I am sorry but I still do not see why. The zone is not empty. It has a
> range spanned. It just doesn't have any pages online. I really fail to
> see how that is different from zones with large offline holes.
> 
>> Maybe it would simplest for everyone to just default to Normal, except
>> movable_node? That's if we decide that the potential breakage I
>> described above is a non-issue.
> 
> This would break the usecase where the memory is onlined a certain type
> initially and the offline/online it later on demand for ballooning.
> 
> I wish this could be more clear but the default onlining has been fuzzy
> since the movable online has been introduced and it is hard to buil
> something really clear since then. The proposed semantic is the most
> clean I could come up with but I am open to any suggestions that
> wouldn't break existing usage.

OK I can live with the semantics, if we clear question of breaking
existing users.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-10 11:11       ` Vlastimil Babka
  2017-07-10 11:17         ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Vlastimil Babka @ 2017-07-10 11:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Linux API

On 07/10/2017 08:45 AM, Michal Hocko wrote:
> On Fri 07-07-17 17:02:59, Vlastimil Babka wrote:
>> [+CC linux-api]
>>
>> On 06/29/2017 09:35 AM, Michal Hocko wrote:
>>> From: Michal Hocko <mhocko@suse.com>
>>>
>>> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
>>> to precede the Movable zone in the physical memory range. The purpose of
>>> the movable zone is, however, not bound to any physical memory restriction.
>>> It merely defines a class of migrateable and reclaimable memory.
>>>
>>> There are users (e.g. CMA) who might want to reserve specific physical
>>> memory ranges for their own purpose. Moreover our pfn walkers have to be
>>> prepared for zones overlapping in the physical range already because we
>>> do support interleaving NUMA nodes and therefore zones can interleave as
>>> well. This means we can allow each memory block to be associated with a
>>> different zone.
>>>
>>> Loosen the current onlining semantic and allow explicit onlining type on
>>> any memblock. That means that online_{kernel,movable} will be allowed
>>> regardless of the physical address of the memblock as long as it is
>>> offline of course. This might result in moveble zone overlapping with
>>> other kernel zones. Default onlining then becomes a bit tricky but still
>>> sensible. echo online > memoryXY/state will online the given block to
>>> 	1) the default zone if the given range is outside of any zone
>>> 	2) the enclosing zone if such a zone doesn't interleave with
>>> 	   any other zone
>>>         3) the default zone if more zones interleave for this range
>>> where default zone is movable zone only if movable_node is enabled
>>> otherwise it is a kernel zone.
>>>
>>> Here is an example of the semantic with (movable_node is not present but
>>> it work in an analogous way). We start with following memblocks, all of
>>> them offline
>>> memory34/valid_zones:Normal Movable
>>> memory35/valid_zones:Normal Movable
>>> memory36/valid_zones:Normal Movable
>>> memory37/valid_zones:Normal Movable
>>> memory38/valid_zones:Normal Movable
>>> memory39/valid_zones:Normal Movable
>>> memory40/valid_zones:Normal Movable
>>> memory41/valid_zones:Normal Movable
>>>
>>> Now, we online block 34 in default mode and block 37 as movable
>>> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
>>> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
>>> memory34/valid_zones:Normal
>>> memory35/valid_zones:Normal Movable
>>> memory36/valid_zones:Normal Movable
>>> memory37/valid_zones:Movable
>>> memory38/valid_zones:Normal Movable
>>> memory39/valid_zones:Normal Movable
>>> memory40/valid_zones:Normal Movable
>>> memory41/valid_zones:Normal Movable
>>
>> Hm so previously, blocks 37-41 would only allow Movable at this point, right?
> 
> yes
> 
>> Shouldn't we still default to Movable for them? We might be breaking some
>> existing userspace here.
> 
> I do not think so. Prior to this merge window f1dd2cd13c4b ("mm,
> memory_hotplug: do not associate hotadded memory to zones until online")
> we allowed only the last offline or the adjacent to existing movable
> memory block to be onlined movable. So the above wasn't possible.

Not exactly the above, but let's say 1-34 is onlined as Normal, 35-37 is
Movable. Then the only possible action before would be online 38 as
Movable? Now it defaults to Normal?

> I
> doubt we have grown a new user since the rework has been merged but if
> you think we should make sure nothing like that happens then we should
> probably merge this patch in this release cycle.

If I'm right and this is a change compared to pre-rework, then it
doesn't matter.

>> IMHO onlining new memory past existing blocks is more common use case than
>> onlining memory between two blocks that are already online?
> 
> I am not really sure. It is quite common to online and offline within an
> existing zones for the memory ballooning. I do not know what kind of
> online operation they use but using the default online operation has
> historically preserved the zone so I would be really reluctant to change
> that.

Hmm all right, ballooning...

>> I also agree with Wei Yang that it's rather fuzzy that a zone that has been
>> completely offlined will affect the defaults for the next onlining just because
>> it has some spanned range, which is however empty of actual populated memory.
> 
> I am sorry but I still do not see why. The zone is not empty. It has a
> range spanned. It just doesn't have any pages online. I really fail to
> see how that is different from zones with large offline holes.
> 
>> Maybe it would simplest for everyone to just default to Normal, except
>> movable_node? That's if we decide that the potential breakage I
>> described above is a non-issue.
> 
> This would break the usecase where the memory is onlined a certain type
> initially and the offline/online it later on demand for ballooning.
> 
> I wish this could be more clear but the default onlining has been fuzzy
> since the movable online has been introduced and it is hard to buil
> something really clear since then. The proposed semantic is the most
> clean I could come up with but I am open to any suggestions that
> wouldn't break existing usage.

OK I can live with the semantics, if we clear question of breaking
existing users.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-07-10 11:11       ` Vlastimil Babka
@ 2017-07-10 11:17         ` Michal Hocko
  2017-07-10 12:12           ` Vlastimil Babka
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-07-10 11:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, Mel Gorman, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Linux API

On Mon 10-07-17 13:11:29, Vlastimil Babka wrote:
> On 07/10/2017 08:45 AM, Michal Hocko wrote:
> > On Fri 07-07-17 17:02:59, Vlastimil Babka wrote:
> >> [+CC linux-api]
> >>
> >> On 06/29/2017 09:35 AM, Michal Hocko wrote:
> >>> From: Michal Hocko <mhocko@suse.com>
> >>>
> >>> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> >>> to precede the Movable zone in the physical memory range. The purpose of
> >>> the movable zone is, however, not bound to any physical memory restriction.
> >>> It merely defines a class of migrateable and reclaimable memory.
> >>>
> >>> There are users (e.g. CMA) who might want to reserve specific physical
> >>> memory ranges for their own purpose. Moreover our pfn walkers have to be
> >>> prepared for zones overlapping in the physical range already because we
> >>> do support interleaving NUMA nodes and therefore zones can interleave as
> >>> well. This means we can allow each memory block to be associated with a
> >>> different zone.
> >>>
> >>> Loosen the current onlining semantic and allow explicit onlining type on
> >>> any memblock. That means that online_{kernel,movable} will be allowed
> >>> regardless of the physical address of the memblock as long as it is
> >>> offline of course. This might result in moveble zone overlapping with
> >>> other kernel zones. Default onlining then becomes a bit tricky but still
> >>> sensible. echo online > memoryXY/state will online the given block to
> >>> 	1) the default zone if the given range is outside of any zone
> >>> 	2) the enclosing zone if such a zone doesn't interleave with
> >>> 	   any other zone
> >>>         3) the default zone if more zones interleave for this range
> >>> where default zone is movable zone only if movable_node is enabled
> >>> otherwise it is a kernel zone.
> >>>
> >>> Here is an example of the semantic with (movable_node is not present but
> >>> it work in an analogous way). We start with following memblocks, all of
> >>> them offline
> >>> memory34/valid_zones:Normal Movable
> >>> memory35/valid_zones:Normal Movable
> >>> memory36/valid_zones:Normal Movable
> >>> memory37/valid_zones:Normal Movable
> >>> memory38/valid_zones:Normal Movable
> >>> memory39/valid_zones:Normal Movable
> >>> memory40/valid_zones:Normal Movable
> >>> memory41/valid_zones:Normal Movable
> >>>
> >>> Now, we online block 34 in default mode and block 37 as movable
> >>> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> >>> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> >>> memory34/valid_zones:Normal
> >>> memory35/valid_zones:Normal Movable
> >>> memory36/valid_zones:Normal Movable
> >>> memory37/valid_zones:Movable
> >>> memory38/valid_zones:Normal Movable
> >>> memory39/valid_zones:Normal Movable
> >>> memory40/valid_zones:Normal Movable
> >>> memory41/valid_zones:Normal Movable
> >>
> >> Hm so previously, blocks 37-41 would only allow Movable at this point, right?
> > 
> > yes
> > 
> >> Shouldn't we still default to Movable for them? We might be breaking some
> >> existing userspace here.
> > 
> > I do not think so. Prior to this merge window f1dd2cd13c4b ("mm,
> > memory_hotplug: do not associate hotadded memory to zones until online")
> > we allowed only the last offline or the adjacent to existing movable
> > memory block to be onlined movable. So the above wasn't possible.
> 
> Not exactly the above, but let's say 1-34 is onlined as Normal, 35-37 is
> Movable. Then the only possible action before would be online 38 as
> Movable? Now it defaults to Normal?

Yes. And let me repeat you couldn't onlne 35-37 as movable before. So no
userspace could depend on that before the rework. Or do I still miss
your point?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-10 11:17         ` Michal Hocko
  2017-07-10 12:12           ` Vlastimil Babka
  0 siblings, 1 reply; 48+ messages in thread
From: Michal Hocko @ 2017-07-10 11:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, Mel Gorman, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Linux API

On Mon 10-07-17 13:11:29, Vlastimil Babka wrote:
> On 07/10/2017 08:45 AM, Michal Hocko wrote:
> > On Fri 07-07-17 17:02:59, Vlastimil Babka wrote:
> >> [+CC linux-api]
> >>
> >> On 06/29/2017 09:35 AM, Michal Hocko wrote:
> >>> From: Michal Hocko <mhocko@suse.com>
> >>>
> >>> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> >>> to precede the Movable zone in the physical memory range. The purpose of
> >>> the movable zone is, however, not bound to any physical memory restriction.
> >>> It merely defines a class of migrateable and reclaimable memory.
> >>>
> >>> There are users (e.g. CMA) who might want to reserve specific physical
> >>> memory ranges for their own purpose. Moreover our pfn walkers have to be
> >>> prepared for zones overlapping in the physical range already because we
> >>> do support interleaving NUMA nodes and therefore zones can interleave as
> >>> well. This means we can allow each memory block to be associated with a
> >>> different zone.
> >>>
> >>> Loosen the current onlining semantic and allow explicit onlining type on
> >>> any memblock. That means that online_{kernel,movable} will be allowed
> >>> regardless of the physical address of the memblock as long as it is
> >>> offline of course. This might result in moveble zone overlapping with
> >>> other kernel zones. Default onlining then becomes a bit tricky but still
> >>> sensible. echo online > memoryXY/state will online the given block to
> >>> 	1) the default zone if the given range is outside of any zone
> >>> 	2) the enclosing zone if such a zone doesn't interleave with
> >>> 	   any other zone
> >>>         3) the default zone if more zones interleave for this range
> >>> where default zone is movable zone only if movable_node is enabled
> >>> otherwise it is a kernel zone.
> >>>
> >>> Here is an example of the semantic with (movable_node is not present but
> >>> it work in an analogous way). We start with following memblocks, all of
> >>> them offline
> >>> memory34/valid_zones:Normal Movable
> >>> memory35/valid_zones:Normal Movable
> >>> memory36/valid_zones:Normal Movable
> >>> memory37/valid_zones:Normal Movable
> >>> memory38/valid_zones:Normal Movable
> >>> memory39/valid_zones:Normal Movable
> >>> memory40/valid_zones:Normal Movable
> >>> memory41/valid_zones:Normal Movable
> >>>
> >>> Now, we online block 34 in default mode and block 37 as movable
> >>> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> >>> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> >>> memory34/valid_zones:Normal
> >>> memory35/valid_zones:Normal Movable
> >>> memory36/valid_zones:Normal Movable
> >>> memory37/valid_zones:Movable
> >>> memory38/valid_zones:Normal Movable
> >>> memory39/valid_zones:Normal Movable
> >>> memory40/valid_zones:Normal Movable
> >>> memory41/valid_zones:Normal Movable
> >>
> >> Hm so previously, blocks 37-41 would only allow Movable at this point, right?
> > 
> > yes
> > 
> >> Shouldn't we still default to Movable for them? We might be breaking some
> >> existing userspace here.
> > 
> > I do not think so. Prior to this merge window f1dd2cd13c4b ("mm,
> > memory_hotplug: do not associate hotadded memory to zones until online")
> > we allowed only the last offline or the adjacent to existing movable
> > memory block to be onlined movable. So the above wasn't possible.
> 
> Not exactly the above, but let's say 1-34 is onlined as Normal, 35-37 is
> Movable. Then the only possible action before would be online 38 as
> Movable? Now it defaults to Normal?

Yes. And let me repeat you couldn't onlne 35-37 as movable before. So no
userspace could depend on that before the rework. Or do I still miss
your point?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-07-10 11:17         ` Michal Hocko
@ 2017-07-10 12:12           ` Vlastimil Babka
  2017-07-10 12:30             ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Vlastimil Babka @ 2017-07-10 12:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Linux API

On 07/10/2017 01:17 PM, Michal Hocko wrote:
> On Mon 10-07-17 13:11:29, Vlastimil Babka wrote:
>> On 07/10/2017 08:45 AM, Michal Hocko wrote:
>>> On Fri 07-07-17 17:02:59, Vlastimil Babka wrote:
>>>> [+CC linux-api]
>>>>
>>>>
>>>> Hm so previously, blocks 37-41 would only allow Movable at this point, right?
>>>
>>> yes
>>>
>>>> Shouldn't we still default to Movable for them? We might be breaking some
>>>> existing userspace here.
>>>
>>> I do not think so. Prior to this merge window f1dd2cd13c4b ("mm,
>>> memory_hotplug: do not associate hotadded memory to zones until online")
>>> we allowed only the last offline or the adjacent to existing movable
>>> memory block to be onlined movable. So the above wasn't possible.
>>
>> Not exactly the above, but let's say 1-34 is onlined as Normal, 35-37 is
>> Movable. Then the only possible action before would be online 38 as
>> Movable? Now it defaults to Normal?
> 
> Yes. And let me repeat you couldn't onlne 35-37 as movable before. So no
> userspace could depend on that before the rework. Or do I still miss
> your point?

Ah, I see. "the last offline or the adjacent to existing movable". OK then.

It would be indeed better to not change behavour twice then and merge
this to 4.13, but it's the middle of merge window, so it's not simple...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-10 12:12           ` Vlastimil Babka
  2017-07-10 12:30             ` Michal Hocko
  0 siblings, 1 reply; 48+ messages in thread
From: Vlastimil Babka @ 2017-07-10 12:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Linux API

On 07/10/2017 01:17 PM, Michal Hocko wrote:
> On Mon 10-07-17 13:11:29, Vlastimil Babka wrote:
>> On 07/10/2017 08:45 AM, Michal Hocko wrote:
>>> On Fri 07-07-17 17:02:59, Vlastimil Babka wrote:
>>>> [+CC linux-api]
>>>>
>>>>
>>>> Hm so previously, blocks 37-41 would only allow Movable at this point, right?
>>>
>>> yes
>>>
>>>> Shouldn't we still default to Movable for them? We might be breaking some
>>>> existing userspace here.
>>>
>>> I do not think so. Prior to this merge window f1dd2cd13c4b ("mm,
>>> memory_hotplug: do not associate hotadded memory to zones until online")
>>> we allowed only the last offline or the adjacent to existing movable
>>> memory block to be onlined movable. So the above wasn't possible.
>>
>> Not exactly the above, but let's say 1-34 is onlined as Normal, 35-37 is
>> Movable. Then the only possible action before would be online 38 as
>> Movable? Now it defaults to Normal?
> 
> Yes. And let me repeat you couldn't onlne 35-37 as movable before. So no
> userspace could depend on that before the rework. Or do I still miss
> your point?

Ah, I see. "the last offline or the adjacent to existing movable". OK then.

It would be indeed better to not change behavour twice then and merge
this to 4.13, but it's the middle of merge window, so it's not simple...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-07-10 12:12           ` Vlastimil Babka
@ 2017-07-10 12:30             ` Michal Hocko
  0 siblings, 0 replies; 48+ messages in thread
From: Michal Hocko @ 2017-07-10 12:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, Mel Gorman, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Linux API

On Mon 10-07-17 14:12:09, Vlastimil Babka wrote:
> On 07/10/2017 01:17 PM, Michal Hocko wrote:
> > On Mon 10-07-17 13:11:29, Vlastimil Babka wrote:
> >> On 07/10/2017 08:45 AM, Michal Hocko wrote:
> >>> On Fri 07-07-17 17:02:59, Vlastimil Babka wrote:
> >>>> [+CC linux-api]
> >>>>
> >>>>
> >>>> Hm so previously, blocks 37-41 would only allow Movable at this point, right?
> >>>
> >>> yes
> >>>
> >>>> Shouldn't we still default to Movable for them? We might be breaking some
> >>>> existing userspace here.
> >>>
> >>> I do not think so. Prior to this merge window f1dd2cd13c4b ("mm,
> >>> memory_hotplug: do not associate hotadded memory to zones until online")
> >>> we allowed only the last offline or the adjacent to existing movable
> >>> memory block to be onlined movable. So the above wasn't possible.
> >>
> >> Not exactly the above, but let's say 1-34 is onlined as Normal, 35-37 is
> >> Movable. Then the only possible action before would be online 38 as
> >> Movable? Now it defaults to Normal?
> > 
> > Yes. And let me repeat you couldn't onlne 35-37 as movable before. So no
> > userspace could depend on that before the rework. Or do I still miss
> > your point?
> 
> Ah, I see. "the last offline or the adjacent to existing movable". OK then.
> 
> It would be indeed better to not change behavour twice then and merge
> this to 4.13, but it's the middle of merge window, so it's not simple...

yeah. I was thinking about about how to make the change reasonably
incremental but failed to find a way. I also didn't want to bring too
many changes at once (the code base is just too fragile already).

If there is a general consensus about the semantic we might want to push
the patch this week. I just do not want to rush it too much as this is a
users visible change and it might kick us back in future.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-10 12:30             ` Michal Hocko
  0 siblings, 0 replies; 48+ messages in thread
From: Michal Hocko @ 2017-07-10 12:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, Mel Gorman, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML, Linux API

On Mon 10-07-17 14:12:09, Vlastimil Babka wrote:
> On 07/10/2017 01:17 PM, Michal Hocko wrote:
> > On Mon 10-07-17 13:11:29, Vlastimil Babka wrote:
> >> On 07/10/2017 08:45 AM, Michal Hocko wrote:
> >>> On Fri 07-07-17 17:02:59, Vlastimil Babka wrote:
> >>>> [+CC linux-api]
> >>>>
> >>>>
> >>>> Hm so previously, blocks 37-41 would only allow Movable at this point, right?
> >>>
> >>> yes
> >>>
> >>>> Shouldn't we still default to Movable for them? We might be breaking some
> >>>> existing userspace here.
> >>>
> >>> I do not think so. Prior to this merge window f1dd2cd13c4b ("mm,
> >>> memory_hotplug: do not associate hotadded memory to zones until online")
> >>> we allowed only the last offline or the adjacent to existing movable
> >>> memory block to be onlined movable. So the above wasn't possible.
> >>
> >> Not exactly the above, but let's say 1-34 is onlined as Normal, 35-37 is
> >> Movable. Then the only possible action before would be online 38 as
> >> Movable? Now it defaults to Normal?
> > 
> > Yes. And let me repeat you couldn't onlne 35-37 as movable before. So no
> > userspace could depend on that before the rework. Or do I still miss
> > your point?
> 
> Ah, I see. "the last offline or the adjacent to existing movable". OK then.
> 
> It would be indeed better to not change behavour twice then and merge
> this to 4.13, but it's the middle of merge window, so it's not simple...

yeah. I was thinking about about how to make the change reasonably
incremental but failed to find a way. I also didn't want to bring too
many changes at once (the code base is just too fragile already).

If there is a general consensus about the semantic we might want to push
the patch this week. I just do not want to rush it too much as this is a
users visible change and it might kick us back in future.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-06-29  7:35 ` [PATCH 2/2] mm, memory_hotplug: remove zone restrictions Michal Hocko
                     ` (2 preceding siblings ...)
  2017-07-07 15:02   ` Vlastimil Babka
@ 2017-07-12 12:49   ` Michal Hocko
  3 siblings, 0 replies; 48+ messages in thread
From: Michal Hocko @ 2017-07-12 12:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML

Are there any other concerns regarding this patch? Can I repost it for
inclusion?

On Thu 29-06-17 09:35:09, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> to precede the Movable zone in the physical memory range. The purpose of
> the movable zone is, however, not bound to any physical memory restriction.
> It merely defines a class of migrateable and reclaimable memory.
> 
> There are users (e.g. CMA) who might want to reserve specific physical
> memory ranges for their own purpose. Moreover our pfn walkers have to be
> prepared for zones overlapping in the physical range already because we
> do support interleaving NUMA nodes and therefore zones can interleave as
> well. This means we can allow each memory block to be associated with a
> different zone.
> 
> Loosen the current onlining semantic and allow explicit onlining type on
> any memblock. That means that online_{kernel,movable} will be allowed
> regardless of the physical address of the memblock as long as it is
> offline of course. This might result in moveble zone overlapping with
> other kernel zones. Default onlining then becomes a bit tricky but still
> sensible. echo online > memoryXY/state will online the given block to
> 	1) the default zone if the given range is outside of any zone
> 	2) the enclosing zone if such a zone doesn't interleave with
> 	   any other zone
>         3) the default zone if more zones interleave for this range
> where default zone is movable zone only if movable_node is enabled
> otherwise it is a kernel zone.
> 
> Here is an example of the semantic with (movable_node is not present but
> it work in an analogous way). We start with following memblocks, all of
> them offline
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> Now, we online block 34 in default mode and block 37 as movable
> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> As we can see all other blocks can still be onlined both into Normal and
> Movable zones and the Normal is default because the Movable zone spans
> only block37 now.
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Now the default zone for blocks 37-41 has changed because movable zone
> spans that range.
> root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Note that the block 39 now belongs to the zone Normal and so block38
> falls into Normal by default as well.
> 
> For completness
> root@test1:/sys/devices/system/node/node1# for i in memory[34]?
> do
> 	echo online > $i/state 2>/dev/null
> done
> 
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable
> memory41/valid_zones:Movable
> 
> Implementation wise the change is quite straightforward. We can get rid
> of allow_online_pfn_range altogether. online_pages allows only offline
> nodes already. The original default_zone_for_pfn will become
> default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
> above semantic. zone_for_pfn_range is slightly reorganized to implement
> kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
> a catch all default behavior.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  drivers/base/memory.c |  3 ---
>  mm/memory_hotplug.c   | 74 ++++++++++++++++-----------------------------------
>  2 files changed, 23 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 26383af9900c..4e3b61cda520 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -394,9 +394,6 @@ static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
>  {
>  	struct zone *zone;
>  
> -	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
> -		return;
> -
>  	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
>  	if (zone != default_zone) {
>  		strcat(buf, " ");
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6b9a60115e37..670f7acbecf4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -894,7 +894,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
>   * If no kernel zone covers this pfn range it will automatically go
>   * to the ZONE_NORMAL.
>   */
> -static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> +static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn,
>  		unsigned long nr_pages)
>  {
>  	struct pglist_data *pgdat = NODE_DATA(nid);
> @@ -910,65 +910,40 @@ static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>  	return &pgdat->node_zones[ZONE_NORMAL];
>  }
>  
> -bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
> +static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> +		unsigned long nr_pages)
>  {
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> -	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
> +	struct zone *kernel_zone = default_kernel_zone_for_pfn(nid, start_pfn,
> +			nr_pages);
> +	struct zone *movable_zone = &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
> +	bool in_kernel = zone_intersects(kernel_zone, start_pfn, nr_pages);
> +	bool in_movable = zone_intersects(movable_zone, start_pfn, nr_pages);
>  
>  	/*
> -	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
> -	 * physically before ZONE_MOVABLE. All we need is they do not
> -	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
> -	 * though so let's stick with it for simplicity for now.
> -	 * TODO make sure we do not overlap with ZONE_DEVICE
> +	 * We inherit the existing zone in a simple case where zones do not
> +	 * overlap in the given range
>  	 */
> -	if (online_type == MMOP_ONLINE_KERNEL) {
> -		if (zone_is_empty(movable_zone))
> -			return true;
> -		return movable_zone->zone_start_pfn >= pfn + nr_pages;
> -	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> -		return zone_end_pfn(default_zone) <= pfn;
> -	}
> -
> -	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
> -	return online_type == MMOP_ONLINE_KEEP;
> -}
> -
> -static inline bool movable_pfn_range(int nid, struct zone *default_zone,
> -		unsigned long start_pfn, unsigned long nr_pages)
> -{
> -	if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
> -				MMOP_ONLINE_KERNEL))
> -		return true;
> -
> -	if (!movable_node_is_enabled())
> -		return false;
> +	if (in_kernel ^ in_movable)
> +		return (in_kernel) ? kernel_zone : movable_zone;
>  
> -	return !zone_intersects(default_zone, start_pfn, nr_pages);
> +	/*
> +	 * If the range doesn't belong to any zone or two zones overlap in the
> +	 * given range then we use movable zone only if movable_node is
> +	 * enabled because we always online to a kernel zone by default.
> +	 */
> +	return movable_node_enabled ? movable_zone : kernel_zone;
>  }
>  
>  struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
>  		unsigned long nr_pages)
>  {
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
> +	if (online_type == MMOP_ONLINE_KERNEL)
> +		return default_kernel_zone_for_pfn(nid, start_pfn, nr_pages);
>  
> -	if (online_type == MMOP_ONLINE_KEEP) {
> -		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> -		/*
> -		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
> -		 * movable zone if that is not possible (e.g. we are within
> -		 * or past the existing movable zone). movable_node overrides
> -		 * this default and defaults to movable zone
> -		 */
> -		if (movable_pfn_range(nid, zone, start_pfn, nr_pages))
> -			zone = movable_zone;
> -	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> -		zone = &pgdat->node_zones[ZONE_MOVABLE];
> -	}
> +	if (online_type == MMOP_ONLINE_MOVABLE)
> +		return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
>  
> -	return zone;
> +	return default_zone_for_pfn(nid, start_pfn, nr_pages);
>  }
>  
>  /*
> @@ -997,9 +972,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>  	struct memory_notify arg;
>  
>  	nid = pfn_to_nid(pfn);
> -	if (!allow_online_pfn_range(nid, pfn, nr_pages, online_type))
> -		return -EINVAL;
> -
>  	/* associate pfn range with the zone */
>  	zone = move_pfn_range(online_type, nid, pfn, nr_pages);
>  
> -- 
> 2.11.0
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-12 12:49   ` Michal Hocko
  0 siblings, 0 replies; 48+ messages in thread
From: Michal Hocko @ 2017-07-12 12:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, LKML

Are there any other concerns regarding this patch? Can I repost it for
inclusion?

On Thu 29-06-17 09:35:09, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> to precede the Movable zone in the physical memory range. The purpose of
> the movable zone is, however, not bound to any physical memory restriction.
> It merely defines a class of migrateable and reclaimable memory.
> 
> There are users (e.g. CMA) who might want to reserve specific physical
> memory ranges for their own purpose. Moreover our pfn walkers have to be
> prepared for zones overlapping in the physical range already because we
> do support interleaving NUMA nodes and therefore zones can interleave as
> well. This means we can allow each memory block to be associated with a
> different zone.
> 
> Loosen the current onlining semantic and allow explicit onlining type on
> any memblock. That means that online_{kernel,movable} will be allowed
> regardless of the physical address of the memblock as long as it is
> offline of course. This might result in moveble zone overlapping with
> other kernel zones. Default onlining then becomes a bit tricky but still
> sensible. echo online > memoryXY/state will online the given block to
> 	1) the default zone if the given range is outside of any zone
> 	2) the enclosing zone if such a zone doesn't interleave with
> 	   any other zone
>         3) the default zone if more zones interleave for this range
> where default zone is movable zone only if movable_node is enabled
> otherwise it is a kernel zone.
> 
> Here is an example of the semantic with (movable_node is not present but
> it work in an analogous way). We start with following memblocks, all of
> them offline
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> Now, we online block 34 in default mode and block 37 as movable
> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> As we can see all other blocks can still be onlined both into Normal and
> Movable zones and the Normal is default because the Movable zone spans
> only block37 now.
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Now the default zone for blocks 37-41 has changed because movable zone
> spans that range.
> root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Note that the block 39 now belongs to the zone Normal and so block38
> falls into Normal by default as well.
> 
> For completness
> root@test1:/sys/devices/system/node/node1# for i in memory[34]?
> do
> 	echo online > $i/state 2>/dev/null
> done
> 
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable
> memory41/valid_zones:Movable
> 
> Implementation wise the change is quite straightforward. We can get rid
> of allow_online_pfn_range altogether. online_pages allows only offline
> nodes already. The original default_zone_for_pfn will become
> default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
> above semantic. zone_for_pfn_range is slightly reorganized to implement
> kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
> a catch all default behavior.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  drivers/base/memory.c |  3 ---
>  mm/memory_hotplug.c   | 74 ++++++++++++++++-----------------------------------
>  2 files changed, 23 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 26383af9900c..4e3b61cda520 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -394,9 +394,6 @@ static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
>  {
>  	struct zone *zone;
>  
> -	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
> -		return;
> -
>  	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
>  	if (zone != default_zone) {
>  		strcat(buf, " ");
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6b9a60115e37..670f7acbecf4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -894,7 +894,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
>   * If no kernel zone covers this pfn range it will automatically go
>   * to the ZONE_NORMAL.
>   */
> -static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> +static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn,
>  		unsigned long nr_pages)
>  {
>  	struct pglist_data *pgdat = NODE_DATA(nid);
> @@ -910,65 +910,40 @@ static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>  	return &pgdat->node_zones[ZONE_NORMAL];
>  }
>  
> -bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
> +static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> +		unsigned long nr_pages)
>  {
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> -	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
> +	struct zone *kernel_zone = default_kernel_zone_for_pfn(nid, start_pfn,
> +			nr_pages);
> +	struct zone *movable_zone = &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
> +	bool in_kernel = zone_intersects(kernel_zone, start_pfn, nr_pages);
> +	bool in_movable = zone_intersects(movable_zone, start_pfn, nr_pages);
>  
>  	/*
> -	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
> -	 * physically before ZONE_MOVABLE. All we need is they do not
> -	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
> -	 * though so let's stick with it for simplicity for now.
> -	 * TODO make sure we do not overlap with ZONE_DEVICE
> +	 * We inherit the existing zone in a simple case where zones do not
> +	 * overlap in the given range
>  	 */
> -	if (online_type == MMOP_ONLINE_KERNEL) {
> -		if (zone_is_empty(movable_zone))
> -			return true;
> -		return movable_zone->zone_start_pfn >= pfn + nr_pages;
> -	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> -		return zone_end_pfn(default_zone) <= pfn;
> -	}
> -
> -	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
> -	return online_type == MMOP_ONLINE_KEEP;
> -}
> -
> -static inline bool movable_pfn_range(int nid, struct zone *default_zone,
> -		unsigned long start_pfn, unsigned long nr_pages)
> -{
> -	if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
> -				MMOP_ONLINE_KERNEL))
> -		return true;
> -
> -	if (!movable_node_is_enabled())
> -		return false;
> +	if (in_kernel ^ in_movable)
> +		return (in_kernel) ? kernel_zone : movable_zone;
>  
> -	return !zone_intersects(default_zone, start_pfn, nr_pages);
> +	/*
> +	 * If the range doesn't belong to any zone or two zones overlap in the
> +	 * given range then we use movable zone only if movable_node is
> +	 * enabled because we always online to a kernel zone by default.
> +	 */
> +	return movable_node_enabled ? movable_zone : kernel_zone;
>  }
>  
>  struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
>  		unsigned long nr_pages)
>  {
> -	struct pglist_data *pgdat = NODE_DATA(nid);
> -	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
> +	if (online_type == MMOP_ONLINE_KERNEL)
> +		return default_kernel_zone_for_pfn(nid, start_pfn, nr_pages);
>  
> -	if (online_type == MMOP_ONLINE_KEEP) {
> -		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> -		/*
> -		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
> -		 * movable zone if that is not possible (e.g. we are within
> -		 * or past the existing movable zone). movable_node overrides
> -		 * this default and defaults to movable zone
> -		 */
> -		if (movable_pfn_range(nid, zone, start_pfn, nr_pages))
> -			zone = movable_zone;
> -	} else if (online_type == MMOP_ONLINE_MOVABLE) {
> -		zone = &pgdat->node_zones[ZONE_MOVABLE];
> -	}
> +	if (online_type == MMOP_ONLINE_MOVABLE)
> +		return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
>  
> -	return zone;
> +	return default_zone_for_pfn(nid, start_pfn, nr_pages);
>  }
>  
>  /*
> @@ -997,9 +972,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>  	struct memory_notify arg;
>  
>  	nid = pfn_to_nid(pfn);
> -	if (!allow_online_pfn_range(nid, pfn, nr_pages, online_type))
> -		return -EINVAL;
> -
>  	/* associate pfn range with the zone */
>  	zone = move_pfn_range(online_type, nid, pfn, nr_pages);
>  
> -- 
> 2.11.0
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-07-14 12:12 ` [PATCH 2/2] mm, memory_hotplug: remove zone restrictions Michal Hocko
  2017-07-14 12:17   ` Vlastimil Babka
@ 2017-07-14 14:26   ` Reza Arbab
  1 sibling, 0 replies; 48+ messages in thread
From: Reza Arbab @ 2017-07-14 14:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, linux-mm, LKML, Michal Hocko, Joonsoo Kim, linux-api

On Fri, Jul 14, 2017 at 02:12:33PM +0200, Michal Hocko wrote: 
>Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
>to precede the Movable zone in the physical memory range. The purpose of
>the movable zone is, however, not bound to any physical memory restriction.
>It merely defines a class of migrateable and reclaimable memory.
>
>There are users (e.g. CMA) who might want to reserve specific physical
>memory ranges for their own purpose. Moreover our pfn walkers have to be
>prepared for zones overlapping in the physical range already because we
>do support interleaving NUMA nodes and therefore zones can interleave as
>well. This means we can allow each memory block to be associated with a
>different zone.
>
>Loosen the current onlining semantic and allow explicit onlining type on
>any memblock. That means that online_{kernel,movable} will be allowed
>regardless of the physical address of the memblock as long as it is
>offline of course. This might result in moveble zone overlapping with
>other kernel zones. Default onlining then becomes a bit tricky but still
>sensible. echo online > memoryXY/state will online the given block to
>	1) the default zone if the given range is outside of any zone
>	2) the enclosing zone if such a zone doesn't interleave with
>	   any other zone
>        3) the default zone if more zones interleave for this range
>where default zone is movable zone only if movable_node is enabled
>otherwise it is a kernel zone.
>
>Here is an example of the semantic with (movable_node is not present but
>it work in an analogous way). We start with following memblocks, all of
>them offline
>memory34/valid_zones:Normal Movable
>memory35/valid_zones:Normal Movable
>memory36/valid_zones:Normal Movable
>memory37/valid_zones:Normal Movable
>memory38/valid_zones:Normal Movable
>memory39/valid_zones:Normal Movable
>memory40/valid_zones:Normal Movable
>memory41/valid_zones:Normal Movable
>
>Now, we online block 34 in default mode and block 37 as movable
>root@test1:/sys/devices/system/node/node1# echo online > memory34/state
>root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
>memory34/valid_zones:Normal
>memory35/valid_zones:Normal Movable
>memory36/valid_zones:Normal Movable
>memory37/valid_zones:Movable
>memory38/valid_zones:Normal Movable
>memory39/valid_zones:Normal Movable
>memory40/valid_zones:Normal Movable
>memory41/valid_zones:Normal Movable
>
>As we can see all other blocks can still be onlined both into Normal and
>Movable zones and the Normal is default because the Movable zone spans
>only block37 now.
>root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
>memory34/valid_zones:Normal
>memory35/valid_zones:Normal Movable
>memory36/valid_zones:Normal Movable
>memory37/valid_zones:Movable
>memory38/valid_zones:Movable Normal
>memory39/valid_zones:Movable Normal
>memory40/valid_zones:Movable Normal
>memory41/valid_zones:Movable
>
>Now the default zone for blocks 37-41 has changed because movable zone
>spans that range.
>root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
>memory34/valid_zones:Normal
>memory35/valid_zones:Normal Movable
>memory36/valid_zones:Normal Movable
>memory37/valid_zones:Movable
>memory38/valid_zones:Normal Movable
>memory39/valid_zones:Normal
>memory40/valid_zones:Movable Normal
>memory41/valid_zones:Movable
>
>Note that the block 39 now belongs to the zone Normal and so block38
>falls into Normal by default as well.
>
>For completness
>root@test1:/sys/devices/system/node/node1# for i in memory[34]?
>do
>	echo online > $i/state 2>/dev/null
>done
>
>memory34/valid_zones:Normal
>memory35/valid_zones:Normal
>memory36/valid_zones:Normal
>memory37/valid_zones:Movable
>memory38/valid_zones:Normal
>memory39/valid_zones:Normal
>memory40/valid_zones:Movable
>memory41/valid_zones:Movable
>
>Implementation wise the change is quite straightforward. We can get rid
>of allow_online_pfn_range altogether. online_pages allows only offline
>nodes already. The original default_zone_for_pfn will become
>default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
>above semantic. zone_for_pfn_range is slightly reorganized to implement
>kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
>a catch all default behavior.
>
>Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>

>Cc: <linux-api@vger.kernel.org>
>Signed-off-by: Michal Hocko <mhocko@suse.com>

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-14 14:26   ` Reza Arbab
  0 siblings, 0 replies; 48+ messages in thread
From: Reza Arbab @ 2017-07-14 14:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, linux-mm, LKML, Michal Hocko, Joonsoo Kim, linux-api

On Fri, Jul 14, 2017 at 02:12:33PM +0200, Michal Hocko wrote: 
>Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
>to precede the Movable zone in the physical memory range. The purpose of
>the movable zone is, however, not bound to any physical memory restriction.
>It merely defines a class of migrateable and reclaimable memory.
>
>There are users (e.g. CMA) who might want to reserve specific physical
>memory ranges for their own purpose. Moreover our pfn walkers have to be
>prepared for zones overlapping in the physical range already because we
>do support interleaving NUMA nodes and therefore zones can interleave as
>well. This means we can allow each memory block to be associated with a
>different zone.
>
>Loosen the current onlining semantic and allow explicit onlining type on
>any memblock. That means that online_{kernel,movable} will be allowed
>regardless of the physical address of the memblock as long as it is
>offline of course. This might result in moveble zone overlapping with
>other kernel zones. Default onlining then becomes a bit tricky but still
>sensible. echo online > memoryXY/state will online the given block to
>	1) the default zone if the given range is outside of any zone
>	2) the enclosing zone if such a zone doesn't interleave with
>	   any other zone
>        3) the default zone if more zones interleave for this range
>where default zone is movable zone only if movable_node is enabled
>otherwise it is a kernel zone.
>
>Here is an example of the semantic with (movable_node is not present but
>it work in an analogous way). We start with following memblocks, all of
>them offline
>memory34/valid_zones:Normal Movable
>memory35/valid_zones:Normal Movable
>memory36/valid_zones:Normal Movable
>memory37/valid_zones:Normal Movable
>memory38/valid_zones:Normal Movable
>memory39/valid_zones:Normal Movable
>memory40/valid_zones:Normal Movable
>memory41/valid_zones:Normal Movable
>
>Now, we online block 34 in default mode and block 37 as movable
>root@test1:/sys/devices/system/node/node1# echo online > memory34/state
>root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
>memory34/valid_zones:Normal
>memory35/valid_zones:Normal Movable
>memory36/valid_zones:Normal Movable
>memory37/valid_zones:Movable
>memory38/valid_zones:Normal Movable
>memory39/valid_zones:Normal Movable
>memory40/valid_zones:Normal Movable
>memory41/valid_zones:Normal Movable
>
>As we can see all other blocks can still be onlined both into Normal and
>Movable zones and the Normal is default because the Movable zone spans
>only block37 now.
>root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
>memory34/valid_zones:Normal
>memory35/valid_zones:Normal Movable
>memory36/valid_zones:Normal Movable
>memory37/valid_zones:Movable
>memory38/valid_zones:Movable Normal
>memory39/valid_zones:Movable Normal
>memory40/valid_zones:Movable Normal
>memory41/valid_zones:Movable
>
>Now the default zone for blocks 37-41 has changed because movable zone
>spans that range.
>root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
>memory34/valid_zones:Normal
>memory35/valid_zones:Normal Movable
>memory36/valid_zones:Normal Movable
>memory37/valid_zones:Movable
>memory38/valid_zones:Normal Movable
>memory39/valid_zones:Normal
>memory40/valid_zones:Movable Normal
>memory41/valid_zones:Movable
>
>Note that the block 39 now belongs to the zone Normal and so block38
>falls into Normal by default as well.
>
>For completness
>root@test1:/sys/devices/system/node/node1# for i in memory[34]?
>do
>	echo online > $i/state 2>/dev/null
>done
>
>memory34/valid_zones:Normal
>memory35/valid_zones:Normal
>memory36/valid_zones:Normal
>memory37/valid_zones:Movable
>memory38/valid_zones:Normal
>memory39/valid_zones:Normal
>memory40/valid_zones:Movable
>memory41/valid_zones:Movable
>
>Implementation wise the change is quite straightforward. We can get rid
>of allow_online_pfn_range altogether. online_pages allows only offline
>nodes already. The original default_zone_for_pfn will become
>default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
>above semantic. zone_for_pfn_range is slightly reorganized to implement
>kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
>a catch all default behavior.
>
>Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>

>Cc: <linux-api@vger.kernel.org>
>Signed-off-by: Michal Hocko <mhocko@suse.com>

-- 
Reza Arbab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-07-14 12:12 ` [PATCH 2/2] mm, memory_hotplug: remove zone restrictions Michal Hocko
@ 2017-07-14 12:17   ` Vlastimil Babka
  2017-07-14 14:26   ` Reza Arbab
  1 sibling, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-07-14 12:17 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, Wei Yang, linux-mm, LKML,
	Michal Hocko, Joonsoo Kim, linux-api

On 07/14/2017 02:12 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> to precede the Movable zone in the physical memory range. The purpose of
> the movable zone is, however, not bound to any physical memory restriction.
> It merely defines a class of migrateable and reclaimable memory.
> 
> There are users (e.g. CMA) who might want to reserve specific physical
> memory ranges for their own purpose. Moreover our pfn walkers have to be
> prepared for zones overlapping in the physical range already because we
> do support interleaving NUMA nodes and therefore zones can interleave as
> well. This means we can allow each memory block to be associated with a
> different zone.
> 
> Loosen the current onlining semantic and allow explicit onlining type on
> any memblock. That means that online_{kernel,movable} will be allowed
> regardless of the physical address of the memblock as long as it is
> offline of course. This might result in moveble zone overlapping with
> other kernel zones. Default onlining then becomes a bit tricky but still
> sensible. echo online > memoryXY/state will online the given block to
> 	1) the default zone if the given range is outside of any zone
> 	2) the enclosing zone if such a zone doesn't interleave with
> 	   any other zone
>         3) the default zone if more zones interleave for this range
> where default zone is movable zone only if movable_node is enabled
> otherwise it is a kernel zone.
> 
> Here is an example of the semantic with (movable_node is not present but
> it work in an analogous way). We start with following memblocks, all of
> them offline
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> Now, we online block 34 in default mode and block 37 as movable
> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> As we can see all other blocks can still be onlined both into Normal and
> Movable zones and the Normal is default because the Movable zone spans
> only block37 now.
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Now the default zone for blocks 37-41 has changed because movable zone
> spans that range.
> root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Note that the block 39 now belongs to the zone Normal and so block38
> falls into Normal by default as well.
> 
> For completness
> root@test1:/sys/devices/system/node/node1# for i in memory[34]?
> do
> 	echo online > $i/state 2>/dev/null
> done
> 
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable
> memory41/valid_zones:Movable
> 
> Implementation wise the change is quite straightforward. We can get rid
> of allow_online_pfn_range altogether. online_pages allows only offline
> nodes already. The original default_zone_for_pfn will become
> default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
> above semantic. zone_for_pfn_range is slightly reorganized to implement
> kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
> a catch all default behavior.
> 
> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> Cc: <linux-api@vger.kernel.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-14 12:17   ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-07-14 12:17 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, Wei Yang, linux-mm, LKML,
	Michal Hocko, Joonsoo Kim, linux-api

On 07/14/2017 02:12 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
> to precede the Movable zone in the physical memory range. The purpose of
> the movable zone is, however, not bound to any physical memory restriction.
> It merely defines a class of migrateable and reclaimable memory.
> 
> There are users (e.g. CMA) who might want to reserve specific physical
> memory ranges for their own purpose. Moreover our pfn walkers have to be
> prepared for zones overlapping in the physical range already because we
> do support interleaving NUMA nodes and therefore zones can interleave as
> well. This means we can allow each memory block to be associated with a
> different zone.
> 
> Loosen the current onlining semantic and allow explicit onlining type on
> any memblock. That means that online_{kernel,movable} will be allowed
> regardless of the physical address of the memblock as long as it is
> offline of course. This might result in moveble zone overlapping with
> other kernel zones. Default onlining then becomes a bit tricky but still
> sensible. echo online > memoryXY/state will online the given block to
> 	1) the default zone if the given range is outside of any zone
> 	2) the enclosing zone if such a zone doesn't interleave with
> 	   any other zone
>         3) the default zone if more zones interleave for this range
> where default zone is movable zone only if movable_node is enabled
> otherwise it is a kernel zone.
> 
> Here is an example of the semantic with (movable_node is not present but
> it work in an analogous way). We start with following memblocks, all of
> them offline
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> Now, we online block 34 in default mode and block 37 as movable
> root@test1:/sys/devices/system/node/node1# echo online > memory34/state
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal Movable
> memory40/valid_zones:Normal Movable
> memory41/valid_zones:Normal Movable
> 
> As we can see all other blocks can still be onlined both into Normal and
> Movable zones and the Normal is default because the Movable zone spans
> only block37 now.
> root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Movable Normal
> memory39/valid_zones:Movable Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Now the default zone for blocks 37-41 has changed because movable zone
> spans that range.
> root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal Movable
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable Normal
> memory41/valid_zones:Movable
> 
> Note that the block 39 now belongs to the zone Normal and so block38
> falls into Normal by default as well.
> 
> For completness
> root@test1:/sys/devices/system/node/node1# for i in memory[34]?
> do
> 	echo online > $i/state 2>/dev/null
> done
> 
> memory34/valid_zones:Normal
> memory35/valid_zones:Normal
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> memory38/valid_zones:Normal
> memory39/valid_zones:Normal
> memory40/valid_zones:Movable
> memory41/valid_zones:Movable
> 
> Implementation wise the change is quite straightforward. We can get rid
> of allow_online_pfn_range altogether. online_pages allows only offline
> nodes already. The original default_zone_for_pfn will become
> default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
> above semantic. zone_for_pfn_range is slightly reorganized to implement
> kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
> a catch all default behavior.
> 
> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> Cc: <linux-api@vger.kernel.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
  2017-07-14 12:12 [PATCH 0/2] mm, memory_hotplug: remove zone onlining restriction Michal Hocko
@ 2017-07-14 12:12 ` Michal Hocko
  2017-07-14 12:17   ` Vlastimil Babka
  2017-07-14 14:26   ` Reza Arbab
  0 siblings, 2 replies; 48+ messages in thread
From: Michal Hocko @ 2017-07-14 12:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Vlastimil Babka, Andrea Arcangeli, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, linux-mm, LKML, Michal Hocko, Joonsoo Kim, linux-api

From: Michal Hocko <mhocko@suse.com>

Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
to precede the Movable zone in the physical memory range. The purpose of
the movable zone is, however, not bound to any physical memory restriction.
It merely defines a class of migrateable and reclaimable memory.

There are users (e.g. CMA) who might want to reserve specific physical
memory ranges for their own purpose. Moreover our pfn walkers have to be
prepared for zones overlapping in the physical range already because we
do support interleaving NUMA nodes and therefore zones can interleave as
well. This means we can allow each memory block to be associated with a
different zone.

Loosen the current onlining semantic and allow explicit onlining type on
any memblock. That means that online_{kernel,movable} will be allowed
regardless of the physical address of the memblock as long as it is
offline of course. This might result in moveble zone overlapping with
other kernel zones. Default onlining then becomes a bit tricky but still
sensible. echo online > memoryXY/state will online the given block to
	1) the default zone if the given range is outside of any zone
	2) the enclosing zone if such a zone doesn't interleave with
	   any other zone
        3) the default zone if more zones interleave for this range
where default zone is movable zone only if movable_node is enabled
otherwise it is a kernel zone.

Here is an example of the semantic with (movable_node is not present but
it work in an analogous way). We start with following memblocks, all of
them offline
memory34/valid_zones:Normal Movable
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable

Now, we online block 34 in default mode and block 37 as movable
root@test1:/sys/devices/system/node/node1# echo online > memory34/state
root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable

As we can see all other blocks can still be onlined both into Normal and
Movable zones and the Normal is default because the Movable zone spans
only block37 now.
root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Movable Normal
memory39/valid_zones:Movable Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable

Now the default zone for blocks 37-41 has changed because movable zone
spans that range.
root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable

Note that the block 39 now belongs to the zone Normal and so block38
falls into Normal by default as well.

For completness
root@test1:/sys/devices/system/node/node1# for i in memory[34]?
do
	echo online > $i/state 2>/dev/null
done

memory34/valid_zones:Normal
memory35/valid_zones:Normal
memory36/valid_zones:Normal
memory37/valid_zones:Movable
memory38/valid_zones:Normal
memory39/valid_zones:Normal
memory40/valid_zones:Movable
memory41/valid_zones:Movable

Implementation wise the change is quite straightforward. We can get rid
of allow_online_pfn_range altogether. online_pages allows only offline
nodes already. The original default_zone_for_pfn will become
default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
above semantic. zone_for_pfn_range is slightly reorganized to implement
kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
a catch all default behavior.

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/base/memory.c |  3 ---
 mm/memory_hotplug.c   | 74 ++++++++++++++++-----------------------------------
 2 files changed, 23 insertions(+), 54 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 26383af9900c..4e3b61cda520 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -394,9 +394,6 @@ static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
 {
 	struct zone *zone;
 
-	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
-		return;
-
 	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
 	if (zone != default_zone) {
 		strcat(buf, " ");
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b4f2583677b1..d8b771b1ae29 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -835,7 +835,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
  * If no kernel zone covers this pfn range it will automatically go
  * to the ZONE_NORMAL.
  */
-static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn,
 		unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
@@ -851,65 +851,40 @@ static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
 	return &pgdat->node_zones[ZONE_NORMAL];
 }
 
-bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
+static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+		unsigned long nr_pages)
 {
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
+	struct zone *kernel_zone = default_kernel_zone_for_pfn(nid, start_pfn,
+			nr_pages);
+	struct zone *movable_zone = &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
+	bool in_kernel = zone_intersects(kernel_zone, start_pfn, nr_pages);
+	bool in_movable = zone_intersects(movable_zone, start_pfn, nr_pages);
 
 	/*
-	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
-	 * physically before ZONE_MOVABLE. All we need is they do not
-	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
-	 * though so let's stick with it for simplicity for now.
-	 * TODO make sure we do not overlap with ZONE_DEVICE
+	 * We inherit the existing zone in a simple case where zones do not
+	 * overlap in the given range
 	 */
-	if (online_type == MMOP_ONLINE_KERNEL) {
-		if (zone_is_empty(movable_zone))
-			return true;
-		return movable_zone->zone_start_pfn >= pfn + nr_pages;
-	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		return zone_end_pfn(default_zone) <= pfn;
-	}
-
-	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
-	return online_type == MMOP_ONLINE_KEEP;
-}
-
-static inline bool movable_pfn_range(int nid, struct zone *default_zone,
-		unsigned long start_pfn, unsigned long nr_pages)
-{
-	if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
-				MMOP_ONLINE_KERNEL))
-		return true;
-
-	if (!movable_node_is_enabled())
-		return false;
+	if (in_kernel ^ in_movable)
+		return (in_kernel) ? kernel_zone : movable_zone;
 
-	return !zone_intersects(default_zone, start_pfn, nr_pages);
+	/*
+	 * If the range doesn't belong to any zone or two zones overlap in the
+	 * given range then we use movable zone only if movable_node is
+	 * enabled because we always online to a kernel zone by default.
+	 */
+	return movable_node_enabled ? movable_zone : kernel_zone;
 }
 
 struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
 		unsigned long nr_pages)
 {
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
+	if (online_type == MMOP_ONLINE_KERNEL)
+		return default_kernel_zone_for_pfn(nid, start_pfn, nr_pages);
 
-	if (online_type == MMOP_ONLINE_KEEP) {
-		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-		/*
-		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
-		 * movable zone if that is not possible (e.g. we are within
-		 * or past the existing movable zone). movable_node overrides
-		 * this default and defaults to movable zone
-		 */
-		if (movable_pfn_range(nid, zone, start_pfn, nr_pages))
-			zone = movable_zone;
-	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		zone = &pgdat->node_zones[ZONE_MOVABLE];
-	}
+	if (online_type == MMOP_ONLINE_MOVABLE)
+		return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
 
-	return zone;
+	return default_zone_for_pfn(nid, start_pfn, nr_pages);
 }
 
 /*
@@ -938,9 +913,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	struct memory_notify arg;
 
 	nid = pfn_to_nid(pfn);
-	if (!allow_online_pfn_range(nid, pfn, nr_pages, online_type))
-		return -EINVAL;
-
 	/* associate pfn range with the zone */
 	zone = move_pfn_range(online_type, nid, pfn, nr_pages);
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 2/2] mm, memory_hotplug: remove zone restrictions
@ 2017-07-14 12:12 ` Michal Hocko
  2017-07-14 12:17   ` Vlastimil Babka
  2017-07-14 14:26   ` Reza Arbab
  0 siblings, 2 replies; 48+ messages in thread
From: Michal Hocko @ 2017-07-14 12:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Vlastimil Babka, Andrea Arcangeli, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Wei Yang, linux-mm, LKML, Michal Hocko, Joonsoo Kim, linux-api

From: Michal Hocko <mhocko@suse.com>

Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
to precede the Movable zone in the physical memory range. The purpose of
the movable zone is, however, not bound to any physical memory restriction.
It merely defines a class of migrateable and reclaimable memory.

There are users (e.g. CMA) who might want to reserve specific physical
memory ranges for their own purpose. Moreover our pfn walkers have to be
prepared for zones overlapping in the physical range already because we
do support interleaving NUMA nodes and therefore zones can interleave as
well. This means we can allow each memory block to be associated with a
different zone.

Loosen the current onlining semantic and allow explicit onlining type on
any memblock. That means that online_{kernel,movable} will be allowed
regardless of the physical address of the memblock as long as it is
offline of course. This might result in moveble zone overlapping with
other kernel zones. Default onlining then becomes a bit tricky but still
sensible. echo online > memoryXY/state will online the given block to
	1) the default zone if the given range is outside of any zone
	2) the enclosing zone if such a zone doesn't interleave with
	   any other zone
        3) the default zone if more zones interleave for this range
where default zone is movable zone only if movable_node is enabled
otherwise it is a kernel zone.

Here is an example of the semantic with (movable_node is not present but
it work in an analogous way). We start with following memblocks, all of
them offline
memory34/valid_zones:Normal Movable
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable

Now, we online block 34 in default mode and block 37 as movable
root@test1:/sys/devices/system/node/node1# echo online > memory34/state
root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable

As we can see all other blocks can still be onlined both into Normal and
Movable zones and the Normal is default because the Movable zone spans
only block37 now.
root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Movable Normal
memory39/valid_zones:Movable Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable

Now the default zone for blocks 37-41 has changed because movable zone
spans that range.
root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable

Note that the block 39 now belongs to the zone Normal and so block38
falls into Normal by default as well.

For completness
root@test1:/sys/devices/system/node/node1# for i in memory[34]?
do
	echo online > $i/state 2>/dev/null
done

memory34/valid_zones:Normal
memory35/valid_zones:Normal
memory36/valid_zones:Normal
memory37/valid_zones:Movable
memory38/valid_zones:Normal
memory39/valid_zones:Normal
memory40/valid_zones:Movable
memory41/valid_zones:Movable

Implementation wise the change is quite straightforward. We can get rid
of allow_online_pfn_range altogether. online_pages allows only offline
nodes already. The original default_zone_for_pfn will become
default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
above semantic. zone_for_pfn_range is slightly reorganized to implement
kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes
a catch all default behavior.

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/base/memory.c |  3 ---
 mm/memory_hotplug.c   | 74 ++++++++++++++++-----------------------------------
 2 files changed, 23 insertions(+), 54 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 26383af9900c..4e3b61cda520 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -394,9 +394,6 @@ static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
 {
 	struct zone *zone;
 
-	if (!allow_online_pfn_range(nid, start_pfn, nr_pages, online_type))
-		return;
-
 	zone = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
 	if (zone != default_zone) {
 		strcat(buf, " ");
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b4f2583677b1..d8b771b1ae29 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -835,7 +835,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
  * If no kernel zone covers this pfn range it will automatically go
  * to the ZONE_NORMAL.
  */
-static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn,
 		unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
@@ -851,65 +851,40 @@ static struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
 	return &pgdat->node_zones[ZONE_NORMAL];
 }
 
-bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type)
+static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+		unsigned long nr_pages)
 {
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
+	struct zone *kernel_zone = default_kernel_zone_for_pfn(nid, start_pfn,
+			nr_pages);
+	struct zone *movable_zone = &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
+	bool in_kernel = zone_intersects(kernel_zone, start_pfn, nr_pages);
+	bool in_movable = zone_intersects(movable_zone, start_pfn, nr_pages);
 
 	/*
-	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
-	 * physically before ZONE_MOVABLE. All we need is they do not
-	 * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE
-	 * though so let's stick with it for simplicity for now.
-	 * TODO make sure we do not overlap with ZONE_DEVICE
+	 * We inherit the existing zone in a simple case where zones do not
+	 * overlap in the given range
 	 */
-	if (online_type == MMOP_ONLINE_KERNEL) {
-		if (zone_is_empty(movable_zone))
-			return true;
-		return movable_zone->zone_start_pfn >= pfn + nr_pages;
-	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		return zone_end_pfn(default_zone) <= pfn;
-	}
-
-	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
-	return online_type == MMOP_ONLINE_KEEP;
-}
-
-static inline bool movable_pfn_range(int nid, struct zone *default_zone,
-		unsigned long start_pfn, unsigned long nr_pages)
-{
-	if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
-				MMOP_ONLINE_KERNEL))
-		return true;
-
-	if (!movable_node_is_enabled())
-		return false;
+	if (in_kernel ^ in_movable)
+		return (in_kernel) ? kernel_zone : movable_zone;
 
-	return !zone_intersects(default_zone, start_pfn, nr_pages);
+	/*
+	 * If the range doesn't belong to any zone or two zones overlap in the
+	 * given range then we use movable zone only if movable_node is
+	 * enabled because we always online to a kernel zone by default.
+	 */
+	return movable_node_enabled ? movable_zone : kernel_zone;
 }
 
 struct zone * zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
 		unsigned long nr_pages)
 {
-	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
+	if (online_type == MMOP_ONLINE_KERNEL)
+		return default_kernel_zone_for_pfn(nid, start_pfn, nr_pages);
 
-	if (online_type == MMOP_ONLINE_KEEP) {
-		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-		/*
-		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
-		 * movable zone if that is not possible (e.g. we are within
-		 * or past the existing movable zone). movable_node overrides
-		 * this default and defaults to movable zone
-		 */
-		if (movable_pfn_range(nid, zone, start_pfn, nr_pages))
-			zone = movable_zone;
-	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		zone = &pgdat->node_zones[ZONE_MOVABLE];
-	}
+	if (online_type == MMOP_ONLINE_MOVABLE)
+		return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE];
 
-	return zone;
+	return default_zone_for_pfn(nid, start_pfn, nr_pages);
 }
 
 /*
@@ -938,9 +913,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	struct memory_notify arg;
 
 	nid = pfn_to_nid(pfn);
-	if (!allow_online_pfn_range(nid, pfn, nr_pages, online_type))
-		return -EINVAL;
-
 	/* associate pfn range with the zone */
 	zone = move_pfn_range(online_type, nid, pfn, nr_pages);
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2017-07-14 14:26 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-29  7:35 [RFC PATCH 0/2] mm, memory_hotplug: remove zone onlining restriction Michal Hocko
2017-06-29  7:35 ` [PATCH 1/2] mm, memory_hotplug: display allowed zones in the preferred ordering Michal Hocko
2017-06-30  0:45   ` Joonsoo Kim
2017-07-07 14:34   ` Vlastimil Babka
2017-06-29  7:35 ` [PATCH 2/2] mm, memory_hotplug: remove zone restrictions Michal Hocko
2017-06-30  1:16   ` Joonsoo Kim
2017-06-30  3:09   ` Wei Yang
2017-06-30  8:39     ` Michal Hocko
2017-06-30  9:39       ` Wei Yang
2017-06-30  9:55         ` Michal Hocko
2017-06-30 11:01           ` Michal Hocko
2017-07-05 23:16             ` Wei Yang
2017-07-06  6:56               ` Michal Hocko
2017-07-07  8:37                 ` Wei Yang
2017-07-07 12:41                   ` Michal Hocko
2017-07-07 15:02   ` Vlastimil Babka
2017-07-10  6:45     ` Michal Hocko
2017-07-10 11:11       ` Vlastimil Babka
2017-07-10 11:17         ` Michal Hocko
2017-07-10 12:12           ` Vlastimil Babka
2017-07-10 12:30             ` Michal Hocko
2017-07-12 12:49   ` Michal Hocko
2017-07-14 12:12 [PATCH 0/2] mm, memory_hotplug: remove zone onlining restriction Michal Hocko
2017-07-14 12:12 ` [PATCH 2/2] mm, memory_hotplug: remove zone restrictions Michal Hocko
2017-07-14 12:17   ` Vlastimil Babka
2017-07-14 14:26   ` Reza Arbab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.