All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] memory hotplug follow up fixes
@ 2017-06-01  8:37 ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-06-01  8:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens,
	LKML

Hi Andrew,
Heiko Carstens has noticed an unexpected memory online behavior for the
default onlining (aka MMOP_ONLINE_KEEP) and also online_kernel to other
kernel zones than ZONE_NORMAL. Both fixes are rather straightforward
so could you add them to the memory hotplug rewrite pile please?

Shortlog
Michal Hocko (2):
      mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior
      mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone

Diffstat
 drivers/base/memory.c          |  2 +-
 include/linux/memory_hotplug.h |  2 ++
 mm/memory_hotplug.c            | 36 +++++++++++++++++++++++++++++-------
 3 files changed, 32 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 0/2] memory hotplug follow up fixes
@ 2017-06-01  8:37 ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-06-01  8:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens,
	LKML

Hi Andrew,
Heiko Carstens has noticed an unexpected memory online behavior for the
default onlining (aka MMOP_ONLINE_KEEP) and also online_kernel to other
kernel zones than ZONE_NORMAL. Both fixes are rather straightforward
so could you add them to the memory hotplug rewrite pile please?

Shortlog
Michal Hocko (2):
      mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior
      mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone

Diffstat
 drivers/base/memory.c          |  2 +-
 include/linux/memory_hotplug.h |  2 ++
 mm/memory_hotplug.c            | 36 +++++++++++++++++++++++++++++-------
 3 files changed, 32 insertions(+), 8 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior
  2017-06-01  8:37 ` Michal Hocko
@ 2017-06-01  8:37   ` Michal Hocko
  -1 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-06-01  8:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Heiko Carstens has noticed that the MMOP_ONLINE_KEEP is broken currently
$ grep . memory3?/valid_zones
memory34/valid_zones:Normal Movable
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable

$ echo online_movable > memory34/state
$ grep . memory3?/valid_zones
memory34/valid_zones:Movable
memory35/valid_zones:Movable
memory36/valid_zones:Movable
memory37/valid_zones:Movable

$ echo online > memory36/state
$ grep . memory3?/valid_zones
memory34/valid_zones:Movable
memory36/valid_zones:Normal
memory37/valid_zones:Movable

so we have effectivelly punched a hole into the movable zone. The
problem is that move_pfn_range() check for MMOP_ONLINE_KEEP is wrong.
It only checks whether the given range is already part of the movable
zone which is not the case here as only memory34 is in the zone. Fix
this by using allow_online_pfn_range(..., MMOP_ONLINE_KERNEL) if that
is false then we can be sure that movable onlining is the right thing to
do.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0a895df2397e..b3895fd609f4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -950,11 +950,12 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
 	if (online_type == MMOP_ONLINE_KEEP) {
 		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
 		/*
-		 * MMOP_ONLINE_KEEP inherits the current zone which is
-		 * ZONE_NORMAL by default but we might be within ZONE_MOVABLE
-		 * already.
+		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
+		 * movable zone if that is not possible (e.g. we are within
+		 * or past the existing movable zone)
 		 */
-		if (zone_intersects(movable_zone, start_pfn, nr_pages))
+		if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
+					MMOP_ONLINE_KERNEL))
 			zone = movable_zone;
 	} else if (online_type == MMOP_ONLINE_MOVABLE) {
 		zone = &pgdat->node_zones[ZONE_MOVABLE];
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior
@ 2017-06-01  8:37   ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-06-01  8:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Heiko Carstens has noticed that the MMOP_ONLINE_KEEP is broken currently
$ grep . memory3?/valid_zones
memory34/valid_zones:Normal Movable
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable

$ echo online_movable > memory34/state
$ grep . memory3?/valid_zones
memory34/valid_zones:Movable
memory35/valid_zones:Movable
memory36/valid_zones:Movable
memory37/valid_zones:Movable

$ echo online > memory36/state
$ grep . memory3?/valid_zones
memory34/valid_zones:Movable
memory36/valid_zones:Normal
memory37/valid_zones:Movable

so we have effectivelly punched a hole into the movable zone. The
problem is that move_pfn_range() check for MMOP_ONLINE_KEEP is wrong.
It only checks whether the given range is already part of the movable
zone which is not the case here as only memory34 is in the zone. Fix
this by using allow_online_pfn_range(..., MMOP_ONLINE_KERNEL) if that
is false then we can be sure that movable onlining is the right thing to
do.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0a895df2397e..b3895fd609f4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -950,11 +950,12 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
 	if (online_type == MMOP_ONLINE_KEEP) {
 		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
 		/*
-		 * MMOP_ONLINE_KEEP inherits the current zone which is
-		 * ZONE_NORMAL by default but we might be within ZONE_MOVABLE
-		 * already.
+		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
+		 * movable zone if that is not possible (e.g. we are within
+		 * or past the existing movable zone)
 		 */
-		if (zone_intersects(movable_zone, start_pfn, nr_pages))
+		if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
+					MMOP_ONLINE_KERNEL))
 			zone = movable_zone;
 	} else if (online_type == MMOP_ONLINE_MOVABLE) {
 		zone = &pgdat->node_zones[ZONE_MOVABLE];
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone
  2017-06-01  8:37 ` Michal Hocko
@ 2017-06-01  8:37   ` Michal Hocko
  -1 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-06-01  8:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Heiko Carstens has noticed that he can generate overlapping zones for
ZONE_DMA and ZONE_NORMAL:
DMA      [mem 0x0000000000000000-0x000000007fffffff]
Normal   [mem 0x0000000080000000-0x000000017fffffff]

$ cat /sys/devices/system/memory/block_size_bytes
10000000
$ cat /sys/devices/system/memory/memory5/valid_zones
DMA
$ echo 0 > /sys/devices/system/memory/memory5/online
$ cat /sys/devices/system/memory/memory5/valid_zones
Normal
$ echo 1 > /sys/devices/system/memory/memory5/online
Normal

$ cat /proc/zoneinfo
Node 0, zone      DMA
spanned  524288        <-----
present  458752
managed  455078
start_pfn:           0 <-----

Node 0, zone   Normal
spanned  720896
present  589824
managed  571648
start_pfn:           327680 <-----

The reason is that we assume that the default zone for kernel onlining
is ZONE_NORMAL. This was a simplification introduced by the memory
hotplug rework and it is easily fixable by checking the range overlap in
the zone order and considering the first matching zone as the default
one. If there is no such zone then assume ZONE_NORMAL as we have been
doing so far.

Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/base/memory.c          |  2 +-
 include/linux/memory_hotplug.h |  2 ++
 mm/memory_hotplug.c            | 27 ++++++++++++++++++++++++---
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b86fda30ce62..c7c4e0325cdb 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -419,7 +419,7 @@ static ssize_t show_valid_zones(struct device *dev,
 
 	nid = pfn_to_nid(start_pfn);
 	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) {
-		strcat(buf, NODE_DATA(nid)->node_zones[ZONE_NORMAL].name);
+		strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name);
 		append = true;
 	}
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 9e0249d0f5e4..ed167541e4fc 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -309,4 +309,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
 extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
 		int online_type);
+extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn,
+		unsigned long nr_pages);
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b3895fd609f4..a0348de3e18c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -858,7 +858,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
 	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-	struct zone *normal_zone =  &pgdat->node_zones[ZONE_NORMAL];
+	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
 
 	/*
 	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
@@ -872,7 +872,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
 			return true;
 		return movable_zone->zone_start_pfn >= pfn + nr_pages;
 	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		return zone_end_pfn(normal_zone) <= pfn;
+		return zone_end_pfn(default_zone) <= pfn;
 	}
 
 	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
 }
 
 /*
+ * Returns a default kernel memory zone for the given pfn range.
+ * If no kernel zone covers this pfn range it will automatically go
+ * to the ZONE_NORMAL.
+ */
+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+		unsigned long nr_pages)
+{
+	struct pglist_data *pgdat = NODE_DATA(nid);
+	int zid;
+
+	for (zid = 0; zid <= ZONE_NORMAL; zid++) {
+		struct zone *zone = &pgdat->node_zones[zid];
+
+		if (zone_intersects(zone, start_pfn, nr_pages))
+			return zone;
+	}
+
+	return &pgdat->node_zones[ZONE_NORMAL];
+}
+
+/*
  * Associates the given pfn range with the given node and the zone appropriate
  * for the given online type.
  */
@@ -945,7 +966,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
 		unsigned long start_pfn, unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *zone = &pgdat->node_zones[ZONE_NORMAL];
+	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
 
 	if (online_type == MMOP_ONLINE_KEEP) {
 		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone
@ 2017-06-01  8:37   ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-06-01  8:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Heiko Carstens has noticed that he can generate overlapping zones for
ZONE_DMA and ZONE_NORMAL:
DMA      [mem 0x0000000000000000-0x000000007fffffff]
Normal   [mem 0x0000000080000000-0x000000017fffffff]

$ cat /sys/devices/system/memory/block_size_bytes
10000000
$ cat /sys/devices/system/memory/memory5/valid_zones
DMA
$ echo 0 > /sys/devices/system/memory/memory5/online
$ cat /sys/devices/system/memory/memory5/valid_zones
Normal
$ echo 1 > /sys/devices/system/memory/memory5/online
Normal

$ cat /proc/zoneinfo
Node 0, zone      DMA
spanned  524288        <-----
present  458752
managed  455078
start_pfn:           0 <-----

Node 0, zone   Normal
spanned  720896
present  589824
managed  571648
start_pfn:           327680 <-----

The reason is that we assume that the default zone for kernel onlining
is ZONE_NORMAL. This was a simplification introduced by the memory
hotplug rework and it is easily fixable by checking the range overlap in
the zone order and considering the first matching zone as the default
one. If there is no such zone then assume ZONE_NORMAL as we have been
doing so far.

Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/base/memory.c          |  2 +-
 include/linux/memory_hotplug.h |  2 ++
 mm/memory_hotplug.c            | 27 ++++++++++++++++++++++++---
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b86fda30ce62..c7c4e0325cdb 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -419,7 +419,7 @@ static ssize_t show_valid_zones(struct device *dev,
 
 	nid = pfn_to_nid(start_pfn);
 	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) {
-		strcat(buf, NODE_DATA(nid)->node_zones[ZONE_NORMAL].name);
+		strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name);
 		append = true;
 	}
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 9e0249d0f5e4..ed167541e4fc 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -309,4 +309,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
 extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
 		int online_type);
+extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn,
+		unsigned long nr_pages);
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b3895fd609f4..a0348de3e18c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -858,7 +858,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
 	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-	struct zone *normal_zone =  &pgdat->node_zones[ZONE_NORMAL];
+	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
 
 	/*
 	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
@@ -872,7 +872,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
 			return true;
 		return movable_zone->zone_start_pfn >= pfn + nr_pages;
 	} else if (online_type == MMOP_ONLINE_MOVABLE) {
-		return zone_end_pfn(normal_zone) <= pfn;
+		return zone_end_pfn(default_zone) <= pfn;
 	}
 
 	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
 }
 
 /*
+ * Returns a default kernel memory zone for the given pfn range.
+ * If no kernel zone covers this pfn range it will automatically go
+ * to the ZONE_NORMAL.
+ */
+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+		unsigned long nr_pages)
+{
+	struct pglist_data *pgdat = NODE_DATA(nid);
+	int zid;
+
+	for (zid = 0; zid <= ZONE_NORMAL; zid++) {
+		struct zone *zone = &pgdat->node_zones[zid];
+
+		if (zone_intersects(zone, start_pfn, nr_pages))
+			return zone;
+	}
+
+	return &pgdat->node_zones[ZONE_NORMAL];
+}
+
+/*
  * Associates the given pfn range with the given node and the zone appropriate
  * for the given online type.
  */
@@ -945,7 +966,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
 		unsigned long start_pfn, unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
-	struct zone *zone = &pgdat->node_zones[ZONE_NORMAL];
+	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
 
 	if (online_type == MMOP_ONLINE_KEEP) {
 		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior
  2017-06-01  8:37   ` Michal Hocko
@ 2017-06-01 12:32     ` Vlastimil Babka
  -1 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2017-06-01 12:32 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, Mel Gorman, Andrea Arcangeli, Jerome Glisse,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Andi Kleen, David Rientjes, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens, LKML,
	Michal Hocko

On 06/01/2017 10:37 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Heiko Carstens has noticed that the MMOP_ONLINE_KEEP is broken currently
> $ grep . memory3?/valid_zones
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> 
> $ echo online_movable > memory34/state
> $ grep . memory3?/valid_zones
> memory34/valid_zones:Movable
> memory35/valid_zones:Movable
> memory36/valid_zones:Movable
> memory37/valid_zones:Movable
> 
> $ echo online > memory36/state
> $ grep . memory3?/valid_zones
> memory34/valid_zones:Movable
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> 
> so we have effectivelly punched a hole into the movable zone. The
> problem is that move_pfn_range() check for MMOP_ONLINE_KEEP is wrong.
> It only checks whether the given range is already part of the movable
> zone which is not the case here as only memory34 is in the zone. Fix
> this by using allow_online_pfn_range(..., MMOP_ONLINE_KERNEL) if that
> is false then we can be sure that movable onlining is the right thing to
> do.
> 
> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"

Just fold it there before sending to Linus, right?

> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/memory_hotplug.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0a895df2397e..b3895fd609f4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -950,11 +950,12 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
>  	if (online_type == MMOP_ONLINE_KEEP) {
>  		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
>  		/*
> -		 * MMOP_ONLINE_KEEP inherits the current zone which is
> -		 * ZONE_NORMAL by default but we might be within ZONE_MOVABLE
> -		 * already.
> +		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
> +		 * movable zone if that is not possible (e.g. we are within
> +		 * or past the existing movable zone)
>  		 */
> -		if (zone_intersects(movable_zone, start_pfn, nr_pages))
> +		if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
> +					MMOP_ONLINE_KERNEL))
>  			zone = movable_zone;
>  	} else if (online_type == MMOP_ONLINE_MOVABLE) {
>  		zone = &pgdat->node_zones[ZONE_MOVABLE];
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior
@ 2017-06-01 12:32     ` Vlastimil Babka
  0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2017-06-01 12:32 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, Mel Gorman, Andrea Arcangeli, Jerome Glisse,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Andi Kleen, David Rientjes, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens, LKML,
	Michal Hocko

On 06/01/2017 10:37 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Heiko Carstens has noticed that the MMOP_ONLINE_KEEP is broken currently
> $ grep . memory3?/valid_zones
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
> 
> $ echo online_movable > memory34/state
> $ grep . memory3?/valid_zones
> memory34/valid_zones:Movable
> memory35/valid_zones:Movable
> memory36/valid_zones:Movable
> memory37/valid_zones:Movable
> 
> $ echo online > memory36/state
> $ grep . memory3?/valid_zones
> memory34/valid_zones:Movable
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
> 
> so we have effectivelly punched a hole into the movable zone. The
> problem is that move_pfn_range() check for MMOP_ONLINE_KEEP is wrong.
> It only checks whether the given range is already part of the movable
> zone which is not the case here as only memory34 is in the zone. Fix
> this by using allow_online_pfn_range(..., MMOP_ONLINE_KERNEL) if that
> is false then we can be sure that movable onlining is the right thing to
> do.
> 
> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"

Just fold it there before sending to Linus, right?

> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/memory_hotplug.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0a895df2397e..b3895fd609f4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -950,11 +950,12 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
>  	if (online_type == MMOP_ONLINE_KEEP) {
>  		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
>  		/*
> -		 * MMOP_ONLINE_KEEP inherits the current zone which is
> -		 * ZONE_NORMAL by default but we might be within ZONE_MOVABLE
> -		 * already.
> +		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
> +		 * movable zone if that is not possible (e.g. we are within
> +		 * or past the existing movable zone)
>  		 */
> -		if (zone_intersects(movable_zone, start_pfn, nr_pages))
> +		if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
> +					MMOP_ONLINE_KERNEL))
>  			zone = movable_zone;
>  	} else if (online_type == MMOP_ONLINE_MOVABLE) {
>  		zone = &pgdat->node_zones[ZONE_MOVABLE];
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior
  2017-06-01 12:32     ` Vlastimil Babka
@ 2017-06-01 12:40       ` Michal Hocko
  -1 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-06-01 12:40 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, Mel Gorman, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens,
	LKML

On Thu 01-06-17 14:32:42, Vlastimil Babka wrote:
> On 06/01/2017 10:37 AM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Heiko Carstens has noticed that the MMOP_ONLINE_KEEP is broken currently
> > $ grep . memory3?/valid_zones
> > memory34/valid_zones:Normal Movable
> > memory35/valid_zones:Normal Movable
> > memory36/valid_zones:Normal Movable
> > memory37/valid_zones:Normal Movable
> > 
> > $ echo online_movable > memory34/state
> > $ grep . memory3?/valid_zones
> > memory34/valid_zones:Movable
> > memory35/valid_zones:Movable
> > memory36/valid_zones:Movable
> > memory37/valid_zones:Movable
> > 
> > $ echo online > memory36/state
> > $ grep . memory3?/valid_zones
> > memory34/valid_zones:Movable
> > memory36/valid_zones:Normal
> > memory37/valid_zones:Movable
> > 
> > so we have effectivelly punched a hole into the movable zone. The
> > problem is that move_pfn_range() check for MMOP_ONLINE_KEEP is wrong.
> > It only checks whether the given range is already part of the movable
> > zone which is not the case here as only memory34 is in the zone. Fix
> > this by using allow_online_pfn_range(..., MMOP_ONLINE_KERNEL) if that
> > is false then we can be sure that movable onlining is the right thing to
> > do.
> > 
> > Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
> 
> Just fold it there before sending to Linus, right?

I do not have a strong preference. The changelog could still be helpful
for reference. The original patch is quite large and details like this
are likely to get lost there.

> 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior
@ 2017-06-01 12:40       ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-06-01 12:40 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, Mel Gorman, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens,
	LKML

On Thu 01-06-17 14:32:42, Vlastimil Babka wrote:
> On 06/01/2017 10:37 AM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Heiko Carstens has noticed that the MMOP_ONLINE_KEEP is broken currently
> > $ grep . memory3?/valid_zones
> > memory34/valid_zones:Normal Movable
> > memory35/valid_zones:Normal Movable
> > memory36/valid_zones:Normal Movable
> > memory37/valid_zones:Normal Movable
> > 
> > $ echo online_movable > memory34/state
> > $ grep . memory3?/valid_zones
> > memory34/valid_zones:Movable
> > memory35/valid_zones:Movable
> > memory36/valid_zones:Movable
> > memory37/valid_zones:Movable
> > 
> > $ echo online > memory36/state
> > $ grep . memory3?/valid_zones
> > memory34/valid_zones:Movable
> > memory36/valid_zones:Normal
> > memory37/valid_zones:Movable
> > 
> > so we have effectivelly punched a hole into the movable zone. The
> > problem is that move_pfn_range() check for MMOP_ONLINE_KEEP is wrong.
> > It only checks whether the given range is already part of the movable
> > zone which is not the case here as only memory34 is in the zone. Fix
> > this by using allow_online_pfn_range(..., MMOP_ONLINE_KERNEL) if that
> > is false then we can be sure that movable onlining is the right thing to
> > do.
> > 
> > Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
> 
> Just fold it there before sending to Linus, right?

I do not have a strong preference. The changelog could still be helpful
for reference. The original patch is quite large and details like this
are likely to get lost there.

> 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone
  2017-06-01  8:37   ` Michal Hocko
@ 2017-06-01 12:57     ` Vlastimil Babka
  -1 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2017-06-01 12:57 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, Mel Gorman, Andrea Arcangeli, Jerome Glisse,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Andi Kleen, David Rientjes, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens, LKML,
	Michal Hocko

On 06/01/2017 10:37 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Heiko Carstens has noticed that he can generate overlapping zones for
> ZONE_DMA and ZONE_NORMAL:
> DMA      [mem 0x0000000000000000-0x000000007fffffff]
> Normal   [mem 0x0000000080000000-0x000000017fffffff]
> 
> $ cat /sys/devices/system/memory/block_size_bytes
> 10000000
> $ cat /sys/devices/system/memory/memory5/valid_zones
> DMA
> $ echo 0 > /sys/devices/system/memory/memory5/online
> $ cat /sys/devices/system/memory/memory5/valid_zones
> Normal
> $ echo 1 > /sys/devices/system/memory/memory5/online
> Normal
> 
> $ cat /proc/zoneinfo
> Node 0, zone      DMA
> spanned  524288        <-----
> present  458752
> managed  455078
> start_pfn:           0 <-----
> 
> Node 0, zone   Normal
> spanned  720896
> present  589824
> managed  571648
> start_pfn:           327680 <-----
> 
> The reason is that we assume that the default zone for kernel onlining
> is ZONE_NORMAL. This was a simplification introduced by the memory
> hotplug rework and it is easily fixable by checking the range overlap in
> the zone order and considering the first matching zone as the default
> one. If there is no such zone then assume ZONE_NORMAL as we have been
> doing so far.
> 
> Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone
@ 2017-06-01 12:57     ` Vlastimil Babka
  0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2017-06-01 12:57 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, Mel Gorman, Andrea Arcangeli, Jerome Glisse,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Kani Toshimitsu, slaoub,
	Joonsoo Kim, Andi Kleen, David Rientjes, Daniel Kiper,
	Igor Mammedov, Vitaly Kuznetsov, Heiko Carstens, LKML,
	Michal Hocko

On 06/01/2017 10:37 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Heiko Carstens has noticed that he can generate overlapping zones for
> ZONE_DMA and ZONE_NORMAL:
> DMA      [mem 0x0000000000000000-0x000000007fffffff]
> Normal   [mem 0x0000000080000000-0x000000017fffffff]
> 
> $ cat /sys/devices/system/memory/block_size_bytes
> 10000000
> $ cat /sys/devices/system/memory/memory5/valid_zones
> DMA
> $ echo 0 > /sys/devices/system/memory/memory5/online
> $ cat /sys/devices/system/memory/memory5/valid_zones
> Normal
> $ echo 1 > /sys/devices/system/memory/memory5/online
> Normal
> 
> $ cat /proc/zoneinfo
> Node 0, zone      DMA
> spanned  524288        <-----
> present  458752
> managed  455078
> start_pfn:           0 <-----
> 
> Node 0, zone   Normal
> spanned  720896
> present  589824
> managed  571648
> start_pfn:           327680 <-----
> 
> The reason is that we assume that the default zone for kernel onlining
> is ZONE_NORMAL. This was a simplification introduced by the memory
> hotplug rework and it is easily fixable by checking the range overlap in
> the zone order and considering the first matching zone as the default
> one. If there is no such zone then assume ZONE_NORMAL as we have been
> doing so far.
> 
> Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone
  2017-06-01  8:37   ` Michal Hocko
  (?)
  (?)
@ 2017-06-22  2:32   ` Wei Yang
  2017-06-22 18:16       ` Michal Hocko
  -1 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2017-06-22  2:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	David Rientjes, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Heiko Carstens, LKML, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 6116 bytes --]

On Thu, Jun 01, 2017 at 10:37:46AM +0200, Michal Hocko wrote:
>From: Michal Hocko <mhocko@suse.com>
>
>Heiko Carstens has noticed that he can generate overlapping zones for
>ZONE_DMA and ZONE_NORMAL:
>DMA      [mem 0x0000000000000000-0x000000007fffffff]
>Normal   [mem 0x0000000080000000-0x000000017fffffff]
>
>$ cat /sys/devices/system/memory/block_size_bytes
>10000000
>$ cat /sys/devices/system/memory/memory5/valid_zones
>DMA
>$ echo 0 > /sys/devices/system/memory/memory5/online
>$ cat /sys/devices/system/memory/memory5/valid_zones
>Normal
>$ echo 1 > /sys/devices/system/memory/memory5/online
>Normal
>
>$ cat /proc/zoneinfo
>Node 0, zone      DMA
>spanned  524288        <-----
>present  458752
>managed  455078
>start_pfn:           0 <-----
>
>Node 0, zone   Normal
>spanned  720896
>present  589824
>managed  571648
>start_pfn:           327680 <-----
>
>The reason is that we assume that the default zone for kernel onlining
>is ZONE_NORMAL. This was a simplification introduced by the memory
>hotplug rework and it is easily fixable by checking the range overlap in
>the zone order and considering the first matching zone as the default
>one. If there is no such zone then assume ZONE_NORMAL as we have been
>doing so far.
>
>Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
>Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>Signed-off-by: Michal Hocko <mhocko@suse.com>
>---
> drivers/base/memory.c          |  2 +-
> include/linux/memory_hotplug.h |  2 ++
> mm/memory_hotplug.c            | 27 ++++++++++++++++++++++++---
> 3 files changed, 27 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index b86fda30ce62..c7c4e0325cdb 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -419,7 +419,7 @@ static ssize_t show_valid_zones(struct device *dev,
> 
> 	nid = pfn_to_nid(start_pfn);
> 	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) {
>-		strcat(buf, NODE_DATA(nid)->node_zones[ZONE_NORMAL].name);
>+		strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name);
> 		append = true;
> 	}
> 
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 9e0249d0f5e4..ed167541e4fc 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -309,4 +309,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
> 					  unsigned long pnum);
> extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> 		int online_type);
>+extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn,
>+		unsigned long nr_pages);
> #endif /* __LINUX_MEMORY_HOTPLUG_H */
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index b3895fd609f4..a0348de3e18c 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -858,7 +858,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> {
> 	struct pglist_data *pgdat = NODE_DATA(nid);
> 	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
>-	struct zone *normal_zone =  &pgdat->node_zones[ZONE_NORMAL];
>+	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
> 
> 	/*
> 	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
>@@ -872,7 +872,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> 			return true;
> 		return movable_zone->zone_start_pfn >= pfn + nr_pages;
> 	} else if (online_type == MMOP_ONLINE_MOVABLE) {
>-		return zone_end_pfn(normal_zone) <= pfn;
>+		return zone_end_pfn(default_zone) <= pfn;
> 	}
> 
> 	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
>@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
> }
> 
> /*
>+ * Returns a default kernel memory zone for the given pfn range.
>+ * If no kernel zone covers this pfn range it will automatically go
>+ * to the ZONE_NORMAL.
>+ */
>+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>+		unsigned long nr_pages)
>+{
>+	struct pglist_data *pgdat = NODE_DATA(nid);
>+	int zid;
>+
>+	for (zid = 0; zid <= ZONE_NORMAL; zid++) {
>+		struct zone *zone = &pgdat->node_zones[zid];
>+
>+		if (zone_intersects(zone, start_pfn, nr_pages))
>+			return zone;
>+	}
>+
>+	return &pgdat->node_zones[ZONE_NORMAL];
>+}

Hmm... a corner case jumped into my mind which may invalidate this
calculation.

The case is:


       Zone:         | DMA   | DMA32      | NORMAL       |
                     v       v            v              v
       
       Phy mem:      [           ]     [                  ]
       
                     ^           ^     ^                  ^
       Node:         |   Node0   |     |      Node1       |
                             A   B     C  D


The key point is
1. There is a hole between Node0 and Node1
2. The hole sits in a non-normal zone

Let's mark the boundary as A, B, C, D. Then we would have
node0->zone[dma21] = [A, B]
node1->zone[dma32] = [C, D]

If we want to hotplug a range in [B, C] on node0, it looks not that bad. While
if we want to hotplug a range in [B, C] on node1, it will introduce the
overlapped zone. Because the range [B, C] intersects none of the existing
zones on node1.

Do you think this is possible?

>+
>+/*
>  * Associates the given pfn range with the given node and the zone appropriate
>  * for the given online type.
>  */
>@@ -945,7 +966,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
> 		unsigned long start_pfn, unsigned long nr_pages)
> {
> 	struct pglist_data *pgdat = NODE_DATA(nid);
>-	struct zone *zone = &pgdat->node_zones[ZONE_NORMAL];
>+	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
> 
> 	if (online_type == MMOP_ONLINE_KEEP) {
> 		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
>-- 
>2.11.0

-- 
Wei Yang
Help you, Help me

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone
  2017-06-22  2:32   ` Wei Yang
@ 2017-06-22 18:16       ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-06-22 18:16 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, linux-mm, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	David Rientjes, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Heiko Carstens, LKML

[Again, please try to trim your quoted response to the minimum]

On Thu 22-06-17 10:32:43, Wei Yang wrote:
> On Thu, Jun 01, 2017 at 10:37:46AM +0200, Michal Hocko wrote:
[...]
> >@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
> > }
> > 
> > /*
> >+ * Returns a default kernel memory zone for the given pfn range.
> >+ * If no kernel zone covers this pfn range it will automatically go
> >+ * to the ZONE_NORMAL.
> >+ */
> >+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> >+		unsigned long nr_pages)
> >+{
> >+	struct pglist_data *pgdat = NODE_DATA(nid);
> >+	int zid;
> >+
> >+	for (zid = 0; zid <= ZONE_NORMAL; zid++) {
> >+		struct zone *zone = &pgdat->node_zones[zid];
> >+
> >+		if (zone_intersects(zone, start_pfn, nr_pages))
> >+			return zone;
> >+	}
> >+
> >+	return &pgdat->node_zones[ZONE_NORMAL];
> >+}
> 
> Hmm... a corner case jumped into my mind which may invalidate this
> calculation.
> 
> The case is:
> 
> 
>        Zone:         | DMA   | DMA32      | NORMAL       |
>                      v       v            v              v
>        
>        Phy mem:      [           ]     [                  ]
>        
>                      ^           ^     ^                  ^
>        Node:         |   Node0   |     |      Node1       |
>                              A   B     C  D
> 
> 
> The key point is
> 1. There is a hole between Node0 and Node1
> 2. The hole sits in a non-normal zone
> 
> Let's mark the boundary as A, B, C, D. Then we would have
> node0->zone[dma21] = [A, B]
> node1->zone[dma32] = [C, D]
> 
> If we want to hotplug a range in [B, C] on node0, it looks not that bad. While
> if we want to hotplug a range in [B, C] on node1, it will introduce the
> overlapped zone. Because the range [B, C] intersects none of the existing
> zones on node1.
> 
> Do you think this is possible?

Yes, it is possible. I would be much more more surprised if it was real
as well. Fixing that would require to use arch_zone_{lowest,highest}_possible_pfn
which is not available after init section disappears and I am not even
sure we should care. I would rather wait for a real life example of such
a configuration to fix it.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone
@ 2017-06-22 18:16       ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-06-22 18:16 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, linux-mm, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	David Rientjes, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Heiko Carstens, LKML

[Again, please try to trim your quoted response to the minimum]

On Thu 22-06-17 10:32:43, Wei Yang wrote:
> On Thu, Jun 01, 2017 at 10:37:46AM +0200, Michal Hocko wrote:
[...]
> >@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
> > }
> > 
> > /*
> >+ * Returns a default kernel memory zone for the given pfn range.
> >+ * If no kernel zone covers this pfn range it will automatically go
> >+ * to the ZONE_NORMAL.
> >+ */
> >+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> >+		unsigned long nr_pages)
> >+{
> >+	struct pglist_data *pgdat = NODE_DATA(nid);
> >+	int zid;
> >+
> >+	for (zid = 0; zid <= ZONE_NORMAL; zid++) {
> >+		struct zone *zone = &pgdat->node_zones[zid];
> >+
> >+		if (zone_intersects(zone, start_pfn, nr_pages))
> >+			return zone;
> >+	}
> >+
> >+	return &pgdat->node_zones[ZONE_NORMAL];
> >+}
> 
> Hmm... a corner case jumped into my mind which may invalidate this
> calculation.
> 
> The case is:
> 
> 
>        Zone:         | DMA   | DMA32      | NORMAL       |
>                      v       v            v              v
>        
>        Phy mem:      [           ]     [                  ]
>        
>                      ^           ^     ^                  ^
>        Node:         |   Node0   |     |      Node1       |
>                              A   B     C  D
> 
> 
> The key point is
> 1. There is a hole between Node0 and Node1
> 2. The hole sits in a non-normal zone
> 
> Let's mark the boundary as A, B, C, D. Then we would have
> node0->zone[dma21] = [A, B]
> node1->zone[dma32] = [C, D]
> 
> If we want to hotplug a range in [B, C] on node0, it looks not that bad. While
> if we want to hotplug a range in [B, C] on node1, it will introduce the
> overlapped zone. Because the range [B, C] intersects none of the existing
> zones on node1.
> 
> Do you think this is possible?

Yes, it is possible. I would be much more more surprised if it was real
as well. Fixing that would require to use arch_zone_{lowest,highest}_possible_pfn
which is not available after init section disappears and I am not even
sure we should care. I would rather wait for a real life example of such
a configuration to fix it.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone
  2017-06-22 18:16       ` Michal Hocko
  (?)
@ 2017-06-23  1:37       ` Wei Yang
  -1 siblings, 0 replies; 16+ messages in thread
From: Wei Yang @ 2017-06-23  1:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, Andrew Morton, linux-mm, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen,
	David Rientjes, Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov,
	Heiko Carstens, LKML

[-- Attachment #1: Type: text/plain, Size: 1632 bytes --]

On Thu, Jun 22, 2017 at 08:16:57PM +0200, Michal Hocko wrote:
>> 
>> Hmm... a corner case jumped into my mind which may invalidate this
>> calculation.
>> 
>> The case is:
>> 
>> 
>>        Zone:         | DMA   | DMA32      | NORMAL       |
>>                      v       v            v              v
>>        
>>        Phy mem:      [           ]     [                  ]
>>        
>>                      ^           ^     ^                  ^
>>        Node:         |   Node0   |     |      Node1       |
>>                              A   B     C  D
>> 
>> 
>> The key point is
>> 1. There is a hole between Node0 and Node1
>> 2. The hole sits in a non-normal zone
>> 
>> Let's mark the boundary as A, B, C, D. Then we would have
>> node0->zone[dma21] = [A, B]
>> node1->zone[dma32] = [C, D]
>> 
>> If we want to hotplug a range in [B, C] on node0, it looks not that bad. While
>> if we want to hotplug a range in [B, C] on node1, it will introduce the
>> overlapped zone. Because the range [B, C] intersects none of the existing
>> zones on node1.
>> 
>> Do you think this is possible?
>
>Yes, it is possible. I would be much more more surprised if it was real
>as well. Fixing that would require to use arch_zone_{lowest,highest}_possible_pfn
>which is not available after init section disappears and I am not even
>sure we should care. I would rather wait for a real life example of such
>a configuration to fix it.

Yep, not easy to fix, so wait for real case.

Or possible to add a line in commit log?

>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-06-23  1:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-01  8:37 [PATCH 0/2] memory hotplug follow up fixes Michal Hocko
2017-06-01  8:37 ` Michal Hocko
2017-06-01  8:37 ` [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior Michal Hocko
2017-06-01  8:37   ` Michal Hocko
2017-06-01 12:32   ` Vlastimil Babka
2017-06-01 12:32     ` Vlastimil Babka
2017-06-01 12:40     ` Michal Hocko
2017-06-01 12:40       ` Michal Hocko
2017-06-01  8:37 ` [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone Michal Hocko
2017-06-01  8:37   ` Michal Hocko
2017-06-01 12:57   ` Vlastimil Babka
2017-06-01 12:57     ` Vlastimil Babka
2017-06-22  2:32   ` Wei Yang
2017-06-22 18:16     ` Michal Hocko
2017-06-22 18:16       ` Michal Hocko
2017-06-23  1:37       ` Wei Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.