[PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
@ 2016-07-22  2:57 ` Xishi Qiu
  0 siblings, 0 replies; 14+ messages in thread
From: Xishi Qiu @ 2016-07-22  2:57 UTC (permalink / raw)
  To: Joonsoo Kim, Vlastimil Babka, Andrew Morton, Naoya Horiguchi,
	David Rientjes
  Cc: Linux MM, LKML

Memory offline could happen on both movable zone and non-movable zone.
We can offline the whole node if the zone is movable zone, and if the
zone is non-movable zone, we cannot offline the whole node, because
some kernel memory can't be migrated.

So if we offline a node with movable zone, use prefer mempolicy to alloc
new page from the next node instead of the current node or other remote
nodes, because re-migrate is a waste of time and the distance of the
remote nodes is often very large.

Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
zone.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e3cbdca..930a5c6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
 	return 0;
 }
 
+static struct page *new_node_page(struct page *page, unsigned long node,
+		int **result)
+{
+	if (PageHuge(page))
+		return alloc_huge_page_node(page_hstate(compound_head(page)),
+					node);
+	else
+		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);
+}
+
 #define NR_OFFLINE_AT_ONCE_PAGES	(256)
 static int
 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
@@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
 	int not_managed = 0;
 	int ret = 0;
+	int nid = NUMA_NO_NODE;
 	LIST_HEAD(source);
 
 	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
@@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			goto out;
 		}
 
-		/*
-		 * alloc_migrate_target should be improooooved!!
-		 * migrate_pages returns # of failed pages.
-		 */
-		ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
-					MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
+		for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+			if (!pfn_valid(pfn))
+				continue;
+			page = pfn_to_page(pfn);
+			if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
+				nid = next_node_in(page_to_nid(page),
+						node_online_map);
+			break;
+		}
+
+		/* Alloc new page from the next node if possible */
+		if (nid != NUMA_NO_NODE)
+			ret = migrate_pages(&source, new_node_page, NULL,
+					nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
+		else
+			ret = migrate_pages(&source, alloc_migrate_target, NULL,
+					0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
+
 		if (ret)
 			putback_movable_pages(&source);
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
@ 2016-07-22  2:57 ` Xishi Qiu
  0 siblings, 0 replies; 14+ messages in thread
From: Xishi Qiu @ 2016-07-22  2:57 UTC (permalink / raw)
  To: Joonsoo Kim, Vlastimil Babka, Andrew Morton, Naoya Horiguchi,
	David Rientjes
  Cc: Linux MM, LKML

Memory offline could happen on both movable zone and non-movable zone.
We can offline the whole node if the zone is movable zone, and if the
zone is non-movable zone, we cannot offline the whole node, because
some kernel memory can't be migrated.

So if we offline a node with movable zone, use prefer mempolicy to alloc
new page from the next node instead of the current node or other remote
nodes, because re-migrate is a waste of time and the distance of the
remote nodes is often very large.

Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
zone.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
 mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e3cbdca..930a5c6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
 	return 0;
 }
 
+static struct page *new_node_page(struct page *page, unsigned long node,
+		int **result)
+{
+	if (PageHuge(page))
+		return alloc_huge_page_node(page_hstate(compound_head(page)),
+					node);
+	else
+		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);
+}
+
 #define NR_OFFLINE_AT_ONCE_PAGES	(256)
 static int
 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
@@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
 	int not_managed = 0;
 	int ret = 0;
+	int nid = NUMA_NO_NODE;
 	LIST_HEAD(source);
 
 	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
@@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			goto out;
 		}
 
-		/*
-		 * alloc_migrate_target should be improooooved!!
-		 * migrate_pages returns # of failed pages.
-		 */
-		ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
-					MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
+		for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+			if (!pfn_valid(pfn))
+				continue;
+			page = pfn_to_page(pfn);
+			if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
+				nid = next_node_in(page_to_nid(page),
+						node_online_map);
+			break;
+		}
+
+		/* Alloc new page from the next node if possible */
+		if (nid != NUMA_NO_NODE)
+			ret = migrate_pages(&source, new_node_page, NULL,
+					nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
+		else
+			ret = migrate_pages(&source, alloc_migrate_target, NULL,
+					0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
+
 		if (ret)
 			putback_movable_pages(&source);
 	}
-- 
1.8.3.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
  2016-07-22  2:57 ` Xishi Qiu
@ 2016-07-22 20:11   ` Andrew Morton
  -1 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2016-07-22 20:11 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Joonsoo Kim, Vlastimil Babka, Naoya Horiguchi, David Rientjes,
	Linux MM, LKML

On Fri, 22 Jul 2016 10:57:48 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote:

> Memory offline could happen on both movable zone and non-movable zone.
> We can offline the whole node if the zone is movable zone, and if the
> zone is non-movable zone, we cannot offline the whole node, because
> some kernel memory can't be migrated.
> 
> So if we offline a node with movable zone, use prefer mempolicy to alloc
> new page from the next node instead of the current node or other remote
> nodes, because re-migrate is a waste of time and the distance of the
> remote nodes is often very large.
> 
> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
> zone.

This conflicts pretty significantly with your "mem-hotplug: use
different mempolicy in alloc_migrate_target()".  Does it replace
"mem-hotplug: use different mempolicy in alloc_migrate_target()" and
your "mem-hotplug: use GFP_HIGHUSER_MOVABLE in,
alloc_migrate_target()", or what?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
@ 2016-07-22 20:11   ` Andrew Morton
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2016-07-22 20:11 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Joonsoo Kim, Vlastimil Babka, Naoya Horiguchi, David Rientjes,
	Linux MM, LKML

On Fri, 22 Jul 2016 10:57:48 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote:

> Memory offline could happen on both movable zone and non-movable zone.
> We can offline the whole node if the zone is movable zone, and if the
> zone is non-movable zone, we cannot offline the whole node, because
> some kernel memory can't be migrated.
> 
> So if we offline a node with movable zone, use prefer mempolicy to alloc
> new page from the next node instead of the current node or other remote
> nodes, because re-migrate is a waste of time and the distance of the
> remote nodes is often very large.
> 
> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
> zone.

This conflicts pretty significantly with your "mem-hotplug: use
different mempolicy in alloc_migrate_target()".  Does it replace
"mem-hotplug: use different mempolicy in alloc_migrate_target()" and
your "mem-hotplug: use GFP_HIGHUSER_MOVABLE in,
alloc_migrate_target()", or what?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
  2016-07-22 20:11   ` Andrew Morton
@ 2016-07-25  1:52     ` Xishi Qiu
  -1 siblings, 0 replies; 14+ messages in thread
From: Xishi Qiu @ 2016-07-25  1:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, Vlastimil Babka, Naoya Horiguchi, David Rientjes,
	Linux MM, LKML

On 2016/7/23 4:11, Andrew Morton wrote:

> On Fri, 22 Jul 2016 10:57:48 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote:
> 
>> Memory offline could happen on both movable zone and non-movable zone.
>> We can offline the whole node if the zone is movable zone, and if the
>> zone is non-movable zone, we cannot offline the whole node, because
>> some kernel memory can't be migrated.
>>
>> So if we offline a node with movable zone, use prefer mempolicy to alloc
>> new page from the next node instead of the current node or other remote
>> nodes, because re-migrate is a waste of time and the distance of the
>> remote nodes is often very large.
>>
>> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
>> zone.
> 
> This conflicts pretty significantly with your "mem-hotplug: use
> different mempolicy in alloc_migrate_target()".  Does it replace
> "mem-hotplug: use different mempolicy in alloc_migrate_target()" and
> your "mem-hotplug: use GFP_HIGHUSER_MOVABLE in,
> alloc_migrate_target()", or what?
> 

Hi Andrew,

Yes, this patch is v2, "mem-hotplug: use different mempolicy in alloc_migrate_target()"
and "mem-hotplug: use GFP_HIGHUSER_MOVABLE in alloc_migrate_target()" are v1,
so just replace them.

Joonsoo and Vlastimil point that migratable pages are not always from user space,
so it is not correct to use GFP_HIGHUSER_MOVABLE in alloc_migrate_target().

David points that CMA and memory offline are distinct usecases and probably 
deserve their own callbacks.

Thanks,
Xishi Qiu

> 
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
@ 2016-07-25  1:52     ` Xishi Qiu
  0 siblings, 0 replies; 14+ messages in thread
From: Xishi Qiu @ 2016-07-25  1:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, Vlastimil Babka, Naoya Horiguchi, David Rientjes,
	Linux MM, LKML

On 2016/7/23 4:11, Andrew Morton wrote:

> On Fri, 22 Jul 2016 10:57:48 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote:
> 
>> Memory offline could happen on both movable zone and non-movable zone.
>> We can offline the whole node if the zone is movable zone, and if the
>> zone is non-movable zone, we cannot offline the whole node, because
>> some kernel memory can't be migrated.
>>
>> So if we offline a node with movable zone, use prefer mempolicy to alloc
>> new page from the next node instead of the current node or other remote
>> nodes, because re-migrate is a waste of time and the distance of the
>> remote nodes is often very large.
>>
>> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
>> zone.
> 
> This conflicts pretty significantly with your "mem-hotplug: use
> different mempolicy in alloc_migrate_target()".  Does it replace
> "mem-hotplug: use different mempolicy in alloc_migrate_target()" and
> your "mem-hotplug: use GFP_HIGHUSER_MOVABLE in,
> alloc_migrate_target()", or what?
> 

Hi Andrew,

Yes, this patch is v2, "mem-hotplug: use different mempolicy in alloc_migrate_target()"
and "mem-hotplug: use GFP_HIGHUSER_MOVABLE in alloc_migrate_target()" are v1,
so just replace them.

Joonsoo and Vlastimil point that migratable pages are not always from user space,
so it is not correct to use GFP_HIGHUSER_MOVABLE in alloc_migrate_target().

David points that CMA and memory offline are distinct usecases and probably 
deserve their own callbacks.

Thanks,
Xishi Qiu

> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
  2016-07-22  2:57 ` Xishi Qiu
@ 2016-07-25  6:59   ` Vlastimil Babka
  -1 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2016-07-25  6:59 UTC (permalink / raw)
  To: Xishi Qiu, Joonsoo Kim, Andrew Morton, Naoya Horiguchi, David Rientjes
  Cc: Linux MM, LKML

On 07/22/2016 04:57 AM, Xishi Qiu wrote:
> Memory offline could happen on both movable zone and non-movable zone.
> We can offline the whole node if the zone is movable zone, and if the
> zone is non-movable zone, we cannot offline the whole node, because
> some kernel memory can't be migrated.
>
> So if we offline a node with movable zone, use prefer mempolicy to alloc
> new page from the next node instead of the current node or other remote
> nodes, because re-migrate is a waste of time and the distance of the
> remote nodes is often very large.
>
> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
> zone.
>
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>

I think this could be simpler, if you preferred the next node regardless 
of whether it's movable zone or not. What are use cases for trying to 
offline part of non-MOVABLE zone in a node? It's not guaranteed to 
succeed anyway. Also if the reasoning is that the non-MOVABLE offlining 
preference for migration target should be instead on the *same* node, 
then alloc_migrate_target() would anyway prefer the node of the current 
CPU that happens to execute the offlining, which is random wrt the node 
in question. So consistently choosing remote node is IMHO better than 
random even for non-MOVABLE zone.

> ---
>  mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
>  1 file changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e3cbdca..930a5c6 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
>  	return 0;
>  }
>
> +static struct page *new_node_page(struct page *page, unsigned long node,
> +		int **result)
> +{
> +	if (PageHuge(page))
> +		return alloc_huge_page_node(page_hstate(compound_head(page)),
> +					node);
> +	else
> +		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);

You could just test for page in movable (or highmem?) zone here in the 
callback.

> +}
> +
>  #define NR_OFFLINE_AT_ONCE_PAGES	(256)
>  static int
>  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> @@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>  	int not_managed = 0;
>  	int ret = 0;
> +	int nid = NUMA_NO_NODE;
>  	LIST_HEAD(source);
>
>  	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
> @@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  			goto out;
>  		}
>
> -		/*
> -		 * alloc_migrate_target should be improooooved!!
> -		 * migrate_pages returns # of failed pages.
> -		 */
> -		ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
> -					MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> +		for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +			if (!pfn_valid(pfn))
> +				continue;
> +			page = pfn_to_page(pfn);
> +			if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
> +				nid = next_node_in(page_to_nid(page),
> +						node_online_map);
> +			break;
> +		}

Then you could remove the ZONE_MOVABLE check here. I'm not sure how much 
worth the precalculation of nid is, if it has to be a rather complicated 
code like this, hm.

Also, since we know that "next node in node_online_map" is in fact not 
optimal, what about using the opportunity to really try the best 
possible way? Maybe it's as simple as allocating via 
__alloc_pages_nodemask() with current node's zonelist (where remote 
nodes should be already sorted according to NUMA distance), but with 
current node (which would be first in the zonelist) removed from the 
nodemask so that it's skipped over? But check if memory offlining 
process didn't kill the zonelist already at this point, or something.

> +
> +		/* Alloc new page from the next node if possible */
> +		if (nid != NUMA_NO_NODE)
> +			ret = migrate_pages(&source, new_node_page, NULL,
> +					nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> +		else
> +			ret = migrate_pages(&source, alloc_migrate_target, NULL,
> +					0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);

Please just use one new callback fully tailored for memory offline, 
instead of choosing between the two like this.

>  		if (ret)
>  			putback_movable_pages(&source);
>  	}
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
@ 2016-07-25  6:59   ` Vlastimil Babka
  0 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2016-07-25  6:59 UTC (permalink / raw)
  To: Xishi Qiu, Joonsoo Kim, Andrew Morton, Naoya Horiguchi, David Rientjes
  Cc: Linux MM, LKML

On 07/22/2016 04:57 AM, Xishi Qiu wrote:
> Memory offline could happen on both movable zone and non-movable zone.
> We can offline the whole node if the zone is movable zone, and if the
> zone is non-movable zone, we cannot offline the whole node, because
> some kernel memory can't be migrated.
>
> So if we offline a node with movable zone, use prefer mempolicy to alloc
> new page from the next node instead of the current node or other remote
> nodes, because re-migrate is a waste of time and the distance of the
> remote nodes is often very large.
>
> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
> zone.
>
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>

I think this could be simpler, if you preferred the next node regardless 
of whether it's movable zone or not. What are use cases for trying to 
offline part of non-MOVABLE zone in a node? It's not guaranteed to 
succeed anyway. Also if the reasoning is that the non-MOVABLE offlining 
preference for migration target should be instead on the *same* node, 
then alloc_migrate_target() would anyway prefer the node of the current 
CPU that happens to execute the offlining, which is random wrt the node 
in question. So consistently choosing remote node is IMHO better than 
random even for non-MOVABLE zone.

> ---
>  mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
>  1 file changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e3cbdca..930a5c6 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
>  	return 0;
>  }
>
> +static struct page *new_node_page(struct page *page, unsigned long node,
> +		int **result)
> +{
> +	if (PageHuge(page))
> +		return alloc_huge_page_node(page_hstate(compound_head(page)),
> +					node);
> +	else
> +		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);

You could just test for page in movable (or highmem?) zone here in the 
callback.

> +}
> +
>  #define NR_OFFLINE_AT_ONCE_PAGES	(256)
>  static int
>  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> @@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>  	int not_managed = 0;
>  	int ret = 0;
> +	int nid = NUMA_NO_NODE;
>  	LIST_HEAD(source);
>
>  	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
> @@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  			goto out;
>  		}
>
> -		/*
> -		 * alloc_migrate_target should be improooooved!!
> -		 * migrate_pages returns # of failed pages.
> -		 */
> -		ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
> -					MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> +		for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +			if (!pfn_valid(pfn))
> +				continue;
> +			page = pfn_to_page(pfn);
> +			if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
> +				nid = next_node_in(page_to_nid(page),
> +						node_online_map);
> +			break;
> +		}

Then you could remove the ZONE_MOVABLE check here. I'm not sure how much 
worth the precalculation of nid is, if it has to be a rather complicated 
code like this, hm.

Also, since we know that "next node in node_online_map" is in fact not 
optimal, what about using the opportunity to really try the best 
possible way? Maybe it's as simple as allocating via 
__alloc_pages_nodemask() with current node's zonelist (where remote 
nodes should be already sorted according to NUMA distance), but with 
current node (which would be first in the zonelist) removed from the 
nodemask so that it's skipped over? But check if memory offlining 
process didn't kill the zonelist already at this point, or something.

> +
> +		/* Alloc new page from the next node if possible */
> +		if (nid != NUMA_NO_NODE)
> +			ret = migrate_pages(&source, new_node_page, NULL,
> +					nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> +		else
> +			ret = migrate_pages(&source, alloc_migrate_target, NULL,
> +					0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);

Please just use one new callback fully tailored for memory offline, 
instead of choosing between the two like this.

>  		if (ret)
>  			putback_movable_pages(&source);
>  	}
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
  2016-07-25  6:59   ` Vlastimil Babka
@ 2016-07-25  7:42     ` Xishi Qiu
  -1 siblings, 0 replies; 14+ messages in thread
From: Xishi Qiu @ 2016-07-25  7:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Naoya Horiguchi, David Rientjes,
	Linux MM, LKML

On 2016/7/25 14:59, Vlastimil Babka wrote:

> On 07/22/2016 04:57 AM, Xishi Qiu wrote:
>> Memory offline could happen on both movable zone and non-movable zone.
>> We can offline the whole node if the zone is movable zone, and if the
>> zone is non-movable zone, we cannot offline the whole node, because
>> some kernel memory can't be migrated.
>>
>> So if we offline a node with movable zone, use prefer mempolicy to alloc
>> new page from the next node instead of the current node or other remote
>> nodes, because re-migrate is a waste of time and the distance of the
>> remote nodes is often very large.
>>
>> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
>> zone.
>>
>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> 
> I think this could be simpler, if you preferred the next node regardless of whether it's movable zone or not. What are use cases for trying to offline part of non-MOVABLE zone in a node? It's not guaranteed to succeed anyway. Also if the reasoning is that the non-MOVABLE offlining preference for migration target should be instead on the *same* node, then alloc_migrate_target() would anyway prefer the node of the current CPU that happens to execute the offlining, which is random wrt the node in question. So consistently choosing remote node is IMHO better than random even for non-MOVABLE zone.
> 

Hi Vlastimil,

use next node for movable zone, use current node for non-movable zone, right?

>> ---
>>  mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
>>  1 file changed, 29 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index e3cbdca..930a5c6 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
>>      return 0;
>>  }
>>
>> +static struct page *new_node_page(struct page *page, unsigned long node,
>> +        int **result)
>> +{
>> +    if (PageHuge(page))
>> +        return alloc_huge_page_node(page_hstate(compound_head(page)),
>> +                    node);
>> +    else
>> +        return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);
> 
> You could just test for page in movable (or highmem?) zone here in the callback.
> 

is_highmem_idx() always return 0 if CONFIG_HIGHMEM closed.
And GFP_HIGHUSER_MOVABLE will choose movable_zone first, then normal_zone.
So how about this check? if (PageHighMem() or zone == ZONE_MOVABLE) then use GFP_HIGHUSER_MOVABLE

>> +}
>> +
>>  #define NR_OFFLINE_AT_ONCE_PAGES    (256)
>>  static int
>>  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>> @@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>      int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>>      int not_managed = 0;
>>      int ret = 0;
>> +    int nid = NUMA_NO_NODE;
>>      LIST_HEAD(source);
>>
>>      for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
>> @@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>              goto out;
>>          }
>>
>> -        /*
>> -         * alloc_migrate_target should be improooooved!!
>> -         * migrate_pages returns # of failed pages.
>> -         */
>> -        ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
>> -                    MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>> +        for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>> +            if (!pfn_valid(pfn))
>> +                continue;
>> +            page = pfn_to_page(pfn);
>> +            if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
>> +                nid = next_node_in(page_to_nid(page),
>> +                        node_online_map);
>> +            break;
>> +        }
> 
> Then you could remove the ZONE_MOVABLE check here. I'm not sure how much worth the precalculation of nid is, if it has to be a rather complicated code like this, hm.
> 
> Also, since we know that "next node in node_online_map" is in fact not optimal, what about using the opportunity to really try the best possible way? Maybe it's as simple as allocating via __alloc_pages_nodemask() with current node's zonelist (where remote nodes should be already sorted according to NUMA distance), but with current node (which would be first in the zonelist) removed from the nodemask so that it's skipped over? But check if memory offlining process didn't kill the zonelist already at this point, or something.
> 

Do you mean that call __alloc_pages_nodemask(), the zonelist is from current page's node,
but it(the current page's node) is not include in the nodemask?

Thanks,
Xishi Qiu

>> +
>> +        /* Alloc new page from the next node if possible */
>> +        if (nid != NUMA_NO_NODE)
>> +            ret = migrate_pages(&source, new_node_page, NULL,
>> +                    nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>> +        else
>> +            ret = migrate_pages(&source, alloc_migrate_target, NULL,
>> +                    0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> 
> Please just use one new callback fully tailored for memory offline, instead of choosing between the two like this.
> 
>>          if (ret)
>>              putback_movable_pages(&source);
>>      }
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
@ 2016-07-25  7:42     ` Xishi Qiu
  0 siblings, 0 replies; 14+ messages in thread
From: Xishi Qiu @ 2016-07-25  7:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Naoya Horiguchi, David Rientjes,
	Linux MM, LKML

On 2016/7/25 14:59, Vlastimil Babka wrote:

> On 07/22/2016 04:57 AM, Xishi Qiu wrote:
>> Memory offline could happen on both movable zone and non-movable zone.
>> We can offline the whole node if the zone is movable zone, and if the
>> zone is non-movable zone, we cannot offline the whole node, because
>> some kernel memory can't be migrated.
>>
>> So if we offline a node with movable zone, use prefer mempolicy to alloc
>> new page from the next node instead of the current node or other remote
>> nodes, because re-migrate is a waste of time and the distance of the
>> remote nodes is often very large.
>>
>> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
>> zone.
>>
>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> 
> I think this could be simpler, if you preferred the next node regardless of whether it's movable zone or not. What are use cases for trying to offline part of non-MOVABLE zone in a node? It's not guaranteed to succeed anyway. Also if the reasoning is that the non-MOVABLE offlining preference for migration target should be instead on the *same* node, then alloc_migrate_target() would anyway prefer the node of the current CPU that happens to execute the offlining, which is random wrt the node in question. So consistently choosing remote node is IMHO better than random even for non-MOVABLE zone.
> 

Hi Vlastimil,

use next node for movable zone, use current node for non-movable zone, right?

>> ---
>>  mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
>>  1 file changed, 29 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index e3cbdca..930a5c6 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
>>      return 0;
>>  }
>>
>> +static struct page *new_node_page(struct page *page, unsigned long node,
>> +        int **result)
>> +{
>> +    if (PageHuge(page))
>> +        return alloc_huge_page_node(page_hstate(compound_head(page)),
>> +                    node);
>> +    else
>> +        return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);
> 
> You could just test for page in movable (or highmem?) zone here in the callback.
> 

is_highmem_idx() always return 0 if CONFIG_HIGHMEM closed.
And GFP_HIGHUSER_MOVABLE will choose movable_zone first, then normal_zone.
So how about this check? if (PageHighMem() or zone == ZONE_MOVABLE) then use GFP_HIGHUSER_MOVABLE

>> +}
>> +
>>  #define NR_OFFLINE_AT_ONCE_PAGES    (256)
>>  static int
>>  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>> @@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>      int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>>      int not_managed = 0;
>>      int ret = 0;
>> +    int nid = NUMA_NO_NODE;
>>      LIST_HEAD(source);
>>
>>      for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
>> @@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>              goto out;
>>          }
>>
>> -        /*
>> -         * alloc_migrate_target should be improooooved!!
>> -         * migrate_pages returns # of failed pages.
>> -         */
>> -        ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
>> -                    MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>> +        for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>> +            if (!pfn_valid(pfn))
>> +                continue;
>> +            page = pfn_to_page(pfn);
>> +            if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
>> +                nid = next_node_in(page_to_nid(page),
>> +                        node_online_map);
>> +            break;
>> +        }
> 
> Then you could remove the ZONE_MOVABLE check here. I'm not sure how much worth the precalculation of nid is, if it has to be a rather complicated code like this, hm.
> 
> Also, since we know that "next node in node_online_map" is in fact not optimal, what about using the opportunity to really try the best possible way? Maybe it's as simple as allocating via __alloc_pages_nodemask() with current node's zonelist (where remote nodes should be already sorted according to NUMA distance), but with current node (which would be first in the zonelist) removed from the nodemask so that it's skipped over? But check if memory offlining process didn't kill the zonelist already at this point, or something.
> 

Do you mean that call __alloc_pages_nodemask(), the zonelist is from current page's node,
but it(the current page's node) is not include in the nodemask?

Thanks,
Xishi Qiu

>> +
>> +        /* Alloc new page from the next node if possible */
>> +        if (nid != NUMA_NO_NODE)
>> +            ret = migrate_pages(&source, new_node_page, NULL,
>> +                    nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>> +        else
>> +            ret = migrate_pages(&source, alloc_migrate_target, NULL,
>> +                    0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> 
> Please just use one new callback fully tailored for memory offline, instead of choosing between the two like this.
> 
>>          if (ret)
>>              putback_movable_pages(&source);
>>      }
>>
> 
> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
  2016-07-25  7:42     ` Xishi Qiu
@ 2016-07-25  7:52       ` Vlastimil Babka
  -1 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2016-07-25  7:52 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Joonsoo Kim, Andrew Morton, Naoya Horiguchi, David Rientjes,
	Linux MM, LKML

On 07/25/2016 09:42 AM, Xishi Qiu wrote:
> On 2016/7/25 14:59, Vlastimil Babka wrote:
>
>> On 07/22/2016 04:57 AM, Xishi Qiu wrote:
>>> Memory offline could happen on both movable zone and non-movable zone.
>>> We can offline the whole node if the zone is movable zone, and if the
>>> zone is non-movable zone, we cannot offline the whole node, because
>>> some kernel memory can't be migrated.
>>>
>>> So if we offline a node with movable zone, use prefer mempolicy to alloc
>>> new page from the next node instead of the current node or other remote
>>> nodes, because re-migrate is a waste of time and the distance of the
>>> remote nodes is often very large.
>>>
>>> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
>>> zone.
>>>
>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>
>> I think this could be simpler, if you preferred the next node regardless of whether it's movable zone or not. What are use cases for trying to offline part of non-MOVABLE zone in a node? It's not guaranteed to succeed anyway. Also if the reasoning is that the non-MOVABLE offlining preference for migration target should be instead on the *same* node, then alloc_migrate_target() would anyway prefer the node of the current CPU that happens to execute the offlining, which is random wrt the node in question. So consistently choosing remote node is IMHO better than random even for non-MOVABLE zone.
>>
>
> Hi Vlastimil,
>
> use next node for movable zone, use current node for non-movable zone, right?

I was asking why not just next node for any zone, to make things 
simpler. What's the use case for offlining in non-movable zone?

>
>>> ---
>>>  mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
>>>  1 file changed, 29 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index e3cbdca..930a5c6 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
>>>      return 0;
>>>  }
>>>
>>> +static struct page *new_node_page(struct page *page, unsigned long node,
>>> +        int **result)
>>> +{
>>> +    if (PageHuge(page))
>>> +        return alloc_huge_page_node(page_hstate(compound_head(page)),
>>> +                    node);
>>> +    else
>>> +        return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);
>>
>> You could just test for page in movable (or highmem?) zone here in the callback.
>>
>
> is_highmem_idx() always return 0 if CONFIG_HIGHMEM closed.

Yeah, but then it doesn't matter if __GFP_HIGHMEM is given.

> And GFP_HIGHUSER_MOVABLE will choose movable_zone first, then normal_zone.
> So how about this check? if (PageHighMem() or zone == ZONE_MOVABLE) then use GFP_HIGHUSER_MOVABLE

zone == ZONE_MOVABLE -> GFP_HIGHUSER_MOVABLE
PageHighMem() -> GFP_HIGHUSER
else GFP_USER ?

>>> +}
>>> +
>>>  #define NR_OFFLINE_AT_ONCE_PAGES    (256)
>>>  static int
>>>  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>> @@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>      int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>>>      int not_managed = 0;
>>>      int ret = 0;
>>> +    int nid = NUMA_NO_NODE;
>>>      LIST_HEAD(source);
>>>
>>>      for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
>>> @@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>              goto out;
>>>          }
>>>
>>> -        /*
>>> -         * alloc_migrate_target should be improooooved!!
>>> -         * migrate_pages returns # of failed pages.
>>> -         */
>>> -        ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
>>> -                    MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>> +        for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>> +            if (!pfn_valid(pfn))
>>> +                continue;
>>> +            page = pfn_to_page(pfn);
>>> +            if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
>>> +                nid = next_node_in(page_to_nid(page),
>>> +                        node_online_map);
>>> +            break;
>>> +        }
>>
>> Then you could remove the ZONE_MOVABLE check here. I'm not sure how much worth the precalculation of nid is, if it has to be a rather complicated code like this, hm.
>>
>> Also, since we know that "next node in node_online_map" is in fact not optimal, what about using the opportunity to really try the best possible way? Maybe it's as simple as allocating via __alloc_pages_nodemask() with current node's zonelist (where remote nodes should be already sorted according to NUMA distance), but with current node (which would be first in the zonelist) removed from the nodemask so that it's skipped over? But check if memory offlining process didn't kill the zonelist already at this point, or something.
>>
>
> Do you mean that call __alloc_pages_nodemask(), the zonelist is from current page's node,
> but it(the current page's node) is not include in the nodemask?

Exactly.
The question is if there should be a fallback to current page's node if 
that fails. Only make sense if somebody wants to offline only part of 
the node?

>
> Thanks,
> Xishi Qiu
>
>>> +
>>> +        /* Alloc new page from the next node if possible */
>>> +        if (nid != NUMA_NO_NODE)
>>> +            ret = migrate_pages(&source, new_node_page, NULL,
>>> +                    nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>> +        else
>>> +            ret = migrate_pages(&source, alloc_migrate_target, NULL,
>>> +                    0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>
>> Please just use one new callback fully tailored for memory offline, instead of choosing between the two like this.
>>
>>>          if (ret)
>>>              putback_movable_pages(&source);
>>>      }
>>>
>>
>>
>> .
>>
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
@ 2016-07-25  7:52       ` Vlastimil Babka
  0 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2016-07-25  7:52 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Joonsoo Kim, Andrew Morton, Naoya Horiguchi, David Rientjes,
	Linux MM, LKML

On 07/25/2016 09:42 AM, Xishi Qiu wrote:
> On 2016/7/25 14:59, Vlastimil Babka wrote:
>
>> On 07/22/2016 04:57 AM, Xishi Qiu wrote:
>>> Memory offline could happen on both movable zone and non-movable zone.
>>> We can offline the whole node if the zone is movable zone, and if the
>>> zone is non-movable zone, we cannot offline the whole node, because
>>> some kernel memory can't be migrated.
>>>
>>> So if we offline a node with movable zone, use prefer mempolicy to alloc
>>> new page from the next node instead of the current node or other remote
>>> nodes, because re-migrate is a waste of time and the distance of the
>>> remote nodes is often very large.
>>>
>>> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
>>> zone.
>>>
>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>
>> I think this could be simpler, if you preferred the next node regardless of whether it's movable zone or not. What are use cases for trying to offline part of non-MOVABLE zone in a node? It's not guaranteed to succeed anyway. Also if the reasoning is that the non-MOVABLE offlining preference for migration target should be instead on the *same* node, then alloc_migrate_target() would anyway prefer the node of the current CPU that happens to execute the offlining, which is random wrt the node in question. So consistently choosing remote node is IMHO better than random even for non-MOVABLE zone.
>>
>
> Hi Vlastimil,
>
> use next node for movable zone, use current node for non-movable zone, right?

I was asking why not just next node for any zone, to make things 
simpler. What's the use case for offlining in non-movable zone?

>
>>> ---
>>>  mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
>>>  1 file changed, 29 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index e3cbdca..930a5c6 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
>>>      return 0;
>>>  }
>>>
>>> +static struct page *new_node_page(struct page *page, unsigned long node,
>>> +        int **result)
>>> +{
>>> +    if (PageHuge(page))
>>> +        return alloc_huge_page_node(page_hstate(compound_head(page)),
>>> +                    node);
>>> +    else
>>> +        return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);
>>
>> You could just test for page in movable (or highmem?) zone here in the callback.
>>
>
> is_highmem_idx() always return 0 if CONFIG_HIGHMEM closed.

Yeah, but then it doesn't matter if __GFP_HIGHMEM is given.

> And GFP_HIGHUSER_MOVABLE will choose movable_zone first, then normal_zone.
> So how about this check? if (PageHighMem() or zone == ZONE_MOVABLE) then use GFP_HIGHUSER_MOVABLE

zone == ZONE_MOVABLE -> GFP_HIGHUSER_MOVABLE
PageHighMem() -> GFP_HIGHUSER
else GFP_USER ?

>>> +}
>>> +
>>>  #define NR_OFFLINE_AT_ONCE_PAGES    (256)
>>>  static int
>>>  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>> @@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>      int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>>>      int not_managed = 0;
>>>      int ret = 0;
>>> +    int nid = NUMA_NO_NODE;
>>>      LIST_HEAD(source);
>>>
>>>      for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
>>> @@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>              goto out;
>>>          }
>>>
>>> -        /*
>>> -         * alloc_migrate_target should be improooooved!!
>>> -         * migrate_pages returns # of failed pages.
>>> -         */
>>> -        ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
>>> -                    MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>> +        for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>> +            if (!pfn_valid(pfn))
>>> +                continue;
>>> +            page = pfn_to_page(pfn);
>>> +            if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
>>> +                nid = next_node_in(page_to_nid(page),
>>> +                        node_online_map);
>>> +            break;
>>> +        }
>>
>> Then you could remove the ZONE_MOVABLE check here. I'm not sure how much worth the precalculation of nid is, if it has to be a rather complicated code like this, hm.
>>
>> Also, since we know that "next node in node_online_map" is in fact not optimal, what about using the opportunity to really try the best possible way? Maybe it's as simple as allocating via __alloc_pages_nodemask() with current node's zonelist (where remote nodes should be already sorted according to NUMA distance), but with current node (which would be first in the zonelist) removed from the nodemask so that it's skipped over? But check if memory offlining process didn't kill the zonelist already at this point, or something.
>>
>
> Do you mean that call __alloc_pages_nodemask(), the zonelist is from current page's node,
> but it(the current page's node) is not include in the nodemask?

Exactly.
The question is if there should be a fallback to current page's node if 
that fails. Only make sense if somebody wants to offline only part of 
the node?

>
> Thanks,
> Xishi Qiu
>
>>> +
>>> +        /* Alloc new page from the next node if possible */
>>> +        if (nid != NUMA_NO_NODE)
>>> +            ret = migrate_pages(&source, new_node_page, NULL,
>>> +                    nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>> +        else
>>> +            ret = migrate_pages(&source, alloc_migrate_target, NULL,
>>> +                    0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>
>> Please just use one new callback fully tailored for memory offline, instead of choosing between the two like this.
>>
>>>          if (ret)
>>>              putback_movable_pages(&source);
>>>      }
>>>
>>
>>
>> .
>>
>
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
  2016-07-25  7:52       ` Vlastimil Babka
@ 2016-07-25  8:54         ` Xishi Qiu
  -1 siblings, 0 replies; 14+ messages in thread
From: Xishi Qiu @ 2016-07-25  8:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Naoya Horiguchi, David Rientjes,
	Linux MM, LKML

On 2016/7/25 15:52, Vlastimil Babka wrote:

> On 07/25/2016 09:42 AM, Xishi Qiu wrote:
>> On 2016/7/25 14:59, Vlastimil Babka wrote:
>>
>>> On 07/22/2016 04:57 AM, Xishi Qiu wrote:
>>>> Memory offline could happen on both movable zone and non-movable zone.
>>>> We can offline the whole node if the zone is movable zone, and if the
>>>> zone is non-movable zone, we cannot offline the whole node, because
>>>> some kernel memory can't be migrated.
>>>>
>>>> So if we offline a node with movable zone, use prefer mempolicy to alloc
>>>> new page from the next node instead of the current node or other remote
>>>> nodes, because re-migrate is a waste of time and the distance of the
>>>> remote nodes is often very large.
>>>>
>>>> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
>>>> zone.
>>>>
>>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>>
>>> I think this could be simpler, if you preferred the next node regardless of whether it's movable zone or not. What are use cases for trying to offline part of non-MOVABLE zone in a node? It's not guaranteed to succeed anyway. Also if the reasoning is that the non-MOVABLE offlining preference for migration target should be instead on the *same* node, then alloc_migrate_target() would anyway prefer the node of the current CPU that happens to execute the offlining, which is random wrt the node in question. So consistently choosing remote node is IMHO better than random even for non-MOVABLE zone.
>>>
>>
>> Hi Vlastimil,
>>
>> use next node for movable zone, use current node for non-movable zone, right?
> 
> I was asking why not just next node for any zone, to make things simpler. What's the use case for offlining in non-movable zone?
> 

OK, both use the next node is much simpler.

>>
>>>> ---
>>>>  mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
>>>>  1 file changed, 29 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>> index e3cbdca..930a5c6 100644
>>>> --- a/mm/memory_hotplug.c
>>>> +++ b/mm/memory_hotplug.c
>>>> @@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
>>>>      return 0;
>>>>  }
>>>>
>>>> +static struct page *new_node_page(struct page *page, unsigned long node,
>>>> +        int **result)
>>>> +{
>>>> +    if (PageHuge(page))
>>>> +        return alloc_huge_page_node(page_hstate(compound_head(page)),
>>>> +                    node);
>>>> +    else
>>>> +        return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);
>>>
>>> You could just test for page in movable (or highmem?) zone here in the callback.
>>>
>>
>> is_highmem_idx() always return 0 if CONFIG_HIGHMEM closed.
> 
> Yeah, but then it doesn't matter if __GFP_HIGHMEM is given.
> 
>> And GFP_HIGHUSER_MOVABLE will choose movable_zone first, then normal_zone.
>> So how about this check? if (PageHighMem() or zone == ZONE_MOVABLE) then use GFP_HIGHUSER_MOVABLE
> 
> zone == ZONE_MOVABLE -> GFP_HIGHUSER_MOVABLE
> PageHighMem() -> GFP_HIGHUSER

How about use GFP_HIGHUSER_MOVABLE in the above two cases? As Joonsoo said:

"all migratable pages are not from user space. For example,
blockdev file cache has __GFP_MOVABLE and migratable but it has no
__GFP_HIGHMEM and __GFP_USER.
And, zram's memory isn't GFP_HIGHUSER_MOVABLE but has __GFP_MOVABLE."

And you said this before "GFP_USER just specifies some reclaim flags"

> else GFP_USER ?

else use GFP_USER | __GFP_MOVABLE ?

Thanks,
Xishi Qiu

> 
>>>> +}
>>>> +
>>>>  #define NR_OFFLINE_AT_ONCE_PAGES    (256)
>>>>  static int
>>>>  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>> @@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>>      int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>>>>      int not_managed = 0;
>>>>      int ret = 0;
>>>> +    int nid = NUMA_NO_NODE;
>>>>      LIST_HEAD(source);
>>>>
>>>>      for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
>>>> @@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>>              goto out;
>>>>          }
>>>>
>>>> -        /*
>>>> -         * alloc_migrate_target should be improooooved!!
>>>> -         * migrate_pages returns # of failed pages.
>>>> -         */
>>>> -        ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
>>>> -                    MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>>> +        for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>> +            if (!pfn_valid(pfn))
>>>> +                continue;
>>>> +            page = pfn_to_page(pfn);
>>>> +            if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
>>>> +                nid = next_node_in(page_to_nid(page),
>>>> +                        node_online_map);
>>>> +            break;
>>>> +        }
>>>
>>> Then you could remove the ZONE_MOVABLE check here. I'm not sure how much worth the precalculation of nid is, if it has to be a rather complicated code like this, hm.
>>>
>>> Also, since we know that "next node in node_online_map" is in fact not optimal, what about using the opportunity to really try the best possible way? Maybe it's as simple as allocating via __alloc_pages_nodemask() with current node's zonelist (where remote nodes should be already sorted according to NUMA distance), but with current node (which would be first in the zonelist) removed from the nodemask so that it's skipped over? But check if memory offlining process didn't kill the zonelist already at this point, or something.
>>>
>>
>> Do you mean that call __alloc_pages_nodemask(), the zonelist is from current page's node,
>> but it(the current page's node) is not include in the nodemask?
> 
> Exactly.
> The question is if there should be a fallback to current page's node if that fails. Only make sense if somebody wants to offline only part of the node?
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>>>> +
>>>> +        /* Alloc new page from the next node if possible */
>>>> +        if (nid != NUMA_NO_NODE)
>>>> +            ret = migrate_pages(&source, new_node_page, NULL,
>>>> +                    nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>>> +        else
>>>> +            ret = migrate_pages(&source, alloc_migrate_target, NULL,
>>>> +                    0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>>
>>> Please just use one new callback fully tailored for memory offline, instead of choosing between the two like this.
>>>
>>>>          if (ret)
>>>>              putback_movable_pages(&source);
>>>>      }
>>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE
@ 2016-07-25  8:54         ` Xishi Qiu
  0 siblings, 0 replies; 14+ messages in thread
From: Xishi Qiu @ 2016-07-25  8:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Naoya Horiguchi, David Rientjes,
	Linux MM, LKML

On 2016/7/25 15:52, Vlastimil Babka wrote:

> On 07/25/2016 09:42 AM, Xishi Qiu wrote:
>> On 2016/7/25 14:59, Vlastimil Babka wrote:
>>
>>> On 07/22/2016 04:57 AM, Xishi Qiu wrote:
>>>> Memory offline could happen on both movable zone and non-movable zone.
>>>> We can offline the whole node if the zone is movable zone, and if the
>>>> zone is non-movable zone, we cannot offline the whole node, because
>>>> some kernel memory can't be migrated.
>>>>
>>>> So if we offline a node with movable zone, use prefer mempolicy to alloc
>>>> new page from the next node instead of the current node or other remote
>>>> nodes, because re-migrate is a waste of time and the distance of the
>>>> remote nodes is often very large.
>>>>
>>>> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
>>>> zone.
>>>>
>>>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>>>
>>> I think this could be simpler, if you preferred the next node regardless of whether it's movable zone or not. What are use cases for trying to offline part of non-MOVABLE zone in a node? It's not guaranteed to succeed anyway. Also if the reasoning is that the non-MOVABLE offlining preference for migration target should be instead on the *same* node, then alloc_migrate_target() would anyway prefer the node of the current CPU that happens to execute the offlining, which is random wrt the node in question. So consistently choosing remote node is IMHO better than random even for non-MOVABLE zone.
>>>
>>
>> Hi Vlastimil,
>>
>> use next node for movable zone, use current node for non-movable zone, right?
> 
> I was asking why not just next node for any zone, to make things simpler. What's the use case for offlining in non-movable zone?
> 

OK, both use the next node is much simpler.

>>
>>>> ---
>>>>  mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
>>>>  1 file changed, 29 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>> index e3cbdca..930a5c6 100644
>>>> --- a/mm/memory_hotplug.c
>>>> +++ b/mm/memory_hotplug.c
>>>> @@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
>>>>      return 0;
>>>>  }
>>>>
>>>> +static struct page *new_node_page(struct page *page, unsigned long node,
>>>> +        int **result)
>>>> +{
>>>> +    if (PageHuge(page))
>>>> +        return alloc_huge_page_node(page_hstate(compound_head(page)),
>>>> +                    node);
>>>> +    else
>>>> +        return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);
>>>
>>> You could just test for page in movable (or highmem?) zone here in the callback.
>>>
>>
>> is_highmem_idx() always return 0 if CONFIG_HIGHMEM closed.
> 
> Yeah, but then it doesn't matter if __GFP_HIGHMEM is given.
> 
>> And GFP_HIGHUSER_MOVABLE will choose movable_zone first, then normal_zone.
>> So how about this check? if (PageHighMem() or zone == ZONE_MOVABLE) then use GFP_HIGHUSER_MOVABLE
> 
> zone == ZONE_MOVABLE -> GFP_HIGHUSER_MOVABLE
> PageHighMem() -> GFP_HIGHUSER

How about use GFP_HIGHUSER_MOVABLE in the above two cases? As Joonsoo said:

"all migratable pages are not from user space. For example,
blockdev file cache has __GFP_MOVABLE and migratable but it has no
__GFP_HIGHMEM and __GFP_USER.
And, zram's memory isn't GFP_HIGHUSER_MOVABLE but has __GFP_MOVABLE."

And you said this before "GFP_USER just specifies some reclaim flags"

> else GFP_USER ?

else use GFP_USER | __GFP_MOVABLE ?

Thanks,
Xishi Qiu

> 
>>>> +}
>>>> +
>>>>  #define NR_OFFLINE_AT_ONCE_PAGES    (256)
>>>>  static int
>>>>  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>> @@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>>      int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>>>>      int not_managed = 0;
>>>>      int ret = 0;
>>>> +    int nid = NUMA_NO_NODE;
>>>>      LIST_HEAD(source);
>>>>
>>>>      for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
>>>> @@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>>              goto out;
>>>>          }
>>>>
>>>> -        /*
>>>> -         * alloc_migrate_target should be improooooved!!
>>>> -         * migrate_pages returns # of failed pages.
>>>> -         */
>>>> -        ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
>>>> -                    MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>>> +        for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>> +            if (!pfn_valid(pfn))
>>>> +                continue;
>>>> +            page = pfn_to_page(pfn);
>>>> +            if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
>>>> +                nid = next_node_in(page_to_nid(page),
>>>> +                        node_online_map);
>>>> +            break;
>>>> +        }
>>>
>>> Then you could remove the ZONE_MOVABLE check here. I'm not sure how much worth the precalculation of nid is, if it has to be a rather complicated code like this, hm.
>>>
>>> Also, since we know that "next node in node_online_map" is in fact not optimal, what about using the opportunity to really try the best possible way? Maybe it's as simple as allocating via __alloc_pages_nodemask() with current node's zonelist (where remote nodes should be already sorted according to NUMA distance), but with current node (which would be first in the zonelist) removed from the nodemask so that it's skipped over? But check if memory offlining process didn't kill the zonelist already at this point, or something.
>>>
>>
>> Do you mean that call __alloc_pages_nodemask(), the zonelist is from current page's node,
>> but it(the current page's node) is not include in the nodemask?
> 
> Exactly.
> The question is if there should be a fallback to current page's node if that fails. Only make sense if somebody wants to offline only part of the node?
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>>>> +
>>>> +        /* Alloc new page from the next node if possible */
>>>> +        if (nid != NUMA_NO_NODE)
>>>> +            ret = migrate_pages(&source, new_node_page, NULL,
>>>> +                    nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>>> +        else
>>>> +            ret = migrate_pages(&source, alloc_migrate_target, NULL,
>>>> +                    0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>>
>>> Please just use one new callback fully tailored for memory offline, instead of choosing between the two like this.
>>>
>>>>          if (ret)
>>>>              putback_movable_pages(&source);
>>>>      }
>>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
> 
> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-07-25 17:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-22  2:57 [PATCH v2] mem-hotplug: alloc new page from the next node if zone is MOVABLE_ZONE Xishi Qiu
2016-07-22  2:57 ` Xishi Qiu
2016-07-22 20:11 ` Andrew Morton
2016-07-22 20:11   ` Andrew Morton
2016-07-25  1:52   ` Xishi Qiu
2016-07-25  1:52     ` Xishi Qiu
2016-07-25  6:59 ` Vlastimil Babka
2016-07-25  6:59   ` Vlastimil Babka
2016-07-25  7:42   ` Xishi Qiu
2016-07-25  7:42     ` Xishi Qiu
2016-07-25  7:52     ` Vlastimil Babka
2016-07-25  7:52       ` Vlastimil Babka
2016-07-25  8:54       ` Xishi Qiu
2016-07-25  8:54         ` Xishi Qiu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.