All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] mm, memory_hotplug: redefine memory offline retry logic
@ 2017-09-18  7:08 ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-09-18  7:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML, Michal Hocko,
	Vlastimil Babka

Hi,
this has been previously sent http://lkml.kernel.org/r/20170904082148.23131-1-mhocko@kernel.org
No fundamental objections have been raised. There were some questions about
potential permanent migration failures but those are deemed unlikely and
not really problematic because the context is interruptible. I have tried
to clarify the wording to be more clear.

original changelog:
While testing memory hotplug on a large 4TB machine we have noticed that
memory offlining is just too eager to fail. The primary reason is that
the retry logic is just too easy to give up. We have 4 ways out of the
offline
	- we have a permanent failure (isolation or memory notifiers fail,
	  or hugetlb pages cannot be dropped)
	- userspace sends a signal
	- a hardcoded 120s timeout expires
	- page migration fails 5 times
This is way too convoluted and it doesn't scale very well. We have seen both
temporary migration failures as well as 120s being triggered. After removing
those restrictions we were able to pass stress testing during memory hot
remove without any other negative side effects observed. Therefore I suggest
dropping both hard coded policies. I couldn't have found any specific reason
for them in the changelog. I neither didn't get any response [1] from Kamezawa.
If we need some upper bound - e.g. timeout based - then we should have a proper
and user defined policy for that. In any case there should be a clear use case
when introducing it.

Any comments, objections?

Shortlog
Michal Hocko (2):
      mm, memory_hotplug: do not fail offlining too early
      mm, memory_hotplug: remove timeout from __offline_memory

Diffstat
 mm/memory_hotplug.c | 48 ++++++++++++------------------------------------
 1 file changed, 12 insertions(+), 36 deletions(-)

[1] http://lkml.kernel.org/r/20170828094316.GF17097@dhcp22.suse.cz

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v2 0/2] mm, memory_hotplug: redefine memory offline retry logic
@ 2017-09-18  7:08 ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-09-18  7:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML, Michal Hocko,
	Vlastimil Babka

Hi,
this has been previously sent http://lkml.kernel.org/r/20170904082148.23131-1-mhocko@kernel.org
No fundamental objections have been raised. There were some questions about
potential permanent migration failures but those are deemed unlikely and
not really problematic because the context is interruptible. I have tried
to clarify the wording to be more clear.

original changelog:
While testing memory hotplug on a large 4TB machine we have noticed that
memory offlining is just too eager to fail. The primary reason is that
the retry logic is just too easy to give up. We have 4 ways out of the
offline
	- we have a permanent failure (isolation or memory notifiers fail,
	  or hugetlb pages cannot be dropped)
	- userspace sends a signal
	- a hardcoded 120s timeout expires
	- page migration fails 5 times
This is way too convoluted and it doesn't scale very well. We have seen both
temporary migration failures as well as 120s being triggered. After removing
those restrictions we were able to pass stress testing during memory hot
remove without any other negative side effects observed. Therefore I suggest
dropping both hard coded policies. I couldn't have found any specific reason
for them in the changelog. I neither didn't get any response [1] from Kamezawa.
If we need some upper bound - e.g. timeout based - then we should have a proper
and user defined policy for that. In any case there should be a clear use case
when introducing it.

Any comments, objections?

Shortlog
Michal Hocko (2):
      mm, memory_hotplug: do not fail offlining too early
      mm, memory_hotplug: remove timeout from __offline_memory

Diffstat
 mm/memory_hotplug.c | 48 ++++++++++++------------------------------------
 1 file changed, 12 insertions(+), 36 deletions(-)

[1] http://lkml.kernel.org/r/20170828094316.GF17097@dhcp22.suse.cz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-09-18  7:08 ` Michal Hocko
@ 2017-09-18  7:08   ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-09-18  7:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML, Michal Hocko,
	Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

Memory offlining can fail just too eagerly under a heavy memory pressure.

[ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
[ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
[ 5410.336811] page dumped because: isolation failed
[ 5410.336813] page->mem_cgroup:ffff8801cd662000
[ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed

Isolation has failed here because the page is not on LRU. Most probably
because it was on the pcp LRU cache or it has been removed from the LRU
already but it hasn't been freed yet. In both cases the page doesn't look
non-migrable so retrying more makes sense.

__offline_pages seems rather cluttered when it comes to the retry
logic. We have 5 retries at maximum and a timeout. We could argue
whether the timeout makes sense but failing just because of a race when
somebody isoltes a page from LRU or puts it on a pcp LRU lists is just
wrong. It only takes it to race with a process which unmaps some pages
and remove them from the LRU list and we can fail the whole offline
because of something that is a temporary condition and actually not
harmful for the offline.

Please note that unmovable pages should be already excluded during
start_isolate_page_range. We could argue that has_unmovable_pages is
racy and MIGRATE_MOVABLE check doesn't provide any hard guarantee either
but kernel zones (aka < ZONE_MOVABLE) will very likely detect unmovable
pages in most cases and movable zone shouldn't contain unmovable pages
at all. Some of those pages might be pinned but not for ever because
that would be a bug on its own. In any case the context is still
interruptible and so the userspace can easily bail out when the
operation takes too long. This is certainly better behavior than a
hardcoded retry loop which is racy.

Fix this by removing the max retry count and only rely on the timeout
resp. interruption by a signal from the userspace. Also retry rather
than fail when check_pages_isolated sees some !free pages because those
could be a result of the race as well.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 40 ++++++++++------------------------------
 1 file changed, 10 insertions(+), 30 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 459bbc182d10..c9dcbe6d2ac6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1597,7 +1597,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 {
 	unsigned long pfn, nr_pages, expire;
 	long offlined_pages;
-	int ret, drain, retry_max, node;
+	int ret, node;
 	unsigned long flags;
 	unsigned long valid_start, valid_end;
 	struct zone *zone;
@@ -1634,43 +1634,25 @@ static int __ref __offline_pages(unsigned long start_pfn,
 
 	pfn = start_pfn;
 	expire = jiffies + timeout;
-	drain = 0;
-	retry_max = 5;
 repeat:
 	/* start memory hot removal */
-	ret = -EAGAIN;
+	ret = -EBUSY;
 	if (time_after(jiffies, expire))
 		goto failed_removal;
 	ret = -EINTR;
 	if (signal_pending(current))
 		goto failed_removal;
-	ret = 0;
-	if (drain) {
-		lru_add_drain_all_cpuslocked();
-		cond_resched();
-		drain_all_pages(zone);
-	}
+
+	cond_resched();
+	lru_add_drain_all_cpuslocked();
+	drain_all_pages(zone);
 
 	pfn = scan_movable_pages(start_pfn, end_pfn);
 	if (pfn) { /* We have movable pages */
 		ret = do_migrate_range(pfn, end_pfn);
-		if (!ret) {
-			drain = 1;
-			goto repeat;
-		} else {
-			if (ret < 0)
-				if (--retry_max == 0)
-					goto failed_removal;
-			yield();
-			drain = 1;
-			goto repeat;
-		}
+		goto repeat;
 	}
-	/* drain all zone's lru pagevec, this is asynchronous... */
-	lru_add_drain_all_cpuslocked();
-	yield();
-	/* drain pcp pages, this is synchronous. */
-	drain_all_pages(zone);
+
 	/*
 	 * dissolve free hugepages in the memory block before doing offlining
 	 * actually in order to make hugetlbfs's object counting consistent.
@@ -1680,10 +1662,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal;
 	/* check again */
 	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
-	if (offlined_pages < 0) {
-		ret = -EBUSY;
-		goto failed_removal;
-	}
+	if (offlined_pages < 0)
+		goto repeat;
 	pr_info("Offlined Pages %ld\n", offlined_pages);
 	/* Ok, all of our target is isolated.
 	   We cannot do rollback at this point. */
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-09-18  7:08   ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-09-18  7:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML, Michal Hocko,
	Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

Memory offlining can fail just too eagerly under a heavy memory pressure.

[ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
[ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
[ 5410.336811] page dumped because: isolation failed
[ 5410.336813] page->mem_cgroup:ffff8801cd662000
[ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed

Isolation has failed here because the page is not on LRU. Most probably
because it was on the pcp LRU cache or it has been removed from the LRU
already but it hasn't been freed yet. In both cases the page doesn't look
non-migrable so retrying more makes sense.

__offline_pages seems rather cluttered when it comes to the retry
logic. We have 5 retries at maximum and a timeout. We could argue
whether the timeout makes sense but failing just because of a race when
somebody isoltes a page from LRU or puts it on a pcp LRU lists is just
wrong. It only takes it to race with a process which unmaps some pages
and remove them from the LRU list and we can fail the whole offline
because of something that is a temporary condition and actually not
harmful for the offline.

Please note that unmovable pages should be already excluded during
start_isolate_page_range. We could argue that has_unmovable_pages is
racy and MIGRATE_MOVABLE check doesn't provide any hard guarantee either
but kernel zones (aka < ZONE_MOVABLE) will very likely detect unmovable
pages in most cases and movable zone shouldn't contain unmovable pages
at all. Some of those pages might be pinned but not for ever because
that would be a bug on its own. In any case the context is still
interruptible and so the userspace can easily bail out when the
operation takes too long. This is certainly better behavior than a
hardcoded retry loop which is racy.

Fix this by removing the max retry count and only rely on the timeout
resp. interruption by a signal from the userspace. Also retry rather
than fail when check_pages_isolated sees some !free pages because those
could be a result of the race as well.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 40 ++++++++++------------------------------
 1 file changed, 10 insertions(+), 30 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 459bbc182d10..c9dcbe6d2ac6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1597,7 +1597,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 {
 	unsigned long pfn, nr_pages, expire;
 	long offlined_pages;
-	int ret, drain, retry_max, node;
+	int ret, node;
 	unsigned long flags;
 	unsigned long valid_start, valid_end;
 	struct zone *zone;
@@ -1634,43 +1634,25 @@ static int __ref __offline_pages(unsigned long start_pfn,
 
 	pfn = start_pfn;
 	expire = jiffies + timeout;
-	drain = 0;
-	retry_max = 5;
 repeat:
 	/* start memory hot removal */
-	ret = -EAGAIN;
+	ret = -EBUSY;
 	if (time_after(jiffies, expire))
 		goto failed_removal;
 	ret = -EINTR;
 	if (signal_pending(current))
 		goto failed_removal;
-	ret = 0;
-	if (drain) {
-		lru_add_drain_all_cpuslocked();
-		cond_resched();
-		drain_all_pages(zone);
-	}
+
+	cond_resched();
+	lru_add_drain_all_cpuslocked();
+	drain_all_pages(zone);
 
 	pfn = scan_movable_pages(start_pfn, end_pfn);
 	if (pfn) { /* We have movable pages */
 		ret = do_migrate_range(pfn, end_pfn);
-		if (!ret) {
-			drain = 1;
-			goto repeat;
-		} else {
-			if (ret < 0)
-				if (--retry_max == 0)
-					goto failed_removal;
-			yield();
-			drain = 1;
-			goto repeat;
-		}
+		goto repeat;
 	}
-	/* drain all zone's lru pagevec, this is asynchronous... */
-	lru_add_drain_all_cpuslocked();
-	yield();
-	/* drain pcp pages, this is synchronous. */
-	drain_all_pages(zone);
+
 	/*
 	 * dissolve free hugepages in the memory block before doing offlining
 	 * actually in order to make hugetlbfs's object counting consistent.
@@ -1680,10 +1662,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal;
 	/* check again */
 	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
-	if (offlined_pages < 0) {
-		ret = -EBUSY;
-		goto failed_removal;
-	}
+	if (offlined_pages < 0)
+		goto repeat;
 	pr_info("Offlined Pages %ld\n", offlined_pages);
 	/* Ok, all of our target is isolated.
 	   We cannot do rollback at this point. */
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 2/2] mm, memory_hotplug: remove timeout from __offline_memory
  2017-09-18  7:08 ` Michal Hocko
@ 2017-09-18  7:08   ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-09-18  7:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML, Michal Hocko,
	Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

We have a hardcoded 120s timeout after which the memory offline fails
basically since the hot remove has been introduced. This is essentially
a policy implemented in the kernel. Moreover there is no way to adjust
the timeout and so we are sometimes facing memory offline failures if
the system is under a heavy memory pressure or very intensive CPU
workload on large machines.

It is not very clear what purpose the timeout actually serves. The
offline operation is interruptible by a signal so if userspace wants
some timeout based termination this can be done trivially by sending a
signal.

If there is a strong usecase to do this from the kernel then we should
do it properly and have a it tunable from the userspace with the timeout
disabled by default along with the explanation who uses it and for what
purporse.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c9dcbe6d2ac6..b8a85c11360e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1593,9 +1593,9 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
 }
 
 static int __ref __offline_pages(unsigned long start_pfn,
-		  unsigned long end_pfn, unsigned long timeout)
+		  unsigned long end_pfn)
 {
-	unsigned long pfn, nr_pages, expire;
+	unsigned long pfn, nr_pages;
 	long offlined_pages;
 	int ret, node;
 	unsigned long flags;
@@ -1633,12 +1633,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal;
 
 	pfn = start_pfn;
-	expire = jiffies + timeout;
 repeat:
 	/* start memory hot removal */
-	ret = -EBUSY;
-	if (time_after(jiffies, expire))
-		goto failed_removal;
 	ret = -EINTR;
 	if (signal_pending(current))
 		goto failed_removal;
@@ -1711,7 +1707,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 /* Must be protected by mem_hotplug_begin() or a device_lock */
 int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
-	return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
+	return __offline_pages(start_pfn, start_pfn + nr_pages);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 2/2] mm, memory_hotplug: remove timeout from __offline_memory
@ 2017-09-18  7:08   ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-09-18  7:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML, Michal Hocko,
	Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

We have a hardcoded 120s timeout after which the memory offline fails
basically since the hot remove has been introduced. This is essentially
a policy implemented in the kernel. Moreover there is no way to adjust
the timeout and so we are sometimes facing memory offline failures if
the system is under a heavy memory pressure or very intensive CPU
workload on large machines.

It is not very clear what purpose the timeout actually serves. The
offline operation is interruptible by a signal so if userspace wants
some timeout based termination this can be done trivially by sending a
signal.

If there is a strong usecase to do this from the kernel then we should
do it properly and have a it tunable from the userspace with the timeout
disabled by default along with the explanation who uses it and for what
purporse.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c9dcbe6d2ac6..b8a85c11360e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1593,9 +1593,9 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
 }
 
 static int __ref __offline_pages(unsigned long start_pfn,
-		  unsigned long end_pfn, unsigned long timeout)
+		  unsigned long end_pfn)
 {
-	unsigned long pfn, nr_pages, expire;
+	unsigned long pfn, nr_pages;
 	long offlined_pages;
 	int ret, node;
 	unsigned long flags;
@@ -1633,12 +1633,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal;
 
 	pfn = start_pfn;
-	expire = jiffies + timeout;
 repeat:
 	/* start memory hot removal */
-	ret = -EBUSY;
-	if (time_after(jiffies, expire))
-		goto failed_removal;
 	ret = -EINTR;
 	if (signal_pending(current))
 		goto failed_removal;
@@ -1711,7 +1707,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 /* Must be protected by mem_hotplug_begin() or a device_lock */
 int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
-	return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
+	return __offline_pages(start_pfn, start_pfn + nr_pages);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-09-18  7:08   ` Michal Hocko
@ 2017-10-10 12:05     ` Michael Ellerman
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael Ellerman @ 2017-10-10 12:05 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML, Michal Hocko,
	Vlastimil Babka

Michal Hocko <mhocko@kernel.org> writes:

> From: Michal Hocko <mhocko@suse.com>
>
> Memory offlining can fail just too eagerly under a heavy memory pressure.
>
> [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
> [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
> [ 5410.336811] page dumped because: isolation failed
> [ 5410.336813] page->mem_cgroup:ffff8801cd662000
> [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>
> Isolation has failed here because the page is not on LRU. Most probably
> because it was on the pcp LRU cache or it has been removed from the LRU
> already but it hasn't been freed yet. In both cases the page doesn't look
> non-migrable so retrying more makes sense.

This breaks offline for me.

Prior to this commit:
  /sys/devices/system/memory/memory0# time echo 0 > online
  -bash: echo: write error: Device or resource busy
  
  real	0m0.001s
  user	0m0.000s
  sys	0m0.001s

After:
  /sys/devices/system/memory/memory0# time echo 0 > online
  -bash: echo: write error: Device or resource busy
  
  real	2m0.009s
  user	0m0.000s
  sys	1m25.035s


There's no way that block can be removed, it contains the kernel text,
so it should instantly fail - which it used to.


With commit 3aa2823fdf66 ("mm, memory_hotplug: remove timeout from
__offline_memory") also applied, it appears to just get stuck forever,
and I get lots of:

  [ 1232.112953] INFO: task kworker/3:0:4609 blocked for more than 120 seconds.
  [ 1232.113067]       Not tainted 4.14.0-rc4-gcc6-next-20171009-g49827b9 #1
  [ 1232.113183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 1232.113319] kworker/3:0     D11984  4609      2 0x00000800
  [ 1232.113416] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
  [ 1232.113531] Call Trace:
  [ 1232.113579] [c0000000fb2db7a0] [c0000000fb2db900] 0xc0000000fb2db900 (unreliable)
  [ 1232.113717] [c0000000fb2db970] [c00000000001c964] __switch_to+0x304/0x6e0
  [ 1232.113840] [c0000000fb2dba10] [c000000000a408c0] __schedule+0x2e0/0xa80
  [ 1232.113978] [c0000000fb2dbae0] [c000000000a410a8] schedule+0x48/0xc0
  [ 1232.114113] [c0000000fb2dbb10] [c000000000a44d88] rwsem_down_read_failed+0x128/0x1b0
  [ 1232.114269] [c0000000fb2dbb70] [c0000000001696a8] __percpu_down_read+0x108/0x110
  [ 1232.114426] [c0000000fb2dbba0] [c00000000032e498] get_online_mems+0x68/0x80
  [ 1232.115487] [c0000000fb2dbbc0] [c0000000002c82ec] memcg_create_kmem_cache+0x4c/0x190
  [ 1232.115651] [c0000000fb2dbc60] [c0000000003483b8] memcg_kmem_cache_create_func+0x38/0xf0
  [ 1232.115809] [c0000000fb2dbc90] [c000000000121594] process_one_work+0x2b4/0x590
  [ 1232.115964] [c0000000fb2dbd20] [c000000000121908] worker_thread+0x98/0x5d0
  [ 1232.116095] [c0000000fb2dbdc0] [c00000000012a134] kthread+0x164/0x1b0
  [ 1232.116229] [c0000000fb2dbe30] [c00000000000bae0] ret_from_kernel_thread+0x5c/0x7c


cheers

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-10 12:05     ` Michael Ellerman
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Ellerman @ 2017-10-10 12:05 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML, Michal Hocko,
	Vlastimil Babka

Michal Hocko <mhocko@kernel.org> writes:

> From: Michal Hocko <mhocko@suse.com>
>
> Memory offlining can fail just too eagerly under a heavy memory pressure.
>
> [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
> [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
> [ 5410.336811] page dumped because: isolation failed
> [ 5410.336813] page->mem_cgroup:ffff8801cd662000
> [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>
> Isolation has failed here because the page is not on LRU. Most probably
> because it was on the pcp LRU cache or it has been removed from the LRU
> already but it hasn't been freed yet. In both cases the page doesn't look
> non-migrable so retrying more makes sense.

This breaks offline for me.

Prior to this commit:
  /sys/devices/system/memory/memory0# time echo 0 > online
  -bash: echo: write error: Device or resource busy
  
  real	0m0.001s
  user	0m0.000s
  sys	0m0.001s

After:
  /sys/devices/system/memory/memory0# time echo 0 > online
  -bash: echo: write error: Device or resource busy
  
  real	2m0.009s
  user	0m0.000s
  sys	1m25.035s


There's no way that block can be removed, it contains the kernel text,
so it should instantly fail - which it used to.


With commit 3aa2823fdf66 ("mm, memory_hotplug: remove timeout from
__offline_memory") also applied, it appears to just get stuck forever,
and I get lots of:

  [ 1232.112953] INFO: task kworker/3:0:4609 blocked for more than 120 seconds.
  [ 1232.113067]       Not tainted 4.14.0-rc4-gcc6-next-20171009-g49827b9 #1
  [ 1232.113183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 1232.113319] kworker/3:0     D11984  4609      2 0x00000800
  [ 1232.113416] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
  [ 1232.113531] Call Trace:
  [ 1232.113579] [c0000000fb2db7a0] [c0000000fb2db900] 0xc0000000fb2db900 (unreliable)
  [ 1232.113717] [c0000000fb2db970] [c00000000001c964] __switch_to+0x304/0x6e0
  [ 1232.113840] [c0000000fb2dba10] [c000000000a408c0] __schedule+0x2e0/0xa80
  [ 1232.113978] [c0000000fb2dbae0] [c000000000a410a8] schedule+0x48/0xc0
  [ 1232.114113] [c0000000fb2dbb10] [c000000000a44d88] rwsem_down_read_failed+0x128/0x1b0
  [ 1232.114269] [c0000000fb2dbb70] [c0000000001696a8] __percpu_down_read+0x108/0x110
  [ 1232.114426] [c0000000fb2dbba0] [c00000000032e498] get_online_mems+0x68/0x80
  [ 1232.115487] [c0000000fb2dbbc0] [c0000000002c82ec] memcg_create_kmem_cache+0x4c/0x190
  [ 1232.115651] [c0000000fb2dbc60] [c0000000003483b8] memcg_kmem_cache_create_func+0x38/0xf0
  [ 1232.115809] [c0000000fb2dbc90] [c000000000121594] process_one_work+0x2b4/0x590
  [ 1232.115964] [c0000000fb2dbd20] [c000000000121908] worker_thread+0x98/0x5d0
  [ 1232.116095] [c0000000fb2dbdc0] [c00000000012a134] kthread+0x164/0x1b0
  [ 1232.116229] [c0000000fb2dbe30] [c00000000000bae0] ret_from_kernel_thread+0x5c/0x7c


cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-10 12:05     ` Michael Ellerman
@ 2017-10-10 12:27       ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-10 12:27 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML,
	Vlastimil Babka

On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > From: Michal Hocko <mhocko@suse.com>
> >
> > Memory offlining can fail just too eagerly under a heavy memory pressure.
> >
> > [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
> > [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
> > [ 5410.336811] page dumped because: isolation failed
> > [ 5410.336813] page->mem_cgroup:ffff8801cd662000
> > [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
> >
> > Isolation has failed here because the page is not on LRU. Most probably
> > because it was on the pcp LRU cache or it has been removed from the LRU
> > already but it hasn't been freed yet. In both cases the page doesn't look
> > non-migrable so retrying more makes sense.
> 
> This breaks offline for me.
> 
> Prior to this commit:
>   /sys/devices/system/memory/memory0# time echo 0 > online
>   -bash: echo: write error: Device or resource busy
>   
>   real	0m0.001s
>   user	0m0.000s
>   sys	0m0.001s
> 
> After:
>   /sys/devices/system/memory/memory0# time echo 0 > online
>   -bash: echo: write error: Device or resource busy
>   
>   real	2m0.009s
>   user	0m0.000s
>   sys	1m25.035s
> 
> 
> There's no way that block can be removed, it contains the kernel text,
> so it should instantly fail - which it used to.

OK, that means that start_isolate_page_range should have failed but it
hasn't for some reason. I strongly suspect has_unmovable_pages is doing
something wrong. Is the kernel text marked somehow? E.g. PageReserved?
In other words, does the diff below helps?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3badcedf96a7..00d042052501 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7368,6 +7368,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 
 		page = pfn_to_page(check);
 
+		if (PageReserved(page))
+			return true;
+
 		/*
 		 * Hugepages are not in LRU lists, but they're movable.
 		 * We need not scan over tail pages bacause we don't


> With commit 3aa2823fdf66 ("mm, memory_hotplug: remove timeout from
> __offline_memory") also applied, it appears to just get stuck forever,
> and I get lots of:
> 
>   [ 1232.112953] INFO: task kworker/3:0:4609 blocked for more than 120 seconds.
>   [ 1232.113067]       Not tainted 4.14.0-rc4-gcc6-next-20171009-g49827b9 #1
>   [ 1232.113183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   [ 1232.113319] kworker/3:0     D11984  4609      2 0x00000800
>   [ 1232.113416] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
>   [ 1232.113531] Call Trace:
>   [ 1232.113579] [c0000000fb2db7a0] [c0000000fb2db900] 0xc0000000fb2db900 (unreliable)
>   [ 1232.113717] [c0000000fb2db970] [c00000000001c964] __switch_to+0x304/0x6e0
>   [ 1232.113840] [c0000000fb2dba10] [c000000000a408c0] __schedule+0x2e0/0xa80
>   [ 1232.113978] [c0000000fb2dbae0] [c000000000a410a8] schedule+0x48/0xc0
>   [ 1232.114113] [c0000000fb2dbb10] [c000000000a44d88] rwsem_down_read_failed+0x128/0x1b0
>   [ 1232.114269] [c0000000fb2dbb70] [c0000000001696a8] __percpu_down_read+0x108/0x110
>   [ 1232.114426] [c0000000fb2dbba0] [c00000000032e498] get_online_mems+0x68/0x80
>   [ 1232.115487] [c0000000fb2dbbc0] [c0000000002c82ec] memcg_create_kmem_cache+0x4c/0x190
>   [ 1232.115651] [c0000000fb2dbc60] [c0000000003483b8] memcg_kmem_cache_create_func+0x38/0xf0
>   [ 1232.115809] [c0000000fb2dbc90] [c000000000121594] process_one_work+0x2b4/0x590
>   [ 1232.115964] [c0000000fb2dbd20] [c000000000121908] worker_thread+0x98/0x5d0
>   [ 1232.116095] [c0000000fb2dbdc0] [c00000000012a134] kthread+0x164/0x1b0
>   [ 1232.116229] [c0000000fb2dbe30] [c00000000000bae0] ret_from_kernel_thread+0x5c/0x7c

I do not see how this is related to the offline path.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-10 12:27       ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-10 12:27 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML,
	Vlastimil Babka

On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > From: Michal Hocko <mhocko@suse.com>
> >
> > Memory offlining can fail just too eagerly under a heavy memory pressure.
> >
> > [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
> > [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
> > [ 5410.336811] page dumped because: isolation failed
> > [ 5410.336813] page->mem_cgroup:ffff8801cd662000
> > [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
> >
> > Isolation has failed here because the page is not on LRU. Most probably
> > because it was on the pcp LRU cache or it has been removed from the LRU
> > already but it hasn't been freed yet. In both cases the page doesn't look
> > non-migrable so retrying more makes sense.
> 
> This breaks offline for me.
> 
> Prior to this commit:
>   /sys/devices/system/memory/memory0# time echo 0 > online
>   -bash: echo: write error: Device or resource busy
>   
>   real	0m0.001s
>   user	0m0.000s
>   sys	0m0.001s
> 
> After:
>   /sys/devices/system/memory/memory0# time echo 0 > online
>   -bash: echo: write error: Device or resource busy
>   
>   real	2m0.009s
>   user	0m0.000s
>   sys	1m25.035s
> 
> 
> There's no way that block can be removed, it contains the kernel text,
> so it should instantly fail - which it used to.

OK, that means that start_isolate_page_range should have failed but it
hasn't for some reason. I strongly suspect has_unmovable_pages is doing
something wrong. Is the kernel text marked somehow? E.g. PageReserved?
In other words, does the diff below helps?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3badcedf96a7..00d042052501 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7368,6 +7368,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 
 		page = pfn_to_page(check);
 
+		if (PageReserved(page))
+			return true;
+
 		/*
 		 * Hugepages are not in LRU lists, but they're movable.
 		 * We need not scan over tail pages bacause we don't


> With commit 3aa2823fdf66 ("mm, memory_hotplug: remove timeout from
> __offline_memory") also applied, it appears to just get stuck forever,
> and I get lots of:
> 
>   [ 1232.112953] INFO: task kworker/3:0:4609 blocked for more than 120 seconds.
>   [ 1232.113067]       Not tainted 4.14.0-rc4-gcc6-next-20171009-g49827b9 #1
>   [ 1232.113183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   [ 1232.113319] kworker/3:0     D11984  4609      2 0x00000800
>   [ 1232.113416] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
>   [ 1232.113531] Call Trace:
>   [ 1232.113579] [c0000000fb2db7a0] [c0000000fb2db900] 0xc0000000fb2db900 (unreliable)
>   [ 1232.113717] [c0000000fb2db970] [c00000000001c964] __switch_to+0x304/0x6e0
>   [ 1232.113840] [c0000000fb2dba10] [c000000000a408c0] __schedule+0x2e0/0xa80
>   [ 1232.113978] [c0000000fb2dbae0] [c000000000a410a8] schedule+0x48/0xc0
>   [ 1232.114113] [c0000000fb2dbb10] [c000000000a44d88] rwsem_down_read_failed+0x128/0x1b0
>   [ 1232.114269] [c0000000fb2dbb70] [c0000000001696a8] __percpu_down_read+0x108/0x110
>   [ 1232.114426] [c0000000fb2dbba0] [c00000000032e498] get_online_mems+0x68/0x80
>   [ 1232.115487] [c0000000fb2dbbc0] [c0000000002c82ec] memcg_create_kmem_cache+0x4c/0x190
>   [ 1232.115651] [c0000000fb2dbc60] [c0000000003483b8] memcg_kmem_cache_create_func+0x38/0xf0
>   [ 1232.115809] [c0000000fb2dbc90] [c000000000121594] process_one_work+0x2b4/0x590
>   [ 1232.115964] [c0000000fb2dbd20] [c000000000121908] worker_thread+0x98/0x5d0
>   [ 1232.116095] [c0000000fb2dbdc0] [c00000000012a134] kthread+0x164/0x1b0
>   [ 1232.116229] [c0000000fb2dbe30] [c00000000000bae0] ret_from_kernel_thread+0x5c/0x7c

I do not see how this is related to the offline path.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-10 12:27       ` Michal Hocko
@ 2017-10-11  2:37         ` Michael Ellerman
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael Ellerman @ 2017-10-11  2:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML,
	Vlastimil Babka

Michal Hocko <mhocko@kernel.org> writes:

> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
>> Michal Hocko <mhocko@kernel.org> writes:
>> 
>> > From: Michal Hocko <mhocko@suse.com>
>> >
>> > Memory offlining can fail just too eagerly under a heavy memory pressure.
>> >
>> > [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
>> > [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
>> > [ 5410.336811] page dumped because: isolation failed
>> > [ 5410.336813] page->mem_cgroup:ffff8801cd662000
>> > [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>> >
>> > Isolation has failed here because the page is not on LRU. Most probably
>> > because it was on the pcp LRU cache or it has been removed from the LRU
>> > already but it hasn't been freed yet. In both cases the page doesn't look
>> > non-migrable so retrying more makes sense.
>> 
>> This breaks offline for me.
>> 
>> Prior to this commit:
>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>   -bash: echo: write error: Device or resource busy
>>   
>>   real	0m0.001s
>>   user	0m0.000s
>>   sys	0m0.001s
>> 
>> After:
>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>   -bash: echo: write error: Device or resource busy
>>   
>>   real	2m0.009s
>>   user	0m0.000s
>>   sys	1m25.035s
>> 
>> 
>> There's no way that block can be removed, it contains the kernel text,
>> so it should instantly fail - which it used to.
>
> OK, that means that start_isolate_page_range should have failed but it
> hasn't for some reason. I strongly suspect has_unmovable_pages is doing
> something wrong. Is the kernel text marked somehow? E.g. PageReserved?

I'm not sure how the text is marked, will have to dig into that.

> In other words, does the diff below helps?

No that doesn't help.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3badcedf96a7..00d042052501 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7368,6 +7368,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  
>  		page = pfn_to_page(check);
>  
> +		if (PageReserved(page))
> +			return true;
> +
>  		/*
>  		 * Hugepages are not in LRU lists, but they're movable.
>  		 * We need not scan over tail pages bacause we don't
>
>
>> With commit 3aa2823fdf66 ("mm, memory_hotplug: remove timeout from
>> __offline_memory") also applied, it appears to just get stuck forever,
>> and I get lots of:
>> 
>>   [ 1232.112953] INFO: task kworker/3:0:4609 blocked for more than 120 seconds.
>>   [ 1232.113067]       Not tainted 4.14.0-rc4-gcc6-next-20171009-g49827b9 #1
>>   [ 1232.113183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>   [ 1232.113319] kworker/3:0     D11984  4609      2 0x00000800
>>   [ 1232.113416] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
>>   [ 1232.113531] Call Trace:
>>   [ 1232.113579] [c0000000fb2db7a0] [c0000000fb2db900] 0xc0000000fb2db900 (unreliable)
>>   [ 1232.113717] [c0000000fb2db970] [c00000000001c964] __switch_to+0x304/0x6e0
>>   [ 1232.113840] [c0000000fb2dba10] [c000000000a408c0] __schedule+0x2e0/0xa80
>>   [ 1232.113978] [c0000000fb2dbae0] [c000000000a410a8] schedule+0x48/0xc0
>>   [ 1232.114113] [c0000000fb2dbb10] [c000000000a44d88] rwsem_down_read_failed+0x128/0x1b0
>>   [ 1232.114269] [c0000000fb2dbb70] [c0000000001696a8] __percpu_down_read+0x108/0x110
>>   [ 1232.114426] [c0000000fb2dbba0] [c00000000032e498] get_online_mems+0x68/0x80
>>   [ 1232.115487] [c0000000fb2dbbc0] [c0000000002c82ec] memcg_create_kmem_cache+0x4c/0x190
>>   [ 1232.115651] [c0000000fb2dbc60] [c0000000003483b8] memcg_kmem_cache_create_func+0x38/0xf0
>>   [ 1232.115809] [c0000000fb2dbc90] [c000000000121594] process_one_work+0x2b4/0x590
>>   [ 1232.115964] [c0000000fb2dbd20] [c000000000121908] worker_thread+0x98/0x5d0
>>   [ 1232.116095] [c0000000fb2dbdc0] [c00000000012a134] kthread+0x164/0x1b0
>>   [ 1232.116229] [c0000000fb2dbe30] [c00000000000bae0] ret_from_kernel_thread+0x5c/0x7c
>
> I do not see how this is related to the offline path.

It's blocked doing get_online_mems(). So it's unrelated to the offline,
but it can't proceed until the offline finishes, which it never does,
IIUIC.

cheers

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-11  2:37         ` Michael Ellerman
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Ellerman @ 2017-10-11  2:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML,
	Vlastimil Babka

Michal Hocko <mhocko@kernel.org> writes:

> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
>> Michal Hocko <mhocko@kernel.org> writes:
>> 
>> > From: Michal Hocko <mhocko@suse.com>
>> >
>> > Memory offlining can fail just too eagerly under a heavy memory pressure.
>> >
>> > [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
>> > [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
>> > [ 5410.336811] page dumped because: isolation failed
>> > [ 5410.336813] page->mem_cgroup:ffff8801cd662000
>> > [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>> >
>> > Isolation has failed here because the page is not on LRU. Most probably
>> > because it was on the pcp LRU cache or it has been removed from the LRU
>> > already but it hasn't been freed yet. In both cases the page doesn't look
>> > non-migrable so retrying more makes sense.
>> 
>> This breaks offline for me.
>> 
>> Prior to this commit:
>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>   -bash: echo: write error: Device or resource busy
>>   
>>   real	0m0.001s
>>   user	0m0.000s
>>   sys	0m0.001s
>> 
>> After:
>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>   -bash: echo: write error: Device or resource busy
>>   
>>   real	2m0.009s
>>   user	0m0.000s
>>   sys	1m25.035s
>> 
>> 
>> There's no way that block can be removed, it contains the kernel text,
>> so it should instantly fail - which it used to.
>
> OK, that means that start_isolate_page_range should have failed but it
> hasn't for some reason. I strongly suspect has_unmovable_pages is doing
> something wrong. Is the kernel text marked somehow? E.g. PageReserved?

I'm not sure how the text is marked, will have to dig into that.

> In other words, does the diff below helps?

No that doesn't help.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3badcedf96a7..00d042052501 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7368,6 +7368,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  
>  		page = pfn_to_page(check);
>  
> +		if (PageReserved(page))
> +			return true;
> +
>  		/*
>  		 * Hugepages are not in LRU lists, but they're movable.
>  		 * We need not scan over tail pages bacause we don't
>
>
>> With commit 3aa2823fdf66 ("mm, memory_hotplug: remove timeout from
>> __offline_memory") also applied, it appears to just get stuck forever,
>> and I get lots of:
>> 
>>   [ 1232.112953] INFO: task kworker/3:0:4609 blocked for more than 120 seconds.
>>   [ 1232.113067]       Not tainted 4.14.0-rc4-gcc6-next-20171009-g49827b9 #1
>>   [ 1232.113183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>   [ 1232.113319] kworker/3:0     D11984  4609      2 0x00000800
>>   [ 1232.113416] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
>>   [ 1232.113531] Call Trace:
>>   [ 1232.113579] [c0000000fb2db7a0] [c0000000fb2db900] 0xc0000000fb2db900 (unreliable)
>>   [ 1232.113717] [c0000000fb2db970] [c00000000001c964] __switch_to+0x304/0x6e0
>>   [ 1232.113840] [c0000000fb2dba10] [c000000000a408c0] __schedule+0x2e0/0xa80
>>   [ 1232.113978] [c0000000fb2dbae0] [c000000000a410a8] schedule+0x48/0xc0
>>   [ 1232.114113] [c0000000fb2dbb10] [c000000000a44d88] rwsem_down_read_failed+0x128/0x1b0
>>   [ 1232.114269] [c0000000fb2dbb70] [c0000000001696a8] __percpu_down_read+0x108/0x110
>>   [ 1232.114426] [c0000000fb2dbba0] [c00000000032e498] get_online_mems+0x68/0x80
>>   [ 1232.115487] [c0000000fb2dbbc0] [c0000000002c82ec] memcg_create_kmem_cache+0x4c/0x190
>>   [ 1232.115651] [c0000000fb2dbc60] [c0000000003483b8] memcg_kmem_cache_create_func+0x38/0xf0
>>   [ 1232.115809] [c0000000fb2dbc90] [c000000000121594] process_one_work+0x2b4/0x590
>>   [ 1232.115964] [c0000000fb2dbd20] [c000000000121908] worker_thread+0x98/0x5d0
>>   [ 1232.116095] [c0000000fb2dbdc0] [c00000000012a134] kthread+0x164/0x1b0
>>   [ 1232.116229] [c0000000fb2dbe30] [c00000000000bae0] ret_from_kernel_thread+0x5c/0x7c
>
> I do not see how this is related to the offline path.

It's blocked doing get_online_mems(). So it's unrelated to the offline,
but it can't proceed until the offline finishes, which it never does,
IIUIC.

cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-11  2:37         ` Michael Ellerman
@ 2017-10-11  5:19           ` Michael Ellerman
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael Ellerman @ 2017-10-11  5:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML,
	Vlastimil Babka

Michael Ellerman <mpe@ellerman.id.au> writes:
> Michal Hocko <mhocko@kernel.org> writes:
>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
>>> Michal Hocko <mhocko@kernel.org> writes:
>>> > From: Michal Hocko <mhocko@suse.com>
>>> > Memory offlining can fail just too eagerly under a heavy memory pressure.
>>> >
>>> > [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
>>> > [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
>>> > [ 5410.336811] page dumped because: isolation failed
>>> > [ 5410.336813] page->mem_cgroup:ffff8801cd662000
>>> > [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>>> >
>>> > Isolation has failed here because the page is not on LRU. Most probably
>>> > because it was on the pcp LRU cache or it has been removed from the LRU
>>> > already but it hasn't been freed yet. In both cases the page doesn't look
>>> > non-migrable so retrying more makes sense.
>>> 
>>> This breaks offline for me.
>>> 
>>> Prior to this commit:
>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>   -bash: echo: write error: Device or resource busy
>>>   
>>>   real	0m0.001s
>>>   user	0m0.000s
>>>   sys	0m0.001s
>>> 
>>> After:
>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>   -bash: echo: write error: Device or resource busy
>>>   
>>>   real	2m0.009s
>>>   user	0m0.000s
>>>   sys	1m25.035s
>>> 
>>> There's no way that block can be removed, it contains the kernel text,
>>> so it should instantly fail - which it used to.
>>
>> OK, that means that start_isolate_page_range should have failed but it
>> hasn't for some reason. I strongly suspect has_unmovable_pages is doing
>> something wrong. Is the kernel text marked somehow? E.g. PageReserved?
>
> I'm not sure how the text is marked, will have to dig into that.

Yeah it's reserved:

  $ grep __init_begin /proc/kallsyms
  c000000000d70000 T __init_begin
  $ ./page-types -r -a 0x0,0xd7
               flags	page-count       MB  symbolic-flags			long-symbolic-flags
  0x0000000100000000	       215       13  __________________________r_______________	reserved
               total	       215       13


I added some printks, we're getting EBUSY from do_migrate_range(pfn, end_pfn).

So we seem to just have an infinite loop:

  repeat:
  	/* start memory hot removal */
  	ret = -EINTR;
  	if (signal_pending(current))
  		goto failed_removal;
  
  	cond_resched();
  	lru_add_drain_all_cpuslocked();
  	drain_all_pages(zone);
  
  	pfn = scan_movable_pages(start_pfn, end_pfn);
  	if (pfn) { /* We have movable pages */
  		ret = do_migrate_range(pfn, end_pfn);
  		printk_ratelimited("memory-hotplug: migrate range returned %ld\n", ret);
  		goto repeat;
  	}


eg:

  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  __offline_pages: 354031 callbacks suppressed
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  __offline_pages: 355794 callbacks suppressed


cheers

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-11  5:19           ` Michael Ellerman
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Ellerman @ 2017-10-11  5:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML,
	Vlastimil Babka

Michael Ellerman <mpe@ellerman.id.au> writes:
> Michal Hocko <mhocko@kernel.org> writes:
>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
>>> Michal Hocko <mhocko@kernel.org> writes:
>>> > From: Michal Hocko <mhocko@suse.com>
>>> > Memory offlining can fail just too eagerly under a heavy memory pressure.
>>> >
>>> > [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
>>> > [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
>>> > [ 5410.336811] page dumped because: isolation failed
>>> > [ 5410.336813] page->mem_cgroup:ffff8801cd662000
>>> > [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>>> >
>>> > Isolation has failed here because the page is not on LRU. Most probably
>>> > because it was on the pcp LRU cache or it has been removed from the LRU
>>> > already but it hasn't been freed yet. In both cases the page doesn't look
>>> > non-migrable so retrying more makes sense.
>>> 
>>> This breaks offline for me.
>>> 
>>> Prior to this commit:
>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>   -bash: echo: write error: Device or resource busy
>>>   
>>>   real	0m0.001s
>>>   user	0m0.000s
>>>   sys	0m0.001s
>>> 
>>> After:
>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>   -bash: echo: write error: Device or resource busy
>>>   
>>>   real	2m0.009s
>>>   user	0m0.000s
>>>   sys	1m25.035s
>>> 
>>> There's no way that block can be removed, it contains the kernel text,
>>> so it should instantly fail - which it used to.
>>
>> OK, that means that start_isolate_page_range should have failed but it
>> hasn't for some reason. I strongly suspect has_unmovable_pages is doing
>> something wrong. Is the kernel text marked somehow? E.g. PageReserved?
>
> I'm not sure how the text is marked, will have to dig into that.

Yeah it's reserved:

  $ grep __init_begin /proc/kallsyms
  c000000000d70000 T __init_begin
  $ ./page-types -r -a 0x0,0xd7
               flags	page-count       MB  symbolic-flags			long-symbolic-flags
  0x0000000100000000	       215       13  __________________________r_______________	reserved
               total	       215       13


I added some printks, we're getting EBUSY from do_migrate_range(pfn, end_pfn).

So we seem to just have an infinite loop:

  repeat:
  	/* start memory hot removal */
  	ret = -EINTR;
  	if (signal_pending(current))
  		goto failed_removal;
  
  	cond_resched();
  	lru_add_drain_all_cpuslocked();
  	drain_all_pages(zone);
  
  	pfn = scan_movable_pages(start_pfn, end_pfn);
  	if (pfn) { /* We have movable pages */
  		ret = do_migrate_range(pfn, end_pfn);
  		printk_ratelimited("memory-hotplug: migrate range returned %ld\n", ret);
  		goto repeat;
  	}


eg:

  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  __offline_pages: 354031 callbacks suppressed
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  memory-hotplug: migrate range returned -16
  __offline_pages: 355794 callbacks suppressed


cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-11  2:37         ` Michael Ellerman
@ 2017-10-11  6:51           ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-11  6:51 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML,
	Vlastimil Babka

On Wed 11-10-17 13:37:50, Michael Ellerman wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
> >> Michal Hocko <mhocko@kernel.org> writes:
> >> 
> >> > From: Michal Hocko <mhocko@suse.com>
> >> >
> >> > Memory offlining can fail just too eagerly under a heavy memory pressure.
> >> >
> >> > [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
> >> > [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
> >> > [ 5410.336811] page dumped because: isolation failed
> >> > [ 5410.336813] page->mem_cgroup:ffff8801cd662000
> >> > [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
> >> >
> >> > Isolation has failed here because the page is not on LRU. Most probably
> >> > because it was on the pcp LRU cache or it has been removed from the LRU
> >> > already but it hasn't been freed yet. In both cases the page doesn't look
> >> > non-migrable so retrying more makes sense.
> >> 
> >> This breaks offline for me.
> >> 
> >> Prior to this commit:
> >>   /sys/devices/system/memory/memory0# time echo 0 > online
> >>   -bash: echo: write error: Device or resource busy
> >>   
> >>   real	0m0.001s
> >>   user	0m0.000s
> >>   sys	0m0.001s
> >> 
> >> After:
> >>   /sys/devices/system/memory/memory0# time echo 0 > online
> >>   -bash: echo: write error: Device or resource busy
> >>   
> >>   real	2m0.009s
> >>   user	0m0.000s
> >>   sys	1m25.035s
> >> 
> >> 
> >> There's no way that block can be removed, it contains the kernel text,
> >> so it should instantly fail - which it used to.
> >
> > OK, that means that start_isolate_page_range should have failed but it
> > hasn't for some reason. I strongly suspect has_unmovable_pages is doing
> > something wrong. Is the kernel text marked somehow? E.g. PageReserved?
> 
> I'm not sure how the text is marked, will have to dig into that.
> 
> > In other words, does the diff below helps?
> 
> No that doesn't help.

This is really strange! As you write in other email the page is
reserved. That means that some of the earlier checks 
	if (zone_idx(zone) == ZONE_MOVABLE)
		return false;
	mt = get_pageblock_migratetype(page);
	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
		return false;
has bailed out early. I would be quite surprised if the kernel text was
sitting in the zone movable. The migrate type check is more fishy
AFAICS. I can imagine that the kernel text can share the movable or CMA
mt block. I am not really familiar with this function but it looks
suspicious. So does it help to remove this check?
--- 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3badcedf96a7..5b4d85ae445c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return false;
-	mt = get_pageblock_migratetype(page);
-	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
-		return false;
 
 	pfn = page_to_pfn(page);
 	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
@@ -7368,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 
 		page = pfn_to_page(check);
 
+		if (PageReserved(page))
+			return true;
+
 		/*
 		 * Hugepages are not in LRU lists, but they're movable.
 		 * We need not scan over tail pages bacause we don't

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-11  6:51           ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-11  6:51 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML,
	Vlastimil Babka

On Wed 11-10-17 13:37:50, Michael Ellerman wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
> >> Michal Hocko <mhocko@kernel.org> writes:
> >> 
> >> > From: Michal Hocko <mhocko@suse.com>
> >> >
> >> > Memory offlining can fail just too eagerly under a heavy memory pressure.
> >> >
> >> > [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
> >> > [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
> >> > [ 5410.336811] page dumped because: isolation failed
> >> > [ 5410.336813] page->mem_cgroup:ffff8801cd662000
> >> > [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
> >> >
> >> > Isolation has failed here because the page is not on LRU. Most probably
> >> > because it was on the pcp LRU cache or it has been removed from the LRU
> >> > already but it hasn't been freed yet. In both cases the page doesn't look
> >> > non-migrable so retrying more makes sense.
> >> 
> >> This breaks offline for me.
> >> 
> >> Prior to this commit:
> >>   /sys/devices/system/memory/memory0# time echo 0 > online
> >>   -bash: echo: write error: Device or resource busy
> >>   
> >>   real	0m0.001s
> >>   user	0m0.000s
> >>   sys	0m0.001s
> >> 
> >> After:
> >>   /sys/devices/system/memory/memory0# time echo 0 > online
> >>   -bash: echo: write error: Device or resource busy
> >>   
> >>   real	2m0.009s
> >>   user	0m0.000s
> >>   sys	1m25.035s
> >> 
> >> 
> >> There's no way that block can be removed, it contains the kernel text,
> >> so it should instantly fail - which it used to.
> >
> > OK, that means that start_isolate_page_range should have failed but it
> > hasn't for some reason. I strongly suspect has_unmovable_pages is doing
> > something wrong. Is the kernel text marked somehow? E.g. PageReserved?
> 
> I'm not sure how the text is marked, will have to dig into that.
> 
> > In other words, does the diff below helps?
> 
> No that doesn't help.

This is really strange! As you write in other email the page is
reserved. That means that some of the earlier checks 
	if (zone_idx(zone) == ZONE_MOVABLE)
		return false;
	mt = get_pageblock_migratetype(page);
	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
		return false;
has bailed out early. I would be quite surprised if the kernel text was
sitting in the zone movable. The migrate type check is more fishy
AFAICS. I can imagine that the kernel text can share the movable or CMA
mt block. I am not really familiar with this function but it looks
suspicious. So does it help to remove this check?
--- 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3badcedf96a7..5b4d85ae445c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return false;
-	mt = get_pageblock_migratetype(page);
-	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
-		return false;
 
 	pfn = page_to_pfn(page);
 	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
@@ -7368,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 
 		page = pfn_to_page(check);
 
+		if (PageReserved(page))
+			return true;
+
 		/*
 		 * Hugepages are not in LRU lists, but they're movable.
 		 * We need not scan over tail pages bacause we don't

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-11  6:51           ` Michal Hocko
@ 2017-10-11  8:04             ` Vlastimil Babka
  -1 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-11  8:04 UTC (permalink / raw)
  To: Michal Hocko, Michael Ellerman
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML

On 10/11/2017 08:51 AM, Michal Hocko wrote:
> On Wed 11-10-17 13:37:50, Michael Ellerman wrote:
>> Michal Hocko <mhocko@kernel.org> writes:
>>
>>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
>>>> Michal Hocko <mhocko@kernel.org> writes:
>>>>
>>>>> From: Michal Hocko <mhocko@suse.com>
>>>>>
>>>>> Memory offlining can fail just too eagerly under a heavy memory pressure.
>>>>>
>>>>> [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
>>>>> [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
>>>>> [ 5410.336811] page dumped because: isolation failed
>>>>> [ 5410.336813] page->mem_cgroup:ffff8801cd662000
>>>>> [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>>>>>
>>>>> Isolation has failed here because the page is not on LRU. Most probably
>>>>> because it was on the pcp LRU cache or it has been removed from the LRU
>>>>> already but it hasn't been freed yet. In both cases the page doesn't look
>>>>> non-migrable so retrying more makes sense.
>>>>
>>>> This breaks offline for me.
>>>>
>>>> Prior to this commit:
>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>   -bash: echo: write error: Device or resource busy

Well, that means offline didn't actually work for that block even before
this patch, right? Is it even a movable_node block? I guess not?

>>>>   real	0m0.001s
>>>>   user	0m0.000s
>>>>   sys	0m0.001s
>>>>
>>>> After:
>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>   -bash: echo: write error: Device or resource busy
>>>>   
>>>>   real	2m0.009s
>>>>   user	0m0.000s
>>>>   sys	1m25.035s
>>>>
>>>>
>>>> There's no way that block can be removed, it contains the kernel text,
>>>> so it should instantly fail - which it used to.

Ah, right. So your complain is really about that the failure is not
instant anymore for blocks that can't be offlined.

>>> OK, that means that start_isolate_page_range should have failed but it
>>> hasn't for some reason. I strongly suspect has_unmovable_pages is doing
>>> something wrong. Is the kernel text marked somehow? E.g. PageReserved?
>>
>> I'm not sure how the text is marked, will have to dig into that.
>>
>>> In other words, does the diff below helps?
>>
>> No that doesn't help.
> 
> This is really strange! As you write in other email the page is
> reserved. That means that some of the earlier checks 
> 	if (zone_idx(zone) == ZONE_MOVABLE)
> 		return false;
> 	mt = get_pageblock_migratetype(page);
> 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))

The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
guarantee there are no unmovable pages in the block (CMA block OTOH
should be a guarantee).

> 		return false;
> has bailed out early. I would be quite surprised if the kernel text was
> sitting in the zone movable. The migrate type check is more fishy
> AFAICS. I can imagine that the kernel text can share the movable or CMA
> mt block. I am not really familiar with this function but it looks
> suspicious. So does it help to remove this check?
> --- 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3badcedf96a7..5b4d85ae445c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  	 */
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return false;
> -	mt = get_pageblock_migratetype(page);
> -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> -		return false;
>  
>  	pfn = page_to_pfn(page);
>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
> @@ -7368,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  
>  		page = pfn_to_page(check);
>  
> +		if (PageReserved(page))
> +			return true;
> +
>  		/*
>  		 * Hugepages are not in LRU lists, but they're movable.
>  		 * We need not scan over tail pages bacause we don't
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-11  8:04             ` Vlastimil Babka
  0 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-11  8:04 UTC (permalink / raw)
  To: Michal Hocko, Michael Ellerman
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML

On 10/11/2017 08:51 AM, Michal Hocko wrote:
> On Wed 11-10-17 13:37:50, Michael Ellerman wrote:
>> Michal Hocko <mhocko@kernel.org> writes:
>>
>>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
>>>> Michal Hocko <mhocko@kernel.org> writes:
>>>>
>>>>> From: Michal Hocko <mhocko@suse.com>
>>>>>
>>>>> Memory offlining can fail just too eagerly under a heavy memory pressure.
>>>>>
>>>>> [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
>>>>> [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
>>>>> [ 5410.336811] page dumped because: isolation failed
>>>>> [ 5410.336813] page->mem_cgroup:ffff8801cd662000
>>>>> [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>>>>>
>>>>> Isolation has failed here because the page is not on LRU. Most probably
>>>>> because it was on the pcp LRU cache or it has been removed from the LRU
>>>>> already but it hasn't been freed yet. In both cases the page doesn't look
>>>>> non-migrable so retrying more makes sense.
>>>>
>>>> This breaks offline for me.
>>>>
>>>> Prior to this commit:
>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>   -bash: echo: write error: Device or resource busy

Well, that means offline didn't actually work for that block even before
this patch, right? Is it even a movable_node block? I guess not?

>>>>   real	0m0.001s
>>>>   user	0m0.000s
>>>>   sys	0m0.001s
>>>>
>>>> After:
>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>   -bash: echo: write error: Device or resource busy
>>>>   
>>>>   real	2m0.009s
>>>>   user	0m0.000s
>>>>   sys	1m25.035s
>>>>
>>>>
>>>> There's no way that block can be removed, it contains the kernel text,
>>>> so it should instantly fail - which it used to.

Ah, right. So your complain is really about that the failure is not
instant anymore for blocks that can't be offlined.

>>> OK, that means that start_isolate_page_range should have failed but it
>>> hasn't for some reason. I strongly suspect has_unmovable_pages is doing
>>> something wrong. Is the kernel text marked somehow? E.g. PageReserved?
>>
>> I'm not sure how the text is marked, will have to dig into that.
>>
>>> In other words, does the diff below helps?
>>
>> No that doesn't help.
> 
> This is really strange! As you write in other email the page is
> reserved. That means that some of the earlier checks 
> 	if (zone_idx(zone) == ZONE_MOVABLE)
> 		return false;
> 	mt = get_pageblock_migratetype(page);
> 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))

The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
guarantee there are no unmovable pages in the block (CMA block OTOH
should be a guarantee).

> 		return false;
> has bailed out early. I would be quite surprised if the kernel text was
> sitting in the zone movable. The migrate type check is more fishy
> AFAICS. I can imagine that the kernel text can share the movable or CMA
> mt block. I am not really familiar with this function but it looks
> suspicious. So does it help to remove this check?
> --- 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3badcedf96a7..5b4d85ae445c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  	 */
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return false;
> -	mt = get_pageblock_migratetype(page);
> -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> -		return false;
>  
>  	pfn = page_to_pfn(page);
>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
> @@ -7368,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  
>  		page = pfn_to_page(check);
>  
> +		if (PageReserved(page))
> +			return true;
> +
>  		/*
>  		 * Hugepages are not in LRU lists, but they're movable.
>  		 * We need not scan over tail pages bacause we don't
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-11  8:04             ` Vlastimil Babka
@ 2017-10-11  8:13               ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-11  8:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	linux-mm, LKML

On Wed 11-10-17 10:04:39, Vlastimil Babka wrote:
> On 10/11/2017 08:51 AM, Michal Hocko wrote:
[...]
> > This is really strange! As you write in other email the page is
> > reserved. That means that some of the earlier checks 
> > 	if (zone_idx(zone) == ZONE_MOVABLE)
> > 		return false;
> > 	mt = get_pageblock_migratetype(page);
> > 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> 
> The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
> guarantee there are no unmovable pages in the block (CMA block OTOH
> should be a guarantee).

OK, thanks for confirmation. I will remove the MIGRATE_MOVABLE check
here. Do you think it is worth removing CMA check as well? This is
merely an optimization AFAIU because we do not have to check the full
pageblockworth of pfns.

Anyway, let's way for Michael to confirm it really helps. If yes I will
post a full patch and ask Andrew to add it as a prerequisite for this
patch when sending to Linus to prevent the regression.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-11  8:13               ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-11  8:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	linux-mm, LKML

On Wed 11-10-17 10:04:39, Vlastimil Babka wrote:
> On 10/11/2017 08:51 AM, Michal Hocko wrote:
[...]
> > This is really strange! As you write in other email the page is
> > reserved. That means that some of the earlier checks 
> > 	if (zone_idx(zone) == ZONE_MOVABLE)
> > 		return false;
> > 	mt = get_pageblock_migratetype(page);
> > 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> 
> The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
> guarantee there are no unmovable pages in the block (CMA block OTOH
> should be a guarantee).

OK, thanks for confirmation. I will remove the MIGRATE_MOVABLE check
here. Do you think it is worth removing CMA check as well? This is
merely an optimization AFAIU because we do not have to check the full
pageblockworth of pfns.

Anyway, let's way for Michael to confirm it really helps. If yes I will
post a full patch and ask Andrew to add it as a prerequisite for this
patch when sending to Linus to prevent the regression.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-11  8:13               ` Michal Hocko
@ 2017-10-11 11:17                 ` Vlastimil Babka
  -1 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-11 11:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	linux-mm, LKML

On 10/11/2017 10:13 AM, Michal Hocko wrote:
> On Wed 11-10-17 10:04:39, Vlastimil Babka wrote:
>> On 10/11/2017 08:51 AM, Michal Hocko wrote:
> [...]
>>> This is really strange! As you write in other email the page is
>>> reserved. That means that some of the earlier checks 
>>> 	if (zone_idx(zone) == ZONE_MOVABLE)
>>> 		return false;
>>> 	mt = get_pageblock_migratetype(page);
>>> 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
>>
>> The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
>> guarantee there are no unmovable pages in the block (CMA block OTOH
>> should be a guarantee).
> 
> OK, thanks for confirmation. I will remove the MIGRATE_MOVABLE check
> here. Do you think it is worth removing CMA check as well? This is
> merely an optimization AFAIU because we do not have to check the full
> pageblockworth of pfns.

Actually, we should remove the CMA part as well. It's true that
MIGRATE_CMA does guarantee that the *buddy allocator* won't allocate
non-MOVABLE pages from the pageblock. But if the memory got allocated as
an actual CMA allocation (alloc_contig...) it will almost certainly not
be movable.

> Anyway, let's way for Michael to confirm it really helps. If yes I will
> post a full patch and ask Andrew to add it as a prerequisite for this
> patch when sending to Linus to prevent the regression.
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-11 11:17                 ` Vlastimil Babka
  0 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-11 11:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	linux-mm, LKML

On 10/11/2017 10:13 AM, Michal Hocko wrote:
> On Wed 11-10-17 10:04:39, Vlastimil Babka wrote:
>> On 10/11/2017 08:51 AM, Michal Hocko wrote:
> [...]
>>> This is really strange! As you write in other email the page is
>>> reserved. That means that some of the earlier checks 
>>> 	if (zone_idx(zone) == ZONE_MOVABLE)
>>> 		return false;
>>> 	mt = get_pageblock_migratetype(page);
>>> 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
>>
>> The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
>> guarantee there are no unmovable pages in the block (CMA block OTOH
>> should be a guarantee).
> 
> OK, thanks for confirmation. I will remove the MIGRATE_MOVABLE check
> here. Do you think it is worth removing CMA check as well? This is
> merely an optimization AFAIU because we do not have to check the full
> pageblockworth of pfns.

Actually, we should remove the CMA part as well. It's true that
MIGRATE_CMA does guarantee that the *buddy allocator* won't allocate
non-MOVABLE pages from the pageblock. But if the memory got allocated as
an actual CMA allocation (alloc_contig...) it will almost certainly not
be movable.

> Anyway, let's way for Michael to confirm it really helps. If yes I will
> post a full patch and ask Andrew to add it as a prerequisite for this
> patch when sending to Linus to prevent the regression.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-11 11:17                 ` Vlastimil Babka
@ 2017-10-11 11:24                   ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-11 11:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	linux-mm, LKML

On Wed 11-10-17 13:17:13, Vlastimil Babka wrote:
> On 10/11/2017 10:13 AM, Michal Hocko wrote:
> > On Wed 11-10-17 10:04:39, Vlastimil Babka wrote:
> >> On 10/11/2017 08:51 AM, Michal Hocko wrote:
> > [...]
> >>> This is really strange! As you write in other email the page is
> >>> reserved. That means that some of the earlier checks 
> >>> 	if (zone_idx(zone) == ZONE_MOVABLE)
> >>> 		return false;
> >>> 	mt = get_pageblock_migratetype(page);
> >>> 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> >>
> >> The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
> >> guarantee there are no unmovable pages in the block (CMA block OTOH
> >> should be a guarantee).
> > 
> > OK, thanks for confirmation. I will remove the MIGRATE_MOVABLE check
> > here. Do you think it is worth removing CMA check as well? This is
> > merely an optimization AFAIU because we do not have to check the full
> > pageblockworth of pfns.
> 
> Actually, we should remove the CMA part as well. It's true that
> MIGRATE_CMA does guarantee that the *buddy allocator* won't allocate
> non-MOVABLE pages from the pageblock. But if the memory got allocated as
> an actual CMA allocation (alloc_contig...) it will almost certainly not
> be movable.

That was my suspicious. Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-11 11:24                   ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-11 11:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	linux-mm, LKML

On Wed 11-10-17 13:17:13, Vlastimil Babka wrote:
> On 10/11/2017 10:13 AM, Michal Hocko wrote:
> > On Wed 11-10-17 10:04:39, Vlastimil Babka wrote:
> >> On 10/11/2017 08:51 AM, Michal Hocko wrote:
> > [...]
> >>> This is really strange! As you write in other email the page is
> >>> reserved. That means that some of the earlier checks 
> >>> 	if (zone_idx(zone) == ZONE_MOVABLE)
> >>> 		return false;
> >>> 	mt = get_pageblock_migratetype(page);
> >>> 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> >>
> >> The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
> >> guarantee there are no unmovable pages in the block (CMA block OTOH
> >> should be a guarantee).
> > 
> > OK, thanks for confirmation. I will remove the MIGRATE_MOVABLE check
> > here. Do you think it is worth removing CMA check as well? This is
> > merely an optimization AFAIU because we do not have to check the full
> > pageblockworth of pfns.
> 
> Actually, we should remove the CMA part as well. It's true that
> MIGRATE_CMA does guarantee that the *buddy allocator* won't allocate
> non-MOVABLE pages from the pageblock. But if the memory got allocated as
> an actual CMA allocation (alloc_contig...) it will almost certainly not
> be movable.

That was my suspicious. Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-11  5:19           ` Michael Ellerman
@ 2017-10-11 14:05             ` Anshuman Khandual
  -1 siblings, 0 replies; 102+ messages in thread
From: Anshuman Khandual @ 2017-10-11 14:05 UTC (permalink / raw)
  To: Michael Ellerman, Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML,
	Vlastimil Babka

On 10/11/2017 10:49 AM, Michael Ellerman wrote:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Michal Hocko <mhocko@kernel.org> writes:
>>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
>>>> Michal Hocko <mhocko@kernel.org> writes:
>>>>> From: Michal Hocko <mhocko@suse.com>
>>>>> Memory offlining can fail just too eagerly under a heavy memory pressure.
>>>>>
>>>>> [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
>>>>> [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
>>>>> [ 5410.336811] page dumped because: isolation failed
>>>>> [ 5410.336813] page->mem_cgroup:ffff8801cd662000
>>>>> [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>>>>>
>>>>> Isolation has failed here because the page is not on LRU. Most probably
>>>>> because it was on the pcp LRU cache or it has been removed from the LRU
>>>>> already but it hasn't been freed yet. In both cases the page doesn't look
>>>>> non-migrable so retrying more makes sense.
>>>> This breaks offline for me.
>>>>
>>>> Prior to this commit:
>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>   -bash: echo: write error: Device or resource busy
>>>>   
>>>>   real	0m0.001s
>>>>   user	0m0.000s
>>>>   sys	0m0.001s
>>>>
>>>> After:
>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>   -bash: echo: write error: Device or resource busy
>>>>   
>>>>   real	2m0.009s
>>>>   user	0m0.000s
>>>>   sys	1m25.035s
>>>>
>>>> There's no way that block can be removed, it contains the kernel text,
>>>> so it should instantly fail - which it used to.
>>> OK, that means that start_isolate_page_range should have failed but it
>>> hasn't for some reason. I strongly suspect has_unmovable_pages is doing
>>> something wrong. Is the kernel text marked somehow? E.g. PageReserved?
>> I'm not sure how the text is marked, will have to dig into that.
> Yeah it's reserved:
> 
>   $ grep __init_begin /proc/kallsyms
>   c000000000d70000 T __init_begin
>   $ ./page-types -r -a 0x0,0xd7
>                flags	page-count       MB  symbolic-flags			long-symbolic-flags
>   0x0000000100000000	       215       13  __________________________r_______________	reserved
>                total	       215       13

Hey Michael,

What tool is this 'page-types' ?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-11 14:05             ` Anshuman Khandual
  0 siblings, 0 replies; 102+ messages in thread
From: Anshuman Khandual @ 2017-10-11 14:05 UTC (permalink / raw)
  To: Michael Ellerman, Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML,
	Vlastimil Babka

On 10/11/2017 10:49 AM, Michael Ellerman wrote:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Michal Hocko <mhocko@kernel.org> writes:
>>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
>>>> Michal Hocko <mhocko@kernel.org> writes:
>>>>> From: Michal Hocko <mhocko@suse.com>
>>>>> Memory offlining can fail just too eagerly under a heavy memory pressure.
>>>>>
>>>>> [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
>>>>> [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
>>>>> [ 5410.336811] page dumped because: isolation failed
>>>>> [ 5410.336813] page->mem_cgroup:ffff8801cd662000
>>>>> [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>>>>>
>>>>> Isolation has failed here because the page is not on LRU. Most probably
>>>>> because it was on the pcp LRU cache or it has been removed from the LRU
>>>>> already but it hasn't been freed yet. In both cases the page doesn't look
>>>>> non-migrable so retrying more makes sense.
>>>> This breaks offline for me.
>>>>
>>>> Prior to this commit:
>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>   -bash: echo: write error: Device or resource busy
>>>>   
>>>>   real	0m0.001s
>>>>   user	0m0.000s
>>>>   sys	0m0.001s
>>>>
>>>> After:
>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>   -bash: echo: write error: Device or resource busy
>>>>   
>>>>   real	2m0.009s
>>>>   user	0m0.000s
>>>>   sys	1m25.035s
>>>>
>>>> There's no way that block can be removed, it contains the kernel text,
>>>> so it should instantly fail - which it used to.
>>> OK, that means that start_isolate_page_range should have failed but it
>>> hasn't for some reason. I strongly suspect has_unmovable_pages is doing
>>> something wrong. Is the kernel text marked somehow? E.g. PageReserved?
>> I'm not sure how the text is marked, will have to dig into that.
> Yeah it's reserved:
> 
>   $ grep __init_begin /proc/kallsyms
>   c000000000d70000 T __init_begin
>   $ ./page-types -r -a 0x0,0xd7
>                flags	page-count       MB  symbolic-flags			long-symbolic-flags
>   0x0000000100000000	       215       13  __________________________r_______________	reserved
>                total	       215       13

Hey Michael,

What tool is this 'page-types' ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-11 14:05             ` Anshuman Khandual
@ 2017-10-11 14:16               ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-11 14:16 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	linux-mm, LKML, Vlastimil Babka

On Wed 11-10-17 19:35:04, Anshuman Khandual wrote:
[...]
> >   $ grep __init_begin /proc/kallsyms
> >   c000000000d70000 T __init_begin
> >   $ ./page-types -r -a 0x0,0xd7
> >                flags	page-count       MB  symbolic-flags			long-symbolic-flags
> >   0x0000000100000000	       215       13  __________________________r_______________	reserved
> >                total	       215       13
> 
> Hey Michael,
> 
> What tool is this 'page-types' ?

tools/vm/page-types.c

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-11 14:16               ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-11 14:16 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	linux-mm, LKML, Vlastimil Babka

On Wed 11-10-17 19:35:04, Anshuman Khandual wrote:
[...]
> >   $ grep __init_begin /proc/kallsyms
> >   c000000000d70000 T __init_begin
> >   $ ./page-types -r -a 0x0,0xd7
> >                flags	page-count       MB  symbolic-flags			long-symbolic-flags
> >   0x0000000100000000	       215       13  __________________________r_______________	reserved
> >                total	       215       13
> 
> Hey Michael,
> 
> What tool is this 'page-types' ?

tools/vm/page-types.c

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-11  8:04             ` Vlastimil Babka
@ 2017-10-13 11:42               ` Michael Ellerman
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael Ellerman @ 2017-10-13 11:42 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML

Vlastimil Babka <vbabka@suse.cz> writes:
> On 10/11/2017 08:51 AM, Michal Hocko wrote:
>> On Wed 11-10-17 13:37:50, Michael Ellerman wrote:
>>> Michal Hocko <mhocko@kernel.org> writes:
>>>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
>>>>> Michal Hocko <mhocko@kernel.org> writes:
>>>>>> From: Michal Hocko <mhocko@suse.com>
>>>>>>
>>>>>> Memory offlining can fail just too eagerly under a heavy memory pressure.
...
>>>>>
>>>>> This breaks offline for me.
>>>>>
>>>>> Prior to this commit:
>>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>>   -bash: echo: write error: Device or resource busy
>
> Well, that means offline didn't actually work for that block even before
> this patch, right? Is it even a movable_node block? I guess not?

Correct. It should fail.

>>>>> After:
>>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>>   -bash: echo: write error: Device or resource busy
>>>>>   
>>>>>   real	2m0.009s
>>>>>   user	0m0.000s
>>>>>   sys	1m25.035s
>>>>>
>>>>> There's no way that block can be removed, it contains the kernel text,
>>>>> so it should instantly fail - which it used to.
>
> Ah, right. So your complain is really about that the failure is not
> instant anymore for blocks that can't be offlined.

Yes. Previously it failed instantly, now it doesn't fail, and loops
infinitely (once the 2 minute limit is removed).

>> This is really strange! As you write in other email the page is
>> reserved. That means that some of the earlier checks 
>> 	if (zone_idx(zone) == ZONE_MOVABLE)
>> 		return false;
>> 	mt = get_pageblock_migratetype(page);
>> 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
>
> The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
> guarantee there are no unmovable pages in the block (CMA block OTOH
> should be a guarantee).

OK I'll try that and get back to you.

cheers


>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3badcedf96a7..5b4d85ae445c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>>  	 */
>>  	if (zone_idx(zone) == ZONE_MOVABLE)
>>  		return false;
>> -	mt = get_pageblock_migratetype(page);
>> -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
>> -		return false;
>>  
>>  	pfn = page_to_pfn(page);
>>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
>> @@ -7368,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>>  
>>  		page = pfn_to_page(check);
>>  
>> +		if (PageReserved(page))
>> +			return true;
>> +
>>  		/*
>>  		 * Hugepages are not in LRU lists, but they're movable.
>>  		 * We need not scan over tail pages bacause we don't
>> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-13 11:42               ` Michael Ellerman
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Ellerman @ 2017-10-13 11:42 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Igor Mammedov, Vitaly Kuznetsov, linux-mm, LKML

Vlastimil Babka <vbabka@suse.cz> writes:
> On 10/11/2017 08:51 AM, Michal Hocko wrote:
>> On Wed 11-10-17 13:37:50, Michael Ellerman wrote:
>>> Michal Hocko <mhocko@kernel.org> writes:
>>>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
>>>>> Michal Hocko <mhocko@kernel.org> writes:
>>>>>> From: Michal Hocko <mhocko@suse.com>
>>>>>>
>>>>>> Memory offlining can fail just too eagerly under a heavy memory pressure.
...
>>>>>
>>>>> This breaks offline for me.
>>>>>
>>>>> Prior to this commit:
>>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>>   -bash: echo: write error: Device or resource busy
>
> Well, that means offline didn't actually work for that block even before
> this patch, right? Is it even a movable_node block? I guess not?

Correct. It should fail.

>>>>> After:
>>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
>>>>>   -bash: echo: write error: Device or resource busy
>>>>>   
>>>>>   real	2m0.009s
>>>>>   user	0m0.000s
>>>>>   sys	1m25.035s
>>>>>
>>>>> There's no way that block can be removed, it contains the kernel text,
>>>>> so it should instantly fail - which it used to.
>
> Ah, right. So your complain is really about that the failure is not
> instant anymore for blocks that can't be offlined.

Yes. Previously it failed instantly, now it doesn't fail, and loops
infinitely (once the 2 minute limit is removed).

>> This is really strange! As you write in other email the page is
>> reserved. That means that some of the earlier checks 
>> 	if (zone_idx(zone) == ZONE_MOVABLE)
>> 		return false;
>> 	mt = get_pageblock_migratetype(page);
>> 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
>
> The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
> guarantee there are no unmovable pages in the block (CMA block OTOH
> should be a guarantee).

OK I'll try that and get back to you.

cheers


>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3badcedf96a7..5b4d85ae445c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>>  	 */
>>  	if (zone_idx(zone) == ZONE_MOVABLE)
>>  		return false;
>> -	mt = get_pageblock_migratetype(page);
>> -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
>> -		return false;
>>  
>>  	pfn = page_to_pfn(page);
>>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
>> @@ -7368,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>>  
>>  		page = pfn_to_page(check);
>>  
>> +		if (PageReserved(page))
>> +			return true;
>> +
>>  		/*
>>  		 * Hugepages are not in LRU lists, but they're movable.
>>  		 * We need not scan over tail pages bacause we don't
>> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
  2017-10-13 11:42               ` Michael Ellerman
@ 2017-10-13 11:58                 ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-13 11:58 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Vlastimil Babka, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	linux-mm, LKML

On Fri 13-10-17 22:42:46, Michael Ellerman wrote:
> Vlastimil Babka <vbabka@suse.cz> writes:
> > On 10/11/2017 08:51 AM, Michal Hocko wrote:
> >> On Wed 11-10-17 13:37:50, Michael Ellerman wrote:
> >>> Michal Hocko <mhocko@kernel.org> writes:
> >>>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
> >>>>> Michal Hocko <mhocko@kernel.org> writes:
> >>>>>> From: Michal Hocko <mhocko@suse.com>
> >>>>>>
> >>>>>> Memory offlining can fail just too eagerly under a heavy memory pressure.
> ...
> >>>>>
> >>>>> This breaks offline for me.
> >>>>>
> >>>>> Prior to this commit:
> >>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
> >>>>>   -bash: echo: write error: Device or resource busy
> >
> > Well, that means offline didn't actually work for that block even before
> > this patch, right? Is it even a movable_node block? I guess not?
> 
> Correct. It should fail.
> 
> >>>>> After:
> >>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
> >>>>>   -bash: echo: write error: Device or resource busy
> >>>>>   
> >>>>>   real	2m0.009s
> >>>>>   user	0m0.000s
> >>>>>   sys	1m25.035s
> >>>>>
> >>>>> There's no way that block can be removed, it contains the kernel text,
> >>>>> so it should instantly fail - which it used to.
> >
> > Ah, right. So your complain is really about that the failure is not
> > instant anymore for blocks that can't be offlined.
> 
> Yes. Previously it failed instantly, now it doesn't fail, and loops
> infinitely (once the 2 minute limit is removed).

Yeah it failed only because the migration code retried few times and we
bailed out which is wrong as well. I will send two patches as a reply to
this email.

> >> This is really strange! As you write in other email the page is
> >> reserved. That means that some of the earlier checks 
> >> 	if (zone_idx(zone) == ZONE_MOVABLE)
> >> 		return false;
> >> 	mt = get_pageblock_migratetype(page);
> >> 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> >
> > The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
> > guarantee there are no unmovable pages in the block (CMA block OTOH
> > should be a guarantee).
> 
> OK I'll try that and get back to you.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
@ 2017-10-13 11:58                 ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-13 11:58 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Vlastimil Babka, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	linux-mm, LKML

On Fri 13-10-17 22:42:46, Michael Ellerman wrote:
> Vlastimil Babka <vbabka@suse.cz> writes:
> > On 10/11/2017 08:51 AM, Michal Hocko wrote:
> >> On Wed 11-10-17 13:37:50, Michael Ellerman wrote:
> >>> Michal Hocko <mhocko@kernel.org> writes:
> >>>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
> >>>>> Michal Hocko <mhocko@kernel.org> writes:
> >>>>>> From: Michal Hocko <mhocko@suse.com>
> >>>>>>
> >>>>>> Memory offlining can fail just too eagerly under a heavy memory pressure.
> ...
> >>>>>
> >>>>> This breaks offline for me.
> >>>>>
> >>>>> Prior to this commit:
> >>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
> >>>>>   -bash: echo: write error: Device or resource busy
> >
> > Well, that means offline didn't actually work for that block even before
> > this patch, right? Is it even a movable_node block? I guess not?
> 
> Correct. It should fail.
> 
> >>>>> After:
> >>>>>   /sys/devices/system/memory/memory0# time echo 0 > online
> >>>>>   -bash: echo: write error: Device or resource busy
> >>>>>   
> >>>>>   real	2m0.009s
> >>>>>   user	0m0.000s
> >>>>>   sys	1m25.035s
> >>>>>
> >>>>> There's no way that block can be removed, it contains the kernel text,
> >>>>> so it should instantly fail - which it used to.
> >
> > Ah, right. So your complain is really about that the failure is not
> > instant anymore for blocks that can't be offlined.
> 
> Yes. Previously it failed instantly, now it doesn't fail, and loops
> infinitely (once the 2 minute limit is removed).

Yeah it failed only because the migration code retried few times and we
bailed out which is wrong as well. I will send two patches as a reply to
this email.

> >> This is really strange! As you write in other email the page is
> >> reserved. That means that some of the earlier checks 
> >> 	if (zone_idx(zone) == ZONE_MOVABLE)
> >> 		return false;
> >> 	mt = get_pageblock_migratetype(page);
> >> 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> >
> > The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
> > guarantee there are no unmovable pages in the block (CMA block OTOH
> > should be a guarantee).
> 
> OK I'll try that and get back to you.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-13 11:58                 ` Michal Hocko
@ 2017-10-13 12:00                   ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-13 12:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Michael has noticed that the memory offline tries to migrate kernel code
pages when doing
 echo 0 > /sys/devices/system/memory/memory0/online

The current implementation will fail the operation after several failed
page migration attempts but we shouldn't even attempt to migrate
that memory and fail right away because this memory is clearly not
migrateable. This will become a real problem when we drop the retry loop
counter resp. timeout.

The real problem is in has_unmovable_pages in fact. We should fail if
there are any non migrateable pages in the area. In orther to guarantee
that remove the migrate type checks because MIGRATE_MOVABLE is not
guaranteed to contain only migrateable pages. It is merely a heuristic.
Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
allocate any non-migrateable pages from the block but CMA allocations
themselves are unlikely to migrateable. Therefore remove both checks.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3badcedf96a7..ad0294ab3e4f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return false;
-	mt = get_pageblock_migratetype(page);
-	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
-		return false;
 
 	pfn = page_to_pfn(page);
 	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-13 12:00                   ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-13 12:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Michael has noticed that the memory offline tries to migrate kernel code
pages when doing
 echo 0 > /sys/devices/system/memory/memory0/online

The current implementation will fail the operation after several failed
page migration attempts but we shouldn't even attempt to migrate
that memory and fail right away because this memory is clearly not
migrateable. This will become a real problem when we drop the retry loop
counter resp. timeout.

The real problem is in has_unmovable_pages in fact. We should fail if
there are any non migrateable pages in the area. In orther to guarantee
that remove the migrate type checks because MIGRATE_MOVABLE is not
guaranteed to contain only migrateable pages. It is merely a heuristic.
Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
allocate any non-migrateable pages from the block but CMA allocations
themselves are unlikely to migrateable. Therefore remove both checks.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3badcedf96a7..ad0294ab3e4f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return false;
-	mt = get_pageblock_migratetype(page);
-	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
-		return false;
 
 	pfn = page_to_pfn(page);
 	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 2/2] mm, page_alloc: fail has_unmovable_pages when seeing reserved pages
  2017-10-13 12:00                   ` Michal Hocko
@ 2017-10-13 12:00                     ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-13 12:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Reserved pages should be completely ignored by the core mm because they
have a special meaning for their owners. has_unmovable_pages doesn't
check those so we rely on other tests (reference count, or PageLRU) to
fail on such pages. Althought this happens to work it is safer to simply
check for those explicitly and do not rely on the owner of the page
to abuse those fields for special purposes.

Please note that this is more of a further fortification of the code
rahter than a fix of an existing issue.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ad0294ab3e4f..a8800b0a5619 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7365,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 
 		page = pfn_to_page(check);
 
+		if (PageReferenced(page))
+			return true;
+
 		/*
 		 * Hugepages are not in LRU lists, but they're movable.
 		 * We need not scan over tail pages bacause we don't
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 2/2] mm, page_alloc: fail has_unmovable_pages when seeing reserved pages
@ 2017-10-13 12:00                     ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-13 12:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Reserved pages should be completely ignored by the core mm because they
have a special meaning for their owners. has_unmovable_pages doesn't
check those so we rely on other tests (reference count, or PageLRU) to
fail on such pages. Althought this happens to work it is safer to simply
check for those explicitly and do not rely on the owner of the page
to abuse those fields for special purposes.

Please note that this is more of a further fortification of the code
rahter than a fix of an existing issue.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ad0294ab3e4f..a8800b0a5619 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7365,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 
 		page = pfn_to_page(check);
 
+		if (PageReferenced(page))
+			return true;
+
 		/*
 		 * Hugepages are not in LRU lists, but they're movable.
 		 * We need not scan over tail pages bacause we don't
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] mm, page_alloc: fail has_unmovable_pages when seeing reserved pages
  2017-10-13 12:00                     ` Michal Hocko
@ 2017-10-13 12:04                       ` Vlastimil Babka
  -1 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-13 12:04 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	LKML, Michal Hocko

On 10/13/2017 02:00 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Reserved pages should be completely ignored by the core mm because they
> have a special meaning for their owners. has_unmovable_pages doesn't
> check those so we rely on other tests (reference count, or PageLRU) to
> fail on such pages. Althought this happens to work it is safer to simply
> check for those explicitly and do not rely on the owner of the page
> to abuse those fields for special purposes.
> 
> Please note that this is more of a further fortification of the code
> rahter than a fix of an existing issue.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_alloc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ad0294ab3e4f..a8800b0a5619 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7365,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  
>  		page = pfn_to_page(check);
>  
> +		if (PageReferenced(page))

"Referenced" != "Reserved"

> +			return true;
> +
>  		/*
>  		 * Hugepages are not in LRU lists, but they're movable.
>  		 * We need not scan over tail pages bacause we don't
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] mm, page_alloc: fail has_unmovable_pages when seeing reserved pages
@ 2017-10-13 12:04                       ` Vlastimil Babka
  0 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-13 12:04 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	LKML, Michal Hocko

On 10/13/2017 02:00 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Reserved pages should be completely ignored by the core mm because they
> have a special meaning for their owners. has_unmovable_pages doesn't
> check those so we rely on other tests (reference count, or PageLRU) to
> fail on such pages. Althought this happens to work it is safer to simply
> check for those explicitly and do not rely on the owner of the page
> to abuse those fields for special purposes.
> 
> Please note that this is more of a further fortification of the code
> rahter than a fix of an existing issue.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_alloc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ad0294ab3e4f..a8800b0a5619 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7365,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  
>  		page = pfn_to_page(check);
>  
> +		if (PageReferenced(page))

"Referenced" != "Reserved"

> +			return true;
> +
>  		/*
>  		 * Hugepages are not in LRU lists, but they're movable.
>  		 * We need not scan over tail pages bacause we don't
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] mm, page_alloc: fail has_unmovable_pages when seeing reserved pages
  2017-10-13 12:04                       ` Vlastimil Babka
@ 2017-10-13 12:07                         ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-13 12:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Igor Mammedov,
	Vitaly Kuznetsov, LKML

On Fri 13-10-17 14:04:08, Vlastimil Babka wrote:
> On 10/13/2017 02:00 PM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Reserved pages should be completely ignored by the core mm because they
> > have a special meaning for their owners. has_unmovable_pages doesn't
> > check those so we rely on other tests (reference count, or PageLRU) to
> > fail on such pages. Althought this happens to work it is safer to simply
> > check for those explicitly and do not rely on the owner of the page
> > to abuse those fields for special purposes.
> > 
> > Please note that this is more of a further fortification of the code
> > rahter than a fix of an existing issue.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/page_alloc.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index ad0294ab3e4f..a8800b0a5619 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7365,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> >  
> >  		page = pfn_to_page(check);
> >  
> > +		if (PageReferenced(page))
> 
> "Referenced" != "Reserved"

Dohh, you are right of course. I blame auto-completion ;) but I am lame
in fact...
---
>From 44b20bdb03846bc5fd79c883d16b8f3aa436878f Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Fri, 13 Oct 2017 13:55:21 +0200
Subject: [PATCH] mm, page_alloc: fail has_unmovable_pages when seeing reserved
 pages

Reserved pages should be completely ignored by the core mm because they
have a special meaning for their owners. has_unmovable_pages doesn't
check those so we rely on other tests (reference count, or PageLRU) to
fail on such pages. Althought this happens to work it is safer to simply
check for those explicitly and do not rely on the owner of the page
to abuse those fields for special purposes.

Please note that this is more of a further fortification of the code
rahter than a fix of an existing issue.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ad0294ab3e4f..5b4d85ae445c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7365,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 
 		page = pfn_to_page(check);
 
+		if (PageReserved(page))
+			return true;
+
 		/*
 		 * Hugepages are not in LRU lists, but they're movable.
 		 * We need not scan over tail pages bacause we don't
-- 
2.14.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] mm, page_alloc: fail has_unmovable_pages when seeing reserved pages
@ 2017-10-13 12:07                         ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-13 12:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Igor Mammedov,
	Vitaly Kuznetsov, LKML

On Fri 13-10-17 14:04:08, Vlastimil Babka wrote:
> On 10/13/2017 02:00 PM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Reserved pages should be completely ignored by the core mm because they
> > have a special meaning for their owners. has_unmovable_pages doesn't
> > check those so we rely on other tests (reference count, or PageLRU) to
> > fail on such pages. Althought this happens to work it is safer to simply
> > check for those explicitly and do not rely on the owner of the page
> > to abuse those fields for special purposes.
> > 
> > Please note that this is more of a further fortification of the code
> > rahter than a fix of an existing issue.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/page_alloc.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index ad0294ab3e4f..a8800b0a5619 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7365,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> >  
> >  		page = pfn_to_page(check);
> >  
> > +		if (PageReferenced(page))
> 
> "Referenced" != "Reserved"

Dohh, you are right of course. I blame auto-completion ;) but I am lame
in fact...
---

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-13 12:00                   ` Michal Hocko
@ 2017-10-17 11:41                     ` Michael Ellerman
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael Ellerman @ 2017-10-17 11:41 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Vlastimil Babka, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	LKML, Michal Hocko

Michal Hocko <mhocko@kernel.org> writes:

> From: Michal Hocko <mhocko@suse.com>
>
> Michael has noticed that the memory offline tries to migrate kernel code
> pages when doing
>  echo 0 > /sys/devices/system/memory/memory0/online
>
> The current implementation will fail the operation after several failed
> page migration attempts but we shouldn't even attempt to migrate
> that memory and fail right away because this memory is clearly not
> migrateable. This will become a real problem when we drop the retry loop
> counter resp. timeout.
>
> The real problem is in has_unmovable_pages in fact. We should fail if
> there are any non migrateable pages in the area. In orther to guarantee
> that remove the migrate type checks because MIGRATE_MOVABLE is not
> guaranteed to contain only migrateable pages. It is merely a heuristic.
> Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> allocate any non-migrateable pages from the block but CMA allocations
> themselves are unlikely to migrateable. Therefore remove both checks.
>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Thanks, that works for me.

Tested-by: Michael Ellerman <mpe@ellerman.id.au>

cheers

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-17 11:41                     ` Michael Ellerman
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Ellerman @ 2017-10-17 11:41 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Vlastimil Babka, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	LKML, Michal Hocko

Michal Hocko <mhocko@kernel.org> writes:

> From: Michal Hocko <mhocko@suse.com>
>
> Michael has noticed that the memory offline tries to migrate kernel code
> pages when doing
>  echo 0 > /sys/devices/system/memory/memory0/online
>
> The current implementation will fail the operation after several failed
> page migration attempts but we shouldn't even attempt to migrate
> that memory and fail right away because this memory is clearly not
> migrateable. This will become a real problem when we drop the retry loop
> counter resp. timeout.
>
> The real problem is in has_unmovable_pages in fact. We should fail if
> there are any non migrateable pages in the area. In orther to guarantee
> that remove the migrate type checks because MIGRATE_MOVABLE is not
> guaranteed to contain only migrateable pages. It is merely a heuristic.
> Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> allocate any non-migrateable pages from the block but CMA allocations
> themselves are unlikely to migrateable. Therefore remove both checks.
>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Thanks, that works for me.

Tested-by: Michael Ellerman <mpe@ellerman.id.au>

cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-17 11:41                     ` Michael Ellerman
@ 2017-10-17 12:03                       ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-17 12:03 UTC (permalink / raw)
  To: Michael Ellerman, Andrew Morton
  Cc: linux-mm, Vlastimil Babka, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	LKML

On Tue 17-10-17 22:41:08, Michael Ellerman wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > From: Michal Hocko <mhocko@suse.com>
> >
> > Michael has noticed that the memory offline tries to migrate kernel code
> > pages when doing
> >  echo 0 > /sys/devices/system/memory/memory0/online
> >
> > The current implementation will fail the operation after several failed
> > page migration attempts but we shouldn't even attempt to migrate
> > that memory and fail right away because this memory is clearly not
> > migrateable. This will become a real problem when we drop the retry loop
> > counter resp. timeout.
> >
> > The real problem is in has_unmovable_pages in fact. We should fail if
> > there are any non migrateable pages in the area. In orther to guarantee
> > that remove the migrate type checks because MIGRATE_MOVABLE is not
> > guaranteed to contain only migrateable pages. It is merely a heuristic.
> > Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> > allocate any non-migrateable pages from the block but CMA allocations
> > themselves are unlikely to migrateable. Therefore remove both checks.
> >
> > Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Thanks, that works for me.
> 
> Tested-by: Michael Ellerman <mpe@ellerman.id.au>

Thanks a lot Michael!

Andrew, could you add these two patches and merge them before
mm-memory_hotplug-do-not-fail-offlining-too-early.patch? Or should I
rather repost the full series (including 2 already merged patches?
again?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-17 12:03                       ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-17 12:03 UTC (permalink / raw)
  To: Michael Ellerman, Andrew Morton
  Cc: linux-mm, Vlastimil Babka, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	LKML

On Tue 17-10-17 22:41:08, Michael Ellerman wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > From: Michal Hocko <mhocko@suse.com>
> >
> > Michael has noticed that the memory offline tries to migrate kernel code
> > pages when doing
> >  echo 0 > /sys/devices/system/memory/memory0/online
> >
> > The current implementation will fail the operation after several failed
> > page migration attempts but we shouldn't even attempt to migrate
> > that memory and fail right away because this memory is clearly not
> > migrateable. This will become a real problem when we drop the retry loop
> > counter resp. timeout.
> >
> > The real problem is in has_unmovable_pages in fact. We should fail if
> > there are any non migrateable pages in the area. In orther to guarantee
> > that remove the migrate type checks because MIGRATE_MOVABLE is not
> > guaranteed to contain only migrateable pages. It is merely a heuristic.
> > Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> > allocate any non-migrateable pages from the block but CMA allocations
> > themselves are unlikely to migrateable. Therefore remove both checks.
> >
> > Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Thanks, that works for me.
> 
> Tested-by: Michael Ellerman <mpe@ellerman.id.au>

Thanks a lot Michael!

Andrew, could you add these two patches and merge them before
mm-memory_hotplug-do-not-fail-offlining-too-early.patch? Or should I
rather repost the full series (including 2 already merged patches?
again?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-13 12:00                   ` Michal Hocko
@ 2017-10-17 13:02                     ` Vlastimil Babka
  -1 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-17 13:02 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	LKML, Michal Hocko

On 10/13/2017 02:00 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Michael has noticed that the memory offline tries to migrate kernel code
> pages when doing
>  echo 0 > /sys/devices/system/memory/memory0/online
> 
> The current implementation will fail the operation after several failed
> page migration attempts but we shouldn't even attempt to migrate
> that memory and fail right away because this memory is clearly not
> migrateable. This will become a real problem when we drop the retry loop
> counter resp. timeout.
> 
> The real problem is in has_unmovable_pages in fact. We should fail if
> there are any non migrateable pages in the area. In orther to guarantee
> that remove the migrate type checks because MIGRATE_MOVABLE is not
> guaranteed to contain only migrateable pages. It is merely a heuristic.
> Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> allocate any non-migrateable pages from the block but CMA allocations
> themselves are unlikely to migrateable. Therefore remove both checks.
> 
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_alloc.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3badcedf96a7..ad0294ab3e4f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  	 */
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return false;
> -	mt = get_pageblock_migratetype(page);
> -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> -		return false;
>  
>  	pfn = page_to_pfn(page);
>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-17 13:02                     ` Vlastimil Babka
  0 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-17 13:02 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	LKML, Michal Hocko

On 10/13/2017 02:00 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Michael has noticed that the memory offline tries to migrate kernel code
> pages when doing
>  echo 0 > /sys/devices/system/memory/memory0/online
> 
> The current implementation will fail the operation after several failed
> page migration attempts but we shouldn't even attempt to migrate
> that memory and fail right away because this memory is clearly not
> migrateable. This will become a real problem when we drop the retry loop
> counter resp. timeout.
> 
> The real problem is in has_unmovable_pages in fact. We should fail if
> there are any non migrateable pages in the area. In orther to guarantee
> that remove the migrate type checks because MIGRATE_MOVABLE is not
> guaranteed to contain only migrateable pages. It is merely a heuristic.
> Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> allocate any non-migrateable pages from the block but CMA allocations
> themselves are unlikely to migrateable. Therefore remove both checks.
> 
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_alloc.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3badcedf96a7..ad0294ab3e4f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  	 */
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return false;
> -	mt = get_pageblock_migratetype(page);
> -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> -		return false;
>  
>  	pfn = page_to_pfn(page);
>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] mm, page_alloc: fail has_unmovable_pages when seeing reserved pages
  2017-10-13 12:07                         ` Michal Hocko
@ 2017-10-17 13:03                           ` Vlastimil Babka
  -1 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-17 13:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Igor Mammedov,
	Vitaly Kuznetsov, LKML

On 10/13/2017 02:07 PM, Michal Hocko wrote:
> On Fri 13-10-17 14:04:08, Vlastimil Babka wrote:
>> On 10/13/2017 02:00 PM, Michal Hocko wrote:
>>> From: Michal Hocko <mhocko@suse.com>
>>>
>>> Reserved pages should be completely ignored by the core mm because they
>>> have a special meaning for their owners. has_unmovable_pages doesn't
>>> check those so we rely on other tests (reference count, or PageLRU) to
>>> fail on such pages. Althought this happens to work it is safer to simply
>>> check for those explicitly and do not rely on the owner of the page
>>> to abuse those fields for special purposes.
>>>
>>> Please note that this is more of a further fortification of the code
>>> rahter than a fix of an existing issue.
>>>
>>> Signed-off-by: Michal Hocko <mhocko@suse.com>
>>> ---
>>>  mm/page_alloc.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index ad0294ab3e4f..a8800b0a5619 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -7365,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>>>  
>>>  		page = pfn_to_page(check);
>>>  
>>> +		if (PageReferenced(page))
>>
>> "Referenced" != "Reserved"
> 
> Dohh, you are right of course. I blame auto-completion ;) but I am lame
> in fact...
> ---
> From 44b20bdb03846bc5fd79c883d16b8f3aa436878f Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 13 Oct 2017 13:55:21 +0200
> Subject: [PATCH] mm, page_alloc: fail has_unmovable_pages when seeing reserved
>  pages
> 
> Reserved pages should be completely ignored by the core mm because they
> have a special meaning for their owners. has_unmovable_pages doesn't
> check those so we rely on other tests (reference count, or PageLRU) to
> fail on such pages. Althought this happens to work it is safer to simply
> check for those explicitly and do not rely on the owner of the page
> to abuse those fields for special purposes.
> 
> Please note that this is more of a further fortification of the code
> rahter than a fix of an existing issue.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_alloc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ad0294ab3e4f..5b4d85ae445c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7365,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  
>  		page = pfn_to_page(check);
>  
> +		if (PageReserved(page))
> +			return true;
> +
>  		/*
>  		 * Hugepages are not in LRU lists, but they're movable.
>  		 * We need not scan over tail pages bacause we don't
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 2/2] mm, page_alloc: fail has_unmovable_pages when seeing reserved pages
@ 2017-10-17 13:03                           ` Vlastimil Babka
  0 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-17 13:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Igor Mammedov,
	Vitaly Kuznetsov, LKML

On 10/13/2017 02:07 PM, Michal Hocko wrote:
> On Fri 13-10-17 14:04:08, Vlastimil Babka wrote:
>> On 10/13/2017 02:00 PM, Michal Hocko wrote:
>>> From: Michal Hocko <mhocko@suse.com>
>>>
>>> Reserved pages should be completely ignored by the core mm because they
>>> have a special meaning for their owners. has_unmovable_pages doesn't
>>> check those so we rely on other tests (reference count, or PageLRU) to
>>> fail on such pages. Althought this happens to work it is safer to simply
>>> check for those explicitly and do not rely on the owner of the page
>>> to abuse those fields for special purposes.
>>>
>>> Please note that this is more of a further fortification of the code
>>> rahter than a fix of an existing issue.
>>>
>>> Signed-off-by: Michal Hocko <mhocko@suse.com>
>>> ---
>>>  mm/page_alloc.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index ad0294ab3e4f..a8800b0a5619 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -7365,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>>>  
>>>  		page = pfn_to_page(check);
>>>  
>>> +		if (PageReferenced(page))
>>
>> "Referenced" != "Reserved"
> 
> Dohh, you are right of course. I blame auto-completion ;) but I am lame
> in fact...
> ---
> From 44b20bdb03846bc5fd79c883d16b8f3aa436878f Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 13 Oct 2017 13:55:21 +0200
> Subject: [PATCH] mm, page_alloc: fail has_unmovable_pages when seeing reserved
>  pages
> 
> Reserved pages should be completely ignored by the core mm because they
> have a special meaning for their owners. has_unmovable_pages doesn't
> check those so we rely on other tests (reference count, or PageLRU) to
> fail on such pages. Althought this happens to work it is safer to simply
> check for those explicitly and do not rely on the owner of the page
> to abuse those fields for special purposes.
> 
> Please note that this is more of a further fortification of the code
> rahter than a fix of an existing issue.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_alloc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ad0294ab3e4f..5b4d85ae445c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7365,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  
>  		page = pfn_to_page(check);
>  
> +		if (PageReserved(page))
> +			return true;
> +
>  		/*
>  		 * Hugepages are not in LRU lists, but they're movable.
>  		 * We need not scan over tail pages bacause we don't
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-13 12:00                   ` Michal Hocko
@ 2017-10-19  2:51                     ` Joonsoo Kim
  -1 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-19  2:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

On Fri, Oct 13, 2017 at 02:00:12PM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Michael has noticed that the memory offline tries to migrate kernel code
> pages when doing
>  echo 0 > /sys/devices/system/memory/memory0/online
> 
> The current implementation will fail the operation after several failed
> page migration attempts but we shouldn't even attempt to migrate
> that memory and fail right away because this memory is clearly not
> migrateable. This will become a real problem when we drop the retry loop
> counter resp. timeout.
> 
> The real problem is in has_unmovable_pages in fact. We should fail if
> there are any non migrateable pages in the area. In orther to guarantee
> that remove the migrate type checks because MIGRATE_MOVABLE is not
> guaranteed to contain only migrateable pages. It is merely a heuristic.
> Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> allocate any non-migrateable pages from the block but CMA allocations
> themselves are unlikely to migrateable. Therefore remove both checks.

Hello,

This patch will break the CMA user. As you mentioned, CMA allocation
itself isn't migrateable. So, after a single page is allocated through
CMA allocation, has_unmovable_pages() will return true for this
pageblock. Then, futher CMA allocation request to this pageblock will
fail because it requires isolating the pageblock.

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-19  2:51                     ` Joonsoo Kim
  0 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-19  2:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Michal Hocko

On Fri, Oct 13, 2017 at 02:00:12PM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Michael has noticed that the memory offline tries to migrate kernel code
> pages when doing
>  echo 0 > /sys/devices/system/memory/memory0/online
> 
> The current implementation will fail the operation after several failed
> page migration attempts but we shouldn't even attempt to migrate
> that memory and fail right away because this memory is clearly not
> migrateable. This will become a real problem when we drop the retry loop
> counter resp. timeout.
> 
> The real problem is in has_unmovable_pages in fact. We should fail if
> there are any non migrateable pages in the area. In orther to guarantee
> that remove the migrate type checks because MIGRATE_MOVABLE is not
> guaranteed to contain only migrateable pages. It is merely a heuristic.
> Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> allocate any non-migrateable pages from the block but CMA allocations
> themselves are unlikely to migrateable. Therefore remove both checks.

Hello,

This patch will break the CMA user. As you mentioned, CMA allocation
itself isn't migrateable. So, after a single page is allocated through
CMA allocation, has_unmovable_pages() will return true for this
pageblock. Then, futher CMA allocation request to this pageblock will
fail because it requires isolating the pageblock.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-19  2:51                     ` Joonsoo Kim
@ 2017-10-19  7:15                       ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-19  7:15 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> On Fri, Oct 13, 2017 at 02:00:12PM +0200, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Michael has noticed that the memory offline tries to migrate kernel code
> > pages when doing
> >  echo 0 > /sys/devices/system/memory/memory0/online
> > 
> > The current implementation will fail the operation after several failed
> > page migration attempts but we shouldn't even attempt to migrate
> > that memory and fail right away because this memory is clearly not
> > migrateable. This will become a real problem when we drop the retry loop
> > counter resp. timeout.
> > 
> > The real problem is in has_unmovable_pages in fact. We should fail if
> > there are any non migrateable pages in the area. In orther to guarantee
> > that remove the migrate type checks because MIGRATE_MOVABLE is not
> > guaranteed to contain only migrateable pages. It is merely a heuristic.
> > Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> > allocate any non-migrateable pages from the block but CMA allocations
> > themselves are unlikely to migrateable. Therefore remove both checks.
> 
> Hello,
> 
> This patch will break the CMA user. As you mentioned, CMA allocation
> itself isn't migrateable. So, after a single page is allocated through
> CMA allocation, has_unmovable_pages() will return true for this
> pageblock. Then, futher CMA allocation request to this pageblock will
> fail because it requires isolating the pageblock.

Hmm, does this mean that the CMA allocation path depends on
has_unmovable_pages to return false here even though the memory is not
movable? This sounds really strange to me and kind of abuse of this
function. Which path is that? Can we do the migrate type test theres?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-19  7:15                       ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-19  7:15 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> On Fri, Oct 13, 2017 at 02:00:12PM +0200, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Michael has noticed that the memory offline tries to migrate kernel code
> > pages when doing
> >  echo 0 > /sys/devices/system/memory/memory0/online
> > 
> > The current implementation will fail the operation after several failed
> > page migration attempts but we shouldn't even attempt to migrate
> > that memory and fail right away because this memory is clearly not
> > migrateable. This will become a real problem when we drop the retry loop
> > counter resp. timeout.
> > 
> > The real problem is in has_unmovable_pages in fact. We should fail if
> > there are any non migrateable pages in the area. In orther to guarantee
> > that remove the migrate type checks because MIGRATE_MOVABLE is not
> > guaranteed to contain only migrateable pages. It is merely a heuristic.
> > Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> > allocate any non-migrateable pages from the block but CMA allocations
> > themselves are unlikely to migrateable. Therefore remove both checks.
> 
> Hello,
> 
> This patch will break the CMA user. As you mentioned, CMA allocation
> itself isn't migrateable. So, after a single page is allocated through
> CMA allocation, has_unmovable_pages() will return true for this
> pageblock. Then, futher CMA allocation request to this pageblock will
> fail because it requires isolating the pageblock.

Hmm, does this mean that the CMA allocation path depends on
has_unmovable_pages to return false here even though the memory is not
movable? This sounds really strange to me and kind of abuse of this
function. Which path is that? Can we do the migrate type test theres?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-19  7:15                       ` Michal Hocko
@ 2017-10-19  7:33                         ` Joonsoo Kim
  -1 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-19  7:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > On Fri, Oct 13, 2017 at 02:00:12PM +0200, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > Michael has noticed that the memory offline tries to migrate kernel code
> > > pages when doing
> > >  echo 0 > /sys/devices/system/memory/memory0/online
> > > 
> > > The current implementation will fail the operation after several failed
> > > page migration attempts but we shouldn't even attempt to migrate
> > > that memory and fail right away because this memory is clearly not
> > > migrateable. This will become a real problem when we drop the retry loop
> > > counter resp. timeout.
> > > 
> > > The real problem is in has_unmovable_pages in fact. We should fail if
> > > there are any non migrateable pages in the area. In orther to guarantee
> > > that remove the migrate type checks because MIGRATE_MOVABLE is not
> > > guaranteed to contain only migrateable pages. It is merely a heuristic.
> > > Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> > > allocate any non-migrateable pages from the block but CMA allocations
> > > themselves are unlikely to migrateable. Therefore remove both checks.
> > 
> > Hello,
> > 
> > This patch will break the CMA user. As you mentioned, CMA allocation
> > itself isn't migrateable. So, after a single page is allocated through
> > CMA allocation, has_unmovable_pages() will return true for this
> > pageblock. Then, futher CMA allocation request to this pageblock will
> > fail because it requires isolating the pageblock.
> 
> Hmm, does this mean that the CMA allocation path depends on
> has_unmovable_pages to return false here even though the memory is not
> movable? This sounds really strange to me and kind of abuse of this

Your understanding is correct. Perhaps, abuse or wrong function name.

> function. Which path is that? Can we do the migrate type test theres?

alloc_contig_range() -> start_isolate_page_range() ->
set_migratetype_isolate() -> has_unmovable_pages()

We can add one argument, 'XXX' to set_migratetype_isolate() and change
it to check migrate type rather than has_unmovable_pages() if 'XXX' is
specified.

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-19  7:33                         ` Joonsoo Kim
  0 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-19  7:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > On Fri, Oct 13, 2017 at 02:00:12PM +0200, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > Michael has noticed that the memory offline tries to migrate kernel code
> > > pages when doing
> > >  echo 0 > /sys/devices/system/memory/memory0/online
> > > 
> > > The current implementation will fail the operation after several failed
> > > page migration attempts but we shouldn't even attempt to migrate
> > > that memory and fail right away because this memory is clearly not
> > > migrateable. This will become a real problem when we drop the retry loop
> > > counter resp. timeout.
> > > 
> > > The real problem is in has_unmovable_pages in fact. We should fail if
> > > there are any non migrateable pages in the area. In orther to guarantee
> > > that remove the migrate type checks because MIGRATE_MOVABLE is not
> > > guaranteed to contain only migrateable pages. It is merely a heuristic.
> > > Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> > > allocate any non-migrateable pages from the block but CMA allocations
> > > themselves are unlikely to migrateable. Therefore remove both checks.
> > 
> > Hello,
> > 
> > This patch will break the CMA user. As you mentioned, CMA allocation
> > itself isn't migrateable. So, after a single page is allocated through
> > CMA allocation, has_unmovable_pages() will return true for this
> > pageblock. Then, futher CMA allocation request to this pageblock will
> > fail because it requires isolating the pageblock.
> 
> Hmm, does this mean that the CMA allocation path depends on
> has_unmovable_pages to return false here even though the memory is not
> movable? This sounds really strange to me and kind of abuse of this

Your understanding is correct. Perhaps, abuse or wrong function name.

> function. Which path is that? Can we do the migrate type test theres?

alloc_contig_range() -> start_isolate_page_range() ->
set_migratetype_isolate() -> has_unmovable_pages()

We can add one argument, 'XXX' to set_migratetype_isolate() and change
it to check migrate type rather than has_unmovable_pages() if 'XXX' is
specified.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-19  7:33                         ` Joonsoo Kim
@ 2017-10-19  8:20                           ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-19  8:20 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
[...]
> > > Hello,
> > > 
> > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > itself isn't migrateable. So, after a single page is allocated through
> > > CMA allocation, has_unmovable_pages() will return true for this
> > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > fail because it requires isolating the pageblock.
> > 
> > Hmm, does this mean that the CMA allocation path depends on
> > has_unmovable_pages to return false here even though the memory is not
> > movable? This sounds really strange to me and kind of abuse of this
> 
> Your understanding is correct. Perhaps, abuse or wrong function name.
>
> > function. Which path is that? Can we do the migrate type test theres?
> 
> alloc_contig_range() -> start_isolate_page_range() ->
> set_migratetype_isolate() -> has_unmovable_pages()

I see. It seems that the CMA and memory hotplug have a very different
view on what should happen during isolation.
 
> We can add one argument, 'XXX' to set_migratetype_isolate() and change
> it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> specified.

Can we use the migratetype argument and do the special thing for
MIGRATE_CMA? Like the following diff?
---
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index d4cd2014fa6f..fa9db0c7b54e 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -30,7 +30,7 @@ static inline bool is_migrate_isolate(int migratetype)
 #endif
 
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
-			 bool skip_hwpoisoned_pages);
+			 int migratetype, bool skip_hwpoisoned_pages);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bc50d746a82f..ad2ea7069d14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7362,6 +7362,7 @@ void *__init alloc_large_system_hash(const char *tablename,
  * race condition. So you can't expect this function should be exact.
  */
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
+			 int migratetype,
 			 bool skip_hwpoisoned_pages)
 {
 	unsigned long pfn, iter, found;
@@ -7373,6 +7374,15 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return false;
 
+	/*
+	 * CMA allocations (alloc_contig_range) really need to mark isolate
+	 * CMA pageblocks even when they are not movable in fact so consider
+	 * them movable here.
+	 */
+	if (is_migrate_cma(migratetype) &&
+			is_migrate_cma(get_pageblock_migratetype(page)))
+		return false;
+
 	pfn = page_to_pfn(page);
 	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
 		unsigned long check = pfn + iter;
@@ -7458,7 +7468,7 @@ bool is_pageblock_removable_nolock(struct page *page)
 	if (!zone_spans_pfn(zone, pfn))
 		return false;
 
-	return !has_unmovable_pages(zone, page, 0, true);
+	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, true);
 }
 
 #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 757410d9f758..8616f5332c77 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -14,7 +14,7 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
-static int set_migratetype_isolate(struct page *page,
+static int set_migratetype_isolate(struct page *page, int migratetype,
 				bool skip_hwpoisoned_pages)
 {
 	struct zone *zone;
@@ -51,7 +51,7 @@ static int set_migratetype_isolate(struct page *page,
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
 	 */
-	if (!has_unmovable_pages(zone, page, arg.pages_found,
+	if (!has_unmovable_pages(zone, page, arg.pages_found, migratetype,
 				 skip_hwpoisoned_pages))
 		ret = 0;
 
@@ -63,14 +63,14 @@ static int set_migratetype_isolate(struct page *page,
 out:
 	if (!ret) {
 		unsigned long nr_pages;
-		int migratetype = get_pageblock_migratetype(page);
+		int mt = get_pageblock_migratetype(page);
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
 		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
 									NULL);
 
-		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
+		__mod_zone_freepage_state(zone, -nr_pages, mt);
 	}
 
 	spin_unlock_irqrestore(&zone->lock, flags);
@@ -182,7 +182,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (page &&
-		    set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
+		    set_migratetype_isolate(page, migratetype, skip_hwpoisoned_pages)) {
 			undo_pfn = pfn;
 			goto undo;
 		}
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-19  8:20                           ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-19  8:20 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
[...]
> > > Hello,
> > > 
> > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > itself isn't migrateable. So, after a single page is allocated through
> > > CMA allocation, has_unmovable_pages() will return true for this
> > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > fail because it requires isolating the pageblock.
> > 
> > Hmm, does this mean that the CMA allocation path depends on
> > has_unmovable_pages to return false here even though the memory is not
> > movable? This sounds really strange to me and kind of abuse of this
> 
> Your understanding is correct. Perhaps, abuse or wrong function name.
>
> > function. Which path is that? Can we do the migrate type test theres?
> 
> alloc_contig_range() -> start_isolate_page_range() ->
> set_migratetype_isolate() -> has_unmovable_pages()

I see. It seems that the CMA and memory hotplug have a very different
view on what should happen during isolation.
 
> We can add one argument, 'XXX' to set_migratetype_isolate() and change
> it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> specified.

Can we use the migratetype argument and do the special thing for
MIGRATE_CMA? Like the following diff?
---
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index d4cd2014fa6f..fa9db0c7b54e 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -30,7 +30,7 @@ static inline bool is_migrate_isolate(int migratetype)
 #endif
 
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
-			 bool skip_hwpoisoned_pages);
+			 int migratetype, bool skip_hwpoisoned_pages);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bc50d746a82f..ad2ea7069d14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7362,6 +7362,7 @@ void *__init alloc_large_system_hash(const char *tablename,
  * race condition. So you can't expect this function should be exact.
  */
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
+			 int migratetype,
 			 bool skip_hwpoisoned_pages)
 {
 	unsigned long pfn, iter, found;
@@ -7373,6 +7374,15 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return false;
 
+	/*
+	 * CMA allocations (alloc_contig_range) really need to mark isolate
+	 * CMA pageblocks even when they are not movable in fact so consider
+	 * them movable here.
+	 */
+	if (is_migrate_cma(migratetype) &&
+			is_migrate_cma(get_pageblock_migratetype(page)))
+		return false;
+
 	pfn = page_to_pfn(page);
 	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
 		unsigned long check = pfn + iter;
@@ -7458,7 +7468,7 @@ bool is_pageblock_removable_nolock(struct page *page)
 	if (!zone_spans_pfn(zone, pfn))
 		return false;
 
-	return !has_unmovable_pages(zone, page, 0, true);
+	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, true);
 }
 
 #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 757410d9f758..8616f5332c77 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -14,7 +14,7 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
-static int set_migratetype_isolate(struct page *page,
+static int set_migratetype_isolate(struct page *page, int migratetype,
 				bool skip_hwpoisoned_pages)
 {
 	struct zone *zone;
@@ -51,7 +51,7 @@ static int set_migratetype_isolate(struct page *page,
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
 	 */
-	if (!has_unmovable_pages(zone, page, arg.pages_found,
+	if (!has_unmovable_pages(zone, page, arg.pages_found, migratetype,
 				 skip_hwpoisoned_pages))
 		ret = 0;
 
@@ -63,14 +63,14 @@ static int set_migratetype_isolate(struct page *page,
 out:
 	if (!ret) {
 		unsigned long nr_pages;
-		int migratetype = get_pageblock_migratetype(page);
+		int mt = get_pageblock_migratetype(page);
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
 		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
 									NULL);
 
-		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
+		__mod_zone_freepage_state(zone, -nr_pages, mt);
 	}
 
 	spin_unlock_irqrestore(&zone->lock, flags);
@@ -182,7 +182,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (page &&
-		    set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
+		    set_migratetype_isolate(page, migratetype, skip_hwpoisoned_pages)) {
 			undo_pfn = pfn;
 			goto undo;
 		}
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-19  8:20                           ` Michal Hocko
@ 2017-10-19 12:21                             ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-19 12:21 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> [...]
> > > > Hello,
> > > > 
> > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > itself isn't migrateable. So, after a single page is allocated through
> > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > fail because it requires isolating the pageblock.
> > > 
> > > Hmm, does this mean that the CMA allocation path depends on
> > > has_unmovable_pages to return false here even though the memory is not
> > > movable? This sounds really strange to me and kind of abuse of this
> > 
> > Your understanding is correct. Perhaps, abuse or wrong function name.
> >
> > > function. Which path is that? Can we do the migrate type test theres?
> > 
> > alloc_contig_range() -> start_isolate_page_range() ->
> > set_migratetype_isolate() -> has_unmovable_pages()
> 
> I see. It seems that the CMA and memory hotplug have a very different
> view on what should happen during isolation.
>  
> > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > specified.
> 
> Can we use the migratetype argument and do the special thing for
> MIGRATE_CMA? Like the following diff?

And with the full changelog.
---
>From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 19 Oct 2017 14:14:02 +0200
Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
 has_unmovable_pages

Joonsoo has noticed that "mm: drop migrate type checks from
has_unmovable_pages" would break CMA allocator because it relies on
has_unmovable_pages returning false even for CMA pageblocks which in
fact don't have to be movable:
alloc_contig_range
  start_isolate_page_range
    set_migratetype_isolate
      has_unmovable_pages

This is a result of the code sharing between CMA and memory hotplug
while each one has a different idea of what has_unmovable_pages should
return. This is unfortunate but fixing it properly would require a lot
of code duplication.

Fix the issue by introducing the requested migrate type argument
and special case MIGRATE_CMA case where CMA page blocks are handled
properly. This will work for memory hotplug because it requires
MIGRATE_MOVABLE.

Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/page-isolation.h |  2 +-
 mm/page_alloc.c                | 12 +++++++++++-
 mm/page_isolation.c            | 10 +++++-----
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index d4cd2014fa6f..fa9db0c7b54e 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -30,7 +30,7 @@ static inline bool is_migrate_isolate(int migratetype)
 #endif
 
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
-			 bool skip_hwpoisoned_pages);
+			 int migratetype, bool skip_hwpoisoned_pages);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b4d85ae445c..259aeb22462f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7344,6 +7344,7 @@ void *__init alloc_large_system_hash(const char *tablename,
  * race condition. So you can't expect this function should be exact.
  */
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
+			 int migratetype,
 			 bool skip_hwpoisoned_pages)
 {
 	unsigned long pfn, iter, found;
@@ -7356,6 +7357,15 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return false;
 
+	/*
+	 * CMA allocations (alloc_contig_range) really need to mark isolate
+	 * CMA pageblocks even when they are not movable in fact so consider
+	 * them movable here.
+	 */
+	if (is_migrate_cma(migratetype) &&
+			is_migrate_cma(get_pageblock_migratetype(page)))
+		return false;
+
 	pfn = page_to_pfn(page);
 	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
 		unsigned long check = pfn + iter;
@@ -7441,7 +7451,7 @@ bool is_pageblock_removable_nolock(struct page *page)
 	if (!zone_spans_pfn(zone, pfn))
 		return false;
 
-	return !has_unmovable_pages(zone, page, 0, true);
+	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, true);
 }
 
 #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 757410d9f758..8616f5332c77 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -14,7 +14,7 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
-static int set_migratetype_isolate(struct page *page,
+static int set_migratetype_isolate(struct page *page, int migratetype,
 				bool skip_hwpoisoned_pages)
 {
 	struct zone *zone;
@@ -51,7 +51,7 @@ static int set_migratetype_isolate(struct page *page,
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
 	 */
-	if (!has_unmovable_pages(zone, page, arg.pages_found,
+	if (!has_unmovable_pages(zone, page, arg.pages_found, migratetype,
 				 skip_hwpoisoned_pages))
 		ret = 0;
 
@@ -63,14 +63,14 @@ static int set_migratetype_isolate(struct page *page,
 out:
 	if (!ret) {
 		unsigned long nr_pages;
-		int migratetype = get_pageblock_migratetype(page);
+		int mt = get_pageblock_migratetype(page);
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
 		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
 									NULL);
 
-		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
+		__mod_zone_freepage_state(zone, -nr_pages, mt);
 	}
 
 	spin_unlock_irqrestore(&zone->lock, flags);
@@ -182,7 +182,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (page &&
-		    set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
+		    set_migratetype_isolate(page, migratetype, skip_hwpoisoned_pages)) {
 			undo_pfn = pfn;
 			goto undo;
 		}
-- 
2.14.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-19 12:21                             ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-19 12:21 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> [...]
> > > > Hello,
> > > > 
> > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > itself isn't migrateable. So, after a single page is allocated through
> > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > fail because it requires isolating the pageblock.
> > > 
> > > Hmm, does this mean that the CMA allocation path depends on
> > > has_unmovable_pages to return false here even though the memory is not
> > > movable? This sounds really strange to me and kind of abuse of this
> > 
> > Your understanding is correct. Perhaps, abuse or wrong function name.
> >
> > > function. Which path is that? Can we do the migrate type test theres?
> > 
> > alloc_contig_range() -> start_isolate_page_range() ->
> > set_migratetype_isolate() -> has_unmovable_pages()
> 
> I see. It seems that the CMA and memory hotplug have a very different
> view on what should happen during isolation.
>  
> > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > specified.
> 
> Can we use the migratetype argument and do the special thing for
> MIGRATE_CMA? Like the following diff?

And with the full changelog.
---

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-19 12:21                             ` Michal Hocko
@ 2017-10-20  2:13                               ` Joonsoo Kim
  -1 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-20  2:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > [...]
> > > > > Hello,
> > > > > 
> > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > fail because it requires isolating the pageblock.
> > > > 
> > > > Hmm, does this mean that the CMA allocation path depends on
> > > > has_unmovable_pages to return false here even though the memory is not
> > > > movable? This sounds really strange to me and kind of abuse of this
> > > 
> > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > >
> > > > function. Which path is that? Can we do the migrate type test theres?
> > > 
> > > alloc_contig_range() -> start_isolate_page_range() ->
> > > set_migratetype_isolate() -> has_unmovable_pages()
> > 
> > I see. It seems that the CMA and memory hotplug have a very different
> > view on what should happen during isolation.
> >  
> > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > specified.
> > 
> > Can we use the migratetype argument and do the special thing for
> > MIGRATE_CMA? Like the following diff?
> 
> And with the full changelog.
> ---
> >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 19 Oct 2017 14:14:02 +0200
> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
>  has_unmovable_pages
> 
> Joonsoo has noticed that "mm: drop migrate type checks from
> has_unmovable_pages" would break CMA allocator because it relies on
> has_unmovable_pages returning false even for CMA pageblocks which in
> fact don't have to be movable:
> alloc_contig_range
>   start_isolate_page_range
>     set_migratetype_isolate
>       has_unmovable_pages
> 
> This is a result of the code sharing between CMA and memory hotplug
> while each one has a different idea of what has_unmovable_pages should
> return. This is unfortunate but fixing it properly would require a lot
> of code duplication.
> 
> Fix the issue by introducing the requested migrate type argument
> and special case MIGRATE_CMA case where CMA page blocks are handled
> properly. This will work for memory hotplug because it requires
> MIGRATE_MOVABLE.

Unfortunately, alloc_contig_range() can be called with
MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.

I did a more thinking and found that it's strange to check if there is
unmovable page in the pageblock during the set_migratetype_isolate().
set_migratetype_isolate() should be just for setting the migratetype
of the pageblock. Checking other things should be done by another
place, for example, before calling the start_isolate_page_range() in
__offline_pages().

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-20  2:13                               ` Joonsoo Kim
  0 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-20  2:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > [...]
> > > > > Hello,
> > > > > 
> > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > fail because it requires isolating the pageblock.
> > > > 
> > > > Hmm, does this mean that the CMA allocation path depends on
> > > > has_unmovable_pages to return false here even though the memory is not
> > > > movable? This sounds really strange to me and kind of abuse of this
> > > 
> > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > >
> > > > function. Which path is that? Can we do the migrate type test theres?
> > > 
> > > alloc_contig_range() -> start_isolate_page_range() ->
> > > set_migratetype_isolate() -> has_unmovable_pages()
> > 
> > I see. It seems that the CMA and memory hotplug have a very different
> > view on what should happen during isolation.
> >  
> > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > specified.
> > 
> > Can we use the migratetype argument and do the special thing for
> > MIGRATE_CMA? Like the following diff?
> 
> And with the full changelog.
> ---
> >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 19 Oct 2017 14:14:02 +0200
> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
>  has_unmovable_pages
> 
> Joonsoo has noticed that "mm: drop migrate type checks from
> has_unmovable_pages" would break CMA allocator because it relies on
> has_unmovable_pages returning false even for CMA pageblocks which in
> fact don't have to be movable:
> alloc_contig_range
>   start_isolate_page_range
>     set_migratetype_isolate
>       has_unmovable_pages
> 
> This is a result of the code sharing between CMA and memory hotplug
> while each one has a different idea of what has_unmovable_pages should
> return. This is unfortunate but fixing it properly would require a lot
> of code duplication.
> 
> Fix the issue by introducing the requested migrate type argument
> and special case MIGRATE_CMA case where CMA page blocks are handled
> properly. This will work for memory hotplug because it requires
> MIGRATE_MOVABLE.

Unfortunately, alloc_contig_range() can be called with
MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.

I did a more thinking and found that it's strange to check if there is
unmovable page in the pageblock during the set_migratetype_isolate().
set_migratetype_isolate() should be just for setting the migratetype
of the pageblock. Checking other things should be done by another
place, for example, before calling the start_isolate_page_range() in
__offline_pages().

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-20  2:13                               ` Joonsoo Kim
@ 2017-10-20  5:59                                 ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-20  5:59 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > [...]
> > > > > > Hello,
> > > > > > 
> > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > fail because it requires isolating the pageblock.
> > > > > 
> > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > 
> > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > >
> > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > 
> > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > 
> > > I see. It seems that the CMA and memory hotplug have a very different
> > > view on what should happen during isolation.
> > >  
> > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > specified.
> > > 
> > > Can we use the migratetype argument and do the special thing for
> > > MIGRATE_CMA? Like the following diff?
> > 
> > And with the full changelog.
> > ---
> > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> >  has_unmovable_pages
> > 
> > Joonsoo has noticed that "mm: drop migrate type checks from
> > has_unmovable_pages" would break CMA allocator because it relies on
> > has_unmovable_pages returning false even for CMA pageblocks which in
> > fact don't have to be movable:
> > alloc_contig_range
> >   start_isolate_page_range
> >     set_migratetype_isolate
> >       has_unmovable_pages
> > 
> > This is a result of the code sharing between CMA and memory hotplug
> > while each one has a different idea of what has_unmovable_pages should
> > return. This is unfortunate but fixing it properly would require a lot
> > of code duplication.
> > 
> > Fix the issue by introducing the requested migrate type argument
> > and special case MIGRATE_CMA case where CMA page blocks are handled
> > properly. This will work for memory hotplug because it requires
> > MIGRATE_MOVABLE.
> 
> Unfortunately, alloc_contig_range() can be called with
> MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.

Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
understanding is that only CMA allocator really depends on this weird
semantic and that does MIGRATE_CMA unconditionally.

> I did a more thinking and found that it's strange to check if there is
> unmovable page in the pageblock during the set_migratetype_isolate().
> set_migratetype_isolate() should be just for setting the migratetype
> of the pageblock. Checking other things should be done by another
> place, for example, before calling the start_isolate_page_range() in
> __offline_pages().

How do we guarantee the atomicity?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-20  5:59                                 ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-20  5:59 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > [...]
> > > > > > Hello,
> > > > > > 
> > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > fail because it requires isolating the pageblock.
> > > > > 
> > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > 
> > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > >
> > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > 
> > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > 
> > > I see. It seems that the CMA and memory hotplug have a very different
> > > view on what should happen during isolation.
> > >  
> > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > specified.
> > > 
> > > Can we use the migratetype argument and do the special thing for
> > > MIGRATE_CMA? Like the following diff?
> > 
> > And with the full changelog.
> > ---
> > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> >  has_unmovable_pages
> > 
> > Joonsoo has noticed that "mm: drop migrate type checks from
> > has_unmovable_pages" would break CMA allocator because it relies on
> > has_unmovable_pages returning false even for CMA pageblocks which in
> > fact don't have to be movable:
> > alloc_contig_range
> >   start_isolate_page_range
> >     set_migratetype_isolate
> >       has_unmovable_pages
> > 
> > This is a result of the code sharing between CMA and memory hotplug
> > while each one has a different idea of what has_unmovable_pages should
> > return. This is unfortunate but fixing it properly would require a lot
> > of code duplication.
> > 
> > Fix the issue by introducing the requested migrate type argument
> > and special case MIGRATE_CMA case where CMA page blocks are handled
> > properly. This will work for memory hotplug because it requires
> > MIGRATE_MOVABLE.
> 
> Unfortunately, alloc_contig_range() can be called with
> MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.

Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
understanding is that only CMA allocator really depends on this weird
semantic and that does MIGRATE_CMA unconditionally.

> I did a more thinking and found that it's strange to check if there is
> unmovable page in the pageblock during the set_migratetype_isolate().
> set_migratetype_isolate() should be just for setting the migratetype
> of the pageblock. Checking other things should be done by another
> place, for example, before calling the start_isolate_page_range() in
> __offline_pages().

How do we guarantee the atomicity?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-20  5:59                                 ` Michal Hocko
@ 2017-10-20  6:50                                   ` Joonsoo Kim
  -1 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-20  6:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > [...]
> > > > > > > Hello,
> > > > > > > 
> > > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > > fail because it requires isolating the pageblock.
> > > > > > 
> > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > > 
> > > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > > >
> > > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > > 
> > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > 
> > > > I see. It seems that the CMA and memory hotplug have a very different
> > > > view on what should happen during isolation.
> > > >  
> > > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > > specified.
> > > > 
> > > > Can we use the migratetype argument and do the special thing for
> > > > MIGRATE_CMA? Like the following diff?
> > > 
> > > And with the full changelog.
> > > ---
> > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.com>
> > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > >  has_unmovable_pages
> > > 
> > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > has_unmovable_pages" would break CMA allocator because it relies on
> > > has_unmovable_pages returning false even for CMA pageblocks which in
> > > fact don't have to be movable:
> > > alloc_contig_range
> > >   start_isolate_page_range
> > >     set_migratetype_isolate
> > >       has_unmovable_pages
> > > 
> > > This is a result of the code sharing between CMA and memory hotplug
> > > while each one has a different idea of what has_unmovable_pages should
> > > return. This is unfortunate but fixing it properly would require a lot
> > > of code duplication.
> > > 
> > > Fix the issue by introducing the requested migrate type argument
> > > and special case MIGRATE_CMA case where CMA page blocks are handled
> > > properly. This will work for memory hotplug because it requires
> > > MIGRATE_MOVABLE.
> > 
> > Unfortunately, alloc_contig_range() can be called with
> > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> 
> Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> understanding is that only CMA allocator really depends on this weird
> semantic and that does MIGRATE_CMA unconditionally.

alloc_contig_range() could be called for partial pages in the
pageblock. With your patch, this case also fails unnecessarilly if the
other pages in the pageblock is pinned.

Until now, there is no user calling alloc_contig_range() with partial
pages except CMA allocator but API could support it.

> 
> > I did a more thinking and found that it's strange to check if there is
> > unmovable page in the pageblock during the set_migratetype_isolate().
> > set_migratetype_isolate() should be just for setting the migratetype
> > of the pageblock. Checking other things should be done by another
> > place, for example, before calling the start_isolate_page_range() in
> > __offline_pages().
> 
> How do we guarantee the atomicity?

What atomicity do you mean?

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-20  6:50                                   ` Joonsoo Kim
  0 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-20  6:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > [...]
> > > > > > > Hello,
> > > > > > > 
> > > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > > fail because it requires isolating the pageblock.
> > > > > > 
> > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > > 
> > > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > > >
> > > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > > 
> > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > 
> > > > I see. It seems that the CMA and memory hotplug have a very different
> > > > view on what should happen during isolation.
> > > >  
> > > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > > specified.
> > > > 
> > > > Can we use the migratetype argument and do the special thing for
> > > > MIGRATE_CMA? Like the following diff?
> > > 
> > > And with the full changelog.
> > > ---
> > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.com>
> > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > >  has_unmovable_pages
> > > 
> > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > has_unmovable_pages" would break CMA allocator because it relies on
> > > has_unmovable_pages returning false even for CMA pageblocks which in
> > > fact don't have to be movable:
> > > alloc_contig_range
> > >   start_isolate_page_range
> > >     set_migratetype_isolate
> > >       has_unmovable_pages
> > > 
> > > This is a result of the code sharing between CMA and memory hotplug
> > > while each one has a different idea of what has_unmovable_pages should
> > > return. This is unfortunate but fixing it properly would require a lot
> > > of code duplication.
> > > 
> > > Fix the issue by introducing the requested migrate type argument
> > > and special case MIGRATE_CMA case where CMA page blocks are handled
> > > properly. This will work for memory hotplug because it requires
> > > MIGRATE_MOVABLE.
> > 
> > Unfortunately, alloc_contig_range() can be called with
> > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> 
> Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> understanding is that only CMA allocator really depends on this weird
> semantic and that does MIGRATE_CMA unconditionally.

alloc_contig_range() could be called for partial pages in the
pageblock. With your patch, this case also fails unnecessarilly if the
other pages in the pageblock is pinned.

Until now, there is no user calling alloc_contig_range() with partial
pages except CMA allocator but API could support it.

> 
> > I did a more thinking and found that it's strange to check if there is
> > unmovable page in the pageblock during the set_migratetype_isolate().
> > set_migratetype_isolate() should be just for setting the migratetype
> > of the pageblock. Checking other things should be done by another
> > place, for example, before calling the start_isolate_page_range() in
> > __offline_pages().
> 
> How do we guarantee the atomicity?

What atomicity do you mean?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-20  6:50                                   ` Joonsoo Kim
@ 2017-10-20  7:02                                     ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-20  7:02 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 20-10-17 15:50:14, Joonsoo Kim wrote:
> On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> > On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > > [...]
> > > > > > > > Hello,
> > > > > > > > 
> > > > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > > > fail because it requires isolating the pageblock.
> > > > > > > 
> > > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > > > 
> > > > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > > > >
> > > > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > > > 
> > > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > > 
> > > > > I see. It seems that the CMA and memory hotplug have a very different
> > > > > view on what should happen during isolation.
> > > > >  
> > > > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > > > specified.
> > > > > 
> > > > > Can we use the migratetype argument and do the special thing for
> > > > > MIGRATE_CMA? Like the following diff?
> > > > 
> > > > And with the full changelog.
> > > > ---
> > > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > > >  has_unmovable_pages
> > > > 
> > > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > > has_unmovable_pages" would break CMA allocator because it relies on
> > > > has_unmovable_pages returning false even for CMA pageblocks which in
> > > > fact don't have to be movable:
> > > > alloc_contig_range
> > > >   start_isolate_page_range
> > > >     set_migratetype_isolate
> > > >       has_unmovable_pages
> > > > 
> > > > This is a result of the code sharing between CMA and memory hotplug
> > > > while each one has a different idea of what has_unmovable_pages should
> > > > return. This is unfortunate but fixing it properly would require a lot
> > > > of code duplication.
> > > > 
> > > > Fix the issue by introducing the requested migrate type argument
> > > > and special case MIGRATE_CMA case where CMA page blocks are handled
> > > > properly. This will work for memory hotplug because it requires
> > > > MIGRATE_MOVABLE.
> > > 
> > > Unfortunately, alloc_contig_range() can be called with
> > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > 
> > Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> > understanding is that only CMA allocator really depends on this weird
> > semantic and that does MIGRATE_CMA unconditionally.
> 
> alloc_contig_range() could be called for partial pages in the
> pageblock. With your patch, this case also fails unnecessarilly if the
> other pages in the pageblock is pinned.

Is this really the case for GB pages? Do we really want to mess those
with CMA blocks and make those blocks basically unusable because GB
pages are rarely (if at all migrateable)?

> Until now, there is no user calling alloc_contig_range() with partial
> pages except CMA allocator but API could support it.

I disagree. If this is a CMA thing it should stay that way. The semantic
is quite confusing already, please let's not make it even worse.

> > > I did a more thinking and found that it's strange to check if there is
> > > unmovable page in the pageblock during the set_migratetype_isolate().
> > > set_migratetype_isolate() should be just for setting the migratetype
> > > of the pageblock. Checking other things should be done by another
> > > place, for example, before calling the start_isolate_page_range() in
> > > __offline_pages().
> > 
> > How do we guarantee the atomicity?
> 
> What atomicity do you mean?

Currently we are checking and isolating pages under zone lock. If we
split that we are losing atomicity, aren't we.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-20  7:02                                     ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-20  7:02 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 20-10-17 15:50:14, Joonsoo Kim wrote:
> On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> > On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > > [...]
> > > > > > > > Hello,
> > > > > > > > 
> > > > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > > > fail because it requires isolating the pageblock.
> > > > > > > 
> > > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > > > 
> > > > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > > > >
> > > > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > > > 
> > > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > > 
> > > > > I see. It seems that the CMA and memory hotplug have a very different
> > > > > view on what should happen during isolation.
> > > > >  
> > > > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > > > specified.
> > > > > 
> > > > > Can we use the migratetype argument and do the special thing for
> > > > > MIGRATE_CMA? Like the following diff?
> > > > 
> > > > And with the full changelog.
> > > > ---
> > > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > > >  has_unmovable_pages
> > > > 
> > > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > > has_unmovable_pages" would break CMA allocator because it relies on
> > > > has_unmovable_pages returning false even for CMA pageblocks which in
> > > > fact don't have to be movable:
> > > > alloc_contig_range
> > > >   start_isolate_page_range
> > > >     set_migratetype_isolate
> > > >       has_unmovable_pages
> > > > 
> > > > This is a result of the code sharing between CMA and memory hotplug
> > > > while each one has a different idea of what has_unmovable_pages should
> > > > return. This is unfortunate but fixing it properly would require a lot
> > > > of code duplication.
> > > > 
> > > > Fix the issue by introducing the requested migrate type argument
> > > > and special case MIGRATE_CMA case where CMA page blocks are handled
> > > > properly. This will work for memory hotplug because it requires
> > > > MIGRATE_MOVABLE.
> > > 
> > > Unfortunately, alloc_contig_range() can be called with
> > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > 
> > Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> > understanding is that only CMA allocator really depends on this weird
> > semantic and that does MIGRATE_CMA unconditionally.
> 
> alloc_contig_range() could be called for partial pages in the
> pageblock. With your patch, this case also fails unnecessarilly if the
> other pages in the pageblock is pinned.

Is this really the case for GB pages? Do we really want to mess those
with CMA blocks and make those blocks basically unusable because GB
pages are rarely (if at all migrateable)?

> Until now, there is no user calling alloc_contig_range() with partial
> pages except CMA allocator but API could support it.

I disagree. If this is a CMA thing it should stay that way. The semantic
is quite confusing already, please let's not make it even worse.

> > > I did a more thinking and found that it's strange to check if there is
> > > unmovable page in the pageblock during the set_migratetype_isolate().
> > > set_migratetype_isolate() should be just for setting the migratetype
> > > of the pageblock. Checking other things should be done by another
> > > place, for example, before calling the start_isolate_page_range() in
> > > __offline_pages().
> > 
> > How do we guarantee the atomicity?
> 
> What atomicity do you mean?

Currently we are checking and isolating pages under zone lock. If we
split that we are losing atomicity, aren't we.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-20  2:13                               ` Joonsoo Kim
@ 2017-10-20  7:22                                 ` Xishi Qiu
  -1 siblings, 0 replies; 102+ messages in thread
From: Xishi Qiu @ 2017-10-20  7:22 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Michal Hocko, linux-mm, Michael Ellerman, Vlastimil Babka,
	Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On 2017/10/20 10:13, Joonsoo Kim wrote:

> On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
>> On Thu 19-10-17 10:20:41, Michal Hocko wrote:
>>> On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
>>>> On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
>>>>> On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
>>> [...]
>>>>>> Hello,
>>>>>>
>>>>>> This patch will break the CMA user. As you mentioned, CMA allocation
>>>>>> itself isn't migrateable. So, after a single page is allocated through
>>>>>> CMA allocation, has_unmovable_pages() will return true for this
>>>>>> pageblock. Then, futher CMA allocation request to this pageblock will
>>>>>> fail because it requires isolating the pageblock.
>>>>>
>>>>> Hmm, does this mean that the CMA allocation path depends on
>>>>> has_unmovable_pages to return false here even though the memory is not
>>>>> movable? This sounds really strange to me and kind of abuse of this
>>>>
>>>> Your understanding is correct. Perhaps, abuse or wrong function name.
>>>>
>>>>> function. Which path is that? Can we do the migrate type test theres?
>>>>
>>>> alloc_contig_range() -> start_isolate_page_range() ->
>>>> set_migratetype_isolate() -> has_unmovable_pages()
>>>
>>> I see. It seems that the CMA and memory hotplug have a very different
>>> view on what should happen during isolation.
>>>  
>>>> We can add one argument, 'XXX' to set_migratetype_isolate() and change
>>>> it to check migrate type rather than has_unmovable_pages() if 'XXX' is
>>>> specified.
>>>
>>> Can we use the migratetype argument and do the special thing for
>>> MIGRATE_CMA? Like the following diff?
>>
>> And with the full changelog.
>> ---
>> >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
>> From: Michal Hocko <mhocko@suse.com>
>> Date: Thu, 19 Oct 2017 14:14:02 +0200
>> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
>>  has_unmovable_pages
>>
>> Joonsoo has noticed that "mm: drop migrate type checks from
>> has_unmovable_pages" would break CMA allocator because it relies on
>> has_unmovable_pages returning false even for CMA pageblocks which in
>> fact don't have to be movable:
>> alloc_contig_range
>>   start_isolate_page_range
>>     set_migratetype_isolate
>>       has_unmovable_pages
>>
>> This is a result of the code sharing between CMA and memory hotplug
>> while each one has a different idea of what has_unmovable_pages should
>> return. This is unfortunate but fixing it properly would require a lot
>> of code duplication.
>>
>> Fix the issue by introducing the requested migrate type argument
>> and special case MIGRATE_CMA case where CMA page blocks are handled
>> properly. This will work for memory hotplug because it requires
>> MIGRATE_MOVABLE.
> 
> Unfortunately, alloc_contig_range() can be called with
> MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> 
> I did a more thinking and found that it's strange to check if there is
> unmovable page in the pageblock during the set_migratetype_isolate().
> set_migratetype_isolate() should be just for setting the migratetype
> of the pageblock. Checking other things should be done by another
> place, for example, before calling the start_isolate_page_range() in
> __offline_pages().
> 
> Thanks.
> 

Hi Joonsoo,

How about add a flag to skip or not has_unmovable_pages() in set_migratetype_isolate()?
Something like the skip_hwpoisoned_pages.

Thanks,
Xishi Qiu

> 
> .
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-20  7:22                                 ` Xishi Qiu
  0 siblings, 0 replies; 102+ messages in thread
From: Xishi Qiu @ 2017-10-20  7:22 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Michal Hocko, linux-mm, Michael Ellerman, Vlastimil Babka,
	Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On 2017/10/20 10:13, Joonsoo Kim wrote:

> On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
>> On Thu 19-10-17 10:20:41, Michal Hocko wrote:
>>> On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
>>>> On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
>>>>> On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
>>> [...]
>>>>>> Hello,
>>>>>>
>>>>>> This patch will break the CMA user. As you mentioned, CMA allocation
>>>>>> itself isn't migrateable. So, after a single page is allocated through
>>>>>> CMA allocation, has_unmovable_pages() will return true for this
>>>>>> pageblock. Then, futher CMA allocation request to this pageblock will
>>>>>> fail because it requires isolating the pageblock.
>>>>>
>>>>> Hmm, does this mean that the CMA allocation path depends on
>>>>> has_unmovable_pages to return false here even though the memory is not
>>>>> movable? This sounds really strange to me and kind of abuse of this
>>>>
>>>> Your understanding is correct. Perhaps, abuse or wrong function name.
>>>>
>>>>> function. Which path is that? Can we do the migrate type test theres?
>>>>
>>>> alloc_contig_range() -> start_isolate_page_range() ->
>>>> set_migratetype_isolate() -> has_unmovable_pages()
>>>
>>> I see. It seems that the CMA and memory hotplug have a very different
>>> view on what should happen during isolation.
>>>  
>>>> We can add one argument, 'XXX' to set_migratetype_isolate() and change
>>>> it to check migrate type rather than has_unmovable_pages() if 'XXX' is
>>>> specified.
>>>
>>> Can we use the migratetype argument and do the special thing for
>>> MIGRATE_CMA? Like the following diff?
>>
>> And with the full changelog.
>> ---
>> >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
>> From: Michal Hocko <mhocko@suse.com>
>> Date: Thu, 19 Oct 2017 14:14:02 +0200
>> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
>>  has_unmovable_pages
>>
>> Joonsoo has noticed that "mm: drop migrate type checks from
>> has_unmovable_pages" would break CMA allocator because it relies on
>> has_unmovable_pages returning false even for CMA pageblocks which in
>> fact don't have to be movable:
>> alloc_contig_range
>>   start_isolate_page_range
>>     set_migratetype_isolate
>>       has_unmovable_pages
>>
>> This is a result of the code sharing between CMA and memory hotplug
>> while each one has a different idea of what has_unmovable_pages should
>> return. This is unfortunate but fixing it properly would require a lot
>> of code duplication.
>>
>> Fix the issue by introducing the requested migrate type argument
>> and special case MIGRATE_CMA case where CMA page blocks are handled
>> properly. This will work for memory hotplug because it requires
>> MIGRATE_MOVABLE.
> 
> Unfortunately, alloc_contig_range() can be called with
> MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> 
> I did a more thinking and found that it's strange to check if there is
> unmovable page in the pageblock during the set_migratetype_isolate().
> set_migratetype_isolate() should be just for setting the migratetype
> of the pageblock. Checking other things should be done by another
> place, for example, before calling the start_isolate_page_range() in
> __offline_pages().
> 
> Thanks.
> 

Hi Joonsoo,

How about add a flag to skip or not has_unmovable_pages() in set_migratetype_isolate()?
Something like the skip_hwpoisoned_pages.

Thanks,
Xishi Qiu

> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-20  7:22                                 ` Xishi Qiu
@ 2017-10-20  8:17                                   ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-20  8:17 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Joonsoo Kim, linux-mm, Michael Ellerman, Vlastimil Babka,
	Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 20-10-17 15:22:14, Xishi Qiu wrote:
> On 2017/10/20 10:13, Joonsoo Kim wrote:
> 
> > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
[...]
> >> >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> >> From: Michal Hocko <mhocko@suse.com>
> >> Date: Thu, 19 Oct 2017 14:14:02 +0200
> >> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> >>  has_unmovable_pages
> >>
> >> Joonsoo has noticed that "mm: drop migrate type checks from
> >> has_unmovable_pages" would break CMA allocator because it relies on
> >> has_unmovable_pages returning false even for CMA pageblocks which in
> >> fact don't have to be movable:
> >> alloc_contig_range
> >>   start_isolate_page_range
> >>     set_migratetype_isolate
> >>       has_unmovable_pages
> >>
> >> This is a result of the code sharing between CMA and memory hotplug
> >> while each one has a different idea of what has_unmovable_pages should
> >> return. This is unfortunate but fixing it properly would require a lot
> >> of code duplication.
> >>
> >> Fix the issue by introducing the requested migrate type argument
> >> and special case MIGRATE_CMA case where CMA page blocks are handled
> >> properly. This will work for memory hotplug because it requires
> >> MIGRATE_MOVABLE.
> > 
> > Unfortunately, alloc_contig_range() can be called with
> > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > 
> > I did a more thinking and found that it's strange to check if there is
> > unmovable page in the pageblock during the set_migratetype_isolate().
> > set_migratetype_isolate() should be just for setting the migratetype
> > of the pageblock. Checking other things should be done by another
> > place, for example, before calling the start_isolate_page_range() in
> > __offline_pages().
> > 
> > Thanks.
> > 
> 
> Hi Joonsoo,
> 
> How about add a flag to skip or not has_unmovable_pages() in set_migratetype_isolate()?
> Something like the skip_hwpoisoned_pages.

I believe this is what Joonsoo was proposing actually. I cannot say I
would like skip_hwpoisoned_pages and adding one more flag is just too
ugly. So I would prefer to have something that would actually make sense
from the semantic POV. If CMA really needs to work with partial CMA
blocks then MIGRATE_CMA as an indicator sounds like the right way to go
to me. But I am not a CMA expert so I might be missing some subtlety
here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-20  8:17                                   ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-20  8:17 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Joonsoo Kim, linux-mm, Michael Ellerman, Vlastimil Babka,
	Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 20-10-17 15:22:14, Xishi Qiu wrote:
> On 2017/10/20 10:13, Joonsoo Kim wrote:
> 
> > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
[...]
> >> >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> >> From: Michal Hocko <mhocko@suse.com>
> >> Date: Thu, 19 Oct 2017 14:14:02 +0200
> >> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> >>  has_unmovable_pages
> >>
> >> Joonsoo has noticed that "mm: drop migrate type checks from
> >> has_unmovable_pages" would break CMA allocator because it relies on
> >> has_unmovable_pages returning false even for CMA pageblocks which in
> >> fact don't have to be movable:
> >> alloc_contig_range
> >>   start_isolate_page_range
> >>     set_migratetype_isolate
> >>       has_unmovable_pages
> >>
> >> This is a result of the code sharing between CMA and memory hotplug
> >> while each one has a different idea of what has_unmovable_pages should
> >> return. This is unfortunate but fixing it properly would require a lot
> >> of code duplication.
> >>
> >> Fix the issue by introducing the requested migrate type argument
> >> and special case MIGRATE_CMA case where CMA page blocks are handled
> >> properly. This will work for memory hotplug because it requires
> >> MIGRATE_MOVABLE.
> > 
> > Unfortunately, alloc_contig_range() can be called with
> > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > 
> > I did a more thinking and found that it's strange to check if there is
> > unmovable page in the pageblock during the set_migratetype_isolate().
> > set_migratetype_isolate() should be just for setting the migratetype
> > of the pageblock. Checking other things should be done by another
> > place, for example, before calling the start_isolate_page_range() in
> > __offline_pages().
> > 
> > Thanks.
> > 
> 
> Hi Joonsoo,
> 
> How about add a flag to skip or not has_unmovable_pages() in set_migratetype_isolate()?
> Something like the skip_hwpoisoned_pages.

I believe this is what Joonsoo was proposing actually. I cannot say I
would like skip_hwpoisoned_pages and adding one more flag is just too
ugly. So I would prefer to have something that would actually make sense
from the semantic POV. If CMA really needs to work with partial CMA
blocks then MIGRATE_CMA as an indicator sounds like the right way to go
to me. But I am not a CMA expert so I might be missing some subtlety
here.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-20  7:02                                     ` Michal Hocko
@ 2017-10-23  5:23                                       ` Joonsoo Kim
  -1 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-23  5:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri, Oct 20, 2017 at 09:02:20AM +0200, Michal Hocko wrote:
> On Fri 20-10-17 15:50:14, Joonsoo Kim wrote:
> > On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> > > On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > > > [...]
> > > > > > > > > Hello,
> > > > > > > > > 
> > > > > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > > > > fail because it requires isolating the pageblock.
> > > > > > > > 
> > > > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > > > > 
> > > > > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > > > > >
> > > > > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > > > > 
> > > > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > > > 
> > > > > > I see. It seems that the CMA and memory hotplug have a very different
> > > > > > view on what should happen during isolation.
> > > > > >  
> > > > > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > > > > specified.
> > > > > > 
> > > > > > Can we use the migratetype argument and do the special thing for
> > > > > > MIGRATE_CMA? Like the following diff?
> > > > > 
> > > > > And with the full changelog.
> > > > > ---
> > > > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > > > >  has_unmovable_pages
> > > > > 
> > > > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > > > has_unmovable_pages" would break CMA allocator because it relies on
> > > > > has_unmovable_pages returning false even for CMA pageblocks which in
> > > > > fact don't have to be movable:
> > > > > alloc_contig_range
> > > > >   start_isolate_page_range
> > > > >     set_migratetype_isolate
> > > > >       has_unmovable_pages
> > > > > 
> > > > > This is a result of the code sharing between CMA and memory hotplug
> > > > > while each one has a different idea of what has_unmovable_pages should
> > > > > return. This is unfortunate but fixing it properly would require a lot
> > > > > of code duplication.
> > > > > 
> > > > > Fix the issue by introducing the requested migrate type argument
> > > > > and special case MIGRATE_CMA case where CMA page blocks are handled
> > > > > properly. This will work for memory hotplug because it requires
> > > > > MIGRATE_MOVABLE.
> > > > 
> > > > Unfortunately, alloc_contig_range() can be called with
> > > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > > 
> > > Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> > > understanding is that only CMA allocator really depends on this weird
> > > semantic and that does MIGRATE_CMA unconditionally.
> > 
> > alloc_contig_range() could be called for partial pages in the
> > pageblock. With your patch, this case also fails unnecessarilly if the
> > other pages in the pageblock is pinned.
> 
> Is this really the case for GB pages? Do we really want to mess those

No, but, as I mentioned already, this API can be called with less
pages. I know that there is no user with less pages at this moment but
I cannot see any point to reduce this API's capability.

> with CMA blocks and make those blocks basically unusable because GB
> pages are rarely (if at all migrateable)?
> 
> > Until now, there is no user calling alloc_contig_range() with partial
> > pages except CMA allocator but API could support it.
> 
> I disagree. If this is a CMA thing it should stay that way. The semantic
> is quite confusing already, please let's not make it even worse.

It is already used by other component.

I'm not sure what is the confusing semantic you mentioned. I think
that set_migratetype_isolate() has confusing semantic and should be
fixed since making the pageblock isolated doesn't need to check if
there is unmovable page or not. Do you think that
set_migratetype_isolate() need to check it? If so, why?

> > > > I did a more thinking and found that it's strange to check if there is
> > > > unmovable page in the pageblock during the set_migratetype_isolate().
> > > > set_migratetype_isolate() should be just for setting the migratetype
> > > > of the pageblock. Checking other things should be done by another
> > > > place, for example, before calling the start_isolate_page_range() in
> > > > __offline_pages().
> > > 
> > > How do we guarantee the atomicity?
> > 
> > What atomicity do you mean?
> 
> Currently we are checking and isolating pages under zone lock. If we
> split that we are losing atomicity, aren't we.

I think that it can be done easily.

set_migratetype_isolate() {
        lock
        __set_migratetype_isolate();
        unlock
}

set_migratetype_isolate_if_no_unmovable_pages() {
        lock
        if (has_unmovable_pages())
                fail
        else
                __set_migratetype_isolate()
        unlock
}

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-23  5:23                                       ` Joonsoo Kim
  0 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-23  5:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri, Oct 20, 2017 at 09:02:20AM +0200, Michal Hocko wrote:
> On Fri 20-10-17 15:50:14, Joonsoo Kim wrote:
> > On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> > > On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > > > [...]
> > > > > > > > > Hello,
> > > > > > > > > 
> > > > > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > > > > fail because it requires isolating the pageblock.
> > > > > > > > 
> > > > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > > > > 
> > > > > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > > > > >
> > > > > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > > > > 
> > > > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > > > 
> > > > > > I see. It seems that the CMA and memory hotplug have a very different
> > > > > > view on what should happen during isolation.
> > > > > >  
> > > > > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > > > > specified.
> > > > > > 
> > > > > > Can we use the migratetype argument and do the special thing for
> > > > > > MIGRATE_CMA? Like the following diff?
> > > > > 
> > > > > And with the full changelog.
> > > > > ---
> > > > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > > > >  has_unmovable_pages
> > > > > 
> > > > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > > > has_unmovable_pages" would break CMA allocator because it relies on
> > > > > has_unmovable_pages returning false even for CMA pageblocks which in
> > > > > fact don't have to be movable:
> > > > > alloc_contig_range
> > > > >   start_isolate_page_range
> > > > >     set_migratetype_isolate
> > > > >       has_unmovable_pages
> > > > > 
> > > > > This is a result of the code sharing between CMA and memory hotplug
> > > > > while each one has a different idea of what has_unmovable_pages should
> > > > > return. This is unfortunate but fixing it properly would require a lot
> > > > > of code duplication.
> > > > > 
> > > > > Fix the issue by introducing the requested migrate type argument
> > > > > and special case MIGRATE_CMA case where CMA page blocks are handled
> > > > > properly. This will work for memory hotplug because it requires
> > > > > MIGRATE_MOVABLE.
> > > > 
> > > > Unfortunately, alloc_contig_range() can be called with
> > > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > > 
> > > Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> > > understanding is that only CMA allocator really depends on this weird
> > > semantic and that does MIGRATE_CMA unconditionally.
> > 
> > alloc_contig_range() could be called for partial pages in the
> > pageblock. With your patch, this case also fails unnecessarilly if the
> > other pages in the pageblock is pinned.
> 
> Is this really the case for GB pages? Do we really want to mess those

No, but, as I mentioned already, this API can be called with less
pages. I know that there is no user with less pages at this moment but
I cannot see any point to reduce this API's capability.

> with CMA blocks and make those blocks basically unusable because GB
> pages are rarely (if at all migrateable)?
> 
> > Until now, there is no user calling alloc_contig_range() with partial
> > pages except CMA allocator but API could support it.
> 
> I disagree. If this is a CMA thing it should stay that way. The semantic
> is quite confusing already, please let's not make it even worse.

It is already used by other component.

I'm not sure what is the confusing semantic you mentioned. I think
that set_migratetype_isolate() has confusing semantic and should be
fixed since making the pageblock isolated doesn't need to check if
there is unmovable page or not. Do you think that
set_migratetype_isolate() need to check it? If so, why?

> > > > I did a more thinking and found that it's strange to check if there is
> > > > unmovable page in the pageblock during the set_migratetype_isolate().
> > > > set_migratetype_isolate() should be just for setting the migratetype
> > > > of the pageblock. Checking other things should be done by another
> > > > place, for example, before calling the start_isolate_page_range() in
> > > > __offline_pages().
> > > 
> > > How do we guarantee the atomicity?
> > 
> > What atomicity do you mean?
> 
> Currently we are checking and isolating pages under zone lock. If we
> split that we are losing atomicity, aren't we.

I think that it can be done easily.

set_migratetype_isolate() {
        lock
        __set_migratetype_isolate();
        unlock
}

set_migratetype_isolate_if_no_unmovable_pages() {
        lock
        if (has_unmovable_pages())
                fail
        else
                __set_migratetype_isolate()
        unlock
}

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-20  8:17                                   ` Michal Hocko
@ 2017-10-23  5:26                                     ` Joonsoo Kim
  -1 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-23  5:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Xishi Qiu, linux-mm, Michael Ellerman, Vlastimil Babka,
	Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri, Oct 20, 2017 at 10:17:00AM +0200, Michal Hocko wrote:
> On Fri 20-10-17 15:22:14, Xishi Qiu wrote:
> > On 2017/10/20 10:13, Joonsoo Kim wrote:
> > 
> > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> [...]
> > >> >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > >> From: Michal Hocko <mhocko@suse.com>
> > >> Date: Thu, 19 Oct 2017 14:14:02 +0200
> > >> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > >>  has_unmovable_pages
> > >>
> > >> Joonsoo has noticed that "mm: drop migrate type checks from
> > >> has_unmovable_pages" would break CMA allocator because it relies on
> > >> has_unmovable_pages returning false even for CMA pageblocks which in
> > >> fact don't have to be movable:
> > >> alloc_contig_range
> > >>   start_isolate_page_range
> > >>     set_migratetype_isolate
> > >>       has_unmovable_pages
> > >>
> > >> This is a result of the code sharing between CMA and memory hotplug
> > >> while each one has a different idea of what has_unmovable_pages should
> > >> return. This is unfortunate but fixing it properly would require a lot
> > >> of code duplication.
> > >>
> > >> Fix the issue by introducing the requested migrate type argument
> > >> and special case MIGRATE_CMA case where CMA page blocks are handled
> > >> properly. This will work for memory hotplug because it requires
> > >> MIGRATE_MOVABLE.
> > > 
> > > Unfortunately, alloc_contig_range() can be called with
> > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > > 
> > > I did a more thinking and found that it's strange to check if there is
> > > unmovable page in the pageblock during the set_migratetype_isolate().
> > > set_migratetype_isolate() should be just for setting the migratetype
> > > of the pageblock. Checking other things should be done by another
> > > place, for example, before calling the start_isolate_page_range() in
> > > __offline_pages().
> > > 
> > > Thanks.
> > > 
> > 
> > Hi Joonsoo,
> > 
> > How about add a flag to skip or not has_unmovable_pages() in set_migratetype_isolate()?
> > Something like the skip_hwpoisoned_pages.
> 
> I believe this is what Joonsoo was proposing actually. I cannot say I

Yes, I initially suggested this idea but change my mind. Now, I think
that problem is not in has_unmovable_pages() but in
set_migratetype_isolate(). So different solution is needed. See my other
reply to Michal.

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-23  5:26                                     ` Joonsoo Kim
  0 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-23  5:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Xishi Qiu, linux-mm, Michael Ellerman, Vlastimil Babka,
	Andrew Morton, KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri, Oct 20, 2017 at 10:17:00AM +0200, Michal Hocko wrote:
> On Fri 20-10-17 15:22:14, Xishi Qiu wrote:
> > On 2017/10/20 10:13, Joonsoo Kim wrote:
> > 
> > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> [...]
> > >> >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > >> From: Michal Hocko <mhocko@suse.com>
> > >> Date: Thu, 19 Oct 2017 14:14:02 +0200
> > >> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > >>  has_unmovable_pages
> > >>
> > >> Joonsoo has noticed that "mm: drop migrate type checks from
> > >> has_unmovable_pages" would break CMA allocator because it relies on
> > >> has_unmovable_pages returning false even for CMA pageblocks which in
> > >> fact don't have to be movable:
> > >> alloc_contig_range
> > >>   start_isolate_page_range
> > >>     set_migratetype_isolate
> > >>       has_unmovable_pages
> > >>
> > >> This is a result of the code sharing between CMA and memory hotplug
> > >> while each one has a different idea of what has_unmovable_pages should
> > >> return. This is unfortunate but fixing it properly would require a lot
> > >> of code duplication.
> > >>
> > >> Fix the issue by introducing the requested migrate type argument
> > >> and special case MIGRATE_CMA case where CMA page blocks are handled
> > >> properly. This will work for memory hotplug because it requires
> > >> MIGRATE_MOVABLE.
> > > 
> > > Unfortunately, alloc_contig_range() can be called with
> > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > > 
> > > I did a more thinking and found that it's strange to check if there is
> > > unmovable page in the pageblock during the set_migratetype_isolate().
> > > set_migratetype_isolate() should be just for setting the migratetype
> > > of the pageblock. Checking other things should be done by another
> > > place, for example, before calling the start_isolate_page_range() in
> > > __offline_pages().
> > > 
> > > Thanks.
> > > 
> > 
> > Hi Joonsoo,
> > 
> > How about add a flag to skip or not has_unmovable_pages() in set_migratetype_isolate()?
> > Something like the skip_hwpoisoned_pages.
> 
> I believe this is what Joonsoo was proposing actually. I cannot say I

Yes, I initially suggested this idea but change my mind. Now, I think
that problem is not in has_unmovable_pages() but in
set_migratetype_isolate(). So different solution is needed. See my other
reply to Michal.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-23  5:23                                       ` Joonsoo Kim
@ 2017-10-23  8:10                                         ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-23  8:10 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Mon 23-10-17 14:23:09, Joonsoo Kim wrote:
> On Fri, Oct 20, 2017 at 09:02:20AM +0200, Michal Hocko wrote:
> > On Fri 20-10-17 15:50:14, Joonsoo Kim wrote:
> > > On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> > > > On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > > > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > > > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > > > > [...]
> > > > > > > > > > Hello,
> > > > > > > > > > 
> > > > > > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > > > > > fail because it requires isolating the pageblock.
> > > > > > > > > 
> > > > > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > > > > > 
> > > > > > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > > > > > >
> > > > > > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > > > > > 
> > > > > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > > > > 
> > > > > > > I see. It seems that the CMA and memory hotplug have a very different
> > > > > > > view on what should happen during isolation.
> > > > > > >  
> > > > > > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > > > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > > > > > specified.
> > > > > > > 
> > > > > > > Can we use the migratetype argument and do the special thing for
> > > > > > > MIGRATE_CMA? Like the following diff?
> > > > > > 
> > > > > > And with the full changelog.
> > > > > > ---
> > > > > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > > > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > > > > >  has_unmovable_pages
> > > > > > 
> > > > > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > > > > has_unmovable_pages" would break CMA allocator because it relies on
> > > > > > has_unmovable_pages returning false even for CMA pageblocks which in
> > > > > > fact don't have to be movable:
> > > > > > alloc_contig_range
> > > > > >   start_isolate_page_range
> > > > > >     set_migratetype_isolate
> > > > > >       has_unmovable_pages
> > > > > > 
> > > > > > This is a result of the code sharing between CMA and memory hotplug
> > > > > > while each one has a different idea of what has_unmovable_pages should
> > > > > > return. This is unfortunate but fixing it properly would require a lot
> > > > > > of code duplication.
> > > > > > 
> > > > > > Fix the issue by introducing the requested migrate type argument
> > > > > > and special case MIGRATE_CMA case where CMA page blocks are handled
> > > > > > properly. This will work for memory hotplug because it requires
> > > > > > MIGRATE_MOVABLE.
> > > > > 
> > > > > Unfortunately, alloc_contig_range() can be called with
> > > > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > > > 
> > > > Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> > > > understanding is that only CMA allocator really depends on this weird
> > > > semantic and that does MIGRATE_CMA unconditionally.
> > > 
> > > alloc_contig_range() could be called for partial pages in the
> > > pageblock. With your patch, this case also fails unnecessarilly if the
> > > other pages in the pageblock is pinned.
> > 
> > Is this really the case for GB pages? Do we really want to mess those
> 
> No, but, as I mentioned already, this API can be called with less
> pages. I know that there is no user with less pages at this moment but
> I cannot see any point to reduce this API's capability.

I am still confused. So when exactly would you want to use this api for
MIGRATE_MOVABLE and use a partial MIGRATE_CMA pageblock?

> > with CMA blocks and make those blocks basically unusable because GB
> > pages are rarely (if at all migrateable)?
> > 
> > > Until now, there is no user calling alloc_contig_range() with partial
> > > pages except CMA allocator but API could support it.
> > 
> > I disagree. If this is a CMA thing it should stay that way. The semantic
> > is quite confusing already, please let's not make it even worse.
> 
> It is already used by other component.
> 
> I'm not sure what is the confusing semantic you mentioned. I think
> that set_migratetype_isolate() has confusing semantic and should be
> fixed since making the pageblock isolated doesn't need to check if
> there is unmovable page or not. Do you think that
> set_migratetype_isolate() need to check it? If so, why?

My intuitive understanding of set_migratetype_isolate is that it either
suceeds and that means that the given pfn range can be isolated for the
given type of allocation (be it movable or cma). No new pages will be
allocated from this range to allow converging into a free range in a
finit amount of time. At least this is how the hotplug code would like
to use it and I suppose that the alloc_contig_range would like to
guarantee the same to not rely on a fixed amount of migration attempts.

> > > > > I did a more thinking and found that it's strange to check if there is
> > > > > unmovable page in the pageblock during the set_migratetype_isolate().
> > > > > set_migratetype_isolate() should be just for setting the migratetype
> > > > > of the pageblock. Checking other things should be done by another
> > > > > place, for example, before calling the start_isolate_page_range() in
> > > > > __offline_pages().
> > > > 
> > > > How do we guarantee the atomicity?
> > > 
> > > What atomicity do you mean?
> > 
> > Currently we are checking and isolating pages under zone lock. If we
> > split that we are losing atomicity, aren't we.
> 
> I think that it can be done easily.
> 
> set_migratetype_isolate() {
>         lock
>         __set_migratetype_isolate();
>         unlock
> }
> 
> set_migratetype_isolate_if_no_unmovable_pages() {
>         lock
>         if (has_unmovable_pages())
>                 fail
>         else
>                 __set_migratetype_isolate()
>         unlock
> }

So you are essentially suggesting to split the API for
alloc_contig_range and hotplug users? Care to send a patch? It is not
like I would really love this but I would really like to have this issue
addressed because I really do want all other patches which depend on
this to be merged in the next release cycle.

That being said, I would much rather see MIGRATE_CMA case special cased
than duplicate the already confusing API but I will not insist of
course.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-23  8:10                                         ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-23  8:10 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Mon 23-10-17 14:23:09, Joonsoo Kim wrote:
> On Fri, Oct 20, 2017 at 09:02:20AM +0200, Michal Hocko wrote:
> > On Fri 20-10-17 15:50:14, Joonsoo Kim wrote:
> > > On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> > > > On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > > > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > > > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > > > > [...]
> > > > > > > > > > Hello,
> > > > > > > > > > 
> > > > > > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > > > > > fail because it requires isolating the pageblock.
> > > > > > > > > 
> > > > > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > > > > > 
> > > > > > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > > > > > >
> > > > > > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > > > > > 
> > > > > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > > > > 
> > > > > > > I see. It seems that the CMA and memory hotplug have a very different
> > > > > > > view on what should happen during isolation.
> > > > > > >  
> > > > > > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > > > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > > > > > specified.
> > > > > > > 
> > > > > > > Can we use the migratetype argument and do the special thing for
> > > > > > > MIGRATE_CMA? Like the following diff?
> > > > > > 
> > > > > > And with the full changelog.
> > > > > > ---
> > > > > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > > > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > > > > >  has_unmovable_pages
> > > > > > 
> > > > > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > > > > has_unmovable_pages" would break CMA allocator because it relies on
> > > > > > has_unmovable_pages returning false even for CMA pageblocks which in
> > > > > > fact don't have to be movable:
> > > > > > alloc_contig_range
> > > > > >   start_isolate_page_range
> > > > > >     set_migratetype_isolate
> > > > > >       has_unmovable_pages
> > > > > > 
> > > > > > This is a result of the code sharing between CMA and memory hotplug
> > > > > > while each one has a different idea of what has_unmovable_pages should
> > > > > > return. This is unfortunate but fixing it properly would require a lot
> > > > > > of code duplication.
> > > > > > 
> > > > > > Fix the issue by introducing the requested migrate type argument
> > > > > > and special case MIGRATE_CMA case where CMA page blocks are handled
> > > > > > properly. This will work for memory hotplug because it requires
> > > > > > MIGRATE_MOVABLE.
> > > > > 
> > > > > Unfortunately, alloc_contig_range() can be called with
> > > > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > > > 
> > > > Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> > > > understanding is that only CMA allocator really depends on this weird
> > > > semantic and that does MIGRATE_CMA unconditionally.
> > > 
> > > alloc_contig_range() could be called for partial pages in the
> > > pageblock. With your patch, this case also fails unnecessarilly if the
> > > other pages in the pageblock is pinned.
> > 
> > Is this really the case for GB pages? Do we really want to mess those
> 
> No, but, as I mentioned already, this API can be called with less
> pages. I know that there is no user with less pages at this moment but
> I cannot see any point to reduce this API's capability.

I am still confused. So when exactly would you want to use this api for
MIGRATE_MOVABLE and use a partial MIGRATE_CMA pageblock?

> > with CMA blocks and make those blocks basically unusable because GB
> > pages are rarely (if at all migrateable)?
> > 
> > > Until now, there is no user calling alloc_contig_range() with partial
> > > pages except CMA allocator but API could support it.
> > 
> > I disagree. If this is a CMA thing it should stay that way. The semantic
> > is quite confusing already, please let's not make it even worse.
> 
> It is already used by other component.
> 
> I'm not sure what is the confusing semantic you mentioned. I think
> that set_migratetype_isolate() has confusing semantic and should be
> fixed since making the pageblock isolated doesn't need to check if
> there is unmovable page or not. Do you think that
> set_migratetype_isolate() need to check it? If so, why?

My intuitive understanding of set_migratetype_isolate is that it either
suceeds and that means that the given pfn range can be isolated for the
given type of allocation (be it movable or cma). No new pages will be
allocated from this range to allow converging into a free range in a
finit amount of time. At least this is how the hotplug code would like
to use it and I suppose that the alloc_contig_range would like to
guarantee the same to not rely on a fixed amount of migration attempts.

> > > > > I did a more thinking and found that it's strange to check if there is
> > > > > unmovable page in the pageblock during the set_migratetype_isolate().
> > > > > set_migratetype_isolate() should be just for setting the migratetype
> > > > > of the pageblock. Checking other things should be done by another
> > > > > place, for example, before calling the start_isolate_page_range() in
> > > > > __offline_pages().
> > > > 
> > > > How do we guarantee the atomicity?
> > > 
> > > What atomicity do you mean?
> > 
> > Currently we are checking and isolating pages under zone lock. If we
> > split that we are losing atomicity, aren't we.
> 
> I think that it can be done easily.
> 
> set_migratetype_isolate() {
>         lock
>         __set_migratetype_isolate();
>         unlock
> }
> 
> set_migratetype_isolate_if_no_unmovable_pages() {
>         lock
>         if (has_unmovable_pages())
>                 fail
>         else
>                 __set_migratetype_isolate()
>         unlock
> }

So you are essentially suggesting to split the API for
alloc_contig_range and hotplug users? Care to send a patch? It is not
like I would really love this but I would really like to have this issue
addressed because I really do want all other patches which depend on
this to be merged in the next release cycle.

That being said, I would much rather see MIGRATE_CMA case special cased
than duplicate the already confusing API but I will not insist of
course.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-23  8:10                                         ` Michal Hocko
@ 2017-10-24  4:44                                           ` Joonsoo Kim
  -1 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-24  4:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Mon, Oct 23, 2017 at 10:10:09AM +0200, Michal Hocko wrote:
> On Mon 23-10-17 14:23:09, Joonsoo Kim wrote:
> > On Fri, Oct 20, 2017 at 09:02:20AM +0200, Michal Hocko wrote:
> > > On Fri 20-10-17 15:50:14, Joonsoo Kim wrote:
> > > > On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> > > > > On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > > > > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > > > > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > > > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > > > > > [...]
> > > > > > > > > > > Hello,
> > > > > > > > > > > 
> > > > > > > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > > > > > > fail because it requires isolating the pageblock.
> > > > > > > > > > 
> > > > > > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > > > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > > > > > > 
> > > > > > > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > > > > > > >
> > > > > > > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > > > > > > 
> > > > > > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > > > > > 
> > > > > > > > I see. It seems that the CMA and memory hotplug have a very different
> > > > > > > > view on what should happen during isolation.
> > > > > > > >  
> > > > > > > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > > > > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > > > > > > specified.
> > > > > > > > 
> > > > > > > > Can we use the migratetype argument and do the special thing for
> > > > > > > > MIGRATE_CMA? Like the following diff?
> > > > > > > 
> > > > > > > And with the full changelog.
> > > > > > > ---
> > > > > > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > > > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > > > > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > > > > > >  has_unmovable_pages
> > > > > > > 
> > > > > > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > > > > > has_unmovable_pages" would break CMA allocator because it relies on
> > > > > > > has_unmovable_pages returning false even for CMA pageblocks which in
> > > > > > > fact don't have to be movable:
> > > > > > > alloc_contig_range
> > > > > > >   start_isolate_page_range
> > > > > > >     set_migratetype_isolate
> > > > > > >       has_unmovable_pages
> > > > > > > 
> > > > > > > This is a result of the code sharing between CMA and memory hotplug
> > > > > > > while each one has a different idea of what has_unmovable_pages should
> > > > > > > return. This is unfortunate but fixing it properly would require a lot
> > > > > > > of code duplication.
> > > > > > > 
> > > > > > > Fix the issue by introducing the requested migrate type argument
> > > > > > > and special case MIGRATE_CMA case where CMA page blocks are handled
> > > > > > > properly. This will work for memory hotplug because it requires
> > > > > > > MIGRATE_MOVABLE.
> > > > > > 
> > > > > > Unfortunately, alloc_contig_range() can be called with
> > > > > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > > > > 
> > > > > Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> > > > > understanding is that only CMA allocator really depends on this weird
> > > > > semantic and that does MIGRATE_CMA unconditionally.
> > > > 
> > > > alloc_contig_range() could be called for partial pages in the
> > > > pageblock. With your patch, this case also fails unnecessarilly if the
> > > > other pages in the pageblock is pinned.
> > > 
> > > Is this really the case for GB pages? Do we really want to mess those
> > 
> > No, but, as I mentioned already, this API can be called with less
> > pages. I know that there is no user with less pages at this moment but
> > I cannot see any point to reduce this API's capability.
> 
> I am still confused. So when exactly would you want to use this api for
> MIGRATE_MOVABLE and use a partial MIGRATE_CMA pageblock?
> 
> > > with CMA blocks and make those blocks basically unusable because GB
> > > pages are rarely (if at all migrateable)?
> > > 
> > > > Until now, there is no user calling alloc_contig_range() with partial
> > > > pages except CMA allocator but API could support it.
> > > 
> > > I disagree. If this is a CMA thing it should stay that way. The semantic
> > > is quite confusing already, please let's not make it even worse.
> > 
> > It is already used by other component.
> > 
> > I'm not sure what is the confusing semantic you mentioned. I think
> > that set_migratetype_isolate() has confusing semantic and should be
> > fixed since making the pageblock isolated doesn't need to check if
> > there is unmovable page or not. Do you think that
> > set_migratetype_isolate() need to check it? If so, why?
> 
> My intuitive understanding of set_migratetype_isolate is that it either
> suceeds and that means that the given pfn range can be isolated for the
> given type of allocation (be it movable or cma). No new pages will be
> allocated from this range to allow converging into a free range in a
> finit amount of time. At least this is how the hotplug code would like
> to use it and I suppose that the alloc_contig_range would like to
> guarantee the same to not rely on a fixed amount of migration attempts.

Yes, alloc_contig_range() also want to guarantee the similar thing.
Major difference between them is 'given pfn range'. memory hotplug
works by pageblock unit but alloc_contig_range() doesn't.
alloc_contig_range() works by the page unit. However, there is no easy
way to isolate individual page so it uses pageblock isolation
regardless of 'given pfn range'. In this case, checking movability of
all pages on the pageblock would cause the problem as I mentioned
before.

> 
> > > > > > I did a more thinking and found that it's strange to check if there is
> > > > > > unmovable page in the pageblock during the set_migratetype_isolate().
> > > > > > set_migratetype_isolate() should be just for setting the migratetype
> > > > > > of the pageblock. Checking other things should be done by another
> > > > > > place, for example, before calling the start_isolate_page_range() in
> > > > > > __offline_pages().
> > > > > 
> > > > > How do we guarantee the atomicity?
> > > > 
> > > > What atomicity do you mean?
> > > 
> > > Currently we are checking and isolating pages under zone lock. If we
> > > split that we are losing atomicity, aren't we.
> > 
> > I think that it can be done easily.
> > 
> > set_migratetype_isolate() {
> >         lock
> >         __set_migratetype_isolate();
> >         unlock
> > }
> > 
> > set_migratetype_isolate_if_no_unmovable_pages() {
> >         lock
> >         if (has_unmovable_pages())
> >                 fail
> >         else
> >                 __set_migratetype_isolate()
> >         unlock
> > }
> 
> So you are essentially suggesting to split the API for
> alloc_contig_range and hotplug users? Care to send a patch? It is not
> like I would really love this but I would really like to have this issue
> addressed because I really do want all other patches which depend on
> this to be merged in the next release cycle.
> 
> That being said, I would much rather see MIGRATE_CMA case special cased
> than duplicate the already confusing API but I will not insist of
> course.

Okay. I atteach the patch. Andrew, could you revert Michal's series
and apply this patch first? Perhaps, Michal will resend his series on
top of this one.

Thanks.


--------------->8-------------------
>From e8e6215c4cdadf7f4df7c420349750412d89e99b Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date: Tue, 24 Oct 2017 11:38:22 +0900
Subject: [PATCH] mm/page_isolation: separate the pageblock isolation function

There are two users who use pageblock isolation function,
alloc_contig_range() and memory hotplug. Each one has different purpose
on isolation so they should be treated separately. For example,
alloc_contig_range() doesn't require that all pages on the pageblock are
movable because it could just needs part of pages on the pageblock.
But, memory hotplug does since memory offline works for pageblock unit
or more. Currently, they are distiniguished by migratetype of
the target pageblock but it causes a problem on memory hotplug
so it's better to separate the function completely at this moment.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/page-isolation.h |  3 +-
 mm/page_alloc.c                |  9 +++--
 mm/page_isolation.c            | 76 +++++++++++++++++++++++++++++-------------
 3 files changed, 57 insertions(+), 31 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index d4cd201..614dc00 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -29,8 +29,7 @@ static inline bool is_migrate_isolate(int migratetype)
 }
 #endif
 
-bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
-                        bool skip_hwpoisoned_pages);
+bool has_unremovable_pages(struct zone *zone, struct page *page, int count);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
                                int migratetype, int *num_movable);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1008c58..04f3b36 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7373,7 +7373,7 @@ void *__init alloc_large_system_hash(const char *tablename,
 }
 
 /*
- * This function checks whether pageblock includes unmovable pages or not.
+ * This function checks whether pageblock includes unremovable pages or not.
  * If @count is not zero, it is okay to include less @count unmovable pages
  *
  * PageLRU check without isolation or lru_lock could race so that
@@ -7381,8 +7381,7 @@ void *__init alloc_large_system_hash(const char *tablename,
  * check without lock_page also may miss some movable non-lru pages at
  * race condition. So you can't expect this function should be exact.
  */
-bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
-                        bool skip_hwpoisoned_pages)
+bool has_unremovable_pages(struct zone *zone, struct page *page, int count)
 {
        unsigned long pfn, iter, found;
        int mt;
@@ -7432,7 +7431,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
                 * The HWPoisoned page may be not in buddy system, and
                 * page_count() is not 0.
                 */
-               if (skip_hwpoisoned_pages && PageHWPoison(page))
+               if (PageHWPoison(page))
                        continue;
 
                if (__PageMovable(page))
@@ -7479,7 +7478,7 @@ bool is_pageblock_removable_nolock(struct page *page)
        if (!zone_spans_pfn(zone, pfn))
                return false;
 
-       return !has_unmovable_pages(zone, page, 0, true);
+       return !has_unremovable_pages(zone, page, 0);
 }
 
 #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 757410d..1650e01 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -14,8 +14,38 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
-static int set_migratetype_isolate(struct page *page,
-                               bool skip_hwpoisoned_pages)
+/* Should hold the zone lock */
+static void __set_migratetype_isolate(struct zone *zone,
+                               struct page *page, int mt)
+{
+       unsigned long nr_pages;
+
+       set_pageblock_migratetype(page, MIGRATE_ISOLATE);
+       zone->nr_isolate_pageblock++;
+       nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, NULL);
+       __mod_zone_freepage_state(zone, -nr_pages, mt);
+}
+
+static int isolate_pageblock(struct page *page, int mt)
+{
+       struct zone *zone = page_zone(page);
+       unsigned long flags;
+
+       spin_lock_irqsave(&zone->lock, flags);
+       if (get_pageblock_migratetype(page) != mt) {
+               spin_unlock_irqrestore(&zone->lock, flags);
+               return -EBUSY;
+       }
+
+       __set_migratetype_isolate(zone, page, mt);
+       spin_unlock_irqrestore(&zone->lock, flags);
+
+       drain_all_pages(zone);
+
+       return 0;
+}
+
+static int isolate_pageblock_for_offline(struct page *page)
 {
        struct zone *zone;
        unsigned long flags, pfn;
@@ -46,33 +76,22 @@ static int set_migratetype_isolate(struct page *page,
        notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
        notifier_ret = notifier_to_errno(notifier_ret);
        if (notifier_ret)
-               goto out;
+               goto err;
        /*
         * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
         * We just check MOVABLE pages.
         */
-       if (!has_unmovable_pages(zone, page, arg.pages_found,
-                                skip_hwpoisoned_pages))
-               ret = 0;
-
        /*
         * immobile means "not-on-lru" pages. If immobile is larger than
         * removable-by-driver pages reported by notifier, we'll fail.
         */
+       if (has_unremovable_pages(zone, page, arg.pages_found))
+               goto err;
 
-out:
-       if (!ret) {
-               unsigned long nr_pages;
-               int migratetype = get_pageblock_migratetype(page);
-
-               set_pageblock_migratetype(page, MIGRATE_ISOLATE);
-               zone->nr_isolate_pageblock++;
-               nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
-                                                                       NULL);
-
-               __mod_zone_freepage_state(zone, -nr_pages, migratetype);
-       }
+       __set_migratetype_isolate(zone, page, get_pageblock_migratetype(page));
+       ret = 0;
 
+err:
        spin_unlock_irqrestore(&zone->lock, flags);
        if (!ret)
                drain_all_pages(zone);
@@ -159,6 +178,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
  * @migratetype: migrate type to set in error recovery.
+ * @for_memory_offline: The purpose of the isolation
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -168,7 +188,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-                            unsigned migratetype, bool skip_hwpoisoned_pages)
+                            unsigned int migratetype, bool for_memory_offline)
 {
        unsigned long pfn;
        unsigned long undo_pfn;
@@ -181,11 +201,19 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
             pfn < end_pfn;
             pfn += pageblock_nr_pages) {
                page = __first_valid_page(pfn, pageblock_nr_pages);
-               if (page &&
-                   set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
-                       undo_pfn = pfn;
-                       goto undo;
+               if (!page)
+                       continue;
+
+               if (for_memory_offline) {
+                       if (!isolate_pageblock_for_offline(page))
+                               continue;
+               } else {
+                       if (!isolate_pageblock(page, migratetype))
+                               continue;
                }
+
+               undo_pfn = pfn;
+               goto undo;
        }
        return 0;
 undo:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-24  4:44                                           ` Joonsoo Kim
  0 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-24  4:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Mon, Oct 23, 2017 at 10:10:09AM +0200, Michal Hocko wrote:
> On Mon 23-10-17 14:23:09, Joonsoo Kim wrote:
> > On Fri, Oct 20, 2017 at 09:02:20AM +0200, Michal Hocko wrote:
> > > On Fri 20-10-17 15:50:14, Joonsoo Kim wrote:
> > > > On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> > > > > On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > > > > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > > > > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > > > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > > > > > [...]
> > > > > > > > > > > Hello,
> > > > > > > > > > > 
> > > > > > > > > > > This patch will break the CMA user. As you mentioned, CMA allocation
> > > > > > > > > > > itself isn't migrateable. So, after a single page is allocated through
> > > > > > > > > > > CMA allocation, has_unmovable_pages() will return true for this
> > > > > > > > > > > pageblock. Then, futher CMA allocation request to this pageblock will
> > > > > > > > > > > fail because it requires isolating the pageblock.
> > > > > > > > > > 
> > > > > > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > > > > > has_unmovable_pages to return false here even though the memory is not
> > > > > > > > > > movable? This sounds really strange to me and kind of abuse of this
> > > > > > > > > 
> > > > > > > > > Your understanding is correct. Perhaps, abuse or wrong function name.
> > > > > > > > >
> > > > > > > > > > function. Which path is that? Can we do the migrate type test theres?
> > > > > > > > > 
> > > > > > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > > > > > 
> > > > > > > > I see. It seems that the CMA and memory hotplug have a very different
> > > > > > > > view on what should happen during isolation.
> > > > > > > >  
> > > > > > > > > We can add one argument, 'XXX' to set_migratetype_isolate() and change
> > > > > > > > > it to check migrate type rather than has_unmovable_pages() if 'XXX' is
> > > > > > > > > specified.
> > > > > > > > 
> > > > > > > > Can we use the migratetype argument and do the special thing for
> > > > > > > > MIGRATE_CMA? Like the following diff?
> > > > > > > 
> > > > > > > And with the full changelog.
> > > > > > > ---
> > > > > > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> > > > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > > > > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > > > > > >  has_unmovable_pages
> > > > > > > 
> > > > > > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > > > > > has_unmovable_pages" would break CMA allocator because it relies on
> > > > > > > has_unmovable_pages returning false even for CMA pageblocks which in
> > > > > > > fact don't have to be movable:
> > > > > > > alloc_contig_range
> > > > > > >   start_isolate_page_range
> > > > > > >     set_migratetype_isolate
> > > > > > >       has_unmovable_pages
> > > > > > > 
> > > > > > > This is a result of the code sharing between CMA and memory hotplug
> > > > > > > while each one has a different idea of what has_unmovable_pages should
> > > > > > > return. This is unfortunate but fixing it properly would require a lot
> > > > > > > of code duplication.
> > > > > > > 
> > > > > > > Fix the issue by introducing the requested migrate type argument
> > > > > > > and special case MIGRATE_CMA case where CMA page blocks are handled
> > > > > > > properly. This will work for memory hotplug because it requires
> > > > > > > MIGRATE_MOVABLE.
> > > > > > 
> > > > > > Unfortunately, alloc_contig_range() can be called with
> > > > > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > > > > 
> > > > > Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> > > > > understanding is that only CMA allocator really depends on this weird
> > > > > semantic and that does MIGRATE_CMA unconditionally.
> > > > 
> > > > alloc_contig_range() could be called for partial pages in the
> > > > pageblock. With your patch, this case also fails unnecessarilly if the
> > > > other pages in the pageblock is pinned.
> > > 
> > > Is this really the case for GB pages? Do we really want to mess those
> > 
> > No, but, as I mentioned already, this API can be called with less
> > pages. I know that there is no user with less pages at this moment but
> > I cannot see any point to reduce this API's capability.
> 
> I am still confused. So when exactly would you want to use this api for
> MIGRATE_MOVABLE and use a partial MIGRATE_CMA pageblock?
> 
> > > with CMA blocks and make those blocks basically unusable because GB
> > > pages are rarely (if at all migrateable)?
> > > 
> > > > Until now, there is no user calling alloc_contig_range() with partial
> > > > pages except CMA allocator but API could support it.
> > > 
> > > I disagree. If this is a CMA thing it should stay that way. The semantic
> > > is quite confusing already, please let's not make it even worse.
> > 
> > It is already used by other component.
> > 
> > I'm not sure what is the confusing semantic you mentioned. I think
> > that set_migratetype_isolate() has confusing semantic and should be
> > fixed since making the pageblock isolated doesn't need to check if
> > there is unmovable page or not. Do you think that
> > set_migratetype_isolate() need to check it? If so, why?
> 
> My intuitive understanding of set_migratetype_isolate is that it either
> suceeds and that means that the given pfn range can be isolated for the
> given type of allocation (be it movable or cma). No new pages will be
> allocated from this range to allow converging into a free range in a
> finit amount of time. At least this is how the hotplug code would like
> to use it and I suppose that the alloc_contig_range would like to
> guarantee the same to not rely on a fixed amount of migration attempts.

Yes, alloc_contig_range() also want to guarantee the similar thing.
Major difference between them is 'given pfn range'. memory hotplug
works by pageblock unit but alloc_contig_range() doesn't.
alloc_contig_range() works by the page unit. However, there is no easy
way to isolate individual page so it uses pageblock isolation
regardless of 'given pfn range'. In this case, checking movability of
all pages on the pageblock would cause the problem as I mentioned
before.

> 
> > > > > > I did a more thinking and found that it's strange to check if there is
> > > > > > unmovable page in the pageblock during the set_migratetype_isolate().
> > > > > > set_migratetype_isolate() should be just for setting the migratetype
> > > > > > of the pageblock. Checking other things should be done by another
> > > > > > place, for example, before calling the start_isolate_page_range() in
> > > > > > __offline_pages().
> > > > > 
> > > > > How do we guarantee the atomicity?
> > > > 
> > > > What atomicity do you mean?
> > > 
> > > Currently we are checking and isolating pages under zone lock. If we
> > > split that we are losing atomicity, aren't we.
> > 
> > I think that it can be done easily.
> > 
> > set_migratetype_isolate() {
> >         lock
> >         __set_migratetype_isolate();
> >         unlock
> > }
> > 
> > set_migratetype_isolate_if_no_unmovable_pages() {
> >         lock
> >         if (has_unmovable_pages())
> >                 fail
> >         else
> >                 __set_migratetype_isolate()
> >         unlock
> > }
> 
> So you are essentially suggesting to split the API for
> alloc_contig_range and hotplug users? Care to send a patch? It is not
> like I would really love this but I would really like to have this issue
> addressed because I really do want all other patches which depend on
> this to be merged in the next release cycle.
> 
> That being said, I would much rather see MIGRATE_CMA case special cased
> than duplicate the already confusing API but I will not insist of
> course.

Okay. I atteach the patch. Andrew, could you revert Michal's series
and apply this patch first? Perhaps, Michal will resend his series on
top of this one.

Thanks.


--------------->8-------------------

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-24  4:44                                           ` Joonsoo Kim
@ 2017-10-24  7:44                                             ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-24  7:44 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Tue 24-10-17 13:44:23, Joonsoo Kim wrote:
> On Mon, Oct 23, 2017 at 10:10:09AM +0200, Michal Hocko wrote:
[...]
> > My intuitive understanding of set_migratetype_isolate is that it either
> > suceeds and that means that the given pfn range can be isolated for the
> > given type of allocation (be it movable or cma). No new pages will be
> > allocated from this range to allow converging into a free range in a
> > finit amount of time. At least this is how the hotplug code would like
> > to use it and I suppose that the alloc_contig_range would like to
> > guarantee the same to not rely on a fixed amount of migration attempts.
> 
> Yes, alloc_contig_range() also want to guarantee the similar thing.
> Major difference between them is 'given pfn range'. memory hotplug
> works by pageblock unit but alloc_contig_range() doesn't.
> alloc_contig_range() works by the page unit. However, there is no easy
> way to isolate individual page so it uses pageblock isolation
> regardless of 'given pfn range'.

I am still confused. So when is it safe to isolate a page from the CMA
pageblock for something that is not a CMA allocation request? Don't we
lose a CMA guanratee that way? 

[...]
> > That being said, I would much rather see MIGRATE_CMA case special cased
> > than duplicate the already confusing API but I will not insist of
> > course.
> 
> Okay. I atteach the patch. Andrew, could you revert Michal's series
> and apply this patch first? Perhaps, Michal will resend his series on
> top of this one.

I am not convinced about this approach but I will not argue about the
patch though. If this is seen as a right way forward, I will rebase
my patches on top.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-24  7:44                                             ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-24  7:44 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Tue 24-10-17 13:44:23, Joonsoo Kim wrote:
> On Mon, Oct 23, 2017 at 10:10:09AM +0200, Michal Hocko wrote:
[...]
> > My intuitive understanding of set_migratetype_isolate is that it either
> > suceeds and that means that the given pfn range can be isolated for the
> > given type of allocation (be it movable or cma). No new pages will be
> > allocated from this range to allow converging into a free range in a
> > finit amount of time. At least this is how the hotplug code would like
> > to use it and I suppose that the alloc_contig_range would like to
> > guarantee the same to not rely on a fixed amount of migration attempts.
> 
> Yes, alloc_contig_range() also want to guarantee the similar thing.
> Major difference between them is 'given pfn range'. memory hotplug
> works by pageblock unit but alloc_contig_range() doesn't.
> alloc_contig_range() works by the page unit. However, there is no easy
> way to isolate individual page so it uses pageblock isolation
> regardless of 'given pfn range'.

I am still confused. So when is it safe to isolate a page from the CMA
pageblock for something that is not a CMA allocation request? Don't we
lose a CMA guanratee that way? 

[...]
> > That being said, I would much rather see MIGRATE_CMA case special cased
> > than duplicate the already confusing API but I will not insist of
> > course.
> 
> Okay. I atteach the patch. Andrew, could you revert Michal's series
> and apply this patch first? Perhaps, Michal will resend his series on
> top of this one.

I am not convinced about this approach but I will not argue about the
patch though. If this is seen as a right way forward, I will rebase
my patches on top.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-24  4:44                                           ` Joonsoo Kim
@ 2017-10-24  8:12                                             ` Vlastimil Babka
  -1 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-24  8:12 UTC (permalink / raw)
  To: Joonsoo Kim, Michal Hocko
  Cc: linux-mm, Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Igor Mammedov,
	Vitaly Kuznetsov, LKML

On 10/24/2017 06:44 AM, Joonsoo Kim wrote:
>>> I'm not sure what is the confusing semantic you mentioned. I think
>>> that set_migratetype_isolate() has confusing semantic and should be
>>> fixed since making the pageblock isolated doesn't need to check if
>>> there is unmovable page or not. Do you think that
>>> set_migratetype_isolate() need to check it? If so, why?
>>
>> My intuitive understanding of set_migratetype_isolate is that it either
>> suceeds and that means that the given pfn range can be isolated for the
>> given type of allocation (be it movable or cma). No new pages will be
>> allocated from this range to allow converging into a free range in a
>> finit amount of time. At least this is how the hotplug code would like
>> to use it and I suppose that the alloc_contig_range would like to
>> guarantee the same to not rely on a fixed amount of migration attempts.
> 
> Yes, alloc_contig_range() also want to guarantee the similar thing.
> Major difference between them is 'given pfn range'. memory hotplug
> works by pageblock unit but alloc_contig_range() doesn't.
> alloc_contig_range() works by the page unit. However, there is no easy
> way to isolate individual page so it uses pageblock isolation
> regardless of 'given pfn range'. In this case, checking movability of
> all pages on the pageblock would cause the problem as I mentioned
> before.

I couldn't look too closely yet, but do I understand correctly that the
*potential* problem (because as you say there are no such
alloc_contig_range callers) you are describing is not newly introduced
by Michal's series? Then his patch fixing the introduced regression
should be enough for now, and further improvements could be posted on
top, and not vice versa? Please don't take it wrong, I agree the current
state is a bit of a mess and improvements are welcome. Also it seems to
me that Michal is right, and there's nothing preventing
alloc_contig_range() to allocate from CMA pageblocks for non-CMA
purposes (likely not movable), and that should be also fixed?

Vlastimil

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-24  8:12                                             ` Vlastimil Babka
  0 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-24  8:12 UTC (permalink / raw)
  To: Joonsoo Kim, Michal Hocko
  Cc: linux-mm, Michael Ellerman, Andrew Morton, KAMEZAWA Hiroyuki,
	Reza Arbab, Yasuaki Ishimatsu, qiuxishi, Igor Mammedov,
	Vitaly Kuznetsov, LKML

On 10/24/2017 06:44 AM, Joonsoo Kim wrote:
>>> I'm not sure what is the confusing semantic you mentioned. I think
>>> that set_migratetype_isolate() has confusing semantic and should be
>>> fixed since making the pageblock isolated doesn't need to check if
>>> there is unmovable page or not. Do you think that
>>> set_migratetype_isolate() need to check it? If so, why?
>>
>> My intuitive understanding of set_migratetype_isolate is that it either
>> suceeds and that means that the given pfn range can be isolated for the
>> given type of allocation (be it movable or cma). No new pages will be
>> allocated from this range to allow converging into a free range in a
>> finit amount of time. At least this is how the hotplug code would like
>> to use it and I suppose that the alloc_contig_range would like to
>> guarantee the same to not rely on a fixed amount of migration attempts.
> 
> Yes, alloc_contig_range() also want to guarantee the similar thing.
> Major difference between them is 'given pfn range'. memory hotplug
> works by pageblock unit but alloc_contig_range() doesn't.
> alloc_contig_range() works by the page unit. However, there is no easy
> way to isolate individual page so it uses pageblock isolation
> regardless of 'given pfn range'. In this case, checking movability of
> all pages on the pageblock would cause the problem as I mentioned
> before.

I couldn't look too closely yet, but do I understand correctly that the
*potential* problem (because as you say there are no such
alloc_contig_range callers) you are describing is not newly introduced
by Michal's series? Then his patch fixing the introduced regression
should be enough for now, and further improvements could be posted on
top, and not vice versa? Please don't take it wrong, I agree the current
state is a bit of a mess and improvements are welcome. Also it seems to
me that Michal is right, and there's nothing preventing
alloc_contig_range() to allocate from CMA pageblocks for non-CMA
purposes (likely not movable), and that should be also fixed?

Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-24  8:12                                             ` Vlastimil Babka
@ 2017-10-24 12:25                                               ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-24 12:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, linux-mm, Michael Ellerman, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Tue 24-10-17 10:12:58, Vlastimil Babka wrote:
> On 10/24/2017 06:44 AM, Joonsoo Kim wrote:
> >>> I'm not sure what is the confusing semantic you mentioned. I think
> >>> that set_migratetype_isolate() has confusing semantic and should be
> >>> fixed since making the pageblock isolated doesn't need to check if
> >>> there is unmovable page or not. Do you think that
> >>> set_migratetype_isolate() need to check it? If so, why?
> >>
> >> My intuitive understanding of set_migratetype_isolate is that it either
> >> suceeds and that means that the given pfn range can be isolated for the
> >> given type of allocation (be it movable or cma). No new pages will be
> >> allocated from this range to allow converging into a free range in a
> >> finit amount of time. At least this is how the hotplug code would like
> >> to use it and I suppose that the alloc_contig_range would like to
> >> guarantee the same to not rely on a fixed amount of migration attempts.
> > 
> > Yes, alloc_contig_range() also want to guarantee the similar thing.
> > Major difference between them is 'given pfn range'. memory hotplug
> > works by pageblock unit but alloc_contig_range() doesn't.
> > alloc_contig_range() works by the page unit. However, there is no easy
> > way to isolate individual page so it uses pageblock isolation
> > regardless of 'given pfn range'. In this case, checking movability of
> > all pages on the pageblock would cause the problem as I mentioned
> > before.
> 
> I couldn't look too closely yet, but do I understand correctly that the
> *potential* problem (because as you say there are no such
> alloc_contig_range callers) you are describing is not newly introduced
> by Michal's series? Then his patch fixing the introduced regression
> should be enough for now, and further improvements could be posted on
> top, and not vice versa? Please don't take it wrong, I agree the current
> state is a bit of a mess and improvements are welcome. Also it seems to
> me that Michal is right, and there's nothing preventing
> alloc_contig_range() to allocate from CMA pageblocks for non-CMA
> purposes (likely not movable), and that should be also fixed?

OK, it seems I understand Joonsoo's concern more now. And I agree with
Vlastimil, that it is better to plug the immediate regression with a
minimal patch and discuss general improvements of the pfn based
allocator separatelly. There are more things to clear up there,
including the proper API (alloc_contig_range is just too low level for
anybody to use) as well as the MIGRATE_* flags usage (e.g. I am not
really sure GB pages usage of MIGRATE_MOVABLE is really correct).
alloc_contig_range looks like an internal CMA function which has been
(ab)used for a different purpose to me rather than a well thought
through interface. MAP_CONTIG discussion has shown some interest in
an API for large allocations so I _believe_ we should think that through
befire we grow more unexpected users.

I am definitely willing to help there.

Is that something you would agree with Joonsoo?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-24 12:25                                               ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-24 12:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, linux-mm, Michael Ellerman, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Tue 24-10-17 10:12:58, Vlastimil Babka wrote:
> On 10/24/2017 06:44 AM, Joonsoo Kim wrote:
> >>> I'm not sure what is the confusing semantic you mentioned. I think
> >>> that set_migratetype_isolate() has confusing semantic and should be
> >>> fixed since making the pageblock isolated doesn't need to check if
> >>> there is unmovable page or not. Do you think that
> >>> set_migratetype_isolate() need to check it? If so, why?
> >>
> >> My intuitive understanding of set_migratetype_isolate is that it either
> >> suceeds and that means that the given pfn range can be isolated for the
> >> given type of allocation (be it movable or cma). No new pages will be
> >> allocated from this range to allow converging into a free range in a
> >> finit amount of time. At least this is how the hotplug code would like
> >> to use it and I suppose that the alloc_contig_range would like to
> >> guarantee the same to not rely on a fixed amount of migration attempts.
> > 
> > Yes, alloc_contig_range() also want to guarantee the similar thing.
> > Major difference between them is 'given pfn range'. memory hotplug
> > works by pageblock unit but alloc_contig_range() doesn't.
> > alloc_contig_range() works by the page unit. However, there is no easy
> > way to isolate individual page so it uses pageblock isolation
> > regardless of 'given pfn range'. In this case, checking movability of
> > all pages on the pageblock would cause the problem as I mentioned
> > before.
> 
> I couldn't look too closely yet, but do I understand correctly that the
> *potential* problem (because as you say there are no such
> alloc_contig_range callers) you are describing is not newly introduced
> by Michal's series? Then his patch fixing the introduced regression
> should be enough for now, and further improvements could be posted on
> top, and not vice versa? Please don't take it wrong, I agree the current
> state is a bit of a mess and improvements are welcome. Also it seems to
> me that Michal is right, and there's nothing preventing
> alloc_contig_range() to allocate from CMA pageblocks for non-CMA
> purposes (likely not movable), and that should be also fixed?

OK, it seems I understand Joonsoo's concern more now. And I agree with
Vlastimil, that it is better to plug the immediate regression with a
minimal patch and discuss general improvements of the pfn based
allocator separatelly. There are more things to clear up there,
including the proper API (alloc_contig_range is just too low level for
anybody to use) as well as the MIGRATE_* flags usage (e.g. I am not
really sure GB pages usage of MIGRATE_MOVABLE is really correct).
alloc_contig_range looks like an internal CMA function which has been
(ab)used for a different purpose to me rather than a well thought
through interface. MAP_CONTIG discussion has shown some interest in
an API for large allocations so I _believe_ we should think that through
befire we grow more unexpected users.

I am definitely willing to help there.

Is that something you would agree with Joonsoo?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-24  8:12                                             ` Vlastimil Babka
@ 2017-10-26  2:47                                               ` Joonsoo Kim
  -1 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-26  2:47 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Michal Hocko, linux-mm, Michael Ellerman, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Tue, Oct 24, 2017 at 10:12:58AM +0200, Vlastimil Babka wrote:
> On 10/24/2017 06:44 AM, Joonsoo Kim wrote:
> >>> I'm not sure what is the confusing semantic you mentioned. I think
> >>> that set_migratetype_isolate() has confusing semantic and should be
> >>> fixed since making the pageblock isolated doesn't need to check if
> >>> there is unmovable page or not. Do you think that
> >>> set_migratetype_isolate() need to check it? If so, why?
> >>
> >> My intuitive understanding of set_migratetype_isolate is that it either
> >> suceeds and that means that the given pfn range can be isolated for the
> >> given type of allocation (be it movable or cma). No new pages will be
> >> allocated from this range to allow converging into a free range in a
> >> finit amount of time. At least this is how the hotplug code would like
> >> to use it and I suppose that the alloc_contig_range would like to
> >> guarantee the same to not rely on a fixed amount of migration attempts.
> > 
> > Yes, alloc_contig_range() also want to guarantee the similar thing.
> > Major difference between them is 'given pfn range'. memory hotplug
> > works by pageblock unit but alloc_contig_range() doesn't.
> > alloc_contig_range() works by the page unit. However, there is no easy
> > way to isolate individual page so it uses pageblock isolation
> > regardless of 'given pfn range'. In this case, checking movability of
> > all pages on the pageblock would cause the problem as I mentioned
> > before.
> 
> I couldn't look too closely yet, but do I understand correctly that the
> *potential* problem (because as you say there are no such
> alloc_contig_range callers) you are describing is not newly introduced
> by Michal's series? Then his patch fixing the introduced regression

This potential problem exists there before Michal's series if the
migratetype of the target pageblock isn't MIGRATE_MOVABLE or MIGRATE_CMA.
However, his series enlarges this potential problem surface. It
would be the problem now even if the migratetype of the target
pageblock is MIGRATE_MOVABLE.

> should be enough for now, and further improvements could be posted on
> top, and not vice versa? Please don't take it wrong, I agree the current
> state is a bit of a mess and improvements are welcome. Also it seems to

I'm not very sensitive that which patch is applied first. I can do
rebase. But, IMHO, correct applying order is my patch first and then
Michal's series.

Anyway, Michal, feel free to do what you think correct.

> me that Michal is right, and there's nothing preventing
> alloc_contig_range() to allocate from CMA pageblocks for non-CMA
> purposes (likely not movable), and that should be also fixed?

I noticed the problem you mentioned now and, yes, it should be fixed.
My patch will naturally fixes this issue, too.

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-26  2:47                                               ` Joonsoo Kim
  0 siblings, 0 replies; 102+ messages in thread
From: Joonsoo Kim @ 2017-10-26  2:47 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Michal Hocko, linux-mm, Michael Ellerman, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Tue, Oct 24, 2017 at 10:12:58AM +0200, Vlastimil Babka wrote:
> On 10/24/2017 06:44 AM, Joonsoo Kim wrote:
> >>> I'm not sure what is the confusing semantic you mentioned. I think
> >>> that set_migratetype_isolate() has confusing semantic and should be
> >>> fixed since making the pageblock isolated doesn't need to check if
> >>> there is unmovable page or not. Do you think that
> >>> set_migratetype_isolate() need to check it? If so, why?
> >>
> >> My intuitive understanding of set_migratetype_isolate is that it either
> >> suceeds and that means that the given pfn range can be isolated for the
> >> given type of allocation (be it movable or cma). No new pages will be
> >> allocated from this range to allow converging into a free range in a
> >> finit amount of time. At least this is how the hotplug code would like
> >> to use it and I suppose that the alloc_contig_range would like to
> >> guarantee the same to not rely on a fixed amount of migration attempts.
> > 
> > Yes, alloc_contig_range() also want to guarantee the similar thing.
> > Major difference between them is 'given pfn range'. memory hotplug
> > works by pageblock unit but alloc_contig_range() doesn't.
> > alloc_contig_range() works by the page unit. However, there is no easy
> > way to isolate individual page so it uses pageblock isolation
> > regardless of 'given pfn range'. In this case, checking movability of
> > all pages on the pageblock would cause the problem as I mentioned
> > before.
> 
> I couldn't look too closely yet, but do I understand correctly that the
> *potential* problem (because as you say there are no such
> alloc_contig_range callers) you are describing is not newly introduced
> by Michal's series? Then his patch fixing the introduced regression

This potential problem exists there before Michal's series if the
migratetype of the target pageblock isn't MIGRATE_MOVABLE or MIGRATE_CMA.
However, his series enlarges this potential problem surface. It
would be the problem now even if the migratetype of the target
pageblock is MIGRATE_MOVABLE.

> should be enough for now, and further improvements could be posted on
> top, and not vice versa? Please don't take it wrong, I agree the current
> state is a bit of a mess and improvements are welcome. Also it seems to

I'm not very sensitive that which patch is applied first. I can do
rebase. But, IMHO, correct applying order is my patch first and then
Michal's series.

Anyway, Michal, feel free to do what you think correct.

> me that Michal is right, and there's nothing preventing
> alloc_contig_range() to allocate from CMA pageblocks for non-CMA
> purposes (likely not movable), and that should be also fixed?

I noticed the problem you mentioned now and, yes, it should be fixed.
My patch will naturally fixes this issue, too.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-26  2:47                                               ` Joonsoo Kim
@ 2017-10-26  7:41                                                 ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-26  7:41 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vlastimil Babka, linux-mm, Michael Ellerman, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 26-10-17 11:47:07, Joonsoo Kim wrote:
> On Tue, Oct 24, 2017 at 10:12:58AM +0200, Vlastimil Babka wrote:
> > On 10/24/2017 06:44 AM, Joonsoo Kim wrote:
> > >>> I'm not sure what is the confusing semantic you mentioned. I think
> > >>> that set_migratetype_isolate() has confusing semantic and should be
> > >>> fixed since making the pageblock isolated doesn't need to check if
> > >>> there is unmovable page or not. Do you think that
> > >>> set_migratetype_isolate() need to check it? If so, why?
> > >>
> > >> My intuitive understanding of set_migratetype_isolate is that it either
> > >> suceeds and that means that the given pfn range can be isolated for the
> > >> given type of allocation (be it movable or cma). No new pages will be
> > >> allocated from this range to allow converging into a free range in a
> > >> finit amount of time. At least this is how the hotplug code would like
> > >> to use it and I suppose that the alloc_contig_range would like to
> > >> guarantee the same to not rely on a fixed amount of migration attempts.
> > > 
> > > Yes, alloc_contig_range() also want to guarantee the similar thing.
> > > Major difference between them is 'given pfn range'. memory hotplug
> > > works by pageblock unit but alloc_contig_range() doesn't.
> > > alloc_contig_range() works by the page unit. However, there is no easy
> > > way to isolate individual page so it uses pageblock isolation
> > > regardless of 'given pfn range'. In this case, checking movability of
> > > all pages on the pageblock would cause the problem as I mentioned
> > > before.
> > 
> > I couldn't look too closely yet, but do I understand correctly that the
> > *potential* problem (because as you say there are no such
> > alloc_contig_range callers) you are describing is not newly introduced
> > by Michal's series? Then his patch fixing the introduced regression
> 
> This potential problem exists there before Michal's series if the
> migratetype of the target pageblock isn't MIGRATE_MOVABLE or MIGRATE_CMA.
> However, his series enlarges this potential problem surface. It
> would be the problem now even if the migratetype of the target
> pageblock is MIGRATE_MOVABLE.
> 
> > should be enough for now, and further improvements could be posted on
> > top, and not vice versa? Please don't take it wrong, I agree the current
> > state is a bit of a mess and improvements are welcome. Also it seems to
> 
> I'm not very sensitive that which patch is applied first. I can do
> rebase. But, IMHO, correct applying order is my patch first and then
> Michal's series.
> 
> Anyway, Michal, feel free to do what you think correct.

If you do not mind I would rather go with the simple patch first and
then build on top of that. For two reasons 1) it documents the CMA
requirement and 2) there do not seem to be any real users affected by
the issue you are seeing right now. And 3) I really believe
alloc_contig_range needs a deeper thought to be usable in more general
contexts.

> > me that Michal is right, and there's nothing preventing
> > alloc_contig_range() to allocate from CMA pageblocks for non-CMA
> > purposes (likely not movable), and that should be also fixed?
> 
> I noticed the problem you mentioned now and, yes, it should be fixed.
> My patch will naturally fixes this issue, too.

I really do not see how.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-26  7:41                                                 ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-26  7:41 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vlastimil Babka, linux-mm, Michael Ellerman, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 26-10-17 11:47:07, Joonsoo Kim wrote:
> On Tue, Oct 24, 2017 at 10:12:58AM +0200, Vlastimil Babka wrote:
> > On 10/24/2017 06:44 AM, Joonsoo Kim wrote:
> > >>> I'm not sure what is the confusing semantic you mentioned. I think
> > >>> that set_migratetype_isolate() has confusing semantic and should be
> > >>> fixed since making the pageblock isolated doesn't need to check if
> > >>> there is unmovable page or not. Do you think that
> > >>> set_migratetype_isolate() need to check it? If so, why?
> > >>
> > >> My intuitive understanding of set_migratetype_isolate is that it either
> > >> suceeds and that means that the given pfn range can be isolated for the
> > >> given type of allocation (be it movable or cma). No new pages will be
> > >> allocated from this range to allow converging into a free range in a
> > >> finit amount of time. At least this is how the hotplug code would like
> > >> to use it and I suppose that the alloc_contig_range would like to
> > >> guarantee the same to not rely on a fixed amount of migration attempts.
> > > 
> > > Yes, alloc_contig_range() also want to guarantee the similar thing.
> > > Major difference between them is 'given pfn range'. memory hotplug
> > > works by pageblock unit but alloc_contig_range() doesn't.
> > > alloc_contig_range() works by the page unit. However, there is no easy
> > > way to isolate individual page so it uses pageblock isolation
> > > regardless of 'given pfn range'. In this case, checking movability of
> > > all pages on the pageblock would cause the problem as I mentioned
> > > before.
> > 
> > I couldn't look too closely yet, but do I understand correctly that the
> > *potential* problem (because as you say there are no such
> > alloc_contig_range callers) you are describing is not newly introduced
> > by Michal's series? Then his patch fixing the introduced regression
> 
> This potential problem exists there before Michal's series if the
> migratetype of the target pageblock isn't MIGRATE_MOVABLE or MIGRATE_CMA.
> However, his series enlarges this potential problem surface. It
> would be the problem now even if the migratetype of the target
> pageblock is MIGRATE_MOVABLE.
> 
> > should be enough for now, and further improvements could be posted on
> > top, and not vice versa? Please don't take it wrong, I agree the current
> > state is a bit of a mess and improvements are welcome. Also it seems to
> 
> I'm not very sensitive that which patch is applied first. I can do
> rebase. But, IMHO, correct applying order is my patch first and then
> Michal's series.
> 
> Anyway, Michal, feel free to do what you think correct.

If you do not mind I would rather go with the simple patch first and
then build on top of that. For two reasons 1) it documents the CMA
requirement and 2) there do not seem to be any real users affected by
the issue you are seeing right now. And 3) I really believe
alloc_contig_range needs a deeper thought to be usable in more general
contexts.

> > me that Michal is right, and there's nothing preventing
> > alloc_contig_range() to allocate from CMA pageblocks for non-CMA
> > purposes (likely not movable), and that should be also fixed?
> 
> I noticed the problem you mentioned now and, yes, it should be fixed.
> My patch will naturally fixes this issue, too.

I really do not see how.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-19 12:21                             ` Michal Hocko
@ 2017-10-26 13:04                               ` Vlastimil Babka
  -1 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-26 13:04 UTC (permalink / raw)
  To: Michal Hocko, Joonsoo Kim, Andrew Morton
  Cc: linux-mm, Michael Ellerman, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	LKML

On 10/19/2017 02:21 PM, Michal Hocko wrote:
> On Thu 19-10-17 10:20:41, Michal Hocko wrote:
>> On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
>>> On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
>>>> On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
>> [...]
>>>>> Hello,
>>>>>
>>>>> This patch will break the CMA user. As you mentioned, CMA allocation
>>>>> itself isn't migrateable. So, after a single page is allocated through
>>>>> CMA allocation, has_unmovable_pages() will return true for this
>>>>> pageblock. Then, futher CMA allocation request to this pageblock will
>>>>> fail because it requires isolating the pageblock.
>>>>
>>>> Hmm, does this mean that the CMA allocation path depends on
>>>> has_unmovable_pages to return false here even though the memory is not
>>>> movable? This sounds really strange to me and kind of abuse of this
>>>
>>> Your understanding is correct. Perhaps, abuse or wrong function name.
>>>
>>>> function. Which path is that? Can we do the migrate type test theres?
>>>
>>> alloc_contig_range() -> start_isolate_page_range() ->
>>> set_migratetype_isolate() -> has_unmovable_pages()
>>
>> I see. It seems that the CMA and memory hotplug have a very different
>> view on what should happen during isolation.
>>  
>>> We can add one argument, 'XXX' to set_migratetype_isolate() and change
>>> it to check migrate type rather than has_unmovable_pages() if 'XXX' is
>>> specified.
>>
>> Can we use the migratetype argument and do the special thing for
>> MIGRATE_CMA? Like the following diff?
> 
> And with the full changelog.
> ---
> From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 19 Oct 2017 14:14:02 +0200
> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
>  has_unmovable_pages
> 
> Joonsoo has noticed that "mm: drop migrate type checks from
> has_unmovable_pages" would break CMA allocator because it relies on
> has_unmovable_pages returning false even for CMA pageblocks which in
> fact don't have to be movable:
> alloc_contig_range
>   start_isolate_page_range
>     set_migratetype_isolate
>       has_unmovable_pages
> 
> This is a result of the code sharing between CMA and memory hotplug
> while each one has a different idea of what has_unmovable_pages should
> return. This is unfortunate but fixing it properly would require a lot
> of code duplication.
> 
> Fix the issue by introducing the requested migrate type argument
> and special case MIGRATE_CMA case where CMA page blocks are handled
> properly. This will work for memory hotplug because it requires
> MIGRATE_MOVABLE.
> 
> Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/page-isolation.h |  2 +-
>  mm/page_alloc.c                | 12 +++++++++++-
>  mm/page_isolation.c            | 10 +++++-----
>  3 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index d4cd2014fa6f..fa9db0c7b54e 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -30,7 +30,7 @@ static inline bool is_migrate_isolate(int migratetype)
>  #endif
>  
>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> -			 bool skip_hwpoisoned_pages);
> +			 int migratetype, bool skip_hwpoisoned_pages);
>  void set_pageblock_migratetype(struct page *page, int migratetype);
>  int move_freepages_block(struct zone *zone, struct page *page,
>  				int migratetype, int *num_movable);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5b4d85ae445c..259aeb22462f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7344,6 +7344,7 @@ void *__init alloc_large_system_hash(const char *tablename,
>   * race condition. So you can't expect this function should be exact.
>   */
>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> +			 int migratetype,
>  			 bool skip_hwpoisoned_pages)
>  {
>  	unsigned long pfn, iter, found;
> @@ -7356,6 +7357,15 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return false;
>  
> +	/*
> +	 * CMA allocations (alloc_contig_range) really need to mark isolate
> +	 * CMA pageblocks even when they are not movable in fact so consider
> +	 * them movable here.
> +	 */
> +	if (is_migrate_cma(migratetype) &&
> +			is_migrate_cma(get_pageblock_migratetype(page)))
> +		return false;
> +
>  	pfn = page_to_pfn(page);
>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
>  		unsigned long check = pfn + iter;
> @@ -7441,7 +7451,7 @@ bool is_pageblock_removable_nolock(struct page *page)
>  	if (!zone_spans_pfn(zone, pfn))
>  		return false;
>  
> -	return !has_unmovable_pages(zone, page, 0, true);
> +	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, true);
>  }
>  
>  #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 757410d9f758..8616f5332c77 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -14,7 +14,7 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/page_isolation.h>
>  
> -static int set_migratetype_isolate(struct page *page,
> +static int set_migratetype_isolate(struct page *page, int migratetype,
>  				bool skip_hwpoisoned_pages)
>  {
>  	struct zone *zone;
> @@ -51,7 +51,7 @@ static int set_migratetype_isolate(struct page *page,
>  	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
>  	 * We just check MOVABLE pages.
>  	 */
> -	if (!has_unmovable_pages(zone, page, arg.pages_found,
> +	if (!has_unmovable_pages(zone, page, arg.pages_found, migratetype,
>  				 skip_hwpoisoned_pages))
>  		ret = 0;
>  
> @@ -63,14 +63,14 @@ static int set_migratetype_isolate(struct page *page,
>  out:
>  	if (!ret) {
>  		unsigned long nr_pages;
> -		int migratetype = get_pageblock_migratetype(page);
> +		int mt = get_pageblock_migratetype(page);
>  
>  		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>  		zone->nr_isolate_pageblock++;
>  		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
>  									NULL);
>  
> -		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> +		__mod_zone_freepage_state(zone, -nr_pages, mt);
>  	}
>  
>  	spin_unlock_irqrestore(&zone->lock, flags);
> @@ -182,7 +182,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  	     pfn += pageblock_nr_pages) {
>  		page = __first_valid_page(pfn, pageblock_nr_pages);
>  		if (page &&
> -		    set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
> +		    set_migratetype_isolate(page, migratetype, skip_hwpoisoned_pages)) {
>  			undo_pfn = pfn;
>  			goto undo;
>  		}
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-26 13:04                               ` Vlastimil Babka
  0 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2017-10-26 13:04 UTC (permalink / raw)
  To: Michal Hocko, Joonsoo Kim, Andrew Morton
  Cc: linux-mm, Michael Ellerman, KAMEZAWA Hiroyuki, Reza Arbab,
	Yasuaki Ishimatsu, qiuxishi, Igor Mammedov, Vitaly Kuznetsov,
	LKML

On 10/19/2017 02:21 PM, Michal Hocko wrote:
> On Thu 19-10-17 10:20:41, Michal Hocko wrote:
>> On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
>>> On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
>>>> On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
>> [...]
>>>>> Hello,
>>>>>
>>>>> This patch will break the CMA user. As you mentioned, CMA allocation
>>>>> itself isn't migrateable. So, after a single page is allocated through
>>>>> CMA allocation, has_unmovable_pages() will return true for this
>>>>> pageblock. Then, futher CMA allocation request to this pageblock will
>>>>> fail because it requires isolating the pageblock.
>>>>
>>>> Hmm, does this mean that the CMA allocation path depends on
>>>> has_unmovable_pages to return false here even though the memory is not
>>>> movable? This sounds really strange to me and kind of abuse of this
>>>
>>> Your understanding is correct. Perhaps, abuse or wrong function name.
>>>
>>>> function. Which path is that? Can we do the migrate type test theres?
>>>
>>> alloc_contig_range() -> start_isolate_page_range() ->
>>> set_migratetype_isolate() -> has_unmovable_pages()
>>
>> I see. It seems that the CMA and memory hotplug have a very different
>> view on what should happen during isolation.
>>  
>>> We can add one argument, 'XXX' to set_migratetype_isolate() and change
>>> it to check migrate type rather than has_unmovable_pages() if 'XXX' is
>>> specified.
>>
>> Can we use the migratetype argument and do the special thing for
>> MIGRATE_CMA? Like the following diff?
> 
> And with the full changelog.
> ---
> From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 19 Oct 2017 14:14:02 +0200
> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
>  has_unmovable_pages
> 
> Joonsoo has noticed that "mm: drop migrate type checks from
> has_unmovable_pages" would break CMA allocator because it relies on
> has_unmovable_pages returning false even for CMA pageblocks which in
> fact don't have to be movable:
> alloc_contig_range
>   start_isolate_page_range
>     set_migratetype_isolate
>       has_unmovable_pages
> 
> This is a result of the code sharing between CMA and memory hotplug
> while each one has a different idea of what has_unmovable_pages should
> return. This is unfortunate but fixing it properly would require a lot
> of code duplication.
> 
> Fix the issue by introducing the requested migrate type argument
> and special case MIGRATE_CMA case where CMA page blocks are handled
> properly. This will work for memory hotplug because it requires
> MIGRATE_MOVABLE.
> 
> Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/page-isolation.h |  2 +-
>  mm/page_alloc.c                | 12 +++++++++++-
>  mm/page_isolation.c            | 10 +++++-----
>  3 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index d4cd2014fa6f..fa9db0c7b54e 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -30,7 +30,7 @@ static inline bool is_migrate_isolate(int migratetype)
>  #endif
>  
>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> -			 bool skip_hwpoisoned_pages);
> +			 int migratetype, bool skip_hwpoisoned_pages);
>  void set_pageblock_migratetype(struct page *page, int migratetype);
>  int move_freepages_block(struct zone *zone, struct page *page,
>  				int migratetype, int *num_movable);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5b4d85ae445c..259aeb22462f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7344,6 +7344,7 @@ void *__init alloc_large_system_hash(const char *tablename,
>   * race condition. So you can't expect this function should be exact.
>   */
>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> +			 int migratetype,
>  			 bool skip_hwpoisoned_pages)
>  {
>  	unsigned long pfn, iter, found;
> @@ -7356,6 +7357,15 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return false;
>  
> +	/*
> +	 * CMA allocations (alloc_contig_range) really need to mark isolate
> +	 * CMA pageblocks even when they are not movable in fact so consider
> +	 * them movable here.
> +	 */
> +	if (is_migrate_cma(migratetype) &&
> +			is_migrate_cma(get_pageblock_migratetype(page)))
> +		return false;
> +
>  	pfn = page_to_pfn(page);
>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
>  		unsigned long check = pfn + iter;
> @@ -7441,7 +7451,7 @@ bool is_pageblock_removable_nolock(struct page *page)
>  	if (!zone_spans_pfn(zone, pfn))
>  		return false;
>  
> -	return !has_unmovable_pages(zone, page, 0, true);
> +	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, true);
>  }
>  
>  #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 757410d9f758..8616f5332c77 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -14,7 +14,7 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/page_isolation.h>
>  
> -static int set_migratetype_isolate(struct page *page,
> +static int set_migratetype_isolate(struct page *page, int migratetype,
>  				bool skip_hwpoisoned_pages)
>  {
>  	struct zone *zone;
> @@ -51,7 +51,7 @@ static int set_migratetype_isolate(struct page *page,
>  	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
>  	 * We just check MOVABLE pages.
>  	 */
> -	if (!has_unmovable_pages(zone, page, arg.pages_found,
> +	if (!has_unmovable_pages(zone, page, arg.pages_found, migratetype,
>  				 skip_hwpoisoned_pages))
>  		ret = 0;
>  
> @@ -63,14 +63,14 @@ static int set_migratetype_isolate(struct page *page,
>  out:
>  	if (!ret) {
>  		unsigned long nr_pages;
> -		int migratetype = get_pageblock_migratetype(page);
> +		int mt = get_pageblock_migratetype(page);
>  
>  		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>  		zone->nr_isolate_pageblock++;
>  		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
>  									NULL);
>  
> -		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> +		__mod_zone_freepage_state(zone, -nr_pages, mt);
>  	}
>  
>  	spin_unlock_irqrestore(&zone->lock, flags);
> @@ -182,7 +182,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  	     pfn += pageblock_nr_pages) {
>  		page = __first_valid_page(pfn, pageblock_nr_pages);
>  		if (page &&
> -		    set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
> +		    set_migratetype_isolate(page, migratetype, skip_hwpoisoned_pages)) {
>  			undo_pfn = pfn;
>  			goto undo;
>  		}
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-10-19 12:21                             ` Michal Hocko
@ 2017-10-26 13:59                               ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-26 13:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Joonsoo Kim,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 19-10-17 14:21:18, Michal Hocko wrote:
[...]
> From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 19 Oct 2017 14:14:02 +0200
> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
>  has_unmovable_pages
> 
> Joonsoo has noticed that "mm: drop migrate type checks from
> has_unmovable_pages" would break CMA allocator because it relies on
> has_unmovable_pages returning false even for CMA pageblocks which in
> fact don't have to be movable:
> alloc_contig_range
>   start_isolate_page_range
>     set_migratetype_isolate
>       has_unmovable_pages
> 
> This is a result of the code sharing between CMA and memory hotplug
> while each one has a different idea of what has_unmovable_pages should
> return. This is unfortunate but fixing it properly would require a lot
> of code duplication.
> 
> Fix the issue by introducing the requested migrate type argument
> and special case MIGRATE_CMA case where CMA page blocks are handled
> properly. This will work for memory hotplug because it requires
> MIGRATE_MOVABLE.
> 
> Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---

Andrew,
could you add this one to the bundle as well? After
mm-drop-migrate-type-checks-from-has_unmovable_pages.patch, please.

Joonsoo would like to see a larger change in this area [1] but I think
we need to think those much more through [2] and Joonsoo agreed to take
the simpler patch first [3].

Thanks!

[1] http://lkml.kernel.org/r/20171024044423.GA31424@js1304-P5Q-DELUXE
[2] http://lkml.kernel.org/r/20171024122526.3kmabkcbmj4johli@dhcp22.suse.cz
[3] http://lkml.kernel.org/r/20171026024707.GA11791@js1304-P5Q-DELUXE

>  include/linux/page-isolation.h |  2 +-
>  mm/page_alloc.c                | 12 +++++++++++-
>  mm/page_isolation.c            | 10 +++++-----
>  3 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index d4cd2014fa6f..fa9db0c7b54e 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -30,7 +30,7 @@ static inline bool is_migrate_isolate(int migratetype)
>  #endif
>  
>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> -			 bool skip_hwpoisoned_pages);
> +			 int migratetype, bool skip_hwpoisoned_pages);
>  void set_pageblock_migratetype(struct page *page, int migratetype);
>  int move_freepages_block(struct zone *zone, struct page *page,
>  				int migratetype, int *num_movable);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5b4d85ae445c..259aeb22462f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7344,6 +7344,7 @@ void *__init alloc_large_system_hash(const char *tablename,
>   * race condition. So you can't expect this function should be exact.
>   */
>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> +			 int migratetype,
>  			 bool skip_hwpoisoned_pages)
>  {
>  	unsigned long pfn, iter, found;
> @@ -7356,6 +7357,15 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return false;
>  
> +	/*
> +	 * CMA allocations (alloc_contig_range) really need to mark isolate
> +	 * CMA pageblocks even when they are not movable in fact so consider
> +	 * them movable here.
> +	 */
> +	if (is_migrate_cma(migratetype) &&
> +			is_migrate_cma(get_pageblock_migratetype(page)))
> +		return false;
> +
>  	pfn = page_to_pfn(page);
>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
>  		unsigned long check = pfn + iter;
> @@ -7441,7 +7451,7 @@ bool is_pageblock_removable_nolock(struct page *page)
>  	if (!zone_spans_pfn(zone, pfn))
>  		return false;
>  
> -	return !has_unmovable_pages(zone, page, 0, true);
> +	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, true);
>  }
>  
>  #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 757410d9f758..8616f5332c77 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -14,7 +14,7 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/page_isolation.h>
>  
> -static int set_migratetype_isolate(struct page *page,
> +static int set_migratetype_isolate(struct page *page, int migratetype,
>  				bool skip_hwpoisoned_pages)
>  {
>  	struct zone *zone;
> @@ -51,7 +51,7 @@ static int set_migratetype_isolate(struct page *page,
>  	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
>  	 * We just check MOVABLE pages.
>  	 */
> -	if (!has_unmovable_pages(zone, page, arg.pages_found,
> +	if (!has_unmovable_pages(zone, page, arg.pages_found, migratetype,
>  				 skip_hwpoisoned_pages))
>  		ret = 0;
>  
> @@ -63,14 +63,14 @@ static int set_migratetype_isolate(struct page *page,
>  out:
>  	if (!ret) {
>  		unsigned long nr_pages;
> -		int migratetype = get_pageblock_migratetype(page);
> +		int mt = get_pageblock_migratetype(page);
>  
>  		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>  		zone->nr_isolate_pageblock++;
>  		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
>  									NULL);
>  
> -		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> +		__mod_zone_freepage_state(zone, -nr_pages, mt);
>  	}
>  
>  	spin_unlock_irqrestore(&zone->lock, flags);
> @@ -182,7 +182,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  	     pfn += pageblock_nr_pages) {
>  		page = __first_valid_page(pfn, pageblock_nr_pages);
>  		if (page &&
> -		    set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
> +		    set_migratetype_isolate(page, migratetype, skip_hwpoisoned_pages)) {
>  			undo_pfn = pfn;
>  			goto undo;
>  		}
> -- 
> 2.14.2
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-10-26 13:59                               ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-10-26 13:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Joonsoo Kim,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 19-10-17 14:21:18, Michal Hocko wrote:
[...]
> From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 19 Oct 2017 14:14:02 +0200
> Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
>  has_unmovable_pages
> 
> Joonsoo has noticed that "mm: drop migrate type checks from
> has_unmovable_pages" would break CMA allocator because it relies on
> has_unmovable_pages returning false even for CMA pageblocks which in
> fact don't have to be movable:
> alloc_contig_range
>   start_isolate_page_range
>     set_migratetype_isolate
>       has_unmovable_pages
> 
> This is a result of the code sharing between CMA and memory hotplug
> while each one has a different idea of what has_unmovable_pages should
> return. This is unfortunate but fixing it properly would require a lot
> of code duplication.
> 
> Fix the issue by introducing the requested migrate type argument
> and special case MIGRATE_CMA case where CMA page blocks are handled
> properly. This will work for memory hotplug because it requires
> MIGRATE_MOVABLE.
> 
> Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---

Andrew,
could you add this one to the bundle as well? After
mm-drop-migrate-type-checks-from-has_unmovable_pages.patch, please.

Joonsoo would like to see a larger change in this area [1] but I think
we need to think those much more through [2] and Joonsoo agreed to take
the simpler patch first [3].

Thanks!

[1] http://lkml.kernel.org/r/20171024044423.GA31424@js1304-P5Q-DELUXE
[2] http://lkml.kernel.org/r/20171024122526.3kmabkcbmj4johli@dhcp22.suse.cz
[3] http://lkml.kernel.org/r/20171026024707.GA11791@js1304-P5Q-DELUXE

>  include/linux/page-isolation.h |  2 +-
>  mm/page_alloc.c                | 12 +++++++++++-
>  mm/page_isolation.c            | 10 +++++-----
>  3 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index d4cd2014fa6f..fa9db0c7b54e 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -30,7 +30,7 @@ static inline bool is_migrate_isolate(int migratetype)
>  #endif
>  
>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> -			 bool skip_hwpoisoned_pages);
> +			 int migratetype, bool skip_hwpoisoned_pages);
>  void set_pageblock_migratetype(struct page *page, int migratetype);
>  int move_freepages_block(struct zone *zone, struct page *page,
>  				int migratetype, int *num_movable);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5b4d85ae445c..259aeb22462f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7344,6 +7344,7 @@ void *__init alloc_large_system_hash(const char *tablename,
>   * race condition. So you can't expect this function should be exact.
>   */
>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> +			 int migratetype,
>  			 bool skip_hwpoisoned_pages)
>  {
>  	unsigned long pfn, iter, found;
> @@ -7356,6 +7357,15 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return false;
>  
> +	/*
> +	 * CMA allocations (alloc_contig_range) really need to mark isolate
> +	 * CMA pageblocks even when they are not movable in fact so consider
> +	 * them movable here.
> +	 */
> +	if (is_migrate_cma(migratetype) &&
> +			is_migrate_cma(get_pageblock_migratetype(page)))
> +		return false;
> +
>  	pfn = page_to_pfn(page);
>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
>  		unsigned long check = pfn + iter;
> @@ -7441,7 +7451,7 @@ bool is_pageblock_removable_nolock(struct page *page)
>  	if (!zone_spans_pfn(zone, pfn))
>  		return false;
>  
> -	return !has_unmovable_pages(zone, page, 0, true);
> +	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, true);
>  }
>  
>  #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 757410d9f758..8616f5332c77 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -14,7 +14,7 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/page_isolation.h>
>  
> -static int set_migratetype_isolate(struct page *page,
> +static int set_migratetype_isolate(struct page *page, int migratetype,
>  				bool skip_hwpoisoned_pages)
>  {
>  	struct zone *zone;
> @@ -51,7 +51,7 @@ static int set_migratetype_isolate(struct page *page,
>  	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
>  	 * We just check MOVABLE pages.
>  	 */
> -	if (!has_unmovable_pages(zone, page, arg.pages_found,
> +	if (!has_unmovable_pages(zone, page, arg.pages_found, migratetype,
>  				 skip_hwpoisoned_pages))
>  		ret = 0;
>  
> @@ -63,14 +63,14 @@ static int set_migratetype_isolate(struct page *page,
>  out:
>  	if (!ret) {
>  		unsigned long nr_pages;
> -		int migratetype = get_pageblock_migratetype(page);
> +		int mt = get_pageblock_migratetype(page);
>  
>  		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>  		zone->nr_isolate_pageblock++;
>  		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
>  									NULL);
>  
> -		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> +		__mod_zone_freepage_state(zone, -nr_pages, mt);
>  	}
>  
>  	spin_unlock_irqrestore(&zone->lock, flags);
> @@ -182,7 +182,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  	     pfn += pageblock_nr_pages) {
>  		page = __first_valid_page(pfn, pageblock_nr_pages);
>  		if (page &&
> -		    set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
> +		    set_migratetype_isolate(page, migratetype, skip_hwpoisoned_pages)) {
>  			undo_pfn = pfn;
>  			goto undo;
>  		}
> -- 
> 2.14.2
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-11-14  7:06         ` Michal Hocko
@ 2017-11-14  7:45           ` Ran Wang
  -1 siblings, 0 replies; 102+ messages in thread
From: Ran Wang @ 2017-11-14  7:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Leo Li, Xiaobo Xie

Hi Michal,
> -----Original Message-----
> From: Michal Hocko [mailto:mhocko@kernel.org]
> Sent: Tuesday, November 14, 2017 3:07 PM
> To: Ran Wang <ran.wang_1@nxp.com>
> Cc: linux-mm@kvack.org; Michael Ellerman <mpe@ellerman.id.au>; Vlastimil
> Babka <vbabka@suse.cz>; Andrew Morton <akpm@linux-foundation.org>;
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>; Reza Arbab
> <arbab@linux.vnet.ibm.com>; Yasuaki Ishimatsu <yasu.isimatu@gmail.com>;
> qiuxishi@huawei.com; Igor Mammedov <imammedo@redhat.com>; Vitaly
> Kuznetsov <vkuznets@redhat.com>; LKML <linux-kernel@vger.kernel.org>;
> Leo Li <leoyang.li@nxp.com>; Xiaobo Xie <xiaobo.xie@nxp.com>
> Subject: Re: [PATCH 1/2] mm: drop migrate type checks from
> has_unmovable_pages
> 
> On Tue 14-11-17 06:10:00, Ran Wang wrote:
> [...]
> > > > This drop cause DWC3 USB controller fail on initialization with
> > > > Layerscaper processors (such as LS1043A) as below:
> > > >
> > > > [    2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered,
> assigned
> > > bus number 1
> > > > [    2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > > > [    2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > > > [    2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > > > [    2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > > > [    2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> > > >
> > > > And I notice that someone also reported to you that DWC2 got
> > > > affected recently, so do you have the solution now?
> > >
> > > Yes. It should be in linux-next. Have a look at the following email
> > > thread:
> > >
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> > >
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> > >
> data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> > >
> a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> > >
> 583607877&sdata=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> > > &reserved=0
> 
> I really have no idea where the above link came from because my email had
> a reference to
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> data=02%7C01%7Cran.wang_1%40nxp.com%7C9b452e62f11e446d12b408d5
> 2b2e4014%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63646239
> 9997608449&sdata=S9MPhGyIUiYCJdVYMh3DAHAEytu%2Fu45BB%2BcMhO%
> 2BP3Qo%3D&reserved=0
> Has your email client modified the original email?
> 
> > Thanks for your info, although I fail to open the link you shared, but
> > I got patch from my colleague and the issue got fix on my side, let you know,
> thanks.
> 
> Thanks for your testing anyway. Can I assume your Tested-by?
Yes, please.

BR
Ran
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-11-14  7:45           ` Ran Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Ran Wang @ 2017-11-14  7:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Leo Li, Xiaobo Xie

Hi Michal,
> -----Original Message-----
> From: Michal Hocko [mailto:mhocko@kernel.org]
> Sent: Tuesday, November 14, 2017 3:07 PM
> To: Ran Wang <ran.wang_1@nxp.com>
> Cc: linux-mm@kvack.org; Michael Ellerman <mpe@ellerman.id.au>; Vlastimil
> Babka <vbabka@suse.cz>; Andrew Morton <akpm@linux-foundation.org>;
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>; Reza Arbab
> <arbab@linux.vnet.ibm.com>; Yasuaki Ishimatsu <yasu.isimatu@gmail.com>;
> qiuxishi@huawei.com; Igor Mammedov <imammedo@redhat.com>; Vitaly
> Kuznetsov <vkuznets@redhat.com>; LKML <linux-kernel@vger.kernel.org>;
> Leo Li <leoyang.li@nxp.com>; Xiaobo Xie <xiaobo.xie@nxp.com>
> Subject: Re: [PATCH 1/2] mm: drop migrate type checks from
> has_unmovable_pages
> 
> On Tue 14-11-17 06:10:00, Ran Wang wrote:
> [...]
> > > > This drop cause DWC3 USB controller fail on initialization with
> > > > Layerscaper processors (such as LS1043A) as below:
> > > >
> > > > [    2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered,
> assigned
> > > bus number 1
> > > > [    2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > > > [    2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > > > [    2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > > > [    2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > > > [    2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> > > >
> > > > And I notice that someone also reported to you that DWC2 got
> > > > affected recently, so do you have the solution now?
> > >
> > > Yes. It should be in linux-next. Have a look at the following email
> > > thread:
> > >
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> > >
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> > >
> data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> > >
> a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> > >
> 583607877&sdata=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> > > &reserved=0
> 
> I really have no idea where the above link came from because my email had
> a reference to
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> data=02%7C01%7Cran.wang_1%40nxp.com%7C9b452e62f11e446d12b408d5
> 2b2e4014%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63646239
> 9997608449&sdata=S9MPhGyIUiYCJdVYMh3DAHAEytu%2Fu45BB%2BcMhO%
> 2BP3Qo%3D&reserved=0
> Has your email client modified the original email?
> 
> > Thanks for your info, although I fail to open the link you shared, but
> > I got patch from my colleague and the issue got fix on my side, let you know,
> thanks.
> 
> Thanks for your testing anyway. Can I assume your Tested-by?
Yes, please.

BR
Ran
> --
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-11-14  6:10       ` Ran Wang
@ 2017-11-14  7:06         ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-11-14  7:06 UTC (permalink / raw)
  To: Ran Wang
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Leo Li, Xiaobo Xie

On Tue 14-11-17 06:10:00, Ran Wang wrote:
[...]
> > > This drop cause DWC3 USB controller fail on initialization with
> > > Layerscaper processors (such as LS1043A) as below:
> > >
> > > [    2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned
> > bus number 1
> > > [    2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > > [    2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > > [    2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > > [    2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > > [    2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> > >
> > > And I notice that someone also reported to you that DWC2 got affected
> > > recently, so do you have the solution now?
> > 
> > Yes. It should be in linux-next. Have a look at the following email
> > thread:
> > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> > kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> > data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> > a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> > 583607877&sdata=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> > &reserved=0

I really have no idea where the above link came from because my email
had a reference to http://lkml.kernel.org/r/20171104082500.qvzbb2kw4suo6cgy@dhcp22.suse.cz
Has your email client modified the original email?

> Thanks for your info, although I fail to open the link you shared, but I got patch
> from my colleague and the issue got fix on my side, let you know, thanks.

Thanks for your testing anyway. Can I assume your Tested-by?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-11-14  7:06         ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-11-14  7:06 UTC (permalink / raw)
  To: Ran Wang
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Leo Li, Xiaobo Xie

On Tue 14-11-17 06:10:00, Ran Wang wrote:
[...]
> > > This drop cause DWC3 USB controller fail on initialization with
> > > Layerscaper processors (such as LS1043A) as below:
> > >
> > > [    2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned
> > bus number 1
> > > [    2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > > [    2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > > [    2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > > [    2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > > [    2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> > >
> > > And I notice that someone also reported to you that DWC2 got affected
> > > recently, so do you have the solution now?
> > 
> > Yes. It should be in linux-next. Have a look at the following email
> > thread:
> > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> > kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> > data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> > a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> > 583607877&sdata=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> > &reserved=0

I really have no idea where the above link came from because my email
had a reference to http://lkml.kernel.org/r/20171104082500.qvzbb2kw4suo6cgy@dhcp22.suse.cz
Has your email client modified the original email?

> Thanks for your info, although I fail to open the link you shared, but I got patch
> from my colleague and the issue got fix on my side, let you know, thanks.

Thanks for your testing anyway. Can I assume your Tested-by?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-11-13 11:02     ` Michal Hocko
@ 2017-11-14  6:10       ` Ran Wang
  -1 siblings, 0 replies; 102+ messages in thread
From: Ran Wang @ 2017-11-14  6:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Leo Li, Xiaobo Xie

Hi Michal,

> -----Original Message-----
> From: Michal Hocko [mailto:mhocko@kernel.org]
> Sent: Monday, November 13, 2017 7:03 PM
> To: Ran Wang <ran.wang_1@nxp.com>
> Cc: linux-mm@kvack.org; Michael Ellerman <mpe@ellerman.id.au>; Vlastimil
> Babka <vbabka@suse.cz>; Andrew Morton <akpm@linux-foundation.org>;
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>; Reza Arbab
> <arbab@linux.vnet.ibm.com>; Yasuaki Ishimatsu <yasu.isimatu@gmail.com>;
> qiuxishi@huawei.com; Igor Mammedov <imammedo@redhat.com>; Vitaly
> Kuznetsov <vkuznets@redhat.com>; LKML <linux-kernel@vger.kernel.org>;
> Leo Li <leoyang.li@nxp.com>; Xiaobo Xie <xiaobo.xie@nxp.com>
> Subject: Re: [PATCH 1/2] mm: drop migrate type checks from
> has_unmovable_pages
> 
> On Mon 13-11-17 07:33:13, Ran Wang wrote:
> > Hello Michal,
> >
> > <snip>
> >
> > > Date: Fri, 13 Oct 2017 14:00:12 +0200
> > >
> > > From: Michal Hocko <mhocko@suse.com>
> > >
> > > Michael has noticed that the memory offline tries to migrate kernel
> > > code pages when doing  echo 0 >
> > > /sys/devices/system/memory/memory0/online
> > >
> > > The current implementation will fail the operation after several
> > > failed page migration attempts but we shouldn't even attempt to
> > > migrate that memory and fail right away because this memory is
> > > clearly not migrateable. This will become a real problem when we drop
> the retry loop counter resp. timeout.
> > >
> > > The real problem is in has_unmovable_pages in fact. We should fail
> > > if there are any non migrateable pages in the area. In orther to
> > > guarantee that remove the migrate type checks because
> > > MIGRATE_MOVABLE is not guaranteed to contain only migrateable pages.
> It is merely a heuristic.
> > > Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> > > allocate any non-migrateable pages from the block but CMA
> > > allocations themselves are unlikely to migrateable. Therefore remove
> both checks.
> > >
> > > Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > Tested-by: Michael Ellerman <mpe@ellerman.id.au>
> > > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > > ---
> > >  mm/page_alloc.c | 3 ---
> > >  1 file changed, 3 deletions(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> > > 3badcedf96a7..ad0294ab3e4f 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone,
> > > struct page *page, int count,
> > >  	 */
> > >  	if (zone_idx(zone) == ZONE_MOVABLE)
> > >  		return false;
> > > -	mt = get_pageblock_migratetype(page);
> > > -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> > > -		return false;
> >
> > This drop cause DWC3 USB controller fail on initialization with
> > Layerscaper processors (such as LS1043A) as below:
> >
> > [    2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned
> bus number 1
> > [    2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > [    2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > [    2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > [    2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > [    2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> >
> > And I notice that someone also reported to you that DWC2 got affected
> > recently, so do you have the solution now?
> 
> Yes. It should be in linux-next. Have a look at the following email
> thread:
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> 583607877&sdata=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> &reserved=0

Thanks for your info, although I fail to open the link you shared, but I got patch
from my colleague and the issue got fix on my side, let you know, thanks.

Best Regards,
Ran
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-11-14  6:10       ` Ran Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Ran Wang @ 2017-11-14  6:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Leo Li, Xiaobo Xie

Hi Michal,

> -----Original Message-----
> From: Michal Hocko [mailto:mhocko@kernel.org]
> Sent: Monday, November 13, 2017 7:03 PM
> To: Ran Wang <ran.wang_1@nxp.com>
> Cc: linux-mm@kvack.org; Michael Ellerman <mpe@ellerman.id.au>; Vlastimil
> Babka <vbabka@suse.cz>; Andrew Morton <akpm@linux-foundation.org>;
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>; Reza Arbab
> <arbab@linux.vnet.ibm.com>; Yasuaki Ishimatsu <yasu.isimatu@gmail.com>;
> qiuxishi@huawei.com; Igor Mammedov <imammedo@redhat.com>; Vitaly
> Kuznetsov <vkuznets@redhat.com>; LKML <linux-kernel@vger.kernel.org>;
> Leo Li <leoyang.li@nxp.com>; Xiaobo Xie <xiaobo.xie@nxp.com>
> Subject: Re: [PATCH 1/2] mm: drop migrate type checks from
> has_unmovable_pages
> 
> On Mon 13-11-17 07:33:13, Ran Wang wrote:
> > Hello Michal,
> >
> > <snip>
> >
> > > Date: Fri, 13 Oct 2017 14:00:12 +0200
> > >
> > > From: Michal Hocko <mhocko@suse.com>
> > >
> > > Michael has noticed that the memory offline tries to migrate kernel
> > > code pages when doing  echo 0 >
> > > /sys/devices/system/memory/memory0/online
> > >
> > > The current implementation will fail the operation after several
> > > failed page migration attempts but we shouldn't even attempt to
> > > migrate that memory and fail right away because this memory is
> > > clearly not migrateable. This will become a real problem when we drop
> the retry loop counter resp. timeout.
> > >
> > > The real problem is in has_unmovable_pages in fact. We should fail
> > > if there are any non migrateable pages in the area. In orther to
> > > guarantee that remove the migrate type checks because
> > > MIGRATE_MOVABLE is not guaranteed to contain only migrateable pages.
> It is merely a heuristic.
> > > Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> > > allocate any non-migrateable pages from the block but CMA
> > > allocations themselves are unlikely to migrateable. Therefore remove
> both checks.
> > >
> > > Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > Tested-by: Michael Ellerman <mpe@ellerman.id.au>
> > > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > > ---
> > >  mm/page_alloc.c | 3 ---
> > >  1 file changed, 3 deletions(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> > > 3badcedf96a7..ad0294ab3e4f 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone,
> > > struct page *page, int count,
> > >  	 */
> > >  	if (zone_idx(zone) == ZONE_MOVABLE)
> > >  		return false;
> > > -	mt = get_pageblock_migratetype(page);
> > > -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> > > -		return false;
> >
> > This drop cause DWC3 USB controller fail on initialization with
> > Layerscaper processors (such as LS1043A) as below:
> >
> > [    2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned
> bus number 1
> > [    2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > [    2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > [    2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > [    2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > [    2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> >
> > And I notice that someone also reported to you that DWC2 got affected
> > recently, so do you have the solution now?
> 
> Yes. It should be in linux-next. Have a look at the following email
> thread:
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> 583607877&sdata=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> &reserved=0

Thanks for your info, although I fail to open the link you shared, but I got patch
from my colleague and the issue got fix on my side, let you know, thanks.

Best Regards,
Ran
> --
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
  2017-11-13  7:33   ` Ran Wang
@ 2017-11-13 11:02     ` Michal Hocko
  -1 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-11-13 11:02 UTC (permalink / raw)
  To: Ran Wang
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Leo Li, Xiaobo Xie

On Mon 13-11-17 07:33:13, Ran Wang wrote:
> Hello Michal,
> 
> <snip>
> 
> > Date: Fri, 13 Oct 2017 14:00:12 +0200
> > 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Michael has noticed that the memory offline tries to migrate kernel code
> > pages when doing  echo 0 > /sys/devices/system/memory/memory0/online
> > 
> > The current implementation will fail the operation after several failed page
> > migration attempts but we shouldn't even attempt to migrate that memory
> > and fail right away because this memory is clearly not migrateable. This will
> > become a real problem when we drop the retry loop counter resp. timeout.
> > 
> > The real problem is in has_unmovable_pages in fact. We should fail if there
> > are any non migrateable pages in the area. In orther to guarantee that
> > remove the migrate type checks because MIGRATE_MOVABLE is not
> > guaranteed to contain only migrateable pages. It is merely a heuristic.
> > Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> > allocate any non-migrateable pages from the block but CMA allocations
> > themselves are unlikely to migrateable. Therefore remove both checks.
> > 
> > Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > Tested-by: Michael Ellerman <mpe@ellerman.id.au>
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > ---
> >  mm/page_alloc.c | 3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> > 3badcedf96a7..ad0294ab3e4f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone,
> > struct page *page, int count,
> >  	 */
> >  	if (zone_idx(zone) == ZONE_MOVABLE)
> >  		return false;
> > -	mt = get_pageblock_migratetype(page);
> > -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> > -		return false;
> 
> This drop cause DWC3 USB controller fail on initialization with Layerscaper processors
> (such as LS1043A) as below:
> 
> [    2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
> [    2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> [    2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> [    2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> [    2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> [    2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> 
> And I notice that someone also reported to you that DWC2 got affected recently,
> so do you have the solution now?

Yes. It should be in linux-next. Have a look at the following email
thread: 
http://lkml.kernel.org/r/20171104082500.qvzbb2kw4suo6cgy@dhcp22.suse.cz

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-11-13 11:02     ` Michal Hocko
  0 siblings, 0 replies; 102+ messages in thread
From: Michal Hocko @ 2017-11-13 11:02 UTC (permalink / raw)
  To: Ran Wang
  Cc: linux-mm, Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Leo Li, Xiaobo Xie

On Mon 13-11-17 07:33:13, Ran Wang wrote:
> Hello Michal,
> 
> <snip>
> 
> > Date: Fri, 13 Oct 2017 14:00:12 +0200
> > 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Michael has noticed that the memory offline tries to migrate kernel code
> > pages when doing  echo 0 > /sys/devices/system/memory/memory0/online
> > 
> > The current implementation will fail the operation after several failed page
> > migration attempts but we shouldn't even attempt to migrate that memory
> > and fail right away because this memory is clearly not migrateable. This will
> > become a real problem when we drop the retry loop counter resp. timeout.
> > 
> > The real problem is in has_unmovable_pages in fact. We should fail if there
> > are any non migrateable pages in the area. In orther to guarantee that
> > remove the migrate type checks because MIGRATE_MOVABLE is not
> > guaranteed to contain only migrateable pages. It is merely a heuristic.
> > Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> > allocate any non-migrateable pages from the block but CMA allocations
> > themselves are unlikely to migrateable. Therefore remove both checks.
> > 
> > Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > Tested-by: Michael Ellerman <mpe@ellerman.id.au>
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > ---
> >  mm/page_alloc.c | 3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> > 3badcedf96a7..ad0294ab3e4f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone,
> > struct page *page, int count,
> >  	 */
> >  	if (zone_idx(zone) == ZONE_MOVABLE)
> >  		return false;
> > -	mt = get_pageblock_migratetype(page);
> > -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> > -		return false;
> 
> This drop cause DWC3 USB controller fail on initialization with Layerscaper processors
> (such as LS1043A) as below:
> 
> [    2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
> [    2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> [    2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> [    2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> [    2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> [    2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> 
> And I notice that someone also reported to you that DWC2 got affected recently,
> so do you have the solution now?

Yes. It should be in linux-next. Have a look at the following email
thread: 
http://lkml.kernel.org/r/20171104082500.qvzbb2kw4suo6cgy@dhcp22.suse.cz

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
       [not found] <AM3PR04MB14892A9D6D2FBCE21B8C1F0FF12B0@AM3PR04MB1489.eurprd04.prod.outlook.com>
@ 2017-11-13  7:33   ` Ran Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Ran Wang @ 2017-11-13  7:33 UTC (permalink / raw)
  To: linux-mm, Michal Hocko
  Cc: Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Leo Li, Xiaobo Xie

Hello Michal,

<snip>

> Date: Fri, 13 Oct 2017 14:00:12 +0200
> 
> From: Michal Hocko <mhocko@suse.com>
> 
> Michael has noticed that the memory offline tries to migrate kernel code
> pages when doing  echo 0 > /sys/devices/system/memory/memory0/online
> 
> The current implementation will fail the operation after several failed page
> migration attempts but we shouldn't even attempt to migrate that memory
> and fail right away because this memory is clearly not migrateable. This will
> become a real problem when we drop the retry loop counter resp. timeout.
> 
> The real problem is in has_unmovable_pages in fact. We should fail if there
> are any non migrateable pages in the area. In orther to guarantee that
> remove the migrate type checks because MIGRATE_MOVABLE is not
> guaranteed to contain only migrateable pages. It is merely a heuristic.
> Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> allocate any non-migrateable pages from the block but CMA allocations
> themselves are unlikely to migrateable. Therefore remove both checks.
> 
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> Tested-by: Michael Ellerman <mpe@ellerman.id.au>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> 3badcedf96a7..ad0294ab3e4f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone,
> struct page *page, int count,
>  	 */
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return false;
> -	mt = get_pageblock_migratetype(page);
> -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> -		return false;

This drop cause DWC3 USB controller fail on initialization with Layerscaper processors
(such as LS1043A) as below:

[    2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
[    2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
[    2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
[    2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
[    2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
[    2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller

And I notice that someone also reported to you that DWC2 got affected recently,
so do you have the solution now?

Best regards

Ran
> 
>  	pfn = page_to_pfn(page);
>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages
@ 2017-11-13  7:33   ` Ran Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Ran Wang @ 2017-11-13  7:33 UTC (permalink / raw)
  To: linux-mm, Michal Hocko
  Cc: Michael Ellerman, Vlastimil Babka, Andrew Morton,
	KAMEZAWA Hiroyuki, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Igor Mammedov, Vitaly Kuznetsov, LKML, Leo Li, Xiaobo Xie

Hello Michal,

<snip>

> Date: Fri, 13 Oct 2017 14:00:12 +0200
> 
> From: Michal Hocko <mhocko@suse.com>
> 
> Michael has noticed that the memory offline tries to migrate kernel code
> pages when doing  echo 0 > /sys/devices/system/memory/memory0/online
> 
> The current implementation will fail the operation after several failed page
> migration attempts but we shouldn't even attempt to migrate that memory
> and fail right away because this memory is clearly not migrateable. This will
> become a real problem when we drop the retry loop counter resp. timeout.
> 
> The real problem is in has_unmovable_pages in fact. We should fail if there
> are any non migrateable pages in the area. In orther to guarantee that
> remove the migrate type checks because MIGRATE_MOVABLE is not
> guaranteed to contain only migrateable pages. It is merely a heuristic.
> Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> allocate any non-migrateable pages from the block but CMA allocations
> themselves are unlikely to migrateable. Therefore remove both checks.
> 
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> Tested-by: Michael Ellerman <mpe@ellerman.id.au>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> 3badcedf96a7..ad0294ab3e4f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone,
> struct page *page, int count,
>  	 */
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return false;
> -	mt = get_pageblock_migratetype(page);
> -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> -		return false;

This drop cause DWC3 USB controller fail on initialization with Layerscaper processors
(such as LS1043A) as below:

[    2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
[    2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
[    2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
[    2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
[    2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
[    2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller

And I notice that someone also reported to you that DWC2 got affected recently,
so do you have the solution now?

Best regards

Ran
> 
>  	pfn = page_to_pfn(page);
>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2017-11-14  7:45 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-18  7:08 [PATCH v2 0/2] mm, memory_hotplug: redefine memory offline retry logic Michal Hocko
2017-09-18  7:08 ` Michal Hocko
2017-09-18  7:08 ` [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early Michal Hocko
2017-09-18  7:08   ` Michal Hocko
2017-10-10 12:05   ` Michael Ellerman
2017-10-10 12:05     ` Michael Ellerman
2017-10-10 12:27     ` Michal Hocko
2017-10-10 12:27       ` Michal Hocko
2017-10-11  2:37       ` Michael Ellerman
2017-10-11  2:37         ` Michael Ellerman
2017-10-11  5:19         ` Michael Ellerman
2017-10-11  5:19           ` Michael Ellerman
2017-10-11 14:05           ` Anshuman Khandual
2017-10-11 14:05             ` Anshuman Khandual
2017-10-11 14:16             ` Michal Hocko
2017-10-11 14:16               ` Michal Hocko
2017-10-11  6:51         ` Michal Hocko
2017-10-11  6:51           ` Michal Hocko
2017-10-11  8:04           ` Vlastimil Babka
2017-10-11  8:04             ` Vlastimil Babka
2017-10-11  8:13             ` Michal Hocko
2017-10-11  8:13               ` Michal Hocko
2017-10-11 11:17               ` Vlastimil Babka
2017-10-11 11:17                 ` Vlastimil Babka
2017-10-11 11:24                 ` Michal Hocko
2017-10-11 11:24                   ` Michal Hocko
2017-10-13 11:42             ` Michael Ellerman
2017-10-13 11:42               ` Michael Ellerman
2017-10-13 11:58               ` Michal Hocko
2017-10-13 11:58                 ` Michal Hocko
2017-10-13 12:00                 ` [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages Michal Hocko
2017-10-13 12:00                   ` Michal Hocko
2017-10-13 12:00                   ` [PATCH 2/2] mm, page_alloc: fail has_unmovable_pages when seeing reserved pages Michal Hocko
2017-10-13 12:00                     ` Michal Hocko
2017-10-13 12:04                     ` Vlastimil Babka
2017-10-13 12:04                       ` Vlastimil Babka
2017-10-13 12:07                       ` Michal Hocko
2017-10-13 12:07                         ` Michal Hocko
2017-10-17 13:03                         ` Vlastimil Babka
2017-10-17 13:03                           ` Vlastimil Babka
2017-10-17 11:41                   ` [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages Michael Ellerman
2017-10-17 11:41                     ` Michael Ellerman
2017-10-17 12:03                     ` Michal Hocko
2017-10-17 12:03                       ` Michal Hocko
2017-10-17 13:02                   ` Vlastimil Babka
2017-10-17 13:02                     ` Vlastimil Babka
2017-10-19  2:51                   ` Joonsoo Kim
2017-10-19  2:51                     ` Joonsoo Kim
2017-10-19  7:15                     ` Michal Hocko
2017-10-19  7:15                       ` Michal Hocko
2017-10-19  7:33                       ` Joonsoo Kim
2017-10-19  7:33                         ` Joonsoo Kim
2017-10-19  8:20                         ` Michal Hocko
2017-10-19  8:20                           ` Michal Hocko
2017-10-19 12:21                           ` Michal Hocko
2017-10-19 12:21                             ` Michal Hocko
2017-10-20  2:13                             ` Joonsoo Kim
2017-10-20  2:13                               ` Joonsoo Kim
2017-10-20  5:59                               ` Michal Hocko
2017-10-20  5:59                                 ` Michal Hocko
2017-10-20  6:50                                 ` Joonsoo Kim
2017-10-20  6:50                                   ` Joonsoo Kim
2017-10-20  7:02                                   ` Michal Hocko
2017-10-20  7:02                                     ` Michal Hocko
2017-10-23  5:23                                     ` Joonsoo Kim
2017-10-23  5:23                                       ` Joonsoo Kim
2017-10-23  8:10                                       ` Michal Hocko
2017-10-23  8:10                                         ` Michal Hocko
2017-10-24  4:44                                         ` Joonsoo Kim
2017-10-24  4:44                                           ` Joonsoo Kim
2017-10-24  7:44                                           ` Michal Hocko
2017-10-24  7:44                                             ` Michal Hocko
2017-10-24  8:12                                           ` Vlastimil Babka
2017-10-24  8:12                                             ` Vlastimil Babka
2017-10-24 12:25                                             ` Michal Hocko
2017-10-24 12:25                                               ` Michal Hocko
2017-10-26  2:47                                             ` Joonsoo Kim
2017-10-26  2:47                                               ` Joonsoo Kim
2017-10-26  7:41                                               ` Michal Hocko
2017-10-26  7:41                                                 ` Michal Hocko
2017-10-20  7:22                               ` Xishi Qiu
2017-10-20  7:22                                 ` Xishi Qiu
2017-10-20  8:17                                 ` Michal Hocko
2017-10-20  8:17                                   ` Michal Hocko
2017-10-23  5:26                                   ` Joonsoo Kim
2017-10-23  5:26                                     ` Joonsoo Kim
2017-10-26 13:04                             ` Vlastimil Babka
2017-10-26 13:04                               ` Vlastimil Babka
2017-10-26 13:59                             ` Michal Hocko
2017-10-26 13:59                               ` Michal Hocko
2017-09-18  7:08 ` [PATCH 2/2] mm, memory_hotplug: remove timeout from __offline_memory Michal Hocko
2017-09-18  7:08   ` Michal Hocko
     [not found] <AM3PR04MB14892A9D6D2FBCE21B8C1F0FF12B0@AM3PR04MB1489.eurprd04.prod.outlook.com>
2017-11-13  7:33 ` [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages Ran Wang
2017-11-13  7:33   ` Ran Wang
2017-11-13 11:02   ` Michal Hocko
2017-11-13 11:02     ` Michal Hocko
2017-11-14  6:10     ` Ran Wang
2017-11-14  6:10       ` Ran Wang
2017-11-14  7:06       ` Michal Hocko
2017-11-14  7:06         ` Michal Hocko
2017-11-14  7:45         ` Ran Wang
2017-11-14  7:45           ` Ran Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.