All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] few memory offlining enhancements
@ 2018-11-20 13:43 ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Oscar Salvador, Pavel Tatashin, David Hildenbrand,
	LKML, Jan Kara, Kirill A. Shutemov, Michal Hocko

I have been chasing memory offlining not making progress recently. On
the way I have noticed few weird decisions in the code. The migration
itself is restricted without a reasonable justification and the retry
loop around the migration is quite messy. This is addressed by patch 1
and patch 2.

Patch 3 is targeting on the faultaround code which has been a hot
candidate for the initial issue reported upstream [1] and that I am
debugging internally. It turned out to be not the main contributor
in the end but I believe we should address it regardless. See the patch
description for more details.

[1] http://lkml.kernel.org/r/20181114070909.GB2653@MiWiFi-R3L-srv



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 0/3] few memory offlining enhancements
@ 2018-11-20 13:43 ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Oscar Salvador, Pavel Tatashin, David Hildenbrand,
	LKML, Jan Kara, Kirill A. Shutemov, Michal Hocko

I have been chasing memory offlining not making progress recently. On
the way I have noticed few weird decisions in the code. The migration
itself is restricted without a reasonable justification and the retry
loop around the migration is quite messy. This is addressed by patch 1
and patch 2.

Patch 3 is targeting on the faultaround code which has been a hot
candidate for the initial issue reported upstream [1] and that I am
debugging internally. It turned out to be not the main contributor
in the end but I believe we should address it regardless. See the patch
description for more details.

[1] http://lkml.kernel.org/r/20181114070909.GB2653@MiWiFi-R3L-srv

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages
  2018-11-20 13:43 ` Michal Hocko
@ 2018-11-20 13:43   ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Oscar Salvador, Pavel Tatashin, David Hildenbrand,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

do_migrate_range has been limiting the number of pages to migrate to 256
for some reason which is not documented. Even if the limit made some
sense back then when it was introduced it doesn't really serve a good
purpose these days. If the range contains huge pages then
we break out of the loop too early and go through LRU and pcp
caches draining and scan_movable_pages is quite suboptimal.

The only reason to limit the number of pages I can think of is to reduce
the potential time to react on the fatal signal. But even then the
number of pages is a questionable metric because even a single page
might migration block in a non-killable state (e.g. __unmap_and_move).

Remove the limit and offline the full requested range (this is one
membblock worth of pages with the current code). Should we ever get a
report that offlining takes too long to react on fatal signal then we
should rather fix the core migration to use killable waits and bailout
on a signal.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c82193db4be6..6263c8cd4491 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1339,18 +1339,16 @@ static struct page *new_node_page(struct page *page, unsigned long private)
 	return new_page_nodemask(page, nid, &nmask);
 }
 
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
 static int
 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
 	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
 	int not_managed = 0;
 	int ret = 0;
 	LIST_HEAD(source);
 
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		if (!pfn_valid(pfn))
 			continue;
 		page = pfn_to_page(pfn);
@@ -1362,8 +1360,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 				ret = -EBUSY;
 				break;
 			}
-			if (isolate_huge_page(page, &source))
-				move_pages -= 1 << compound_order(head);
+			isolate_huge_page(page, &source);
 			continue;
 		} else if (PageTransHuge(page))
 			pfn = page_to_pfn(compound_head(page))
@@ -1382,7 +1379,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		if (!ret) { /* Success */
 			put_page(page);
 			list_add_tail(&page->lru, &source);
-			move_pages--;
 			if (!__PageMovable(page))
 				inc_node_page_state(page, NR_ISOLATED_ANON +
 						    page_is_file_cache(page));
-- 
2.19.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages
@ 2018-11-20 13:43   ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Oscar Salvador, Pavel Tatashin, David Hildenbrand,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

do_migrate_range has been limiting the number of pages to migrate to 256
for some reason which is not documented. Even if the limit made some
sense back then when it was introduced it doesn't really serve a good
purpose these days. If the range contains huge pages then
we break out of the loop too early and go through LRU and pcp
caches draining and scan_movable_pages is quite suboptimal.

The only reason to limit the number of pages I can think of is to reduce
the potential time to react on the fatal signal. But even then the
number of pages is a questionable metric because even a single page
might migration block in a non-killable state (e.g. __unmap_and_move).

Remove the limit and offline the full requested range (this is one
membblock worth of pages with the current code). Should we ever get a
report that offlining takes too long to react on fatal signal then we
should rather fix the core migration to use killable waits and bailout
on a signal.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c82193db4be6..6263c8cd4491 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1339,18 +1339,16 @@ static struct page *new_node_page(struct page *page, unsigned long private)
 	return new_page_nodemask(page, nid, &nmask);
 }
 
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
 static int
 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
 	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
 	int not_managed = 0;
 	int ret = 0;
 	LIST_HEAD(source);
 
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		if (!pfn_valid(pfn))
 			continue;
 		page = pfn_to_page(pfn);
@@ -1362,8 +1360,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 				ret = -EBUSY;
 				break;
 			}
-			if (isolate_huge_page(page, &source))
-				move_pages -= 1 << compound_order(head);
+			isolate_huge_page(page, &source);
 			continue;
 		} else if (PageTransHuge(page))
 			pfn = page_to_pfn(compound_head(page))
@@ -1382,7 +1379,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		if (!ret) { /* Success */
 			put_page(page);
 			list_add_tail(&page->lru, &source);
-			move_pages--;
 			if (!__PageMovable(page))
 				inc_node_page_state(page, NR_ISOLATED_ANON +
 						    page_is_file_cache(page));
-- 
2.19.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining
  2018-11-20 13:43 ` Michal Hocko
@ 2018-11-20 13:43   ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Oscar Salvador, Pavel Tatashin, David Hildenbrand,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Memory migration might fail during offlining and we keep retrying in
that case. This is currently obfuscate by goto retry loop. The code
is hard to follow and as a result it is even suboptimal becase each
retry round scans the full range from start_pfn even though we have
successfully scanned/migrated [start_pfn, pfn] range already. This
is all only because check_pages_isolated failure has to rescan the full
range again.

De-obfuscate the migration retry loop by promoting it to a real for
loop. In fact remove the goto altogether by making it a proper double
loop (yeah, gotos are nasty in this specific case). In the end we
will get a slightly more optimal code which is better readable.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 60 +++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6263c8cd4491..9cd161db3061 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1591,38 +1591,40 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal_isolated;
 	}
 
-	pfn = start_pfn;
-repeat:
-	/* start memory hot removal */
-	ret = -EINTR;
-	if (signal_pending(current)) {
-		reason = "signal backoff";
-		goto failed_removal_isolated;
-	}
+	do {
+		for (pfn = start_pfn; pfn;)
+		{
+			/* start memory hot removal */
+			ret = -EINTR;
+			if (signal_pending(current)) {
+				reason = "signal backoff";
+				goto failed_removal_isolated;
+			}
 
-	cond_resched();
-	lru_add_drain_all();
-	drain_all_pages(zone);
+			cond_resched();
+			lru_add_drain_all();
+			drain_all_pages(zone);
 
-	pfn = scan_movable_pages(start_pfn, end_pfn);
-	if (pfn) { /* We have movable pages */
-		ret = do_migrate_range(pfn, end_pfn);
-		goto repeat;
-	}
+			pfn = scan_movable_pages(pfn, end_pfn);
+			if (pfn) {
+				/* TODO fatal migration failures should bail out */
+				do_migrate_range(pfn, end_pfn);
+			}
+		}
+
+		/*
+		 * dissolve free hugepages in the memory block before doing offlining
+		 * actually in order to make hugetlbfs's object counting consistent.
+		 */
+		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
+		if (ret) {
+			reason = "failure to dissolve huge pages";
+			goto failed_removal_isolated;
+		}
+		/* check again */
+		offlined_pages = check_pages_isolated(start_pfn, end_pfn);
+	} while (offlined_pages < 0);
 
-	/*
-	 * dissolve free hugepages in the memory block before doing offlining
-	 * actually in order to make hugetlbfs's object counting consistent.
-	 */
-	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
-	if (ret) {
-		reason = "failure to dissolve huge pages";
-		goto failed_removal_isolated;
-	}
-	/* check again */
-	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
-	if (offlined_pages < 0)
-		goto repeat;
 	pr_info("Offlined Pages %ld\n", offlined_pages);
 	/* Ok, all of our target is isolated.
 	   We cannot do rollback at this point. */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining
@ 2018-11-20 13:43   ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Oscar Salvador, Pavel Tatashin, David Hildenbrand,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Memory migration might fail during offlining and we keep retrying in
that case. This is currently obfuscate by goto retry loop. The code
is hard to follow and as a result it is even suboptimal becase each
retry round scans the full range from start_pfn even though we have
successfully scanned/migrated [start_pfn, pfn] range already. This
is all only because check_pages_isolated failure has to rescan the full
range again.

De-obfuscate the migration retry loop by promoting it to a real for
loop. In fact remove the goto altogether by making it a proper double
loop (yeah, gotos are nasty in this specific case). In the end we
will get a slightly more optimal code which is better readable.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 60 +++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6263c8cd4491..9cd161db3061 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1591,38 +1591,40 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal_isolated;
 	}
 
-	pfn = start_pfn;
-repeat:
-	/* start memory hot removal */
-	ret = -EINTR;
-	if (signal_pending(current)) {
-		reason = "signal backoff";
-		goto failed_removal_isolated;
-	}
+	do {
+		for (pfn = start_pfn; pfn;)
+		{
+			/* start memory hot removal */
+			ret = -EINTR;
+			if (signal_pending(current)) {
+				reason = "signal backoff";
+				goto failed_removal_isolated;
+			}
 
-	cond_resched();
-	lru_add_drain_all();
-	drain_all_pages(zone);
+			cond_resched();
+			lru_add_drain_all();
+			drain_all_pages(zone);
 
-	pfn = scan_movable_pages(start_pfn, end_pfn);
-	if (pfn) { /* We have movable pages */
-		ret = do_migrate_range(pfn, end_pfn);
-		goto repeat;
-	}
+			pfn = scan_movable_pages(pfn, end_pfn);
+			if (pfn) {
+				/* TODO fatal migration failures should bail out */
+				do_migrate_range(pfn, end_pfn);
+			}
+		}
+
+		/*
+		 * dissolve free hugepages in the memory block before doing offlining
+		 * actually in order to make hugetlbfs's object counting consistent.
+		 */
+		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
+		if (ret) {
+			reason = "failure to dissolve huge pages";
+			goto failed_removal_isolated;
+		}
+		/* check again */
+		offlined_pages = check_pages_isolated(start_pfn, end_pfn);
+	} while (offlined_pages < 0);
 
-	/*
-	 * dissolve free hugepages in the memory block before doing offlining
-	 * actually in order to make hugetlbfs's object counting consistent.
-	 */
-	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
-	if (ret) {
-		reason = "failure to dissolve huge pages";
-		goto failed_removal_isolated;
-	}
-	/* check again */
-	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
-	if (offlined_pages < 0)
-		goto repeat;
 	pr_info("Offlined Pages %ld\n", offlined_pages);
 	/* Ok, all of our target is isolated.
 	   We cannot do rollback at this point. */
-- 
2.19.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-20 13:43 ` Michal Hocko
@ 2018-11-20 13:43   ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Oscar Salvador, Pavel Tatashin, David Hildenbrand,
	LKML, Michal Hocko, Kirill A. Shutemov

From: Michal Hocko <mhocko@suse.com>

filemap_map_pages takes a speculative reference to each page in the
range before it tries to lock that page. While this is correct it
also can influence page migration which will bail out when seeing
an elevated reference count. The faultaround code would bail on
seeing a locked page so we can pro-actively check the PageLocked
bit before page_cache_get_speculative and prevent from pointless
reference count churn.

Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/filemap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 81adec8ee02c..c76d6a251770 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2553,6 +2553,9 @@ void filemap_map_pages(struct vm_fault *vmf,
 			goto next;
 
 		head = compound_head(page);
+
+		if (PageLocked(head))
+			goto next;
 		if (!page_cache_get_speculative(head))
 			goto next;
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
@ 2018-11-20 13:43   ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Oscar Salvador, Pavel Tatashin, David Hildenbrand,
	LKML, Michal Hocko, Kirill A. Shutemov

From: Michal Hocko <mhocko@suse.com>

filemap_map_pages takes a speculative reference to each page in the
range before it tries to lock that page. While this is correct it
also can influence page migration which will bail out when seeing
an elevated reference count. The faultaround code would bail on
seeing a locked page so we can pro-actively check the PageLocked
bit before page_cache_get_speculative and prevent from pointless
reference count churn.

Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/filemap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 81adec8ee02c..c76d6a251770 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2553,6 +2553,9 @@ void filemap_map_pages(struct vm_fault *vmf,
 			goto next;
 
 		head = compound_head(page);
+
+		if (PageLocked(head))
+			goto next;
 		if (!page_cache_get_speculative(head))
 			goto next;
 
-- 
2.19.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-20 13:43   ` Michal Hocko
  (?)
@ 2018-11-20 14:07   ` Kirill A. Shutemov
  2018-11-20 14:12     ` Michal Hocko
  -1 siblings, 1 reply; 31+ messages in thread
From: Kirill A. Shutemov @ 2018-11-20 14:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin,
	David Hildenbrand, LKML, Michal Hocko

On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> filemap_map_pages takes a speculative reference to each page in the
> range before it tries to lock that page. While this is correct it
> also can influence page migration which will bail out when seeing
> an elevated reference count. The faultaround code would bail on
> seeing a locked page so we can pro-actively check the PageLocked
> bit before page_cache_get_speculative and prevent from pointless
> reference count churn.

Looks fine to me.

But please drop a line of comment in the code. As is it might be confusing
for a reader.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-20 14:07   ` Kirill A. Shutemov
@ 2018-11-20 14:12     ` Michal Hocko
  2018-11-20 14:17       ` Kirill A. Shutemov
  2018-11-21  4:51       ` William Kucharski
  0 siblings, 2 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 14:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin,
	David Hildenbrand, LKML

On Tue 20-11-18 17:07:15, Kirill A. Shutemov wrote:
> On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > filemap_map_pages takes a speculative reference to each page in the
> > range before it tries to lock that page. While this is correct it
> > also can influence page migration which will bail out when seeing
> > an elevated reference count. The faultaround code would bail on
> > seeing a locked page so we can pro-actively check the PageLocked
> > bit before page_cache_get_speculative and prevent from pointless
> > reference count churn.
> 
> Looks fine to me.

Thanks for the review.

> But please drop a line of comment in the code. As is it might be confusing
> for a reader.

This?

diff --git a/mm/filemap.c b/mm/filemap.c
index c76d6a251770..7c4e439a2e85 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2554,6 +2554,10 @@ void filemap_map_pages(struct vm_fault *vmf,
 
 		head = compound_head(page);
 
+		/*
+		 * Check the locked pages before taking a reference to not
+		 * go in the way of migration.
+		 */
 		if (PageLocked(head))
 			goto next;
 		if (!page_cache_get_speculative(head))
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-20 14:12     ` Michal Hocko
@ 2018-11-20 14:17       ` Kirill A. Shutemov
  2018-11-20 14:25         ` Michal Hocko
  2018-11-21  4:51       ` William Kucharski
  1 sibling, 1 reply; 31+ messages in thread
From: Kirill A. Shutemov @ 2018-11-20 14:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin,
	David Hildenbrand, LKML

On Tue, Nov 20, 2018 at 03:12:07PM +0100, Michal Hocko wrote:
> On Tue 20-11-18 17:07:15, Kirill A. Shutemov wrote:
> > On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > filemap_map_pages takes a speculative reference to each page in the
> > > range before it tries to lock that page. While this is correct it
> > > also can influence page migration which will bail out when seeing
> > > an elevated reference count. The faultaround code would bail on
> > > seeing a locked page so we can pro-actively check the PageLocked
> > > bit before page_cache_get_speculative and prevent from pointless
> > > reference count churn.
> > 
> > Looks fine to me.
> 
> Thanks for the review.
> 
> > But please drop a line of comment in the code. As is it might be confusing
> > for a reader.
> 
> This?

Yep.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c76d6a251770..7c4e439a2e85 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2554,6 +2554,10 @@ void filemap_map_pages(struct vm_fault *vmf,
>  
>  		head = compound_head(page);
>  
> +		/*
> +		 * Check the locked pages before taking a reference to not
> +		 * go in the way of migration.
> +		 */
>  		if (PageLocked(head))
>  			goto next;
>  		if (!page_cache_get_speculative(head))
> -- 
> Michal Hocko
> SUSE Labs

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages
  2018-11-20 13:43   ` Michal Hocko
  (?)
@ 2018-11-20 14:18   ` David Hildenbrand
  2018-11-20 14:25     ` Michal Hocko
  -1 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2018-11-20 14:18 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Andrew Morton, Oscar Salvador, Pavel Tatashin, LKML, Michal Hocko

On 20.11.18 14:43, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> do_migrate_range has been limiting the number of pages to migrate to 256
> for some reason which is not documented. Even if the limit made some
> sense back then when it was introduced it doesn't really serve a good
> purpose these days. If the range contains huge pages then
> we break out of the loop too early and go through LRU and pcp
> caches draining and scan_movable_pages is quite suboptimal.
> 
> The only reason to limit the number of pages I can think of is to reduce
> the potential time to react on the fatal signal. But even then the
> number of pages is a questionable metric because even a single page
> might migration block in a non-killable state (e.g. __unmap_and_move).
> 
> Remove the limit and offline the full requested range (this is one
> membblock worth of pages with the current code). Should we ever get a
> report that offlining takes too long to react on fatal signal then we
> should rather fix the core migration to use killable waits and bailout
> on a signal.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/memory_hotplug.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c82193db4be6..6263c8cd4491 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1339,18 +1339,16 @@ static struct page *new_node_page(struct page *page, unsigned long private)
>  	return new_page_nodemask(page, nid, &nmask);
>  }
>  
> -#define NR_OFFLINE_AT_ONCE_PAGES	(256)
>  static int
>  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  {
>  	unsigned long pfn;
>  	struct page *page;
> -	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>  	int not_managed = 0;
>  	int ret = 0;
>  	LIST_HEAD(source);
>  
> -	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>  		if (!pfn_valid(pfn))
>  			continue;
>  		page = pfn_to_page(pfn);
> @@ -1362,8 +1360,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  				ret = -EBUSY;
>  				break;
>  			}
> -			if (isolate_huge_page(page, &source))
> -				move_pages -= 1 << compound_order(head);
> +			isolate_huge_page(page, &source);
>  			continue;
>  		} else if (PageTransHuge(page))
>  			pfn = page_to_pfn(compound_head(page))
> @@ -1382,7 +1379,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  		if (!ret) { /* Success */
>  			put_page(page);
>  			list_add_tail(&page->lru, &source);
> -			move_pages--;
>  			if (!__PageMovable(page))
>  				inc_node_page_state(page, NR_ISOLATED_ANON +
>  						    page_is_file_cache(page));
> 

Yes, there is basically no statement why it was done that way. If it is
important, there should be one.

(we could also check for pending signals inside that function if really
required)

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages
  2018-11-20 14:18   ` David Hildenbrand
@ 2018-11-20 14:25     ` Michal Hocko
  2018-11-20 14:27       ` David Hildenbrand
  0 siblings, 1 reply; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 14:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin, LKML

On Tue 20-11-18 15:18:41, David Hildenbrand wrote:
[...]
> (we could also check for pending signals inside that function if really
> required)

do_migrate_pages is not the proper layer to check signals. Because the
loop only isolates pages and that is not expensive. The most expensive
part is deeper down in the migration core. We wait for page lock or
writeback and that can take a long. None of that is killable wait which
is a larger surgery but something that we should consider should there
be any need to address this.

> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-20 14:17       ` Kirill A. Shutemov
@ 2018-11-20 14:25         ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 14:25 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin,
	David Hildenbrand, LKML

On Tue 20-11-18 17:17:00, Kirill A. Shutemov wrote:
> On Tue, Nov 20, 2018 at 03:12:07PM +0100, Michal Hocko wrote:
> > On Tue 20-11-18 17:07:15, Kirill A. Shutemov wrote:
> > > On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > 
> > > > filemap_map_pages takes a speculative reference to each page in the
> > > > range before it tries to lock that page. While this is correct it
> > > > also can influence page migration which will bail out when seeing
> > > > an elevated reference count. The faultaround code would bail on
> > > > seeing a locked page so we can pro-actively check the PageLocked
> > > > bit before page_cache_get_speculative and prevent from pointless
> > > > reference count churn.
> > > 
> > > Looks fine to me.
> > 
> > Thanks for the review.
> > 
> > > But please drop a line of comment in the code. As is it might be confusing
> > > for a reader.
> > 
> > This?
> 
> Yep.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Cool, thanks! I will wait some more time for review feedback for other
patches and then repost with this folded in.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining
  2018-11-20 13:43   ` Michal Hocko
  (?)
@ 2018-11-20 14:26   ` David Hildenbrand
  2018-11-20 14:34     ` Michal Hocko
  -1 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2018-11-20 14:26 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Andrew Morton, Oscar Salvador, Pavel Tatashin, LKML, Michal Hocko

On 20.11.18 14:43, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Memory migration might fail during offlining and we keep retrying in
> that case. This is currently obfuscate by goto retry loop. The code
> is hard to follow and as a result it is even suboptimal becase each
> retry round scans the full range from start_pfn even though we have
> successfully scanned/migrated [start_pfn, pfn] range already. This
> is all only because check_pages_isolated failure has to rescan the full
> range again.
> 
> De-obfuscate the migration retry loop by promoting it to a real for
> loop. In fact remove the goto altogether by making it a proper double
> loop (yeah, gotos are nasty in this specific case). In the end we
> will get a slightly more optimal code which is better readable.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/memory_hotplug.c | 60 +++++++++++++++++++++++----------------------
>  1 file changed, 31 insertions(+), 29 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6263c8cd4491..9cd161db3061 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1591,38 +1591,40 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  		goto failed_removal_isolated;
>  	}
>  
> -	pfn = start_pfn;
> -repeat:
> -	/* start memory hot removal */
> -	ret = -EINTR;
> -	if (signal_pending(current)) {
> -		reason = "signal backoff";
> -		goto failed_removal_isolated;
> -	}
> +	do {
> +		for (pfn = start_pfn; pfn;)
> +		{

{ on a new line looks weird.

> +			/* start memory hot removal */
> +			ret = -EINTR;

I think we can move that into the "if (signal_pending(current))"

(if my eyes are not wrong, this will not be touched otherwise)

> +			if (signal_pending(current)) {
> +				reason = "signal backoff";
> +				goto failed_removal_isolated;
> +			}
>  
> -	cond_resched();
> -	lru_add_drain_all();
> -	drain_all_pages(zone);
> +			cond_resched();
> +			lru_add_drain_all();
> +			drain_all_pages(zone);
>  
> -	pfn = scan_movable_pages(start_pfn, end_pfn);
> -	if (pfn) { /* We have movable pages */
> -		ret = do_migrate_range(pfn, end_pfn);
> -		goto repeat;
> -	}
> +			pfn = scan_movable_pages(pfn, end_pfn);
> +			if (pfn) {
> +				/* TODO fatal migration failures should bail out */
> +				do_migrate_range(pfn, end_pfn);

Right, that return value was always ignored.

> +			}
> +		}
> +
> +		/*
> +		 * dissolve free hugepages in the memory block before doing offlining
> +		 * actually in order to make hugetlbfs's object counting consistent.
> +		 */
> +		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> +		if (ret) {
> +			reason = "failure to dissolve huge pages";
> +			goto failed_removal_isolated;
> +		}
> +		/* check again */
> +		offlined_pages = check_pages_isolated(start_pfn, end_pfn);
> +	} while (offlined_pages < 0);
>  
> -	/*
> -	 * dissolve free hugepages in the memory block before doing offlining
> -	 * actually in order to make hugetlbfs's object counting consistent.
> -	 */
> -	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> -	if (ret) {
> -		reason = "failure to dissolve huge pages";
> -		goto failed_removal_isolated;
> -	}
> -	/* check again */
> -	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
> -	if (offlined_pages < 0)
> -		goto repeat;
>  	pr_info("Offlined Pages %ld\n", offlined_pages);
>  	/* Ok, all of our target is isolated.
>  	   We cannot do rollback at this point. */
> 

Looks much better to me.


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages
  2018-11-20 14:25     ` Michal Hocko
@ 2018-11-20 14:27       ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2018-11-20 14:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin, LKML

On 20.11.18 15:25, Michal Hocko wrote:
> On Tue 20-11-18 15:18:41, David Hildenbrand wrote:
> [...]
>> (we could also check for pending signals inside that function if really
>> required)
> 
> do_migrate_pages is not the proper layer to check signals. Because the
> loop only isolates pages and that is not expensive. The most expensive
> part is deeper down in the migration core. We wait for page lock or
> writeback and that can take a long. None of that is killable wait which
> is a larger surgery but something that we should consider should there
> be any need to address this.

Indeed, that makes sense.

> 
>> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> Thanks!
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-20 13:43   ` Michal Hocko
  (?)
  (?)
@ 2018-11-20 14:33   ` David Hildenbrand
  -1 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2018-11-20 14:33 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Andrew Morton, Oscar Salvador, Pavel Tatashin, LKML,
	Michal Hocko, Kirill A. Shutemov

On 20.11.18 14:43, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> filemap_map_pages takes a speculative reference to each page in the
> range before it tries to lock that page. While this is correct it
> also can influence page migration which will bail out when seeing
> an elevated reference count. The faultaround code would bail on
> seeing a locked page so we can pro-actively check the PageLocked
> bit before page_cache_get_speculative and prevent from pointless
> reference count churn.
> 
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/filemap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 81adec8ee02c..c76d6a251770 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2553,6 +2553,9 @@ void filemap_map_pages(struct vm_fault *vmf,
>  			goto next;
>  
>  		head = compound_head(page);
> +
> +		if (PageLocked(head))
> +			goto next;
>  		if (!page_cache_get_speculative(head))
>  			goto next;
>  
> 

Right, no sense in referencing a page if we know we will drop the
reference right away.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages
  2018-11-20 13:43   ` Michal Hocko
  (?)
  (?)
@ 2018-11-20 14:33   ` Pavel Tatashin
  -1 siblings, 0 replies; 31+ messages in thread
From: Pavel Tatashin @ 2018-11-20 14:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin,
	David Hildenbrand, LKML, Michal Hocko, Kamezawa Hiroyuki

On 18-11-20 14:43:21, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> do_migrate_range has been limiting the number of pages to migrate to 256
> for some reason which is not documented. Even if the limit made some
> sense back then when it was introduced it doesn't really serve a good
> purpose these days. If the range contains huge pages then
> we break out of the loop too early and go through LRU and pcp
> caches draining and scan_movable_pages is quite suboptimal.
> 
> The only reason to limit the number of pages I can think of is to reduce
> the potential time to react on the fatal signal. But even then the
> number of pages is a questionable metric because even a single page
> might migration block in a non-killable state (e.g. __unmap_and_move).
> 
> Remove the limit and offline the full requested range (this is one
> membblock worth of pages with the current code). Should we ever get a
> report that offlining takes too long to react on fatal signal then we
> should rather fix the core migration to use killable waits and bailout
> on a signal.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Looks good to me, I also do not see a reason for 256 pages limit.

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

Added Kame to CC, who introduced page offlining, and this limit, but as
far as I can tell the last time he was active on LKML was in 2016.

Pasha

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining
  2018-11-20 14:26   ` David Hildenbrand
@ 2018-11-20 14:34     ` Michal Hocko
  2018-11-20 14:34       ` David Hildenbrand
  0 siblings, 1 reply; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 14:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin, LKML

On Tue 20-11-18 15:26:43, David Hildenbrand wrote:
[...]
> > +	do {
> > +		for (pfn = start_pfn; pfn;)
> > +		{
> 
> { on a new line looks weird.
> 
> > +			/* start memory hot removal */
> > +			ret = -EINTR;
> 
> I think we can move that into the "if (signal_pending(current))"
> 
> (if my eyes are not wrong, this will not be touched otherwise)

Better?

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9cd161db3061..6bc3aee30f5e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1592,11 +1592,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	}
 
 	do {
-		for (pfn = start_pfn; pfn;)
-		{
+		for (pfn = start_pfn; pfn;) {
 			/* start memory hot removal */
-			ret = -EINTR;
 			if (signal_pending(current)) {
+				ret = -EINTR;
 				reason = "signal backoff";
 				goto failed_removal_isolated;
 			}
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining
  2018-11-20 14:34     ` Michal Hocko
@ 2018-11-20 14:34       ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2018-11-20 14:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin, LKML

On 20.11.18 15:34, Michal Hocko wrote:
> On Tue 20-11-18 15:26:43, David Hildenbrand wrote:
> [...]
>>> +	do {
>>> +		for (pfn = start_pfn; pfn;)
>>> +		{
>>
>> { on a new line looks weird.
>>
>>> +			/* start memory hot removal */
>>> +			ret = -EINTR;
>>
>> I think we can move that into the "if (signal_pending(current))"
>>
>> (if my eyes are not wrong, this will not be touched otherwise)
> 
> Better?
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9cd161db3061..6bc3aee30f5e 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1592,11 +1592,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  	}
>  
>  	do {
> -		for (pfn = start_pfn; pfn;)
> -		{
> +		for (pfn = start_pfn; pfn;) {
>  			/* start memory hot removal */
> -			ret = -EINTR;
>  			if (signal_pending(current)) {
> +				ret = -EINTR;
>  				reason = "signal backoff";
>  				goto failed_removal_isolated;
>  			}
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

:)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages
  2018-11-20 13:43   ` Michal Hocko
                     ` (2 preceding siblings ...)
  (?)
@ 2018-11-20 14:51   ` osalvador
  2018-11-20 15:00     ` Michal Hocko
  -1 siblings, 1 reply; 31+ messages in thread
From: osalvador @ 2018-11-20 14:51 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Andrew Morton, Pavel Tatashin, David Hildenbrand, LKML, Michal Hocko

On Tue, 2018-11-20 at 14:43 +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> do_migrate_range has been limiting the number of pages to migrate to
> 256
> for some reason which is not documented. 

When looking back at old memory-hotplug commits one feels pretty sad
about the brevity of the changelogs.

> Signed-off-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages
  2018-11-20 14:51   ` osalvador
@ 2018-11-20 15:00     ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 15:00 UTC (permalink / raw)
  To: osalvador
  Cc: linux-mm, Andrew Morton, David Hildenbrand, LKML, Pavel Tatashin

On Tue 20-11-18 15:51:32, Oscar Salvador wrote:
> On Tue, 2018-11-20 at 14:43 +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > do_migrate_range has been limiting the number of pages to migrate to
> > 256
> > for some reason which is not documented. 
> 
> When looking back at old memory-hotplug commits one feels pretty sad
> about the brevity of the changelogs.

Well, things evolve and we've become much more careful about changelogs
over time. It still gets quite a lot of time to push back on changelogs
even these days though. People still keep forgetting that "what" is not
as important as "why" because the former is usually quite easy to
understand from reading the diff. The intention behind is usually what
gets forgotten after years. I guess people realize this much more after
few excavation git blame tours.

> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining
  2018-11-20 13:43   ` Michal Hocko
  (?)
  (?)
@ 2018-11-20 15:13   ` osalvador
  2018-11-20 15:18     ` Michal Hocko
  -1 siblings, 1 reply; 31+ messages in thread
From: osalvador @ 2018-11-20 15:13 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Andrew Morton, Pavel Tatashin, David Hildenbrand, LKML, Michal Hocko


> Signed-off-by: Michal Hocko <mhocko@suse.com>
[...]
> +	do {
> +		for (pfn = start_pfn; pfn;)
> +		{
> +			/* start memory hot removal */

Should we change thAT comment? I mean, this is not really the hot-
removal stage.

Maybe "start memory migration" suits better? or memory offlining?

> +			ret = -EINTR;
> +			if (signal_pending(current)) {
> +				reason = "signal backoff";
> +				goto failed_removal_isolated;
> +			}
>  
> -	cond_resched();
> -	lru_add_drain_all();
> -	drain_all_pages(zone);
> +			cond_resched();
> +			lru_add_drain_all();
> +			drain_all_pages(zone);
>  
> -	pfn = scan_movable_pages(start_pfn, end_pfn);
> -	if (pfn) { /* We have movable pages */
> -		ret = do_migrate_range(pfn, end_pfn);
> -		goto repeat;
> -	}
> +			pfn = scan_movable_pages(pfn, end_pfn);
> +			if (pfn) {
> +				/* TODO fatal migration failures
> should bail out */
> +				do_migrate_range(pfn, end_pfn);
> +			}
> +		}
> +
> +		/*
> +		 * dissolve free hugepages in the memory block
> before doing offlining
> +		 * actually in order to make hugetlbfs's object
> counting consistent.
> +		 */
> +		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> +		if (ret) {
> +			reason = "failure to dissolve huge pages";
> +			goto failed_removal_isolated;
> +		}
> +		/* check again */
> +		offlined_pages = check_pages_isolated(start_pfn,
> end_pfn);
> +	} while (offlined_pages < 0);
>  
> -	/*
> -	 * dissolve free hugepages in the memory block before doing
> offlining
> -	 * actually in order to make hugetlbfs's object counting
> consistent.
> -	 */
> -	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> -	if (ret) {
> -		reason = "failure to dissolve huge pages";
> -		goto failed_removal_isolated;
> -	}
> -	/* check again */
> -	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
> -	if (offlined_pages < 0)
> -		goto repeat;

This indeed looks much nicer and it is easier to follow.
With the changes commented by David:

Reviewed-by: Oscar Salvador <osalvador@suse.de>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining
  2018-11-20 15:13   ` osalvador
@ 2018-11-20 15:18     ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-20 15:18 UTC (permalink / raw)
  To: osalvador
  Cc: linux-mm, Andrew Morton, Pavel Tatashin, David Hildenbrand, LKML

On Tue 20-11-18 16:13:35, Oscar Salvador wrote:
> 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> [...]
> > +	do {
> > +		for (pfn = start_pfn; pfn;)
> > +		{
> > +			/* start memory hot removal */
> 
> Should we change thAT comment? I mean, this is not really the hot-
> removal stage.
> 
> Maybe "start memory migration" suits better? or memory offlining?

I will just drop it. It doesn't really explain anything.
[...]
> 
> This indeed looks much nicer and it is easier to follow.
> With the changes commented by David:
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-20 13:43   ` Michal Hocko
                     ` (2 preceding siblings ...)
  (?)
@ 2018-11-21  1:47   ` Hugh Dickins
  2018-11-21  7:11     ` Michal Hocko
  -1 siblings, 1 reply; 31+ messages in thread
From: Hugh Dickins @ 2018-11-21  1:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin,
	David Hildenbrand, LKML, Michal Hocko, Kirill A. Shutemov

On Tue, 20 Nov 2018, Michal Hocko wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> filemap_map_pages takes a speculative reference to each page in the
> range before it tries to lock that page. While this is correct it
> also can influence page migration which will bail out when seeing
> an elevated reference count. The faultaround code would bail on
> seeing a locked page so we can pro-actively check the PageLocked
> bit before page_cache_get_speculative and prevent from pointless
> reference count churn.
> 
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Hugh Dickins <hughd@google.com>

though I think this patch is more useful to the avoid atomic ops,
and unnecessary dirtying of the cacheline, than to avoid the very
transient elevation of refcount, which will not affect page migration
very much.

> ---
>  mm/filemap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 81adec8ee02c..c76d6a251770 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2553,6 +2553,9 @@ void filemap_map_pages(struct vm_fault *vmf,
>  			goto next;
>  
>  		head = compound_head(page);
> +
> +		if (PageLocked(head))
> +			goto next;
>  		if (!page_cache_get_speculative(head))
>  			goto next;
>  
> -- 
> 2.19.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-20 14:12     ` Michal Hocko
  2018-11-20 14:17       ` Kirill A. Shutemov
@ 2018-11-21  4:51       ` William Kucharski
  2018-11-21  7:07         ` Michal Hocko
  1 sibling, 1 reply; 31+ messages in thread
From: William Kucharski @ 2018-11-21  4:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill A. Shutemov, Linux-MM, Andrew Morton, Oscar Salvador,
	Pavel Tatashin, David Hildenbrand, LKML



> On Nov 20, 2018, at 7:12 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> +		/*
> +		 * Check the locked pages before taking a reference to not
> +		 * go in the way of migration.
> +		 */

Could you make this a tiny bit more explanative, something like:

+		/*
+		 * Check for a locked page first, as a speculative
+		 * reference may adversely influence page migration.
+		 */

Reviewed-by: William Kucharski <william.kucharski@oracle.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-21  4:51       ` William Kucharski
@ 2018-11-21  7:07         ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2018-11-21  7:07 UTC (permalink / raw)
  To: William Kucharski
  Cc: Kirill A. Shutemov, Linux-MM, Andrew Morton, Oscar Salvador,
	Pavel Tatashin, David Hildenbrand, LKML

On Tue 20-11-18 21:51:39, William Kucharski wrote:
> 
> 
> > On Nov 20, 2018, at 7:12 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > +		/*
> > +		 * Check the locked pages before taking a reference to not
> > +		 * go in the way of migration.
> > +		 */
> 
> Could you make this a tiny bit more explanative, something like:
> 
> +		/*
> +		 * Check for a locked page first, as a speculative
> +		 * reference may adversely influence page migration.
> +		 */

sure

> 
> Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-21  1:47   ` Hugh Dickins
@ 2018-11-21  7:11     ` Michal Hocko
  2018-11-22  2:27       ` Hugh Dickins
  0 siblings, 1 reply; 31+ messages in thread
From: Michal Hocko @ 2018-11-21  7:11 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin,
	David Hildenbrand, LKML, Kirill A. Shutemov

On Tue 20-11-18 17:47:21, Hugh Dickins wrote:
> On Tue, 20 Nov 2018, Michal Hocko wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > filemap_map_pages takes a speculative reference to each page in the
> > range before it tries to lock that page. While this is correct it
> > also can influence page migration which will bail out when seeing
> > an elevated reference count. The faultaround code would bail on
> > seeing a locked page so we can pro-actively check the PageLocked
> > bit before page_cache_get_speculative and prevent from pointless
> > reference count churn.
> > 
> > Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Acked-by: Hugh Dickins <hughd@google.com>

Thanks!

> though I think this patch is more useful to the avoid atomic ops,
> and unnecessary dirtying of the cacheline, than to avoid the very
> transient elevation of refcount, which will not affect page migration
> very much.

Are you sure it would really be transient? In other words is it possible
that the fault around can block migration repeatedly under refault heavy
workload? I just couldn't convince myself, to be honest.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-21  7:11     ` Michal Hocko
@ 2018-11-22  2:27       ` Hugh Dickins
  2018-11-22  9:05         ` Michal Hocko
  0 siblings, 1 reply; 31+ messages in thread
From: Hugh Dickins @ 2018-11-22  2:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Hugh Dickins, linux-mm, Andrew Morton, Oscar Salvador,
	Pavel Tatashin, David Hildenbrand, LKML, Kirill A. Shutemov

On Wed, 21 Nov 2018, Michal Hocko wrote:
> On Tue 20-11-18 17:47:21, Hugh Dickins wrote:
> > On Tue, 20 Nov 2018, Michal Hocko wrote:
> > 
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > filemap_map_pages takes a speculative reference to each page in the
> > > range before it tries to lock that page. While this is correct it
> > > also can influence page migration which will bail out when seeing
> > > an elevated reference count. The faultaround code would bail on
> > > seeing a locked page so we can pro-actively check the PageLocked
> > > bit before page_cache_get_speculative and prevent from pointless
> > > reference count churn.
> > > 
> > > Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> > > Suggested-by: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > 
> > Acked-by: Hugh Dickins <hughd@google.com>
> 
> Thanks!
> 
> > though I think this patch is more useful to the avoid atomic ops,
> > and unnecessary dirtying of the cacheline, than to avoid the very
> > transient elevation of refcount, which will not affect page migration
> > very much.
> 
> Are you sure it would really be transient? In other words is it possible
> that the fault around can block migration repeatedly under refault heavy
> workload? I just couldn't convince myself, to be honest.

I don't deny that it is possible: I expect that, using fork() (which does
not copy the ptes in a shared file vma), you can construct a test case
where each child faults one or another page near a page of no interest,
and that page of no interest is a target of migration perpetually
frustrated by filemap_map_pages()'s briefly raised refcount.

But I suggest that's a third-order effect: well worth fixing because
it's easily and uncontroversially dealt with, as you have; but not of
great importance.

The first-order effect is migration conspiring to defeat itself: that's
what my put_and_wait_on_page_locked() patch, in other thread, is about.

The second order effect is when a page that is really wanted is waited
on - the target of a fault, for which page refcount is raised maybe
long before it finally gets into the page table (whereupon it becomes
visible to try_to_unmap(), and its mapcount matches refcount so that
migration can fully account for the page).  One class of that can be
well dealt with by using put_and_wait_on_page_locked_killable() in
lock_page_or_retry(), but I was keeping that as a future instalment.

But I shouldn't denigrate the transient case by referring so lightly
to migrate_pages()' 10 attempts: each of those failed attempts can
be very expensive, unmapping and TLB flushing (including IPIs) and
remapping. It may well be that 2 or 3 would be a more cost-effective
number of attempts, at least when the page is mapped.

Hugh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-22  2:27       ` Hugh Dickins
@ 2018-11-22  9:05         ` Michal Hocko
  2018-11-23 20:22           ` Hugh Dickins
  0 siblings, 1 reply; 31+ messages in thread
From: Michal Hocko @ 2018-11-22  9:05 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Andrew Morton, Oscar Salvador, Pavel Tatashin,
	David Hildenbrand, LKML, Kirill A. Shutemov

On Wed 21-11-18 18:27:11, Hugh Dickins wrote:
> On Wed, 21 Nov 2018, Michal Hocko wrote:
> > On Tue 20-11-18 17:47:21, Hugh Dickins wrote:
> > > On Tue, 20 Nov 2018, Michal Hocko wrote:
> > > 
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > 
> > > > filemap_map_pages takes a speculative reference to each page in the
> > > > range before it tries to lock that page. While this is correct it
> > > > also can influence page migration which will bail out when seeing
> > > > an elevated reference count. The faultaround code would bail on
> > > > seeing a locked page so we can pro-actively check the PageLocked
> > > > bit before page_cache_get_speculative and prevent from pointless
> > > > reference count churn.
> > > > 
> > > > Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> > > > Suggested-by: Jan Kara <jack@suse.cz>
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > 
> > > Acked-by: Hugh Dickins <hughd@google.com>
> > 
> > Thanks!
> > 
> > > though I think this patch is more useful to the avoid atomic ops,
> > > and unnecessary dirtying of the cacheline, than to avoid the very
> > > transient elevation of refcount, which will not affect page migration
> > > very much.
> > 
> > Are you sure it would really be transient? In other words is it possible
> > that the fault around can block migration repeatedly under refault heavy
> > workload? I just couldn't convince myself, to be honest.
> 
> I don't deny that it is possible: I expect that, using fork() (which does
> not copy the ptes in a shared file vma), you can construct a test case
> where each child faults one or another page near a page of no interest,
> and that page of no interest is a target of migration perpetually
> frustrated by filemap_map_pages()'s briefly raised refcount.

The other issue I am debugging and which very likely has the same
underlying issue in the end has shown
[  883.930477] rac1 kernel: page:ffffea2084bf5cc0 count:1889 mapcount:1887 mapping:ffff8833c82c9ad8 index:0x6b
[  883.930485] rac1 kernel: ext4_da_aops [ext4]
[  883.930497] rac1 kernel: name:"libc-2.22.so"
[  883.931241] rac1 kernel: do_migrate_range done ret=23

pattern. After we have disabled the faultaround the failure has moved to
a different page but libc hasn't shown up again. This might be a matter
of (bad)luck and timing. But we thought that it is not too unlikely for
faultaround on such a shared page to strike in.

> But I suggest that's a third-order effect: well worth fixing because
> it's easily and uncontroversially dealt with, as you have; but not of
> great importance.
> 
> The first-order effect is migration conspiring to defeat itself: that's
> what my put_and_wait_on_page_locked() patch, in other thread, is about.

yes. That is obviously a much more effective fix.

> The second order effect is when a page that is really wanted is waited
> on - the target of a fault, for which page refcount is raised maybe
> long before it finally gets into the page table (whereupon it becomes
> visible to try_to_unmap(), and its mapcount matches refcount so that
> migration can fully account for the page).  One class of that can be
> well dealt with by using put_and_wait_on_page_locked_killable() in
> lock_page_or_retry(), but I was keeping that as a future instalment.
> 
> But I shouldn't denigrate the transient case by referring so lightly
> to migrate_pages()' 10 attempts: each of those failed attempts can
> be very expensive, unmapping and TLB flushing (including IPIs) and
> remapping. It may well be that 2 or 3 would be a more cost-effective
> number of attempts, at least when the page is mapped.

If you want some update to the comment in this function or to the
changelog, I am open of course. Right now I have
+                * Check for a locked page first, as a speculative
+                * reference may adversely influence page migration.
as suggested by William.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-11-22  9:05         ` Michal Hocko
@ 2018-11-23 20:22           ` Hugh Dickins
  0 siblings, 0 replies; 31+ messages in thread
From: Hugh Dickins @ 2018-11-23 20:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Hugh Dickins, linux-mm, Andrew Morton, Oscar Salvador,
	Pavel Tatashin, David Hildenbrand, LKML, Kirill A. Shutemov

On Thu, 22 Nov 2018, Michal Hocko wrote:
> 
> If you want some update to the comment in this function or to the
> changelog, I am open of course. Right now I have
> +                * Check for a locked page first, as a speculative
> +                * reference may adversely influence page migration.
> as suggested by William.

I ought to care, since I challenged the significance of this aspect
in the first place, but find I don't care enough - I much prefer the
patch to the comments on and in it, but have not devised any wording
that I'd prefer to see instead - sorry.

Hugh

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-11-23 20:22 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-20 13:43 [RFC PATCH 0/3] few memory offlining enhancements Michal Hocko
2018-11-20 13:43 ` Michal Hocko
2018-11-20 13:43 ` [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages Michal Hocko
2018-11-20 13:43   ` Michal Hocko
2018-11-20 14:18   ` David Hildenbrand
2018-11-20 14:25     ` Michal Hocko
2018-11-20 14:27       ` David Hildenbrand
2018-11-20 14:33   ` Pavel Tatashin
2018-11-20 14:51   ` osalvador
2018-11-20 15:00     ` Michal Hocko
2018-11-20 13:43 ` [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining Michal Hocko
2018-11-20 13:43   ` Michal Hocko
2018-11-20 14:26   ` David Hildenbrand
2018-11-20 14:34     ` Michal Hocko
2018-11-20 14:34       ` David Hildenbrand
2018-11-20 15:13   ` osalvador
2018-11-20 15:18     ` Michal Hocko
2018-11-20 13:43 ` [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page Michal Hocko
2018-11-20 13:43   ` Michal Hocko
2018-11-20 14:07   ` Kirill A. Shutemov
2018-11-20 14:12     ` Michal Hocko
2018-11-20 14:17       ` Kirill A. Shutemov
2018-11-20 14:25         ` Michal Hocko
2018-11-21  4:51       ` William Kucharski
2018-11-21  7:07         ` Michal Hocko
2018-11-20 14:33   ` David Hildenbrand
2018-11-21  1:47   ` Hugh Dickins
2018-11-21  7:11     ` Michal Hocko
2018-11-22  2:27       ` Hugh Dickins
2018-11-22  9:05         ` Michal Hocko
2018-11-23 20:22           ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.