Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/3] few memory offlining enhancements
@ 2018-12-11 14:27 Michal Hocko
  2018-12-11 14:27 ` [PATCH 1/3] mm, memory_hotplug: try to migrate full pfn range Michal Hocko
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Michal Hocko @ 2018-12-11 14:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, David Hildenbrand, Hugh Dickins, Jan Kara,
	Kirill A. Shutemov, Michal Hocko, Oscar Salvador, Pavel Tatashin,
	William Kucharski

This has been posted as an RFC [1]. There was a general agreement for
these patches. I hope I have addressed all the review feedback.

Original cover:
I have been chasing memory offlining not making progress recently. On
the way I have noticed few weird decisions in the code. The migration
itself is restricted without a reasonable justification and the retry
loop around the migration is quite messy. This is addressed by patch 1
and patch 2.

Patch 3 is targeting on the faultaround code which has been a hot
candidate for the initial issue reported upstream [2] and that I am
debugging internally. It turned out to be not the main contributor
in the end but I believe we should address it regardless. See the patch
description for more details.

[1] http://lkml.kernel.org/r/20181120134323.13007-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/20181114070909.GB2653@MiWiFi-R3L-srv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] mm, memory_hotplug: try to migrate full pfn range
  2018-12-11 14:27 [PATCH 0/3] few memory offlining enhancements Michal Hocko
@ 2018-12-11 14:27 ` Michal Hocko
  2018-12-12  1:12   ` Wei Yang
  2018-12-11 14:27 ` [PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining Michal Hocko
  2018-12-11 14:27 ` [PATCH 3/3] mm, fault_around: do not take a reference to a locked page Michal Hocko
  2 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2018-12-11 14:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Michal Hocko, David Hildenbrand, Oscar Salvador,
	Pavel Tatashin

From: Michal Hocko <mhocko@suse.com>

do_migrate_range has been limiting the number of pages to migrate to 256
for some reason which is not documented. Even if the limit made some
sense back then when it was introduced it doesn't really serve a good
purpose these days. If the range contains huge pages then
we break out of the loop too early and go through LRU and pcp
caches draining and scan_movable_pages is quite suboptimal.

The only reason to limit the number of pages I can think of is to reduce
the potential time to react on the fatal signal. But even then the
number of pages is a questionable metric because even a single page
might migration block in a non-killable state (e.g. __unmap_and_move).

Remove the limit and offline the full requested range (this is one
membblock worth of pages with the current code). Should we ever get a
report that offlining takes too long to react on fatal signal then we
should rather fix the core migration to use killable waits and bailout
on a signal.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c82193db4be6..6263c8cd4491 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1339,18 +1339,16 @@ static struct page *new_node_page(struct page *page, unsigned long private)
 	return new_page_nodemask(page, nid, &nmask);
 }
 
-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
 static int
 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
 	struct page *page;
-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
 	int not_managed = 0;
 	int ret = 0;
 	LIST_HEAD(source);
 
-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		if (!pfn_valid(pfn))
 			continue;
 		page = pfn_to_page(pfn);
@@ -1362,8 +1360,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 				ret = -EBUSY;
 				break;
 			}
-			if (isolate_huge_page(page, &source))
-				move_pages -= 1 << compound_order(head);
+			isolate_huge_page(page, &source);
 			continue;
 		} else if (PageTransHuge(page))
 			pfn = page_to_pfn(compound_head(page))
@@ -1382,7 +1379,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		if (!ret) { /* Success */
 			put_page(page);
 			list_add_tail(&page->lru, &source);
-			move_pages--;
 			if (!__PageMovable(page))
 				inc_node_page_state(page, NR_ISOLATED_ANON +
 						    page_is_file_cache(page));
-- 
2.19.2

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining
  2018-12-11 14:27 [PATCH 0/3] few memory offlining enhancements Michal Hocko
  2018-12-11 14:27 ` [PATCH 1/3] mm, memory_hotplug: try to migrate full pfn range Michal Hocko
@ 2018-12-11 14:27 ` Michal Hocko
  2018-12-11 14:27 ` [PATCH 3/3] mm, fault_around: do not take a reference to a locked page Michal Hocko
  2 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2018-12-11 14:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Michal Hocko, David Hildenbrand, Oscar Salvador

From: Michal Hocko <mhocko@suse.com>

Memory migration might fail during offlining and we keep retrying in
that case. This is currently obfuscate by goto retry loop. The code
is hard to follow and as a result it is even suboptimal becase each
retry round scans the full range from start_pfn even though we have
successfully scanned/migrated [start_pfn, pfn] range already. This
is all only because check_pages_isolated failure has to rescan the full
range again.

De-obfuscate the migration retry loop by promoting it to a real for
loop. In fact remove the goto altogether by making it a proper double
loop (yeah, gotos are nasty in this specific case). In the end we
will get a slightly more optimal code which is better readable.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 58 ++++++++++++++++++++++-----------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6263c8cd4491..c6c42a7425e5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1591,38 +1591,38 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal_isolated;
 	}
 
-	pfn = start_pfn;
-repeat:
-	/* start memory hot removal */
-	ret = -EINTR;
-	if (signal_pending(current)) {
-		reason = "signal backoff";
-		goto failed_removal_isolated;
-	}
+	do {
+		for (pfn = start_pfn; pfn;) {
+			if (signal_pending(current)) {
+				ret = -EINTR;
+				reason = "signal backoff";
+				goto failed_removal_isolated;
+			}
 
-	cond_resched();
-	lru_add_drain_all();
-	drain_all_pages(zone);
+			cond_resched();
+			lru_add_drain_all();
+			drain_all_pages(zone);
 
-	pfn = scan_movable_pages(start_pfn, end_pfn);
-	if (pfn) { /* We have movable pages */
-		ret = do_migrate_range(pfn, end_pfn);
-		goto repeat;
-	}
+			pfn = scan_movable_pages(pfn, end_pfn);
+			if (pfn) {
+				/* TODO fatal migration failures should bail out */
+				do_migrate_range(pfn, end_pfn);
+			}
+		}
+
+		/*
+		 * dissolve free hugepages in the memory block before doing offlining
+		 * actually in order to make hugetlbfs's object counting consistent.
+		 */
+		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
+		if (ret) {
+			reason = "failure to dissolve huge pages";
+			goto failed_removal_isolated;
+		}
+		/* check again */
+		offlined_pages = check_pages_isolated(start_pfn, end_pfn);
+	} while (offlined_pages < 0);
 
-	/*
-	 * dissolve free hugepages in the memory block before doing offlining
-	 * actually in order to make hugetlbfs's object counting consistent.
-	 */
-	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
-	if (ret) {
-		reason = "failure to dissolve huge pages";
-		goto failed_removal_isolated;
-	}
-	/* check again */
-	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
-	if (offlined_pages < 0)
-		goto repeat;
 	pr_info("Offlined Pages %ld\n", offlined_pages);
 	/* Ok, all of our target is isolated.
 	   We cannot do rollback at this point. */
-- 
2.19.2

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3/3] mm, fault_around: do not take a reference to a locked page
  2018-12-11 14:27 [PATCH 0/3] few memory offlining enhancements Michal Hocko
  2018-12-11 14:27 ` [PATCH 1/3] mm, memory_hotplug: try to migrate full pfn range Michal Hocko
  2018-12-11 14:27 ` [PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining Michal Hocko
@ 2018-12-11 14:27 ` Michal Hocko
  2 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2018-12-11 14:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Michal Hocko, David Hildenbrand, Hugh Dickins,
	Jan Kara, Kirill A. Shutemov, William Kucharski

From: Michal Hocko <mhocko@suse.com>

filemap_map_pages takes a speculative reference to each page in the
range before it tries to lock that page. While this is correct it
also can influence page migration which will bail out when seeing
an elevated reference count. The faultaround code would bail on
seeing a locked page so we can pro-actively check the PageLocked
bit before page_cache_get_speculative and prevent from pointless
reference count churn.

Suggested-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/filemap.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 81adec8ee02c..a87f71fff879 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2553,6 +2553,13 @@ void filemap_map_pages(struct vm_fault *vmf,
 			goto next;
 
 		head = compound_head(page);
+
+		/*
+		 * Check for a locked page first, as a speculative
+		 * reference may adversely influence page migration.
+		 */
+		if (PageLocked(head))
+			goto next;
 		if (!page_cache_get_speculative(head))
 			goto next;
 
-- 
2.19.2

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/3] mm, memory_hotplug: try to migrate full pfn range
  2018-12-11 14:27 ` [PATCH 1/3] mm, memory_hotplug: try to migrate full pfn range Michal Hocko
@ 2018-12-12  1:12   ` Wei Yang
  0 siblings, 0 replies; 5+ messages in thread
From: Wei Yang @ 2018-12-12  1:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, LKML, Michal Hocko, David Hildenbrand,
	Oscar Salvador, Pavel Tatashin

On Tue, Dec 11, 2018 at 03:27:39PM +0100, Michal Hocko wrote:
>From: Michal Hocko <mhocko@suse.com>
>
>do_migrate_range has been limiting the number of pages to migrate to 256
>for some reason which is not documented. Even if the limit made some
>sense back then when it was introduced it doesn't really serve a good
>purpose these days. If the range contains huge pages then
>we break out of the loop too early and go through LRU and pcp
>caches draining and scan_movable_pages is quite suboptimal.
>
>The only reason to limit the number of pages I can think of is to reduce
>the potential time to react on the fatal signal. But even then the
>number of pages is a questionable metric because even a single page
>might migration block in a non-killable state (e.g. __unmap_and_move).
>
>Remove the limit and offline the full requested range (this is one
>membblock worth of pages with the current code). Should we ever get a

s/membblock/memblock/

Or memory block is more accurate? May memblock confuse audience with
lower level facility?

>report that offlining takes too long to react on fatal signal then we
>should rather fix the core migration to use killable waits and bailout
>on a signal.
>
>Reviewed-by: David Hildenbrand <david@redhat.com>
>Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
>Reviewed-by: Oscar Salvador <osalvador@suse.de>
>Signed-off-by: Michal Hocko <mhocko@suse.com>
>---
> mm/memory_hotplug.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index c82193db4be6..6263c8cd4491 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -1339,18 +1339,16 @@ static struct page *new_node_page(struct page *page, unsigned long private)
> 	return new_page_nodemask(page, nid, &nmask);
> }
> 
>-#define NR_OFFLINE_AT_ONCE_PAGES	(256)
> static int
> do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> {
> 	unsigned long pfn;
> 	struct page *page;
>-	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
> 	int not_managed = 0;
> 	int ret = 0;
> 	LIST_HEAD(source);
> 
>-	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
>+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> 		if (!pfn_valid(pfn))
> 			continue;
> 		page = pfn_to_page(pfn);
>@@ -1362,8 +1360,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> 				ret = -EBUSY;
> 				break;
> 			}
>-			if (isolate_huge_page(page, &source))
>-				move_pages -= 1 << compound_order(head);
>+			isolate_huge_page(page, &source);
> 			continue;
> 		} else if (PageTransHuge(page))
> 			pfn = page_to_pfn(compound_head(page))
>@@ -1382,7 +1379,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> 		if (!ret) { /* Success */
> 			put_page(page);
> 			list_add_tail(&page->lru, &source);
>-			move_pages--;
> 			if (!__PageMovable(page))
> 				inc_node_page_state(page, NR_ISOLATED_ANON +
> 						    page_is_file_cache(page));
>-- 
>2.19.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-11 14:27 [PATCH 0/3] few memory offlining enhancements Michal Hocko
2018-12-11 14:27 ` [PATCH 1/3] mm, memory_hotplug: try to migrate full pfn range Michal Hocko
2018-12-12  1:12   ` Wei Yang
2018-12-11 14:27 ` [PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining Michal Hocko
2018-12-11 14:27 ` [PATCH 3/3] mm, fault_around: do not take a reference to a locked page Michal Hocko

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git