linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups
@ 2019-10-21 17:23 David Hildenbrand
  2019-10-21 17:23 ` [PATCH v2 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-10-21 17:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Alexander Duyck, Andrew Morton,
	Anshuman Khandual, Dan Williams, Mel Gorman, Michal Hocko,
	Mike Rapoport, Mike Rapoport, Oscar Salvador, Pavel Tatashin,
	Pavel Tatashin, Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang

Two cleanups that popped up while working on (and discussing) virtio-mem:
 https://lkml.org/lkml/2019/9/19/463

Tested with DIMMs on x86.

As discussed with michal in v1, I'll soon look into removing the use
of PG_reserved during memory onlining completely - most probably
disallowing to offline memory blocks with holes, cleaning up the
onlining+offlining code.

v1 -> v2:
- "mm/page_alloc.c: Don't set pages PageReserved() when offlining"
-- Fixup one comment
- "mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE"
-- Use parenthesis around checks
- Added ACKs

David Hildenbrand (2):
  mm/page_alloc.c: Don't set pages PageReserved() when offlining
  mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE

 include/linux/page-isolation.h |  4 ++--
 mm/memory_hotplug.c            | 12 ++++++------
 mm/page_alloc.c                |  9 +++------
 mm/page_isolation.c            | 12 ++++++------
 4 files changed, 17 insertions(+), 20 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining
  2019-10-21 17:23 [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups David Hildenbrand
@ 2019-10-21 17:23 ` David Hildenbrand
  2019-10-21 17:23 ` [PATCH v2 2/2] mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE David Hildenbrand
  2019-10-22  6:52 ` [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups David Hildenbrand
  2 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-10-21 17:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Michal Hocko,
	Vlastimil Babka, Oscar Salvador, Mel Gorman, Mike Rapoport,
	Dan Williams, Wei Yang, Alexander Duyck, Anshuman Khandual,
	Pavel Tatashin

We call __offline_isolated_pages() from __offline_pages() after all
pages were isolated and are either free (PageBuddy()) or PageHWPoison.
Nothing can stop us from offlining memory at this point.

In __offline_isolated_pages() we first set all affected memory sections
offline (offline_mem_sections(pfn, end_pfn)), to mark the memmap as
invalid (pfn_to_online_page() will no longer succeed), and then walk over
all pages to pull the free pages from the free lists (to the isolated
free lists, to be precise).

Note that re-onlining a memory block will result in the whole memmap
getting reinitialized, overwriting any old state. We already poision the
memmap when offlining is complete to find any access to
stale/uninitialized memmaps.

So, setting the pages PageReserved() is not helpful. The memap is marked
offline and all pageblocks are isolated. As soon as offline, the memmap
is stale either way.

This looks like a leftover from ancient times where we initialized the
memmap when adding memory and not when onlining it (the pages were set
PageReserved so re-onling would work as expected).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 4 +---
 mm/page_alloc.c     | 5 +----
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5e6b2a312362..3524e2e1a9df 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1346,9 +1346,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 	return ret;
 }
 
-/*
- * remove from free_area[] and mark all as Reserved.
- */
+/* Mark all sections offline and remove all free pages from the buddy. */
 static int
 offline_isolated_pages_cb(unsigned long start, unsigned long nr_pages,
 			void *data)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ed8884dc0c47..bf6b21f02154 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8667,7 +8667,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 {
 	struct page *page;
 	struct zone *zone;
-	unsigned int order, i;
+	unsigned int order;
 	unsigned long pfn;
 	unsigned long flags;
 	unsigned long offlined_pages = 0;
@@ -8695,7 +8695,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 		 */
 		if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
 			pfn++;
-			SetPageReserved(page);
 			offlined_pages++;
 			continue;
 		}
@@ -8709,8 +8708,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 			pfn, 1 << order, end_pfn);
 #endif
 		del_page_from_free_area(page, &zone->free_area[order]);
-		for (i = 0; i < (1 << order); i++)
-			SetPageReserved((page+i));
 		pfn += (1 << order);
 	}
 	spin_unlock_irqrestore(&zone->lock, flags);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/2] mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE
  2019-10-21 17:23 [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups David Hildenbrand
  2019-10-21 17:23 ` [PATCH v2 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining David Hildenbrand
@ 2019-10-21 17:23 ` David Hildenbrand
  2019-10-22  6:52 ` [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups David Hildenbrand
  2 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-10-21 17:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Michal Hocko, Oscar Salvador,
	Andrew Morton, Anshuman Khandual, Pingfan Liu, Qian Cai,
	Pavel Tatashin, Dan Williams, Vlastimil Babka, Mel Gorman,
	Mike Rapoport, Alexander Duyck

We have two types of users of page isolation:
1. Memory offlining: Offline memory so it can be unplugged. Memory won't
		     be touched.
2. Memory allocation: Allocate memory (e.g., alloc_contig_range()) to
		      become the owner of the memory and make use of it.

For example, in case we want to offline memory, we can ignore (skip over)
PageHWPoison() pages, as the memory won't get used. We can allow to
offline memory. In contrast, we don't want to allow to allocate such
memory.

Let's generalize the approach so we can special case other types of
pages we want to skip over in case we offline memory. While at it, also
pass the same flags to test_pages_isolated().

Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/page-isolation.h |  4 ++--
 mm/memory_hotplug.c            |  8 +++++---
 mm/page_alloc.c                |  4 ++--
 mm/page_isolation.c            | 12 ++++++------
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 1099c2fee20f..6861df759fad 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -30,7 +30,7 @@ static inline bool is_migrate_isolate(int migratetype)
 }
 #endif
 
-#define SKIP_HWPOISON	0x1
+#define MEMORY_OFFLINE	0x1
 #define REPORT_FAILURE	0x2
 
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
@@ -58,7 +58,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
  * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
-			bool skip_hwpoisoned_pages);
+			int isol_flags);
 
 struct page *alloc_migrate_target(struct page *page, unsigned long private);
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3524e2e1a9df..561371ead39a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1149,7 +1149,8 @@ static bool is_pageblock_removable_nolock(unsigned long pfn)
 	if (!zone_spans_pfn(zone, pfn))
 		return false;
 
-	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, SKIP_HWPOISON);
+	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE,
+				    MEMORY_OFFLINE);
 }
 
 /* Checks if this range of memory is likely to be hot-removable. */
@@ -1364,7 +1365,8 @@ static int
 check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages,
 			void *data)
 {
-	return test_pages_isolated(start_pfn, start_pfn + nr_pages, true);
+	return test_pages_isolated(start_pfn, start_pfn + nr_pages,
+				   MEMORY_OFFLINE);
 }
 
 static int __init cmdline_parse_movable_node(char *p)
@@ -1475,7 +1477,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE,
-				       SKIP_HWPOISON | REPORT_FAILURE);
+				       MEMORY_OFFLINE | REPORT_FAILURE);
 	if (ret < 0) {
 		reason = "failure to isolate range";
 		goto failed_removal;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf6b21f02154..e153280bde9a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8270,7 +8270,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 		 * The HWPoisoned page may be not in buddy system, and
 		 * page_count() is not 0.
 		 */
-		if ((flags & SKIP_HWPOISON) && PageHWPoison(page))
+		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
 			continue;
 
 		if (__PageMovable(page))
@@ -8486,7 +8486,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	}
 
 	/* Make sure the range is really isolated. */
-	if (test_pages_isolated(outer_start, end, false)) {
+	if (test_pages_isolated(outer_start, end, 0)) {
 		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
 			__func__, outer_start, end);
 		ret = -EBUSY;
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 89c19c0feadb..04ee1663cdbe 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -168,7 +168,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * @migratetype:	Migrate type to set in error recovery.
  * @flags:		The following flags are allowed (they can be combined in
  *			a bit mask)
- *			SKIP_HWPOISON - ignore hwpoison pages
+ *			MEMORY_OFFLINE - isolate to offline (!allocate) memory
+ *					 e.g., skip over PageHWPoison() pages
  *			REPORT_FAILURE - report details about the failure to
  *			isolate the range
  *
@@ -257,7 +258,7 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
  */
 static unsigned long
 __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
-				  bool skip_hwpoisoned_pages)
+				  int flags)
 {
 	struct page *page;
 
@@ -274,7 +275,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 			 * simple way to verify that as VM_BUG_ON(), though.
 			 */
 			pfn += 1 << page_order(page);
-		else if (skip_hwpoisoned_pages && PageHWPoison(page))
+		else if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
 			/* A HWPoisoned page cannot be also PageBuddy */
 			pfn++;
 		else
@@ -286,7 +287,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 
 /* Caller should ensure that requested range is in a single zone */
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
-			bool skip_hwpoisoned_pages)
+			int isol_flags)
 {
 	unsigned long pfn, flags;
 	struct page *page;
@@ -308,8 +309,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	/* Check all pages are free or marked as ISOLATED */
 	zone = page_zone(page);
 	spin_lock_irqsave(&zone->lock, flags);
-	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
-						skip_hwpoisoned_pages);
+	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn, isol_flags);
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	trace_test_pages_isolated(start_pfn, end_pfn, pfn);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups
  2019-10-21 17:23 [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups David Hildenbrand
  2019-10-21 17:23 ` [PATCH v2 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining David Hildenbrand
  2019-10-21 17:23 ` [PATCH v2 2/2] mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE David Hildenbrand
@ 2019-10-22  6:52 ` David Hildenbrand
  2019-10-22  8:08   ` Michal Hocko
  2 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2019-10-22  6:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Alexander Duyck, Andrew Morton, Anshuman Khandual,
	Dan Williams, Mel Gorman, Michal Hocko, Mike Rapoport,
	Mike Rapoport, Oscar Salvador, Pavel Tatashin, Pavel Tatashin,
	Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang

On 21.10.19 19:23, David Hildenbrand wrote:
> Two cleanups that popped up while working on (and discussing) virtio-mem:
>   https://lkml.org/lkml/2019/9/19/463
> 
> Tested with DIMMs on x86.
> 
> As discussed with michal in v1, I'll soon look into removing the use
> of PG_reserved during memory onlining completely - most probably
> disallowing to offline memory blocks with holes, cleaning up the
> onlining+offlining code.

BTW, I remember that ZONE_DEVICE pages are still required to be set 
PG_reserved. That has to be sorted out first. I remember that somebody 
was working on it a while ago but didn't hear about that again. Will 
look into that as well - should be as easy as adding a zone check (if 
there isn't a pfn_to_online_page() check already). But of course, there 
might be special cases ....


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups
  2019-10-22  6:52 ` [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups David Hildenbrand
@ 2019-10-22  8:08   ` Michal Hocko
  2019-10-22  8:15     ` David Hildenbrand
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2019-10-22  8:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Alexander Duyck, Andrew Morton,
	Anshuman Khandual, Dan Williams, Mel Gorman, Mike Rapoport,
	Mike Rapoport, Oscar Salvador, Pavel Tatashin, Pavel Tatashin,
	Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang

On Tue 22-10-19 08:52:28, David Hildenbrand wrote:
> On 21.10.19 19:23, David Hildenbrand wrote:
> > Two cleanups that popped up while working on (and discussing) virtio-mem:
> >   https://lkml.org/lkml/2019/9/19/463
> > 
> > Tested with DIMMs on x86.
> > 
> > As discussed with michal in v1, I'll soon look into removing the use
> > of PG_reserved during memory onlining completely - most probably
> > disallowing to offline memory blocks with holes, cleaning up the
> > onlining+offlining code.
> 
> BTW, I remember that ZONE_DEVICE pages are still required to be set
> PG_reserved. That has to be sorted out first.

Do they?

> I remember that somebody was
> working on it a while ago but didn't hear about that again. Will look into
> that as well - should be as easy as adding a zone check (if there isn't a
> pfn_to_online_page() check already). But of course, there might be special
> cases ....

I remember Alexander didn't want to change the PageReserved handling
because he was worried about unforeseeable side effects. I have a vague
recollection he (or maybe Dan) has promissed some follow up clean ups
which didn't seem to materialize.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups
  2019-10-22  8:08   ` Michal Hocko
@ 2019-10-22  8:15     ` David Hildenbrand
  2019-10-22  8:21       ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2019-10-22  8:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Alexander Duyck, Andrew Morton,
	Anshuman Khandual, Dan Williams, Mel Gorman, Mike Rapoport,
	Mike Rapoport, Oscar Salvador, Pavel Tatashin, Pavel Tatashin,
	Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang

On 22.10.19 10:08, Michal Hocko wrote:
> On Tue 22-10-19 08:52:28, David Hildenbrand wrote:
>> On 21.10.19 19:23, David Hildenbrand wrote:
>>> Two cleanups that popped up while working on (and discussing) virtio-mem:
>>>    https://lkml.org/lkml/2019/9/19/463
>>>
>>> Tested with DIMMs on x86.
>>>
>>> As discussed with michal in v1, I'll soon look into removing the use
>>> of PG_reserved during memory onlining completely - most probably
>>> disallowing to offline memory blocks with holes, cleaning up the
>>> onlining+offlining code.
>>
>> BTW, I remember that ZONE_DEVICE pages are still required to be set
>> PG_reserved. That has to be sorted out first.
> 
> Do they?

Yes, especially KVM code :/

> 
>> I remember that somebody was
>> working on it a while ago but didn't hear about that again. Will look into
>> that as well - should be as easy as adding a zone check (if there isn't a
>> pfn_to_online_page() check already). But of course, there might be special
>> cases ....
> 
> I remember Alexander didn't want to change the PageReserved handling
> because he was worried about unforeseeable side effects. I have a vague
> recollection he (or maybe Dan) has promissed some follow up clean ups
> which didn't seem to materialize.

I'm looking into it right now, especially the KVM part.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups
  2019-10-22  8:15     ` David Hildenbrand
@ 2019-10-22  8:21       ` Michal Hocko
  2019-10-22  8:32         ` David Hildenbrand
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2019-10-22  8:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Alexander Duyck, Andrew Morton,
	Anshuman Khandual, Dan Williams, Mel Gorman, Mike Rapoport,
	Mike Rapoport, Oscar Salvador, Pavel Tatashin, Pavel Tatashin,
	Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang

On Tue 22-10-19 10:15:07, David Hildenbrand wrote:
> On 22.10.19 10:08, Michal Hocko wrote:
> > On Tue 22-10-19 08:52:28, David Hildenbrand wrote:
> > > On 21.10.19 19:23, David Hildenbrand wrote:
> > > > Two cleanups that popped up while working on (and discussing) virtio-mem:
> > > >    https://lkml.org/lkml/2019/9/19/463
> > > > 
> > > > Tested with DIMMs on x86.
> > > > 
> > > > As discussed with michal in v1, I'll soon look into removing the use
> > > > of PG_reserved during memory onlining completely - most probably
> > > > disallowing to offline memory blocks with holes, cleaning up the
> > > > onlining+offlining code.
> > > 
> > > BTW, I remember that ZONE_DEVICE pages are still required to be set
> > > PG_reserved. That has to be sorted out first.
> > 
> > Do they?
> 
> Yes, especially KVM code :/

Details please?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups
  2019-10-22  8:21       ` Michal Hocko
@ 2019-10-22  8:32         ` David Hildenbrand
  2019-10-22  8:38           ` David Hildenbrand
  2019-10-22  9:14           ` Michal Hocko
  0 siblings, 2 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-10-22  8:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Alexander Duyck, Andrew Morton,
	Anshuman Khandual, Dan Williams, Mel Gorman, Mike Rapoport,
	Mike Rapoport, Oscar Salvador, Pavel Tatashin, Pavel Tatashin,
	Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang

On 22.10.19 10:21, Michal Hocko wrote:
> On Tue 22-10-19 10:15:07, David Hildenbrand wrote:
>> On 22.10.19 10:08, Michal Hocko wrote:
>>> On Tue 22-10-19 08:52:28, David Hildenbrand wrote:
>>>> On 21.10.19 19:23, David Hildenbrand wrote:
>>>>> Two cleanups that popped up while working on (and discussing) virtio-mem:
>>>>>     https://lkml.org/lkml/2019/9/19/463
>>>>>
>>>>> Tested with DIMMs on x86.
>>>>>
>>>>> As discussed with michal in v1, I'll soon look into removing the use
>>>>> of PG_reserved during memory onlining completely - most probably
>>>>> disallowing to offline memory blocks with holes, cleaning up the
>>>>> onlining+offlining code.
>>>>
>>>> BTW, I remember that ZONE_DEVICE pages are still required to be set
>>>> PG_reserved. That has to be sorted out first.
>>>
>>> Do they?
>>
>> Yes, especially KVM code :/
> 
> Details please?
> 


E.g., arch/x86/kvm/mmu.c:kvm_is_mmio_pfn()

And I currently have


From 55606751b67989bd06d17844a6bcfbf85d44ee69 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Tue, 22 Oct 2019 10:08:04 +0200
Subject: [PATCH v1] KVM: x86/mmu: Prepare kvm_is_mmio_pfn() for PG_reserved
 changes

Right now, ZONE_DEVICE memory is always set PG_reserved. We want to
change that in the future.

KVM has this weird use case that you can map anything from /dev/mem
into the guest. pfn_valid() is not a reliable check whether the memmap
was initialized and can be touched. pfn_to_online_page() makes sure
that we have an initialized memmap - however, there is no reliable and
fast check to detect memmaps that were initialized and are ZONE_DEVICE.

Let's rewrite kvm_is_mmio_pfn() so we really only touch initialized
memmaps that are guaranteed to not contain garbage. Make sure that
RAM without a memmap is still not detected as MMIO and that ZONE_DEVICE
that is not UC/UC-/WC is not detected as MMIO.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/x86/kvm/mmu.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 24c23c66b226..c91c9a5d14dc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2962,23 +2962,29 @@ static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 
 static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
 {
-	if (pfn_valid(pfn))
-		return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn)) &&
-			/*
-			 * Some reserved pages, such as those from NVDIMM
-			 * DAX devices, are not for MMIO, and can be mapped
-			 * with cached memory type for better performance.
-			 * However, the above check misconceives those pages
-			 * as MMIO, and results in KVM mapping them with UC
-			 * memory type, which would hurt the performance.
-			 * Therefore, we check the host memory type in addition
-			 * and only treat UC/UC-/WC pages as MMIO.
-			 */
-			(!pat_enabled() || pat_pfn_immune_to_uc_mtrr(pfn));
+	struct page *page = pfn_to_online_page(pfn);
+
+	/*
+	 * Online pages consist of pages managed by the buddy. Especially,
+	 * ZONE_DEVICE pages are never online. Online pages that are reserved
+	 * indicate the zero page and MMIO pages.
+	 */
+	if (page)
+		return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn));
 
-	return !e820__mapped_raw_any(pfn_to_hpa(pfn),
-				     pfn_to_hpa(pfn + 1) - 1,
-				     E820_TYPE_RAM);
+	/*
+	 * Any RAM that is not online (e.g., mapped via /dev/mem without
+	 * a memmap or with an uninitialized memmap) is not MMIO.
+	 */
+	if (e820__mapped_raw_any(pfn_to_hpa(pfn), pfn_to_hpa(pfn + 1) - 1,
+				 E820_TYPE_RAM))
+		return false;
+
+	/*
+	 * Finally, anything with a valid memmap could be ZONE_DEVICE - or the
+	 * memmap could be uninitialized. Treat only UC/UC-/WC pages as MMIO.
+	 */
+	return pfn_valid() && !pat_enabled() || pat_pfn_immune_to_uc_mtrr(pfn);
 }
 
 /* Bits which may be returned by set_spte() */
-- 
2.21.0




And also virt/kvm/kvm_main.c:kvm_is_reserved_pfn()


From 928be02b293750e6076c8622268ba41ecd8819e9 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Tue, 22 Oct 2019 10:26:46 +0200
Subject: [PATCH v1] KVM:Prepare kvm_is_reserved_pfn() for PG_reserved changes

Right now, ZONE_DEVICE memory is always set PG_reserved. We want to
change that in the future.

KVM has this weird use case that you can map anything from /dev/mem
into the guest. pfn_valid() is not a reliable check whether the memmap
was initialized and can be touched. pfn_to_online_page() makes sure
that we have an initialized memmap. Note that ZONE_DEVICE memory is
never online (IOW, managed by the buddy).

Switching to pfn_to_online_page() keeps the existing behavior for
PFNs without a memmap and for ZONE_DEVICE memory. They are treated as
reserved and the page is not touched (e.g., to set it dirty or accessed).

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 virt/kvm/kvm_main.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 66a977472a1c..b98d5d44c2b8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -151,9 +151,15 @@ __weak int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 
 bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
 {
-	if (pfn_valid(pfn))
-		return PageReserved(pfn_to_page(pfn));
+	struct page *page = pfn_to_online_page(pfn);
 
+	/*
+	 * We treat any pages that are not online  (not managed by the buddy)
+	 * as reserved - this includes ZONE_DEVICE pages and pages without
+	 * a memmap (e.g.., mapped via /dev/mem).
+	 */
+	if (page)
+		return PageReserved(pfn_to_page(pfn));
 	return true;
 }
 
-- 
2.21.0



I'd like to note that the pfn_valid() check in __kvm_map_gfn() is also bogus,
but switching pfn_to_online_page() is not possible, as we would treat
ZONE_DEVICE memory differently then.

-- 

Thanks,

David / dhildenb


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups
  2019-10-22  8:32         ` David Hildenbrand
@ 2019-10-22  8:38           ` David Hildenbrand
  2019-10-22  9:14           ` Michal Hocko
  1 sibling, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-10-22  8:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Alexander Duyck, Andrew Morton,
	Anshuman Khandual, Dan Williams, Mel Gorman, Mike Rapoport,
	Mike Rapoport, Oscar Salvador, Pavel Tatashin, Pavel Tatashin,
	Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang

On 22.10.19 10:32, David Hildenbrand wrote:
> On 22.10.19 10:21, Michal Hocko wrote:
>> On Tue 22-10-19 10:15:07, David Hildenbrand wrote:
>>> On 22.10.19 10:08, Michal Hocko wrote:
>>>> On Tue 22-10-19 08:52:28, David Hildenbrand wrote:
>>>>> On 21.10.19 19:23, David Hildenbrand wrote:
>>>>>> Two cleanups that popped up while working on (and discussing) virtio-mem:
>>>>>>      https://lkml.org/lkml/2019/9/19/463
>>>>>>
>>>>>> Tested with DIMMs on x86.
>>>>>>
>>>>>> As discussed with michal in v1, I'll soon look into removing the use
>>>>>> of PG_reserved during memory onlining completely - most probably
>>>>>> disallowing to offline memory blocks with holes, cleaning up the
>>>>>> onlining+offlining code.
>>>>>
>>>>> BTW, I remember that ZONE_DEVICE pages are still required to be set
>>>>> PG_reserved. That has to be sorted out first.
>>>>
>>>> Do they?
>>>
>>> Yes, especially KVM code :/
>>
>> Details please?
>>

Oh, and I think you might be wondering "how can we have RAM without a 
memmap in the guest", see

https://lwn.net/Articles/778240/

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups
  2019-10-22  8:32         ` David Hildenbrand
  2019-10-22  8:38           ` David Hildenbrand
@ 2019-10-22  9:14           ` Michal Hocko
  2019-10-22  9:17             ` David Hildenbrand
  1 sibling, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2019-10-22  9:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Alexander Duyck, Andrew Morton,
	Anshuman Khandual, Dan Williams, Mel Gorman, Mike Rapoport,
	Mike Rapoport, Oscar Salvador, Pavel Tatashin, Pavel Tatashin,
	Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang

On Tue 22-10-19 10:32:11, David Hildenbrand wrote:
[...]
> E.g., arch/x86/kvm/mmu.c:kvm_is_mmio_pfn()

Thanks for these references. I am not really familiar with kvm so I
cannot really comment on the specific code but I am wondering why
it simply doesn't check for ZONE_DEVICE explicitly? Also we do care
about holes in RAM (from the early boot), those should be reserved
already AFAIR. So we are left with hotplugged memory with holes and
I am not really sure we should bother with this until there is a clear
usecase in sight.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups
  2019-10-22  9:14           ` Michal Hocko
@ 2019-10-22  9:17             ` David Hildenbrand
  2019-10-22  9:24               ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2019-10-22  9:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Alexander Duyck, Andrew Morton,
	Anshuman Khandual, Dan Williams, Mel Gorman, Mike Rapoport,
	Mike Rapoport, Oscar Salvador, Pavel Tatashin, Pavel Tatashin,
	Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang

On 22.10.19 11:14, Michal Hocko wrote:
> On Tue 22-10-19 10:32:11, David Hildenbrand wrote:
> [...]
>> E.g., arch/x86/kvm/mmu.c:kvm_is_mmio_pfn()
> 
> Thanks for these references. I am not really familiar with kvm so I
> cannot really comment on the specific code but I am wondering why
> it simply doesn't check for ZONE_DEVICE explicitly? Also we do care
> about holes in RAM (from the early boot), those should be reserved
> already AFAIR. So we are left with hotplugged memory with holes and
> I am not really sure we should bother with this until there is a clear
> usecase in sight.

Well, checking for ZONE_DEVICE is only possible if you have an 
initialized memmap. And that is not guaranteed when you start mapping 
random stuff into your guest via /dev/mem.

I am reworking these patches right now and audit the whole kernel for 
PageReserved() checks that might affect ZONE_DEVICE. I'll send the 
collection of patches as RFC.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups
  2019-10-22  9:17             ` David Hildenbrand
@ 2019-10-22  9:24               ` Michal Hocko
  2019-10-22  9:27                 ` David Hildenbrand
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2019-10-22  9:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Alexander Duyck, Andrew Morton,
	Anshuman Khandual, Dan Williams, Mel Gorman, Mike Rapoport,
	Mike Rapoport, Oscar Salvador, Pavel Tatashin, Pavel Tatashin,
	Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang

On Tue 22-10-19 11:17:24, David Hildenbrand wrote:
> On 22.10.19 11:14, Michal Hocko wrote:
> > On Tue 22-10-19 10:32:11, David Hildenbrand wrote:
> > [...]
> > > E.g., arch/x86/kvm/mmu.c:kvm_is_mmio_pfn()
> > 
> > Thanks for these references. I am not really familiar with kvm so I
> > cannot really comment on the specific code but I am wondering why
> > it simply doesn't check for ZONE_DEVICE explicitly? Also we do care
> > about holes in RAM (from the early boot), those should be reserved
> > already AFAIR. So we are left with hotplugged memory with holes and
> > I am not really sure we should bother with this until there is a clear
> > usecase in sight.
> 
> Well, checking for ZONE_DEVICE is only possible if you have an initialized
> memmap. And that is not guaranteed when you start mapping random stuff into
> your guest via /dev/mem.

Yes, I can understand that part but checking PageReserved on an
uninitialized memmap is pointless as well. So if you can test for it you
can very well test for ZONE_DEVICE as well. PageReserved -> ZONE_DEVICE
is a terrible assumption.

> I am reworking these patches right now and audit the whole kernel for
> PageReserved() checks that might affect ZONE_DEVICE. I'll send the
> collection of patches as RFC.

Thanks a lot!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups
  2019-10-22  9:24               ` Michal Hocko
@ 2019-10-22  9:27                 ` David Hildenbrand
  0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2019-10-22  9:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Alexander Duyck, Andrew Morton,
	Anshuman Khandual, Dan Williams, Mel Gorman, Mike Rapoport,
	Mike Rapoport, Oscar Salvador, Pavel Tatashin, Pavel Tatashin,
	Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang

On 22.10.19 11:24, Michal Hocko wrote:
> On Tue 22-10-19 11:17:24, David Hildenbrand wrote:
>> On 22.10.19 11:14, Michal Hocko wrote:
>>> On Tue 22-10-19 10:32:11, David Hildenbrand wrote:
>>> [...]
>>>> E.g., arch/x86/kvm/mmu.c:kvm_is_mmio_pfn()
>>>
>>> Thanks for these references. I am not really familiar with kvm so I
>>> cannot really comment on the specific code but I am wondering why
>>> it simply doesn't check for ZONE_DEVICE explicitly? Also we do care
>>> about holes in RAM (from the early boot), those should be reserved
>>> already AFAIR. So we are left with hotplugged memory with holes and
>>> I am not really sure we should bother with this until there is a clear
>>> usecase in sight.
>>
>> Well, checking for ZONE_DEVICE is only possible if you have an initialized
>> memmap. And that is not guaranteed when you start mapping random stuff into
>> your guest via /dev/mem.
> 
> Yes, I can understand that part but checking PageReserved on an
> uninitialized memmap is pointless as well. So if you can test for it you

That's why I add pfn_to_online_page() :)

> can very well test for ZONE_DEVICE as well. PageReserved -> ZONE_DEVICE
> is a terrible assumption.
Indeed, it is. But there are more parts in the kernel that I'll be 
fixing. Stay tuned.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-10-22  9:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-21 17:23 [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups David Hildenbrand
2019-10-21 17:23 ` [PATCH v2 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining David Hildenbrand
2019-10-21 17:23 ` [PATCH v2 2/2] mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE David Hildenbrand
2019-10-22  6:52 ` [PATCH v2 0/2] mm: Memory offlining + page isolation cleanups David Hildenbrand
2019-10-22  8:08   ` Michal Hocko
2019-10-22  8:15     ` David Hildenbrand
2019-10-22  8:21       ` Michal Hocko
2019-10-22  8:32         ` David Hildenbrand
2019-10-22  8:38           ` David Hildenbrand
2019-10-22  9:14           ` Michal Hocko
2019-10-22  9:17             ` David Hildenbrand
2019-10-22  9:24               ` Michal Hocko
2019-10-22  9:27                 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).