linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] mm / virtio-mem: support ZONE_MOVABLE
@ 2020-07-30  9:34 David Hildenbrand
  2020-07-30  9:34 ` [PATCH v2 1/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate() David Hildenbrand
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: David Hildenbrand @ 2020-07-30  9:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, David Hildenbrand, Andrew Morton,
	Baoquan He, Jason Wang, Michael S. Tsirkin, Michal Hocko,
	Mike Kravetz, Mike Rapoport, Pankaj Gupta, Qian Cai

@Andrew, @Mst, I suggest the whole series (including the virtio-mem
change) goes via the -mm tree.

Currently, virtio-mem does not really support ZONE_MOVABLE. While it allows
to online fully plugged memory blocks to ZONE_MOVABLE, it does not allow
to online partially-plugged memory blocks to ZONE_MOVABLE and will never
consider such memory blocks when unplugging memory. This might be
surprising for users (especially, if onlining suddenly fails).

Let's support partially plugged memory blocks in ZONE_MOVABLE, allowing
partially plugged memory blocks to be online to ZONE_MOVABLE and also
unplugging from such memory blocks.

This is especially helpful for testing, but also paves the way for
virtio-mem optimizations, allowing more memory to get reliably unplugged.

Cleanup has_unmovable_pages() and set_migratetype_isolate(), providing
better documentation of how ZONE_MOVABLE interacts with different kind of
unmovable pages (memory offlining vs. alloc_contig_range()).

v1 -> v2:
- "mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()"
-- Move to position 1, add Fixes: tag
-- Drop unused "out:" label
- "mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()"
-- Keep curly braces on "else" case
- Replace "[PATCH v1 5/6] mm/page_alloc: restrict ZONE_MOVABLE optimization
           in has_unmovable_pages() to memory offlining"
  by "mm: document semantics of ZONE_MOVABLE"
-- Brain dump of what I know about ZONE_MOVABLE :)

David Hildenbrand (6):
  mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()
  mm/page_alloc: tweak comments in has_unmovable_pages()
  mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
  mm/page_isolation: cleanup set_migratetype_isolate()
  virtio-mem: don't special-case ZONE_MOVABLE
  mm: document semantics of ZONE_MOVABLE

 drivers/virtio/virtio_mem.c | 47 +++++++------------------------------
 include/linux/mmzone.h      | 34 +++++++++++++++++++++++++++
 mm/page_alloc.c             | 22 +++++------------
 mm/page_isolation.c         | 39 ++++++++++++++----------------
 4 files changed, 65 insertions(+), 77 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()
  2020-07-30  9:34 [PATCH v2 0/6] mm / virtio-mem: support ZONE_MOVABLE David Hildenbrand
@ 2020-07-30  9:34 ` David Hildenbrand
  2020-07-30  9:34 ` [PATCH v2 2/6] mm/page_alloc: tweak comments in has_unmovable_pages() David Hildenbrand
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2020-07-30  9:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, David Hildenbrand, Baoquan He,
	Pankaj Gupta, Mike Kravetz, Andrew Morton, Michal Hocko,
	Michael S . Tsirkin, Qian Cai

Right now, if we have two isolations racing, we might trigger the
WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just
return directly.

In the future, we might want to report -EAGAIN to the caller instead, as
this could indicate a temporary isolation failure only.

Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Fixes: 4a55c0474a92 ("mm/hotplug: silence a lockdep splat with printk()")
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_isolation.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f6d07c5f0d34d..7d7d263ce7f4b 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -29,10 +29,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 	/*
 	 * We assume the caller intended to SET migrate type to isolate.
 	 * If it is already set, then someone else must have raced and
-	 * set it before us.  Return -EBUSY
+	 * set it before us.
 	 */
-	if (is_migrate_isolate_page(page))
-		goto out;
+	if (is_migrate_isolate_page(page)) {
+		spin_unlock_irqrestore(&zone->lock, flags);
+		return -EBUSY;
+	}
 
 	/*
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
@@ -52,7 +54,6 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 		ret = 0;
 	}
 
-out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 	if (!ret) {
 		drain_all_pages(zone);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/6] mm/page_alloc: tweak comments in has_unmovable_pages()
  2020-07-30  9:34 [PATCH v2 0/6] mm / virtio-mem: support ZONE_MOVABLE David Hildenbrand
  2020-07-30  9:34 ` [PATCH v2 1/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate() David Hildenbrand
@ 2020-07-30  9:34 ` David Hildenbrand
  2020-07-30  9:34 ` [PATCH v2 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate() David Hildenbrand
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2020-07-30  9:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, David Hildenbrand, Baoquan He,
	Andrew Morton, Michal Hocko, Michael S . Tsirkin, Mike Kravetz,
	Pankaj Gupta

Let's move the split comment regarding bootmem allocations and memory
holes, especially in the context of ZONE_MOVABLE, to the PageReserved()
check.

Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e028b87ce2942..042ba09d70c5d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 	unsigned long iter = 0;
 	unsigned long pfn = page_to_pfn(page);
 
-	/*
-	 * TODO we could make this much more efficient by not checking every
-	 * page in the range if we know all of them are in MOVABLE_ZONE and
-	 * that the movable zone guarantees that pages are migratable but
-	 * the later is not the case right now unfortunatelly. E.g. movablecore
-	 * can still lead to having bootmem allocations in zone_movable.
-	 */
-
 	if (is_migrate_cma_page(page)) {
 		/*
 		 * CMA allocations (alloc_contig_range) really need to mark
@@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 
 		page = pfn_to_page(pfn + iter);
 
+		/*
+		 * Both, bootmem allocations and memory holes are marked
+		 * PG_reserved and are unmovable. We can even have unmovable
+		 * allocations inside ZONE_MOVABLE, for example when
+		 * specifying "movablecore".
+		 */
 		if (PageReserved(page))
 			return page;
 
@@ -8306,14 +8304,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
 		 * it.  But now, memory offline itself doesn't call
 		 * shrink_node_slabs() and it still to be fixed.
 		 */
-		/*
-		 * If the page is not RAM, page_count()should be 0.
-		 * we don't need more check. This is an _used_ not-movable page.
-		 *
-		 * The problematic thing here is PG_reserved pages. PG_reserved
-		 * is set to both of a memory hole page and a _used_ kernel
-		 * page at boot.
-		 */
 		return page;
 	}
 	return NULL;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
  2020-07-30  9:34 [PATCH v2 0/6] mm / virtio-mem: support ZONE_MOVABLE David Hildenbrand
  2020-07-30  9:34 ` [PATCH v2 1/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate() David Hildenbrand
  2020-07-30  9:34 ` [PATCH v2 2/6] mm/page_alloc: tweak comments in has_unmovable_pages() David Hildenbrand
@ 2020-07-30  9:34 ` David Hildenbrand
  2020-07-30  9:34 ` [PATCH v2 4/6] mm/page_isolation: cleanup set_migratetype_isolate() David Hildenbrand
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2020-07-30  9:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, David Hildenbrand, Baoquan He,
	Andrew Morton, Michal Hocko, Michael S . Tsirkin, Mike Kravetz,
	Pankaj Gupta

Inside has_unmovable_pages(), we have a comment describing how unmovable
data could end up in ZONE_MOVABLE - via "movable_core". Also, besides
checking if the first page in the pageblock is reserved, we don't
perform any further checks in case of ZONE_MOVABLE.

In case of memory offlining, we set REPORT_FAILURE, properly
dump_page() the page and handle the error gracefully.
alloc_contig_pages() users currently never allocate from ZONE_MOVABLE.
E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic
pages only, which will never end up on the MOVABLE zone
(see htlb_alloc_mask()).

Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_isolation.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 7d7d263ce7f4b..d099aac479601 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -57,15 +57,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 	spin_unlock_irqrestore(&zone->lock, flags);
 	if (!ret) {
 		drain_all_pages(zone);
-	} else {
-		WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
-
-		if ((isol_flags & REPORT_FAILURE) && unmovable)
-			/*
-			 * printk() with zone->lock held will likely trigger a
-			 * lockdep splat, so defer it here.
-			 */
-			dump_page(unmovable, "unmovable page");
+	} else if ((isol_flags & REPORT_FAILURE) && unmovable) {
+		/*
+		 * printk() with zone->lock held will likely trigger a
+		 * lockdep splat, so defer it here.
+		 */
+		dump_page(unmovable, "unmovable page");
 	}
 
 	return ret;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/6] mm/page_isolation: cleanup set_migratetype_isolate()
  2020-07-30  9:34 [PATCH v2 0/6] mm / virtio-mem: support ZONE_MOVABLE David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-07-30  9:34 ` [PATCH v2 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate() David Hildenbrand
@ 2020-07-30  9:34 ` David Hildenbrand
  2020-08-06 13:35   ` Vlastimil Babka
  2020-07-30  9:34 ` [PATCH v2 5/6] virtio-mem: don't special-case ZONE_MOVABLE David Hildenbrand
  2020-07-30  9:34 ` [PATCH v2 6/6] mm: document semantics of ZONE_MOVABLE David Hildenbrand
  5 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2020-07-30  9:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, David Hildenbrand, Baoquan He,
	Pankaj Gupta, Andrew Morton, Michal Hocko, Michael S . Tsirkin,
	Mike Kravetz

Let's clean it up a bit, simplifying error handling and getting rid of
the label.

Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_isolation.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index d099aac479601..e65fe5d770849 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -17,12 +17,9 @@
 
 static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
 {
-	struct page *unmovable = NULL;
-	struct zone *zone;
+	struct zone *zone = page_zone(page);
+	struct page *unmovable;
 	unsigned long flags;
-	int ret = -EBUSY;
-
-	zone = page_zone(page);
 
 	spin_lock_irqsave(&zone->lock, flags);
 
@@ -51,13 +48,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 									NULL);
 
 		__mod_zone_freepage_state(zone, -nr_pages, mt);
-		ret = 0;
+		spin_unlock_irqrestore(&zone->lock, flags);
+		drain_all_pages(zone);
+		return 0;
 	}
 
 	spin_unlock_irqrestore(&zone->lock, flags);
-	if (!ret) {
-		drain_all_pages(zone);
-	} else if ((isol_flags & REPORT_FAILURE) && unmovable) {
+	if (isol_flags & REPORT_FAILURE) {
 		/*
 		 * printk() with zone->lock held will likely trigger a
 		 * lockdep splat, so defer it here.
@@ -65,7 +62,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 		dump_page(unmovable, "unmovable page");
 	}
 
-	return ret;
+	return -EBUSY;
 }
 
 static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 5/6] virtio-mem: don't special-case ZONE_MOVABLE
  2020-07-30  9:34 [PATCH v2 0/6] mm / virtio-mem: support ZONE_MOVABLE David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-07-30  9:34 ` [PATCH v2 4/6] mm/page_isolation: cleanup set_migratetype_isolate() David Hildenbrand
@ 2020-07-30  9:34 ` David Hildenbrand
  2020-07-30  9:34 ` [PATCH v2 6/6] mm: document semantics of ZONE_MOVABLE David Hildenbrand
  5 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2020-07-30  9:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, David Hildenbrand, Andrew Morton,
	Michal Hocko, Michael S . Tsirkin, Jason Wang, Mike Kravetz,
	Pankaj Gupta, Baoquan He

Let's allow to online partially plugged memory blocks to ZONE_MOVABLE
and also consider memory blocks that were onlined to ZONE_MOVABLE when
unplugging memory. While unplugged memory blocks are, in general,
unmovable, they can be skipped when offlining memory.

virtio-mem only unplugs fairly big chunks (in the megabyte range) and
rather tries to shrink the memory region than randomly choosing memory. In
theory, if all other pages in the movable zone would be movable, virtio-mem
would only shrink that zone and not create any kind of fragmentation.

In the future, we might want to remember the zone again and use the
information when (un)plugging memory. For now, let's keep it simple.

Note: Support for defragmentation is planned, to deal with fragmentation
after unplug due to memory chunks within memory blocks that could not
get unplugged before (e.g., somebody pinning pages within ZONE_MOVABLE
for a longer time).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 47 +++++++------------------------------
 1 file changed, 8 insertions(+), 39 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index f26f5f64ae822..2ddfc4a0e2ee0 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -36,18 +36,10 @@ enum virtio_mem_mb_state {
 	VIRTIO_MEM_MB_STATE_OFFLINE,
 	/* Partially plugged, fully added to Linux, offline. */
 	VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL,
-	/* Fully plugged, fully added to Linux, online (!ZONE_MOVABLE). */
+	/* Fully plugged, fully added to Linux, online. */
 	VIRTIO_MEM_MB_STATE_ONLINE,
-	/* Partially plugged, fully added to Linux, online (!ZONE_MOVABLE). */
+	/* Partially plugged, fully added to Linux, online. */
 	VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL,
-	/*
-	 * Fully plugged, fully added to Linux, online (ZONE_MOVABLE).
-	 * We are not allowed to allocate (unplug) parts of this block that
-	 * are not movable (similar to gigantic pages). We will never allow
-	 * to online OFFLINE_PARTIAL to ZONE_MOVABLE (as they would contain
-	 * unmovable parts).
-	 */
-	VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE,
 	VIRTIO_MEM_MB_STATE_COUNT
 };
 
@@ -526,21 +518,10 @@ static bool virtio_mem_owned_mb(struct virtio_mem *vm, unsigned long mb_id)
 }
 
 static int virtio_mem_notify_going_online(struct virtio_mem *vm,
-					  unsigned long mb_id,
-					  enum zone_type zone)
+					  unsigned long mb_id)
 {
 	switch (virtio_mem_mb_get_state(vm, mb_id)) {
 	case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
-		/*
-		 * We won't allow to online a partially plugged memory block
-		 * to the MOVABLE zone - it would contain unmovable parts.
-		 */
-		if (zone == ZONE_MOVABLE) {
-			dev_warn_ratelimited(&vm->vdev->dev,
-					     "memory block has holes, MOVABLE not supported\n");
-			return NOTIFY_BAD;
-		}
-		return NOTIFY_OK;
 	case VIRTIO_MEM_MB_STATE_OFFLINE:
 		return NOTIFY_OK;
 	default:
@@ -560,7 +541,6 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
 					VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
 		break;
 	case VIRTIO_MEM_MB_STATE_ONLINE:
-	case VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE:
 		virtio_mem_mb_set_state(vm, mb_id,
 					VIRTIO_MEM_MB_STATE_OFFLINE);
 		break;
@@ -579,24 +559,17 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
 	virtio_mem_retry(vm);
 }
 
-static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id,
-				     enum zone_type zone)
+static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
 {
 	unsigned long nb_offline;
 
 	switch (virtio_mem_mb_get_state(vm, mb_id)) {
 	case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
-		BUG_ON(zone == ZONE_MOVABLE);
 		virtio_mem_mb_set_state(vm, mb_id,
 					VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
 		break;
 	case VIRTIO_MEM_MB_STATE_OFFLINE:
-		if (zone == ZONE_MOVABLE)
-			virtio_mem_mb_set_state(vm, mb_id,
-					    VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE);
-		else
-			virtio_mem_mb_set_state(vm, mb_id,
-						VIRTIO_MEM_MB_STATE_ONLINE);
+		virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE);
 		break;
 	default:
 		BUG();
@@ -675,7 +648,6 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 	const unsigned long start = PFN_PHYS(mhp->start_pfn);
 	const unsigned long size = PFN_PHYS(mhp->nr_pages);
 	const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
-	enum zone_type zone;
 	int rc = NOTIFY_OK;
 
 	if (!virtio_mem_overlaps_range(vm, start, size))
@@ -717,8 +689,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 			break;
 		}
 		vm->hotplug_active = true;
-		zone = page_zonenum(pfn_to_page(mhp->start_pfn));
-		rc = virtio_mem_notify_going_online(vm, mb_id, zone);
+		rc = virtio_mem_notify_going_online(vm, mb_id);
 		break;
 	case MEM_OFFLINE:
 		virtio_mem_notify_offline(vm, mb_id);
@@ -726,8 +697,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
 		mutex_unlock(&vm->hotplug_mutex);
 		break;
 	case MEM_ONLINE:
-		zone = page_zonenum(pfn_to_page(mhp->start_pfn));
-		virtio_mem_notify_online(vm, mb_id, zone);
+		virtio_mem_notify_online(vm, mb_id);
 		vm->hotplug_active = false;
 		mutex_unlock(&vm->hotplug_mutex);
 		break;
@@ -1906,8 +1876,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
 	if (vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] ||
 	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] ||
 	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] ||
-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL] ||
-	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE]) {
+	    vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL]) {
 		dev_warn(&vdev->dev, "device still has system memory added\n");
 	} else {
 		virtio_mem_delete_resource(vm);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 6/6] mm: document semantics of ZONE_MOVABLE
  2020-07-30  9:34 [PATCH v2 0/6] mm / virtio-mem: support ZONE_MOVABLE David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-07-30  9:34 ` [PATCH v2 5/6] virtio-mem: don't special-case ZONE_MOVABLE David Hildenbrand
@ 2020-07-30  9:34 ` David Hildenbrand
  5 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2020-07-30  9:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, David Hildenbrand, Andrew Morton,
	Michal Hocko, Michael S . Tsirkin, Mike Kravetz, Mike Rapoport,
	Pankaj Gupta, Baoquan He

Let's document what ZONE_MOVABLE means, how it's used, and which special
cases we have regarding unmovable pages (memory offlining vs. migration /
allocations).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mmzone.h | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f6f884970511d..b8c49b2aff684 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -372,6 +372,40 @@ enum zone_type {
 	 */
 	ZONE_HIGHMEM,
 #endif
+	/*
+	 * ZONE_MOVABLE is similar to ZONE_NORMAL, except that it *primarily*
+	 * only contains movable pages. Main use cases are to make memory
+	 * offlining more likely to succeed, and to locally limit unmovable
+	 * allocations - e.g., to increase the number of THP/huge pages.
+	 * Notable special cases:
+	 *
+	 * 1. Pinned pages: (Long-term) pinning of movable pages might
+	 *    essentially turn such pages unmovable. Memory offlining might
+	 *    retry a long time.
+	 * 2. memblock allocations: kernelcore/movablecore setups might create
+	 *    situations where ZONE_MOVABLE contains unmovable allocations
+	 *    after boot. Memory offlining and allocations fail early.
+	 * 3. Memory holes: Such pages cannot be allocated. Applies only to
+	 *    boot memory, not hotplugged memory. Memory offlining and
+	 *    allocations fail early.
+	 * 4. PG_hwpoison pages: While poisoned pages can be skipped during
+	 *    memory offlining, such pages cannot be allocated.
+	 * 5. Unmovable PG_offline pages: In paravirtualized environments,
+	 *    hotplugged memory blocks might only partially be managed by the
+	 *    buddy (e.g., via XEN-balloon, Hyper-V balloon, virtio-mem). The
+	 *    parts not manged by the buddy are unmovable PG_offline pages. In
+	 *    some cases (virtio-mem), such pages can be skipped during
+	 *    memory offlining, however, cannot be moved/allcoated. These
+	 *    techniques might use alloc_contig_range() to hide previously
+	 *    exposed pages from the buddy again (e.g., to implement some sort
+	 *    of memory unplug in virtio-mem).
+	 *
+	 * In general, no unmovable allocations that degrade memory offlining
+	 * should end up in ZONE_MOVABLE. Allocators (like alloc_contig_range())
+	 * have to expect that migrating pages in ZONE_MOVABLE can fail (even
+	 * if has_unmovable_pages() states that there are no unmovable pages,
+	 * there can be false negatives).
+	 */
 	ZONE_MOVABLE,
 #ifdef CONFIG_ZONE_DEVICE
 	ZONE_DEVICE,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 4/6] mm/page_isolation: cleanup set_migratetype_isolate()
  2020-07-30  9:34 ` [PATCH v2 4/6] mm/page_isolation: cleanup set_migratetype_isolate() David Hildenbrand
@ 2020-08-06 13:35   ` Vlastimil Babka
  2020-08-06 14:19     ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2020-08-06 13:35 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: virtualization, linux-mm, Baoquan He, Pankaj Gupta,
	Andrew Morton, Michal Hocko, Michael S . Tsirkin, Mike Kravetz

On 7/30/20 11:34 AM, David Hildenbrand wrote:
> Let's clean it up a bit, simplifying error handling and getting rid of
> the label.

Nit: the label was already removed by patch 1/6?

> Reviewed-by: Baoquan He <bhe@redhat.com>
> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/page_isolation.c | 17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index d099aac479601..e65fe5d770849 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -17,12 +17,9 @@
>  
>  static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>  {
> -	struct page *unmovable = NULL;
> -	struct zone *zone;
> +	struct zone *zone = page_zone(page);
> +	struct page *unmovable;
>  	unsigned long flags;
> -	int ret = -EBUSY;
> -
> -	zone = page_zone(page);
>  
>  	spin_lock_irqsave(&zone->lock, flags);
>  
> @@ -51,13 +48,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>  									NULL);
>  
>  		__mod_zone_freepage_state(zone, -nr_pages, mt);
> -		ret = 0;
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +		drain_all_pages(zone);
> +		return 0;
>  	}
>  
>  	spin_unlock_irqrestore(&zone->lock, flags);
> -	if (!ret) {
> -		drain_all_pages(zone);
> -	} else if ((isol_flags & REPORT_FAILURE) && unmovable) {
> +	if (isol_flags & REPORT_FAILURE) {
>  		/*
>  		 * printk() with zone->lock held will likely trigger a
>  		 * lockdep splat, so defer it here.
> @@ -65,7 +62,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>  		dump_page(unmovable, "unmovable page");
>  	}
>  
> -	return ret;
> +	return -EBUSY;
>  }
>  
>  static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 4/6] mm/page_isolation: cleanup set_migratetype_isolate()
  2020-08-06 13:35   ` Vlastimil Babka
@ 2020-08-06 14:19     ` David Hildenbrand
  0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2020-08-06 14:19 UTC (permalink / raw)
  To: Vlastimil Babka, linux-kernel
  Cc: virtualization, linux-mm, Baoquan He, Pankaj Gupta,
	Andrew Morton, Michal Hocko, Michael S . Tsirkin, Mike Kravetz

On 06.08.20 15:35, Vlastimil Babka wrote:
> On 7/30/20 11:34 AM, David Hildenbrand wrote:
>> Let's clean it up a bit, simplifying error handling and getting rid of
>> the label.
> 
> Nit: the label was already removed by patch 1/6?
> 

Ack, leftover from reshuffling - thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-06 14:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-30  9:34 [PATCH v2 0/6] mm / virtio-mem: support ZONE_MOVABLE David Hildenbrand
2020-07-30  9:34 ` [PATCH v2 1/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate() David Hildenbrand
2020-07-30  9:34 ` [PATCH v2 2/6] mm/page_alloc: tweak comments in has_unmovable_pages() David Hildenbrand
2020-07-30  9:34 ` [PATCH v2 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate() David Hildenbrand
2020-07-30  9:34 ` [PATCH v2 4/6] mm/page_isolation: cleanup set_migratetype_isolate() David Hildenbrand
2020-08-06 13:35   ` Vlastimil Babka
2020-08-06 14:19     ` David Hildenbrand
2020-07-30  9:34 ` [PATCH v2 5/6] virtio-mem: don't special-case ZONE_MOVABLE David Hildenbrand
2020-07-30  9:34 ` [PATCH v2 6/6] mm: document semantics of ZONE_MOVABLE David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).