From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
Michal Hocko <mhocko@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Alexander Duyck <alexander.h.duyck@linux.intel.com>,
Michal Hocko <mhocko@suse.com>, Juergen Gross <jgross@suse.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Pavel Tatashin <pavel.tatashin@microsoft.com>,
Vlastimil Babka <vbabka@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
Oscar Salvador <osalvador@suse.de>,
Mel Gorman <mgorman@techsingularity.net>,
Mike Rapoport <rppt@linux.ibm.com>,
Dan Williams <dan.j.williams@intel.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Qian Cai <cai@lca.pw>, Pingfan Liu <kernelfans@gmail.com>
Subject: Re: [PATCH v2 05/10] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
Date: Tue, 14 Apr 2020 12:34:26 -0400 [thread overview]
Message-ID: <20200414123334-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20200311171422.10484-6-david@redhat.com>
On Wed, Mar 11, 2020 at 06:14:17PM +0100, David Hildenbrand wrote:
> virtio-mem wants to allow to offline memory blocks of which some parts
> were unplugged (allocated via alloc_contig_range()), especially, to later
> offline and remove completely unplugged memory blocks. The important part
> is that PageOffline() has to remain set until the section is offline, so
> these pages will never get accessed (e.g., when dumping). The pages should
> not be handed back to the buddy (which would require clearing PageOffline()
> and result in issues if offlining fails and the pages are suddenly in the
> buddy).
>
> Let's allow to do that by allowing to isolate any PageOffline() page
> when offlining. This way, we can reach the memory hotplug notifier
> MEM_GOING_OFFLINE, where the driver can signal that he is fine with
> offlining this page by dropping its reference count. PageOffline() pages
> with a reference count of 0 can then be skipped when offlining the
> pages (like if they were free, however they are not in the buddy).
>
> Anybody who uses PageOffline() pages and does not agree to offline them
> (e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not
> decrement the reference count and make offlining fail when trying to
> migrate such an unmovable page. So there should be no observable change.
> Same applies to balloon compaction users (movable PageOffline() pages), the
> pages will simply be migrated.
>
> Note 1: If offlining fails, a driver has to increment the reference
> count again in MEM_CANCEL_OFFLINE.
>
> Note 2: A driver that makes use of this has to be aware that re-onlining
> the memory block has to be handled by hooking into onlining code
> (online_page_callback_t), resetting the page PageOffline() and
> not giving them to the buddy.
>
> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Pingfan Liu <kernelfans@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Andrew, could you please ack merging this through the vhost tree
together with the rest of the patches?
> ---
> include/linux/page-flags.h | 10 +++++++++
> mm/memory_hotplug.c | 44 +++++++++++++++++++++++++++++---------
> mm/page_alloc.c | 24 +++++++++++++++++++++
> mm/page_isolation.c | 9 ++++++++
> 4 files changed, 77 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 49c2697046b9..fd6d4670ccc3 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -772,6 +772,16 @@ PAGE_TYPE_OPS(Buddy, buddy)
> * not onlined when onlining the section).
> * The content of these pages is effectively stale. Such pages should not
> * be touched (read/write/dump/save) except by their owner.
> + *
> + * If a driver wants to allow to offline unmovable PageOffline() pages without
> + * putting them back to the buddy, it can do so via the memory notifier by
> + * decrementing the reference count in MEM_GOING_OFFLINE and incrementing the
> + * reference count in MEM_CANCEL_OFFLINE. When offlining, the PageOffline()
> + * pages (now with a reference count of zero) are treated like free pages,
> + * allowing the containing memory block to get offlined. A driver that
> + * relies on this feature is aware that re-onlining the memory block will
> + * require to re-set the pages PageOffline() and not giving them to the
> + * buddy via online_page_callback_t.
> */
> PAGE_TYPE_OPS(Offline, offline)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1a00b5a37ef6..ab1c31e67fd1 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1221,11 +1221,17 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn,
>
> /*
> * Scan pfn range [start,end) to find movable/migratable pages (LRU pages,
> - * non-lru movable pages and hugepages). We scan pfn because it's much
> - * easier than scanning over linked list. This function returns the pfn
> - * of the first found movable page if it's found, otherwise 0.
> + * non-lru movable pages and hugepages). Will skip over most unmovable
> + * pages (esp., pages that can be skipped when offlining), but bail out on
> + * definitely unmovable pages.
> + *
> + * Returns:
> + * 0 in case a movable page is found and movable_pfn was updated.
> + * -ENOENT in case no movable page was found.
> + * -EBUSY in case a definitely unmovable page was found.
> */
> -static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
> +static int scan_movable_pages(unsigned long start, unsigned long end,
> + unsigned long *movable_pfn)
> {
> unsigned long pfn;
>
> @@ -1237,18 +1243,30 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
> continue;
> page = pfn_to_page(pfn);
> if (PageLRU(page))
> - return pfn;
> + goto found;
> if (__PageMovable(page))
> - return pfn;
> + goto found;
> +
> + /*
> + * PageOffline() pages that are not marked __PageMovable() and
> + * have a reference count > 0 (after MEM_GOING_OFFLINE) are
> + * definitely unmovable. If their reference count would be 0,
> + * they could at least be skipped when offlining memory.
> + */
> + if (PageOffline(page) && page_count(page))
> + return -EBUSY;
>
> if (!PageHuge(page))
> continue;
> head = compound_head(page);
> if (page_huge_active(head))
> - return pfn;
> + goto found;
> skip = compound_nr(head) - (page - head);
> pfn += skip - 1;
> }
> + return -ENOENT;
> +found:
> + *movable_pfn = pfn;
> return 0;
> }
>
> @@ -1515,7 +1533,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
> }
>
> do {
> - for (pfn = start_pfn; pfn;) {
> + pfn = start_pfn;
> + do {
> if (signal_pending(current)) {
> ret = -EINTR;
> reason = "signal backoff";
> @@ -1525,14 +1544,19 @@ static int __ref __offline_pages(unsigned long start_pfn,
> cond_resched();
> lru_add_drain_all();
>
> - pfn = scan_movable_pages(pfn, end_pfn);
> - if (pfn) {
> + ret = scan_movable_pages(pfn, end_pfn, &pfn);
> + if (!ret) {
> /*
> * TODO: fatal migration failures should bail
> * out
> */
> do_migrate_range(pfn, end_pfn);
> }
> + } while (!ret);
> +
> + if (ret != -ENOENT) {
> + reason = "unmovable page";
> + goto failed_removal_isolated;
> }
>
> /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8d7be3f33e26..baa60222215f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8366,6 +8366,19 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
> continue;
>
> + /*
> + * We treat all PageOffline() pages as movable when offlining
> + * to give drivers a chance to decrement their reference count
> + * in MEM_GOING_OFFLINE in order to indicate that these pages
> + * can be offlined as there are no direct references anymore.
> + * For actually unmovable PageOffline() where the driver does
> + * not support this, we will fail later when trying to actually
> + * move these pages that still have a reference count > 0.
> + * (false negatives in this function only)
> + */
> + if ((flags & MEMORY_OFFLINE) && PageOffline(page))
> + continue;
> +
> if (__PageMovable(page) || PageLRU(page))
> continue;
>
> @@ -8786,6 +8799,17 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> offlined_pages++;
> continue;
> }
> + /*
> + * At this point all remaining PageOffline() pages have a
> + * reference count of 0 and can simply be skipped.
> + */
> + if (PageOffline(page)) {
> + BUG_ON(page_count(page));
> + BUG_ON(PageBuddy(page));
> + pfn++;
> + offlined_pages++;
> + continue;
> + }
>
> BUG_ON(page_count(page));
> BUG_ON(!PageBuddy(page));
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 2c11a38d6e87..f6d07c5f0d34 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -151,6 +151,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
> * a bit mask)
> * MEMORY_OFFLINE - isolate to offline (!allocate) memory
> * e.g., skip over PageHWPoison() pages
> + * and PageOffline() pages.
> * REPORT_FAILURE - report details about the failure to
> * isolate the range
> *
> @@ -259,6 +260,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> else if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
> /* A HWPoisoned page cannot be also PageBuddy */
> pfn++;
> + else if ((flags & MEMORY_OFFLINE) && PageOffline(page) &&
> + !page_count(page))
> + /*
> + * The responsible driver agreed to skip PageOffline()
> + * pages when offlining memory by dropping its
> + * reference in MEM_GOING_OFFLINE.
> + */
> + pfn++;
> else
> break;
> }
> --
> 2.24.1
next prev parent reply other threads:[~2020-04-14 16:34 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-11 17:14 [PATCH v2 00/10] virtio-mem: paravirtualized memory David Hildenbrand
2020-03-11 17:14 ` [PATCH v2 01/10] virtio-mem: Paravirtualized memory hotplug David Hildenbrand
2020-04-15 5:59 ` Pankaj Gupta
2020-03-11 17:14 ` [PATCH v2 02/10] virtio-mem: Allow to specify an ACPI PXM as nid David Hildenbrand
2020-03-30 11:04 ` Pankaj Gupta
2020-03-11 17:14 ` [PATCH v2 03/10] virtio-mem: Paravirtualized memory hotunplug part 1 David Hildenbrand
2020-03-11 17:14 ` [PATCH v2 04/10] virtio-mem: Paravirtualized memory hotunplug part 2 David Hildenbrand
2020-03-11 17:14 ` [PATCH v2 05/10] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE David Hildenbrand
2020-04-14 16:34 ` Michael S. Tsirkin [this message]
2020-04-15 0:32 ` Andrew Morton
2020-03-11 17:14 ` [PATCH v2 06/10] virtio-mem: Allow to offline partially unplugged memory blocks David Hildenbrand
2020-03-11 17:14 ` [PATCH v2 07/10] mm/memory_hotplug: Introduce offline_and_remove_memory() David Hildenbrand
2020-03-11 17:19 ` David Hildenbrand
2020-04-14 16:35 ` Michael S. Tsirkin
2020-04-15 0:30 ` Andrew Morton
2020-04-15 7:35 ` Pankaj Gupta
2020-03-11 17:14 ` [PATCH v2 08/10] virtio-mem: Offline and remove completely unplugged memory blocks David Hildenbrand
2020-04-15 7:42 ` Pankaj Gupta
2020-03-11 17:14 ` [PATCH v2 09/10] virtio-mem: Better retry handling David Hildenbrand
2020-03-11 17:14 ` [PATCH v2 10/10] MAINTAINERS: Add myself as virtio-mem maintainer David Hildenbrand
2020-03-27 16:58 ` [PATCH v2 00/10] virtio-mem: paravirtualized memory Pankaj Gupta
2020-03-27 17:03 ` David Hildenbrand
2020-03-27 17:07 ` David Hildenbrand
2020-03-29 15:41 ` Pankaj Gupta
2020-03-30 8:42 ` David Hildenbrand
2020-03-29 12:42 ` Michael S. Tsirkin
2020-03-30 8:16 ` David Hildenbrand
2020-04-14 9:15 ` David Hildenbrand
2020-04-14 16:28 ` Michael S. Tsirkin
2020-04-14 18:39 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200414123334-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.h.duyck@linux.intel.com \
--cc=anshuman.khandual@arm.com \
--cc=anthony.yznaga@oracle.com \
--cc=cai@lca.pw \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=jgross@suse.com \
--cc=kernelfans@gmail.com \
--cc=konrad.wilk@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=osalvador@suse.de \
--cc=pavel.tatashin@microsoft.com \
--cc=rppt@linux.ibm.com \
--cc=vbabka@suse.cz \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).