From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5C50C352BE for ; Tue, 14 Apr 2020 16:34:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 995FF2063A for ; Tue, 14 Apr 2020 16:34:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UQXesxwN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391787AbgDNQep (ORCPT ); Tue, 14 Apr 2020 12:34:45 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:36771 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2391775AbgDNQej (ORCPT ); Tue, 14 Apr 2020 12:34:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586882076; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=mxMk4OjYqz6WaLMP6SDDXA6S1CwQy1WHx7MZAFhC3P4=; b=UQXesxwNuQ0wFGXxSVDQHwtjUjmgD9dSkh1rbYgo1NvcJg6Pn5z732ZSaZpPLabxC1O6U6 u3w12TrY+nnVVs45cxSEmd7vPQZXX5a2oqKTxmiF0uYi/dgSakPyaT0NFrBQrRBhFUjj1G Gc67/B2jW2wKo1Up3yczfiE7ei8/xLQ= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-317-b6JCtEiwMDK0iGMcZQOR9A-1; Tue, 14 Apr 2020 12:34:34 -0400 X-MC-Unique: b6JCtEiwMDK0iGMcZQOR9A-1 Received: by mail-qv1-f72.google.com with SMTP id x9so313021qvj.8 for ; Tue, 14 Apr 2020 09:34:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=mxMk4OjYqz6WaLMP6SDDXA6S1CwQy1WHx7MZAFhC3P4=; b=R+YgxVCcjk/X68UZ4vrQBysJ/j2TV5kntw6nPQNjVCyh612coGDr+Lwt67s50w7HgZ SfbWmvphqXVOc/DUJJPpPdgjSL7P91q8+DXF8vQyzK4l+XbugZn6Y7GrU1E22RHwpu+q 1rRj1IFuESt9GRooL6zy0/8YuXP77pmHd33xoSag2ZVXCmDvf2Bgi/9Hp/OCrI8bwMwK wY9Pbo99PEARtFjOu/4ktcU3c8An4oW6MTgRQourcmTyEf+2wAoN4PUqmxunm6vKVqnZ T45RF3BVXGqaF2AZuiCgqHg/Is+I2Xs6drdPu2+BjAF31to/ydQHnrn/f4kQEqzkRvmm lC7w== X-Gm-Message-State: AGi0PuZGSMnZeVIVfdFdfHDhU9DNVgs/+ctL3MF5DzlQ3LVKF81e/KGQ I+fTct8mCCEWC5Hx7bq/dsxqQ+Gr2jg+wBXy9LhywQL9+YB4T+xXq7jMq/aBVxswb3n3cSCHyaa MDXhJVW79T7r6oScTx8m06PeY X-Received: by 2002:a37:4852:: with SMTP id v79mr21337204qka.459.1586882074312; Tue, 14 Apr 2020 09:34:34 -0700 (PDT) X-Google-Smtp-Source: APiQypIxDlWSPYHz24Em/CyqTvLeDJZn3kuTqc3zQvK9MuWirtAiyJlkACPViYqM2NQN7iOQBdFYOg== X-Received: by 2002:a37:4852:: with SMTP id v79mr21337161qka.459.1586882073931; Tue, 14 Apr 2020 09:34:33 -0700 (PDT) Received: from redhat.com (bzq-79-183-51-3.red.bezeqint.net. [79.183.51.3]) by smtp.gmail.com with ESMTPSA id y9sm10550413qkb.41.2020.04.14.09.34.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 09:34:32 -0700 (PDT) Date: Tue, 14 Apr 2020 12:34:26 -0400 From: "Michael S. Tsirkin" To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtio-dev@lists.oasis-open.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, Michal Hocko , Andrew Morton , Alexander Duyck , Michal Hocko , Juergen Gross , Konrad Rzeszutek Wilk , Pavel Tatashin , Vlastimil Babka , Johannes Weiner , Anthony Yznaga , Oscar Salvador , Mel Gorman , Mike Rapoport , Dan Williams , Anshuman Khandual , Qian Cai , Pingfan Liu Subject: Re: [PATCH v2 05/10] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE Message-ID: <20200414123334-mutt-send-email-mst@kernel.org> References: <20200311171422.10484-1-david@redhat.com> <20200311171422.10484-6-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200311171422.10484-6-david@redhat.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 11, 2020 at 06:14:17PM +0100, David Hildenbrand wrote: > virtio-mem wants to allow to offline memory blocks of which some parts > were unplugged (allocated via alloc_contig_range()), especially, to later > offline and remove completely unplugged memory blocks. The important part > is that PageOffline() has to remain set until the section is offline, so > these pages will never get accessed (e.g., when dumping). The pages should > not be handed back to the buddy (which would require clearing PageOffline() > and result in issues if offlining fails and the pages are suddenly in the > buddy). > > Let's allow to do that by allowing to isolate any PageOffline() page > when offlining. This way, we can reach the memory hotplug notifier > MEM_GOING_OFFLINE, where the driver can signal that he is fine with > offlining this page by dropping its reference count. PageOffline() pages > with a reference count of 0 can then be skipped when offlining the > pages (like if they were free, however they are not in the buddy). > > Anybody who uses PageOffline() pages and does not agree to offline them > (e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not > decrement the reference count and make offlining fail when trying to > migrate such an unmovable page. So there should be no observable change. > Same applies to balloon compaction users (movable PageOffline() pages), the > pages will simply be migrated. > > Note 1: If offlining fails, a driver has to increment the reference > count again in MEM_CANCEL_OFFLINE. > > Note 2: A driver that makes use of this has to be aware that re-onlining > the memory block has to be handled by hooking into onlining code > (online_page_callback_t), resetting the page PageOffline() and > not giving them to the buddy. > > Reviewed-by: Alexander Duyck > Acked-by: Michal Hocko > Cc: Andrew Morton > Cc: Juergen Gross > Cc: Konrad Rzeszutek Wilk > Cc: Pavel Tatashin > Cc: Alexander Duyck > Cc: Vlastimil Babka > Cc: Johannes Weiner > Cc: Anthony Yznaga > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: Mel Gorman > Cc: Mike Rapoport > Cc: Dan Williams > Cc: Anshuman Khandual > Cc: Qian Cai > Cc: Pingfan Liu > Signed-off-by: David Hildenbrand Andrew, could you please ack merging this through the vhost tree together with the rest of the patches? > --- > include/linux/page-flags.h | 10 +++++++++ > mm/memory_hotplug.c | 44 +++++++++++++++++++++++++++++--------- > mm/page_alloc.c | 24 +++++++++++++++++++++ > mm/page_isolation.c | 9 ++++++++ > 4 files changed, 77 insertions(+), 10 deletions(-) > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 49c2697046b9..fd6d4670ccc3 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -772,6 +772,16 @@ PAGE_TYPE_OPS(Buddy, buddy) > * not onlined when onlining the section). > * The content of these pages is effectively stale. Such pages should not > * be touched (read/write/dump/save) except by their owner. > + * > + * If a driver wants to allow to offline unmovable PageOffline() pages without > + * putting them back to the buddy, it can do so via the memory notifier by > + * decrementing the reference count in MEM_GOING_OFFLINE and incrementing the > + * reference count in MEM_CANCEL_OFFLINE. When offlining, the PageOffline() > + * pages (now with a reference count of zero) are treated like free pages, > + * allowing the containing memory block to get offlined. A driver that > + * relies on this feature is aware that re-onlining the memory block will > + * require to re-set the pages PageOffline() and not giving them to the > + * buddy via online_page_callback_t. > */ > PAGE_TYPE_OPS(Offline, offline) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 1a00b5a37ef6..ab1c31e67fd1 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1221,11 +1221,17 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn, > > /* > * Scan pfn range [start,end) to find movable/migratable pages (LRU pages, > - * non-lru movable pages and hugepages). We scan pfn because it's much > - * easier than scanning over linked list. This function returns the pfn > - * of the first found movable page if it's found, otherwise 0. > + * non-lru movable pages and hugepages). Will skip over most unmovable > + * pages (esp., pages that can be skipped when offlining), but bail out on > + * definitely unmovable pages. > + * > + * Returns: > + * 0 in case a movable page is found and movable_pfn was updated. > + * -ENOENT in case no movable page was found. > + * -EBUSY in case a definitely unmovable page was found. > */ > -static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > +static int scan_movable_pages(unsigned long start, unsigned long end, > + unsigned long *movable_pfn) > { > unsigned long pfn; > > @@ -1237,18 +1243,30 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > continue; > page = pfn_to_page(pfn); > if (PageLRU(page)) > - return pfn; > + goto found; > if (__PageMovable(page)) > - return pfn; > + goto found; > + > + /* > + * PageOffline() pages that are not marked __PageMovable() and > + * have a reference count > 0 (after MEM_GOING_OFFLINE) are > + * definitely unmovable. If their reference count would be 0, > + * they could at least be skipped when offlining memory. > + */ > + if (PageOffline(page) && page_count(page)) > + return -EBUSY; > > if (!PageHuge(page)) > continue; > head = compound_head(page); > if (page_huge_active(head)) > - return pfn; > + goto found; > skip = compound_nr(head) - (page - head); > pfn += skip - 1; > } > + return -ENOENT; > +found: > + *movable_pfn = pfn; > return 0; > } > > @@ -1515,7 +1533,8 @@ static int __ref __offline_pages(unsigned long start_pfn, > } > > do { > - for (pfn = start_pfn; pfn;) { > + pfn = start_pfn; > + do { > if (signal_pending(current)) { > ret = -EINTR; > reason = "signal backoff"; > @@ -1525,14 +1544,19 @@ static int __ref __offline_pages(unsigned long start_pfn, > cond_resched(); > lru_add_drain_all(); > > - pfn = scan_movable_pages(pfn, end_pfn); > - if (pfn) { > + ret = scan_movable_pages(pfn, end_pfn, &pfn); > + if (!ret) { > /* > * TODO: fatal migration failures should bail > * out > */ > do_migrate_range(pfn, end_pfn); > } > + } while (!ret); > + > + if (ret != -ENOENT) { > + reason = "unmovable page"; > + goto failed_removal_isolated; > } > > /* > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8d7be3f33e26..baa60222215f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8366,6 +8366,19 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, > if ((flags & MEMORY_OFFLINE) && PageHWPoison(page)) > continue; > > + /* > + * We treat all PageOffline() pages as movable when offlining > + * to give drivers a chance to decrement their reference count > + * in MEM_GOING_OFFLINE in order to indicate that these pages > + * can be offlined as there are no direct references anymore. > + * For actually unmovable PageOffline() where the driver does > + * not support this, we will fail later when trying to actually > + * move these pages that still have a reference count > 0. > + * (false negatives in this function only) > + */ > + if ((flags & MEMORY_OFFLINE) && PageOffline(page)) > + continue; > + > if (__PageMovable(page) || PageLRU(page)) > continue; > > @@ -8786,6 +8799,17 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) > offlined_pages++; > continue; > } > + /* > + * At this point all remaining PageOffline() pages have a > + * reference count of 0 and can simply be skipped. > + */ > + if (PageOffline(page)) { > + BUG_ON(page_count(page)); > + BUG_ON(PageBuddy(page)); > + pfn++; > + offlined_pages++; > + continue; > + } > > BUG_ON(page_count(page)); > BUG_ON(!PageBuddy(page)); > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 2c11a38d6e87..f6d07c5f0d34 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -151,6 +151,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) > * a bit mask) > * MEMORY_OFFLINE - isolate to offline (!allocate) memory > * e.g., skip over PageHWPoison() pages > + * and PageOffline() pages. > * REPORT_FAILURE - report details about the failure to > * isolate the range > * > @@ -259,6 +260,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, > else if ((flags & MEMORY_OFFLINE) && PageHWPoison(page)) > /* A HWPoisoned page cannot be also PageBuddy */ > pfn++; > + else if ((flags & MEMORY_OFFLINE) && PageOffline(page) && > + !page_count(page)) > + /* > + * The responsible driver agreed to skip PageOffline() > + * pages when offlining memory by dropping its > + * reference in MEM_GOING_OFFLINE. > + */ > + pfn++; > else > break; > } > -- > 2.24.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [PATCH v2 05/10] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE Date: Tue, 14 Apr 2020 12:34:26 -0400 Message-ID: <20200414123334-mutt-send-email-mst@kernel.org> References: <20200311171422.10484-1-david@redhat.com> <20200311171422.10484-6-david@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20200311171422.10484-6-david@redhat.com> Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: virtualization-bounces@lists.linux-foundation.org Sender: "Virtualization" To: David Hildenbrand Cc: Michal Hocko , kvm@vger.kernel.org, Pingfan Liu , Michal Hocko , linux-mm@kvack.org, Alexander Duyck , virtio-dev@lists.oasis-open.org, Anshuman Khandual , Mike Rapoport , Anthony Yznaga , Pavel Tatashin , Konrad Rzeszutek Wilk , Qian Cai , Dan Williams , virtualization@lists.linux-foundation.org, Vlastimil Babka , Oscar Salvador , Juergen Gross , linux-kernel@vger.kernel.org, Johannes Weiner , Andrew Morton , Mel Gorman List-Id: virtualization@lists.linuxfoundation.org On Wed, Mar 11, 2020 at 06:14:17PM +0100, David Hildenbrand wrote: > virtio-mem wants to allow to offline memory blocks of which some parts > were unplugged (allocated via alloc_contig_range()), especially, to later > offline and remove completely unplugged memory blocks. The important part > is that PageOffline() has to remain set until the section is offline, so > these pages will never get accessed (e.g., when dumping). The pages should > not be handed back to the buddy (which would require clearing PageOffline() > and result in issues if offlining fails and the pages are suddenly in the > buddy). > > Let's allow to do that by allowing to isolate any PageOffline() page > when offlining. This way, we can reach the memory hotplug notifier > MEM_GOING_OFFLINE, where the driver can signal that he is fine with > offlining this page by dropping its reference count. PageOffline() pages > with a reference count of 0 can then be skipped when offlining the > pages (like if they were free, however they are not in the buddy). > > Anybody who uses PageOffline() pages and does not agree to offline them > (e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not > decrement the reference count and make offlining fail when trying to > migrate such an unmovable page. So there should be no observable change. > Same applies to balloon compaction users (movable PageOffline() pages), the > pages will simply be migrated. > > Note 1: If offlining fails, a driver has to increment the reference > count again in MEM_CANCEL_OFFLINE. > > Note 2: A driver that makes use of this has to be aware that re-onlining > the memory block has to be handled by hooking into onlining code > (online_page_callback_t), resetting the page PageOffline() and > not giving them to the buddy. > > Reviewed-by: Alexander Duyck > Acked-by: Michal Hocko > Cc: Andrew Morton > Cc: Juergen Gross > Cc: Konrad Rzeszutek Wilk > Cc: Pavel Tatashin > Cc: Alexander Duyck > Cc: Vlastimil Babka > Cc: Johannes Weiner > Cc: Anthony Yznaga > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: Mel Gorman > Cc: Mike Rapoport > Cc: Dan Williams > Cc: Anshuman Khandual > Cc: Qian Cai > Cc: Pingfan Liu > Signed-off-by: David Hildenbrand Andrew, could you please ack merging this through the vhost tree together with the rest of the patches? > --- > include/linux/page-flags.h | 10 +++++++++ > mm/memory_hotplug.c | 44 +++++++++++++++++++++++++++++--------- > mm/page_alloc.c | 24 +++++++++++++++++++++ > mm/page_isolation.c | 9 ++++++++ > 4 files changed, 77 insertions(+), 10 deletions(-) > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 49c2697046b9..fd6d4670ccc3 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -772,6 +772,16 @@ PAGE_TYPE_OPS(Buddy, buddy) > * not onlined when onlining the section). > * The content of these pages is effectively stale. Such pages should not > * be touched (read/write/dump/save) except by their owner. > + * > + * If a driver wants to allow to offline unmovable PageOffline() pages without > + * putting them back to the buddy, it can do so via the memory notifier by > + * decrementing the reference count in MEM_GOING_OFFLINE and incrementing the > + * reference count in MEM_CANCEL_OFFLINE. When offlining, the PageOffline() > + * pages (now with a reference count of zero) are treated like free pages, > + * allowing the containing memory block to get offlined. A driver that > + * relies on this feature is aware that re-onlining the memory block will > + * require to re-set the pages PageOffline() and not giving them to the > + * buddy via online_page_callback_t. > */ > PAGE_TYPE_OPS(Offline, offline) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 1a00b5a37ef6..ab1c31e67fd1 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1221,11 +1221,17 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn, > > /* > * Scan pfn range [start,end) to find movable/migratable pages (LRU pages, > - * non-lru movable pages and hugepages). We scan pfn because it's much > - * easier than scanning over linked list. This function returns the pfn > - * of the first found movable page if it's found, otherwise 0. > + * non-lru movable pages and hugepages). Will skip over most unmovable > + * pages (esp., pages that can be skipped when offlining), but bail out on > + * definitely unmovable pages. > + * > + * Returns: > + * 0 in case a movable page is found and movable_pfn was updated. > + * -ENOENT in case no movable page was found. > + * -EBUSY in case a definitely unmovable page was found. > */ > -static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > +static int scan_movable_pages(unsigned long start, unsigned long end, > + unsigned long *movable_pfn) > { > unsigned long pfn; > > @@ -1237,18 +1243,30 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > continue; > page = pfn_to_page(pfn); > if (PageLRU(page)) > - return pfn; > + goto found; > if (__PageMovable(page)) > - return pfn; > + goto found; > + > + /* > + * PageOffline() pages that are not marked __PageMovable() and > + * have a reference count > 0 (after MEM_GOING_OFFLINE) are > + * definitely unmovable. If their reference count would be 0, > + * they could at least be skipped when offlining memory. > + */ > + if (PageOffline(page) && page_count(page)) > + return -EBUSY; > > if (!PageHuge(page)) > continue; > head = compound_head(page); > if (page_huge_active(head)) > - return pfn; > + goto found; > skip = compound_nr(head) - (page - head); > pfn += skip - 1; > } > + return -ENOENT; > +found: > + *movable_pfn = pfn; > return 0; > } > > @@ -1515,7 +1533,8 @@ static int __ref __offline_pages(unsigned long start_pfn, > } > > do { > - for (pfn = start_pfn; pfn;) { > + pfn = start_pfn; > + do { > if (signal_pending(current)) { > ret = -EINTR; > reason = "signal backoff"; > @@ -1525,14 +1544,19 @@ static int __ref __offline_pages(unsigned long start_pfn, > cond_resched(); > lru_add_drain_all(); > > - pfn = scan_movable_pages(pfn, end_pfn); > - if (pfn) { > + ret = scan_movable_pages(pfn, end_pfn, &pfn); > + if (!ret) { > /* > * TODO: fatal migration failures should bail > * out > */ > do_migrate_range(pfn, end_pfn); > } > + } while (!ret); > + > + if (ret != -ENOENT) { > + reason = "unmovable page"; > + goto failed_removal_isolated; > } > > /* > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8d7be3f33e26..baa60222215f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8366,6 +8366,19 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, > if ((flags & MEMORY_OFFLINE) && PageHWPoison(page)) > continue; > > + /* > + * We treat all PageOffline() pages as movable when offlining > + * to give drivers a chance to decrement their reference count > + * in MEM_GOING_OFFLINE in order to indicate that these pages > + * can be offlined as there are no direct references anymore. > + * For actually unmovable PageOffline() where the driver does > + * not support this, we will fail later when trying to actually > + * move these pages that still have a reference count > 0. > + * (false negatives in this function only) > + */ > + if ((flags & MEMORY_OFFLINE) && PageOffline(page)) > + continue; > + > if (__PageMovable(page) || PageLRU(page)) > continue; > > @@ -8786,6 +8799,17 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) > offlined_pages++; > continue; > } > + /* > + * At this point all remaining PageOffline() pages have a > + * reference count of 0 and can simply be skipped. > + */ > + if (PageOffline(page)) { > + BUG_ON(page_count(page)); > + BUG_ON(PageBuddy(page)); > + pfn++; > + offlined_pages++; > + continue; > + } > > BUG_ON(page_count(page)); > BUG_ON(!PageBuddy(page)); > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 2c11a38d6e87..f6d07c5f0d34 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -151,6 +151,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) > * a bit mask) > * MEMORY_OFFLINE - isolate to offline (!allocate) memory > * e.g., skip over PageHWPoison() pages > + * and PageOffline() pages. > * REPORT_FAILURE - report details about the failure to > * isolate the range > * > @@ -259,6 +260,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, > else if ((flags & MEMORY_OFFLINE) && PageHWPoison(page)) > /* A HWPoisoned page cannot be also PageBuddy */ > pfn++; > + else if ((flags & MEMORY_OFFLINE) && PageOffline(page) && > + !page_count(page)) > + /* > + * The responsible driver agreed to skip PageOffline() > + * pages when offlining memory by dropping its > + * reference in MEM_GOING_OFFLINE. > + */ > + pfn++; > else > break; > } > -- > 2.24.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: virtio-dev-return-7060-cohuck=redhat.com@lists.oasis-open.org Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id 2DF71985F59 for ; Tue, 14 Apr 2020 16:34:38 +0000 (UTC) Date: Tue, 14 Apr 2020 12:34:26 -0400 From: "Michael S. Tsirkin" Message-ID: <20200414123334-mutt-send-email-mst@kernel.org> References: <20200311171422.10484-1-david@redhat.com> <20200311171422.10484-6-david@redhat.com> MIME-Version: 1.0 In-Reply-To: <20200311171422.10484-6-david@redhat.com> Subject: [virtio-dev] Re: [PATCH v2 05/10] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtio-dev@lists.oasis-open.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, Michal Hocko , Andrew Morton , Alexander Duyck , Michal Hocko , Juergen Gross , Konrad Rzeszutek Wilk , Pavel Tatashin , Vlastimil Babka , Johannes Weiner , Anthony Yznaga , Oscar Salvador , Mel Gorman , Mike Rapoport , Dan Williams , Anshuman Khandual , Qian Cai , Pingfan Liu List-ID: On Wed, Mar 11, 2020 at 06:14:17PM +0100, David Hildenbrand wrote: > virtio-mem wants to allow to offline memory blocks of which some parts > were unplugged (allocated via alloc_contig_range()), especially, to later > offline and remove completely unplugged memory blocks. The important part > is that PageOffline() has to remain set until the section is offline, so > these pages will never get accessed (e.g., when dumping). The pages shoul= d > not be handed back to the buddy (which would require clearing PageOffline= () > and result in issues if offlining fails and the pages are suddenly in the > buddy). >=20 > Let's allow to do that by allowing to isolate any PageOffline() page > when offlining. This way, we can reach the memory hotplug notifier > MEM_GOING_OFFLINE, where the driver can signal that he is fine with > offlining this page by dropping its reference count. PageOffline() pages > with a reference count of 0 can then be skipped when offlining the > pages (like if they were free, however they are not in the buddy). >=20 > Anybody who uses PageOffline() pages and does not agree to offline them > (e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will n= ot > decrement the reference count and make offlining fail when trying to > migrate such an unmovable page. So there should be no observable change. > Same applies to balloon compaction users (movable PageOffline() pages), t= he > pages will simply be migrated. >=20 > Note 1: If offlining fails, a driver has to increment the reference > =09count again in MEM_CANCEL_OFFLINE. >=20 > Note 2: A driver that makes use of this has to be aware that re-onlining > =09the memory block has to be handled by hooking into onlining code > =09(online_page_callback_t), resetting the page PageOffline() and > =09not giving them to the buddy. >=20 > Reviewed-by: Alexander Duyck > Acked-by: Michal Hocko > Cc: Andrew Morton > Cc: Juergen Gross > Cc: Konrad Rzeszutek Wilk > Cc: Pavel Tatashin > Cc: Alexander Duyck > Cc: Vlastimil Babka > Cc: Johannes Weiner > Cc: Anthony Yznaga > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: Mel Gorman > Cc: Mike Rapoport > Cc: Dan Williams > Cc: Anshuman Khandual > Cc: Qian Cai > Cc: Pingfan Liu > Signed-off-by: David Hildenbrand Andrew, could you please ack merging this through the vhost tree together with the rest of the patches? > --- > include/linux/page-flags.h | 10 +++++++++ > mm/memory_hotplug.c | 44 +++++++++++++++++++++++++++++--------- > mm/page_alloc.c | 24 +++++++++++++++++++++ > mm/page_isolation.c | 9 ++++++++ > 4 files changed, 77 insertions(+), 10 deletions(-) >=20 > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 49c2697046b9..fd6d4670ccc3 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -772,6 +772,16 @@ PAGE_TYPE_OPS(Buddy, buddy) > * not onlined when onlining the section). > * The content of these pages is effectively stale. Such pages should no= t > * be touched (read/write/dump/save) except by their owner. > + * > + * If a driver wants to allow to offline unmovable PageOffline() pages w= ithout > + * putting them back to the buddy, it can do so via the memory notifier = by > + * decrementing the reference count in MEM_GOING_OFFLINE and incrementin= g the > + * reference count in MEM_CANCEL_OFFLINE. When offlining, the PageOfflin= e() > + * pages (now with a reference count of zero) are treated like free page= s, > + * allowing the containing memory block to get offlined. A driver that > + * relies on this feature is aware that re-onlining the memory block wil= l > + * require to re-set the pages PageOffline() and not giving them to the > + * buddy via online_page_callback_t. > */ > PAGE_TYPE_OPS(Offline, offline) > =20 > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 1a00b5a37ef6..ab1c31e67fd1 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1221,11 +1221,17 @@ struct zone *test_pages_in_a_zone(unsigned long s= tart_pfn, > =20 > /* > * Scan pfn range [start,end) to find movable/migratable pages (LRU page= s, > - * non-lru movable pages and hugepages). We scan pfn because it's much > - * easier than scanning over linked list. This function returns the pfn > - * of the first found movable page if it's found, otherwise 0. > + * non-lru movable pages and hugepages). Will skip over most unmovable > + * pages (esp., pages that can be skipped when offlining), but bail out = on > + * definitely unmovable pages. > + * > + * Returns: > + *=090 in case a movable page is found and movable_pfn was updated. > + *=09-ENOENT in case no movable page was found. > + *=09-EBUSY in case a definitely unmovable page was found. > */ > -static unsigned long scan_movable_pages(unsigned long start, unsigned lo= ng end) > +static int scan_movable_pages(unsigned long start, unsigned long end, > +=09=09=09 unsigned long *movable_pfn) > { > =09unsigned long pfn; > =20 > @@ -1237,18 +1243,30 @@ static unsigned long scan_movable_pages(unsigned = long start, unsigned long end) > =09=09=09continue; > =09=09page =3D pfn_to_page(pfn); > =09=09if (PageLRU(page)) > -=09=09=09return pfn; > +=09=09=09goto found; > =09=09if (__PageMovable(page)) > -=09=09=09return pfn; > +=09=09=09goto found; > + > +=09=09/* > +=09=09 * PageOffline() pages that are not marked __PageMovable() and > +=09=09 * have a reference count > 0 (after MEM_GOING_OFFLINE) are > +=09=09 * definitely unmovable. If their reference count would be 0, > +=09=09 * they could at least be skipped when offlining memory. > +=09=09 */ > +=09=09if (PageOffline(page) && page_count(page)) > +=09=09=09return -EBUSY; > =20 > =09=09if (!PageHuge(page)) > =09=09=09continue; > =09=09head =3D compound_head(page); > =09=09if (page_huge_active(head)) > -=09=09=09return pfn; > +=09=09=09goto found; > =09=09skip =3D compound_nr(head) - (page - head); > =09=09pfn +=3D skip - 1; > =09} > +=09return -ENOENT; > +found: > +=09*movable_pfn =3D pfn; > =09return 0; > } > =20 > @@ -1515,7 +1533,8 @@ static int __ref __offline_pages(unsigned long star= t_pfn, > =09} > =20 > =09do { > -=09=09for (pfn =3D start_pfn; pfn;) { > +=09=09pfn =3D start_pfn; > +=09=09do { > =09=09=09if (signal_pending(current)) { > =09=09=09=09ret =3D -EINTR; > =09=09=09=09reason =3D "signal backoff"; > @@ -1525,14 +1544,19 @@ static int __ref __offline_pages(unsigned long st= art_pfn, > =09=09=09cond_resched(); > =09=09=09lru_add_drain_all(); > =20 > -=09=09=09pfn =3D scan_movable_pages(pfn, end_pfn); > -=09=09=09if (pfn) { > +=09=09=09ret =3D scan_movable_pages(pfn, end_pfn, &pfn); > +=09=09=09if (!ret) { > =09=09=09=09/* > =09=09=09=09 * TODO: fatal migration failures should bail > =09=09=09=09 * out > =09=09=09=09 */ > =09=09=09=09do_migrate_range(pfn, end_pfn); > =09=09=09} > +=09=09} while (!ret); > + > +=09=09if (ret !=3D -ENOENT) { > +=09=09=09reason =3D "unmovable page"; > +=09=09=09goto failed_removal_isolated; > =09=09} > =20 > =09=09/* > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8d7be3f33e26..baa60222215f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8366,6 +8366,19 @@ struct page *has_unmovable_pages(struct zone *zone= , struct page *page, > =09=09if ((flags & MEMORY_OFFLINE) && PageHWPoison(page)) > =09=09=09continue; > =20 > +=09=09/* > +=09=09 * We treat all PageOffline() pages as movable when offlining > +=09=09 * to give drivers a chance to decrement their reference count > +=09=09 * in MEM_GOING_OFFLINE in order to indicate that these pages > +=09=09 * can be offlined as there are no direct references anymore. > +=09=09 * For actually unmovable PageOffline() where the driver does > +=09=09 * not support this, we will fail later when trying to actually > +=09=09 * move these pages that still have a reference count > 0. > +=09=09 * (false negatives in this function only) > +=09=09 */ > +=09=09if ((flags & MEMORY_OFFLINE) && PageOffline(page)) > +=09=09=09continue; > + > =09=09if (__PageMovable(page) || PageLRU(page)) > =09=09=09continue; > =20 > @@ -8786,6 +8799,17 @@ __offline_isolated_pages(unsigned long start_pfn, = unsigned long end_pfn) > =09=09=09offlined_pages++; > =09=09=09continue; > =09=09} > +=09=09/* > +=09=09 * At this point all remaining PageOffline() pages have a > +=09=09 * reference count of 0 and can simply be skipped. > +=09=09 */ > +=09=09if (PageOffline(page)) { > +=09=09=09BUG_ON(page_count(page)); > +=09=09=09BUG_ON(PageBuddy(page)); > +=09=09=09pfn++; > +=09=09=09offlined_pages++; > +=09=09=09continue; > +=09=09} > =20 > =09=09BUG_ON(page_count(page)); > =09=09BUG_ON(!PageBuddy(page)); > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 2c11a38d6e87..f6d07c5f0d34 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -151,6 +151,7 @@ __first_valid_page(unsigned long pfn, unsigned long n= r_pages) > *=09=09=09a bit mask) > *=09=09=09MEMORY_OFFLINE - isolate to offline (!allocate) memory > *=09=09=09=09=09 e.g., skip over PageHWPoison() pages > + *=09=09=09=09=09 and PageOffline() pages. > *=09=09=09REPORT_FAILURE - report details about the failure to > *=09=09=09isolate the range > * > @@ -259,6 +260,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn,= unsigned long end_pfn, > =09=09else if ((flags & MEMORY_OFFLINE) && PageHWPoison(page)) > =09=09=09/* A HWPoisoned page cannot be also PageBuddy */ > =09=09=09pfn++; > +=09=09else if ((flags & MEMORY_OFFLINE) && PageOffline(page) && > +=09=09=09 !page_count(page)) > +=09=09=09/* > +=09=09=09 * The responsible driver agreed to skip PageOffline() > +=09=09=09 * pages when offlining memory by dropping its > +=09=09=09 * reference in MEM_GOING_OFFLINE. > +=09=09=09 */ > +=09=09=09pfn++; > =09=09else > =09=09=09break; > =09} > --=20 > 2.24.1 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org