From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93111C2BAEE for ; Wed, 11 Mar 2020 17:16:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E050B20746 for ; Wed, 11 Mar 2020 17:16:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MNSvGa62" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730504AbgCKRQF (ORCPT ); Wed, 11 Mar 2020 13:16:05 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:51698 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730205AbgCKRQE (ORCPT ); Wed, 11 Mar 2020 13:16:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1583946963; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GUPkyWfPObsIY0DZQSPrijCHG7LExrL/8dv4FXU4BII=; b=MNSvGa62zad2XrxSnyj1FmF847iS4PZOgEKlK0BMUMS70PTrLGw7DhTU/YQnoJTUJNIoXw H0b7iw+Pm23/kzg++2BU9FU0LHucXVHrJhR+Ugi6XQJwAhLHpFE0db5WfzFfGUF2FywvKz zScx+pm9U5VqbjhwsNq66v4j4ZqD1FM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-224-Ixxu5hKKP4a_cCK4Z25P4A-1; Wed, 11 Mar 2020 13:15:59 -0400 X-MC-Unique: Ixxu5hKKP4a_cCK4Z25P4A-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1FE19DBA3; Wed, 11 Mar 2020 17:15:57 +0000 (UTC) Received: from t480s.redhat.com (ovpn-116-132.ams2.redhat.com [10.36.116.132]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3209592D2D; Wed, 11 Mar 2020 17:15:42 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, virtio-dev@lists.oasis-open.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, Michal Hocko , Andrew Morton , "Michael S . Tsirkin" , David Hildenbrand , Michal Hocko , Jason Wang , Oscar Salvador , Igor Mammedov , Dave Young , Dan Williams , Pavel Tatashin , Stefan Hajnoczi , Vlastimil Babka , Mel Gorman , Mike Rapoport , Alexander Duyck , Alexander Potapenko Subject: [PATCH v2 04/10] virtio-mem: Paravirtualized memory hotunplug part 2 Date: Wed, 11 Mar 2020 18:14:16 +0100 Message-Id: <20200311171422.10484-5-david@redhat.com> In-Reply-To: <20200311171422.10484-1-david@redhat.com> References: <20200311171422.10484-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We also want to unplug online memory (contained in online memory blocks and, therefore, managed by the buddy), and eventually replug it later. When requested to unplug memory, we use alloc_contig_range() to allocate subblocks in online memory blocks (so we are the owner) and send them to our hypervisor. When requested to plug memory, we can replug such memory using free_contig_range() after asking our hypervisor. We also want to mark all allocated pages PG_offline, so nobody will touch them. To differentiate pages that were never onlined when onlining the memory block from pages allocated via alloc_contig_range(), = we use PageDirty(). Based on this flag, virtio_mem_fake_online() can either online the pages for the first time or use free_contig_range(). It is worth noting that there are no guarantees on how much memory can actually get unplugged again. All device memory might completely be fragmented with unmovable data, such that no subblock can get unplugged. We are not touching the ZONE_MOVABLE. If memory is onlined to the ZONE_MOVABLE, it can only get unplugged after that memory was offlined manually by user space. In normal operation, virtio-mem memory is suggested to be onlined to ZONE_NORMAL. In the future, we will try to make unplug more likely to succeed. Add a module parameter to control if online memory shall be touched. As we want to access alloc_contig_range()/free_contig_range() from kernel module context, export the symbols. Note: Whenever virtio-mem uses alloc_contig_range(), all affected pages are on the same node, in the same zone, and contain no holes. Acked-by: Michal Hocko # to export contig range allocat= or API Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Oscar Salvador Cc: Michal Hocko Cc: Igor Mammedov Cc: Dave Young Cc: Andrew Morton Cc: Dan Williams Cc: Pavel Tatashin Cc: Stefan Hajnoczi Cc: Vlastimil Babka Cc: Mel Gorman Cc: Mike Rapoport Cc: Alexander Duyck Cc: Alexander Potapenko Signed-off-by: David Hildenbrand --- drivers/virtio/Kconfig | 1 + drivers/virtio/virtio_mem.c | 157 ++++++++++++++++++++++++++++++++---- mm/page_alloc.c | 2 + 3 files changed, 146 insertions(+), 14 deletions(-) diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 95ea2094a6b4..6af35ffb9796 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -72,6 +72,7 @@ config VIRTIO_MEM depends on VIRTIO depends on MEMORY_HOTPLUG_SPARSE depends on MEMORY_HOTREMOVE + select CONTIG_ALLOC help This driver provides access to virtio-mem paravirtualized memory devices, allowing to hotplug and hotunplug memory. diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index c1fc7f9c4acf..5b26d57be551 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -22,6 +22,10 @@ =20 #include =20 +static bool unplug_online =3D true; +module_param(unplug_online, bool, 0644); +MODULE_PARM_DESC(unplug_online, "Try to unplug online memory"); + enum virtio_mem_mb_state { /* Unplugged, not added to Linux. Can be reused later. */ VIRTIO_MEM_MB_STATE_UNUSED =3D 0, @@ -652,23 +656,35 @@ static int virtio_mem_memory_notifier_cb(struct not= ifier_block *nb, } =20 /* - * Set a range of pages PG_offline. + * Set a range of pages PG_offline. Remember pages that were never onlin= ed + * (via generic_online_page()) using PageDirty(). */ static void virtio_mem_set_fake_offline(unsigned long pfn, - unsigned int nr_pages) + unsigned int nr_pages, bool onlined) { - for (; nr_pages--; pfn++) - __SetPageOffline(pfn_to_page(pfn)); + for (; nr_pages--; pfn++) { + struct page *page =3D pfn_to_page(pfn); + + __SetPageOffline(page); + if (!onlined) + SetPageDirty(page); + } } =20 /* - * Clear PG_offline from a range of pages. + * Clear PG_offline from a range of pages. If the pages were never onlin= ed, + * (via generic_online_page()), clear PageDirty(). */ static void virtio_mem_clear_fake_offline(unsigned long pfn, - unsigned int nr_pages) + unsigned int nr_pages, bool onlined) { - for (; nr_pages--; pfn++) - __ClearPageOffline(pfn_to_page(pfn)); + for (; nr_pages--; pfn++) { + struct page *page =3D pfn_to_page(pfn); + + __ClearPageOffline(page); + if (!onlined) + ClearPageDirty(page); + } } =20 /* @@ -684,10 +700,26 @@ static void virtio_mem_fake_online(unsigned long pf= n, unsigned int nr_pages) * We are always called with subblock granularity, which is at least * aligned to MAX_ORDER - 1. */ - virtio_mem_clear_fake_offline(pfn, nr_pages); + for (i =3D 0; i < nr_pages; i +=3D 1 << order) { + struct page *page =3D pfn_to_page(pfn + i); =20 - for (i =3D 0; i < nr_pages; i +=3D 1 << order) - generic_online_page(pfn_to_page(pfn + i), order); + /* + * If the page is PageDirty(), it was kept fake-offline when + * onlining the memory block. Otherwise, it was allocated + * using alloc_contig_range(). All pages in a subblock are + * alike. + */ + if (PageDirty(page)) { + virtio_mem_clear_fake_offline(pfn + i, 1 << order, + false); + generic_online_page(page, order); + } else { + virtio_mem_clear_fake_offline(pfn + i, 1 << order, + true); + free_contig_range(pfn + i, 1 << order); + adjust_managed_page_count(page, 1 << order); + } + } } =20 static void virtio_mem_online_page_cb(struct page *page, unsigned int or= der) @@ -716,7 +748,8 @@ static void virtio_mem_online_page_cb(struct page *pa= ge, unsigned int order) if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1)) generic_online_page(page, order); else - virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order); + virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order, + false); rcu_read_unlock(); return; } @@ -1184,6 +1217,72 @@ static int virtio_mem_mb_unplug_any_sb_offline(str= uct virtio_mem *vm, return 0; } =20 +/* + * Unplug the desired number of plugged subblocks of an online memory bl= ock. + * Will skip subblock that are busy. + * + * Will modify the state of the memory block. + * + * Note: Can fail after some subblocks were successfully unplugged. Can + * return 0 even if subblocks were busy and could not get unplugge= d. + */ +static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm, + unsigned long mb_id, + uint64_t *nb_sb) +{ + const unsigned long nr_pages =3D PFN_DOWN(vm->subblock_size); + unsigned long start_pfn; + int rc, sb_id; + + /* + * TODO: To increase the performance we want to try bigger, consecutive + * subblocks first before falling back to single subblocks. Also, + * we should sense via something like is_mem_section_removable() + * first if it makes sense to go ahead any try to allocate. + */ + for (sb_id =3D 0; sb_id < vm->nb_sb_per_mb && *nb_sb; sb_id++) { + /* Find the next candidate subblock */ + while (sb_id < vm->nb_sb_per_mb && + !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1)) + sb_id++; + if (sb_id >=3D vm->nb_sb_per_mb) + break; + + start_pfn =3D PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) + + sb_id * vm->subblock_size); + rc =3D alloc_contig_range(start_pfn, start_pfn + nr_pages, + MIGRATE_MOVABLE, GFP_KERNEL); + if (rc =3D=3D -ENOMEM) + /* whoops, out of memory */ + return rc; + if (rc) + /* memory busy, we can't unplug this chunk */ + continue; + + /* Mark it as fake-offline before unplugging it */ + virtio_mem_set_fake_offline(start_pfn, nr_pages, true); + adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages); + + /* Try to unplug the allocated memory */ + rc =3D virtio_mem_mb_unplug_sb(vm, mb_id, sb_id, 1); + if (rc) { + /* Return the memory to the buddy. */ + virtio_mem_fake_online(start_pfn, nr_pages); + return rc; + } + + virtio_mem_mb_set_state(vm, mb_id, + VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL); + *nb_sb -=3D 1; + } + + /* + * TODO: Once all subblocks of a memory block were unplugged, we want + * to offline the memory block and remove it. + */ + return 0; +} + /* * Try to unplug the requested amount of memory. */ @@ -1223,8 +1322,37 @@ static int virtio_mem_unplug_request(struct virtio= _mem *vm, uint64_t diff) cond_resched(); } =20 + if (!unplug_online) { + mutex_unlock(&vm->hotplug_mutex); + return 0; + } + + /* Try to unplug subblocks of partially plugged online blocks. */ + virtio_mem_for_each_mb_state_rev(vm, mb_id, + VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL) { + rc =3D virtio_mem_mb_unplug_any_sb_online(vm, mb_id, + &nb_sb); + if (rc || !nb_sb) + goto out_unlock; + mutex_unlock(&vm->hotplug_mutex); + cond_resched(); + mutex_lock(&vm->hotplug_mutex); + } + + /* Try to unplug subblocks of plugged online blocks. */ + virtio_mem_for_each_mb_state_rev(vm, mb_id, + VIRTIO_MEM_MB_STATE_ONLINE) { + rc =3D virtio_mem_mb_unplug_any_sb_online(vm, mb_id, + &nb_sb); + if (rc || !nb_sb) + goto out_unlock; + mutex_unlock(&vm->hotplug_mutex); + cond_resched(); + mutex_lock(&vm->hotplug_mutex); + } + mutex_unlock(&vm->hotplug_mutex); - return 0; + return nb_sb ? -EBUSY : 0; out_unlock: mutex_unlock(&vm->hotplug_mutex); return rc; @@ -1330,7 +1458,8 @@ static void virtio_mem_run_wq(struct work_struct *w= ork) case -EBUSY: /* * The hypervisor cannot process our request right now - * (e.g., out of memory, migrating). + * (e.g., out of memory, migrating) or we cannot free up + * any memory to unplug it (all plugged memory is busy). */ case -ENOMEM: /* Out of memory, try again later. */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 79e950d76ffc..8d7be3f33e26 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8597,6 +8597,7 @@ int alloc_contig_range(unsigned long start, unsigne= d long end, pfn_max_align_up(end), migratetype); return ret; } +EXPORT_SYMBOL(alloc_contig_range); =20 static int __alloc_contig_pages(unsigned long start_pfn, unsigned long nr_pages, gfp_t gfp_mask) @@ -8712,6 +8713,7 @@ void free_contig_range(unsigned long pfn, unsigned = int nr_pages) } WARN(count !=3D 0, "%d pages are still in use!\n", count); } +EXPORT_SYMBOL(free_contig_range); =20 /* * The zone indicated has a new number of managed_pages; batch sizes and= percpu --=20 2.24.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: virtio-dev-return-6925-cohuck=redhat.com@lists.oasis-open.org Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id 2AC6A985FD9 for ; Wed, 11 Mar 2020 17:16:04 +0000 (UTC) From: David Hildenbrand Date: Wed, 11 Mar 2020 18:14:16 +0100 Message-Id: <20200311171422.10484-5-david@redhat.com> In-Reply-To: <20200311171422.10484-1-david@redhat.com> References: <20200311171422.10484-1-david@redhat.com> MIME-Version: 1.0 Subject: [virtio-dev] [PATCH v2 04/10] virtio-mem: Paravirtualized memory hotunplug part 2 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, virtio-dev@lists.oasis-open.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, Michal Hocko , Andrew Morton , "Michael S . Tsirkin" , David Hildenbrand , Michal Hocko , Jason Wang , Oscar Salvador , Igor Mammedov , Dave Young , Dan Williams , Pavel Tatashin , Stefan Hajnoczi , Vlastimil Babka , Mel Gorman , Mike Rapoport , Alexander Duyck , Alexander Potapenko List-ID: We also want to unplug online memory (contained in online memory blocks and, therefore, managed by the buddy), and eventually replug it later. When requested to unplug memory, we use alloc_contig_range() to allocate subblocks in online memory blocks (so we are the owner) and send them to our hypervisor. When requested to plug memory, we can replug such memory using free_contig_range() after asking our hypervisor. We also want to mark all allocated pages PG_offline, so nobody will touch them. To differentiate pages that were never onlined when onlining the memory block from pages allocated via alloc_contig_range(), we use PageDirty(). Based on this flag, virtio_mem_fake_online() can either online the pages for the first time or use free_contig_range(). It is worth noting that there are no guarantees on how much memory can actually get unplugged again. All device memory might completely be fragmented with unmovable data, such that no subblock can get unplugged. We are not touching the ZONE_MOVABLE. If memory is onlined to the ZONE_MOVABLE, it can only get unplugged after that memory was offlined manually by user space. In normal operation, virtio-mem memory is suggested to be onlined to ZONE_NORMAL. In the future, we will try to make unplug more likely to succeed. Add a module parameter to control if online memory shall be touched. As we want to access alloc_contig_range()/free_contig_range() from kernel module context, export the symbols. Note: Whenever virtio-mem uses alloc_contig_range(), all affected pages are on the same node, in the same zone, and contain no holes. Acked-by: Michal Hocko # to export contig range allocator= API Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Oscar Salvador Cc: Michal Hocko Cc: Igor Mammedov Cc: Dave Young Cc: Andrew Morton Cc: Dan Williams Cc: Pavel Tatashin Cc: Stefan Hajnoczi Cc: Vlastimil Babka Cc: Mel Gorman Cc: Mike Rapoport Cc: Alexander Duyck Cc: Alexander Potapenko Signed-off-by: David Hildenbrand --- drivers/virtio/Kconfig | 1 + drivers/virtio/virtio_mem.c | 157 ++++++++++++++++++++++++++++++++---- mm/page_alloc.c | 2 + 3 files changed, 146 insertions(+), 14 deletions(-) diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 95ea2094a6b4..6af35ffb9796 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -72,6 +72,7 @@ config VIRTIO_MEM =09depends on VIRTIO =09depends on MEMORY_HOTPLUG_SPARSE =09depends on MEMORY_HOTREMOVE +=09select CONTIG_ALLOC =09help =09 This driver provides access to virtio-mem paravirtualized memory =09 devices, allowing to hotplug and hotunplug memory. diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index c1fc7f9c4acf..5b26d57be551 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -22,6 +22,10 @@ =20 #include =20 +static bool unplug_online =3D true; +module_param(unplug_online, bool, 0644); +MODULE_PARM_DESC(unplug_online, "Try to unplug online memory"); + enum virtio_mem_mb_state { =09/* Unplugged, not added to Linux. Can be reused later. */ =09VIRTIO_MEM_MB_STATE_UNUSED =3D 0, @@ -652,23 +656,35 @@ static int virtio_mem_memory_notifier_cb(struct notif= ier_block *nb, } =20 /* - * Set a range of pages PG_offline. + * Set a range of pages PG_offline. Remember pages that were never onlined + * (via generic_online_page()) using PageDirty(). */ static void virtio_mem_set_fake_offline(unsigned long pfn, -=09=09=09=09=09unsigned int nr_pages) +=09=09=09=09=09unsigned int nr_pages, bool onlined) { -=09for (; nr_pages--; pfn++) -=09=09__SetPageOffline(pfn_to_page(pfn)); +=09for (; nr_pages--; pfn++) { +=09=09struct page *page =3D pfn_to_page(pfn); + +=09=09__SetPageOffline(page); +=09=09if (!onlined) +=09=09=09SetPageDirty(page); +=09} } =20 /* - * Clear PG_offline from a range of pages. + * Clear PG_offline from a range of pages. If the pages were never onlined= , + * (via generic_online_page()), clear PageDirty(). */ static void virtio_mem_clear_fake_offline(unsigned long pfn, -=09=09=09=09=09 unsigned int nr_pages) +=09=09=09=09=09 unsigned int nr_pages, bool onlined) { -=09for (; nr_pages--; pfn++) -=09=09__ClearPageOffline(pfn_to_page(pfn)); +=09for (; nr_pages--; pfn++) { +=09=09struct page *page =3D pfn_to_page(pfn); + +=09=09__ClearPageOffline(page); +=09=09if (!onlined) +=09=09=09ClearPageDirty(page); +=09} } =20 /* @@ -684,10 +700,26 @@ static void virtio_mem_fake_online(unsigned long pfn,= unsigned int nr_pages) =09 * We are always called with subblock granularity, which is at least =09 * aligned to MAX_ORDER - 1. =09 */ -=09virtio_mem_clear_fake_offline(pfn, nr_pages); +=09for (i =3D 0; i < nr_pages; i +=3D 1 << order) { +=09=09struct page *page =3D pfn_to_page(pfn + i); =20 -=09for (i =3D 0; i < nr_pages; i +=3D 1 << order) -=09=09generic_online_page(pfn_to_page(pfn + i), order); +=09=09/* +=09=09 * If the page is PageDirty(), it was kept fake-offline when +=09=09 * onlining the memory block. Otherwise, it was allocated +=09=09 * using alloc_contig_range(). All pages in a subblock are +=09=09 * alike. +=09=09 */ +=09=09if (PageDirty(page)) { +=09=09=09virtio_mem_clear_fake_offline(pfn + i, 1 << order, +=09=09=09=09=09=09 false); +=09=09=09generic_online_page(page, order); +=09=09} else { +=09=09=09virtio_mem_clear_fake_offline(pfn + i, 1 << order, +=09=09=09=09=09=09 true); +=09=09=09free_contig_range(pfn + i, 1 << order); +=09=09=09adjust_managed_page_count(page, 1 << order); +=09=09} +=09} } =20 static void virtio_mem_online_page_cb(struct page *page, unsigned int orde= r) @@ -716,7 +748,8 @@ static void virtio_mem_online_page_cb(struct page *page= , unsigned int order) =09=09if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1)) =09=09=09generic_online_page(page, order); =09=09else -=09=09=09virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order); +=09=09=09virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order, +=09=09=09=09=09=09 false); =09=09rcu_read_unlock(); =09=09return; =09} @@ -1184,6 +1217,72 @@ static int virtio_mem_mb_unplug_any_sb_offline(struc= t virtio_mem *vm, =09return 0; } =20 +/* + * Unplug the desired number of plugged subblocks of an online memory bloc= k. + * Will skip subblock that are busy. + * + * Will modify the state of the memory block. + * + * Note: Can fail after some subblocks were successfully unplugged. Can + * return 0 even if subblocks were busy and could not get unplugged. + */ +static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm, +=09=09=09=09=09 unsigned long mb_id, +=09=09=09=09=09 uint64_t *nb_sb) +{ +=09const unsigned long nr_pages =3D PFN_DOWN(vm->subblock_size); +=09unsigned long start_pfn; +=09int rc, sb_id; + +=09/* +=09 * TODO: To increase the performance we want to try bigger, consecutive +=09 * subblocks first before falling back to single subblocks. Also, +=09 * we should sense via something like is_mem_section_removable() +=09 * first if it makes sense to go ahead any try to allocate. +=09 */ +=09for (sb_id =3D 0; sb_id < vm->nb_sb_per_mb && *nb_sb; sb_id++) { +=09=09/* Find the next candidate subblock */ +=09=09while (sb_id < vm->nb_sb_per_mb && +=09=09 !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1)) +=09=09=09sb_id++; +=09=09if (sb_id >=3D vm->nb_sb_per_mb) +=09=09=09break; + +=09=09start_pfn =3D PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) + +=09=09=09=09 sb_id * vm->subblock_size); +=09=09rc =3D alloc_contig_range(start_pfn, start_pfn + nr_pages, +=09=09=09=09=09MIGRATE_MOVABLE, GFP_KERNEL); +=09=09if (rc =3D=3D -ENOMEM) +=09=09=09/* whoops, out of memory */ +=09=09=09return rc; +=09=09if (rc) +=09=09=09/* memory busy, we can't unplug this chunk */ +=09=09=09continue; + +=09=09/* Mark it as fake-offline before unplugging it */ +=09=09virtio_mem_set_fake_offline(start_pfn, nr_pages, true); +=09=09adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages); + +=09=09/* Try to unplug the allocated memory */ +=09=09rc =3D virtio_mem_mb_unplug_sb(vm, mb_id, sb_id, 1); +=09=09if (rc) { +=09=09=09/* Return the memory to the buddy. */ +=09=09=09virtio_mem_fake_online(start_pfn, nr_pages); +=09=09=09return rc; +=09=09} + +=09=09virtio_mem_mb_set_state(vm, mb_id, +=09=09=09=09=09VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL); +=09=09*nb_sb -=3D 1; +=09} + +=09/* +=09 * TODO: Once all subblocks of a memory block were unplugged, we want +=09 * to offline the memory block and remove it. +=09 */ +=09return 0; +} + /* * Try to unplug the requested amount of memory. */ @@ -1223,8 +1322,37 @@ static int virtio_mem_unplug_request(struct virtio_m= em *vm, uint64_t diff) =09=09cond_resched(); =09} =20 +=09if (!unplug_online) { +=09=09mutex_unlock(&vm->hotplug_mutex); +=09=09return 0; +=09} + +=09/* Try to unplug subblocks of partially plugged online blocks. */ +=09virtio_mem_for_each_mb_state_rev(vm, mb_id, +=09=09=09=09=09 VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL) { +=09=09rc =3D virtio_mem_mb_unplug_any_sb_online(vm, mb_id, +=09=09=09=09=09=09=09&nb_sb); +=09=09if (rc || !nb_sb) +=09=09=09goto out_unlock; +=09=09mutex_unlock(&vm->hotplug_mutex); +=09=09cond_resched(); +=09=09mutex_lock(&vm->hotplug_mutex); +=09} + +=09/* Try to unplug subblocks of plugged online blocks. */ +=09virtio_mem_for_each_mb_state_rev(vm, mb_id, +=09=09=09=09=09 VIRTIO_MEM_MB_STATE_ONLINE) { +=09=09rc =3D virtio_mem_mb_unplug_any_sb_online(vm, mb_id, +=09=09=09=09=09=09=09&nb_sb); +=09=09if (rc || !nb_sb) +=09=09=09goto out_unlock; +=09=09mutex_unlock(&vm->hotplug_mutex); +=09=09cond_resched(); +=09=09mutex_lock(&vm->hotplug_mutex); +=09} + =09mutex_unlock(&vm->hotplug_mutex); -=09return 0; +=09return nb_sb ? -EBUSY : 0; out_unlock: =09mutex_unlock(&vm->hotplug_mutex); =09return rc; @@ -1330,7 +1458,8 @@ static void virtio_mem_run_wq(struct work_struct *wor= k) =09case -EBUSY: =09=09/* =09=09 * The hypervisor cannot process our request right now -=09=09 * (e.g., out of memory, migrating). +=09=09 * (e.g., out of memory, migrating) or we cannot free up +=09=09 * any memory to unplug it (all plugged memory is busy). =09=09 */ =09case -ENOMEM: =09=09/* Out of memory, try again later. */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 79e950d76ffc..8d7be3f33e26 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8597,6 +8597,7 @@ int alloc_contig_range(unsigned long start, unsigned = long end, =09=09=09=09pfn_max_align_up(end), migratetype); =09return ret; } +EXPORT_SYMBOL(alloc_contig_range); =20 static int __alloc_contig_pages(unsigned long start_pfn, =09=09=09=09unsigned long nr_pages, gfp_t gfp_mask) @@ -8712,6 +8713,7 @@ void free_contig_range(unsigned long pfn, unsigned in= t nr_pages) =09} =09WARN(count !=3D 0, "%d pages are still in use!\n", count); } +EXPORT_SYMBOL(free_contig_range); =20 /* * The zone indicated has a new number of managed_pages; batch sizes and p= ercpu --=20 2.24.1 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org