From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C4E6C433DF for ; Mon, 12 Oct 2020 12:57:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CA55921D81 for ; Mon, 12 Oct 2020 12:57:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MqDhH5IG" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA55921D81 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5372E94000F; Mon, 12 Oct 2020 08:57:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E67D900002; Mon, 12 Oct 2020 08:57:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D61594000F; Mon, 12 Oct 2020 08:57:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id F24A5900002 for ; Mon, 12 Oct 2020 08:57:00 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8F7848249980 for ; Mon, 12 Oct 2020 12:57:00 +0000 (UTC) X-FDA: 77363273400.16.chalk47_0416353271fa Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 6B4EC10145562 for ; Mon, 12 Oct 2020 12:57:00 +0000 (UTC) X-HE-Tag: chalk47_0416353271fa X-Filterd-Recvd-Size: 26395 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Mon, 12 Oct 2020 12:56:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602507419; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aOzyQHDBEE79xCP4ha2DfODouMh64klRv5EoPAxqGmo=; b=MqDhH5IGjpr2yejxY2rlpZAMfGYYeDb1LRveCNM/lDW7BXZgTb4hxg1g6B16pnhd5dM8Wc IcH30CxukDUyRi8DAshvup43csnZS2Sn5IfLbQpGqLRtH7wgBdtrUsrz5mJ1+DT84IEaSk YhSQKPuxrBJcUeFnh/QGzDoCX5cq5zo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-593-xj5h_pewNDWzFzQ8axFMJw-1; Mon, 12 Oct 2020 08:56:57 -0400 X-MC-Unique: xj5h_pewNDWzFzQ8axFMJw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C35FF100559C; Mon, 12 Oct 2020 12:56:55 +0000 (UTC) Received: from t480s.redhat.com (ovpn-113-251.ams2.redhat.com [10.36.113.251]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1ED1C60C07; Mon, 12 Oct 2020 12:56:47 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, virtualization@lists.linux-foundation.org, Andrew Morton , "Michael S . Tsirkin" , David Hildenbrand , Jason Wang , Pankaj Gupta , Michal Hocko , Oscar Salvador , Wei Yang Subject: [PATCH v1 25/29] virtio-mem: Big Block Mode (BBM) memory hotplug Date: Mon, 12 Oct 2020 14:53:19 +0200 Message-Id: <20201012125323.17509-26-david@redhat.com> In-Reply-To: <20201012125323.17509-1-david@redhat.com> References: <20201012125323.17509-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, we do not support device block sizes that exceed the Linux memory block size. For example, having a device block size of 1 GiB (e.g.= , gigantic pages in the hypervisor) won't work with 128 MiB Linux memory blocks. Let's implement Big Block Mode (BBM), whereby we add/remove at least one Linux memory block at a time. With a 1 GiB device block size, a Big Block (BB) will cover 8 Linux memory blocks. We'll keep registering the online_page_callback machinery, it will be use= d for safe memory hotunplug in BBM next. Note: BBM is properly prepared for variable-sized Linux memory blocks that we might see in the future. So we won't care how many Linux memory blocks a big block actually spans, and how the memory notifier is called. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Pankaj Gupta Cc: Michal Hocko Cc: Oscar Salvador Cc: Wei Yang Cc: Andrew Morton Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 484 ++++++++++++++++++++++++++++++------ 1 file changed, 402 insertions(+), 82 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index e68d0d99590c..4d396ef98a92 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -30,12 +30,18 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online= memory"); /* * virtio-mem currently supports the following modes of operation: * - * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB= ). The + * * Sub Block Mode (SBM): A Linux memory block spans 2..X subblocks (SB= ). The * size of a Sub Block (SB) is determined based on the device block si= ze, the * pageblock size, and the maximum allocation granularity of the buddy= . * Subblocks within a Linux memory block might either be plugged or un= plugged. * Memory is added/removed to Linux MM in Linux memory block granulari= ty. * + * * Big Block Mode (BBM): A Big Block (BB) spans 1..X Linux memory bloc= ks. + * Memory is added/removed to Linux MM in Big Block granularity. + * + * The mode is determined automatically based on the Linux memory block = size + * and the device block size. + * * User space / core MM (auto onlining) is responsible for onlining adde= d * Linux memory blocks - and for selecting a zone. Linux Memory Blocks a= re * always onlined separately, and all memory within a Linux memory block= is @@ -61,6 +67,19 @@ enum virtio_mem_sbm_mb_state { VIRTIO_MEM_SBM_MB_COUNT }; =20 +/* + * State of a Big Block (BB) in BBM, covering 1..X Linux memory blocks. + */ +enum virtio_mem_bbm_bb_state { + /* Unplugged, not added to Linux. Can be reused later. */ + VIRTIO_MEM_BBM_BB_UNUSED =3D 0, + /* Plugged, not added to Linux. Error on add_memory(). */ + VIRTIO_MEM_BBM_BB_PLUGGED, + /* Plugged and added to Linux. */ + VIRTIO_MEM_BBM_BB_ADDED, + VIRTIO_MEM_BBM_BB_COUNT +}; + struct virtio_mem { struct virtio_device *vdev; =20 @@ -113,6 +132,9 @@ struct virtio_mem { atomic64_t offline_size; uint64_t offline_threshold; =20 + /* If set, the driver is in SBM, otherwise in BBM. */ + bool in_sbm; + struct { /* Id of the first memory block of this device. */ unsigned long first_mb_id; @@ -151,9 +173,27 @@ struct virtio_mem { unsigned long *sb_states; } sbm; =20 + struct { + /* Id of the first big block of this device. */ + unsigned long first_bb_id; + /* Id of the last usable big block of this device. */ + unsigned long last_usable_bb_id; + /* Id of the next device bock to prepare when needed. */ + unsigned long next_bb_id; + + /* Summary of all big block states. */ + unsigned long bb_count[VIRTIO_MEM_BBM_BB_COUNT]; + + /* One byte state per big block. See sbm.mb_states. */ + uint8_t *bb_states; + + /* The block size used for (un)plugged, adding/removing. */ + uint64_t bb_size; + } bbm; + /* - * Mutex that protects the sbm.mb_count, sbm.mb_states, and - * sbm.sb_states. + * Mutex that protects the sbm.mb_count, sbm.mb_states, + * sbm.sb_states, bbm.bb_count, and bbm.bb_states * * When this lock is held the pointers can't change, ONLINE and * OFFLINE blocks can't change the state and no subblocks will get @@ -247,6 +287,24 @@ static unsigned long virtio_mem_mb_id_to_phys(unsign= ed long mb_id) return mb_id * memory_block_size_bytes(); } =20 +/* + * Calculate the big block id of a given address. + */ +static unsigned long virtio_mem_phys_to_bb_id(struct virtio_mem *vm, + uint64_t addr) +{ + return addr / vm->bbm.bb_size; +} + +/* + * Calculate the physical start address of a given big block id. + */ +static uint64_t virtio_mem_bb_id_to_phys(struct virtio_mem *vm, + unsigned long bb_id) +{ + return bb_id * vm->bbm.bb_size; +} + /* * Calculate the subblock id of a given address. */ @@ -259,6 +317,67 @@ static unsigned long virtio_mem_phys_to_sb_id(struct= virtio_mem *vm, return (addr - mb_addr) / vm->sbm.sb_size; } =20 +/* + * Set the state of a big block, taking care of the state counter. + */ +static void virtio_mem_bbm_set_bb_state(struct virtio_mem *vm, + unsigned long bb_id, + enum virtio_mem_bbm_bb_state state) +{ + const unsigned long idx =3D bb_id - vm->bbm.first_bb_id; + enum virtio_mem_bbm_bb_state old_state; + + old_state =3D vm->bbm.bb_states[idx]; + vm->bbm.bb_states[idx] =3D state; + + BUG_ON(vm->bbm.bb_count[old_state] =3D=3D 0); + vm->bbm.bb_count[old_state]--; + vm->bbm.bb_count[state]++; +} + +/* + * Get the state of a big block. + */ +static enum virtio_mem_bbm_bb_state virtio_mem_bbm_get_bb_state(struct v= irtio_mem *vm, + unsigned long bb_id) +{ + return vm->bbm.bb_states[bb_id - vm->bbm.first_bb_id]; +} + +/* + * Prepare the big block state array for the next big block. + */ +static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *v= m) +{ + unsigned long old_bytes =3D vm->bbm.next_bb_id - vm->bbm.first_bb_id; + unsigned long new_bytes =3D old_bytes + 1; + int old_pages =3D PFN_UP(old_bytes); + int new_pages =3D PFN_UP(new_bytes); + uint8_t *new_array; + + if (vm->bbm.bb_states && old_pages =3D=3D new_pages) + return 0; + + new_array =3D vzalloc(new_pages * PAGE_SIZE); + if (!new_array) + return -ENOMEM; + + mutex_lock(&vm->hotplug_mutex); + if (vm->bbm.bb_states) + memcpy(new_array, vm->bbm.bb_states, old_pages * PAGE_SIZE); + vfree(vm->bbm.bb_states); + vm->bbm.bb_states =3D new_array; + mutex_unlock(&vm->hotplug_mutex); + + return 0; +} + +#define virtio_mem_bbm_for_each_bb(_vm, _bb_id, _state) \ + for (_bb_id =3D vm->bbm.first_bb_id; \ + _bb_id < vm->bbm.next_bb_id && _vm->bbm.bb_count[_state]; \ + _bb_id++) \ + if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) =3D=3D _state) + /* * Set the state of a memory block, taking care of the state counter. */ @@ -504,6 +623,17 @@ static int virtio_mem_sbm_add_mb(struct virtio_mem *= vm, unsigned long mb_id) return virtio_mem_add_memory(vm, addr, size); } =20 +/* + * See virtio_mem_add_memory(): Try adding a big block. + */ +static int virtio_mem_bbm_add_bb(struct virtio_mem *vm, unsigned long bb= _id) +{ + const uint64_t addr =3D virtio_mem_bb_id_to_phys(vm, bb_id); + const uint64_t size =3D vm->bbm.bb_size; + + return virtio_mem_add_memory(vm, addr, size); +} + /* * Try removing memory from Linux. Will only fail if memory blocks aren'= t * offline. @@ -731,20 +861,33 @@ static int virtio_mem_memory_notifier_cb(struct not= ifier_block *nb, struct memory_notify *mhp =3D arg; const unsigned long start =3D PFN_PHYS(mhp->start_pfn); const unsigned long size =3D PFN_PHYS(mhp->nr_pages); - const unsigned long mb_id =3D virtio_mem_phys_to_mb_id(start); int rc =3D NOTIFY_OK; + unsigned long id; =20 if (!virtio_mem_overlaps_range(vm, start, size)) return NOTIFY_DONE; =20 - /* - * Memory is onlined/offlined in memory block granularity. We cannot - * cross virtio-mem device boundaries and memory block boundaries. Bail - * out if this ever changes. - */ - if (WARN_ON_ONCE(size !=3D memory_block_size_bytes() || - !IS_ALIGNED(start, memory_block_size_bytes()))) - return NOTIFY_BAD; + if (vm->in_sbm) { + id =3D virtio_mem_phys_to_mb_id(start); + /* + * In SBM, we add memory in separate memory blocks - we expect + * it to be onlined/offlined in the same granularity. Bail out + * if this ever changes. + */ + if (WARN_ON_ONCE(size !=3D memory_block_size_bytes() || + !IS_ALIGNED(start, memory_block_size_bytes()))) + return NOTIFY_BAD; + } else { + id =3D virtio_mem_phys_to_bb_id(vm, start); + /* + * In BBM, we only care about onlining/offlining happening + * within a single big block, we don't care about the + * actual granularity as we don't track individual Linux + * memory blocks. + */ + if (WARN_ON_ONCE(id !=3D virtio_mem_phys_to_bb_id(vm, start + size - 1= ))) + return NOTIFY_BAD; + } =20 /* * Avoid circular locking lockdep warnings. We lock the mutex @@ -763,7 +906,8 @@ static int virtio_mem_memory_notifier_cb(struct notif= ier_block *nb, break; } vm->hotplug_active =3D true; - virtio_mem_sbm_notify_going_offline(vm, mb_id); + if (vm->in_sbm) + virtio_mem_sbm_notify_going_offline(vm, id); break; case MEM_GOING_ONLINE: mutex_lock(&vm->hotplug_mutex); @@ -773,10 +917,12 @@ static int virtio_mem_memory_notifier_cb(struct not= ifier_block *nb, break; } vm->hotplug_active =3D true; - rc =3D virtio_mem_sbm_notify_going_online(vm, mb_id); + if (vm->in_sbm) + rc =3D virtio_mem_sbm_notify_going_online(vm, id); break; case MEM_OFFLINE: - virtio_mem_sbm_notify_offline(vm, mb_id); + if (vm->in_sbm) + virtio_mem_sbm_notify_offline(vm, id); =20 atomic64_add(size, &vm->offline_size); /* @@ -790,7 +936,8 @@ static int virtio_mem_memory_notifier_cb(struct notif= ier_block *nb, mutex_unlock(&vm->hotplug_mutex); break; case MEM_ONLINE: - virtio_mem_sbm_notify_online(vm, mb_id); + if (vm->in_sbm) + virtio_mem_sbm_notify_online(vm, id); =20 atomic64_sub(size, &vm->offline_size); /* @@ -809,7 +956,8 @@ static int virtio_mem_memory_notifier_cb(struct notif= ier_block *nb, case MEM_CANCEL_OFFLINE: if (!vm->hotplug_active) break; - virtio_mem_sbm_notify_cancel_offline(vm, mb_id); + if (vm->in_sbm) + virtio_mem_sbm_notify_cancel_offline(vm, id); vm->hotplug_active =3D false; mutex_unlock(&vm->hotplug_mutex); break; @@ -980,27 +1128,29 @@ static void virtio_mem_fake_offline_cancel_offline= (unsigned long pfn, static void virtio_mem_online_page_cb(struct page *page, unsigned int or= der) { const unsigned long addr =3D page_to_phys(page); - const unsigned long mb_id =3D virtio_mem_phys_to_mb_id(addr); + unsigned long id, sb_id; struct virtio_mem *vm; - int sb_id; + bool do_online; =20 - /* - * We exploit here that subblocks have at least MAX_ORDER_NR_PAGES. - * size/alignment and that this callback is is called with such a - * size/alignment. So we cannot cross subblocks and therefore - * also not memory blocks. - */ rcu_read_lock(); list_for_each_entry_rcu(vm, &virtio_mem_devices, next) { if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order))) continue; =20 - sb_id =3D virtio_mem_phys_to_sb_id(vm, addr); - /* - * If plugged, online the pages, otherwise, set them fake - * offline (PageOffline). - */ - if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1)) + if (vm->in_sbm) { + /* + * We exploit here that subblocks have at least + * MAX_ORDER_NR_PAGES size/alignment - so we cannot + * cross subblocks within one call. + */ + id =3D virtio_mem_phys_to_mb_id(addr); + sb_id =3D virtio_mem_phys_to_sb_id(vm, addr); + do_online =3D virtio_mem_sbm_test_sb_plugged(vm, id, + sb_id, 1); + } else { + do_online =3D true; + } + if (do_online) generic_online_page(page, order); else virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order, @@ -1180,6 +1330,32 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_= mem *vm, unsigned long mb_id, return rc; } =20 +/* + * Request to unplug a big block. + * + * Will not modify the state of the big block. + */ +static int virtio_mem_bbm_unplug_bb(struct virtio_mem *vm, unsigned long= bb_id) +{ + const uint64_t addr =3D virtio_mem_bb_id_to_phys(vm, bb_id); + const uint64_t size =3D vm->bbm.bb_size; + + return virtio_mem_send_unplug_request(vm, addr, size); +} + +/* + * Request to plug a big block. + * + * Will not modify the state of the big block. + */ +static int virtio_mem_bbm_plug_bb(struct virtio_mem *vm, unsigned long b= b_id) +{ + const uint64_t addr =3D virtio_mem_bb_id_to_phys(vm, bb_id); + const uint64_t size =3D vm->bbm.bb_size; + + return virtio_mem_send_plug_request(vm, addr, size); +} + /* * Unplug the desired number of plugged subblocks of a offline or not-ad= ded * memory block. Will fail if any subblock cannot get unplugged (instead= of @@ -1365,10 +1541,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virti= o_mem *vm, return 0; } =20 -/* - * Try to plug the requested amount of memory. - */ -static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff) +static int virtio_mem_sbm_plug_request(struct virtio_mem *vm, uint64_t d= iff) { uint64_t nb_sb =3D diff / vm->sbm.sb_size; unsigned long mb_id; @@ -1435,6 +1608,112 @@ static int virtio_mem_plug_request(struct virtio_= mem *vm, uint64_t diff) return rc; } =20 +/* + * Plug a big block and add it to Linux. + * + * Will modify the state of the big block. + */ +static int virtio_mem_bbm_plug_and_add_bb(struct virtio_mem *vm, + unsigned long bb_id) +{ + int rc; + + if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=3D + VIRTIO_MEM_BBM_BB_UNUSED)) + return -EINVAL; + + rc =3D virtio_mem_bbm_plug_bb(vm, bb_id); + if (rc) + return rc; + virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED); + + rc =3D virtio_mem_bbm_add_bb(vm, bb_id); + if (rc) { + if (!virtio_mem_bbm_unplug_bb(vm, bb_id)) + virtio_mem_bbm_set_bb_state(vm, bb_id, + VIRTIO_MEM_BBM_BB_UNUSED); + else + /* Retry from the main loop. */ + virtio_mem_bbm_set_bb_state(vm, bb_id, + VIRTIO_MEM_BBM_BB_PLUGGED); + return rc; + } + return 0; +} + +/* + * Prepare tracking data for the next big block. + */ +static int virtio_mem_bbm_prepare_next_bb(struct virtio_mem *vm, + unsigned long *bb_id) +{ + int rc; + + if (vm->bbm.next_bb_id > vm->bbm.last_usable_bb_id) + return -ENOSPC; + + /* Resize the big block state array if required. */ + rc =3D virtio_mem_bbm_bb_states_prepare_next_bb(vm); + if (rc) + return rc; + + vm->bbm.bb_count[VIRTIO_MEM_BBM_BB_UNUSED]++; + *bb_id =3D vm->bbm.next_bb_id; + vm->bbm.next_bb_id++; + return 0; +} + +static int virtio_mem_bbm_plug_request(struct virtio_mem *vm, uint64_t d= iff) +{ + uint64_t nb_bb =3D diff / vm->bbm.bb_size; + unsigned long bb_id; + int rc; + + if (!nb_bb) + return 0; + + /* Try to plug and add unused big blocks */ + virtio_mem_bbm_for_each_bb(vm, bb_id, VIRTIO_MEM_BBM_BB_UNUSED) { + if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size)) + return -ENOSPC; + + rc =3D virtio_mem_bbm_plug_and_add_bb(vm, bb_id); + if (!rc) + nb_bb--; + if (rc || !nb_bb) + return rc; + cond_resched(); + } + + /* Try to prepare, plug and add new big blocks */ + while (nb_bb) { + if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size)) + return -ENOSPC; + + rc =3D virtio_mem_bbm_prepare_next_bb(vm, &bb_id); + if (rc) + return rc; + rc =3D virtio_mem_bbm_plug_and_add_bb(vm, bb_id); + if (!rc) + nb_bb--; + if (rc) + return rc; + cond_resched(); + } + + return 0; +} + +/* + * Try to plug the requested amount of memory. + */ +static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff) +{ + if (vm->in_sbm) + return virtio_mem_sbm_plug_request(vm, diff); + return virtio_mem_bbm_plug_request(vm, diff); +} + /* * Unplug the desired number of plugged subblocks of an offline memory b= lock. * Will fail if any subblock cannot get unplugged (instead of skipping i= t). @@ -1573,10 +1852,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(str= uct virtio_mem *vm, return 0; } =20 -/* - * Try to unplug the requested amount of memory. - */ -static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t dif= f) +static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t= diff) { uint64_t nb_sb =3D diff / vm->sbm.sb_size; unsigned long mb_id; @@ -1642,20 +1918,42 @@ static int virtio_mem_unplug_request(struct virti= o_mem *vm, uint64_t diff) return rc; } =20 +/* + * Try to unplug the requested amount of memory. + */ +static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t dif= f) +{ + if (vm->in_sbm) + return virtio_mem_sbm_unplug_request(vm, diff); + return -EBUSY; +} + /* * Try to unplug all blocks that couldn't be unplugged before, for examp= le, * because the hypervisor was busy. */ static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm) { - unsigned long mb_id; + unsigned long id; int rc; =20 - virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) { - rc =3D virtio_mem_sbm_unplug_mb(vm, mb_id); + if (!vm->in_sbm) { + virtio_mem_bbm_for_each_bb(vm, id, + VIRTIO_MEM_BBM_BB_PLUGGED) { + rc =3D virtio_mem_bbm_unplug_bb(vm, id); + if (rc) + return rc; + virtio_mem_bbm_set_bb_state(vm, id, + VIRTIO_MEM_BBM_BB_UNUSED); + } + return 0; + } + + virtio_mem_sbm_for_each_mb(vm, id, VIRTIO_MEM_SBM_MB_PLUGGED) { + rc =3D virtio_mem_sbm_unplug_mb(vm, id); if (rc) return rc; - virtio_mem_sbm_set_mb_state(vm, mb_id, + virtio_mem_sbm_set_mb_state(vm, id, VIRTIO_MEM_SBM_MB_UNUSED); } =20 @@ -1681,7 +1979,13 @@ static void virtio_mem_refresh_config(struct virti= o_mem *vm) usable_region_size, &usable_region_size); end_addr =3D vm->addr + usable_region_size; end_addr =3D min(end_addr, phys_limit); - vm->sbm.last_usable_mb_id =3D virtio_mem_phys_to_mb_id(end_addr) - 1; + + if (vm->in_sbm) + vm->sbm.last_usable_mb_id =3D + virtio_mem_phys_to_mb_id(end_addr) - 1; + else + vm->bbm.last_usable_bb_id =3D + virtio_mem_phys_to_bb_id(vm, end_addr) - 1; =20 /* see if there is a request to change the size */ virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size, @@ -1804,6 +2108,7 @@ static int virtio_mem_init_vq(struct virtio_mem *vm= ) static int virtio_mem_init(struct virtio_mem *vm) { const uint64_t phys_limit =3D 1UL << MAX_PHYSMEM_BITS; + uint64_t sb_size, addr; uint16_t node_id; =20 if (!vm->vdev->config->get) { @@ -1836,16 +2141,6 @@ static int virtio_mem_init(struct virtio_mem *vm) if (vm->nid =3D=3D NUMA_NO_NODE) vm->nid =3D memory_add_physaddr_to_nid(vm->addr); =20 - /* - * We always hotplug memory in memory block granularity. This way, - * we have to wait for exactly one memory block to online. - */ - if (vm->device_block_size > memory_block_size_bytes()) { - dev_err(&vm->vdev->dev, - "The block size is not supported (too big).\n"); - return -EINVAL; - } - /* bad device setup - warn only */ if (!IS_ALIGNED(vm->addr, memory_block_size_bytes())) dev_warn(&vm->vdev->dev, @@ -1865,20 +2160,35 @@ static int virtio_mem_init(struct virtio_mem *vm) * - Is required for now for alloc_contig_range() to work reliably - * it doesn't properly handle smaller granularity on ZONE_NORMAL. */ - vm->sbm.sb_size =3D max_t(uint64_t, MAX_ORDER_NR_PAGES, - pageblock_nr_pages) * PAGE_SIZE; - vm->sbm.sb_size =3D max_t(uint64_t, vm->device_block_size, - vm->sbm.sb_size); - vm->sbm.sbs_per_mb =3D memory_block_size_bytes() / vm->sbm.sb_size; + sb_size =3D max_t(uint64_t, MAX_ORDER_NR_PAGES, + pageblock_nr_pages) * PAGE_SIZE; + sb_size =3D max_t(uint64_t, vm->device_block_size, sb_size); + + if (sb_size < memory_block_size_bytes()) { + /* SBM: At least two subblocks per Linux memory block. */ + vm->in_sbm =3D true; + vm->sbm.sb_size =3D sb_size; + vm->sbm.sbs_per_mb =3D memory_block_size_bytes() / + vm->sbm.sb_size; + + /* Round up to the next full memory block */ + addr =3D vm->addr + memory_block_size_bytes() - 1; + vm->sbm.first_mb_id =3D virtio_mem_phys_to_mb_id(addr); + vm->sbm.next_mb_id =3D vm->sbm.first_mb_id; + } else { + /* BBM: At least one Linux memory block. */ + vm->bbm.bb_size =3D vm->device_block_size; =20 - /* Round up to the next full memory block */ - vm->sbm.first_mb_id =3D virtio_mem_phys_to_mb_id(vm->addr - 1 + - memory_block_size_bytes()); - vm->sbm.next_mb_id =3D vm->sbm.first_mb_id; + vm->bbm.first_bb_id =3D virtio_mem_phys_to_bb_id(vm, vm->addr); + vm->bbm.next_bb_id =3D vm->bbm.first_bb_id; + } =20 /* Prepare the offline threshold - make sure we can add two blocks. */ vm->offline_threshold =3D max_t(uint64_t, 2 * memory_block_size_bytes()= , VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD); + /* In BBM, we also want at least two big blocks. */ + vm->offline_threshold =3D max_t(uint64_t, 2 * vm->bbm.bb_size, + vm->offline_threshold); =20 dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr); dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size); @@ -1886,8 +2196,12 @@ static int virtio_mem_init(struct virtio_mem *vm) (unsigned long long)vm->device_block_size); dev_info(&vm->vdev->dev, "memory block size: 0x%lx", memory_block_size_bytes()); - dev_info(&vm->vdev->dev, "subblock size: 0x%llx", - (unsigned long long)vm->sbm.sb_size); + if (vm->in_sbm) + dev_info(&vm->vdev->dev, "subblock size: 0x%llx", + (unsigned long long)vm->sbm.sb_size); + else + dev_info(&vm->vdev->dev, "big block size: 0x%llx", + (unsigned long long)vm->bbm.bb_size); if (vm->nid !=3D NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA)) dev_info(&vm->vdev->dev, "nid: %d", vm->nid); =20 @@ -2044,22 +2358,24 @@ static void virtio_mem_remove(struct virtio_devic= e *vdev) cancel_work_sync(&vm->wq); hrtimer_cancel(&vm->retry_timer); =20 - /* - * After we unregistered our callbacks, user space can online partially - * plugged offline blocks. Make sure to remove them. - */ - virtio_mem_sbm_for_each_mb(vm, mb_id, - VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) { - rc =3D virtio_mem_sbm_remove_mb(vm, mb_id); - BUG_ON(rc); - virtio_mem_sbm_set_mb_state(vm, mb_id, - VIRTIO_MEM_SBM_MB_UNUSED); + if (vm->in_sbm) { + /* + * After we unregistered our callbacks, user space can online + * partially plugged offline blocks. Make sure to remove them. + */ + virtio_mem_sbm_for_each_mb(vm, mb_id, + VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) { + rc =3D virtio_mem_sbm_remove_mb(vm, mb_id); + BUG_ON(rc); + virtio_mem_sbm_set_mb_state(vm, mb_id, + VIRTIO_MEM_SBM_MB_UNUSED); + } + /* + * After we unregistered our callbacks, user space can no longer + * offline partially plugged online memory blocks. No need to + * worry about them. + */ } - /* - * After we unregistered our callbacks, user space can no longer - * offline partially plugged online memory blocks. No need to worry - * about them. - */ =20 /* unregister callbacks */ unregister_virtio_mem_device(vm); @@ -2078,8 +2394,12 @@ static void virtio_mem_remove(struct virtio_device= *vdev) } =20 /* remove all tracking data - no locking needed */ - vfree(vm->sbm.mb_states); - vfree(vm->sbm.sb_states); + if (vm->in_sbm) { + vfree(vm->sbm.mb_states); + vfree(vm->sbm.sb_states); + } else { + vfree(vm->bbm.bb_states); + } =20 /* reset the device and cleanup the queues */ vdev->config->reset(vdev); --=20 2.26.2