From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B673C761A6 for ; Tue, 4 Apr 2023 14:40:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E0B1F6B0071; Tue, 4 Apr 2023 10:40:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DBBCD6B0072; Tue, 4 Apr 2023 10:40:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5C226B0074; Tue, 4 Apr 2023 10:40:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B486D6B0071 for ; Tue, 4 Apr 2023 10:40:55 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8403FA0F5A for ; Tue, 4 Apr 2023 14:40:55 +0000 (UTC) X-FDA: 80643970470.22.4A938BE Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by imf21.hostedemail.com (Postfix) with ESMTP id 8789D1C001B for ; Tue, 4 Apr 2023 14:40:53 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AqFcW0+z; spf=pass (imf21.hostedemail.com: domain of zhi.wang.linux@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=zhi.wang.linux@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680619253; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8jMFTqZ3iX4TYtwGxvCBQk3Ib7hvU7vvi0bgFu56EK0=; b=U967J8YfH2fEsXnC7L6y13CpjrTP0CFYeCMoaMZOw3YHgXUc0+1c2wWE3K9Heg045fnM70 BTCdo0s0hvMci40xcPsLIO1kP7mYZkTa0yeHnIXL9+xciCrpyw6cu0aQsqC020DBHPfTEX uHNnfWowGtbfz1sLjKteJaKyTV2oQUU= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AqFcW0+z; spf=pass (imf21.hostedemail.com: domain of zhi.wang.linux@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=zhi.wang.linux@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680619253; a=rsa-sha256; cv=none; b=2oucQoSSHzt9QlGZ4BnyI3+Y+oy2bZ8rOKCFATAEzCZYPU2UnL0FGp1DzbkPkItOfnshiZ 2VXL38vz/leGbPQWDnh0D6ulOXT3191jcwI0nkBNFQu84NmwfxxdEFC8gf3RnjoqeU9ls6 mpEPVLoMFCVIhbKrzDA+QOoETsppWGw= Received: by mail-lj1-f179.google.com with SMTP id 38308e7fff4ca-2a5f619b4e2so760471fa.0 for ; Tue, 04 Apr 2023 07:40:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680619252; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=8jMFTqZ3iX4TYtwGxvCBQk3Ib7hvU7vvi0bgFu56EK0=; b=AqFcW0+z2nqWIvV59SVfS9ovPz0wpK2NVEr/92uv+2oNKtQqs0gM/H2+VYxlIo6YCr eD55r5NM39gud2Bz0x2CVKXX13ykMnkdqi+WwOXPr/tfqFyAbPK525CHZAJIHrXY53Ia fmt6XU4HZiphjGhVWXAO88HzQSDHCJwiWkb1bKbJUt3aalNhPv6t5mnKFv4YP/ovYrSO hP/i6xLR0H0NikKmLnUvedmvf+aRZKUcPUYzcaGhE6RkmsNzBNUaaYgQuXurUewfKcDj /blybMBPkXprc32WMhNsW+JdCZEkNHKFGzrqwx0XQsdCsK5KLMiiqnFlmSi21btzyxfu ddXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680619252; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8jMFTqZ3iX4TYtwGxvCBQk3Ib7hvU7vvi0bgFu56EK0=; b=KKVdZus+XH8TIs3/pKzf0N2aBVWjVppRDXZbz0ZEKDVgPaR9+0rlsYbOkD/bRWUFDS 5HMLBzgGNN+qgjL463LptJx77jiDXikLtJDmiphp672Lu/BAkcVjL6x06QiW2By1wtEy OmXMyGOEdzofkfvXSZK6ZRr+lAxDmLtYJh6lWnH1PEhSGr+fvP1dlSeLtEi4nkcsJ6As 4KzLxX5N3jF7wQ50T527OS7YUkmEVBD0EnUc8LVTpdCq+sJdnfbdL4mBDm7Pi6B37bav EtQMRKtPwmO/bFCrPLNQ8Ino8BRCs2jOfiHQcPAMep/nOhB7cXVnD/Y4yEVR3lIp+jBQ 3jqg== X-Gm-Message-State: AAQBX9f+0hfc6EF2LnnW1vu3TtTSQwAmOcEcL1pxNX+b8b3yQIjH6bPV BK0sg6IZiynQNhNBK+0eGTQ= X-Google-Smtp-Source: AKy350b1m3A3BFbhnjEWAZBZaO+cl87OMMz86TJAKmwKy4zTQ86+WUTaHK3S2APiVUkq35oqpY+1rQ== X-Received: by 2002:a2e:bd84:0:b0:2a6:1dbf:5d3e with SMTP id o4-20020a2ebd84000000b002a61dbf5d3emr1026319ljq.0.1680619251465; Tue, 04 Apr 2023 07:40:51 -0700 (PDT) Received: from localhost (88-115-161-74.elisa-laajakaista.fi. [88.115.161.74]) by smtp.gmail.com with ESMTPSA id a7-20020a05651c010700b0029c13f4d519sm2369013ljb.119.2023.04.04.07.40.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Apr 2023 07:40:51 -0700 (PDT) Date: Tue, 4 Apr 2023 17:40:48 +0300 From: Zhi Wang To: Michael Roth Cc: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Vishal Annapurve Subject: Re: [PATCH RFC v8 04/56] KVM: Add HVA range operator Message-ID: <20230404174048.00005ef9.zhi.wang.linux@gmail.com> In-Reply-To: <20230327003444.lqfrididd4gavomb@amd.com> References: <20230220183847.59159-1-michael.roth@amd.com> <20230220183847.59159-5-michael.roth@amd.com> <20230220233709.00006dfc@gmail.com> <20230327003444.lqfrididd4gavomb@amd.com> X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 8789D1C001B X-Stat-Signature: sehdp8cybdfoydngfb1874trf5w9orke X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1680619253-318382 X-HE-Meta: U2FsdGVkX1+4ahjQ5z7DEqVEnLyuGWy0MrpQ83A9jCFQ49wkCTBhbCkICOfuUwKAlMSI/ImN1DjQYVakJtriX7YsFze59p69vY8RPbxK0+csYx2vmWY1grIsSj6kgO49AfG4KnmbvyGF7B/hY3ZMyxP7kSfgbzBSDi5U4b16SFDEGgrYQ+Sc2QUiShzra0Ombehglbpp8xIcYBas11L005o/1sZGpLFJhqFSPQ0TTsGzB0P/e5L4V017Op5bq8RlYXlCxXIooup2Zrxf4YtRVt1t0/yc1nbret9wvM1vqCoj7QGuaV8eZ60x2ODgg85ZJoiRLNzkX8iAlhjliVsezbaRH2rpHzttqffP72WRnBazIDw0JgcJNrkPI6wRIML9+PkFtdflMcO0vDxh0whCqJ2iC7EbdgpVImZTGkwnDRlj4sweLiCYGjaGLI24C5HjwchybWUb1PijCvoy3mVLtFVVxhb6m3ODgLYEuEnF6deqbwDmNrk8ptTXCdIdMQjJBfxkDsxf0as7pH7ZXNzOmpAavQddfEQq1boLRKiqpAwDGostgDm41BaBWR1NY5FxORMXYdzF/+u7AzTfLEO2xModwnZhVSMsGwisRZk0mx1vCXIGWaW9Ip6VB0olebhvqUnj0NaS1/yMipK9FZy5dqTter94OraqXNla5804YLIAL27Rg5YV6KLkSkRPJWq4EtxrjW+sGvYU0/cwV1oq7+HcEBxxazjN05m75uuzNAMRt7Hu7WYcvgOcgHaZUQ0xE/wDweft6vNXf2rajZ4/a/fQmNiptkgTGdow7NyWeuVMfgMbUxzvxKFnCLcFD8g5AOndqdLIMkSNHh4dplJ2ruqTLuX7js3+gYOFnyQPJiGo+hqfsiN7m2NedOKIGg6J0qcAvffGGuxDNjR7tJUQqnCAvLRpUkQ+1HpJy2jg2fZYuzw2H013bp0G8mf5RTDcUWlVVFWzEOq4Q/dIoi9 Krz92Fls 1zvbKz7AKyTJz5X73IHdqX2Ru1jHRPUw4fu8/L8M34pDUfVTkICN7vBtajXfsC/34l1VGy5bISXo+4sg+SSS0iwwBz5GJqv/9vKpxPWJiqoPBYFhqFeWTK5iOt4xhpw1GBDmzlfYiK7bmgzqIrVC97ifya70bN0X9rqxxUceQkebxX/JccP6thzuF2I1qoDsiLzH2dV+OPzx0zNKYgByYaftwdt2YmmqqZ/MHzLikzgkRPA3F0/mH72mo4e5Wvd5cLYc1C3FXkuipkN1POBVX+lJyJ3URbpk/mB1GC+G8TP1p6Gdv536kBzw36QSIHsIdNhysAxGbF+GAdrM/mWtZ1b8G9R1S2rruMan4p8oUsvHtTsIWlen+FXf1So9B/J0/rYHCzL1B/vcDPHHS5WFN+Oid5RIeWP30MvHfl0/XvpOL+hm3cDAhS27qSA/EccJMgQd83fUDlyTI3TF2XVVqfXtVefiri5hVpiFoPg1Q705WXEpPhyHOHi7yw4rGEQPDX/6Cc2D+Jqvbue2XMQrgOu2z4GmDVLL+IAv/IvH8ehKJO6M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, 26 Mar 2023 19:34:44 -0500 Michael Roth wrote: > On Mon, Feb 20, 2023 at 11:37:09PM +0200, Zhi Wang wrote: > > On Mon, 20 Feb 2023 12:37:55 -0600 > > Michael Roth wrote: > > > > > From: Vishal Annapurve > > > > > > Introduce HVA range operator so that other KVM subsystems > > > can operate on HVA range. > > > > > > Signed-off-by: Vishal Annapurve > > > [mdr: minor checkpatch alignment fixups] > > > Signed-off-by: Michael Roth > > > --- > > > include/linux/kvm_host.h | 6 +++++ > > > virt/kvm/kvm_main.c | 48 ++++++++++++++++++++++++++++++++++++++++ > > > 2 files changed, 54 insertions(+) > > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > > index 4d542060cd93..c615650ed256 100644 > > > --- a/include/linux/kvm_host.h > > > +++ b/include/linux/kvm_host.h > > > @@ -1402,6 +1402,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm); > > > void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end); > > > void kvm_mmu_invalidate_end(struct kvm *kvm); > > > > > > +typedef int (*kvm_hva_range_op_t)(struct kvm *kvm, > > > + struct kvm_gfn_range *range, void *data); > > > + > > > +int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start, > > > + unsigned long hva_end, kvm_hva_range_op_t handler, void *data); > > > + > > > long kvm_arch_dev_ioctl(struct file *filp, > > > unsigned int ioctl, unsigned long arg); > > > long kvm_arch_vcpu_ioctl(struct file *filp, > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > index f7e00593cc5d..4ccd655dd5af 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -642,6 +642,54 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, > > > return (int)ret; > > > } > > > > > > > Below function seems a reduced duplicate of __kvm_handle_hva_range() > > in virt/kvm/kvm_main.c. It would be nice to factor __kvm_handle_hva_range(). > > A few differences make it difficult to refactor this clearly: > > - This handler is mainly used for loading initial contents into guest > image before booting and doesn't rely on the MMU lock being held. It > also *can't* be called with MMU lock held because it suffers from the > same issue with mem_attr_update() hook where it needs to take a > mutex as part of unmapping from directmap when transitioning page to > private state in RMP table > - This handler wants to return an error code, as opposed to existing > handlers which return a true/false values which are passed along to > MMU notifier call-site and handled differently. > - This handler wants to terminate iterating through memslots as soon > as it encounters the first failure, whereas the existing handlers > expect to be called for each slot regardless of return value. > > So it's a pretty different use-case that adds enough complexity to > __kvm_handle_hva_range() that it might need be worth refactoring it, > since it complicates some bits that are closely tied to dealing with > invalidations where the extra complexity probably needs to be > well-warranted. > > I took a stab at it here for reference, but even with what seems to be > the minimal set of changes it doesn't save on any code and ultimately I > think it makes it harder to make sense of what going on: > > https://github.com/mdroth/linux/commit/976c5fb708f7babe899fd80e27e19f8ba3f6818d > > Is there a better approach? > Those requirements looks pretty suitable for kvm_handle_hva_range(). Guess we just need to extend the iterator a little bit. My ideas: 1) Add a lock flag in struct kvm_hva_range to indicate if kvm_lock is required or not during the iteration. Check the flag with if (!locked && hva_range.need_lock). Then the unlock part can be left un-touched. 2) Add an error code in struct kvm_gfn_range, the handler can set it so that __kvm_handle_hva_range() can check gfn_range.err after ret|= handler(xxx); If the err is set, bail out. 3) Return the gfn_range.err to the caller. The caller can decide how to convert it (to boolean or keep it) 4) Set hva_range.need_lock in the existing and the new caller. How about this? > Thanks, > > -Mike > > > > > > +int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start, > > > + unsigned long hva_end, kvm_hva_range_op_t handler, void *data) > > > +{ > > > + int ret = 0; > > > + struct kvm_gfn_range gfn_range; > > > + struct kvm_memory_slot *slot; > > > + struct kvm_memslots *slots; > > > + int i, idx; > > > + > > > + if (WARN_ON_ONCE(hva_end <= hva_start)) > > > + return -EINVAL; > > > + > > > + idx = srcu_read_lock(&kvm->srcu); > > > + > > > + for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { > > > + struct interval_tree_node *node; > > > + > > > + slots = __kvm_memslots(kvm, i); > > > + kvm_for_each_memslot_in_hva_range(node, slots, > > > + hva_start, hva_end - 1) { > > > + unsigned long start, end; > > > + > > > + slot = container_of(node, struct kvm_memory_slot, > > > + hva_node[slots->node_idx]); > > > + start = max(hva_start, slot->userspace_addr); > > > + end = min(hva_end, slot->userspace_addr + > > > + (slot->npages << PAGE_SHIFT)); > > > + > > > + /* > > > + * {gfn(page) | page intersects with [hva_start, hva_end)} = > > > + * {gfn_start, gfn_start+1, ..., gfn_end-1}. > > > + */ > > > + gfn_range.start = hva_to_gfn_memslot(start, slot); > > > + gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot); > > > + gfn_range.slot = slot; > > > + > > > + ret = handler(kvm, &gfn_range, data); > > > + if (ret) > > > + goto e_ret; > > > + } > > > + } > > > + > > > +e_ret: > > > + srcu_read_unlock(&kvm->srcu, idx); > > > + > > > + return ret; > > > +} > > > + > > > static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, > > > unsigned long start, > > > unsigned long end, > >