From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FE82C001DE for ; Fri, 14 Jul 2023 23:10:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229986AbjGNXKG (ORCPT ); Fri, 14 Jul 2023 19:10:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229800AbjGNXKE (ORCPT ); Fri, 14 Jul 2023 19:10:04 -0400 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C6883589 for ; Fri, 14 Jul 2023 16:10:03 -0700 (PDT) Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-51e5e4c6026so2989717a12.0 for ; Fri, 14 Jul 2023 16:10:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689376202; x=1691968202; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CtlXy1KLlp4Z9yr6f0vE8608JmRv00GbNepTwupN+LE=; b=E9WUWUwO4fT8t1PSs2eneHvRvExhJfDAB1Eiz7qKASKfVkyOHvYwhqSlb7jDvR7Cp2 wH/kkMTysu7N+XClYuMaQDyCdxSSXS2qg6KcjPj7SHIU5wFPX0fyat9Hva9Oe3CHEEcy 8dAY0aoypyRqpn+6CxjyzHYrBgviVStGJPhXdJcFNwsPDZX9WRmZKZ5t56f228q5Kkb7 cJCYrrtypevJTCjG4uyS1tW3Sgc2UwRZXhT9KzT2O+E0cOfeSq0rCBPg0Fp5nKS2hlOu Q0OI8Mvra8I0kZKkuee6umJ5KvaUQ4wzBCgQtctCxdPJ5DXcWhY68iy9Cm05c6UXtrIo Lrhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689376202; x=1691968202; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CtlXy1KLlp4Z9yr6f0vE8608JmRv00GbNepTwupN+LE=; b=cvQuHo8FWquBs1zdy6O1tM/Jd/W8pPULqUahggK42RPmkKOUoEq/w6nhuSx+xPp5VJ r0+6cD96ptcNbYLvLIvVZrvKezgBUKWqIIdz3HE3VVuly2I+rJ77gTAME2aFUdURdbcJ WNbYR6X5ycczgH6T+yXzFzrG+yyKGDvn3roKJCDDhptoaAulkRwKda51UUVh4VUBrys7 ybNULA7Idt4r7ljtoVAW6UmyQhFDLsIYy/sNQG4cqbsG8jLdA4bAiIDuvk0jW9whLbWR sVHHs6E984lDOlbqz9Qy02GggD2VR2azBikIQgnNTXcgBDXBBg7wGDEru+2XW1f68d9v 7wyA== X-Gm-Message-State: ABy/qLYLniuSdYF3Qy16q+1SkPCVspD3DxJ6dkHErk2ZlaWEMZPZi14n Nb/uSfnfyBAjCM9xMPKQNQXJmmDBG9vOqLMCUD1EDA== X-Google-Smtp-Source: APBJJlEWt+RnTz5aE6m/JAWabL+RG8/hq9gYwcXNzUAKsTyzNO+L298+3KtLUjnyJkm89EAh80GKMWiiPRpnSTIGYKI= X-Received: by 2002:a17:906:20dd:b0:98d:5293:55f7 with SMTP id c29-20020a17090620dd00b0098d529355f7mr5323184ejc.6.1689376201775; Fri, 14 Jul 2023 16:10:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Vishal Annapurve Date: Fri, 14 Jul 2023 16:09:50 -0700 Message-ID: Subject: Re: Rename restrictedmem => guardedmem? (was: Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM) To: Sean Christopherson Cc: Ackerley Tng , david@redhat.com, chao.p.peng@linux.intel.com, pbonzini@redhat.com, vkuznets@redhat.com, jmattson@google.com, joro@8bytes.org, mail@maciej.szmigiero.name, vbabka@suse.cz, yu.c.zhang@linux.intel.com, kirill.shutemov@linux.intel.com, dhildenb@redhat.com, qperret@google.com, tabba@google.com, michael.roth@amd.com, wei.w.wang@intel.com, rppt@kernel.org, liam.merwick@oracle.com, isaku.yamahata@gmail.com, jarkko@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hughd@google.com, brauner@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 14, 2023 at 12:29=E2=80=AFPM Sean Christopherson wrote: > ... > And _if_ there is a VMM that instantiates memory before KVM_CREATE_VM, IM= O making > the ioctl() /dev/kvm scoped would have no meaningful impact on adapting u= serspace > to play nice with the required ordering. If userspace can get at /dev/kv= m, then > it can do KVM_CREATE_VM, because the only input to KVM_CREATE_VM is the t= ype, i.e. > the only dependencies for KVM_CREATE_VM should be known/resolved long bef= ore the > VMM knows it wants to use gmem. I am not sure about the benefits of tying gmem creation to any given kvm instance. I think the most important requirement here is that a given gmem range is always tied to a single VM - This can be enforced when memslots are bound to the gmem files. I believe "Required ordering" is that gmem files are created first and then supplied while creating the memslots whose gpa ranges can generate private memory accesses. Is there any other ordering we want to enforce here? > ... > Practically, I think that gives us a clean, intuitive way to handle intra= -host > migration. Rather than transfer ownership of the file, instantiate a new= file > for the target VM, using the gmem inode from the source VM, i.e. create a= hard > link. That'd probably require new uAPI, but I don't think that will be h= ugely > problematic. KVM would need to ensure the new VM's guest_memfd can't be = mapped > until KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM (which would also need to verify t= he > memslots/bindings are identical), but that should be easy enough to enfor= ce. > > That way, a VM, its memslots, and its SPTEs are tied to the file, while a= llowing > the memory and the *contents* of memory to outlive the VM, i.e. be effect= ively > transfered to the new target VM. And we'll maintain the invariant that e= ach > guest_memfd is bound 1:1 with a single VM. > > As above, that should also help us draw the line between mapping memory i= nto a > VM (file), and freeing/reclaiming the memory (inode). > > There will be extra complexity/overhead as we'll have to play nice with t= he > possibility of multiple files per inode, e.g. to zap mappings across all = files > when punching a hole, but the extra complexity is quite small, e.g. we ca= n use > address_space.private_list to keep track of the guest_memfd instances ass= ociated > with the inode. Are we talking about a different usecase of sharing gmem fd across VMs other than intra-host migration? If not, ideally only one of the files should be catering to the guest memory mappings at any given time. i.e. any inode should be ideally bound to (through the file) a single kvm instance, as we we are planning to ensure that guest_memfd can't be mapped until KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM is invoked on the target side.