From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51783C25B0E for ; Fri, 19 Aug 2022 00:20:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241338AbiHSAUm (ORCPT ); Thu, 18 Aug 2022 20:20:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240030AbiHSAUf (ORCPT ); Thu, 18 Aug 2022 20:20:35 -0400 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96C42CE317 for ; Thu, 18 Aug 2022 17:20:33 -0700 (PDT) Received: by mail-pj1-x1033.google.com with SMTP id o14-20020a17090a0a0e00b001fabfd3369cso3359379pjo.5 for ; Thu, 18 Aug 2022 17:20:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=pEgXnTJKFJrTL3CTka16XSJbg6kxwB4cnfOI/Ofjlz4=; b=jQOCWpbvee03T0P8xijjOYN3zJfdX6BqWq3GiUDY54GeiB2kYOpPOdypH/gNOYaODr ix4Q5zM5HRdkrH8VNhhYaqC4bGm3TWrsr3zotxk6UhP/oyxM/RNGcU3wIqmpKytWvU4a BTREDH/HNF59rmTBvSdbCesgF6GRyz5K9//B1DM9fRziLLn/FELyxzGGJ6BTGO4HJJrl o0UAtZhP3o05JqW/WffSD/W7KBey1RzgtF0ZEqIKFPy09PHnweIeH6czpkeJBsdR0KF6 StCY9jf29Uto9xBg0n+/eHlcuPQ9Rk2xvxAOK7+9vgIk182YafaVSJIp8WTK07eIFh8d mv/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=pEgXnTJKFJrTL3CTka16XSJbg6kxwB4cnfOI/Ofjlz4=; b=RwxPAzs5r1e/Jhsr/gLm3tFQ17yIcNuDXMfT710+1Sr+jpnhgkjk09ECy+jHeOz5ks 5pHShPoXeV8Mz1bgd7Abk1u+N6o4ydWa2PryNsePKsnDgKT3D2ks710wB10JSAK/e/Vx WgenNz9SbaS+jffdVJbj4AMtpAFwk3Lp6Pa9m2Gy/gHDAGid03VuDI3ftyU6ESzjiegx ADk18bVicp54OXchHtnYHwhPEoDRH+l2iYKTSnGIykuSsl9xakpyRqR2OFYhxK/MnxaF U0CNZj/0iUGecZBcCVash5TxcL2w13KXcyshOMJ5Ok6F+suRB6XqZeW0ILg+KiVaCDPf DsBQ== X-Gm-Message-State: ACgBeo198dROpSc6H/Uu2lEWKrr+o4moVZJcEZPBCNz45bygOQ+MBXG0 1tmXbYbw+pd9VVPikLAnG5O2IQ== X-Google-Smtp-Source: AA6agR7cPKIHabvzPHesQ7mysYM4f98Not26hLuTLYt6okFI7nMH1sNNM2K+QGI7RxRYoX8JypicJA== X-Received: by 2002:a17:902:7208:b0:172:a9d6:527 with SMTP id ba8-20020a170902720800b00172a9d60527mr4900702plb.32.1660868432853; Thu, 18 Aug 2022 17:20:32 -0700 (PDT) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id l4-20020a170903244400b0016bedcced2fsm1984501pls.35.2022.08.18.17.20.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Aug 2022 17:20:32 -0700 (PDT) Date: Fri, 19 Aug 2022 00:20:28 +0000 From: Sean Christopherson To: "Kirill A . Shutemov" Cc: Hugh Dickins , Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, linux-kselftest@vger.kernel.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song , "Gupta, Pankaj" Subject: Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: References: <20220706082016.2603916-1-chao.p.peng@linux.intel.com> <20220818132421.6xmjqduempmxnnu2@box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220818132421.6xmjqduempmxnnu2@box> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 18, 2022, Kirill A . Shutemov wrote: > On Wed, Aug 17, 2022 at 10:40:12PM -0700, Hugh Dickins wrote: > > On Wed, 6 Jul 2022, Chao Peng wrote: > > But since then, TDX in particular has forced an effort into preventing > > (by flags, seals, notifiers) almost everything that makes it shmem/tmpfs. > > > > Are any of the shmem.c mods useful to existing users of shmem.c? No. > > Is MFD_INACCESSIBLE useful or comprehensible to memfd_create() users? No. But QEMU and other VMMs are users of shmem and memfd. The new features certainly aren't useful for _all_ existing users, but I don't think it's fair to say that they're not useful for _any_ existing users. > > What use do you have for a filesystem here? Almost none. > > IIUC, what you want is an fd through which QEMU can allocate kernel > > memory, selectively free that memory, and communicate fd+offset+length > > to KVM. And perhaps an interface to initialize a little of that memory > > from a template (presumably copied from a real file on disk somewhere). > > > > You don't need shmem.c or a filesystem for that! > > > > If your memory could be swapped, that would be enough of a good reason > > to make use of shmem.c: but it cannot be swapped; and although there > > are some references in the mailthreads to it perhaps being swappable > > in future, I get the impression that will not happen soon if ever. > > > > If your memory could be migrated, that would be some reason to use > > filesystem page cache (because page migration happens to understand > > that type of memory): but it cannot be migrated. > > Migration support is in pipeline. It is part of TDX 1.5 [1]. And this isn't intended for just TDX (or SNP, or pKVM). We're not _that_ far off from being able to use UPM for "regular" VMs as a way to provide defense-in-depth without having to take on the overhead of confidential VMs. At that point, migration and probably even swap are on the table. > And swapping theoretically possible, but I'm not aware of any plans as of > now. Ya, I highly doubt confidential VMs will ever bother with swap. > > I'm afraid of the special demands you may make of memory allocation > > later on - surprised that huge pages are not mentioned already; > > gigantic contiguous extents? secretmem removed from direct map? > > The design allows for extension to hugetlbfs if needed. Combination of > MFD_INACCESSIBLE | MFD_HUGETLB should route this way. There should be zero > implications for shmem. It is going to be separate struct memfile_backing_store. > > I'm not sure secretmem is a fit here as we want to extend MFD_INACCESSIBLE > to be movable if platform supports it and secretmem is not migratable by > design (without direct mapping fragmentations). But secretmem _could_ be a fit. If a use case wants to unmap guest private memory from both userspace and the kernel then KVM should absolutely be able to support that, but at the same time I don't want to have to update KVM to enable secretmem (and I definitely don't want KVM poking into the directmap itself). MFD_INACCESSIBLE should only say "this memory can't be mapped into userspace", any other properties should be completely separate, e.g. the inability to migrate pages is effective a restriction from KVM (acting on behalf of TDX/SNP), it's not a fundamental property of MFD_INACCESSIBLE.