From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7699C00140 for ; Fri, 19 Aug 2022 00:20:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E1CD78D0002; Thu, 18 Aug 2022 20:20:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA4CB8D0001; Thu, 18 Aug 2022 20:20:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF78E8D0002; Thu, 18 Aug 2022 20:20:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A8D898D0001 for ; Thu, 18 Aug 2022 20:20:34 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 76BAD1A011E for ; Fri, 19 Aug 2022 00:20:34 +0000 (UTC) X-FDA: 79814435988.02.A3470C6 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf04.hostedemail.com (Postfix) with ESMTP id A0CF140002 for ; Fri, 19 Aug 2022 00:20:33 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id pm17so3165975pjb.3 for ; Thu, 18 Aug 2022 17:20:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=pEgXnTJKFJrTL3CTka16XSJbg6kxwB4cnfOI/Ofjlz4=; b=jQOCWpbvee03T0P8xijjOYN3zJfdX6BqWq3GiUDY54GeiB2kYOpPOdypH/gNOYaODr ix4Q5zM5HRdkrH8VNhhYaqC4bGm3TWrsr3zotxk6UhP/oyxM/RNGcU3wIqmpKytWvU4a BTREDH/HNF59rmTBvSdbCesgF6GRyz5K9//B1DM9fRziLLn/FELyxzGGJ6BTGO4HJJrl o0UAtZhP3o05JqW/WffSD/W7KBey1RzgtF0ZEqIKFPy09PHnweIeH6czpkeJBsdR0KF6 StCY9jf29Uto9xBg0n+/eHlcuPQ9Rk2xvxAOK7+9vgIk182YafaVSJIp8WTK07eIFh8d mv/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=pEgXnTJKFJrTL3CTka16XSJbg6kxwB4cnfOI/Ofjlz4=; b=BazJen64CQizL89Zy4h5NkBglg6VxFqMjIx5YMN6ttuC/fEIMWowv2G6sSO/uPiuXh Mu9tZWUvbLxhE4/NL2r+hU55tXrB2GvmlLviN+MCWH0kvvDFpAuX+v+5XOFRBK54Tb70 2vfP1v/FSqjF/w+ayPww5cOlA3mMOTvjm5eOsdRWEusY2el83ouvX+mZ4SY26O/HBy1+ FYPXnvgGqUZ9eA41HfbZOr30I4uafjQ5nco54mwPCGfCbPBOAXUKViWSnnsN4JGthf8i j+NkT1uAtbDgmKcrxJHxFtiOIYZGz4DYKYbjY8NaIdro+tt8kSeV/ZWmTvR8/L/SXU4y pdpg== X-Gm-Message-State: ACgBeo2Ju3z/XvdUHZO8I4/jBWf2kb6bKKiRYO0En53KTF3ilHodguSu qwT/+oEY2E6lUXFLSP0BxkqFSQ== X-Google-Smtp-Source: AA6agR7cPKIHabvzPHesQ7mysYM4f98Not26hLuTLYt6okFI7nMH1sNNM2K+QGI7RxRYoX8JypicJA== X-Received: by 2002:a17:902:7208:b0:172:a9d6:527 with SMTP id ba8-20020a170902720800b00172a9d60527mr4900702plb.32.1660868432853; Thu, 18 Aug 2022 17:20:32 -0700 (PDT) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id l4-20020a170903244400b0016bedcced2fsm1984501pls.35.2022.08.18.17.20.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Aug 2022 17:20:32 -0700 (PDT) Date: Fri, 19 Aug 2022 00:20:28 +0000 From: Sean Christopherson To: "Kirill A . Shutemov" Cc: Hugh Dickins , Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, linux-kselftest@vger.kernel.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song , "Gupta, Pankaj" Subject: Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: References: <20220706082016.2603916-1-chao.p.peng@linux.intel.com> <20220818132421.6xmjqduempmxnnu2@box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220818132421.6xmjqduempmxnnu2@box> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660868433; a=rsa-sha256; cv=none; b=0iECqu7JG1Dykrd1OeAda2RE1QHGBb1L3rnhmBVNUtA/TTK/ncI7TKq+LxNzeZ+Y93X5UT feWh/xSVoGgk9cr0Y8ePAh9dvhyzDt5nM9xNkJHzhshZy9sOMm698K+2ectOS+LUyvtUk2 CB6UVNNICoJZYU2mkU43AFn3z1eKRcg= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jQOCWpbv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of seanjc@google.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=seanjc@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660868433; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pEgXnTJKFJrTL3CTka16XSJbg6kxwB4cnfOI/Ofjlz4=; b=VJYxoYowoW/+/KuNk2z9wtKyxWRH7p4prqZjim8WrYv9htVgnsg67uEe19WYe+0T/2/NPF LjPFrtGhFRRDegryNWQa3JkMOCz6oJxxMSWkuaAit0BjxRYevS67t94jLsSaK1vqjqOWP1 MajJfSbi4it4gxPpSHMXhZhYGl9r4cY= X-Stat-Signature: z4oa6ghkp3rurixbo8aqqbi777ms3d53 X-Rspamd-Queue-Id: A0CF140002 X-Rspam-User: Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jQOCWpbv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of seanjc@google.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=seanjc@google.com X-Rspamd-Server: rspam11 X-HE-Tag: 1660868433-963337 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 18, 2022, Kirill A . Shutemov wrote: > On Wed, Aug 17, 2022 at 10:40:12PM -0700, Hugh Dickins wrote: > > On Wed, 6 Jul 2022, Chao Peng wrote: > > But since then, TDX in particular has forced an effort into preventing > > (by flags, seals, notifiers) almost everything that makes it shmem/tmpfs. > > > > Are any of the shmem.c mods useful to existing users of shmem.c? No. > > Is MFD_INACCESSIBLE useful or comprehensible to memfd_create() users? No. But QEMU and other VMMs are users of shmem and memfd. The new features certainly aren't useful for _all_ existing users, but I don't think it's fair to say that they're not useful for _any_ existing users. > > What use do you have for a filesystem here? Almost none. > > IIUC, what you want is an fd through which QEMU can allocate kernel > > memory, selectively free that memory, and communicate fd+offset+length > > to KVM. And perhaps an interface to initialize a little of that memory > > from a template (presumably copied from a real file on disk somewhere). > > > > You don't need shmem.c or a filesystem for that! > > > > If your memory could be swapped, that would be enough of a good reason > > to make use of shmem.c: but it cannot be swapped; and although there > > are some references in the mailthreads to it perhaps being swappable > > in future, I get the impression that will not happen soon if ever. > > > > If your memory could be migrated, that would be some reason to use > > filesystem page cache (because page migration happens to understand > > that type of memory): but it cannot be migrated. > > Migration support is in pipeline. It is part of TDX 1.5 [1]. And this isn't intended for just TDX (or SNP, or pKVM). We're not _that_ far off from being able to use UPM for "regular" VMs as a way to provide defense-in-depth without having to take on the overhead of confidential VMs. At that point, migration and probably even swap are on the table. > And swapping theoretically possible, but I'm not aware of any plans as of > now. Ya, I highly doubt confidential VMs will ever bother with swap. > > I'm afraid of the special demands you may make of memory allocation > > later on - surprised that huge pages are not mentioned already; > > gigantic contiguous extents? secretmem removed from direct map? > > The design allows for extension to hugetlbfs if needed. Combination of > MFD_INACCESSIBLE | MFD_HUGETLB should route this way. There should be zero > implications for shmem. It is going to be separate struct memfile_backing_store. > > I'm not sure secretmem is a fit here as we want to extend MFD_INACCESSIBLE > to be movable if platform supports it and secretmem is not migratable by > design (without direct mapping fragmentations). But secretmem _could_ be a fit. If a use case wants to unmap guest private memory from both userspace and the kernel then KVM should absolutely be able to support that, but at the same time I don't want to have to update KVM to enable secretmem (and I definitely don't want KVM poking into the directmap itself). MFD_INACCESSIBLE should only say "this memory can't be mapped into userspace", any other properties should be completely separate, e.g. the inability to migrate pages is effective a restriction from KVM (acting on behalf of TDX/SNP), it's not a fundamental property of MFD_INACCESSIBLE.