From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43CC7C00140 for ; Thu, 18 Aug 2022 05:40:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 550628D0002; Thu, 18 Aug 2022 01:40:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FF8C6B0074; Thu, 18 Aug 2022 01:40:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39EEF8D0002; Thu, 18 Aug 2022 01:40:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 29F8C6B0073 for ; Thu, 18 Aug 2022 01:40:38 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id EED0C1A0A60 for ; Thu, 18 Aug 2022 05:40:34 +0000 (UTC) X-FDA: 79811613588.11.59780F2 Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf03.hostedemail.com (Postfix) with ESMTP id A1F23201D9 for ; Thu, 18 Aug 2022 05:40:34 +0000 (UTC) Received: by mail-qt1-f181.google.com with SMTP id h4so417831qtj.11 for ; Wed, 17 Aug 2022 22:40:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc; bh=c4YN6O08AaYBNToE2Um1itfC+MNp00Kd+ydM+T0W7ko=; b=IZg9hsqi4ssQ7fuSUIn7+iozROcN5MvemR4pWibTfFN+hfgd52YevwqpQv+32PXVtk tMKwa30Shd3IsjRcMLHYaAnJfl/UHy6ihTpMO5jxCGTVmCK8t9Q/fI2uh+GiuGiPBmQT hK2VF/OtxMLDbC/ytFlCkss/tfpwYlp5+ZwaMpzIEC+P6HlTmxeljMnHhkBQPeC4di9d T2u0GkCtVLOLm6jsVJ/3uxGfDAAj5JLx6gR6GsoWNUhw61Zp1rHlGIeQeHfYiVPTdajB O6i+/DSqdHmANKvtofbV85bdjkpgDV6mKg5DiLTieyv2rXLbetaIRb7kxG6T1BLHQnBp SkEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc; bh=c4YN6O08AaYBNToE2Um1itfC+MNp00Kd+ydM+T0W7ko=; b=JE3Od6JtwkLK6GlihX/tBB4sWWE4hjKMQ6wd9FAxbqRRZRAM0o7oqwVBDOVpZmzZj9 R0VGkwM0opFegI82ziPdh9j3JmxkTj3JB8vwrHZPwHhhWYkXW6eESosg2jBFr/h1pBbR ZQY2J0H0c4gbQLtjgIGJLruR/HqE6tXKP42Y9pkMPIJoAV7FybUtOaMU2hVW/Wy6TOzo cgZ4bGHqZE2NqZKgVm6aCK7w5a+Qa9km8MxX48Xy0azUZ99o3TzQuYftf37rSqJk+nGH mb2Mn+bPHTUfwwPS8gjQfSgQGFw5stWAYh6GMdmHMJMt1wSOr0Dv7IAg7XrqbAgAOZJ7 zcKA== X-Gm-Message-State: ACgBeo2e3vkTdbh5ObJ/0WxlsXnvVs07HeGUivyVk4+r8ZuPJVDFWthj oc/mSywfWmqLCIDz2RFpHVBXzQ== X-Google-Smtp-Source: AA6agR4Fw98jiMqwiSkyP6cz6JD/D6jUG7NFWUxowXwEedXtNOOsMzX1sD0rCuxeyRSGmm1fcXkVkw== X-Received: by 2002:a05:622a:1745:b0:343:5e40:47b1 with SMTP id l5-20020a05622a174500b003435e4047b1mr1310585qtk.120.1660801233516; Wed, 17 Aug 2022 22:40:33 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id m1-20020a05620a290100b006b95f832aebsm787055qkp.96.2022.08.17.22.40.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Aug 2022 22:40:33 -0700 (PDT) Date: Wed, 17 Aug 2022 22:40:12 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Chao Peng cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, linux-kselftest@vger.kernel.org, Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song , "Gupta, Pankaj" Subject: Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory In-Reply-To: <20220706082016.2603916-1-chao.p.peng@linux.intel.com> Message-ID: References: <20220706082016.2603916-1-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=IZg9hsqi; spf=pass (imf03.hostedemail.com: domain of hughd@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660801234; a=rsa-sha256; cv=none; b=nwFThE8GlWFduleKvlRMpRN+1dYuit7fSGxFd8T4krZEnJ3OGdhp6hI7Pm9j6YfQp8vEI/ /QEDo2JcYluBcJ4AIM1DPAlxcNDxGJ4Rslhim0TdL/U6bGEw24BTUqOEzu+fgbKZIg4U0Z Hc9HoIoaw4yvPPT8MIpqZ2657OE+ri4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660801234; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c4YN6O08AaYBNToE2Um1itfC+MNp00Kd+ydM+T0W7ko=; b=o03UxkgrW5Xuk/l7eFa/n1NFTDd7nAR/2uWmmrSZKg0eKwyPEGtNnVRjJ0Paaz8qlZS4n+ ZhsExhM1pZPNDoUAvCz1HWz8f5Td+/5HDerlyHG9ltMj4IsfXZcc3k462tCB8vapLGyIPM /BuA+tKW89/qVqBcWB1rYJ4p3zYQ7fo= X-Stat-Signature: 6eks7oczgxp3dj56absochdpgsqy9m4b X-Rspamd-Queue-Id: A1F23201D9 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=IZg9hsqi; spf=pass (imf03.hostedemail.com: domain of hughd@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1660801234-799509 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 6 Jul 2022, Chao Peng wrote: > This is the v7 of this series which tries to implement the fd-based KVM > guest private memory. Here at last are my reluctant thoughts on this patchset. fd-based approach for supporting KVM guest private memory: fine. Use or abuse of memfd and shmem.c: mistaken. memfd_create() was an excellent way to put together the initial prototype. But since then, TDX in particular has forced an effort into preventing (by flags, seals, notifiers) almost everything that makes it shmem/tmpfs. Are any of the shmem.c mods useful to existing users of shmem.c? No. Is MFD_INACCESSIBLE useful or comprehensible to memfd_create() users? No. What use do you have for a filesystem here? Almost none. IIUC, what you want is an fd through which QEMU can allocate kernel memory, selectively free that memory, and communicate fd+offset+length to KVM. And perhaps an interface to initialize a little of that memory from a template (presumably copied from a real file on disk somewhere). You don't need shmem.c or a filesystem for that! If your memory could be swapped, that would be enough of a good reason to make use of shmem.c: but it cannot be swapped; and although there are some references in the mailthreads to it perhaps being swappable in future, I get the impression that will not happen soon if ever. If your memory could be migrated, that would be some reason to use filesystem page cache (because page migration happens to understand that type of memory): but it cannot be migrated. Some of these impressions may come from earlier iterations of the patchset (v7 looks better in several ways than v5). I am probably underestimating the extent to which you have taken on board other usages beyond TDX and SEV private memory, and rightly want to serve them all with similar interfaces: perhaps there is enough justification for shmem there, but I don't see it. There was mention of userfaultfd in one link: does that provide the justification for using shmem? I'm afraid of the special demands you may make of memory allocation later on - surprised that huge pages are not mentioned already; gigantic contiguous extents? secretmem removed from direct map? Here's what I would prefer, and imagine much easier for you to maintain; but I'm no system designer, and may be misunderstanding throughout. QEMU gets fd from opening /dev/kvm_something, uses ioctls (or perhaps the fallocate syscall interface itself) to allocate and free the memory, ioctl for initializing some of it too. KVM in control of whether that fd can be read or written or mmap'ed or whatever, no need to prevent it in shmem.c, no need for flags, seals, notifications to and fro because KVM is already in control and knows the history. If shmem actually has value, call into it underneath - somewhat like SysV SHM, and /dev/zero mmap, and i915/gem make use of it underneath. If shmem has nothing to add, just allocate and free kernel memory directly, recorded in your own xarray. With that /dev/kvm_something subject to access controls and LSMs - which I cannot find for memfd_create(). Full marks for including the MFD_INACCESSIBLE manpage update, and for Cc'ing linux-api: but I'd have expected some doubts from that direction already. Hugh