From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24AF7C38142 for ; Mon, 23 Jan 2023 15:18:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 714596B0072; Mon, 23 Jan 2023 10:18:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C1796B0074; Mon, 23 Jan 2023 10:18:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 514246B0078; Mon, 23 Jan 2023 10:18:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3DF326B0072 for ; Mon, 23 Jan 2023 10:18:13 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id EA2FA1A0AA2 for ; Mon, 23 Jan 2023 15:18:12 +0000 (UTC) X-FDA: 80386419624.26.0ECA27A Received: from new2-smtp.messagingengine.com (new2-smtp.messagingengine.com [66.111.4.224]) by imf16.hostedemail.com (Postfix) with ESMTP id BF7B3180023 for ; Mon, 23 Jan 2023 15:18:10 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b="N Axston"; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="Z 3pTD1Y"; dmarc=none; spf=pass (imf16.hostedemail.com: domain of kirill@shutemov.name designates 66.111.4.224 as permitted sender) smtp.mailfrom=kirill@shutemov.name ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674487090; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5kISd+ukDqvXoyuUF5unW5X7Gt8EcZaBqFcmuVPXAbs=; b=VhPw1KmnFQsqIMsAfGncV3a44IQm9sVOY5SzFzIvsHkxpqVwJ03oE8lXgoh7l3bcVC2oIP vIykQnYSRz4Z5SJxeI/NbTH2JgmMMTNfKPWRTU3pDqkBfMbX5WMsdbvDYODLtNypAX/vOq 6QEdko0MyhcFOrIlJLSKjQL1kqf3lDI= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b="N Axston"; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="Z 3pTD1Y"; dmarc=none; spf=pass (imf16.hostedemail.com: domain of kirill@shutemov.name designates 66.111.4.224 as permitted sender) smtp.mailfrom=kirill@shutemov.name ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674487090; a=rsa-sha256; cv=none; b=lTD3gFt2D5ZEuS28r8vED/AOTMyuDEzHwFtqysWC4swYkqmykbu8Z/h5+ng695BAuYBnYI SGyv0rJOu1yI+qgEl/gjEmEupE0CW0OsRYX13Qabh3VMkIutzriLVdJ5e8vIGGU6Z2PFN5 i39QrXt5m3y1oJHoyb+brH6kf2JvuII= Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailnew.nyi.internal (Postfix) with ESMTP id 16807581DF8; Mon, 23 Jan 2023 10:18:10 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Mon, 23 Jan 2023 10:18:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm1; t=1674487090; x= 1674494290; bh=5kISd+ukDqvXoyuUF5unW5X7Gt8EcZaBqFcmuVPXAbs=; b=N Axston+PNN+fEVioeD1V/lezUKAdjHfU9WuPeXM7fHsZvGjnyyO2FVQ+2IoofTD3 jrYl4JgKxDGXYTYKPMymgiEMsC6peKta1bphKVU1+UBG482ZbUobglIP4uLXLjvh iz8svKCfNXQ9/sb9i3m+ZwF5ZaddAN//4EqhB6TPnfp+I0l6pe5vuwRi/QBKI2DA cqTe9fYYC3tH3u6FnXRVcaV+S9g7rlE4KqmpdJeD8mCh158fZwqoQlGQz8gdR2RT TZ5v5u6AHEuv162fcDTraiqMcKEIhKISRcihlR2d/22rn6pKJO8WIigCLYMAcka4 KPN16U9YlPtzjLX8HdEjg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1674487090; x= 1674494290; bh=5kISd+ukDqvXoyuUF5unW5X7Gt8EcZaBqFcmuVPXAbs=; b=Z 3pTD1YXyo5r/MKhvY5k85GR656ULBrcYsZOWggTr2ZC9w1c1I7Z+bY6sDSWgAvWJ pM9aBZtww7dRFxUv0HoqMx4JpXc19n0xHG4h7LLGQ/08CnSbx2/q5hLy1/xprtl9 qgZlahCrM5wO2hvkE4Tfk+OSCUq8ywtwPX2h8kxtc5LDqf8FTDbrxmiXwVaQVDxJ xy3rZPsc21uBW4nhTe5K7Yh1Z/O1Dh7LFReqE6wP2gfn1gkC7JRM4oIl+QfDAPt2 WTHg5DxjkMM4wlaD4/wKzLCzEXARGfJLoC5MplhMLol16WrJvJqE5JOjumf6rDBs A9ScnDZUD2WKfbk/pDqkw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedruddukedgieegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggugfgjsehtkeertddttddunecuhfhrohhmpedfmfhi rhhilhhlucetrdcuufhhuhhtvghmohhvfdcuoehkihhrihhllhesshhhuhhtvghmohhvrd hnrghmvgeqnecuggftrfgrthhtvghrnhepvdfgjeffteevffetleefgfehjefffefftdeh ffeljeevfffgffefueegfeeuuefgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepkhhirhhilhhlsehshhhuthgvmhhovhdrnhgrmhgv X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 23 Jan 2023 10:18:06 -0500 (EST) Received: by box.shutemov.name (Postfix, from userid 1000) id 45D7C10352A; Mon, 23 Jan 2023 18:18:03 +0300 (+03) Date: Mon, 23 Jan 2023 18:18:03 +0300 From: "Kirill A. Shutemov" To: Vlastimil Babka Cc: "Huang, Kai" , "chao.p.peng@linux.intel.com" , "tglx@linutronix.de" , "linux-arch@vger.kernel.org" , "kvm@vger.kernel.org" , "jmattson@google.com" , "Hocko, Michal" , "pbonzini@redhat.com" , "ak@linux.intel.com" , "Lutomirski, Andy" , "linux-fsdevel@vger.kernel.org" , "tabba@google.com" , "david@redhat.com" , "michael.roth@amd.com" , "kirill.shutemov@linux.intel.com" , "corbet@lwn.net" , "qemu-devel@nongnu.org" , "dhildenb@redhat.com" , "bfields@fieldses.org" , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , "bp@alien8.de" , "ddutile@redhat.com" , "rppt@kernel.org" , "shuah@kernel.org" , "vkuznets@redhat.com" , "mail@maciej.szmigiero.name" , "naoya.horiguchi@nec.com" , "qperret@google.com" , "arnd@arndb.de" , "linux-api@vger.kernel.org" , "yu.c.zhang@linux.intel.com" , "Christopherson,, Sean" , "wanpengli@tencent.com" , "vannapurve@google.com" , "hughd@google.com" , "aarcange@redhat.com" , "mingo@redhat.com" , "hpa@zytor.com" , "Nakajima, Jun" , "jlayton@kernel.org" , "joro@8bytes.org" , "linux-mm@kvack.org" , "Wang, Wei W" , "steven.price@arm.com" , "linux-doc@vger.kernel.org" , "Hansen, Dave" , "akpm@linux-foundation.org" , "linmiaohe@huawei.com" Subject: Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory Message-ID: <20230123151803.lwbjug6fm45olmru@box> References: <20221202061347.1070246-1-chao.p.peng@linux.intel.com> <20221202061347.1070246-2-chao.p.peng@linux.intel.com> <5c6e2e516f19b0a030eae9bf073d555c57ca1f21.camel@intel.com> <20221219075313.GB1691829@chaop.bj.intel.com> <20221220072228.GA1724933@chaop.bj.intel.com> <126046ce506df070d57e6fe5ab9c92cdaf4cf9b7.camel@intel.com> <20221221133905.GA1766136@chaop.bj.intel.com> <010a330c-a4d5-9c1a-3212-f9107d1c5f4e@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <010a330c-a4d5-9c1a-3212-f9107d1c5f4e@suse.cz> X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: BF7B3180023 X-Stat-Signature: x3zdeb1nnzkc4cky3yqsoin7qrnqdtjr X-HE-Tag: 1674487090-933871 X-HE-Meta: U2FsdGVkX19SYgwDJ9LofwjEj2ckoLAOTaiWLEhKDcTX3zk/ZvS9edQgo8yP5Ih0G3DaEWzINJTJsFZkEbEsJuVwOofduWNFPNiThQ0A/xwYtI1XMyGxBvGp/1LPxBXSgL5OTrelQwNZk62Xi74LMpavX9m+ykoONYxJ7dGHQoZ/oXtsyt3M9ptC2ql2fqn4prvPalqewoQLH9xhW7rSfTY8+c4HwHb0UZ0C7D/IUq5H50VR2thA+nMKeiWfMNSlriPhe4JVV+/Rxg6Mn+AS7ABqpUNfwFOCxUn4Nj6ETbg+KIWmIHDWTLEc4nojdVe1/dH40zt25i+cjj/JyQvmQTQOPrmROR3u6cj9mY9Qz6b3kVUQ3Asd/CwGxkxPNAWf8ds+At53Ctiftec1uEXjTMOEvKFl0W8U4v6LqtpA1ESbh10G5BEn7oQ2r2x24h+G7xJu0i01akW3ogzKlyreQU3CaX0JQQWWWO+0XbLQD+rIANEdXM8zg4RSQqWP1FrRSEzCD9WxZc4kAhlH35LUMUfwRV1FBp5HMaPwFwhKZQDHaayMb6yhJXZChPcffBXRsER0bBwruEkR4+A3vcaTyOQiXTiSCiFy9aQ5uNEhLXwdrB0RN6Va0MNG8GhcMvJVF4aOWKSJLUb7q5nsfZUp2jn9eJZduou6cU2wGlMBRURxKY4XvlkqWK7hdOSW9HEQkOOoSIqNq6oMUJO+ON8fOYHA6PL3stAux1Jwvp6jG6UcsMrBi1LRBEZuOic0HJy+sbVbeaH8KGtX5Th82zMEvdmjshIuggM/hOfXvYOOTa73nZWzTET6GLS5dT+FelPl4j6U8PVpk0LQwtTY6Rn+r265SpflsEBzVCw4TETm4JlA85Udp3ap+KjPgzw6PqozEybYoNEyer5Z/UmLJG+johKT2WCIL4lXAzbGI9q2jQEC5cqzZnrtDVg/hYd+M+jKdyj4p+KjOHok5yL+Q7l /15n0c6h +W8g6PrK8nXpOG01e0wIy5hBekKwgZFQPYaHEAI3UEawobw/GQb+xdsPatg+LdIsDKHWwHUtXRTbQYR5cDgwdS9Xu7WUaJaeDrFZVsEq5HpyIcZDF8a1ibo2uU5JEC9XSkTmTPjDoWOcBW0thgAq7+N+06NGOUOTz4/lmzl5gMahShYGGo6eYCaln0JRC6LHPCkwVG3CwpUkS9rrN3TkI6sX5ZKcFPwIY9PsN6A5BMNbednjbjaBHUaq53c0e8iibHF/EdvrsX03oxkcFwcW8IDuX/2KLXTcwHatV12gDocjecfTxNDDqKQTHw3LB5m47Uc+5PYw7JFdhEKMxSEf9iErcUxKzxxyxOWUvKhFUFuZI4SEtp352Gr39Xg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 23, 2023 at 03:03:45PM +0100, Vlastimil Babka wrote: > On 12/22/22 01:37, Huang, Kai wrote: > >>> I argue that this page pinning (or page migration prevention) is not > >>> tied to where the page comes from, instead related to how the page will > >>> be used. Whether the page is restrictedmem backed or GUP() backed, once > >>> it's used by current version of TDX then the page pinning is needed. So > >>> such page migration prevention is really TDX thing, even not KVM generic > >>> thing (that's why I think we don't need change the existing logic of > >>> kvm_release_pfn_clean()).  > >>> > > This essentially boils down to who "owns" page migration handling, and sadly, > > page migration is kinda "owned" by the core-kernel, i.e. KVM cannot handle page > > migration by itself -- it's just a passive receiver. > > > > For normal pages, page migration is totally done by the core-kernel (i.e. it > > unmaps page from VMA, allocates a new page, and uses migrate_pape() or a_ops- > >> migrate_page() to actually migrate the page). > > In the sense of TDX, conceptually it should be done in the same way. The more > > important thing is: yes KVM can use get_page() to prevent page migration, but > > when KVM wants to support it, KVM cannot just remove get_page(), as the core- > > kernel will still just do migrate_page() which won't work for TDX (given > > restricted_memfd doesn't have a_ops->migrate_page() implemented). > > > > So I think the restricted_memfd filesystem should own page migration handling, > > (i.e. by implementing a_ops->migrate_page() to either just reject page migration > > or somehow support it). > > While this thread seems to be settled on refcounts already, just wanted > to point out that it wouldn't be ideal to prevent migrations by > a_ops->migrate_page() rejecting them. It would mean cputime wasted (i.e. > by memory compaction) by isolating the pages for migration and then > releasing them after the callback rejects it (at least we wouldn't waste > time creating and undoing migration entries in the userspace page tables > as there's no mmap). Elevated refcount on the other hand is detected > very early in compaction so no isolation is attempted, so from that > aspect it's optimal. Hm. Do we need a new hook in a_ops to check if the page is migratable before going with longer path to migrate_page(). Or maybe add AS_UNMOVABLE? -- Kiryl Shutsemau / Kirill A. Shutemov