From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58BE6C54E94 for ; Mon, 23 Jan 2023 14:05:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C50E16B0072; Mon, 23 Jan 2023 09:05:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C27C36B0073; Mon, 23 Jan 2023 09:05:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AEF456B0074; Mon, 23 Jan 2023 09:05:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A0F2E6B0072 for ; Mon, 23 Jan 2023 09:05:41 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 883F31A0A7F for ; Mon, 23 Jan 2023 14:05:41 +0000 (UTC) X-FDA: 80386236882.28.3A38A4C Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf02.hostedemail.com (Postfix) with ESMTP id 6C55D80022 for ; Mon, 23 Jan 2023 14:05:39 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Ohw7JlX3; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=sUrDJ3R0; spf=pass (imf02.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674482739; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TboMeQ08DPimFEa6aGsq7T2+45CHDly2y+c6BUvgQZk=; b=itutW4kXAac6CQLBUY2df28UCirNbBhjMZwYHl1NjMiWbbRMHLlo8mH+MFBe978GU3nPsO oFk+Fj3G2B4/E2UYt4QyEwNE2GtBsnv54DCkZBPuN2rTGOZb+OuqDigE7BukeBU6OHe+4H m52O3ypJyHajG1kIhbmnPPe4Prnc8Ss= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Ohw7JlX3; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=sUrDJ3R0; spf=pass (imf02.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674482739; a=rsa-sha256; cv=none; b=8IVsimp3WVcGncIHgtXYOgRmp+mJ8kAx7rnSIC9y5mfmNnRBz2cgQxLrmNZd0nST2omrCE Bwz3wQxKpxiR0D/841eo+u3O/RVEVu0sn2jPADfhhFbJVLZQDlCirBf0gph0JhMNnoLhG8 /vOqnTZe/Fy0jeaLMJg0BU6IVKLzaRE= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B4712339C7; Mon, 23 Jan 2023 14:05:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1674482737; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TboMeQ08DPimFEa6aGsq7T2+45CHDly2y+c6BUvgQZk=; b=Ohw7JlX3WCQz37Vbj1oIjlnjYp5zGfZW6t2Ldf4VxPph/WnY/v38NDh/e4hQhfkYr0L2nM tTDy+o417yh7yO/LPXj1Z8n7gftKOHJarxHrklKDBQus+7cmh+qEymeZvwvGa3iSoerIoa I9j84jts3ZCF0osIEYwGrzJjD6qJmeM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1674482737; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TboMeQ08DPimFEa6aGsq7T2+45CHDly2y+c6BUvgQZk=; b=sUrDJ3R05y4rVwKZW1U5xV5ItVofPAnrWqkqedb0xLRv5AUbA46qNL6KwfAMuDVn1ihKfs SrGGxk/QC3zf4NAQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id AE567134F5; Mon, 23 Jan 2023 14:05:36 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id cjdAKTCUzmN6IwAAMHmgww (envelope-from ); Mon, 23 Jan 2023 14:05:36 +0000 Message-ID: <010a330c-a4d5-9c1a-3212-f9107d1c5f4e@suse.cz> Date: Mon, 23 Jan 2023 15:03:45 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory Content-Language: en-US To: "Huang, Kai" , "chao.p.peng@linux.intel.com" Cc: "tglx@linutronix.de" , "linux-arch@vger.kernel.org" , "kvm@vger.kernel.org" , "jmattson@google.com" , "Hocko, Michal" , "pbonzini@redhat.com" , "ak@linux.intel.com" , "Lutomirski, Andy" , "linux-fsdevel@vger.kernel.org" , "tabba@google.com" , "david@redhat.com" , "michael.roth@amd.com" , "kirill.shutemov@linux.intel.com" , "corbet@lwn.net" , "qemu-devel@nongnu.org" , "dhildenb@redhat.com" , "bfields@fieldses.org" , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , "bp@alien8.de" , "ddutile@redhat.com" , "rppt@kernel.org" , "shuah@kernel.org" , "vkuznets@redhat.com" , "mail@maciej.szmigiero.name" , "naoya.horiguchi@nec.com" , "qperret@google.com" , "arnd@arndb.de" , "linux-api@vger.kernel.org" , "yu.c.zhang@linux.intel.com" , "Christopherson,, Sean" , "wanpengli@tencent.com" , "vannapurve@google.com" , "hughd@google.com" , "aarcange@redhat.com" , "mingo@redhat.com" , "hpa@zytor.com" , "Nakajima, Jun" , "jlayton@kernel.org" , "joro@8bytes.org" , "linux-mm@kvack.org" , "Wang, Wei W" , "steven.price@arm.com" , "linux-doc@vger.kernel.org" , "Hansen, Dave" , "akpm@linux-foundation.org" , "linmiaohe@huawei.com" References: <20221202061347.1070246-1-chao.p.peng@linux.intel.com> <20221202061347.1070246-2-chao.p.peng@linux.intel.com> <5c6e2e516f19b0a030eae9bf073d555c57ca1f21.camel@intel.com> <20221219075313.GB1691829@chaop.bj.intel.com> <20221220072228.GA1724933@chaop.bj.intel.com> <126046ce506df070d57e6fe5ab9c92cdaf4cf9b7.camel@intel.com> <20221221133905.GA1766136@chaop.bj.intel.com> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6C55D80022 X-Rspam-User: X-Stat-Signature: 3hhcinp8mn8sacgfshj9ow88coogfmbh X-HE-Tag: 1674482739-562613 X-HE-Meta: U2FsdGVkX1+6q1fZk9sYikHtbzLtaPzkJKRUK6m6a9riT8IfcTAoJvltnvkJk+17qREgN8v1/0/udQgiAbOWIZDepG8BNRU+qvffhhcWuYOxhk7NG7VujjvZSYB4YBr/dhJZT+zLRPsB2/Kvkv628nfgIcHvvoxhI8/k74tv7vNzpZDnNqv6UeuziSzaOpXNn+cmv6Siyf1ah8/0i8Qtmj6NzPLGg7zWKnNv/gYhzrw5U91iBHU11rFpmbGM1vZgEthRJ1xEjrxop0rjpCPMKFh0tHg5B7LBEwZRfGyUiwO1tBKyJMd/U6t58LPR8ha0bDZdqrni4b/GwCukFNLMByAeh0wnhOPxrh6P/9AWoo9mLQ4kCyPW8X+X7jhe4+zuXc2k0VfFkzxEiBzyZjJ7plVAVQsh8uoJofN/ya9N34dsbI+pz2wpFIVQAtjO2sB0Wi66qudvvioZ/YNpdoNFQWGo7wcApsW46Xnbpmuoaiq7RFGV9K5oWQaw5MFbt3iXjzPF09RZ4a9q5ZpIdzGrKjW6NS36vpig4sO1J1A80L0c4u/MLcxCC4UM3APhf6NEQtAy2/fdxOFSpIB97S6b6QsrxvLNhR67WwX+9rCFeIEcKeII5+1CB9YllyWbivdXv10Bq6nK/dh4DU6p6zSGYE3jCwlsmybg9pLSmpNdt9D9OEA+8KqC/u4QgUE0226ZONPoavau55pDMI5MLZ6Vz5BRNR2J3yPOq2bd/Qs5yk+eRt6O4JqSTff/cskY/gInFgNUlMjLFDVNKT8COynr3rNwfsL1BJO0ag9B79y7SCFAK2JvRrIY/qyGLRuqwid4ALjPHw1rBlai9lI/I6oLGQZg9B4I6RARKLfd4cV8EBlbIXk+raogZq+aLCxmwigc7TknoCLcv5DSZYMrtdtU30NiUGfj2z3Zb3vbjVRGtS0EB9XgbK0ONDolADiGhnB/R8WO/3VzsLY3M4+5Mff AgFo/lPC rMW2BtJzFvb8F6QsB69yT+0q4PwGzGAxslbNOV3FafgjBudt+HB06ml9/U9cCt8uolBbrm4qxgkzP4sHuNGECFss9i0WDBNqgGAr3oZ2axW6zEHUnieKa8erP0VSO+nUkPq04N8KXpSDzpPXj7QEuSx3J9wR4aVP5qRSxCeGxD4oiqkWvzwplT9LeRWVEYKMzizZjaBzu7fshSygkMnGmLG6mgvWsm7VVSmbhzH5mUkS3o4CQLSZ/e7iNfrzgXp7l4eMNtq5OU6H/Jb4OwjAEHoUfHCTJAK4FctFqN4nXhubd/5Zf0vCXB7ikpw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/22/22 01:37, Huang, Kai wrote: >>> I argue that this page pinning (or page migration prevention) is not >>> tied to where the page comes from, instead related to how the page will >>> be used. Whether the page is restrictedmem backed or GUP() backed, once >>> it's used by current version of TDX then the page pinning is needed. So >>> such page migration prevention is really TDX thing, even not KVM generic >>> thing (that's why I think we don't need change the existing logic of >>> kvm_release_pfn_clean()).  >>> > This essentially boils down to who "owns" page migration handling, and sadly, > page migration is kinda "owned" by the core-kernel, i.e. KVM cannot handle page > migration by itself -- it's just a passive receiver. > > For normal pages, page migration is totally done by the core-kernel (i.e. it > unmaps page from VMA, allocates a new page, and uses migrate_pape() or a_ops- >> migrate_page() to actually migrate the page). > In the sense of TDX, conceptually it should be done in the same way. The more > important thing is: yes KVM can use get_page() to prevent page migration, but > when KVM wants to support it, KVM cannot just remove get_page(), as the core- > kernel will still just do migrate_page() which won't work for TDX (given > restricted_memfd doesn't have a_ops->migrate_page() implemented). > > So I think the restricted_memfd filesystem should own page migration handling, > (i.e. by implementing a_ops->migrate_page() to either just reject page migration > or somehow support it). While this thread seems to be settled on refcounts already, just wanted to point out that it wouldn't be ideal to prevent migrations by a_ops->migrate_page() rejecting them. It would mean cputime wasted (i.e. by memory compaction) by isolating the pages for migration and then releasing them after the callback rejects it (at least we wouldn't waste time creating and undoing migration entries in the userspace page tables as there's no mmap). Elevated refcount on the other hand is detected very early in compaction so no isolation is attempted, so from that aspect it's optimal.