From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81899C46467 for ; Tue, 20 Dec 2022 07:28:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233774AbiLTH2e (ORCPT ); Tue, 20 Dec 2022 02:28:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233506AbiLTH17 (ORCPT ); Tue, 20 Dec 2022 02:27:59 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61042120B6; Mon, 19 Dec 2022 23:27:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1671521220; x=1703057220; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:content-transfer-encoding:in-reply-to; bh=c7MFanM5OsQxf4mfVMxJdtJD6xfrjMgUk4/0Zk73muk=; b=diQ/TTC4A/f0m3ip6hLGojkoKtLi1yb+0prGnGDiqcK0tG3ov2K3kDcX jDVl1uj0ZpchIffz/L8jjz8ZWqs11AW66NO3YuEZ/8gI9Z0aIzgKbHD7R hn+zzznAoF7u6IPp8uaC9ejjRz+ohaBy+PHB9fToL6oNo/7ywztCd2Bjf scTJgKSLdnrI5QQul4tv/Hu/EL1zV/rfJch0LZq+FIIxKXiSeEBXHJ9YT dPApISjd4ejTcZibrseEFNAyr6bhP8Id+lOdw2jBLQbtg5Kd0lMMG4jyj yTjzNTVUdmUW/MxNKza3/GkRbh/V4MEK9zejFlGcAUXN+Cp+BbobD+3im A==; X-IronPort-AV: E=McAfee;i="6500,9779,10566"; a="317184648" X-IronPort-AV: E=Sophos;i="5.96,258,1665471600"; d="scan'208";a="317184648" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Dec 2022 23:26:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10566"; a="714332739" X-IronPort-AV: E=Sophos;i="5.96,258,1665471600"; d="scan'208";a="714332739" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.193.75]) by fmsmga008.fm.intel.com with ESMTP; 19 Dec 2022 23:26:44 -0800 Date: Tue, 20 Dec 2022 15:22:28 +0800 From: Chao Peng To: "Huang, Kai" Cc: "tglx@linutronix.de" , "linux-arch@vger.kernel.org" , "kvm@vger.kernel.org" , "Wang, Wei W" , "jmattson@google.com" , "Lutomirski, Andy" , "ak@linux.intel.com" , "kirill.shutemov@linux.intel.com" , "david@redhat.com" , "qemu-devel@nongnu.org" , "tabba@google.com" , "Hocko, Michal" , "michael.roth@amd.com" , "corbet@lwn.net" , "linux-fsdevel@vger.kernel.org" , "dhildenb@redhat.com" , "bfields@fieldses.org" , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , "bp@alien8.de" , "vannapurve@google.com" , "rppt@kernel.org" , "shuah@kernel.org" , "vkuznets@redhat.com" , "vbabka@suse.cz" , "mail@maciej.szmigiero.name" , "linux-api@vger.kernel.org" , "qperret@google.com" , "arnd@arndb.de" , "pbonzini@redhat.com" , "ddutile@redhat.com" , "naoya.horiguchi@nec.com" , "Christopherson,, Sean" , "wanpengli@tencent.com" , "yu.c.zhang@linux.intel.com" , "hughd@google.com" , "aarcange@redhat.com" , "mingo@redhat.com" , "hpa@zytor.com" , "Nakajima, Jun" , "jlayton@kernel.org" , "joro@8bytes.org" , "linux-mm@kvack.org" , "steven.price@arm.com" , "Hansen, Dave" , "linux-doc@vger.kernel.org" , "akpm@linux-foundation.org" , "linmiaohe@huawei.com" Subject: Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory Message-ID: <20221220072228.GA1724933@chaop.bj.intel.com> Reply-To: Chao Peng References: <20221202061347.1070246-1-chao.p.peng@linux.intel.com> <20221202061347.1070246-2-chao.p.peng@linux.intel.com> <5c6e2e516f19b0a030eae9bf073d555c57ca1f21.camel@intel.com> <20221219075313.GB1691829@chaop.bj.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-api@vger.kernel.org On Mon, Dec 19, 2022 at 08:48:10AM +0000, Huang, Kai wrote: > On Mon, 2022-12-19 at 15:53 +0800, Chao Peng wrote: > > > > > > [...] > > > > > > > + > > > > + /* > > > > + * These pages are currently unmovable so don't place them into > > > > movable > > > > + * pageblocks (e.g. CMA and ZONE_MOVABLE). > > > > + */ > > > > + mapping = memfd->f_mapping; > > > > + mapping_set_unevictable(mapping); > > > > + mapping_set_gfp_mask(mapping, > > > > +      mapping_gfp_mask(mapping) & ~__GFP_MOVABLE); > > > > > > But, IIUC removing __GFP_MOVABLE flag here only makes page allocation from > > > non- > > > movable zones, but doesn't necessarily prevent page from being migrated.  My > > > first glance is you need to implement either a_ops->migrate_folio() or just > > > get_page() after faulting in the page to prevent. > > > > The current api restrictedmem_get_page() already does this, after the > > caller calling it, it holds a reference to the page. The caller then > > decides when to call put_page() appropriately. > > I tried to dig some history. Perhaps I am missing something, but it seems Kirill > said in v9 that this code doesn't prevent page migration, and we need to > increase page refcount in restrictedmem_get_page(): > > https://lore.kernel.org/linux-mm/20221129112139.usp6dqhbih47qpjl@box.shutemov.name/ > > But looking at this series it seems restrictedmem_get_page() in this v10 is > identical to the one in v9 (except v10 uses 'folio' instead of 'page')? restrictedmem_get_page() increases page refcount several versions ago so no change in v10 is needed. You probably missed my reply: https://lore.kernel.org/linux-mm/20221129135844.GA902164@chaop.bj.intel.com/ The current solution is clear: unless we have better approach, we will let restrictedmem user (KVM in this case) to hold the refcount to prevent page migration. Thanks, Chao > > Anyway if this is not fixed, then it should be fixed. Otherwise, a comment at > the place where page refcount is increased will be helpful to help people > understand page migration is actually prevented. >