From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28B7DC433DB for ; Tue, 26 Jan 2021 09:20:32 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B12D923109 for ; Tue, 26 Jan 2021 09:20:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B12D923109 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 82DAE100EBB95; Tue, 26 Jan 2021 01:20:31 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=198.145.29.99; helo=mail.kernel.org; envelope-from=rppt@kernel.org; receiver= Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 7250D100EBB94 for ; Tue, 26 Jan 2021 01:20:29 -0800 (PST) Received: by mail.kernel.org (Postfix) with ESMTPSA id E799F23104; Tue, 26 Jan 2021 09:20:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611652828; bh=bRC3SR9LJ18SWaWmVRerAdHAiKyQlnV1tKBjwhOnPS8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=aMudgeWS8EFR4uZOodLooioJQHSO0U8pMo3JByI0WkzeCLds2rL8dDBKPdMIET6wU NikHEJLu6mMPo3tSd4JLNAkh4YqDqIqAawEF4n7/5E0DpJcUj6JlJzfXjlARC2mrq+ CruSYC+Gj4FeoDXNsygqJGE7OxywyVzS/Nn8h+qL2ObhBOyIjfhuWkMbt9RXhvqlR9 sIvn3CEW/XJ7CY9dZPZIilb4D2ZP3U4aA+EJ7+3rtsmP4/R1ZB/lnI26RwNUiGucdL kTLxonDh/eLTOLxPU2kPm5Hg1de9FuFM6fyMMsw0V869lvcbyEy9tpZfl7U/HLU3MC 6HV9bZva2SpZg== Date: Tue, 26 Jan 2021 11:20:11 +0200 From: Mike Rapoport To: Michal Hocko Subject: Re: [PATCH v16 06/11] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: <20210126092011.GP6332@kernel.org> References: <20210121122723.3446-1-rppt@kernel.org> <20210121122723.3446-7-rppt@kernel.org> <20210125170122.GU827@dhcp22.suse.cz> <20210125213618.GL6332@kernel.org> <20210126071614.GX827@dhcp22.suse.cz> <20210126083311.GN6332@kernel.org> <20210126090013.GF827@dhcp22.suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20210126090013.GF827@dhcp22.suse.cz> Message-ID-Hash: 2TA6WM3YDYQGXXWUOHLJCTMTYQQSSG3K X-Message-ID-Hash: 2TA6WM3YDYQGXXWUOHLJCTMTYQQSSG3K X-MailFrom: rppt@kernel.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation CC: Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Tue, Jan 26, 2021 at 10:00:13AM +0100, Michal Hocko wrote: > On Tue 26-01-21 10:33:11, Mike Rapoport wrote: > > On Tue, Jan 26, 2021 at 08:16:14AM +0100, Michal Hocko wrote: > > > On Mon 25-01-21 23:36:18, Mike Rapoport wrote: > > > > On Mon, Jan 25, 2021 at 06:01:22PM +0100, Michal Hocko wrote: > > > > > On Thu 21-01-21 14:27:18, Mike Rapoport wrote: > > > > > > From: Mike Rapoport > > > > > > > > > > > > Introduce "memfd_secret" system call with the ability to create memory > > > > > > areas visible only in the context of the owning process and not mapped not > > > > > > only to other processes but in the kernel page tables as well. > > > > > > > > > > > > The user will create a file descriptor using the memfd_secret() system > > > > > > call. The memory areas created by mmap() calls from this file descriptor > > > > > > will be unmapped from the kernel direct map and they will be only mapped in > > > > > > the page table of the owning mm. > > > > > > > > > > > > The secret memory remains accessible in the process context using uaccess > > > > > > primitives, but it is not accessible using direct/linear map addresses. > > > > > > > > > > > > Functions in the follow_page()/get_user_page() family will refuse to return > > > > > > a page that belongs to the secret memory area. > > > > > > > > > > > > A page that was a part of the secret memory area is cleared when it is > > > > > > freed. > > > > > > > > > > > > The following example demonstrates creation of a secret mapping (error > > > > > > handling is omitted): > > > > > > > > > > > > fd = memfd_secret(0); > > > > > > ftruncate(fd, MAP_SIZE); > > > > > > ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > > > > > > > > > > I do not see any access control or permission model for this feature. > > > > > Is this feature generally safe to anybody? > > > > > > > > The mappings obey memlock limit. Besides, this feature should be enabled > > > > explicitly at boot with the kernel parameter that says what is the maximal > > > > memory size secretmem can consume. > > > > > > Why is such a model sufficient and future proof? I mean even when it has > > > to be enabled by an admin it is still all or nothing approach. Mlock > > > limit is not really useful because it is per mm rather than per user. > > > > > > Is there any reason why this is allowed for non-privileged processes? > > > Maybe this has been discussed in the past but is there any reason why > > > this cannot be done by a special device which will allow to provide at > > > least some permission policy? > > > > Why this should not be allowed for non-privileged processes? This behaves > > similarly to mlocked memory, so I don't see a reason why secretmem should > > have different permissions model. > > Because appart from the reclaim aspect it fragments the direct mapping > IIUC. That might have an impact on all others, right? It does fragment the direct map, but first it only splits 1G pages to 2M pages and as was discussed several times already it's not that clear which page size in the direct map is the best and this is very much workload dependent. These are the results of the benchmarks I've run with the default direct mapping covered with 1G pages, with disabled 1G pages using "nogbpages" in the kernel command line and with the entire direct map forced to use 4K pages using a simple patch to arch/x86/mm/init.c. https://docs.google.com/spreadsheets/d/1tdD-cu8e93vnfGsTFxZ5YdaEfs2E1GELlvWNOGkJV2U/edit?usp=sharing -- Sincerely yours, Mike. _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 966A9C433DB for ; Tue, 26 Jan 2021 12:04:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5AFFA2311D for ; Tue, 26 Jan 2021 12:04:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404741AbhAZLZn (ORCPT ); Tue, 26 Jan 2021 06:25:43 -0500 Received: from mail.kernel.org ([198.145.29.99]:58152 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390502AbhAZJVW (ORCPT ); Tue, 26 Jan 2021 04:21:22 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id E799F23104; Tue, 26 Jan 2021 09:20:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611652828; bh=bRC3SR9LJ18SWaWmVRerAdHAiKyQlnV1tKBjwhOnPS8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=aMudgeWS8EFR4uZOodLooioJQHSO0U8pMo3JByI0WkzeCLds2rL8dDBKPdMIET6wU NikHEJLu6mMPo3tSd4JLNAkh4YqDqIqAawEF4n7/5E0DpJcUj6JlJzfXjlARC2mrq+ CruSYC+Gj4FeoDXNsygqJGE7OxywyVzS/Nn8h+qL2ObhBOyIjfhuWkMbt9RXhvqlR9 sIvn3CEW/XJ7CY9dZPZIilb4D2ZP3U4aA+EJ7+3rtsmP4/R1ZB/lnI26RwNUiGucdL kTLxonDh/eLTOLxPU2kPm5Hg1de9FuFM6fyMMsw0V869lvcbyEy9tpZfl7U/HLU3MC 6HV9bZva2SpZg== Date: Tue, 26 Jan 2021 11:20:11 +0200 From: Mike Rapoport To: Michal Hocko Cc: Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: Re: [PATCH v16 06/11] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: <20210126092011.GP6332@kernel.org> References: <20210121122723.3446-1-rppt@kernel.org> <20210121122723.3446-7-rppt@kernel.org> <20210125170122.GU827@dhcp22.suse.cz> <20210125213618.GL6332@kernel.org> <20210126071614.GX827@dhcp22.suse.cz> <20210126083311.GN6332@kernel.org> <20210126090013.GF827@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210126090013.GF827@dhcp22.suse.cz> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 26, 2021 at 10:00:13AM +0100, Michal Hocko wrote: > On Tue 26-01-21 10:33:11, Mike Rapoport wrote: > > On Tue, Jan 26, 2021 at 08:16:14AM +0100, Michal Hocko wrote: > > > On Mon 25-01-21 23:36:18, Mike Rapoport wrote: > > > > On Mon, Jan 25, 2021 at 06:01:22PM +0100, Michal Hocko wrote: > > > > > On Thu 21-01-21 14:27:18, Mike Rapoport wrote: > > > > > > From: Mike Rapoport > > > > > > > > > > > > Introduce "memfd_secret" system call with the ability to create memory > > > > > > areas visible only in the context of the owning process and not mapped not > > > > > > only to other processes but in the kernel page tables as well. > > > > > > > > > > > > The user will create a file descriptor using the memfd_secret() system > > > > > > call. The memory areas created by mmap() calls from this file descriptor > > > > > > will be unmapped from the kernel direct map and they will be only mapped in > > > > > > the page table of the owning mm. > > > > > > > > > > > > The secret memory remains accessible in the process context using uaccess > > > > > > primitives, but it is not accessible using direct/linear map addresses. > > > > > > > > > > > > Functions in the follow_page()/get_user_page() family will refuse to return > > > > > > a page that belongs to the secret memory area. > > > > > > > > > > > > A page that was a part of the secret memory area is cleared when it is > > > > > > freed. > > > > > > > > > > > > The following example demonstrates creation of a secret mapping (error > > > > > > handling is omitted): > > > > > > > > > > > > fd = memfd_secret(0); > > > > > > ftruncate(fd, MAP_SIZE); > > > > > > ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > > > > > > > > > > I do not see any access control or permission model for this feature. > > > > > Is this feature generally safe to anybody? > > > > > > > > The mappings obey memlock limit. Besides, this feature should be enabled > > > > explicitly at boot with the kernel parameter that says what is the maximal > > > > memory size secretmem can consume. > > > > > > Why is such a model sufficient and future proof? I mean even when it has > > > to be enabled by an admin it is still all or nothing approach. Mlock > > > limit is not really useful because it is per mm rather than per user. > > > > > > Is there any reason why this is allowed for non-privileged processes? > > > Maybe this has been discussed in the past but is there any reason why > > > this cannot be done by a special device which will allow to provide at > > > least some permission policy? > > > > Why this should not be allowed for non-privileged processes? This behaves > > similarly to mlocked memory, so I don't see a reason why secretmem should > > have different permissions model. > > Because appart from the reclaim aspect it fragments the direct mapping > IIUC. That might have an impact on all others, right? It does fragment the direct map, but first it only splits 1G pages to 2M pages and as was discussed several times already it's not that clear which page size in the direct map is the best and this is very much workload dependent. These are the results of the benchmarks I've run with the default direct mapping covered with 1G pages, with disabled 1G pages using "nogbpages" in the kernel command line and with the entire direct map forced to use 4K pages using a simple patch to arch/x86/mm/init.c. https://docs.google.com/spreadsheets/d/1tdD-cu8e93vnfGsTFxZ5YdaEfs2E1GELlvWNOGkJV2U/edit?usp=sharing -- Sincerely yours, Mike. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D49DC433DB for ; Tue, 26 Jan 2021 09:20:57 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F1AC423109 for ; Tue, 26 Jan 2021 09:20:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F1AC423109 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=et8lrKea0o5V/tEqW3czw/bi36t10gLrTK5Uit+725w=; b=rIoi7JKTE21Yb8wyJnC36l1QH RELsj4qMbA9dCf7TnEjxy8q57jF5GUGW9qtEp7BF17e+SYI5PuuAja9q3fAvm0AqQbWT82JB4sPAH WApmlO4ULQNOPN47OjuwuEutxwUFRLMZ4wrJ4Hm7lV2Y30Nv5TUd2/ODUdWRAFgz9oqVVt1LsLLyK LNnQfIdPqTOyT9RdhWsCim6IYkxC4V/IVHL32V3Y9A3HQESwHo6IJWAM3cOl8cv4j6l8OvFtdPSxd YN1fO4ui2+Gz/nPyWMdm1NqufBJyCF6wanGbyz7AdNEbF3ls8iSVirFEaw1viiLXDzGqNOiGDAEm/ +CWOl1vFA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l4KWh-0005VT-Cz; Tue, 26 Jan 2021 09:20:43 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l4KWT-0005QT-Pd; Tue, 26 Jan 2021 09:20:33 +0000 Received: by mail.kernel.org (Postfix) with ESMTPSA id E799F23104; Tue, 26 Jan 2021 09:20:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611652828; bh=bRC3SR9LJ18SWaWmVRerAdHAiKyQlnV1tKBjwhOnPS8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=aMudgeWS8EFR4uZOodLooioJQHSO0U8pMo3JByI0WkzeCLds2rL8dDBKPdMIET6wU NikHEJLu6mMPo3tSd4JLNAkh4YqDqIqAawEF4n7/5E0DpJcUj6JlJzfXjlARC2mrq+ CruSYC+Gj4FeoDXNsygqJGE7OxywyVzS/Nn8h+qL2ObhBOyIjfhuWkMbt9RXhvqlR9 sIvn3CEW/XJ7CY9dZPZIilb4D2ZP3U4aA+EJ7+3rtsmP4/R1ZB/lnI26RwNUiGucdL kTLxonDh/eLTOLxPU2kPm5Hg1de9FuFM6fyMMsw0V869lvcbyEy9tpZfl7U/HLU3MC 6HV9bZva2SpZg== Date: Tue, 26 Jan 2021 11:20:11 +0200 From: Mike Rapoport To: Michal Hocko Subject: Re: [PATCH v16 06/11] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: <20210126092011.GP6332@kernel.org> References: <20210121122723.3446-1-rppt@kernel.org> <20210121122723.3446-7-rppt@kernel.org> <20210125170122.GU827@dhcp22.suse.cz> <20210125213618.GL6332@kernel.org> <20210126071614.GX827@dhcp22.suse.cz> <20210126083311.GN6332@kernel.org> <20210126090013.GF827@dhcp22.suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20210126090013.GF827@dhcp22.suse.cz> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210126_042030_055927_9D3520C0 X-CRM114-Status: GOOD ( 34.61 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , David Hildenbrand , Peter Zijlstra , Catalin Marinas , Dave Hansen , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, "H. Peter Anvin" , Christopher Lameter , Shuah Khan , Thomas Gleixner , Elena Reshetova , linux-arch@vger.kernel.org, Tycho Andersen , linux-nvdimm@lists.01.org, Will Deacon , x86@kernel.org, Matthew Wilcox , Mike Rapoport , Ingo Molnar , Michael Kerrisk , Palmer Dabbelt , Arnd Bergmann , James Bottomley , Hagen Paul Pfeifer , Borislav Petkov , Alexander Viro , Andy Lutomirski , Paul Walmsley , "Kirill A. Shutemov" , Dan Williams , linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Palmer Dabbelt , linux-fsdevel@vger.kernel.org, Shakeel Butt , Andrew Morton , Rick Edgecombe , Roman Gushchin Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org On Tue, Jan 26, 2021 at 10:00:13AM +0100, Michal Hocko wrote: > On Tue 26-01-21 10:33:11, Mike Rapoport wrote: > > On Tue, Jan 26, 2021 at 08:16:14AM +0100, Michal Hocko wrote: > > > On Mon 25-01-21 23:36:18, Mike Rapoport wrote: > > > > On Mon, Jan 25, 2021 at 06:01:22PM +0100, Michal Hocko wrote: > > > > > On Thu 21-01-21 14:27:18, Mike Rapoport wrote: > > > > > > From: Mike Rapoport > > > > > > > > > > > > Introduce "memfd_secret" system call with the ability to create memory > > > > > > areas visible only in the context of the owning process and not mapped not > > > > > > only to other processes but in the kernel page tables as well. > > > > > > > > > > > > The user will create a file descriptor using the memfd_secret() system > > > > > > call. The memory areas created by mmap() calls from this file descriptor > > > > > > will be unmapped from the kernel direct map and they will be only mapped in > > > > > > the page table of the owning mm. > > > > > > > > > > > > The secret memory remains accessible in the process context using uaccess > > > > > > primitives, but it is not accessible using direct/linear map addresses. > > > > > > > > > > > > Functions in the follow_page()/get_user_page() family will refuse to return > > > > > > a page that belongs to the secret memory area. > > > > > > > > > > > > A page that was a part of the secret memory area is cleared when it is > > > > > > freed. > > > > > > > > > > > > The following example demonstrates creation of a secret mapping (error > > > > > > handling is omitted): > > > > > > > > > > > > fd = memfd_secret(0); > > > > > > ftruncate(fd, MAP_SIZE); > > > > > > ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > > > > > > > > > > I do not see any access control or permission model for this feature. > > > > > Is this feature generally safe to anybody? > > > > > > > > The mappings obey memlock limit. Besides, this feature should be enabled > > > > explicitly at boot with the kernel parameter that says what is the maximal > > > > memory size secretmem can consume. > > > > > > Why is such a model sufficient and future proof? I mean even when it has > > > to be enabled by an admin it is still all or nothing approach. Mlock > > > limit is not really useful because it is per mm rather than per user. > > > > > > Is there any reason why this is allowed for non-privileged processes? > > > Maybe this has been discussed in the past but is there any reason why > > > this cannot be done by a special device which will allow to provide at > > > least some permission policy? > > > > Why this should not be allowed for non-privileged processes? This behaves > > similarly to mlocked memory, so I don't see a reason why secretmem should > > have different permissions model. > > Because appart from the reclaim aspect it fragments the direct mapping > IIUC. That might have an impact on all others, right? It does fragment the direct map, but first it only splits 1G pages to 2M pages and as was discussed several times already it's not that clear which page size in the direct map is the best and this is very much workload dependent. These are the results of the benchmarks I've run with the default direct mapping covered with 1G pages, with disabled 1G pages using "nogbpages" in the kernel command line and with the entire direct map forced to use 4K pages using a simple patch to arch/x86/mm/init.c. https://docs.google.com/spreadsheets/d/1tdD-cu8e93vnfGsTFxZ5YdaEfs2E1GELlvWNOGkJV2U/edit?usp=sharing -- Sincerely yours, Mike. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2455BC433E0 for ; Tue, 26 Jan 2021 09:22:15 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A3DE023104 for ; Tue, 26 Jan 2021 09:22:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3DE023104 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=edtx3OpZ71phhglIHDNhZlNIiPSXS4ymeOWM6uvxHd8=; b=GunN7TDg24bav6EpeK3n5m5jl 8blM6q5sjPgOu+beN/LYq0vR0vVfXyz61356j134BQYLFJFsFX34nz706HlFD/Ns6bDL0E7+2yRa9 FBS1v7o5d4ny+avkI9/+QwniYSC2p1YrLC0ZuXiWCVZ3YSvbgnCQ0DpWqTKQ6bYw3oeS1skxXNS28 BGbtfxhHFXrDtJvBJsAlLjXN4we72cjzyyNqOkuMne53ni/8IYlVh0YtNgjHVusrvYBzqnpnMTxNn SeaLG+YEoNJwEp2LGEBQCgFAVv+Y4iQDa3L3tBEYdBc+xNu3mUujxRVhmaupmqXibW3VtUAT7UAa0 PNFW+GbwA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l4KWe-0005UJ-5p; Tue, 26 Jan 2021 09:20:40 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l4KWT-0005QT-Pd; Tue, 26 Jan 2021 09:20:33 +0000 Received: by mail.kernel.org (Postfix) with ESMTPSA id E799F23104; Tue, 26 Jan 2021 09:20:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611652828; bh=bRC3SR9LJ18SWaWmVRerAdHAiKyQlnV1tKBjwhOnPS8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=aMudgeWS8EFR4uZOodLooioJQHSO0U8pMo3JByI0WkzeCLds2rL8dDBKPdMIET6wU NikHEJLu6mMPo3tSd4JLNAkh4YqDqIqAawEF4n7/5E0DpJcUj6JlJzfXjlARC2mrq+ CruSYC+Gj4FeoDXNsygqJGE7OxywyVzS/Nn8h+qL2ObhBOyIjfhuWkMbt9RXhvqlR9 sIvn3CEW/XJ7CY9dZPZIilb4D2ZP3U4aA+EJ7+3rtsmP4/R1ZB/lnI26RwNUiGucdL kTLxonDh/eLTOLxPU2kPm5Hg1de9FuFM6fyMMsw0V869lvcbyEy9tpZfl7U/HLU3MC 6HV9bZva2SpZg== Date: Tue, 26 Jan 2021 11:20:11 +0200 From: Mike Rapoport To: Michal Hocko Subject: Re: [PATCH v16 06/11] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: <20210126092011.GP6332@kernel.org> References: <20210121122723.3446-1-rppt@kernel.org> <20210121122723.3446-7-rppt@kernel.org> <20210125170122.GU827@dhcp22.suse.cz> <20210125213618.GL6332@kernel.org> <20210126071614.GX827@dhcp22.suse.cz> <20210126083311.GN6332@kernel.org> <20210126090013.GF827@dhcp22.suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20210126090013.GF827@dhcp22.suse.cz> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210126_042030_055927_9D3520C0 X-CRM114-Status: GOOD ( 34.61 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , David Hildenbrand , Peter Zijlstra , Catalin Marinas , Dave Hansen , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, "H. Peter Anvin" , Christopher Lameter , Shuah Khan , Thomas Gleixner , Elena Reshetova , linux-arch@vger.kernel.org, Tycho Andersen , linux-nvdimm@lists.01.org, Will Deacon , x86@kernel.org, Matthew Wilcox , Mike Rapoport , Ingo Molnar , Michael Kerrisk , Palmer Dabbelt , Arnd Bergmann , James Bottomley , Hagen Paul Pfeifer , Borislav Petkov , Alexander Viro , Andy Lutomirski , Paul Walmsley , "Kirill A. Shutemov" , Dan Williams , linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Palmer Dabbelt , linux-fsdevel@vger.kernel.org, Shakeel Butt , Andrew Morton , Rick Edgecombe , Roman Gushchin Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Jan 26, 2021 at 10:00:13AM +0100, Michal Hocko wrote: > On Tue 26-01-21 10:33:11, Mike Rapoport wrote: > > On Tue, Jan 26, 2021 at 08:16:14AM +0100, Michal Hocko wrote: > > > On Mon 25-01-21 23:36:18, Mike Rapoport wrote: > > > > On Mon, Jan 25, 2021 at 06:01:22PM +0100, Michal Hocko wrote: > > > > > On Thu 21-01-21 14:27:18, Mike Rapoport wrote: > > > > > > From: Mike Rapoport > > > > > > > > > > > > Introduce "memfd_secret" system call with the ability to create memory > > > > > > areas visible only in the context of the owning process and not mapped not > > > > > > only to other processes but in the kernel page tables as well. > > > > > > > > > > > > The user will create a file descriptor using the memfd_secret() system > > > > > > call. The memory areas created by mmap() calls from this file descriptor > > > > > > will be unmapped from the kernel direct map and they will be only mapped in > > > > > > the page table of the owning mm. > > > > > > > > > > > > The secret memory remains accessible in the process context using uaccess > > > > > > primitives, but it is not accessible using direct/linear map addresses. > > > > > > > > > > > > Functions in the follow_page()/get_user_page() family will refuse to return > > > > > > a page that belongs to the secret memory area. > > > > > > > > > > > > A page that was a part of the secret memory area is cleared when it is > > > > > > freed. > > > > > > > > > > > > The following example demonstrates creation of a secret mapping (error > > > > > > handling is omitted): > > > > > > > > > > > > fd = memfd_secret(0); > > > > > > ftruncate(fd, MAP_SIZE); > > > > > > ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > > > > > > > > > > I do not see any access control or permission model for this feature. > > > > > Is this feature generally safe to anybody? > > > > > > > > The mappings obey memlock limit. Besides, this feature should be enabled > > > > explicitly at boot with the kernel parameter that says what is the maximal > > > > memory size secretmem can consume. > > > > > > Why is such a model sufficient and future proof? I mean even when it has > > > to be enabled by an admin it is still all or nothing approach. Mlock > > > limit is not really useful because it is per mm rather than per user. > > > > > > Is there any reason why this is allowed for non-privileged processes? > > > Maybe this has been discussed in the past but is there any reason why > > > this cannot be done by a special device which will allow to provide at > > > least some permission policy? > > > > Why this should not be allowed for non-privileged processes? This behaves > > similarly to mlocked memory, so I don't see a reason why secretmem should > > have different permissions model. > > Because appart from the reclaim aspect it fragments the direct mapping > IIUC. That might have an impact on all others, right? It does fragment the direct map, but first it only splits 1G pages to 2M pages and as was discussed several times already it's not that clear which page size in the direct map is the best and this is very much workload dependent. These are the results of the benchmarks I've run with the default direct mapping covered with 1G pages, with disabled 1G pages using "nogbpages" in the kernel command line and with the entire direct map forced to use 4K pages using a simple patch to arch/x86/mm/init.c. https://docs.google.com/spreadsheets/d/1tdD-cu8e93vnfGsTFxZ5YdaEfs2E1GELlvWNOGkJV2U/edit?usp=sharing -- Sincerely yours, Mike. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel