From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60A33C433E0 for ; Tue, 9 Feb 2021 13:17:17 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D940264ECF for ; Tue, 9 Feb 2021 13:17:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D940264ECF Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 98D2E100EAB61; Tue, 9 Feb 2021 05:17:16 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=195.135.220.15; helo=mx2.suse.de; envelope-from=mhocko@suse.com; receiver= Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 9474B100EAB4A for ; Tue, 9 Feb 2021 05:17:14 -0800 (PST) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1612876633; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hscVwXWbZS59k2Mrx2EQB55XPrmNAr7Nf/1W616aiRA=; b=phJYgl2FDdsaxXzglirq+9E/PFvQ09SmMOXMZhmQCkX3xnUui6gyQgU8pq6VCgFZRP1Nj7 4ywYD8QcYFjpmTT73fbyJ99EI01XOD13cuJJ02YYNRiktCaeLBu0orc/YuPfY4sb+PIX+n y3DWzu+Qklmtk5MHIoO/Fx8xciQgqDo= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 9362BAD6A; Tue, 9 Feb 2021 13:17:12 +0000 (UTC) Date: Tue, 9 Feb 2021 14:17:11 +0100 From: Michal Hocko To: Mike Rapoport Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: References: <20210208084920.2884-1-rppt@kernel.org> <20210208084920.2884-8-rppt@kernel.org> <20210208212605.GX242749@kernel.org> <20210209090938.GP299309@linux.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20210209090938.GP299309@linux.ibm.com> Message-ID-Hash: 3RLR2N6RTIZ2KHLLB65GTMIL74EQVI6T X-Message-ID-Hash: 3RLR2N6RTIZ2KHLLB65GTMIL74EQVI6T X-MailFrom: mhocko@suse.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation CC: Mike Rapoport , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho And ersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Tue 09-02-21 11:09:38, Mike Rapoport wrote: > On Tue, Feb 09, 2021 at 09:47:08AM +0100, Michal Hocko wrote: > > On Mon 08-02-21 23:26:05, Mike Rapoport wrote: > > > On Mon, Feb 08, 2021 at 11:49:22AM +0100, Michal Hocko wrote: > > > > On Mon 08-02-21 10:49:17, Mike Rapoport wrote: > > [...] > > > > > The file descriptor based memory has several advantages over the > > > > > "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). It > > > > > paves the way for VMMs to remove the secret memory range from the process; > > > > > > > > I do not understand how it helps to remove the memory from the process > > > > as the interface explicitly allows to add a memory that is removed from > > > > all other processes via direct map. > > > > > > The current implementation does not help to remove the memory from the > > > process, but using fd-backed memory seems a better interface to remove > > > guest memory from host mappings than mmap. As Andy nicely put it: > > > > > > "Getting fd-backed memory into a guest will take some possibly major work in > > > the kernel, but getting vma-backed memory into a guest without mapping it > > > in the host user address space seems much, much worse." > > > > OK, so IIUC this means that the model is to hand over memory from host > > to guest. I thought the guest would be under control of its address > > space and therefore it operates on the VMAs. This would benefit from > > an additional and more specific clarification. > > How guest would operate on VMAs if the interface between host and guest is > virtual hardware? I have to say that I am not really familiar with this area so my view might be misleading or completely wrong. I thought that the HW address ranges are mapped to the guest process and therefore have a VMA. > If you mean qemu (or any other userspace part of VMM that uses KVM), so one > of the points Andy mentioned back than is to remove mappings of the guest > memory from the qemu process. > > > > > > As secret memory implementation is not an extension of tmpfs or hugetlbfs, > > > > > usage of a dedicated system call rather than hooking new functionality into > > > > > memfd_create(2) emphasises that memfd_secret(2) has different semantics and > > > > > allows better upwards compatibility. > > > > > > > > What is this supposed to mean? What are differences? > > > > > > Well, the phrasing could be better indeed. That supposed to mean that > > > they differ in the semantics behind the file descriptor: memfd_create > > > implements sealing for shmem and hugetlbfs while memfd_secret implements > > > memory hidden from the kernel. > > > > Right but why memfd_create model is not sufficient for the usecase? > > Please note that I am arguing against. To be honest I do not really care > > much. Using an existing scheme is usually preferable from my POV but > > there might be real reasons why shmem as a backing "storage" is not > > appropriate. > > Citing my older email: > > I've hesitated whether to continue to use new flags to memfd_create() or to > add a new system call and I've decided to use a new system call after I've > started to look into man pages update. There would have been two completely > independent descriptions and I think it would have been very confusing. Could you elaborate? Unmapping from the kernel address space can work both for sealed or hugetlb memfds, no? Those features are completely orthogonal AFAICS. With a dedicated syscall you will need to introduce this functionality on top if that is required. Have you considered that? I mean hugetlb pages are used to back guest memory very often. Is this something that will be a secret memory usecase? Please be really specific when giving arguments to back a new syscall decision. -- Michal Hocko SUSE Labs _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63AEEC433DB for ; Tue, 9 Feb 2021 13:18:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 16A1A64EED for ; Tue, 9 Feb 2021 13:18:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231341AbhBINSW (ORCPT ); Tue, 9 Feb 2021 08:18:22 -0500 Received: from mx2.suse.de ([195.135.220.15]:59034 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231284AbhBINSA (ORCPT ); Tue, 9 Feb 2021 08:18:00 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1612876633; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hscVwXWbZS59k2Mrx2EQB55XPrmNAr7Nf/1W616aiRA=; b=phJYgl2FDdsaxXzglirq+9E/PFvQ09SmMOXMZhmQCkX3xnUui6gyQgU8pq6VCgFZRP1Nj7 4ywYD8QcYFjpmTT73fbyJ99EI01XOD13cuJJ02YYNRiktCaeLBu0orc/YuPfY4sb+PIX+n y3DWzu+Qklmtk5MHIoO/Fx8xciQgqDo= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 9362BAD6A; Tue, 9 Feb 2021 13:17:12 +0000 (UTC) Date: Tue, 9 Feb 2021 14:17:11 +0100 From: Michal Hocko To: Mike Rapoport Cc: Mike Rapoport , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: References: <20210208084920.2884-1-rppt@kernel.org> <20210208084920.2884-8-rppt@kernel.org> <20210208212605.GX242749@kernel.org> <20210209090938.GP299309@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210209090938.GP299309@linux.ibm.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 09-02-21 11:09:38, Mike Rapoport wrote: > On Tue, Feb 09, 2021 at 09:47:08AM +0100, Michal Hocko wrote: > > On Mon 08-02-21 23:26:05, Mike Rapoport wrote: > > > On Mon, Feb 08, 2021 at 11:49:22AM +0100, Michal Hocko wrote: > > > > On Mon 08-02-21 10:49:17, Mike Rapoport wrote: > > [...] > > > > > The file descriptor based memory has several advantages over the > > > > > "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). It > > > > > paves the way for VMMs to remove the secret memory range from the process; > > > > > > > > I do not understand how it helps to remove the memory from the process > > > > as the interface explicitly allows to add a memory that is removed from > > > > all other processes via direct map. > > > > > > The current implementation does not help to remove the memory from the > > > process, but using fd-backed memory seems a better interface to remove > > > guest memory from host mappings than mmap. As Andy nicely put it: > > > > > > "Getting fd-backed memory into a guest will take some possibly major work in > > > the kernel, but getting vma-backed memory into a guest without mapping it > > > in the host user address space seems much, much worse." > > > > OK, so IIUC this means that the model is to hand over memory from host > > to guest. I thought the guest would be under control of its address > > space and therefore it operates on the VMAs. This would benefit from > > an additional and more specific clarification. > > How guest would operate on VMAs if the interface between host and guest is > virtual hardware? I have to say that I am not really familiar with this area so my view might be misleading or completely wrong. I thought that the HW address ranges are mapped to the guest process and therefore have a VMA. > If you mean qemu (or any other userspace part of VMM that uses KVM), so one > of the points Andy mentioned back than is to remove mappings of the guest > memory from the qemu process. > > > > > > As secret memory implementation is not an extension of tmpfs or hugetlbfs, > > > > > usage of a dedicated system call rather than hooking new functionality into > > > > > memfd_create(2) emphasises that memfd_secret(2) has different semantics and > > > > > allows better upwards compatibility. > > > > > > > > What is this supposed to mean? What are differences? > > > > > > Well, the phrasing could be better indeed. That supposed to mean that > > > they differ in the semantics behind the file descriptor: memfd_create > > > implements sealing for shmem and hugetlbfs while memfd_secret implements > > > memory hidden from the kernel. > > > > Right but why memfd_create model is not sufficient for the usecase? > > Please note that I am arguing against. To be honest I do not really care > > much. Using an existing scheme is usually preferable from my POV but > > there might be real reasons why shmem as a backing "storage" is not > > appropriate. > > Citing my older email: > > I've hesitated whether to continue to use new flags to memfd_create() or to > add a new system call and I've decided to use a new system call after I've > started to look into man pages update. There would have been two completely > independent descriptions and I think it would have been very confusing. Could you elaborate? Unmapping from the kernel address space can work both for sealed or hugetlb memfds, no? Those features are completely orthogonal AFAICS. With a dedicated syscall you will need to introduce this functionality on top if that is required. Have you considered that? I mean hugetlb pages are used to back guest memory very often. Is this something that will be a secret memory usecase? Please be really specific when giving arguments to back a new syscall decision. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2087C433E0 for ; Tue, 9 Feb 2021 13:17:28 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 713A464EE3 for ; Tue, 9 Feb 2021 13:17:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 713A464EE3 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=wq+VvdKW4LUJEnH6nHtJ1Ijefza+tuy66KGFUqrGNEk=; b=hYN37crS6YvFmGdNrIotrgee/ 9hv081GiKJB77SH/MG6wG7RJW8kqimlz1qH/pi6k0fUxzgjOAnQAkCuU+npqfz3nOwtNygFnd7R5M +nm7B3CtzAJOPJOiA2xIzoAJLqK4GTGRPVPUOrDoLSo6djU+PErUFg3NWa2IG/h6zlkVBEgqKZn77 tPLGU1rZGXk/iQdkB6UtaKaB+dwI+Ws65X3a8nOTXY/7v2GzemlMnb69Op2BsKWS7AHC32GbZLLJo bCFKtTXTn+vdXeNP4XSkRuzJH33u8VS0gkMGv5XpHvDsQau/rIwVJTe189NzYNcfYcdaUwCgpXivR Uu7pzGFsw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9StN-0008H9-4e; Tue, 09 Feb 2021 13:17:21 +0000 Received: from mx2.suse.de ([195.135.220.15]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9StH-0008FJ-TQ; Tue, 09 Feb 2021 13:17:17 +0000 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1612876633; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hscVwXWbZS59k2Mrx2EQB55XPrmNAr7Nf/1W616aiRA=; b=phJYgl2FDdsaxXzglirq+9E/PFvQ09SmMOXMZhmQCkX3xnUui6gyQgU8pq6VCgFZRP1Nj7 4ywYD8QcYFjpmTT73fbyJ99EI01XOD13cuJJ02YYNRiktCaeLBu0orc/YuPfY4sb+PIX+n y3DWzu+Qklmtk5MHIoO/Fx8xciQgqDo= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 9362BAD6A; Tue, 9 Feb 2021 13:17:12 +0000 (UTC) Date: Tue, 9 Feb 2021 14:17:11 +0100 From: Michal Hocko To: Mike Rapoport Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: References: <20210208084920.2884-1-rppt@kernel.org> <20210208084920.2884-8-rppt@kernel.org> <20210208212605.GX242749@kernel.org> <20210209090938.GP299309@linux.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20210209090938.GP299309@linux.ibm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210209_081716_177357_0E371719 X-CRM114-Status: GOOD ( 40.64 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , David Hildenbrand , Peter Zijlstra , Catalin Marinas , Dave Hansen , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, "H. Peter Anvin" , Christopher Lameter , Shuah Khan , Thomas Gleixner , Elena Reshetova , linux-arch@vger.kernel.org, Tycho Andersen , linux-nvdimm@lists.01.org, Will Deacon , x86@kernel.org, Matthew Wilcox , Ingo Molnar , Michael Kerrisk , Palmer Dabbelt , Arnd Bergmann , James Bottomley , Hagen Paul Pfeifer , Borislav Petkov , Alexander Viro , Andy Lutomirski , Paul Walmsley , "Kirill A. Shutemov" , Dan Williams , linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Palmer Dabbelt , linux-fsdevel@vger.kernel.org, Shakeel Butt , Andrew Morton , Rick Edgecombe , Roman Gushchin , Mike Rapoport Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org On Tue 09-02-21 11:09:38, Mike Rapoport wrote: > On Tue, Feb 09, 2021 at 09:47:08AM +0100, Michal Hocko wrote: > > On Mon 08-02-21 23:26:05, Mike Rapoport wrote: > > > On Mon, Feb 08, 2021 at 11:49:22AM +0100, Michal Hocko wrote: > > > > On Mon 08-02-21 10:49:17, Mike Rapoport wrote: > > [...] > > > > > The file descriptor based memory has several advantages over the > > > > > "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). It > > > > > paves the way for VMMs to remove the secret memory range from the process; > > > > > > > > I do not understand how it helps to remove the memory from the process > > > > as the interface explicitly allows to add a memory that is removed from > > > > all other processes via direct map. > > > > > > The current implementation does not help to remove the memory from the > > > process, but using fd-backed memory seems a better interface to remove > > > guest memory from host mappings than mmap. As Andy nicely put it: > > > > > > "Getting fd-backed memory into a guest will take some possibly major work in > > > the kernel, but getting vma-backed memory into a guest without mapping it > > > in the host user address space seems much, much worse." > > > > OK, so IIUC this means that the model is to hand over memory from host > > to guest. I thought the guest would be under control of its address > > space and therefore it operates on the VMAs. This would benefit from > > an additional and more specific clarification. > > How guest would operate on VMAs if the interface between host and guest is > virtual hardware? I have to say that I am not really familiar with this area so my view might be misleading or completely wrong. I thought that the HW address ranges are mapped to the guest process and therefore have a VMA. > If you mean qemu (or any other userspace part of VMM that uses KVM), so one > of the points Andy mentioned back than is to remove mappings of the guest > memory from the qemu process. > > > > > > As secret memory implementation is not an extension of tmpfs or hugetlbfs, > > > > > usage of a dedicated system call rather than hooking new functionality into > > > > > memfd_create(2) emphasises that memfd_secret(2) has different semantics and > > > > > allows better upwards compatibility. > > > > > > > > What is this supposed to mean? What are differences? > > > > > > Well, the phrasing could be better indeed. That supposed to mean that > > > they differ in the semantics behind the file descriptor: memfd_create > > > implements sealing for shmem and hugetlbfs while memfd_secret implements > > > memory hidden from the kernel. > > > > Right but why memfd_create model is not sufficient for the usecase? > > Please note that I am arguing against. To be honest I do not really care > > much. Using an existing scheme is usually preferable from my POV but > > there might be real reasons why shmem as a backing "storage" is not > > appropriate. > > Citing my older email: > > I've hesitated whether to continue to use new flags to memfd_create() or to > add a new system call and I've decided to use a new system call after I've > started to look into man pages update. There would have been two completely > independent descriptions and I think it would have been very confusing. Could you elaborate? Unmapping from the kernel address space can work both for sealed or hugetlb memfds, no? Those features are completely orthogonal AFAICS. With a dedicated syscall you will need to introduce this functionality on top if that is required. Have you considered that? I mean hugetlb pages are used to back guest memory very often. Is this something that will be a secret memory usecase? Please be really specific when giving arguments to back a new syscall decision. -- Michal Hocko SUSE Labs _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B3D2C433DB for ; Tue, 9 Feb 2021 13:19:14 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7702664EF1 for ; Tue, 9 Feb 2021 13:19:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7702664EF1 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=A3pehv4PWih18gnIqITyHvEcqsGXUgB3KM6kAMznoVc=; b=IWDtarbejI7fbOcXgHdjy2XLd 11KBgIfeHCN4W3VNhMEprLjq2UZ8veM1O0bs7EP5aGHXCN+eUl25vUG3OTEhTUvptyEvbseGyaUil MaAWRjFiQiWbRlK2daD3op3TF1kGxRg/6vwRxCtOM9UJwdniAGSXfnL5xNqX9qAOEBVQI7AVbMXM5 /vbhiFzv4YIPU+ZLOQGBguPb+7EnSb1zoD0jkIlJfKQc11yEfVAccL9u3KqqadEGtsVdmgGNTicZU Oz9JXFbAwWBPsT1B09UzTlyHdJXnaPROZwwb985BEYKcY58NKLE8E7Snlclopeso4i4iw2FRvLhAT 9sf4mmfDA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9StL-0008Gd-6b; Tue, 09 Feb 2021 13:17:19 +0000 Received: from mx2.suse.de ([195.135.220.15]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l9StH-0008FJ-TQ; Tue, 09 Feb 2021 13:17:17 +0000 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1612876633; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hscVwXWbZS59k2Mrx2EQB55XPrmNAr7Nf/1W616aiRA=; b=phJYgl2FDdsaxXzglirq+9E/PFvQ09SmMOXMZhmQCkX3xnUui6gyQgU8pq6VCgFZRP1Nj7 4ywYD8QcYFjpmTT73fbyJ99EI01XOD13cuJJ02YYNRiktCaeLBu0orc/YuPfY4sb+PIX+n y3DWzu+Qklmtk5MHIoO/Fx8xciQgqDo= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 9362BAD6A; Tue, 9 Feb 2021 13:17:12 +0000 (UTC) Date: Tue, 9 Feb 2021 14:17:11 +0100 From: Michal Hocko To: Mike Rapoport Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: References: <20210208084920.2884-1-rppt@kernel.org> <20210208084920.2884-8-rppt@kernel.org> <20210208212605.GX242749@kernel.org> <20210209090938.GP299309@linux.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20210209090938.GP299309@linux.ibm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210209_081716_177357_0E371719 X-CRM114-Status: GOOD ( 40.64 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , David Hildenbrand , Peter Zijlstra , Catalin Marinas , Dave Hansen , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, "H. Peter Anvin" , Christopher Lameter , Shuah Khan , Thomas Gleixner , Elena Reshetova , linux-arch@vger.kernel.org, Tycho Andersen , linux-nvdimm@lists.01.org, Will Deacon , x86@kernel.org, Matthew Wilcox , Ingo Molnar , Michael Kerrisk , Palmer Dabbelt , Arnd Bergmann , James Bottomley , Hagen Paul Pfeifer , Borislav Petkov , Alexander Viro , Andy Lutomirski , Paul Walmsley , "Kirill A. Shutemov" , Dan Williams , linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Palmer Dabbelt , linux-fsdevel@vger.kernel.org, Shakeel Butt , Andrew Morton , Rick Edgecombe , Roman Gushchin , Mike Rapoport Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue 09-02-21 11:09:38, Mike Rapoport wrote: > On Tue, Feb 09, 2021 at 09:47:08AM +0100, Michal Hocko wrote: > > On Mon 08-02-21 23:26:05, Mike Rapoport wrote: > > > On Mon, Feb 08, 2021 at 11:49:22AM +0100, Michal Hocko wrote: > > > > On Mon 08-02-21 10:49:17, Mike Rapoport wrote: > > [...] > > > > > The file descriptor based memory has several advantages over the > > > > > "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). It > > > > > paves the way for VMMs to remove the secret memory range from the process; > > > > > > > > I do not understand how it helps to remove the memory from the process > > > > as the interface explicitly allows to add a memory that is removed from > > > > all other processes via direct map. > > > > > > The current implementation does not help to remove the memory from the > > > process, but using fd-backed memory seems a better interface to remove > > > guest memory from host mappings than mmap. As Andy nicely put it: > > > > > > "Getting fd-backed memory into a guest will take some possibly major work in > > > the kernel, but getting vma-backed memory into a guest without mapping it > > > in the host user address space seems much, much worse." > > > > OK, so IIUC this means that the model is to hand over memory from host > > to guest. I thought the guest would be under control of its address > > space and therefore it operates on the VMAs. This would benefit from > > an additional and more specific clarification. > > How guest would operate on VMAs if the interface between host and guest is > virtual hardware? I have to say that I am not really familiar with this area so my view might be misleading or completely wrong. I thought that the HW address ranges are mapped to the guest process and therefore have a VMA. > If you mean qemu (or any other userspace part of VMM that uses KVM), so one > of the points Andy mentioned back than is to remove mappings of the guest > memory from the qemu process. > > > > > > As secret memory implementation is not an extension of tmpfs or hugetlbfs, > > > > > usage of a dedicated system call rather than hooking new functionality into > > > > > memfd_create(2) emphasises that memfd_secret(2) has different semantics and > > > > > allows better upwards compatibility. > > > > > > > > What is this supposed to mean? What are differences? > > > > > > Well, the phrasing could be better indeed. That supposed to mean that > > > they differ in the semantics behind the file descriptor: memfd_create > > > implements sealing for shmem and hugetlbfs while memfd_secret implements > > > memory hidden from the kernel. > > > > Right but why memfd_create model is not sufficient for the usecase? > > Please note that I am arguing against. To be honest I do not really care > > much. Using an existing scheme is usually preferable from my POV but > > there might be real reasons why shmem as a backing "storage" is not > > appropriate. > > Citing my older email: > > I've hesitated whether to continue to use new flags to memfd_create() or to > add a new system call and I've decided to use a new system call after I've > started to look into man pages update. There would have been two completely > independent descriptions and I think it would have been very confusing. Could you elaborate? Unmapping from the kernel address space can work both for sealed or hugetlb memfds, no? Those features are completely orthogonal AFAICS. With a dedicated syscall you will need to introduce this functionality on top if that is required. Have you considered that? I mean hugetlb pages are used to back guest memory very often. Is this something that will be a secret memory usecase? Please be really specific when giving arguments to back a new syscall decision. -- Michal Hocko SUSE Labs _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel