From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D28C9C433DB for ; Thu, 11 Feb 2021 09:01:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3DC9464E16 for ; Thu, 11 Feb 2021 09:01:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3DC9464E16 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 82B2E6B00A0; Thu, 11 Feb 2021 04:01:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B4386B00A1; Thu, 11 Feb 2021 04:01:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 654E16B00A2; Thu, 11 Feb 2021 04:01:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0111.hostedemail.com [216.40.44.111]) by kanga.kvack.org (Postfix) with ESMTP id 499E76B00A0 for ; Thu, 11 Feb 2021 04:01:56 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 01FCE5DEA for ; Thu, 11 Feb 2021 09:01:55 +0000 (UTC) X-FDA: 77805394632.27.month62_3103a7927617 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id D64713D663 for ; Thu, 11 Feb 2021 09:01:55 +0000 (UTC) X-HE-Tag: month62_3103a7927617 X-Filterd-Recvd-Size: 9002 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Thu, 11 Feb 2021 09:01:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613034114; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=87Impx6zBhFPew0MsLa4uFlCrkffLZXeuCsIAmOnb1o=; b=Lk/M0qqeo34ImcbbCgR6VNkm4dMEWz1EyhOHirHngtdRHyhlfncwIHEYQMYH0Y3e2Qs1Rd XHsxCqr7lkmdX9oKHSZaWj99yMAujEH2ETPqPH5qXYWC/k8AygFx5mmTKTQGv6bmmnPOMZ jfw3xwGhj04iirr0U6S+EbCh9j6ZPOk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-339-CVxMxznvM0iwyXDkheMOxg-1; Thu, 11 Feb 2021 04:01:50 -0500 X-MC-Unique: CVxMxznvM0iwyXDkheMOxg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B6F3E107B465; Thu, 11 Feb 2021 09:01:44 +0000 (UTC) Received: from [10.36.114.52] (ovpn-114-52.ams2.redhat.com [10.36.114.52]) by smtp.corp.redhat.com (Postfix) with ESMTP id F05745DF58; Thu, 11 Feb 2021 09:01:33 +0000 (UTC) To: Michal Hocko , Mike Rapoport Cc: Mike Rapoport , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt References: <20210208084920.2884-1-rppt@kernel.org> <20210208084920.2884-8-rppt@kernel.org> <20210208212605.GX242749@kernel.org> <20210209090938.GP299309@linux.ibm.com> <20210211071319.GF242749@kernel.org> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: <0d66baec-1898-987b-7eaf-68a015c027ff@redhat.com> Date: Thu, 11 Feb 2021 10:01:32 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11.02.21 09:39, Michal Hocko wrote: > On Thu 11-02-21 09:13:19, Mike Rapoport wrote: >> On Tue, Feb 09, 2021 at 02:17:11PM +0100, Michal Hocko wrote: >>> On Tue 09-02-21 11:09:38, Mike Rapoport wrote: > [...] >>>> Citing my older email: >>>> >>>> I've hesitated whether to continue to use new flags to memfd_cr= eate() or to >>>> add a new system call and I've decided to use a new system call= after I've >>>> started to look into man pages update. There would have been tw= o completely >>>> independent descriptions and I think it would have been very co= nfusing. >>> >>> Could you elaborate? Unmapping from the kernel address space can work >>> both for sealed or hugetlb memfds, no? Those features are completely >>> orthogonal AFAICS. With a dedicated syscall you will need to introduc= e >>> this functionality on top if that is required. Have you considered th= at? >>> I mean hugetlb pages are used to back guest memory very often. Is thi= s >>> something that will be a secret memory usecase? >>> >>> Please be really specific when giving arguments to back a new syscall >>> decision. >> >> Isn't "syscalls have completely independent description" specific enou= gh? >=20 > No, it's not as you can see from questions I've had above. More on that > below. >=20 >> We are talking about API here, not the implementation details whether >> secretmem supports large pages or not. >> >> The purpose of memfd_create() is to create a file-like access to memor= y. >> The purpose of memfd_secret() is to create a way to access memory hidd= en >> from the kernel. >> >> I don't think overloading memfd_create() with the secretmem flags beca= use >> they happen to return a file descriptor will be better for users, but >> rather will be more confusing. >=20 > This is quite a subjective conclusion. I could very well argue that it > would be much better to have a single syscall to get a fd backed memory > with spedific requirements (sealing, unmapping from the kernel address > space). Neither of us would be clearly right or wrong. A more important > point is a future extensibility and usability, though. So let's just > think of few usecases I have outlined above. Is it unrealistic to expec= t > that secret memory should be sealable? What about hugetlb? Because if > the answer is no then a new API is a clear win as the combination of > flags would never work and then we would just suffer from the syscall > multiplexing without much gain. On the other hand if combination of the > functionality is to be expected then you will have to jam it into > memfd_create and copy the interface likely causing more confusion. See > what I mean? >=20 > I by no means do not insist one way or the other but from what I have > seen so far I have a feeling that the interface hasn't been thought > through enough. Sure you have landed with fd based approach and that > seems fair. But how to get that fd seems to still have some gaps IMHO. >=20 I agree with Michal. This has been raised by different people already, including on LWN (https://lwn.net/Articles/835342/). I can follow Mike's reasoning (man page), and I am also fine if there is a valid reason. However, IMHO the basic description seems to match quite = good: memfd_create() creates an anonymous file and returns a file descr= iptor that refers to it. The file behaves like a regular file, and so can be modified, truncat= ed, memory-mapped, and so on. However, unlike a regular file, it lives in RAM and has a volati= le backing storage. Once all references to the file are dropped, it is automatically released.= Anonymous memory is used for all backing pages of the file. Therefore, files created by= memfd_create() have the same semantics as other anonymous memory allocations such as those all= ocated using mmap(2) with the MAP_ANONYMOUS flag. AFAIKS, we would need MFD_SECRET and disallow MFD_ALLOW_SEALING and MFD_HUGETLB. In addition, we could add MFD_SECRET_NEVER_MAP, which could disallow any = kind of temporary mappings (eor migration). TBC. --- Some random thoughts regarding files. What is the page size of secretmem memory? Sometimes we use huge pages, sometimes we fallback to 4k pages. So I assume huge pages in general? What are semantics of MADV()/FALLOCATE() etc on such files? I assume PUNCH_HOLE fails in a nice way? does it work? Does mremap()/mremap(FIXED) work/is it blocked? Does mprotect() fail in a nice way? Is userfaultfd() properly fenced? Or does it even work (doubt)? How does it behave if I mmap(FIXED) something in between? In which granularity can I do that (->page-size?)? What are other granularity restrictions (->page size)? Don't want to open a big discussion here, just some random thoughts. Maybe it has all been already figured out and most of the answers above are "Fails with -EINVAL". --=20 Thanks, David / dhildenb