linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	James Bottomley <jejb@linux.ibm.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org, Hagen Paul Pfeifer <hagen@jauu.net>,
	Palmer Dabbelt <palmerdabbelt@google.com>
Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Thu, 11 Feb 2021 10:01:32 +0100	[thread overview]
Message-ID: <0d66baec-1898-987b-7eaf-68a015c027ff@redhat.com> (raw)
In-Reply-To: <YCTtSrCEvuBug2ap@dhcp22.suse.cz>

On 11.02.21 09:39, Michal Hocko wrote:
> On Thu 11-02-21 09:13:19, Mike Rapoport wrote:
>> On Tue, Feb 09, 2021 at 02:17:11PM +0100, Michal Hocko wrote:
>>> On Tue 09-02-21 11:09:38, Mike Rapoport wrote:
> [...]
>>>> Citing my older email:
>>>>
>>>>      I've hesitated whether to continue to use new flags to memfd_create() or to
>>>>      add a new system call and I've decided to use a new system call after I've
>>>>      started to look into man pages update. There would have been two completely
>>>>      independent descriptions and I think it would have been very confusing.
>>>
>>> Could you elaborate? Unmapping from the kernel address space can work
>>> both for sealed or hugetlb memfds, no? Those features are completely
>>> orthogonal AFAICS. With a dedicated syscall you will need to introduce
>>> this functionality on top if that is required. Have you considered that?
>>> I mean hugetlb pages are used to back guest memory very often. Is this
>>> something that will be a secret memory usecase?
>>>
>>> Please be really specific when giving arguments to back a new syscall
>>> decision.
>>
>> Isn't "syscalls have completely independent description" specific enough?
> 
> No, it's not as you can see from questions I've had above. More on that
> below.
> 
>> We are talking about API here, not the implementation details whether
>> secretmem supports large pages or not.
>>
>> The purpose of memfd_create() is to create a file-like access to memory.
>> The purpose of memfd_secret() is to create a way to access memory hidden
>> from the kernel.
>>
>> I don't think overloading memfd_create() with the secretmem flags because
>> they happen to return a file descriptor will be better for users, but
>> rather will be more confusing.
> 
> This is quite a subjective conclusion. I could very well argue that it
> would be much better to have a single syscall to get a fd backed memory
> with spedific requirements (sealing, unmapping from the kernel address
> space). Neither of us would be clearly right or wrong. A more important
> point is a future extensibility and usability, though. So let's just
> think of few usecases I have outlined above. Is it unrealistic to expect
> that secret memory should be sealable? What about hugetlb? Because if
> the answer is no then a new API is a clear win as the combination of
> flags would never work and then we would just suffer from the syscall
> multiplexing without much gain. On the other hand if combination of the
> functionality is to be expected then you will have to jam it into
> memfd_create and copy the interface likely causing more confusion. See
> what I mean?
> 
> I by no means do not insist one way or the other but from what I have
> seen so far I have a feeling that the interface hasn't been thought
> through enough. Sure you have landed with fd based approach and that
> seems fair. But how to get that fd seems to still have some gaps IMHO.
> 

I agree with Michal. This has been raised by different
people already, including on LWN (https://lwn.net/Articles/835342/).

I can follow Mike's reasoning (man page), and I am also fine if there is
a valid reason. However, IMHO the basic description seems to match quite good:

        memfd_create() creates an anonymous file and returns a file descriptor that refers to it.  The
        file behaves like a regular file, and so can be modified, truncated, memory-mapped, and so on.
        However,  unlike a regular file, it lives in RAM and has a volatile backing storage.  Once all
        references to the file are dropped, it is automatically released.  Anonymous  memory  is  used
        for  all  backing pages of the file.  Therefore, files created by memfd_create() have the same
        semantics as other anonymous memory allocations such as those allocated using mmap(2) with the
        MAP_ANONYMOUS flag.

AFAIKS, we would need MFD_SECRET and disallow
MFD_ALLOW_SEALING and MFD_HUGETLB.

In addition, we could add MFD_SECRET_NEVER_MAP, which could disallow any kind of
temporary mappings (eor migration). TBC.

---

Some random thoughts regarding files.

What is the page size of secretmem memory? Sometimes we use huge pages,
sometimes we fallback to 4k pages. So I assume huge pages in general?

What are semantics of MADV()/FALLOCATE() etc on such files?
I assume PUNCH_HOLE fails in a nice way? does it work?

Does mremap()/mremap(FIXED) work/is it blocked?

Does mprotect() fail in a nice way?

Is userfaultfd() properly fenced? Or does it even work (doubt)?

How does it behave if I mmap(FIXED) something in between?
In which granularity can I do that (->page-size?)?

What are other granularity restrictions (->page size)?

Don't want to open a big discussion here, just some random thoughts.
Maybe it has all been already figured out and most of the answers
above are "Fails with -EINVAL".

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-02-11  9:15 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08  8:49 [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 01/10] mm: add definition of PMD_PAGE_ORDER Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 02/10] mmap: make mlock_future_check() global Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 03/10] riscv/Kconfig: make direct map manipulation options depend on MMU Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 04/10] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 05/10] set_memory: allow querying whether set_direct_map_*() is actually enabled Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 06/10] arm64: kfence: fix header inclusion Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-02-08 10:49   ` Michal Hocko
2021-02-08 21:26     ` Mike Rapoport
2021-02-09  8:47       ` Michal Hocko
2021-02-09  9:09         ` Mike Rapoport
2021-02-09 13:17           ` Michal Hocko
2021-02-11  7:13             ` Mike Rapoport
2021-02-11  8:39               ` Michal Hocko
2021-02-11  9:01                 ` David Hildenbrand [this message]
2021-02-11  9:38                   ` Michal Hocko
2021-02-11  9:48                     ` David Hildenbrand
2021-02-11 10:02                     ` David Hildenbrand
2021-02-11 11:29                       ` Mike Rapoport
2021-02-11 11:27                   ` Mike Rapoport
2021-02-11 12:07                     ` David Hildenbrand
2021-02-11 23:09                       ` Mike Rapoport
2021-02-12  9:18                         ` David Hildenbrand
2021-02-14  9:19                           ` Mike Rapoport
2021-02-14  9:58                             ` David Hildenbrand
2021-02-14 19:21                               ` James Bottomley
2021-02-15  9:13                                 ` Michal Hocko
2021-02-15 18:14                                   ` James Bottomley
2021-02-15 19:20                                     ` Michal Hocko
2021-02-16 16:25                                       ` James Bottomley
2021-02-16 16:34                                         ` David Hildenbrand
2021-02-16 16:44                                           ` James Bottomley
2021-02-16 17:16                                             ` David Hildenbrand
2021-02-17 16:19                                               ` James Bottomley
2021-02-22  9:38                                                 ` David Hildenbrand
2021-02-22 10:50                                                   ` David Hildenbrand
2021-02-16 16:51                                         ` Michal Hocko
2021-02-11 11:20                 ` Mike Rapoport
2021-02-11 12:30                   ` Michal Hocko
2021-02-11 22:59                     ` Mike Rapoport
2021-02-12  9:02                       ` Michal Hocko
2021-02-08  8:49 ` [PATCH v17 08/10] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2021-02-08 10:18   ` Michal Hocko
2021-02-08 10:32     ` David Hildenbrand
2021-02-08 10:51       ` Michal Hocko
2021-02-08 10:53         ` David Hildenbrand
2021-02-08 10:57           ` Michal Hocko
2021-02-08 11:13             ` David Hildenbrand
2021-02-08 11:14               ` David Hildenbrand
2021-02-08 11:26                 ` David Hildenbrand
2021-02-08 12:17                   ` Michal Hocko
2021-02-08 13:34                     ` Michal Hocko
2021-02-08 13:40                     ` David Hildenbrand
2021-02-08 21:28     ` Mike Rapoport
2021-02-22  7:34   ` Matthew Garrett
2021-02-22 10:23     ` Mike Rapoport
2021-02-22 18:27       ` Matthew Garrett
2021-02-22 19:17       ` Dan Williams
2021-02-22 19:21         ` James Bottomley
2021-02-08  8:49 ` [PATCH v17 09/10] arch, mm: wire up memfd_secret system call where relevant Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 10/10] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2021-02-08  9:27 ` [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas David Hildenbrand
2021-02-08 21:13   ` Mike Rapoport
2021-02-08 21:38     ` David Hildenbrand
2021-02-09  8:59       ` Michal Hocko
2021-02-09  9:15         ` David Hildenbrand
2021-02-09  9:53           ` Michal Hocko
2021-02-09 10:23             ` David Hildenbrand
2021-02-09 10:30               ` David Hildenbrand
2021-02-09 13:25               ` Michal Hocko
2021-02-09 16:17                 ` David Hildenbrand
2021-02-09 20:08                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0d66baec-1898-987b-7eaf-68a015c027ff@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=elena.reshetova@intel.com \
    --cc=guro@fb.com \
    --cc=hagen@jauu.net \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=palmerdabbelt@google.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).