From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Fs8f=4E=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2144AC3B1BF
	for <linux-mm@archiver.kernel.org>; Sun, 16 Feb 2020 06:35:14 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id A6C502086A
	for <linux-mm@archiver.kernel.org>; Sun, 16 Feb 2020 06:35:13 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="HEnqEbvt"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A6C502086A
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id EEBD16B0003; Sun, 16 Feb 2020 01:35:12 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E9CE16B0006; Sun, 16 Feb 2020 01:35:12 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id DB2866B0007; Sun, 16 Feb 2020 01:35:12 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236])
	by kanga.kvack.org (Postfix) with ESMTP id C40276B0003
	for <linux-mm@kvack.org>; Sun, 16 Feb 2020 01:35:12 -0500 (EST)
Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 767A1180AD80F
	for <linux-mm@kvack.org>; Sun, 16 Feb 2020 06:35:12 +0000 (UTC)
X-FDA: 76495028064.04.girls22_84baeac576f5a
X-HE-Tag: girls22_84baeac576f5a
X-Filterd-Recvd-Size: 5508
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by imf25.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Sun, 16 Feb 2020 06:35:11 +0000 (UTC)
Received: from hump.haifa.ibm.com (nesher1.haifa.il.ibm.com [195.110.40.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPSA id DDEF42086A;
	Sun, 16 Feb 2020 06:35:09 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=default; t=1581834911;
	bh=E0swlgqAWQXiDuedPw6JOC7s6BvIo1TPcMzuyR/xZms=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
	b=HEnqEbvtHwBdUCibAi9fMZGKT2EOnWklje9fAbGQzYoXbYthv3R9gO3NI7u+PYNhx
	 nEQIBy1K+MMNFdR5RJ6K0KW2eyBoHwRDqXeqL5L+0IFMWUzA40orORWr/R4awoZiQ5
	 wB5fEz2QjT+/LQiauHbL71VrUg1sQRl49rtZKgpY=
Date: Sun, 16 Feb 2020 08:35:04 +0200
From: Mike Rapoport <rppt@kernel.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mike Rapoport <rppt@linux.ibm.com>, lsf-pc@lists.linux-foundation.org,
	linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] Restricted kernel address spaces
Message-ID: <20200216063504.GA22092@hump.haifa.ibm.com>
References: <20200206165900.GD17499@linux.ibm.com>
 <20200207173909.e5gtjys7q4ieh2fv@box>
 <20200211172047.GA24237@hump>
 <20200211215334.bftqnru57mv5bcza@box>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20200211215334.bftqnru57mv5bcza@box>
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Feb 12, 2020 at 12:53:34AM +0300, Kirill A. Shutemov wrote:
> On Tue, Feb 11, 2020 at 07:20:47PM +0200, Mike Rapoport wrote:
> > On Fri, Feb 07, 2020 at 08:39:09PM +0300, Kirill A. Shutemov wrote:
> > > On Thu, Feb 06, 2020 at 06:59:00PM +0200, Mike Rapoport wrote:
> > > > 
> > > > * "Secret" memory userspace APIs
> > > > 
> > > >   Should such API follow "native" MM interfaces like mmap(), mprotect(),
> > > >   madvise() or it would be better to use a file descriptor , e.g. like
> > > >   memfd-create does?
> > > 
> > > I don't really see a point in such file-descriptor. It suppose to be very
> > > private secret data. What functionality that provide a file descriptor do
> > > you see valuable in this scenario?
> > > 
> > > File descriptor makes it easier to spill the secrets to other process: over
> > > fork(), UNIX socket or via /proc/PID/fd/.
> > 
> > On the other hand it is may be desired to share a secret between several
> > processes. Then UNIX socket or fork() actually become handy.
> 
> If more than one knows, it is secret no longer :P

But even cryptographers define "shared secret" ;-)
 
> > > >   MM "native" APIs would require VM_something flag and probably a page flag
> > > >   or page_ext. With file-descriptor VM_SPECIAL and custom implementation of
> > > >   .mmap() and .fault() would suffice. On the other hand, mmap() and
> > > >   mprotect() seem better fit semantically and they could be more easily
> > > >   adopted by the userspace.
> > > 
> > > You mix up implementation and interface. You can provide an interface which
> > > doesn't require a file descriptor, but still use a magic file internally to
> > > the VMA distinct.
> > 
> > If I understand correctly, if we go with mmap(MAP_SECRET) example, the
> > mmap() would implicitly create a magic file with its .mmap() and .fault()
> > implementing the protection? That's a possibility. But then, if we already
> > have a file, why not let user get a handle for it and allow fine grained
> > control over its sharing between processes?
> 
> A proper file descriptor would have wider exposer with security
> implications. It has to be at least scoped properly.
 
Agree.

> > > > * Direct/linear map fragmentation
> > > > 
> > > >   Whenever we want to drop some mappings from the direct map or even change
> > > >   the protection bits for some memory area, the gigantic and huge pages
> > > >   that comprise the direct map need to be broken and there's no THP for the
> > > >   kernel page tables to collapse them back. Moreover, the existing API
> > > >   defined in <asm/set_memory.h> by several architectures do not really
> > > >   presume it would be widely used.
> > > > 
> > > >   For the "secret" memory use-case the fragmentation can be minimized by
> > > >   caching large pages, use them to satisfy smaller "secret" allocations and
> > > >   than collapse them back once the "secret" memory is freed. Another
> > > >   possibility is to pre-allocate physical memory at boot time.
> > > 
> > > I would rather go with pre-allocation path. At least at first. We always
> > > can come up with more dynamic and complicated solution later if the
> > > interface would be wildly adopted.
> > 
> > We still must manage the "secret" allocations, so I don't think that the
> > dynamic solution will be much more complicated.
> 
> Okay.
> 
> BTW, with clarified scope of the AMD Erratum, I believe we can implement
> "collapse" for direct mapping. Willing to try?
 
My initial plan was to use a pool of large pages to satisfy "secret"
allocation requests. Whenever a new large page is allocated for that pool,
it's removed from the direct map without being split into small pages and
then when it would be reinstated back there would be no need to collapse
it. 

> > > >   Yet another idea is to make page allocator aware of the direct map layout.
> 
> -- 
>  Kirill A. Shutemov

-- 
Sincerely yours,
Mike.