From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D86FDC31E5B for ; Mon, 17 Jun 2019 19:12:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8543320863 for ; Mon, 17 Jun 2019 19:12:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="VWtdWxjX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8543320863 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1BD476B0005; Mon, 17 Jun 2019 15:12:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 16E608E0005; Mon, 17 Jun 2019 15:12:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00E368E0001; Mon, 17 Jun 2019 15:12:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id B7C1B6B0005 for ; Mon, 17 Jun 2019 15:12:35 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id j7so7599786pfn.10 for ; Mon, 17 Jun 2019 12:12:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:mime-version:references :in-reply-to:from:date:message-id:subject:to:cc; bh=clU45spENyLPC9cj2oA5TWZaxvpR/nwAXcqqpB3uLuA=; b=sqdusUoKx0xI2qtEISZ1kHGaW2V3ixTygzekX9bUto+agSzR1E+qrO+fiaJhrss8xa aVyHyGHXn/KZGgTAQ07Q7AZH3apdkRg07A3yIxtj8Ug6QsgUaQV2rXwWk1vBpOTdaNWH l+cuWpv49vdVOyDSi2o9QJqpNFx+W4R2Lh6iV5KDNBEX0/digJA507RS//WnWEz5TL7V xGapx/CPsKCMIbP6NwyqGZzpGb1eXBoBad6gVHiTfi9RBnyaiugQ82zP2FIgJKfiraGm IonNnLrFvSS2MVYegQozIwvweSv9wptY3htJdtQWKaJUo5ViJl+GGwpwJrnLzwFEQNe/ P50A== X-Gm-Message-State: APjAAAWEVKgL8QLHLszm2fUhYPdfUcGiJd/7UPYhejxxs0AW41M2f+yt 0iiGGdS0z8+W7O7LI1RrgRU5xyK7+sFqBCHS2LomAEiLW9uQ2LvDCjBLgGt3sBb0vQmc8PjsAuA oNx0vc+cIDdtY0z+bSSNOxZ7TLIhIWle599lVpxazCctrc3HxJG26xYS/mKzqSGaimw== X-Received: by 2002:a17:90a:3210:: with SMTP id k16mr424281pjb.13.1560798755391; Mon, 17 Jun 2019 12:12:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqyXBw1QLPdyxAu6uXWZTVaiLn+r98+lHVh5LDlrCnFh+1NT2tT0Bo3IkanEv0c9MATz1weP X-Received: by 2002:a17:90a:3210:: with SMTP id k16mr424185pjb.13.1560798754497; Mon, 17 Jun 2019 12:12:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560798754; cv=none; d=google.com; s=arc-20160816; b=HmPM2TD5E+pJ4/LejpFo41jhp10RyKIfbyXacxpzCWnjEby5pMvhnn2V6LG1w57hLX Tp3Rtr9AgaTqzoNSlS7ymGmNzcqS4RXoxqc81mj9gL1viZU29bYRCamc59W8eaUjQsV+ G47tSS9OgWHyT6CrWUVeTHtJVz/On4dQoTRXM+01/ffgQavBJPLlbRPm81fZg59kDTfx sLW6UuW/BiT9abk7PIWgu3Knrdz8I53fa5MjuzHVbJeJnH4X421R8JDfQUCV1apXepFb st8MrTqSsDIo8hXRIdL6nN6g2uoa1kPDkoqaQfTHvyJ2PkQFd5kganPW7dbkK5mnefht uw5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=clU45spENyLPC9cj2oA5TWZaxvpR/nwAXcqqpB3uLuA=; b=U0ZezyjKBPeRe0sQCtjgakVeCKNqhfAl5oPsXSrLldh1t//bzdSuaoGQJErNsZDep3 XSgo3lEzUOpQ4LBN3q2W86tRLveTX+W0/cu1D4wAChSJ94DecrJtj/t7PYTlrcYmKTqL 0If+QfanFdGcf0WZmxIWvqzkXGLt9yUpTwXenV8p8elnHJuQkpYpcXGHIp7Rvtpt/s0/ xkz8mJJ/Pa0gk4zVKYbYvxJZtnt2xS7HyIUdZuKDm0Iik6Cn/kNADNfJahY217CCKS0K YBObozEbPLlsbKqwCd5tlx5CeIAkRdvX8pM470SFnHagOs7VjQvDJugLP+eKaeNbJHAI GgCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=VWtdWxjX; spf=pass (google.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id k64si5362987pgd.573.2019.06.17.12.12.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Jun 2019 12:12:34 -0700 (PDT) Received-SPF: pass (google.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=VWtdWxjX; spf=pass (google.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BD0822182B for ; Mon, 17 Jun 2019 19:12:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1560798754; bh=iB5y5Zu4qbKrjb1l9NPFwndfY9Hlf8H4ea1wBhUkzvQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=VWtdWxjXKdd5pJ/zZFr8pi1CS++LZSE6Er4QR0FaeoW9MmBZeZgMaHUYFfFwhifcE ud+Jw1kMPltbchiC0eIOQQKrTAx6Y/f4Pa0DHuJIaa4KFFDcG5zHxjwSEwHA40BKXq Ci31dLoL0EOWz8zQ4iyk1sWcg/BwigcXyLChG7Ww= Received: by mail-wr1-f50.google.com with SMTP id c2so11209731wrm.8 for ; Mon, 17 Jun 2019 12:12:33 -0700 (PDT) X-Received: by 2002:adf:cc85:: with SMTP id p5mr16716016wrj.47.1560798752177; Mon, 17 Jun 2019 12:12:32 -0700 (PDT) MIME-Version: 1.0 References: <20190508144422.13171-1-kirill.shutemov@linux.intel.com> <20190508144422.13171-46-kirill.shutemov@linux.intel.com> <3c658cce-7b7e-7d45-59a0-e17dae986713@intel.com> <5cbfa2da-ba2e-ed91-d0e8-add67753fc12@intel.com> In-Reply-To: <5cbfa2da-ba2e-ed91-d0e8-add67753fc12@intel.com> From: Andy Lutomirski Date: Mon, 17 Jun 2019 12:12:20 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME To: Dave Hansen Cc: Andy Lutomirski , "Kirill A. Shutemov" , Andrew Morton , X86 ML , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Borislav Petkov , Peter Zijlstra , David Howells , Kees Cook , Kai Huang , Jacob Pan , Alison Schofield , Linux-MM , kvm list , keyrings@vger.kernel.org, LKML , Tom Lendacky Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 17, 2019 at 11:37 AM Dave Hansen wrote: > > Tom Lendacky, could you take a look down in the message to the talk of > SEV? I want to make sure I'm not misrepresenting what it does today. > ... > > > >> I actually don't care all that much which one we end up with. It's not > >> like the extra syscall in the second options means much. > > > > The benefit of the second one is that, if sys_encrypt is absent, it > > just works. In the first model, programs need a fallback because > > they'll segfault of mprotect_encrypt() gets ENOSYS. > > Well, by the time they get here, they would have already had to allocate > and set up the encryption key. I don't think this would really be the > "normal" malloc() path, for instance. > > >> How do we > >> eventually stack it on top of persistent memory filesystems or Device > >> DAX? > > > > How do we stack anonymous memory on top of persistent memory or Device > > DAX? I'm confused. > > If our interface to MKTME is: > > fd = open("/dev/mktme"); > ptr = mmap(fd); > > Then it's hard to combine with an interface which is: > > fd = open("/dev/dax123"); > ptr = mmap(fd); > > Where if we have something like mprotect() (or madvise() or something > else taking pointer), we can just do: > > fd = open("/dev/anything987"); > ptr = mmap(fd); > sys_encrypt(ptr); I'm having a hard time imagining that ever working -- wouldn't it blow up if someone did: fd = open("/dev/anything987"); ptr1 = mmap(fd); ptr2 = mmap(fd); sys_encrypt(ptr1); So I think it really has to be: fd = open("/dev/anything987"); ioctl(fd, ENCRYPT_ME); mmap(fd); But I really expect that the encryption of a DAX device will actually be a block device setting and won't look like this at all. It'll be more like dm-crypt except without device mapper. > > Now, we might not *do* it that way for dax, for instance, but I'm just > saying that if we go the /dev/mktme route, we never get a choice. > > > I think that, in the long run, we're going to have to either expand > > the core mm's concept of what "memory" is or just have a whole > > parallel set of mechanisms for memory that doesn't work like memory. > ... > > I expect that some day normal memory will be able to be repurposed as > > SGX pages on the fly, and that will also look a lot more like SEV or > > XPFO than like the this model of MKTME. > > I think you're drawing the line at pages where the kernel can manage > contents vs. not manage contents. I'm not sure that's the right > distinction to make, though. The thing that is important is whether the > kernel can manage the lifetime and location of the data in the page. The kernel can manage the location of EPC pages, for example, but only under extreme constraints right now. The draft SGX driver can and does swap them out and swap them back in, potentially at a different address. > > Basically: Can the kernel choose where the page comes from and get the > page back when it wants? > > I really don't like the current state of things like with SEV or with > KVM direct device assignment where the physical location is quite locked > down and the kernel really can't manage the memory. I'm trying really > hard to make sure future hardware is more permissive about such things. > My hope is that these are a temporary blip and not the new normal. > > > So, if we upstream MKTME as anonymous memory with a magic config > > syscall, I predict that, in a few years, it will be end up inheriting > > all downsides of both approaches with few of the upsides. Programs > > like QEMU will need to learn to manipulate pages that can't be > > accessed outside the VM without special VM buy-in, so the fact that > > MKTME pages are fully functional and can be GUP-ed won't be very > > useful. And the VM will learn about all these things, but MKTME won't > > really fit in. > > Kai Huang (who is on cc) has been doing the QEMU enabling and might want > to weigh in. I'd also love to hear from the AMD folks in case I'm not > grokking some aspect of SEV. > > But, my understanding is that, even today, neither QEMU nor the kernel > can see SEV-encrypted guest memory. So QEMU should already understand > how to not interact with guest memory. I _assume_ it's also already > doing this with anonymous memory, without needing /dev/sme or something. Let's find out! If it's using anonymous memory, it must be very strange, since it would more or less have to be mmapped PROT_NONE. The thing that makes anonymous memory in particular seem so awkward to me is that it's fundamentally identified by it's *address*, which means it makes no sense if it's not mapped. > > > And, one of these days, someone will come up with a version of XPFO > > that could actually be upstreamed, and it seems entirely plausible > > that it will be totally incompatible with MKTME-as-anonymous-memory > > and that users of MKTME will actually get *worse* security. > > I'm not following here. XPFO just means that we don't keep the direct > map around all the time for all memory. If XPFO and > MKTME-as-anonymous-memory were both in play, I think we'd just be > creating/destroying the MKTME-enlightened direct map instead of a > vanilla one. What I'm saying is that I can imagine XPFO also wanting to be something other than anonymous memory. I don't think we'll ever want regular MAP_ANONYMOUS to enable XPFO by default because the performance will suck. Doing this seems odd: ptr = mmap(MAP_ANONYMOUS); sys_xpfo_a_pointer(ptr); So I could imagine: ptr = mmap(MAP_ANONYMOUS | MAP_XPFO); or fd = open("/dev/xpfo"); (or fd = memfd_create(..., XPFO); ptr = mmap(fd); I'm thinking that XPFO is a *lot* simpler under the hood if we just straight-up don't support GUP on it. Maybe we should call this "strong XPFO". Similarly, the kinds of things that want MKTME may also want the memory to be entirely absent from the direct map. And the things that use SEV (as I understand it) *can't* usefully use the memory for normal IO via GUP or copy_to/from_user(), so these things all have a decent amount in common. Another down side of anonymous memory (in my head, anyway -- QEMU people should chime in) is that it seems awkward to use it for IO techniques in which the back-end isn't in the QEMU process. If there's an fd involved, you can pass it around, feed it to things like vfio, etc. If there's no fd, it's stuck in the creating process. And another silly argument: if we had /dev/mktme, then we could possibly get away with avoiding all the keyring stuff entirely. Instead, you open /dev/mktme and you get your own key under the hook. If you want two keys, you open /dev/mktme twice. If you want some other program to be able to see your memory, you pass it the fd. I hope this email isn't too rambling :) --Andy