From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC760C31E4A for ; Thu, 13 Jun 2019 16:13:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8C20B217D9 for ; Thu, 13 Jun 2019 16:13:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1560442415; bh=RT5ARPzRqXVzJnJRWsb4aSjhGlmdp5Dxz378/CaIaL8=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=nal29ny7Vc/xXrvcbTndmzSIfHM+kgtYEvr+OX1CRM7q6x+AXqSGtd9R4k6hCpl50 KDKSdWeHONxlHu734NgyINTLCu+vDKSDekOch4Iu9vFHZLkB+E0dOhCSnGfJTBE7Cg +6B95opCnEh2Mu9lDnpIognJbTs2YvTV6JuZoBKI= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390441AbfFMQNe (ORCPT ); Thu, 13 Jun 2019 12:13:34 -0400 Received: from mail.kernel.org ([198.145.29.99]:36900 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730097AbfFMQNd (ORCPT ); Thu, 13 Jun 2019 12:13:33 -0400 Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D1EE62184C for ; Thu, 13 Jun 2019 16:13:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1560442412; bh=RT5ARPzRqXVzJnJRWsb4aSjhGlmdp5Dxz378/CaIaL8=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=NwGeY6c5yx5u+vAf+WFQFGDxu4tJunZee5S3v8/3Qa+DA8OJcbCm8FmnWlW22EeIo pvPbgqFLEzgMn1IjNleSrLm81jYN3yBQ71Ciz8HfYJq/FtRHBbbhMiZLbPlFV0PnuP ocg5sCTo1sMNGcufG6vv6SGvTttlwV0I45IEbCZo= Received: by mail-wm1-f54.google.com with SMTP id w9so6965302wmd.1 for ; Thu, 13 Jun 2019 09:13:31 -0700 (PDT) X-Gm-Message-State: APjAAAW2CzpbwoCK31lEUxwQBYFDBb0Z1PkTvI2yFNJ9QtVvJ6UxuSoc Axv7gyKVReKp7PJbSi/ZsJ5OnF0qFzrpSlsYSrjDZw== X-Google-Smtp-Source: APXvYqxORucqKuGLKjes9H2wam29KDoQdYjsLf9dyQzWSteNOhlzCVgUt/QAWlV6lgXowPWhEDBN5CK0AZlXmczTjmM= X-Received: by 2002:a7b:cd84:: with SMTP id y4mr4464357wmj.79.1560442410236; Thu, 13 Jun 2019 09:13:30 -0700 (PDT) MIME-Version: 1.0 References: <20190612170834.14855-1-mhillenb@amazon.de> <459e2273-bc27-f422-601b-2d6cdaf06f84@amazon.com> In-Reply-To: <459e2273-bc27-f422-601b-2d6cdaf06f84@amazon.com> From: Andy Lutomirski Date: Thu, 13 Jun 2019 09:13:19 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC 00/10] Process-local memory allocations for hiding KVM secrets To: Alexander Graf , Nadav Amit Cc: Andy Lutomirski , Dave Hansen , Marius Hillenbrand , kvm list , LKML , Kernel Hardening , Linux-MM , Alexander Graf , David Woodhouse , "the arch/x86 maintainers" , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 13, 2019 at 12:53 AM Alexander Graf wrote: > > > On 13.06.19 03:30, Andy Lutomirski wrote: > > On Wed, Jun 12, 2019 at 1:27 PM Andy Lutomirski w= rote: > >> > >> > >>> On Jun 12, 2019, at 12:55 PM, Dave Hansen wro= te: > >>> > >>>> On 6/12/19 10:08 AM, Marius Hillenbrand wrote: > >>>> This patch series proposes to introduce a region for what we call > >>>> process-local memory into the kernel's virtual address space. > >>> It might be fun to cc some x86 folks on this series. They might have > >>> some relevant opinions. ;) > >>> > >>> A few high-level questions: > >>> > >>> Why go to all this trouble to hide guest state like registers if all = the > >>> guest data itself is still mapped? > >>> > >>> Where's the context-switching code? Did I just miss it? > >>> > >>> We've discussed having per-cpu page tables where a given PGD is only = in > >>> use from one CPU at a time. I *think* this scheme still works in suc= h a > >>> case, it just adds one more PGD entry that would have to context-swit= ched. > >> Fair warning: Linus is on record as absolutely hating this idea. He mi= ght change his mind, but it=E2=80=99s an uphill battle. > > I looked at the patch, and it (sensibly) has nothing to do with > > per-cpu PGDs. So it's in great shape! > > > Thanks a lot for the very timely review! > > > > > > Seriously, though, here are some very high-level review comments: > > > > Please don't call it "process local", since "process" is meaningless. > > Call it "mm local" or something like that. > > > Naming is hard, yes :). Is "mmlocal" obvious enough to most readers? I'm > not fully convinced, but I don't find it better or worse than proclocal. > So whatever flies with the majority works for me :). My objection to "proc" is that we have many concepts of "process" in the kernel: task, mm, signal handling context, etc. These memory ranges are specifically local to the mm. Admittedly, it would be very surprising to have memory that is local to a signal handling context, but still. > > > > We already have a per-mm kernel mapping: the LDT. So please nix all > > the code that adds a new VA region, etc, except to the extent that > > some of it consists of valid cleanups in and of itself. Instead, > > please refactor the LDT code (arch/x86/kernel/ldt.c, mainly) to make > > it use a more general "mm local" address range, and then reuse the > > same infrastructure for other fancy things. The code that makes it > > > I don't fully understand how those two are related. Are you referring to > the KPTI enabling code in there? That just maps the LDT at the same > address in both kernel and user mappings, no? The relevance here is that, when KPTI is on, the exact same address refers to a different LDT in different mms, so it's genuinely an mm-local mapping. It works just like yours: a whole top-level paging entry is reserved for it. What I'm suggesting is that, when you're all done, the LDT should be more or less just one more mm-local mapping, with two caveats. First, the LDT needs special KPTI handling, but that's fine. Second, the LDT address is visible to user code on non-UMIP systems, so you'll have to decide if that's okay. My suggestion is to have the LDT be the very first address in the mm-local range and then to randomize everything else in the mm-local range. > > So you're suggesting we use the new mm local address as LDT address > instead and have that mapped in both kernel and user space? This patch > set today maps "mm local" data only in kernel space, not in user space, > as it's meant for kernel data structures. Yes, exactly. > > So I'm not really seeing the path to adapt any of the LDT logic to this. > Could you please elaborate? > > > > KASLR-able should be in its very own patch that applies *after* the > > code that makes it all work so that, when the KASLR part causes a > > crash, we can bisect it. > > > That sounds very reasonable, yes. > > > > > > + /* > > + * Faults in process-local memory may be caused by process-local > > + * addresses leaking into other contexts. > > + * tbd: warn and handle gracefully. > > + */ > > + if (unlikely(fault_in_process_local(address))) { > > + pr_err("page fault in PROCLOCAL at %lx", address); > > + force_sig_fault(SIGSEGV, SEGV_MAPERR, (void __user *)address, current= ); > > + } > > + > > > > Huh? Either it's an OOPS or you shouldn't print any special > > debugging. As it is, you're just blatantly leaking the address of the > > mm-local range to malicious user programs. > > > Yes, this is a left over bit from an idea that we discussed and rejected > yesterday. The idea was to have a DEBUG config option that allows > proclocal memory to leak into other processes, but print debug output so > that it's easier to catch bugs. After discussion, I think we managed to > convince everyone that an OOPS is the better tool to find bugs :). > > Any trace of this will disappear in the next version. > > > > > > Also, you should IMO consider using this mechanism for kmap_atomic(). > > > It might make sense to use it for kmap_atomic() for debug purposes, as > it ensures that other users can no longer access the same mapping > through the linear map. However, it does come at quite a big cost, as we > need to shoot down the TLB of all other threads in the system. So I'm > not sure it's of general value? What I meant was that kmap_atomic() could use mm-local memory so that it doesn't need to do a global shootdown. But I guess it's not actually used for real on 64-bit, so this is mostly moot. Are you planning to support mm-local on 32-bit? --Andy > > > Alex > > > > Hi, Nadav!