From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B023CC7EE23 for ; Tue, 23 May 2023 19:00:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236183AbjEWTAV (ORCPT ); Tue, 23 May 2023 15:00:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229834AbjEWTAQ (ORCPT ); Tue, 23 May 2023 15:00:16 -0400 Received: from smtp-fw-52002.amazon.com (smtp-fw-52002.amazon.com [52.119.213.150]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F5F791; Tue, 23 May 2023 12:00:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1684868415; x=1716404415; h=mime-version:content-transfer-encoding:date:cc:subject: from:to:message-id:references:in-reply-to; bh=2f6hgtyoF3npyINkvFYDynw82+STAB4fWRrWAyFHWQA=; b=rFrsHxEx4TA20pUdS17YN6UMkg+qfoSFBqBw8KE/INYT7NkhhhV5iWwV 5T27Bh8pxrQjY/iu0XsMxq0C8VqNzfepjssPX/1G2hHoH5J5W08I3J4uF kpU4FR/qKo1bBZIEXdhuPI4BGSN3koysLpldGLa7iF483sjtn0BAB2T4J E=; X-IronPort-AV: E=Sophos;i="6.00,187,1681171200"; d="scan'208";a="564743059" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-pdx-1box-2bm6-32cf6363.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 May 2023 19:00:09 +0000 Received: from EX19D004EUC001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan3.pdx.amazon.com [10.236.137.198]) by email-inbound-relay-pdx-1box-2bm6-32cf6363.us-west-2.amazon.com (Postfix) with ESMTPS id 98D7A80457; Tue, 23 May 2023 19:00:05 +0000 (UTC) Received: from localhost (10.13.235.138) by EX19D004EUC001.ant.amazon.com (10.252.51.190) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.26; Tue, 23 May 2023 18:59:51 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" Date: Tue, 23 May 2023 18:59:47 +0000 CC: Chao Peng , , , , , , , , , , Paolo Bonzini , Jonathan Corbet , "Vitaly Kuznetsov" , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Arnd Bergmann , "Naoya Horiguchi" , Miaohe Lin , , "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , "Shuah Khan" , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , "Vishal Annapurve" , Yu Zhang , "Kirill A . Shutemov" , , , , , , , , , Quentin Perret , , Michael Roth , , , Subject: Re: [PATCH v10 2/9] KVM: Introduce per-page memory attributes From: Nicolas Saenz Julienne To: Sean Christopherson Message-ID: X-Mailer: aerc 0.15.2-21-g30c1a30168df-dirty References: <20221202061347.1070246-1-chao.p.peng@linux.intel.com> <20221202061347.1070246-3-chao.p.peng@linux.intel.com> In-Reply-To: X-Originating-IP: [10.13.235.138] X-ClientProxiedBy: EX19D032UWA004.ant.amazon.com (10.13.139.56) To EX19D004EUC001.ant.amazon.com (10.252.51.190) Precedence: bulk List-ID: X-Mailing-List: linux-api@vger.kernel.org Hi Sean, On Fri May 19, 2023 at 6:23 PM UTC, Sean Christopherson wrote: > On Fri, May 19, 2023, Nicolas Saenz Julienne wrote: > > Hi, > > On Fri Dec 2, 2022 at 6:13 AM UTC, Chao Peng wrote: [...] > > VSM introduces isolated guest execution contexts called Virtual Trust > > Levels (VTL) [2]. Each VTL has its own memory access protections, > > virtual processors states, interrupt controllers and overlay pages. VTL= s > > are hierarchical and might enforce memory protections on less privilege= d > > VTLs. Memory protections are enforced on a per-GPA granularity. > > > > We implemented this in the past by using a separate address space per > > VTL and updating memory regions on protection changes. But having to > > update the memory slot layout for every permission change scales poorly= , > > especially as we have to perform 100.000s of these operations at boot > > (see [1] for a little more context). > > > > I believe the biggest barrier for us to use memory attributes is not > > having the ability to target specific address spaces, or to the very > > least having some mechanism to maintain multiple independent layers of > > attributes. > > Can you elaborate on "specific address spaces"? In KVM, that usually mea= ns SMM, > but the VTL comment above makes me think you're talking about something e= ntirely > different. E.g. can you provide a brief summary of the requirements/expe= ctations? Let me refresh some concepts first. VTLs are vCPU modes implemented by the hypervisor. Lower VTLs switch into higher VTLs [1] through a hypercall or asynchronously through interrupts. Each VTL has its own CPU architectural state, lapic and MSR state (applies to only some MSRs). These are saved/restored when switching VTLS [2]. Additionally, VTLs share a common GPA->HPA mapping, but protection bits differ depending on which VTL the CPU is on. Privileged VTLs might revoke R/W/X(+MBEC, optional) access bits from lower VTLs on a per-GPA basis. In order to deal with the per-VTL memory protection bits, we extended the number of KVM address spaces and assigned one to each VTL. The hypervisor initializes all VTLs address spaces with the same mappings and protections, they are expected to diverge during runtime. Operations that rely on memory slots for GPA->HPA/HVA translations (including page faults) are already address space aware, so adding VTL support was fairly simple. Ultimately, when a privileged VTL enforces memory protections on lower VTLs we update that VTL's address space memory regions to reflect them. Protection changes are requested through a hypercall, which expects the new protection to be visible system wide upon returning from it. These hypercalls happen around 100000+ times during boot, so we introduced an "atomic memory slot update" API similar to Emanuele's [3] that allows splitting memory regions/changing permissions concurrent with other vCPUs. Now, if we had a way to map memory attributes to specific VTLs, we could use that instead. Actually, we wouldn't need to extend address spaces at all to support this (we might still need them to support Overlay Pages, but that's another story). Hope it makes a little more sense now. :) Nicolas [1] In practice we've only seen VTL0 and VTL1 being used. The spec supports up to 16 VTLs. [2] One can draw an analogy with arm's TrustZone. The hypervisor plays the role of EL3. Windows (VTL0) runs in Non-Secure (EL0/EL1) and the secure kernel (VTL1) in Secure World (EL1s/EL0s). [3] https://lore.kernel.org/all/20220909104506.738478-1-eesposit@redhat.com= /