From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 293DAC433ED for ; Mon, 10 May 2021 22:40:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0AB2F6143B for ; Mon, 10 May 2021 22:40:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231308AbhEJWla (ORCPT ); Mon, 10 May 2021 18:41:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229944AbhEJWlZ (ORCPT ); Mon, 10 May 2021 18:41:25 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9DA5C061574 for ; Mon, 10 May 2021 15:40:19 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id p4so14692108pfo.3 for ; Mon, 10 May 2021 15:40:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=hbH0C8P8Jk3egpmCRvdSA5W5Q4AjZh56OZVlG79oi5c=; b=A+xlxTb/7lkPj03icNYGlzQpfUJGeogr1P2hfAIiS6/3JQyu0vO6r5DAucRJeq+enP i5zbzHECRyMxzPr7MQM53STbDIDcceKOeZnybqIVnfY5VVMNvIBU8HjZZNJeVXWimXLU V8yj3USNQudRfzWVfXAt5oJApFpDAjuFAZTjzTMumcnMe0+zAdEjooDffz+mstlPIGbC Kzyt4DCWTF+Fm3+QXFn+FQ9XNwLMJaZoM69SB9YAYPOUqsgUYp1IZcFR06wS86LWGTlH iQcGS6l5IzM2NO+SI6sCQO38MN5gIjZwIchV7zNf0SSqu4InPIWUdBH3VZWo6RlrM1xl KQcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=hbH0C8P8Jk3egpmCRvdSA5W5Q4AjZh56OZVlG79oi5c=; b=Wqb6Zk/VQQSOdhslccUwVnLvCz0pQKsga0hbaWs1aKBvrzkbmzXWhzTmg27G8T9fdu qVY9v6jw5SOmcSSl08+REeioVTpI0autTZc0/F9fAzirf7CVvmhSxyCSMnhfRMTjhNQ6 CKNx/mQXpIJb33Wlxgi/HD4dX0SQ02ZdWFZaVI4n3zI731jTvJ2C/Z9c2GZrBXl26T3r V6e2NNwB/xUZJrFevmJk2w1+iYikvnrcjbAcB7noToV8mEKHz7qB+JciTxNQTZjISWEH QeX783t0ZhfSQ+9BVSaz7ajknhRVqLc4BPHKLevg+wDChU1jtyC+oOQ5Y8Pp8f/zecqW KmvQ== X-Gm-Message-State: AOAM533oTyDIkC5aDWZYeOz0M7kQIS6UaM//FIkVE+NbPXZQ4OlaollP URxEMEc+/Tm6nkRt/7p7BlX9Aw== X-Google-Smtp-Source: ABdhPJwf9X8xH3jIgjeORQIx2SfTico/ajOH9CLwd2EbFrzvR88MdP2uV0vBGdNelb+tLqnIvu7CRQ== X-Received: by 2002:aa7:904e:0:b029:28f:da01:1a5f with SMTP id n14-20020aa7904e0000b029028fda011a5fmr26907997pfo.67.1620686419178; Mon, 10 May 2021 15:40:19 -0700 (PDT) Received: from google.com (240.111.247.35.bc.googleusercontent.com. [35.247.111.240]) by smtp.gmail.com with ESMTPSA id z29sm12097398pga.52.2021.05.10.15.40.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 May 2021 15:40:18 -0700 (PDT) Date: Mon, 10 May 2021 22:40:14 +0000 From: Sean Christopherson To: Tom Lendacky Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Peter Gonda , Maxim Levitsky Subject: Re: [PATCH 2/2] KVM: x86: Allow userspace to update tracked sregs for protected guests Message-ID: References: <20210507165947.2502412-1-seanjc@google.com> <20210507165947.2502412-3-seanjc@google.com> <5f084672-5c0d-a6f3-6dcf-38dd76e0bde0@amd.com> <12fe8f83-49b4-1a22-7903-84e45f16c372@amd.com> <26da40a0-c9b4-f517-94a6-5d3d69c4a207@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <26da40a0-c9b4-f517-94a6-5d3d69c4a207@amd.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 10, 2021, Tom Lendacky wrote: > On 5/10/21 4:02 PM, Sean Christopherson wrote: > > On Mon, May 10, 2021, Tom Lendacky wrote: > >> On 5/10/21 11:10 AM, Sean Christopherson wrote: > >>> On Fri, May 07, 2021, Tom Lendacky wrote: > >>>> On 5/7/21 11:59 AM, Sean Christopherson wrote: > >>>>> Allow userspace to set CR0, CR4, CR8, and EFER via KVM_SET_SREGS for > >>>>> protected guests, e.g. for SEV-ES guests with an encrypted VMSA. KVM > >>>>> tracks the aforementioned registers by trapping guest writes, and also > >>>>> exposes the values to userspace via KVM_GET_SREGS. Skipping the regs > >>>>> in KVM_SET_SREGS prevents userspace from updating KVM's CPU model to > >>>>> match the known hardware state. > >>>> > >>>> This is very similar to the original patch I had proposed that you were > >>>> against :) > >>> > >>> I hope/think my position was that it should be unnecessary for KVM to need to > >>> know the guest's CR0/4/0 and EFER values, i.e. even the trapping is unnecessary. > >>> I was going to say I had a change of heart, as EFER.LMA in particular could > >>> still be required to identify 64-bit mode, but that's wrong; EFER.LMA only gets > >>> us long mode, the full is_64_bit_mode() needs access to cs.L, which AFAICT isn't > >>> provided by #VMGEXIT or trapping. > >> > >> Right, that one is missing. If you take a VMGEXIT that uses the GHCB, then > >> I think you can assume we're in 64-bit mode. > > > > But that's not technically guaranteed. The GHCB even seems to imply that there > > are scenarios where it's legal/expected to do VMGEXIT with a valid GHCB outside > > of 64-bit mode: > > > > However, instead of issuing a HLT instruction, the AP will issue a VMGEXIT > > with SW_EXITCODE of 0x8000_0004 ((this implies that the GHCB was updated prior > > to leaving 64-bit long mode). > > Right, but in order to fill in the GHCB so that the hypervisor can read > it, the guest had to have been in 64-bit mode. Otherwise, whatever the > guest wrote will be seen as encrypted data and make no sense to the > hypervisor anyway. Having once been in 64-bit is not the same thing as currently being in 64-bit. E.g. if the VMM _knows_ that the vCPU is not in 64-bit, e.g. because it traps writes to CR0, then ignoring bits 63:32 of GPRs would be entirely reasonable. In practice it probably won't matter since the guest will have to go out of its way to do VMGEXIT outside of 64-bit mode with valid data in the GHCB, but ideally KVM will conform to a spec, not implement a best guess as to what will/won't work for the guest. > >>> I.e. either the ghcb_cpl_is_valid() check should be nuked, or more likely KVM > >> > >> The ghcb_cpl_is_valid() is still needed to see whether the VMMCALL was > >> from userspace or not (a VMMCALL will generate a #VC). > > > > Blech. I get that the GHCB spec says CPL must be provided/checked for VMMCALL, > > but IMO that makes no sense whatsover. > > > > If the guest restricts the GHCB to CPL0, then the CPL field is pointless because > > the VMGEXIT will only ever come from CPL0. Yes, technically the guest kernel > > can proxy a VMMCALL from userspace to the host, but the guest kernel _must_ be > > the one to enforce any desired CPL checks because the VMM is untrusted, at least > > once you get to SNP. > > > > If the guest exposes the GHCB to any CPL, then the CPL check is worthless because > > The GHCB itself is not exposed to any CPL. That is a guest decision. Nothing in the GHCB spec requires the GHCB to be accessible only in CPL0. I'm not arguing that any sane guest will actually map the GHCB into userspace, but from an architectural perspective it's not a given. > A VMMCALL will generate a #VC. But that's irrevelant from an architectural perspective. It is 100% legal to do VMGEXIT(VMMCALL) without first doing VMMCALL and taking a #VC. > The guest #VC handler will extract the CPL level from the context that > generated the #VC (see vc_handle_vmmcall() in arch/x86/kernel/sev-es.c), > so that a VMMCALL from userspace will have the proper CPL value in the > GHCB when the #VC handler issues the VMGEXIT instruction. I get how the CPL ended up as a requirement for VMMCALL in the GHCB, I'm arguing that it's useless information. From the guest's perspective, it must handle privilege checks for those checks to have any meaning with respect to security. >From the VMM's perspective, the checks have zero meaning whatsoever because the CPL field is software controlled. The fact that the VMM got a VMGEXIT with valid data in the GHCB is proof enough that the guest blessed the VMGEXIT. It's kind of a moot point because the extra CPL doesn't break anything, but it gives a false sense of security since it implies the guest can/should rely on the VMM to do privilege enforcement. Actually, after looking at the Linux guest code, the entire VMCALL->#VC->VMGEXIT concept is flawed. E.g. kvm_sev_es_hcall_prepare() copies over all _possible_ GPRs. That means users of kvm_hypercall{1,2,3}() are leaking random register state to the VMM. That can be "fixed" by zeroing unused registers or reflecting on regs->ax, but at that point it's far easier and more performant to simply do VMGEXIT(VMCALL) directly and skip the #VC. As for VMMCALL from userspace, IMO the kernel's default should be to reject them unconditionally. KVM and Hyper-V APIs do not allow hypercalls from CPL>0. At a glance, neither does Xen (Xen skips the CPL check, but AFAICT only when the vCPU is in real mode and thus the CPL check is meaningless). No idea about VMware. If there is a use case for allowing VMMCALL from userspace, then it absolutely needs to be handled in a x86_hyper_runtime hook, because otherwise the kernel has zero clue whether or not the hypercall should be allowed, and if it's allowed, precisely which registers to expose. I.e. the hook would have to expose only the registers needed for that exact hypercall, otherwise the kernel would unintentionally leak userspace register state.