From: Tom Lendacky <firstname.lastname@example.org>
To: Sean Christopherson <email@example.com>
Cc: Paolo Bonzini <firstname.lastname@example.org>,
Vitaly Kuznetsov <email@example.com>,
Wanpeng Li <firstname.lastname@example.org>,
Jim Mattson <email@example.com>, Joerg Roedel <firstname.lastname@example.org>,
Peter Gonda <email@example.com>,
Maxim Levitsky <firstname.lastname@example.org>
Subject: Re: [PATCH 2/2] KVM: x86: Allow userspace to update tracked sregs for protected guests
Date: Mon, 10 May 2021 16:23:37 -0500 [thread overview]
Message-ID: <email@example.com> (raw)
On 5/10/21 4:02 PM, Sean Christopherson wrote:
> On Mon, May 10, 2021, Tom Lendacky wrote:
>> On 5/10/21 11:10 AM, Sean Christopherson wrote:
>>> On Fri, May 07, 2021, Tom Lendacky wrote:
>>>> On 5/7/21 11:59 AM, Sean Christopherson wrote:
>>>>> Allow userspace to set CR0, CR4, CR8, and EFER via KVM_SET_SREGS for
>>>>> protected guests, e.g. for SEV-ES guests with an encrypted VMSA. KVM
>>>>> tracks the aforementioned registers by trapping guest writes, and also
>>>>> exposes the values to userspace via KVM_GET_SREGS. Skipping the regs
>>>>> in KVM_SET_SREGS prevents userspace from updating KVM's CPU model to
>>>>> match the known hardware state.
>>>> This is very similar to the original patch I had proposed that you were
>>>> against :)
>>> I hope/think my position was that it should be unnecessary for KVM to need to
>>> know the guest's CR0/4/0 and EFER values, i.e. even the trapping is unnecessary.
>>> I was going to say I had a change of heart, as EFER.LMA in particular could
>>> still be required to identify 64-bit mode, but that's wrong; EFER.LMA only gets
>>> us long mode, the full is_64_bit_mode() needs access to cs.L, which AFAICT isn't
>>> provided by #VMGEXIT or trapping.
>> Right, that one is missing. If you take a VMGEXIT that uses the GHCB, then
>> I think you can assume we're in 64-bit mode.
> But that's not technically guaranteed. The GHCB even seems to imply that there
> are scenarios where it's legal/expected to do VMGEXIT with a valid GHCB outside
> of 64-bit mode:
> However, instead of issuing a HLT instruction, the AP will issue a VMGEXIT
> with SW_EXITCODE of 0x8000_0004 ((this implies that the GHCB was updated prior
> to leaving 64-bit long mode).
Right, but in order to fill in the GHCB so that the hypervisor can read
it, the guest had to have been in 64-bit mode. Otherwise, whatever the
guest wrote will be seen as encrypted data and make no sense to the
> In practice, assuming the guest is in 64-bit mode will likely work, especially
> since the MSR-based protocol is extremely limited, but ideally there should be
> stronger language in the GHCB to define the exact VMM assumptions/behaviors.
> On the flip side, that assumption and the limited exposure through the MSR
> protocol means trapping CR0, CR4, and EFER is pointless. I don't see how KVM
> can do anything useful with that information outside of VMGEXITs. Page tables
> are encrypted and GPRs are stale; what else could KVM possibly do with
> identifying protected mode, paging, and/or 64-bit?
>>> Unless I'm missing something, that means that VMGEXIT(VMMCALL) is broken since
>>> KVM will incorrectly crush (or preserve) bits 63:32 of GPRs. I'm guessing no
>>> one has reported a bug because either (a) no one has tested a hypercall that
>>> requires bits 63:32 in a GPR or (b) the guest just happens to be in 64-bit mode
>>> when KVM_SEV_LAUNCH_UPDATE_VMSA is invoked and so the segment registers are
>>> frozen to make it appear as if the guest is perpetually in 64-bit mode.
>> I don't think it's (b) since the LAUNCH_UPDATE_VMSA is done against reset-
>> state vCPUs.
>>> I see that sev_es_validate_vmgexit() checks ghcb_cpl_is_valid(), but isn't that
>>> either pointless or indicative of a much, much bigger problem? If VMGEXIT is
>> It is needed for the VMMCALL exit.
>>> restricted to CPL0, then the check is pointless. If VMGEXIT isn't restricted to
>>> CPL0, then KVM has a big gaping hole that allows a malicious/broken guest
>>> userspace to crash the VM simply by executing VMGEXIT. Since valid_bitmap is
>>> cleared during VMGEXIT handling, I don't think guest userspace can attack/corrupt
>>> the guest kernel by doing a replay attack, but it does all but guarantee a
>>> VMGEXIT at CPL>0 will be fatal since the required valid bits won't be set.
>> Right, so I think some cleanup is needed there, both for the guest and the
>> - For the guest, we could just clear the valid bitmask before leaving the
>> #VC handler/releasing the GHCB. Userspace can't update the GHCB, so any
>> VMGEXIT from userspace would just look like a no-op with the below
>> change to KVM.
> Ah, right, the exit_code and exit infos need to be valid.
>> - For KVM, instead of returning -EINVAL from sev_es_validate_vmgexit(), we
>> return the #GP action through the GHCB and continue running the guest.
> Agreed, KVM should never kill the guest in response to a bad VMGEXIT. That
> should always be a guest decision.
>>> Sadly, the APM doesn't describe the VMGEXIT behavior, nor does any of the SEV-ES
>>> documentation I have. I assume VMGEXIT is recognized at CPL>0 since it morphs
>>> to VMMCALL when SEV-ES isn't active.
>>> I.e. either the ghcb_cpl_is_valid() check should be nuked, or more likely KVM
>> The ghcb_cpl_is_valid() is still needed to see whether the VMMCALL was
>> from userspace or not (a VMMCALL will generate a #VC).
> Blech. I get that the GHCB spec says CPL must be provided/checked for VMMCALL,
> but IMO that makes no sense whatsover.
> If the guest restricts the GHCB to CPL0, then the CPL field is pointless because
> the VMGEXIT will only ever come from CPL0. Yes, technically the guest kernel
> can proxy a VMMCALL from userspace to the host, but the guest kernel _must_ be
> the one to enforce any desired CPL checks because the VMM is untrusted, at least
> once you get to SNP.
> If the guest exposes the GHCB to any CPL, then the CPL check is worthless because
The GHCB itself is not exposed to any CPL. A VMMCALL will generate a #VC.
The guest #VC handler will extract the CPL level from the context that
generated the #VC (see vc_handle_vmmcall() in arch/x86/kernel/sev-es.c),
so that a VMMCALL from userspace will have the proper CPL value in the
GHCB when the #VC handler issues the VMGEXIT instruction.
> guest userspace can simply lie about the CPL. And exposing the GCHB to userspace
> completely undermines guest privilege separation since hardware doesn't provide
> the real CPL, i.e. the VMM, even it were trusted, can't determine the origin of
> the VMGEXIT.
next prev parent reply other threads:[~2021-05-10 21:23 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-07 16:59 [PATCH 0/2] KVM: x86: Fixes for SEV-ES state tracking Sean Christopherson
2021-05-07 16:59 ` [PATCH 1/2] KVM: SVM: Update EFER software model on CR0 trap for SEV-ES Sean Christopherson
2021-05-07 23:15 ` Tom Lendacky
2021-05-07 16:59 ` [PATCH 2/2] KVM: x86: Allow userspace to update tracked sregs for protected guests Sean Christopherson
2021-05-07 23:21 ` Tom Lendacky
2021-05-10 16:10 ` Sean Christopherson
2021-05-10 18:07 ` Tom Lendacky
2021-05-10 21:02 ` Sean Christopherson
2021-05-10 21:23 ` Tom Lendacky [this message]
2021-05-10 22:40 ` Sean Christopherson
2021-05-14 14:19 ` Peter Gonda
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).