Re: [PATCH v2 07/11] KVM: x86: define standard behavior for bits 0/1 of VM type

From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	michael.roth@amd.com,  aik@amd.com
Subject: Re: [PATCH v2 07/11] KVM: x86: define standard behavior for bits 0/1 of VM type
Date: Fri, 23 Feb 2024 08:46:39 -0800	[thread overview]
Message-ID: <ZdjL783FazB6V6Cy@google.com> (raw)
In-Reply-To: <20240223104009.632194-8-pbonzini@redhat.com>

On Fri, Feb 23, 2024, Paolo Bonzini wrote:
> Some VM types have characteristics in common; in fact, the only use
> of VM types right now is kvm_arch_has_private_mem and it assumes that
> _all_ VM types have private memory.
> 
> So, let the low bits specify the characteristics of the VM type.  As
> of we have two special things: whether memory is private, and whether
> guest state is protected.  The latter is similar to
> kvm->arch.guest_state_protected, but the latter is only set on a fully
> initialized VM.  If both are set, ioctls to set registers will cause
> an error---SEV-ES did not do so, which is a problematic API.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Message-Id: <20240209183743.22030-7-pbonzini@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  9 +++-
>  arch/x86/kvm/x86.c              | 93 +++++++++++++++++++++++++++------
>  2 files changed, 85 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0bcd9ae16097..15db2697863c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2135,8 +2135,15 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
>  void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>  		       int tdp_max_root_level, int tdp_huge_page_level);
>  
> +
> +/* Low bits of VM types provide confidential computing capabilities.  */
> +#define __KVM_X86_PRIVATE_MEM_TYPE	1
> +#define __KVM_X86_PROTECTED_STATE_TYPE	2
> +#define __KVM_X86_VM_TYPE_FEATURES	3
> +static_assert((KVM_X86_SW_PROTECTED_VM & __KVM_X86_VM_TYPE_FEATURES) == __KVM_X86_PRIVATE_MEM_TYPE);

Aliasing bit 0 to KVM_X86_SW_PROTECTED_VM is gross, e.g. this

 #define KVM_X86_DEFAULT_VM     0
 #define KVM_X86_SW_PROTECTED_VM        1
+#define KVM_X86_SEV_VM         8
+#define KVM_X86_SEV_ES_VM      10
 
is _super_ confusing and bound to cause problems.

Oh, good gravy, you're also aliasing __KVM_X86_PROTECTED_STATE_TYPE into SEV_ES_VM.
Curse you and your Rami Code induced decimal-based bitwise shenanigans!!!

I don't see any reason to bleed the flags into KVM's ABI.  Even shoving the flags
into kvm->arch.vm_type is unncessary.  Aha!  As is storing vm_type as an "unsigned
long", since (a) it can't ever be bigger than u32, and (b) in practice a u8 will
suffice since as Mike pointed out we're effectively limited to 31 types before
kvm_vm_ioctl_check_extension() starts returning error codes.

So I vote to skip the aliasing and simply do:

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ff23712f1f3f..27265ff5fd29 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1279,7 +1279,9 @@ enum kvm_apicv_inhibit {
 };
 
 struct kvm_arch {
-       unsigned long vm_type;
+       u8 vm_type;
+       bool has_private_memory;
+       bool has_protected_state;
        unsigned long n_used_mmu_pages;
        unsigned long n_requested_mmu_pages;
        unsigned long n_max_mmu_pages;

and then just use straight incrementing values for types, i.e.

#define KVM_X86_DEFAULT_VM	0
#define KVM_X86_SW_PROTECTED_VM	1
#define KVM_X86_SEV_VM		2
#define KVM_X86_SEV_ES_VM	3

It'll require a bit of extra work in kvm_arch_init_vm(), but I think the end result
will be signifincatly easier to follow.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8746530930d5..e634e5b67516 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5526,21 +5526,30 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>  	return 0;
>  }
>  
> -static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
> -					     struct kvm_debugregs *dbgregs)
> +static int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
> +					    struct kvm_debugregs *dbgregs)
>  {
>  	unsigned long val;
>  
> +	if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
> +	    vcpu->arch.guest_state_protected)
> +		return -EINVAL;
> +
>  	memset(dbgregs, 0, sizeof(*dbgregs));
>  	memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db));
>  	kvm_get_dr(vcpu, 6, &val);
>  	dbgregs->dr6 = val;
>  	dbgregs->dr7 = vcpu->arch.dr7;
> +	return 0;
>  }

>  static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
> @@ -5622,6 +5645,10 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
>  {
>  	int i, r = 0;
>  
> +	if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
> +	    vcpu->arch.guest_state_protected)
> +		return -EINVAL;
> +
>  	if (!boot_cpu_has(X86_FEATURE_XSAVE))
>  		return -EINVAL;
>  
> @@ -6010,7 +6037,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  	case KVM_GET_DEBUGREGS: {
>  		struct kvm_debugregs dbgregs;
>  
> -		kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
> +		r = kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
> +		if (r < 0)

I would strongly prefer to just do

		if (r)

as "r < 0" implies that postive return values are possible/allowed.

That said, rather than a mix of open coding checks in kvm_arch_vcpu_ioctl() versus
burying checks in helpers, what about adding a dedicated helper to take care of
everything in one shot?  E.g.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bc69b0c9822..f5ca78e75eec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5864,6 +5864,16 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
        }
 }
 
+static bool kvm_ioctl_accesses_guest_state(unsigned int ioctl)
+{
+       switch (ioctl) {
+       case <...>:
+               return true;
+       default:
+               return false:
+       }
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
                         unsigned int ioctl, unsigned long arg)
 {
@@ -5878,6 +5888,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
                void *buffer;
        } u;
 
+       if (vcpu->kvm->arch.has_protected_state &&
+           vcpu->arch.guest_state_protected &&
+           kvm_ioctl_accesses_guest_state(ioctl))
+               return -EINVAL;
+
        vcpu_load(vcpu);
 
        u.buffer = NULL;

It'll be a much smaller diff, and hopefully easier to audit, too.