RE: [RFC] KVM: x86: Support KVM VMs sharing SEV context

From: "Kalra, Ashish" <Ashish.Kalra@amd.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Tobin Feldman-Fitzthum <tobin@linux.ibm.com>,
	"natet@google.com" <natet@google.com>
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>,
	"Lendacky, Thomas" <Thomas.Lendacky@amd.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"srutherford@google.com" <srutherford@google.com>,
	"seanjc@google.com" <seanjc@google.com>,
	"rientjes@google.com" <rientjes@google.com>,
	"Singh, Brijesh" <brijesh.singh@amd.com>,
	Laszlo Ersek <lersek@redhat.com>,
	James Bottomley <jejb@linux.ibm.com>,
	Hubertus Franke <frankeh@us.ibm.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: RE: [RFC] KVM: x86: Support KVM VMs sharing SEV context
Date: Thu, 27 May 2021 15:51:29 +0000	[thread overview]
Message-ID: <SN6PR12MB276726B5FEF171E1099B91AC8E239@SN6PR12MB2767.namprd12.prod.outlook.com> (raw)
In-Reply-To: <SN6PR12MB276780007C17ADD273EB70FC8E269@SN6PR12MB2767.namprd12.prod.outlook.com>

[AMD Public Use]

Looking at kvm selftests in the kernel, I think the other alternative is to :

Maintain separate data structures like struct kvm_vm, struct vcpu for the mirror VM, but then that means quite a bit of 
the KVM code in Qemu for the mirror VM will be duplicated. 

For example, this will add separate and duplicated functionality for : 

vm_check_cap/vm_enable_cap,
vm_create,
vcpu_run,
vcpu_get_regs, vcpu_sregs_get/set,
vcpu_ioctl,
vm_ioctl, etc., etc.

Also I think that once the mirror VM starts booting and running the UEFI code, it might be only during the PEI or DXE phase where
it will start actually running the MH code, so mirror VM probably still need to handles KVM_EXIT_IO, when SEC phase does I/O,
I can see PIC accesses and Debug Agent initialization stuff in Sec startup code.

Thanks,
Ashish

-----Original Message-----
From: Kalra, Ashish 
Sent: Monday, May 24, 2021 4:29 PM
To: Paolo Bonzini <pbonzini@redhat.com>; Tobin Feldman-Fitzthum <tobin@linux.ibm.com>; natet@google.com
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; x86@kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; srutherford@google.com; seanjc@google.com; rientjes@google.com; Singh, Brijesh <brijesh.singh@amd.com>; Laszlo Ersek <lersek@redhat.com>; James Bottomley <jejb@linux.ibm.com>; Hubertus Franke <frankeh@us.ibm.com>; qemu-devel@nongnu.org
Subject: RE: [RFC] KVM: x86: Support KVM VMs sharing SEV context

[AMD Public Use]

Hello Paolo,

I am working on prototype code in qemu to start a mirror VM running in parallel to the primary VM. Initially I had an idea of a running a completely parallel VM like using the qemu’s microvm machine/platform, but the main issue with this idea is the difficulty in sharing the memory of primary VM with it.

Hence, I started exploring running an internal thread like the current per-vCPU thread(s) in qemu. The main issue is that qemu has a lot of global state, especially the KVMState structure which is per-VM, and all the KVM vCPUs are very tightly tied into it. It does not make sense to add a completely new KVMState structure instance for the mirror VM as then the mirror VM does not remain lightweight at all. 

Hence, the mirror VM i am adding, has to integrate with the current KVMState structure and the “global” KVM state in qemu, this required adding some parallel KVM code in qemu, for example to do ioctl's on the mirror VM, similar to the primary VM. Details below :

The mirror_vm_fd is added to the KVMState structure itself. 

The parallel code I mentioned is like the following :

#define kvm_mirror_vm_enable_cap(s, capability, cap_flags, ...)      \
    ({                                                               \
        struct kvm_enable_cap cap = {                                \
            .cap = capability,                                       \
            .flags = cap_flags,                                      \
        };                                                           \
        uint64_t args_tmp[] = { __VA_ARGS__ };                       \
        size_t n = MIN(ARRAY_SIZE(args_tmp), ARRAY_SIZE(cap.args));  \
        memcpy(cap.args, args_tmp, n * sizeof(cap.args[0]));         \
        kvm_mirror_vm_ioctl(s, KVM_ENABLE_CAP, &cap);                \
    })

+int kvm_mirror_vm_ioctl(KVMState *s, int type, ...) {
+    int ret;
+    void *arg;
+    va_list ap;
+
+    va_start(ap, type);
+    arg = va_arg(ap, void *);
+    va_end(ap);
+
+    trace_kvm_vm_ioctl(type, arg);
+    ret = ioctl(s->mirror_vm_fd, type, arg);
+    if (ret == -1) {
+        ret = -errno;
+    }
+    return ret;
+}
+

The vcpu ioctl code works as it is. 

The kvm_arch_put_registers() also needed a mirror VM variant kvm_arch_mirror_put_registers(), for reasons such as saving MSRs on the mirror VM required enabling the in-kernel irqchip support on the mirror VM, otherwise, kvm_put_msrs() fails. Hence, kvm_arch_mirror_put_registers() makes the mirror VM simpler by not saving any MSRs and not needing the in-kernel irqchip support.

I had lot of issues in dynamically adding a new vCPU, i.e., the CPUState structure due to qemu's object model (QOM) which requires that every qemu structure/object has to contain the parent/base class/object and then all the derived classes after that. It was difficult to add a new CPU object dynamically, hence I have to reuse one of the “-smp”  cpus passed on qemu command line as the mirror vCPU. This also assists in having the X86CPU "backing" structure for the mirror vCPU’s CPU object, which allows using most of the KVM code in qemu for the mirror vCPU. Also the mirror vCPU CPU object will have the CPUX86State structure embedded which contains the cpu register state for the mirror vCPU. 

The mirror vCPU is now running a simpler KVM run loop, it does not have any in-kernel irqchip (interrupt controller) or any other kvmapic interrupt controller supported and enabled for it. As of now it is still doing both I/O and MMIO handling.

Looking fwd. to comments, feedback, thoughts on the above approach.

Thanks,
Ashish

-----Original Message-----
From: Paolo Bonzini <pbonzini@redhat.com>
Sent: Thursday, March 11, 2021 10:30 AM
To: Tobin Feldman-Fitzthum <tobin@linux.ibm.com>; natet@google.com
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; x86@kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; srutherford@google.com; seanjc@google.com; rientjes@google.com; Singh, Brijesh <brijesh.singh@amd.com>; Kalra, Ashish <Ashish.Kalra@amd.com>; Laszlo Ersek <lersek@redhat.com>; James Bottomley <jejb@linux.ibm.com>; Hubertus Franke <frankeh@us.ibm.com>
Subject: Re: [RFC] KVM: x86: Support KVM VMs sharing SEV context

On 11/03/21 16:30, Tobin Feldman-Fitzthum wrote:
> I am not sure how the mirror VM will be supported in QEMU. Usually 
> there is one QEMU process per-vm. Now we would need to run a second VM 
> and communicate with it during migration. Is there a way to do this 
> without adding significant complexity?

I can answer this part.  I think this will actually be simpler than with auxiliary vCPUs.  There will be a separate pair of VM+vCPU file descriptors within the same QEMU process, and some code to set up the memory map using KVM_SET_USER_MEMORY_REGION.

However, the code to run this VM will be very small as the VM does not have to do MMIO, interrupts, live migration (of itself), etc.  It just starts up and communicates with QEMU using a mailbox at a predetermined address.

I also think (but I'm not 100% sure) that the auxiliary VM does not have to watch changes in the primary VM's memory map (e.g. mapping and unmapping of BARs).  In QEMU terms, the auxiliary VM's memory map tracks RAMBlocks, not MemoryRegions, which makes things much simpler.

There are already many examples of mini VMMs running special purpose VMs in the kernel's tools/testing/selftests/kvm directory, and I don't think the QEMU code would be any more complex than that.

Paolo