kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Expose KVM API to Linux Kernel
@ 2020-05-18  6:58 Anastassios Nanos
  2020-05-18  6:58 ` [PATCH 1/2] KVMM: export needed symbols Anastassios Nanos
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Anastassios Nanos @ 2020-05-18  6:58 UTC (permalink / raw)
  To: kvm, kvmarm, linux-kernel
  Cc: Wanpeng Li, Marc Zyngier, Joerg Roedel, x86, H. Peter Anvin,
	Sean Christopherson, Ingo Molnar, Catalin Marinas,
	Borislav Petkov, Paolo Bonzini, Vitaly Kuznetsov, Will Deacon,
	Thomas Gleixner, Jim Mattson

To spawn KVM-enabled Virtual Machines on Linux systems, one has to use
QEMU, or some other kind of VM monitor in user-space to host the vCPU
threads, I/O threads and various other book-keeping/management mechanisms.
This is perfectly fine for a large number of reasons and use cases: for
instance, running generic VMs, running general purpose Operating systems
that need some kind of emulation for legacy boot/hardware etc.

What if we wanted to execute a small piece of code as a guest instance,
without the involvement of user-space? The KVM functions are already doing
what they should: VM and vCPU setup is already part of the kernel, the only
missing piece is memory handling.

With these series, (a) we expose to the Linux Kernel the bare minimum KVM
API functions in order to spawn a guest instance without the intervention
of user-space; and (b) we tweak the memory handling code of KVM-related
functions to account for another kind of guest, spawned in kernel-space.

PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces the
changes in the KVM memory handling code for x86_64 and aarch64.

An example of use is provided based on kvmtest.c
[https://lwn.net/Articles/658512/] at
https://github.com/cloudkernels/kvmmtest

Anastassios Nanos (2):
  KVMM: export needed symbols
  KVMM: Memory and interface related changes

 arch/arm64/include/asm/kvm_host.h   |   6 ++
 arch/arm64/kvm/fpsimd.c             |   8 +-
 arch/arm64/kvm/guest.c              |  48 +++++++++++
 arch/x86/include/asm/fpu/internal.h |  10 ++-
 arch/x86/kvm/cpuid.c                |  25 ++++++
 arch/x86/kvm/emulate.c              |   3 +-
 arch/x86/kvm/vmx/vmx.c              |   3 +-
 arch/x86/kvm/x86.c                  |  38 ++++++++-
 include/linux/kvm_host.h            |  36 +++++++++
 virt/kvm/arm/arm.c                  |  18 +++++
 virt/kvm/arm/mmu.c                  |  34 +++++---
 virt/kvm/async_pf.c                 |   4 +-
 virt/kvm/coalesced_mmio.c           |   6 ++
 virt/kvm/kvm_main.c                 | 120 ++++++++++++++++++++++------
 14 files changed, 316 insertions(+), 43 deletions(-)

-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] KVMM: export needed symbols
  2020-05-18  6:58 [PATCH 0/2] Expose KVM API to Linux Kernel Anastassios Nanos
@ 2020-05-18  6:58 ` Anastassios Nanos
  2020-05-18  7:41   ` Marc Zyngier
  2020-05-18  7:01 ` [PATCH 2/2] KVMM: Memory and interface related changes Anastassios Nanos
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Anastassios Nanos @ 2020-05-18  6:58 UTC (permalink / raw)
  To: kvm, kvmarm, linux-kernel
  Cc: Wanpeng Li, Marc Zyngier, Joerg Roedel, x86, H. Peter Anvin,
	Sean Christopherson, Ingo Molnar, Catalin Marinas,
	Borislav Petkov, Paolo Bonzini, Vitaly Kuznetsov, Will Deacon,
	Thomas Gleixner, Jim Mattson

Expose a set of KVM functions to the kernel, in order to be
able to spawn a VM instance without assistance from user-space.
To handle a guest instance, the system needs access to the following
functions:

    kvm_arch_vcpu_run_map_fp
    kvm_arch_vcpu_ioctl_get_regs
    kvm_arch_vcpu_ioctl_set_regs
    kvm_arm_get_reg
    kvm_arm_set_reg
    kvm_arch_vcpu_ioctl_get_sregs
    kvm_arch_vcpu_ioctl_set_sregs
    kvm_vcpu_preferred_target
    kvm_vcpu_ioctl_set_cpuid2
    kvm_vcpu_ioctl_get_cpuid2
    kvm_dev_ioctl_get_cpuid
    kvm_arch_vcpu_ioctl_run
    kvm_arch_vcpu_ioctl_get_regs
    kvm_arch_vcpu_ioctl_set_regs
    kvm_arch_vcpu_ioctl_get_sregs
    kvm_arch_vcpu_ioctl_set_sregs
    kvm_vcpu_initialized
    kvm_arch_vcpu_ioctl_run
    kvm_arch_vcpu_ioctl_vcpu_init
    kvm_coalesced_mmio_init
    kvm_create_vm
    kvm_destroy_vm
    kvm_vm_ioctl_set_memory_region
    kvm_vm_ioctl_create_vcpu

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
Signed-off-by: Charalampos Mainas <cmainas@nubificus.co.uk>
Signed-off-by: Konstantinos Papazafeiropoulos <kostis@nubificus.co.uk>
Signed-off-by: Stratos Psomadakis <psomas@nubificus.co.uk>
---
 arch/arm64/include/asm/kvm_host.h |  6 ++++
 arch/arm64/kvm/fpsimd.c           |  6 ++++
 arch/arm64/kvm/guest.c            | 48 +++++++++++++++++++++++++++++++
 arch/x86/kvm/cpuid.c              | 25 ++++++++++++++++
 arch/x86/kvm/x86.c                | 31 ++++++++++++++++++++
 include/linux/kvm_host.h          | 24 ++++++++++++++++
 virt/kvm/arm/arm.c                | 18 ++++++++++++
 virt/kvm/coalesced_mmio.c         |  6 ++++
 virt/kvm/kvm_main.c               | 23 +++++++++++++++
 9 files changed, 187 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 32c8a675e5a4..79b51a4eeaac 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -602,6 +602,7 @@ static inline void __cpu_init_stage2(void) {}
 
 /* Guest/host FPSIMD coordination helpers */
 int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
+int kvmm_kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
@@ -617,6 +618,11 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 	return kvm_arch_vcpu_run_map_fp(vcpu);
 }
 
+static inline int kvmm_kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	return kvmm_kvm_arch_vcpu_run_map_fp(vcpu);
+}
+
 void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
 void kvm_clr_pmu_events(u32 clr);
 
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index e329a36b2bee..274f8c47b22c 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -48,6 +48,12 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+int kvmm_kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
+{
+	return kvm_arch_vcpu_run_map_fp(vcpu);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_run_map_fp);
+
 /*
  * Prepare vcpu for saving the host's FPSIMD state and loading the guest's.
  * The actual loading is done by the FPSIMD access trap taken to hyp.
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 50a279d3ddd7..f64fe3a999ac 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -443,11 +443,26 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 	return -EINVAL;
 }
 
+int kvmm_kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu,
+				struct kvm_regs *regs)
+{
+	return kvm_arch_vcpu_ioctl_get_regs(vcpu, regs);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_ioctl_get_regs);
+
 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
 	return -EINVAL;
 }
 
+int kvmm_kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu,
+				struct kvm_regs *regs)
+{
+	return kvm_arch_vcpu_ioctl_set_regs(vcpu, regs);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_ioctl_set_regs);
+
+
 static int copy_core_reg_indices(const struct kvm_vcpu *vcpu,
 				 u64 __user *uindices)
 {
@@ -678,6 +693,12 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	return kvm_arm_sys_reg_get_reg(vcpu, reg);
 }
 
+int kvmm_kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	return kvm_arm_get_reg(vcpu, reg);
+}
+EXPORT_SYMBOL(kvmm_kvm_arm_get_reg);
+
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 {
 	/* We currently use nothing arch-specific in upper 32 bits */
@@ -696,18 +717,39 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	return kvm_arm_sys_reg_set_reg(vcpu, reg);
 }
 
+int kvmm_kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	return kvm_arm_set_reg(vcpu, reg);
+}
+EXPORT_SYMBOL(kvmm_kvm_arm_set_reg);
+
 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
 				  struct kvm_sregs *sregs)
 {
 	return -EINVAL;
 }
 
+
+int kvmm_kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				struct kvm_sregs *sregs)
+{
+	return kvm_arch_vcpu_ioctl_get_sregs(vcpu, sregs);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_ioctl_get_sregs);
+
 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 				  struct kvm_sregs *sregs)
 {
 	return -EINVAL;
 }
 
+int kvmm_kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				struct kvm_sregs *sregs)
+{
+	return kvm_arch_vcpu_ioctl_set_sregs(vcpu, sregs);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_ioctl_set_sregs);
+
 int __kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
 			      struct kvm_vcpu_events *events)
 {
@@ -801,6 +843,12 @@ int kvm_vcpu_preferred_target(struct kvm_vcpu_init *init)
 	return 0;
 }
 
+int kvmm_kvm_vcpu_preferred_target(struct kvm_vcpu_init *init)
+{
+	return kvm_vcpu_preferred_target(init);
+}
+EXPORT_SYMBOL(kvmm_kvm_vcpu_preferred_target);
+
 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 {
 	return -EINVAL;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 901cd1fdecd9..c7d2071da47a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -238,6 +238,14 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
 	return r;
 }
 
+int kvmm_kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
+			      struct kvm_cpuid2 *cpuid,
+			      struct kvm_cpuid_entry2 __user *entries)
+{
+	return kvm_vcpu_ioctl_set_cpuid2(vcpu, cpuid, entries);
+}
+EXPORT_SYMBOL(kvmm_kvm_vcpu_ioctl_set_cpuid2);
+
 int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
 			      struct kvm_cpuid2 *cpuid,
 			      struct kvm_cpuid_entry2 __user *entries)
@@ -258,6 +266,14 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
 	return r;
 }
 
+int kvmm_kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
+			      struct kvm_cpuid2 *cpuid,
+			      struct kvm_cpuid_entry2 __user *entries)
+{
+	return kvm_vcpu_ioctl_get_cpuid2(vcpu, cpuid, entries);
+}
+EXPORT_SYMBOL(kvmm_kvm_vcpu_ioctl_get_cpuid2);
+
 static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
 {
 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
@@ -900,6 +916,15 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
 	return r;
 }
 
+int kvmm_kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
+			    struct kvm_cpuid_entry2 __user *entries,
+			    unsigned int type)
+{
+	return kvm_dev_ioctl_get_cpuid(cpuid, entries, type);
+
+}
+EXPORT_SYMBOL(kvmm_kvm_dev_ioctl_get_cpuid);
+
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
 					      u32 function, u32 index)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bf2ecafd027..b4039402aa7d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8776,6 +8776,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	return r;
 }
 
+int kvmm_kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+	return kvm_arch_vcpu_ioctl_run(vcpu, kvm_run);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_ioctl_run);
+
+
 static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
 	if (vcpu->arch.emulate_regs_need_sync_to_vcpu) {
@@ -8819,6 +8826,12 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 	vcpu_put(vcpu);
 	return 0;
 }
+int kvmm_kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu,
+				struct kvm_regs *regs)
+{
+	return kvm_arch_vcpu_ioctl_get_regs(vcpu, regs);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_ioctl_get_regs);
 
 static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
@@ -8859,6 +8872,12 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 	vcpu_put(vcpu);
 	return 0;
 }
+int kvmm_kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu,
+				struct kvm_regs *regs)
+{
+	return kvm_arch_vcpu_ioctl_set_regs(vcpu, regs);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_ioctl_set_regs);
 
 void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
 {
@@ -8914,6 +8933,12 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
 	vcpu_put(vcpu);
 	return 0;
 }
+int kvmm_kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return kvm_arch_vcpu_ioctl_get_sregs(vcpu, sregs);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_ioctl_get_sregs);
 
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
@@ -9113,6 +9138,12 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	vcpu_put(vcpu);
 	return ret;
 }
+int kvmm_kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return kvm_arch_vcpu_ioctl_set_sregs(vcpu, sregs);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_ioctl_set_sregs);
 
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 					struct kvm_guest_debug *dbg)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6d58beb65454..794b97c4ddf9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1408,11 +1408,16 @@ int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 
 #ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
 int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
+int kvmm_kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
 #else
 static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 {
 	return 0;
 }
+static inline int kvmm_kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
 #endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
 
 typedef int (*kvm_vm_thread_fn_t)(struct kvm *kvm, uintptr_t data);
@@ -1421,4 +1426,23 @@ int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn,
 				uintptr_t data, const char *name,
 				struct task_struct **thread_ptr);
 
+/* KVMM related functions */
+struct kvm *kvmm_kvm_create_vm(unsigned long type);
+int kvmm_kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id);
+void kvmm_kvm_destroy_vm(struct kvm *kvm);
+int kvmm_kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
+				 struct kvm_run *kvm_run);
+int kvmm_kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				       struct kvm_sregs *sregs);
+int kvmm_kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				       struct kvm_sregs *sregs);
+int kvmm_kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu,
+				      struct kvm_regs *regs);
+int kvmm_kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu,
+				      struct kvm_regs *regs);
+int kvmm_kvm_coalesced_mmio_init(struct kvm *kvm);
+int kvmm_kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
+				   struct kvm_userspace_memory_region *mem);
+
+
 #endif
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..d33bbb64b515 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -607,6 +607,11 @@ static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.target >= 0;
 }
+int kvmm_kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
+{
+	return kvm_vcpu_initialized(vcpu);
+}
+EXPORT_SYMBOL(kvmm_kvm_vcpu_initialized);
 
 static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 {
@@ -830,6 +835,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return ret;
 }
 
+int kvmm_kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return kvm_arch_vcpu_ioctl_run(vcpu, run);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_ioctl_run);
+
 static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
 {
 	int bit_index;
@@ -1000,6 +1011,13 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
+int kvmm_kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
+					 struct kvm_vcpu_init *init)
+{
+	return kvm_arch_vcpu_ioctl_vcpu_init(vcpu, init);
+}
+EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_ioctl_vcpu_init);
+
 static int kvm_arm_vcpu_set_attr(struct kvm_vcpu *vcpu,
 				 struct kvm_device_attr *attr)
 {
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 00c747dbc82e..ad7e540dcbd7 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -128,6 +128,12 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
 	return 0;
 }
 
+int kvmm_kvm_coalesced_mmio_init(struct kvm *kvm)
+{
+	return kvm_coalesced_mmio_init(kvm);
+}
+EXPORT_SYMBOL(kvmm_kvm_coalesced_mmio_init);
+
 void kvm_coalesced_mmio_free(struct kvm *kvm)
 {
 	if (kvm->coalesced_mmio_ring)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 74bdb7bf3295..dcbecc2ab370 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -761,6 +761,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	mmdrop(current->mm);
 	return ERR_PTR(r);
 }
+struct kvm *kvmm_kvm_create_vm(unsigned long type)
+{
+	return kvm_create_vm(type);
+}
+EXPORT_SYMBOL(kvmm_kvm_create_vm);
 
 static void kvm_destroy_devices(struct kvm *kvm)
 {
@@ -816,6 +821,12 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	mmdrop(mm);
 }
 
+void kvmm_kvm_destroy_vm(struct kvm *kvm)
+{
+	kvm_destroy_vm(kvm);
+}
+EXPORT_SYMBOL(kvmm_kvm_destroy_vm);
+
 void kvm_get_kvm(struct kvm *kvm)
 {
 	refcount_inc(&kvm->users_count);
@@ -1332,6 +1343,12 @@ static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
 
 	return kvm_set_memory_region(kvm, mem);
 }
+int kvmm_kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
+				  struct kvm_userspace_memory_region *mem)
+{
+	return kvm_vm_ioctl_set_memory_region(kvm, mem);
+}
+EXPORT_SYMBOL_GPL(kvmm_kvm_vm_ioctl_set_memory_region);
 
 #ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
 /**
@@ -3078,6 +3095,12 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 	return r;
 }
 
+int kvmm_kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
+{
+	return kvm_vm_ioctl_create_vcpu(kvm, id);
+}
+EXPORT_SYMBOL(kvmm_kvm_vm_ioctl_create_vcpu);
+
 static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
 {
 	if (sigset) {
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] KVMM: Memory and interface related changes
  2020-05-18  7:01 ` [PATCH 2/2] KVMM: Memory and interface related changes Anastassios Nanos
@ 2020-05-18  6:59   ` Anastassios Nanos
  2020-05-18  9:13   ` kbuild test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Anastassios Nanos @ 2020-05-18  6:59 UTC (permalink / raw)
  To: kvm, kvmarm, linux-kernel
  Cc: Wanpeng Li, Marc Zyngier, Joerg Roedel, x86, H. Peter Anvin,
	Sean Christopherson, Ingo Molnar, Catalin Marinas,
	Borislav Petkov, Paolo Bonzini, Vitaly Kuznetsov, Will Deacon,
	Thomas Gleixner, Jim Mattson

To run KVMM guests, we need to extend the KVM memory subsystem
to support kernel-allocated memory. To do this, we allow "vmalloced"
space to be used as KVM guest addresses.

Additionally, since current->mm is NULL when being in the Kernel, in
the context of kthreads, we check whether mm refers to a kthread or
a userspace process, and proceed accordingly.

Finally, when pages need to be pinned (when get_user_pages is used),
we just return the vmalloc_to_page result, instead of going through
get_user_pages and returning the relevant pfn.

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
Signed-off-by: Konstantinos Papazafeiropoulos <kostis@nubificus.co.uk>
Signed-off-by: Charalampos Mainas <cmainas@nubificus.co.uk>
Signed-off-by: Stratos Psomadakis <psomas@nubificus.co.uk>
---
 arch/arm64/kvm/fpsimd.c             |  2 +-
 arch/x86/include/asm/fpu/internal.h | 10 ++-
 arch/x86/kvm/emulate.c              |  3 +-
 arch/x86/kvm/vmx/vmx.c              |  3 +-
 arch/x86/kvm/x86.c                  |  7 ++-
 include/linux/kvm_host.h            | 12 ++++
 virt/kvm/arm/mmu.c                  | 34 ++++++----
 virt/kvm/async_pf.c                 |  4 +-
 virt/kvm/kvm_main.c                 | 97 ++++++++++++++++++++++-------
 9 files changed, 129 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 274f8c47b22c..411c9ae1f12a 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -66,7 +66,7 @@ EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_run_map_fp);
  */
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 {
-	BUG_ON(!current->mm);
+	BUG_ON(!(vcpu->kvm->is_kvmm_vm ? current->active_mm : current->mm));
 
 	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
 			      KVM_ARM64_HOST_SVE_IN_USE |
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 44c48e34d799..f9e70df3b268 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -536,7 +536,8 @@ static inline void __fpregs_load_activate(void)
 	struct fpu *fpu = &current->thread.fpu;
 	int cpu = smp_processor_id();
 
-	if (WARN_ON_ONCE(current->flags & PF_KTHREAD))
+	/* don't do anything if we're in the kernel */
+	if (current->flags & PF_KTHREAD)
 		return;
 
 	if (!fpregs_state_valid(fpu, cpu)) {
@@ -571,7 +572,8 @@ static inline void __fpregs_load_activate(void)
  */
 static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 {
-	if (static_cpu_has(X86_FEATURE_FPU) && !(current->flags & PF_KTHREAD)) {
+	if (static_cpu_has(X86_FEATURE_FPU) &&
+	    !(current->flags & PF_KTHREAD)) {
 		if (!copy_fpregs_to_fpstate(old_fpu))
 			old_fpu->last_cpu = -1;
 		else
@@ -598,6 +600,10 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
 	if (!static_cpu_has(X86_FEATURE_FPU))
 		return;
 
+	/* don't do anything if we're in the kernel */
+	if (current->flags & PF_KTHREAD)
+		return;
+
 	set_thread_flag(TIF_NEED_FPU_LOAD);
 
 	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index bddaba9c68dd..0f6bc2404d0d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1086,7 +1086,8 @@ static void emulator_get_fpu(void)
 	fpregs_lock();
 
 	fpregs_assert_state_consistent();
-	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+	if (test_thread_flag(TIF_NEED_FPU_LOAD) &&
+	    !(current->flags & PF_KTHREAD))
 		switch_fpu_return();
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 83050977490c..2e34cf5829f2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1166,7 +1166,8 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 	savesegment(es, host_state->es_sel);
 
 	gs_base = cpu_kernelmode_gs_base(cpu);
-	if (likely(is_64bit_mm(current->mm))) {
+	if (likely(is_64bit_mm(vcpu->kvm->is_kvmm_vm ?
+	    current->active_mm : current->mm))) {
 		save_fsgs_for_kvm();
 		fs_sel = current->thread.fsindex;
 		gs_sel = current->thread.gsindex;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4039402aa7d..42451faa6c0b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8390,7 +8390,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	guest_enter_irqoff();
 
 	fpregs_assert_state_consistent();
-	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+	if (test_thread_flag(TIF_NEED_FPU_LOAD) &&
+	    !(current->flags & PF_KTHREAD))
 		switch_fpu_return();
 
 	if (unlikely(vcpu->arch.switch_db_regs)) {
@@ -9903,11 +9904,13 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-	if (current->mm == kvm->mm) {
+	if (current->mm == kvm->mm || kvm->is_kvmm_vm) {
 		/*
 		 * Free memory regions allocated on behalf of userspace,
 		 * unless the the memory map has changed due to process exit
 		 * or fd copying.
+		 * In case it is a kvmm VM, then we need to invalidate the
+		 * regions allocated.
 		 */
 		mutex_lock(&kvm->slots_lock);
 		__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 794b97c4ddf9..c4d23bbfff60 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -121,8 +121,19 @@ static inline bool is_noslot_pfn(kvm_pfn_t pfn)
 #define KVM_HVA_ERR_BAD		(PAGE_OFFSET)
 #define KVM_HVA_ERR_RO_BAD	(PAGE_OFFSET + PAGE_SIZE)
 
+static inline bool kvmm_valid_addr(unsigned long addr)
+{
+	if (is_vmalloc_addr((void *)addr))
+		return true;
+	else
+		return false;
+}
+
 static inline bool kvm_is_error_hva(unsigned long addr)
 {
+	if (kvmm_valid_addr(addr))
+		return 0;
+
 	return addr >= PAGE_OFFSET;
 }
 
@@ -503,6 +514,7 @@ struct kvm {
 	struct srcu_struct srcu;
 	struct srcu_struct irq_srcu;
 	pid_t userspace_pid;
+	bool is_kvmm_vm;
 };
 
 #define kvm_err(fmt, ...) \
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index e3b9ee268823..c53f3b4b1b28 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -926,6 +926,8 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
 	phys_addr_t size = PAGE_SIZE * memslot->npages;
 	hva_t reg_end = hva + size;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
 	/*
 	 * A memory region could potentially cover multiple VMAs, and any holes
@@ -940,7 +942,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 	 *     +--------------------------------------------+
 	 */
 	do {
-		struct vm_area_struct *vma = find_vma(current->mm, hva);
+		struct vm_area_struct *vma = find_vma(mm, hva);
 		hva_t vm_start, vm_end;
 
 		if (!vma || vma->vm_start >= reg_end)
@@ -972,9 +974,11 @@ void stage2_unmap_vm(struct kvm *kvm)
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
 	int idx;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
 	idx = srcu_read_lock(&kvm->srcu);
-	down_read(&current->mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 	spin_lock(&kvm->mmu_lock);
 
 	slots = kvm_memslots(kvm);
@@ -982,7 +986,7 @@ void stage2_unmap_vm(struct kvm *kvm)
 		stage2_unmap_memslot(kvm, memslot);
 
 	spin_unlock(&kvm->mmu_lock);
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	srcu_read_unlock(&kvm->srcu, idx);
 }
 
@@ -1673,6 +1677,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	pgprot_t mem_type = PAGE_S2;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long vma_pagesize, flags = 0;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
 	write_fault = kvm_is_write_fault(vcpu);
 	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1684,15 +1690,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	/* Let's check if we will get back a huge page backed by hugetlbfs */
-	down_read(&current->mm->mmap_sem);
-	vma = find_vma_intersection(current->mm, hva, hva + 1);
-	if (unlikely(!vma)) {
+	down_read(&mm->mmap_sem);
+	vma = find_vma_intersection(mm, hva, hva + 1);
+	/* vma is invalid for a KVMM guest */
+	if (unlikely(!vma) && !kvm->is_kvmm_vm) {
 		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
-		up_read(&current->mm->mmap_sem);
+		up_read(&mm->mmap_sem);
 		return -EFAULT;
 	}
 
-	if (is_vm_hugetlb_page(vma))
+	/* FIXME: not sure how to handle hugetlb in KVMM, skip for now */
+	if (is_vm_hugetlb_page(vma) && !kvm->is_kvmm_vm)
 		vma_shift = huge_page_shift(hstate_vma(vma));
 	else
 		vma_shift = PAGE_SHIFT;
@@ -1715,7 +1723,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (vma_pagesize == PMD_SIZE ||
 	    (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
 		gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 
 	/* We need minimum second+third level pages */
 	ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm),
@@ -2278,6 +2286,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	hva_t reg_end = hva + mem->memory_size;
 	bool writable = !(mem->flags & KVM_MEM_READONLY);
 	int ret = 0;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
 	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
 			change != KVM_MR_FLAGS_ONLY)
@@ -2291,7 +2301,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	    (kvm_phys_size(kvm) >> PAGE_SHIFT))
 		return -EFAULT;
 
-	down_read(&current->mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 	/*
 	 * A memory region could potentially cover multiple VMAs, and any holes
 	 * between them, so iterate over all of them to find out if we can map
@@ -2305,7 +2315,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	 *     +--------------------------------------------+
 	 */
 	do {
-		struct vm_area_struct *vma = find_vma(current->mm, hva);
+		struct vm_area_struct *vma = find_vma(mm, hva);
 		hva_t vm_start, vm_end;
 
 		if (!vma || vma->vm_start >= reg_end)
@@ -2350,7 +2360,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 		stage2_flush_memslot(kvm, memslot);
 	spin_unlock(&kvm->mmu_lock);
 out:
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	return ret;
 }
 
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 15e5b037f92d..d0b23e1afb87 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -175,7 +175,9 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	work->cr2_or_gpa = cr2_or_gpa;
 	work->addr = hva;
 	work->arch = *arch;
-	work->mm = current->mm;
+	/* if we're a KVMM VM, then current->mm is null */
+	work->mm = work->vcpu->kvm->is_kvmm_vm ?
+			current->active_mm : current->mm;
 	mmget(work->mm);
 	kvm_get_kvm(work->vcpu->kvm);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dcbecc2ab370..5e1929f7f282 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -541,6 +541,9 @@ static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
 
 static int kvm_init_mmu_notifier(struct kvm *kvm)
 {
+	/* FIXME: ignore mmu_notifiers for KVMM guests, for now */
+	if (kvm->is_kvmm_vm)
+		return 0;
 	kvm->mmu_notifier.ops = &kvm_mmu_notifier_ops;
 	return mmu_notifier_register(&kvm->mmu_notifier, current->mm);
 }
@@ -672,13 +675,22 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	struct kvm *kvm = kvm_arch_alloc_vm();
 	int r = -ENOMEM;
 	int i;
+	/* if we're a KVMM guest, then current->mm is NULL */
+	struct mm_struct *mm = (type == 1) ? current->active_mm : current->mm;
 
 	if (!kvm)
 		return ERR_PTR(-ENOMEM);
 
+	if (type == 1) {
+		kvm->is_kvmm_vm = true;
+		type = 0;
+	} else {
+		kvm->is_kvmm_vm = false;
+	}
+
 	spin_lock_init(&kvm->mmu_lock);
-	mmgrab(current->mm);
-	kvm->mm = current->mm;
+	mmgrab(mm);
+	kvm->mm = mm;
 	kvm_eventfd_init(kvm);
 	mutex_init(&kvm->lock);
 	mutex_init(&kvm->irq_lock);
@@ -741,7 +753,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 out_err:
 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 	if (kvm->mmu_notifier.ops)
-		mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
+		mmu_notifier_unregister(&kvm->mmu_notifier, mm);
 #endif
 out_err_no_mmu_notifier:
 	hardware_disable_all();
@@ -758,7 +770,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	cleanup_srcu_struct(&kvm->srcu);
 out_err_no_srcu:
 	kvm_arch_free_vm(kvm);
-	mmdrop(current->mm);
+	mmdrop(mm);
 	return ERR_PTR(r);
 }
 struct kvm *kvmm_kvm_create_vm(unsigned long type)
@@ -1228,8 +1240,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	/* We can read the guest memory with __xxx_user() later on. */
 	if ((id < KVM_USER_MEM_SLOTS) &&
 	    ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
-	     !access_ok((void __user *)(unsigned long)mem->userspace_addr,
-			mem->memory_size)))
+	     !(access_ok((void __user *)(unsigned long)mem->userspace_addr,
+			mem->memory_size) || kvm->is_kvmm_vm)))
 		return -EINVAL;
 	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
 		return -EINVAL;
@@ -1484,7 +1496,6 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	return 0;
 }
 
-
 /**
  * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
  * @kvm: kvm instance
@@ -1636,6 +1647,8 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	struct vm_area_struct *vma;
 	unsigned long addr, size;
+	struct mm_struct *mm = vcpu->kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
 	size = PAGE_SIZE;
 
@@ -1643,15 +1656,15 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
 	if (kvm_is_error_hva(addr))
 		return PAGE_SIZE;
 
-	down_read(&current->mm->mmap_sem);
-	vma = find_vma(current->mm, addr);
+	down_read(&mm->mmap_sem);
+	vma = find_vma(mm, addr);
 	if (!vma)
 		goto out;
 
 	size = vma_kernel_pagesize(vma);
 
 out:
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 
 	return size;
 }
@@ -1760,8 +1773,12 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
 	 */
 	if (!(write_fault || writable))
 		return false;
-
-	npages = __get_user_pages_fast(addr, 1, 1, page);
+	if (kvmm_valid_addr(addr)) {
+		npages = 1;
+		page[0] = vmalloc_to_page((void *)addr);
+	} else {
+		npages = __get_user_pages_fast(addr, 1, 1, page);
+	}
 	if (npages == 1) {
 		*pfn = page_to_pfn(page[0]);
 
@@ -1794,15 +1811,28 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 	if (async)
 		flags |= FOLL_NOWAIT;
 
-	npages = get_user_pages_unlocked(addr, 1, &page, flags);
+	if (kvmm_valid_addr(addr)) {
+		npages = 1;
+		page = vmalloc_to_page((void *)addr);
+	} else {
+		npages = get_user_pages_unlocked(addr, 1, &page, flags);
+	}
 	if (npages != 1)
 		return npages;
 
 	/* map read fault as writable if possible */
 	if (unlikely(!write_fault) && writable) {
 		struct page *wpage;
+		int lnpages = 0;
+
+		if (kvmm_valid_addr(addr)) {
+			npages_local = 1;
+			wpage = vmalloc_to_page((void *)addr);
+		} else {
+			lnpages = __get_user_pages_fast(addr, 1, 1, &wpage);
+		}
 
-		if (__get_user_pages_fast(addr, 1, 1, &wpage) == 1) {
+		if (lnpages == 1) {
 			*writable = true;
 			put_page(page);
 			page = wpage;
@@ -1830,6 +1860,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 {
 	unsigned long pfn;
 	int r;
+	struct mm_struct *mm = kvmm_valid_addr(addr) ?
+				current->active_mm : current->mm;
 
 	r = follow_pfn(vma, addr, &pfn);
 	if (r) {
@@ -1838,7 +1870,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 		 * not call the fault handler, so do it here.
 		 */
 		bool unlocked = false;
-		r = fixup_user_fault(current, current->mm, addr,
+		r = fixup_user_fault(current, mm, addr,
 				     (write_fault ? FAULT_FLAG_WRITE : 0),
 				     &unlocked);
 		if (unlocked)
@@ -1892,6 +1924,8 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	struct vm_area_struct *vma;
 	kvm_pfn_t pfn = 0;
 	int npages, r;
+	struct mm_struct *mm = kvmm_valid_addr(addr) ?
+				current->active_mm : current->mm;
 
 	/* we can do it either atomically or asynchronously, not both */
 	BUG_ON(atomic && async);
@@ -1906,7 +1940,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	if (npages == 1)
 		return pfn;
 
-	down_read(&current->mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 	if (npages == -EHWPOISON ||
 	      (!async && check_user_page_hwpoison(addr))) {
 		pfn = KVM_PFN_ERR_HWPOISON;
@@ -1914,7 +1948,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	}
 
 retry:
-	vma = find_vma_intersection(current->mm, addr, addr + 1);
+	vma = find_vma_intersection(mm, addr, addr + 1);
 
 	if (vma == NULL)
 		pfn = KVM_PFN_ERR_FAULT;
@@ -1930,7 +1964,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 		pfn = KVM_PFN_ERR_FAULT;
 	}
 exit:
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	return pfn;
 }
 
@@ -2208,6 +2242,9 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn)
 {
+	/* rely on the fact that we're in the kernel */
+	if (current->flags & PF_KTHREAD)
+		return;
 	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
 		put_page(pfn_to_page(pfn));
 }
@@ -2244,6 +2281,9 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
 
 void kvm_get_pfn(kvm_pfn_t pfn)
 {
+	/* rely on the fact that we're in the kernel */
+	if (current->flags & PF_KTHREAD)
+		return;
 	if (!kvm_is_reserved_pfn(pfn))
 		get_page(pfn_to_page(pfn));
 }
@@ -3120,8 +3160,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
 	int r;
 	struct kvm_fpu *fpu = NULL;
 	struct kvm_sregs *kvm_sregs = NULL;
+	struct mm_struct *mm = vcpu->kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
-	if (vcpu->kvm->mm != current->mm)
+	if (vcpu->kvm->mm != mm)
 		return -EIO;
 
 	if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
@@ -3327,9 +3369,12 @@ static long kvm_vcpu_compat_ioctl(struct file *filp,
 	struct kvm_vcpu *vcpu = filp->private_data;
 	void __user *argp = compat_ptr(arg);
 	int r;
+	struct mm_struct *mm = vcpu->kvm->is_kvmm_vm ?
+				       current->active_mm : current->mm;
 
-	if (vcpu->kvm->mm != current->mm)
+	if (vcpu->kvm->mm != mm) {
 		return -EIO;
+	}
 
 	switch (ioctl) {
 	case KVM_SET_SIGNAL_MASK: {
@@ -3392,8 +3437,10 @@ static long kvm_device_ioctl(struct file *filp, unsigned int ioctl,
 			     unsigned long arg)
 {
 	struct kvm_device *dev = filp->private_data;
+	struct mm_struct *mm = dev->kvm->is_kvmm_vm ?
+				       current->active_mm : current->mm;
 
-	if (dev->kvm->mm != current->mm)
+	if (dev->kvm->mm != mm)
 		return -EIO;
 
 	switch (ioctl) {
@@ -3600,8 +3647,10 @@ static long kvm_vm_ioctl(struct file *filp,
 	struct kvm *kvm = filp->private_data;
 	void __user *argp = (void __user *)arg;
 	int r;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				       current->active_mm : current->mm;
 
-	if (kvm->mm != current->mm)
+	if (kvm->mm != mm)
 		return -EIO;
 	switch (ioctl) {
 	case KVM_CREATE_VCPU:
@@ -3797,9 +3846,11 @@ static long kvm_vm_compat_ioctl(struct file *filp,
 			   unsigned int ioctl, unsigned long arg)
 {
 	struct kvm *kvm = filp->private_data;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				       current->active_mm : current->mm;
 	int r;
 
-	if (kvm->mm != current->mm)
+	if (kvm->mm != mm)
 		return -EIO;
 	switch (ioctl) {
 	case KVM_GET_DIRTY_LOG: {
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] KVMM: Memory and interface related changes
  2020-05-18  6:58 [PATCH 0/2] Expose KVM API to Linux Kernel Anastassios Nanos
  2020-05-18  6:58 ` [PATCH 1/2] KVMM: export needed symbols Anastassios Nanos
@ 2020-05-18  7:01 ` Anastassios Nanos
  2020-05-18  6:59   ` Anastassios Nanos
                     ` (3 more replies)
  2020-05-18  7:50 ` [PATCH 0/2] Expose KVM API to Linux Kernel Marc Zyngier
  2020-05-18  8:42 ` Thomas Gleixner
  3 siblings, 4 replies; 17+ messages in thread
From: Anastassios Nanos @ 2020-05-18  7:01 UTC (permalink / raw)
  To: kvm, kvmarm, linux-kernel
  Cc: Wanpeng Li, Marc Zyngier, Joerg Roedel, x86, H. Peter Anvin,
	Sean Christopherson, Ingo Molnar, Catalin Marinas,
	Borislav Petkov, Paolo Bonzini, Vitaly Kuznetsov, Will Deacon,
	Thomas Gleixner, Jim Mattson

To run KVMM guests, we need to extend the KVM memory subsystem
to support kernel-allocated memory. To do this, we allow "vmalloced"
space to be used as KVM guest addresses.

Additionally, since current->mm is NULL when being in the Kernel, in
the context of kthreads, we check whether mm refers to a kthread or
a userspace process, and proceed accordingly.

Finally, when pages need to be pinned (when get_user_pages is used),
we just return the vmalloc_to_page result, instead of going through
get_user_pages and returning the relevant pfn.

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
Signed-off-by: Konstantinos Papazafeiropoulos <kostis@nubificus.co.uk>
Signed-off-by: Charalampos Mainas <cmainas@nubificus.co.uk>
Signed-off-by: Stratos Psomadakis <psomas@nubificus.co.uk>
---
 arch/arm64/kvm/fpsimd.c             |  2 +-
 arch/x86/include/asm/fpu/internal.h | 10 ++-
 arch/x86/kvm/emulate.c              |  3 +-
 arch/x86/kvm/vmx/vmx.c              |  3 +-
 arch/x86/kvm/x86.c                  |  7 ++-
 include/linux/kvm_host.h            | 12 ++++
 virt/kvm/arm/mmu.c                  | 34 ++++++----
 virt/kvm/async_pf.c                 |  4 +-
 virt/kvm/kvm_main.c                 | 97 ++++++++++++++++++++++-------
 9 files changed, 129 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 274f8c47b22c..411c9ae1f12a 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -66,7 +66,7 @@ EXPORT_SYMBOL(kvmm_kvm_arch_vcpu_run_map_fp);
  */
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 {
-	BUG_ON(!current->mm);
+	BUG_ON(!(vcpu->kvm->is_kvmm_vm ? current->active_mm : current->mm));
 
 	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
 			      KVM_ARM64_HOST_SVE_IN_USE |
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 44c48e34d799..f9e70df3b268 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -536,7 +536,8 @@ static inline void __fpregs_load_activate(void)
 	struct fpu *fpu = &current->thread.fpu;
 	int cpu = smp_processor_id();
 
-	if (WARN_ON_ONCE(current->flags & PF_KTHREAD))
+	/* don't do anything if we're in the kernel */
+	if (current->flags & PF_KTHREAD)
 		return;
 
 	if (!fpregs_state_valid(fpu, cpu)) {
@@ -571,7 +572,8 @@ static inline void __fpregs_load_activate(void)
  */
 static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 {
-	if (static_cpu_has(X86_FEATURE_FPU) && !(current->flags & PF_KTHREAD)) {
+	if (static_cpu_has(X86_FEATURE_FPU) &&
+	    !(current->flags & PF_KTHREAD)) {
 		if (!copy_fpregs_to_fpstate(old_fpu))
 			old_fpu->last_cpu = -1;
 		else
@@ -598,6 +600,10 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
 	if (!static_cpu_has(X86_FEATURE_FPU))
 		return;
 
+	/* don't do anything if we're in the kernel */
+	if (current->flags & PF_KTHREAD)
+		return;
+
 	set_thread_flag(TIF_NEED_FPU_LOAD);
 
 	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index bddaba9c68dd..0f6bc2404d0d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1086,7 +1086,8 @@ static void emulator_get_fpu(void)
 	fpregs_lock();
 
 	fpregs_assert_state_consistent();
-	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+	if (test_thread_flag(TIF_NEED_FPU_LOAD) &&
+	    !(current->flags & PF_KTHREAD))
 		switch_fpu_return();
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 83050977490c..2e34cf5829f2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1166,7 +1166,8 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 	savesegment(es, host_state->es_sel);
 
 	gs_base = cpu_kernelmode_gs_base(cpu);
-	if (likely(is_64bit_mm(current->mm))) {
+	if (likely(is_64bit_mm(vcpu->kvm->is_kvmm_vm ?
+	    current->active_mm : current->mm))) {
 		save_fsgs_for_kvm();
 		fs_sel = current->thread.fsindex;
 		gs_sel = current->thread.gsindex;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4039402aa7d..42451faa6c0b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8390,7 +8390,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	guest_enter_irqoff();
 
 	fpregs_assert_state_consistent();
-	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+	if (test_thread_flag(TIF_NEED_FPU_LOAD) &&
+	    !(current->flags & PF_KTHREAD))
 		switch_fpu_return();
 
 	if (unlikely(vcpu->arch.switch_db_regs)) {
@@ -9903,11 +9904,13 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-	if (current->mm == kvm->mm) {
+	if (current->mm == kvm->mm || kvm->is_kvmm_vm) {
 		/*
 		 * Free memory regions allocated on behalf of userspace,
 		 * unless the the memory map has changed due to process exit
 		 * or fd copying.
+		 * In case it is a kvmm VM, then we need to invalidate the
+		 * regions allocated.
 		 */
 		mutex_lock(&kvm->slots_lock);
 		__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 794b97c4ddf9..c4d23bbfff60 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -121,8 +121,19 @@ static inline bool is_noslot_pfn(kvm_pfn_t pfn)
 #define KVM_HVA_ERR_BAD		(PAGE_OFFSET)
 #define KVM_HVA_ERR_RO_BAD	(PAGE_OFFSET + PAGE_SIZE)
 
+static inline bool kvmm_valid_addr(unsigned long addr)
+{
+	if (is_vmalloc_addr((void *)addr))
+		return true;
+	else
+		return false;
+}
+
 static inline bool kvm_is_error_hva(unsigned long addr)
 {
+	if (kvmm_valid_addr(addr))
+		return 0;
+
 	return addr >= PAGE_OFFSET;
 }
 
@@ -503,6 +514,7 @@ struct kvm {
 	struct srcu_struct srcu;
 	struct srcu_struct irq_srcu;
 	pid_t userspace_pid;
+	bool is_kvmm_vm;
 };
 
 #define kvm_err(fmt, ...) \
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index e3b9ee268823..c53f3b4b1b28 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -926,6 +926,8 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
 	phys_addr_t size = PAGE_SIZE * memslot->npages;
 	hva_t reg_end = hva + size;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
 	/*
 	 * A memory region could potentially cover multiple VMAs, and any holes
@@ -940,7 +942,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 	 *     +--------------------------------------------+
 	 */
 	do {
-		struct vm_area_struct *vma = find_vma(current->mm, hva);
+		struct vm_area_struct *vma = find_vma(mm, hva);
 		hva_t vm_start, vm_end;
 
 		if (!vma || vma->vm_start >= reg_end)
@@ -972,9 +974,11 @@ void stage2_unmap_vm(struct kvm *kvm)
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
 	int idx;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
 	idx = srcu_read_lock(&kvm->srcu);
-	down_read(&current->mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 	spin_lock(&kvm->mmu_lock);
 
 	slots = kvm_memslots(kvm);
@@ -982,7 +986,7 @@ void stage2_unmap_vm(struct kvm *kvm)
 		stage2_unmap_memslot(kvm, memslot);
 
 	spin_unlock(&kvm->mmu_lock);
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	srcu_read_unlock(&kvm->srcu, idx);
 }
 
@@ -1673,6 +1677,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	pgprot_t mem_type = PAGE_S2;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long vma_pagesize, flags = 0;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
 	write_fault = kvm_is_write_fault(vcpu);
 	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1684,15 +1690,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	/* Let's check if we will get back a huge page backed by hugetlbfs */
-	down_read(&current->mm->mmap_sem);
-	vma = find_vma_intersection(current->mm, hva, hva + 1);
-	if (unlikely(!vma)) {
+	down_read(&mm->mmap_sem);
+	vma = find_vma_intersection(mm, hva, hva + 1);
+	/* vma is invalid for a KVMM guest */
+	if (unlikely(!vma) && !kvm->is_kvmm_vm) {
 		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
-		up_read(&current->mm->mmap_sem);
+		up_read(&mm->mmap_sem);
 		return -EFAULT;
 	}
 
-	if (is_vm_hugetlb_page(vma))
+	/* FIXME: not sure how to handle hugetlb in KVMM, skip for now */
+	if (is_vm_hugetlb_page(vma) && !kvm->is_kvmm_vm)
 		vma_shift = huge_page_shift(hstate_vma(vma));
 	else
 		vma_shift = PAGE_SHIFT;
@@ -1715,7 +1723,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (vma_pagesize == PMD_SIZE ||
 	    (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
 		gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 
 	/* We need minimum second+third level pages */
 	ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm),
@@ -2278,6 +2286,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	hva_t reg_end = hva + mem->memory_size;
 	bool writable = !(mem->flags & KVM_MEM_READONLY);
 	int ret = 0;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
 	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
 			change != KVM_MR_FLAGS_ONLY)
@@ -2291,7 +2301,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	    (kvm_phys_size(kvm) >> PAGE_SHIFT))
 		return -EFAULT;
 
-	down_read(&current->mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 	/*
 	 * A memory region could potentially cover multiple VMAs, and any holes
 	 * between them, so iterate over all of them to find out if we can map
@@ -2305,7 +2315,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	 *     +--------------------------------------------+
 	 */
 	do {
-		struct vm_area_struct *vma = find_vma(current->mm, hva);
+		struct vm_area_struct *vma = find_vma(mm, hva);
 		hva_t vm_start, vm_end;
 
 		if (!vma || vma->vm_start >= reg_end)
@@ -2350,7 +2360,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 		stage2_flush_memslot(kvm, memslot);
 	spin_unlock(&kvm->mmu_lock);
 out:
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	return ret;
 }
 
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 15e5b037f92d..d0b23e1afb87 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -175,7 +175,9 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	work->cr2_or_gpa = cr2_or_gpa;
 	work->addr = hva;
 	work->arch = *arch;
-	work->mm = current->mm;
+	/* if we're a KVMM VM, then current->mm is null */
+	work->mm = work->vcpu->kvm->is_kvmm_vm ?
+			current->active_mm : current->mm;
 	mmget(work->mm);
 	kvm_get_kvm(work->vcpu->kvm);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dcbecc2ab370..5e1929f7f282 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -541,6 +541,9 @@ static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
 
 static int kvm_init_mmu_notifier(struct kvm *kvm)
 {
+	/* FIXME: ignore mmu_notifiers for KVMM guests, for now */
+	if (kvm->is_kvmm_vm)
+		return 0;
 	kvm->mmu_notifier.ops = &kvm_mmu_notifier_ops;
 	return mmu_notifier_register(&kvm->mmu_notifier, current->mm);
 }
@@ -672,13 +675,22 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	struct kvm *kvm = kvm_arch_alloc_vm();
 	int r = -ENOMEM;
 	int i;
+	/* if we're a KVMM guest, then current->mm is NULL */
+	struct mm_struct *mm = (type == 1) ? current->active_mm : current->mm;
 
 	if (!kvm)
 		return ERR_PTR(-ENOMEM);
 
+	if (type == 1) {
+		kvm->is_kvmm_vm = true;
+		type = 0;
+	} else {
+		kvm->is_kvmm_vm = false;
+	}
+
 	spin_lock_init(&kvm->mmu_lock);
-	mmgrab(current->mm);
-	kvm->mm = current->mm;
+	mmgrab(mm);
+	kvm->mm = mm;
 	kvm_eventfd_init(kvm);
 	mutex_init(&kvm->lock);
 	mutex_init(&kvm->irq_lock);
@@ -741,7 +753,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 out_err:
 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 	if (kvm->mmu_notifier.ops)
-		mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
+		mmu_notifier_unregister(&kvm->mmu_notifier, mm);
 #endif
 out_err_no_mmu_notifier:
 	hardware_disable_all();
@@ -758,7 +770,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	cleanup_srcu_struct(&kvm->srcu);
 out_err_no_srcu:
 	kvm_arch_free_vm(kvm);
-	mmdrop(current->mm);
+	mmdrop(mm);
 	return ERR_PTR(r);
 }
 struct kvm *kvmm_kvm_create_vm(unsigned long type)
@@ -1228,8 +1240,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	/* We can read the guest memory with __xxx_user() later on. */
 	if ((id < KVM_USER_MEM_SLOTS) &&
 	    ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
-	     !access_ok((void __user *)(unsigned long)mem->userspace_addr,
-			mem->memory_size)))
+	     !(access_ok((void __user *)(unsigned long)mem->userspace_addr,
+			mem->memory_size) || kvm->is_kvmm_vm)))
 		return -EINVAL;
 	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
 		return -EINVAL;
@@ -1484,7 +1496,6 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	return 0;
 }
 
-
 /**
  * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
  * @kvm: kvm instance
@@ -1636,6 +1647,8 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	struct vm_area_struct *vma;
 	unsigned long addr, size;
+	struct mm_struct *mm = vcpu->kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
 	size = PAGE_SIZE;
 
@@ -1643,15 +1656,15 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
 	if (kvm_is_error_hva(addr))
 		return PAGE_SIZE;
 
-	down_read(&current->mm->mmap_sem);
-	vma = find_vma(current->mm, addr);
+	down_read(&mm->mmap_sem);
+	vma = find_vma(mm, addr);
 	if (!vma)
 		goto out;
 
 	size = vma_kernel_pagesize(vma);
 
 out:
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 
 	return size;
 }
@@ -1760,8 +1773,12 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
 	 */
 	if (!(write_fault || writable))
 		return false;
-
-	npages = __get_user_pages_fast(addr, 1, 1, page);
+	if (kvmm_valid_addr(addr)) {
+		npages = 1;
+		page[0] = vmalloc_to_page((void *)addr);
+	} else {
+		npages = __get_user_pages_fast(addr, 1, 1, page);
+	}
 	if (npages == 1) {
 		*pfn = page_to_pfn(page[0]);
 
@@ -1794,15 +1811,28 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 	if (async)
 		flags |= FOLL_NOWAIT;
 
-	npages = get_user_pages_unlocked(addr, 1, &page, flags);
+	if (kvmm_valid_addr(addr)) {
+		npages = 1;
+		page = vmalloc_to_page((void *)addr);
+	} else {
+		npages = get_user_pages_unlocked(addr, 1, &page, flags);
+	}
 	if (npages != 1)
 		return npages;
 
 	/* map read fault as writable if possible */
 	if (unlikely(!write_fault) && writable) {
 		struct page *wpage;
+		int lnpages = 0;
+
+		if (kvmm_valid_addr(addr)) {
+			npages_local = 1;
+			wpage = vmalloc_to_page((void *)addr);
+		} else {
+			lnpages = __get_user_pages_fast(addr, 1, 1, &wpage);
+		}
 
-		if (__get_user_pages_fast(addr, 1, 1, &wpage) == 1) {
+		if (lnpages == 1) {
 			*writable = true;
 			put_page(page);
 			page = wpage;
@@ -1830,6 +1860,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 {
 	unsigned long pfn;
 	int r;
+	struct mm_struct *mm = kvmm_valid_addr(addr) ?
+				current->active_mm : current->mm;
 
 	r = follow_pfn(vma, addr, &pfn);
 	if (r) {
@@ -1838,7 +1870,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 		 * not call the fault handler, so do it here.
 		 */
 		bool unlocked = false;
-		r = fixup_user_fault(current, current->mm, addr,
+		r = fixup_user_fault(current, mm, addr,
 				     (write_fault ? FAULT_FLAG_WRITE : 0),
 				     &unlocked);
 		if (unlocked)
@@ -1892,6 +1924,8 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	struct vm_area_struct *vma;
 	kvm_pfn_t pfn = 0;
 	int npages, r;
+	struct mm_struct *mm = kvmm_valid_addr(addr) ?
+				current->active_mm : current->mm;
 
 	/* we can do it either atomically or asynchronously, not both */
 	BUG_ON(atomic && async);
@@ -1906,7 +1940,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	if (npages == 1)
 		return pfn;
 
-	down_read(&current->mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 	if (npages == -EHWPOISON ||
 	      (!async && check_user_page_hwpoison(addr))) {
 		pfn = KVM_PFN_ERR_HWPOISON;
@@ -1914,7 +1948,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	}
 
 retry:
-	vma = find_vma_intersection(current->mm, addr, addr + 1);
+	vma = find_vma_intersection(mm, addr, addr + 1);
 
 	if (vma == NULL)
 		pfn = KVM_PFN_ERR_FAULT;
@@ -1930,7 +1964,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 		pfn = KVM_PFN_ERR_FAULT;
 	}
 exit:
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	return pfn;
 }
 
@@ -2208,6 +2242,9 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn)
 {
+	/* rely on the fact that we're in the kernel */
+	if (current->flags & PF_KTHREAD)
+		return;
 	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
 		put_page(pfn_to_page(pfn));
 }
@@ -2244,6 +2281,9 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
 
 void kvm_get_pfn(kvm_pfn_t pfn)
 {
+	/* rely on the fact that we're in the kernel */
+	if (current->flags & PF_KTHREAD)
+		return;
 	if (!kvm_is_reserved_pfn(pfn))
 		get_page(pfn_to_page(pfn));
 }
@@ -3120,8 +3160,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
 	int r;
 	struct kvm_fpu *fpu = NULL;
 	struct kvm_sregs *kvm_sregs = NULL;
+	struct mm_struct *mm = vcpu->kvm->is_kvmm_vm ?
+				current->active_mm : current->mm;
 
-	if (vcpu->kvm->mm != current->mm)
+	if (vcpu->kvm->mm != mm)
 		return -EIO;
 
 	if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
@@ -3327,9 +3369,12 @@ static long kvm_vcpu_compat_ioctl(struct file *filp,
 	struct kvm_vcpu *vcpu = filp->private_data;
 	void __user *argp = compat_ptr(arg);
 	int r;
+	struct mm_struct *mm = vcpu->kvm->is_kvmm_vm ?
+				       current->active_mm : current->mm;
 
-	if (vcpu->kvm->mm != current->mm)
+	if (vcpu->kvm->mm != mm) {
 		return -EIO;
+	}
 
 	switch (ioctl) {
 	case KVM_SET_SIGNAL_MASK: {
@@ -3392,8 +3437,10 @@ static long kvm_device_ioctl(struct file *filp, unsigned int ioctl,
 			     unsigned long arg)
 {
 	struct kvm_device *dev = filp->private_data;
+	struct mm_struct *mm = dev->kvm->is_kvmm_vm ?
+				       current->active_mm : current->mm;
 
-	if (dev->kvm->mm != current->mm)
+	if (dev->kvm->mm != mm)
 		return -EIO;
 
 	switch (ioctl) {
@@ -3600,8 +3647,10 @@ static long kvm_vm_ioctl(struct file *filp,
 	struct kvm *kvm = filp->private_data;
 	void __user *argp = (void __user *)arg;
 	int r;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				       current->active_mm : current->mm;
 
-	if (kvm->mm != current->mm)
+	if (kvm->mm != mm)
 		return -EIO;
 	switch (ioctl) {
 	case KVM_CREATE_VCPU:
@@ -3797,9 +3846,11 @@ static long kvm_vm_compat_ioctl(struct file *filp,
 			   unsigned int ioctl, unsigned long arg)
 {
 	struct kvm *kvm = filp->private_data;
+	struct mm_struct *mm = kvm->is_kvmm_vm ?
+				       current->active_mm : current->mm;
 	int r;
 
-	if (kvm->mm != current->mm)
+	if (kvm->mm != mm)
 		return -EIO;
 	switch (ioctl) {
 	case KVM_GET_DIRTY_LOG: {
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVMM: export needed symbols
  2020-05-18  6:58 ` [PATCH 1/2] KVMM: export needed symbols Anastassios Nanos
@ 2020-05-18  7:41   ` Marc Zyngier
  0 siblings, 0 replies; 17+ messages in thread
From: Marc Zyngier @ 2020-05-18  7:41 UTC (permalink / raw)
  To: Anastassios Nanos
  Cc: Thomas Gleixner, Wanpeng Li, kvm, Catalin Marinas, Joerg Roedel,
	x86, linux-kernel, Sean Christopherson, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Paolo Bonzini, Vitaly Kuznetsov,
	Will Deacon, kvmarm, Jim Mattson

On 2020-05-18 07:58, Anastassios Nanos wrote:
> Expose a set of KVM functions to the kernel, in order to be
> able to spawn a VM instance without assistance from user-space.
> To handle a guest instance, the system needs access to the following
> functions:
> 
>     kvm_arch_vcpu_run_map_fp
>     kvm_arch_vcpu_ioctl_get_regs
>     kvm_arch_vcpu_ioctl_set_regs
>     kvm_arm_get_reg
>     kvm_arm_set_reg
>     kvm_arch_vcpu_ioctl_get_sregs
>     kvm_arch_vcpu_ioctl_set_sregs
>     kvm_vcpu_preferred_target
>     kvm_vcpu_ioctl_set_cpuid2
>     kvm_vcpu_ioctl_get_cpuid2
>     kvm_dev_ioctl_get_cpuid
>     kvm_arch_vcpu_ioctl_run
>     kvm_arch_vcpu_ioctl_get_regs
>     kvm_arch_vcpu_ioctl_set_regs
>     kvm_arch_vcpu_ioctl_get_sregs
>     kvm_arch_vcpu_ioctl_set_sregs
>     kvm_vcpu_initialized
>     kvm_arch_vcpu_ioctl_run
>     kvm_arch_vcpu_ioctl_vcpu_init
>     kvm_coalesced_mmio_init
>     kvm_create_vm
>     kvm_destroy_vm
>     kvm_vm_ioctl_set_memory_region
>     kvm_vm_ioctl_create_vcpu
> 
> Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
> Signed-off-by: Charalampos Mainas <cmainas@nubificus.co.uk>
> Signed-off-by: Konstantinos Papazafeiropoulos <kostis@nubificus.co.uk>
> Signed-off-by: Stratos Psomadakis <psomas@nubificus.co.uk>
> ---
>  arch/arm64/include/asm/kvm_host.h |  6 ++++
>  arch/arm64/kvm/fpsimd.c           |  6 ++++
>  arch/arm64/kvm/guest.c            | 48 +++++++++++++++++++++++++++++++
>  arch/x86/kvm/cpuid.c              | 25 ++++++++++++++++
>  arch/x86/kvm/x86.c                | 31 ++++++++++++++++++++
>  include/linux/kvm_host.h          | 24 ++++++++++++++++
>  virt/kvm/arm/arm.c                | 18 ++++++++++++
>  virt/kvm/coalesced_mmio.c         |  6 ++++
>  virt/kvm/kvm_main.c               | 23 +++++++++++++++
>  9 files changed, 187 insertions(+)

In general, we don't export synbols without a user in the tree.
And if/when we do, the sensible thing to do would be to export
them as GPL only.

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] Expose KVM API to Linux Kernel
  2020-05-18  6:58 [PATCH 0/2] Expose KVM API to Linux Kernel Anastassios Nanos
  2020-05-18  6:58 ` [PATCH 1/2] KVMM: export needed symbols Anastassios Nanos
  2020-05-18  7:01 ` [PATCH 2/2] KVMM: Memory and interface related changes Anastassios Nanos
@ 2020-05-18  7:50 ` Marc Zyngier
       [not found]   ` <CALRTab90UyMq2hMxCdCmC3GwPWFn2tK_uKMYQP2YBRcHwzkEUQ@mail.gmail.com>
  2020-05-18  8:42 ` Thomas Gleixner
  3 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2020-05-18  7:50 UTC (permalink / raw)
  To: Anastassios Nanos
  Cc: Thomas Gleixner, Wanpeng Li, kvm, Catalin Marinas, Joerg Roedel,
	x86, linux-kernel, Sean Christopherson, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Paolo Bonzini, Vitaly Kuznetsov,
	Will Deacon, kvmarm, Jim Mattson

On 2020-05-18 07:58, Anastassios Nanos wrote:
> To spawn KVM-enabled Virtual Machines on Linux systems, one has to use
> QEMU, or some other kind of VM monitor in user-space to host the vCPU
> threads, I/O threads and various other book-keeping/management 
> mechanisms.
> This is perfectly fine for a large number of reasons and use cases: for
> instance, running generic VMs, running general purpose Operating 
> systems
> that need some kind of emulation for legacy boot/hardware etc.
> 
> What if we wanted to execute a small piece of code as a guest instance,
> without the involvement of user-space? The KVM functions are already 
> doing
> what they should: VM and vCPU setup is already part of the kernel, the 
> only
> missing piece is memory handling.
> 
> With these series, (a) we expose to the Linux Kernel the bare minimum 
> KVM
> API functions in order to spawn a guest instance without the 
> intervention
> of user-space; and (b) we tweak the memory handling code of KVM-related
> functions to account for another kind of guest, spawned in 
> kernel-space.
> 
> PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces 
> the
> changes in the KVM memory handling code for x86_64 and aarch64.
> 
> An example of use is provided based on kvmtest.c
> [https://lwn.net/Articles/658512/] at
> https://github.com/cloudkernels/kvmmtest

You don't explain *why* we would want this. What is the overhead of 
having
a userspace if your guest doesn't need any userspace handling? The 
kvmtest
example indeed shows that the KVM userspace API is usable  without any 
form
of emulation, hence has almost no cost.

Without a clear description of the advantages of your solution, as well
as a full featured in-tree use case, I find it pretty hard to support 
this.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] Expose KVM API to Linux Kernel
  2020-05-18  6:58 [PATCH 0/2] Expose KVM API to Linux Kernel Anastassios Nanos
                   ` (2 preceding siblings ...)
  2020-05-18  7:50 ` [PATCH 0/2] Expose KVM API to Linux Kernel Marc Zyngier
@ 2020-05-18  8:42 ` Thomas Gleixner
       [not found]   ` <CALRTab-mEYtRG4zQbSGoAri+jg8xNL-imODv=MWE330Hkt_t+Q@mail.gmail.com>
  3 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2020-05-18  8:42 UTC (permalink / raw)
  To: Anastassios Nanos, kvm, kvmarm, linux-kernel
  Cc: Wanpeng Li, Marc Zyngier, Joerg Roedel, x86, H. Peter Anvin,
	Sean Christopherson, Ingo Molnar, Catalin Marinas,
	Borislav Petkov, Paolo Bonzini, Vitaly Kuznetsov, Will Deacon,
	Jim Mattson

Anastassios Nanos <ananos@nubificus.co.uk> writes:
> To spawn KVM-enabled Virtual Machines on Linux systems, one has to use
> QEMU, or some other kind of VM monitor in user-space to host the vCPU
> threads, I/O threads and various other book-keeping/management mechanisms.
> This is perfectly fine for a large number of reasons and use cases: for
> instance, running generic VMs, running general purpose Operating systems
> that need some kind of emulation for legacy boot/hardware etc.
>
> What if we wanted to execute a small piece of code as a guest instance,
> without the involvement of user-space? The KVM functions are already doing
> what they should: VM and vCPU setup is already part of the kernel, the only
> missing piece is memory handling.
>
> With these series, (a) we expose to the Linux Kernel the bare minimum KVM
> API functions in order to spawn a guest instance without the intervention
> of user-space; and (b) we tweak the memory handling code of KVM-related
> functions to account for another kind of guest, spawned in kernel-space.
>
> PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces the
> changes in the KVM memory handling code for x86_64 and aarch64.
>
> An example of use is provided based on kvmtest.c
> [https://lwn.net/Articles/658512/] at

And this shows clearly how simple the user space is which is required to
do that. So why on earth would we want to have all of that in the
kernel?

Thanks,

        tglx
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVMM: Memory and interface related changes
  2020-05-18  7:01 ` [PATCH 2/2] KVMM: Memory and interface related changes Anastassios Nanos
  2020-05-18  6:59   ` Anastassios Nanos
@ 2020-05-18  9:13   ` kbuild test robot
  2020-05-18  9:28   ` kbuild test robot
  2020-05-18 10:16   ` kbuild test robot
  3 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2020-05-18  9:13 UTC (permalink / raw)
  To: Anastassios Nanos, kvm, kvmarm, linux-kernel
  Cc: kbuild-all, Catalin Marinas, Sean Christopherson, Marc Zyngier,
	Paolo Bonzini, Vitaly Kuznetsov, Will Deacon

[-- Attachment #1: Type: text/plain, Size: 2958 bytes --]

Hi Anastassios,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.7-rc6]
[cannot apply to kvm/linux-next kvmarm/next tip/x86/core next-20200515]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Anastassios-Nanos/Expose-KVM-API-to-Linux-Kernel/20200518-150402
base:    b9bbe6ed63b2b9f2c9ee5cbd0f2c946a2723f4ce
config: i386-randconfig-a016-20200518 (attached as .config)
compiler: gcc-5 (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function 'hva_to_pfn_slow':
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1837:4: error: 'npages_local' undeclared (first use in this function)
npages_local = 1;
^
arch/x86/kvm/../../../virt/kvm/kvm_main.c:1837:4: note: each undeclared identifier is reported only once for each function it appears in

vim +/npages_local +1837 arch/x86/kvm/../../../virt/kvm/kvm_main.c

  1800	
  1801	/*
  1802	 * The slow path to get the pfn of the specified host virtual address,
  1803	 * 1 indicates success, -errno is returned if error is detected.
  1804	 */
  1805	static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
  1806				   bool *writable, kvm_pfn_t *pfn)
  1807	{
  1808		unsigned int flags = FOLL_HWPOISON;
  1809		struct page *page;
  1810		int npages = 0;
  1811	
  1812		might_sleep();
  1813	
  1814		if (writable)
  1815			*writable = write_fault;
  1816	
  1817		if (write_fault)
  1818			flags |= FOLL_WRITE;
  1819		if (async)
  1820			flags |= FOLL_NOWAIT;
  1821	
  1822		if (kvmm_valid_addr(addr)) {
  1823			npages = 1;
  1824			page = vmalloc_to_page((void *)addr);
  1825		} else {
  1826			npages = get_user_pages_unlocked(addr, 1, &page, flags);
  1827		}
  1828		if (npages != 1)
  1829			return npages;
  1830	
  1831		/* map read fault as writable if possible */
  1832		if (unlikely(!write_fault) && writable) {
  1833			struct page *wpage;
  1834			int lnpages = 0;
  1835	
  1836			if (kvmm_valid_addr(addr)) {
> 1837				npages_local = 1;
  1838				wpage = vmalloc_to_page((void *)addr);
  1839			} else {
  1840				lnpages = __get_user_pages_fast(addr, 1, 1, &wpage);
  1841			}
  1842	
  1843			if (lnpages == 1) {
  1844				*writable = true;
  1845				put_page(page);
  1846				page = wpage;
  1847			}
  1848		}
  1849		*pfn = page_to_pfn(page);
  1850		return npages;
  1851	}
  1852	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31979 bytes --]

[-- Attachment #3: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] Expose KVM API to Linux Kernel
       [not found]   ` <CALRTab-mEYtRG4zQbSGoAri+jg8xNL-imODv=MWE330Hkt_t+Q@mail.gmail.com>
@ 2020-05-18  9:18     ` Vitaly Kuznetsov
  2020-05-18  9:38     ` Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-18  9:18 UTC (permalink / raw)
  To: Anastassios Nanos
  Cc: Wanpeng Li, kvm, Marc Zyngier, Joerg Roedel, x86, H. Peter Anvin,
	linux-kernel, Sean Christopherson, Ingo Molnar, Catalin Marinas,
	Borislav Petkov, Paolo Bonzini, Thomas Gleixner, Will Deacon,
	kvmarm, Jim Mattson

Anastassios Nanos <ananos@nubificus.co.uk> writes:

> Moreover, it doesn't involve *any* mode switch at all while printing
> out the result of the  addition of these two registers -- which I
> guess for a simple use-case like this it isn't much.
> But if we were to scale this to a large number of exits (and their
> respective handling in user-space) that would incur significant
> overhead.

Eliminating frequent exits to userspace when the guest is already
running is absolutely fine but eliminating userspace completely, even
for creation of the guest, is something dubious. To create a simple
guest you need just a dozen of IOCTLs, you'll have to find a really,
really good showcase when it makes a difference. 

E.g. I can imagine the following use-case: you need to create a lot of
guests with the same (or almost the same) memory contents and allocating
and populating this memory in userspace takes time. But even in this
use-case, why do you need to terminate your userspace? Or would it be
possible to create guests from a shared memory? (we may not have
copy-on-write capabilities in KVM currently but this doesn't mean they
can't be added).

Alternatively, you may want to mangle vmexit handling somehow and
exiting to userspace seems slow. Fine, let's add eBPF attach points to
KVM and an API to attach eBPF code there.

I'm, however, just guessing. I understand you may not want to reveal
your original idea for some reason but without us understanding what's
really needed I don't see how the change can be reviewed.

-- 
Vitaly

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVMM: Memory and interface related changes
  2020-05-18  7:01 ` [PATCH 2/2] KVMM: Memory and interface related changes Anastassios Nanos
  2020-05-18  6:59   ` Anastassios Nanos
  2020-05-18  9:13   ` kbuild test robot
@ 2020-05-18  9:28   ` kbuild test robot
  2020-05-18 10:16   ` kbuild test robot
  3 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2020-05-18  9:28 UTC (permalink / raw)
  To: Anastassios Nanos, kvm, kvmarm, linux-kernel
  Cc: kbuild-all, Marc Zyngier, Sean Christopherson, clang-built-linux,
	Catalin Marinas, Paolo Bonzini, Vitaly Kuznetsov, Will Deacon

[-- Attachment #1: Type: text/plain, Size: 3122 bytes --]

Hi Anastassios,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.7-rc6]
[cannot apply to kvm/linux-next kvmarm/next tip/x86/core next-20200515]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Anastassios-Nanos/Expose-KVM-API-to-Linux-Kernel/20200518-150402
base:    b9bbe6ed63b2b9f2c9ee5cbd0f2c946a2723f4ce
config: x86_64-randconfig-r006-20200518 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 135b877874fae96b4372c8a3fbfaa8ff44ff86e3)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1837:4: error: use of undeclared identifier 'npages_local'
npages_local = 1;
^
1 error generated.

vim +/npages_local +1837 arch/x86/kvm/../../../virt/kvm/kvm_main.c

  1800	
  1801	/*
  1802	 * The slow path to get the pfn of the specified host virtual address,
  1803	 * 1 indicates success, -errno is returned if error is detected.
  1804	 */
  1805	static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
  1806				   bool *writable, kvm_pfn_t *pfn)
  1807	{
  1808		unsigned int flags = FOLL_HWPOISON;
  1809		struct page *page;
  1810		int npages = 0;
  1811	
  1812		might_sleep();
  1813	
  1814		if (writable)
  1815			*writable = write_fault;
  1816	
  1817		if (write_fault)
  1818			flags |= FOLL_WRITE;
  1819		if (async)
  1820			flags |= FOLL_NOWAIT;
  1821	
  1822		if (kvmm_valid_addr(addr)) {
  1823			npages = 1;
  1824			page = vmalloc_to_page((void *)addr);
  1825		} else {
  1826			npages = get_user_pages_unlocked(addr, 1, &page, flags);
  1827		}
  1828		if (npages != 1)
  1829			return npages;
  1830	
  1831		/* map read fault as writable if possible */
  1832		if (unlikely(!write_fault) && writable) {
  1833			struct page *wpage;
  1834			int lnpages = 0;
  1835	
  1836			if (kvmm_valid_addr(addr)) {
> 1837				npages_local = 1;
  1838				wpage = vmalloc_to_page((void *)addr);
  1839			} else {
  1840				lnpages = __get_user_pages_fast(addr, 1, 1, &wpage);
  1841			}
  1842	
  1843			if (lnpages == 1) {
  1844				*writable = true;
  1845				put_page(page);
  1846				page = wpage;
  1847			}
  1848		}
  1849		*pfn = page_to_pfn(page);
  1850		return npages;
  1851	}
  1852	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34336 bytes --]

[-- Attachment #3: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] Expose KVM API to Linux Kernel
       [not found]   ` <CALRTab-mEYtRG4zQbSGoAri+jg8xNL-imODv=MWE330Hkt_t+Q@mail.gmail.com>
  2020-05-18  9:18     ` Vitaly Kuznetsov
@ 2020-05-18  9:38     ` Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2020-05-18  9:38 UTC (permalink / raw)
  To: Anastassios Nanos
  Cc: Wanpeng Li, kvm, Marc Zyngier, Joerg Roedel, x86, H. Peter Anvin,
	linux-kernel, Sean Christopherson, Ingo Molnar, Catalin Marinas,
	Borislav Petkov, Paolo Bonzini, Vitaly Kuznetsov, Will Deacon,
	kvmarm, Jim Mattson

Anastassios Nanos <ananos@nubificus.co.uk> writes:
> On Mon, May 18, 2020 at 11:43 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> And this shows clearly how simple the user space is which is required to
>> do that. So why on earth would we want to have all of that in the
>> kernel?
>>
> well, the main idea is that all this functionality is already in the
> kernel. My view is that kvmmtest is as simple as kvmtest.

That still does not explain the purpose, the advantage and any reason
why this should be moreged.

> Moreover, it doesn't involve *any* mode switch at all while printing
> out the result of the addition of these two registers -- which I guess
> for a simple use-case like this it isn't much.  But if we were to
> scale this to a large number of exits (and their respective handling
> in user-space) that would incur significant overhead. Don't you agree?

No. I still do not see the real world use case you are trying to
solve. We are not going to accept changes like this which have no proper
justification, real world use cases and proper numbers backing it up.

Thanks,

        tglx


_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVMM: Memory and interface related changes
  2020-05-18  7:01 ` [PATCH 2/2] KVMM: Memory and interface related changes Anastassios Nanos
                     ` (2 preceding siblings ...)
  2020-05-18  9:28   ` kbuild test robot
@ 2020-05-18 10:16   ` kbuild test robot
  3 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2020-05-18 10:16 UTC (permalink / raw)
  To: Anastassios Nanos, kvm, kvmarm, linux-kernel
  Cc: kbuild-all, Catalin Marinas, Sean Christopherson, Marc Zyngier,
	Paolo Bonzini, Vitaly Kuznetsov, Will Deacon

[-- Attachment #1: Type: text/plain, Size: 3175 bytes --]

Hi Anastassios,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.7-rc6]
[cannot apply to kvm/linux-next kvmarm/next tip/x86/core next-20200515]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Anastassios-Nanos/Expose-KVM-API-to-Linux-Kernel/20200518-150402
base:    b9bbe6ed63b2b9f2c9ee5cbd0f2c946a2723f4ce
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

arch/powerpc/kvm/../../../virt/kvm/kvm_main.c: In function 'hva_to_pfn_slow':
>> arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:1837:4: error: 'npages_local' undeclared (first use in this function)
1837 |    npages_local = 1;
|    ^~~~~~~~~~~~
arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:1837:4: note: each undeclared identifier is reported only once for each function it appears in

vim +/npages_local +1837 arch/powerpc/kvm/../../../virt/kvm/kvm_main.c

  1800	
  1801	/*
  1802	 * The slow path to get the pfn of the specified host virtual address,
  1803	 * 1 indicates success, -errno is returned if error is detected.
  1804	 */
  1805	static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
  1806				   bool *writable, kvm_pfn_t *pfn)
  1807	{
  1808		unsigned int flags = FOLL_HWPOISON;
  1809		struct page *page;
  1810		int npages = 0;
  1811	
  1812		might_sleep();
  1813	
  1814		if (writable)
  1815			*writable = write_fault;
  1816	
  1817		if (write_fault)
  1818			flags |= FOLL_WRITE;
  1819		if (async)
  1820			flags |= FOLL_NOWAIT;
  1821	
  1822		if (kvmm_valid_addr(addr)) {
  1823			npages = 1;
  1824			page = vmalloc_to_page((void *)addr);
  1825		} else {
  1826			npages = get_user_pages_unlocked(addr, 1, &page, flags);
  1827		}
  1828		if (npages != 1)
  1829			return npages;
  1830	
  1831		/* map read fault as writable if possible */
  1832		if (unlikely(!write_fault) && writable) {
  1833			struct page *wpage;
  1834			int lnpages = 0;
  1835	
  1836			if (kvmm_valid_addr(addr)) {
> 1837				npages_local = 1;
  1838				wpage = vmalloc_to_page((void *)addr);
  1839			} else {
  1840				lnpages = __get_user_pages_fast(addr, 1, 1, &wpage);
  1841			}
  1842	
  1843			if (lnpages == 1) {
  1844				*writable = true;
  1845				put_page(page);
  1846				page = wpage;
  1847			}
  1848		}
  1849		*pfn = page_to_pfn(page);
  1850		return npages;
  1851	}
  1852	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26114 bytes --]

[-- Attachment #3: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] Expose KVM API to Linux Kernel
       [not found]   ` <CALRTab90UyMq2hMxCdCmC3GwPWFn2tK_uKMYQP2YBRcHwzkEUQ@mail.gmail.com>
@ 2020-05-18 11:18     ` Paolo Bonzini
  2020-05-18 11:34       ` Maxim Levitsky
  2020-05-18 20:45     ` Andy Lutomirski
  1 sibling, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2020-05-18 11:18 UTC (permalink / raw)
  To: Anastassios Nanos, Marc Zyngier
  Cc: Thomas Gleixner, Wanpeng Li, kvm, Catalin Marinas, Joerg Roedel,
	x86, linux-kernel, Sean Christopherson, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Vitaly Kuznetsov, Will Deacon,
	kvmarm, Jim Mattson

On 18/05/20 10:45, Anastassios Nanos wrote:
> Being in the kernel saves us from doing unneccessary mode switches.
> Of course there are optimizations for handling I/O on QEMU/KVM VMs
> (virtio/vhost), but essentially what happens is removing mode-switches (and
> exits) for I/O operations -- is there a good reason not to address that
> directly? a guest running in the kernel exits because of an I/O request,
> which gets processed and forwarded directly to the relevant subsystem *in*
> the kernel (net/block etc.).

In high-performance configurations, most of the time virtio devices are
processed in another thread that polls on the virtio rings.  In this
setup, the rings are configured to not cause a vmexit at all; this has
much smaller latency than even a lightweight (kernel-only) vmexit,
basically corresponding to writing an L1 cache line back to L2.

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] Expose KVM API to Linux Kernel
  2020-05-18 11:18     ` Paolo Bonzini
@ 2020-05-18 11:34       ` Maxim Levitsky
  2020-05-18 11:51         ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Maxim Levitsky @ 2020-05-18 11:34 UTC (permalink / raw)
  To: Paolo Bonzini, Anastassios Nanos, Marc Zyngier
  Cc: Thomas Gleixner, Wanpeng Li, kvm, Catalin Marinas, Joerg Roedel,
	x86, linux-kernel, Sean Christopherson, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Vitaly Kuznetsov, Will Deacon,
	kvmarm, Jim Mattson

On Mon, 2020-05-18 at 13:18 +0200, Paolo Bonzini wrote:
> On 18/05/20 10:45, Anastassios Nanos wrote:
> > Being in the kernel saves us from doing unneccessary mode switches.
> > Of course there are optimizations for handling I/O on QEMU/KVM VMs
> > (virtio/vhost), but essentially what happens is removing mode-switches (and
> > exits) for I/O operations -- is there a good reason not to address that
> > directly? a guest running in the kernel exits because of an I/O request,
> > which gets processed and forwarded directly to the relevant subsystem *in*
> > the kernel (net/block etc.).
> 
> In high-performance configurations, most of the time virtio devices are
> processed in another thread that polls on the virtio rings.  In this
> setup, the rings are configured to not cause a vmexit at all; this has
> much smaller latency than even a lightweight (kernel-only) vmexit,
> basically corresponding to writing an L1 cache line back to L2.
> 
> Paolo
> 
This can be used to run kernel drivers inside a very thin VM IMHO to break up the stigma,
that kernel driver is always a bad thing to and should be by all means replaced by a userspace driver,
something I see a lot lately, and what was the ground for rejection of my nvme-mdev proposal.


Best regards,
	Maxim Levitsky


_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] Expose KVM API to Linux Kernel
  2020-05-18 11:34       ` Maxim Levitsky
@ 2020-05-18 11:51         ` Paolo Bonzini
  2020-05-18 12:12           ` Maxim Levitsky
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2020-05-18 11:51 UTC (permalink / raw)
  To: Maxim Levitsky, Anastassios Nanos, Marc Zyngier
  Cc: Thomas Gleixner, Wanpeng Li, kvm, Catalin Marinas, Joerg Roedel,
	x86, linux-kernel, Sean Christopherson, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Vitaly Kuznetsov, Will Deacon,
	kvmarm, Jim Mattson

On 18/05/20 13:34, Maxim Levitsky wrote:
>> In high-performance configurations, most of the time virtio devices are
>> processed in another thread that polls on the virtio rings.  In this
>> setup, the rings are configured to not cause a vmexit at all; this has
>> much smaller latency than even a lightweight (kernel-only) vmexit,
>> basically corresponding to writing an L1 cache line back to L2.
>
> This can be used to run kernel drivers inside a very thin VM IMHO to break up the stigma,
> that kernel driver is always a bad thing to and should be by all means replaced by a userspace driver,
> something I see a lot lately, and what was the ground for rejection of my nvme-mdev proposal.

It's a tought design decision between speeding up a kernel driver with
something like eBPF or wanting to move everything to userspace.

Networking has moved more towards the first because there are many more
opportunities for NIC-based acceleration, while storage has moved
towards the latter with things such as io_uring.  That said, I don't see
why in-kernel NVMeoF drivers would be acceptable for anything but Fibre
Channel (and that's only because FC HBAs try hard to hide most of the
SAN layers).

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] Expose KVM API to Linux Kernel
  2020-05-18 11:51         ` Paolo Bonzini
@ 2020-05-18 12:12           ` Maxim Levitsky
  0 siblings, 0 replies; 17+ messages in thread
From: Maxim Levitsky @ 2020-05-18 12:12 UTC (permalink / raw)
  To: Paolo Bonzini, Anastassios Nanos, Marc Zyngier
  Cc: Thomas Gleixner, Wanpeng Li, kvm, Catalin Marinas, Joerg Roedel,
	x86, linux-kernel, Sean Christopherson, Ingo Molnar,
	H. Peter Anvin, Borislav Petkov, Vitaly Kuznetsov, Will Deacon,
	kvmarm, Jim Mattson

On Mon, 2020-05-18 at 13:51 +0200, Paolo Bonzini wrote:
> On 18/05/20 13:34, Maxim Levitsky wrote:
> > > In high-performance configurations, most of the time virtio devices are
> > > processed in another thread that polls on the virtio rings.  In this
> > > setup, the rings are configured to not cause a vmexit at all; this has
> > > much smaller latency than even a lightweight (kernel-only) vmexit,
> > > basically corresponding to writing an L1 cache line back to L2.
> > 
> > This can be used to run kernel drivers inside a very thin VM IMHO to break up the stigma,
> > that kernel driver is always a bad thing to and should be by all means replaced by a userspace driver,
> > something I see a lot lately, and what was the ground for rejection of my nvme-mdev proposal.
> 
> It's a tought design decision between speeding up a kernel driver with
> something like eBPF or wanting to move everything to userspace.
> 
> Networking has moved more towards the first because there are many more
> opportunities for NIC-based acceleration, while storage has moved
> towards the latter with things such as io_uring.  That said, I don't see
> why in-kernel NVMeoF drivers would be acceptable for anything but Fibre
> Channel (and that's only because FC HBAs try hard to hide most of the
> SAN layers).
> 
> Paolo
> 

Note that these days storage is as fast or even faster that many types of networking,
and that there also are opportunities for acceleration (like p2p pci dma) that also are more
natural to do in the kernel.

io-uring is actually not about moving everything to userspace IMHO, but rather the opposite,
it allows the userspace to access the kernel block subsystem in very efficent way which
is the right thing to do.

Sadly it doesn't help much with fast NVME virtualization because the bottleneck moves
to the communication with the guest.

I guess this is getting offtopic, so I won't continue this discussion here,
I just wanted to voice my opinion on this manner.

Another thing that comes to my mind (not that it has to be done in the kernel),
is that AMD's AVIC allows peer to peer interrupts between guests, and that
can in theory allow to run a 'driver' in a special guest and let it communicate
with a normal guest using interrupts bi-directionally which can finally solve the
need to waste a core in a busy wait loop.

The only catch is that the 'special guest' has to run 100% of the time,
thus it can't still share a core with other kernel/usespace tasks, but at least it can be in sleeping
state most of the time, and it can itsel run various tasks that serve various needs.

In other words, I don't have any objection to allowing part of the host kernel to run in VMX/SVM
guest mode. This can be a very intersting thing.

Best regards,
	Maxim Levitsky

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] Expose KVM API to Linux Kernel
       [not found]   ` <CALRTab90UyMq2hMxCdCmC3GwPWFn2tK_uKMYQP2YBRcHwzkEUQ@mail.gmail.com>
  2020-05-18 11:18     ` Paolo Bonzini
@ 2020-05-18 20:45     ` Andy Lutomirski
  1 sibling, 0 replies; 17+ messages in thread
From: Andy Lutomirski @ 2020-05-18 20:45 UTC (permalink / raw)
  To: Anastassios Nanos
  Cc: Thomas Gleixner, Wanpeng Li, kvm list, Marc Zyngier,
	Joerg Roedel, X86 ML, H. Peter Anvin, LKML, Sean Christopherson,
	Ingo Molnar, Catalin Marinas, Borislav Petkov, Paolo Bonzini,
	Vitaly Kuznetsov, Will Deacon, kvmarm, Jim Mattson

On Mon, May 18, 2020 at 1:50 AM Anastassios Nanos
<ananos@nubificus.co.uk> wrote:
>
> On Mon, May 18, 2020 at 10:50 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On 2020-05-18 07:58, Anastassios Nanos wrote:
> > > To spawn KVM-enabled Virtual Machines on Linux systems, one has to use
> > > QEMU, or some other kind of VM monitor in user-space to host the vCPU
> > > threads, I/O threads and various other book-keeping/management
> > > mechanisms.
> > > This is perfectly fine for a large number of reasons and use cases: for
> > > instance, running generic VMs, running general purpose Operating
> > > systems
> > > that need some kind of emulation for legacy boot/hardware etc.
> > >
> > > What if we wanted to execute a small piece of code as a guest instance,
> > > without the involvement of user-space? The KVM functions are already
> > > doing
> > > what they should: VM and vCPU setup is already part of the kernel, the
> > > only
> > > missing piece is memory handling.
> > >
> > > With these series, (a) we expose to the Linux Kernel the bare minimum
> > > KVM
> > > API functions in order to spawn a guest instance without the
> > > intervention
> > > of user-space; and (b) we tweak the memory handling code of KVM-related
> > > functions to account for another kind of guest, spawned in
> > > kernel-space.
> > >
> > > PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces
> > > the
> > > changes in the KVM memory handling code for x86_64 and aarch64.
> > >
> > > An example of use is provided based on kvmtest.c
> > > [http://email.nubificus.co.uk/c/eJwdzU0LgjAAxvFPo0eZm1t62MEkC0xQScJTuBdfcGrpQuvTN4KHP7_bIygSDQfY7mkUXotbzQJQftIX7NI9EtEYofOW3eMJ6uTxTtIqz2B1LPhl-w6nMrc8MNa9ctp_-TzaHWUekxwfSMCRIA3gLvFrQAiGDUNE-MxWtNP6uVootGBsprbJmaQ2ChfdcyVXQ4J97EIDe6G7T8zRIJdJKmde2h_0WTe_] at
> > > http://email.nubificus.co.uk/c/eJwljdsKgkAYhJ9GL2X9NQ8Xe2GSBSaoJOFVrOt6QFdL17Sevq1gGPhmGKbERllRtFNb7Hvn9EIKF2Wv6AFNtPmlz33juMbXYAAR3pYwypMY8n1KT-u7O2SJYiJO2l6rf05HrjbYsCihRUEp2DYCgmyH2TowGeiVCS6oPW6EuM-K4SkQSNWtaJbiu5ZA-3EpOzYNrJ8ldk_OBZuFOuHNseTdv9LGqf4Apyg8eg
>
> Hi Marc,
>
> thanks for taking the time to check this!
>
> >
> > You don't explain *why* we would want this. What is the overhead of
> > having
> > a userspace if your guest doesn't need any userspace handling? The
> > kvmtest
> > example indeed shows that the KVM userspace API is usable  without any
> > form
> > of emulation, hence has almost no cost.
>
> The rationale behind such an approach is two-fold:
> (a) we are able to ditch any user-space involvement in the creation and
> spawning of a KVM guest. This is particularly interesting in use-cases
> where short-lived tasks are spawned on demand.  Think of a scenario where
> an ABI compatible binary is loaded in memory.  Spawning it as a guest from
> userspace would incur a number of IOCTLs. Doing the same from the kernel
> would be the same number of IOCTLs but now these are function calls;
> additionally, memory handling is kind of simplified.
>
> (b) I agree that the userspace KVM API is usable without emulation for a
> simple task, written in bytecode, adding two registers. But what about
> something more complicated? something that needs I/O? for most use-cases,
> I/O happens between the guest and some hardware device (network/storage
> etc.). Being in the kernel saves us from doing unneccessary mode switches.
> Of course there are optimizations for handling I/O on QEMU/KVM VMs
> (virtio/vhost), but essentially what happens is removing mode-switches (and
> exits) for I/O operations -- is there a good reason not to address that
> directly? a guest running in the kernel exits because of an I/O request,
> which gets processed and forwarded directly to the relevant subsystem *in*
> the kernel (net/block etc.).
>
> We work on both directions with a particular focus on (a) -- device I/O could
> be handled with other mechanisms as well (VFs for instance).
>
> > Without a clear description of the advantages of your solution, as well
> > as a full featured in-tree use case, I find it pretty hard to support
> > this.
>
> Totally understand that -- please keep in mind that this is a first (baby)
> step for what we call KVMM (kernel virtual machine monitor). We presented
> the architecture at FOSDEM and some preliminary results regarding I/O. Of
> course, this is WiP, and far from being upstreamable. Hence the kvmmtest
> example showcasing the potential use-case.
>
> To be honest my main question is whether we are interested in such an
> approach in the first place, and then try to work on any rough edges. As
> far as I understand, you're not in favor of this approach.

The usual answer here is that the kernel is not in favor of adding
in-kernel functionality that is not used in the upstream kernel.  If
you come up with a real use case, and that use case is GPL and has
plans for upstreaming, and that use case has a real benefit
(dramatically faster than user code could likely be, does something
new and useful, etc), then it may well be mergeable.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-05-19  8:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-18  6:58 [PATCH 0/2] Expose KVM API to Linux Kernel Anastassios Nanos
2020-05-18  6:58 ` [PATCH 1/2] KVMM: export needed symbols Anastassios Nanos
2020-05-18  7:41   ` Marc Zyngier
2020-05-18  7:01 ` [PATCH 2/2] KVMM: Memory and interface related changes Anastassios Nanos
2020-05-18  6:59   ` Anastassios Nanos
2020-05-18  9:13   ` kbuild test robot
2020-05-18  9:28   ` kbuild test robot
2020-05-18 10:16   ` kbuild test robot
2020-05-18  7:50 ` [PATCH 0/2] Expose KVM API to Linux Kernel Marc Zyngier
     [not found]   ` <CALRTab90UyMq2hMxCdCmC3GwPWFn2tK_uKMYQP2YBRcHwzkEUQ@mail.gmail.com>
2020-05-18 11:18     ` Paolo Bonzini
2020-05-18 11:34       ` Maxim Levitsky
2020-05-18 11:51         ` Paolo Bonzini
2020-05-18 12:12           ` Maxim Levitsky
2020-05-18 20:45     ` Andy Lutomirski
2020-05-18  8:42 ` Thomas Gleixner
     [not found]   ` <CALRTab-mEYtRG4zQbSGoAri+jg8xNL-imODv=MWE330Hkt_t+Q@mail.gmail.com>
2020-05-18  9:18     ` Vitaly Kuznetsov
2020-05-18  9:38     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).