linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/15] acrn: add the ACRN driver module
@ 2019-08-16  2:25 Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 01/15] x86/acrn: Report X2APIC for ACRN guest Zhao Yakui
                   ` (15 more replies)
  0 siblings, 16 replies; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui

ACRN is a flexible, lightweight reference hypervisor, built with real-time
and safety-criticality in mind, optimized to streamline embedded development
through an open source platform. It is built for embedded IOT with small
footprint and real-time features. More details can be found
in https://projectacrn.org/

This is the patch set that add the ACRN driver module on ACRN guest, which
acts as the router to communciate with ACRN hypervisor.
The user-space applications can use the provided ACRN ioctls to
interact with ACRN hypervisor through different hypercalls. After the ACRN
module is loaded, the device file of /dev/acrn_hsm can be accessed in
user-space. It includes the management of virtualized CPU/memory/
device/interrupt/MMIO emulation for other ACRN guest. 
 
The first three patches are the changes under x86/acrn, which adds the
required APIs for the driver and reports the X2APIC caps. 
The remaining patches add the ACRN driver module, which accepts the ioctl
from user-space and then communicate with the low-level ACRN hypervisor
by using hypercall.


Zhao Yakui (15):
  x86/acrn: Report X2APIC for ACRN guest
  x86/acrn: Add two APIs to add/remove driver-specific upcall ISR handler
  x86/acrn: Add hypercall for ACRN guest
  drivers/acrn: add the basic framework of acrn char device driver
  drivers/acrn: add driver-specific hypercall for ACRN_HSM
  drivers/acrn: add the support of querying ACRN api version
  drivers/acrn: add acrn vm/vcpu management for ACRN_HSM char device
  drivers/acrn: add VM memory management for ACRN char device
  drivers/acrn: add passthrough device support
  drivers/acrn: add interrupt injection support
  drivers/acrn: add the support of handling emulated ioreq
  drivers/acrn: add driver-specific IRQ handle to dispatch IO_REQ request
  drivers/acrn: add service to obtain Power data transition
  drivers/acrn: add the support of irqfd and eventfd
  drivers/acrn: add the support of offline SOS cpu

 arch/x86/include/asm/acrn.h               |  57 ++
 arch/x86/kernel/cpu/acrn.c                |  20 +-
 drivers/staging/Kconfig                   |   2 +
 drivers/staging/Makefile                  |   1 +
 drivers/staging/acrn/Kconfig              |  18 +
 drivers/staging/acrn/Makefile             |   9 +
 drivers/staging/acrn/acrn_dev.c           | 727 +++++++++++++++++++++++
 drivers/staging/acrn/acrn_drv_internal.h  | 186 ++++++
 drivers/staging/acrn/acrn_hv_defs.h       |  65 +++
 drivers/staging/acrn/acrn_hypercall.c     | 136 +++++
 drivers/staging/acrn/acrn_hypercall.h     | 132 +++++
 drivers/staging/acrn/acrn_ioeventfd.c     | 407 +++++++++++++
 drivers/staging/acrn/acrn_ioreq.c         | 937 ++++++++++++++++++++++++++++++
 drivers/staging/acrn/acrn_irqfd.c         | 339 +++++++++++
 drivers/staging/acrn/acrn_mm.c            | 227 ++++++++
 drivers/staging/acrn/acrn_mm_hugetlb.c    | 281 +++++++++
 drivers/staging/acrn/acrn_vm_mngt.c       | 116 ++++
 include/linux/acrn/acrn_drv.h             | 200 +++++++
 include/uapi/linux/acrn/acrn_common_def.h | 201 +++++++
 include/uapi/linux/acrn/acrn_ioctl_defs.h | 345 +++++++++++
 20 files changed, 4400 insertions(+), 6 deletions(-)
 create mode 100644 drivers/staging/acrn/Kconfig
 create mode 100644 drivers/staging/acrn/Makefile
 create mode 100644 drivers/staging/acrn/acrn_dev.c
 create mode 100644 drivers/staging/acrn/acrn_drv_internal.h
 create mode 100644 drivers/staging/acrn/acrn_hv_defs.h
 create mode 100644 drivers/staging/acrn/acrn_hypercall.c
 create mode 100644 drivers/staging/acrn/acrn_hypercall.h
 create mode 100644 drivers/staging/acrn/acrn_ioeventfd.c
 create mode 100644 drivers/staging/acrn/acrn_ioreq.c
 create mode 100644 drivers/staging/acrn/acrn_irqfd.c
 create mode 100644 drivers/staging/acrn/acrn_mm.c
 create mode 100644 drivers/staging/acrn/acrn_mm_hugetlb.c
 create mode 100644 drivers/staging/acrn/acrn_vm_mngt.c
 create mode 100644 include/linux/acrn/acrn_drv.h
 create mode 100644 include/uapi/linux/acrn/acrn_common_def.h
 create mode 100644 include/uapi/linux/acrn/acrn_ioctl_defs.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 01/15] x86/acrn: Report X2APIC for ACRN guest
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 02/15] x86/acrn: Add two APIs to add/remove driver-specific upcall ISR handler Zhao Yakui
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Jason Chen CJ

After lapic is switched from xapic to x2apic mode, it can use the APIC
MSR register to access local apic register in ACRN guest. This will
help to remove some traps of lapic access in ACRN guest.
Report the X2APIC so that the ACRN guest can be switched to x2apic mode.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 arch/x86/kernel/cpu/acrn.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index 676022e..95db5c4 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -12,6 +12,7 @@
 #include <linux/interrupt.h>
 #include <asm/acrn.h>
 #include <asm/apic.h>
+#include <asm/cpufeatures.h>
 #include <asm/desc.h>
 #include <asm/hypervisor.h>
 #include <asm/irq_regs.h>
@@ -29,12 +30,7 @@ static void __init acrn_init_platform(void)
 
 static bool acrn_x2apic_available(void)
 {
-	/*
-	 * x2apic is not supported for now. Future enablement will have to check
-	 * X86_FEATURE_X2APIC to determine whether x2apic is supported in the
-	 * guest.
-	 */
-	return false;
+	return boot_cpu_has(X86_FEATURE_X2APIC);
 }
 
 static void (*acrn_intr_handler)(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 02/15] x86/acrn: Add two APIs to add/remove driver-specific upcall ISR handler
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 01/15] x86/acrn: Report X2APIC for ACRN guest Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 03/15] x86/acrn: Add hypercall for ACRN guest Zhao Yakui
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Jason Chen CJ

After the ACRN hypervisor sends the upcall notify interrupt, the upcall ISR
handler will be served. Now almost nothing is handled in upcall ISR handler
except acking EOI.
The driver-specific ISR handler is registered by the driver, which helps to
handle the real notification from ACRN hypervisor.
This is similar to that in XEN/HyperV.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 arch/x86/include/asm/acrn.h |  3 +++
 arch/x86/kernel/cpu/acrn.c  | 12 ++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/acrn.h b/arch/x86/include/asm/acrn.h
index 4adb13f..857e6244 100644
--- a/arch/x86/include/asm/acrn.h
+++ b/arch/x86/include/asm/acrn.h
@@ -8,4 +8,7 @@ extern void acrn_hv_callback_vector(void);
 #endif
 
 extern void acrn_hv_vector_handler(struct pt_regs *regs);
+
+extern void acrn_setup_intr_irq(void (*handler)(void));
+extern void acrn_remove_intr_irq(void);
 #endif /* _ASM_X86_ACRN_H */
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index 95db5c4..a1ce52a 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -56,6 +56,18 @@ __visible void __irq_entry acrn_hv_vector_handler(struct pt_regs *regs)
 	set_irq_regs(old_regs);
 }
 
+void acrn_setup_intr_irq(void (*handler)(void))
+{
+	acrn_intr_handler = handler;
+}
+EXPORT_SYMBOL_GPL(acrn_setup_intr_irq);
+
+void acrn_remove_intr_irq(void)
+{
+	acrn_intr_handler = NULL;
+}
+EXPORT_SYMBOL_GPL(acrn_remove_intr_irq);
+
 const __initconst struct hypervisor_x86 x86_hyper_acrn = {
 	.name                   = "ACRN",
 	.detect                 = acrn_detect,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 03/15] x86/acrn: Add hypercall for ACRN guest
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 01/15] x86/acrn: Report X2APIC for ACRN guest Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 02/15] x86/acrn: Add two APIs to add/remove driver-specific upcall ISR handler Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 04/15] drivers/acrn: add the basic framework of acrn char device driver Zhao Yakui
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Jason Chen CJ

When ACRN hypervisor is detected, the hypercall is needed so that the
ACRN guest can query/config some settings. For example: it can be used
to query the resources in hypervisor and manage the CPU/memory/device/
interrupt for guest operating system.

On x86 it is implemented with the VMCALL instruction.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 arch/x86/include/asm/acrn.h | 54 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/acrn.h b/arch/x86/include/asm/acrn.h
index 857e6244..ab97c3d 100644
--- a/arch/x86/include/asm/acrn.h
+++ b/arch/x86/include/asm/acrn.h
@@ -11,4 +11,58 @@ extern void acrn_hv_vector_handler(struct pt_regs *regs);
 
 extern void acrn_setup_intr_irq(void (*handler)(void));
 extern void acrn_remove_intr_irq(void);
+
+/*
+ * Hypercalls for ACRN guest
+ *
+ * Hypercall number is passed in R8 register.
+ * Up to 2 arguments are passed in RDI, RSI.
+ * Return value will be placed in RAX.
+ */
+static inline long acrn_hypercall0(unsigned long hcall_id)
+{
+	long result;
+
+	/* the hypercall is implemented with the VMCALL instruction.
+	 * volatile qualifier is added to avoid that it is dropped
+	 * because of compiler optimization.
+	 */
+	asm volatile("movq %[hcall_id], %%r8\n\t"
+		     "vmcall\n\t"
+		     : "=a" (result)
+		     : [hcall_id] "g" (hcall_id)
+		     : "r8");
+
+	return result;
+}
+
+static inline long acrn_hypercall1(unsigned long hcall_id,
+				   unsigned long param1)
+{
+	long result;
+
+	asm volatile("movq %[hcall_id], %%r8\n\t"
+		     "vmcall\n\t"
+		     : "=a" (result)
+		     : [hcall_id] "g" (hcall_id), "D" (param1)
+		     : "r8");
+
+	return result;
+}
+
+static inline long acrn_hypercall2(unsigned long hcall_id,
+				   unsigned long param1,
+				   unsigned long param2)
+{
+	long result;
+
+	asm volatile("movq %[hcall_id], %%r8\n\t"
+		     "vmcall\n\t"
+		     : "=a" (result)
+		     : [hcall_id] "g" (hcall_id), "D" (param1), "S" (param2)
+		     : "r8");
+
+	return result;
+}
+
 #endif /* _ASM_X86_ACRN_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 04/15] drivers/acrn: add the basic framework of acrn char device driver
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (2 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 03/15] x86/acrn: Add hypercall for ACRN guest Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16  7:05   ` Greg KH
  2019-08-16 11:28   ` Dan Carpenter
  2019-08-16  2:25 ` [RFC PATCH 05/15] drivers/acrn: add driver-specific hypercall for ACRN_HSM Zhao Yakui
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel
  Cc: Zhao Yakui, Jason Chen CJ, Jack Ren, Mingqiang Chi, Liu Shuo

ACRN hypervisor service module is the important middle layer that allows
the Linux kernel to communicate with the ACRN hypervisor. It includes
the management of virtualized CPU/memory/device/interrupt for other ACRN
guest. The user-space applications can use the provided ACRN ioctls to
interact with ACRN hypervisor through different hypercalls.

Add one basic framework firstly and the following patches will
add the corresponding implementations, which includes the management of
virtualized CPU/memory/interrupt and the emulation of MMIO/IO/PCI access.
The device file of /dev/acrn_hsm can be accessed in user-space to
communicate with ACRN module.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Co-developed-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Jack Ren <jack.ren@intel.com>
Co-developed-by: Mingqiang Chi <mingqiang.chi@intel.com>
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Co-developed-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/Kconfig         |   2 +
 drivers/staging/Makefile        |   1 +
 drivers/staging/acrn/Kconfig    |  18 ++++++
 drivers/staging/acrn/Makefile   |   2 +
 drivers/staging/acrn/acrn_dev.c | 123 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 146 insertions(+)
 create mode 100644 drivers/staging/acrn/Kconfig
 create mode 100644 drivers/staging/acrn/Makefile
 create mode 100644 drivers/staging/acrn/acrn_dev.c

diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 7c96a01..0766de5 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -120,4 +120,6 @@ source "drivers/staging/kpc2000/Kconfig"
 
 source "drivers/staging/isdn/Kconfig"
 
+source "drivers/staging/acrn/Kconfig"
+
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index fcaac96..f927eb0 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -49,4 +49,5 @@ obj-$(CONFIG_XIL_AXIS_FIFO)	+= axis-fifo/
 obj-$(CONFIG_EROFS_FS)		+= erofs/
 obj-$(CONFIG_FIELDBUS_DEV)     += fieldbus/
 obj-$(CONFIG_KPC2000)		+= kpc2000/
+obj-$(CONFIG_ACRN_HSM)		+= acrn/
 obj-$(CONFIG_ISDN_CAPI)		+= isdn/
diff --git a/drivers/staging/acrn/Kconfig b/drivers/staging/acrn/Kconfig
new file mode 100644
index 0000000..a047d5f
--- /dev/null
+++ b/drivers/staging/acrn/Kconfig
@@ -0,0 +1,18 @@
+config ACRN_HSM
+	tristate "Intel ACRN Hypervisor service Module"
+	depends on ACRN_GUEST
+	depends on HUGETLBFS
+	depends on PCI_MSI
+	default n
+	help
+	  This is the Hypervisor service Module (ACRN.ko) for ACRN guest
+	  to communicate with ACRN hypervisor. It includes the management
+	  of virtualized CPU/memory/device/interrupt for other ACRN guest.
+
+	  It is required if it needs to manage other ACRN guests. User-guest
+	  OS does not need it.
+
+	  If unsure, say N.
+	  If you wish to work on this driver, to help improve it, or to
+	  report problems you have with them, please use the
+	  acrn-dev@lists.projectacrn.org mailing list.
diff --git a/drivers/staging/acrn/Makefile b/drivers/staging/acrn/Makefile
new file mode 100644
index 0000000..48fca38
--- /dev/null
+++ b/drivers/staging/acrn/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_ACRN_HSM)	:= acrn.o
+acrn-y := acrn_dev.o
diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
new file mode 100644
index 0000000..55a7612
--- /dev/null
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
+/*
+ * ACRN hyperviosr service module (HSM): main framework
+ *
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ *
+ * Jason Chen CJ <jason.cj.chen@intel.com>
+ * Zhao Yakui <yakui.zhao@intel.com>
+ * Jack Ren <jack.ren@intel.com>
+ * Mingqiang Chi <mingqiang.chi@intel.com>
+ * Liu Shuo <shuo.a.liu@intel.com>
+ *
+ */
+
+#include <linux/bits.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/kdev_t.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <asm/acrn.h>
+#include <asm/hypervisor.h>
+
+#define  DEVICE_NAME "acrn_hsm"
+#define  CLASS_NAME  "acrn"
+
+static int	acrn_hsm_inited;
+static int	major;
+static struct class	*acrn_class;
+static struct device	*acrn_device;
+
+static
+int acrn_dev_open(struct inode *inodep, struct file *filep)
+{
+	pr_info("%s: opening device node\n", __func__);
+
+	return 0;
+}
+
+static
+long acrn_dev_ioctl(struct file *filep,
+		    unsigned int ioctl_num, unsigned long ioctl_param)
+{
+	long ret = 0;
+
+	return ret;
+}
+
+static int acrn_dev_release(struct inode *inodep, struct file *filep)
+{
+	return 0;
+}
+
+static const struct file_operations fops = {
+	.open = acrn_dev_open,
+	.release = acrn_dev_release,
+	.unlocked_ioctl = acrn_dev_ioctl,
+};
+
+#define EAX_PRIVILEGE_VM	BIT(0)
+
+static int __init acrn_init(void)
+{
+	acrn_hsm_inited = 0;
+	if (x86_hyper_type != X86_HYPER_ACRN)
+		return -ENODEV;
+
+	if (!(cpuid_eax(0x40000001) & EAX_PRIVILEGE_VM))
+		return -EPERM;
+
+	/* Try to dynamically allocate a major number for the device */
+	major = register_chrdev(0, DEVICE_NAME, &fops);
+	if (major < 0) {
+		pr_warn("acrn: failed to register a major number\n");
+		return major;
+	}
+	pr_info("acrn: registered correctly with major number %d\n", major);
+
+	/* Register the device class */
+	acrn_class = class_create(THIS_MODULE, CLASS_NAME);
+	if (IS_ERR(acrn_class)) {
+		unregister_chrdev(major, DEVICE_NAME);
+		pr_warn("acrn: failed to register device class\n");
+		return PTR_ERR(acrn_class);
+	}
+
+	/* Register the device driver */
+	acrn_device = device_create(acrn_class, NULL, MKDEV(major, 0),
+				    NULL, DEVICE_NAME);
+	if (IS_ERR(acrn_device)) {
+		class_destroy(acrn_class);
+		unregister_chrdev(major, DEVICE_NAME);
+		pr_warn("acrn: failed to create the device\n");
+		return PTR_ERR(acrn_device);
+	}
+
+	pr_info("acrn: ACRN Hypervisor service module initialized\n");
+	acrn_hsm_inited = 1;
+	return 0;
+}
+
+static void __exit acrn_exit(void)
+{
+	if (!acrn_hsm_inited)
+		return;
+
+	device_destroy(acrn_class, MKDEV(major, 0));
+	class_unregister(acrn_class);
+	class_destroy(acrn_class);
+	unregister_chrdev(major, DEVICE_NAME);
+	pr_info("acrn: exit\n");
+}
+
+module_init(acrn_init);
+module_exit(acrn_exit);
+
+MODULE_AUTHOR("Intel");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("This is a char device driver, acts as a route\n"
+		"responsible for transferring IO requsts from other modules\n"
+		"either in user-space or in kernel to and from ACRN hypervisor\n");
+MODULE_VERSION("0.1");
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 05/15] drivers/acrn: add driver-specific hypercall for ACRN_HSM
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (3 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 04/15] drivers/acrn: add the basic framework of acrn char device driver Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 06/15] drivers/acrn: add the support of querying ACRN api version Zhao Yakui
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel
  Cc: Zhao Yakui, Jason Chen CJ, Jack Ren, Yin FengWei, Liu Shuo

After the user-space calls the ioctls, the module will then call the
defined hypercall so that the ACRN hypervisor can take the corresponding
action. It includes the management of creating vcpu, guest memory
management and interrupt injection, pass-through device management.
The available driver-specific hypercalls for ACRN HSM module are added
so that the ACRN_HSM module can communicate with the low-level
ACRN hypervisor.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Co-developed-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Jack Ren <jack.ren@intel.com>
Co-developed-by: Yin FengWei <fengwei.yin@intel.com>
Signed-off-by: Yin FengWei <fengwei.yin@intel.com>
Co-developed-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/acrn/Makefile         |   3 +-
 drivers/staging/acrn/acrn_hv_defs.h   |  65 ++++++++++++++++
 drivers/staging/acrn/acrn_hypercall.c | 136 ++++++++++++++++++++++++++++++++++
 drivers/staging/acrn/acrn_hypercall.h | 132 +++++++++++++++++++++++++++++++++
 4 files changed, 335 insertions(+), 1 deletion(-)
 create mode 100644 drivers/staging/acrn/acrn_hv_defs.h
 create mode 100644 drivers/staging/acrn/acrn_hypercall.c
 create mode 100644 drivers/staging/acrn/acrn_hypercall.h

diff --git a/drivers/staging/acrn/Makefile b/drivers/staging/acrn/Makefile
index 48fca38..a58b0d1 100644
--- a/drivers/staging/acrn/Makefile
+++ b/drivers/staging/acrn/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_ACRN_HSM)	:= acrn.o
-acrn-y := acrn_dev.o
+acrn-y := acrn_dev.o \
+	  acrn_hypercall.o
diff --git a/drivers/staging/acrn/acrn_hv_defs.h b/drivers/staging/acrn/acrn_hv_defs.h
new file mode 100644
index 0000000..55417d2
--- /dev/null
+++ b/drivers/staging/acrn/acrn_hv_defs.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
+/*
+ * hypercall ID definition
+ *
+ */
+
+#ifndef _ACRN_HV_DEFS_H
+#define _ACRN_HV_DEFS_H
+
+/*
+ * Common structures for HV/HSM
+ */
+
+#define _HC_ID(x, y) (((x) << 24) | (y))
+
+#define HC_ID 0x80UL
+
+/* general */
+#define HC_ID_GEN_BASE               0x0UL
+#define HC_GET_API_VERSION          _HC_ID(HC_ID, HC_ID_GEN_BASE + 0x00)
+#define HC_SOS_OFFLINE_CPU          _HC_ID(HC_ID, HC_ID_GEN_BASE + 0x01)
+#define HC_GET_PLATFORM_INFO        _HC_ID(HC_ID, HC_ID_GEN_BASE + 0x03)
+
+/* VM management */
+#define HC_ID_VM_BASE               0x10UL
+#define HC_CREATE_VM                _HC_ID(HC_ID, HC_ID_VM_BASE + 0x00)
+#define HC_DESTROY_VM               _HC_ID(HC_ID, HC_ID_VM_BASE + 0x01)
+#define HC_START_VM                 _HC_ID(HC_ID, HC_ID_VM_BASE + 0x02)
+#define HC_PAUSE_VM                 _HC_ID(HC_ID, HC_ID_VM_BASE + 0x03)
+#define HC_CREATE_VCPU              _HC_ID(HC_ID, HC_ID_VM_BASE + 0x04)
+#define HC_RESET_VM                 _HC_ID(HC_ID, HC_ID_VM_BASE + 0x05)
+#define HC_SET_VCPU_REGS            _HC_ID(HC_ID, HC_ID_VM_BASE + 0x06)
+
+/* IRQ and Interrupts */
+#define HC_ID_IRQ_BASE              0x20UL
+#define HC_INJECT_MSI               _HC_ID(HC_ID, HC_ID_IRQ_BASE + 0x03)
+#define HC_VM_INTR_MONITOR          _HC_ID(HC_ID, HC_ID_IRQ_BASE + 0x04)
+#define HC_SET_IRQLINE              _HC_ID(HC_ID, HC_ID_IRQ_BASE + 0x05)
+
+/* DM ioreq management */
+#define HC_ID_IOREQ_BASE            0x30UL
+#define HC_SET_IOREQ_BUFFER         _HC_ID(HC_ID, HC_ID_IOREQ_BASE + 0x00)
+#define HC_NOTIFY_REQUEST_FINISH    _HC_ID(HC_ID, HC_ID_IOREQ_BASE + 0x01)
+
+/* Guest memory management */
+#define HC_ID_MEM_BASE              0x40UL
+#define HC_VM_SET_MEMORY_REGIONS    _HC_ID(HC_ID, HC_ID_MEM_BASE + 0x02)
+#define HC_VM_WRITE_PROTECT_PAGE    _HC_ID(HC_ID, HC_ID_MEM_BASE + 0x03)
+
+/* PCI assignment*/
+#define HC_ID_PCI_BASE              0x50UL
+#define HC_ASSIGN_PTDEV             _HC_ID(HC_ID, HC_ID_PCI_BASE + 0x00)
+#define HC_DEASSIGN_PTDEV           _HC_ID(HC_ID, HC_ID_PCI_BASE + 0x01)
+#define HC_SET_PTDEV_INTR_INFO      _HC_ID(HC_ID, HC_ID_PCI_BASE + 0x03)
+#define HC_RESET_PTDEV_INTR_INFO    _HC_ID(HC_ID, HC_ID_PCI_BASE + 0x04)
+
+/* DEBUG */
+#define HC_ID_DBG_BASE              0x60UL
+
+/* Power management */
+#define HC_ID_PM_BASE               0x80UL
+#define HC_PM_GET_CPU_STATE         _HC_ID(HC_ID, HC_ID_PM_BASE + 0x00)
+#define HC_PM_SET_SSTATE_DATA       _HC_ID(HC_ID, HC_ID_PM_BASE + 0x01)
+
+#endif /* __ACRN_HV_DEFS_H */
diff --git a/drivers/staging/acrn/acrn_hypercall.c b/drivers/staging/acrn/acrn_hypercall.c
new file mode 100644
index 0000000..6d83475
--- /dev/null
+++ b/drivers/staging/acrn/acrn_hypercall.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
+/*
+ * ACRN hyperviosr service module (HSM): driver-specific hypercall
+ *
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ * Jason Chen CJ <jason.cj.chen@intel.com>
+ * Zhao Yakui <yakui.zhao@intel.com>
+ * Jack Ren <jack.ren@intel.com>
+ * Yin FengWei <fengwei.yin@intel.com>
+ * Liu Shuo <shuo.a.liu@intel.com>
+ */
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+#include <asm/acrn.h>
+#include "acrn_hv_defs.h"
+#include "acrn_hypercall.h"
+
+/* General */
+long hcall_get_api_version(unsigned long api_version)
+{
+	return acrn_hypercall1(HC_GET_API_VERSION, api_version);
+}
+
+long hcall_sos_offline_cpu(unsigned long cpu)
+{
+	return acrn_hypercall1(HC_SOS_OFFLINE_CPU, cpu);
+}
+
+long hcall_get_platform_info(unsigned long platform_info)
+{
+	return acrn_hypercall1(HC_GET_PLATFORM_INFO, platform_info);
+}
+
+/* VM management */
+long hcall_create_vm(unsigned long vminfo)
+{
+	return acrn_hypercall1(HC_CREATE_VM, vminfo);
+}
+
+long hcall_start_vm(unsigned long vmid)
+{
+	return  acrn_hypercall1(HC_START_VM, vmid);
+}
+
+long hcall_pause_vm(unsigned long vmid)
+{
+	return acrn_hypercall1(HC_PAUSE_VM, vmid);
+}
+
+long hcall_reset_vm(unsigned long vmid)
+{
+	return acrn_hypercall1(HC_RESET_VM, vmid);
+}
+
+long hcall_destroy_vm(unsigned long vmid)
+{
+	return acrn_hypercall1(HC_DESTROY_VM, vmid);
+}
+
+long hcall_create_vcpu(unsigned long vmid, unsigned long vcpu)
+{
+	return acrn_hypercall2(HC_CREATE_VCPU, vmid, vcpu);
+}
+
+long hcall_set_vcpu_regs(unsigned long vmid,
+			 unsigned long regs_state)
+{
+	return acrn_hypercall2(HC_SET_VCPU_REGS, vmid, regs_state);
+}
+
+/* IRQ and Interrupts */
+long hcall_inject_msi(unsigned long vmid, unsigned long msi)
+{
+	return acrn_hypercall2(HC_INJECT_MSI, vmid, msi);
+}
+
+long hcall_vm_intr_monitor(unsigned long vmid, unsigned long addr)
+{
+	return  acrn_hypercall2(HC_VM_INTR_MONITOR, vmid, addr);
+}
+
+long hcall_set_irqline(unsigned long vmid, unsigned long op)
+{
+	return acrn_hypercall2(HC_SET_IRQLINE, vmid, op);
+}
+
+/* DM ioreq management */
+long hcall_set_ioreq_buffer(unsigned long vmid, unsigned long buffer)
+{
+	return acrn_hypercall2(HC_SET_IOREQ_BUFFER, vmid, buffer);
+}
+
+long hcall_notify_req_finish(unsigned long vmid, unsigned long vcpu)
+{
+	return acrn_hypercall2(HC_NOTIFY_REQUEST_FINISH, vmid, vcpu);
+}
+
+/* Guest memory management */
+long hcall_set_memory_regions(unsigned long pa_regions)
+{
+	return acrn_hypercall1(HC_VM_SET_MEMORY_REGIONS, pa_regions);
+}
+
+long hcall_write_protect_page(unsigned long vmid, unsigned long wp)
+{
+	return acrn_hypercall2(HC_VM_WRITE_PROTECT_PAGE, vmid, wp);
+}
+
+/* PCI device assignment */
+long hcall_assign_ptdev(unsigned long vmid, unsigned long bdf)
+{
+	return acrn_hypercall2(HC_ASSIGN_PTDEV, vmid, bdf);
+}
+
+long hcall_deassign_ptdev(unsigned long vmid, unsigned long bdf)
+{
+	return acrn_hypercall2(HC_DEASSIGN_PTDEV, vmid, bdf);
+}
+
+long hcall_set_ptdev_intr_info(unsigned long vmid, unsigned long pt_irq)
+{
+	return acrn_hypercall2(HC_SET_PTDEV_INTR_INFO, vmid, pt_irq);
+}
+
+long hcall_reset_ptdev_intr_info(unsigned long vmid,
+				 unsigned long pt_irq)
+{
+	return acrn_hypercall2(HC_RESET_PTDEV_INTR_INFO, vmid, pt_irq);
+}
+
+/* Power Management */
+long hcall_get_cpu_state(unsigned long cmd, unsigned long state_pa)
+{
+	return acrn_hypercall2(HC_PM_GET_CPU_STATE, cmd, state_pa);
+}
diff --git a/drivers/staging/acrn/acrn_hypercall.h b/drivers/staging/acrn/acrn_hypercall.h
new file mode 100644
index 0000000..104f84c
--- /dev/null
+++ b/drivers/staging/acrn/acrn_hypercall.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
+/*
+ * ACRN hyperviosr service module (HSM): driver-specific hypercall
+ * (header definition)
+ */
+#ifndef _ACRN_HSM_HYPERCALL_H
+#define _ACRN_HSM_HYPERCALL_H
+
+/* General */
+/* notify the hypervisor to offline one vcpu for SOS
+ * cpu is the cpu number that needs to be offlined
+ */
+long hcall_sos_offline_cpu(unsigned long cpu);
+/* return the API_VERSION of hypervisor
+ * api_version points to the gpa of returned info
+ */
+long hcall_get_api_version(unsigned long api_version);
+/* return the platform info of hypervisor
+ * platform_info points to the gpa of returned info
+ */
+long hcall_get_platform_info(unsigned long platform_info);
+
+/* VM management */
+/* ask the hypervisor to create one Guest VM.
+ * vminfo points to the gpa of created VM(in/out)
+ */
+long hcall_create_vm(unsigned long vminfo);
+/* ask the hypervisor to start the given VM based on vmid.
+ * vmid is the identifier for the given VM
+ */
+long hcall_start_vm(unsigned long vmid);
+/* ask the hypervisor to pause the given VM based on vmid.
+ * vmid is the identifier for the given VM
+ */
+long hcall_pause_vm(unsigned long vmid);
+/* ask the hypervisor to release  the given VM based on vmid.
+ * vmid is the identifier for the given VM
+ */
+long hcall_destroy_vm(unsigned long vmid);
+/* ask the hypervisor to reset the given VM based on vmid.
+ * vmid is the identifier for the given VM
+ */
+long hcall_reset_vm(unsigned long vmid);
+/* ask the hypervisor to create one vcpu based on vmid.
+ * vmid is the identifier for the given VM
+ * vcpu is the cpu number that needs to be created
+ */
+long hcall_create_vcpu(unsigned long vmid, unsigned long vcpu);
+/* ask the hypervisor to configure the regs_state for one vcpu in VM
+ * vmid is the identifier for the given VM
+ * regs_state points to the gpa of configured register state: cpu_id and
+ *         register value.
+ */
+long hcall_set_vcpu_regs(unsigned long vmid, unsigned long regs_state);
+
+/* IRQ and interrupt management */
+/* notify the hypervisor to deliver MSI interrupt to target vm
+ * vmid is the identifier of target VM
+ * msi points to the gpa of MSI message
+ */
+long hcall_inject_msi(unsigned long vmid, unsigned long msi);
+/* notify the hypervisor to query interrupt_count info for target VM
+ * vmid is the identifier of target VM
+ * addr is the GPA address that points to interrupt_count page of target VM
+ */
+long hcall_vm_intr_monitor(unsigned long vmid, unsigned long addr);
+/* notify the hypervisor to handle the passed irq op
+ * vmid is the identifier of target VM.
+ * op is the defined irq op
+ */
+long hcall_set_irqline(unsigned long vmid, unsigned long op);
+
+/* DM IOREQ management */
+/* ask the hypervisor to setup the shared buffer for IO Request.
+ * vmdi is the identifier of target VM
+ * buffer points to the gpa address of the ioreq_buffer structure
+ */
+long hcall_set_ioreq_buffer(unsigned long vmid, unsigned long buffer);
+/* notify that the ioreq on cpu of VMID is done
+ * vmid is the identifier of target VM
+ * cpu is the vCPU that triggers the iorequest
+ */
+long hcall_notify_req_finish(unsigned long vmid, unsigned long vcpu);
+
+/* Guest Memory mamangement */
+/* ask the hypervisor to setup EPT for the given VM.
+ * pa_regions points to the gpa for memory_region that includes the
+ * mapping between HPA and UOS GPA. The vmid is also included.
+ */
+long hcall_set_memory_regions(unsigned long pa_regions);
+
+/* ask the hypervisor to enable/disable the EPT_WP for one 4K page on
+ *    one given VM
+ * vmid is the identifier of target VM
+ * wp points to the gpa address that contains the wp_data structure
+ */
+long hcall_write_protect_page(unsigned long vmid, unsigned long wp);
+
+/* PCI device assignement */
+/* notify the hypervisor to assign one PCI device to target vm
+ * vmid is the identifier of target VM
+ * bdf is the assigned PCI device(bus:dev:func)
+ */
+long hcall_assign_ptdev(unsigned long vmid, unsigned long bdf);
+/* notify the hypervisor to deassign one PCI device to target vm
+ * vmid is the identifier of target VM
+ * bdf is the deassigned PCI device(bus:dev:func)
+ */
+long hcall_deassign_ptdev(unsigned long vmid, unsigned long bdf);
+/* notify the hypervisor to configure the interrupt_info for the assigned
+ *        PCI device
+ * vmid is the identifier of target VM
+ * pt_irq is the GPA address that points to the pt_irq info
+ */
+long hcall_set_ptdev_intr_info(unsigned long vmid, unsigned long pt_irq);
+/* notify the hypervisor to reset the interrupt_info for the assigned
+ *        PCI device
+ * vmid is the identifier of target VM
+ * pt_irq is the GPA address that points to the pt_irq info
+ */
+long hcall_reset_ptdev_intr_info(unsigned long vmid, unsigned long pt_irq);
+
+/* Debug assignment */
+/* TBD: It will be added when adding debug module */
+
+/* Power management */
+/* get the cpu px/cx state from hypervisor.
+ * state_pa points to the gpa of px/cx state buffer
+ */
+long hcall_get_cpu_state(unsigned long cmd, unsigned long state_pa);
+
+#endif /* __ACRN_HSM_HYPERCALL_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 06/15] drivers/acrn: add the support of querying ACRN api version
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (4 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 05/15] drivers/acrn: add driver-specific hypercall for ACRN_HSM Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 07/15] drivers/acrn: add acrn vm/vcpu management for ACRN_HSM char device Zhao Yakui
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Jason Chen CJ, Liu Shuo

In order to assure that the ACRN module can work with the required ACRN
hypervisor, it needs to check whether the required version is consistent
with the queried version from ACRN ypervisor. If it is inconsistent, it
won't coninue the initialization of ACRN_HSM module.
Similarly the user-space module also needs to check the driver version.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Co-developed-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/acrn/acrn_dev.c           | 47 +++++++++++++++++++++++++++++++
 include/uapi/linux/acrn/acrn_ioctl_defs.h | 32 +++++++++++++++++++++
 2 files changed, 79 insertions(+)
 create mode 100644 include/uapi/linux/acrn/acrn_ioctl_defs.h

diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
index 55a7612..57cd2bb 100644
--- a/drivers/staging/acrn/acrn_dev.c
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -18,13 +18,22 @@
 #include <linux/kdev_t.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
+#include <linux/io.h>
 #include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
 #include <asm/acrn.h>
 #include <asm/hypervisor.h>
+#include <linux/acrn/acrn_ioctl_defs.h>
+
+#include "acrn_hypercall.h"
 
 #define  DEVICE_NAME "acrn_hsm"
 #define  CLASS_NAME  "acrn"
 
+#define ACRN_API_VERSION_MAJOR	1
+#define ACRN_API_VERSION_MINOR	0
+
 static int	acrn_hsm_inited;
 static int	major;
 static struct class	*acrn_class;
@@ -44,6 +53,19 @@ long acrn_dev_ioctl(struct file *filep,
 {
 	long ret = 0;
 
+	if (ioctl_num == IC_GET_API_VERSION) {
+		struct api_version api_version;
+
+		api_version.major_version = ACRN_API_VERSION_MAJOR;
+		api_version.minor_version = ACRN_API_VERSION_MINOR;
+
+		if (copy_to_user((void *)ioctl_param, &api_version,
+				 sizeof(api_version)))
+			return -EFAULT;
+
+		return 0;
+	}
+
 	return ret;
 }
 
@@ -59,9 +81,12 @@ static const struct file_operations fops = {
 };
 
 #define EAX_PRIVILEGE_VM	BIT(0)
+#define SUPPORT_HV_API_VERSION_MAJOR	1
+#define SUPPORT_HV_API_VERSION_MINOR	0
 
 static int __init acrn_init(void)
 {
+	struct api_version *api_version;
 	acrn_hsm_inited = 0;
 	if (x86_hyper_type != X86_HYPER_ACRN)
 		return -ENODEV;
@@ -69,6 +94,28 @@ static int __init acrn_init(void)
 	if (!(cpuid_eax(0x40000001) & EAX_PRIVILEGE_VM))
 		return -EPERM;
 
+	api_version = kmalloc(sizeof(*api_version), GFP_KERNEL);
+	if (!api_version)
+		return -ENOMEM;
+
+	if (hcall_get_api_version(virt_to_phys(api_version)) < 0) {
+		pr_err("acrn: failed to get api version from Hypervisor !\n");
+		kfree(api_version);
+		return -EINVAL;
+	}
+
+	if (api_version->major_version >= SUPPORT_HV_API_VERSION_MAJOR &&
+	    api_version->minor_version >= SUPPORT_HV_API_VERSION_MINOR) {
+		pr_info("acrn: hv api version %d.%d\n",
+			api_version->major_version, api_version->minor_version);
+		kfree(api_version);
+	} else {
+		pr_err("acrn: not support hv api version %d.%d!\n",
+		       api_version->major_version, api_version->minor_version);
+		kfree(api_version);
+		return -EINVAL;
+	}
+
 	/* Try to dynamically allocate a major number for the device */
 	major = register_chrdev(0, DEVICE_NAME, &fops);
 	if (major < 0) {
diff --git a/include/uapi/linux/acrn/acrn_ioctl_defs.h b/include/uapi/linux/acrn/acrn_ioctl_defs.h
new file mode 100644
index 0000000..8dbf69a
--- /dev/null
+++ b/include/uapi/linux/acrn/acrn_ioctl_defs.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
+/**
+ * @file acrn_ioctl_defs.h
+ *
+ * ACRN definition for ioctl to user space
+ */
+
+#ifndef __ACRN_IOCTL_DEFS_H__
+#define __ACRN_IOCTL_DEFS_H__
+
+/**
+ * struct api_version - data structure to track ACRN_SRV API version
+ *
+ * @major_version: major version of ACRN_SRV API
+ * @minor_version: minor version of ACRN_SRV API
+ */
+struct api_version {
+	uint32_t major_version;
+	uint32_t minor_version;
+};
+
+/*
+ * Common IOCTL ID definition for DM
+ */
+#define _IC_ID(x, y) (((x) << 24) | (y))
+#define IC_ID 0x43UL
+
+/* General */
+#define IC_ID_GEN_BASE                  0x0UL
+#define IC_GET_API_VERSION             _IC_ID(IC_ID, IC_ID_GEN_BASE + 0x00)
+
+#endif /* __ACRN_IOCTL_DEFS_H__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 07/15] drivers/acrn: add acrn vm/vcpu management for ACRN_HSM char device
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (5 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 06/15] drivers/acrn: add the support of querying ACRN api version Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN " Zhao Yakui
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Jason Chen CJ, Liu Shuo

The VM management is one important role of acrn module. It is used to
manage another VM based on the user-space ioctls. It includes the
following VM operation: CREATE/START/PAUSE/DESTROY VM, CREATE_VCPU,
IC_SET_VCPU_REGS.
acrn_ioctl is provided so that the user of /dev/acrn_hsm can manage
the VM for the given guest. After the ioctl is called, the hypercall
is then called so that the ACRN hypervisor can help to manage the
corresponding VM and create the required VCPU.

As ACRN is mainly used for embedded IOT usage, no interface is provided
to destroy the vcpu explicitly. When the VM is destroyed, the low-level
ACRN hypervisor will free the corresponding vcpu implicitly.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Co-developed-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/acrn/Makefile             |   3 +-
 drivers/staging/acrn/acrn_dev.c           | 169 +++++++++++++++++++++++++++++-
 drivers/staging/acrn/acrn_drv_internal.h  |  74 +++++++++++++
 drivers/staging/acrn/acrn_vm_mngt.c       |  72 +++++++++++++
 include/uapi/linux/acrn/acrn_ioctl_defs.h | 126 ++++++++++++++++++++++
 5 files changed, 442 insertions(+), 2 deletions(-)
 create mode 100644 drivers/staging/acrn/acrn_drv_internal.h
 create mode 100644 drivers/staging/acrn/acrn_vm_mngt.c

diff --git a/drivers/staging/acrn/Makefile b/drivers/staging/acrn/Makefile
index a58b0d1..426d6e8 100644
--- a/drivers/staging/acrn/Makefile
+++ b/drivers/staging/acrn/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_ACRN_HSM)	:= acrn.o
 acrn-y := acrn_dev.o \
-	  acrn_hypercall.o
+	  acrn_hypercall.o \
+	  acrn_vm_mngt.o
diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
index 57cd2bb..7372316 100644
--- a/drivers/staging/acrn/acrn_dev.c
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -27,6 +27,7 @@
 #include <linux/acrn/acrn_ioctl_defs.h>
 
 #include "acrn_hypercall.h"
+#include "acrn_drv_internal.h"
 
 #define  DEVICE_NAME "acrn_hsm"
 #define  CLASS_NAME  "acrn"
@@ -42,8 +43,22 @@ static struct device	*acrn_device;
 static
 int acrn_dev_open(struct inode *inodep, struct file *filep)
 {
-	pr_info("%s: opening device node\n", __func__);
+	struct acrn_vm *vm;
+
+	vm = kzalloc(sizeof(*vm), GFP_KERNEL);
+	if (!vm)
+		return -ENOMEM;
+
+	refcount_set(&vm->refcnt, 1);
+	vm->vmid = ACRN_INVALID_VMID;
+	vm->dev = acrn_device;
 
+	write_lock_bh(&acrn_vm_list_lock);
+	vm_list_add(&vm->list);
+	write_unlock_bh(&acrn_vm_list_lock);
+	filep->private_data = vm;
+
+	pr_info("%s: opening device node\n", __func__);
 	return 0;
 }
 
@@ -52,6 +67,15 @@ long acrn_dev_ioctl(struct file *filep,
 		    unsigned int ioctl_num, unsigned long ioctl_param)
 {
 	long ret = 0;
+	struct acrn_vm *vm;
+
+	pr_debug("[%s] ioctl_num=0x%x\n", __func__, ioctl_num);
+
+	vm = (struct acrn_vm *)filep->private_data;
+	if (!vm) {
+		pr_err("acrn: invalid VM !\n");
+		return -EFAULT;
+	}
 
 	if (ioctl_num == IC_GET_API_VERSION) {
 		struct api_version api_version;
@@ -66,11 +90,154 @@ long acrn_dev_ioctl(struct file *filep,
 		return 0;
 	}
 
+	if ((vm->vmid == ACRN_INVALID_VMID) && (ioctl_num != IC_CREATE_VM)) {
+		pr_err("acrn: invalid VM ID for IOCTL %x!\n", ioctl_num);
+		return -EFAULT;
+	}
+
+	switch (ioctl_num) {
+	case IC_CREATE_VM: {
+		struct acrn_create_vm *created_vm;
+
+		created_vm = kmalloc(sizeof(*created_vm), GFP_KERNEL);
+		if (!created_vm)
+			return -ENOMEM;
+
+		if (copy_from_user(created_vm, (void *)ioctl_param,
+				   sizeof(struct acrn_create_vm))) {
+			kfree(created_vm);
+			return -EFAULT;
+		}
+
+		ret = hcall_create_vm(virt_to_phys(created_vm));
+		if ((ret < 0) || (created_vm->vmid == ACRN_INVALID_VMID)) {
+			pr_err("acrn: failed to create VM from Hypervisor !\n");
+			kfree(created_vm);
+			return -EFAULT;
+		}
+
+		if (copy_to_user((void *)ioctl_param, created_vm,
+				 sizeof(struct acrn_create_vm))) {
+			kfree(created_vm);
+			return -EFAULT;
+		}
+
+		vm->vmid = created_vm->vmid;
+		atomic_set(&vm->vcpu_num, 0);
+
+		pr_info("acrn: VM %d created\n", created_vm->vmid);
+		kfree(created_vm);
+		break;
+	}
+
+	case IC_START_VM: {
+		ret = hcall_start_vm(vm->vmid);
+		if (ret < 0) {
+			pr_err("acrn: failed to start VM %d!\n", vm->vmid);
+			return -EFAULT;
+		}
+		break;
+	}
+
+	case IC_PAUSE_VM: {
+		ret = hcall_pause_vm(vm->vmid);
+		if (ret < 0) {
+			pr_err("acrn: failed to pause VM %d!\n", vm->vmid);
+			return -EFAULT;
+		}
+		break;
+	}
+
+	case IC_RESET_VM: {
+		ret = hcall_reset_vm(vm->vmid);
+		if (ret < 0) {
+			pr_err("acrn: failed to restart VM %d!\n", vm->vmid);
+			return -EFAULT;
+		}
+		break;
+	}
+
+	case IC_DESTROY_VM: {
+		ret = acrn_vm_destroy(vm);
+		break;
+	}
+
+	case IC_CREATE_VCPU: {
+		struct acrn_create_vcpu *cv;
+
+		cv = kmalloc(sizeof(*cv), GFP_KERNEL);
+		if (!cv)
+			return -ENOMEM;
+
+		if (copy_from_user(cv, (void *)ioctl_param,
+				   sizeof(struct acrn_create_vcpu))) {
+			kfree(cv);
+			return -EFAULT;
+		}
+
+		ret = hcall_create_vcpu(vm->vmid, virt_to_phys(cv));
+		if (ret < 0) {
+			pr_err("acrn: failed to create vcpu %d!\n",
+			       cv->vcpu_id);
+			kfree(cv);
+			return -EFAULT;
+		}
+		atomic_inc(&vm->vcpu_num);
+		kfree(cv);
+
+		return ret;
+	}
+
+	case IC_SET_VCPU_REGS: {
+		struct acrn_set_vcpu_regs *cpu_regs;
+
+		cpu_regs = kmalloc(sizeof(*cpu_regs), GFP_KERNEL);
+		if (!cpu_regs)
+			return -ENOMEM;
+
+		if (copy_from_user(cpu_regs, (void *)ioctl_param,
+				   sizeof(*cpu_regs))) {
+			kfree(cpu_regs);
+			return -EFAULT;
+		}
+
+		ret = hcall_set_vcpu_regs(vm->vmid, virt_to_phys(cpu_regs));
+		kfree(cpu_regs);
+		if (ret < 0) {
+			pr_err("acrn: failed to set bsp state of vm %d!\n",
+			       vm->vmid);
+			return -EFAULT;
+		}
+
+		return ret;
+	}
+
+	default:
+		pr_warn("Unknown IOCTL 0x%x\n", ioctl_num);
+		ret = -EFAULT;
+		break;
+	}
+
 	return ret;
 }
 
 static int acrn_dev_release(struct inode *inodep, struct file *filep)
 {
+	struct acrn_vm *vm = filep->private_data;
+
+	if (!vm) {
+		pr_err("acrn: invalid VM !\n");
+		return -EFAULT;
+	}
+	if (vm->vmid != ACRN_INVALID_VMID)
+		acrn_vm_destroy(vm);
+
+	write_lock_bh(&acrn_vm_list_lock);
+	list_del_init(&vm->list);
+	write_unlock_bh(&acrn_vm_list_lock);
+
+	put_vm(vm);
+	filep->private_data = NULL;
 	return 0;
 }
 
diff --git a/drivers/staging/acrn/acrn_drv_internal.h b/drivers/staging/acrn/acrn_drv_internal.h
new file mode 100644
index 0000000..6758dea
--- /dev/null
+++ b/drivers/staging/acrn/acrn_drv_internal.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
+/*
+ * ACRN_HSM : vm management header file.
+ *
+ */
+
+#ifndef __ACRN_VM_MNGT_H
+#define __ACRN_VM_MNGT_H
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/refcount.h>
+
+#define ACRN_INVALID_VMID (-1)
+
+enum ACRN_VM_FLAGS {
+	ACRN_VM_DESTROYED = 0,
+};
+
+extern struct list_head acrn_vm_list;
+extern rwlock_t acrn_vm_list_lock;
+
+void vm_list_add(struct list_head *list);
+
+/**
+ * struct acrn_vm - data structure to track guest
+ *
+ * @dev: pointer to dev of linux device mode
+ * @list: list of acrn_vm
+ * @vmid: guest vmid
+ * @refcnt: reference count of guest
+ * @max_gfn: maximum guest page frame number
+ * @vcpu_num: vcpu number
+ * @flags: VM flag bits
+ */
+struct acrn_vm {
+	struct device *dev;
+	struct list_head list;
+	unsigned short vmid;
+	refcount_t refcnt;
+	int max_gfn;
+	atomic_t vcpu_num;
+	unsigned long flags;
+};
+
+int acrn_vm_destroy(struct acrn_vm *vm);
+
+/**
+ * find_get_vm() - find and keep guest acrn_vm based on the vmid
+ *
+ * @vmid: guest vmid
+ *
+ * Return: pointer to acrn_vm, NULL if can't find vm matching vmid
+ */
+struct acrn_vm *find_get_vm(unsigned short vmid);
+
+/**
+ * get_vm() - increase the refcnt of acrn_vm
+ * @vm: pointer to acrn_vm which identify specific guest
+ *
+ * Return:
+ */
+void get_vm(struct acrn_vm *vm);
+
+/**
+ * put_vm() - release acrn_vm of guest according to guest vmid
+ * If the latest reference count drops to zero, free acrn_vm as well
+ * @vm: pointer to acrn_vm which identify specific guest
+ *
+ * Return:
+ */
+void put_vm(struct acrn_vm *vm);
+
+#endif
diff --git a/drivers/staging/acrn/acrn_vm_mngt.c b/drivers/staging/acrn/acrn_vm_mngt.c
new file mode 100644
index 0000000..04c551d
--- /dev/null
+++ b/drivers/staging/acrn/acrn_vm_mngt.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
+/*
+ * ACRN_HSM: vm management
+
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ * Jason Chen CJ <jason.cj.chen@intel.com>
+ * Zhao Yakui <yakui.zhao@intel.com>
+ * Liu Shuo <shuo.a.liu@intel.com>
+ */
+
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/rwlock_types.h>
+
+#include "acrn_hypercall.h"
+#include "acrn_drv_internal.h"
+
+LIST_HEAD(acrn_vm_list);
+DEFINE_RWLOCK(acrn_vm_list_lock);
+
+struct acrn_vm *find_get_vm(unsigned short vmid)
+{
+	struct acrn_vm *vm;
+
+	read_lock_bh(&acrn_vm_list_lock);
+	list_for_each_entry(vm, &acrn_vm_list, list) {
+		if (vm->vmid == vmid) {
+			refcount_inc(&vm->refcnt);
+			read_unlock_bh(&acrn_vm_list_lock);
+			return vm;
+		}
+	}
+	read_unlock_bh(&acrn_vm_list_lock);
+	return NULL;
+}
+
+void get_vm(struct acrn_vm *vm)
+{
+	refcount_inc_checked(&vm->refcnt);
+}
+
+void put_vm(struct acrn_vm *vm)
+{
+	if (refcount_dec_and_test(&vm->refcnt)) {
+		kfree(vm);
+		pr_debug("hsm: freed vm\n");
+	}
+}
+
+void vm_list_add(struct list_head *list)
+{
+	list_add(list, &acrn_vm_list);
+}
+
+int acrn_vm_destroy(struct acrn_vm *vm)
+{
+	int ret;
+
+	if (test_and_set_bit(ACRN_VM_DESTROYED, &vm->flags))
+		return 0;
+
+	ret = hcall_destroy_vm(vm->vmid);
+	if (ret < 0) {
+		pr_warn("failed to destroy VM %d\n", vm->vmid);
+		clear_bit(ACRN_VM_DESTROYED, &vm->flags);
+		return -EFAULT;
+	}
+
+	vm->vmid = ACRN_INVALID_VMID;
+	return 0;
+}
diff --git a/include/uapi/linux/acrn/acrn_ioctl_defs.h b/include/uapi/linux/acrn/acrn_ioctl_defs.h
index 8dbf69a..ebcf812 100644
--- a/include/uapi/linux/acrn/acrn_ioctl_defs.h
+++ b/include/uapi/linux/acrn/acrn_ioctl_defs.h
@@ -19,6 +19,122 @@ struct api_version {
 	uint32_t minor_version;
 };
 
+/**
+ * @brief Info to create a VM, the parameter for HC_CREATE_VM hypercall
+ */
+struct acrn_create_vm {
+	/** created vmid return to ACRN. Keep it first field */
+	uint16_t vmid;
+
+	/** Reserved */
+	uint16_t reserved0;
+
+	/** VCPU numbers this VM want to create */
+	uint16_t vcpu_num;
+
+	/** Reserved */
+	uint16_t reserved1;
+
+	/** the GUID of this VM */
+	uint8_t	 GUID[16];
+
+	/* VM flag bits from Guest OS. */
+	uint64_t vm_flag;
+
+	/** guest physical address of VM request_buffer */
+	uint64_t req_buf;
+
+	/** Reserved for future use*/
+	uint8_t  reserved2[16];
+};
+
+/**
+ * @brief Info to create a VCPU
+ *
+ * the parameter for HC_CREATE_VCPU hypercall
+ */
+struct acrn_create_vcpu {
+	/** the virtual CPU ID for the VCPU created */
+	uint16_t vcpu_id;
+
+	/** the physical CPU ID for the VCPU created */
+	uint16_t pcpu_id;
+
+	/** Reserved for future use*/
+	uint8_t reserved[4];
+} __aligned(8);
+
+struct acrn_gp_regs {
+	uint64_t rax;
+	uint64_t rcx;
+	uint64_t rdx;
+	uint64_t rbx;
+	uint64_t rsp;
+	uint64_t rbp;
+	uint64_t rsi;
+	uint64_t rdi;
+	uint64_t r8;
+	uint64_t r9;
+	uint64_t r10;
+	uint64_t r11;
+	uint64_t r12;
+	uint64_t r13;
+	uint64_t r14;
+	uint64_t r15;
+};
+
+struct acrn_descriptor_ptr {
+	uint16_t limit;
+	uint64_t base;
+	uint16_t reserved[3];
+} __packed;
+
+struct acrn_vcpu_regs {
+	struct acrn_gp_regs gprs;
+	struct acrn_descriptor_ptr gdt;
+	struct acrn_descriptor_ptr idt;
+
+	uint64_t        rip;
+	uint64_t        cs_base;
+	uint64_t        cr0;
+	uint64_t        cr4;
+	uint64_t        cr3;
+	uint64_t        ia32_efer;
+	uint64_t        rflags;
+	uint64_t        reserved_64[4];
+
+	uint32_t        cs_ar;
+	uint32_t        reserved_32[4];
+
+	/* don't change the order of following sel */
+	uint16_t        cs_sel;
+	uint16_t        ss_sel;
+	uint16_t        ds_sel;
+	uint16_t        es_sel;
+	uint16_t        fs_sel;
+	uint16_t        gs_sel;
+	uint16_t        ldt_sel;
+	uint16_t        tr_sel;
+
+	uint16_t        reserved_16[4];
+};
+
+/**
+ * @brief Info to set vcpu state
+ *
+ * the pamameter for HC_SET_VCPU_REGS
+ */
+struct acrn_set_vcpu_regs {
+	/** the virtual CPU ID for the VCPU */
+	uint16_t vcpu_id;
+
+	/** reserved space to make cpu_state aligned to 8 bytes */
+	uint16_t reserved0[3];
+
+	/** the structure to hold vcpu state */
+	struct acrn_vcpu_regs vcpu_regs;
+};
+
 /*
  * Common IOCTL ID definition for DM
  */
@@ -29,4 +145,14 @@ struct api_version {
 #define IC_ID_GEN_BASE                  0x0UL
 #define IC_GET_API_VERSION             _IC_ID(IC_ID, IC_ID_GEN_BASE + 0x00)
 
+/* VM management */
+#define IC_ID_VM_BASE                  0x10UL
+#define IC_CREATE_VM                   _IC_ID(IC_ID, IC_ID_VM_BASE + 0x00)
+#define IC_DESTROY_VM                  _IC_ID(IC_ID, IC_ID_VM_BASE + 0x01)
+#define IC_START_VM                    _IC_ID(IC_ID, IC_ID_VM_BASE + 0x02)
+#define IC_PAUSE_VM                    _IC_ID(IC_ID, IC_ID_VM_BASE + 0x03)
+#define IC_CREATE_VCPU                 _IC_ID(IC_ID, IC_ID_VM_BASE + 0x04)
+#define IC_RESET_VM                    _IC_ID(IC_ID, IC_ID_VM_BASE + 0x05)
+#define IC_SET_VCPU_REGS               _IC_ID(IC_ID, IC_ID_VM_BASE + 0x06)
+
 #endif /* __ACRN_IOCTL_DEFS_H__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (6 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 07/15] drivers/acrn: add acrn vm/vcpu management for ACRN_HSM char device Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16 12:58   ` Dan Carpenter
  2019-08-16  2:25 ` [RFC PATCH 09/15] drivers/acrn: add passthrough device support Zhao Yakui
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Jason Chen CJ, Li, Fei, Liu Shuo

In order to launch the ACRN guest system, it needs to setup the mapping
between GPA (guest physical address) and HPA (host physical address).
This is based on memory virtualization and configured in EPT table.
The ioctl related with memory management is added and then the hypercall
is called so that the ACRN hypervisor can help to setup the memory
mapping for ACRN guest.
The 1G/2M huge page is used to optimize the EPT table for guest VM. This
will simplify the memory allocation and also optimizes the TLB.
For the MMIO mapping: It can support 4K/2M page.

IC_SET_MEMSEG: This is used to setup the memory mapping for the memory
of guest system by using hugetlb(Guest physical address and host virtual
addr).It is also used to setup the device MMIO mapping for PCI device.
IC_UNSET_MEMSEG: This is used to remove the device MMIO mapping for PCI
device. This is used with updating the MMIO mapping together. As the
acrn hypervisor is mainly used for embedded IOT device, it doesn't support
the dynamica removal of memory mapping.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Co-developed-by: Li, Fei <lei1.li@intel.com>
Signed-off-by: Li, Fei <lei1.li@intel.com>
Co-developed-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/acrn/Makefile             |   4 +-
 drivers/staging/acrn/acrn_dev.c           |  27 +++
 drivers/staging/acrn/acrn_drv_internal.h  |  90 +++++++---
 drivers/staging/acrn/acrn_mm.c            | 227 ++++++++++++++++++++++++
 drivers/staging/acrn/acrn_mm_hugetlb.c    | 281 ++++++++++++++++++++++++++++++
 drivers/staging/acrn/acrn_vm_mngt.c       |   2 +
 include/linux/acrn/acrn_drv.h             |  86 +++++++++
 include/uapi/linux/acrn/acrn_common_def.h |  25 +++
 include/uapi/linux/acrn/acrn_ioctl_defs.h |  41 +++++
 9 files changed, 759 insertions(+), 24 deletions(-)
 create mode 100644 drivers/staging/acrn/acrn_mm.c
 create mode 100644 drivers/staging/acrn/acrn_mm_hugetlb.c
 create mode 100644 include/linux/acrn/acrn_drv.h
 create mode 100644 include/uapi/linux/acrn/acrn_common_def.h

diff --git a/drivers/staging/acrn/Makefile b/drivers/staging/acrn/Makefile
index 426d6e8..ec62afe 100644
--- a/drivers/staging/acrn/Makefile
+++ b/drivers/staging/acrn/Makefile
@@ -1,4 +1,6 @@
 obj-$(CONFIG_ACRN_HSM)	:= acrn.o
 acrn-y := acrn_dev.o \
 	  acrn_hypercall.o \
-	  acrn_vm_mngt.o
+	  acrn_vm_mngt.o \
+	  acrn_mm.o \
+	  acrn_mm_hugetlb.o
diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
index 7372316..cb62819 100644
--- a/drivers/staging/acrn/acrn_dev.c
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -44,6 +44,7 @@ static
 int acrn_dev_open(struct inode *inodep, struct file *filep)
 {
 	struct acrn_vm *vm;
+	int i;
 
 	vm = kzalloc(sizeof(*vm), GFP_KERNEL);
 	if (!vm)
@@ -53,6 +54,10 @@ int acrn_dev_open(struct inode *inodep, struct file *filep)
 	vm->vmid = ACRN_INVALID_VMID;
 	vm->dev = acrn_device;
 
+	for (i = 0; i < HUGEPAGE_HLIST_ARRAY_SIZE; i++)
+		INIT_HLIST_HEAD(&vm->hugepage_hlist[i]);
+	mutex_init(&vm->hugepage_lock);
+
 	write_lock_bh(&acrn_vm_list_lock);
 	vm_list_add(&vm->list);
 	write_unlock_bh(&acrn_vm_list_lock);
@@ -212,6 +217,28 @@ long acrn_dev_ioctl(struct file *filep,
 		return ret;
 	}
 
+	case IC_SET_MEMSEG: {
+		struct vm_memmap memmap;
+
+		if (copy_from_user(&memmap, (void *)ioctl_param,
+				   sizeof(memmap)))
+			return -EFAULT;
+
+		ret = map_guest_memseg(vm, &memmap);
+		break;
+	}
+
+	case IC_UNSET_MEMSEG: {
+		struct vm_memmap memmap;
+
+		if (copy_from_user(&memmap, (void *)ioctl_param,
+				   sizeof(memmap)))
+			return -EFAULT;
+
+		ret = unmap_guest_memseg(vm, &memmap);
+		break;
+	}
+
 	default:
 		pr_warn("Unknown IOCTL 0x%x\n", ioctl_num);
 		ret = -EFAULT;
diff --git a/drivers/staging/acrn/acrn_drv_internal.h b/drivers/staging/acrn/acrn_drv_internal.h
index 6758dea..5098765 100644
--- a/drivers/staging/acrn/acrn_drv_internal.h
+++ b/drivers/staging/acrn/acrn_drv_internal.h
@@ -11,6 +11,57 @@
 #include <linux/list.h>
 #include <linux/refcount.h>
 
+struct vm_memory_region {
+#define MR_ADD		0
+#define MR_DEL		2
+	u32 type;
+
+	/* IN: mem attr */
+	u32 prot;
+
+	/* IN: beginning guest GPA to map */
+	u64 gpa;
+
+	/* IN: VM0's GPA which foreign gpa will be mapped to */
+	u64 vm0_gpa;
+
+	/* IN: size of the region */
+	u64 size;
+};
+
+struct set_regions {
+	/*IN: vmid for this hypercall */
+	u16 vmid;
+
+	/** Reserved */
+	u16 reserved[3];
+
+	/* IN: multi memmaps numbers */
+	u32 mr_num;
+
+	/** Reserved */
+	u32 reserved1;
+	/* IN:
+	 * the gpa of memmaps buffer, point to the memmaps array:
+	 * struct memory_map memmap_array[memmaps_num]
+	 * the max buffer size is one page.
+	 */
+	u64 regions_gpa;
+};
+
+struct wp_data {
+	/** set page write protect permission.
+	 *  true: set the wp; flase: clear the wp
+	 */
+	u8 set;
+
+	/** Reserved */
+	u8 reserved[7];
+
+	/** the guest physical address of the page to change */
+	u64 gpa;
+};
+
 #define ACRN_INVALID_VMID (-1)
 
 enum ACRN_VM_FLAGS {
@@ -22,6 +73,10 @@ extern rwlock_t acrn_vm_list_lock;
 
 void vm_list_add(struct list_head *list);
 
+#define HUGEPAGE_2M_HLIST_ARRAY_SIZE	32
+#define HUGEPAGE_1G_HLIST_ARRAY_SIZE	1
+#define HUGEPAGE_HLIST_ARRAY_SIZE	(HUGEPAGE_2M_HLIST_ARRAY_SIZE + \
+					 HUGEPAGE_1G_HLIST_ARRAY_SIZE)
 /**
  * struct acrn_vm - data structure to track guest
  *
@@ -32,6 +87,7 @@ void vm_list_add(struct list_head *list);
  * @max_gfn: maximum guest page frame number
  * @vcpu_num: vcpu number
  * @flags: VM flag bits
+ * @hugepage_hlist: hash list of hugepage
  */
 struct acrn_vm {
 	struct device *dev;
@@ -41,34 +97,22 @@ struct acrn_vm {
 	int max_gfn;
 	atomic_t vcpu_num;
 	unsigned long flags;
+	/* mutex to protect hugepage_hlist */
+	struct mutex hugepage_lock;
+	struct hlist_head hugepage_hlist[HUGEPAGE_HLIST_ARRAY_SIZE];
 };
 
 int acrn_vm_destroy(struct acrn_vm *vm);
 
-/**
- * find_get_vm() - find and keep guest acrn_vm based on the vmid
- *
- * @vmid: guest vmid
- *
- * Return: pointer to acrn_vm, NULL if can't find vm matching vmid
- */
 struct acrn_vm *find_get_vm(unsigned short vmid);
-
-/**
- * get_vm() - increase the refcnt of acrn_vm
- * @vm: pointer to acrn_vm which identify specific guest
- *
- * Return:
- */
 void get_vm(struct acrn_vm *vm);
-
-/**
- * put_vm() - release acrn_vm of guest according to guest vmid
- * If the latest reference count drops to zero, free acrn_vm as well
- * @vm: pointer to acrn_vm which identify specific guest
- *
- * Return:
- */
 void put_vm(struct acrn_vm *vm);
-
+void free_guest_mem(struct acrn_vm *vm);
+int map_guest_memseg(struct acrn_vm *vm, struct vm_memmap *memmap);
+int unmap_guest_memseg(struct acrn_vm *vm, struct vm_memmap *memmap);
+int hugepage_map_guest(struct acrn_vm *vm, struct vm_memmap *memmap);
+void hugepage_free_guest(struct acrn_vm *vm);
+void *hugepage_map_guest_phys(struct acrn_vm *vm, u64 guest_phys, size_t size);
+int hugepage_unmap_guest_phys(struct acrn_vm *vm, u64 guest_phys);
+int set_memory_regions(struct set_regions *regions);
 #endif
diff --git a/drivers/staging/acrn/acrn_mm.c b/drivers/staging/acrn/acrn_mm.c
new file mode 100644
index 0000000..4a7af7e
--- /dev/null
+++ b/drivers/staging/acrn/acrn_mm.c
@@ -0,0 +1,227 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
+/*
+ * ACRN: memory map management for each VM
+ *
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ *
+ * Jason Chen CJ <jason.cj.chen@intel.com>
+ * Zhao Yakui <yakui.zhao@intel.com>
+ * Li Fei <lei1.li@intel.com>
+ * Liu Shuo <shuo.a.liu@intel.com>
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/uaccess.h>
+#include <linux/io.h>
+
+#include <linux/acrn/acrn_ioctl_defs.h>
+#include <linux/acrn/acrn_drv.h>
+
+#include "acrn_drv_internal.h"
+#include "acrn_hypercall.h"
+
+static int set_memory_region(unsigned short vmid,
+			     struct vm_memory_region *region)
+{
+	struct set_regions *regions;
+	int ret;
+
+	regions = kzalloc(sizeof(*regions), GFP_KERNEL);
+	if (!regions)
+		return -ENOMEM;
+
+	regions->vmid = vmid;
+	regions->mr_num = 1;
+	regions->regions_gpa = virt_to_phys(region);
+
+	ret = set_memory_regions(regions);
+	kfree(regions);
+	if (ret < 0) {
+		pr_err("acrn: failed to set memory region for vm[%d]!\n",
+		       vmid);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+int acrn_add_memory_region(unsigned short vmid, unsigned long gpa,
+			   unsigned long host_gpa, unsigned long size,
+			   unsigned int mem_type, unsigned int mem_access_right)
+{
+	struct vm_memory_region *region;
+	int ret = 0;
+
+	region = kzalloc(sizeof(*region), GFP_KERNEL);
+	if (!region)
+		return -ENOMEM;
+
+	region->type = MR_ADD;
+	region->gpa = gpa;
+	region->vm0_gpa = host_gpa;
+	region->size = size;
+	region->prot = ((mem_type & MEM_TYPE_MASK) |
+			(mem_access_right & MEM_ACCESS_RIGHT_MASK));
+	ret = set_memory_region(vmid, region);
+	kfree(region);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(acrn_add_memory_region);
+
+int acrn_del_memory_region(unsigned short vmid, unsigned long gpa,
+			   unsigned long size)
+{
+	struct vm_memory_region *region;
+	int ret = 0;
+
+	region = kzalloc(sizeof(*region), GFP_KERNEL);
+	if (!region)
+		return -ENOMEM;
+
+	region->type = MR_DEL;
+	region->gpa = gpa;
+	region->vm0_gpa = 0;
+	region->size = size;
+	region->prot = 0;
+
+	ret = set_memory_region(vmid, region);
+	kfree(region);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(acrn_del_memory_region);
+
+int set_memory_regions(struct set_regions *regions)
+{
+	if (!regions)
+		return -EINVAL;
+
+	if (regions->mr_num > 0) {
+		if (hcall_set_memory_regions(virt_to_phys(regions)) < 0) {
+			pr_err("acrn: failed to set memory regions!\n");
+			return -EFAULT;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * when set is true, set page write protection,
+ * else clear page write protection.
+ */
+int acrn_write_protect_page(unsigned short vmid,
+			    unsigned long gpa, unsigned char set)
+{
+	struct wp_data *wp;
+	int ret = 0;
+
+	wp = kzalloc(sizeof(*wp), GFP_KERNEL);
+	if (!wp)
+		return -ENOMEM;
+
+	wp->set = set;
+	wp->gpa = gpa;
+	ret = hcall_write_protect_page(vmid, virt_to_phys(wp));
+	kfree(wp);
+
+	if (ret < 0) {
+		pr_err("acrn: vm[%d] %s failed !\n", vmid, __func__);
+		return -EFAULT;
+	}
+
+	pr_debug("VHM: %s, gpa: 0x%lx, set: %d\n", __func__, gpa, set);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(acrn_write_protect_page);
+
+int map_guest_memseg(struct acrn_vm *vm, struct vm_memmap *memmap)
+{
+	/* hugetlb use vma to do the mapping */
+	if (memmap->type == VM_MEMMAP_SYSMEM && memmap->using_vma)
+		return hugepage_map_guest(vm, memmap);
+
+	/* mmio */
+	if (memmap->type != VM_MEMMAP_MMIO) {
+		pr_err("acrn: %s invalid memmap type: %d\n",
+		       __func__, memmap->type);
+		return -EINVAL;
+	}
+
+	if (acrn_add_memory_region(vm->vmid, memmap->gpa,
+				   memmap->hpa, memmap->len,
+				   MEM_TYPE_UC, memmap->prot) < 0){
+		pr_err("acrn: failed to set memory region %d!\n", vm->vmid);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+int unmap_guest_memseg(struct acrn_vm *vm, struct vm_memmap *memmap)
+{
+	/* only handle mmio */
+	if (memmap->type != VM_MEMMAP_MMIO) {
+		pr_err("hsm: %s invalid memmap type: %d for unmap\n",
+		       __func__, memmap->type);
+		return -EINVAL;
+	}
+
+	if (acrn_del_memory_region(vm->vmid, memmap->gpa, memmap->len) < 0) {
+		pr_err("hsm: failed to del memory region %d!\n", vm->vmid);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+void free_guest_mem(struct acrn_vm *vm)
+{
+	return hugepage_free_guest(vm);
+}
+
+void *acrn_map_guest_phys(unsigned short vmid, u64 guest_phys, size_t size)
+{
+	struct acrn_vm *vm;
+	void *ret;
+
+	vm = find_get_vm(vmid);
+	if (!vm)
+		return NULL;
+
+	ret = hugepage_map_guest_phys(vm, guest_phys, size);
+
+	put_vm(vm);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(acrn_map_guest_phys);
+
+int acrn_unmap_guest_phys(unsigned short vmid, u64 guest_phys)
+{
+	struct acrn_vm *vm;
+	int ret;
+
+	vm = find_get_vm(vmid);
+	if (!vm) {
+		pr_warn("vm_list corrupted\n");
+		return -ESRCH;
+	}
+
+	ret = hugepage_unmap_guest_phys(vm, guest_phys);
+
+	put_vm(vm);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(acrn_unmap_guest_phys);
diff --git a/drivers/staging/acrn/acrn_mm_hugetlb.c b/drivers/staging/acrn/acrn_mm_hugetlb.c
new file mode 100644
index 0000000..69c5e02
--- /dev/null
+++ b/drivers/staging/acrn/acrn_mm_hugetlb.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
+/*
+ * ACRN: VM memory map based on hugetlb.
+ *
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ *
+ * Jason Chen CJ <jason.cj.chen@intel.com>
+ * Zhao Yakui <yakui.zhao@intel.com>
+ * Li Fei <lei1.li@intel.com>
+ * Liu Shuo <shuo.a.liu@intel.com>
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+
+#include <linux/acrn/acrn_ioctl_defs.h>
+#include <linux/acrn/acrn_drv.h>
+#include "acrn_drv_internal.h"
+#include "acrn_hypercall.h"
+
+#define HUGEPAGE_2M_SHIFT	21
+#define HUGEPAGE_1G_SHIFT	30
+
+#define HUGEPAGE_1G_HLIST_IDX	(HUGEPAGE_HLIST_ARRAY_SIZE - 1)
+
+struct hugepage_map {
+	struct hlist_node hlist;
+	u64 vm0_gpa;
+	size_t size;
+	u64 guest_gpa;
+	atomic_t refcount;
+};
+
+static inline
+struct hlist_head *hlist_2m_hash(struct acrn_vm *vm,
+				 unsigned long guest_gpa)
+{
+	return &vm->hugepage_hlist[guest_gpa >> HUGEPAGE_2M_SHIFT &
+			(HUGEPAGE_2M_HLIST_ARRAY_SIZE - 1)];
+}
+
+static int add_guest_map(struct acrn_vm *vm, unsigned long vm0_gpa,
+			 unsigned long guest_gpa, unsigned long size)
+{
+	struct hugepage_map *map;
+	int max_gfn;
+
+	map = kzalloc(sizeof(*map), GFP_KERNEL);
+	if (!map)
+		return -ENOMEM;
+
+	map->vm0_gpa = vm0_gpa;
+	map->guest_gpa = guest_gpa;
+	map->size = size;
+	atomic_set(&map->refcount, 1);
+
+	INIT_HLIST_NODE(&map->hlist);
+
+	max_gfn = (map->guest_gpa + map->size) >> PAGE_SHIFT;
+	if (vm->max_gfn < max_gfn)
+		vm->max_gfn = max_gfn;
+
+	pr_debug("HSM: add hugepage with size=0x%lx,vm0_hpa=0x%llx and its guest gpa = 0x%llx\n",
+		 map->size, map->vm0_gpa, map->guest_gpa);
+
+	mutex_lock(&vm->hugepage_lock);
+	/* 1G hugepage? */
+	if (map->size == (1UL << HUGEPAGE_1G_SHIFT))
+		hlist_add_head(&map->hlist,
+			       &vm->hugepage_hlist[HUGEPAGE_1G_HLIST_IDX]);
+	else
+		hlist_add_head(&map->hlist,
+			       hlist_2m_hash(vm, map->guest_gpa));
+	mutex_unlock(&vm->hugepage_lock);
+
+	return 0;
+}
+
+int hugepage_map_guest(struct acrn_vm *vm, struct vm_memmap *memmap)
+{
+	struct page *page = NULL, *regions_buf_pg = NULL;
+	unsigned long len, guest_gpa, vma;
+	struct vm_memory_region *region_array;
+	struct set_regions *regions;
+	int max_size = PAGE_SIZE / sizeof(struct vm_memory_region);
+	int ret;
+
+	if (!vm || !memmap)
+		return -EINVAL;
+
+	len = memmap->len;
+	vma = memmap->vma_base;
+	guest_gpa = memmap->gpa;
+
+	/* prepare set_memory_regions info */
+	regions_buf_pg = alloc_page(GFP_KERNEL);
+	if (!regions_buf_pg)
+		return -ENOMEM;
+
+	regions = kzalloc(sizeof(*regions), GFP_KERNEL);
+	if (!regions) {
+		__free_page(regions_buf_pg);
+		return -ENOMEM;
+	}
+	regions->mr_num = 0;
+	regions->vmid = vm->vmid;
+	regions->regions_gpa = page_to_phys(regions_buf_pg);
+	region_array = page_to_virt(regions_buf_pg);
+
+	while (len > 0) {
+		unsigned long vm0_gpa, pagesize;
+
+		ret = get_user_pages_fast(vma, 1, 1, &page);
+		if (unlikely(ret != 1) || (!page)) {
+			pr_err("failed to pin huge page!\n");
+			ret = -ENOMEM;
+			goto err;
+		}
+
+		vm0_gpa = page_to_phys(page);
+		pagesize = PAGE_SIZE << compound_order(page);
+
+		ret = add_guest_map(vm, vm0_gpa, guest_gpa, pagesize);
+		if (ret < 0) {
+			pr_err("failed to add memseg for huge page!\n");
+			goto err;
+		}
+
+		/* fill each memory region into region_array */
+		region_array[regions->mr_num].type = MR_ADD;
+		region_array[regions->mr_num].gpa = guest_gpa;
+		region_array[regions->mr_num].vm0_gpa = vm0_gpa;
+		region_array[regions->mr_num].size = pagesize;
+		region_array[regions->mr_num].prot =
+				(MEM_TYPE_WB & MEM_TYPE_MASK) |
+				(memmap->prot & MEM_ACCESS_RIGHT_MASK);
+		regions->mr_num++;
+		if (regions->mr_num == max_size) {
+			pr_debug("region buffer full, set & renew regions!\n");
+			ret = set_memory_regions(regions);
+			if (ret < 0) {
+				pr_err("failed to set regions,ret=%d!\n", ret);
+				goto err;
+			}
+			regions->mr_num = 0;
+		}
+
+		len -= pagesize;
+		vma += pagesize;
+		guest_gpa += pagesize;
+	}
+
+	ret = set_memory_regions(regions);
+	if (ret < 0) {
+		pr_err("failed to set regions, ret=%d!\n", ret);
+		goto err;
+	}
+
+	__free_page(regions_buf_pg);
+	kfree(regions);
+
+	return 0;
+err:
+	if (regions_buf_pg)
+		__free_page(regions_buf_pg);
+	if (page)
+		put_page(page);
+	kfree(regions);
+	return ret;
+}
+
+void hugepage_free_guest(struct acrn_vm *vm)
+{
+	struct hlist_node *htmp;
+	struct hugepage_map *map;
+	int i;
+
+	mutex_lock(&vm->hugepage_lock);
+	for (i = 0; i < HUGEPAGE_HLIST_ARRAY_SIZE; i++) {
+		if (!hlist_empty(&vm->hugepage_hlist[i])) {
+			hlist_for_each_entry_safe(map, htmp,
+						  &vm->hugepage_hlist[i],
+						  hlist) {
+				hlist_del(&map->hlist);
+				/* put_page to unpin huge page */
+				put_page(pfn_to_page(PHYS_PFN(map->vm0_gpa)));
+				if (!atomic_dec_and_test(&map->refcount)) {
+					pr_warn("failed to unmap for gpa %llx in vm %d\n",
+						map->guest_gpa, vm->vmid);
+				}
+				kfree(map);
+			}
+		}
+	}
+	mutex_unlock(&vm->hugepage_lock);
+}
+
+void *hugepage_map_guest_phys(struct acrn_vm *vm, u64 guest_phys, size_t size)
+{
+	struct hlist_node *htmp;
+	struct hugepage_map *map;
+	struct hlist_head *hpage_head;
+
+	mutex_lock(&vm->hugepage_lock);
+	/* check 1G hlist first */
+	if (!hlist_empty(&vm->hugepage_hlist[HUGEPAGE_1G_HLIST_IDX])) {
+		hpage_head = &vm->hugepage_hlist[HUGEPAGE_1G_HLIST_IDX];
+		hlist_for_each_entry_safe(map, htmp, hpage_head, hlist) {
+			if (guest_phys < map->guest_gpa ||
+			    guest_phys >= (map->guest_gpa + map->size))
+				continue;
+
+			if (guest_phys + size > map->guest_gpa + map->size)
+				goto err;
+
+			atomic_inc(&map->refcount);
+			mutex_unlock(&vm->hugepage_lock);
+			return phys_to_virt(map->vm0_gpa +
+					    guest_phys - map->guest_gpa);
+		}
+	}
+
+	/* check 2m hlist */
+	hlist_for_each_entry_safe(map, htmp,
+				  hlist_2m_hash(vm, guest_phys), hlist) {
+		if (guest_phys < map->guest_gpa ||
+		    guest_phys >= (map->guest_gpa + map->size))
+			continue;
+
+		if (guest_phys + size > map->guest_gpa + map->size)
+			goto err;
+
+		atomic_inc(&map->refcount);
+		mutex_unlock(&vm->hugepage_lock);
+		return phys_to_virt(map->vm0_gpa +
+				    guest_phys - map->guest_gpa);
+	}
+
+err:
+	mutex_unlock(&vm->hugepage_lock);
+	pr_warn("incorrect mem map, input %llx,size %lx\n",
+		guest_phys, size);
+	return NULL;
+}
+
+int hugepage_unmap_guest_phys(struct acrn_vm *vm, u64 guest_phys)
+{
+	struct hlist_node *htmp;
+	struct hugepage_map *map;
+	struct hlist_head *hpage_head;
+
+	mutex_lock(&vm->hugepage_lock);
+	/* check 1G hlist first */
+	if (!hlist_empty(&vm->hugepage_hlist[HUGEPAGE_1G_HLIST_IDX])) {
+		hpage_head = &vm->hugepage_hlist[HUGEPAGE_1G_HLIST_IDX];
+		hlist_for_each_entry_safe(map, htmp, hpage_head, hlist) {
+			if (guest_phys >= map->guest_gpa &&
+			    guest_phys < (map->guest_gpa + map->size)) {
+				atomic_dec(&map->refcount);
+				mutex_unlock(&vm->hugepage_lock);
+				return 0;
+			}
+		}
+	}
+	/* check 2m hlist */
+	hlist_for_each_entry_safe(map, htmp,
+				  hlist_2m_hash(vm, guest_phys), hlist) {
+		if (guest_phys >= map->guest_gpa &&
+		    guest_phys < (map->guest_gpa + map->size)) {
+			atomic_dec(&map->refcount);
+			mutex_unlock(&vm->hugepage_lock);
+			return 0;
+		}
+	}
+	mutex_unlock(&vm->hugepage_lock);
+	return -ESRCH;
+}
diff --git a/drivers/staging/acrn/acrn_vm_mngt.c b/drivers/staging/acrn/acrn_vm_mngt.c
index 04c551d..9c6dd6d 100644
--- a/drivers/staging/acrn/acrn_vm_mngt.c
+++ b/drivers/staging/acrn/acrn_vm_mngt.c
@@ -12,6 +12,7 @@
 #include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/rwlock_types.h>
+#include <linux/acrn/acrn_ioctl_defs.h>
 
 #include "acrn_hypercall.h"
 #include "acrn_drv_internal.h"
@@ -43,6 +44,7 @@ void get_vm(struct acrn_vm *vm)
 void put_vm(struct acrn_vm *vm)
 {
 	if (refcount_dec_and_test(&vm->refcnt)) {
+		free_guest_mem(vm);
 		kfree(vm);
 		pr_debug("hsm: freed vm\n");
 	}
diff --git a/include/linux/acrn/acrn_drv.h b/include/linux/acrn/acrn_drv.h
new file mode 100644
index 0000000..62b03f0
--- /dev/null
+++ b/include/linux/acrn/acrn_drv.h
@@ -0,0 +1,86 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
+/**
+ * @file acrn_drv.h
+ *
+ * ACRN HSM exported API for other modules.
+ */
+
+#ifndef _ACRN_DRV_H
+#define _ACRN_DRV_H
+
+#include <linux/acrn/acrn_common_def.h>
+
+/**
+ * acrn_map_guest_phys - map guest physical address to SOS kernel
+ *			 virtual address
+ *
+ * @vmid: guest vmid
+ * @uos_phys: physical address in guest
+ * @size: the memory size mapped
+ *
+ * Return: SOS kernel virtual address, NULL on error
+ */
+extern void *acrn_map_guest_phys(unsigned short vmid, u64 uos_phys,
+				 size_t size);
+
+/**
+ * acrn_unmap_guest_phys - unmap guest physical address
+ *
+ * @vmid: guest vmid
+ * @uos_phys: physical address in guest
+ *
+ * Return: 0 on success, <0 for error.
+ */
+extern int acrn_unmap_guest_phys(unsigned short vmid, u64 uos_phys);
+
+/**
+ * acrn_add_memory_region - add a guest memory region
+ *
+ * @vmid: guest vmid
+ * @gpa: gpa of UOS
+ * @host_gpa: gpa of SOS
+ * @size: memory region size
+ * @mem_type: memory mapping type. Possible value could be:
+ *                    MEM_TYPE_WB
+ *                    MEM_TYPE_WT
+ *                    MEM_TYPE_UC
+ *                    MEM_TYPE_WC
+ *                    MEM_TYPE_WP
+ * @mem_access_right: memory mapping access. Possible value could be:
+ *                    MEM_ACCESS_READ
+ *                    MEM_ACCESS_WRITE
+ *                    MEM_ACCESS_EXEC
+ *                    MEM_ACCESS_RWX
+ *
+ * Return: 0 on success, <0 for error.
+ */
+extern int acrn_add_memory_region(unsigned short vmid, unsigned long gpa,
+				  unsigned long host_gpa, unsigned long size,
+				  unsigned int mem_type,
+				  unsigned int mem_access_right);
+
+/**
+ * acrn_del_memory_region - delete a guest memory region
+ *
+ * @vmid: guest vmid
+ * @gpa: gpa of UOS
+ * @size: memory region size
+ *
+ * Return: 0 on success, <0 for error.
+ */
+extern int acrn_del_memory_region(unsigned short vmid, unsigned long gpa,
+			   unsigned long size);
+
+/**
+ * write_protect_page - change one page write protection
+ *
+ * @vmid: guest vmid
+ * @gpa: gpa in guest vmid
+ * @set: set or clear page write protection
+ *
+ * Return: 0 on success, <0 for error.
+ */
+extern int acrn_write_protect_page(unsigned short vmid, unsigned long gpa,
+				   unsigned char set);
+
+#endif
diff --git a/include/uapi/linux/acrn/acrn_common_def.h b/include/uapi/linux/acrn/acrn_common_def.h
new file mode 100644
index 0000000..a0f90a3
--- /dev/null
+++ b/include/uapi/linux/acrn/acrn_common_def.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
+/**
+ * @file acrn_common_def.h
+ *
+ * Common structure/definitions for acrn_ioctl/acrn_drv
+ */
+
+#ifndef _ACRN_COMMON_DEF_H
+#define _ACRN_COMMON_DEF_H
+
+/* Generic memory attributes */
+#define	MEM_ACCESS_READ                 0x00000001
+#define	MEM_ACCESS_WRITE                0x00000002
+#define	MEM_ACCESS_EXEC	                0x00000004
+#define	MEM_ACCESS_RWX			(MEM_ACCESS_READ | MEM_ACCESS_WRITE | \
+						MEM_ACCESS_EXEC)
+#define MEM_ACCESS_RIGHT_MASK           0x00000007
+#define	MEM_TYPE_WB                     0x00000040
+#define	MEM_TYPE_WT                     0x00000080
+#define	MEM_TYPE_UC                     0x00000100
+#define	MEM_TYPE_WC                     0x00000200
+#define	MEM_TYPE_WP                     0x00000400
+#define MEM_TYPE_MASK                   0x000007C0
+
+#endif /* _ACRN_COMMON_DEF_H */
diff --git a/include/uapi/linux/acrn/acrn_ioctl_defs.h b/include/uapi/linux/acrn/acrn_ioctl_defs.h
index ebcf812..76e358f 100644
--- a/include/uapi/linux/acrn/acrn_ioctl_defs.h
+++ b/include/uapi/linux/acrn/acrn_ioctl_defs.h
@@ -8,6 +8,8 @@
 #ifndef __ACRN_IOCTL_DEFS_H__
 #define __ACRN_IOCTL_DEFS_H__
 
+#include "acrn_common_def.h"
+
 /**
  * struct api_version - data structure to track ACRN_SRV API version
  *
@@ -135,6 +137,39 @@ struct acrn_set_vcpu_regs {
 	struct acrn_vcpu_regs vcpu_regs;
 };
 
+#define VM_MEMMAP_SYSMEM       0
+#define VM_MEMMAP_MMIO         1
+
+/**
+ * struct vm_memmap - EPT memory mapping info for guest
+ */
+struct vm_memmap {
+	/** @type: memory mapping type */
+	uint32_t type;
+	/** @using_vma: using vma_base to get vm0_gpa,
+	 * only for type == VM_SYSTEM
+	 */
+	uint32_t using_vma;
+	/** @gpa: user OS guest physical start address of memory mapping */
+	uint64_t gpa;
+	/** union */
+	union {
+		/** @hpa: host physical start address of memory,
+		 * only for type == VM_MEMMAP_MMIO
+		 */
+		uint64_t hpa;
+		/** @vma_base: service OS user virtual start address of
+		 * memory, only for type == VM_MEMMAP_SYSMEM &&
+		 * using_vma == true
+		 */
+		uint64_t vma_base;
+	};
+	/** @len: the length of memory range mapped */
+	uint64_t len;	/* mmap length */
+	/** @prot: memory mapping attribute */
+	uint32_t prot;	/* RWX */
+};
+
 /*
  * Common IOCTL ID definition for DM
  */
@@ -155,4 +190,10 @@ struct acrn_set_vcpu_regs {
 #define IC_RESET_VM                    _IC_ID(IC_ID, IC_ID_VM_BASE + 0x05)
 #define IC_SET_VCPU_REGS               _IC_ID(IC_ID, IC_ID_VM_BASE + 0x06)
 
+
+/* Guest memory management */
+#define IC_ID_MEM_BASE                  0x40UL
+#define IC_SET_MEMSEG                   _IC_ID(IC_ID, IC_ID_MEM_BASE + 0x01)
+#define IC_UNSET_MEMSEG                 _IC_ID(IC_ID, IC_ID_MEM_BASE + 0x02)
+
 #endif /* __ACRN_IOCTL_DEFS_H__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 09/15] drivers/acrn: add passthrough device support
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (7 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN " Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16 13:05   ` Dan Carpenter
  2019-08-16  2:25 ` [RFC PATCH 10/15] drivers/acrn: add interrupt injection support Zhao Yakui
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Gao, Shiqing, Jason Chen CJ

Pass-through device plays an important role for guest OS when it is
accessed exclusively by guest OS. This is critical for the performance
scenario. After one PCI device is assigned to guest OS, it can be
accessed exclusively by guest system. It can avoid the device emulation
and provide the better performance.
It provides the following operations for supporting pass-through device.
- assign, pass-through device
   ACRN_ASSIGN_PTDEV: Assign one PCI device to one guest OS
- deassign pass-through device
   ACRN_DEASSIGN_PTDEV: Return the assigned PCI device from
the guest OS so that it can be assigned to another guest OS.
- set, reset pass-through device intr info
   ACRN_SET_PTDEV_INTR_INFO
   ACRN_RESET_PTDEV_INTR_INFO : This is used to configure
the interrupt info for the assigned pass-through device so that
ACRN hypervisor can inject the interrupt into guest system after
the device interrupt is triggered.

Co-developed-by: Gao, Shiqing <shiqing.gao@intel.com>
Signed-off-by: Gao, Shiqing <shiqing.gao@intel.com>
Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/acrn/acrn_dev.c           | 77 +++++++++++++++++++++++++++++++
 drivers/staging/acrn/acrn_drv_internal.h  | 25 ++++++++++
 include/uapi/linux/acrn/acrn_ioctl_defs.h | 47 +++++++++++++++++++
 3 files changed, 149 insertions(+)

diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
index cb62819..28bbd78 100644
--- a/drivers/staging/acrn/acrn_dev.c
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -239,6 +239,83 @@ long acrn_dev_ioctl(struct file *filep,
 		break;
 	}
 
+	case IC_ASSIGN_PTDEV: {
+		unsigned short bdf;
+
+		if (copy_from_user(&bdf, (void *)ioctl_param,
+				   sizeof(unsigned short)))
+			return -EFAULT;
+
+		ret = hcall_assign_ptdev(vm->vmid, bdf);
+		if (ret < 0) {
+			pr_err("acrn: failed to assign ptdev!\n");
+			return -EFAULT;
+		}
+		break;
+	}
+	case IC_DEASSIGN_PTDEV: {
+		unsigned short bdf;
+
+		if (copy_from_user(&bdf, (void *)ioctl_param,
+				   sizeof(unsigned short)))
+			return -EFAULT;
+
+		ret = hcall_deassign_ptdev(vm->vmid, bdf);
+		if (ret < 0) {
+			pr_err("acrn: failed to deassign ptdev!\n");
+			return -EFAULT;
+		}
+		break;
+	}
+
+	case IC_SET_PTDEV_INTR_INFO: {
+		struct ic_ptdev_irq ic_pt_irq;
+		struct hc_ptdev_irq *hc_pt_irq;
+
+		if (copy_from_user(&ic_pt_irq, (void *)ioctl_param,
+				   sizeof(ic_pt_irq)))
+			return -EFAULT;
+
+		hc_pt_irq = kmalloc(sizeof(*hc_pt_irq), GFP_KERNEL);
+		if (!hc_pt_irq)
+			return -ENOMEM;
+
+		memcpy(hc_pt_irq, &ic_pt_irq, sizeof(*hc_pt_irq));
+
+		ret = hcall_set_ptdev_intr_info(vm->vmid,
+						virt_to_phys(hc_pt_irq));
+		kfree(hc_pt_irq);
+		if (ret < 0) {
+			pr_err("acrn: failed to set intr info for ptdev!\n");
+			return -EFAULT;
+		}
+
+		break;
+	}
+	case IC_RESET_PTDEV_INTR_INFO: {
+		struct ic_ptdev_irq ic_pt_irq;
+		struct hc_ptdev_irq *hc_pt_irq;
+
+		if (copy_from_user(&ic_pt_irq, (void *)ioctl_param,
+				   sizeof(ic_pt_irq)))
+			return -EFAULT;
+
+		hc_pt_irq = kmalloc(sizeof(*hc_pt_irq), GFP_KERNEL);
+		if (!hc_pt_irq)
+			return -ENOMEM;
+
+		memcpy(hc_pt_irq, &ic_pt_irq, sizeof(*hc_pt_irq));
+
+		ret = hcall_reset_ptdev_intr_info(vm->vmid,
+						  virt_to_phys(hc_pt_irq));
+		kfree(hc_pt_irq);
+		if (ret < 0) {
+			pr_err("acrn: failed to reset intr info for ptdev!\n");
+			return -EFAULT;
+		}
+		break;
+	}
+
 	default:
 		pr_warn("Unknown IOCTL 0x%x\n", ioctl_num);
 		ret = -EFAULT;
diff --git a/drivers/staging/acrn/acrn_drv_internal.h b/drivers/staging/acrn/acrn_drv_internal.h
index 5098765..3e633cd 100644
--- a/drivers/staging/acrn/acrn_drv_internal.h
+++ b/drivers/staging/acrn/acrn_drv_internal.h
@@ -115,4 +115,29 @@ void hugepage_free_guest(struct acrn_vm *vm);
 void *hugepage_map_guest_phys(struct acrn_vm *vm, u64 guest_phys, size_t size);
 int hugepage_unmap_guest_phys(struct acrn_vm *vm, u64 guest_phys);
 int set_memory_regions(struct set_regions *regions);
+
+struct hc_ptdev_irq {
+#define IRQ_INTX 0
+#define IRQ_MSI 1
+#define IRQ_MSIX 2
+	u32 type;
+	u16 virt_bdf;	/* IN: Device virtual BDF# */
+	u16 phys_bdf;	/* IN: Device physical BDF# */
+	union {
+		struct {
+			u8 virt_pin;	/* IN: virtual IOAPIC pin */
+			u8 reserved0[3];	/* Reserved */
+			u8 phys_pin;	/* IN: physical IOAPIC pin */
+			u8 reserved1[3];	/* Reserved */
+			bool pic_pin;		/* IN: pin from PIC? */
+			u8 reserved2[3];	/* Reserved */
+		} intx;
+		struct {
+			/* IN: vector count of MSI/MSIX */
+			u32 vector_cnt;
+			u32 reserved0[3];	/* Reserved */
+		} msix;
+	};
+};
+
 #endif
diff --git a/include/uapi/linux/acrn/acrn_ioctl_defs.h b/include/uapi/linux/acrn/acrn_ioctl_defs.h
index 76e358f..ee259c2 100644
--- a/include/uapi/linux/acrn/acrn_ioctl_defs.h
+++ b/include/uapi/linux/acrn/acrn_ioctl_defs.h
@@ -137,6 +137,46 @@ struct acrn_set_vcpu_regs {
 	struct acrn_vcpu_regs vcpu_regs;
 };
 
+/**
+ * struct ic_ptdev_irq - pass thru device irq data structure
+ */
+struct ic_ptdev_irq {
+#define IRQ_INTX 0
+#define IRQ_MSI 1
+#define IRQ_MSIX 2
+	/** @type: irq type */
+	uint32_t type;
+	/** @virt_bdf: virtual bdf description of pass thru device */
+	uint16_t virt_bdf;	/* IN: Device virtual BDF# */
+	/** @phys_bdf: physical bdf description of pass thru device */
+	uint16_t phys_bdf;	/* IN: Device physical BDF# */
+	/** union */
+	union {
+		/** struct intx - info of IOAPIC/PIC interrupt */
+		struct {
+			/** @virt_pin: virtual IOAPIC pin */
+			uint32_t virt_pin;
+			/** @phys_pin: physical IOAPIC pin */
+			uint32_t phys_pin;
+			/** @is_pic_pin: PIC pin */
+			uint32_t is_pic_pin;
+		} intx;
+
+		/** struct msix - info of MSI/MSIX interrupt */
+		struct {
+			/* Keep this filed on top of msix */
+			/** @vector_cnt: vector count of MSI/MSIX */
+			uint32_t vector_cnt;
+
+			/** @table_size: size of MSIX table(round up to 4K) */
+			uint32_t table_size;
+
+			/** @table_paddr: physical address of MSIX table */
+			uint64_t table_paddr;
+		} msix;
+	};
+};
+
 #define VM_MEMMAP_SYSMEM       0
 #define VM_MEMMAP_MMIO         1
 
@@ -196,4 +236,11 @@ struct vm_memmap {
 #define IC_SET_MEMSEG                   _IC_ID(IC_ID, IC_ID_MEM_BASE + 0x01)
 #define IC_UNSET_MEMSEG                 _IC_ID(IC_ID, IC_ID_MEM_BASE + 0x02)
 
+/* PCI assignment*/
+#define IC_ID_PCI_BASE                  0x50UL
+#define IC_ASSIGN_PTDEV                _IC_ID(IC_ID, IC_ID_PCI_BASE + 0x00)
+#define IC_DEASSIGN_PTDEV              _IC_ID(IC_ID, IC_ID_PCI_BASE + 0x01)
+#define IC_SET_PTDEV_INTR_INFO         _IC_ID(IC_ID, IC_ID_PCI_BASE + 0x03)
+#define IC_RESET_PTDEV_INTR_INFO       _IC_ID(IC_ID, IC_ID_PCI_BASE + 0x04)
+
 #endif /* __ACRN_IOCTL_DEFS_H__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 10/15] drivers/acrn: add interrupt injection support
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (8 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 09/15] drivers/acrn: add passthrough device support Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16 13:12   ` Dan Carpenter
  2019-08-16  2:25 ` [RFC PATCH 11/15] drivers/acrn: add the support of handling emulated ioreq Zhao Yakui
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Jason Chen CJ, Mingqiang Chi

After ACRN devicemodel finishes the emulation of trapped MMIO/IO/PCICFG
access, it needs to inject one interrupt to notify that the guest can be
resumed.
IC_SET_IRQLINE: This is used to inject virtual IOAPIC gsi interrupt
IC_INJECT_MSI: Inject virtual MSI interrupt to guest OS
IC_VM_INTR_MONITOR: monitor the interrupt info for one guest OS

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Co-developed-by: Mingqiang Chi <mingqiang.chi@intel.com>
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/acrn/acrn_dev.c           | 48 +++++++++++++++++++++++++++++++
 drivers/staging/acrn/acrn_vm_mngt.c       | 28 ++++++++++++++++++
 include/linux/acrn/acrn_drv.h             | 12 ++++++++
 include/uapi/linux/acrn/acrn_ioctl_defs.h | 18 ++++++++++++
 4 files changed, 106 insertions(+)

diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
index 28bbd78..1476817 100644
--- a/drivers/staging/acrn/acrn_dev.c
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -19,6 +19,7 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/io.h>
+#include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
@@ -316,6 +317,53 @@ long acrn_dev_ioctl(struct file *filep,
 		break;
 	}
 
+	case IC_SET_IRQLINE: {
+		ret = hcall_set_irqline(vm->vmid, ioctl_param);
+		if (ret < 0) {
+			pr_err("acrn: failed to set irqline!\n");
+			return -EFAULT;
+		}
+		break;
+	}
+
+	case IC_INJECT_MSI: {
+		struct acrn_msi_entry *msi;
+
+		msi = kmalloc(sizeof(*msi), GFP_KERNEL);
+		if (!msi)
+			return -ENOMEM;
+
+		if (copy_from_user(msi, (void *)ioctl_param, sizeof(*msi))) {
+			kfree(msi);
+			return -EFAULT;
+		}
+
+		ret = hcall_inject_msi(vm->vmid, virt_to_phys(msi));
+		kfree(msi);
+		if (ret < 0) {
+			pr_err("acrn: failed to inject!\n");
+			return -EFAULT;
+		}
+		break;
+	}
+
+	case IC_VM_INTR_MONITOR: {
+		struct page *page;
+
+		ret = get_user_pages_fast(ioctl_param, 1, 1, &page);
+		if (unlikely(ret != 1) || !page) {
+			pr_err("acrn-dev: failed to pin intr hdr buffer!\n");
+			return -ENOMEM;
+		}
+
+		ret = hcall_vm_intr_monitor(vm->vmid, page_to_phys(page));
+		if (ret < 0) {
+			pr_err("acrn-dev: monitor intr data err=%ld\n", ret);
+			return -EFAULT;
+		}
+		break;
+	}
+
 	default:
 		pr_warn("Unknown IOCTL 0x%x\n", ioctl_num);
 		ret = -EFAULT;
diff --git a/drivers/staging/acrn/acrn_vm_mngt.c b/drivers/staging/acrn/acrn_vm_mngt.c
index 9c6dd6d..4287595 100644
--- a/drivers/staging/acrn/acrn_vm_mngt.c
+++ b/drivers/staging/acrn/acrn_vm_mngt.c
@@ -11,8 +11,10 @@
 #include <linux/list.h>
 #include <linux/slab.h>
 #include <linux/init.h>
+#include <linux/io.h>
 #include <linux/rwlock_types.h>
 #include <linux/acrn/acrn_ioctl_defs.h>
+#include <linux/acrn/acrn_drv.h>
 
 #include "acrn_hypercall.h"
 #include "acrn_drv_internal.h"
@@ -72,3 +74,29 @@ int acrn_vm_destroy(struct acrn_vm *vm)
 	vm->vmid = ACRN_INVALID_VMID;
 	return 0;
 }
+
+int acrn_inject_msi(unsigned short vmid, unsigned long msi_addr,
+		    unsigned long msi_data)
+{
+	struct acrn_msi_entry *msi;
+	int ret;
+
+	msi = kzalloc(sizeof(*msi), GFP_KERNEL);
+
+	if (!msi)
+		return -ENOMEM;
+
+	/* msi_addr: addr[19:12] with dest vcpu id */
+	/* msi_data: data[7:0] with vector */
+	msi->msi_addr = msi_addr;
+	msi->msi_data = msi_data;
+	ret = hcall_inject_msi(vmid, virt_to_phys(msi));
+	kfree(msi);
+	if (ret < 0) {
+		pr_err("acrn: failed to inject MSI for vmid %d, msi_addr %lx msi_data%lx!\n",
+		       vmid, msi_addr, msi_data);
+		return -EFAULT;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(acrn_inject_msi);
diff --git a/include/linux/acrn/acrn_drv.h b/include/linux/acrn/acrn_drv.h
index 62b03f0..bcdfcaf 100644
--- a/include/linux/acrn/acrn_drv.h
+++ b/include/linux/acrn/acrn_drv.h
@@ -83,4 +83,16 @@ extern int acrn_del_memory_region(unsigned short vmid, unsigned long gpa,
 extern int acrn_write_protect_page(unsigned short vmid, unsigned long gpa,
 				   unsigned char set);
 
+/**
+ * acrn_inject_msi() - inject MSI interrupt to guest
+ *
+ * @vmid: guest vmid
+ * @msi_addr: MSI addr matches MSI spec
+ * @msi_data: MSI data matches MSI spec
+ *
+ * Return: 0 on success, <0 on error
+ */
+extern int acrn_inject_msi(unsigned short vmid, unsigned long msi_addr,
+			   unsigned long msi_data);
+
 #endif
diff --git a/include/uapi/linux/acrn/acrn_ioctl_defs.h b/include/uapi/linux/acrn/acrn_ioctl_defs.h
index ee259c2..371904c 100644
--- a/include/uapi/linux/acrn/acrn_ioctl_defs.h
+++ b/include/uapi/linux/acrn/acrn_ioctl_defs.h
@@ -210,6 +210,19 @@ struct vm_memmap {
 	uint32_t prot;	/* RWX */
 };
 
+/**
+ * @brief Info to inject a MSI interrupt to VM
+ *
+ * the parameter for HC_INJECT_MSI hypercall
+ */
+struct acrn_msi_entry {
+	/** MSI addr[19:12] with dest VCPU ID */
+	uint64_t msi_addr;
+
+	/** MSI data[7:0] with vector */
+	uint64_t msi_data;
+};
+
 /*
  * Common IOCTL ID definition for DM
  */
@@ -230,6 +243,11 @@ struct vm_memmap {
 #define IC_RESET_VM                    _IC_ID(IC_ID, IC_ID_VM_BASE + 0x05)
 #define IC_SET_VCPU_REGS               _IC_ID(IC_ID, IC_ID_VM_BASE + 0x06)
 
+/* IRQ and Interrupts */
+#define IC_ID_IRQ_BASE                 0x20UL
+#define IC_INJECT_MSI                  _IC_ID(IC_ID, IC_ID_IRQ_BASE + 0x03)
+#define IC_VM_INTR_MONITOR             _IC_ID(IC_ID, IC_ID_IRQ_BASE + 0x04)
+#define IC_SET_IRQLINE                 _IC_ID(IC_ID, IC_ID_IRQ_BASE + 0x05)
 
 /* Guest memory management */
 #define IC_ID_MEM_BASE                  0x40UL
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 11/15] drivers/acrn: add the support of handling emulated ioreq
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (9 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 10/15] drivers/acrn: add interrupt injection support Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16 13:39   ` Dan Carpenter
  2019-08-16  2:25 ` [RFC PATCH 12/15] drivers/acrn: add driver-specific IRQ handle to dispatch IO_REQ request Zhao Yakui
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Jason Chen CJ, Yin FengWei, Liu Shuo

After guest UOS is booted, the MMIO/IO access will cause that
it exits from VMX non-root env into ACRN hypervisor. Then the ACRN
hypervisor will inject virtual irq into the Linux guest with ACRN HSM
module. The ACRN_HSM handles this virtual irq (which is based on
HYPERVISOR_CALLBACK_VECTOR), parses corresponding IO request from shared
IOReq buffer and distributes it to different ioreq client. After the
ioreq emulation is finished, it will notify ACRN hypervisor and then
hypervisor will resume the execution of guest UOS.

ACRN HSM module will group some range of emulated MMIO/IO addr as
one ioreq_client. It will determine which ioreq_client should handle
the emulated MMIO/IO request based on the address and then dispatch it
into the ioreq_client thread. User-space device model will create one
default ioreq_client, which is used to handle the emulated MMIO/IO in
user-space thread.

Add ioreq service and defines IOReq APIs like below:
   int acrn_ioreq_create_client(unsigned long vmid,
		ioreq_handler_t handler,
		void *client_priv,
		char *name);
   void acrn_ioreq_destroy_client(int client_id);
   int acrn_ioreq_add_iorange(int client_id, enum request_type type,
       long start, long end);
   int acrn_ioreq_del_iorange(int client_id, enum request_type type,
       long start, long end);
   struct acrn_request * acrn_ioreq_get_reqbuf(int client_id);
   int acrn_ioreq_attach_client(int client_id);
   int acrn_ioreq_complete_request(int client_id);

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Co-developed-by: Yin FengWei <fengwei.yin@intel.com>
Signed-off-by: Yin FengWei <fengwei.yin@intel.com>
Co-developed-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/acrn/Makefile             |   3 +-
 drivers/staging/acrn/acrn_dev.c           |  58 ++
 drivers/staging/acrn/acrn_drv_internal.h  |  33 ++
 drivers/staging/acrn/acrn_ioreq.c         | 937 ++++++++++++++++++++++++++++++
 drivers/staging/acrn/acrn_vm_mngt.c       |   7 +
 include/linux/acrn/acrn_drv.h             | 102 ++++
 include/uapi/linux/acrn/acrn_common_def.h | 176 ++++++
 include/uapi/linux/acrn/acrn_ioctl_defs.h |  20 +
 8 files changed, 1335 insertions(+), 1 deletion(-)
 create mode 100644 drivers/staging/acrn/acrn_ioreq.c

diff --git a/drivers/staging/acrn/Makefile b/drivers/staging/acrn/Makefile
index ec62afe..a381944 100644
--- a/drivers/staging/acrn/Makefile
+++ b/drivers/staging/acrn/Makefile
@@ -3,4 +3,5 @@ acrn-y := acrn_dev.o \
 	  acrn_hypercall.o \
 	  acrn_vm_mngt.o \
 	  acrn_mm.o \
-	  acrn_mm_hugetlb.o
+	  acrn_mm_hugetlb.o \
+	  acrn_ioreq.o
diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
index 1476817..28258fb 100644
--- a/drivers/staging/acrn/acrn_dev.c
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -26,6 +26,7 @@
 #include <asm/acrn.h>
 #include <asm/hypervisor.h>
 #include <linux/acrn/acrn_ioctl_defs.h>
+#include <linux/acrn/acrn_drv.h>
 
 #include "acrn_hypercall.h"
 #include "acrn_drv_internal.h"
@@ -59,6 +60,9 @@ int acrn_dev_open(struct inode *inodep, struct file *filep)
 		INIT_HLIST_HEAD(&vm->hugepage_hlist[i]);
 	mutex_init(&vm->hugepage_lock);
 
+	INIT_LIST_HEAD(&vm->ioreq_client_list);
+	spin_lock_init(&vm->ioreq_client_lock);
+
 	write_lock_bh(&acrn_vm_list_lock);
 	vm_list_add(&vm->list);
 	write_unlock_bh(&acrn_vm_list_lock);
@@ -131,9 +135,20 @@ long acrn_dev_ioctl(struct file *filep,
 		vm->vmid = created_vm->vmid;
 		atomic_set(&vm->vcpu_num, 0);
 
+		ret = acrn_ioreq_init(vm, created_vm->req_buf);
+		if (ret < 0)
+			goto ioreq_buf_fail;
+
 		pr_info("acrn: VM %d created\n", created_vm->vmid);
 		kfree(created_vm);
 		break;
+
+ioreq_buf_fail:
+		hcall_destroy_vm(created_vm->vmid);
+		vm->vmid = ACRN_INVALID_VMID;
+		kfree(created_vm);
+		break;
+
 	}
 
 	case IC_START_VM: {
@@ -364,6 +379,47 @@ long acrn_dev_ioctl(struct file *filep,
 		break;
 	}
 
+	case IC_CREATE_IOREQ_CLIENT: {
+		int client_id;
+
+		client_id = acrn_ioreq_create_fallback_client(vm->vmid,
+							      "acrndm");
+		if (client_id < 0)
+			return -EFAULT;
+		return client_id;
+	}
+
+	case IC_DESTROY_IOREQ_CLIENT: {
+		int client = ioctl_param;
+
+		acrn_ioreq_destroy_client(client);
+		break;
+	}
+
+	case IC_ATTACH_IOREQ_CLIENT: {
+		int client = ioctl_param;
+
+		return acrn_ioreq_attach_client(client);
+	}
+
+	case IC_NOTIFY_REQUEST_FINISH: {
+		struct ioreq_notify notify;
+
+		if (copy_from_user(&notify, (void *)ioctl_param,
+				   sizeof(notify)))
+			return -EFAULT;
+
+		ret = acrn_ioreq_complete_request(notify.client_id,
+						  notify.vcpu, NULL);
+		if (ret < 0)
+			return -EFAULT;
+		break;
+	}
+	case IC_CLEAR_VM_IOREQ: {
+		acrn_ioreq_clear_request(vm);
+		break;
+	}
+
 	default:
 		pr_warn("Unknown IOCTL 0x%x\n", ioctl_num);
 		ret = -EFAULT;
@@ -397,6 +453,7 @@ static const struct file_operations fops = {
 	.open = acrn_dev_open,
 	.release = acrn_dev_release,
 	.unlocked_ioctl = acrn_dev_ioctl,
+	.poll = acrn_dev_poll,
 };
 
 #define EAX_PRIVILEGE_VM	BIT(0)
@@ -461,6 +518,7 @@ static int __init acrn_init(void)
 		return PTR_ERR(acrn_device);
 	}
 
+	acrn_ioreq_driver_init();
 	pr_info("acrn: ACRN Hypervisor service module initialized\n");
 	acrn_hsm_inited = 1;
 	return 0;
diff --git a/drivers/staging/acrn/acrn_drv_internal.h b/drivers/staging/acrn/acrn_drv_internal.h
index 3e633cd..7813387 100644
--- a/drivers/staging/acrn/acrn_drv_internal.h
+++ b/drivers/staging/acrn/acrn_drv_internal.h
@@ -10,6 +10,7 @@
 #include <linux/types.h>
 #include <linux/list.h>
 #include <linux/refcount.h>
+#include <linux/poll.h>
 
 struct vm_memory_region {
 #define MR_ADD		0
@@ -66,6 +67,7 @@ struct wp_data {
 
 enum ACRN_VM_FLAGS {
 	ACRN_VM_DESTROYED = 0,
+	ACRN_VM_IOREQ_FREE,
 };
 
 extern struct list_head acrn_vm_list;
@@ -88,6 +90,12 @@ void vm_list_add(struct list_head *list);
  * @vcpu_num: vcpu number
  * @flags: VM flag bits
  * @hugepage_hlist: hash list of hugepage
+ * @ioreq_fallback_client: default ioreq client
+ * @ioreq_client_lock: spinlock to protect ioreq_client_list
+ * @ioreq_client_list: list of ioreq clients
+ * @req_buf: request buffer shared between HV, SOS and UOS
+ * @pg: pointer to linux page which holds req_buf
+ * @pci_conf_addr: the saved pci_conf1_addr for 0xCF8
  */
 struct acrn_vm {
 	struct device *dev;
@@ -100,6 +108,13 @@ struct acrn_vm {
 	/* mutex to protect hugepage_hlist */
 	struct mutex hugepage_lock;
 	struct hlist_head hugepage_hlist[HUGEPAGE_HLIST_ARRAY_SIZE];
+	int ioreq_fallback_client;
+	/* the spin_lock to protect ioreq_client_list */
+	spinlock_t ioreq_client_lock;
+	struct list_head ioreq_client_list;
+	struct acrn_request_buffer *req_buf;
+	struct page *pg;
+	u32 pci_conf_addr;
 };
 
 int acrn_vm_destroy(struct acrn_vm *vm);
@@ -140,4 +155,22 @@ struct hc_ptdev_irq {
 	};
 };
 
+/**
+ * @brief Info to set ioreq buffer for a created VM
+ *
+ * the parameter for HC_SET_IOREQ_BUFFER hypercall
+ */
+struct acrn_set_ioreq_buffer {
+	/** host physical address of VM request_buffer */
+	u64 req_buf;
+};
+
+int acrn_ioreq_init(struct acrn_vm *vm, unsigned long vma);
+void acrn_ioreq_free(struct acrn_vm *vm);
+int acrn_ioreq_create_fallback_client(unsigned short vmid, char *name);
+unsigned int acrn_dev_poll(struct file *filep, poll_table *wait);
+void acrn_ioreq_driver_init(void);
+void acrn_ioreq_clear_request(struct acrn_vm *vm);
+int acrn_ioreq_distribute_request(struct acrn_vm *vm);
+
 #endif
diff --git a/drivers/staging/acrn/acrn_ioreq.c b/drivers/staging/acrn/acrn_ioreq.c
new file mode 100644
index 0000000..a6cf0c1
--- /dev/null
+++ b/drivers/staging/acrn/acrn_ioreq.c
@@ -0,0 +1,937 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
+/*
+ * ACRN_HSM: handle the ioreq_request in ioreq_client
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ *
+ * Jason Chen CJ <jason.cj.chen@intel.com>
+ * Zhao Yakui <yakui.zhao@intel.com>
+ * Jack Ren <jack.ren@intel.com>
+ * FengWei yin <fengwei.yin@intel.com>
+ *
+ */
+
+#include <linux/slab.h>
+#include <linux/wait.h>
+#include <linux/freezer.h>
+#include <linux/sched.h>
+#include <linux/kthread.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/poll.h>
+#include <linux/delay.h>
+#include <linux/bitops.h>
+#include <linux/acrn/acrn_ioctl_defs.h>
+#include <linux/acrn/acrn_drv.h>
+
+#include <linux/idr.h>
+#include <linux/refcount.h>
+#include <linux/rwlock_types.h>
+
+#include "acrn_drv_internal.h"
+#include "acrn_hypercall.h"
+
+/* rwlock that is used to protect IDR client */
+static DEFINE_RWLOCK(client_lock);
+static struct idr	idr_client;
+
+struct ioreq_range {
+	struct list_head list;
+	u32 type;
+	long start;
+	long end;
+};
+
+enum IOREQ_CLIENT_BITS {
+	IOREQ_CLIENT_DESTROYING = 0,
+	IOREQ_CLIENT_EXIT,
+};
+
+struct ioreq_client {
+	/* client name */
+	char name[16];
+	/* client id */
+	int id;
+	/* vm this client belongs to */
+	unsigned short vmid;
+	/* list node for this ioreq_client */
+	struct list_head list;
+	/*
+	 * is this client fallback?
+	 * there is only one fallback client in a vm - dm
+	 * a fallback client shares IOReq buffer pages
+	 * a fallback client handles all left IOReq not handled by other clients
+	 * a fallback client does not need add io ranges
+	 * a fallback client handles ioreq in its own context
+	 */
+	bool fallback;
+
+	unsigned long flags;
+
+	/* client covered io ranges - N/A for fallback client */
+	struct list_head range_list;
+	rwlock_t range_lock;
+
+	/*
+	 *   this req records the req number this client need handle
+	 */
+	DECLARE_BITMAP(ioreqs_map, ACRN_REQUEST_MAX);
+
+	/*
+	 * client ioreq handler:
+	 *   if client provides a handler, it means acrn need create a kthread
+	 *   to call the handler while there is ioreq.
+	 *   if client doesn't provide a handler, client should handle ioreq
+	 *   in its own context when calls acrn_ioreq_attach_client.
+	 *
+	 *   NOTE: for fallback client, there is no ioreq handler.
+	 */
+	ioreq_handler_t handler;
+	bool acrn_create_kthread;
+	struct task_struct *thread;
+	wait_queue_head_t wq;
+
+	refcount_t refcnt;
+	/* add the ref_vm of ioreq_client */
+	struct acrn_vm *ref_vm;
+	void *client_priv;
+};
+
+#define MAX_CLIENT 1024
+static void acrn_ioreq_notify_client(struct ioreq_client *client);
+
+static inline bool has_pending_request(struct ioreq_client *client)
+{
+	if (client)
+		return !bitmap_empty(client->ioreqs_map, ACRN_REQUEST_MAX);
+	else
+		return false;
+}
+
+static int alloc_client(void)
+{
+	struct ioreq_client *client;
+	int ret;
+
+	client = kzalloc(sizeof(*client), GFP_KERNEL);
+	if (!client)
+		return -ENOMEM;
+	refcount_set(&client->refcnt, 1);
+
+	write_lock_bh(&client_lock);
+	ret = idr_alloc_cyclic(&idr_client, client, 1, MAX_CLIENT, GFP_NOWAIT);
+	write_unlock_bh(&client_lock);
+
+	if (ret < 0) {
+		kfree(client);
+		return -EINVAL;
+	}
+
+	client->id = ret;
+
+	return ret;
+}
+
+static struct ioreq_client *acrn_ioreq_get_client(int client_id)
+{
+	struct ioreq_client *obj;
+
+	read_lock_bh(&client_lock);
+	obj = idr_find(&idr_client, client_id);
+	if (obj)
+		refcount_inc(&obj->refcnt);
+	read_unlock_bh(&client_lock);
+
+	return obj;
+}
+
+static void acrn_ioreq_put_client(struct ioreq_client *client)
+{
+	if (refcount_dec_and_test(&client->refcnt)) {
+		struct acrn_vm *ref_vm = client->ref_vm;
+		/* The client should be released when refcnt = 0 */
+		kfree(client);
+		put_vm(ref_vm);
+	}
+}
+
+int acrn_ioreq_create_client(unsigned short vmid,
+			     ioreq_handler_t handler,
+			     void *client_priv,
+			     char *name)
+{
+	struct acrn_vm *vm;
+	struct ioreq_client *client;
+	int client_id;
+
+	might_sleep();
+
+	vm = find_get_vm(vmid);
+	if (unlikely(!vm || !vm->req_buf)) {
+		pr_err("acrn-ioreq: failed to find vm from vmid %d\n", vmid);
+		put_vm(vm);
+		return -EINVAL;
+	}
+
+	client_id = alloc_client();
+	if (unlikely(client_id < 0)) {
+		pr_err("acrn-ioreq: vm[%d] failed to alloc ioreq client\n",
+		       vmid);
+		put_vm(vm);
+		return -EINVAL;
+	}
+
+	client = acrn_ioreq_get_client(client_id);
+	if (unlikely(!client)) {
+		pr_err("failed to get the client.\n");
+		put_vm(vm);
+		return -EINVAL;
+	}
+
+	if (handler) {
+		client->handler = handler;
+		client->acrn_create_kthread = true;
+	}
+
+	client->ref_vm = vm;
+	client->vmid = vmid;
+	client->client_priv = client_priv;
+	if (name)
+		strncpy(client->name, name, sizeof(client->name) - 1);
+	rwlock_init(&client->range_lock);
+	INIT_LIST_HEAD(&client->range_list);
+	init_waitqueue_head(&client->wq);
+
+	/* When it is added to ioreq_client_list, the refcnt is increased */
+	spin_lock_bh(&vm->ioreq_client_lock);
+	list_add(&client->list, &vm->ioreq_client_list);
+	spin_unlock_bh(&vm->ioreq_client_lock);
+
+	pr_info("acrn-ioreq: created ioreq client %d\n", client_id);
+
+	return client_id;
+}
+
+void acrn_ioreq_clear_request(struct acrn_vm *vm)
+{
+	struct ioreq_client *client;
+	struct list_head *pos;
+	bool has_pending = false;
+	int retry_cnt = 10;
+	int bit;
+
+	/*
+	 * Now, ioreq clearing only happens when do VM reset. Current
+	 * implementation is waiting all ioreq clients except the DM
+	 * one have no pending ioreqs in 10ms per loop
+	 */
+
+	do {
+		spin_lock_bh(&vm->ioreq_client_lock);
+		list_for_each(pos, &vm->ioreq_client_list) {
+			client = container_of(pos, struct ioreq_client, list);
+			if (vm->ioreq_fallback_client == client->id)
+				continue;
+			has_pending = has_pending_request(client);
+			if (has_pending)
+				break;
+		}
+		spin_unlock_bh(&vm->ioreq_client_lock);
+
+		if (has_pending)
+			schedule_timeout_interruptible(HZ / 100);
+	} while (has_pending && --retry_cnt > 0);
+
+	if (retry_cnt == 0)
+		pr_warn("ioreq client[%d] cannot flush pending request!\n",
+			client->id);
+
+	/* Clear all ioreqs belong to DM. */
+	if (vm->ioreq_fallback_client > 0) {
+		client = acrn_ioreq_get_client(vm->ioreq_fallback_client);
+		if (!client)
+			return;
+
+		bit = find_next_bit(client->ioreqs_map, ACRN_REQUEST_MAX, 0);
+		while (bit < ACRN_REQUEST_MAX) {
+			acrn_ioreq_complete_request(client->id, bit, NULL);
+			bit = find_next_bit(client->ioreqs_map,
+					    ACRN_REQUEST_MAX,
+					    bit + 1);
+		}
+		acrn_ioreq_put_client(client);
+	}
+}
+
+int acrn_ioreq_create_fallback_client(unsigned short vmid, char *name)
+{
+	struct acrn_vm *vm;
+	int client_id;
+	struct ioreq_client *client;
+
+	vm = find_get_vm(vmid);
+	if (unlikely(!vm)) {
+		pr_err("acrn-ioreq: failed to find vm from vmid %d\n",
+		       vmid);
+		return -EINVAL;
+	}
+
+	if (unlikely(vm->ioreq_fallback_client > 0)) {
+		pr_err("acrn-ioreq: there is already fallback client exist for vm %d\n",
+		       vmid);
+		put_vm(vm);
+		return -EINVAL;
+	}
+
+	client_id = acrn_ioreq_create_client(vmid, NULL, NULL, name);
+	if (unlikely(client_id < 0)) {
+		put_vm(vm);
+		return -EINVAL;
+	}
+
+	client = acrn_ioreq_get_client(client_id);
+	if (unlikely(!client)) {
+		pr_err("failed to get the client.\n");
+		put_vm(vm);
+		return -EINVAL;
+	}
+
+	client->fallback = true;
+	vm->ioreq_fallback_client = client_id;
+
+	acrn_ioreq_put_client(client);
+	put_vm(vm);
+
+	return client_id;
+}
+
+/* When one client is removed from VM, the refcnt is decreased */
+static void acrn_ioreq_remove_client_pervm(struct ioreq_client *client,
+					   struct acrn_vm *vm)
+{
+	struct list_head *pos, *tmp;
+
+	set_bit(IOREQ_CLIENT_DESTROYING, &client->flags);
+
+	if (client->acrn_create_kthread) {
+		/* when the kthread is already started, the kthread_stop is
+		 * used to terminate the ioreq_client_thread
+		 */
+		if (client->thread &&
+		    !test_bit(IOREQ_CLIENT_EXIT, &client->flags))
+			kthread_stop(client->thread);
+
+		/* decrease the refcount as it is increased when creating
+		 * ioreq_client_thread kthread
+		 */
+		acrn_ioreq_put_client(client);
+	} else {
+		set_bit(IOREQ_CLIENT_DESTROYING, &client->flags);
+		acrn_ioreq_notify_client(client);
+	}
+
+	write_lock_bh(&client->range_lock);
+	list_for_each_safe(pos, tmp, &client->range_list) {
+		struct ioreq_range *range =
+			container_of(pos, struct ioreq_range, list);
+		list_del(&range->list);
+		kfree(range);
+	}
+	write_unlock_bh(&client->range_lock);
+
+	spin_lock_bh(&vm->ioreq_client_lock);
+	list_del(&client->list);
+	spin_unlock_bh(&vm->ioreq_client_lock);
+
+	if (client->id == vm->ioreq_fallback_client)
+		vm->ioreq_fallback_client = -1;
+
+	acrn_ioreq_put_client(client);
+}
+
+void acrn_ioreq_destroy_client(int client_id)
+{
+	struct ioreq_client *client;
+
+	if (client_id < 0 || client_id >= MAX_CLIENT) {
+		pr_err("acrn-ioreq: no client for id %d\n", client_id);
+		return;
+	}
+
+	write_lock_bh(&client_lock);
+	client = idr_remove(&idr_client, client_id);
+	write_unlock_bh(&client_lock);
+
+	/* When client_id is released, just keep silence and return */
+	if (!client)
+		return;
+
+	might_sleep();
+
+	acrn_ioreq_remove_client_pervm(client, client->ref_vm);
+	acrn_ioreq_put_client(client);
+}
+
+/*
+ * NOTE: here just add iorange entry directly, no check for the overlap..
+ * please client take care of it
+ */
+int acrn_ioreq_add_iorange(int client_id, uint32_t type,
+			   long start, long end)
+{
+	struct ioreq_client *client;
+	struct ioreq_range *range;
+
+	if (client_id < 0 || client_id >= MAX_CLIENT) {
+		pr_err("acrn-ioreq: no client for id %d\n", client_id);
+		return -EFAULT;
+	}
+	if (end < start) {
+		pr_err("acrn-ioreq: end < start\n");
+		return -EFAULT;
+	}
+
+	client = acrn_ioreq_get_client(client_id);
+	if (!client) {
+		pr_err("acrn-ioreq: no client for id %d\n", client_id);
+		return -EFAULT;
+	}
+
+	might_sleep();
+
+	range = kzalloc(sizeof(*range), GFP_KERNEL);
+	if (!range) {
+		acrn_ioreq_put_client(client);
+		return -ENOMEM;
+	}
+	range->type = type;
+	range->start = start;
+	range->end = end;
+
+	write_lock_bh(&client->range_lock);
+	list_add(&range->list, &client->range_list);
+	write_unlock_bh(&client->range_lock);
+	acrn_ioreq_put_client(client);
+
+	return 0;
+}
+
+int acrn_ioreq_del_iorange(int client_id, uint32_t type,
+			   long start, long end)
+{
+	struct ioreq_client *client;
+	struct ioreq_range *range;
+	struct list_head *pos, *tmp;
+
+	if (client_id < 0 || client_id >= MAX_CLIENT) {
+		pr_err("acrn-ioreq: no client for id %d\n", client_id);
+		return -EFAULT;
+	}
+	if (end < start) {
+		pr_err("acrn-ioreq: end < start\n");
+		return -EFAULT;
+	}
+
+	client = acrn_ioreq_get_client(client_id);
+	if (!client) {
+		pr_err("acrn-ioreq: no client for id %d\n", client_id);
+		return -EFAULT;
+	}
+
+	might_sleep();
+
+	read_lock_bh(&client->range_lock);
+	list_for_each_safe(pos, tmp, &client->range_list) {
+		range = container_of(pos, struct ioreq_range, list);
+		if ((range->type == type) &&
+		    (start == range->start) &&
+		    (end == range->end)) {
+			list_del(&range->list);
+			kfree(range);
+			break;
+		}
+	}
+	read_unlock_bh(&client->range_lock);
+	acrn_ioreq_put_client(client);
+
+	return 0;
+}
+
+static inline bool is_destroying(struct ioreq_client *client)
+{
+	if (client)
+		return test_bit(IOREQ_CLIENT_DESTROYING, &client->flags);
+	else
+		return true;
+}
+
+struct acrn_request *acrn_ioreq_get_reqbuf(int client_id)
+{
+	struct ioreq_client *client;
+	struct acrn_vm *vm;
+
+	if (client_id < 0 || client_id >= MAX_CLIENT) {
+		pr_err("acrn-ioreq: no client for id %d\n", client_id);
+		return NULL;
+	}
+	client = acrn_ioreq_get_client(client_id);
+	if (!client) {
+		pr_err("acrn-ioreq: no client for id %d\n", client_id);
+		return NULL;
+	}
+
+	vm = client->ref_vm;
+	if (unlikely(!vm || !vm->req_buf)) {
+		pr_err("acrn-ioreq: failed to find vm from vmid %d\n",
+		       client->vmid);
+		acrn_ioreq_put_client(client);
+		return NULL;
+	}
+
+	acrn_ioreq_put_client(client);
+	return (struct acrn_request *)vm->req_buf;
+}
+
+static int ioreq_client_thread(void *data)
+{
+	struct ioreq_client *client;
+	int ret;
+	struct acrn_vm *vm;
+
+	client = (struct ioreq_client *)data;
+
+	/* This should never happen */
+	if (unlikely(!client)) {
+		pr_err("acrn-ioreq: pass the NULL parameter\n");
+		return 0;
+	}
+
+	vm = client->ref_vm;
+	if (unlikely(!vm)) {
+		pr_err("acrn-ioreq: failed to find vm from vmid %d\n",
+		       client->vmid);
+		set_bit(IOREQ_CLIENT_EXIT, &client->flags);
+		return -EINVAL;
+	}
+
+	/* add refcnt for client */
+	refcount_inc(&client->refcnt);
+
+	while (!kthread_should_stop()) {
+		if (has_pending_request(client)) {
+			if (client->handler) {
+				ret = client->handler(client->id,
+					client->ioreqs_map,
+					client->client_priv);
+				if (ret < 0) {
+					pr_err("acrn-ioreq: err:%d\n", ret);
+					break;
+				}
+			} else {
+				pr_err("acrn-ioreq: no ioreq handler\n");
+				break;
+			}
+			continue;
+		}
+		wait_event_freezable(client->wq,
+				     (has_pending_request(client) ||
+				      kthread_should_stop()));
+	}
+
+	set_bit(IOREQ_CLIENT_EXIT, &client->flags);
+	acrn_ioreq_put_client(client);
+	return 0;
+}
+
+int acrn_ioreq_attach_client(int client_id)
+{
+	struct ioreq_client *client;
+
+	if (client_id < 0 || client_id >= MAX_CLIENT) {
+		pr_err("acrn-ioreq: no client for id %d\n", client_id);
+		return -EFAULT;
+	}
+	client = acrn_ioreq_get_client(client_id);
+	if (!client) {
+		pr_err("acrn-ioreq: no client for id %d\n", client_id);
+		return -EFAULT;
+	}
+
+	if (client->acrn_create_kthread) {
+		if (client->thread) {
+			pr_warn("acrn-ioreq: kthread already exist for client %s\n",
+				client->name);
+			acrn_ioreq_put_client(client);
+			return 0;
+		}
+		client->thread = kthread_run(ioreq_client_thread,
+					     client,
+					     "ict[%d]:%s",
+					     client->vmid, client->name);
+		if (IS_ERR_OR_NULL(client->thread)) {
+			pr_err("acrn-ioreq: failed to run kthread for client %s\n",
+			       client->name);
+			client->thread = NULL;
+			acrn_ioreq_put_client(client);
+			return -ENOMEM;
+		}
+	} else {
+		wait_event_freezable(client->wq,
+				     (has_pending_request(client) ||
+				      is_destroying(client)));
+
+		if (is_destroying(client)) {
+			acrn_ioreq_put_client(client);
+			return 1;
+		}
+		acrn_ioreq_put_client(client);
+	}
+
+	return 0;
+}
+
+static void acrn_ioreq_notify_client(struct ioreq_client *client)
+{
+	/* if client thread is in waitqueue, wake up it */
+	if (waitqueue_active(&client->wq))
+		wake_up_interruptible(&client->wq);
+}
+
+static int ioreq_complete_request(unsigned short vmid, int vcpu,
+				  struct acrn_request *acrn_req)
+{
+	bool polling_mode;
+
+	/* add barrier before reading the completion mode */
+	smp_rmb();
+	polling_mode = acrn_req->completion_polling;
+	atomic_set(&acrn_req->processed, REQ_STATE_COMPLETE);
+	/*
+	 * In polling mode, HV will poll ioreqs' completion.
+	 * Once marked the ioreq as REQ_STATE_COMPLETE, hypervisor side
+	 * can poll the result and continue the IO flow. Thus, we don't
+	 * need to notify hypervisor by hypercall.
+	 * Please note, we need get completion_polling before set the request
+	 * as complete, or we will race with hypervisor.
+	 */
+	if (!polling_mode) {
+		if (hcall_notify_req_finish(vmid, vcpu) < 0) {
+			pr_err("acrn-ioreq: notify request complete failed!\n");
+			return -EFAULT;
+		}
+	}
+
+	return 0;
+}
+
+static bool req_in_range(struct ioreq_range *range, struct acrn_request *req)
+{
+	bool ret = false;
+
+	if (range->type == req->type) {
+		switch (req->type) {
+		case REQ_MMIO:
+		case REQ_WP:
+		{
+			if (req->reqs.mmio_request.address >= range->start &&
+			    (req->reqs.mmio_request.address +
+			     req->reqs.mmio_request.size) <= range->end)
+				ret = true;
+			break;
+		}
+		case REQ_PORTIO: {
+			if (req->reqs.pio_request.address >= range->start &&
+			    (req->reqs.pio_request.address +
+			     req->reqs.pio_request.size) <= range->end)
+				ret = true;
+			break;
+		}
+
+		default:
+			ret = false;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static bool is_cfg_addr(struct acrn_request *req)
+{
+	return ((req->type == REQ_PORTIO) &&
+		(req->reqs.pio_request.address == 0xcf8));
+}
+
+static bool is_cfg_data(struct acrn_request *req)
+{
+	return (req->type == REQ_PORTIO &&
+		(req->reqs.pio_request.address >= 0xcfc &&
+		 req->reqs.pio_request.address < (0xcfc + 4)));
+}
+
+#define PCI_LOWREG_MASK  0xFC   /* the low 8-bit of supported pci_reg addr.*/
+#define PCI_HIGHREG_MASK 0xF00  /* the high 4-bit of supported pci_reg addr */
+#define PCI_FUNCMAX	7       /* highest supported function number */
+#define PCI_SLOTMAX	31      /* highest supported slot number */
+#define PCI_BUSMAX	255     /* highest supported bus number */
+#define CONF1_ENABLE	0x80000000ul
+static int handle_cf8cfc(struct acrn_vm *vm, struct acrn_request *req, int vcpu)
+{
+	int req_handled = 0;
+	int err = 0;
+
+	/*XXX: like DM, assume cfg address write is size 4 */
+	if (is_cfg_addr(req)) {
+		if (req->reqs.pio_request.direction == REQUEST_WRITE) {
+			if (req->reqs.pio_request.size == 4) {
+				vm->pci_conf_addr = req->reqs.pio_request.value;
+				req_handled = 1;
+			}
+		} else {
+			if (req->reqs.pio_request.size == 4) {
+				req->reqs.pio_request.value = vm->pci_conf_addr;
+				req_handled = 1;
+			}
+		}
+	} else if (is_cfg_data(req)) {
+		if (!(vm->pci_conf_addr & CONF1_ENABLE)) {
+			if (req->reqs.pio_request.direction == REQUEST_READ)
+				req->reqs.pio_request.value = 0xffffffff;
+			req_handled = 1;
+		} else {
+			/* pci request is same as io request at top */
+			int offset = req->reqs.pio_request.address - 0xcfc;
+			int pci_reg;
+			u32 pci_cfg_addr;
+
+			req->type = REQ_PCICFG;
+			pci_cfg_addr = vm->pci_conf_addr;
+			req->reqs.pci_request.bus = (pci_cfg_addr >> 16) &
+						     PCI_BUSMAX;
+			req->reqs.pci_request.dev = (pci_cfg_addr >> 11) &
+						     PCI_SLOTMAX;
+			req->reqs.pci_request.func = (pci_cfg_addr >> 8) &
+						      PCI_FUNCMAX;
+			pci_reg = (pci_cfg_addr & PCI_LOWREG_MASK) +
+				   ((pci_cfg_addr >> 16) & PCI_HIGHREG_MASK);
+			req->reqs.pci_request.reg = pci_reg + offset;
+		}
+	}
+
+	if (req_handled)
+		err = ioreq_complete_request(vm->vmid, vcpu, req);
+
+	return err ? err : req_handled;
+}
+
+static
+struct ioreq_client *find_ioreq_client_by_request(struct acrn_vm *vm,
+						  struct acrn_request *req)
+{
+	struct list_head *pos, *range_pos;
+	struct ioreq_client *client;
+	int target_client, fallback_client;
+	struct ioreq_range *range;
+	bool found = false;
+
+	target_client = 0;
+	fallback_client = 0;
+	spin_lock_bh(&vm->ioreq_client_lock);
+	list_for_each(pos, &vm->ioreq_client_list) {
+		client = container_of(pos, struct ioreq_client, list);
+
+		if (client->fallback) {
+			fallback_client = client->id;
+			continue;
+		}
+
+		read_lock_bh(&client->range_lock);
+		list_for_each(range_pos, &client->range_list) {
+			range =
+			container_of(range_pos, struct ioreq_range, list);
+			if (req_in_range(range, req)) {
+				found = true;
+				target_client = client->id;
+				break;
+			}
+		}
+		read_unlock_bh(&client->range_lock);
+
+		if (found)
+			break;
+	}
+	spin_unlock_bh(&vm->ioreq_client_lock);
+
+	if (target_client > 0)
+		return acrn_ioreq_get_client(target_client);
+
+	if (fallback_client > 0)
+		return acrn_ioreq_get_client(fallback_client);
+
+	return NULL;
+}
+
+int acrn_ioreq_distribute_request(struct acrn_vm *vm)
+{
+	struct acrn_request *req;
+	struct list_head *pos;
+	struct ioreq_client *client;
+	int i, vcpu_num;
+
+	vcpu_num = atomic_read(&vm->vcpu_num);
+	for (i = 0; i < vcpu_num; i++) {
+		req = vm->req_buf->req_queue + i;
+
+		/* This function is called in tasklet only on SOS. Thus it
+		 * is safe to read the state first and update it later as
+		 * long as the update is atomic.
+		 */
+		if (atomic_read(&req->processed) == REQ_STATE_PENDING) {
+			if (handle_cf8cfc(vm, req, i))
+				continue;
+			client = find_ioreq_client_by_request(vm, req);
+			if (!client) {
+				pr_err("acrn-ioreq: failed to find ioreq client\n");
+				return -EINVAL;
+			}
+			req->client = client->id;
+			atomic_set(&req->processed, REQ_STATE_PROCESSING);
+			set_bit(i, client->ioreqs_map);
+			acrn_ioreq_put_client(client);
+		}
+	}
+
+	spin_lock_bh(&vm->ioreq_client_lock);
+	list_for_each(pos, &vm->ioreq_client_list) {
+		client = container_of(pos, struct ioreq_client, list);
+		if (has_pending_request(client))
+			acrn_ioreq_notify_client(client);
+	}
+	spin_unlock_bh(&vm->ioreq_client_lock);
+
+	return 0;
+}
+
+int acrn_ioreq_complete_request(int client_id, uint64_t vcpu,
+				struct acrn_request *acrn_req)
+{
+	struct ioreq_client *client;
+	int ret;
+
+	if (client_id < 0 || client_id >= MAX_CLIENT) {
+		pr_err("acrn-ioreq: no client for id %d\n", client_id);
+		return -EINVAL;
+	}
+	client = acrn_ioreq_get_client(client_id);
+	if (!client) {
+		pr_err("acrn-ioreq: no client for id %d\n", client_id);
+		return -EINVAL;
+	}
+
+	clear_bit(vcpu, client->ioreqs_map);
+	if (!acrn_req) {
+		acrn_req = acrn_ioreq_get_reqbuf(client_id);
+		if (!acrn_req) {
+			acrn_ioreq_put_client(client);
+			return -EINVAL;
+		}
+		acrn_req += vcpu;
+	}
+
+	ret = ioreq_complete_request(client->vmid, vcpu, acrn_req);
+	acrn_ioreq_put_client(client);
+
+	return ret;
+}
+
+unsigned int acrn_dev_poll(struct file *filep, poll_table *wait)
+{
+	struct acrn_vm *vm = filep->private_data;
+	struct ioreq_client *fallback_client;
+	unsigned int ret = 0;
+
+	if (!vm || !vm->req_buf ||
+	    (vm->ioreq_fallback_client <= 0)) {
+		pr_err("acrn: invalid VM !\n");
+		ret = POLLERR;
+		return ret;
+	}
+
+	fallback_client = acrn_ioreq_get_client(vm->ioreq_fallback_client);
+	if (!fallback_client) {
+		pr_err("acrn-ioreq: no client for id %d\n",
+		       vm->ioreq_fallback_client);
+		return -EINVAL;
+	}
+
+	poll_wait(filep, &fallback_client->wq, wait);
+	if (has_pending_request(fallback_client) ||
+	    is_destroying(fallback_client))
+		ret = POLLIN | POLLRDNORM;
+
+	acrn_ioreq_put_client(fallback_client);
+
+	return ret;
+}
+
+int acrn_ioreq_init(struct acrn_vm *vm, unsigned long vma)
+{
+	struct acrn_set_ioreq_buffer *set_buffer;
+	struct page *page;
+	int ret;
+
+	if (vm->req_buf)
+		return -EEXIST;
+
+	set_buffer = kmalloc(sizeof(*set_buffer), GFP_KERNEL);
+	if (!set_buffer)
+		return -ENOMEM;
+
+	ret = get_user_pages_fast(vma, 1, 1, &page);
+	if (unlikely(ret != 1) || !page) {
+		pr_err("acrn-ioreq: failed to pin request buffer!\n");
+		kfree(set_buffer);
+		return -ENOMEM;
+	}
+
+	vm->req_buf = page_address(page);
+	vm->pg = page;
+
+	set_buffer->req_buf = page_to_phys(page);
+
+	ret = hcall_set_ioreq_buffer(vm->vmid, virt_to_phys(set_buffer));
+	kfree(set_buffer);
+	if (ret < 0) {
+		pr_err("acrn-ioreq: failed to set request buffer !\n");
+		return -EFAULT;
+	}
+
+	pr_debug("acrn-ioreq: init request buffer @ %p!\n",
+		 vm->req_buf);
+
+	return 0;
+}
+
+void acrn_ioreq_free(struct acrn_vm *vm)
+{
+	struct list_head *pos, *tmp;
+
+	/* When acrn_ioreq_destroy_client is called, it will be released
+	 * and removed from vm->ioreq_client_list.
+	 * The below is used to assure that the client is still released
+	 * even when it is not called.
+	 */
+	if (!test_and_set_bit(ACRN_VM_IOREQ_FREE, &vm->flags)) {
+		get_vm(vm);
+		list_for_each_safe(pos, tmp, &vm->ioreq_client_list) {
+			struct ioreq_client *client =
+				container_of(pos, struct ioreq_client, list);
+			acrn_ioreq_destroy_client(client->id);
+		}
+		put_vm(vm);
+	}
+}
+
+void acrn_ioreq_driver_init(void)
+{
+	idr_init(&idr_client);
+}
diff --git a/drivers/staging/acrn/acrn_vm_mngt.c b/drivers/staging/acrn/acrn_vm_mngt.c
index 4287595..8de380c 100644
--- a/drivers/staging/acrn/acrn_vm_mngt.c
+++ b/drivers/staging/acrn/acrn_vm_mngt.c
@@ -12,6 +12,7 @@
 #include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/io.h>
+#include <linux/mm.h>
 #include <linux/rwlock_types.h>
 #include <linux/acrn/acrn_ioctl_defs.h>
 #include <linux/acrn/acrn_drv.h>
@@ -47,6 +48,12 @@ void put_vm(struct acrn_vm *vm)
 {
 	if (refcount_dec_and_test(&vm->refcnt)) {
 		free_guest_mem(vm);
+		if (vm->req_buf && vm->pg) {
+			put_page(vm->pg);
+			vm->pg = NULL;
+			vm->req_buf = NULL;
+		}
+
 		kfree(vm);
 		pr_debug("hsm: freed vm\n");
 	}
diff --git a/include/linux/acrn/acrn_drv.h b/include/linux/acrn/acrn_drv.h
index bcdfcaf..74a9e89 100644
--- a/include/linux/acrn/acrn_drv.h
+++ b/include/linux/acrn/acrn_drv.h
@@ -95,4 +95,106 @@ extern int acrn_write_protect_page(unsigned short vmid, unsigned long gpa,
 extern int acrn_inject_msi(unsigned short vmid, unsigned long msi_addr,
 			   unsigned long msi_data);
 
+
+/* the API related with emulated mmio ioreq */
+typedef	int (*ioreq_handler_t)(int client_id,
+			       unsigned long *ioreqs_map,
+			       void *client_priv);
+
+/**
+ * acrn_ioreq_create_client - create ioreq client
+ *
+ * @vmid: ID to identify guest
+ * @handler: ioreq_handler of ioreq client
+ *           If client wants to handle request in client thread context, set
+ *           this parameter to NULL. If client wants to handle request out of
+ *           client thread context, set handler function pointer of its own.
+ *           acrn_hsm will create kernel thread and call handler to handle
+ *           request(This is recommended).
+ *
+ * @client_priv: the private structure for the given client.
+ *           When handler is not NULL, this is required and used as the
+ *           third argument of ioreq_handler callback
+ *
+ * @name: the name of ioreq client
+ *
+ * Return: client id on success, <0 on error
+ */
+int acrn_ioreq_create_client(unsigned short vmid,
+			     ioreq_handler_t handler,
+			     void *client_priv,
+			     char *name);
+
+/**
+ * acrn_ioreq_destroy_client - destroy ioreq client
+ *
+ * @client_id: client id to identify ioreq client
+ *
+ * Return:
+ */
+void acrn_ioreq_destroy_client(int client_id);
+
+/**
+ * acrn_ioreq_add_iorange - add iorange monitored by ioreq client
+ *
+ * @client_id: client id to identify ioreq client
+ * @type: iorange type
+ * @start: iorange start address
+ * @end: iorange end address
+ *
+ * Return: 0 on success, <0 on error
+ */
+int acrn_ioreq_add_iorange(int client_id, uint32_t type,
+			   long start, long end);
+
+/**
+ * acrn_ioreq_del_iorange - del iorange monitored by ioreq client
+ *
+ * @client_id: client id to identify ioreq client
+ * @type: iorange type
+ * @start: iorange start address
+ * @end: iorange end address
+ *
+ * Return: 0 on success, <0 on error
+ */
+int acrn_ioreq_del_iorange(int client_id, uint32_t type,
+			   long start, long end);
+
+/**
+ * acrn_ioreq_get_reqbuf - get request buffer
+ * request buffer is shared by all clients in one guest
+ *
+ * @client_id: client id to identify ioreq client
+ *
+ * Return: pointer to request buffer, NULL on error
+ */
+struct acrn_request *acrn_ioreq_get_reqbuf(int client_id);
+
+/**
+ * acrn_ioreq_attach_client - start handle request for ioreq client
+ * If request is handled out of client thread context, this function is
+ * only called once to be ready to handle new request.
+ *
+ * If request is handled in client thread context, this function must
+ * be called every time after the previous request handling is completed
+ * to be ready to handle new request.
+ *
+ * @client_id: client id to identify ioreq client
+ *
+ * Return: 0 on success, <0 on error, 1 if ioreq client is destroying
+ */
+int acrn_ioreq_attach_client(int client_id);
+
+/**
+ * acrn_ioreq_complete_request - notify guest request handling is completed
+ *
+ * @client_id: client id to identify ioreq client
+ * @vcpu: identify request submitter
+ * @req: the acrn_request that is marked as completed
+ *
+ * Return: 0 on success, <0 on error
+ */
+int acrn_ioreq_complete_request(int client_id, uint64_t vcpu,
+				struct acrn_request *req);
+
 #endif
diff --git a/include/uapi/linux/acrn/acrn_common_def.h b/include/uapi/linux/acrn/acrn_common_def.h
index a0f90a3..e2ad9b5 100644
--- a/include/uapi/linux/acrn/acrn_common_def.h
+++ b/include/uapi/linux/acrn/acrn_common_def.h
@@ -22,4 +22,180 @@
 #define	MEM_TYPE_WP                     0x00000400
 #define MEM_TYPE_MASK                   0x000007C0
 
+/*
+ * IO request
+ */
+#define ACRN_REQUEST_MAX 16
+
+#define REQ_STATE_PENDING	0
+#define REQ_STATE_COMPLETE	1
+#define REQ_STATE_PROCESSING	2
+#define REQ_STATE_FREE		3
+
+#define REQ_PORTIO	0
+#define REQ_MMIO	1
+#define REQ_PCICFG	2
+#define REQ_WP		3
+
+#define REQUEST_READ	0
+#define REQUEST_WRITE	1
+
+/**
+ * @brief Hypercall
+ *
+ * @addtogroup acrn_hypercall ACRN Hypercall
+ * @{
+ */
+
+struct mmio_request {
+	uint32_t direction;
+	uint32_t reserved;
+	uint64_t address;
+	uint64_t size;
+	uint64_t value;
+};
+
+struct pio_request {
+	uint32_t direction;
+	uint32_t reserved;
+	uint64_t address;
+	uint64_t size;
+	uint32_t value;
+};
+
+struct pci_request {
+	uint32_t direction;
+	uint32_t reserved[3];/* need keep same header fields with pio_request */
+	int64_t size;
+	int32_t value;
+	int32_t bus;
+	int32_t dev;
+	int32_t func;
+	int32_t reg;
+};
+
+/**
+ * struct acrn_request - 256-byte ACRN request
+ *
+ * The state transitions of a ACRN request are:
+ *
+ *    FREE -> PENDING -> PROCESSING -> COMPLETE -> FREE -> ...
+ *                                \              /
+ *                                 +--> FAILED -+
+ *
+ * When a request is in COMPLETE or FREE state, the request is owned by the
+ * hypervisor. SOS (HSM or DM) shall not read or write the internals of the
+ * request except the state.
+ *
+ * When a request is in PENDING or PROCESSING state, the request is owned by
+ * SOS. The hypervisor shall not read or write the request other than the state.
+ *
+ * Based on the rules above, a typical ACRN request lifecycle should looks like
+ * the following.
+ *
+ *                     (assume the initial state is FREE)
+ *
+ *       SOS vCPU 0                SOS vCPU x                    UOS vCPU y
+ *
+ *                                                 hypervisor:
+ *                                                     fill in type, addr, etc.
+ *                                                     pause UOS vcpu y
+ *                                                     set state to PENDING (a)
+ *                                                     fire upcall to SOS vCPU 0
+ *
+ *  HSM:
+ *      scan for pending requests
+ *      set state to PROCESSING (b)
+ *      assign requests to clients (c)
+ *
+ *                            client:
+ *                                scan for assigned requests
+ *                                handle the requests (d)
+ *                                set state to COMPLETE
+ *                                notify the hypervisor
+ *
+ *                            hypervisor:
+ *                                resume UOS vcpu y (e)
+ *
+ *                                                 hypervisor:
+ *                                                     post-work (f)
+ *                                                     set state to FREE
+ *
+ * Note that the following shall hold.
+ *
+ *   1. (a) happens before (b)
+ *   2. (c) happens before (d)
+ *   3. (e) happens before (f)
+ *   4. One vCPU cannot trigger another I/O request before the previous one has
+ *      completed (i.e. the state switched to FREE)
+ *
+ * Accesses to the state of a acrn_request shall be atomic and proper barriers
+ * are needed to ensure that:
+ *
+ *   1. Setting state to PENDING is the last operation when issuing a request in
+ *      the hypervisor, as the hypervisor shall not access the request any more.
+ *
+ *   2. Due to similar reasons, setting state to COMPLETE is the last operation
+ *      of request handling in HSM or clients in SOS.
+ */
+struct acrn_request {
+	/**
+	 * @type: Type of this request. Byte offset: 0.
+	 */
+	uint32_t type;
+
+	/**
+	 * @completion_polling: Hypervisor will poll completion if set.
+	 *
+	 * Byte offset: 4.
+	 */
+	uint32_t completion_polling;
+
+
+	/**
+	 * @reserved0: Reserved fields. Byte offset: 4.
+	 */
+	uint32_t reserved0[14];
+
+	/**
+	 * @reqs: Details about this request.
+	 *
+	 * For REQ_PORTIO, this has type pio_request. For REQ_MMIO and REQ_WP,
+	 * this has type mmio_request. For REQ_PCICFG, this has type
+	 * pci_request. Byte offset: 64.
+	 */
+	union {
+		struct pio_request pio_request;
+		struct pci_request pci_request;
+		struct mmio_request mmio_request;
+		uint64_t reserved1[8];
+	} reqs;
+
+	/**
+	 * @reserved1: Reserved fields. Byte offset: 128.
+	 */
+	uint32_t reserved1;
+
+	/**
+	 * @client: The client which is distributed to handle this request.
+	 *
+	 * Accessed by ACRN_HSM only. Byte offset: 132.
+	 */
+	int32_t client;
+
+	/**
+	 * @processed: The status of this request.
+	 *
+	 * Take REQ_STATE_xxx as values. Byte offset: 136.
+	 */
+	atomic_t processed;
+} __aligned(256);
+
+struct acrn_request_buffer {
+	union {
+		struct acrn_request req_queue[ACRN_REQUEST_MAX];
+		uint8_t reserved[4096];
+	};
+};
+
 #endif /* _ACRN_COMMON_DEF_H */
diff --git a/include/uapi/linux/acrn/acrn_ioctl_defs.h b/include/uapi/linux/acrn/acrn_ioctl_defs.h
index 371904c..c3c4f98 100644
--- a/include/uapi/linux/acrn/acrn_ioctl_defs.h
+++ b/include/uapi/linux/acrn/acrn_ioctl_defs.h
@@ -223,6 +223,17 @@ struct acrn_msi_entry {
 	uint64_t msi_data;
 };
 
+/**
+ * struct ioreq_notify - data structure to notify hypervisor ioreq is handled
+ *
+ * @client_id: client id to identify ioreq client
+ * @vcpu: identify the ioreq submitter
+ */
+struct ioreq_notify {
+	int32_t client_id;
+	uint32_t vcpu;
+};
+
 /*
  * Common IOCTL ID definition for DM
  */
@@ -249,6 +260,15 @@ struct acrn_msi_entry {
 #define IC_VM_INTR_MONITOR             _IC_ID(IC_ID, IC_ID_IRQ_BASE + 0x04)
 #define IC_SET_IRQLINE                 _IC_ID(IC_ID, IC_ID_IRQ_BASE + 0x05)
 
+/* DM ioreq management */
+#define IC_ID_IOREQ_BASE                0x30UL
+#define IC_SET_IOREQ_BUFFER             _IC_ID(IC_ID, IC_ID_IOREQ_BASE + 0x00)
+#define IC_NOTIFY_REQUEST_FINISH        _IC_ID(IC_ID, IC_ID_IOREQ_BASE + 0x01)
+#define IC_CREATE_IOREQ_CLIENT          _IC_ID(IC_ID, IC_ID_IOREQ_BASE + 0x02)
+#define IC_ATTACH_IOREQ_CLIENT          _IC_ID(IC_ID, IC_ID_IOREQ_BASE + 0x03)
+#define IC_DESTROY_IOREQ_CLIENT         _IC_ID(IC_ID, IC_ID_IOREQ_BASE + 0x04)
+#define IC_CLEAR_VM_IOREQ               _IC_ID(IC_ID, IC_ID_IOREQ_BASE + 0x05)
+
 /* Guest memory management */
 #define IC_ID_MEM_BASE                  0x40UL
 #define IC_SET_MEMSEG                   _IC_ID(IC_ID, IC_ID_MEM_BASE + 0x01)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 12/15] drivers/acrn: add driver-specific IRQ handle to dispatch IO_REQ request
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (10 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 11/15] drivers/acrn: add the support of handling emulated ioreq Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 13/15] drivers/acrn: add service to obtain Power data transition Zhao Yakui
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel
  Cc: Zhao Yakui, Jason Chen CJ, Mingqiang Chi, Liu Shuo

After ACRN hypervisor captures the io_request(mmio, IO, PCI access) from
guest OS, it will send the IRQ interrupt to SOS system.
The HYPERVISOR_CALLBACK_VECTOR ISR handler will be executed and it
needs to call the driver-specific ISR handler to dispatch emulated
io_request.
After the emulation of ioreq request is finished, the ACRN hypervisor
is notified and then can resume the execution of guest OS.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Co-developed-by: Mingqiang Chi <mingqiang.chi@intel.com>
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Co-developed-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/acrn/acrn_dev.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
index 28258fb..93f45e3 100644
--- a/drivers/staging/acrn/acrn_dev.c
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -18,6 +18,7 @@
 #include <linux/kdev_t.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
+#include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/mm.h>
 #include <linux/module.h>
@@ -41,6 +42,7 @@ static int	acrn_hsm_inited;
 static int	major;
 static struct class	*acrn_class;
 static struct device	*acrn_device;
+static struct tasklet_struct acrn_io_req_tasklet;
 
 static
 int acrn_dev_open(struct inode *inodep, struct file *filep)
@@ -416,6 +418,16 @@ long acrn_dev_ioctl(struct file *filep,
 		break;
 	}
 	case IC_CLEAR_VM_IOREQ: {
+		/*
+		 * we need to flush the current pending ioreq dispatch
+		 * tasklet and finish it before clearing all ioreq of this VM.
+		 * With tasklet_kill, there still be a very rare race which
+		 * might lost one ioreq tasklet for other VMs. So arm one after
+		 * the clearing. It's harmless.
+		 */
+		tasklet_schedule(&acrn_io_req_tasklet);
+		tasklet_kill(&acrn_io_req_tasklet);
+		tasklet_schedule(&acrn_io_req_tasklet);
 		acrn_ioreq_clear_request(vm);
 		break;
 	}
@@ -449,6 +461,28 @@ static int acrn_dev_release(struct inode *inodep, struct file *filep)
 	return 0;
 }
 
+static void io_req_tasklet(unsigned long data)
+{
+	struct acrn_vm *vm;
+	/* This is already in tasklet. Use read_lock for list_lock */
+
+	read_lock(&acrn_vm_list_lock);
+	list_for_each_entry(vm, &acrn_vm_list, list) {
+		if (!vm || !vm->req_buf)
+			break;
+
+		get_vm(vm);
+		acrn_ioreq_distribute_request(vm);
+		put_vm(vm);
+	}
+	read_unlock(&acrn_vm_list_lock);
+}
+
+static void acrn_intr_handler(void)
+{
+	tasklet_schedule(&acrn_io_req_tasklet);
+}
+
 static const struct file_operations fops = {
 	.open = acrn_dev_open,
 	.release = acrn_dev_release,
@@ -462,6 +496,7 @@ static const struct file_operations fops = {
 
 static int __init acrn_init(void)
 {
+	unsigned long flag;
 	struct api_version *api_version;
 	acrn_hsm_inited = 0;
 	if (x86_hyper_type != X86_HYPER_ACRN)
@@ -518,6 +553,10 @@ static int __init acrn_init(void)
 		return PTR_ERR(acrn_device);
 	}
 
+	tasklet_init(&acrn_io_req_tasklet, io_req_tasklet, 0);
+	local_irq_save(flag);
+	acrn_setup_intr_irq(acrn_intr_handler);
+	local_irq_restore(flag);
 	acrn_ioreq_driver_init();
 	pr_info("acrn: ACRN Hypervisor service module initialized\n");
 	acrn_hsm_inited = 1;
@@ -529,6 +568,8 @@ static void __exit acrn_exit(void)
 	if (!acrn_hsm_inited)
 		return;
 
+	tasklet_kill(&acrn_io_req_tasklet);
+	acrn_remove_intr_irq();
 	device_destroy(acrn_class, MKDEV(major, 0));
 	class_unregister(acrn_class);
 	class_destroy(acrn_class);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 13/15] drivers/acrn: add service to obtain Power data transition
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (11 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 12/15] drivers/acrn: add driver-specific IRQ handle to dispatch IO_REQ request Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 14/15] drivers/acrn: add the support of irqfd and eventfd Zhao Yakui
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Jason Chen CJ, Victor Sun

The px/cx data is critical to support the power transition. DM will get
these data to build DSDT for UOS. With this DSDT, UOS would have the
capability on power control if acpi-cpufreq/idle driver is enabled in
kernel.
Add the PM ioctl that is used to obtain the info of power state
so that the DM can construct the DSDT with Power frequence/C-state idle
for guest system.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Co-developed-by: Victor Sun <victor.sun@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/acrn/acrn_dev.c           | 75 +++++++++++++++++++++++++++++++
 include/uapi/linux/acrn/acrn_ioctl_defs.h | 36 +++++++++++++++
 2 files changed, 111 insertions(+)

diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
index 93f45e3..ef0ec50 100644
--- a/drivers/staging/acrn/acrn_dev.c
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -432,6 +432,81 @@ long acrn_dev_ioctl(struct file *filep,
 		break;
 	}
 
+	case IC_PM_GET_CPU_STATE: {
+		u64 cmd;
+
+		if (copy_from_user(&cmd, (void *)ioctl_param, sizeof(cmd)))
+			return -EFAULT;
+
+		switch (cmd & PMCMD_TYPE_MASK) {
+		case PMCMD_GET_PX_CNT:
+		case PMCMD_GET_CX_CNT: {
+			u64 *pm_info;
+
+			pm_info = kmalloc(sizeof(u64), GFP_KERNEL);
+			if (!pm_info)
+				return -ENOMEM;
+
+			ret = hcall_get_cpu_state(cmd, virt_to_phys(pm_info));
+			if (ret < 0) {
+				kfree(pm_info);
+				return -EFAULT;
+			}
+
+			if (copy_to_user((void *)ioctl_param,
+					 pm_info, sizeof(u64)))
+				ret = -EFAULT;
+
+			kfree(pm_info);
+			break;
+		}
+		case PMCMD_GET_PX_DATA: {
+			struct cpu_px_data *px_data;
+
+			px_data = kmalloc(sizeof(*px_data), GFP_KERNEL);
+			if (!px_data)
+				return -ENOMEM;
+
+			ret = hcall_get_cpu_state(cmd, virt_to_phys(px_data));
+			if (ret < 0) {
+				kfree(px_data);
+				return -EFAULT;
+			}
+
+			if (copy_to_user((void *)ioctl_param,
+					 px_data, sizeof(*px_data)))
+				ret = -EFAULT;
+
+			kfree(px_data);
+			break;
+		}
+		case PMCMD_GET_CX_DATA: {
+			struct cpu_cx_data *cx_data;
+
+			cx_data = kmalloc(sizeof(*cx_data), GFP_KERNEL);
+			if (!cx_data)
+				return -ENOMEM;
+
+			ret = hcall_get_cpu_state(cmd, virt_to_phys(cx_data));
+			if (ret < 0) {
+				kfree(cx_data);
+				return -EFAULT;
+			}
+
+			if (copy_to_user((void *)ioctl_param,
+					 cx_data, sizeof(*cx_data)))
+				ret = -EFAULT;
+			kfree(cx_data);
+			break;
+		}
+		default:
+			ret = -EFAULT;
+			break;
+		}
+
+		break;
+	}
+
 	default:
 		pr_warn("Unknown IOCTL 0x%x\n", ioctl_num);
 		ret = -EFAULT;
diff --git a/include/uapi/linux/acrn/acrn_ioctl_defs.h b/include/uapi/linux/acrn/acrn_ioctl_defs.h
index c3c4f98..c762bd2 100644
--- a/include/uapi/linux/acrn/acrn_ioctl_defs.h
+++ b/include/uapi/linux/acrn/acrn_ioctl_defs.h
@@ -234,6 +234,39 @@ struct ioreq_notify {
 	uint32_t vcpu;
 };
 
+struct acrn_generic_address {
+	uint8_t		space_id;
+	uint8_t		bit_width;
+	uint8_t		bit_offset;
+	uint8_t		access_size;
+	uint64_t	address;
+};
+
+struct cpu_cx_data {
+	struct acrn_generic_address cx_reg;
+	uint8_t		type;
+	uint32_t	latency;
+	uint64_t	power;
+};
+
+struct cpu_px_data {
+	uint64_t core_frequency;	/* megahertz */
+	uint64_t power;			/* milliWatts */
+	uint64_t transition_latency;	/* microseconds */
+	uint64_t bus_master_latency;	/* microseconds */
+	uint64_t control;		/* control value */
+	uint64_t status;		/* success indicator */
+};
+
+#define PMCMD_TYPE_MASK		0x000000ff
+
+enum pm_cmd_type {
+	PMCMD_GET_PX_CNT,
+	PMCMD_GET_PX_DATA,
+	PMCMD_GET_CX_CNT,
+	PMCMD_GET_CX_DATA,
+};
+
 /*
  * Common IOCTL ID definition for DM
  */
@@ -281,4 +314,7 @@ struct ioreq_notify {
 #define IC_SET_PTDEV_INTR_INFO         _IC_ID(IC_ID, IC_ID_PCI_BASE + 0x03)
 #define IC_RESET_PTDEV_INTR_INFO       _IC_ID(IC_ID, IC_ID_PCI_BASE + 0x04)
 
+/* Power management */
+#define IC_ID_PM_BASE                   0x60UL
+#define IC_PM_GET_CPU_STATE            _IC_ID(IC_ID, IC_ID_PM_BASE + 0x00)
 #endif /* __ACRN_IOCTL_DEFS_H__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 14/15] drivers/acrn: add the support of irqfd and eventfd
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (12 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 13/15] drivers/acrn: add service to obtain Power data transition Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-16  2:25 ` [RFC PATCH 15/15] drivers/acrn: add the support of offline SOS cpu Zhao Yakui
  2019-08-16  6:39 ` [RFC PATCH 00/15] acrn: add the ACRN driver module Borislav Petkov
  15 siblings, 0 replies; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Shuo Liu

The ioventfd/irqfd based on eventfd is one mechanism that is widely used
to implement virtio kernel backend driver. After the ioreq is trigged from
virtio front driver, the eventfd_signal is called to notify the eventfd so
that the virtio kernel backend driver is waked up to handle the request.
After it is done, it will wake up the irqfd to inject the interrupt to
virtio front driver.

Each ioeventfd registered by userspace can map a PIO/MMIO range of the
guest to eventfd, and response to signal the eventfd when get the
in-range IO write from guest. Then the other side of eventfd can be
notified to process the IO request.

As we only use the ioeventfd to listen virtqueue's kick register, some
limitations are added:
      1) Length support can only be 1, 2, 4 or 8
      2) Only support write operation, read will get 0
      3) Same address, shorter length writing can be handled with the
         integral data matching

The irqfd based on eventfd provides a pipe for injecting guest interrupt
through a file description writing operation. Each irqfd registered by
userspace can map a interrupt of the guest to eventfd, and a writing
operation on one side of the eventfd will trigger the interrupt injection
on acrn_hsm side.

Co-developed-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/acrn/Makefile             |   4 +-
 drivers/staging/acrn/acrn_dev.c           |  19 ++
 drivers/staging/acrn/acrn_drv_internal.h  |  10 +
 drivers/staging/acrn/acrn_ioeventfd.c     | 407 ++++++++++++++++++++++++++++++
 drivers/staging/acrn/acrn_irqfd.c         | 339 +++++++++++++++++++++++++
 drivers/staging/acrn/acrn_vm_mngt.c       |   9 +-
 include/uapi/linux/acrn/acrn_ioctl_defs.h |  25 ++
 7 files changed, 811 insertions(+), 2 deletions(-)
 create mode 100644 drivers/staging/acrn/acrn_ioeventfd.c
 create mode 100644 drivers/staging/acrn/acrn_irqfd.c

diff --git a/drivers/staging/acrn/Makefile b/drivers/staging/acrn/Makefile
index a381944..f8d8ee2 100644
--- a/drivers/staging/acrn/Makefile
+++ b/drivers/staging/acrn/Makefile
@@ -4,4 +4,6 @@ acrn-y := acrn_dev.o \
 	  acrn_vm_mngt.o \
 	  acrn_mm.o \
 	  acrn_mm_hugetlb.o \
-	  acrn_ioreq.o
+	  acrn_ioreq.o  \
+	  acrn_ioeventfd.o \
+	  acrn_irqfd.o
diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
index ef0ec50..0602125 100644
--- a/drivers/staging/acrn/acrn_dev.c
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -141,6 +141,8 @@ long acrn_dev_ioctl(struct file *filep,
 		if (ret < 0)
 			goto ioreq_buf_fail;
 
+		acrn_ioeventfd_init(vm->vmid);
+		acrn_irqfd_init(vm->vmid);
 		pr_info("acrn: VM %d created\n", created_vm->vmid);
 		kfree(created_vm);
 		break;
@@ -506,6 +508,23 @@ long acrn_dev_ioctl(struct file *filep,
 
 		break;
 	}
+	case IC_EVENT_IOEVENTFD: {
+		struct acrn_ioeventfd args;
+
+		if (copy_from_user(&args, (void *)ioctl_param, sizeof(args)))
+			return -EFAULT;
+		ret = acrn_ioeventfd_config(vm->vmid, &args);
+		break;
+	}
+
+	case IC_EVENT_IRQFD: {
+		struct acrn_irqfd args;
+
+		if (copy_from_user(&args, (void *)ioctl_param, sizeof(args)))
+			return -EFAULT;
+		ret = acrn_irqfd_config(vm->vmid, &args);
+		break;
+	}
 
 	default:
 		pr_warn("Unknown IOCTL 0x%x\n", ioctl_num);
diff --git a/drivers/staging/acrn/acrn_drv_internal.h b/drivers/staging/acrn/acrn_drv_internal.h
index 7813387..b9ded9a 100644
--- a/drivers/staging/acrn/acrn_drv_internal.h
+++ b/drivers/staging/acrn/acrn_drv_internal.h
@@ -173,4 +173,14 @@ void acrn_ioreq_driver_init(void);
 void acrn_ioreq_clear_request(struct acrn_vm *vm);
 int acrn_ioreq_distribute_request(struct acrn_vm *vm);
 
+/* ioeventfd APIs */
+int acrn_ioeventfd_init(unsigned short vmid);
+int acrn_ioeventfd_config(unsigned short vmid, struct acrn_ioeventfd *args);
+void acrn_ioeventfd_deinit(unsigned short vmid);
+
+/* irqfd APIs */
+int acrn_irqfd_init(unsigned short vmid);
+int acrn_irqfd_config(unsigned short vmid, struct acrn_irqfd *args);
+void acrn_irqfd_deinit(unsigned short vmid);
+
 #endif
diff --git a/drivers/staging/acrn/acrn_ioeventfd.c b/drivers/staging/acrn/acrn_ioeventfd.c
new file mode 100644
index 0000000..b330625
--- /dev/null
+++ b/drivers/staging/acrn/acrn_ioeventfd.c
@@ -0,0 +1,407 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
+/*
+ * ACRN hyperviosr service module (SRV): ioeventfd based on eventfd
+ *
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ * Liu Shuo <shuo.a.liu@intel.com>
+ * Zhao Yakui <yakui.zhao@intel.com>
+ */
+#include <linux/types.h>
+#include <linux/wait.h>
+#include <linux/poll.h>
+#include <linux/file.h>
+#include <linux/list.h>
+#include <linux/eventfd.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include <linux/acrn/acrn_ioctl_defs.h>
+#include <linux/acrn/acrn_drv.h>
+
+#include "acrn_drv_internal.h"
+#include "acrn_hypercall.h"
+
+static LIST_HEAD(acrn_ioeventfd_clients);
+static DEFINE_MUTEX(acrn_ioeventfds_mutex);
+
+/* use internally to record properties of each ioeventfd */
+struct acrn_hsm_ioeventfd {
+	/* list to link all ioventfd together */
+	struct list_head list;
+	/* eventfd of this ioeventfd */
+	struct eventfd_ctx *eventfd;
+	/* start address for IO range*/
+	u64 addr;
+	/* match data */
+	u64 data;
+	/* length for IO range */
+	int length;
+	/* IO range type, can be REQ_PORTIO and REQ_MMIO */
+	int type;
+	/* ignore match data if true */
+	bool wildcard;
+};
+
+/* instance to bind ioeventfds of each VM */
+struct acrn_ioeventfd_info {
+	struct list_head list;
+	atomic_t refcnt;
+	/* vmid of VM */
+	unsigned short vmid;
+	/* acrn ioreq client for this instance */
+	int acrn_client_id;
+	/* vcpu number of this VM */
+	int vcpu_num;
+	/* ioreq shared buffer of this VM */
+	struct acrn_request *req_buf;
+
+	/* the mutex lock to protect ioeventfd list attached to VM */
+	struct mutex ioeventfds_lock;
+	/* ioeventfds in this instance */
+	struct list_head ioeventfds;
+};
+
+static
+struct acrn_ioeventfd_info *get_ioeventfd_info_by_vm(unsigned short vmid)
+{
+	struct acrn_ioeventfd_info *info = NULL;
+
+	mutex_lock(&acrn_ioeventfds_mutex);
+	list_for_each_entry(info, &acrn_ioeventfd_clients, list) {
+		if (info->vmid == vmid) {
+			atomic_inc(&info->refcnt);
+			mutex_unlock(&acrn_ioeventfds_mutex);
+			return info;
+		}
+	}
+	mutex_unlock(&acrn_ioeventfds_mutex);
+	return NULL;
+}
+
+static void put_ioeventfd_info(struct acrn_ioeventfd_info *info)
+{
+	mutex_lock(&acrn_ioeventfds_mutex);
+	if (atomic_dec_and_test(&info->refcnt)) {
+		list_del(&info->list);
+		mutex_unlock(&acrn_ioeventfds_mutex);
+		kfree(info);
+		return;
+	}
+	mutex_unlock(&acrn_ioeventfds_mutex);
+}
+
+/* assumes info->ioeventfds_lock held */
+static void acrn_ioeventfd_shutdown(struct acrn_hsm_ioeventfd *p)
+{
+	eventfd_ctx_put(p->eventfd);
+	list_del(&p->list);
+	kfree(p);
+}
+
+static inline int ioreq_type_from_flags(int flags)
+{
+	return flags & ACRN_IOEVENTFD_FLAG_PIO ?
+			REQ_PORTIO : REQ_MMIO;
+}
+
+/* assumes info->ioeventfds_lock held */
+static bool acrn_ioeventfd_is_duplicated(struct acrn_ioeventfd_info *info,
+					 struct acrn_hsm_ioeventfd *ioeventfd)
+{
+	struct acrn_hsm_ioeventfd *p;
+
+	/*
+	 * Treat same addr/type/data with different length combination
+	 * as the same one.
+	 *   Register PIO[0x100~0x107] with data 0x10 as ioeventfd A, later
+	 *   PIO[0x100~0x103] with data 0x10 will be failed to register.
+	 */
+	list_for_each_entry(p, &info->ioeventfds, list)
+		if (p->addr == ioeventfd->addr &&
+		    p->type == ioeventfd->type &&
+		    (p->wildcard || ioeventfd->wildcard ||
+		     p->data == ioeventfd->data))
+			return true;
+
+	return false;
+}
+
+static int acrn_assign_ioeventfd(struct acrn_ioeventfd_info *info,
+				 struct acrn_ioeventfd *args)
+{
+	struct eventfd_ctx *eventfd;
+	struct acrn_hsm_ioeventfd *p;
+	int ret = -ENOENT;
+
+	/* check for range overflow */
+	if (args->addr + args->len < args->addr)
+		return -EINVAL;
+
+	/* Only support 1,2,4,8 width registers */
+	if (!(args->len == 1 || args->len == 2 ||
+	      args->len == 4 || args->len == 8))
+		return -EINVAL;
+
+	eventfd = eventfd_ctx_fdget(args->fd);
+	if (IS_ERR(eventfd))
+		return PTR_ERR(eventfd);
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	INIT_LIST_HEAD(&p->list);
+	p->addr    = args->addr;
+	p->length  = args->len;
+	p->eventfd = eventfd;
+	p->type	   = ioreq_type_from_flags(args->flags);
+
+	/* If datamatch enabled, we compare the data
+	 * otherwise this is a wildcard
+	 */
+	if (args->flags & ACRN_IOEVENTFD_FLAG_DATAMATCH)
+		p->data = args->data;
+	else
+		p->wildcard = true;
+
+	mutex_lock(&info->ioeventfds_lock);
+
+	/* Verify that there isn't a match already */
+	if (acrn_ioeventfd_is_duplicated(info, p)) {
+		ret = -EEXIST;
+		goto unlock_fail;
+	}
+
+	/* register the IO range into acrn client */
+	ret = acrn_ioreq_add_iorange(info->acrn_client_id, p->type,
+				     p->addr, p->addr + p->length - 1);
+	if (ret < 0)
+		goto unlock_fail;
+
+	list_add_tail(&p->list, &info->ioeventfds);
+	mutex_unlock(&info->ioeventfds_lock);
+
+	return 0;
+
+unlock_fail:
+	mutex_unlock(&info->ioeventfds_lock);
+fail:
+	kfree(p);
+	eventfd_ctx_put(eventfd);
+	return ret;
+}
+
+static int acrn_deassign_ioeventfd(struct acrn_ioeventfd_info *info,
+				   struct acrn_ioeventfd *args)
+{
+	struct acrn_hsm_ioeventfd *p, *tmp;
+	struct eventfd_ctx *eventfd;
+	int ret = 0;
+
+	eventfd = eventfd_ctx_fdget(args->fd);
+	if (IS_ERR(eventfd))
+		return PTR_ERR(eventfd);
+
+	mutex_lock(&info->ioeventfds_lock);
+
+	list_for_each_entry_safe(p, tmp, &info->ioeventfds, list) {
+		if (p->eventfd != eventfd)
+			continue;
+
+		ret = acrn_ioreq_del_iorange(info->acrn_client_id, p->type,
+					     p->addr,
+					     p->addr + p->length - 1);
+		if (ret)
+			break;
+		acrn_ioeventfd_shutdown(p);
+		break;
+	}
+
+	mutex_unlock(&info->ioeventfds_lock);
+
+	eventfd_ctx_put(eventfd);
+
+	return ret;
+}
+
+static struct acrn_hsm_ioeventfd *
+acrn_ioeventfd_match(struct acrn_ioeventfd_info *info,
+		     u64 addr, u64 data,
+		     int length, int type)
+{
+	struct acrn_hsm_ioeventfd *p = NULL;
+
+	/*
+	 * Same addr/type/data will be treated as hit, otherwise ignore.
+	 *   Register PIO[0x100~0x107] with data 0x10 as ioeventfd A, later
+	 *   request PIO[0x100~0x103] with data 0x10 will hit A.
+	 */
+	list_for_each_entry(p, &info->ioeventfds, list) {
+		if (p->type == type && p->addr == addr &&
+		    (p->wildcard || p->data == data))
+			return p;
+	}
+
+	return NULL;
+}
+
+static int acrn_ioeventfd_handler(int client_id,
+				  unsigned long *ioreqs_map,
+				  void *client_priv)
+{
+	struct acrn_request *req;
+	struct acrn_hsm_ioeventfd *p;
+	struct acrn_ioeventfd_info *info;
+	u64 addr;
+	u64 val;
+	int size;
+	int vcpu;
+
+	info = (struct acrn_ioeventfd_info *)client_priv;
+	if (!info)
+		return -EINVAL;
+
+	/* get req buf */
+	if (!info->req_buf) {
+		info->req_buf = acrn_ioreq_get_reqbuf(info->acrn_client_id);
+		if (!info->req_buf) {
+			pr_err("Failed to get req_buf for client %d\n",
+			       info->acrn_client_id);
+			return -EINVAL;
+		}
+	}
+
+	while (1) {
+		vcpu = find_first_bit(ioreqs_map, info->vcpu_num);
+		if (vcpu == info->vcpu_num)
+			break;
+		req = &info->req_buf[vcpu];
+		if (atomic_read(&req->processed) == REQ_STATE_PROCESSING &&
+		    req->client == client_id) {
+			if (req->type == REQ_MMIO) {
+				if (req->reqs.mmio_request.direction ==
+						REQUEST_READ) {
+					/* reading does nothing and return 0 */
+					req->reqs.mmio_request.value = 0;
+					goto next_ioreq;
+				}
+				addr = req->reqs.mmio_request.address;
+				size = req->reqs.mmio_request.size;
+				val = req->reqs.mmio_request.value;
+			} else {
+				if (req->reqs.pio_request.direction ==
+						REQUEST_READ) {
+					/* reading does nothing and return 0 */
+					req->reqs.pio_request.value = 0;
+					goto next_ioreq;
+				}
+				addr = req->reqs.pio_request.address;
+				size = req->reqs.pio_request.size;
+				val = req->reqs.pio_request.value;
+			}
+
+			mutex_lock(&info->ioeventfds_lock);
+			p = acrn_ioeventfd_match(info, addr, val, size,
+						 req->type);
+			if (p)
+				eventfd_signal(p->eventfd, 1);
+			mutex_unlock(&info->ioeventfds_lock);
+
+next_ioreq:
+			acrn_ioreq_complete_request(client_id, vcpu, req);
+		}
+	}
+
+	return 0;
+}
+
+int acrn_ioeventfd_init(unsigned short vmid)
+{
+	int ret = 0;
+	char name[16];
+	struct acrn_ioeventfd_info *info;
+
+	info = get_ioeventfd_info_by_vm(vmid);
+	if (info) {
+		put_ioeventfd_info(info);
+		return -EEXIST;
+	}
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+	mutex_init(&info->ioeventfds_lock);
+	info->vmid = vmid;
+	atomic_set(&info->refcnt, 1);
+	INIT_LIST_HEAD(&info->ioeventfds);
+	info->vcpu_num = ACRN_REQUEST_MAX;
+
+	snprintf(name, sizeof(name), "ioeventfd-%hu", vmid);
+	info->acrn_client_id = acrn_ioreq_create_client(vmid,
+							acrn_ioeventfd_handler,
+							info, name);
+	if (info->acrn_client_id < 0) {
+		pr_err("Failed to create ioeventfd client for ioreq!\n");
+		ret = -EINVAL;
+		goto fail;
+	}
+
+	ret = acrn_ioreq_attach_client(info->acrn_client_id);
+	if (ret < 0) {
+		pr_err("Failed to attach acrn client %d!\n",
+		       info->acrn_client_id);
+		goto client_fail;
+	}
+
+	mutex_lock(&acrn_ioeventfds_mutex);
+	list_add(&info->list, &acrn_ioeventfd_clients);
+	mutex_unlock(&acrn_ioeventfds_mutex);
+
+	pr_info("ACRN hsm ioeventfd init done!\n");
+	return 0;
+client_fail:
+	acrn_ioreq_destroy_client(info->acrn_client_id);
+fail:
+	kfree(info);
+	return ret;
+}
+
+void acrn_ioeventfd_deinit(unsigned short vmid)
+{
+	struct acrn_hsm_ioeventfd *p, *tmp;
+	struct acrn_ioeventfd_info *info = NULL;
+
+	info = get_ioeventfd_info_by_vm(vmid);
+	if (!info)
+		return;
+
+	acrn_ioreq_destroy_client(info->acrn_client_id);
+	mutex_lock(&info->ioeventfds_lock);
+	list_for_each_entry_safe(p, tmp, &info->ioeventfds, list)
+		acrn_ioeventfd_shutdown(p);
+	mutex_unlock(&info->ioeventfds_lock);
+
+	put_ioeventfd_info(info);
+	/* put one more as we count it in finding */
+	put_ioeventfd_info(info);
+}
+
+int acrn_ioeventfd_config(unsigned short vmid, struct acrn_ioeventfd *args)
+{
+	struct acrn_ioeventfd_info *info = NULL;
+	int ret;
+
+	info = get_ioeventfd_info_by_vm(vmid);
+	if (!info)
+		return -ENOENT;
+
+	if (args->flags & ACRN_IOEVENTFD_FLAG_DEASSIGN)
+		ret = acrn_deassign_ioeventfd(info, args);
+	else
+		ret = acrn_assign_ioeventfd(info, args);
+
+	put_ioeventfd_info(info);
+	return ret;
+}
diff --git a/drivers/staging/acrn/acrn_irqfd.c b/drivers/staging/acrn/acrn_irqfd.c
new file mode 100644
index 0000000..578e05c
--- /dev/null
+++ b/drivers/staging/acrn/acrn_irqfd.c
@@ -0,0 +1,339 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
+/*
+ * ACRN hyperviosr service module (SRV): irqfd based on eventfd
+ *
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ * Liu Shuo <shuo.a.liu@intel.com>
+ * Zhao Yakui <yakui.zhao@intel.com>
+ */
+
+#include <linux/device.h>
+#include <linux/wait.h>
+#include <linux/poll.h>
+#include <linux/file.h>
+#include <linux/list.h>
+#include <linux/eventfd.h>
+#include <linux/kernel.h>
+#include <linux/async.h>
+#include <linux/slab.h>
+
+#include <linux/acrn/acrn_ioctl_defs.h>
+#include <linux/acrn/acrn_drv.h>
+
+#include "acrn_drv_internal.h"
+#include "acrn_hypercall.h"
+
+static LIST_HEAD(acrn_irqfd_clients);
+static DEFINE_MUTEX(acrn_irqfds_mutex);
+
+/* instance to bind irqfds of each VM */
+struct acrn_irqfd_info {
+	struct list_head list;
+	int refcnt;
+	/* vmid of VM */
+	unsigned short  vmid;
+	/* workqueue for async shutdown work */
+	struct workqueue_struct *wq;
+
+	/* the lock to protect the irqfds list */
+	spinlock_t irqfds_lock;
+	/* irqfds in this instance */
+	struct list_head irqfds;
+};
+
+/* use internally to record properties of each irqfd */
+struct acrn_hsm_irqfd {
+	/* acrn_irqfd_info which this irqfd belong to */
+	struct acrn_irqfd_info *info;
+	/* wait queue node */
+	wait_queue_entry_t wait;
+	/* async shutdown work */
+	struct work_struct shutdown;
+	/* eventfd of this irqfd */
+	struct eventfd_ctx *eventfd;
+	/* list to link all ioventfd together */
+	struct list_head list;
+	/* poll_table of this irqfd */
+	poll_table pt;
+	/* msi to send when this irqfd triggerd */
+	struct acrn_msi_entry msi;
+};
+
+static struct acrn_irqfd_info *get_irqfd_info_by_vm(uint16_t vmid)
+{
+	struct acrn_irqfd_info *info = NULL;
+
+	mutex_lock(&acrn_irqfds_mutex);
+	list_for_each_entry(info, &acrn_irqfd_clients, list) {
+		if (info->vmid == vmid) {
+			info->refcnt++;
+			mutex_unlock(&acrn_irqfds_mutex);
+			return info;
+		}
+	}
+	mutex_unlock(&acrn_irqfds_mutex);
+	return NULL;
+}
+
+static void put_irqfd_info(struct acrn_irqfd_info *info)
+{
+	mutex_lock(&acrn_irqfds_mutex);
+	info->refcnt--;
+	if (info->refcnt == 0) {
+		list_del(&info->list);
+		kfree(info);
+	}
+	mutex_unlock(&acrn_irqfds_mutex);
+}
+
+static void acrn_irqfd_inject(struct acrn_hsm_irqfd *irqfd)
+{
+	struct acrn_irqfd_info *info = irqfd->info;
+
+	acrn_inject_msi(info->vmid, irqfd->msi.msi_addr,
+			irqfd->msi.msi_data);
+}
+
+/*
+ * Try to find if the irqfd still in list info->irqfds
+ *
+ * assumes info->irqfds_lock is held
+ */
+static bool acrn_irqfd_is_active(struct acrn_irqfd_info *info,
+				 struct acrn_hsm_irqfd *irqfd)
+{
+	struct acrn_hsm_irqfd *_irqfd;
+
+	list_for_each_entry(_irqfd, &info->irqfds, list)
+		if (_irqfd == irqfd)
+			return true;
+
+	return false;
+}
+
+/*
+ * Remove irqfd and free it.
+ *
+ * assumes info->irqfds_lock is held
+ */
+static void acrn_irqfd_shutdown(struct acrn_hsm_irqfd *irqfd)
+{
+	u64 cnt;
+
+	/* remove from wait queue */
+	list_del_init(&irqfd->list);
+	eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt);
+	eventfd_ctx_put(irqfd->eventfd);
+	kfree(irqfd);
+}
+
+static void acrn_irqfd_shutdown_work(struct work_struct *work)
+{
+	struct acrn_hsm_irqfd *irqfd =
+		container_of(work, struct acrn_hsm_irqfd, shutdown);
+	struct acrn_irqfd_info *info = irqfd->info;
+
+	spin_lock(&info->irqfds_lock);
+	if (acrn_irqfd_is_active(info, irqfd))
+		acrn_irqfd_shutdown(irqfd);
+	spin_unlock(&info->irqfds_lock);
+}
+
+/*
+ * Called with wqh->lock held and interrupts disabled
+ */
+static int acrn_irqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode,
+			     int sync, void *key)
+{
+	struct acrn_hsm_irqfd *irqfd =
+		container_of(wait, struct acrn_hsm_irqfd, wait);
+	unsigned long poll_bits = (unsigned long)key;
+	struct acrn_irqfd_info *info = irqfd->info;
+
+	if (poll_bits & POLLIN)
+		/* An event has been signaled, inject an interrupt */
+		acrn_irqfd_inject(irqfd);
+
+	if (poll_bits & POLLHUP)
+		/* async close eventfd as shutdown need hold wqh->lock */
+		queue_work(info->wq, &irqfd->shutdown);
+
+	return 0;
+}
+
+static void acrn_irqfd_poll_func(struct file *file, wait_queue_head_t *wqh,
+				 poll_table *pt)
+{
+	struct acrn_hsm_irqfd *irqfd =
+		container_of(pt, struct acrn_hsm_irqfd, pt);
+	add_wait_queue(wqh, &irqfd->wait);
+}
+
+static
+int acrn_irqfd_assign(struct acrn_irqfd_info *info, struct acrn_irqfd *args)
+{
+	struct acrn_hsm_irqfd *irqfd, *tmp;
+	struct fd f;
+	struct eventfd_ctx *eventfd = NULL;
+	int ret = 0;
+	unsigned int events;
+
+	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
+	if (!irqfd)
+		return -ENOMEM;
+
+	irqfd->info = info;
+	memcpy(&irqfd->msi, &args->msi, sizeof(args->msi));
+	INIT_LIST_HEAD(&irqfd->list);
+	INIT_WORK(&irqfd->shutdown, acrn_irqfd_shutdown_work);
+
+	f = fdget(args->fd);
+	if (!f.file) {
+		ret = -EBADF;
+		goto out;
+	}
+
+	eventfd = eventfd_ctx_fileget(f.file);
+	if (IS_ERR(eventfd)) {
+		ret = PTR_ERR(eventfd);
+		goto fail;
+	}
+
+	irqfd->eventfd = eventfd;
+
+	/*
+	 * Install our own custom wake-up handling so we are notified via
+	 * a callback whenever someone signals the underlying eventfd
+	 */
+	init_waitqueue_func_entry(&irqfd->wait, acrn_irqfd_wakeup);
+	init_poll_funcptr(&irqfd->pt, acrn_irqfd_poll_func);
+
+	spin_lock(&info->irqfds_lock);
+
+	list_for_each_entry(tmp, &info->irqfds, list) {
+		if (irqfd->eventfd != tmp->eventfd)
+			continue;
+		/* This fd is used for another irq already. */
+		ret = -EBUSY;
+		spin_unlock(&info->irqfds_lock);
+		goto fail;
+	}
+	list_add_tail(&irqfd->list, &info->irqfds);
+
+	spin_unlock(&info->irqfds_lock);
+
+	/* Check the pending event in this stage */
+	events = f.file->f_op->poll(f.file, &irqfd->pt);
+
+	if (events & POLLIN)
+		acrn_irqfd_inject(irqfd);
+
+	fdput(f);
+
+	return 0;
+fail:
+	if (eventfd && !IS_ERR(eventfd))
+		eventfd_ctx_put(eventfd);
+
+	fdput(f);
+out:
+	kfree(irqfd);
+	return ret;
+}
+
+static int acrn_irqfd_deassign(struct acrn_irqfd_info *info,
+			       struct acrn_irqfd *args)
+{
+	struct acrn_hsm_irqfd *irqfd, *tmp;
+	struct eventfd_ctx *eventfd;
+
+	eventfd = eventfd_ctx_fdget(args->fd);
+	if (IS_ERR(eventfd))
+		return PTR_ERR(eventfd);
+
+	spin_lock(&info->irqfds_lock);
+
+	list_for_each_entry_safe(irqfd, tmp, &info->irqfds, list) {
+		if (irqfd->eventfd == eventfd) {
+			acrn_irqfd_shutdown(irqfd);
+			break;
+		}
+	}
+
+	spin_unlock(&info->irqfds_lock);
+	eventfd_ctx_put(eventfd);
+
+	return 0;
+}
+
+int acrn_irqfd_config(unsigned short vmid, struct acrn_irqfd *args)
+{
+	struct acrn_irqfd_info *info;
+	int ret;
+
+	info = get_irqfd_info_by_vm(vmid);
+	if (!info)
+		return -ENOENT;
+
+	if (args->flags & ACRN_IRQFD_FLAG_DEASSIGN)
+		ret = acrn_irqfd_deassign(info, args);
+	else
+		ret = acrn_irqfd_assign(info, args);
+
+	put_irqfd_info(info);
+	return ret;
+}
+
+int acrn_irqfd_init(unsigned short vmid)
+{
+	struct acrn_irqfd_info *info;
+
+	info = get_irqfd_info_by_vm(vmid);
+	if (info) {
+		put_irqfd_info(info);
+		return -EEXIST;
+	}
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+	info->vmid = vmid;
+	info->refcnt = 1;
+	INIT_LIST_HEAD(&info->irqfds);
+	spin_lock_init(&info->irqfds_lock);
+
+	info->wq = alloc_workqueue("acrn_irqfd-%d", 0, 0, vmid);
+	if (!info->wq) {
+		kfree(info);
+		return -ENOMEM;
+	}
+
+	mutex_lock(&acrn_irqfds_mutex);
+	list_add(&info->list, &acrn_irqfd_clients);
+	mutex_unlock(&acrn_irqfds_mutex);
+
+	pr_info("ACRN HSM irqfd init done!\n");
+	return 0;
+}
+
+void acrn_irqfd_deinit(uint16_t vmid)
+{
+	struct acrn_hsm_irqfd *irqfd, *tmp;
+	struct acrn_irqfd_info *info;
+
+	info = get_irqfd_info_by_vm(vmid);
+	if (!info)
+		return;
+
+	put_irqfd_info(info);
+
+	destroy_workqueue(info->wq);
+
+	spin_lock(&info->irqfds_lock);
+	list_for_each_entry_safe(irqfd, tmp, &info->irqfds, list)
+		acrn_irqfd_shutdown(irqfd);
+	spin_unlock(&info->irqfds_lock);
+
+	/* put one more to release it */
+	put_irqfd_info(info);
+}
diff --git a/drivers/staging/acrn/acrn_vm_mngt.c b/drivers/staging/acrn/acrn_vm_mngt.c
index 8de380c..13ed719 100644
--- a/drivers/staging/acrn/acrn_vm_mngt.c
+++ b/drivers/staging/acrn/acrn_vm_mngt.c
@@ -71,6 +71,8 @@ int acrn_vm_destroy(struct acrn_vm *vm)
 	if (test_and_set_bit(ACRN_VM_DESTROYED, &vm->flags))
 		return 0;
 
+	acrn_ioeventfd_deinit(vm->vmid);
+	acrn_irqfd_deinit(vm->vmid);
 	ret = hcall_destroy_vm(vm->vmid);
 	if (ret < 0) {
 		pr_warn("failed to destroy VM %d\n", vm->vmid);
@@ -88,7 +90,12 @@ int acrn_inject_msi(unsigned short vmid, unsigned long msi_addr,
 	struct acrn_msi_entry *msi;
 	int ret;
 
-	msi = kzalloc(sizeof(*msi), GFP_KERNEL);
+	/* acrn_inject_msi is called in acrn_irqfd_inject from eventfd_signal
+	 * and the interrupt is disabled.
+	 * So the GFP_ATOMIC should be used instead of GFP_KERNEL to
+	 * avoid the sleeping with interrupt disabled.
+	 */
+	msi = kzalloc(sizeof(*msi), GFP_ATOMIC);
 
 	if (!msi)
 		return -ENOMEM;
diff --git a/include/uapi/linux/acrn/acrn_ioctl_defs.h b/include/uapi/linux/acrn/acrn_ioctl_defs.h
index c762bd2..3a4f7c1 100644
--- a/include/uapi/linux/acrn/acrn_ioctl_defs.h
+++ b/include/uapi/linux/acrn/acrn_ioctl_defs.h
@@ -267,6 +267,25 @@ enum pm_cmd_type {
 	PMCMD_GET_CX_DATA,
 };
 
+#define ACRN_IOEVENTFD_FLAG_PIO		0x01
+#define ACRN_IOEVENTFD_FLAG_DATAMATCH	0x02
+#define ACRN_IOEVENTFD_FLAG_DEASSIGN	0x04
+struct acrn_ioeventfd {
+	int32_t fd;
+	uint32_t flags;
+	uint64_t addr;
+	uint32_t len;
+	uint32_t reserved;
+	uint64_t data;
+};
+
+#define ACRN_IRQFD_FLAG_DEASSIGN	0x01
+struct acrn_irqfd {
+	int32_t fd;
+	uint32_t flags;
+	struct acrn_msi_entry msi;
+};
+
 /*
  * Common IOCTL ID definition for DM
  */
@@ -317,4 +336,10 @@ enum pm_cmd_type {
 /* Power management */
 #define IC_ID_PM_BASE                   0x60UL
 #define IC_PM_GET_CPU_STATE            _IC_ID(IC_ID, IC_ID_PM_BASE + 0x00)
+
+/* VHM eventfd */
+#define IC_ID_EVENT_BASE		0x70UL
+#define IC_EVENT_IOEVENTFD		_IC_ID(IC_ID, IC_ID_EVENT_BASE + 0x00)
+#define IC_EVENT_IRQFD			_IC_ID(IC_ID, IC_ID_EVENT_BASE + 0x01)
+
 #endif /* __ACRN_IOCTL_DEFS_H__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 15/15] drivers/acrn: add the support of offline SOS cpu
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (13 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 14/15] drivers/acrn: add the support of irqfd and eventfd Zhao Yakui
@ 2019-08-16  2:25 ` Zhao Yakui
  2019-08-19 10:34   ` Dan Carpenter
  2019-08-16  6:39 ` [RFC PATCH 00/15] acrn: add the ACRN driver module Borislav Petkov
  15 siblings, 1 reply; 40+ messages in thread
From: Zhao Yakui @ 2019-08-16  2:25 UTC (permalink / raw)
  To: x86, linux-kernel, devel; +Cc: Zhao Yakui, Jason Chen CJ

The ACRN-hypervisor works in partition mode. In such case the guest OS
and domain0 kernel will run in the different CPUs.  In course of booting
domain0 kernel, it can use all the available CPUs,which can accelerate
the booting. But after the booting is finished, it needs to offline the
other CPUs so that they can be allocated to the guest OS.

add sysfs with attr "offline_cpu", use
	echo cpu_id > /sys/class/acrn/acrn_hsm/offline_cpu
to do the hypercall offline/destroy according vcpu.
before doing it, It will offline cpu by using the below cmd:
	echo 0 > /sys/devices/system/cpu/cpuX/online

Currently this is mainly used in user-space device model before
booting other ACRN guest.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/staging/acrn/acrn_dev.c | 45 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
index 0602125..6868003 100644
--- a/drivers/staging/acrn/acrn_dev.c
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -588,6 +588,41 @@ static const struct file_operations fops = {
 #define SUPPORT_HV_API_VERSION_MAJOR	1
 #define SUPPORT_HV_API_VERSION_MINOR	0
 
+static ssize_t
+offline_cpu_store(struct device *dev,
+			struct device_attribute *attr,
+			const char *buf, size_t count)
+{
+#ifdef CONFIG_X86
+	u64 cpu, lapicid;
+
+	if (kstrtoull(buf, 0, &cpu) < 0)
+		return -EINVAL;
+
+	if (cpu_possible(cpu)) {
+		lapicid = cpu_data(cpu).apicid;
+		pr_info("acrn: try to offline cpu %lld with lapicid %lld\n",
+				cpu, lapicid);
+		if (hcall_sos_offline_cpu(lapicid) < 0) {
+			pr_err("acrn: failed to offline cpu from Hypervisor!\n");
+			return -EINVAL;
+		}
+	}
+#endif
+	return count;
+}
+
+static DEVICE_ATTR(offline_cpu, 00200, NULL, offline_cpu_store);
+
+static struct attribute *acrn_attrs[] = {
+	&dev_attr_offline_cpu.attr,
+	NULL
+};
+
+static struct attribute_group acrn_attr_group = {
+	.attrs = acrn_attrs,
+};
+
 static int __init acrn_init(void)
 {
 	unsigned long flag;
@@ -647,6 +682,15 @@ static int __init acrn_init(void)
 		return PTR_ERR(acrn_device);
 	}
 
+	if (sysfs_create_group(&acrn_device->kobj, &acrn_attr_group)) {
+		pr_warn("acrn: sysfs create failed\n");
+		device_destroy(acrn_class, MKDEV(major, 0));
+		class_unregister(acrn_class);
+		class_destroy(acrn_class);
+		unregister_chrdev(major, DEVICE_NAME);
+		return -EINVAL;
+	}
+
 	tasklet_init(&acrn_io_req_tasklet, io_req_tasklet, 0);
 	local_irq_save(flag);
 	acrn_setup_intr_irq(acrn_intr_handler);
@@ -664,6 +708,7 @@ static void __exit acrn_exit(void)
 
 	tasklet_kill(&acrn_io_req_tasklet);
 	acrn_remove_intr_irq();
+	sysfs_remove_group(&acrn_device->kobj, &acrn_attr_group);
 	device_destroy(acrn_class, MKDEV(major, 0));
 	class_unregister(acrn_class);
 	class_destroy(acrn_class);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/15] acrn: add the ACRN driver module
  2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
                   ` (14 preceding siblings ...)
  2019-08-16  2:25 ` [RFC PATCH 15/15] drivers/acrn: add the support of offline SOS cpu Zhao Yakui
@ 2019-08-16  6:39 ` Borislav Petkov
  2019-08-16  7:03   ` Greg KH
  2019-08-19  1:44   ` Zhao, Yakui
  15 siblings, 2 replies; 40+ messages in thread
From: Borislav Petkov @ 2019-08-16  6:39 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: x86, linux-kernel, devel

On Fri, Aug 16, 2019 at 10:25:41AM +0800, Zhao Yakui wrote:
> The first three patches are the changes under x86/acrn, which adds the
> required APIs for the driver and reports the X2APIC caps. 
> The remaining patches add the ACRN driver module, which accepts the ioctl
> from user-space and then communicate with the low-level ACRN hypervisor
> by using hypercall.

I have a problem with that: you're adding interfaces to arch/x86/ and
its users go into staging. Why? Why not directly put the driver where
it belongs, clean it up properly and submit it like everything else is
submitted?

I don't want to have stuff in arch/x86/ which is used solely by code in
staging and the latter is lingering there indefinitely because no one is
cleaning it up...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/15] acrn: add the ACRN driver module
  2019-08-16  6:39 ` [RFC PATCH 00/15] acrn: add the ACRN driver module Borislav Petkov
@ 2019-08-16  7:03   ` Greg KH
  2019-08-19  2:39     ` Zhao, Yakui
  2019-08-19  1:44   ` Zhao, Yakui
  1 sibling, 1 reply; 40+ messages in thread
From: Greg KH @ 2019-08-16  7:03 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: Borislav Petkov, devel, x86, linux-kernel

On Fri, Aug 16, 2019 at 08:39:25AM +0200, Borislav Petkov wrote:
> On Fri, Aug 16, 2019 at 10:25:41AM +0800, Zhao Yakui wrote:
> > The first three patches are the changes under x86/acrn, which adds the
> > required APIs for the driver and reports the X2APIC caps. 
> > The remaining patches add the ACRN driver module, which accepts the ioctl
> > from user-space and then communicate with the low-level ACRN hypervisor
> > by using hypercall.
> 
> I have a problem with that: you're adding interfaces to arch/x86/ and
> its users go into staging. Why? Why not directly put the driver where
> it belongs, clean it up properly and submit it like everything else is
> submitted?
> 
> I don't want to have stuff in arch/x86/ which is used solely by code in
> staging and the latter is lingering there indefinitely because no one is
> cleaning it up...

I agree, stuff in drivers/staging/ must be self-contained, with no
changes outside of the code's subdirectory needed in order for it to
work.  That way it is trivial for us to delete it when it never gets
cleaned up :)

You never say _why_ this should go into drivers/staging/, nor do you
have a TODO file like all other staging code that explains exactly what
needs to be done to get it out of there.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 04/15] drivers/acrn: add the basic framework of acrn char device driver
  2019-08-16  2:25 ` [RFC PATCH 04/15] drivers/acrn: add the basic framework of acrn char device driver Zhao Yakui
@ 2019-08-16  7:05   ` Greg KH
  2019-08-19  4:02     ` Zhao, Yakui
  2019-08-16 11:28   ` Dan Carpenter
  1 sibling, 1 reply; 40+ messages in thread
From: Greg KH @ 2019-08-16  7:05 UTC (permalink / raw)
  To: Zhao Yakui
  Cc: x86, linux-kernel, devel, Mingqiang Chi, Jack Ren, Jason Chen CJ,
	Liu Shuo

On Fri, Aug 16, 2019 at 10:25:45AM +0800, Zhao Yakui wrote:
> ACRN hypervisor service module is the important middle layer that allows
> the Linux kernel to communicate with the ACRN hypervisor. It includes
> the management of virtualized CPU/memory/device/interrupt for other ACRN
> guest. The user-space applications can use the provided ACRN ioctls to
> interact with ACRN hypervisor through different hypercalls.
> 
> Add one basic framework firstly and the following patches will
> add the corresponding implementations, which includes the management of
> virtualized CPU/memory/interrupt and the emulation of MMIO/IO/PCI access.
> The device file of /dev/acrn_hsm can be accessed in user-space to
> communicate with ACRN module.
> 
> Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
> Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
> Co-developed-by: Jack Ren <jack.ren@intel.com>
> Signed-off-by: Jack Ren <jack.ren@intel.com>
> Co-developed-by: Mingqiang Chi <mingqiang.chi@intel.com>
> Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
> Co-developed-by: Liu Shuo <shuo.a.liu@intel.com>
> Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
> ---
>  drivers/staging/Kconfig         |   2 +

Also, your subject line for all of these patches are wrong, it is not
drivers/acrn :(

And you forgot to cc: the staging maintainer :(

As I have said with NUMEROUS Intel patches in the past, I now refuse to
take patches from you all WITHOUT having it signed-off-by someone from
the Intel "OTC" group (or whatever the Intel Linux group is called these
days).  They are a resource you can not ignore, and if you do, you just
end up making the rest of the kernel community grumpy by having us do
their work for them :(

Please work with them.

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 04/15] drivers/acrn: add the basic framework of acrn char device driver
  2019-08-16  2:25 ` [RFC PATCH 04/15] drivers/acrn: add the basic framework of acrn char device driver Zhao Yakui
  2019-08-16  7:05   ` Greg KH
@ 2019-08-16 11:28   ` Dan Carpenter
  1 sibling, 0 replies; 40+ messages in thread
From: Dan Carpenter @ 2019-08-16 11:28 UTC (permalink / raw)
  To: Zhao Yakui
  Cc: x86, linux-kernel, devel, Mingqiang Chi, Jack Ren, Jason Chen CJ,
	Liu Shuo

On Fri, Aug 16, 2019 at 10:25:45AM +0800, Zhao Yakui wrote:
> +static
> +int acrn_dev_open(struct inode *inodep, struct file *filep)
> +{
> +	pr_info("%s: opening device node\n", __func__);
> +
> +	return 0;
> +}
> +
> +static
> +long acrn_dev_ioctl(struct file *filep,
> +		    unsigned int ioctl_num, unsigned long ioctl_param)
> +{
> +	long ret = 0;
> +
> +	return ret;


This module is mostly stubs and debugging printks...  :(

I looked ahead in the patch series to see if we do something with the
stubs later on and it turns out we do.  Fold the two patches together so
that we don't have to review patches like this one.  Each patch should
do "one thing" which makes sense and can be reviewed independently.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
  2019-08-16  2:25 ` [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN " Zhao Yakui
@ 2019-08-16 12:58   ` Dan Carpenter
  2019-08-19  5:32     ` Zhao, Yakui
  0 siblings, 1 reply; 40+ messages in thread
From: Dan Carpenter @ 2019-08-16 12:58 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: x86, linux-kernel, devel, Li, Fei, Jason Chen CJ, Liu Shuo

On Fri, Aug 16, 2019 at 10:25:49AM +0800, Zhao Yakui wrote:
> +int hugepage_map_guest(struct acrn_vm *vm, struct vm_memmap *memmap)
> +{
> +	struct page *page = NULL, *regions_buf_pg = NULL;
> +	unsigned long len, guest_gpa, vma;
> +	struct vm_memory_region *region_array;
> +	struct set_regions *regions;
> +	int max_size = PAGE_SIZE / sizeof(struct vm_memory_region);
> +	int ret;
> +
> +	if (!vm || !memmap)
> +		return -EINVAL;
> +
> +	len = memmap->len;
> +	vma = memmap->vma_base;
> +	guest_gpa = memmap->gpa;
> +
> +	/* prepare set_memory_regions info */
> +	regions_buf_pg = alloc_page(GFP_KERNEL);
> +	if (!regions_buf_pg)
> +		return -ENOMEM;
> +
> +	regions = kzalloc(sizeof(*regions), GFP_KERNEL);
> +	if (!regions) {
> +		__free_page(regions_buf_pg);
> +		return -ENOMEM;

It's better to do a goto err_free_regions_buf here.  More comments
below.

> +	}
> +	regions->mr_num = 0;
> +	regions->vmid = vm->vmid;
> +	regions->regions_gpa = page_to_phys(regions_buf_pg);
> +	region_array = page_to_virt(regions_buf_pg);
> +
> +	while (len > 0) {
> +		unsigned long vm0_gpa, pagesize;
> +
> +		ret = get_user_pages_fast(vma, 1, 1, &page);
> +		if (unlikely(ret != 1) || (!page)) {
> +			pr_err("failed to pin huge page!\n");
> +			ret = -ENOMEM;
> +			goto err;

goto err is a red flag.  It's better if error labels do one specific
named thing like:

err_regions:
	kfree(regions);
err_free_regions_buf:
	__free_page(regions_buf_pg);

We should unwind in the opposite/mirror order from how things were
allocated.  Then we can remove the if statements in the error handling.

In this situation, say the user triggers an -EFAULT in
get_user_pages_fast() in the second iteration through the loop.  That
means that "page" is the non-NULL page from the previous iteration.  We
have already added it to add_guest_map().  But now we're freeing it
without removing it from the map so probably it leads to a use after
free.

The best way to write the error handling in a loop like this is to
clean up the partial iteration that has succeed (nothing here), and then
unwind all the successful iterations at the bottom of the function.
"goto unwind_loop;"

> +		}
> +
> +		vm0_gpa = page_to_phys(page);
> +		pagesize = PAGE_SIZE << compound_order(page);
> +
> +		ret = add_guest_map(vm, vm0_gpa, guest_gpa, pagesize);
> +		if (ret < 0) {
> +			pr_err("failed to add memseg for huge page!\n");
> +			goto err;

So then here, it would be:

			pr_err("failed to add memseg for huge page!\n");
			put_page(page);
			goto unwind_loop;

regards,
dan carpenter

> +		}
> +
> +		/* fill each memory region into region_array */
> +		region_array[regions->mr_num].type = MR_ADD;
> +		region_array[regions->mr_num].gpa = guest_gpa;
> +		region_array[regions->mr_num].vm0_gpa = vm0_gpa;
> +		region_array[regions->mr_num].size = pagesize;
> +		region_array[regions->mr_num].prot =
> +				(MEM_TYPE_WB & MEM_TYPE_MASK) |
> +				(memmap->prot & MEM_ACCESS_RIGHT_MASK);
> +		regions->mr_num++;
> +		if (regions->mr_num == max_size) {
> +			pr_debug("region buffer full, set & renew regions!\n");
> +			ret = set_memory_regions(regions);
> +			if (ret < 0) {
> +				pr_err("failed to set regions,ret=%d!\n", ret);
> +				goto err;
> +			}
> +			regions->mr_num = 0;
> +		}
> +
> +		len -= pagesize;
> +		vma += pagesize;
> +		guest_gpa += pagesize;
> +	}
> +
> +	ret = set_memory_regions(regions);
> +	if (ret < 0) {
> +		pr_err("failed to set regions, ret=%d!\n", ret);
> +		goto err;
> +	}
> +
> +	__free_page(regions_buf_pg);
> +	kfree(regions);
> +
> +	return 0;
> +err:
> +	if (regions_buf_pg)
> +		__free_page(regions_buf_pg);
> +	if (page)
> +		put_page(page);
> +	kfree(regions);
> +	return ret;
> +}
> +


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 09/15] drivers/acrn: add passthrough device support
  2019-08-16  2:25 ` [RFC PATCH 09/15] drivers/acrn: add passthrough device support Zhao Yakui
@ 2019-08-16 13:05   ` Dan Carpenter
  0 siblings, 0 replies; 40+ messages in thread
From: Dan Carpenter @ 2019-08-16 13:05 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: x86, linux-kernel, devel, Gao, Shiqing, Jason Chen CJ

On Fri, Aug 16, 2019 at 10:25:50AM +0800, Zhao Yakui wrote:
> +	case IC_ASSIGN_PTDEV: {
> +		unsigned short bdf;
> +
> +		if (copy_from_user(&bdf, (void *)ioctl_param,

This casting is ugly and you also need a __user tag for Sparse.  Do
something like "void __user *p = ioctl_param;"

> +				   sizeof(unsigned short)))
> +			return -EFAULT;
> +
> +		ret = hcall_assign_ptdev(vm->vmid, bdf);
> +		if (ret < 0) {
> +			pr_err("acrn: failed to assign ptdev!\n");
> +			return -EFAULT;

Preserve the error code.  "return ret;".

> +		}
> +		break;
> +	}
> +	case IC_DEASSIGN_PTDEV: {
> +		unsigned short bdf;
> +
> +		if (copy_from_user(&bdf, (void *)ioctl_param,
> +				   sizeof(unsigned short)))
> +			return -EFAULT;
> +
> +		ret = hcall_deassign_ptdev(vm->vmid, bdf);
> +		if (ret < 0) {
> +			pr_err("acrn: failed to deassign ptdev!\n");
> +			return -EFAULT;
> +		}
> +		break;
> +	}
> +
> +	case IC_SET_PTDEV_INTR_INFO: {
> +		struct ic_ptdev_irq ic_pt_irq;
> +		struct hc_ptdev_irq *hc_pt_irq;
> +
> +		if (copy_from_user(&ic_pt_irq, (void *)ioctl_param,
> +				   sizeof(ic_pt_irq)))
> +			return -EFAULT;
> +
> +		hc_pt_irq = kmalloc(sizeof(*hc_pt_irq), GFP_KERNEL);
> +		if (!hc_pt_irq)
> +			return -ENOMEM;
> +
> +		memcpy(hc_pt_irq, &ic_pt_irq, sizeof(*hc_pt_irq));

Use memdup_user().

> +
> +		ret = hcall_set_ptdev_intr_info(vm->vmid,
> +						virt_to_phys(hc_pt_irq));
> +		kfree(hc_pt_irq);
> +		if (ret < 0) {
> +			pr_err("acrn: failed to set intr info for ptdev!\n");
> +			return -EFAULT;
> +		}
> +
> +		break;
> +	}

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 10/15] drivers/acrn: add interrupt injection support
  2019-08-16  2:25 ` [RFC PATCH 10/15] drivers/acrn: add interrupt injection support Zhao Yakui
@ 2019-08-16 13:12   ` Dan Carpenter
  2019-08-19  4:59     ` Zhao, Yakui
  0 siblings, 1 reply; 40+ messages in thread
From: Dan Carpenter @ 2019-08-16 13:12 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: x86, linux-kernel, devel, Mingqiang Chi, Jason Chen CJ

On Fri, Aug 16, 2019 at 10:25:51AM +0800, Zhao Yakui wrote:
> +	case IC_VM_INTR_MONITOR: {
> +		struct page *page;
> +
> +		ret = get_user_pages_fast(ioctl_param, 1, 1, &page);
> +		if (unlikely(ret != 1) || !page) {
                                       ^^^^^^^^
Not required.

> +			pr_err("acrn-dev: failed to pin intr hdr buffer!\n");
> +			return -ENOMEM;
> +		}
> +
> +		ret = hcall_vm_intr_monitor(vm->vmid, page_to_phys(page));
> +		if (ret < 0) {
> +			pr_err("acrn-dev: monitor intr data err=%ld\n", ret);
> +			return -EFAULT;
> +		}
> +		break;
> +	}
> +

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 11/15] drivers/acrn: add the support of handling emulated ioreq
  2019-08-16  2:25 ` [RFC PATCH 11/15] drivers/acrn: add the support of handling emulated ioreq Zhao Yakui
@ 2019-08-16 13:39   ` Dan Carpenter
  2019-08-19  4:54     ` Zhao, Yakui
  0 siblings, 1 reply; 40+ messages in thread
From: Dan Carpenter @ 2019-08-16 13:39 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: x86, linux-kernel, devel, Yin FengWei, Jason Chen CJ, Liu Shuo

On Fri, Aug 16, 2019 at 10:25:52AM +0800, Zhao Yakui wrote:
> +int acrn_ioreq_create_client(unsigned short vmid,
> +			     ioreq_handler_t handler,
> +			     void *client_priv,
> +			     char *name)
> +{
> +	struct acrn_vm *vm;
> +	struct ioreq_client *client;
> +	int client_id;
> +
> +	might_sleep();
> +
> +	vm = find_get_vm(vmid);
> +	if (unlikely(!vm || !vm->req_buf)) {
> +		pr_err("acrn-ioreq: failed to find vm from vmid %d\n", vmid);
> +		put_vm(vm);
> +		return -EINVAL;
> +	}
> +
> +	client_id = alloc_client();
> +	if (unlikely(client_id < 0)) {
> +		pr_err("acrn-ioreq: vm[%d] failed to alloc ioreq client\n",
> +		       vmid);
> +		put_vm(vm);
> +		return -EINVAL;
> +	}
> +
> +	client = acrn_ioreq_get_client(client_id);
> +	if (unlikely(!client)) {
> +		pr_err("failed to get the client.\n");
> +		put_vm(vm);
> +		return -EINVAL;

Do we need to clean up the alloc_client() allocation?

regards,
dan carpenter

> +	}
> +
> +	if (handler) {
> +		client->handler = handler;
> +		client->acrn_create_kthread = true;
> +	}
> +
> +	client->ref_vm = vm;
> +	client->vmid = vmid;
> +	client->client_priv = client_priv;
> +	if (name)
> +		strncpy(client->name, name, sizeof(client->name) - 1);
> +	rwlock_init(&client->range_lock);
> +	INIT_LIST_HEAD(&client->range_list);
> +	init_waitqueue_head(&client->wq);
> +
> +	/* When it is added to ioreq_client_list, the refcnt is increased */
> +	spin_lock_bh(&vm->ioreq_client_lock);
> +	list_add(&client->list, &vm->ioreq_client_list);
> +	spin_unlock_bh(&vm->ioreq_client_lock);
> +
> +	pr_info("acrn-ioreq: created ioreq client %d\n", client_id);
> +
> +	return client_id;
> +}


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/15] acrn: add the ACRN driver module
  2019-08-16  6:39 ` [RFC PATCH 00/15] acrn: add the ACRN driver module Borislav Petkov
  2019-08-16  7:03   ` Greg KH
@ 2019-08-19  1:44   ` Zhao, Yakui
  2019-08-19  5:25     ` Greg KH
  2019-08-19  6:18     ` Borislav Petkov
  1 sibling, 2 replies; 40+ messages in thread
From: Zhao, Yakui @ 2019-08-19  1:44 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-kernel, devel



On 2019年08月16日 14:39, Borislav Petkov wrote:
> On Fri, Aug 16, 2019 at 10:25:41AM +0800, Zhao Yakui wrote:
>> The first three patches are the changes under x86/acrn, which adds the
>> required APIs for the driver and reports the X2APIC caps.
>> The remaining patches add the ACRN driver module, which accepts the ioctl
>> from user-space and then communicate with the low-level ACRN hypervisor
>> by using hypercall.
> 
> I have a problem with that: you're adding interfaces to arch/x86/ and
> its users go into staging. Why? Why not directly put the driver where
> it belongs, clean it up properly and submit it like everything else is
> submitted?

Thanks for your reply and the concern.

After taking a look at several driver examples(gma500, android), it 
seems that they are firstly added into drivers/staging/XXX and then 
moved to drivers/XXX after the driver becomes mature.
So we refer to this method to upstream ACRN driver part.

If the new driver can also be added by skipping the staging approach,
we will refine it and then submit it in normal process.
> 
> I don't want to have stuff in arch/x86/ which is used solely by code in
> staging and the latter is lingering there indefinitely because no one is
> cleaning it up...
> 

The ACRN driver will be the only user of the added APIs in x86/acrn. 
Without the APIs in x86/acrn, the driver can't add the driver-specifc 
upcall notification ISR or call the hypercall.

Not sure whether it can be sent in two patch sets?
The first is to add the required APIs for ACRN driver.
The second is to add the ACRN driver

Thanks
    Yakui

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/15] acrn: add the ACRN driver module
  2019-08-16  7:03   ` Greg KH
@ 2019-08-19  2:39     ` Zhao, Yakui
  2019-08-19  5:25       ` Greg KH
  0 siblings, 1 reply; 40+ messages in thread
From: Zhao, Yakui @ 2019-08-19  2:39 UTC (permalink / raw)
  To: Greg KH; +Cc: Borislav Petkov, devel, x86, linux-kernel



On 2019年08月16日 15:03, Greg KH wrote:
> On Fri, Aug 16, 2019 at 08:39:25AM +0200, Borislav Petkov wrote:
>> On Fri, Aug 16, 2019 at 10:25:41AM +0800, Zhao Yakui wrote:
>>> The first three patches are the changes under x86/acrn, which adds the
>>> required APIs for the driver and reports the X2APIC caps.
>>> The remaining patches add the ACRN driver module, which accepts the ioctl
>>> from user-space and then communicate with the low-level ACRN hypervisor
>>> by using hypercall.
>>
>> I have a problem with that: you're adding interfaces to arch/x86/ and
>> its users go into staging. Why? Why not directly put the driver where
>> it belongs, clean it up properly and submit it like everything else is
>> submitted?
>>
>> I don't want to have stuff in arch/x86/ which is used solely by code in
>> staging and the latter is lingering there indefinitely because no one is
>> cleaning it up...
> 
> I agree, stuff in drivers/staging/ must be self-contained, with no
> changes outside of the code's subdirectory needed in order for it to
> work.  That way it is trivial for us to delete it when it never gets
> cleaned up :)

Thanks for pointing out the rule of drivers/staging.
The acrn staging driver is one self-contained driver. But it has some 
dependency on arch/x86/acrn and need to call the APIs in arch/x86/acrn.

If there is no driver,  the API without user had better not be added.
If API is not added,  the driver can't be compiled correctly.
The ACRN driver is one new driver. Maybe it will have some bugs and not 
be mature. So we want to add the driver as the staging.

What is the better approach to handle such scenario?

> 
> You never say _why_ this should go into drivers/staging/, nor do you
> have a TODO file like all other staging code that explains exactly what
> needs to be done to get it out of there.

Ok. The TODO file will be added in next version.


> 
> thanks,
> 
> greg k-h
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 04/15] drivers/acrn: add the basic framework of acrn char device driver
  2019-08-16  7:05   ` Greg KH
@ 2019-08-19  4:02     ` Zhao, Yakui
  2019-08-19  5:26       ` Greg KH
  0 siblings, 1 reply; 40+ messages in thread
From: Zhao, Yakui @ 2019-08-19  4:02 UTC (permalink / raw)
  To: Greg KH
  Cc: x86, linux-kernel, devel, Mingqiang Chi, Jack Ren, Jason Chen CJ,
	Liu Shuo



On 2019年08月16日 15:05, Greg KH wrote:
> On Fri, Aug 16, 2019 at 10:25:45AM +0800, Zhao Yakui wrote:
>> ACRN hypervisor service module is the important middle layer that allows
>> the Linux kernel to communicate with the ACRN hypervisor. It includes
>> the management of virtualized CPU/memory/device/interrupt for other ACRN
>> guest. The user-space applications can use the provided ACRN ioctls to
>> interact with ACRN hypervisor through different hypercalls.
>>
>> Add one basic framework firstly and the following patches will
>> add the corresponding implementations, which includes the management of
>> virtualized CPU/memory/interrupt and the emulation of MMIO/IO/PCI access.
>> The device file of /dev/acrn_hsm can be accessed in user-space to
>> communicate with ACRN module.
>>
>> Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
>> Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
>> Co-developed-by: Jack Ren <jack.ren@intel.com>
>> Signed-off-by: Jack Ren <jack.ren@intel.com>
>> Co-developed-by: Mingqiang Chi <mingqiang.chi@intel.com>
>> Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
>> Co-developed-by: Liu Shuo <shuo.a.liu@intel.com>
>> Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
>> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
>> ---
>>   drivers/staging/Kconfig         |   2 +
> 
> Also, your subject line for all of these patches are wrong, it is not
> drivers/acrn :(

Thanks for the pointing out it.

It will be fixed.

> 
> And you forgot to cc: the staging maintainer :(

Do you mean that the maintainer of staging subsystem is also added in 
the patch commit log?


> 
> As I have said with NUMEROUS Intel patches in the past, I now refuse to
> take patches from you all WITHOUT having it signed-off-by someone from
> the Intel "OTC" group (or whatever the Intel Linux group is called these
> days).  They are a resource you can not ignore, and if you do, you just
> end up making the rest of the kernel community grumpy by having us do
> their work for them :(
> 
> Please work with them.

OK. I will work with some peoples in OTC group to prepare the better 
ACRN driver.

> 
> greg k-h
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 11/15] drivers/acrn: add the support of handling emulated ioreq
  2019-08-16 13:39   ` Dan Carpenter
@ 2019-08-19  4:54     ` Zhao, Yakui
  0 siblings, 0 replies; 40+ messages in thread
From: Zhao, Yakui @ 2019-08-19  4:54 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: linux-kernel, devel, Yin FengWei, Jason Chen CJ, Liu Shuo



On 2019年08月16日 21:39, Dan Carpenter wrote:
> On Fri, Aug 16, 2019 at 10:25:52AM +0800, Zhao Yakui wrote:
>> +int acrn_ioreq_create_client(unsigned short vmid,
>> +			     ioreq_handler_t handler,
>> +			     void *client_priv,
>> +			     char *name)
>> +{
>> +	struct acrn_vm *vm;
>> +	struct ioreq_client *client;
>> +	int client_id;
>> +
>> +	might_sleep();
>> +
>> +	vm = find_get_vm(vmid);
>> +	if (unlikely(!vm || !vm->req_buf)) {
>> +		pr_err("acrn-ioreq: failed to find vm from vmid %d\n", vmid);
>> +		put_vm(vm);
>> +		return -EINVAL;
>> +	}
>> +
>> +	client_id = alloc_client();
>> +	if (unlikely(client_id < 0)) {
>> +		pr_err("acrn-ioreq: vm[%d] failed to alloc ioreq client\n",
>> +		       vmid);
>> +		put_vm(vm);
>> +		return -EINVAL;
>> +	}
>> +
>> +	client = acrn_ioreq_get_client(client_id);
>> +	if (unlikely(!client)) {
>> +		pr_err("failed to get the client.\n");
>> +		put_vm(vm);
>> +		return -EINVAL;
> 
> Do we need to clean up the alloc_client() allocation?

Thanks for the review.

The function of acrn_iocreq_get_client is used to return the client for 
the given client_id. (The ref_count of client is also added). If it is 
NULL, it indicates that it is already released in another place.

In the function of acrn_ioreq_create_client, we don't need to clean up 
the alloc_client as it always exists in course of creating_client.

> 
> regards,
> dan carpenter
> 
>> +	}
>> +
>> +	if (handler) {
>> +		client->handler = handler;
>> +		client->acrn_create_kthread = true;
>> +	}
>> +
>> +	client->ref_vm = vm;
>> +	client->vmid = vmid;
>> +	client->client_priv = client_priv;
>> +	if (name)
>> +		strncpy(client->name, name, sizeof(client->name) - 1);
>> +	rwlock_init(&client->range_lock);
>> +	INIT_LIST_HEAD(&client->range_list);
>> +	init_waitqueue_head(&client->wq);
>> +
>> +	/* When it is added to ioreq_client_list, the refcnt is increased */
>> +	spin_lock_bh(&vm->ioreq_client_lock);
>> +	list_add(&client->list, &vm->ioreq_client_list);
>> +	spin_unlock_bh(&vm->ioreq_client_lock);
>> +
>> +	pr_info("acrn-ioreq: created ioreq client %d\n", client_id);
>> +
>> +	return client_id;
>> +}
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 10/15] drivers/acrn: add interrupt injection support
  2019-08-16 13:12   ` Dan Carpenter
@ 2019-08-19  4:59     ` Zhao, Yakui
  0 siblings, 0 replies; 40+ messages in thread
From: Zhao, Yakui @ 2019-08-19  4:59 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: linux-kernel, devel, Mingqiang Chi, Jason Chen CJ



On 2019年08月16日 21:12, Dan Carpenter wrote:
> On Fri, Aug 16, 2019 at 10:25:51AM +0800, Zhao Yakui wrote:
>> +	case IC_VM_INTR_MONITOR: {
>> +		struct page *page;
>> +
>> +		ret = get_user_pages_fast(ioctl_param, 1, 1, &page);
>> +		if (unlikely(ret != 1) || !page) {
>                                         ^^^^^^^^
> Not required.

Do you mean that it is enough to check the condition of "ret != 1"?
OK. It will be removed.


> 
>> +			pr_err("acrn-dev: failed to pin intr hdr buffer!\n");
>> +			return -ENOMEM;
>> +		}
>> +
>> +		ret = hcall_vm_intr_monitor(vm->vmid, page_to_phys(page));
>> +		if (ret < 0) {
>> +			pr_err("acrn-dev: monitor intr data err=%ld\n", ret);
>> +			return -EFAULT;
>> +		}
>> +		break;
>> +	}
>> +
> 
> regards,
> dan carpenter
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/15] acrn: add the ACRN driver module
  2019-08-19  1:44   ` Zhao, Yakui
@ 2019-08-19  5:25     ` Greg KH
  2019-08-19  5:39       ` Zhao, Yakui
  2019-08-19  6:18     ` Borislav Petkov
  1 sibling, 1 reply; 40+ messages in thread
From: Greg KH @ 2019-08-19  5:25 UTC (permalink / raw)
  To: Zhao, Yakui; +Cc: Borislav Petkov, devel, x86, linux-kernel

On Mon, Aug 19, 2019 at 09:44:25AM +0800, Zhao, Yakui wrote:
> 
> 
> On 2019年08月16日 14:39, Borislav Petkov wrote:
> > On Fri, Aug 16, 2019 at 10:25:41AM +0800, Zhao Yakui wrote:
> > > The first three patches are the changes under x86/acrn, which adds the
> > > required APIs for the driver and reports the X2APIC caps.
> > > The remaining patches add the ACRN driver module, which accepts the ioctl
> > > from user-space and then communicate with the low-level ACRN hypervisor
> > > by using hypercall.
> > 
> > I have a problem with that: you're adding interfaces to arch/x86/ and
> > its users go into staging. Why? Why not directly put the driver where
> > it belongs, clean it up properly and submit it like everything else is
> > submitted?
> 
> Thanks for your reply and the concern.
> 
> After taking a look at several driver examples(gma500, android), it seems
> that they are firstly added into drivers/staging/XXX and then moved to
> drivers/XXX after the driver becomes mature.
> So we refer to this method to upstream ACRN driver part.

Those two examples are probably the worst examples to ever look at :)

The code quality of those submissions was horrible, gma500 took a very
long time to clean up and there are parts of the android code that are
still in staging to this day.

> If the new driver can also be added by skipping the staging approach,
> we will refine it and then submit it in normal process.

That is the normal process, staging should not be needed at all for any
code.  It is a fall-back for when the company involved has no idea of
how to upstream their code, which should NOT be the case here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/15] acrn: add the ACRN driver module
  2019-08-19  2:39     ` Zhao, Yakui
@ 2019-08-19  5:25       ` Greg KH
  0 siblings, 0 replies; 40+ messages in thread
From: Greg KH @ 2019-08-19  5:25 UTC (permalink / raw)
  To: Zhao, Yakui; +Cc: devel, x86, Borislav Petkov, linux-kernel

On Mon, Aug 19, 2019 at 10:39:32AM +0800, Zhao, Yakui wrote:
> 
> 
> On 2019年08月16日 15:03, Greg KH wrote:
> > On Fri, Aug 16, 2019 at 08:39:25AM +0200, Borislav Petkov wrote:
> > > On Fri, Aug 16, 2019 at 10:25:41AM +0800, Zhao Yakui wrote:
> > > > The first three patches are the changes under x86/acrn, which adds the
> > > > required APIs for the driver and reports the X2APIC caps.
> > > > The remaining patches add the ACRN driver module, which accepts the ioctl
> > > > from user-space and then communicate with the low-level ACRN hypervisor
> > > > by using hypercall.
> > > 
> > > I have a problem with that: you're adding interfaces to arch/x86/ and
> > > its users go into staging. Why? Why not directly put the driver where
> > > it belongs, clean it up properly and submit it like everything else is
> > > submitted?
> > > 
> > > I don't want to have stuff in arch/x86/ which is used solely by code in
> > > staging and the latter is lingering there indefinitely because no one is
> > > cleaning it up...
> > 
> > I agree, stuff in drivers/staging/ must be self-contained, with no
> > changes outside of the code's subdirectory needed in order for it to
> > work.  That way it is trivial for us to delete it when it never gets
> > cleaned up :)
> 
> Thanks for pointing out the rule of drivers/staging.
> The acrn staging driver is one self-contained driver. But it has some
> dependency on arch/x86/acrn and need to call the APIs in arch/x86/acrn.

Then it should not be in drivers/staging/  Please work to get this
accepted "normally".

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 04/15] drivers/acrn: add the basic framework of acrn char device driver
  2019-08-19  4:02     ` Zhao, Yakui
@ 2019-08-19  5:26       ` Greg KH
  0 siblings, 0 replies; 40+ messages in thread
From: Greg KH @ 2019-08-19  5:26 UTC (permalink / raw)
  To: Zhao, Yakui
  Cc: devel, x86, linux-kernel, Jason Chen CJ, Jack Ren, Liu Shuo,
	Mingqiang Chi

On Mon, Aug 19, 2019 at 12:02:33PM +0800, Zhao, Yakui wrote:
> 
> 
> On 2019年08月16日 15:05, Greg KH wrote:
> > On Fri, Aug 16, 2019 at 10:25:45AM +0800, Zhao Yakui wrote:
> > > ACRN hypervisor service module is the important middle layer that allows
> > > the Linux kernel to communicate with the ACRN hypervisor. It includes
> > > the management of virtualized CPU/memory/device/interrupt for other ACRN
> > > guest. The user-space applications can use the provided ACRN ioctls to
> > > interact with ACRN hypervisor through different hypercalls.
> > > 
> > > Add one basic framework firstly and the following patches will
> > > add the corresponding implementations, which includes the management of
> > > virtualized CPU/memory/interrupt and the emulation of MMIO/IO/PCI access.
> > > The device file of /dev/acrn_hsm can be accessed in user-space to
> > > communicate with ACRN module.
> > > 
> > > Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
> > > Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
> > > Co-developed-by: Jack Ren <jack.ren@intel.com>
> > > Signed-off-by: Jack Ren <jack.ren@intel.com>
> > > Co-developed-by: Mingqiang Chi <mingqiang.chi@intel.com>
> > > Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
> > > Co-developed-by: Liu Shuo <shuo.a.liu@intel.com>
> > > Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
> > > Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
> > > ---
> > >   drivers/staging/Kconfig         |   2 +
> > 
> > Also, your subject line for all of these patches are wrong, it is not
> > drivers/acrn :(
> 
> Thanks for the pointing out it.
> 
> It will be fixed.
> 
> > 
> > And you forgot to cc: the staging maintainer :(
> 
> Do you mean that the maintainer of staging subsystem is also added in the
> patch commit log?

Did you not run scripts/get_maintainer.pl on your patches to determine
who to send patches to?  Always do that.

> > As I have said with NUMEROUS Intel patches in the past, I now refuse to
> > take patches from you all WITHOUT having it signed-off-by someone from
> > the Intel "OTC" group (or whatever the Intel Linux group is called these
> > days).  They are a resource you can not ignore, and if you do, you just
> > end up making the rest of the kernel community grumpy by having us do
> > their work for them :(
> > 
> > Please work with them.
> 
> OK. I will work with some peoples in OTC group to prepare the better ACRN
> driver.

Thank you.

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
  2019-08-16 12:58   ` Dan Carpenter
@ 2019-08-19  5:32     ` Zhao, Yakui
  2019-08-19  7:39       ` Dan Carpenter
  0 siblings, 1 reply; 40+ messages in thread
From: Zhao, Yakui @ 2019-08-19  5:32 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: x86, linux-kernel, devel, Li, Fei, Jason Chen CJ, Liu Shuo



On 2019年08月16日 20:58, Dan Carpenter wrote:
> On Fri, Aug 16, 2019 at 10:25:49AM +0800, Zhao Yakui wrote:
>> +int hugepage_map_guest(struct acrn_vm *vm, struct vm_memmap *memmap)
>> +{
>> +	struct page *page = NULL, *regions_buf_pg = NULL;
>> +	unsigned long len, guest_gpa, vma;
>> +	struct vm_memory_region *region_array;
>> +	struct set_regions *regions;
>> +	int max_size = PAGE_SIZE / sizeof(struct vm_memory_region);
>> +	int ret;
>> +
>> +	if (!vm || !memmap)
>> +		return -EINVAL;
>> +
>> +	len = memmap->len;
>> +	vma = memmap->vma_base;
>> +	guest_gpa = memmap->gpa;
>> +
>> +	/* prepare set_memory_regions info */
>> +	regions_buf_pg = alloc_page(GFP_KERNEL);
>> +	if (!regions_buf_pg)
>> +		return -ENOMEM;
>> +
>> +	regions = kzalloc(sizeof(*regions), GFP_KERNEL);
>> +	if (!regions) {
>> +		__free_page(regions_buf_pg);
>> +		return -ENOMEM;
> 
> It's better to do a goto err_free_regions_buf here.  More comments
> below.
> 
>> +	}
>> +	regions->mr_num = 0;
>> +	regions->vmid = vm->vmid;
>> +	regions->regions_gpa = page_to_phys(regions_buf_pg);
>> +	region_array = page_to_virt(regions_buf_pg);
>> +
>> +	while (len > 0) {
>> +		unsigned long vm0_gpa, pagesize;
>> +
>> +		ret = get_user_pages_fast(vma, 1, 1, &page);
>> +		if (unlikely(ret != 1) || (!page)) {
>> +			pr_err("failed to pin huge page!\n");
>> +			ret = -ENOMEM;
>> +			goto err;
> 
> goto err is a red flag.  It's better if error labels do one specific
> named thing like:
> 
> err_regions:
> 	kfree(regions);
> err_free_regions_buf:
> 	__free_page(regions_buf_pg);
> 
> We should unwind in the opposite/mirror order from how things were
> allocated.  Then we can remove the if statements in the error handling.

Thanks for the review.

Will follow your suggestion to unwind the error handling.

> 
> In this situation, say the user triggers an -EFAULT in
> get_user_pages_fast() in the second iteration through the loop.  That
> means that "page" is the non-NULL page from the previous iteration.  We
> have already added it to add_guest_map().  But now we're freeing it
> without removing it from the map so probably it leads to a use after
> free.
> 
> The best way to write the error handling in a loop like this is to
> clean up the partial iteration that has succeed (nothing here), and then
> unwind all the successful iterations at the bottom of the function.
> "goto unwind_loop;"
> 

In theory we should cleanup the previous success iteration if it 
encounters one error in the current iteration.
But it will be quite complex to cleanup up the previous iteration.
call the set_memory_regions for MR_DEL op.
call the remove_guest_map for the added hash item
call the put_page for returned page in get_user_pages_fast.

In fact as this driver is mainly used for embedded IOT usage, it doesn't 
handle the complex cleanup when such error is encountered. Instead the 
clean up is handled in free_guest_vm.

>> +		}
>> +
>> +		vm0_gpa = page_to_phys(page);
>> +		pagesize = PAGE_SIZE << compound_order(page);
>> +
>> +		ret = add_guest_map(vm, vm0_gpa, guest_gpa, pagesize);
>> +		if (ret < 0) {
>> +			pr_err("failed to add memseg for huge page!\n");
>> +			goto err;
> 
> So then here, it would be:
> 
> 			pr_err("failed to add memseg for huge page!\n");
> 			put_page(page);
> 			goto unwind_loop;
> 
> regards,
> dan carpenter
> 
>> +		}
>> +
>> +		/* fill each memory region into region_array */
>> +		region_array[regions->mr_num].type = MR_ADD;
>> +		region_array[regions->mr_num].gpa = guest_gpa;
>> +		region_array[regions->mr_num].vm0_gpa = vm0_gpa;
>> +		region_array[regions->mr_num].size = pagesize;
>> +		region_array[regions->mr_num].prot =
>> +				(MEM_TYPE_WB & MEM_TYPE_MASK) |
>> +				(memmap->prot & MEM_ACCESS_RIGHT_MASK);
>> +		regions->mr_num++;
>> +		if (regions->mr_num == max_size) {
>> +			pr_debug("region buffer full, set & renew regions!\n");
>> +			ret = set_memory_regions(regions);
>> +			if (ret < 0) {
>> +				pr_err("failed to set regions,ret=%d!\n", ret);
>> +				goto err;
>> +			}
>> +			regions->mr_num = 0;
>> +		}
>> +
>> +		len -= pagesize;
>> +		vma += pagesize;
>> +		guest_gpa += pagesize;
>> +	}
>> +
>> +	ret = set_memory_regions(regions);
>> +	if (ret < 0) {
>> +		pr_err("failed to set regions, ret=%d!\n", ret);
>> +		goto err;
>> +	}
>> +
>> +	__free_page(regions_buf_pg);
>> +	kfree(regions);
>> +
>> +	return 0;
>> +err:
>> +	if (regions_buf_pg)
>> +		__free_page(regions_buf_pg);
>> +	if (page)
>> +		put_page(page);
>> +	kfree(regions);
>> +	return ret;
>> +}
>> +
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/15] acrn: add the ACRN driver module
  2019-08-19  5:25     ` Greg KH
@ 2019-08-19  5:39       ` Zhao, Yakui
  0 siblings, 0 replies; 40+ messages in thread
From: Zhao, Yakui @ 2019-08-19  5:39 UTC (permalink / raw)
  To: Greg KH; +Cc: Borislav Petkov, devel, x86, linux-kernel



On 2019年08月19日 13:25, Greg KH wrote:
> On Mon, Aug 19, 2019 at 09:44:25AM +0800, Zhao, Yakui wrote:
>>
>>
>> On 2019年08月16日 14:39, Borislav Petkov wrote:
>>> On Fri, Aug 16, 2019 at 10:25:41AM +0800, Zhao Yakui wrote:
>>>> The first three patches are the changes under x86/acrn, which adds the
>>>> required APIs for the driver and reports the X2APIC caps.
>>>> The remaining patches add the ACRN driver module, which accepts the ioctl
>>>> from user-space and then communicate with the low-level ACRN hypervisor
>>>> by using hypercall.
>>>
>>> I have a problem with that: you're adding interfaces to arch/x86/ and
>>> its users go into staging. Why? Why not directly put the driver where
>>> it belongs, clean it up properly and submit it like everything else is
>>> submitted?
>>
>> Thanks for your reply and the concern.
>>
>> After taking a look at several driver examples(gma500, android), it seems
>> that they are firstly added into drivers/staging/XXX and then moved to
>> drivers/XXX after the driver becomes mature.
>> So we refer to this method to upstream ACRN driver part.
> 
> Those two examples are probably the worst examples to ever look at :)
> 
> The code quality of those submissions was horrible, gma500 took a very
> long time to clean up and there are parts of the android code that are
> still in staging to this day.
> 
>> If the new driver can also be added by skipping the staging approach,
>> we will refine it and then submit it in normal process.
> 
> That is the normal process, staging should not be needed at all for any
> code.  It is a fall-back for when the company involved has no idea of
> how to upstream their code, which should NOT be the case here.

Thanks for your explanation.

OK. We will submit it in normal process.

> 
> thanks,
> 
> greg k-h
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/15] acrn: add the ACRN driver module
  2019-08-19  1:44   ` Zhao, Yakui
  2019-08-19  5:25     ` Greg KH
@ 2019-08-19  6:18     ` Borislav Petkov
  1 sibling, 0 replies; 40+ messages in thread
From: Borislav Petkov @ 2019-08-19  6:18 UTC (permalink / raw)
  To: Zhao, Yakui; +Cc: x86, linux-kernel, devel

On Mon, Aug 19, 2019 at 09:44:25AM +0800, Zhao, Yakui wrote:
> Not sure whether it can be sent in two patch sets?
> The first is to add the required APIs for ACRN driver.
> The second is to add the ACRN driver

One patchset adding the APIs and its user(s).

And make sure to refresh on

https://www.kernel.org/doc/html/latest/process/submitting-patches.html

before sending.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
  2019-08-19  5:32     ` Zhao, Yakui
@ 2019-08-19  7:39       ` Dan Carpenter
  2019-08-19  7:46         ` Borislav Petkov
  2019-08-20  2:25         ` Zhao, Yakui
  0 siblings, 2 replies; 40+ messages in thread
From: Dan Carpenter @ 2019-08-19  7:39 UTC (permalink / raw)
  To: Zhao, Yakui; +Cc: devel, Li, x86, linux-kernel, Jason Chen CJ, Liu Shuo, Fei

On Mon, Aug 19, 2019 at 01:32:54PM +0800, Zhao, Yakui wrote:
> In fact as this driver is mainly used for embedded IOT usage, it doesn't
> handle the complex cleanup when such error is encountered. Instead the clean
> up is handled in free_guest_vm.

A use after free here seems like a potential security problem.  Security
matters for IoT...  :(

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
  2019-08-19  7:39       ` Dan Carpenter
@ 2019-08-19  7:46         ` Borislav Petkov
  2019-08-20  2:25         ` Zhao, Yakui
  1 sibling, 0 replies; 40+ messages in thread
From: Borislav Petkov @ 2019-08-19  7:46 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Zhao, Yakui, devel, Li, x86, linux-kernel, Jason Chen CJ, Liu Shuo, Fei

On Mon, Aug 19, 2019 at 10:39:58AM +0300, Dan Carpenter wrote:
> On Mon, Aug 19, 2019 at 01:32:54PM +0800, Zhao, Yakui wrote:
> > In fact as this driver is mainly used for embedded IOT usage, it doesn't
> > handle the complex cleanup when such error is encountered. Instead the clean
> > up is handled in free_guest_vm.
> 
> A use after free here seems like a potential security problem.  Security
> matters for IoT...  :(

Yeah, the "S" in "IoT" stands for security.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 15/15] drivers/acrn: add the support of offline SOS cpu
  2019-08-16  2:25 ` [RFC PATCH 15/15] drivers/acrn: add the support of offline SOS cpu Zhao Yakui
@ 2019-08-19 10:34   ` Dan Carpenter
  2019-08-20  2:23     ` Zhao, Yakui
  0 siblings, 1 reply; 40+ messages in thread
From: Dan Carpenter @ 2019-08-19 10:34 UTC (permalink / raw)
  To: Zhao Yakui; +Cc: x86, linux-kernel, devel, Jason Chen CJ

On Fri, Aug 16, 2019 at 10:25:56AM +0800, Zhao Yakui wrote:
> diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
> index 0602125..6868003 100644
> --- a/drivers/staging/acrn/acrn_dev.c
> +++ b/drivers/staging/acrn/acrn_dev.c
> @@ -588,6 +588,41 @@ static const struct file_operations fops = {
>  #define SUPPORT_HV_API_VERSION_MAJOR	1
>  #define SUPPORT_HV_API_VERSION_MINOR	0
>  
> +static ssize_t
> +offline_cpu_store(struct device *dev,
> +			struct device_attribute *attr,
> +			const char *buf, size_t count)
> +{
> +#ifdef CONFIG_X86
> +	u64 cpu, lapicid;
> +
> +	if (kstrtoull(buf, 0, &cpu) < 0)
> +		return -EINVAL;

Preserve the error code.

	ret = kstrtoull(buf, 0, &cpu);
	if (ret)
		return ret;

> +
> +	if (cpu_possible(cpu)) {

You can't pass unchecked cpu values to cpu_possible() or it results in
an out of bounds read if cpu is >= than nr_cpu_ids.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 15/15] drivers/acrn: add the support of offline SOS cpu
  2019-08-19 10:34   ` Dan Carpenter
@ 2019-08-20  2:23     ` Zhao, Yakui
  0 siblings, 0 replies; 40+ messages in thread
From: Zhao, Yakui @ 2019-08-20  2:23 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: linux-kernel, devel, Jason Chen CJ



On 2019年08月19日 18:34, Dan Carpenter wrote:
> On Fri, Aug 16, 2019 at 10:25:56AM +0800, Zhao Yakui wrote:
>> diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
>> index 0602125..6868003 100644
>> --- a/drivers/staging/acrn/acrn_dev.c
>> +++ b/drivers/staging/acrn/acrn_dev.c
>> @@ -588,6 +588,41 @@ static const struct file_operations fops = {
>>   #define SUPPORT_HV_API_VERSION_MAJOR	1
>>   #define SUPPORT_HV_API_VERSION_MINOR	0
>>   
>> +static ssize_t
>> +offline_cpu_store(struct device *dev,
>> +			struct device_attribute *attr,
>> +			const char *buf, size_t count)
>> +{
>> +#ifdef CONFIG_X86
>> +	u64 cpu, lapicid;
>> +
>> +	if (kstrtoull(buf, 0, &cpu) < 0)
>> +		return -EINVAL;
> 

Thanks for the review.

Make sense.
The error code will be preserved.

> Preserve the error code.
> 
> 	ret = kstrtoull(buf, 0, &cpu);
> 	if (ret)
> 		return ret;


> 
>> +
>> +	if (cpu_possible(cpu)) {
> 
> You can't pass unchecked cpu values to cpu_possible() or it results in
> an out of bounds read if cpu is >= than nr_cpu_ids.
> 

OK. It will add the check of "cpu < num_possibles_cpu()" to avoid the 
out of bounds.

> regards,
> dan carpenter
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
  2019-08-19  7:39       ` Dan Carpenter
  2019-08-19  7:46         ` Borislav Petkov
@ 2019-08-20  2:25         ` Zhao, Yakui
  1 sibling, 0 replies; 40+ messages in thread
From: Zhao, Yakui @ 2019-08-20  2:25 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: devel, linux-kernel, Jason Chen CJ



On 2019年08月19日 15:39, Dan Carpenter wrote:
> On Mon, Aug 19, 2019 at 01:32:54PM +0800, Zhao, Yakui wrote:
>> In fact as this driver is mainly used for embedded IOT usage, it doesn't
>> handle the complex cleanup when such error is encountered. Instead the clean
>> up is handled in free_guest_vm.
> 
> A use after free here seems like a potential security problem.  Security
> matters for IoT...  :(

Thanks for pointing out the issue.
The cleanup will be considered carefully.

> 
> regards,
> dan carpenter
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2019-08-20  2:32 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16  2:25 [RFC PATCH 00/15] acrn: add the ACRN driver module Zhao Yakui
2019-08-16  2:25 ` [RFC PATCH 01/15] x86/acrn: Report X2APIC for ACRN guest Zhao Yakui
2019-08-16  2:25 ` [RFC PATCH 02/15] x86/acrn: Add two APIs to add/remove driver-specific upcall ISR handler Zhao Yakui
2019-08-16  2:25 ` [RFC PATCH 03/15] x86/acrn: Add hypercall for ACRN guest Zhao Yakui
2019-08-16  2:25 ` [RFC PATCH 04/15] drivers/acrn: add the basic framework of acrn char device driver Zhao Yakui
2019-08-16  7:05   ` Greg KH
2019-08-19  4:02     ` Zhao, Yakui
2019-08-19  5:26       ` Greg KH
2019-08-16 11:28   ` Dan Carpenter
2019-08-16  2:25 ` [RFC PATCH 05/15] drivers/acrn: add driver-specific hypercall for ACRN_HSM Zhao Yakui
2019-08-16  2:25 ` [RFC PATCH 06/15] drivers/acrn: add the support of querying ACRN api version Zhao Yakui
2019-08-16  2:25 ` [RFC PATCH 07/15] drivers/acrn: add acrn vm/vcpu management for ACRN_HSM char device Zhao Yakui
2019-08-16  2:25 ` [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN " Zhao Yakui
2019-08-16 12:58   ` Dan Carpenter
2019-08-19  5:32     ` Zhao, Yakui
2019-08-19  7:39       ` Dan Carpenter
2019-08-19  7:46         ` Borislav Petkov
2019-08-20  2:25         ` Zhao, Yakui
2019-08-16  2:25 ` [RFC PATCH 09/15] drivers/acrn: add passthrough device support Zhao Yakui
2019-08-16 13:05   ` Dan Carpenter
2019-08-16  2:25 ` [RFC PATCH 10/15] drivers/acrn: add interrupt injection support Zhao Yakui
2019-08-16 13:12   ` Dan Carpenter
2019-08-19  4:59     ` Zhao, Yakui
2019-08-16  2:25 ` [RFC PATCH 11/15] drivers/acrn: add the support of handling emulated ioreq Zhao Yakui
2019-08-16 13:39   ` Dan Carpenter
2019-08-19  4:54     ` Zhao, Yakui
2019-08-16  2:25 ` [RFC PATCH 12/15] drivers/acrn: add driver-specific IRQ handle to dispatch IO_REQ request Zhao Yakui
2019-08-16  2:25 ` [RFC PATCH 13/15] drivers/acrn: add service to obtain Power data transition Zhao Yakui
2019-08-16  2:25 ` [RFC PATCH 14/15] drivers/acrn: add the support of irqfd and eventfd Zhao Yakui
2019-08-16  2:25 ` [RFC PATCH 15/15] drivers/acrn: add the support of offline SOS cpu Zhao Yakui
2019-08-19 10:34   ` Dan Carpenter
2019-08-20  2:23     ` Zhao, Yakui
2019-08-16  6:39 ` [RFC PATCH 00/15] acrn: add the ACRN driver module Borislav Petkov
2019-08-16  7:03   ` Greg KH
2019-08-19  2:39     ` Zhao, Yakui
2019-08-19  5:25       ` Greg KH
2019-08-19  1:44   ` Zhao, Yakui
2019-08-19  5:25     ` Greg KH
2019-08-19  5:39       ` Zhao, Yakui
2019-08-19  6:18     ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).