All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/10] KVM/ARM Implementation
@ 2011-08-06 10:38 Christoffer Dall
  2011-08-06 10:39 ` [PATCH v4 01/10] ARM: KVM: Initial skeleton to compile KVM support Christoffer Dall
                   ` (10 more replies)
  0 siblings, 11 replies; 34+ messages in thread
From: Christoffer Dall @ 2011-08-06 10:38 UTC (permalink / raw)
  To: kvm; +Cc: catalin.marinas, tech, android-virt

The following series implements KVM support for ARM processors,
specifically on the Cortex A-15 platform.

The patch series applies to the arm-lpae branch of ARM Ltd's kernel
tree. This is Version 4 of the patch series, but the first two versions
were reviewed outside of the KVM mailing list. Changes can also be
pulled from:
  git://git.ncl.cs.columbia.edu/pub/git/linux-kvm-arm kvm-a15-v4

The implementation is broken up into a logical set of patches, the first
one containing a skeleton of files, makefile changes, the basic user
space interface and KVM architecture specific stubs.  Subsequent patches
implement parts of the system as listed:
 1.  Skeleton
 2.  Identity Mapping for Hyp mode
 3.  Hypervisor intitalization
 4.  Hyp mode memory mappings and 2nd stage preparation
 5.  World-switch implementation and Hyp exception vectors
 6.  Emulation framework and CP15 emulation
 7.  Handle guest user memory aborts
 8.  Handle guest MMIO aborts
 9.  Handle userspace IRQ/FIQ injection
 10. Support guest wait-for-interrupt instructions.

Testing:
Limited testing, but have run GCC inside guest, which compiled a small
hellow-world program, which was successfully run. Hardware still
unavailable, so all testing has been done on ARM Fast Models.

For a guide on how to set up a testing environment and try out these
patches, see:
  http://wiki.ncl.cs.columbia.edu/wiki/KVMARM:Guides:Development_Environment

Changes since v3:
 - v4 actually works, fully boots a guest
 - Support compiling as a module
 - Use static inlines instead of macros for vcpu_reg and friends
 - Optimize kvm_vcpu_reg function
 - Use Ftrace for trace capabilities
 - Updated documentation and commenting
 - Use KVM_IRQ_LINE instead of KVM_INTERRUPT
 - Emulates load/store instructions not supported through HSR
   syndrome information.
 - Frees 2nd stage translation tables on VM teardown
 - Handles IRQ/FIQ instructions
 - Handles more CP15 accesses
 - Support guest WFI calls
 - Uses debugfs instead of /proc
 - Support compiling in Thumb mode

Changes since v2:
 - Performs world-switch code
 - Maps guest memory using 2nd stage translation
 - Emulates co-processor 15 instructions
 - Forwards I/O faults to QEMU.

---

Christoffer Dall (10):
      ARM: KVM: Initial skeleton to compile KVM support
      ARM: KVM: Hypervisor identity mapping
      ARM: KVM: Add hypervisor inititalization
      ARM: KVM: Memory virtualization setup
      ARM: KVM: Inject IRQs and FIQs from userspace
      ARM: KVM: World-switch implementation
      ARM: KVM: Emulation framework and CP15 emulation
      ARM: KVM: Handle guest faults in KVM
      ARM: KVM: Handle I/O aborts
      ARM: KVM: Guest wait-for-interrupts (WFI) support


 Documentation/kvm/api.txt                   |   11 
 arch/arm/Kconfig                            |    2 
 arch/arm/Makefile                           |    1 
 arch/arm/include/asm/kvm.h                  |   75 +++
 arch/arm/include/asm/kvm_arm.h              |  130 +++++
 arch/arm/include/asm/kvm_asm.h              |   51 ++
 arch/arm/include/asm/kvm_emulate.h          |  100 ++++
 arch/arm/include/asm/kvm_host.h             |  105 ++++
 arch/arm/include/asm/kvm_mmu.h              |   46 ++
 arch/arm/include/asm/kvm_para.h             |    9 
 arch/arm/include/asm/pgtable-3level-hwdef.h |    6 
 arch/arm/include/asm/pgtable-3level.h       |    9 
 arch/arm/include/asm/pgtable.h              |   15 +
 arch/arm/include/asm/unified.h              |   12 
 arch/arm/kernel/armksyms.c                  |    6 
 arch/arm/kernel/asm-offsets.c               |   33 +
 arch/arm/kernel/entry-armv.S                |    1 
 arch/arm/kvm/Kconfig                        |   44 ++
 arch/arm/kvm/Makefile                       |   18 +
 arch/arm/kvm/arm.c                          |  701 +++++++++++++++++++++++++++
 arch/arm/kvm/arm_emulate.c                  |  604 +++++++++++++++++++++++
 arch/arm/kvm/arm_exports.c                  |   26 +
 arch/arm/kvm/arm_guest.c                    |  150 ++++++
 arch/arm/kvm/arm_init.S                     |  115 ++++
 arch/arm/kvm/arm_interrupts.S               |  488 +++++++++++++++++++
 arch/arm/kvm/arm_mmu.c                      |  549 +++++++++++++++++++++
 arch/arm/kvm/debug.c                        |  377 +++++++++++++++
 arch/arm/kvm/debug.h                        |   63 ++
 arch/arm/kvm/trace.h                        |  131 +++++
 arch/arm/mach-vexpress/Kconfig              |    1 
 arch/arm/mm/Kconfig                         |    7 
 arch/arm/mm/idmap.c                         |   52 ++
 arch/arm/mm/mmu.c                           |    3 
 include/linux/kvm.h                         |    1 
 mm/memory.c                                 |    1 
 35 files changed, 3939 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/asm/kvm_mmu.h
 create mode 100644 arch/arm/include/asm/kvm_para.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/arm_emulate.c
 create mode 100644 arch/arm/kvm/arm_exports.c
 create mode 100644 arch/arm/kvm/arm_guest.c
 create mode 100644 arch/arm/kvm/arm_init.S
 create mode 100644 arch/arm/kvm/arm_interrupts.S
 create mode 100644 arch/arm/kvm/arm_mmu.c
 create mode 100644 arch/arm/kvm/debug.c
 create mode 100644 arch/arm/kvm/debug.h
 create mode 100644 arch/arm/kvm/trace.h

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 01/10] ARM: KVM: Initial skeleton to compile KVM support
  2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
@ 2011-08-06 10:39 ` Christoffer Dall
  2011-08-06 10:39 ` [PATCH v4 02/10] ARM: KVM: Hypervisor identity mapping Christoffer Dall
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Christoffer Dall @ 2011-08-06 10:39 UTC (permalink / raw)
  To: kvm; +Cc: catalin.marinas, tech, android-virt

Targets KVM support for Cortex A-15 processors.

Contains no real functionality but all the framework components,
make files, header files and some tracing functionality.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/Kconfig                   |    2 
 arch/arm/Makefile                  |    1 
 arch/arm/include/asm/kvm.h         |   66 ++++++
 arch/arm/include/asm/kvm_asm.h     |   28 +++
 arch/arm/include/asm/kvm_emulate.h |   91 +++++++++
 arch/arm/include/asm/kvm_host.h    |   93 +++++++++
 arch/arm/include/asm/kvm_para.h    |    9 +
 arch/arm/include/asm/unified.h     |   12 +
 arch/arm/kvm/Kconfig               |   44 ++++
 arch/arm/kvm/Makefile              |   18 ++
 arch/arm/kvm/arm.c                 |  272 ++++++++++++++++++++++++++
 arch/arm/kvm/arm_emulate.c         |  121 ++++++++++++
 arch/arm/kvm/arm_exports.c         |   16 ++
 arch/arm/kvm/arm_guest.c           |  148 ++++++++++++++
 arch/arm/kvm/arm_init.S            |   17 ++
 arch/arm/kvm/arm_interrupts.S      |   17 ++
 arch/arm/kvm/arm_mmu.c             |   15 +
 arch/arm/kvm/debug.c               |  377 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/debug.h               |   63 ++++++
 arch/arm/kvm/trace.h               |   52 +++++
 arch/arm/mach-vexpress/Kconfig     |    1 
 arch/arm/mm/Kconfig                |    7 +
 22 files changed, 1470 insertions(+), 0 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/asm/kvm_para.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/arm_emulate.c
 create mode 100644 arch/arm/kvm/arm_exports.c
 create mode 100644 arch/arm/kvm/arm_guest.c
 create mode 100644 arch/arm/kvm/arm_init.S
 create mode 100644 arch/arm/kvm/arm_interrupts.S
 create mode 100644 arch/arm/kvm/arm_mmu.c
 create mode 100644 arch/arm/kvm/debug.c
 create mode 100644 arch/arm/kvm/debug.h
 create mode 100644 arch/arm/kvm/trace.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c7b01b0..3cc74c7 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2049,3 +2049,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
+
+source "arch/arm/kvm/Kconfig"
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index c7d321a..718876d 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -258,6 +258,7 @@ core-$(CONFIG_VFP)		+= arch/arm/vfp/
 
 # If we have a machine-specific directory, then include it in the build.
 core-y				+= arch/arm/kernel/ arch/arm/mm/ arch/arm/common/
+core-y 				+= arch/arm/kvm/
 core-y				+= $(machdirs) $(platdirs)
 
 drivers-$(CONFIG_OPROFILE)      += arch/arm/oprofile/
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
new file mode 100644
index 0000000..87dc33b
--- /dev/null
+++ b/arch/arm/include/asm/kvm.h
@@ -0,0 +1,66 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __ARM_KVM_H__
+#define __ARM_KVM_H__
+
+#include <asm/types.h>
+
+/*
+ * Modes used for short-hand mode determinition in the world-switch code and
+ * in emulation code.
+ *
+ * Note: These indices do NOT correspond to the value of the CPSR mode bits!
+ */
+#define MODE_FIQ	0
+#define MODE_IRQ	1
+#define MODE_SVC	2
+#define MODE_ABT	3
+#define MODE_UND	4
+#define MODE_USR	5
+#define MODE_SYS	6
+
+struct kvm_regs {
+	__u32 regs0_7[8];	/* Unbanked regs. (r0 - r7)	   */
+	__u32 fiq_regs8_12[5];	/* Banked fiq regs. (r8 - r12)	   */
+	__u32 usr_regs8_12[5];	/* Banked usr registers (r8 - r12) */
+	__u32 reg13[6];		/* Banked r13, indexed by MODE_	   */
+	__u32 reg14[6];		/* Banked r13, indexed by MODE_	   */
+	__u32 reg15;
+	__u32 cpsr;
+	__u32 spsr[5];		/* Banked SPSR,  indexed by MODE_  */
+	struct {
+		__u32 c1_sys;
+		__u32 c2_base0;
+		__u32 c2_base1;
+		__u32 c3_dacr;
+	} cp15;
+
+};
+
+struct kvm_sregs {
+};
+
+struct kvm_fpu {
+};
+
+struct kvm_guest_debug_arch {
+};
+
+struct kvm_debug_exit_arch {
+};
+
+#endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
new file mode 100644
index 0000000..c3d4458
--- /dev/null
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -0,0 +1,28 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __ARM_KVM_ASM_H__
+#define __ARM_KVM_ASM_H__
+
+#define ARM_EXCEPTION_RESET	  0
+#define ARM_EXCEPTION_UNDEFINED   1
+#define ARM_EXCEPTION_SOFTWARE    2
+#define ARM_EXCEPTION_PREF_ABORT  3
+#define ARM_EXCEPTION_DATA_ABORT  4
+#define ARM_EXCEPTION_IRQ	  5
+#define ARM_EXCEPTION_FIQ	  6
+
+#endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
new file mode 100644
index 0000000..91d461a
--- /dev/null
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -0,0 +1,91 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __ARM_KVM_EMULATE_H__
+#define __ARM_KVM_EMULATE_H__
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_asm.h>
+
+u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
+
+static inline unsigned char vcpu_mode(struct kvm_vcpu *vcpu)
+{
+	u8 modes_table[16] = {
+		MODE_USR,	/* 0x0 */
+		MODE_FIQ,	/* 0x1 */
+		MODE_IRQ,	/* 0x2 */
+		MODE_SVC,	/* 0x3 */
+		0xf, 0xf, 0xf,
+		MODE_ABT,	/* 0x7 */
+		0xf, 0xf, 0xf,
+		MODE_UND,	/* 0xb */
+		0xf, 0xf, 0xf,
+		MODE_SYS};	/* 0xf */
+
+	BUG_ON(modes_table[vcpu->arch.regs.cpsr & 0xf] == 0xf);
+	return modes_table[vcpu->arch.regs.cpsr & 0xf];
+}
+
+/*
+ * Return the SPSR for the specified mode of the virtual CPU.
+ */
+static inline u32 *kvm_vcpu_spsr(struct kvm_vcpu *vcpu, u32 mode)
+{
+	switch (mode) {
+	case MODE_SVC:
+		return &vcpu->arch.regs.svc_regs[2];
+	case MODE_ABT:
+		return &vcpu->arch.regs.svc_regs[2];
+	case MODE_UND:
+		return &vcpu->arch.regs.svc_regs[2];
+	case MODE_IRQ:
+		return &vcpu->arch.regs.svc_regs[2];
+	case MODE_FIQ:
+		return &vcpu->arch.regs.fiq_regs[7];
+	default:
+		BUG();
+	}
+}
+
+/* Get vcpu register for current mode */
+static inline u32 *vcpu_reg(struct kvm_vcpu *vcpu, unsigned long reg_num)
+{
+	return kvm_vcpu_reg(vcpu, reg_num, vcpu_mode(vcpu));
+}
+
+static inline u32 *vcpu_cpsr(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.regs.cpsr;
+}
+
+/* Get vcpu SPSR for current mode */
+static inline u32 *vcpu_spsr(struct kvm_vcpu *vcpu)
+{
+	return kvm_vcpu_spsr(vcpu, vcpu_mode(vcpu));
+}
+
+static inline bool mode_has_spsr(struct kvm_vcpu *vcpu)
+{
+	return (vcpu_mode(vcpu) < MODE_USR);
+}
+
+static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
+{
+	return ((vcpu_mode(vcpu)) == MODE_USR) ? 0 : 1;
+}
+
+#endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
new file mode 100644
index 0000000..b2fcd8a
--- /dev/null
+++ b/arch/arm/include/asm/kvm_host.h
@@ -0,0 +1,93 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __ARM_KVM_HOST_H__
+#define __ARM_KVM_HOST_H__
+
+#define KVM_MAX_VCPUS 1
+#define KVM_MEMORY_SLOTS 32
+#define KVM_PRIVATE_MEM_SLOTS 4
+#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+
+/* We don't currently support large pages. */
+#define KVM_HPAGE_GFN_SHIFT(x)	0
+#define KVM_NR_PAGE_SIZES	1
+#define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
+
+struct kvm_vcpu;
+u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
+
+struct kvm_arch {
+};
+
+#define EXCEPTION_NONE      0
+#define EXCEPTION_RESET     0x80
+#define EXCEPTION_UNDEFINED 0x40
+#define EXCEPTION_SOFTWARE  0x20
+#define EXCEPTION_PREFETCH  0x10
+#define EXCEPTION_DATA      0x08
+#define EXCEPTION_IMPRECISE 0x04
+#define EXCEPTION_IRQ       0x02
+#define EXCEPTION_FIQ       0x01
+
+struct kvm_vcpu_regs {
+	u32 usr_regs[15];	/* R0_usr - R14_usr */
+	u32 svc_regs[3];	/* SP_svc, LR_svc, SPSR_svc */
+	u32 abt_regs[3];	/* SP_abt, LR_abt, SPSR_abt */
+	u32 und_regs[3];	/* SP_und, LR_und, SPSR_und */
+	u32 irq_regs[3];	/* SP_irq, LR_irq, SPSR_irq */
+	u32 fiq_regs[8];	/* R8_fiq - R14_fiq, SPSR_fiq */
+	u32 pc;			/* The program counter (r15) */
+	u32 cpsr;		/* The guest CPSR */
+} __packed;
+
+struct kvm_vcpu_arch {
+	struct kvm_vcpu_regs regs;
+
+	/* System control coprocessor (cp15) */
+	struct {
+		u32 c1_SCTLR;		/* System Control Register */
+		u32 c1_ACTLR;		/* Auxilliary Control Register */
+		u32 c1_CPACR;		/* Coprocessor Access Control */
+		u64 c2_TTBR0;		/* Translation Table Base Register 0 */
+		u64 c2_TTBR1;		/* Translation Table Base Register 1 */
+		u32 c2_TTBCR;		/* Translation Table Base Control R. */
+		u32 c3_DACR;		/* Domain Access Control Register */
+	} cp15;
+
+	u32 virt_irq;		/* HCR exception mask */
+
+	/* Exception Information */
+	u32 hsr;		/* Hyp Syndrom Register */
+	u32 hdfar;		/* Hyp Data Fault Address Register */
+	u32 hifar;		/* Hyp Inst. Fault Address Register */
+	u32 hpfar;		/* Hyp IPA Fault Address Register */
+
+	/* IO related fields */
+	u32 mmio_rd;
+
+	/* Misc. fields */
+	u32 wait_for_interrupts;
+};
+
+struct kvm_vm_stat {
+	u32 remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat {
+};
+
+#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_para.h b/arch/arm/include/asm/kvm_para.h
new file mode 100644
index 0000000..7ce5f1c
--- /dev/null
+++ b/arch/arm/include/asm/kvm_para.h
@@ -0,0 +1,9 @@
+#ifndef _ASM_X86_KVM_PARA_H
+#define _ASM_X86_KVM_PARA_H
+
+static inline unsigned int kvm_arch_para_features(void)
+{
+	return 0;
+}
+
+#endif /* _ASM_X86_KVM_PARA_H */
diff --git a/arch/arm/include/asm/unified.h b/arch/arm/include/asm/unified.h
index bc63116..0d41bde 100644
--- a/arch/arm/include/asm/unified.h
+++ b/arch/arm/include/asm/unified.h
@@ -54,6 +54,18 @@
 
 #endif	/* CONFIG_THUMB2_KERNEL */
 
+#ifdef CONFIG_KVM_ARM_HOST
+#ifdef __ASSEMBLY__
+.arch_extension sec
+.arch_extension virt
+#else
+__asm__(
+"	.arch_extension sec\n"
+"	.arch_extension virt\n"
+);
+#endif
+#endif
+
 #ifndef CONFIG_ARM_ASM_UNIFIED
 
 /*
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
new file mode 100644
index 0000000..ccabbb3
--- /dev/null
+++ b/arch/arm/kvm/Kconfig
@@ -0,0 +1,44 @@
+#
+# KVM configuration
+#
+
+source "virt/kvm/Kconfig"
+
+menuconfig VIRTUALIZATION
+	bool "Virtualization"
+	---help---
+	  Say Y here to get to see options for using your Linux host to run
+	  other operating systems inside virtual machines (guests).
+	  This option alone does not add any kernel code.
+
+	  If you say N, all options in this submenu will be skipped and
+	  disabled.
+
+if VIRTUALIZATION
+
+config KVM
+	tristate "Kernel-based Virtual Machine (KVM) support"
+	select PREEMPT_NOTIFIERS
+	select ANON_INODES
+	select KVM_ARM_HOST
+	select KVM_MMIO
+	---help---
+	  Support hosting virtualized guest machines. You will also
+	  need to select one or more of the processor modules below.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
+config KVM_ARM_HOST
+	bool "KVM host support for ARM cpus."
+	depends on KVM
+	depends on MMU
+	depends on CPU_V7 || ARM_VIRT_EXT
+	---help---
+	  Provides host support for ARM processors.
+
+source drivers/virtio/Kconfig
+
+endif # VIRTUALIZATION
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
new file mode 100644
index 0000000..017c65a
--- /dev/null
+++ b/arch/arm/kvm/Makefile
@@ -0,0 +1,18 @@
+#
+# Makefile for Kernel-based Virtual Machine module
+#
+
+ccflags-y += -Ivirt/kvm -Iarch/arm/kvm
+CFLAGS_arm.o     := -I.
+CFLAGS_arm_mmu.o := -I.
+
+EXTRA_CFLAGS += -Ivirt/kvm -Iarch/arm/kvm
+AFLAGS_arm_interrupts.o := -I$(obj)
+
+obj-$(CONFIG_KVM_ARM_HOST) += arm_init.o arm_interrupts.o arm_exports.o
+
+kvm-arm-y += $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+
+kvm-arm-y += arm.o arm_guest.o arm_mmu.o arm_emulate.o debug.o
+
+obj-$(CONFIG_KVM) += kvm-arm.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
new file mode 100644
index 0000000..1e7a907
--- /dev/null
+++ b/arch/arm/kvm/arm.c
@@ -0,0 +1,272 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <linux/mman.h>
+#include <linux/sched.h>
+#include <trace/events/kvm.h>
+
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
+#include <asm/unified.h>
+#include <asm/uaccess.h>
+#include <asm/ptrace.h>
+#include <asm/mman.h>
+
+#include "debug.h"
+
+int kvm_arch_hardware_enable(void *garbage)
+{
+	return 0;
+}
+
+void kvm_arch_hardware_disable(void *garbage)
+{
+}
+
+int kvm_arch_hardware_setup(void)
+{
+	return 0;
+}
+
+void kvm_arch_hardware_unsetup(void)
+{
+}
+
+void kvm_arch_check_processor_compat(void *rtn)
+{
+	*(int *)rtn = 0;
+}
+
+void kvm_arch_sync_events(struct kvm *kvm)
+{
+}
+
+int kvm_arch_init_vm(struct kvm *kvm)
+{
+	return 0;
+}
+
+void kvm_arch_destroy_vm(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+		if (kvm->vcpus[i]) {
+			kvm_arch_vcpu_free(kvm->vcpus[i]);
+			kvm->vcpus[i] = NULL;
+		}
+	}
+}
+
+int kvm_dev_ioctl_check_extension(long ext)
+{
+	int r;
+	switch (ext) {
+	case KVM_CAP_USER_MEMORY:
+	case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
+		r = 1;
+		break;
+	case KVM_CAP_COALESCED_MMIO:
+		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
+		break;
+	default:
+		r = 0;
+		break;
+	}
+	return r;
+}
+
+long kvm_arch_dev_ioctl(struct file *filp,
+			unsigned int ioctl, unsigned long arg)
+{
+	int ret = 0;
+
+	switch (ioctl) {
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret < 0)
+		printk(KERN_ERR "error processing ARM ioct: %d", ret);
+	return ret;
+}
+
+int kvm_arch_set_memory_region(struct kvm *kvm,
+			       struct kvm_userspace_memory_region *mem,
+			       struct kvm_memory_slot old,
+			       int user_alloc)
+{
+	return 0;
+}
+
+int kvm_arch_prepare_memory_region(struct kvm *kvm,
+				   struct kvm_memory_slot *memslot,
+				   struct kvm_memory_slot old,
+				   struct kvm_userspace_memory_region *mem,
+				   int user_alloc)
+{
+	return 0;
+}
+
+void kvm_arch_commit_memory_region(struct kvm *kvm,
+				   struct kvm_userspace_memory_region *mem,
+				   struct kvm_memory_slot old,
+				   int user_alloc)
+{
+}
+
+void kvm_arch_flush_shadow(struct kvm *kvm)
+{
+}
+
+struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	int err;
+	struct kvm_vcpu *vcpu;
+
+	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	if (!vcpu) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	latest_vcpu = vcpu;
+	return vcpu;
+free_vcpu:
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
+out:
+	return ERR_PTR(err);
+}
+
+void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	latest_vcpu = NULL;
+	KVMARM_NOT_IMPLEMENTED();
+}
+
+void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvm_arch_vcpu_free(vcpu);
+}
+
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
+{
+	KVMARM_NOT_IMPLEMENTED();
+	return 0;
+}
+
+void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
+void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
+{
+}
+
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+					struct kvm_guest_debug *dbg)
+{
+	return -EINVAL;
+}
+
+
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	KVMARM_NOT_IMPLEMENTED();
+	return -EINVAL;
+}
+
+long kvm_arch_vcpu_ioctl(struct file *filp,
+			 unsigned int ioctl, unsigned long arg)
+{
+	kvm_err(-EINVAL, "Unsupported ioctl (%d)", ioctl);
+	return -EINVAL;
+}
+
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vm_ioctl(struct file *filp,
+		       unsigned int ioctl, unsigned long arg)
+{
+	printk(KERN_ERR "kvm_arch_vm_ioctl: Unsupported ioctl (%d)\n", ioctl);
+	return -EINVAL;
+}
+
+int kvm_arch_init(void *opaque)
+{
+	return 0;
+}
+
+void kvm_arch_exit(void)
+{
+}
+
+static int arm_init(void)
+{
+	int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+	if (rc == 0)
+		kvm_arm_debugfs_init();
+	return rc;
+}
+
+static void __exit arm_exit(void)
+{
+	kvm_exit();
+	kvm_arm_debugfs_exit();
+}
+
+module_init(arm_init);
+module_exit(arm_exit)
diff --git a/arch/arm/kvm/arm_emulate.c b/arch/arm/kvm/arm_emulate.c
new file mode 100644
index 0000000..6587dde
--- /dev/null
+++ b/arch/arm/kvm/arm_emulate.c
@@ -0,0 +1,121 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#include <asm/kvm_emulate.h>
+
+#define USR_REG_OFFSET(_reg) \
+	offsetof(struct kvm_vcpu_arch, regs.usr_regs[_reg])
+
+static unsigned long vcpu_reg_offsets[MODE_SYS + 1][16] = {
+	/* FIQ Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7),
+		offsetof(struct kvm_vcpu_arch, regs.fiq_regs[1]), /* r8 */
+		offsetof(struct kvm_vcpu_arch, regs.fiq_regs[1]), /* r9 */
+		offsetof(struct kvm_vcpu_arch, regs.fiq_regs[2]), /* r10 */
+		offsetof(struct kvm_vcpu_arch, regs.fiq_regs[3]), /* r11 */
+		offsetof(struct kvm_vcpu_arch, regs.fiq_regs[4]), /* r12 */
+		offsetof(struct kvm_vcpu_arch, regs.fiq_regs[5]), /* r13 */
+		offsetof(struct kvm_vcpu_arch, regs.fiq_regs[6]), /* r14 */
+		offsetof(struct kvm_vcpu_arch, regs.pc)		  /* r15 */
+	},
+
+	/* IRQ Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		offsetof(struct kvm_vcpu_arch, regs.irq_regs[0]), /* r13 */
+		offsetof(struct kvm_vcpu_arch, regs.irq_regs[1]), /* r14 */
+		offsetof(struct kvm_vcpu_arch, regs.pc)	          /* r15 */
+	},
+
+	/* SVC Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		offsetof(struct kvm_vcpu_arch, regs.svc_regs[0]), /* r13 */
+		offsetof(struct kvm_vcpu_arch, regs.svc_regs[1]), /* r14 */
+		offsetof(struct kvm_vcpu_arch, regs.pc)		  /* r15 */
+	},
+
+	/* ABT Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		offsetof(struct kvm_vcpu_arch, regs.abt_regs[0]), /* r13 */
+		offsetof(struct kvm_vcpu_arch, regs.abt_regs[1]), /* r14 */
+		offsetof(struct kvm_vcpu_arch, regs.pc)	          /* r15 */
+	},
+
+	/* UND Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		offsetof(struct kvm_vcpu_arch, regs.und_regs[0]), /* r13 */
+		offsetof(struct kvm_vcpu_arch, regs.und_regs[1]), /* r14 */
+		offsetof(struct kvm_vcpu_arch, regs.pc)	          /* r15 */
+	},
+
+	/* USR Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		offsetof(struct kvm_vcpu_arch, regs.usr_regs[13]), /* r13 */
+		offsetof(struct kvm_vcpu_arch, regs.usr_regs[14]), /* r14 */
+		offsetof(struct kvm_vcpu_arch, regs.pc)	           /* r15 */
+	},
+
+	/* SYS Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		offsetof(struct kvm_vcpu_arch, regs.usr_regs[13]), /* r13 */
+		offsetof(struct kvm_vcpu_arch, regs.usr_regs[14]), /* r14 */
+		offsetof(struct kvm_vcpu_arch, regs.pc)	           /* r15 */
+	},
+};
+
+/*
+ * Return a pointer to the register number valid in the specified mode of
+ * the virtual CPU.
+ */
+u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
+{
+	BUG_ON(reg_num > 15);
+	BUG_ON(mode > MODE_SYS);
+
+	return (u32 *)((void *)&vcpu->arch + vcpu_reg_offsets[mode][reg_num]);
+}
diff --git a/arch/arm/kvm/arm_exports.c b/arch/arm/kvm/arm_exports.c
new file mode 100644
index 0000000..d8a7fd5
--- /dev/null
+++ b/arch/arm/kvm/arm_exports.c
@@ -0,0 +1,16 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/module.h>
diff --git a/arch/arm/kvm/arm_guest.c b/arch/arm/kvm/arm_guest.c
new file mode 100644
index 0000000..94a5c54
--- /dev/null
+++ b/arch/arm/kvm/arm_guest.c
@@ -0,0 +1,148 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <asm/uaccess.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
+
+
+#define VM_STAT(x) (offsetof(struct kvm, stat.x), KVM_STAT_VM)
+#define VCPU_STAT(x) (offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU)
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ NULL }
+};
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	struct kvm_vcpu_regs *vcpu_regs = &vcpu->arch.regs;
+
+	/*
+	 * GPRs and PSRs
+	 */
+	memcpy(regs->regs0_7, &(vcpu_regs->usr_regs[0]), sizeof(u32) * 8);
+	memcpy(regs->usr_regs8_12, &(vcpu_regs->usr_regs[8]), sizeof(u32) * 5);
+	memcpy(regs->fiq_regs8_12, &(vcpu_regs->fiq_regs[0]), sizeof(u32) * 5);
+	regs->reg13[MODE_FIQ] = vcpu_regs->fiq_regs[5];
+	regs->reg14[MODE_FIQ] = vcpu_regs->fiq_regs[6];
+	regs->reg13[MODE_IRQ] = vcpu_regs->irq_regs[0];
+	regs->reg14[MODE_IRQ] = vcpu_regs->irq_regs[1];
+	regs->reg13[MODE_SVC] = vcpu_regs->svc_regs[0];
+	regs->reg14[MODE_SVC] = vcpu_regs->svc_regs[1];
+	regs->reg13[MODE_ABT] = vcpu_regs->abt_regs[0];
+	regs->reg14[MODE_ABT] = vcpu_regs->abt_regs[1];
+	regs->reg13[MODE_UND] = vcpu_regs->und_regs[0];
+	regs->reg14[MODE_UND] = vcpu_regs->und_regs[1];
+	regs->reg13[MODE_USR] = vcpu_regs->usr_regs[0];
+	regs->reg14[MODE_USR] = vcpu_regs->usr_regs[1];
+
+	regs->spsr[MODE_FIQ]  = vcpu_regs->fiq_regs[7];
+	regs->spsr[MODE_IRQ]  = vcpu_regs->irq_regs[2];
+	regs->spsr[MODE_SVC]  = vcpu_regs->svc_regs[2];
+	regs->spsr[MODE_ABT]  = vcpu_regs->abt_regs[2];
+	regs->spsr[MODE_UND]  = vcpu_regs->und_regs[2];
+
+	regs->reg15 = vcpu_regs->pc;
+	regs->cpsr = vcpu_regs->cpsr;
+
+
+	/*
+	 * Co-processor registers.
+	 */
+	regs->cp15.c1_sys = vcpu->arch.cp15.c1_SCTLR;
+	regs->cp15.c2_base0 = vcpu->arch.cp15.c2_TTBR0;
+	regs->cp15.c2_base1 = vcpu->arch.cp15.c2_TTBR1;
+	regs->cp15.c3_dacr = vcpu->arch.cp15.c3_DACR;
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	struct kvm_vcpu_regs *vcpu_regs = &vcpu->arch.regs;
+
+	memcpy(&(vcpu_regs->usr_regs[0]), regs->regs0_7, sizeof(u32) * 8);
+	memcpy(&(vcpu_regs->usr_regs[8]), regs->usr_regs8_12, sizeof(u32) * 5);
+	memcpy(&(vcpu_regs->fiq_regs[0]), regs->fiq_regs8_12, sizeof(u32) * 5);
+
+	vcpu_regs->fiq_regs[5] = regs->reg13[MODE_FIQ];
+	vcpu_regs->fiq_regs[6] = regs->reg14[MODE_FIQ];
+	vcpu_regs->irq_regs[0] = regs->reg13[MODE_IRQ];
+	vcpu_regs->irq_regs[1] = regs->reg14[MODE_IRQ];
+	vcpu_regs->svc_regs[0] = regs->reg13[MODE_SVC];
+	vcpu_regs->svc_regs[1] = regs->reg14[MODE_SVC];
+	vcpu_regs->abt_regs[0] = regs->reg13[MODE_ABT];
+	vcpu_regs->abt_regs[1] = regs->reg14[MODE_ABT];
+	vcpu_regs->und_regs[0] = regs->reg13[MODE_UND];
+	vcpu_regs->und_regs[1] = regs->reg14[MODE_UND];
+	vcpu_regs->usr_regs[0] = regs->reg13[MODE_USR];
+	vcpu_regs->usr_regs[1] = regs->reg14[MODE_USR];
+
+	vcpu_regs->fiq_regs[7] = regs->spsr[MODE_FIQ];
+	vcpu_regs->irq_regs[2] = regs->spsr[MODE_IRQ];
+	vcpu_regs->svc_regs[2] = regs->spsr[MODE_SVC];
+	vcpu_regs->abt_regs[2] = regs->spsr[MODE_ABT];
+	vcpu_regs->und_regs[2] = regs->spsr[MODE_UND];
+
+	/*
+	 * Co-processor registers.
+	 */
+	vcpu->arch.cp15.c1_SCTLR = regs->cp15.c1_sys;
+
+	vcpu_regs->pc = regs->reg15;
+	vcpu_regs->cpsr = regs->cpsr;
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+				  struct kvm_translation *tr)
+{
+	return -EINVAL;
+}
diff --git a/arch/arm/kvm/arm_init.S b/arch/arm/kvm/arm_init.S
new file mode 100644
index 0000000..073a494
--- /dev/null
+++ b/arch/arm/kvm/arm_init.S
@@ -0,0 +1,17 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/arm_interrupts.S b/arch/arm/kvm/arm_interrupts.S
new file mode 100644
index 0000000..073a494
--- /dev/null
+++ b/arch/arm/kvm/arm_interrupts.S
@@ -0,0 +1,17 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/arm_mmu.c b/arch/arm/kvm/arm_mmu.c
new file mode 100644
index 0000000..2cccd48
--- /dev/null
+++ b/arch/arm/kvm/arm_mmu.c
@@ -0,0 +1,15 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
diff --git a/arch/arm/kvm/debug.c b/arch/arm/kvm/debug.c
new file mode 100644
index 0000000..c2b213a
--- /dev/null
+++ b/arch/arm/kvm/debug.c
@@ -0,0 +1,377 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kvm_types.h>
+#include <linux/kvm_host.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <asm/kvm_emulate.h>
+
+#include "debug.h"
+
+static struct dentry *vcpu_debugfs_file;
+static struct dentry *ws_debugfs_file;
+
+/******************************************************************************
+ * World-switch ring-buffer
+ */
+
+#define WS_TRACE_ITEMS 10
+static u32 ws_trace_enter[WS_TRACE_ITEMS];
+static int ws_trace_enter_index;
+static u32 ws_trace_exit[WS_TRACE_ITEMS];
+static int ws_trace_exit_index;
+DEFINE_MUTEX(ws_trace_mutex);
+
+void debug_ws_enter(u32 guest_pc)
+{
+	mutex_lock(&ws_trace_mutex);
+	ws_trace_enter[ws_trace_enter_index++] = guest_pc;
+	if (ws_trace_enter_index >= WS_TRACE_ITEMS)
+		ws_trace_enter_index = 0;
+	mutex_unlock(&ws_trace_mutex);
+}
+
+void debug_ws_exit(u32 guest_pc)
+{
+	mutex_lock(&ws_trace_mutex);
+	ws_trace_exit[ws_trace_exit_index++] = guest_pc;
+	if (ws_trace_exit_index >= WS_TRACE_ITEMS)
+		ws_trace_exit_index = 0;
+	mutex_unlock(&ws_trace_mutex);
+}
+
+void print_ws_trace(void)
+{
+	int i;
+	mutex_lock(&ws_trace_mutex);
+
+	if (ws_trace_enter_index != ws_trace_exit_index) {
+		kvm_msg("enter and exit WS trace count differ");
+		mutex_unlock(&ws_trace_mutex);
+		return;
+	}
+
+	/* Avoid potential endless loop */
+	if (ws_trace_enter_index < 0 ||
+	    ws_trace_enter_index >= WS_TRACE_ITEMS) {
+		kvm_msg("ws_trace_enter_index out of bounds: %d",
+				ws_trace_enter_index);
+		mutex_unlock(&ws_trace_mutex);
+		return;
+	}
+
+	for (i = ws_trace_enter_index - 1; i != ws_trace_enter_index; i--) {
+		if (i < 0) {
+			i = WS_TRACE_ITEMS;
+			continue;
+		}
+
+		printk(KERN_ERR "Enter: %08x    Exit: %08x\n",
+			ws_trace_enter[i],
+			ws_trace_exit[i]);
+	}
+	mutex_unlock(&ws_trace_mutex);
+}
+
+/******************************************************************************
+ * Dump total debug info, or write to debugfs entry
+ */
+
+struct kvm_vcpu *latest_vcpu;
+
+void print_kvm_vcpu_info(int (*print_fn)(print_fn_args), struct seq_file *m)
+{
+	int i;
+	struct kvm_vcpu_regs *regs;
+	char *mode = NULL;
+	struct kvm_vcpu *vcpu = latest_vcpu;
+
+	print_fn(m, "KVM/ARM runtime info\n");
+	print_fn(m, "======================================================");
+	print_fn(m, "\n\n");
+
+	if (vcpu == NULL) {
+		print_fn(m, "No registered VCPU\n");
+		goto out;
+	}
+
+	switch (vcpu_mode(vcpu)) {
+	case MODE_USR:
+		mode = "USR";
+		break;
+	case MODE_FIQ:
+		mode = "FIQ";
+		break;
+	case MODE_IRQ:
+		mode = "IRQ";
+		break;
+	case MODE_SVC:
+		mode = "SVC";
+		break;
+	case MODE_ABT:
+		mode = "ABT";
+		break;
+	case MODE_UND:
+		mode = "UND";
+		break;
+	case MODE_SYS:
+		mode = "SYS";
+		break;
+	}
+
+	vcpu_load(vcpu);
+	regs = &vcpu->arch.regs;
+
+	print_fn(m, "Virtual CPU state:\n\n");
+	print_fn(m, "PC is at: \t%08x\n", vcpu_reg(vcpu, 15));
+	print_fn(m, "CPSR:     \t%08x\n(Mode: %s)  (IRQs: %s)  (FIQs: %s) "
+		      "  (Vec: %s)\n",
+		      regs->cpsr, mode,
+		      (regs->cpsr & PSR_I_BIT) ? "off" : "on",
+		      (regs->cpsr & PSR_F_BIT) ? "off" : "on",
+		      (regs->cpsr & PSR_V_BIT) ? "high" : "low");
+
+	for (i = 0; i <= 12; i++) {
+		if ((i % 4) == 0)
+			print_fn(m, "\nregs[%u]: ", i);
+
+		print_fn(m, "\t0x%08x", *kvm_vcpu_reg(vcpu, i, MODE_USR));
+	}
+
+	print_fn(m, "\n\n");
+	print_fn(m, "Banked registers:  \tr13\t\tr14\t\tspsr\n");
+	print_fn(m, "-------------------\t--------\t--------\t--------\n");
+	print_fn(m, "             USR:  \t%08x\t%08x\t////////\n",
+			*kvm_vcpu_reg(vcpu, 13, MODE_USR),
+			*kvm_vcpu_reg(vcpu, 14, MODE_USR));
+	print_fn(m, "             SVC:  \t%08x\t%08x\t%08x\n",
+			*kvm_vcpu_reg(vcpu, 13, MODE_SVC),
+			*kvm_vcpu_reg(vcpu, 14, MODE_SVC),
+			*kvm_vcpu_spsr(vcpu, MODE_SVC));
+	print_fn(m, "             ABT:  \t%08x\t%08x\t%08x\n",
+			*kvm_vcpu_reg(vcpu, 13, MODE_ABT),
+			*kvm_vcpu_reg(vcpu, 14, MODE_ABT),
+			*kvm_vcpu_spsr(vcpu, MODE_ABT));
+	print_fn(m, "             UND:  \t%08x\t%08x\t%08x\n",
+			*kvm_vcpu_reg(vcpu, 13, MODE_UND),
+			*kvm_vcpu_reg(vcpu, 14, MODE_UND),
+			*kvm_vcpu_spsr(vcpu, MODE_UND));
+	print_fn(m, "             IRQ:  \t%08x\t%08x\t%08x\n",
+			*kvm_vcpu_reg(vcpu, 13, MODE_IRQ),
+			*kvm_vcpu_reg(vcpu, 14, MODE_IRQ),
+			*kvm_vcpu_spsr(vcpu, MODE_IRQ));
+	print_fn(m, "             FIQ:  \t%08x\t%08x\t%08x\n",
+			*kvm_vcpu_reg(vcpu, 13, MODE_FIQ),
+			*kvm_vcpu_reg(vcpu, 14, MODE_FIQ),
+			*kvm_vcpu_spsr(vcpu, MODE_FIQ));
+
+	print_fn(m, "\n");
+	print_fn(m, "fiq regs:\t%08x\t%08x\t%08x\t%08x\n"
+			  "         \t%08x\n",
+			*kvm_vcpu_reg(vcpu, 8, MODE_FIQ),
+			*kvm_vcpu_reg(vcpu, 9, MODE_FIQ),
+			*kvm_vcpu_reg(vcpu, 10, MODE_FIQ),
+			*kvm_vcpu_reg(vcpu, 11, MODE_FIQ),
+			*kvm_vcpu_reg(vcpu, 12, MODE_FIQ));
+
+out:
+	if (vcpu != NULL)
+		vcpu_put(vcpu);
+}
+
+void print_kvm_ws_info(int (*print_fn)(print_fn_args), struct seq_file *m)
+{
+	int i;
+
+	/*
+	 * Print world-switch trace circular buffer
+	 */
+	print_fn(m, "World switch history:\n");
+	print_fn(m, "---------------------\n");
+	mutex_lock(&ws_trace_mutex);
+
+	if (ws_trace_enter_index != ws_trace_exit_index ||
+	    ws_trace_enter_index < 0 ||
+	    ws_trace_enter_index >= WS_TRACE_ITEMS) {
+		mutex_unlock(&ws_trace_mutex);
+		return;
+	}
+
+	for (i = ws_trace_enter_index - 1; i != ws_trace_enter_index; i--) {
+		if (i < 0) {
+			i = WS_TRACE_ITEMS;
+			continue;
+		}
+
+		print_fn(m, "Enter: %08x    Exit: %08x\n",
+			ws_trace_enter[i], ws_trace_exit[i]);
+	}
+	mutex_unlock(&ws_trace_mutex);
+}
+
+static int __printk_relay(struct seq_file *m, const char *fmt, ...)
+{
+	va_list ap;
+	va_start(ap, fmt);
+	vprintk(fmt, ap);
+	va_end(ap);
+	return 0;
+}
+
+void kvm_dump_vcpu_state(void)
+{
+	print_kvm_vcpu_info(&__printk_relay, NULL);
+}
+
+void kvm_arm_trace_init(void)
+{
+
+}
+
+/******************************************************************************
+ * debugfs handling
+ */
+
+static int vcpu_debugfs_show(struct seq_file *m, void *v)
+{
+	print_kvm_vcpu_info(&seq_printf, m);
+	return 0;
+}
+
+static int ws_debugfs_show(struct seq_file *m, void *v)
+{
+	print_kvm_ws_info(&seq_printf, m);
+	return 0;
+}
+
+static void *k_start(struct seq_file *m, loff_t *pos)
+{
+	return *pos < 1 ? (void *)1 : NULL;
+}
+
+static void *k_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	++*pos;
+	return NULL;
+}
+
+static void k_stop(struct seq_file *m, void *v)
+{
+}
+
+static const struct seq_operations vcpu_debugfs_op = {
+	.start	= k_start,
+	.next	= k_next,
+	.stop	= k_stop,
+	.show	= vcpu_debugfs_show
+};
+
+static const struct seq_operations ws_debugfs_op = {
+	.start	= k_start,
+	.next	= k_next,
+	.stop	= k_stop,
+	.show	= ws_debugfs_show
+};
+
+static int vcpu_debugfs_open(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &vcpu_debugfs_op);
+}
+
+static int ws_debugfs_open(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &ws_debugfs_op);
+}
+
+static const struct file_operations vcpu_debugfs_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = vcpu_debugfs_open,
+	.read	 = seq_read,
+	.llseek	 = seq_lseek,
+	.release = seq_release,
+};
+
+static const struct file_operations ws_debugfs_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = ws_debugfs_open,
+	.read	 = seq_read,
+	.llseek	 = seq_lseek,
+	.release = seq_release,
+};
+
+/**
+ * kvm_arm_debugfs_init - create debugfs directory and files
+ *
+ * Create the debugfs entries for KVM/ARM
+ */
+void kvm_arm_debugfs_init(void)
+{
+	struct dentry *file;
+
+	file = debugfs_create_file("vcpu", 0444, kvm_debugfs_dir,
+				     NULL, &vcpu_debugfs_fops);
+	if (IS_ERR(file) || !file) {
+		kvm_err(PTR_ERR(file),
+			"cannot create debugfs KVM/ARM vcpu file\n");
+		return;
+	}
+	vcpu_debugfs_file = file;
+
+	file = debugfs_create_file("ws", 0444, kvm_debugfs_dir,
+				     NULL, &ws_debugfs_fops);
+	if (IS_ERR(file) || !file) {
+		kvm_err(PTR_ERR(file),
+			"cannot create debugfs KVM/ARM ws file\n");
+	}
+	ws_debugfs_file = file;
+}
+
+void kvm_arm_debugfs_exit(void)
+{
+	if (vcpu_debugfs_file)
+		debugfs_remove(vcpu_debugfs_file);
+	if (ws_debugfs_file)
+		debugfs_remove(ws_debugfs_file);
+}
+
+
+/******************************************************************************
+ * Printk-log-wrapping functionality
+ */
+
+#define TMP_LOG_LEN 512
+static char __tmp_log_data[TMP_LOG_LEN];
+DEFINE_MUTEX(__tmp_log_lock);
+void __kvm_print_msg(char *fmt, ...)
+{
+	va_list ap;
+	unsigned int size;
+
+	mutex_lock(&__tmp_log_lock);
+
+	va_start(ap, fmt);
+	size = vsnprintf(__tmp_log_data, TMP_LOG_LEN, fmt, ap);
+	va_end(ap);
+
+	if (size >= TMP_LOG_LEN)
+		printk(KERN_ERR "Message exceeded log length!\n");
+	else
+		printk(KERN_INFO "%s", __tmp_log_data);
+
+	mutex_unlock(&__tmp_log_lock);
+}
diff --git a/arch/arm/kvm/debug.h b/arch/arm/kvm/debug.h
new file mode 100644
index 0000000..4f5f381
--- /dev/null
+++ b/arch/arm/kvm/debug.h
@@ -0,0 +1,63 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ *
+ * This file contains debugging and tracing functions and definitions
+ * for KVM/ARM.
+ */
+#ifndef __ARM_KVM_TRACE_H__
+#define __ARM_KVM_TRACE_H__
+
+#include <linux/types.h>
+#include <linux/kvm_types.h>
+#include <linux/kvm_host.h>
+
+extern struct kvm_vcpu *latest_vcpu;
+
+void kvm_dump_vcpu_state(void);
+
+void debug_ws_enter(u32 guest_pc);
+void debug_ws_exit(u32 guest_pc);
+
+#define print_fn_args struct seq_file *, const char *, ...
+void print_kvm_debug_info(int (*print_fn)(print_fn_args), struct seq_file *m);
+
+void __kvm_print_msg(char *_fmt, ...);
+
+#define kvm_err(err, fmt, args...) do {			\
+	__kvm_print_msg(KERN_ERR "KVM error [%s:%d]: (%d) ", \
+			__func__, __LINE__, err); \
+	__kvm_print_msg(fmt "\n", ##args); \
+} while (0)
+
+#define __kvm_msg(fmt, args...) do {			\
+	__kvm_print_msg(KERN_ERR "KVM [%s:%d]: ", __func__, __LINE__); \
+	__kvm_print_msg(fmt, ##args); \
+} while (0)
+
+#define kvm_msg(__fmt, __args...) __kvm_msg(__fmt "\n", ##__args)
+
+
+#define KVMARM_NOT_IMPLEMENTED() \
+{ \
+	printk(KERN_ERR "KVM not implemented [%s:%d] in %s\n", \
+			__FILE__, __LINE__, __func__); \
+}
+
+void print_ws_trace(void);
+
+void kvm_arm_debugfs_init(void);
+void kvm_arm_debugfs_exit(void);
+
+#endif  /* __ARM_KVM_TRACE_H__ */
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
new file mode 100644
index 0000000..f8869c1
--- /dev/null
+++ b/arch/arm/kvm/trace.h
@@ -0,0 +1,52 @@
+#if !defined(_TRACE_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+/*
+ * Tracepoints for entry/exit to guest
+ */
+TRACE_EVENT(kvm_entry,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+TRACE_EVENT(kvm_exit,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+
+
+#endif /* _TRACE_KVM_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH arch/arm/kvm
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/arch/arm/mach-vexpress/Kconfig b/arch/arm/mach-vexpress/Kconfig
index e8c1111..34febe1 100644
--- a/arch/arm/mach-vexpress/Kconfig
+++ b/arch/arm/mach-vexpress/Kconfig
@@ -36,6 +36,7 @@ config ARCH_VEXPRESS_CA15X4
 	bool "Versatile Express Cortex-A15x4 tile"
 	depends on VEXPRESS_EXTENDED_MEMORY_MAP
 	select CPU_V7
+	select ARM_VIRT_EXT
 	select ARM_GIC
 	select HAVE_ARCH_TIMERS
 
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index afb5231..ab7d8ea 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -636,6 +636,13 @@ config ARM_LPAE
 	  Say Y if you have an ARMv7 processor supporting the LPAE page table
 	  format and you would like to access memory beyond the 4GB limit.
 
+config ARM_VIRT_EXT
+	bool "Support for ARM Virtualization Extensions"
+	depends on ARM_LPAE
+	help
+	  Say Y if you have an ARMv7 processor supporting the ARM hardware
+	  Virtualization extensions.
+
 config ARCH_PHYS_ADDR_T_64BIT
 	def_bool ARM_LPAE
 


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 02/10] ARM: KVM: Hypervisor identity mapping
  2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
  2011-08-06 10:39 ` [PATCH v4 01/10] ARM: KVM: Initial skeleton to compile KVM support Christoffer Dall
@ 2011-08-06 10:39 ` Christoffer Dall
  2011-08-09  9:20   ` Avi Kivity
  2011-08-06 10:39 ` [PATCH v4 03/10] ARM: KVM: Add hypervisor inititalization Christoffer Dall
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Christoffer Dall @ 2011-08-06 10:39 UTC (permalink / raw)
  To: kvm; +Cc: catalin.marinas, tech, android-virt

Adds support in the identity mapping feature that allows KVM to setup
identity mapping for the Hyp mode with the AP[1] bit set as required by
the specification and also supports freeing created sub pmd's after
finished use.

These two functions:
 - hyp_identity_mapping_add(pgd, addr, end);
 - hyp_identity_mapping_del(pgd, addr, end);
are essentially calls the same function as the non-hyp versions but
with a different argument value. KVM calls these functions to setup
and teardown the identity mapping used to initialize the hypervisor.

Note, the hyp-version of the _del function actually frees the pmd's
pointed to by the pgd as opposed to the non-hyp version which just
clears them.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/pgtable-3level-hwdef.h |    1 +
 arch/arm/include/asm/pgtable.h              |    6 +++
 arch/arm/mm/idmap.c                         |   47 ++++++++++++++++++++++++++-
 3 files changed, 53 insertions(+), 1 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index 7c238a3..8ed298f 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -49,6 +49,7 @@
 #endif
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
+#define PMD_SECT_AP1		(_AT(pmdval_t, 1) << 6)
 #define PMD_SECT_TEX(x)		(_AT(pmdval_t, 0))
 
 /*
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 9645e52..da74cd1 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -409,6 +409,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 void identity_mapping_add(pgd_t *, unsigned long, unsigned long);
 void identity_mapping_del(pgd_t *, unsigned long, unsigned long);
 
+#ifdef CONFIG_KVM_ARM_HOST
+void hyp_identity_mapping_add(pgd_t *, unsigned long, unsigned long);
+void hyp_identity_mapping_del(pgd_t *pgd, unsigned long addr,
+			      unsigned long end);
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* CONFIG_MMU */
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 24e0655..83dc20d 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -56,11 +56,16 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 	} while (pud++, addr = next, addr != end);
 }
 
-void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+static void __identity_mapping_add(pgd_t *pgd, unsigned long addr,
+				   unsigned long end, bool hyp_mapping)
 {
 	unsigned long prot, next;
 
 	prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
+
+	if (hyp_mapping)
+		prot |= PMD_SECT_AP1;
+
 	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
 		prot |= PMD_BIT4;
 
@@ -71,6 +76,12 @@ void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
 	} while (pgd++, addr = next, addr != end);
 }
 
+void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+{
+	__identity_mapping_add(pgd, addr, end, false);
+}
+
+
 #ifdef CONFIG_SMP
 static void idmap_del_pmd(pud_t *pud, unsigned long addr, unsigned long end)
 {
@@ -105,6 +116,40 @@ void identity_mapping_del(pgd_t *pgd, unsigned long addr, unsigned long end)
 }
 #endif
 
+#ifdef CONFIG_KVM_ARM_HOST
+void hyp_identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+{
+	__identity_mapping_add(pgd, addr, end, true);
+}
+
+static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
+{
+	pmd_t *pmd;
+
+	pmd = pmd_offset(pgd, addr);
+	pmd_free(NULL, pmd);
+}
+
+/*
+ * This version actually frees the underlying pmds for all pgds in range and
+ * clear the pgds themselves afterwards.
+ */
+void hyp_identity_mapping_del(pgd_t *pgd, unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	pgd_t *next_pgd;
+
+	do {
+		next = pgd_addr_end(addr, end);
+		next_pgd = pgd + pgd_index(addr);
+		if (!pgd_none_or_clear_bad(next_pgd)) {
+			hyp_idmap_del_pmd(next_pgd, addr);
+			pgd_clear(next_pgd);
+		}
+	} while (addr = next, addr < end);
+}
+#endif
+
 /*
  * In order to soft-boot, we need to insert a 1:1 mapping in place of
  * the user-mode pages.  This will then ensure that we have predictable


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 03/10] ARM: KVM: Add hypervisor inititalization
  2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
  2011-08-06 10:39 ` [PATCH v4 01/10] ARM: KVM: Initial skeleton to compile KVM support Christoffer Dall
  2011-08-06 10:39 ` [PATCH v4 02/10] ARM: KVM: Hypervisor identity mapping Christoffer Dall
@ 2011-08-06 10:39 ` Christoffer Dall
  2011-08-06 10:39 ` [PATCH v4 04/10] ARM: KVM: Memory virtualization setup Christoffer Dall
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Christoffer Dall @ 2011-08-06 10:39 UTC (permalink / raw)
  To: kvm; +Cc: catalin.marinas, tech, android-virt

Sets up the required registers to run code in HYP-mode from the kernel.
No major controversies, but we should consider how to deal with SMP
support for hypervisor stack page.

By setting the HVBAR the kernel can execute code in Hyp-mode with
the MMU disabled. The HVBAR initially points to initialization code,
which initializes other Hyp-mode registers and enables the MMU
for Hyp-mode. Afterwards, the HVBAR is changed to point to KVM
Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.

Also provides memory mapping code to map required code pages and data
structures accessed in Hyp mode at the same virtual address as the
host kernel virtual addresses, but which conforms to the architectural
requirements for translations in Hyp mode. This interface is added in
arch/arm/kvm/arm_mmu.c and is comprised of:
 - create_hyp_mappings(hyp_pgd, start, end);
 - remove_hyp_mappings(hyp_pgd, start, end);
 - free_hyp_pmds(pgd_hyp);

 See the implementation for more details.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h              |  103 +++++++++++++
 arch/arm/include/asm/kvm_asm.h              |   23 +++
 arch/arm/include/asm/kvm_host.h             |    1 
 arch/arm/include/asm/kvm_mmu.h              |   40 +++++
 arch/arm/include/asm/pgtable-3level-hwdef.h |    5 +
 arch/arm/include/asm/pgtable.h              |    5 +
 arch/arm/kvm/arm.c                          |  169 +++++++++++++++++++++
 arch/arm/kvm/arm_exports.c                  |   10 +
 arch/arm/kvm/arm_init.S                     |   98 ++++++++++++
 arch/arm/kvm/arm_interrupts.S               |   30 ++++
 arch/arm/kvm/arm_mmu.c                      |  213 +++++++++++++++++++++++++++
 arch/arm/mm/idmap.c                         |    7 +
 mm/memory.c                                 |    1 
 13 files changed, 704 insertions(+), 1 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_mmu.h

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
new file mode 100644
index 0000000..835abd1
--- /dev/null
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -0,0 +1,103 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __KVM_ARM_H__
+#define __KVM_ARM_H__
+
+#include <asm/types.h>
+
+/* Hyp Configuration Register (HCR) bits */
+#define HCR_TGE		(1 << 27)
+#define HCR_TVM		(1 << 26)
+#define HCR_TTLB	(1 << 25)
+#define HCR_TPU		(1 << 24)
+#define HCR_TPC		(1 << 23)
+#define HCR_TSW		(1 << 22)
+#define HCR_TAC		(1 << 21)
+#define HCR_TIDCP	(1 << 20)
+#define HCR_TSC		(1 << 19)
+#define HCR_TID3	(1 << 18)
+#define HCR_TID2	(1 << 17)
+#define HCR_TID1	(1 << 16)
+#define HCR_TID0	(1 << 15)
+#define HCR_TWE		(1 << 14)
+#define HCR_TWI		(1 << 13)
+#define HCR_DC		(1 << 12)
+#define HCR_BSU		(3 << 10)
+#define HCR_FB		(1 << 9)
+#define HCR_VA		(1 << 8)
+#define HCR_VI		(1 << 7)
+#define HCR_VF		(1 << 6)
+#define HCR_AMO		(1 << 5)
+#define HCR_IMO		(1 << 4)
+#define HCR_FMO		(1 << 3)
+#define HCR_PTW		(1 << 2)
+#define HCR_SWIO	(1 << 1)
+#define HCR_VM		1
+#define HCR_GUEST_MASK (HCR_TSC | HCR_TWE | HCR_TWI | HCR_VM | HCR_AMO | \
+			HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO)
+
+/* Hyp System Control Register (HSCTLR) bits */
+#define HSCTLR_TE	(1 << 30)
+#define HSCTLR_EE	(1 << 25)
+#define HSCTLR_FI	(1 << 21)
+#define HSCTLR_WXN	(1 << 19)
+#define HSCTLR_I	(1 << 12)
+#define HSCTLR_C	(1 << 2)
+#define HSCTLR_A	(1 << 1)
+#define HSCTLR_M	1
+#define HSCTLR_MASK	(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
+			 HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
+
+/* TTBCR and HTCR Registers bits */
+#define TTBCR_EAE	(1 << 31)
+#define TTBCR_IMP	(1 << 30)
+#define TTBCR_SH1	(3 << 28)
+#define TTBCR_ORGN1	(3 << 26)
+#define TTBCR_IRGN1	(3 << 24)
+#define TTBCR_EPD1	(1 << 23)
+#define TTBCR_A1	(1 << 22)
+#define TTBCR_T1SZ	(3 << 16)
+#define TTBCR_SH0	(3 << 12)
+#define TTBCR_ORGN0	(3 << 10)
+#define TTBCR_IRGN0	(3 << 8)
+#define TTBCR_EPD0	(1 << 7)
+#define TTBCR_T0SZ	3
+#define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
+
+
+/* Virtualization Translation Control Register (VTCR) bits */
+#define VTCR_SH0	(3 << 12)
+#define VTCR_ORGN0	(3 << 10)
+#define VTCR_IRGN0	(3 << 8)
+#define VTCR_SL0	(3 << 6)
+#define VTCR_S		(1 << 4)
+#define VTCR_T0SZ	3
+#define VTCR_MASK	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0 | VTCR_SL0 | \
+			 VTCR_S | VTCR_T0SZ | VTCR_MASK)
+#define VTCR_HTCR_SH	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0)
+#define VTCR_SL_L2	0		/* Starting-level: 2 */
+#define VTCR_SL_L1	(1 << 6)	/* Starting-level: 1 */
+#define VTCR_GUEST_SL	VTCR_SL_L1
+#define VTCR_GUEST_T0SZ	0
+#if VTCR_GUEST_SL == 0
+#define VTTBR_X		(14 - VTCR_GUEST_T0SZ)
+#else
+#define VTTBR_X		(5 - VTCR_GUEST_T0SZ)
+#endif
+
+
+#endif /* __KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index c3d4458..78cf8d3 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -24,5 +24,28 @@
 #define ARM_EXCEPTION_DATA_ABORT  4
 #define ARM_EXCEPTION_IRQ	  5
 #define ARM_EXCEPTION_FIQ	  6
+#define ARM_EXCEPTION_HVC	  7
+
+/*
+ * SMC Hypervisor API call numbers
+ */
+#ifdef __ASSEMBLY__
+.equ SMCHYP_HVBAR_W, 0xfffffff0
+#else /* !__ASSEMBLY__ */
+asm(".equ SMCHYP_HVBAR_W, 0xfffffff0");
+#endif /* __ASSEMBLY__ */
+
+#ifndef __ASSEMBLY__
+struct kvm_vcpu;
+
+extern unsigned long __kvm_hyp_init;
+extern unsigned long __kvm_hyp_init_end;
+
+extern unsigned long __kvm_hyp_vector;
+extern unsigned long __kvm_hyp_vector_end;
+
+extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+extern unsigned long __kvm_vcpu_run_end;
+#endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index b2fcd8a..6a10467 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -31,6 +31,7 @@ struct kvm_vcpu;
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 
 struct kvm_arch {
+	pgd_t *pgd;     /* 1-level 2nd stage table */
 };
 
 #define EXCEPTION_NONE      0
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
new file mode 100644
index 0000000..d22aad0
--- /dev/null
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -0,0 +1,40 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __ARM_KVM_MMU_H__
+#define __ARM_KVM_MMU_H__
+
+/*
+ * The architecture supports 40-bit IPA as input to the 2nd stage translations
+ * and PTRS_PER_PGD2 could therefore be 1024.
+ *
+ * To save a bit of memory and to avoid alignment issues we assume 39-bit IPA
+ * for now, but remember that the level-1 table must be aligned to its size.
+ */
+#define PTRS_PER_PGD2	512
+#define PGD2_ORDER	get_order(PTRS_PER_PGD2 * sizeof(pgd_t))
+
+extern pgd_t *kvm_hyp_pgd;
+
+int create_hyp_mappings(pgd_t *hyp_pgd,
+			unsigned long start,
+			unsigned long end);
+void remove_hyp_mappings(pgd_t *hyp_pgd,
+			 unsigned long start,
+			 unsigned long end);
+void free_hyp_pmds(pgd_t *hyp_pgd);
+
+#endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index 8ed298f..7dd1dba 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -32,6 +32,9 @@
 #define PMD_TYPE_SECT		(_AT(pmdval_t, 1) << 0)
 #define PMD_BIT4		(_AT(pmdval_t, 0))
 #define PMD_DOMAIN(x)		(_AT(pmdval_t, 0))
+#define PMD_APTABLE_SHIFT	(61)
+#define PMD_APTABLE		(_AT(pgdval_t, 3) << PGD_APTABLE_SHIFT)
+#define PMD_PXNTABLE		(_AT(pgdval_t, 1) << 59)
 
 /*
  *   - section
@@ -44,8 +47,10 @@
 #ifdef __ASSEMBLY__
 /* avoid 'shift count out of range' warning */
 #define PMD_SECT_XN		(0)
+#define PMD_SECT_PXN		(0)
 #else
 #define PMD_SECT_XN		((pmdval_t)1 << 54)
+#define PMD_SECT_PXN		((pmdval_t)1 << 53)
 #endif
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index da74cd1..db3b6e8 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -88,6 +88,7 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
+#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
@@ -223,6 +224,10 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #ifdef CONFIG_ARM_LPAE
 
 #define pmd_bad(pmd)		(!(pmd_val(pmd) & 2))
+#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_TABLE)
+#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_SECT)
 
 #define copy_pmd(pmdpd,pmdps)		\
 	do {				\
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 1e7a907..ccfb225 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -31,9 +31,20 @@
 #include <asm/uaccess.h>
 #include <asm/ptrace.h>
 #include <asm/mman.h>
+#include <asm/tlbflush.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_mmu.h>
 
 #include "debug.h"
 
+static void *kvm_arm_hyp_stack_page;
+
+/* The VMID used in the VTTBR */
+#define VMID_SIZE (1<<8)
+static DECLARE_BITMAP(kvm_vmids, VMID_SIZE);
+static DEFINE_MUTEX(kvm_vmids_mutex);
+
 int kvm_arch_hardware_enable(void *garbage)
 {
 	return 0;
@@ -245,13 +256,171 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	return -EINVAL;
 }
 
+/**
+ * Inits Hyp-mode on a single CPU
+ */
+static int init_hyp_mode(void)
+{
+	phys_addr_t init_phys_addr, init_end_phys_addr;
+	unsigned long vector_ptr, hyp_stack_ptr;
+	int err = 0;
+
+	/*
+	 * Allocate Hyp level-1 page table
+	 */
+	kvm_hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!kvm_hyp_pgd)
+		return -ENOMEM;
+
+	/*
+	 * Allocate stack page for Hypervisor-mode
+	 */
+	kvm_arm_hyp_stack_page = (void *)__get_free_page(GFP_KERNEL);
+	if (!kvm_arm_hyp_stack_page) {
+		err = -ENOMEM;
+		goto out_free_pgd;
+	}
+
+	hyp_stack_ptr = (unsigned long)kvm_arm_hyp_stack_page + PAGE_SIZE;
+
+	init_phys_addr = virt_to_phys((void *)&__kvm_hyp_init);
+	init_end_phys_addr = virt_to_phys((void *)&__kvm_hyp_init_end);
+
+	/*
+	 * Create identity mapping
+	 */
+	hyp_identity_mapping_add(kvm_hyp_pgd,
+				 (unsigned long)init_phys_addr,
+				 (unsigned long)init_end_phys_addr);
+
+	/*
+	 * Set the HVBAR
+	 */
+	BUG_ON(init_phys_addr & 0x1f);
+	asm volatile (
+		"mov	r0, %[vector_ptr]\n\t"
+		"ldr	r7, =SMCHYP_HVBAR_W\n\t"
+		"smc	#0\n\t" : :
+		[vector_ptr] "r" ((unsigned long)init_phys_addr) :
+		"r0", "r7");
+
+	/*
+	 * Call initialization code
+	 */
+	asm volatile (
+		"mov	r0, %[pgd_ptr]\n\t"
+		"mov	r1, %[stack_ptr]\n\t"
+		"hvc	#0\n\t" : :
+		[pgd_ptr] "r" (virt_to_phys(kvm_hyp_pgd)),
+		[stack_ptr] "r" (hyp_stack_ptr) :
+		"r0", "r1");
+
+	/*
+	 * Unmap the identity mapping
+	 */
+	hyp_identity_mapping_del(kvm_hyp_pgd,
+				 (unsigned long)init_phys_addr,
+				 (unsigned long)init_end_phys_addr);
+
+	/*
+	 * Set the HVBAR to the virtual kernel address
+	 */
+	vector_ptr = (unsigned long)&__kvm_hyp_vector;
+	asm volatile (
+		"mov	r0, %[vector_ptr]\n\t"
+		"ldr	r7, =SMCHYP_HVBAR_W\n\t"
+		"smc	#0\n\t" : :
+		[vector_ptr] "r" ((unsigned long)vector_ptr) :
+		"r0", "r7");
+
+	return err;
+out_free_pgd:
+	kfree(kvm_hyp_pgd);
+	kvm_hyp_pgd = NULL;
+	return err;
+}
+
+/*
+ * Initializes the memory mappings used in Hyp-mode
+ *
+ * Code executed in Hyp-mode and a stack page per cpu must be mapped into the
+ * hypervisor translation tables.
+ *
+ * Currently there is no SMP support so we map only a single stack page on a
+ * single CPU.
+ */
+static int init_hyp_memory(void)
+{
+	int err = 0;
+	unsigned long start, end;
+
+	/*
+	 * Map Hyp exception vectors
+	 */
+	start = (unsigned long)&__kvm_hyp_vector;
+	end = (unsigned long)&__kvm_hyp_vector_end;
+	err = create_hyp_mappings(kvm_hyp_pgd, start, end);
+	if (err) {
+		kvm_err(err, "Cannot map hyp vector");
+		goto out_free_mappings;
+	}
+
+	/*
+	 * Map the world-switch code
+	 */
+	start = (unsigned long)&__kvm_vcpu_run;
+	end = (unsigned long)&__kvm_vcpu_run_end;
+	err = create_hyp_mappings(kvm_hyp_pgd, start, end);
+	if (err) {
+		kvm_err(err, "Cannot map world-switch code");
+		goto out_free_mappings;
+	}
+
+	/*
+	 * Map the Hyp stack page
+	 */
+	start = (unsigned long)kvm_arm_hyp_stack_page;
+	end = start + PAGE_SIZE - 1;
+	err = create_hyp_mappings(kvm_hyp_pgd, start, end);
+	if (err) {
+		kvm_err(err, "Cannot map hyp stack");
+		goto out_free_mappings;
+	}
+
+	return err;
+out_free_mappings:
+	free_hyp_pmds(kvm_hyp_pgd);
+	return err;
+}
+
+/**
+ * Initialize Hyp-mode and memory mappings on all CPUs.
+ */
 int kvm_arch_init(void *opaque)
 {
+	int err;
+
+	err = init_hyp_mode();
+	if (err)
+		goto out_err;
+
+	err = init_hyp_memory();
+	if (err)
+		goto out_err;
+
+	set_bit(0, kvm_vmids);
 	return 0;
+out_err:
+	return err;
 }
 
 void kvm_arch_exit(void)
 {
+	if (kvm_hyp_pgd) {
+		free_hyp_pmds(kvm_hyp_pgd);
+		kfree(kvm_hyp_pgd);
+		kvm_hyp_pgd = NULL;
+	}
 }
 
 static int arm_init(void)
diff --git a/arch/arm/kvm/arm_exports.c b/arch/arm/kvm/arm_exports.c
index d8a7fd5..0fdd5ff 100644
--- a/arch/arm/kvm/arm_exports.c
+++ b/arch/arm/kvm/arm_exports.c
@@ -14,3 +14,13 @@
  */
 
 #include <linux/module.h>
+#include <asm/kvm_asm.h>
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_init);
+EXPORT_SYMBOL_GPL(__kvm_hyp_init_end);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_vector);
+EXPORT_SYMBOL_GPL(__kvm_hyp_vector_end);
+
+EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
+EXPORT_SYMBOL_GPL(__kvm_vcpu_run_end);
diff --git a/arch/arm/kvm/arm_init.S b/arch/arm/kvm/arm_init.S
index 073a494..5f7e922 100644
--- a/arch/arm/kvm/arm_init.S
+++ b/arch/arm/kvm/arm_init.S
@@ -13,5 +13,103 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  *
  */
+
+#include <linux/linkage.h>
+#include <asm/unified.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor initialization
+@    - should be called with:
+@        r0 = Hypervisor pgd pointer
+@        r1 = top of Hyp stack (kernel VA)
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	.text
+	.arm
+	.align 12
+__kvm_hyp_init:
+	.globl __kvm_hyp_init
+
+	@ Hyp-mode exception vector
+	nop
+	nop
+	nop
+	nop
+	nop
+	b	__do_hyp_init
+	nop
+	nop
+
+__do_hyp_init:
+	@ Set the sp to end of this page and push data for later use
+	mov	sp, pc
+	bic	sp, sp, #0x0ff
+	bic	sp, sp, #0xf00
+	add	sp, sp, #0x1000
+	push	{r1, r2, r12}
+
+	@ Set the HTTBR to point to the hypervisor PGD pointer passed to
+	@ function and set the upper bits equal to the kernel PGD.
+	mrrc	p15, 1, r1, r2, c2
+	mcrr	p15, 4, r0, r2, c2
+
+	@ Set the HTCR and VTCR to the same shareability and cacheability
+	@ settings as the non-secure TTBCR and with T0SZ == 0.
+	mrc	p15, 4, r0, c2, c0, 2	@ HTCR
+	ldr	r12, =HTCR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c2, c0, 2	@ TTBCR
+	and	r1, r1, #(HTCR_MASK & ~TTBCR_T0SZ)
+	orr	r0, r0, r1
+	mcr	p15, 4, r0, c2, c0, 2	@ HTCR
+
+	mrc	p15, 4, r1, c2, c1, 2	@ VTCR
+	bic	r1, r1, #(VTCR_HTCR_SH | VTCR_SL0)
+	bic	r0, r0, #(~VTCR_HTCR_SH)
+	orr	r1, r0, r1
+	orr	r1, r1, #(VTCR_SL_L1 | VTCR_GUEST_T0SZ)
+	mcr	p15, 4, r1, c2, c1, 2	@ VTCR
+
+	@ Use the same memory attributes for hyp. accesses as the kernel
+	@ (copy MAIRx ro HMAIRx).
+	mrc	p15, 0, r0, c10, c2, 0
+	mcr	p15, 4, r0, c10, c2, 0
+	mrc	p15, 0, r0, c10, c2, 1
+	mcr	p15, 4, r0, c10, c2, 1
+
+	@ Set the HSCTLR to:
+	@  - ARM/THUMB exceptions: ARM
+	@  - Endianness: Kernel config
+	@  - Fast Interrupt Features: Kernel config
+	@  - Write permission implies XN: disabled
+	@  - Instruction cache: enabled
+	@  - Data/Unified cache: enabled
+	@  - Memory alignment checks: enabled
+	@  - MMU: enabled (this code must be run from an identity mapping)
+	mrc	p15, 4, r0, c1, c0, 0	@ HSCR
+	ldr	r12, =HSCTLR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c1, c0, 0	@ SCTLR
+	ldr	r12, =(HSCTLR_EE | HSCTLR_FI)
+	and	r1, r1, r12
+	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I)
+	orr	r1, r1, r12
+	orr	r0, r0, r1
+	isb
+	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
+	isb
+
+	@ Set stack pointer and return to the kernel
+	pop	{r1, r2, r12}
+	mov	sp, r1
+	eret
+
+	.ltorg
+
+	.align 12
+
+	__kvm_init_sp:
+	.globl __kvm_hyp_init_end
+__kvm_hyp_init_end:
diff --git a/arch/arm/kvm/arm_interrupts.S b/arch/arm/kvm/arm_interrupts.S
index 073a494..2edc49b 100644
--- a/arch/arm/kvm/arm_interrupts.S
+++ b/arch/arm/kvm/arm_interrupts.S
@@ -13,5 +13,35 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  *
  */
+
+#include <linux/linkage.h>
+#include <asm/unified.h>
+#include <asm/page.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor world-switch code
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+	.text
+	.arm
+
+ENTRY(__kvm_vcpu_run)
+THUMB(	orr	lr, lr, #1)
+	mov	pc, lr
+__kvm_vcpu_run_end:
+	.globl __kvm_vcpu_run_end
+
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor exception vector and handlers
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+	.align 5
+__kvm_hyp_vector:
+	.globl __kvm_hyp_vector
+	nop
+__kvm_hyp_vector_end:
+	.globl __kvm_hyp_vector_end
diff --git a/arch/arm/kvm/arm_mmu.c b/arch/arm/kvm/arm_mmu.c
index 2cccd48..8fefda2 100644
--- a/arch/arm/kvm/arm_mmu.c
+++ b/arch/arm/kvm/arm_mmu.c
@@ -13,3 +13,216 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  *
  */
+
+#include <linux/mman.h>
+#include <linux/kvm_host.h>
+#include <asm/pgalloc.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
+
+#include "debug.h"
+
+pgd_t *kvm_hyp_pgd;
+
+static void free_ptes(pmd_t *pmd, unsigned long addr)
+{
+	pte_t *pte;
+	unsigned int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			pte_free_kernel(NULL, pte);
+		}
+		pmd++;
+	}
+}
+
+/**
+ * free_hyp_pmds - free a Hyp-mode level-2 tables and child level-3 tables
+ * @hypd_pgd:	The Hyp-mode page table pointer
+ *
+ * Assumes this is a page table used strictly in Hyp-mode and therefore contains
+ * only mappings in the kernel memory area, which is above PAGE_OFFSET.
+ */
+void free_hyp_pmds(pgd_t *hyp_pgd)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr, next, end;
+
+	addr = PAGE_OFFSET;
+	end = ~0;
+	do {
+		next = pgd_addr_end(addr, end);
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		BUG_ON(pud_bad(*pud));
+
+		if (pud_none(*pud))
+			continue;
+
+		pmd = pmd_offset(pud, addr);
+		free_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+	} while (addr = next, addr != end);
+}
+
+static void remove_hyp_pte_mappings(pmd_t *pmd, unsigned long addr,
+						unsigned long end)
+{
+	pte_t *pte;
+
+	do {
+		pte = pte_offset_kernel(pmd, addr);
+		pte_clear(NULL, addr, pte);
+	} while (addr += PAGE_SIZE, addr < end);
+}
+
+static void remove_hyp_pmd_mappings(pgd_t *pgd, unsigned long addr,
+					       unsigned long end)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long next;
+
+	do {
+		next = pmd_addr_end(addr, end);
+		pud = pud_offset(pgd, addr);
+		pmd = pmd_offset(pud, addr);
+
+		BUG_ON(pmd_sect(*pmd));
+
+		if (!pmd_none(*pmd))
+			remove_hyp_pte_mappings(pmd, addr, next);
+	} while (addr = next, addr < end);
+}
+
+/**
+ * remove_hyp_mappings - clear hypervisor mappings from specified range
+ * @hyp_pgd:	The Hyp-mode page table pointer
+ * @start:	The start virtual address of the area to clear
+ * @end:	The end virtual address of the area to clear
+ *
+ * The page tables aren't actually freed - call free_hyp_pmds to do this.
+ */
+void remove_hyp_mappings(pgd_t *hyp_pgd, unsigned long start,
+					 unsigned long end)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	unsigned long addr, next;
+
+	BUG_ON(start > end);
+	BUG_ON(start < PAGE_OFFSET);
+
+	addr = start;
+	do {
+		next = pgd_addr_end(addr, end);
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		BUG_ON(pud_bad(*pud));
+
+		if (pud_none(*pud))
+			continue;
+
+		remove_hyp_pmd_mappings(pgd, addr, next);
+	} while (addr = next, addr < end);
+}
+
+static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long addr,
+						unsigned long end)
+{
+	pte_t *pte;
+	struct page *page;
+
+	addr &= PAGE_MASK;
+	do {
+		pte = pte_offset_kernel(pmd, addr);
+		BUG_ON(!virt_addr_valid(addr));
+		page = virt_to_page(addr);
+
+		set_pte_ext(pte, mk_pte(page, PAGE_HYP), 0);
+	} while (addr += PAGE_SIZE, addr < end);
+}
+
+static int create_hyp_pmd_mappings(pud_t *pud, unsigned long addr,
+					       unsigned long end)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+	unsigned long next;
+
+	do {
+		next = pmd_addr_end(addr, end);
+		pmd = pmd_offset(pud, addr);
+
+		BUG_ON(pmd_sect(*pmd));
+
+		if (pmd_none(*pmd)) {
+			pte = pte_alloc_one_kernel(NULL, addr);
+			if (!pte) {
+				kvm_err(-ENOMEM, "Cannot allocate Hyp pte");
+				return -ENOMEM;
+			}
+			pmd_populate_kernel(NULL, pmd, pte);
+		}
+
+		create_hyp_pte_mappings(pmd, addr, next);
+	} while (addr = next, addr < end);
+
+	return 0;
+}
+
+/**
+ * create_hyp_mappings - map a kernel virtual address range in Hyp mode
+ * @hyp_pgd:	The allocated hypervisor level-1 table
+ * @start:	The virtual kernel start address of the range
+ * @end:	The virtual kernel end address of the range
+ *
+ * The same virtual address as the kernel virtual address is also used in
+ * Hyp-mode mapping to the same underlying physical pages.
+ */
+int create_hyp_mappings(pgd_t *hyp_pgd, unsigned long start, unsigned long end)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr, next;
+	int err = 0;
+
+	BUG_ON(start > end);
+	if (start < PAGE_OFFSET)
+		return -EINVAL;
+
+	addr = start;
+	do {
+		next = pgd_addr_end(addr, end);
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none_or_clear_bad(pud)) {
+			pmd = pmd_alloc_one(NULL, addr);
+			if (!pmd) {
+				kvm_err(-ENOMEM, "Cannot allocate Hyp pmd");
+				return -ENOMEM;
+			}
+			pud_populate(NULL, pud, pmd);
+		}
+
+		err = create_hyp_pmd_mappings(pud, addr, next);
+		if (err)
+			return err;
+	} while (addr = next, addr < end);
+
+	return err;
+}
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	KVMARM_NOT_IMPLEMENTED();
+	return -EINVAL;
+}
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 83dc20d..37b8b4c 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,5 +1,6 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/module.h>
 #include <linux/kernel.h>
 
 #include <asm/cputype.h>
@@ -121,12 +122,15 @@ void hyp_identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
 {
 	__identity_mapping_add(pgd, addr, end, true);
 }
+EXPORT_SYMBOL_GPL(hyp_identity_mapping_add);
 
 static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
 {
+	pud_t *pud;
 	pmd_t *pmd;
 
-	pmd = pmd_offset(pgd, addr);
+	pud = pud_offset(pgd, addr);
+	pmd = pmd_offset(pud, addr);
 	pmd_free(NULL, pmd);
 }
 
@@ -148,6 +152,7 @@ void hyp_identity_mapping_del(pgd_t *pgd, unsigned long addr, unsigned long end)
 		}
 	} while (addr = next, addr < end);
 }
+EXPORT_SYMBOL_GPL(hyp_identity_mapping_del);
 #endif
 
 /*
diff --git a/mm/memory.c b/mm/memory.c
index 61e66f0..2a99b7b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -204,6 +204,7 @@ void pgd_clear_bad(pgd_t *pgd)
 	pgd_ERROR(*pgd);
 	pgd_clear(pgd);
 }
+EXPORT_SYMBOL_GPL(pgd_clear_bad);
 
 void pud_clear_bad(pud_t *pud)
 {


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 04/10] ARM: KVM: Memory virtualization setup
  2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
                   ` (2 preceding siblings ...)
  2011-08-06 10:39 ` [PATCH v4 03/10] ARM: KVM: Add hypervisor inititalization Christoffer Dall
@ 2011-08-06 10:39 ` Christoffer Dall
  2011-08-09  9:57   ` Avi Kivity
  2011-08-06 10:39 ` [PATCH v4 05/10] ARM: KVM: Inject IRQs and FIQs from userspace Christoffer Dall
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Christoffer Dall @ 2011-08-06 10:39 UTC (permalink / raw)
  To: kvm; +Cc: catalin.marinas, tech, android-virt

This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 tabled (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Further, each entry in TLBs and caches are tagged with a VMID
identifier in addition to ASIDs. The VMIDs are managed using
a bitmap and assigned when creating the VM in kvm_arch_init_vm()
where the 2nd stage pgd is also allocated. The table is freed in
kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_host.h |    4 ++
 arch/arm/include/asm/kvm_mmu.h  |    5 +++
 arch/arm/kvm/arm.c              |   62 ++++++++++++++++++++++++++++++++++-
 arch/arm/kvm/arm_mmu.c          |   69 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6a10467..06d1263 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -31,7 +31,9 @@ struct kvm_vcpu;
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 
 struct kvm_arch {
-	pgd_t *pgd;     /* 1-level 2nd stage table */
+	u32    vmid;	/* The VMID used for the virt. memory system */
+	pgd_t *pgd;	/* 1-level 2nd stage table */
+	u64    vttbr;	/* VTTBR value associated with above pgd and vmid */
 };
 
 #define EXCEPTION_NONE      0
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index d22aad0..a64ab2d 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -37,4 +37,9 @@ void remove_hyp_mappings(pgd_t *hyp_pgd,
 			 unsigned long end);
 void free_hyp_pmds(pgd_t *hyp_pgd);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ccfb225..3db6794 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -72,15 +72,66 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 }
 
+/**
+ * kvm_arch_init_vm - initializes a VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 int kvm_arch_init_vm(struct kvm *kvm)
 {
-	return 0;
+	int ret = 0;
+	phys_addr_t pgd_phys;
+	unsigned long vmid;
+	unsigned long start, end;
+
+
+	mutex_lock(&kvm_vmids_mutex);
+	vmid = find_first_zero_bit(kvm_vmids, VMID_SIZE);
+	if (vmid >= VMID_SIZE) {
+		mutex_unlock(&kvm_vmids_mutex);
+		return -EBUSY;
+	}
+	__set_bit(vmid, kvm_vmids);
+	kvm->arch.vmid = vmid;
+	mutex_unlock(&kvm_vmids_mutex);
+
+	ret = kvm_alloc_stage2_pgd(kvm);
+	if (ret)
+		goto out_fail_alloc;
+
+	pgd_phys = virt_to_phys(kvm->arch.pgd);
+	kvm->arch.vttbr = pgd_phys & ((1LLU << 40) - 1) & ~((2 << VTTBR_X) - 1);
+	kvm->arch.vttbr |= ((u64)vmid << 48);
+
+	start = (unsigned long)kvm,
+	end = start + sizeof(struct kvm);
+	ret = create_hyp_mappings(kvm_hyp_pgd, start, end);
+	if (ret)
+		goto out_fail_hyp_mappings;
+
+	return ret;
+out_fail_hyp_mappings:
+	remove_hyp_mappings(kvm_hyp_pgd, start, end);
+out_fail_alloc:
+	clear_bit(vmid, kvm_vmids);
+	return ret;
 }
 
+/**
+ * kvm_arch_destroy_vm - destroy the VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	int i;
 
+	kvm_free_stage2_pgd(kvm);
+
+	if (kvm->arch.vmid != 0) {
+		mutex_lock(&kvm_vmids_mutex);
+		clear_bit(kvm->arch.vmid, kvm_vmids);
+		mutex_unlock(&kvm_vmids_mutex);
+	}
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
@@ -154,6 +205,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 {
 	int err;
 	struct kvm_vcpu *vcpu;
+	unsigned long start, end;
 
 	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
 	if (!vcpu) {
@@ -165,8 +217,16 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (err)
 		goto free_vcpu;
 
+	start = (unsigned long)vcpu,
+	end = start + sizeof(struct kvm_vcpu);
+	err = create_hyp_mappings(kvm_hyp_pgd, start, end);
+	if (err)
+		goto out_fail_hyp_mappings;
+
 	latest_vcpu = vcpu;
 	return vcpu;
+out_fail_hyp_mappings:
+	remove_hyp_mappings(kvm_hyp_pgd, start, end);
 free_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 out:
diff --git a/arch/arm/kvm/arm_mmu.c b/arch/arm/kvm/arm_mmu.c
index 8fefda2..5af0a7c 100644
--- a/arch/arm/kvm/arm_mmu.c
+++ b/arch/arm/kvm/arm_mmu.c
@@ -221,6 +221,75 @@ int create_hyp_mappings(pgd_t *hyp_pgd, unsigned long start, unsigned long end)
 	return err;
 }
 
+/**
+ * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Allocates the 1st level table only of size defined by PGD2_ORDER (can
+ * support either full 40-bit input addresses or limited to 32-bit input
+ * addresses). Clears the allocated pages.
+ */
+int kvm_alloc_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+
+	if (kvm->arch.pgd != NULL) {
+		kvm_err(-EINVAL, "kvm_arch already initialized?\n");
+		return -EINVAL;
+	}
+
+	pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
+	if (!pgd)
+		return -ENOMEM;
+
+	memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
+	kvm->arch.pgd = pgd;
+
+	return 0;
+}
+
+/**
+ * kvm_free_stage2_pgd - free all stage-2 tables
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
+ * underlying level-2 and level-3 tables before freeing the actual level-1 table
+ * and setting the struct pointer to NULL.
+ */
+void kvm_free_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long long i, addr;
+
+	if (kvm->arch.pgd == NULL)
+		return;
+
+	/*
+	 * We do this slightly different than other places, since we need more
+	 * than 32 bits and for instance pgd_addr_end converts to unsigned long.
+	 */
+	addr = 0;
+	for (i = 0; i < PTRS_PER_PGD2; i++) {
+		addr = i * (unsigned long long)PGDIR_SIZE;
+		pgd = kvm->arch.pgd + i;
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none(*pud))
+			continue;
+
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+	}
+
+	free_pages((unsigned long)kvm->arch.pgd, PGD2_ORDER);
+	kvm->arch.pgd = NULL;
+}
+
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	KVMARM_NOT_IMPLEMENTED();


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 05/10] ARM: KVM: Inject IRQs and FIQs from userspace
  2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
                   ` (3 preceding siblings ...)
  2011-08-06 10:39 ` [PATCH v4 04/10] ARM: KVM: Memory virtualization setup Christoffer Dall
@ 2011-08-06 10:39 ` Christoffer Dall
  2011-08-09 10:07   ` Avi Kivity
  2011-08-06 10:39 ` [PATCH v4 06/10] ARM: KVM: World-switch implementation Christoffer Dall
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Christoffer Dall @ 2011-08-06 10:39 UTC (permalink / raw)
  To: kvm; +Cc: catalin.marinas, tech, android-virt

Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
This ioctl is used since the sematics are in fact two lines that can be
either raised or lowered on the VCPU - the IRQ and FIQ lines.

KVM needs to know which VCPU it must operate on and whether the FIQ or
IRQ line is raised/lowered. Hence both pieces of information is packed
in the kvm_irq_level->irq field. The irq fild value will be:
  IRQ: vcpu_index * 2
  FIQ: (vcpu_index * 2) + 1

This is documented in Documentation/kvm/api.txt.

The effect of the ioctl is simply to simply raise/lower the
corresponding virt_irq field on the VCPU struct, which will cause the
world-switch code to raise/lower virtual interrupts when running the
guest on next switch. The wait_for_interrupt flag is also cleared for
raised IRQs causing an idle VCPU to become active again.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/kvm/api.txt      |   11 ++++++--
 arch/arm/include/asm/kvm.h     |    8 ++++++
 arch/arm/include/asm/kvm_arm.h |    1 +
 arch/arm/kvm/arm.c             |   54 +++++++++++++++++++++++++++++++++++++++-
 arch/arm/kvm/trace.h           |   21 ++++++++++++++++
 include/linux/kvm.h            |    1 +
 6 files changed, 91 insertions(+), 5 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 9bef4e4..1ed5554 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -534,15 +534,20 @@ only go to the IOAPIC.  On ia64, a IOSAPIC is created.
 4.25 KVM_IRQ_LINE
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irq_level
 Returns: 0 on success, -1 on error
 
 Sets the level of a GSI input to the interrupt controller model in the kernel.
 Requires that an interrupt controller model has been previously created with
-KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
-to be set to 1 and then back to 0.
+KVM_CREATE_IRQCHIP (except for ARM).  Note that edge-triggered interrupts
+require the level to be set to 1 and then back to 0.
+
+ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value of the
+irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for
+FIQs. Level is used to raise/lower the line. See arch/arm/include/asm/kvm.h for
+convenience macros.
 
 struct kvm_irq_level {
 	union {
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index 87dc33b..8935062 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -20,6 +20,14 @@
 #include <asm/types.h>
 
 /*
+ * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index.
+ */
+enum KVM_ARM_IRQ_LINE_TYPE {
+	KVM_ARM_IRQ_LINE = 0,
+	KVM_ARM_FIQ_LINE = 1,
+};
+
+/*
  * Modes used for short-hand mode determinition in the world-switch code and
  * in emulation code.
  *
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 835abd1..e378a37 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -49,6 +49,7 @@
 #define HCR_VM		1
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TWE | HCR_TWI | HCR_VM | HCR_AMO | \
 			HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO)
+#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3db6794..071912e 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -297,6 +297,43 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return -EINVAL;
 }
 
+static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
+				      struct kvm_irq_level *irq_level)
+{
+	u32 mask;
+	unsigned int vcpu_idx;
+	struct kvm_vcpu *vcpu;
+
+	vcpu_idx = irq_level->irq / 2;
+	if (vcpu_idx >= KVM_MAX_VCPUS)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+	if (!vcpu)
+		return -EINVAL;
+
+	switch (irq_level->irq % 2) {
+	case KVM_ARM_IRQ_LINE:
+		mask = HCR_VI;
+		break;
+	case KVM_ARM_FIQ_LINE:
+		mask = HCR_VF;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	trace_kvm_irq_line(irq_level->irq % 2, irq_level->level, vcpu_idx);
+
+	if (irq_level->level) {
+		vcpu->arch.virt_irq |= mask;
+		vcpu->arch.wait_for_interrupts = 0;
+	} else
+		vcpu->arch.virt_irq &= ~mask;
+
+	return 0;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
@@ -312,8 +349,21 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
-	printk(KERN_ERR "kvm_arch_vm_ioctl: Unsupported ioctl (%d)\n", ioctl);
-	return -EINVAL;
+	struct kvm *kvm = filp->private_data;
+	void __user *argp = (void __user *)arg;
+
+	switch (ioctl) {
+	case KVM_IRQ_LINE: {
+		struct kvm_irq_level irq_event;
+
+		if (copy_from_user(&irq_event, argp, sizeof irq_event))
+			return -EFAULT;
+		return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event);
+	}
+	default:
+		kvm_err(-EINVAL, "Unsupported ioctl (%d)", ioctl);
+		return -EINVAL;
+	}
 }
 
 /**
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index f8869c1..ac64e3a 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -40,6 +40,27 @@ TRACE_EVENT(kvm_exit,
 );
 
 
+TRACE_EVENT(kvm_irq_line,
+	TP_PROTO(unsigned int type, unsigned int level, unsigned int vcpu_idx),
+	TP_ARGS(type, level, vcpu_idx),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	type			)
+		__field(	unsigned int,	level			)
+		__field(	unsigned int,	vcpu_idx		)
+	),
+
+	TP_fast_assign(
+		__entry->type			= type;
+		__entry->level			= level;
+		__entry->vcpu_idx		= vcpu_idx;
+	),
+
+	TP_printk("KVM_IRQ_LINE: type: %s, level: %u, vcpu: %u",
+		(__entry->type == KVM_ARM_IRQ_LINE) ? "IRQ" : "FIQ",
+		__entry->level, __entry->vcpu_idx)
+);
+
 
 #endif /* _TRACE_KVM_H */
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..4e85b4a 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -111,6 +111,7 @@ struct kvm_irq_level {
 	 * ACPI gsi notion of irq.
 	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
 	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
+	 * For ARM: IRQ: irq = (2*vcpu_index). FIQ: irq = (2*vcpu_indx + 1).
 	 */
 	union {
 		__u32 irq;


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 06/10] ARM: KVM: World-switch implementation
  2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
                   ` (4 preceding siblings ...)
  2011-08-06 10:39 ` [PATCH v4 05/10] ARM: KVM: Inject IRQs and FIQs from userspace Christoffer Dall
@ 2011-08-06 10:39 ` Christoffer Dall
  2011-08-09 11:09   ` Avi Kivity
  2011-08-06 10:39 ` [PATCH v4 07/10] ARM: KVM: Emulation framework and CP15 emulation Christoffer Dall
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Christoffer Dall @ 2011-08-06 10:39 UTC (permalink / raw)
  To: kvm; +Cc: catalin.marinas, tech, android-virt

Provides complete world-switch implementation to switch to other guests
runinng in non-secure modes. Includes Hyp exception handlers that
captures necessary exception information and stores the information on
the VCPU and KVM structures.

Switching to Hyp mode is done through a simple HVC instructions. The
exception vector code will check that the HVC comes from VMID==0 and if
so will store the necessary state on the Hyp stack, which will look like
this (see hyp_hvc):
  ...
  Hyp_Sp + 4: lr_usr
  Hyp_Sp    : spsr (Host-SVC cpsr)

When returning from Hyp mode to SVC mode, another HVC instruction is
executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp
stack pointer should be where it was left from the above initial call,
since the values on the stack will be used to restore state (see
hyp_svc).

Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.

One controversy may be the back-door call to __irq_svc (the host
kernel's own physical IRQ handler) which is called when a physical IRQ
exception is taken in Hyp mode while running in the guest.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm.h      |    1 
 arch/arm/include/asm/kvm_arm.h  |   26 ++
 arch/arm/include/asm/kvm_host.h |    8 +
 arch/arm/kernel/armksyms.c      |    6 +
 arch/arm/kernel/asm-offsets.c   |   33 +++
 arch/arm/kernel/entry-armv.S    |    1 
 arch/arm/kvm/arm.c              |   55 ++++-
 arch/arm/kvm/arm_guest.c        |    2 
 arch/arm/kvm/arm_interrupts.S   |  443 +++++++++++++++++++++++++++++++++++++++
 9 files changed, 570 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index 8935062..ff88ca0 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -51,6 +51,7 @@ struct kvm_regs {
 	__u32 cpsr;
 	__u32 spsr[5];		/* Banked SPSR,  indexed by MODE_  */
 	struct {
+		__u32 c0_midr;
 		__u32 c1_sys;
 		__u32 c2_base0;
 		__u32 c2_base1;
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index e378a37..1769187 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -100,5 +100,31 @@
 #define VTTBR_X		(5 - VTCR_GUEST_T0SZ)
 #endif
 
+/* Hyp Syndrome Register (HSR) bits */
+#define HSR_EC_SHIFT	(26)
+#define HSR_EC		(0x3fU << HSR_EC_SHIFT)
+#define HSR_IL		(1U << 25)
+#define HSR_ISS		(HSR_IL - 1)
+#define HSR_ISV_SHIFT	(24)
+#define HSR_ISV		(1U << HSR_ISV_SHIFT)
+
+#define HSR_EC_UNKNOWN	(0x00)
+#define HSR_EC_WFI	(0x01)
+#define HSR_EC_CP15_32	(0x03)
+#define HSR_EC_CP15_64	(0x04)
+#define HSR_EC_CP14_MR	(0x05)
+#define HSR_EC_CP14_LS	(0x06)
+#define HSR_EC_CP_0_13	(0x07)
+#define HSR_EC_CP10_ID	(0x08)
+#define HSR_EC_JAZELLE	(0x09)
+#define HSR_EC_BXJ	(0x0A)
+#define HSR_EC_CP14_64	(0x0C)
+#define HSR_EC_SVC_HYP	(0x11)
+#define HSR_EC_HVC	(0x12)
+#define HSR_EC_SMC	(0x13)
+#define HSR_EC_IABT	(0x20)
+#define HSR_EC_IABT_HYP	(0x21)
+#define HSR_EC_DABT	(0x24)
+#define HSR_EC_DABT_HYP	(0x25)
 
 #endif /* __KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 06d1263..59fcd15 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -62,6 +62,7 @@ struct kvm_vcpu_arch {
 
 	/* System control coprocessor (cp15) */
 	struct {
+		u32 c0_MIDR;		/* Main ID Register */
 		u32 c1_SCTLR;		/* System Control Register */
 		u32 c1_ACTLR;		/* Auxilliary Control Register */
 		u32 c1_CPACR;		/* Coprocessor Access Control */
@@ -69,6 +70,12 @@ struct kvm_vcpu_arch {
 		u64 c2_TTBR1;		/* Translation Table Base Register 1 */
 		u32 c2_TTBCR;		/* Translation Table Base Control R. */
 		u32 c3_DACR;		/* Domain Access Control Register */
+		u32 c10_PRRR;		/* Primary Region Remap Register */
+		u32 c10_NMRR;		/* Normal Memory Remap Register */
+		u32 c13_CID;		/* Context ID Register */
+		u32 c13_TID_URW;	/* Thread ID, User R/W */
+		u32 c13_TID_URO;	/* Thread ID, User R/O */
+		u32 c13_TID_PRIV;	/* Thread ID, Priveleged */
 	} cp15;
 
 	u32 virt_irq;		/* HCR exception mask */
@@ -78,6 +85,7 @@ struct kvm_vcpu_arch {
 	u32 hdfar;		/* Hyp Data Fault Address Register */
 	u32 hifar;		/* Hyp Inst. Fault Address Register */
 	u32 hpfar;		/* Hyp IPA Fault Address Register */
+	u64 pc_ipa;		/* IPA for the current PC (VA to PA result) */
 
 	/* IO related fields */
 	u32 mmio_rd;
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index acca35a..819b78c 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -49,6 +49,12 @@ extern void __aeabi_ulcmp(void);
 
 extern void fpundefinstr(void);
 
+#ifdef CONFIG_KVM_ARM_HOST
+/* This is needed for KVM */
+extern void __irq_svc(void);
+
+EXPORT_SYMBOL_GPL(__irq_svc);
+#endif
 
 EXPORT_SYMBOL(__backtrace);
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 927522c..cfa4a52 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -16,6 +16,7 @@
 #include <asm/cacheflush.h>
 #include <asm/glue-df.h>
 #include <asm/glue-pf.h>
+#include <linux/kvm_host.h>
 #include <asm/mach/arch.h>
 #include <asm/thread_info.h>
 #include <asm/memory.h>
@@ -129,5 +130,37 @@ int main(void)
   DEFINE(DMA_BIDIRECTIONAL,	DMA_BIDIRECTIONAL);
   DEFINE(DMA_TO_DEVICE,		DMA_TO_DEVICE);
   DEFINE(DMA_FROM_DEVICE,	DMA_FROM_DEVICE);
+#ifdef CONFIG_KVM_ARM_HOST
+  DEFINE(VCPU_KVM,		offsetof(struct kvm_vcpu, kvm));
+  DEFINE(VCPU_MIDR,		offsetof(struct kvm_vcpu, arch.cp15.c0_MIDR));
+  DEFINE(VCPU_SCTLR,		offsetof(struct kvm_vcpu, arch.cp15.c1_SCTLR));
+  DEFINE(VCPU_CPACR,		offsetof(struct kvm_vcpu, arch.cp15.c1_CPACR));
+  DEFINE(VCPU_TTBR0,		offsetof(struct kvm_vcpu, arch.cp15.c2_TTBR0));
+  DEFINE(VCPU_TTBR1,		offsetof(struct kvm_vcpu, arch.cp15.c2_TTBR1));
+  DEFINE(VCPU_TTBCR,		offsetof(struct kvm_vcpu, arch.cp15.c2_TTBCR));
+  DEFINE(VCPU_DACR,		offsetof(struct kvm_vcpu, arch.cp15.c3_DACR));
+  DEFINE(VCPU_PRRR,		offsetof(struct kvm_vcpu, arch.cp15.c10_PRRR));
+  DEFINE(VCPU_NMRR,		offsetof(struct kvm_vcpu, arch.cp15.c10_NMRR));
+  DEFINE(VCPU_CID,		offsetof(struct kvm_vcpu, arch.cp15.c13_CID));
+  DEFINE(VCPU_TID_URW,		offsetof(struct kvm_vcpu, arch.cp15.c13_TID_URW));
+  DEFINE(VCPU_TID_URO,		offsetof(struct kvm_vcpu, arch.cp15.c13_TID_URO));
+  DEFINE(VCPU_TID_PRIV,		offsetof(struct kvm_vcpu, arch.cp15.c13_TID_PRIV));
+  DEFINE(VCPU_REGS,		offsetof(struct kvm_vcpu, arch.regs));
+  DEFINE(VCPU_USR_REGS,		offsetof(struct kvm_vcpu, arch.regs.usr_regs));
+  DEFINE(VCPU_SVC_REGS,		offsetof(struct kvm_vcpu, arch.regs.svc_regs));
+  DEFINE(VCPU_ABT_REGS,		offsetof(struct kvm_vcpu, arch.regs.abt_regs));
+  DEFINE(VCPU_UND_REGS,		offsetof(struct kvm_vcpu, arch.regs.und_regs));
+  DEFINE(VCPU_IRQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.irq_regs));
+  DEFINE(VCPU_FIQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
+  DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.pc));
+  DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.cpsr));
+  DEFINE(VCPU_VIRT_IRQ,		offsetof(struct kvm_vcpu, arch.virt_irq));
+  DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.hsr));
+  DEFINE(VCPU_HDFAR,		offsetof(struct kvm_vcpu, arch.hdfar));
+  DEFINE(VCPU_HIFAR,		offsetof(struct kvm_vcpu, arch.hifar));
+  DEFINE(VCPU_HPFAR,		offsetof(struct kvm_vcpu, arch.hpfar));
+  DEFINE(VCPU_PC_IPA,		offsetof(struct kvm_vcpu, arch.pc_ipa));
+  DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
+#endif
   return 0; 
 }
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index e8d8856..c40f3b5 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -198,6 +198,7 @@ __dabt_svc:
 ENDPROC(__dabt_svc)
 
 	.align	5
+	.globl __irq_svc
 __irq_svc:
 	svc_entry
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 071912e..196eace 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -235,8 +235,15 @@ out:
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
+	unsigned long start, end;
+
 	latest_vcpu = NULL;
-	KVMARM_NOT_IMPLEMENTED();
+
+	start = (unsigned long)vcpu,
+	end = start + sizeof(struct kvm_vcpu);
+	remove_hyp_mappings(kvm_hyp_pgd, start, end);
+
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
@@ -251,7 +258,20 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
-	KVMARM_NOT_IMPLEMENTED();
+	unsigned long cpsr;
+	unsigned long sctlr;
+
+	/* Init execution CPSR */
+	asm volatile ("mrs	%[cpsr], cpsr" :
+			[cpsr] "=r" (cpsr));
+	vcpu->arch.regs.cpsr = SVC_MODE | PSR_I_BIT | PSR_F_BIT | PSR_A_BIT |
+				(cpsr & PSR_E_BIT);
+
+	/* Init SCTLR with MMU disabled */
+	asm volatile ("mrc	p15, 0, %[sctlr], c1, c0, 0" :
+			[sctlr] "=r" (sctlr));
+	vcpu->arch.cp15.c1_SCTLR = sctlr & ~1U;
+
 	return 0;
 }
 
@@ -291,10 +311,37 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 	return 0;
 }
 
+/**
+ * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
+ * @vcpu:	The VCPU pointer
+ * @run:	The kvm_run structure pointer used for userspace state exchange
+ *
+ * This function is called through the VCPU_RUN ioctl called from user space. It
+ * will execute VM code in a loop until the time slice for the process is used
+ * or some emulation is needed from user space in which case the function will
+ * return with return value 0 and with the kvm_run structure filled in with the
+ * required data for the requested emulation.
+ */
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	KVMARM_NOT_IMPLEMENTED();
-	return -EINVAL;
+	unsigned long flags;
+	int ret;
+
+	for (;;) {
+		trace_kvm_entry(vcpu->arch.regs.pc);
+		debug_ws_enter(vcpu->arch.regs.pc);
+		kvm_guest_enter();
+
+		local_irq_save(flags);
+		ret = __kvm_vcpu_run(vcpu);
+		local_irq_restore(flags);
+
+		kvm_guest_exit();
+		debug_ws_exit(vcpu->arch.regs.pc);
+		trace_kvm_exit(vcpu->arch.regs.pc);
+	}
+
+	return ret;
 }
 
 static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
diff --git a/arch/arm/kvm/arm_guest.c b/arch/arm/kvm/arm_guest.c
index 94a5c54..3a23bee 100644
--- a/arch/arm/kvm/arm_guest.c
+++ b/arch/arm/kvm/arm_guest.c
@@ -73,6 +73,7 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 	/*
 	 * Co-processor registers.
 	 */
+	regs->cp15.c0_midr = vcpu->arch.cp15.c0_MIDR;
 	regs->cp15.c1_sys = vcpu->arch.cp15.c1_SCTLR;
 	regs->cp15.c2_base0 = vcpu->arch.cp15.c2_TTBR0;
 	regs->cp15.c2_base1 = vcpu->arch.cp15.c2_TTBR1;
@@ -111,6 +112,7 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 	/*
 	 * Co-processor registers.
 	 */
+	vcpu->arch.cp15.c0_MIDR = regs->cp15.c0_midr;
 	vcpu->arch.cp15.c1_SCTLR = regs->cp15.c1_sys;
 
 	vcpu_regs->pc = regs->reg15;
diff --git a/arch/arm/kvm/arm_interrupts.S b/arch/arm/kvm/arm_interrupts.S
index 2edc49b..d516bf4 100644
--- a/arch/arm/kvm/arm_interrupts.S
+++ b/arch/arm/kvm/arm_interrupts.S
@@ -21,6 +21,12 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
 
+#define VCPU_USR_REG(_reg_nr)	(VCPU_USR_REGS + (_reg_nr * 4))
+#define VCPU_USR_SP		(VCPU_USR_REG(13))
+#define VCPU_FIQ_REG(_reg_nr)	(VCPU_FIQ_REGS + (_reg_nr * 4))
+#define VCPU_FIQ_SPSR		(VCPU_FIQ_REG(7))
+
+
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor world-switch code
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@ -28,9 +34,317 @@
 	.text
 	.arm
 
+/* These are simply for the macros to work - value don't have meaning */
+.equ usr, 0
+.equ svc, 1
+.equ abt, 2
+.equ und, 3
+.equ irq, 4
+.equ fiq, 5
+
+.macro store_mode_state base_reg, mode
+	.if \mode == usr
+	mrs	r2, SP_usr
+	mov	r3, lr
+	stmdb	\base_reg!, {r2, r3}
+	.elseif \mode != fiq
+	mrs	r2, SP_\mode
+	mrs	r3, LR_\mode
+	mrs	r4, SPSR_\mode
+	stmdb	\base_reg!, {r2, r3, r4}
+	.else
+	mrs	r2, r8_fiq
+	mrs	r3, r9_fiq
+	mrs	r4, r10_fiq
+	mrs	r5, r11_fiq
+	mrs	r6, r12_fiq
+	mrs	r7, SP_fiq
+	mrs	r8, LR_fiq
+	mrs	r9, SPSR_fiq
+	stmdb	\base_reg!, {r2-r9}
+	.endif
+.endm
+
+.macro load_mode_state base_reg, mode
+	.if \mode == usr
+	ldmia	\base_reg!, {r2, r3}
+	msr	SP_usr, r2
+	mov	lr, r3
+	.elseif \mode != fiq
+	ldmia	\base_reg!, {r2, r3, r4}
+	msr	SP_\mode, r2
+	msr	LR_\mode, r3
+	msr	SPSR_\mode, r4
+	.else
+	ldmia	\base_reg!, {r2-r9}
+	msr	r8_fiq, r2
+	msr	r9_fiq, r3
+	msr	r10_fiq, r4
+	msr	r11_fiq, r5
+	msr	r12_fiq, r6
+	msr	SP_fiq, r7
+	msr	LR_fiq, r8
+	msr	SPSR_fiq, r9
+	.endif
+.endm
+
+/* Reads cp15 registers from hardware and stores then in memory
+ * @vcpu:   If 0, registers are written in-order to the stack,
+ * 	    otherwise to the VCPU struct pointed to by vcpup
+ * @vcpup:  Register pointing to VCPU struct
+ */
+.macro read_cp15_state vcpu=0, vcpup
+	mrc	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mrc	p15, 0, r3, c1, c0, 2	@ CPACR
+	mrc	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mrc	p15, 0, r5, c3, c0, 0	@ DACR
+	mrrc	p15, 0, r6, r7, c2	@ TTBR 0
+	mrrc	p15, 1, r8, r9, c2	@ TTBR 1
+	mrc	p15, 0, r10, c10, c2, 0	@ PRRR
+	mrc	p15, 0, r11, c10, c2, 1	@ NMRR
+
+	.if \vcpu == 0
+	push	{r2-r11}		@ Push CP15 registers
+	.else
+	str	r2, [\vcpup, #VCPU_SCTLR]
+	str	r3, [\vcpup, #VCPU_CPACR]
+	str	r4, [\vcpup, #VCPU_TTBCR]
+	str	r5, [\vcpup, #VCPU_DACR]
+	add	\vcpup, \vcpup, #VCPU_TTBR0
+	strd	r6, r7, [\vcpup]
+	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	strd	r8, r9, [\vcpup]
+	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
+	str	r10, [\vcpup, #VCPU_PRRR]
+	str	r11, [\vcpup, #VCPU_NMRR]
+	.endif
+
+	mrc	p15, 0, r2, c13, c0, 1	@ CID
+	mrc	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mrc	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mrc	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	.if \vcpu == 0
+	push	{r2-r5}			@ Push CP15 registers
+	.else
+	str	r2, [\vcpup, #VCPU_CID]
+	str	r3, [\vcpup, #VCPU_TID_URW]
+	str	r4, [\vcpup, #VCPU_TID_URO]
+	str	r5, [\vcpup, #VCPU_TID_PRIV]
+	.endif
+.endm
+
+/* Reads cp15 registers from memory and writes them to hardware
+ * @vcpu:   If 0, registers are read in-order from the stack,
+ * 	    otherwise from the VCPU struct pointed to by vcpup
+ * @vcpup:  Register pointing to VCPU struct
+ */
+.macro write_cp15_state vcpu=0, vcpup
+	.if \vcpu == 0
+	pop	{r2-r5}
+	.else
+	ldr	r2, [\vcpup, #VCPU_CID]
+	ldr	r3, [\vcpup, #VCPU_TID_URW]
+	ldr	r4, [\vcpup, #VCPU_TID_URO]
+	ldr	r5, [\vcpup, #VCPU_TID_PRIV]
+	.endif
+
+	mcr	p15, 0, r2, c13, c0, 1	@ CID
+	mcr	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mcr	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mcr	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+
+	.if \vcpu == 0
+	pop	{r2-r11}
+	.else
+	ldr	r2, [\vcpup, #VCPU_SCTLR]
+	ldr	r3, [\vcpup, #VCPU_CPACR]
+	ldr	r4, [\vcpup, #VCPU_TTBCR]
+	ldr	r5, [\vcpup, #VCPU_DACR]
+	add	\vcpup, \vcpup, #VCPU_TTBR0
+	ldrd	r6, r7, [\vcpup]
+	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	ldrd	r8, r9, [\vcpup]
+	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
+	ldr	r10, [\vcpup, #VCPU_PRRR]
+	ldr	r11, [\vcpup, #VCPU_NMRR]
+	.endif
+
+	mcr	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mcr	p15, 0, r3, c1, c0, 2	@ CPACR
+	mcr	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mcr	p15, 0, r5, c3, c0, 0	@ DACR
+	mcrr	p15, 0, r6, r7, c2	@ TTBR 0
+	mcrr	p15, 1, r8, r9, c2	@ TTBR 1
+	mcr	p15, 0, r10, c10, c2, 0	@ PRRR
+	mcr	p15, 0, r11, c10, c2, 1	@ NMRR
+.endm
+
+/* Configures the HSTR (Hyp System Trap Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hstr entry
+	mrc	p15, 4, r2, c1, c1, 3
+	ldr	r3, =0x9e00
+	.if \entry == 1
+	orr	r2, r2, r3		@ Trap CR{9,10,11,12,15}
+	.else
+	bic	r2, r2, r3		@ Don't trap any CRx accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 3
+.endm
+
+/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi/wfe, trap smc */
+.macro configure_hyp_role entry, vcpu_ptr
+	mrc	p15, 4, r2, c1, c1, 0	@ HCR
+	bic	r2, r2, #HCR_VIRT_EXCP_MASK
+	ldr	r3, =HCR_GUEST_MASK
+	.if \entry == 1
+	orr	r2, r2, r3
+	ldr	r3, [\vcpu_ptr, #VCPU_VIRT_IRQ]
+	orr	r2, r2, r3
+	.else
+	bic	r2, r2, r3
+	.endif
+	mcr	p15, 4, r2, c1, c1, 0
+.endm
+
+@ This must be called from Hyp mode!
+@ Arguments:
+@  r0: pointer to vcpu struct
 ENTRY(__kvm_vcpu_run)
+	hvc	#0			@ Change to Hyp-mode
+
+	@ Now we're in Hyp-mode and lr_usr, spsr_hyp are on the stack
+	mrs	r2, sp_usr
+	push	{r2}			@ Push r13_usr
+	push	{r4-r12}		@ Push r4-r12
+
+	store_mode_state sp, svc
+	store_mode_state sp, abt
+	store_mode_state sp, und
+	store_mode_state sp, irq
+	store_mode_state sp, fiq
+
+	@ Store hardware CP15 state and load guest state
+	read_cp15_state
+	write_cp15_state 1, r0
+
+	push	{r0}			@ Push the VCPU pointer
+
+	@ Set up guest memory translation
+	ldr	r1, [r0, #VCPU_KVM]	@ r1 points to kvm struct
+	ldrd	r2, r3, [r1, #KVM_VTTBR]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Configure Hyp-role
+	configure_hyp_role 1, r0
+
+	@ Trap coprocessor CRx for all x except 2 and 14
+	set_hstr 1
+
+	@ Write standard A-9 CPU id in MIDR
+	ldr	r1, [r0, #VCPU_MIDR]
+	mcr	p15, 4, r1, c0, c0, 0
+
+	@ Load guest registers
+	add	r0, r0, #(VCPU_USR_SP)
+	load_mode_state r0, usr
+	load_mode_state r0, svc
+	load_mode_state r0, abt
+	load_mode_state r0, und
+	load_mode_state r0, irq
+	load_mode_state r0, fiq
+
+	@ Load return state (r0 now points to vcpu->arch.regs.pc)
+	ldmia	r0, {r2, r3}
+	msr	ELR_hyp, r2
+	msr	spsr, r3
+
+	@ Load remaining registers and do the switch
+	sub	r0, r0, #(VCPU_PC - VCPU_USR_REGS)
+	ldmia	r0, {r0-r12}
+	eret
+
+__kvm_vcpu_return:
+	@ Store return state
+	mrs	r2, ELR_hyp
+	mrs	r3, spsr
+	str	r2, [r1, #VCPU_PC]
+	str	r3, [r1, #VCPU_CPSR]
+
+	@ Store guest registers
+	add	r1, r1, #(VCPU_FIQ_SPSR + 4)
+	store_mode_state r1, fiq
+	store_mode_state r1, irq
+	store_mode_state r1, und
+	store_mode_state r1, abt
+	store_mode_state r1, svc
+	store_mode_state r1, usr
+	sub	r1, r1, #(VCPU_USR_REG(13))
+
+	@ Don't trap coprocessor accesses for host kernel
+	set_hstr 0
+
+	@ Reset Hyp-role
+	configure_hyp_role 0, r1
+
+	@ Let guest read hardware MIDR
+	mrc	p15, 0, r2, c0, c0, 0
+	mcr	p15, 4, r2, c0, c0, 0
+
+	@ Set VMID == 0
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Store guest CP15 state and restore host state
+	read_cp15_state 1, r1
+	write_cp15_state
+
+	load_mode_state sp, fiq
+	load_mode_state sp, irq
+	load_mode_state sp, und
+	load_mode_state sp, abt
+	load_mode_state sp, svc
+
+	pop	{r4-r12}		@ Pop r4-r12
+	pop	{r2}			@ Pop r13_usr
+	msr	sp_usr, r2
+
+	hvc	#0
+
+	cmp	r0, #ARM_EXCEPTION_IRQ
+	bne	return_to_ioctl
+
+	/*
+	 * It's time to launch the kernel IRQ handler for IRQ exceptions. This
+	 * requires some manipulation though.
+	 *
+	 *  - The easiest entry point to the host handler is __irq_svc.
+	 *  - The __irq_svc expects to be called from SVC mode, which has been
+	 *    switched to from vector_stub code in entry-armv.S. The __irq_svc
+	 *    calls svc_entry which uses values stored in memory and pointed to
+	 *    by r0 to return from handler. We allocate this memory on the
+	 *    stack, which will contain these values:
+	 *      0x8:   cpsr
+	 *      0x4:   return_address
+	 *      0x0:   r0
+	 */
+	adr	r1, irq_kernel_resume	@ Where to resume
+	mrs	r2, cpsr		@ CPSR when we return
+	push	{r0 - r2}
+	mov	r0, sp
+	b	__irq_svc
+
+irq_kernel_resume:
+	pop	{r0}
+	add	sp, sp, #8
+
+return_to_ioctl:
 THUMB(	orr	lr, lr, #1)
 	mov	pc, lr
+
+	.ltorg
+
 __kvm_vcpu_run_end:
 	.globl __kvm_vcpu_run_end
 
@@ -39,9 +353,136 @@ __kvm_vcpu_run_end:
 @  Hypervisor exception vector and handlers
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+	.text
+	.arm
+
 	.align 5
 __kvm_hyp_vector:
 	.globl __kvm_hyp_vector
-	nop
+
+	@ Hyp-mode exception vector
+	b	hyp_reset
+	b	hyp_undef
+	b	hyp_svc
+	b	hyp_pabt
+	b	hyp_dabt
+	b	hyp_hvc
+	b	hyp_irq
+	b	hyp_fiq
+
+	.align
+hyp_reset:
+	sub	pc, pc, #8
+
+	.align
+hyp_undef:
+	sub	pc, pc, #8
+
+	.align
+hyp_svc:
+	@ Can only get here if HVC or SVC is called from Hyp, mode which means
+	@ we want to change mode back to SVC mode.
+	@ NB: Stack pointer should be where hyp_hvc handler left it!
+	ldr	lr, [sp, #4]
+	msr	spsr, lr
+	ldr	lr, [sp]
+	add	sp, sp, #8
+	eret
+
+	.align
+hyp_pabt:
+	sub	pc, pc, #8
+
+	.align
+hyp_dabt:
+	sub	pc, pc, #8
+
+	.align
+hyp_hvc:
+	@ Getting here is either becuase of a trap from a guest or from calling
+	@ HVC from the host kernel, which means "switch to Hyp mode".
+	push	{r0, r1, r2}
+
+	@ Check syndrome register
+	mrc	p15, 4, r0, c5, c2, 0	@ HSR
+	lsr	r1, r0, #HSR_EC_SHIFT
+	cmp	r1, #HSR_EC_HVC
+	bne	guest_trap		@ Not HVC instr.
+
+	@ Let's check if the HVC came from VMID 0 and allow simple
+	@ switch to Hyp mode
+	mrrc    p15, 6, r1, r2, c2
+	lsr     r2, r2, #16
+	and     r2, r2, #0xff
+	cmp     r2, #0
+	bne	guest_trap		@ Guest called HVC
+
+	pop	{r0, r1, r2}
+
+	@ Store lr_usr,spsr (svc cpsr) on stack
+	sub	sp, sp, #8
+	str	lr, [sp]
+	mrs	lr, spsr
+	str	lr, [sp, #4]
+
+	@ Return to caller in Hyp mode
+	mrs	lr, ELR_hyp
+	mov	pc, lr
+
+guest_trap:
+	ldr	r1, [sp, #12]		@ Load VCPU pointer
+	str	r0, [r1, #VCPU_HSR]
+	add	r1, r1, #VCPU_USR_REG(3)
+	stmia	r1, {r3-r12}
+	sub	r1, r1, #(VCPU_USR_REG(3) - VCPU_USR_REG(0))
+	pop	{r3, r4, r5}
+	add	sp, sp, #4		@ We loaded the VCPU pointer above
+	stmia	r1, {r3, r4, r5}
+	sub	r1, r1, #VCPU_USR_REG(0)
+
+	@ Check if we need the fault information
+	lsr	r2, r0, #HSR_EC_SHIFT
+	cmp	r2, #HSR_EC_IABT
+	beq	2f
+	cmpne	r2, #HSR_EC_DABT
+	bne	1f
+
+	@ For non-valid data aborts, get the offending instr. PA
+	lsr	r2, r0, #HSR_ISV_SHIFT
+	ands	r2, r2, #1
+	bne	2f
+	mrs	r3, ELR_hyp
+	mcr	p15, 0, r3, c7, c8, 0	@ VA to PA, V2PCWPR
+	mrrc	p15, 0, r4, r5, c7	@ PAR
+	add	r6, r1, #VCPU_PC_IPA
+	strd	r4, r5, [r6]
+
+2:	mrc	p15, 4, r2, c6, c0, 0	@ HDFAR
+	mrc	p15, 4, r3, c6, c0, 2	@ HIFAR
+	mrc	p15, 4, r4, c6, c0, 4	@ HPFAR
+	add	r5, r1, #VCPU_HDFAR
+	stmia	r5, {r2, r3, r4}
+
+1:	mov	r0, #ARM_EXCEPTION_HVC
+	b	__kvm_vcpu_return
+
+	.align
+hyp_irq:
+	push	{r0}
+	ldr	r0, [sp, #4]		@ Load VCPU pointer
+	add	r0, r0, #(VCPU_USR_REG(1))
+	stmia	r0, {r1-r12}
+	pop	{r0, r1}		@ r1 == vcpu pointer
+	str	r0, [r1, #VCPU_USR_REG(0)]
+
+	mov	r0, #ARM_EXCEPTION_IRQ
+	b	__kvm_vcpu_return
+
+	.align
+hyp_fiq:
+	sub	pc, pc, #8
+
+	.ltorg
+
 __kvm_hyp_vector_end:
 	.globl __kvm_hyp_vector_end


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 07/10] ARM: KVM: Emulation framework and CP15 emulation
  2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
                   ` (5 preceding siblings ...)
  2011-08-06 10:39 ` [PATCH v4 06/10] ARM: KVM: World-switch implementation Christoffer Dall
@ 2011-08-06 10:39 ` Christoffer Dall
  2011-08-09 11:17   ` Avi Kivity
  2011-08-06 10:39 ` [PATCH v4 08/10] ARM: KVM: Handle guest faults in KVM Christoffer Dall
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Christoffer Dall @ 2011-08-06 10:39 UTC (permalink / raw)
  To: kvm; +Cc: catalin.marinas, tech, android-virt

Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.

Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commits handles these exits by
emulating the intented operation in software and skip the guest
instruction.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_emulate.h |    7 +
 arch/arm/kvm/arm.c                 |   77 ++++++++++++++
 arch/arm/kvm/arm_emulate.c         |  195 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h               |   28 +++++
 4 files changed, 307 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 91d461a..af21fd5 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -40,6 +40,13 @@ static inline unsigned char vcpu_mode(struct kvm_vcpu *vcpu)
 	return modes_table[vcpu->arch.regs.cpsr & 0xf];
 }
 
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
 /*
  * Return the SPSR for the specified mode of the virtual CPU.
  */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 196eace..a28de12 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -35,6 +35,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_emulate.h>
 
 #include "debug.h"
 
@@ -311,6 +312,62 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 	return 0;
 }
 
+static inline int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			      int exception_index)
+{
+	unsigned long hsr_ec;
+
+	if (exception_index == ARM_EXCEPTION_IRQ)
+		return 0;
+
+	if (exception_index != ARM_EXCEPTION_HVC) {
+		kvm_err(-EINVAL, "Unsupported exception type");
+		return -EINVAL;
+	}
+
+	hsr_ec = (vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT;
+	switch (hsr_ec) {
+	case HSR_EC_WFI:
+		return kvm_handle_wfi(vcpu, run);
+	case HSR_EC_CP15_32:
+	case HSR_EC_CP15_64:
+		return kvm_handle_cp15_access(vcpu, run);
+	case HSR_EC_CP14_MR:
+		return kvm_handle_cp14_access(vcpu, run);
+	case HSR_EC_CP14_LS:
+		return kvm_handle_cp14_load_store(vcpu, run);
+	case HSR_EC_CP14_64:
+		return kvm_handle_cp14_access(vcpu, run);
+	case HSR_EC_CP_0_13:
+		return kvm_handle_cp_0_13_access(vcpu, run);
+	case HSR_EC_CP10_ID:
+		return kvm_handle_cp10_id(vcpu, run);
+	case HSR_EC_SVC_HYP:
+		/* SVC called from Hyp mode should never get here */
+		kvm_msg("SVC called from Hyp mode shouldn't go here");
+		BUG();
+	case HSR_EC_HVC:
+		kvm_msg("hvc: %x (at %08x)", vcpu->arch.hsr & ((1 << 16) - 1),
+					     vcpu->arch.regs.pc);
+		kvm_msg("         HSR: %8x", vcpu->arch.hsr);
+		break;
+	case HSR_EC_IABT:
+	case HSR_EC_DABT:
+		return kvm_handle_guest_abort(vcpu, run);
+	case HSR_EC_IABT_HYP:
+	case HSR_EC_DABT_HYP:
+		/* The hypervisor should never cause aborts */
+		kvm_msg("The hypervisor itself shouldn't cause aborts");
+		BUG();
+	default:
+		kvm_msg("Unkown exception class: %08x (%08x)", hsr_ec,
+				vcpu->arch.hsr);
+		BUG();
+	}
+
+	return 0;
+}
+
 /**
  * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
  * @vcpu:	The VCPU pointer
@@ -339,6 +396,26 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_guest_exit();
 		debug_ws_exit(vcpu->arch.regs.pc);
 		trace_kvm_exit(vcpu->arch.regs.pc);
+
+		ret = handle_exit(vcpu, run, ret);
+		if (ret) {
+			kvm_err(ret, "Error in handle_exit");
+			break;
+		}
+
+		if (run->exit_reason == KVM_EXIT_MMIO)
+			break;
+
+		if (need_resched()) {
+			vcpu_put(vcpu);
+			schedule();
+			vcpu_load(vcpu);
+		}
+
+		if (signal_pending(current) && !(run->exit_reason)) {
+			run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
+			break;
+		}
 	}
 
 	return ret;
diff --git a/arch/arm/kvm/arm_emulate.c b/arch/arm/kvm/arm_emulate.c
index 6587dde..37fe029 100644
--- a/arch/arm/kvm/arm_emulate.c
+++ b/arch/arm/kvm/arm_emulate.c
@@ -14,7 +14,14 @@
  *
  */
 
+#include <linux/mm.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <trace/events/kvm.h>
+
+#include "debug.h"
+#include "trace.h"
 
 #define USR_REG_OFFSET(_reg) \
 	offsetof(struct kvm_vcpu_arch, regs.usr_regs[_reg])
@@ -119,3 +126,191 @@ u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
 
 	return (u32 *)((void *)&vcpu->arch + vcpu_reg_offsets[mode][reg_num]);
 }
+
+/******************************************************************************
+ * Co-processor emulation
+ */
+
+struct coproc_params {
+	unsigned long CRm;
+	unsigned long CRn;
+	unsigned long Op1;
+	unsigned long Op2;
+	unsigned long Rt1;
+	unsigned long Rt2;
+	bool is_64bit;
+	bool is_write;
+};
+
+#define CP15_OP(_vcpu, _params, _cp15_reg) \
+do { \
+	if (_params->is_write) \
+		_vcpu->arch.cp15._cp15_reg = *vcpu_reg(_vcpu, _params->Rt1); \
+	else \
+		*vcpu_reg(_vcpu, _params->Rt1) = _vcpu->arch.cp15._cp15_reg; \
+} while (0);
+
+
+static inline void print_cp_instr(struct coproc_params *p)
+{
+	if (p->is_64bit) {
+		kvm_msg("    %s\tp15, %u, r%u, r%u, c%u",
+				(p->is_write) ? "mcrr" : "mrrc",
+				p->Op1, p->Rt1, p->Rt2, p->CRm);
+	} else {
+		kvm_msg("    %s\tp15, %u, r%u, c%u, c%u, %u",
+				(p->is_write) ? "mcr" : "mrc",
+				p->Op1, p->Rt1, p->CRn, p->CRm, p->Op2);
+	}
+}
+
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	KVMARM_NOT_IMPLEMENTED();
+	return -EINVAL;
+}
+
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	KVMARM_NOT_IMPLEMENTED();
+	return -EINVAL;
+}
+
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	KVMARM_NOT_IMPLEMENTED();
+	return -EINVAL;
+}
+
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	KVMARM_NOT_IMPLEMENTED();
+	return -EINVAL;
+}
+
+/**
+ * emulate_cp15_c10_access -- emulates cp15 accesses for CRn == 10
+ * @vcpu: The VCPU pointer
+ * @p:    The coprocessor parameters struct pointer holding trap inst. details
+ *
+ * This funciton may not need to exist - if we can ignore guest attempts to
+ * tamper with TLB lockdowns then it should be enough to store/restore the
+ * host/guest PRRR and NMRR memory remap registers and allow guest direct access
+ * to these registers.
+ */
+static int emulate_cp15_c10_access(struct kvm_vcpu *vcpu,
+				   struct coproc_params *p)
+{
+	BUG_ON(p->CRn != 10);
+	BUG_ON(p->is_64bit);
+
+	if ((p->CRm == 0 || p->CRm == 1 || p->CRm == 4 || p->CRm == 8) &&
+	    (p->Op2 <= 7)) {
+		/* TLB Lockdown operations - ignored */
+		return 0;
+	}
+
+	if (p->CRm == 2 && p->Op2 == 0) {
+		CP15_OP(vcpu, p, c10_PRRR);
+		return 0;
+	}
+
+	if (p->CRm == 2 && p->Op2 == 1) {
+		CP15_OP(vcpu, p, c10_NMRR);
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
+/**
+ * emulate_cp15_c15_access -- emulates cp15 accesses for CRn == 15
+ * @vcpu: The VCPU pointer
+ * @p:    The coprocessor parameters struct pointer holding trap inst. details
+ *
+ * The CP15 c15 register is implementation defined, but some guest kernels
+ * attempt to read/write a diagnostics register here. We always return 0 and
+ * ignore writes and hope for the best. This may need to be refined.
+ */
+static int emulate_cp15_c15_access(struct kvm_vcpu *vcpu,
+				   struct coproc_params *p)
+{
+	trace_kvm_emulate_cp15_imp(p->Op1, p->Rt1, p->CRn, p->CRm,
+				   p->Op2, p->is_write);
+
+	if (!p->is_write)
+		*vcpu_reg(vcpu, p->Rt1) = 0;
+
+	return 0;
+}
+
+/**
+ * kvm_handle_cp15_access -- handles a trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ *
+ * Investigates the CRn/CRm and wether this was mcr/mrc or mcrr/mrrc and either
+ * simply errors out if the operation was not supported (should maybe raise
+ * undefined to guest instead?) and otherwise emulated access.
+ */
+int kvm_handle_cp15_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	unsigned long hsr_ec, instr_len;
+	struct coproc_params params;
+	int ret = 0;
+
+	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	BUG_ON(params.Rt1 >= 15);
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = (hsr_ec == HSR_EC_CP15_64);
+
+	if (params.is_64bit) {
+		/* mrrc, mccr operation */
+		params.Op1 = (vcpu->arch.hsr >> 16) & 0xf;
+		params.Op2 = 0;
+		params.Rt2 = (vcpu->arch.hsr >> 10) & 0xf;
+		BUG_ON(params.Rt2 >= 15);
+		params.CRn = 0;
+	} else {
+		params.CRn = (vcpu->arch.hsr >> 10) & 0xf;
+		params.Op1 = (vcpu->arch.hsr >> 14) & 0x7;
+		params.Op2 = (vcpu->arch.hsr >> 17) & 0x7;
+		params.Rt2 = 0;
+	}
+
+	/* So far no mrrc/mcrr accesses are emulated */
+	if (params.is_64bit)
+		goto unsupp_err_out;
+
+	switch (params.CRn) {
+	case 10:
+		ret = emulate_cp15_c10_access(vcpu, &params);
+		break;
+	case 15:
+		ret = emulate_cp15_c15_access(vcpu, &params);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	if (ret)
+		goto unsupp_err_out;
+
+	/* Skip instruction, since it was emulated */
+	instr_len = ((vcpu->arch.hsr >> 25) & 1) ? 4 : 2;
+	*vcpu_reg(vcpu, 15) += instr_len;
+
+	return ret;
+unsupp_err_out:
+	kvm_msg("Unsupported guest CP15 access at: %08x", vcpu->arch.regs.pc);
+	print_cp_instr(&params);
+	return -EINVAL;
+}
+
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return 0;
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index ac64e3a..381ea4a 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,6 +39,34 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_emulate_cp15_imp,
+	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
+		 unsigned long CRm, unsigned long Op2, bool is_write),
+	TP_ARGS(Op1, Rt1, CRn, CRm, Op2, is_write),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	Op1		)
+		__field(	unsigned int,	Rt1		)
+		__field(	unsigned int,	CRn		)
+		__field(	unsigned int,	CRm		)
+		__field(	unsigned int,	Op2		)
+		__field(	bool,		is_write	)
+	),
+
+	TP_fast_assign(
+		__entry->is_write		= is_write;
+		__entry->Op1			= Op1;
+		__entry->Rt1			= Rt1;
+		__entry->CRn			= CRn;
+		__entry->CRm			= CRm;
+		__entry->Op2			= Op2;
+	),
+
+	TP_printk("Implementation defined CP15: %s\tp15, %u, r%u, c%u, c%u, %u",
+			(__entry->is_write) ? "mcr" : "mrc",
+			__entry->Op1, __entry->Rt1, __entry->CRn,
+			__entry->CRm, __entry->Op2)
+);
 
 TRACE_EVENT(kvm_irq_line,
 	TP_PROTO(unsigned int type, unsigned int level, unsigned int vcpu_idx),


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 08/10] ARM: KVM: Handle guest faults in KVM
  2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
                   ` (6 preceding siblings ...)
  2011-08-06 10:39 ` [PATCH v4 07/10] ARM: KVM: Emulation framework and CP15 emulation Christoffer Dall
@ 2011-08-06 10:39 ` Christoffer Dall
  2011-08-09 11:24   ` Avi Kivity
  2011-08-06 10:40 ` [PATCH v4 09/10] ARM: KVM: Handle I/O aborts Christoffer Dall
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Christoffer Dall @ 2011-08-06 10:39 UTC (permalink / raw)
  To: kvm; +Cc: catalin.marinas, tech, android-virt

Handles the guest faults in KVM by mapping in corresponding user pages
in the 2nd stage page tables.

Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
pgprot_guest variables used to map 2nd stage memory for KVM guests.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/pgtable-3level.h |    9 +++
 arch/arm/include/asm/pgtable.h        |    4 +
 arch/arm/kvm/arm_mmu.c                |  107 ++++++++++++++++++++++++++++++++-
 arch/arm/mm/mmu.c                     |    3 +
 4 files changed, 121 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index a6261f5..d8c5c14 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -104,4 +104,13 @@
  */
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
 
+/*
+ * 2-nd stage PTE definitions for LPAE.
+ */
+#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
+#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */
+#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
+#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */
+
+
 #endif /* _ASM_PGTABLE_3LEVEL_H */
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index db3b6e8..0e0ca21 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -76,6 +76,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 
 extern pgprot_t		pgprot_user;
 extern pgprot_t		pgprot_kernel;
+extern pgprot_t		pgprot_guest;
 
 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
 
@@ -89,6 +90,9 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
 #define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
+#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
+					  L_PTE2_WRITE | L_PTE2_NORM_WB | \
+					  L_PTE2_INNER_WB)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/kvm/arm_mmu.c b/arch/arm/kvm/arm_mmu.c
index 5af0a7c..6040aff 100644
--- a/arch/arm/kvm/arm_mmu.c
+++ b/arch/arm/kvm/arm_mmu.c
@@ -290,8 +290,111 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
 	kvm->arch.pgd = NULL;
 }
 
+static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			  gfn_t gfn, struct kvm_memory_slot *memslot)
+{
+	pfn_t pfn;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte, new_pte;
+
+	pfn = gfn_to_pfn(vcpu->kvm, gfn);
+
+	if (is_error_pfn(pfn)) {
+		kvm_err(-EFAULT, "Guest gfn %u (0x%08lx) does not have "
+				"corresponding host mapping",
+				gfn, gfn << PAGE_SHIFT);
+		return -EFAULT;
+	}
+
+	/* Create 2nd stage page table mapping - Level 1 */
+	pgd = vcpu->kvm->arch.pgd + pgd_index(fault_ipa);
+	pud = pud_offset(pgd, fault_ipa);
+	if (pud_none(*pud)) {
+		pmd = pmd_alloc_one(NULL, fault_ipa);
+		if (!pmd) {
+			kvm_err(-ENOMEM, "Cannot allocate 2nd stage pmd");
+			return -ENOMEM;
+		}
+		pud_populate(NULL, pud, pmd);
+		pmd += pmd_index(fault_ipa);
+	} else
+		pmd = pmd_offset(pud, fault_ipa);
+
+	/* Create 2nd stage page table mapping - Level 2 */
+	if (pmd_none(*pmd)) {
+		pte = pte_alloc_one_kernel(NULL, fault_ipa);
+		if (!pte) {
+			kvm_err(-ENOMEM, "Cannot allocate 2nd stage pte");
+			return -ENOMEM;
+		}
+		pmd_populate_kernel(NULL, pmd, pte);
+		pte += pte_index(fault_ipa);
+	} else
+		pte = pte_offset_kernel(pmd, fault_ipa);
+
+	/* Create 2nd stage page table mapping - Level 3 */
+	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
+	set_pte_ext(pte, new_pte, 0);
+
+	return 0;
+}
+
+#define HSR_ABT_FS	(0x3f)
+#define HPFAR_MASK	(~0xf)
+
+/**
+ * kvm_handle_guest_abort - handles all 2nd stage aborts
+ * @vcpu:	the VCPU pointer
+ * @run:	the kvm_run structure
+ *
+ * Any abort that gets to the host is almost guaranteed to be caused by a
+ * missing second stage translation table entry, which can mean that either the
+ * guest simply needs more memory and we must allocate an appropriate page or it
+ * can mean that the guest tried to access I/O memory, which is emulated by user
+ * space. The distinction is based on the IPA causing the fault and whether this
+ * memory region has been registered as standard RAM by user space.
+ */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	KVMARM_NOT_IMPLEMENTED();
-	return -EINVAL;
+	unsigned long hsr_ec;
+	unsigned long fault_status;
+	phys_addr_t fault_ipa;
+	struct kvm_memory_slot *memslot = NULL;
+	bool is_iabt;
+	gfn_t gfn;
+
+	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
+	is_iabt = (hsr_ec == HSR_EC_IABT);
+
+	/* Check that the second stage fault is a translation fault */
+	fault_status = vcpu->arch.hsr & HSR_ABT_FS;
+	if ((fault_status & 0x3c) != 0x4) {
+		kvm_err(-EFAULT, "Unsupported fault status: %x",
+				fault_status & 0x3c);
+		return -EFAULT;
+	}
+
+	fault_ipa = ((phys_addr_t)vcpu->arch.hpfar & HPFAR_MASK) << 8;
+
+	gfn = fault_ipa >> PAGE_SHIFT;
+	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
+		if (is_iabt) {
+			kvm_err(-EFAULT, "Inst. abort on I/O address");
+			return -EFAULT;
+		}
+
+		kvm_msg("I/O address abort...");
+		KVMARM_NOT_IMPLEMENTED();
+		return -EINVAL;
+	}
+
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot->user_alloc) {
+		kvm_err(-EINVAL, "non user-alloc memslots not supported");
+		return -EINVAL;
+	}
+
+	return user_mem_abort(vcpu, fault_ipa, gfn, memslot);
 }
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 749475e..c025e65 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -55,9 +55,11 @@ static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
+pgprot_t pgprot_guest;
 
 EXPORT_SYMBOL(pgprot_user);
 EXPORT_SYMBOL(pgprot_kernel);
+EXPORT_SYMBOL(pgprot_guest);
 
 struct cachepolicy {
 	const char	policy[16];
@@ -497,6 +499,7 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
+	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
 
 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 09/10] ARM: KVM: Handle I/O aborts
  2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
                   ` (7 preceding siblings ...)
  2011-08-06 10:39 ` [PATCH v4 08/10] ARM: KVM: Handle guest faults in KVM Christoffer Dall
@ 2011-08-06 10:40 ` Christoffer Dall
  2011-08-09 11:34   ` Avi Kivity
  2011-08-06 10:40 ` [PATCH v4 10/10] ARM: KVM: Guest wait-for-interrupts (WFI) support Christoffer Dall
  2011-08-09 11:43 ` [PATCH v4 00/10] KVM/ARM Implementation Avi Kivity
  10 siblings, 1 reply; 34+ messages in thread
From: Christoffer Dall @ 2011-08-06 10:40 UTC (permalink / raw)
  To: kvm; +Cc: catalin.marinas, tech, android-virt

When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.

Certain classes of load/store operations do not support the syndrome
information provided in the HSR and we therefore must be able to fetch
the offending instruction from guest memory and decode it manually.

This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_emulate.h |    2 
 arch/arm/include/asm/kvm_host.h    |    1 
 arch/arm/include/asm/kvm_mmu.h     |    1 
 arch/arm/kvm/arm.c                 |    8 +
 arch/arm/kvm/arm_emulate.c         |  279 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/arm_mmu.c             |  155 ++++++++++++++++++++
 arch/arm/kvm/trace.h               |   15 ++
 7 files changed, 457 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index af21fd5..9899474 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -46,6 +46,8 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			unsigned long instr);
 
 /*
  * Return the SPSR for the specified mode of the virtual CPU.
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 59fcd15..86f6cf1 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -88,6 +88,7 @@ struct kvm_vcpu_arch {
 	u64 pc_ipa;		/* IPA for the current PC (VA to PA result) */
 
 	/* IO related fields */
+	bool mmio_sign_extend;	/* for byte/halfword loads */
 	u32 mmio_rd;
 
 	/* Misc. fields */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index a64ab2d..f06f42d 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -40,6 +40,7 @@ void free_hyp_pmds(pgd_t *hyp_pgd);
 int kvm_alloc_stage2_pgd(struct kvm *kvm);
 void kvm_free_stage2_pgd(struct kvm *kvm);
 
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index a28de12..3e3f6d7 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -385,6 +385,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	int ret;
 
 	for (;;) {
+		if (run->exit_reason == KVM_EXIT_MMIO) {
+			ret = kvm_handle_mmio_return(vcpu, vcpu->run);
+			if (ret)
+				break;
+		}
+
+		run->exit_reason = KVM_EXIT_UNKNOWN;
+
 		trace_kvm_entry(vcpu->arch.regs.pc);
 		debug_ws_enter(vcpu->arch.regs.pc);
 		kvm_guest_enter();
diff --git a/arch/arm/kvm/arm_emulate.c b/arch/arm/kvm/arm_emulate.c
index 37fe029..0c99360 100644
--- a/arch/arm/kvm/arm_emulate.c
+++ b/arch/arm/kvm/arm_emulate.c
@@ -20,6 +20,7 @@
 #include <asm/kvm_emulate.h>
 #include <trace/events/kvm.h>
 
+#include "trace.h"
 #include "debug.h"
 #include "trace.h"
 
@@ -128,8 +129,30 @@ u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
 }
 
 /******************************************************************************
- * Co-processor emulation
+ * Utility functions common for all emulation code
+ *****************************************************************************/
+
+/*
+ * This one accepts a matrix where the first element is the
+ * bits as they must be, and the second element is the bitmask.
  */
+#define INSTR_NONE	-1
+static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
+{
+	int i;
+	u32 mask;
+
+	for (i = 0; i < table_entries; i++) {
+		mask = table[i][1];
+		if ((table[i][0] & mask) == (instr & mask))
+			return i;
+	}
+	return INSTR_NONE;
+}
+
+/******************************************************************************
+ * Co-processor emulation
+ *****************************************************************************/
 
 struct coproc_params {
 	unsigned long CRm;
@@ -314,3 +337,257 @@ int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	return 0;
 }
+
+
+/******************************************************************************
+ * Load-Store instruction emulation
+ *****************************************************************************/
+
+/*
+ * Must be ordered with LOADS first and WRITES afterwards
+ * for easy distinction when doing MMIO.
+ */
+#define NUM_LD_INSTR  9
+enum INSTR_LS_INDEXES {
+	INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
+	INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
+	INSTR_LS_LDRSH,
+	INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
+	INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
+	NUM_LS_INSTR
+};
+
+static u32 ls_instr[NUM_LS_INSTR][2] = {
+	{0x04700000, 0x0d700000}, /* LDRBT */
+	{0x04300000, 0x0d700000}, /* LDRT  */
+	{0x04100000, 0x0c500000}, /* LDR   */
+	{0x04500000, 0x0c500000}, /* LDRB  */
+	{0x000000d0, 0x0e1000f0}, /* LDRD  */
+	{0x01900090, 0x0ff000f0}, /* LDREX */
+	{0x001000b0, 0x0e1000f0}, /* LDRH  */
+	{0x001000d0, 0x0e1000f0}, /* LDRSB */
+	{0x001000f0, 0x0e1000f0}, /* LDRSH */
+	{0x04600000, 0x0d700000}, /* STRBT */
+	{0x04200000, 0x0d700000}, /* STRT  */
+	{0x04000000, 0x0c500000}, /* STR   */
+	{0x04400000, 0x0c500000}, /* STRB  */
+	{0x000000f0, 0x0e1000f0}, /* STRD  */
+	{0x01800090, 0x0ff000f0}, /* STREX */
+	{0x000000b0, 0x0e1000f0}  /* STRH  */
+};
+
+static inline int get_arm_ls_instr_index(u32 instr)
+{
+	return kvm_instr_index(instr, ls_instr, NUM_LS_INSTR);
+}
+
+/*
+ * Load-Store instruction decoding
+ */
+#define INSTR_LS_TYPE_BIT		26
+#define INSTR_LS_RD_MASK		0x0000f000
+#define INSTR_LS_RD_SHIFT		12
+#define INSTR_LS_RN_MASK		0x000f0000
+#define INSTR_LS_RN_SHIFT		16
+#define INSTR_LS_RM_MASK		0x0000000f
+#define INSTR_LS_OFFSET12_MASK		0x00000fff
+
+#define INSTR_LS_BIT_P			24
+#define INSTR_LS_BIT_U			23
+#define INSTR_LS_BIT_B			22
+#define INSTR_LS_BIT_W			21
+#define INSTR_LS_BIT_L			20
+#define INSTR_LS_BIT_S			 6
+#define INSTR_LS_BIT_H			 5
+
+/*
+ * ARM addressing mode defines
+ */
+#define OFFSET_IMM_MASK			0x0e000000
+#define OFFSET_IMM_VALUE		0x04000000
+#define OFFSET_REG_MASK			0x0e000ff0
+#define OFFSET_REG_VALUE		0x06000000
+#define OFFSET_SCALE_MASK		0x0e000010
+#define OFFSET_SCALE_VALUE		0x06000000
+
+#define SCALE_SHIFT_MASK		0x000000a0
+#define SCALE_SHIFT_SHIFT		5
+#define SCALE_SHIFT_LSL			0x0
+#define SCALE_SHIFT_LSR			0x1
+#define SCALE_SHIFT_ASR			0x2
+#define SCALE_SHIFT_ROR_RRX		0x3
+#define SCALE_SHIFT_IMM_MASK		0x00000f80
+#define SCALE_SHIFT_IMM_SHIFT		6
+
+#define PSR_BIT_C			29
+
+static unsigned long ls_word_calc_offset(struct kvm_vcpu *vcpu,
+					 unsigned long instr)
+{
+	int offset = 0;
+
+	if ((instr & OFFSET_IMM_MASK) == OFFSET_IMM_VALUE) {
+		/* Immediate offset/index */
+		offset = instr & INSTR_LS_OFFSET12_MASK;
+
+		if (!(instr & (1U << INSTR_LS_BIT_U)))
+			offset = -offset;
+	}
+
+	if ((instr & OFFSET_REG_MASK) == OFFSET_REG_VALUE) {
+		/* Register offset/index */
+		u8 rm = instr & INSTR_LS_RM_MASK;
+		offset = *vcpu_reg(vcpu, rm);
+
+		if (!(instr & (1U << INSTR_LS_BIT_P)))
+			offset = 0;
+	}
+
+	if ((instr & OFFSET_SCALE_MASK) == OFFSET_SCALE_VALUE) {
+		/* Scaled register offset */
+		int asr_test;
+		u8 rm = instr & INSTR_LS_RM_MASK;
+		u8 shift = (instr & SCALE_SHIFT_MASK) >> SCALE_SHIFT_SHIFT;
+		u32 shift_imm = (instr & SCALE_SHIFT_IMM_MASK)
+				>> SCALE_SHIFT_IMM_SHIFT;
+		offset = *vcpu_reg(vcpu, rm);
+
+		switch (shift) {
+		case SCALE_SHIFT_LSL:
+			offset = offset << shift_imm;
+			break;
+		case SCALE_SHIFT_LSR:
+			if (shift_imm == 0)
+				offset = 0;
+			else
+				offset = ((u32)offset) >> shift_imm;
+			break;
+		case SCALE_SHIFT_ASR:
+			/* Test that the compiler used arithmetic right shift
+			 * for signed values. */
+			asr_test = 0xffffffff;
+			BUG_ON((asr_test >> 2) >= 0);
+			if (shift_imm == 0) {
+				if (offset & (1U << 31))
+					offset = 0xffffffff;
+				else
+					offset = 0;
+			} else {
+				offset = offset >> shift_imm;
+			}
+			break;
+		case SCALE_SHIFT_ROR_RRX:
+			/* Test that the compiler used arithmetic right shift
+			 * for signed values. */
+			asr_test = 0xffffffff;
+			BUG_ON((asr_test >> 2) >= 0);
+			if (shift_imm == 0) {
+				u32 C = (vcpu->arch.regs.cpsr &
+						(1U << PSR_BIT_C));
+				offset = (C << 31) | offset >> 1;
+			} else {
+				offset = ror32(offset, shift_imm);
+			}
+			break;
+		}
+
+		if (instr & (1U << INSTR_LS_BIT_U))
+			return offset;
+		else
+			return -offset;
+	}
+
+	if (instr & (1U << INSTR_LS_BIT_U))
+		return offset;
+	else
+		return -offset;
+
+	BUG();
+}
+
+static int kvm_ls_length(struct kvm_vcpu *vcpu, u32 instr)
+{
+	int index;
+
+	index = get_arm_ls_instr_index(instr);
+	BUG_ON(index == INSTR_NONE);
+
+	if (instr & (1U << INSTR_LS_TYPE_BIT)) {
+		/* LS word or unsigned byte */
+		if (instr & (1U << INSTR_LS_BIT_B))
+			return sizeof(unsigned char);
+		else
+			return sizeof(u32);
+	} else {
+		/* LS halfword, doubleword or signed byte */
+		u32 H = (instr & (1U << INSTR_LS_BIT_H));
+		u32 S = (instr & (1U << INSTR_LS_BIT_S));
+		u32 L = (instr & (1U << INSTR_LS_BIT_L));
+
+		if (!L && S) {
+			kvm_msg("WARNING: d-word for MMIO");
+			return 2 * sizeof(u32);
+		} else if (L && S && !H)
+			return sizeof(char);
+		else
+			return sizeof(u16);
+	}
+
+	BUG();
+}
+
+/**
+ * kvm_emulate_mmio_ls - emulates load/store instructions made to I/O memory
+ * @vcpu:	The vcpu pointer
+ * @fault_ipa:	The IPA that caused the 2nd stage fault
+ * @instr:	The instruction that caused the fault
+ *
+ * Handles emulation of load/store instructions which cannot be emulated through
+ * information found in the HSR on faults. It is necessary in this case to
+ * simply decode the offending instruction in software and determine the
+ * required operands.
+ */
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			unsigned long instr)
+{
+	unsigned long rd, rn, offset, len;
+	int index;
+	bool is_write;
+
+	index = get_arm_ls_instr_index(instr);
+	if (index == INSTR_NONE) {
+		kvm_err(-EINVAL, "Unknown load/store instruction");
+		return -EINVAL;
+	}
+
+	is_write = (index < NUM_LD_INSTR) ? false : true;
+	rd = (instr & INSTR_LS_RD_MASK) >> INSTR_LS_RD_SHIFT;
+	len = kvm_ls_length(vcpu, instr);
+
+	vcpu->run->exit_reason = KVM_EXIT_MMIO;
+	vcpu->run->mmio.is_write = is_write;
+	vcpu->run->mmio.phys_addr = fault_ipa;
+	vcpu->run->mmio.len = len;
+	vcpu->arch.mmio_sign_extend = false;
+	vcpu->arch.mmio_rd = rd;
+
+	trace_kvm_mmio_emulate(vcpu->arch.regs.pc);
+	trace_kvm_mmio((is_write) ? KVM_TRACE_MMIO_WRITE :
+				    KVM_TRACE_MMIO_READ_UNSATISFIED,
+			len, fault_ipa, (is_write) ? *vcpu_reg(vcpu, rd) : 0);
+
+	/* Handle base register writeback */
+	if (!(instr & (1U << INSTR_LS_BIT_P)) ||
+	     (instr & (1U << INSTR_LS_BIT_W))) {
+		rn = (instr & INSTR_LS_RN_MASK) >> INSTR_LS_RN_SHIFT;
+		offset = ls_word_calc_offset(vcpu, instr);
+		*vcpu_reg(vcpu, rn) += offset;
+	}
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest. (XXX We don't support Thumb instructions yet).
+	 */
+	*vcpu_reg(vcpu, 15) += 4;
+	return 0;
+}
diff --git a/arch/arm/kvm/arm_mmu.c b/arch/arm/kvm/arm_mmu.c
index 6040aff..032133a 100644
--- a/arch/arm/kvm/arm_mmu.c
+++ b/arch/arm/kvm/arm_mmu.c
@@ -16,10 +16,13 @@
 
 #include <linux/mman.h>
 #include <linux/kvm_host.h>
+#include <trace/events/kvm.h>
 #include <asm/pgalloc.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_emulate.h>
 
+#include "trace.h"
 #include "debug.h"
 
 pgd_t *kvm_hyp_pgd;
@@ -341,6 +344,152 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	return 0;
 }
 
+/**
+ * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation
+ * @vcpu: The VCPU pointer
+ * @run:  The VCPU run struct containing the mmio data
+ *
+ * This should only be called after returning to QEMU for MMIO load emulation.
+ */
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	int *dest;
+	unsigned int len;
+	int mask;
+
+	if (!run->mmio.is_write) {
+		dest = vcpu_reg(vcpu, vcpu->arch.mmio_rd);
+		memset(dest, 0, sizeof(int));
+
+		if (run->mmio.len > 4) {
+			kvm_err(-EINVAL, "Incorrect mmio length");
+			return -EINVAL;
+		}
+
+		len = run->mmio.len;
+		memcpy(dest, run->mmio.data, len);
+
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
+				*((u64 *)run->mmio.data));
+
+		if (vcpu->arch.mmio_sign_extend && len < 4) {
+			mask = 1U << ((len * 8) - 1);
+			*dest = (*dest ^ mask) - mask;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * invalid_io_mem_abort -- Handle I/O aborts ISV bit is clear
+ *
+ * @vcpu:      The vcpu pointer
+ * @fault_ipa: The IPA that caused the 2nd stage fault
+ *
+ * Some load/store instructions cannot be emulated using the information
+ * presented in the HSR, for instance, register write-back instructions are not
+ * supported. We therefore need to fetch the instruction, decode it, and then
+ * emulate its behavior.
+ */
+static int invalid_io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
+{
+	unsigned long instr;
+	phys_addr_t pc_ipa;
+
+	if (vcpu->arch.pc_ipa & (1U << 11)) {
+		/* LPAE PAR format */
+		pc_ipa = vcpu->arch.pc_ipa & PAGE_MASK & ((1ULL << 32) - 1);
+	} else {
+		/* VMSAv7 PAR format */
+		pc_ipa = vcpu->arch.pc_ipa & PAGE_MASK & ((1ULL << 40) - 1);
+	}
+	pc_ipa += vcpu->arch.regs.pc & ~PAGE_MASK;
+
+	if (kvm_read_guest(vcpu->kvm, pc_ipa, &instr, sizeof(instr))) {
+		kvm_err(-EFAULT, "Could not copy guest instruction");
+		return -EFAULT;
+	}
+
+	if (vcpu->arch.regs.cpsr & PSR_T_BIT) {
+		/* Need to decode thumb instructions as well */
+		KVMARM_NOT_IMPLEMENTED();
+		return -EINVAL;
+	}
+
+	return kvm_emulate_mmio_ls(vcpu, fault_ipa, instr);
+}
+
+static int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
+{
+	unsigned long rd, len, instr_len;
+	bool is_write, sign_extend;
+
+	if (!(vcpu->arch.hsr & HSR_ISV))
+		return invalid_io_mem_abort(vcpu, fault_ipa);
+
+	if (((vcpu->arch.hsr >> 8) & 1)) {
+		kvm_err(-EFAULT, "Not supported, Cache operation on I/O addr.");
+		return -EFAULT;
+	}
+
+	if ((vcpu->arch.hsr >> 7) & 1) {
+		kvm_err(-EFAULT, "Translation table accesses I/O memory");
+		return -EFAULT;
+	}
+
+	switch ((vcpu->arch.hsr >> 22) & 0x3) {
+	case 0:
+		len = 1;
+		break;
+	case 1:
+		len = 2;
+		break;
+	case 2:
+		len = 4;
+		break;
+	default:
+		kvm_err(-EFAULT, "Invalid I/O abort");
+		return -EFAULT;
+	}
+
+	is_write = ((vcpu->arch.hsr >> 6) & 1);
+	sign_extend = ((vcpu->arch.hsr >> 21) & 1);
+	rd = (vcpu->arch.hsr >> 16) & 0xf;
+	BUG_ON(rd > 15);
+
+	if (rd == 15) {
+		kvm_err(-EFAULT, "I/O memory trying to read/write pc");
+		return -EFAULT;
+	}
+
+	/* Get instruction length in bytes */
+	instr_len = ((vcpu->arch.hsr >> 25) & 1) ? 4 : 2;
+
+	/* Export MMIO operations to user space */
+	run->exit_reason = KVM_EXIT_MMIO;
+	run->mmio.is_write = is_write;
+	run->mmio.phys_addr = fault_ipa;
+	run->mmio.len = len;
+	vcpu->arch.mmio_sign_extend = sign_extend;
+	vcpu->arch.mmio_rd = rd;
+
+	trace_kvm_mmio((is_write) ? KVM_TRACE_MMIO_WRITE :
+				    KVM_TRACE_MMIO_READ_UNSATISFIED,
+			len, fault_ipa, (is_write) ? *vcpu_reg(vcpu, rd) : 0);
+
+	if (is_write)
+		memcpy(run->mmio.data, vcpu_reg(vcpu, rd), len);
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	*vcpu_reg(vcpu, 15) += instr_len;
+	return 0;
+}
+
 #define HSR_ABT_FS	(0x3f)
 #define HPFAR_MASK	(~0xf)
 
@@ -385,9 +534,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			return -EFAULT;
 		}
 
-		kvm_msg("I/O address abort...");
-		KVMARM_NOT_IMPLEMENTED();
-		return -EINVAL;
+		/* Adjust page offset */
+		fault_ipa += vcpu->arch.hdfar % PAGE_SIZE;
+		return io_mem_abort(vcpu, run, fault_ipa, memslot);
 	}
 
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 381ea4a..4f20d75 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,6 +39,21 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_mmio_emulate,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("Emulate MMIO at: 0x%08lx", __entry->vcpu_pc)
+);
+
 TRACE_EVENT(kvm_emulate_cp15_imp,
 	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
 		 unsigned long CRm, unsigned long Op2, bool is_write),


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 10/10] ARM: KVM: Guest wait-for-interrupts (WFI) support
  2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
                   ` (8 preceding siblings ...)
  2011-08-06 10:40 ` [PATCH v4 09/10] ARM: KVM: Handle I/O aborts Christoffer Dall
@ 2011-08-06 10:40 ` Christoffer Dall
  2011-08-09 11:43 ` [PATCH v4 00/10] KVM/ARM Implementation Avi Kivity
  10 siblings, 0 replies; 34+ messages in thread
From: Christoffer Dall @ 2011-08-06 10:40 UTC (permalink / raw)
  To: kvm; +Cc: catalin.marinas, tech, android-virt

When the guest executes a WFI instruction the operation is trapped to
KVM, which emulates the instruction in software. There is no correlation
between a guest executing a WFI instruction and actually puttin the
hardware into a low-power mode, since a KVM guest is essentially a
process and the WFI instruction can be seen as 'sleep' call from this
process. Therefore, we flag the VCPU to be in wait_for_interrupts mode
and call the main KVM function kvm_vcpu_block() function. This function
will put the thread on a wait-queue and call schedule.

When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
signal the VCPU thread and unflag the VCPU to no longer wait for
interrupts. All calls to kvm_arch_vcpu_ioctl_run() result in a call to
kvm_vcpu_block() as long as the VCPU is in wfi-mode.


Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/kvm/arm.c         |   20 +++++++++++++++++++-
 arch/arm/kvm/arm_emulate.c |   11 +++++++++++
 arch/arm/kvm/trace.h       |   15 +++++++++++++++
 3 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3e3f6d7..693ba69 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -307,9 +307,18 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+/**
+ * kvm_arch_vcpu_runnable - determine if the vcpu can be scheduled
+ * @v:		The VCPU pointer
+ *
+ * If the guest CPU is not waiting for interrupts (or is waiting for interrupts
+ * but there actually is an incoming interrupt), then it is by definition
+ * runnable.
+ */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return 0;
+	return !!v->arch.virt_irq ||
+		!v->arch.wait_for_interrupts;
 }
 
 static inline int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
@@ -385,6 +394,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	int ret;
 
 	for (;;) {
+		if (vcpu->arch.wait_for_interrupts)
+			goto wait_for_interrupts;
+
 		if (run->exit_reason == KVM_EXIT_MMIO) {
 			ret = kvm_handle_mmio_return(vcpu, vcpu->run);
 			if (ret)
@@ -420,6 +432,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			vcpu_load(vcpu);
 		}
 
+wait_for_interrupts:
+		if (vcpu->arch.wait_for_interrupts)
+			kvm_vcpu_block(vcpu);
+
 		if (signal_pending(current) && !(run->exit_reason)) {
 			run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
 			break;
@@ -460,6 +476,8 @@ static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
 	if (irq_level->level) {
 		vcpu->arch.virt_irq |= mask;
 		vcpu->arch.wait_for_interrupts = 0;
+		if (waitqueue_active(&vcpu->wq))
+			wake_up_interruptible(&vcpu->wq);
 	} else
 		vcpu->arch.virt_irq &= ~mask;
 
diff --git a/arch/arm/kvm/arm_emulate.c b/arch/arm/kvm/arm_emulate.c
index 0c99360..928c747 100644
--- a/arch/arm/kvm/arm_emulate.c
+++ b/arch/arm/kvm/arm_emulate.c
@@ -333,8 +333,19 @@ unsupp_err_out:
 	return -EINVAL;
 }
 
+/**
+ * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
+ * @vcpu:	the vcpu pointer
+ * @run:	the kvm_run structure pointer
+ *
+ * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
+ * halt execution of world-switches and schedule other host processes until
+ * there is an incoming IRQ or FIQ to the VM.
+ */
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+	trace_kvm_wfi(vcpu->arch.regs.pc);
+	vcpu->arch.wait_for_interrupts = 1;
 	return 0;
 }
 
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 4f20d75..2ea01f5 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -104,6 +104,21 @@ TRACE_EVENT(kvm_irq_line,
 		__entry->level, __entry->vcpu_idx)
 );
 
+TRACE_EVENT(kvm_wfi,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("guest executed wfi at: 0x%08lx", __entry->vcpu_pc)
+);
+
 
 #endif /* _TRACE_KVM_H */
 


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 02/10] ARM: KVM: Hypervisor identity mapping
  2011-08-06 10:39 ` [PATCH v4 02/10] ARM: KVM: Hypervisor identity mapping Christoffer Dall
@ 2011-08-09  9:20   ` Avi Kivity
  2011-08-09  9:29     ` Catalin Marinas
  2011-08-09  9:29     ` Christoffer Dall
  0 siblings, 2 replies; 34+ messages in thread
From: Avi Kivity @ 2011-08-09  9:20 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, catalin.marinas, tech, android-virt

On 08/06/2011 01:39 PM, Christoffer Dall wrote:
> Adds support in the identity mapping feature that allows KVM to setup
> identity mapping for the Hyp mode with the AP[1] bit set as required by
> the specification and also supports freeing created sub pmd's after
> finished use.
>
> These two functions:
>   - hyp_identity_mapping_add(pgd, addr, end);
>   - hyp_identity_mapping_del(pgd, addr, end);
> are essentially calls the same function as the non-hyp versions but
> with a different argument value. KVM calls these functions to setup
> and teardown the identity mapping used to initialize the hypervisor.
>
> Note, the hyp-version of the _del function actually frees the pmd's
> pointed to by the pgd as opposed to the non-hyp version which just
> clears them.
>
>

These are for mapping host memory, not guest memory, right?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 02/10] ARM: KVM: Hypervisor identity mapping
  2011-08-09  9:20   ` Avi Kivity
@ 2011-08-09  9:29     ` Catalin Marinas
  2011-08-09  9:29     ` Christoffer Dall
  1 sibling, 0 replies; 34+ messages in thread
From: Catalin Marinas @ 2011-08-09  9:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoffer Dall, kvm, tech, android-virt

On Tue, Aug 09, 2011 at 10:20:27AM +0100, Avi Kivity wrote:
> On 08/06/2011 01:39 PM, Christoffer Dall wrote:
> > Adds support in the identity mapping feature that allows KVM to setup
> > identity mapping for the Hyp mode with the AP[1] bit set as required by
> > the specification and also supports freeing created sub pmd's after
> > finished use.
> >
> > These two functions:
> >   - hyp_identity_mapping_add(pgd, addr, end);
> >   - hyp_identity_mapping_del(pgd, addr, end);
> > are essentially calls the same function as the non-hyp versions but
> > with a different argument value. KVM calls these functions to setup
> > and teardown the identity mapping used to initialize the hypervisor.
> >
> > Note, the hyp-version of the _del function actually frees the pmd's
> > pointed to by the pgd as opposed to the non-hyp version which just
> > clears them.
> >
> >
> 
> These are for mapping host memory, not guest memory, right?

Yes. There is some code that is built into the kernel image (and address
space) but it needs to run in Hypervisor mode which has its own MMU
translation tables.

-- 
Catalin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 02/10] ARM: KVM: Hypervisor identity mapping
  2011-08-09  9:20   ` Avi Kivity
  2011-08-09  9:29     ` Catalin Marinas
@ 2011-08-09  9:29     ` Christoffer Dall
  2011-08-09 10:23       ` [Android-virt] " Alexey Smirnov
  1 sibling, 1 reply; 34+ messages in thread
From: Christoffer Dall @ 2011-08-09  9:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoffer Dall, kvm, catalin.marinas, tech, android-virt


On Aug 9, 2011, at 11:20 AM, Avi Kivity wrote:

> On 08/06/2011 01:39 PM, Christoffer Dall wrote:
>> Adds support in the identity mapping feature that allows KVM to setup
>> identity mapping for the Hyp mode with the AP[1] bit set as required by
>> the specification and also supports freeing created sub pmd's after
>> finished use.
>> 
>> These two functions:
>>  - hyp_identity_mapping_add(pgd, addr, end);
>>  - hyp_identity_mapping_del(pgd, addr, end);
>> are essentially calls the same function as the non-hyp versions but
>> with a different argument value. KVM calls these functions to setup
>> and teardown the identity mapping used to initialize the hypervisor.
>> 
>> Note, the hyp-version of the _del function actually frees the pmd's
>> pointed to by the pgd as opposed to the non-hyp version which just
>> clears them.
>> 
>> 
> 
> These are for mapping host memory, not guest memory, right?

yes (or to be exact - hypervisor memory). The point is that there are special hardware requirements for translation tables used in Hyp-mode not otherwise satisfied by the normal page tables.

> 
> -- 
> error compiling committee.c: too many arguments to function
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 04/10] ARM: KVM: Memory virtualization setup
  2011-08-06 10:39 ` [PATCH v4 04/10] ARM: KVM: Memory virtualization setup Christoffer Dall
@ 2011-08-09  9:57   ` Avi Kivity
  2011-08-09 11:24     ` [Android-virt] " Christoffer Dall
  0 siblings, 1 reply; 34+ messages in thread
From: Avi Kivity @ 2011-08-09  9:57 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, catalin.marinas, tech, android-virt

On 08/06/2011 01:39 PM, Christoffer Dall wrote:
> This commit introduces the framework for guest memory management
> through the use of 2nd stage translation. Each VM has a pointer
> to a level-1 tabled (the pgd field in struct kvm_arch) which is
> used for the 2nd stage translations. Entries are added when handling
> guest faults (later patch) and the table itself can be allocated and
> freed through the following functions implemented in
> arch/arm/kvm/arm_mmu.c:
>   - kvm_alloc_stage2_pgd(struct kvm *kvm);
>   - kvm_free_stage2_pgd(struct kvm *kvm);
>
> Further, each entry in TLBs and caches are tagged with a VMID
> identifier in addition to ASIDs. The VMIDs are managed using
> a bitmap and assigned when creating the VM in kvm_arch_init_vm()
> where the 2nd stage pgd is also allocated. The table is freed in
> kvm_arch_destroy_vm(). Both functions are called from the main
> KVM code.
>
>
> +/**
> + * kvm_arch_init_vm - initializes a VM data structure
> + * @kvm:	pointer to the KVM struct
> + */
>   int kvm_arch_init_vm(struct kvm *kvm)
>   {
> -	return 0;
> +	int ret = 0;
> +	phys_addr_t pgd_phys;
> +	unsigned long vmid;
> +	unsigned long start, end;
> +
> +
> +	mutex_lock(&kvm_vmids_mutex);
> +	vmid = find_first_zero_bit(kvm_vmids, VMID_SIZE);
> +	if (vmid>= VMID_SIZE) {
> +		mutex_unlock(&kvm_vmids_mutex);
> +		return -EBUSY;
> +	}
> +	__set_bit(vmid, kvm_vmids);

VMID_SIZE seems to be a bit low for comfort.  I guess it's fine for a 
start, but later on we'll have to recycle VMIDs, like we do for SVM ASIDs.

Is there not a risk of a user starting 255 tiny guests and denying other 
users the ability to use kvm?

> +	kvm->arch.vmid = vmid;
> +	mutex_unlock(&kvm_vmids_mutex);
> +
> +	ret = kvm_alloc_stage2_pgd(kvm);
> +	if (ret)
> +		goto out_fail_alloc;
> +
> +	pgd_phys = virt_to_phys(kvm->arch.pgd);
> +	kvm->arch.vttbr = pgd_phys&  ((1LLU<<  40) - 1)&  ~((2<<  VTTBR_X) - 1);
> +	kvm->arch.vttbr |= ((u64)vmid<<  48);
> +
> +	start = (unsigned long)kvm,
> +	end = start + sizeof(struct kvm);
> +	ret = create_hyp_mappings(kvm_hyp_pgd, start, end);
> +	if (ret)
> +		goto out_fail_hyp_mappings;
> +
> +	return ret;
> +out_fail_hyp_mappings:
> +	remove_hyp_mappings(kvm_hyp_pgd, start, end);
> +out_fail_alloc:
> +	clear_bit(vmid, kvm_vmids);
> +	return ret;
>   }
>

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 05/10] ARM: KVM: Inject IRQs and FIQs from userspace
  2011-08-06 10:39 ` [PATCH v4 05/10] ARM: KVM: Inject IRQs and FIQs from userspace Christoffer Dall
@ 2011-08-09 10:07   ` Avi Kivity
  2011-08-09 11:27     ` [Android-virt] " Christoffer Dall
  0 siblings, 1 reply; 34+ messages in thread
From: Avi Kivity @ 2011-08-09 10:07 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, catalin.marinas, tech, android-virt

On 08/06/2011 01:39 PM, Christoffer Dall wrote:
> Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
> This ioctl is used since the sematics are in fact two lines that can be
> either raised or lowered on the VCPU - the IRQ and FIQ lines.
>
> KVM needs to know which VCPU it must operate on and whether the FIQ or
> IRQ line is raised/lowered. Hence both pieces of information is packed
> in the kvm_irq_level->irq field. The irq fild value will be:
>    IRQ: vcpu_index * 2
>    FIQ: (vcpu_index * 2) + 1
>
> This is documented in Documentation/kvm/api.txt.
>
> The effect of the ioctl is simply to simply raise/lower the
> corresponding virt_irq field on the VCPU struct, which will cause the
> world-switch code to raise/lower virtual interrupts when running the
> guest on next switch. The wait_for_interrupt flag is also cleared for
> raised IRQs causing an idle VCPU to become active again.

Note x86 starts out with a default configuration and allows updating it 
via KVM_SET_GSI_ROUTING.  You may need this in the future if you decide 
to implement an irq controller in the kernel.

> +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
> +				      struct kvm_irq_level *irq_level)
> +{
> +	u32 mask;
> +	unsigned int vcpu_idx;
> +	struct kvm_vcpu *vcpu;
> +
> +	vcpu_idx = irq_level->irq / 2;
> +	if (vcpu_idx>= KVM_MAX_VCPUS)
> +		return -EINVAL;
> +
> +	vcpu = kvm_get_vcpu(kvm, vcpu_idx);
> +	if (!vcpu)
> +		return -EINVAL;
> +
> +	switch (irq_level->irq % 2) {
> +	case KVM_ARM_IRQ_LINE:
> +		mask = HCR_VI;
> +		break;
> +	case KVM_ARM_FIQ_LINE:
> +		mask = HCR_VF;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	trace_kvm_irq_line(irq_level->irq % 2, irq_level->level, vcpu_idx);

Please reuse trace_kvm_set_irq().  You can decode vcpu/type in a 
trace-cmd plugin.

> +
> +	if (irq_level->level) {
> +		vcpu->arch.virt_irq |= mask;
> +		vcpu->arch.wait_for_interrupts = 0;
> +	} else
> +		vcpu->arch.virt_irq&= ~mask;
> +

This seems to be non-smp-safe?  Do you need atomic ops and barriers 
here?  And a wakeup?

Unlike KVM_INTERRUPT, KVM_IRQ_LINE is designed to be used asynchronously 
wrt the vcpu.

> +	return 0;
> +}
> +
>   long kvm_arch_vcpu_ioctl(struct file *filp,
>   			 unsigned int ioctl, unsigned long arg)
>   {
> @@ -312,8 +349,21 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>   long kvm_arch_vm_ioctl(struct file *filp,
>   		       unsigned int ioctl, unsigned long arg)
>   {
> -	printk(KERN_ERR "kvm_arch_vm_ioctl: Unsupported ioctl (%d)\n", ioctl);
> -	return -EINVAL;
> +	struct kvm *kvm = filp->private_data;
> +	void __user *argp = (void __user *)arg;
> +
> +	switch (ioctl) {
> +	case KVM_IRQ_LINE: {
> +		struct kvm_irq_level irq_event;
> +
> +		if (copy_from_user(&irq_event, argp, sizeof irq_event))
> +			return -EFAULT;
> +		return kvm_arch_vm_ioctl_irq_line(kvm,&irq_event);
> +	}
> +	default:
> +		kvm_err(-EINVAL, "Unsupported ioctl (%d)", ioctl);

Please remove for the final code, we don't want a user spamming the 
kernel log.

> +		return -EINVAL;
> +	}
>   }
>
>

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Android-virt] [PATCH v4 02/10] ARM: KVM: Hypervisor identity mapping
  2011-08-09  9:29     ` Christoffer Dall
@ 2011-08-09 10:23       ` Alexey Smirnov
  2011-08-09 11:23         ` Christoffer Dall
  0 siblings, 1 reply; 34+ messages in thread
From: Alexey Smirnov @ 2011-08-09 10:23 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Avi Kivity, tech, android-virt, kvm

Hi Christoffer,

>>
>> These are for mapping host memory, not guest memory, right?
>
> yes (or to be exact - hypervisor memory). The point is that there are special hardware requirements for translation tables used in Hyp-mode not otherwise satisfied by the normal page tables.

In function init_hyp_memory() you map some memory regions for vectors,
vcpu, stack, etc. using function create_hyp_mappings. Just wondering,
how do you make sure that guest will never map its own data into these
addresses? Since guest is not para-virtualized, it can use any VA it
wants, including these addresses.

 In your earlier KVM-arm paper you mentioned that such mappings were
write-protected, so whenever guest tried to access them you needed to
relocate such shared pages. Is the mechanism the same or you somehow
take advantage of virtualization extensions to avoid this problem?

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 06/10] ARM: KVM: World-switch implementation
  2011-08-06 10:39 ` [PATCH v4 06/10] ARM: KVM: World-switch implementation Christoffer Dall
@ 2011-08-09 11:09   ` Avi Kivity
  2011-08-09 11:29     ` Christoffer Dall
  0 siblings, 1 reply; 34+ messages in thread
From: Avi Kivity @ 2011-08-09 11:09 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, catalin.marinas, tech, android-virt

On 08/06/2011 01:39 PM, Christoffer Dall wrote:
> Provides complete world-switch implementation to switch to other guests
> runinng in non-secure modes. Includes Hyp exception handlers that
> captures necessary exception information and stores the information on
> the VCPU and KVM structures.
>
> Switching to Hyp mode is done through a simple HVC instructions. The
> exception vector code will check that the HVC comes from VMID==0 and if
> so will store the necessary state on the Hyp stack, which will look like
> this (see hyp_hvc):
>    ...
>    Hyp_Sp + 4: lr_usr
>    Hyp_Sp    : spsr (Host-SVC cpsr)
>
> When returning from Hyp mode to SVC mode, another HVC instruction is
> executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp
> stack pointer should be where it was left from the above initial call,
> since the values on the stack will be used to restore state (see
> hyp_svc).
>
> Otherwise, the world-switch is pretty straight-forward. All state that
> can be modified by the guest is first backed up on the Hyp stack and the
> VCPU values is loaded onto the hardware. State, which is not loaded, but
> theoretically modifiable by the guest is protected through the
> virtualiation features to generate a trap and cause software emulation.
> Upon guest returns, all state is restored from hardware onto the VCPU
> struct and the original state is restored from the Hyp-stack onto the
> hardware.
>
> One controversy may be the back-door call to __irq_svc (the host
> kernel's own physical IRQ handler) which is called when a physical IRQ
> exception is taken in Hyp mode while running in the guest.
>
>
>   void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
>   {
> +	unsigned long start, end;
> +
>   	latest_vcpu = NULL;
> -	KVMARM_NOT_IMPLEMENTED();
> +
> +	start = (unsigned long)vcpu,
> +	end = start + sizeof(struct kvm_vcpu);
> +	remove_hyp_mappings(kvm_hyp_pgd, start, end);

What if vcpu shares a page with another mapped structure?

> +
> +	kmem_cache_free(kvm_vcpu_cache, vcpu);
>   }

>   	return 0;
>   }
>
> +/**
> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
> + * @vcpu:	The VCPU pointer
> + * @run:	The kvm_run structure pointer used for userspace state exchange
> + *
> + * This function is called through the VCPU_RUN ioctl called from user space. It
> + * will execute VM code in a loop until the time slice for the process is used
> + * or some emulation is needed from user space in which case the function will
> + * return with return value 0 and with the kvm_run structure filled in with the
> + * required data for the requested emulation.
> + */
>   int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>   {
> -	KVMARM_NOT_IMPLEMENTED();
> -	return -EINVAL;
> +	unsigned long flags;
> +	int ret;
> +
> +	for (;;) {
> +		trace_kvm_entry(vcpu->arch.regs.pc);
> +		debug_ws_enter(vcpu->arch.regs.pc);

why both trace_kvm and debug_ws?

> +		kvm_guest_enter();
> +
> +		local_irq_save(flags);

local_irq_disable() is likely sufficient - the call path never changes.

> +		ret = __kvm_vcpu_run(vcpu);
> +		local_irq_restore(flags);
> +
> +		kvm_guest_exit();
> +		debug_ws_exit(vcpu->arch.regs.pc);
> +		trace_kvm_exit(vcpu->arch.regs.pc);
> +	}
> +
> +	return ret;
>   }
>
>

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 07/10] ARM: KVM: Emulation framework and CP15 emulation
  2011-08-06 10:39 ` [PATCH v4 07/10] ARM: KVM: Emulation framework and CP15 emulation Christoffer Dall
@ 2011-08-09 11:17   ` Avi Kivity
  2011-08-09 11:34     ` Christoffer Dall
  0 siblings, 1 reply; 34+ messages in thread
From: Avi Kivity @ 2011-08-09 11:17 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, catalin.marinas, tech, android-virt

On 08/06/2011 01:39 PM, Christoffer Dall wrote:
> Adds a new important function in the main KVM/ARM code called
> handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
> from guest execution. This function examines the Hyp-Syndrome-Register
> (HSR), which contains information telling KVM what caused the exit from
> the guest.
>
> Some of the reasons for an exit are CP15 accesses, which are
> not allowed from the guest and this commits handles these exits by
> emulating the intented operation in software and skip the guest
> instruction.
>
>
>   /**
>    * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>    * @vcpu:	The VCPU pointer
> @@ -339,6 +396,26 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>   		kvm_guest_exit();
>   		debug_ws_exit(vcpu->arch.regs.pc);
>   		trace_kvm_exit(vcpu->arch.regs.pc);
> +
> +		ret = handle_exit(vcpu, run, ret);
> +		if (ret) {
> +			kvm_err(ret, "Error in handle_exit");
> +			break;
> +		}
> +
> +		if (run->exit_reason == KVM_EXIT_MMIO)
> +			break;
> +
> +		if (need_resched()) {
> +			vcpu_put(vcpu);
> +			schedule();
> +			vcpu_load(vcpu);
> +		}

Preempt notifiers mean you don't need vcpu_put()/vcpu_load() - the 
scheduler will call kvm_arch_vcpu_put/load() automatically during 
context switch.

> +
> +		if (signal_pending(current)&&  !(run->exit_reason)) {
> +			run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
> +			break;
> +		}
>   	}

You're supposed to return -EINTR on a signal.  run->exit_reason isn't 
defined in this case, but traditionally we return KVM_EXIT_INTR (which 
means host signal, not guest signal - yes it's confusing).

> +
> +/**
> + * emulate_cp15_c15_access -- emulates cp15 accesses for CRn == 15
> + * @vcpu: The VCPU pointer
> + * @p:    The coprocessor parameters struct pointer holding trap inst. details
> + *
> + * The CP15 c15 register is implementation defined, but some guest kernels
> + * attempt to read/write a diagnostics register here. We always return 0 and
> + * ignore writes and hope for the best. This may need to be refined.
> + */
> +static int emulate_cp15_c15_access(struct kvm_vcpu *vcpu,
> +				   struct coproc_params *p)
> +{
> +	trace_kvm_emulate_cp15_imp(p->Op1, p->Rt1, p->CRn, p->CRm,
> +				   p->Op2, p->is_write);

_imp?

> +
> +	if (!p->is_write)
> +		*vcpu_reg(vcpu, p->Rt1) = 0;
> +
> +	return 0;
> +}
> +
>

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Android-virt] [PATCH v4 02/10] ARM: KVM: Hypervisor identity mapping
  2011-08-09 10:23       ` [Android-virt] " Alexey Smirnov
@ 2011-08-09 11:23         ` Christoffer Dall
  0 siblings, 0 replies; 34+ messages in thread
From: Christoffer Dall @ 2011-08-09 11:23 UTC (permalink / raw)
  To: Alexey Smirnov; +Cc: tech, Avi Kivity, android-virt, kvm


On Aug 9, 2011, at 12:23 PM, Alexey Smirnov wrote:

> Hi Christoffer,
> 
>>> 
>>> These are for mapping host memory, not guest memory, right?
>> 
>> yes (or to be exact - hypervisor memory). The point is that there are special hardware requirements for translation tables used in Hyp-mode not otherwise satisfied by the normal page tables.
> 
> In function init_hyp_memory() you map some memory regions for vectors,
> vcpu, stack, etc. using function create_hyp_mappings. Just wondering,
> how do you make sure that guest will never map its own data into these
> addresses? Since guest is not para-virtualized, it can use any VA it
> wants, including these addresses.
> 
> In your earlier KVM-arm paper you mentioned that such mappings were
> write-protected, so whenever guest tried to access them you needed to
> relocate such shared pages. Is the mechanism the same or you somehow
> take advantage of virtualization extensions to avoid this problem?

I take advantage of the virtualization extensions. These mappings are used only in Hyp-mode, which is only used to handle exceptions from the guest and to perform world-switches. Thus, these mappings are completely orthogonal to the 2nd stage translations used when running the VM.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Android-virt] [PATCH v4 04/10] ARM: KVM: Memory virtualization setup
  2011-08-09  9:57   ` Avi Kivity
@ 2011-08-09 11:24     ` Christoffer Dall
  0 siblings, 0 replies; 34+ messages in thread
From: Christoffer Dall @ 2011-08-09 11:24 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoffer Dall, tech, android-virt, kvm


On Aug 9, 2011, at 11:57 AM, Avi Kivity wrote:

> On 08/06/2011 01:39 PM, Christoffer Dall wrote:
>> This commit introduces the framework for guest memory management
>> through the use of 2nd stage translation. Each VM has a pointer
>> to a level-1 tabled (the pgd field in struct kvm_arch) which is
>> used for the 2nd stage translations. Entries are added when handling
>> guest faults (later patch) and the table itself can be allocated and
>> freed through the following functions implemented in
>> arch/arm/kvm/arm_mmu.c:
>>  - kvm_alloc_stage2_pgd(struct kvm *kvm);
>>  - kvm_free_stage2_pgd(struct kvm *kvm);
>> 
>> Further, each entry in TLBs and caches are tagged with a VMID
>> identifier in addition to ASIDs. The VMIDs are managed using
>> a bitmap and assigned when creating the VM in kvm_arch_init_vm()
>> where the 2nd stage pgd is also allocated. The table is freed in
>> kvm_arch_destroy_vm(). Both functions are called from the main
>> KVM code.
>> 
>> 
>> +/**
>> + * kvm_arch_init_vm - initializes a VM data structure
>> + * @kvm:	pointer to the KVM struct
>> + */
>>  int kvm_arch_init_vm(struct kvm *kvm)
>>  {
>> -	return 0;
>> +	int ret = 0;
>> +	phys_addr_t pgd_phys;
>> +	unsigned long vmid;
>> +	unsigned long start, end;
>> +
>> +
>> +	mutex_lock(&kvm_vmids_mutex);
>> +	vmid = find_first_zero_bit(kvm_vmids, VMID_SIZE);
>> +	if (vmid>= VMID_SIZE) {
>> +		mutex_unlock(&kvm_vmids_mutex);
>> +		return -EBUSY;
>> +	}
>> +	__set_bit(vmid, kvm_vmids);
> 
> VMID_SIZE seems to be a bit low for comfort.  I guess it's fine for a 
> start, but later on we'll have to recycle VMIDs, like we do for SVM ASIDs.
> 
> Is there not a risk of a user starting 255 tiny guests and denying other 
> users the ability to use kvm?

yes there absolutely is, if that's a valid use case. I wanted something simple for now, but completely agree that with ARM machines with TBs of memory, we need some more VMIDs. I will incorporate it into the next patch series.
> 
>> +	kvm->arch.vmid = vmid;
>> +	mutex_unlock(&kvm_vmids_mutex);
>> +
>> +	ret = kvm_alloc_stage2_pgd(kvm);
>> +	if (ret)
>> +		goto out_fail_alloc;
>> +
>> +	pgd_phys = virt_to_phys(kvm->arch.pgd);
>> +	kvm->arch.vttbr = pgd_phys&  ((1LLU<<  40) - 1)&  ~((2<<  VTTBR_X) - 1);
>> +	kvm->arch.vttbr |= ((u64)vmid<<  48);
>> +
>> +	start = (unsigned long)kvm,
>> +	end = start + sizeof(struct kvm);
>> +	ret = create_hyp_mappings(kvm_hyp_pgd, start, end);
>> +	if (ret)
>> +		goto out_fail_hyp_mappings;
>> +
>> +	return ret;
>> +out_fail_hyp_mappings:
>> +	remove_hyp_mappings(kvm_hyp_pgd, start, end);
>> +out_fail_alloc:
>> +	clear_bit(vmid, kvm_vmids);
>> +	return ret;
>>  }
>> 
> 
> -- 
> error compiling committee.c: too many arguments to function
> 
> _______________________________________________
> Android-virt mailing list
> Android-virt@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/android-virt


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 08/10] ARM: KVM: Handle guest faults in KVM
  2011-08-06 10:39 ` [PATCH v4 08/10] ARM: KVM: Handle guest faults in KVM Christoffer Dall
@ 2011-08-09 11:24   ` Avi Kivity
  2011-08-09 11:35     ` Christoffer Dall
  0 siblings, 1 reply; 34+ messages in thread
From: Avi Kivity @ 2011-08-09 11:24 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, catalin.marinas, tech, android-virt

On 08/06/2011 01:39 PM, Christoffer Dall wrote:
> Handles the guest faults in KVM by mapping in corresponding user pages
> in the 2nd stage page tables.
>
> Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
> pgprot_guest variables used to map 2nd stage memory for KVM guests.
>
>
>
> +static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +			  gfn_t gfn, struct kvm_memory_slot *memslot)
> +{
> +	pfn_t pfn;
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte, new_pte;
> +
> +	pfn = gfn_to_pfn(vcpu->kvm, gfn);
> +
> +	if (is_error_pfn(pfn)) {
> +		kvm_err(-EFAULT, "Guest gfn %u (0x%08lx) does not have "
> +				"corresponding host mapping",
> +				gfn, gfn<<  PAGE_SHIFT);
> +		return -EFAULT;
> +	}
> +
> +	/* Create 2nd stage page table mapping - Level 1 */
> +	pgd = vcpu->kvm->arch.pgd + pgd_index(fault_ipa);
> +	pud = pud_offset(pgd, fault_ipa);
> +	if (pud_none(*pud)) {
> +		pmd = pmd_alloc_one(NULL, fault_ipa);
> +		if (!pmd) {
> +			kvm_err(-ENOMEM, "Cannot allocate 2nd stage pmd");
> +			return -ENOMEM;
> +		}
> +		pud_populate(NULL, pud, pmd);
> +		pmd += pmd_index(fault_ipa);

Don't we need locking here?  Another vcpu may have executed pud_populate 
concurrently.

> +	} else
> +		pmd = pmd_offset(pud, fault_ipa);
> +
> +	/* Create 2nd stage page table mapping - Level 2 */
> +	if (pmd_none(*pmd)) {
> +		pte = pte_alloc_one_kernel(NULL, fault_ipa);
> +		if (!pte) {
> +			kvm_err(-ENOMEM, "Cannot allocate 2nd stage pte");
> +			return -ENOMEM;
> +		}
> +		pmd_populate_kernel(NULL, pmd, pte);
> +		pte += pte_index(fault_ipa);
> +	} else
> +		pte = pte_offset_kernel(pmd, fault_ipa);
> +
> +	/* Create 2nd stage page table mapping - Level 3 */
> +	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
> +	set_pte_ext(pte, new_pte, 0);
> +
> +	return 0;
> +}
> +
> +#define HSR_ABT_FS	(0x3f)
>

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Android-virt] [PATCH v4 05/10] ARM: KVM: Inject IRQs and FIQs from userspace
  2011-08-09 10:07   ` Avi Kivity
@ 2011-08-09 11:27     ` Christoffer Dall
  2011-08-09 11:37       ` Avi Kivity
  0 siblings, 1 reply; 34+ messages in thread
From: Christoffer Dall @ 2011-08-09 11:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoffer Dall, tech, android-virt, kvm


On Aug 9, 2011, at 12:07 PM, Avi Kivity wrote:

> On 08/06/2011 01:39 PM, Christoffer Dall wrote:
>> Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
>> This ioctl is used since the sematics are in fact two lines that can be
>> either raised or lowered on the VCPU - the IRQ and FIQ lines.
>> 
>> KVM needs to know which VCPU it must operate on and whether the FIQ or
>> IRQ line is raised/lowered. Hence both pieces of information is packed
>> in the kvm_irq_level->irq field. The irq fild value will be:
>>   IRQ: vcpu_index * 2
>>   FIQ: (vcpu_index * 2) + 1
>> 
>> This is documented in Documentation/kvm/api.txt.
>> 
>> The effect of the ioctl is simply to simply raise/lower the
>> corresponding virt_irq field on the VCPU struct, which will cause the
>> world-switch code to raise/lower virtual interrupts when running the
>> guest on next switch. The wait_for_interrupt flag is also cleared for
>> raised IRQs causing an idle VCPU to become active again.
> 
> Note x86 starts out with a default configuration and allows updating it 
> via KVM_SET_GSI_ROUTING.  You may need this in the future if you decide 
> to implement an irq controller in the kernel.

Will probably happen some time. Noted.

> 
>> +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
>> +				      struct kvm_irq_level *irq_level)
>> +{
>> +	u32 mask;
>> +	unsigned int vcpu_idx;
>> +	struct kvm_vcpu *vcpu;
>> +
>> +	vcpu_idx = irq_level->irq / 2;
>> +	if (vcpu_idx>= KVM_MAX_VCPUS)
>> +		return -EINVAL;
>> +
>> +	vcpu = kvm_get_vcpu(kvm, vcpu_idx);
>> +	if (!vcpu)
>> +		return -EINVAL;
>> +
>> +	switch (irq_level->irq % 2) {
>> +	case KVM_ARM_IRQ_LINE:
>> +		mask = HCR_VI;
>> +		break;
>> +	case KVM_ARM_FIQ_LINE:
>> +		mask = HCR_VF;
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	trace_kvm_irq_line(irq_level->irq % 2, irq_level->level, vcpu_idx);
> 
> Please reuse trace_kvm_set_irq().  You can decode vcpu/type in a 
> trace-cmd plugin.

OK

> 
>> +
>> +	if (irq_level->level) {
>> +		vcpu->arch.virt_irq |= mask;
>> +		vcpu->arch.wait_for_interrupts = 0;
>> +	} else
>> +		vcpu->arch.virt_irq&= ~mask;
>> +
> 
> This seems to be non-smp-safe?  Do you need atomic ops and barriers 
> here?  And a wakeup?

The whole thing is not SMP tested yet, so I took some shortcuts. I only recently got hold of a SMP model and SMP support will be a focus area for the next series. Thanks for pin-pointing this though.

> 
> Unlike KVM_INTERRUPT, KVM_IRQ_LINE is designed to be used asynchronously 
> wrt the vcpu.
> 
>> +	return 0;
>> +}
>> +
>>  long kvm_arch_vcpu_ioctl(struct file *filp,
>>  			 unsigned int ioctl, unsigned long arg)
>>  {
>> @@ -312,8 +349,21 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>  long kvm_arch_vm_ioctl(struct file *filp,
>>  		       unsigned int ioctl, unsigned long arg)
>>  {
>> -	printk(KERN_ERR "kvm_arch_vm_ioctl: Unsupported ioctl (%d)\n", ioctl);
>> -	return -EINVAL;
>> +	struct kvm *kvm = filp->private_data;
>> +	void __user *argp = (void __user *)arg;
>> +
>> +	switch (ioctl) {
>> +	case KVM_IRQ_LINE: {
>> +		struct kvm_irq_level irq_event;
>> +
>> +		if (copy_from_user(&irq_event, argp, sizeof irq_event))
>> +			return -EFAULT;
>> +		return kvm_arch_vm_ioctl_irq_line(kvm,&irq_event);
>> +	}
>> +	default:
>> +		kvm_err(-EINVAL, "Unsupported ioctl (%d)", ioctl);
> 
> Please remove for the final code, we don't want a user spamming the 
> kernel log.

OK. Good point.

> 
>> +		return -EINVAL;
>> +	}
>>  }
>> 
>> 
> 
> -- 
> error compiling committee.c: too many arguments to function
> 
> _______________________________________________
> Android-virt mailing list
> Android-virt@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/android-virt


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 06/10] ARM: KVM: World-switch implementation
  2011-08-09 11:09   ` Avi Kivity
@ 2011-08-09 11:29     ` Christoffer Dall
  0 siblings, 0 replies; 34+ messages in thread
From: Christoffer Dall @ 2011-08-09 11:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoffer Dall, kvm, catalin.marinas, tech, android-virt


On Aug 9, 2011, at 1:09 PM, Avi Kivity wrote:

> On 08/06/2011 01:39 PM, Christoffer Dall wrote:
>> Provides complete world-switch implementation to switch to other guests
>> runinng in non-secure modes. Includes Hyp exception handlers that
>> captures necessary exception information and stores the information on
>> the VCPU and KVM structures.
>> 
>> Switching to Hyp mode is done through a simple HVC instructions. The
>> exception vector code will check that the HVC comes from VMID==0 and if
>> so will store the necessary state on the Hyp stack, which will look like
>> this (see hyp_hvc):
>>   ...
>>   Hyp_Sp + 4: lr_usr
>>   Hyp_Sp    : spsr (Host-SVC cpsr)
>> 
>> When returning from Hyp mode to SVC mode, another HVC instruction is
>> executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp
>> stack pointer should be where it was left from the above initial call,
>> since the values on the stack will be used to restore state (see
>> hyp_svc).
>> 
>> Otherwise, the world-switch is pretty straight-forward. All state that
>> can be modified by the guest is first backed up on the Hyp stack and the
>> VCPU values is loaded onto the hardware. State, which is not loaded, but
>> theoretically modifiable by the guest is protected through the
>> virtualiation features to generate a trap and cause software emulation.
>> Upon guest returns, all state is restored from hardware onto the VCPU
>> struct and the original state is restored from the Hyp-stack onto the
>> hardware.
>> 
>> One controversy may be the back-door call to __irq_svc (the host
>> kernel's own physical IRQ handler) which is called when a physical IRQ
>> exception is taken in Hyp mode while running in the guest.
>> 
>> 
>>  void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
>>  {
>> +	unsigned long start, end;
>> +
>>  	latest_vcpu = NULL;
>> -	KVMARM_NOT_IMPLEMENTED();
>> +
>> +	start = (unsigned long)vcpu,
>> +	end = start + sizeof(struct kvm_vcpu);
>> +	remove_hyp_mappings(kvm_hyp_pgd, start, end);
> 
> What if vcpu shares a page with another mapped structure?

Then we have a problem. I will think about this. Thanks.

> 
>> +
>> +	kmem_cache_free(kvm_vcpu_cache, vcpu);
>>  }
> 
>>  	return 0;
>>  }
>> 
>> +/**
>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>> + * @vcpu:	The VCPU pointer
>> + * @run:	The kvm_run structure pointer used for userspace state exchange
>> + *
>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>> + * will execute VM code in a loop until the time slice for the process is used
>> + * or some emulation is needed from user space in which case the function will
>> + * return with return value 0 and with the kvm_run structure filled in with the
>> + * required data for the requested emulation.
>> + */
>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  {
>> -	KVMARM_NOT_IMPLEMENTED();
>> -	return -EINVAL;
>> +	unsigned long flags;
>> +	int ret;
>> +
>> +	for (;;) {
>> +		trace_kvm_entry(vcpu->arch.regs.pc);
>> +		debug_ws_enter(vcpu->arch.regs.pc);
> 
> why both trace_kvm and debug_ws?

Because tracepoints was a bit impractical to use for debugging some problems, but this is actually not relevant anymore and I will clean it up for next series. Sorry.

> 
>> +		kvm_guest_enter();
>> +
>> +		local_irq_save(flags);
> 
> local_irq_disable() is likely sufficient - the call path never changes.

good point.

> 
>> +		ret = __kvm_vcpu_run(vcpu);
>> +		local_irq_restore(flags);
>> +
>> +		kvm_guest_exit();
>> +		debug_ws_exit(vcpu->arch.regs.pc);
>> +		trace_kvm_exit(vcpu->arch.regs.pc);
>> +	}
>> +
>> +	return ret;
>>  }
>> 
>> 
> 
> -- 
> error compiling committee.c: too many arguments to function
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 07/10] ARM: KVM: Emulation framework and CP15 emulation
  2011-08-09 11:17   ` Avi Kivity
@ 2011-08-09 11:34     ` Christoffer Dall
  2011-08-09 11:39       ` Avi Kivity
  0 siblings, 1 reply; 34+ messages in thread
From: Christoffer Dall @ 2011-08-09 11:34 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoffer Dall, kvm, catalin.marinas, tech, android-virt


On Aug 9, 2011, at 1:17 PM, Avi Kivity wrote:

> On 08/06/2011 01:39 PM, Christoffer Dall wrote:
>> Adds a new important function in the main KVM/ARM code called
>> handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
>> from guest execution. This function examines the Hyp-Syndrome-Register
>> (HSR), which contains information telling KVM what caused the exit from
>> the guest.
>> 
>> Some of the reasons for an exit are CP15 accesses, which are
>> not allowed from the guest and this commits handles these exits by
>> emulating the intented operation in software and skip the guest
>> instruction.
>> 
>> 
>>  /**
>>   * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>>   * @vcpu:	The VCPU pointer
>> @@ -339,6 +396,26 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  		kvm_guest_exit();
>>  		debug_ws_exit(vcpu->arch.regs.pc);
>>  		trace_kvm_exit(vcpu->arch.regs.pc);
>> +
>> +		ret = handle_exit(vcpu, run, ret);
>> +		if (ret) {
>> +			kvm_err(ret, "Error in handle_exit");
>> +			break;
>> +		}
>> +
>> +		if (run->exit_reason == KVM_EXIT_MMIO)
>> +			break;
>> +
>> +		if (need_resched()) {
>> +			vcpu_put(vcpu);
>> +			schedule();
>> +			vcpu_load(vcpu);
>> +		}
> 
> Preempt notifiers mean you don't need vcpu_put()/vcpu_load() - the scheduler will call kvm_arch_vcpu_put/load() automatically during context switch.

cool. thanks.

> 
>> +
>> +		if (signal_pending(current)&&  !(run->exit_reason)) {
>> +			run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
>> +			break;
>> +		}
>>  	}
> 
> You're supposed to return -EINTR on a signal.  run->exit_reason isn't defined in this case, but traditionally we return KVM_EXIT_INTR (which means host signal, not guest signal - yes it's confusing).

thanks for clearing that up.

> 
>> +
>> +/**
>> + * emulate_cp15_c15_access -- emulates cp15 accesses for CRn == 15
>> + * @vcpu: The VCPU pointer
>> + * @p:    The coprocessor parameters struct pointer holding trap inst. details
>> + *
>> + * The CP15 c15 register is implementation defined, but some guest kernels
>> + * attempt to read/write a diagnostics register here. We always return 0 and
>> + * ignore writes and hope for the best. This may need to be refined.
>> + */
>> +static int emulate_cp15_c15_access(struct kvm_vcpu *vcpu,
>> +				   struct coproc_params *p)
>> +{
>> +	trace_kvm_emulate_cp15_imp(p->Op1, p->Rt1, p->CRn, p->CRm,
>> +				   p->Op2, p->is_write);
> 
> _imp?

implementation defined co-processor 15 operations. Took me 10 minutes to dig out from memory, so, ok, this is not super informative or clear:) Will try to come up with something better or the right comment somewhere or something.

> 
>> +
>> +	if (!p->is_write)
>> +		*vcpu_reg(vcpu, p->Rt1) = 0;
>> +
>> +	return 0;
>> +}
>> +
>> 
> 
> -- 
> error compiling committee.c: too many arguments to function
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 09/10] ARM: KVM: Handle I/O aborts
  2011-08-06 10:40 ` [PATCH v4 09/10] ARM: KVM: Handle I/O aborts Christoffer Dall
@ 2011-08-09 11:34   ` Avi Kivity
  2011-08-09 11:39     ` Christoffer Dall
  0 siblings, 1 reply; 34+ messages in thread
From: Avi Kivity @ 2011-08-09 11:34 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, catalin.marinas, tech, android-virt

On 08/06/2011 01:40 PM, Christoffer Dall wrote:
> When the guest accesses I/O memory this will create data abort
> exceptions and they are handled by decoding the HSR information
> (physical address, read/write, length, register) and forwarding reads
> and writes to QEMU which performs the device emulation.
>
> Certain classes of load/store operations do not support the syndrome
> information provided in the HSR and we therefore must be able to fetch
> the offending instruction from guest memory and decode it manually.
>
> This requires changing the general flow somewhat since new calls to run
> the VCPU must check if there's a pending MMIO load and perform the write
> after userspace has made the data available.

We need to move this to arch independent code.  Outside the scope of 
these patches, of course.

>   /******************************************************************************
> - * Co-processor emulation
> + * Utility functions common for all emulation code
> + *****************************************************************************/
> +
> +/*
> + * This one accepts a matrix where the first element is the
> + * bits as they must be, and the second element is the bitmask.
>    */
> +#define INSTR_NONE	-1
> +static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
> +{
> +	int i;
> +	u32 mask;
> +
> +	for (i = 0; i<  table_entries; i++) {
> +		mask = table[i][1];
> +		if ((table[i][0]&  mask) == (instr&  mask))
> +			return i;
> +	}
> +	return INSTR_NONE;
> +}

Seems somewhat inefficient to do this for insn emulation.  Is there not 
a common prefix that can be used to determine the mask?

> +
> +/*
> + * Must be ordered with LOADS first and WRITES afterwards
> + * for easy distinction when doing MMIO.
> + */
> +#define NUM_LD_INSTR  9
> +enum INSTR_LS_INDEXES {
> +	INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
> +	INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
> +	INSTR_LS_LDRSH,
> +	INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
> +	INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
> +	NUM_LS_INSTR
> +};
> +
> +static u32 ls_instr[NUM_LS_INSTR][2] = {
> +	{0x04700000, 0x0d700000}, /* LDRBT */
> +	{0x04300000, 0x0d700000}, /* LDRT  */
> +	{0x04100000, 0x0c500000}, /* LDR   */
> +	{0x04500000, 0x0c500000}, /* LDRB  */
> +	{0x000000d0, 0x0e1000f0}, /* LDRD  */
> +	{0x01900090, 0x0ff000f0}, /* LDREX */
> +	{0x001000b0, 0x0e1000f0}, /* LDRH  */
> +	{0x001000d0, 0x0e1000f0}, /* LDRSB */
> +	{0x001000f0, 0x0e1000f0}, /* LDRSH */
> +	{0x04600000, 0x0d700000}, /* STRBT */
> +	{0x04200000, 0x0d700000}, /* STRT  */
> +	{0x04000000, 0x0c500000}, /* STR   */
> +	{0x04400000, 0x0c500000}, /* STRB  */
> +	{0x000000f0, 0x0e1000f0}, /* STRD  */
> +	{0x01800090, 0x0ff000f0}, /* STREX */
> +	{0x000000b0, 0x0e1000f0}  /* STRH  */
> +};
> +

Okay, maybe not.  But surely there's some clever arithmetic the cpu uses 
to decode this.

> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index 381ea4a..4f20d75 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -39,6 +39,21 @@ TRACE_EVENT(kvm_exit,
>   	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
>   );
>
> +TRACE_EVENT(kvm_mmio_emulate,
> +	TP_PROTO(unsigned long vcpu_pc),
> +	TP_ARGS(vcpu_pc),

Please add the instruction bytes and any other information needed to 
decode the opcode (e.g. thumb mode).  Forx86 we have a trace-cmd plugin 
that disassembles guest instructions into the trace; it's very useful.

> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	vcpu_pc		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu_pc		= vcpu_pc;
> +	),
> +
> +	TP_printk("Emulate MMIO at: 0x%08lx", __entry->vcpu_pc)
> +);
> +
>   TRACE_EVENT(kvm_emulate_cp15_imp,
>   	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
>   		 unsigned long CRm, unsigned long Op2, bool is_write),
>
>

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 08/10] ARM: KVM: Handle guest faults in KVM
  2011-08-09 11:24   ` Avi Kivity
@ 2011-08-09 11:35     ` Christoffer Dall
  0 siblings, 0 replies; 34+ messages in thread
From: Christoffer Dall @ 2011-08-09 11:35 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoffer Dall, kvm, catalin.marinas, tech, android-virt


On Aug 9, 2011, at 1:24 PM, Avi Kivity wrote:

> On 08/06/2011 01:39 PM, Christoffer Dall wrote:
>> Handles the guest faults in KVM by mapping in corresponding user pages
>> in the 2nd stage page tables.
>> 
>> Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
>> pgprot_guest variables used to map 2nd stage memory for KVM guests.
>> 
>> 
>> 
>> +static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> +			  gfn_t gfn, struct kvm_memory_slot *memslot)
>> +{
>> +	pfn_t pfn;
>> +	pgd_t *pgd;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	pte_t *pte, new_pte;
>> +
>> +	pfn = gfn_to_pfn(vcpu->kvm, gfn);
>> +
>> +	if (is_error_pfn(pfn)) {
>> +		kvm_err(-EFAULT, "Guest gfn %u (0x%08lx) does not have "
>> +				"corresponding host mapping",
>> +				gfn, gfn<<  PAGE_SHIFT);
>> +		return -EFAULT;
>> +	}
>> +
>> +	/* Create 2nd stage page table mapping - Level 1 */
>> +	pgd = vcpu->kvm->arch.pgd + pgd_index(fault_ipa);
>> +	pud = pud_offset(pgd, fault_ipa);
>> +	if (pud_none(*pud)) {
>> +		pmd = pmd_alloc_one(NULL, fault_ipa);
>> +		if (!pmd) {
>> +			kvm_err(-ENOMEM, "Cannot allocate 2nd stage pmd");
>> +			return -ENOMEM;
>> +		}
>> +		pud_populate(NULL, pud, pmd);
>> +		pmd += pmd_index(fault_ipa);
> 
> Don't we need locking here?  Another vcpu may have executed pud_populate concurrently.

Absolutely, but there is no SMP support yet and only a single VCPU is supported. Again, focus area in next patch series and thanks for the pinpoint.

> 
>> +	} else
>> +		pmd = pmd_offset(pud, fault_ipa);
>> +
>> +	/* Create 2nd stage page table mapping - Level 2 */
>> +	if (pmd_none(*pmd)) {
>> +		pte = pte_alloc_one_kernel(NULL, fault_ipa);
>> +		if (!pte) {
>> +			kvm_err(-ENOMEM, "Cannot allocate 2nd stage pte");
>> +			return -ENOMEM;
>> +		}
>> +		pmd_populate_kernel(NULL, pmd, pte);
>> +		pte += pte_index(fault_ipa);
>> +	} else
>> +		pte = pte_offset_kernel(pmd, fault_ipa);
>> +
>> +	/* Create 2nd stage page table mapping - Level 3 */
>> +	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>> +	set_pte_ext(pte, new_pte, 0);
>> +
>> +	return 0;
>> +}
>> +
>> +#define HSR_ABT_FS	(0x3f)
>> 
> 
> -- 
> error compiling committee.c: too many arguments to function
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Android-virt] [PATCH v4 05/10] ARM: KVM: Inject IRQs and FIQs from userspace
  2011-08-09 11:27     ` [Android-virt] " Christoffer Dall
@ 2011-08-09 11:37       ` Avi Kivity
  2011-08-09 11:40         ` Christoffer Dall
  0 siblings, 1 reply; 34+ messages in thread
From: Avi Kivity @ 2011-08-09 11:37 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Christoffer Dall, tech, android-virt, kvm

On 08/09/2011 02:27 PM, Christoffer Dall wrote:
> >
> >>  +
> >>  +	if (irq_level->level) {
> >>  +		vcpu->arch.virt_irq |= mask;
> >>  +		vcpu->arch.wait_for_interrupts = 0;
> >>  +	} else
> >>  +		vcpu->arch.virt_irq&= ~mask;
> >>  +
> >
> >  This seems to be non-smp-safe?  Do you need atomic ops and barriers
> >  here?  And a wakeup?
>
> The whole thing is not SMP tested yet, so I took some shortcuts. I only recently got hold of a SMP model and SMP support will be a focus area for the next series. Thanks for pin-pointing this though.

Note even a single vcpu guest on an smp host needs this.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 07/10] ARM: KVM: Emulation framework and CP15 emulation
  2011-08-09 11:34     ` Christoffer Dall
@ 2011-08-09 11:39       ` Avi Kivity
  2011-08-09 11:40         ` Christoffer Dall
  0 siblings, 1 reply; 34+ messages in thread
From: Avi Kivity @ 2011-08-09 11:39 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, kvm, catalin.marinas, tech, android-virt

On 08/09/2011 02:34 PM, Christoffer Dall wrote:
> >
> >>  +
> >>  +/**
> >>  + * emulate_cp15_c15_access -- emulates cp15 accesses for CRn == 15
> >>  + * @vcpu: The VCPU pointer
> >>  + * @p:    The coprocessor parameters struct pointer holding trap inst. details
> >>  + *
> >>  + * The CP15 c15 register is implementation defined, but some guest kernels
> >>  + * attempt to read/write a diagnostics register here. We always return 0 and
> >>  + * ignore writes and hope for the best. This may need to be refined.
> >>  + */
> >>  +static int emulate_cp15_c15_access(struct kvm_vcpu *vcpu,
> >>  +				   struct coproc_params *p)
> >>  +{
> >>  +	trace_kvm_emulate_cp15_imp(p->Op1, p->Rt1, p->CRn, p->CRm,
> >>  +				   p->Op2, p->is_write);
> >
> >  _imp?
>
> implementation defined co-processor 15 operations. Took me 10 minutes to dig out from memory, so, ok, this is not super informative or clear:) Will try to come up with something better or the right comment somewhere or something.
>

Ah, okay.  It's not related to the kvm implementation, it's 
architecturally implementation defined.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 09/10] ARM: KVM: Handle I/O aborts
  2011-08-09 11:34   ` Avi Kivity
@ 2011-08-09 11:39     ` Christoffer Dall
  2011-08-09 11:46       ` Avi Kivity
  0 siblings, 1 reply; 34+ messages in thread
From: Christoffer Dall @ 2011-08-09 11:39 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoffer Dall, kvm, catalin.marinas, tech, android-virt


On Aug 9, 2011, at 1:34 PM, Avi Kivity wrote:

> On 08/06/2011 01:40 PM, Christoffer Dall wrote:
>> When the guest accesses I/O memory this will create data abort
>> exceptions and they are handled by decoding the HSR information
>> (physical address, read/write, length, register) and forwarding reads
>> and writes to QEMU which performs the device emulation.
>> 
>> Certain classes of load/store operations do not support the syndrome
>> information provided in the HSR and we therefore must be able to fetch
>> the offending instruction from guest memory and decode it manually.
>> 
>> This requires changing the general flow somewhat since new calls to run
>> the VCPU must check if there's a pending MMIO load and perform the write
>> after userspace has made the data available.
> 
> We need to move this to arch independent code.  Outside the scope of these patches, of course.

OK, let me know what I can do to make this fit with the ARM implementation nicely.

> 
>>  /******************************************************************************
>> - * Co-processor emulation
>> + * Utility functions common for all emulation code
>> + *****************************************************************************/
>> +
>> +/*
>> + * This one accepts a matrix where the first element is the
>> + * bits as they must be, and the second element is the bitmask.
>>   */
>> +#define INSTR_NONE	-1
>> +static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
>> +{
>> +	int i;
>> +	u32 mask;
>> +
>> +	for (i = 0; i<  table_entries; i++) {
>> +		mask = table[i][1];
>> +		if ((table[i][0]&  mask) == (instr&  mask))
>> +			return i;
>> +	}
>> +	return INSTR_NONE;
>> +}
> 
> Seems somewhat inefficient to do this for insn emulation.  Is there not a common prefix that can be used to determine the mask?

hehe, not so much.

> 
>> +
>> +/*
>> + * Must be ordered with LOADS first and WRITES afterwards
>> + * for easy distinction when doing MMIO.
>> + */
>> +#define NUM_LD_INSTR  9
>> +enum INSTR_LS_INDEXES {
>> +	INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
>> +	INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
>> +	INSTR_LS_LDRSH,
>> +	INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
>> +	INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
>> +	NUM_LS_INSTR
>> +};
>> +
>> +static u32 ls_instr[NUM_LS_INSTR][2] = {
>> +	{0x04700000, 0x0d700000}, /* LDRBT */
>> +	{0x04300000, 0x0d700000}, /* LDRT  */
>> +	{0x04100000, 0x0c500000}, /* LDR   */
>> +	{0x04500000, 0x0c500000}, /* LDRB  */
>> +	{0x000000d0, 0x0e1000f0}, /* LDRD  */
>> +	{0x01900090, 0x0ff000f0}, /* LDREX */
>> +	{0x001000b0, 0x0e1000f0}, /* LDRH  */
>> +	{0x001000d0, 0x0e1000f0}, /* LDRSB */
>> +	{0x001000f0, 0x0e1000f0}, /* LDRSH */
>> +	{0x04600000, 0x0d700000}, /* STRBT */
>> +	{0x04200000, 0x0d700000}, /* STRT  */
>> +	{0x04000000, 0x0c500000}, /* STR   */
>> +	{0x04400000, 0x0c500000}, /* STRB  */
>> +	{0x000000f0, 0x0e1000f0}, /* STRD  */
>> +	{0x01800090, 0x0ff000f0}, /* STREX */
>> +	{0x000000b0, 0x0e1000f0}  /* STRH  */
>> +};
>> +
> 
> Okay, maybe not.  But surely there's some clever arithmetic the cpu uses to decode this.

Probably, but this is only used in the rare case when the virt. extensions doesn't support the fault information. I highly doubt that this is in any critical path for any sane guest OS, but surely one could write a VM that would run very slow. I would like not to spend time on this right now and perhaps get back to it when we have all sorts of other features in place. Or, what do you think?

> 
>> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
>> index 381ea4a..4f20d75 100644
>> --- a/arch/arm/kvm/trace.h
>> +++ b/arch/arm/kvm/trace.h
>> @@ -39,6 +39,21 @@ TRACE_EVENT(kvm_exit,
>>  	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
>>  );
>> 
>> +TRACE_EVENT(kvm_mmio_emulate,
>> +	TP_PROTO(unsigned long vcpu_pc),
>> +	TP_ARGS(vcpu_pc),
> 
> Please add the instruction bytes and any other information needed to decode the opcode (e.g. thumb mode).  Forx86 we have a trace-cmd plugin that disassembles guest instructions into the trace; it's very useful.

that's a good idea. I will look into it.

> 
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	unsigned long,	vcpu_pc		)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->vcpu_pc		= vcpu_pc;
>> +	),
>> +
>> +	TP_printk("Emulate MMIO at: 0x%08lx", __entry->vcpu_pc)
>> +);
>> +
>>  TRACE_EVENT(kvm_emulate_cp15_imp,
>>  	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
>>  		 unsigned long CRm, unsigned long Op2, bool is_write),
>> 
>> 
> 
> -- 
> error compiling committee.c: too many arguments to function
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Android-virt] [PATCH v4 05/10] ARM: KVM: Inject IRQs and FIQs from userspace
  2011-08-09 11:37       ` Avi Kivity
@ 2011-08-09 11:40         ` Christoffer Dall
  0 siblings, 0 replies; 34+ messages in thread
From: Christoffer Dall @ 2011-08-09 11:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoffer Dall, tech, android-virt, kvm


On Aug 9, 2011, at 1:37 PM, Avi Kivity wrote:

> On 08/09/2011 02:27 PM, Christoffer Dall wrote:
>> >
>> >>  +
>> >>  +	if (irq_level->level) {
>> >>  +		vcpu->arch.virt_irq |= mask;
>> >>  +		vcpu->arch.wait_for_interrupts = 0;
>> >>  +	} else
>> >>  +		vcpu->arch.virt_irq&= ~mask;
>> >>  +
>> >
>> >  This seems to be non-smp-safe?  Do you need atomic ops and barriers
>> >  here?  And a wakeup?
>> 
>> The whole thing is not SMP tested yet, so I took some shortcuts. I only recently got hold of a SMP model and SMP support will be a focus area for the next series. Thanks for pin-pointing this though.
> 
> Note even a single vcpu guest on an smp host needs this.

yep, I am aware. It's on my to-do list. Thanks.

> 
> -- 
> error compiling committee.c: too many arguments to function
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 07/10] ARM: KVM: Emulation framework and CP15 emulation
  2011-08-09 11:39       ` Avi Kivity
@ 2011-08-09 11:40         ` Christoffer Dall
  0 siblings, 0 replies; 34+ messages in thread
From: Christoffer Dall @ 2011-08-09 11:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoffer Dall, kvm, catalin.marinas, tech, android-virt


On Aug 9, 2011, at 1:39 PM, Avi Kivity wrote:

> On 08/09/2011 02:34 PM, Christoffer Dall wrote:
>> >
>> >>  +
>> >>  +/**
>> >>  + * emulate_cp15_c15_access -- emulates cp15 accesses for CRn == 15
>> >>  + * @vcpu: The VCPU pointer
>> >>  + * @p:    The coprocessor parameters struct pointer holding trap inst. details
>> >>  + *
>> >>  + * The CP15 c15 register is implementation defined, but some guest kernels
>> >>  + * attempt to read/write a diagnostics register here. We always return 0 and
>> >>  + * ignore writes and hope for the best. This may need to be refined.
>> >>  + */
>> >>  +static int emulate_cp15_c15_access(struct kvm_vcpu *vcpu,
>> >>  +				   struct coproc_params *p)
>> >>  +{
>> >>  +	trace_kvm_emulate_cp15_imp(p->Op1, p->Rt1, p->CRn, p->CRm,
>> >>  +				   p->Op2, p->is_write);
>> >
>> >  _imp?
>> 
>> implementation defined co-processor 15 operations. Took me 10 minutes to dig out from memory, so, ok, this is not super informative or clear:) Will try to come up with something better or the right comment somewhere or something.
>> 
> 
> Ah, okay.  It's not related to the kvm implementation, it's architecturally implementation defined.

exactly.

> 
> -- 
> error compiling committee.c: too many arguments to function
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 00/10] KVM/ARM Implementation
  2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
                   ` (9 preceding siblings ...)
  2011-08-06 10:40 ` [PATCH v4 10/10] ARM: KVM: Guest wait-for-interrupts (WFI) support Christoffer Dall
@ 2011-08-09 11:43 ` Avi Kivity
  10 siblings, 0 replies; 34+ messages in thread
From: Avi Kivity @ 2011-08-09 11:43 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, catalin.marinas, tech, android-virt

On 08/06/2011 01:38 PM, Christoffer Dall wrote:
> The following series implements KVM support for ARM processors,
> specifically on the Cortex A-15 platform.
>
> The patch series applies to the arm-lpae branch of ARM Ltd's kernel
> tree. This is Version 4 of the patch series, but the first two versions
> were reviewed outside of the KVM mailing list. Changes can also be
> pulled from:
>    git://git.ncl.cs.columbia.edu/pub/git/linux-kvm-arm kvm-a15-v4
>
> The implementation is broken up into a logical set of patches, the first
> one containing a skeleton of files, makefile changes, the basic user
> space interface and KVM architecture specific stubs.  Subsequent patches
> implement parts of the system as listed:
>   1.  Skeleton
>   2.  Identity Mapping for Hyp mode
>   3.  Hypervisor intitalization
>   4.  Hyp mode memory mappings and 2nd stage preparation
>   5.  World-switch implementation and Hyp exception vectors
>   6.  Emulation framework and CP15 emulation
>   7.  Handle guest user memory aborts
>   8.  Handle guest MMIO aborts
>   9.  Handle userspace IRQ/FIQ injection
>   10. Support guest wait-for-interrupt instructions.
>
> Testing:
> Limited testing, but have run GCC inside guest, which compiled a small
> hellow-world program, which was successfully run. Hardware still
> unavailable, so all testing has been done on ARM Fast Models.
>
> For a guide on how to set up a testing environment and try out these
> patches, see:
>    http://wiki.ncl.cs.columbia.edu/wiki/KVMARM:Guides:Development_Environment

Pretty nice, and once again I congratulate you for not having to 
virtualize x86.

I don't know how close you feel you are to merging, but from my point of 
view things are looking good.  We'll need to coordinate trees and acks 
since this is touching more than just arch/arm/kvm.


>   arch/arm/kvm/arm.c                          |  701 +++++++++++++++++++++++++++
>   arch/arm/kvm/arm_emulate.c                  |  604 +++++++++++++++++++++++
>   arch/arm/kvm/arm_exports.c                  |   26 +
>   arch/arm/kvm/arm_guest.c                    |  150 ++++++
>   arch/arm/kvm/arm_init.S                     |  115 ++++
>   arch/arm/kvm/arm_interrupts.S               |  488 +++++++++++++++++++
>   arch/arm/kvm/arm_mmu.c                      |  549 +++++++++++++++++++++

Suggest eliminating the arm_ prefixes.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 09/10] ARM: KVM: Handle I/O aborts
  2011-08-09 11:39     ` Christoffer Dall
@ 2011-08-09 11:46       ` Avi Kivity
  0 siblings, 0 replies; 34+ messages in thread
From: Avi Kivity @ 2011-08-09 11:46 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, kvm, catalin.marinas, tech, android-virt

On 08/09/2011 02:39 PM, Christoffer Dall wrote:
> >>  +
> >>  +static u32 ls_instr[NUM_LS_INSTR][2] = {
> >>  +	{0x04700000, 0x0d700000}, /* LDRBT */
> >>  +	{0x04300000, 0x0d700000}, /* LDRT  */
> >>  +	{0x04100000, 0x0c500000}, /* LDR   */
> >>  +	{0x04500000, 0x0c500000}, /* LDRB  */
> >>  +	{0x000000d0, 0x0e1000f0}, /* LDRD  */
> >>  +	{0x01900090, 0x0ff000f0}, /* LDREX */
> >>  +	{0x001000b0, 0x0e1000f0}, /* LDRH  */
> >>  +	{0x001000d0, 0x0e1000f0}, /* LDRSB */
> >>  +	{0x001000f0, 0x0e1000f0}, /* LDRSH */
> >>  +	{0x04600000, 0x0d700000}, /* STRBT */
> >>  +	{0x04200000, 0x0d700000}, /* STRT  */
> >>  +	{0x04000000, 0x0c500000}, /* STR   */
> >>  +	{0x04400000, 0x0c500000}, /* STRB  */
> >>  +	{0x000000f0, 0x0e1000f0}, /* STRD  */
> >>  +	{0x01800090, 0x0ff000f0}, /* STREX */
> >>  +	{0x000000b0, 0x0e1000f0}  /* STRH  */
> >>  +};
> >>  +
> >
> >  Okay, maybe not.  But surely there's some clever arithmetic the cpu uses to decode this.
>
> Probably, but this is only used in the rare case when the virt. extensions doesn't support the fault information. I highly doubt that this is in any critical path for any sane guest OS, but surely one could write a VM that would run very slow. I would like not to spend time on this right now and perhaps get back to it when we have all sorts of other features in place. Or, what do you think?

It's the ordinary case of premature optimization that afflicts even the 
best of us.  Best to keep it simple.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2011-08-09 11:46 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-06 10:38 [PATCH v4 00/10] KVM/ARM Implementation Christoffer Dall
2011-08-06 10:39 ` [PATCH v4 01/10] ARM: KVM: Initial skeleton to compile KVM support Christoffer Dall
2011-08-06 10:39 ` [PATCH v4 02/10] ARM: KVM: Hypervisor identity mapping Christoffer Dall
2011-08-09  9:20   ` Avi Kivity
2011-08-09  9:29     ` Catalin Marinas
2011-08-09  9:29     ` Christoffer Dall
2011-08-09 10:23       ` [Android-virt] " Alexey Smirnov
2011-08-09 11:23         ` Christoffer Dall
2011-08-06 10:39 ` [PATCH v4 03/10] ARM: KVM: Add hypervisor inititalization Christoffer Dall
2011-08-06 10:39 ` [PATCH v4 04/10] ARM: KVM: Memory virtualization setup Christoffer Dall
2011-08-09  9:57   ` Avi Kivity
2011-08-09 11:24     ` [Android-virt] " Christoffer Dall
2011-08-06 10:39 ` [PATCH v4 05/10] ARM: KVM: Inject IRQs and FIQs from userspace Christoffer Dall
2011-08-09 10:07   ` Avi Kivity
2011-08-09 11:27     ` [Android-virt] " Christoffer Dall
2011-08-09 11:37       ` Avi Kivity
2011-08-09 11:40         ` Christoffer Dall
2011-08-06 10:39 ` [PATCH v4 06/10] ARM: KVM: World-switch implementation Christoffer Dall
2011-08-09 11:09   ` Avi Kivity
2011-08-09 11:29     ` Christoffer Dall
2011-08-06 10:39 ` [PATCH v4 07/10] ARM: KVM: Emulation framework and CP15 emulation Christoffer Dall
2011-08-09 11:17   ` Avi Kivity
2011-08-09 11:34     ` Christoffer Dall
2011-08-09 11:39       ` Avi Kivity
2011-08-09 11:40         ` Christoffer Dall
2011-08-06 10:39 ` [PATCH v4 08/10] ARM: KVM: Handle guest faults in KVM Christoffer Dall
2011-08-09 11:24   ` Avi Kivity
2011-08-09 11:35     ` Christoffer Dall
2011-08-06 10:40 ` [PATCH v4 09/10] ARM: KVM: Handle I/O aborts Christoffer Dall
2011-08-09 11:34   ` Avi Kivity
2011-08-09 11:39     ` Christoffer Dall
2011-08-09 11:46       ` Avi Kivity
2011-08-06 10:40 ` [PATCH v4 10/10] ARM: KVM: Guest wait-for-interrupts (WFI) support Christoffer Dall
2011-08-09 11:43 ` [PATCH v4 00/10] KVM/ARM Implementation Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.