All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 00/14] KVM/ARM Implementation
@ 2012-08-16 15:27 Christoffer Dall
  2012-08-16 15:28 ` [PATCH v10 01/14] ARM: add mem_type prot_pte accessor Christoffer Dall
                   ` (13 more replies)
  0 siblings, 14 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:27 UTC (permalink / raw)
  To: kvmarm, kvm

The following series implements KVM support for ARM processors,
specifically on the Cortex A-15 platform.  Work is done in
collaboration between Columbia University, Virtual Open Systems and
ARM/Linaro.

The patch series applies to kvm/next, specifically commit:
 dbcb4e798072d114fe68813f39a9efd239ab99c0

This is Version 10 of the patch series, but the first two versions
were reviewed outside of the KVM mailing list. Changes can also be
pulled from:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v10

A non-flattened edition of the patch series can be found at:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v10-stage

WARNING: This patch series release breaks compatibility with QEMU as it
worked with kvm-a15-v9 due to the new reset and set target API.  Please
use the latest Linaro master branch (or the mirror from here):
 git://github.com/virtualopensystems/qemu.git kvm-a15-v10

The implementation is broken up into a logical set of patches, the first
are preparatory patches:
  1. ARM: Add mem_type prot_pte accessor
  2. ARM: ARM_VIRT_EXT config option
  3. ARM: Section based HYP idmaps
  4. ARM: Expose PMNC bitfields for KVM use

KVM guys, please consider pulling the KVM generic patches as early as
possible. Thanks.

The main implementation is broken up into separate patches, the first
containing a skeleton of files, makefile changes, the basic user space
interface and KVM architecture specific stubs.  Subsequent patches
implement parts of the system as listed:
  5. Skeleton and reset hooks
  6. Hypervisor initialization
  7. Memory virtualization setup (hyp mode mappings and 2nd stage)
  8. Inject IRQs and FIQs from userspace
  9. World-switch implementation and Hyp exception vectors
 10. Emulation framework and coproc emulation
 11. Coproc user space API
 12. Handle guest user memory aborts
 13. Handle guest MMIO aborts
 14. Support guest wait-for-interrupt instructions

Testing:
Limited testing, but have run GCC inside guest, which compiled a small
hello-world program, which was successfully run. For v10 both ARM/Thumb-2
kernels were tested as both host/guest and both a compiled-in version
and a kernel module version of KVM was tested. Hardware still
unavailable to me, so all testing has been done on ARM Fast Models.

For a guide on how to set up a testing environment and try out these
patches, see:
 http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf


Additionally a few major milestones are coming up shortly:
 - Support Thumb MMIO emulation and test MMIO emulation code (under way)
 - Merge Marc Zyngier's patch series for VGIC and timers (review in
   progress)
 - Change from SMC based install to relying on booting the kernel in Hyp
   mode. (review of patches from Marc Zyngier underway)

Changes since v9:
 - Addressed reviewer comments (see mailing list archive)
 - Limit the user of .arch_extensiion sec/virt for compilers that need them
 - VFP/Neon Support (Antonios Motakis)
 - Run exit handling under preemption and still handle guest cache ops
 - Add support for IO mapping at Hyp level (VGIC prep)
 - Add support for IO mapping at Guest level (VGIC prep)
 - Remove backdoor call to irq_svc
 - Complete rework of CP15 handling and register reset (Rusty Russell)
 - Don't use HSTR for anything else than CR 15
 - New ioctl to set emulation target core (only A15 supported for now)
 - Support KVM_GET_MSRS / KVM_SET_MSRS
 - Add page accounting and page table eviction
 - Change pgd lock to spinlock and fix sleeping in atomic bugs
 - Check kvm_condition_valid for HVC traps of undefs
 - Added a naive implementation of kvm_unmap_hva_range

Changes since v8:
 - Support cache maintenance on SMP through set/way
 - Hyp mode idmaps are now section based and happen at kernel init
 - Handle aborts in Hyp mode
 - Inject undefined exceptions into the guest on error
 - Kernel-side reset of all crucial registers
 - Specifically state which target CPU is being virtualized
 - Exit statistics in debugfs
 - Some L2CTLR cp15 emulation cleanups
 - Support spte_hva for MMU notifiers and take write faults
 - FIX: Race condition in VMID generation
 - BUG: Run exit handling code with disabled preemption
 - Save/Restore abort fault register during world switch

Changes since v7:
 - Traps accesses to ACTLR
 - Do not trap WFE execution
 - Upgrade barriers and TLB operations to inner-shareable domain
 - Restrucure hyp_pgd related code to be more opaque
 - Random SMP fixes
 - Random BUG fixes
 - Improve commenting
 - Support module loading/unloading of KVM/ARM
 - Thumb-2 support for host kernel and KVM
 - Unaligned cross-page wide guest Thumb instruction fetching
 - Support ITSTATE fields in CPSR for Thumb guests
 - Document HCR settings

Changes since v6:
 - Support for MMU notifiers to not pin user pages in memory
 - Suport build with log debugging
 - Bugfix: v6 clobbered r7 in init code
 - Simplify hyp code mapping
 - Cleanup of register access code
 - Table-based CP15 emulation from Rusty Russell
 - Various other bug fixes and cleanups

Changes since v5:
 - General bugfixes and nit fixes from reviews
 - Implemented re-use of VMIDs
 - Cleaned up the Hyp-mapping code to be readable by non-mm hackers
   (including myself)
 - Integrated preliminary SMP support in base patches
 - Lock-less interrupt injection and WFI support
 - Fixed signal-handling in while in guest (increases overall stability)

Changes since v4:
 - Addressed reviewer comments from v4
    * cleanup debug and trace code
    * remove printks
    * fixup kvm_arch_vcpu_ioctl_run
    * add trace details to mmio emulation
 - Fix from Marc Zyngier: Move kvm_guest_enter/exit into non-preemptible
   section (squashed into world-switch patch)
 - Cleanup create_hyp_mappings/remove_hyp_mappings from Marc Zyngier
   (squashed into hypervisor initialization patch)
 - Removed the remove_hyp_mappings feature. Removing hypervisor mappings
   could potentially unmap other important data shared in the same page.
 - Removed the arm_ prefix from the arch-specific files.
 - Initial SMP host/guest support

Changes since v3:
 - v4 actually works, fully boots a guest
 - Support compiling as a module
 - Use static inlines instead of macros for vcpu_reg and friends
 - Optimize kvm_vcpu_reg function
 - Use Ftrace for trace capabilities
 - Updated documentation and commenting
 - Use KVM_IRQ_LINE instead of KVM_INTERRUPT
 - Emulates load/store instructions not supported through HSR
  syndrome information.
 - Frees 2nd stage translation tables on VM teardown
 - Handles IRQ/FIQ instructions
 - Handles more CP15 accesses
 - Support guest WFI calls
 - Uses debugfs instead of /proc
 - Support compiling in Thumb mode

Changes since v2:
 - Performs world-switch code
 - Maps guest memory using 2nd stage translation
 - Emulates co-processor 15 instructions
 - Forwards I/O faults to QEMU.

---

Christoffer Dall (12):
      ARM: Add config option ARM_VIRT_EXT
      ARM: Section based HYP idmap
      KVM: ARM: Initial skeleton to compile KVM support
      KVM: ARM: Hypervisor inititalization
      KVM: ARM: Memory virtualization setup
      KVM: ARM: Inject IRQs and FIQs from userspace
      KVM: ARM: World-switch implementation
      KVM: ARM: Emulation framework and CP15 emulation
      KVM: ARM: User space API for getting/setting co-proc registers
      KVM: ARM: Handle guest faults in KVM
      KVM: ARM: Handle I/O aborts
      KVM: ARM: Guest wait-for-interrupts (WFI) support

Marc Zyngier (1):
      ARM: add mem_type prot_pte accessor

Rusty Russell (1):
      ARM: Expose PMNC bitfields for KVM use


 Documentation/virtual/kvm/api.txt           |   85 ++
 arch/arm/Kconfig                            |    2 
 arch/arm/Makefile                           |    1 
 arch/arm/include/asm/idmap.h                |    7 
 arch/arm/include/asm/kvm.h                  |  119 +++
 arch/arm/include/asm/kvm_arm.h              |  197 +++++
 arch/arm/include/asm/kvm_asm.h              |   59 ++
 arch/arm/include/asm/kvm_coproc.h           |   38 +
 arch/arm/include/asm/kvm_emulate.h          |  115 +++
 arch/arm/include/asm/kvm_host.h             |  193 +++++
 arch/arm/include/asm/kvm_mmu.h              |   46 +
 arch/arm/include/asm/mach/map.h             |    1 
 arch/arm/include/asm/perf_bits.h            |   56 ++
 arch/arm/include/asm/pgtable-3level-hwdef.h |    5 
 arch/arm/include/asm/pgtable-3level.h       |   13 
 arch/arm/include/asm/pgtable.h              |    5 
 arch/arm/kernel/asm-offsets.c               |   45 +
 arch/arm/kernel/perf_event_v7.c             |   51 -
 arch/arm/kernel/vmlinux.lds.S               |    6 
 arch/arm/kvm/Kconfig                        |   45 +
 arch/arm/kvm/Makefile                       |   23 +
 arch/arm/kvm/arm.c                          |  991 +++++++++++++++++++++++++++
 arch/arm/kvm/coproc.c                       |  962 ++++++++++++++++++++++++++
 arch/arm/kvm/emulate.c                      |  531 ++++++++++++++
 arch/arm/kvm/exports.c                      |   38 +
 arch/arm/kvm/guest.c                        |  163 ++++
 arch/arm/kvm/init.S                         |  149 ++++
 arch/arm/kvm/interrupts.S                   |  782 +++++++++++++++++++++
 arch/arm/kvm/mmu.c                          |  837 +++++++++++++++++++++++
 arch/arm/kvm/reset.c                        |   74 ++
 arch/arm/kvm/trace.h                        |  117 +++
 arch/arm/mm/Kconfig                         |   10 
 arch/arm/mm/idmap.c                         |   88 ++
 arch/arm/mm/mmu.c                           |    9 
 include/linux/kvm.h                         |    3 
 mm/memory.c                                 |    2 
 36 files changed, 5798 insertions(+), 70 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/asm/kvm_mmu.h
 create mode 100644 arch/arm/include/asm/perf_bits.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/exports.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v10 01/14] ARM: add mem_type prot_pte accessor
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
@ 2012-08-16 15:28 ` Christoffer Dall
  2012-08-16 15:28 ` [PATCH v10 02/14] ARM: Add config option ARM_VIRT_EXT Christoffer Dall
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:28 UTC (permalink / raw)
  To: kvmarm, kvm

From: Marc Zyngier <marc.zyngier@arm.com>

The KVM hypervisor mmu code requires access to the mem_type prot_pte
field when setting up page tables pointing to a device. Unfortunately,
the mem_type structure is opaque.

Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
value.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/mach/map.h |    1 +
 arch/arm/mm/mmu.c               |    6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index a6efcdd..3787c9f 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -37,6 +37,7 @@ extern void iotable_init(struct map_desc *, int);
 
 struct mem_type;
 extern const struct mem_type *get_mem_type(unsigned int type);
+extern pteval_t get_mem_type_prot_pte(unsigned int type);
 /*
  * external interface to remap single page with appropriate type
  */
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 4c2d045..76bf4f5 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -301,6 +301,12 @@ const struct mem_type *get_mem_type(unsigned int type)
 }
 EXPORT_SYMBOL(get_mem_type);
 
+pteval_t get_mem_type_prot_pte(unsigned int type)
+{
+	return get_mem_type(type)->prot_pte;
+}
+EXPORT_SYMBOL(get_mem_type_prot_pte);
+
 /*
  * Adjust the PMD section entries according to the CPU in use.
  */


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 02/14] ARM: Add config option ARM_VIRT_EXT
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
  2012-08-16 15:28 ` [PATCH v10 01/14] ARM: add mem_type prot_pte accessor Christoffer Dall
@ 2012-08-16 15:28 ` Christoffer Dall
  2012-08-16 15:28 ` [PATCH v10 03/14] ARM: Section based HYP idmap Christoffer Dall
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:28 UTC (permalink / raw)
  To: kvmarm, kvm

Select this option for ARM processors equipped with hardware
Virtualization Extensions.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/mm/Kconfig |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 101b968..037dc53 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -597,6 +597,16 @@ config ARM_LPAE
 
 	  If unsure, say N.
 
+config ARM_VIRT_EXT
+	bool "Support for ARM Virtualization Extensions"
+	depends on ARM_LPAE
+	help
+	  Say Y if you have an ARMv7 processor supporting the ARM hardware
+	  Virtualization extensions. KVM depends on this feature and will
+	  not run without it being selected. If you say Y here, the kernel
+	  will not boot on a machine without virtualization extensions and
+	  will not boot as a KVM guest.
+
 config ARCH_PHYS_ADDR_T_64BIT
 	def_bool ARM_LPAE
 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 03/14] ARM: Section based HYP idmap
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
  2012-08-16 15:28 ` [PATCH v10 01/14] ARM: add mem_type prot_pte accessor Christoffer Dall
  2012-08-16 15:28 ` [PATCH v10 02/14] ARM: Add config option ARM_VIRT_EXT Christoffer Dall
@ 2012-08-16 15:28 ` Christoffer Dall
  2012-08-16 15:28 ` [PATCH v10 04/14] ARM: Expose PMNC bitfields for KVM use Christoffer Dall
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:28 UTC (permalink / raw)
  To: kvmarm, kvm

Add a HYP pgd to the core code (so it can benefit all Linux
hypervisors).

Populate this pgd with an identity mapping of the code contained
in the .hyp.idmap.text section

Offer a method to drop the this identity mapping through
hyp_idmap_teardown and re-create it through hyp_idmap_setup.

Make all the above depend on CONFIG_ARM_VIRT_EXT

Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/idmap.h                |    7 ++
 arch/arm/include/asm/pgtable-3level-hwdef.h |    1 
 arch/arm/kernel/vmlinux.lds.S               |    6 ++
 arch/arm/mm/idmap.c                         |   88 +++++++++++++++++++++++----
 4 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
index bf863ed..a1ab8d6 100644
--- a/arch/arm/include/asm/idmap.h
+++ b/arch/arm/include/asm/idmap.h
@@ -11,4 +11,11 @@ extern pgd_t *idmap_pgd;
 
 void setup_mm_for_reboot(void);
 
+#ifdef CONFIG_ARM_VIRT_EXT
+extern pgd_t *hyp_pgd;
+
+void hyp_idmap_teardown(void);
+void hyp_idmap_setup(void);
+#endif
+
 #endif	/* __ASM_IDMAP_H */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..a2d404e 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -44,6 +44,7 @@
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
+#define PMD_SECT_AP1		(_AT(pmdval_t, 1) << 6)
 #define PMD_SECT_TEX(x)		(_AT(pmdval_t, 0))
 
 /*
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 36ff15b..12fd2eb 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -19,7 +19,11 @@
 	ALIGN_FUNCTION();						\
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
 	*(.idmap.text)							\
-	VMLINUX_SYMBOL(__idmap_text_end) = .;
+	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
+	ALIGN_FUNCTION();						\
+	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
+	*(.hyp.idmap.text)						\
+	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index ab88ed4..7a944af 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,4 +1,6 @@
+#include <linux/module.h>
 #include <linux/kernel.h>
+#include <linux/slab.h>
 
 #include <asm/cputype.h>
 #include <asm/idmap.h>
@@ -59,11 +61,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 	} while (pud++, addr = next, addr != end);
 }
 
-static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+static void identity_mapping_add(pgd_t *pgd, const char *text_start,
+				 const char *text_end, unsigned long prot)
 {
-	unsigned long prot, next;
+	unsigned long addr, end;
+	unsigned long next;
+
+	addr = virt_to_phys(text_start);
+	end = virt_to_phys(text_end);
+
+	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
+		prot ? "HYP " : "",
+		(long long)addr, (long long)end);
+	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
-	prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
 		prot |= PMD_BIT4;
 
@@ -78,24 +89,77 @@ extern char  __idmap_text_start[], __idmap_text_end[];
 
 static int __init init_static_idmap(void)
 {
-	phys_addr_t idmap_start, idmap_end;
-
 	idmap_pgd = pgd_alloc(&init_mm);
 	if (!idmap_pgd)
 		return -ENOMEM;
 
-	/* Add an identity mapping for the physical address of the section. */
-	idmap_start = virt_to_phys((void *)__idmap_text_start);
-	idmap_end = virt_to_phys((void *)__idmap_text_end);
-
-	pr_info("Setting up static identity map for 0x%llx - 0x%llx\n",
-		(long long)idmap_start, (long long)idmap_end);
-	identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
+	identity_mapping_add(idmap_pgd, __idmap_text_start,
+			     __idmap_text_end, 0);
 
 	return 0;
 }
 early_initcall(init_static_idmap);
 
+#ifdef CONFIG_ARM_VIRT_EXT
+pgd_t *hyp_pgd;
+EXPORT_SYMBOL_GPL(hyp_pgd);
+
+static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pud = pud_offset(pgd, addr);
+	pmd = pmd_offset(pud, addr);
+	pud_clear(pud);
+	clean_pmd_entry(pmd);
+	pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
+}
+
+extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
+
+/*
+ * This version actually frees the underlying pmds for all pgds in range and
+ * clear the pgds themselves afterwards.
+ */
+void hyp_idmap_teardown(void)
+{
+	unsigned long addr, end;
+	unsigned long next;
+	pgd_t *pgd = hyp_pgd;
+
+	addr = virt_to_phys(__hyp_idmap_text_start);
+	end = virt_to_phys(__hyp_idmap_text_end);
+
+	pgd += pgd_index(addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (!pgd_none_or_clear_bad(pgd))
+			hyp_idmap_del_pmd(pgd, addr);
+	} while (pgd++, addr = next, addr < end);
+}
+EXPORT_SYMBOL_GPL(hyp_idmap_teardown);
+
+void hyp_idmap_setup(void)
+{
+	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
+			     __hyp_idmap_text_end, PMD_SECT_AP1);
+}
+EXPORT_SYMBOL_GPL(hyp_idmap_setup);
+
+static int __init hyp_init_static_idmap(void)
+{
+	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!hyp_pgd)
+		return -ENOMEM;
+
+	hyp_idmap_setup();
+
+	return 0;
+}
+early_initcall(hyp_init_static_idmap);
+#endif
+
 /*
  * In order to soft-boot, we need to switch to a 1:1 mapping for the
  * cpu_reset functions. This will then ensure that we have predictable


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 04/14] ARM: Expose PMNC bitfields for KVM use
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
                   ` (2 preceding siblings ...)
  2012-08-16 15:28 ` [PATCH v10 03/14] ARM: Section based HYP idmap Christoffer Dall
@ 2012-08-16 15:28 ` Christoffer Dall
  2012-08-16 15:28 ` [PATCH v10 05/14] KVM: ARM: Initial skeleton to compile KVM support Christoffer Dall
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:28 UTC (permalink / raw)
  To: kvmarm, kvm

From: Rusty Russell <rusty.russell@linaro.org>

We want some of these for use in KVM, so pull them out of
arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
---
 arch/arm/include/asm/perf_bits.h |   56 ++++++++++++++++++++++++++++++++++++++
 arch/arm/kernel/perf_event_v7.c  |   51 +----------------------------------
 2 files changed, 57 insertions(+), 50 deletions(-)
 create mode 100644 arch/arm/include/asm/perf_bits.h

diff --git a/arch/arm/include/asm/perf_bits.h b/arch/arm/include/asm/perf_bits.h
new file mode 100644
index 0000000..eeb266a
--- /dev/null
+++ b/arch/arm/include/asm/perf_bits.h
@@ -0,0 +1,56 @@
+#ifndef __ARM_PERF_BITS_H__
+#define __ARM_PERF_BITS_H__
+
+/*
+ * ARMv7 low level PMNC access
+ */
+
+/*
+ * Per-CPU PMNC: config reg
+ */
+#define ARMV7_PMNC_E		(1 << 0) /* Enable all counters */
+#define ARMV7_PMNC_P		(1 << 1) /* Reset all counters */
+#define ARMV7_PMNC_C		(1 << 2) /* Cycle counter reset */
+#define ARMV7_PMNC_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
+#define ARMV7_PMNC_X		(1 << 4) /* Export to ETM */
+#define ARMV7_PMNC_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
+#define	ARMV7_PMNC_N_SHIFT	11	 /* Number of counters supported */
+#define	ARMV7_PMNC_N_MASK	0x1f
+#define	ARMV7_PMNC_MASK		0x3f	 /* Mask for writable bits */
+
+/*
+ * FLAG: counters overflow flag status reg
+ */
+#define	ARMV7_FLAG_MASK		0xffffffff	/* Mask for writable bits */
+#define	ARMV7_OVERFLOWED_MASK	ARMV7_FLAG_MASK
+
+/*
+ * PMXEVTYPER: Event selection reg
+ */
+#define	ARMV7_EVTYPE_MASK	0xc00000ff	/* Mask for writable bits */
+#define	ARMV7_EVTYPE_EVENT	0xff		/* Mask for EVENT bits */
+
+/*
+ * Event filters for PMUv2
+ */
+#define	ARMV7_EXCLUDE_PL1	(1 << 31)
+#define	ARMV7_EXCLUDE_USER	(1 << 30)
+#define	ARMV7_INCLUDE_HYP	(1 << 27)
+
+#ifndef __ASSEMBLY__
+static inline u32 armv7_pmnc_read(void)
+{
+	u32 val;
+	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r"(val));
+	return val;
+}
+
+static inline void armv7_pmnc_write(u32 val)
+{
+	val &= ARMV7_PMNC_MASK;
+	isb();
+	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r"(val));
+}
+#endif
+
+#endif /* __ARM_PERF_BITS_H__ */
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index f04070b..09851b3 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -17,6 +17,7 @@
  */
 
 #ifdef CONFIG_CPU_V7
+#include <asm/perf_bits.h>
 
 static struct arm_pmu armv7pmu;
 
@@ -744,61 +745,11 @@ static const unsigned armv7_a7_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 #define	ARMV7_COUNTER_MASK	(ARMV7_MAX_COUNTERS - 1)
 
 /*
- * ARMv7 low level PMNC access
- */
-
-/*
  * Perf Event to low level counters mapping
  */
 #define	ARMV7_IDX_TO_COUNTER(x)	\
 	(((x) - ARMV7_IDX_COUNTER0) & ARMV7_COUNTER_MASK)
 
-/*
- * Per-CPU PMNC: config reg
- */
-#define ARMV7_PMNC_E		(1 << 0) /* Enable all counters */
-#define ARMV7_PMNC_P		(1 << 1) /* Reset all counters */
-#define ARMV7_PMNC_C		(1 << 2) /* Cycle counter reset */
-#define ARMV7_PMNC_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
-#define ARMV7_PMNC_X		(1 << 4) /* Export to ETM */
-#define ARMV7_PMNC_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
-#define	ARMV7_PMNC_N_SHIFT	11	 /* Number of counters supported */
-#define	ARMV7_PMNC_N_MASK	0x1f
-#define	ARMV7_PMNC_MASK		0x3f	 /* Mask for writable bits */
-
-/*
- * FLAG: counters overflow flag status reg
- */
-#define	ARMV7_FLAG_MASK		0xffffffff	/* Mask for writable bits */
-#define	ARMV7_OVERFLOWED_MASK	ARMV7_FLAG_MASK
-
-/*
- * PMXEVTYPER: Event selection reg
- */
-#define	ARMV7_EVTYPE_MASK	0xc00000ff	/* Mask for writable bits */
-#define	ARMV7_EVTYPE_EVENT	0xff		/* Mask for EVENT bits */
-
-/*
- * Event filters for PMUv2
- */
-#define	ARMV7_EXCLUDE_PL1	(1 << 31)
-#define	ARMV7_EXCLUDE_USER	(1 << 30)
-#define	ARMV7_INCLUDE_HYP	(1 << 27)
-
-static inline u32 armv7_pmnc_read(void)
-{
-	u32 val;
-	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r"(val));
-	return val;
-}
-
-static inline void armv7_pmnc_write(u32 val)
-{
-	val &= ARMV7_PMNC_MASK;
-	isb();
-	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r"(val));
-}
-
 static inline int armv7_pmnc_has_overflowed(u32 pmnc)
 {
 	return pmnc & ARMV7_OVERFLOWED_MASK;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 05/14] KVM: ARM: Initial skeleton to compile KVM support
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
                   ` (3 preceding siblings ...)
  2012-08-16 15:28 ` [PATCH v10 04/14] ARM: Expose PMNC bitfields for KVM use Christoffer Dall
@ 2012-08-16 15:28 ` Christoffer Dall
  2012-08-16 15:29 ` [PATCH v10 06/14] KVM: ARM: Hypervisor inititalization Christoffer Dall
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:28 UTC (permalink / raw)
  To: kvmarm, kvm

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files and some
tracing functionality.

Only supported core is Cortex-A15 for now.

Contains minor reset hook driven from kvm_vcpu_set_target, which will
eventually be a custom ARM ioctl to set the core we are emulating.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

“Nothing to see here. Move along, move along..."

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/Kconfig                   |    2 
 arch/arm/Makefile                  |    1 
 arch/arm/include/asm/kvm.h         |   79 +++++++++
 arch/arm/include/asm/kvm_arm.h     |   28 +++
 arch/arm/include/asm/kvm_asm.h     |   30 +++
 arch/arm/include/asm/kvm_coproc.h  |   24 +++
 arch/arm/include/asm/kvm_emulate.h |  108 ++++++++++++
 arch/arm/include/asm/kvm_host.h    |  160 ++++++++++++++++++
 arch/arm/kvm/Kconfig               |   44 +++++
 arch/arm/kvm/Makefile              |   23 +++
 arch/arm/kvm/arm.c                 |  317 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/coproc.c              |   22 ++
 arch/arm/kvm/emulate.c             |  127 ++++++++++++++
 arch/arm/kvm/exports.c             |   21 ++
 arch/arm/kvm/guest.c               |  163 +++++++++++++++++++
 arch/arm/kvm/init.S                |   19 ++
 arch/arm/kvm/interrupts.S          |   19 ++
 arch/arm/kvm/mmu.c                 |   17 ++
 arch/arm/kvm/reset.c               |   74 ++++++++
 arch/arm/kvm/trace.h               |   52 ++++++
 include/linux/kvm.h                |    1 
 21 files changed, 1331 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/exports.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e91c7cd..8cc2e41 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2341,3 +2341,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
+
+source "arch/arm/kvm/Kconfig"
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 30eae87..3bcc414 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -255,6 +255,7 @@ core-$(CONFIG_VFP)		+= arch/arm/vfp/
 # If we have a machine-specific directory, then include it in the build.
 core-y				+= arch/arm/kernel/ arch/arm/mm/ arch/arm/common/
 core-y				+= arch/arm/net/
+core-y 				+= arch/arm/kvm/
 core-y				+= $(machdirs) $(platdirs)
 
 drivers-$(CONFIG_OPROFILE)      += arch/arm/oprofile/
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
new file mode 100644
index 0000000..bc5d72b
--- /dev/null
+++ b/arch/arm/include/asm/kvm.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_H__
+#define __ARM_KVM_H__
+
+#include <asm/types.h>
+
+#define __KVM_HAVE_GUEST_DEBUG
+
+/*
+ * Modes used for short-hand mode determinition in the world-switch code and
+ * in emulation code.
+ *
+ * Note: These indices do NOT correspond to the value of the CPSR mode bits!
+ */
+enum vcpu_mode {
+	MODE_FIQ = 0,
+	MODE_IRQ,
+	MODE_SVC,
+	MODE_ABT,
+	MODE_UND,
+	MODE_USR,
+	MODE_SYS
+};
+
+struct kvm_regs {
+	__u32 regs0_7[8];	/* Unbanked regs. (r0 - r7)	   */
+	__u32 fiq_regs8_12[5];	/* Banked fiq regs. (r8 - r12)	   */
+	__u32 usr_regs8_12[5];	/* Banked usr registers (r8 - r12) */
+	__u32 reg13[6];		/* Banked r13, indexed by MODE_	   */
+	__u32 reg14[6];		/* Banked r13, indexed by MODE_	   */
+	__u32 reg15;
+	__u32 cpsr;
+	__u32 spsr[5];		/* Banked SPSR,  indexed by MODE_  */
+};
+
+/* Supported Processor Types */
+#define KVM_ARM_TARGET_CORTEX_A15	(0xC0F)
+
+struct kvm_vcpu_init {
+	__u32 target;
+	__u32 features[7];
+};
+
+struct kvm_sregs {
+};
+
+struct kvm_fpu {
+};
+
+struct kvm_guest_debug_arch {
+};
+
+struct kvm_debug_exit_arch {
+};
+
+struct kvm_sync_regs {
+};
+
+struct kvm_arch_memory_slot {
+};
+
+#endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
new file mode 100644
index 0000000..2f9d28e
--- /dev/null
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_ARM_H__
+#define __ARM_KVM_ARM_H__
+
+/* Supported Processor Types */
+#define CORTEX_A15	(0xC0F)
+
+/* Multiprocessor Affinity Register */
+#define MPIDR_CPUID	(0x3 << 0)
+
+#endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
new file mode 100644
index 0000000..44591f9
--- /dev/null
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_ASM_H__
+#define __ARM_KVM_ASM_H__
+
+#define ARM_EXCEPTION_RESET	  0
+#define ARM_EXCEPTION_UNDEFINED   1
+#define ARM_EXCEPTION_SOFTWARE    2
+#define ARM_EXCEPTION_PREF_ABORT  3
+#define ARM_EXCEPTION_DATA_ABORT  4
+#define ARM_EXCEPTION_IRQ	  5
+#define ARM_EXCEPTION_FIQ	  6
+
+#endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
new file mode 100644
index 0000000..b6d023d
--- /dev/null
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2012 Rusty Russell IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_COPROC_H__
+#define __ARM_KVM_COPROC_H__
+#include <linux/kvm_host.h>
+
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
+
+#endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
new file mode 100644
index 0000000..9e29335
--- /dev/null
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_EMULATE_H__
+#define __ARM_KVM_EMULATE_H__
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_asm.h>
+
+u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, enum vcpu_mode mode);
+
+static inline u8 __vcpu_mode(u32 cpsr)
+{
+	u8 modes_table[32] = {
+		0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
+		0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
+		MODE_USR,	/* 0x0 */
+		MODE_FIQ,	/* 0x1 */
+		MODE_IRQ,	/* 0x2 */
+		MODE_SVC,	/* 0x3 */
+		0xf, 0xf, 0xf,
+		MODE_ABT,	/* 0x7 */
+		0xf, 0xf, 0xf,
+		MODE_UND,	/* 0xb */
+		0xf, 0xf, 0xf,
+		MODE_SYS	/* 0xf */
+	};
+
+	return modes_table[cpsr & 0x1f];
+}
+
+static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
+{
+	u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
+	BUG_ON(mode == 0xf);
+	return mode;
+}
+
+/*
+ * Return the SPSR for the specified mode of the virtual CPU.
+ */
+static inline u32 *vcpu_spsr_mode(struct kvm_vcpu *vcpu, enum vcpu_mode mode)
+{
+	switch (mode) {
+	case MODE_SVC:
+		return &vcpu->arch.regs.svc_regs[2];
+	case MODE_ABT:
+		return &vcpu->arch.regs.abt_regs[2];
+	case MODE_UND:
+		return &vcpu->arch.regs.und_regs[2];
+	case MODE_IRQ:
+		return &vcpu->arch.regs.irq_regs[2];
+	case MODE_FIQ:
+		return &vcpu->arch.regs.fiq_regs[7];
+	default:
+		BUG();
+	}
+}
+
+/* Get vcpu register for current mode */
+static inline u32 *vcpu_reg(struct kvm_vcpu *vcpu, unsigned long reg_num)
+{
+	return vcpu_reg_mode(vcpu, reg_num, vcpu_mode(vcpu));
+}
+
+static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
+{
+	return vcpu_reg(vcpu, 15);
+}
+
+static inline u32 *vcpu_cpsr(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.regs.cpsr;
+}
+
+/* Get vcpu SPSR for current mode */
+static inline u32 *vcpu_spsr(struct kvm_vcpu *vcpu)
+{
+	return vcpu_spsr_mode(vcpu, vcpu_mode(vcpu));
+}
+
+static inline bool mode_has_spsr(struct kvm_vcpu *vcpu)
+{
+	return (vcpu_mode(vcpu) < MODE_USR);
+}
+
+static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
+{
+	BUG_ON(vcpu_mode(vcpu) > MODE_SYS);
+	return vcpu_mode(vcpu) != MODE_USR;
+}
+
+#endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
new file mode 100644
index 0000000..d7e3398
--- /dev/null
+++ b/arch/arm/include/asm/kvm_host.h
@@ -0,0 +1,160 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_HOST_H__
+#define __ARM_KVM_HOST_H__
+
+#define KVM_MAX_VCPUS 4
+#define KVM_MEMORY_SLOTS 32
+#define KVM_PRIVATE_MEM_SLOTS 4
+#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+
+#define NUM_FEATURES 0
+
+/* We don't currently support large pages. */
+#define KVM_HPAGE_GFN_SHIFT(x)	0
+#define KVM_NR_PAGE_SIZES	1
+#define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
+
+struct kvm_vcpu;
+u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
+int kvm_target_cpu(void);
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
+
+struct kvm_arch {
+	/* The VMID generation used for the virt. memory system */
+	u64    vmid_gen;
+	u32    vmid;
+
+	/* 1-level 2nd stage table and lock */
+	spinlock_t pgd_lock;
+	pgd_t *pgd;
+
+	/* VTTBR value associated with above pgd and vmid */
+	u64    vttbr;
+};
+
+#define EXCEPTION_NONE      0
+#define EXCEPTION_RESET     0x80
+#define EXCEPTION_UNDEFINED 0x40
+#define EXCEPTION_SOFTWARE  0x20
+#define EXCEPTION_PREFETCH  0x10
+#define EXCEPTION_DATA      0x08
+#define EXCEPTION_IMPRECISE 0x04
+#define EXCEPTION_IRQ       0x02
+#define EXCEPTION_FIQ       0x01
+
+#define KVM_NR_MEM_OBJS     40
+
+/*
+ * We don't want allocation failures within the mmu code, so we preallocate
+ * enough memory for a single page fault in a cache.
+ */
+struct kvm_mmu_memory_cache {
+	int nobjs;
+	void *objects[KVM_NR_MEM_OBJS];
+};
+
+struct kvm_vcpu_regs {
+	u32 usr_regs[15];	/* R0_usr - R14_usr */
+	u32 svc_regs[3];	/* SP_svc, LR_svc, SPSR_svc */
+	u32 abt_regs[3];	/* SP_abt, LR_abt, SPSR_abt */
+	u32 und_regs[3];	/* SP_und, LR_und, SPSR_und */
+	u32 irq_regs[3];	/* SP_irq, LR_irq, SPSR_irq */
+	u32 fiq_regs[8];	/* R8_fiq - R14_fiq, SPSR_fiq */
+	u32 pc;			/* The program counter (r15) */
+	u32 cpsr;		/* The guest CPSR */
+} __packed;
+
+/* 0 is reserved as an invalid value. */
+enum cp15_regs {
+	c0_MPIDR=1,		/* MultiProcessor ID Register */
+	c1_SCTLR,		/* System Control Register */
+	c1_ACTLR,		/* Auxilliary Control Register */
+	c1_CPACR,		/* Coprocessor Access Control */
+	c2_TTBR0,		/* Translation Table Base Register 0 */
+	c2_TTBR0_high,		/* TTBR0 top 32 bits */
+	c2_TTBR1,		/* Translation Table Base Register 1 */
+	c2_TTBR1_high,		/* TTBR1 top 32 bits */
+	c2_TTBCR,		/* Translation Table Base Control R. */
+	c3_DACR,		/* Domain Access Control Register */
+	c5_DFSR,		/* Data Fault Status Register */
+	c5_IFSR,		/* Instruction Fault Status Register */
+	c5_ADFSR,		/* Auxilary Data Fault Status Register */
+	c5_AIFSR,		/* Auxilary Instruction Fault Status Register */
+	c6_DFAR,		/* Data Fault Address Register */
+	c6_IFAR,		/* Instruction Fault Address Register */
+	c10_PRRR,		/* Primary Region Remap Register */
+	c10_NMRR,		/* Normal Memory Remap Register */
+	c12_VBAR,		/* Vector Base Address Register */
+	c13_CID,		/* Context ID Register */
+	c13_TID_URW,		/* Thread ID, User R/W */
+	c13_TID_URO,		/* Thread ID, User R/O */
+	c13_TID_PRIV,		/* Thread ID, Priveleged */
+
+	nr_cp15_regs
+};
+
+struct kvm_vcpu_arch {
+	struct kvm_vcpu_regs regs;
+
+	u32 target; /* Currently KVM_ARM_TARGET_CORTEX_A15 */
+	DECLARE_BITMAP(features, NUM_FEATURES);
+
+	/* System control coprocessor (cp15) */
+	u32 cp15[nr_cp15_regs];
+
+	/* The CPU type we expose to the VM */
+	u32 midr;
+
+	/* Exception Information */
+	u32 hsr;		/* Hyp Syndrom Register */
+	u32 hdfar;		/* Hyp Data Fault Address Register */
+	u32 hifar;		/* Hyp Inst. Fault Address Register */
+	u32 hpfar;		/* Hyp IPA Fault Address Register */
+	u64 pc_ipa;		/* IPA for the current PC (VA to PA result) */
+	u64 pc_ipa2;		/* same as above, but for non-aligned wide thumb
+				   instructions */
+
+	/* IO related fields */
+	bool mmio_sign_extend;	/* for byte/halfword loads */
+	u32 mmio_rd;
+
+	/* Interrupt related fields */
+	u32 irq_lines;		/* IRQ and FIQ levels */
+
+	/* Hyp exception information */
+	u32 hyp_pc;		/* PC when exception was taken from Hyp mode */
+
+	/* Cache some mmu pages needed inside spinlock regions */
+	struct kvm_mmu_memory_cache mmu_page_cache;
+};
+
+struct kvm_vm_stat {
+	u32 remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat {
+	u32 halt_wakeup;
+};
+
+struct kvm_vcpu_init;
+int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
+			const struct kvm_vcpu_init *init);
+#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
new file mode 100644
index 0000000..83abbe0
--- /dev/null
+++ b/arch/arm/kvm/Kconfig
@@ -0,0 +1,44 @@
+#
+# KVM configuration
+#
+
+source "virt/kvm/Kconfig"
+
+menuconfig VIRTUALIZATION
+	bool "Virtualization"
+	---help---
+	  Say Y here to get to see options for using your Linux host to run
+	  other operating systems inside virtual machines (guests).
+	  This option alone does not add any kernel code.
+
+	  If you say N, all options in this submenu will be skipped and
+	  disabled.
+
+if VIRTUALIZATION
+
+config KVM
+	tristate "Kernel-based Virtual Machine (KVM) support"
+	select PREEMPT_NOTIFIERS
+	select ANON_INODES
+	select KVM_MMIO
+	depends on CPU_V7 && ARM_VIRT_EXT
+	---help---
+	  Support hosting virtualized guest machines. You will also
+	  need to select one or more of the processor modules below.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
+config KVM_ARM_HOST
+	bool "KVM host support for ARM cpus."
+	depends on KVM
+	depends on MMU
+	depends on CPU_V7 && ARM_VIRT_EXT
+	---help---
+	  Provides host support for ARM processors.
+
+source drivers/virtio/Kconfig
+
+endif # VIRTUALIZATION
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
new file mode 100644
index 0000000..92b04b2
--- /dev/null
+++ b/arch/arm/kvm/Makefile
@@ -0,0 +1,23 @@
+#
+# Makefile for Kernel-based Virtual Machine module
+#
+
+plus_virt := $(call as-instr,.arch_extension virt,+virt)
+ifeq ($(plus_virt),+virt)
+	plus_virt_def := -DREQUIRES_VIRT=1
+endif
+
+ccflags-y += -Ivirt/kvm -Iarch/arm/kvm
+CFLAGS_arm.o     := -I. $(call as-instr,.arch_extension sec,-DREQUIRES_SEC=1) $(plus_virt_def)
+CFLAGS_mmu.o := -I.
+
+AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
+AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
+
+obj-$(CONFIG_KVM_ARM_HOST) += init.o interrupts.o exports.o
+
+kvm-arm-y += $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+
+kvm-arm-y += arm.o guest.o mmu.o emulate.o reset.o coproc.o
+
+obj-$(CONFIG_KVM) += kvm-arm.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
new file mode 100644
index 0000000..1f30b0a
--- /dev/null
+++ b/arch/arm/kvm/arm.c
@@ -0,0 +1,317 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <linux/mman.h>
+#include <linux/sched.h>
+#include <trace/events/kvm.h>
+
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
+#include <asm/unified.h>
+#include <asm/uaccess.h>
+#include <asm/ptrace.h>
+#include <asm/mman.h>
+#include <asm/cputype.h>
+
+#ifdef REQUIRES_SEC
+__asm__(".arch_extension	sec");
+#endif
+#ifdef REQUIRES_VIRT
+__asm__(".arch_extension	virt");
+#endif
+
+int kvm_arch_hardware_enable(void *garbage)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+
+void kvm_arch_hardware_disable(void *garbage)
+{
+}
+
+int kvm_arch_hardware_setup(void)
+{
+	return 0;
+}
+
+void kvm_arch_hardware_unsetup(void)
+{
+}
+
+void kvm_arch_check_processor_compat(void *rtn)
+{
+	*(int *)rtn = 0;
+}
+
+void kvm_arch_sync_events(struct kvm *kvm)
+{
+}
+
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+{
+	if (type)
+		return -EINVAL;
+
+	return 0;
+}
+
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+	return VM_FAULT_SIGBUS;
+}
+
+void kvm_arch_free_memslot(struct kvm_memory_slot *free,
+			   struct kvm_memory_slot *dont)
+{
+}
+
+int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
+{
+	return 0;
+}
+
+void kvm_arch_destroy_vm(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+		if (kvm->vcpus[i]) {
+			kvm_arch_vcpu_free(kvm->vcpus[i]);
+			kvm->vcpus[i] = NULL;
+		}
+	}
+}
+
+int kvm_dev_ioctl_check_extension(long ext)
+{
+	int r;
+	switch (ext) {
+	case KVM_CAP_USER_MEMORY:
+	case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
+		r = 1;
+		break;
+	case KVM_CAP_COALESCED_MMIO:
+		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
+		break;
+	default:
+		r = 0;
+		break;
+	}
+	return r;
+}
+
+long kvm_arch_dev_ioctl(struct file *filp,
+			unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_set_memory_region(struct kvm *kvm,
+			       struct kvm_userspace_memory_region *mem,
+			       struct kvm_memory_slot old,
+			       int user_alloc)
+{
+	return 0;
+}
+
+int kvm_arch_prepare_memory_region(struct kvm *kvm,
+				   struct kvm_memory_slot *memslot,
+				   struct kvm_memory_slot old,
+				   struct kvm_userspace_memory_region *mem,
+				   int user_alloc)
+{
+	return 0;
+}
+
+void kvm_arch_commit_memory_region(struct kvm *kvm,
+				   struct kvm_userspace_memory_region *mem,
+				   struct kvm_memory_slot old,
+				   int user_alloc)
+{
+}
+
+void kvm_arch_flush_shadow(struct kvm *kvm)
+{
+}
+
+struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	int err;
+	struct kvm_vcpu *vcpu;
+
+	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	if (!vcpu) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	return vcpu;
+free_vcpu:
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
+out:
+	return ERR_PTR(err);
+}
+
+void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvm_arch_vcpu_free(vcpu);
+}
+
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int __attribute_const__ kvm_target_cpu(void)
+{
+	unsigned int midr;
+
+	midr = read_cpuid_id();
+	switch ((midr >> 4) & 0xfff) {
+	case KVM_ARM_TARGET_CORTEX_A15:
+		return KVM_ARM_TARGET_CORTEX_A15;
+	default:
+		return -EINVAL;
+	}
+}
+
+int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
+void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
+{
+}
+
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+					struct kvm_guest_debug *dbg)
+{
+	return -EINVAL;
+}
+
+
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vcpu_ioctl(struct file *filp,
+			 unsigned int ioctl, unsigned long arg)
+{
+	struct kvm_vcpu *vcpu = filp->private_data;
+	void __user *argp = (void __user *)arg;
+
+	switch (ioctl) {
+	case KVM_ARM_VCPU_INIT: {
+		struct kvm_vcpu_init init;
+
+		if (copy_from_user(&init, argp, sizeof init))
+			return -EFAULT;
+
+		return kvm_vcpu_set_target(vcpu, &init);
+
+	}
+	default:
+		return -EINVAL;
+	}
+}
+
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vm_ioctl(struct file *filp,
+		       unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_init(void *opaque)
+{
+	return 0;
+}
+
+void kvm_arch_exit(void)
+{
+}
+
+static int arm_init(void)
+{
+	int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+	return rc;
+}
+
+static void __exit arm_exit(void)
+{
+	kvm_exit();
+}
+
+module_init(arm_init);
+module_exit(arm_exit)
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
new file mode 100644
index 0000000..4b9dad8
--- /dev/null
+++ b/arch/arm/kvm/coproc.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/kvm_host.h>
+
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
+{
+}
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
new file mode 100644
index 0000000..c0bacc6
--- /dev/null
+++ b/arch/arm/kvm/emulate.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <asm/kvm_emulate.h>
+
+#define REG_OFFSET(_reg) \
+	(offsetof(struct kvm_vcpu_regs, _reg) / sizeof(u32))
+
+#define USR_REG_OFFSET(_num) REG_OFFSET(usr_regs[_num])
+
+static const unsigned long vcpu_reg_offsets[MODE_SYS + 1][16] = {
+	/* FIQ Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7),
+		REG_OFFSET(fiq_regs[1]), /* r8 */
+		REG_OFFSET(fiq_regs[1]), /* r9 */
+		REG_OFFSET(fiq_regs[2]), /* r10 */
+		REG_OFFSET(fiq_regs[3]), /* r11 */
+		REG_OFFSET(fiq_regs[4]), /* r12 */
+		REG_OFFSET(fiq_regs[5]), /* r13 */
+		REG_OFFSET(fiq_regs[6]), /* r14 */
+		REG_OFFSET(pc)		 /* r15 */
+	},
+
+	/* IRQ Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(irq_regs[0]), /* r13 */
+		REG_OFFSET(irq_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* SVC Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(svc_regs[0]), /* r13 */
+		REG_OFFSET(svc_regs[1]), /* r14 */
+		REG_OFFSET(pc)		 /* r15 */
+	},
+
+	/* ABT Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(abt_regs[0]), /* r13 */
+		REG_OFFSET(abt_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* UND Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(und_regs[0]), /* r13 */
+		REG_OFFSET(und_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* USR Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(usr_regs[13]), /* r13 */
+		REG_OFFSET(usr_regs[14]), /* r14 */
+		REG_OFFSET(pc)	          /* r15 */
+	},
+
+	/* SYS Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(usr_regs[13]), /* r13 */
+		REG_OFFSET(usr_regs[14]), /* r14 */
+		REG_OFFSET(pc)	          /* r15 */
+	},
+};
+
+/*
+ * Return a pointer to the register number valid in the specified mode of
+ * the virtual CPU.
+ */
+u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
+{
+	u32 *reg_array = (u32 *)&vcpu->arch.regs;
+
+	BUG_ON(reg_num > 15);
+	BUG_ON(mode > MODE_SYS);
+
+	return reg_array + vcpu_reg_offsets[mode][reg_num];
+}
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
new file mode 100644
index 0000000..3e38c95
--- /dev/null
+++ b/arch/arm/kvm/exports.c
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/module.h>
+
+EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
new file mode 100644
index 0000000..7215305
--- /dev/null
+++ b/arch/arm/kvm/guest.c
@@ -0,0 +1,163 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <asm/uaccess.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
+
+#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
+#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ NULL }
+};
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	struct kvm_vcpu_regs *vcpu_regs = &vcpu->arch.regs;
+
+	/*
+	 * GPRs and PSRs
+	 */
+	memcpy(regs->regs0_7, &(vcpu_regs->usr_regs[0]), sizeof(u32) * 8);
+	memcpy(regs->usr_regs8_12, &(vcpu_regs->usr_regs[8]), sizeof(u32) * 5);
+	memcpy(regs->fiq_regs8_12, &(vcpu_regs->fiq_regs[0]), sizeof(u32) * 5);
+	regs->reg13[MODE_FIQ] = vcpu_regs->fiq_regs[5];
+	regs->reg14[MODE_FIQ] = vcpu_regs->fiq_regs[6];
+	regs->reg13[MODE_IRQ] = vcpu_regs->irq_regs[0];
+	regs->reg14[MODE_IRQ] = vcpu_regs->irq_regs[1];
+	regs->reg13[MODE_SVC] = vcpu_regs->svc_regs[0];
+	regs->reg14[MODE_SVC] = vcpu_regs->svc_regs[1];
+	regs->reg13[MODE_ABT] = vcpu_regs->abt_regs[0];
+	regs->reg14[MODE_ABT] = vcpu_regs->abt_regs[1];
+	regs->reg13[MODE_UND] = vcpu_regs->und_regs[0];
+	regs->reg14[MODE_UND] = vcpu_regs->und_regs[1];
+	regs->reg13[MODE_USR] = vcpu_regs->usr_regs[0];
+	regs->reg14[MODE_USR] = vcpu_regs->usr_regs[1];
+
+	regs->spsr[MODE_FIQ]  = vcpu_regs->fiq_regs[7];
+	regs->spsr[MODE_IRQ]  = vcpu_regs->irq_regs[2];
+	regs->spsr[MODE_SVC]  = vcpu_regs->svc_regs[2];
+	regs->spsr[MODE_ABT]  = vcpu_regs->abt_regs[2];
+	regs->spsr[MODE_UND]  = vcpu_regs->und_regs[2];
+
+	regs->reg15 = vcpu_regs->pc;
+	regs->cpsr = vcpu_regs->cpsr;
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	struct kvm_vcpu_regs *vcpu_regs = &vcpu->arch.regs;
+
+	if (__vcpu_mode(regs->cpsr) == 0xf)
+		return -EINVAL;
+
+	memcpy(&(vcpu_regs->usr_regs[0]), regs->regs0_7, sizeof(u32) * 8);
+	memcpy(&(vcpu_regs->usr_regs[8]), regs->usr_regs8_12, sizeof(u32) * 5);
+	memcpy(&(vcpu_regs->fiq_regs[0]), regs->fiq_regs8_12, sizeof(u32) * 5);
+
+	vcpu_regs->fiq_regs[5] = regs->reg13[MODE_FIQ];
+	vcpu_regs->fiq_regs[6] = regs->reg14[MODE_FIQ];
+	vcpu_regs->irq_regs[0] = regs->reg13[MODE_IRQ];
+	vcpu_regs->irq_regs[1] = regs->reg14[MODE_IRQ];
+	vcpu_regs->svc_regs[0] = regs->reg13[MODE_SVC];
+	vcpu_regs->svc_regs[1] = regs->reg14[MODE_SVC];
+	vcpu_regs->abt_regs[0] = regs->reg13[MODE_ABT];
+	vcpu_regs->abt_regs[1] = regs->reg14[MODE_ABT];
+	vcpu_regs->und_regs[0] = regs->reg13[MODE_UND];
+	vcpu_regs->und_regs[1] = regs->reg14[MODE_UND];
+	vcpu_regs->usr_regs[0] = regs->reg13[MODE_USR];
+	vcpu_regs->usr_regs[1] = regs->reg14[MODE_USR];
+
+	vcpu_regs->fiq_regs[7] = regs->spsr[MODE_FIQ];
+	vcpu_regs->irq_regs[2] = regs->spsr[MODE_IRQ];
+	vcpu_regs->svc_regs[2] = regs->spsr[MODE_SVC];
+	vcpu_regs->abt_regs[2] = regs->spsr[MODE_ABT];
+	vcpu_regs->und_regs[2] = regs->spsr[MODE_UND];
+
+	vcpu_regs->pc = regs->reg15;
+	vcpu_regs->cpsr = regs->cpsr;
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
+			const struct kvm_vcpu_init *init)
+{
+	unsigned int i;
+
+	/* We can only do a cortex A15 for now. */
+	if (init->target != kvm_target_cpu())
+		return -EINVAL;
+
+	vcpu->arch.target = init->target;
+	bitmap_zero(vcpu->arch.features, NUM_FEATURES);
+
+	/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
+	for (i = 0; i < sizeof(init->features)*8; i++) {
+		if (init->features[i / 32] & (1 << (i % 32))) {
+			if (i >= NUM_FEATURES)
+				return -ENOENT;
+			set_bit(i, vcpu->arch.features);
+		}
+	}
+
+	/* Now we know what it is, we can reset it. */
+	return kvm_reset_vcpu(vcpu);
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+				  struct kvm_translation *tr)
+{
+	return -EINVAL;
+}
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
new file mode 100644
index 0000000..1dc8926
--- /dev/null
+++ b/arch/arm/kvm/init.S
@@ -0,0 +1,19 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
new file mode 100644
index 0000000..1dc8926
--- /dev/null
+++ b/arch/arm/kvm/interrupts.S
@@ -0,0 +1,19 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
new file mode 100644
index 0000000..10ed464
--- /dev/null
+++ b/arch/arm/kvm/mmu.c
@@ -0,0 +1,17 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
new file mode 100644
index 0000000..888799f
--- /dev/null
+++ b/arch/arm/kvm/reset.c
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/sched.h>
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+
+#include <asm/unified.h>
+#include <asm/ptrace.h>
+#include <asm/cputype.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_coproc.h>
+
+/******************************************************************************
+ * Cortex-A15 Reset Values
+ */
+
+static const int a15_max_cpu_idx = 3;
+
+static struct kvm_vcpu_regs a15_regs_reset = {
+	.cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
+};
+
+
+/*******************************************************************************
+ * Exported reset function
+ */
+
+/**
+ * kvm_reset_vcpu - sets core registers and cp15 registers to reset value
+ * @vcpu: The VCPU pointer
+ *
+ * This function finds the right table above and sets the registers on the
+ * virtual CPU struct to their architectually defined reset values.
+ */
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_regs *cpu_reset;
+
+	switch (vcpu->arch.target) {
+	case KVM_ARM_TARGET_CORTEX_A15:
+		if (vcpu->vcpu_id > a15_max_cpu_idx)
+			return -EINVAL;
+		cpu_reset = &a15_regs_reset;
+		vcpu->arch.midr = read_cpuid_id();
+		break;
+	default:
+		return -ENODEV;
+	}
+
+	/* Reset core registers */
+	memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
+
+	/* Reset CP15 registers */
+	kvm_reset_coprocs(vcpu);
+
+	return 0;
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
new file mode 100644
index 0000000..f8869c1
--- /dev/null
+++ b/arch/arm/kvm/trace.h
@@ -0,0 +1,52 @@
+#if !defined(_TRACE_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+/*
+ * Tracepoints for entry/exit to guest
+ */
+TRACE_EVENT(kvm_entry,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+TRACE_EVENT(kvm_exit,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+
+
+#endif /* _TRACE_KVM_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH arch/arm/kvm
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..5fb08b5 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -904,6 +904,7 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_ONE_REG		  _IOW(KVMIO,  0xac, struct kvm_one_reg)
 /* VM is being stopped by host */
 #define KVM_KVMCLOCK_CTRL	  _IO(KVMIO,   0xad)
+#define KVM_ARM_VCPU_INIT	  _IOW(KVMIO,  0xae, struct kvm_vcpu_init)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 06/14] KVM: ARM: Hypervisor inititalization
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
                   ` (4 preceding siblings ...)
  2012-08-16 15:28 ` [PATCH v10 05/14] KVM: ARM: Initial skeleton to compile KVM support Christoffer Dall
@ 2012-08-16 15:29 ` Christoffer Dall
  2012-08-23 15:08   ` [kvmarm] " Lei Wen
  2012-08-16 15:29 ` [PATCH v10 07/14] KVM: ARM: Memory virtualization setup Christoffer Dall
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:29 UTC (permalink / raw)
  To: kvmarm, kvm

Sets up the required registers to run code in HYP-mode from the kernel.

By setting the HVBAR the kernel can execute code in Hyp-mode with
the MMU disabled. The HVBAR initially points to initialization code,
which initializes other Hyp-mode registers and enables the MMU
for Hyp-mode. Afterwards, the HVBAR is changed to point to KVM
Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.

Also provides memory mapping code to map required code pages, data structures,
and I/O regions  accessed in Hyp mode at the same virtual address as the host
kernel virtual addresses, but which conforms to the architectural requirements
for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
and is comprised of:
 - create_hyp_mappings(from, to);
 - create_hyp_io_mappings(from, to, phys_addr);
 - free_hyp_pmds();

Note: The initialization mechanism currently relies on an SMC #0 call
to the secure monitor, which was merely a fast way of getting to the
hypervisor. We are working on supporting Hyp mode boot of the kernel
and control of Hyp mode through a local kernel mechanism.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h              |  109 +++++++++++++
 arch/arm/include/asm/kvm_asm.h              |   25 +++
 arch/arm/include/asm/kvm_mmu.h              |   36 ++++
 arch/arm/include/asm/pgtable-3level-hwdef.h |    4 
 arch/arm/include/asm/pgtable-3level.h       |    4 
 arch/arm/include/asm/pgtable.h              |    1 
 arch/arm/kvm/arm.c                          |  224 +++++++++++++++++++++++++++
 arch/arm/kvm/exports.c                      |   16 ++
 arch/arm/kvm/init.S                         |  130 ++++++++++++++++
 arch/arm/kvm/interrupts.S                   |   48 ++++++
 arch/arm/kvm/mmu.c                          |  189 +++++++++++++++++++++++
 mm/memory.c                                 |    2 
 12 files changed, 788 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm_mmu.h

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 2f9d28e..6e46541 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -19,10 +19,119 @@
 #ifndef __ARM_KVM_ARM_H__
 #define __ARM_KVM_ARM_H__
 
+#include <asm/types.h>
+
 /* Supported Processor Types */
 #define CORTEX_A15	(0xC0F)
 
 /* Multiprocessor Affinity Register */
 #define MPIDR_CPUID	(0x3 << 0)
 
+/* Hyp Configuration Register (HCR) bits */
+#define HCR_TGE		(1 << 27)
+#define HCR_TVM		(1 << 26)
+#define HCR_TTLB	(1 << 25)
+#define HCR_TPU		(1 << 24)
+#define HCR_TPC		(1 << 23)
+#define HCR_TSW		(1 << 22)
+#define HCR_TAC		(1 << 21)
+#define HCR_TIDCP	(1 << 20)
+#define HCR_TSC		(1 << 19)
+#define HCR_TID3	(1 << 18)
+#define HCR_TID2	(1 << 17)
+#define HCR_TID1	(1 << 16)
+#define HCR_TID0	(1 << 15)
+#define HCR_TWE		(1 << 14)
+#define HCR_TWI		(1 << 13)
+#define HCR_DC		(1 << 12)
+#define HCR_BSU		(3 << 10)
+#define HCR_BSU_IS	(1 << 10)
+#define HCR_FB		(1 << 9)
+#define HCR_VA		(1 << 8)
+#define HCR_VI		(1 << 7)
+#define HCR_VF		(1 << 6)
+#define HCR_AMO		(1 << 5)
+#define HCR_IMO		(1 << 4)
+#define HCR_FMO		(1 << 3)
+#define HCR_PTW		(1 << 2)
+#define HCR_SWIO	(1 << 1)
+#define HCR_VM		1
+
+/*
+ * The bits we set in HCR:
+ * TAC:		Trap ACTLR
+ * TSC:		Trap SMC
+ * TSW:		Trap cache operations by set/way
+ * TWI:		Trap WFI
+ * TIDCP:	Trap L2CTLR/L2ECTLR
+ * BSU_IS:	Upgrade barriers to the inner shareable domain
+ * FB:		Force broadcast of all maintainance operations
+ * AMO:		Override CPSR.A and enable signaling with VA
+ * IMO:		Override CPSR.I and enable signaling with VI
+ * FMO:		Override CPSR.F and enable signaling with VF
+ * SWIO:	Turn set/way invalidates into set/way clean+invalidate
+ */
+#define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
+			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
+			HCR_SWIO | HCR_TIDCP)
+
+/* Hyp System Control Register (HSCTLR) bits */
+#define HSCTLR_TE	(1 << 30)
+#define HSCTLR_EE	(1 << 25)
+#define HSCTLR_FI	(1 << 21)
+#define HSCTLR_WXN	(1 << 19)
+#define HSCTLR_I	(1 << 12)
+#define HSCTLR_C	(1 << 2)
+#define HSCTLR_A	(1 << 1)
+#define HSCTLR_M	1
+#define HSCTLR_MASK	(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
+			 HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
+
+/* TTBCR and HTCR Registers bits */
+#define TTBCR_EAE	(1 << 31)
+#define TTBCR_IMP	(1 << 30)
+#define TTBCR_SH1	(3 << 28)
+#define TTBCR_ORGN1	(3 << 26)
+#define TTBCR_IRGN1	(3 << 24)
+#define TTBCR_EPD1	(1 << 23)
+#define TTBCR_A1	(1 << 22)
+#define TTBCR_T1SZ	(3 << 16)
+#define TTBCR_SH0	(3 << 12)
+#define TTBCR_ORGN0	(3 << 10)
+#define TTBCR_IRGN0	(3 << 8)
+#define TTBCR_EPD0	(1 << 7)
+#define TTBCR_T0SZ	3
+#define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
+
+/* Hyp Debug Configuration Register bits */
+#define HDCR_TDRA	(1 << 11)
+#define HDCR_TDOSA	(1 << 10)
+#define HDCR_TDA	(1 << 9)
+#define HDCR_TDE	(1 << 8)
+#define HDCR_HPME	(1 << 7)
+#define HDCR_TPM	(1 << 6)
+#define HDCR_TPMCR	(1 << 5)
+#define HDCR_HPMN_MASK	(0x1F)
+
+/* Virtualization Translation Control Register (VTCR) bits */
+#define VTCR_SH0	(3 << 12)
+#define VTCR_ORGN0	(3 << 10)
+#define VTCR_IRGN0	(3 << 8)
+#define VTCR_SL0	(3 << 6)
+#define VTCR_S		(1 << 4)
+#define VTCR_T0SZ	3
+#define VTCR_MASK	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0 | VTCR_SL0 | \
+			 VTCR_S | VTCR_T0SZ | VTCR_MASK)
+#define VTCR_HTCR_SH	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0)
+#define VTCR_SL_L2	0		/* Starting-level: 2 */
+#define VTCR_SL_L1	(1 << 6)	/* Starting-level: 1 */
+#define VTCR_GUEST_SL	VTCR_SL_L1
+#define VTCR_GUEST_T0SZ	0
+#if VTCR_GUEST_SL == 0
+#define VTTBR_X		(14 - VTCR_GUEST_T0SZ)
+#else
+#define VTTBR_X		(5 - VTCR_GUEST_T0SZ)
+#endif
+
+
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 44591f9..58d51e3 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -26,5 +26,30 @@
 #define ARM_EXCEPTION_DATA_ABORT  4
 #define ARM_EXCEPTION_IRQ	  5
 #define ARM_EXCEPTION_FIQ	  6
+#define ARM_EXCEPTION_HVC	  7
+
+/*
+ * SMC Hypervisor API call number
+ */
+#define SMCHYP_HVBAR_W 0xfffffff0
+
+#ifndef __ASSEMBLY__
+struct kvm_vcpu;
+
+extern char __kvm_hyp_init[];
+extern char __kvm_hyp_init_end[];
+
+extern char __kvm_hyp_exit[];
+extern char __kvm_hyp_exit_end[];
+
+extern char __kvm_hyp_vector[];
+
+extern char __kvm_hyp_code_start[];
+extern char __kvm_hyp_code_end[];
+
+extern void __kvm_flush_vm_context(void);
+
+extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+#endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
new file mode 100644
index 0000000..8252921
--- /dev/null
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_MMU_H__
+#define __ARM_KVM_MMU_H__
+
+/*
+ * The architecture supports 40-bit IPA as input to the 2nd stage translations
+ * and PTRS_PER_PGD2 could therefore be 1024.
+ *
+ * To save a bit of memory and to avoid alignment issues we assume 39-bit IPA
+ * for now, but remember that the level-1 table must be aligned to its size.
+ */
+#define PTRS_PER_PGD2	512
+#define PGD2_ORDER	get_order(PTRS_PER_PGD2 * sizeof(pgd_t))
+
+int create_hyp_mappings(void *from, void *to);
+int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
+void free_hyp_pmds(void);
+
+#endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index a2d404e..18f5cef 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -32,6 +32,9 @@
 #define PMD_TYPE_SECT		(_AT(pmdval_t, 1) << 0)
 #define PMD_BIT4		(_AT(pmdval_t, 0))
 #define PMD_DOMAIN(x)		(_AT(pmdval_t, 0))
+#define PMD_APTABLE_SHIFT	(61)
+#define PMD_APTABLE		(_AT(pgdval_t, 3) << PGD_APTABLE_SHIFT)
+#define PMD_PXNTABLE		(_AT(pgdval_t, 1) << 59)
 
 /*
  *   - section
@@ -41,6 +44,7 @@
 #define PMD_SECT_S		(_AT(pmdval_t, 3) << 8)
 #define PMD_SECT_AF		(_AT(pmdval_t, 1) << 10)
 #define PMD_SECT_nG		(_AT(pmdval_t, 1) << 11)
+#define PMD_SECT_PXN		(_AT(pmdval_t, 1) << 53)
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index b249035..1169a8a 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -107,6 +107,10 @@
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & 2))
 #define pud_present(pud)	(pud_val(pud))
+#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_TABLE)
+#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_SECT)
 
 #define pud_clear(pudp)			\
 	do {				\
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index f66626d..bc83540 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -82,6 +82,7 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
+#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 1f30b0a..0b1c466 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -34,6 +34,11 @@
 #include <asm/ptrace.h>
 #include <asm/mman.h>
 #include <asm/cputype.h>
+#include <asm/idmap.h>
+#include <asm/tlbflush.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_mmu.h>
 
 #ifdef REQUIRES_SEC
 __asm__(".arch_extension	sec");
@@ -42,6 +47,9 @@ __asm__(".arch_extension	sec");
 __asm__(".arch_extension	virt");
 #endif
 
+static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+static DEFINE_PER_CPU(struct vfp_hard_struct *, kvm_host_vfp_state);
+
 int kvm_arch_hardware_enable(void *garbage)
 {
 	return 0;
@@ -293,13 +301,229 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	return -EINVAL;
 }
 
+static void cpu_set_vector(void *vector)
+{
+	unsigned long vector_ptr;
+	unsigned long smc_hyp_nr;
+
+	vector_ptr = (unsigned long)vector;
+	smc_hyp_nr = SMCHYP_HVBAR_W;
+
+	/*
+	 * Set the HVBAR
+	 */
+	asm volatile (
+		"mov	r0, %[vector_ptr]\n\t"
+		"mov	r7, %[smc_hyp_nr]\n\t"
+		"smc	#0\n\t" : :
+		[vector_ptr] "r" (vector_ptr),
+		[smc_hyp_nr] "r" (smc_hyp_nr) :
+		"r0", "r7");
+}
+
+static void cpu_init_hyp_mode(void *vector)
+{
+	unsigned long pgd_ptr;
+	unsigned long hyp_stack_ptr;
+	unsigned long stack_page;
+
+	cpu_set_vector(vector);
+
+	pgd_ptr = virt_to_phys(hyp_pgd);
+	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
+	hyp_stack_ptr = stack_page + PAGE_SIZE;
+
+	/*
+	 * Call initialization code
+	 */
+	asm volatile (
+		"mov	r0, %[pgd_ptr]\n\t"
+		"mov	r1, %[hyp_stack_ptr]\n\t"
+		"hvc	#0\n\t" : :
+		[pgd_ptr] "r" (pgd_ptr),
+		[hyp_stack_ptr] "r" (hyp_stack_ptr) :
+		"r0", "r1");
+}
+
+/**
+ * Inits Hyp-mode on all online CPUs
+ */
+static int init_hyp_mode(void)
+{
+	phys_addr_t init_phys_addr;
+	int cpu;
+	int err = 0;
+
+	/*
+	 * Allocate stack pages for Hypervisor-mode
+	 */
+	for_each_possible_cpu(cpu) {
+		unsigned long stack_page;
+
+		stack_page = __get_free_page(GFP_KERNEL);
+		if (!stack_page) {
+			err = -ENOMEM;
+			goto out_free_stack_pages;
+		}
+
+		per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
+	}
+
+	/*
+	 * Execute the init code on each CPU.
+	 *
+	 * Note: The stack is not mapped yet, so don't do anything else than
+	 * initializing the hypervisor mode on each CPU using a local stack
+	 * space for temporary storage.
+	 */
+	init_phys_addr = virt_to_phys(__kvm_hyp_init);
+	for_each_online_cpu(cpu) {
+		smp_call_function_single(cpu, cpu_init_hyp_mode,
+					 (void *)(long)init_phys_addr, 1);
+	}
+
+	/*
+	 * Unmap the identity mapping
+	 */
+	hyp_idmap_teardown();
+
+	/*
+	 * Map the Hyp-code called directly from the host
+	 */
+	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
+	if (err) {
+		kvm_err("Cannot map world-switch code\n");
+		goto out_free_mappings;
+	}
+
+	/*
+	 * Map the Hyp stack pages
+	 */
+	for_each_possible_cpu(cpu) {
+		char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
+		err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE);
+
+		if (err) {
+			kvm_err("Cannot map hyp stack\n");
+			goto out_free_mappings;
+		}
+	}
+
+	/*
+	 * Set the HVBAR to the virtual kernel address
+	 */
+	for_each_online_cpu(cpu)
+		smp_call_function_single(cpu, cpu_set_vector,
+					 __kvm_hyp_vector, 1);
+
+	/*
+	 * Map the host VFP structures
+	 */
+	for_each_possible_cpu(cpu) {
+		struct vfp_hard_struct *vfp;
+
+		vfp = kmalloc(sizeof(*vfp), GFP_KERNEL);
+		if (!vfp) {
+			kvm_err("Not enough memory for vfp struct\n");
+			goto out_free_vfp;
+		}
+
+		memset(vfp, 0, sizeof(*vfp));
+		per_cpu(kvm_host_vfp_state, cpu) = vfp;
+		err = create_hyp_mappings(vfp, vfp + 1);
+
+		if (err) {
+			kvm_err("Cannot map host VFP state: %d\n", err);
+			goto out_free_vfp;
+		}
+	}
+
+	return 0;
+out_free_vfp:
+	for_each_possible_cpu(cpu)
+		kfree(per_cpu(kvm_host_vfp_state, cpu));
+out_free_mappings:
+	free_hyp_pmds();
+out_free_stack_pages:
+	for_each_possible_cpu(cpu)
+		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+	return err;
+}
+
+/**
+ * Initialize Hyp-mode and memory mappings on all CPUs.
+ */
 int kvm_arch_init(void *opaque)
 {
+	int err;
+
+	if (kvm_target_cpu() < 0) {
+		kvm_err("Target CPU not supported!\n");
+		return -ENODEV;
+	}
+
+	err = init_hyp_mode();
+	if (err)
+		goto out_err;
+
+	return 0;
+out_err:
+	return err;
+}
+
+static void cpu_exit_hyp_mode(void *vector)
+{
+	cpu_set_vector(vector);
+
+	/*
+	 * Disable Hyp-MMU for each cpu
+	 */
+	asm volatile ("hvc	#0");
+}
+
+static int exit_hyp_mode(void)
+{
+	phys_addr_t exit_phys_addr;
+	int cpu;
+
+	/*
+	 * TODO: flush Hyp TLB in case idmap code overlaps.
+	 * Note that we should do this in the monitor code when switching the
+	 * HVBAR, but this is going  away and should be rather done in the Hyp
+	 * mode change of HVBAR.
+	 */
+	hyp_idmap_setup();
+	exit_phys_addr = virt_to_phys(__kvm_hyp_exit);
+	BUG_ON(exit_phys_addr & 0x1f);
+
+	/*
+	 * Execute the exit code on each CPU.
+	 *
+	 * Note: The stack is not mapped yet, so don't do anything else than
+	 * initializing the hypervisor mode on each CPU using a local stack
+	 * space for temporary storage.
+	 */
+	for_each_online_cpu(cpu) {
+		smp_call_function_single(cpu, cpu_exit_hyp_mode,
+					 (void *)(long)exit_phys_addr, 1);
+	}
+
 	return 0;
 }
 
 void kvm_arch_exit(void)
 {
+	int cpu;
+
+	exit_hyp_mode();
+
+	free_hyp_pmds();
+	for_each_possible_cpu(cpu) {
+		kfree(per_cpu(kvm_host_vfp_state, cpu));
+		per_cpu(kvm_host_vfp_state, cpu) = NULL;
+		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+		per_cpu(kvm_arm_hyp_stack_page, cpu) = 0;
+	}
 }
 
 static int arm_init(void)
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index 3e38c95..8ebdf07 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -17,5 +17,21 @@
  */
 
 #include <linux/module.h>
+#include <asm/kvm_asm.h>
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_init);
+EXPORT_SYMBOL_GPL(__kvm_hyp_init_end);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_exit);
+EXPORT_SYMBOL_GPL(__kvm_hyp_exit_end);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_vector);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_code_start);
+EXPORT_SYMBOL_GPL(__kvm_hyp_code_end);
+
+EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
+
+EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
 
 EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 1dc8926..4db26cb 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -15,5 +15,135 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/linkage.h>
+#include <asm/unified.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor initialization
+@    - should be called with:
+@        r0 = Hypervisor pgd pointer
+@        r1 = top of Hyp stack (kernel VA)
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	.text
+	.arm
+        .pushsection    .hyp.idmap.text,"ax"
+	.align 12
+__kvm_hyp_init:
+	.globl __kvm_hyp_init
+
+	@ Hyp-mode exception vector
+	nop
+	nop
+	nop
+	nop
+	nop
+	b	__do_hyp_init
+	nop
+	nop
+
+__do_hyp_init:
+	@ Set the sp to end of this page and push data for later use
+	mov	sp, pc
+	bic	sp, sp, #0x0ff
+	bic	sp, sp, #0xf00
+	add	sp, sp, #0x1000
+	push	{r0, r1, r2, r12}
+
+	@ Set the HTTBR to point to the hypervisor PGD pointer passed to
+	@ function and set the upper bits equal to the kernel PGD.
+	mrrc	p15, 1, r1, r2, c2
+	mcrr	p15, 4, r0, r2, c2
+
+	@ Set the HTCR and VTCR to the same shareability and cacheability
+	@ settings as the non-secure TTBCR and with T0SZ == 0.
+	mrc	p15, 4, r0, c2, c0, 2	@ HTCR
+	ldr	r12, =HTCR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c2, c0, 2	@ TTBCR
+	and	r1, r1, #(HTCR_MASK & ~TTBCR_T0SZ)
+	orr	r0, r0, r1
+	mcr	p15, 4, r0, c2, c0, 2	@ HTCR
+
+	mrc	p15, 4, r1, c2, c1, 2	@ VTCR
+	bic	r1, r1, #(VTCR_HTCR_SH | VTCR_SL0)
+	bic	r0, r0, #(~VTCR_HTCR_SH)
+	orr	r1, r0, r1
+	orr	r1, r1, #(VTCR_SL_L1 | VTCR_GUEST_T0SZ)
+	mcr	p15, 4, r1, c2, c1, 2	@ VTCR
+
+	@ Use the same memory attributes for hyp. accesses as the kernel
+	@ (copy MAIRx ro HMAIRx).
+	mrc	p15, 0, r0, c10, c2, 0
+	mcr	p15, 4, r0, c10, c2, 0
+	mrc	p15, 0, r0, c10, c2, 1
+	mcr	p15, 4, r0, c10, c2, 1
+
+	@ Set the HSCTLR to:
+	@  - ARM/THUMB exceptions: Kernel config (Thumb-2 kernel)
+	@  - Endianness: Kernel config
+	@  - Fast Interrupt Features: Kernel config
+	@  - Write permission implies XN: disabled
+	@  - Instruction cache: enabled
+	@  - Data/Unified cache: enabled
+	@  - Memory alignment checks: enabled
+	@  - MMU: enabled (this code must be run from an identity mapping)
+	mrc	p15, 4, r0, c1, c0, 0	@ HSCR
+	ldr	r12, =HSCTLR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c1, c0, 0	@ SCTLR
+	ldr	r12, =(HSCTLR_EE | HSCTLR_FI)
+	and	r1, r1, r12
+ ARM(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I)			)
+ THUMB(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I | HSCTLR_TE)	)
+	orr	r1, r1, r12
+	orr	r0, r0, r1
+	isb
+	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
+	isb
+
+	@ Set stack pointer and return to the kernel
+	pop	{r0, r1, r2, r12}
+	mov	sp, r1
+	eret
+
+	.ltorg
+
+	.align 12
+
+	__kvm_init_sp:
+	.globl __kvm_hyp_init_end
+__kvm_hyp_init_end:
+
+	.align 12
+__kvm_hyp_exit:
+	.globl __kvm_hyp_exit
+
+	@ Hyp-mode exception vector
+	nop
+	nop
+	nop
+	nop
+	nop
+	b	__do_hyp_exit
+	nop
+	nop
+
+__do_hyp_exit:
+	@ Clear the MMU and TE bits in the HSCR
+	mrc	p15, 4, sp, c1, c0, 0	@ HSCR
+	bic	sp, sp, #((1 << 30) | (1 << 0))
+
+	isb
+	mcr	p15, 4, sp, c1, c0, 0	@ HSCR
+	mcr	p15, 4, r0, c8, c7, 0   @ Flush Hyp TLB, r0 ignored
+	isb
+	eret
+
+	.globl __kvm_hyp_exit_end
+__kvm_hyp_exit_end:
+
+	.popsection
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 1dc8926..bf09801 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -15,5 +15,53 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/unified.h>
+#include <asm/page.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+	.text
+	.align	PAGE_SHIFT
+
+__kvm_hyp_code_start:
+	.globl __kvm_hyp_code_start
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Flush TLBs and instruction caches of current CPU for all VMIDs
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_flush_vm_context)
+	bx	lr
+ENDPROC(__kvm_flush_vm_context)
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor world-switch code
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_vcpu_run)
+	bx	lr
+
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor exception vector and handlers
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+	.align 5
+__kvm_hyp_vector:
+	.globl __kvm_hyp_vector
+	nop
+
+/*
+ * The below lines makes sure the HYP mode code fits in a single page (the
+ * assembler will bark at you if it doesn't). Please keep them together. If
+ * you plan to restructure the code or increase its size over a page, you'll
+ * have to fix the code in init_hyp_mode().
+ */
+__kvm_hyp_code_end:
+	.globl	__kvm_hyp_code_end
+
+	.org	__kvm_hyp_code_start + PAGE_SIZE
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 10ed464..6a7dfd4 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -15,3 +15,192 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/mman.h>
+#include <linux/kvm_host.h>
+#include <linux/io.h>
+#include <asm/idmap.h>
+#include <asm/pgalloc.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
+#include <asm/mach/map.h>
+
+static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
+
+static void free_ptes(pmd_t *pmd, unsigned long addr)
+{
+	pte_t *pte;
+	unsigned int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			pte_free_kernel(NULL, pte);
+		}
+		pmd++;
+	}
+}
+
+/**
+ * free_hyp_pmds - free a Hyp-mode level-2 tables and child level-3 tables
+ *
+ * Assumes this is a page table used strictly in Hyp-mode and therefore contains
+ * only mappings in the kernel memory area, which is above PAGE_OFFSET.
+ */
+void free_hyp_pmds(void)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = PAGE_OFFSET; addr != 0; addr += PGDIR_SIZE) {
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none(*pud))
+			continue;
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+		pud_clear(pud);
+	}
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+}
+
+/*
+ * Create a HYP pte mapping.
+ *
+ * If pfn_base is NULL, we map kernel pages into HYP with the virtual
+ * address. Otherwise, this is considered an I/O mapping and we map
+ * the physical region starting at *pfn_base to [start, end[.
+ */
+static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,
+				    unsigned long end, unsigned long *pfn_base)
+{
+	pte_t *pte;
+	unsigned long addr;
+	pgprot_t prot;
+
+	if (pfn_base)
+		prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER);
+	else
+		prot = PAGE_HYP;
+
+	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
+		pte = pte_offset_kernel(pmd, addr);
+		if (pfn_base) {
+			BUG_ON(pfn_valid(*pfn_base));
+			set_pte_ext(pte, pfn_pte(*pfn_base, prot), 0);
+			(*pfn_base)++;
+		} else {
+			struct page *page;
+			BUG_ON(!virt_addr_valid(addr));
+			page = virt_to_page(addr);
+			set_pte_ext(pte, mk_pte(page, prot), 0);
+		}
+
+	}
+}
+
+static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
+				   unsigned long end, unsigned long *pfn_base)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+	unsigned long addr, next;
+
+	for (addr = start; addr < end; addr = next) {
+		pmd = pmd_offset(pud, addr);
+
+		BUG_ON(pmd_sect(*pmd));
+
+		if (pmd_none(*pmd)) {
+			pte = pte_alloc_one_kernel(NULL, addr);
+			if (!pte) {
+				kvm_err("Cannot allocate Hyp pte\n");
+				return -ENOMEM;
+			}
+			pmd_populate_kernel(NULL, pmd, pte);
+		}
+
+		next = pmd_addr_end(addr, end);
+		create_hyp_pte_mappings(pmd, addr, next, pfn_base);
+	}
+
+	return 0;
+}
+
+static int __create_hyp_mappings(void *from, void *to, unsigned long *pfn_base)
+{
+	unsigned long start = (unsigned long)from;
+	unsigned long end = (unsigned long)to;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr, next;
+	int err = 0;
+
+	BUG_ON(start > end);
+	if (start < PAGE_OFFSET)
+		return -EINVAL;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = start; addr < end; addr = next) {
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none_or_clear_bad(pud)) {
+			pmd = pmd_alloc_one(NULL, addr);
+			if (!pmd) {
+				kvm_err("Cannot allocate Hyp pmd\n");
+				err = -ENOMEM;
+				goto out;
+			}
+			pud_populate(NULL, pud, pmd);
+		}
+
+		next = pgd_addr_end(addr, end);
+		err = create_hyp_pmd_mappings(pud, addr, next, pfn_base);
+		if (err)
+			goto out;
+	}
+out:
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+	return err;
+}
+
+/**
+ * create_hyp_mappings - map a kernel virtual address range in Hyp mode
+ * @from:	The virtual kernel start address of the range
+ * @to:		The virtual kernel end address of the range (exclusive)
+ *
+ * The same virtual address as the kernel virtual address is also used in
+ * Hyp-mode mapping to the same underlying physical pages.
+ *
+ * Note: Wrapping around zero in the "to" address is not supported.
+ */
+int create_hyp_mappings(void *from, void *to)
+{
+	return __create_hyp_mappings(from, to, NULL);
+}
+
+/**
+ * create_hyp_io_mappings - map a physical IO range in Hyp mode
+ * @from:	The virtual HYP start address of the range
+ * @to:		The virtual HYP end address of the range (exclusive)
+ * @addr:	The physical start address which gets mapped
+ */
+int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
+{
+	unsigned long pfn = __phys_to_pfn(addr);
+	return __create_hyp_mappings(from, to, &pfn);
+}
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
diff --git a/mm/memory.c b/mm/memory.c
index 5736170..0e58fdd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -383,12 +383,14 @@ void pgd_clear_bad(pgd_t *pgd)
 	pgd_ERROR(*pgd);
 	pgd_clear(pgd);
 }
+EXPORT_SYMBOL_GPL(pgd_clear_bad);
 
 void pud_clear_bad(pud_t *pud)
 {
 	pud_ERROR(*pud);
 	pud_clear(pud);
 }
+EXPORT_SYMBOL_GPL(pud_clear_bad);
 
 void pmd_clear_bad(pmd_t *pmd)
 {


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 07/14] KVM: ARM: Memory virtualization setup
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
                   ` (5 preceding siblings ...)
  2012-08-16 15:29 ` [PATCH v10 06/14] KVM: ARM: Hypervisor inititalization Christoffer Dall
@ 2012-08-16 15:29 ` Christoffer Dall
  2012-08-16 18:25   ` [kvmarm] " Alexander Graf
  2012-08-23  8:12   ` Min-gyu Kim
  2012-08-16 15:29 ` [PATCH v10 08/14] KVM: ARM: Inject IRQs and FIQs from userspace Christoffer Dall
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:29 UTC (permalink / raw)
  To: kvmarm, kvm

This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
pgprot_guest variables used to map 2nd stage memory for KVM guests.

Each entry in TLBs and caches are tagged with a VMID identifier in
addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
order that VMs are executed, and caches and tlbs are invalidated when
the VMID space has been used to allow for more than 255 simultaenously
running guests.

The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

We pre-allocate page table memory to be able to synchronize using a
spinlock and be called under rcu_read_lock from the MMU notifiers.  We
steal the mmu_memory_cache implementation from x86 and adapt for our
specific usage.

We support MMU notifiers (thanks to Marc Zyngier) through
kvm_unmap_hva and kvm_set_spte_hva.

Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
which is used by VGIC support to map the virtual CPU interface registers
to the guest. This support is added by Marc Zyngier.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_asm.h        |    2 
 arch/arm/include/asm/kvm_host.h       |   18 ++
 arch/arm/include/asm/kvm_mmu.h        |    9 +
 arch/arm/include/asm/pgtable-3level.h |    9 +
 arch/arm/include/asm/pgtable.h        |    4 
 arch/arm/kvm/Kconfig                  |    1 
 arch/arm/kvm/arm.c                    |   38 +++
 arch/arm/kvm/exports.c                |    1 
 arch/arm/kvm/interrupts.S             |    8 +
 arch/arm/kvm/mmu.c                    |  373 +++++++++++++++++++++++++++++++++
 arch/arm/mm/mmu.c                     |    3 
 11 files changed, 465 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 58d51e3..55b6446 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -34,6 +34,7 @@
 #define SMCHYP_HVBAR_W 0xfffffff0
 
 #ifndef __ASSEMBLY__
+struct kvm;
 struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
@@ -48,6 +49,7 @@ extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index d7e3398..d86ce39 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -157,4 +157,22 @@ struct kvm_vcpu_stat {
 struct kvm_vcpu_init;
 int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 			const struct kvm_vcpu_init *init);
+
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+struct kvm;
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end);
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+
+/* We do not have shadow page tables, hence the empty hooks */
+static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
+
+static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 8252921..11f4c3a 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -33,4 +33,13 @@ int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
 void free_hyp_pmds(void);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 1169a8a..7351eee 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -102,6 +102,15 @@
  */
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
 
+/*
+ * 2-nd stage PTE definitions for LPAE.
+ */
+#define L_PTE2_SHARED		L_PTE_SHARED
+#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
+#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */
+#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
+#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */
+
 #ifndef __ASSEMBLY__
 
 #define pud_none(pud)		(!pud_val(pud))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index bc83540..a31d0e9 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 
 extern pgprot_t		pgprot_user;
 extern pgprot_t		pgprot_kernel;
+extern pgprot_t		pgprot_guest;
 
 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
 
@@ -83,6 +84,9 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
 #define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
+#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
+					  L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
+					  L_PTE2_SHARED)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 83abbe0..7fa50d3 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM_ARM_HOST
 	depends on KVM
 	depends on MMU
 	depends on CPU_V7 && ARM_VIRT_EXT
+	select	MMU_NOTIFIER
 	---help---
 	  Provides host support for ARM processors.
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 0b1c466..3f97e7c 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -82,12 +82,34 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 }
 
+/**
+ * kvm_arch_init_vm - initializes a VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
+	int ret = 0;
+
 	if (type)
 		return -EINVAL;
 
-	return 0;
+	ret = kvm_alloc_stage2_pgd(kvm);
+	if (ret)
+		goto out_fail_alloc;
+	spin_lock_init(&kvm->arch.pgd_lock);
+
+	ret = create_hyp_mappings(kvm, kvm + 1);
+	if (ret)
+		goto out_free_stage2_pgd;
+
+	/* Mark the initial VMID generation invalid */
+	kvm->arch.vmid_gen = 0;
+
+	return ret;
+out_free_stage2_pgd:
+	kvm_free_stage2_pgd(kvm);
+out_fail_alloc:
+	return ret;
 }
 
 int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
@@ -105,10 +127,16 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 	return 0;
 }
 
+/**
+ * kvm_arch_destroy_vm - destroy the VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	int i;
 
+	kvm_free_stage2_pgd(kvm);
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
@@ -184,7 +212,13 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (err)
 		goto free_vcpu;
 
+	err = create_hyp_mappings(vcpu, vcpu + 1);
+	if (err)
+		goto vcpu_uninit;
+
 	return vcpu;
+vcpu_uninit:
+	kvm_vcpu_uninit(vcpu);
 free_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 out:
@@ -193,6 +227,8 @@ out:
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
+	kvm_mmu_free_memory_caches(vcpu);
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index 8ebdf07..f39f823 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -33,5 +33,6 @@ EXPORT_SYMBOL_GPL(__kvm_hyp_code_end);
 EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
 
 EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
+EXPORT_SYMBOL_GPL(__kvm_tlb_flush_vmid);
 
 EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index bf09801..edf9ed5 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -31,6 +31,14 @@ __kvm_hyp_code_start:
 	.globl __kvm_hyp_code_start
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Flush per-VMID TLBs
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_tlb_flush_vmid)
+	bx	lr
+ENDPROC(__kvm_tlb_flush_vmid)
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Flush TLBs and instruction caches of current CPU for all VMIDs
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 6a7dfd4..6cb0e38 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -23,10 +23,43 @@
 #include <asm/pgalloc.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_asm.h>
 #include <asm/mach/map.h>
 
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
+				  int min, int max)
+{
+	void *page;
+
+	BUG_ON(max > KVM_NR_MEM_OBJS);
+	if (cache->nobjs >= min)
+		return 0;
+	while (cache->nobjs < max) {
+		page = (void *)__get_free_page(PGALLOC_GFP);
+		if (!page)
+			return -ENOMEM;
+		cache->objects[cache->nobjs++] = page;
+	}
+	return 0;
+}
+
+static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+{
+	while (mc->nobjs)
+		free_page((unsigned long)mc->objects[--mc->nobjs]);
+}
+
+static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
+{
+	void *p;
+
+	BUG_ON(!mc || !mc->nobjs);
+	p = mc->objects[--mc->nobjs];
+	return p;
+}
+
 static void free_ptes(pmd_t *pmd, unsigned long addr)
 {
 	pte_t *pte;
@@ -200,7 +233,347 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
 	return __create_hyp_mappings(from, to, &pfn);
 }
 
+/**
+ * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Allocates the 1st level table only of size defined by PGD2_ORDER (can
+ * support either full 40-bit input addresses or limited to 32-bit input
+ * addresses). Clears the allocated pages.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * created, which can only be done once.
+ */
+int kvm_alloc_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+
+	if (kvm->arch.pgd != NULL) {
+		kvm_err("kvm_arch already initialized?\n");
+		return -EINVAL;
+	}
+
+	pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
+	if (!pgd)
+		return -ENOMEM;
+
+	memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
+	kvm->arch.pgd = pgd;
+
+	return 0;
+}
+
+static void free_guest_pages(pte_t *pte, unsigned long addr)
+{
+	unsigned int i;
+	struct page *page, *pte_page;
+
+	pte_page = virt_to_page(pte);
+
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		if (pte_present(*pte)) {
+			unsigned long pfn = pte_pfn(*pte);
+
+			if (pfn_valid(pfn)) { /* Skip over device memory */
+				page = pfn_to_page(pfn);
+				put_page(page);
+			}
+			put_page(pte_page);
+		}
+		pte++;
+	}
+}
+
+static void free_stage2_ptes(pmd_t *pmd, unsigned long addr)
+{
+	unsigned int i;
+	pte_t *pte;
+	struct page *page, *pmd_page;
+
+	pmd_page = virt_to_page(pmd);
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		BUG_ON(pmd_sect(*pmd));
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			free_guest_pages(pte, addr);
+			page = virt_to_page((void *)pte);
+			WARN_ON(page_count(page) != 1);
+			pte_free_kernel(NULL, pte);
+
+			put_page(pmd_page);
+		}
+		pmd++;
+	}
+}
+
+/**
+ * kvm_free_stage2_pgd - free all stage-2 tables
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
+ * underlying level-2 and level-3 tables before freeing the actual level-1 table
+ * and setting the struct pointer to NULL.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * destroyed, which can only be done once.
+ */
+void kvm_free_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long long i, addr;
+	struct page *page, *pud_page;
+
+	if (kvm->arch.pgd == NULL)
+		return;
+
+	/*
+	 * We do this slightly different than other places, since we need more
+	 * than 32 bits and for instance pgd_addr_end converts to unsigned long.
+	 */
+	addr = 0;
+	for (i = 0; i < PTRS_PER_PGD2; i++) {
+		addr = i * (unsigned long long)PGDIR_SIZE;
+		pgd = kvm->arch.pgd + i;
+		pud = pud_offset(pgd, addr);
+		pud_page = virt_to_page(pud);
+
+		if (pud_none(*pud))
+			continue;
+
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_stage2_ptes(pmd, addr);
+		page = virt_to_page((void *)pmd);
+		WARN_ON(page_count(page) != 1);
+		pmd_free(NULL, pmd);
+		put_page(pud_page);
+	}
+
+	WARN_ON(page_count(pud_page) != 1);
+	free_pages((unsigned long)kvm->arch.pgd, PGD2_ORDER);
+	kvm->arch.pgd = NULL;
+}
+
+/*
+ * Clear a stage-2 PTE, lowering the various ref-counts. Also takes
+ * care of invalidating the TLBs.  Must be called while holding
+ * pgd_lock, otherwise another faulting VCPU may come in and mess
+ * things behind our back.
+ */
+static void stage2_clear_pte(struct kvm *kvm, phys_addr_t addr)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	struct page *page;
+
+	kvm_debug("Clearing PTE&%08llx\n", addr);
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	BUG_ON(pud_none(*pud));
+
+	pmd = pmd_offset(pud, addr);
+	BUG_ON(pmd_none(*pmd));
+
+	pte = pte_offset_kernel(pmd, addr);
+	set_pte_ext(pte, __pte(0), 0);
+
+	page = virt_to_page(pte);
+	put_page(page);
+	if (page_count(page) != 1) {
+		__kvm_tlb_flush_vmid(kvm);
+		return;
+	}
+
+	/* Need to remove pte page */
+	pmd_clear(pmd);
+	__kvm_tlb_flush_vmid(kvm);
+	pte_free_kernel(NULL, (pte_t *)((unsigned long)pte & PAGE_MASK));
+
+	page = virt_to_page(pmd);
+	put_page(page);
+	if (page_count(page) != 1)
+		return;
+
+	/*
+	 * Need to remove pmd page. This is the worst case, and we end
+	 * up invalidating the TLB twice. No big deal.
+	 */
+	pud_clear(pud);
+	__kvm_tlb_flush_vmid(kvm);
+	pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
+
+	page = virt_to_page(pud);
+	put_page(page);
+}
+
+static void stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+			   phys_addr_t addr, const pte_t *new_pte)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	/* Create 2nd stage page table mapping - Level 1 */
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud)) {
+		if (!cache)
+			return; /* ignore calls from kvm_set_spte_hva */
+		pmd = mmu_memory_cache_alloc(cache);
+		pud_populate(NULL, pud, pmd);
+		pmd += pmd_index(addr);
+		get_page(virt_to_page(pud));
+	} else
+		pmd = pmd_offset(pud, addr);
+
+	/* Create 2nd stage page table mapping - Level 2 */
+	if (pmd_none(*pmd)) {
+		if (!cache)
+			return; /* ignore calls from kvm_set_spte_hva */
+		pte = mmu_memory_cache_alloc(cache);
+		clean_pte_table(pte);
+		pmd_populate_kernel(NULL, pmd, pte);
+		pte += pte_index(addr);
+		get_page(virt_to_page(pmd));
+	} else
+		pte = pte_offset_kernel(pmd, addr);
+
+	/* Create 2nd stage page table mapping - Level 3 */
+	BUG_ON(pte_none(pte));
+	set_pte_ext(pte, *new_pte, 0);
+	get_page(virt_to_page(pte));
+}
+
+/**
+ * kvm_phys_addr_ioremap - map a device range to guest IPA
+ *
+ * @kvm:	The KVM pointer
+ * @guest_ipa:	The IPA at which to insert the mapping
+ * @pa:		The physical address of the device
+ * @size:	The size of the mapping
+ */
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size)
+{
+	phys_addr_t addr, end;
+	pgprot_t prot;
+	int ret = 0;
+	unsigned long pfn;
+	struct kvm_mmu_memory_cache cache = { 0, };
+
+	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
+	prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
+			L_PTE2_READ | L_PTE2_WRITE);
+	pfn = __phys_to_pfn(pa);
+
+	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
+		pte_t pte = pfn_pte(pfn, prot);
+
+		ret = mmu_topup_memory_cache(&cache, 2, 2);
+		if (ret)
+			goto out;
+		spin_lock(&kvm->arch.pgd_lock);
+		stage2_set_pte(kvm, &cache, addr, &pte);
+		spin_unlock(&kvm->arch.pgd_lock);
+
+		pfn++;
+	}
+
+out:
+	mmu_free_memory_cache(&cache);
+	return ret;
+}
+
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	return -EINVAL;
 }
+
+static bool hva_to_gpa(struct kvm *kvm, unsigned long hva, gpa_t *gpa)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+	bool found = false;
+
+	slots = kvm_memslots(kvm);
+
+	/* we only care about the pages that the guest sees */
+	kvm_for_each_memslot(memslot, slots) {
+		unsigned long start = memslot->userspace_addr;
+		unsigned long end;
+
+		end = start + (memslot->npages << PAGE_SHIFT);
+		if (hva >= start && hva < end) {
+			gpa_t gpa_offset = hva - start;
+			*gpa = (memslot->base_gfn << PAGE_SHIFT) + gpa_offset;
+			found = true;
+			/* no overlapping memslots allowed: break */
+			break;
+		}
+	}
+
+	return found;
+}
+
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+	bool found;
+	gpa_t gpa;
+
+	if (!kvm->arch.pgd)
+		return 0;
+
+	found = hva_to_gpa(kvm, hva, &gpa);
+	if (found) {
+		spin_lock(&kvm->arch.pgd_lock);
+		stage2_clear_pte(kvm, gpa);
+		spin_unlock(&kvm->arch.pgd_lock);
+	}
+	return 0;
+}
+
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end)
+{
+	unsigned long addr;
+	int ret;
+
+	BUG_ON((start | end) & (~PAGE_MASK));
+
+	for (addr = start; addr < end; addr += PAGE_SIZE) {
+		ret = kvm_unmap_hva(kvm, addr);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+{
+	gpa_t gpa;
+	bool found;
+
+	if (!kvm->arch.pgd)
+		return;
+
+	found = hva_to_gpa(kvm, hva, &gpa);
+	if (found) {
+		spin_lock(&kvm->arch.pgd_lock);
+		stage2_set_pte(kvm, NULL, gpa, &pte);
+		spin_unlock(&kvm->arch.pgd_lock);
+		__kvm_tlb_flush_vmid(kvm);
+	}
+}
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
+{
+	mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+}
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 76bf4f5..a153fd4 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -56,9 +56,11 @@ static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
+pgprot_t pgprot_guest;
 
 EXPORT_SYMBOL(pgprot_user);
 EXPORT_SYMBOL(pgprot_kernel);
+EXPORT_SYMBOL(pgprot_guest);
 
 struct cachepolicy {
 	const char	policy[16];
@@ -514,6 +516,7 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
+	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
 
 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 08/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
                   ` (6 preceding siblings ...)
  2012-08-16 15:29 ` [PATCH v10 07/14] KVM: ARM: Memory virtualization setup Christoffer Dall
@ 2012-08-16 15:29 ` Christoffer Dall
  2012-08-21  8:20   ` Jan Kiszka
  2012-08-16 15:29 ` [PATCH v10 09/14] KVM: ARM: World-switch implementation Christoffer Dall
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:29 UTC (permalink / raw)
  To: kvmarm, kvm

From: Christoffer Dall <cdall@cs.columbia.edu>

Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
This ioctl is used since the sematics are in fact two lines that can be
either raised or lowered on the VCPU - the IRQ and FIQ lines.

KVM needs to know which VCPU it must operate on and whether the FIQ or
IRQ line is raised/lowered. Hence both pieces of information is packed
in the kvm_irq_level->irq field. The irq fild value will be:
  IRQ: vcpu_index << 1
  FIQ: (vcpu_index << 1) | 1

This is documented in Documentation/kvm/api.txt.

The effect of the ioctl is simply to simply raise/lower the
corresponding irq_line field on the VCPU struct, which will cause the
world-switch code to raise/lower virtual interrupts when running the
guest on next switch. The wait_for_interrupt flag is also cleared for
raised IRQs or FIQs causing an idle VCPU to become active again. CPUs
in guest mode are kicked to make sure they refresh their interrupt status.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |   12 ++++++---
 arch/arm/include/asm/kvm.h        |    9 +++++++
 arch/arm/include/asm/kvm_arm.h    |    1 +
 arch/arm/kvm/arm.c                |   47 +++++++++++++++++++++++++++++++++++++
 include/linux/kvm.h               |    1 +
 5 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index bf33aaa..8345b78 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -614,15 +614,19 @@ only go to the IOAPIC.  On ia64, a IOSAPIC is created.
 4.25 KVM_IRQ_LINE
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irq_level
 Returns: 0 on success, -1 on error
 
 Sets the level of a GSI input to the interrupt controller model in the kernel.
-Requires that an interrupt controller model has been previously created with
-KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
-to be set to 1 and then back to 0.
+On some architectures it is required that an interrupt controller model has
+been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
+interrupts require the level to be set to 1 and then back to 0.
+
+ARM uses two types of interrupt lines per CPU: IRQ and FIQ.  The value of the
+irq field should be (vcpu_index << 1) for IRQs and ((vcpu_index << 1) | 1) for
+FIQs. Level is used to raise/lower the line.
 
 struct kvm_irq_level {
 	union {
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index bc5d72b..4a3e25d 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -22,6 +22,15 @@
 #include <asm/types.h>
 
 #define __KVM_HAVE_GUEST_DEBUG
+#define __KVM_HAVE_IRQ_LINE
+
+/*
+ * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index.
+ */
+enum KVM_ARM_IRQ_LINE_TYPE {
+	KVM_ARM_IRQ_LINE = 0,
+	KVM_ARM_FIQ_LINE = 1,
+};
 
 /*
  * Modes used for short-hand mode determinition in the world-switch code and
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 6e46541..0f641c1 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -74,6 +74,7 @@
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
 			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
 			HCR_SWIO | HCR_TIDCP)
+#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3f97e7c..8306587 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -24,6 +24,7 @@
 #include <linux/fs.h>
 #include <linux/mman.h>
 #include <linux/sched.h>
+#include <linux/kvm.h>
 #include <trace/events/kvm.h>
 
 #define CREATE_TRACE_POINTS
@@ -265,6 +266,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+	vcpu->cpu = cpu;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -305,6 +307,51 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return -EINVAL;
 }
 
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
+{
+	unsigned int vcpu_idx;
+	struct kvm_vcpu *vcpu;
+	unsigned long *ptr;
+	bool set;
+	int bit_index;
+
+	vcpu_idx = irq_level->irq >> 1;
+	if (vcpu_idx >= KVM_MAX_VCPUS)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+	if (!vcpu)
+		return -EINVAL;
+
+	trace_kvm_set_irq(irq_level->irq, irq_level->level, 0);
+
+	if ((irq_level->irq & 1) == KVM_ARM_IRQ_LINE)
+		bit_index = ffs(HCR_VI) - 1;
+	else /* KVM_ARM_FIQ_LINE */
+		bit_index = ffs(HCR_VF) - 1;
+
+	ptr = (unsigned long *)&vcpu->arch.irq_lines;
+	if (irq_level->level)
+		set = test_and_set_bit(bit_index, ptr);
+	else
+		set = test_and_clear_bit(bit_index, ptr);
+
+	/*
+	 * If we didn't change anything, no need to wake up or kick other CPUs
+	 */
+	if (!!set == !!irq_level->level)
+		return 0;
+
+	/*
+	 * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
+	 * trigger a world-switch round on the running physical CPU to set the
+	 * virtual IRQ/FIQ fields in the HCR appropriately.
+	 */
+	kvm_vcpu_kick(vcpu);
+
+	return 0;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 5fb08b5..c9b2556 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -111,6 +111,7 @@ struct kvm_irq_level {
 	 * ACPI gsi notion of irq.
 	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
 	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
+	 * For ARM: IRQ: irq = (2*vcpu_index). FIQ: irq = (2*vcpu_indx + 1).
 	 */
 	union {
 		__u32 irq;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 09/14] KVM: ARM: World-switch implementation
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
                   ` (7 preceding siblings ...)
  2012-08-16 15:29 ` [PATCH v10 08/14] KVM: ARM: Inject IRQs and FIQs from userspace Christoffer Dall
@ 2012-08-16 15:29 ` Christoffer Dall
  2012-08-16 15:29 ` [PATCH v10 10/14] KVM: ARM: Emulation framework and CP15 emulation Christoffer Dall
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:29 UTC (permalink / raw)
  To: kvmarm, kvm

Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.

The following Hyp-ABI is also documented in the code:

Hyp-ABI: Switching from host kernel to Hyp-mode:
   Switching to Hyp mode is done through a simple HVC instructions. The
   exception vector code will check that the HVC comes from VMID==0 and if
   so will store the necessary state on the Hyp stack, which will look like
   this (growing downwards, see the hyp_hvc handler):
     ...
     stack_page + 4: spsr (Host-SVC cpsr)
     stack_page    : lr_usr
     --------------: stack bottom

Hyp-ABI: Switching from Hyp-mode to host kernel SVC mode:
   When returning from Hyp mode to SVC mode, another HVC instruction is
   executed from Hyp mode, which is taken in the hyp_svc handler. The
   bottom of the Hyp is derived from the Hyp stack pointer (only a single
   page aligned stack is used per CPU) and the initial SVC registers are
   used to restore the host state.

Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.

SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.

Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.

To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU.  After a guest exit, the VFP state is
returned to the host.  When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state.  We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h  |   38 ++
 arch/arm/include/asm/kvm_host.h |   10 +
 arch/arm/kernel/asm-offsets.c   |   45 ++
 arch/arm/kvm/arm.c              |  166 +++++++++
 arch/arm/kvm/interrupts.S       |  711 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 967 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 0f641c1..ee345a6 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -104,6 +104,18 @@
 #define TTBCR_T0SZ	3
 #define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
 
+/* Hyp System Trap Register */
+#define HSTR_T(x)	(1 << x)
+#define HSTR_TTEE	(1 << 16)
+#define HSTR_TJDBX	(1 << 17)
+
+/* Hyp Coprocessor Trap Register */
+#define HCPTR_TCP(x)	(1 << x)
+#define HCPTR_TCP_MASK	(0x3fff)
+#define HCPTR_TASE	(1 << 15)
+#define HCPTR_TTA	(1 << 20)
+#define HCPTR_TCPAC	(1 << 31)
+
 /* Hyp Debug Configuration Register bits */
 #define HDCR_TDRA	(1 << 11)
 #define HDCR_TDOSA	(1 << 10)
@@ -134,5 +146,31 @@
 #define VTTBR_X		(5 - VTCR_GUEST_T0SZ)
 #endif
 
+/* Hyp Syndrome Register (HSR) bits */
+#define HSR_EC_SHIFT	(26)
+#define HSR_EC		(0x3fU << HSR_EC_SHIFT)
+#define HSR_IL		(1U << 25)
+#define HSR_ISS		(HSR_IL - 1)
+#define HSR_ISV_SHIFT	(24)
+#define HSR_ISV		(1U << HSR_ISV_SHIFT)
+
+#define HSR_EC_UNKNOWN	(0x00)
+#define HSR_EC_WFI	(0x01)
+#define HSR_EC_CP15_32	(0x03)
+#define HSR_EC_CP15_64	(0x04)
+#define HSR_EC_CP14_MR	(0x05)
+#define HSR_EC_CP14_LS	(0x06)
+#define HSR_EC_CP_0_13	(0x07)
+#define HSR_EC_CP10_ID	(0x08)
+#define HSR_EC_JAZELLE	(0x09)
+#define HSR_EC_BXJ	(0x0A)
+#define HSR_EC_CP14_64	(0x0C)
+#define HSR_EC_SVC_HYP	(0x11)
+#define HSR_EC_HVC	(0x12)
+#define HSR_EC_SMC	(0x13)
+#define HSR_EC_IABT	(0x20)
+#define HSR_EC_IABT_HYP	(0x21)
+#define HSR_EC_DABT	(0x24)
+#define HSR_EC_DABT_HYP	(0x25)
 
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index d86ce39..5414eeb 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -19,6 +19,8 @@
 #ifndef __ARM_KVM_HOST_H__
 #define __ARM_KVM_HOST_H__
 
+#include <asm/fpstate.h>
+
 #define KVM_MAX_VCPUS 4
 #define KVM_MEMORY_SLOTS 32
 #define KVM_PRIVATE_MEM_SLOTS 4
@@ -132,6 +134,14 @@ struct kvm_vcpu_arch {
 	u64 pc_ipa2;		/* same as above, but for non-aligned wide thumb
 				   instructions */
 
+	/* Floating point registers (VFP and Advanced SIMD/NEON) */
+	struct vfp_hard_struct vfp_guest;
+	struct vfp_hard_struct *vfp_host;
+
+	/*
+	 * Anything that is not used directly from assembly code goes
+	 * here.
+	 */
 	/* IO related fields */
 	bool mmio_sign_extend;	/* for byte/halfword loads */
 	u32 mmio_rd;
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 1429d89..aca6c2c 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -13,6 +13,7 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/dma-mapping.h>
+#include <linux/kvm_host.h>
 #include <asm/cacheflush.h>
 #include <asm/glue-df.h>
 #include <asm/glue-pf.h>
@@ -144,5 +145,49 @@ int main(void)
   DEFINE(DMA_BIDIRECTIONAL,	DMA_BIDIRECTIONAL);
   DEFINE(DMA_TO_DEVICE,		DMA_TO_DEVICE);
   DEFINE(DMA_FROM_DEVICE,	DMA_FROM_DEVICE);
+#ifdef CONFIG_KVM_ARM_HOST
+  DEFINE(VCPU_KVM,		offsetof(struct kvm_vcpu, kvm));
+  DEFINE(VCPU_MIDR,		offsetof(struct kvm_vcpu, arch.midr));
+  DEFINE(VCPU_MPIDR,		offsetof(struct kvm_vcpu, arch.cp15[c0_MPIDR]));
+  DEFINE(VCPU_SCTLR,		offsetof(struct kvm_vcpu, arch.cp15[c1_SCTLR]));
+  DEFINE(VCPU_CPACR,		offsetof(struct kvm_vcpu, arch.cp15[c1_CPACR]));
+  DEFINE(VCPU_TTBR0,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR0]));
+  DEFINE(VCPU_TTBR1,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR1]));
+  DEFINE(VCPU_TTBCR,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBCR]));
+  DEFINE(VCPU_DACR,		offsetof(struct kvm_vcpu, arch.cp15[c3_DACR]));
+  DEFINE(VCPU_DFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_DFSR]));
+  DEFINE(VCPU_IFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_IFSR]));
+  DEFINE(VCPU_ADFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_ADFSR]));
+  DEFINE(VCPU_AIFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_AIFSR]));
+  DEFINE(VCPU_DFAR,		offsetof(struct kvm_vcpu, arch.cp15[c6_DFAR]));
+  DEFINE(VCPU_IFAR,		offsetof(struct kvm_vcpu, arch.cp15[c6_IFAR]));
+  DEFINE(VCPU_PRRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_PRRR]));
+  DEFINE(VCPU_NMRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_NMRR]));
+  DEFINE(VCPU_VBAR,		offsetof(struct kvm_vcpu, arch.cp15[c12_VBAR]));
+  DEFINE(VCPU_CID,		offsetof(struct kvm_vcpu, arch.cp15[c13_CID]));
+  DEFINE(VCPU_TID_URW,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URW]));
+  DEFINE(VCPU_TID_URO,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URO]));
+  DEFINE(VCPU_TID_PRIV,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_PRIV]));
+  DEFINE(VCPU_VFP_GUEST,	offsetof(struct kvm_vcpu, arch.vfp_guest));
+  DEFINE(VCPU_VFP_HOST,		offsetof(struct kvm_vcpu, arch.vfp_host));
+  DEFINE(VCPU_REGS,		offsetof(struct kvm_vcpu, arch.regs));
+  DEFINE(VCPU_USR_REGS,		offsetof(struct kvm_vcpu, arch.regs.usr_regs));
+  DEFINE(VCPU_SVC_REGS,		offsetof(struct kvm_vcpu, arch.regs.svc_regs));
+  DEFINE(VCPU_ABT_REGS,		offsetof(struct kvm_vcpu, arch.regs.abt_regs));
+  DEFINE(VCPU_UND_REGS,		offsetof(struct kvm_vcpu, arch.regs.und_regs));
+  DEFINE(VCPU_IRQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.irq_regs));
+  DEFINE(VCPU_FIQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
+  DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.pc));
+  DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.cpsr));
+  DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.hsr));
+  DEFINE(VCPU_HDFAR,		offsetof(struct kvm_vcpu, arch.hdfar));
+  DEFINE(VCPU_HIFAR,		offsetof(struct kvm_vcpu, arch.hifar));
+  DEFINE(VCPU_HPFAR,		offsetof(struct kvm_vcpu, arch.hpfar));
+  DEFINE(VCPU_PC_IPA,		offsetof(struct kvm_vcpu, arch.pc_ipa));
+  DEFINE(VCPU_PC_IPA2,		offsetof(struct kvm_vcpu, arch.pc_ipa2));
+  DEFINE(VCPU_HYP_PC,		offsetof(struct kvm_vcpu, arch.hyp_pc));
+  DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
+#endif
   return 0; 
 }
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 8306587..28bf2c2 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -40,6 +40,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_emulate.h>
 
 #ifdef REQUIRES_SEC
 __asm__(".arch_extension	sec");
@@ -51,6 +52,11 @@ __asm__(".arch_extension	virt");
 static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 static DEFINE_PER_CPU(struct vfp_hard_struct *, kvm_host_vfp_state);
 
+/* The VMID used in the VTTBR */
+static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
+static u8 kvm_next_vmid;
+static DEFINE_SPINLOCK(kvm_vmid_lock);
+
 int kvm_arch_hardware_enable(void *garbage)
 {
 	return 0;
@@ -267,6 +273,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
+	vcpu->arch.vfp_host = __get_cpu_var(kvm_host_vfp_state);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -299,12 +306,169 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
 {
+	return v->mode == IN_GUEST_MODE;
+}
+
+static void reset_vm_context(void *info)
+{
+	__kvm_flush_vm_context();
+}
+
+/**
+ * need_new_vmid_gen - check that the VMID is still valid
+ * @kvm: The VM's VMID to checkt
+ *
+ * return true if there is a new generation of VMIDs being used
+ *
+ * The hardware supports only 256 values with the value zero reserved for the
+ * host, so we check if an assigned value belongs to a previous generation,
+ * which which requires us to assign a new value. If we're the first to use a
+ * VMID for the new generation, we must flush necessary caches and TLBs on all
+ * CPUs.
+ */
+static bool need_new_vmid_gen(struct kvm *kvm)
+{
+	return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
+}
+
+/**
+ * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
+ * @kvm	The guest that we are about to run
+ *
+ * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
+ * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
+ * caches and TLBs.
+ */
+static void update_vttbr(struct kvm *kvm)
+{
+	phys_addr_t pgd_phys;
+
+	if (!need_new_vmid_gen(kvm))
+		return;
+
+	spin_lock(&kvm_vmid_lock);
+
+	/* First user of a new VMID generation? */
+	if (unlikely(kvm_next_vmid == 0)) {
+		atomic64_inc(&kvm_vmid_gen);
+		kvm_next_vmid = 1;
+
+		/*
+		 * On SMP we know no other CPUs can use this CPU's or
+		 * each other's VMID since the kvm_vmid_lock blocks
+		 * them from reentry to the guest.
+		 */
+		on_each_cpu(reset_vm_context, NULL, 1);
+	}
+
+	kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
+	kvm->arch.vmid = kvm_next_vmid;
+	kvm_next_vmid++;
+
+	/* update vttbr to be used with the new vmid */
+	pgd_phys = virt_to_phys(kvm->arch.pgd);
+	kvm->arch.vttbr = pgd_phys & ((1LLU << 40) - 1)
+			  & ~((2 << VTTBR_X) - 1);
+	kvm->arch.vttbr |= (u64)(kvm->arch.vmid) << 48;
+
+	spin_unlock(&kvm_vmid_lock);
+}
+
+/*
+ * Return 0 to return to guest, < 0 on error, exit_reason ( > 0) on proper
+ * exit to QEMU.
+ */
+static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+		       int exception_index)
+{
+	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 	return 0;
 }
 
+/**
+ * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
+ * @vcpu:	The VCPU pointer
+ * @run:	The kvm_run structure pointer used for userspace state exchange
+ *
+ * This function is called through the VCPU_RUN ioctl called from user space. It
+ * will execute VM code in a loop until the time slice for the process is used
+ * or some emulation is needed from user space in which case the function will
+ * return with return value 0 and with the kvm_run structure filled in with the
+ * required data for the requested emulation.
+ */
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	int ret;
+	sigset_t sigsaved;
+
+	/* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
+	if (unlikely(!vcpu->arch.target))
+		return -ENOEXEC;
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
+
+	ret = 1;
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	while (ret > 0) {
+		/*
+		 * Check conditions before entering the guest
+		 */
+		cond_resched();
+
+		update_vttbr(vcpu->kvm);
+
+		local_irq_disable();
+
+		/*
+		 * Re-check atomic conditions
+		 */
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			run->exit_reason = KVM_EXIT_INTR;
+		}
+
+		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
+			local_irq_enable();
+			continue;
+		}
+
+		BUG_ON(__vcpu_mode(*vcpu_cpsr(vcpu)) == 0xf);
+
+		/**************************************************************
+		 * Enter the guest
+		 */
+		trace_kvm_entry(vcpu->arch.regs.pc);
+		kvm_guest_enter();
+		vcpu->mode = IN_GUEST_MODE;
+
+		ret = __kvm_vcpu_run(vcpu);
+
+		vcpu->mode = OUTSIDE_GUEST_MODE;
+		kvm_guest_exit();
+		trace_kvm_exit(vcpu->arch.regs.pc);
+		/*
+		 * We may have taken a host interrupt in HYP mode (ie
+		 * while executing the guest). This interrupt is still
+		 * pending, as we haven't serviced it yet!
+		 *
+		 * We're now back in SVC mode, with interrupts
+		 * disabled.  Enabling the interrupts now will have
+		 * the effect of taking the interrupt again, in SVC
+		 * mode this time.
+		 */
+		local_irq_enable();
+
+		/*
+		 * Back from guest
+		 *************************************************************/
+
+		ret = handle_exit(vcpu, run, ret);
+	}
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &sigsaved, NULL);
+	return ret;
 }
 
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index edf9ed5..a29870e 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -23,6 +23,12 @@
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
+#include <asm/vfpmacros.h>
+
+#define VCPU_USR_REG(_reg_nr)	(VCPU_USR_REGS + (_reg_nr * 4))
+#define VCPU_USR_SP		(VCPU_USR_REG(13))
+#define VCPU_FIQ_REG(_reg_nr)	(VCPU_FIQ_REGS + (_reg_nr * 4))
+#define VCPU_FIQ_SPSR		(VCPU_FIQ_REG(7))
 
 	.text
 	.align	PAGE_SHIFT
@@ -34,7 +40,33 @@ __kvm_hyp_code_start:
 @  Flush per-VMID TLBs
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/*
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm);
+ *
+ * We rely on the hardware to broadcast the TLB invalidation to all CPUs
+ * inside the inner-shareable domain (which is the case for all v7
+ * implementations).  If we come across a non-IS SMP implementation, we'll
+ * have to use an IPI based mechanism. Until then, we stick to the simple
+ * hardware assisted version.
+ */
 ENTRY(__kvm_tlb_flush_vmid)
+	hvc	#0			@ Switch to Hyp mode
+	push	{r2, r3}
+
+	add	r0, r0, #KVM_VTTBR
+	ldrd	r2, r3, [r0]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+	isb
+	mcr     p15, 0, r0, c8, c3, 0	@ TLBIALLIS (rt ignored)
+	dsb
+	isb
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
+	isb
+
+	pop	{r2, r3}
+	hvc	#0			@ Back to SVC
 	bx	lr
 ENDPROC(__kvm_tlb_flush_vmid)
 
@@ -42,26 +74,701 @@ ENDPROC(__kvm_tlb_flush_vmid)
 @  Flush TLBs and instruction caches of current CPU for all VMIDs
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/*
+ * void __kvm_flush_vm_context(void);
+ */
 ENTRY(__kvm_flush_vm_context)
+	hvc	#0			@ switch to hyp-mode
+
+	mov	r0, #0			@ rn parameter for c15 flushes is SBZ
+	mcr     p15, 4, r0, c8, c7, 4   @ Invalidate Non-secure Non-Hyp TLB
+	mcr     p15, 0, r0, c7, c5, 0   @ Invalidate instruction caches
+	dsb
+	isb
+
+	hvc	#0			@ switch back to svc-mode, see hyp_svc
 	bx	lr
 ENDPROC(__kvm_flush_vm_context)
 
+/* Clobbers {r2-r6} */
+.macro store_vfp_state vfp_base
+	@ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions
+	VFPFMRX	r2, FPEXC
+	@ Make sure VFP is enabled so we can touch the registers.
+	orr	r6, r2, #FPEXC_EN
+	VFPFMXR	FPEXC, r6
+
+	VFPFMRX	r3, FPSCR
+	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
+	beq	1f
+	@ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so
+	@ we only need to save them if FPEXC_EX is set.
+	VFPFMRX r4, FPINST
+	tst	r2, #FPEXC_FP2V
+	VFPFMRX r5, FPINST2, ne		@ vmrsne
+	bic	r6, r2, #FPEXC_EX	@ FPEXC_EX disable
+	VFPFMXR	FPEXC, r6
+1:
+	VFPFSTMIA \vfp_base, r6		@ Save VFP registers
+	stm	\vfp_base, {r2-r5}	@ Save FPEXC, FPSCR, FPINST, FPINST2
+.endm
+
+/* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */
+.macro restore_vfp_state vfp_base
+	VFPFLDMIA \vfp_base, r6		@ Load VFP registers
+	ldm	\vfp_base, {r2-r5}	@ Load FPEXC, FPSCR, FPINST, FPINST2
+
+	VFPFMXR FPSCR, r3
+	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
+	beq	1f
+	VFPFMXR FPINST, r4
+	tst	r2, #FPEXC_FP2V
+	VFPFMXR FPINST2, r5, ne
+1:
+	VFPFMXR FPEXC, r2	@ FPEXC	(last, in case !EN)
+.endm
+
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor world-switch code
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/* These are simply for the macros to work - value don't have meaning */
+.equ usr, 0
+.equ svc, 1
+.equ abt, 2
+.equ und, 3
+.equ irq, 4
+.equ fiq, 5
+
+.macro store_mode_state base_reg, mode
+	.if \mode == usr
+	mrs	r2, SP_usr
+	mov	r3, lr
+	stmdb	\base_reg!, {r2, r3}
+	.elseif \mode != fiq
+	mrs	r2, SP_\mode
+	mrs	r3, LR_\mode
+	mrs	r4, SPSR_\mode
+	stmdb	\base_reg!, {r2, r3, r4}
+	.else
+	mrs	r2, r8_fiq
+	mrs	r3, r9_fiq
+	mrs	r4, r10_fiq
+	mrs	r5, r11_fiq
+	mrs	r6, r12_fiq
+	mrs	r7, SP_fiq
+	mrs	r8, LR_fiq
+	mrs	r9, SPSR_fiq
+	stmdb	\base_reg!, {r2-r9}
+	.endif
+.endm
+
+.macro load_mode_state base_reg, mode
+	.if \mode == usr
+	ldmia	\base_reg!, {r2, r3}
+	msr	SP_usr, r2
+	mov	lr, r3
+	.elseif \mode != fiq
+	ldmia	\base_reg!, {r2, r3, r4}
+	msr	SP_\mode, r2
+	msr	LR_\mode, r3
+	msr	SPSR_\mode, r4
+	.else
+	ldmia	\base_reg!, {r2-r9}
+	msr	r8_fiq, r2
+	msr	r9_fiq, r3
+	msr	r10_fiq, r4
+	msr	r11_fiq, r5
+	msr	r12_fiq, r6
+	msr	SP_fiq, r7
+	msr	LR_fiq, r8
+	msr	SPSR_fiq, r9
+	.endif
+.endm
+
+/* Reads cp15 registers from hardware and stores them in memory
+ * @vcpu:   If 0, registers are written in-order to the stack,
+ * 	    otherwise to the VCPU struct pointed to by vcpup
+ * @vcpup:  Register pointing to VCPU struct
+ */
+.macro read_cp15_state vcpu=0, vcpup
+	mrc	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mrc	p15, 0, r3, c1, c0, 2	@ CPACR
+	mrc	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mrc	p15, 0, r5, c3, c0, 0	@ DACR
+	mrrc	p15, 0, r6, r7, c2	@ TTBR 0
+	mrrc	p15, 1, r8, r9, c2	@ TTBR 1
+	mrc	p15, 0, r10, c10, c2, 0	@ PRRR
+	mrc	p15, 0, r11, c10, c2, 1	@ NMRR
+
+	.if \vcpu == 0
+	push	{r2-r11}		@ Push CP15 registers
+	.else
+	str	r2, [\vcpup, #VCPU_SCTLR]
+	str	r3, [\vcpup, #VCPU_CPACR]
+	str	r4, [\vcpup, #VCPU_TTBCR]
+	str	r5, [\vcpup, #VCPU_DACR]
+	add	\vcpup, \vcpup, #VCPU_TTBR0
+	strd	r6, r7, [\vcpup]
+	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	strd	r8, r9, [\vcpup]
+	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
+	str	r10, [\vcpup, #VCPU_PRRR]
+	str	r11, [\vcpup, #VCPU_NMRR]
+	.endif
+
+	mrc	p15, 0, r2, c13, c0, 1	@ CID
+	mrc	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mrc	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mrc	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	mrc	p15, 0, r6, c5, c0, 0	@ DFSR
+	mrc	p15, 0, r7, c5, c0, 1	@ IFSR
+	mrc	p15, 0, r8, c5, c1, 0	@ ADFSR
+	mrc	p15, 0, r9, c5, c1, 1	@ AIFSR
+	mrc	p15, 0, r10, c6, c0, 0	@ DFAR
+	mrc	p15, 0, r11, c6, c0, 2	@ IFAR
+	mrc	p15, 0, r12, c12, c0, 0	@ VBAR
+
+	.if \vcpu == 0
+	push	{r2-r12}		@ Push CP15 registers
+	.else
+	str	r2, [\vcpup, #VCPU_CID]
+	str	r3, [\vcpup, #VCPU_TID_URW]
+	str	r4, [\vcpup, #VCPU_TID_URO]
+	str	r5, [\vcpup, #VCPU_TID_PRIV]
+	str	r6, [\vcpup, #VCPU_DFSR]
+	str	r7, [\vcpup, #VCPU_IFSR]
+	str	r8, [\vcpup, #VCPU_ADFSR]
+	str	r9, [\vcpup, #VCPU_AIFSR]
+	str	r10, [\vcpup, #VCPU_DFAR]
+	str	r11, [\vcpup, #VCPU_IFAR]
+	str	r12, [\vcpup, #VCPU_VBAR]
+	.endif
+.endm
+
+/* Reads cp15 registers from memory and writes them to hardware
+ * @vcpu:   If 0, registers are read in-order from the stack,
+ * 	    otherwise from the VCPU struct pointed to by vcpup
+ * @vcpup:  Register pointing to VCPU struct
+ */
+.macro write_cp15_state vcpu=0, vcpup
+	.if \vcpu == 0
+	pop	{r2-r12}
+	.else
+	ldr	r2, [\vcpup, #VCPU_CID]
+	ldr	r3, [\vcpup, #VCPU_TID_URW]
+	ldr	r4, [\vcpup, #VCPU_TID_URO]
+	ldr	r5, [\vcpup, #VCPU_TID_PRIV]
+	ldr	r6, [\vcpup, #VCPU_DFSR]
+	ldr	r7, [\vcpup, #VCPU_IFSR]
+	ldr	r8, [\vcpup, #VCPU_ADFSR]
+	ldr	r9, [\vcpup, #VCPU_AIFSR]
+	ldr	r10, [\vcpup, #VCPU_DFAR]
+	ldr	r11, [\vcpup, #VCPU_IFAR]
+	ldr	r12, [\vcpup, #VCPU_VBAR]
+	.endif
+
+	mcr	p15, 0, r2, c13, c0, 1	@ CID
+	mcr	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mcr	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mcr	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	mcr	p15, 0, r6, c5, c0, 0	@ DFSR
+	mcr	p15, 0, r7, c5, c0, 1	@ IFSR
+	mcr	p15, 0, r8, c5, c1, 0	@ ADFSR
+	mcr	p15, 0, r9, c5, c1, 1	@ AIFSR
+	mcr	p15, 0, r10, c6, c0, 0	@ DFAR
+	mcr	p15, 0, r11, c6, c0, 2	@ IFAR
+	mcr	p15, 0, r12, c12, c0, 0	@ VBAR
+
+	.if \vcpu == 0
+	pop	{r2-r11}
+	.else
+	ldr	r2, [\vcpup, #VCPU_SCTLR]
+	ldr	r3, [\vcpup, #VCPU_CPACR]
+	ldr	r4, [\vcpup, #VCPU_TTBCR]
+	ldr	r5, [\vcpup, #VCPU_DACR]
+	add	\vcpup, \vcpup, #VCPU_TTBR0
+	ldrd	r6, r7, [\vcpup]
+	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	ldrd	r8, r9, [\vcpup]
+	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
+	ldr	r10, [\vcpup, #VCPU_PRRR]
+	ldr	r11, [\vcpup, #VCPU_NMRR]
+	.endif
+
+	mcr	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mcr	p15, 0, r3, c1, c0, 2	@ CPACR
+	mcr	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mcr	p15, 0, r5, c3, c0, 0	@ DACR
+	mcrr	p15, 0, r6, r7, c2	@ TTBR 0
+	mcrr	p15, 1, r8, r9, c2	@ TTBR 1
+	mcr	p15, 0, r10, c10, c2, 0	@ PRRR
+	mcr	p15, 0, r11, c10, c2, 1	@ NMRR
+.endm
+
+/* Configures the HSTR (Hyp System Trap Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hstr entry
+	mrc	p15, 4, r2, c1, c1, 3
+	ldr	r3, =HSTR_T(15)
+	.if \entry == 1
+	orr	r2, r2, r3		@ Trap CR{15}
+	.else
+	bic	r2, r2, r3		@ Don't trap any CRx accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 3
+.endm
+
+/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
+ * (hardware reset value is 0). Keep previous value in r2. */
+.macro set_hcptr entry, mask
+	mrc	p15, 4, r2, c1, c1, 2
+	ldr	r3, =\mask
+	.if \entry == 1
+	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
+	.else
+	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
+	.endif
+	mcr	p15, 4, r3, c1, c1, 2
+.endm
+
+/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hdcr entry
+	mrc	p15, 4, r2, c1, c1, 1
+	ldr	r3, =(HDCR_TPM|HDCR_TPMCR)
+	.if \entry == 1
+	orr	r2, r2, r3		@ Trap some perfmon accesses
+	.else
+	bic	r2, r2, r3		@ Don't trap any perfmon accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 1
+.endm
+
+/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
+.macro configure_hyp_role entry, vcpu_ptr
+	mrc	p15, 4, r2, c1, c1, 0	@ HCR
+	bic	r2, r2, #HCR_VIRT_EXCP_MASK
+	ldr	r3, =HCR_GUEST_MASK
+	.if \entry == 1
+	orr	r2, r2, r3
+	ldr	r3, [\vcpu_ptr, #VCPU_IRQ_LINES]
+	orr	r2, r2, r3
+	.else
+	bic	r2, r2, r3
+	.endif
+	mcr	p15, 4, r2, c1, c1, 0
+.endm
+
+@ Arguments:
+@  r0: pointer to vcpu struct
 ENTRY(__kvm_vcpu_run)
-	bx	lr
+	hvc	#0			@ switch to hyp-mode
+
+	@ Now we're in Hyp-mode and lr_usr, spsr_hyp are on the stack
+	mrs	r2, sp_usr
+	push	{r2}			@ Push r13_usr
+	push	{r4-r12}		@ Push r4-r12
+
+	store_mode_state sp, svc
+	store_mode_state sp, abt
+	store_mode_state sp, und
+	store_mode_state sp, irq
+	store_mode_state sp, fiq
+
+	@ Store hardware CP15 state and load guest state
+	read_cp15_state
+	write_cp15_state 1, r0
+
+	@ If the host kernel has not been configured with VFPv3 support,
+	@ then it is safer if we deny guests from using it as well.
+#ifdef CONFIG_VFPv3
+	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
+	VFPFMRX r2, FPEXC		@ VMRS
+	push	{r2}
+	orr	r2, r2, #FPEXC_EN
+	VFPFMXR FPEXC, r2		@ VMSR
+#endif
+
+	push	{r0}			@ Push the VCPU pointer
+
+	@ Configure Hyp-role
+	configure_hyp_role 1, r0
+
+	@ Trap coprocessor CRx accesses
+	set_hstr 1
+	set_hcptr 1, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+	set_hdcr 1
+
+	@ Write configured ID register into MIDR alias
+	ldr	r1, [r0, #VCPU_MIDR]
+	mcr	p15, 4, r1, c0, c0, 0
+
+	@ Write guest view of MPIDR into VMPIDR
+	ldr	r1, [r0, #VCPU_MPIDR]
+	mcr	p15, 4, r1, c0, c0, 5
+
+	@ Load guest registers
+	add	r0, r0, #(VCPU_USR_SP)
+	load_mode_state r0, usr
+	load_mode_state r0, svc
+	load_mode_state r0, abt
+	load_mode_state r0, und
+	load_mode_state r0, irq
+	load_mode_state r0, fiq
+
+	@ Load return state (r0 now points to vcpu->arch.regs.pc)
+	ldmia	r0, {r2, r3}
+	msr	ELR_hyp, r2
+	msr	SPSR_cxsf, r3
+
+	@ Set up guest memory translation
+	sub	r1, r0, #(VCPU_PC - VCPU_KVM)	@ r1 points to kvm struct
+	ldr	r1, [r1]
+	add	r1, r1, #KVM_VTTBR
+	ldrd	r2, r3, [r1]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Load remaining registers and do the switch
+	sub	r0, r0, #(VCPU_PC - VCPU_USR_REGS)
+	ldmia	r0, {r0-r12}
+	eret
+
+__kvm_vcpu_return:
+	@ Set VMID == 0
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Store return state
+	mrs	r2, ELR_hyp
+	mrs	r3, spsr
+	str	r2, [r1, #VCPU_PC]
+	str	r3, [r1, #VCPU_CPSR]
+
+	@ Store guest registers
+	add	r1, r1, #(VCPU_FIQ_SPSR + 4)
+	store_mode_state r1, fiq
+	store_mode_state r1, irq
+	store_mode_state r1, und
+	store_mode_state r1, abt
+	store_mode_state r1, svc
+	store_mode_state r1, usr
+	sub	r1, r1, #(VCPU_USR_REG(13))
+
+	@ Don't trap coprocessor accesses for host kernel
+	set_hstr 0
+	set_hcptr 0, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+	set_hdcr 0
+
+#ifdef CONFIG_VFPv3
+	@ Save floating point registers we if let guest use them.
+	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
+	bne	after_vfp_restore
+
+	@ Switch VFP/NEON hardware state to the host's
+	add	r7, r1, #VCPU_VFP_GUEST
+	store_vfp_state r7
+	add	r7, r1, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	restore_vfp_state r7
 
+after_vfp_restore:
+	@ Restore FPEXC_EN which we clobbered on entry
+	pop	{r2}
+	VFPFMXR FPEXC, r2
+#endif
+
+	@ Reset Hyp-role
+	configure_hyp_role 0, r1
+
+	@ Let host read hardware MIDR
+	mrc	p15, 0, r2, c0, c0, 0
+	mcr	p15, 4, r2, c0, c0, 0
+
+	@ Back to hardware MPIDR
+	mrc	p15, 0, r2, c0, c0, 5
+	mcr	p15, 4, r2, c0, c0, 5
+
+	@ Store guest CP15 state and restore host state
+	read_cp15_state 1, r1
+	write_cp15_state
+
+	load_mode_state sp, fiq
+	load_mode_state sp, irq
+	load_mode_state sp, und
+	load_mode_state sp, abt
+	load_mode_state sp, svc
+
+	pop	{r4-r12}		@ Pop r4-r12
+	pop	{r2}			@ Pop r13_usr
+	msr	sp_usr, r2
+
+	hvc	#0			@ switch back to svc-mode, see hyp_svc
+
+	bx	lr			@ return to IOCTL
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor exception vector and handlers
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/*
+ * The KVM/ARM Hypervisor ABI is defined as follows:
+ *
+ * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
+ * instruction is issued since all traps are disabled when running the host
+ * kernel as per the Hyp-mode initialization at boot time.
+ *
+ * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
+ * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
+ * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
+ * instructions are called from within Hyp-mode.
+ *
+ * Hyp-ABI: Switching from host kernel to Hyp-mode:
+ *    Switching to Hyp mode is done through a simple HVC instructions. The
+ *    exception vector code will check that the HVC comes from VMID==0 and if
+ *    so will store the necessary state on the Hyp stack, which will look like
+ *    this (growing downwards, see the hyp_hvc handler):
+ *      ...
+ *      stack_page + 4: spsr (Host-SVC cpsr)
+ *      stack_page    : lr_usr
+ *      --------------: stack bottom
+ *
+ * Hyp-ABI: Switching from Hyp-mode to host kernel SVC mode:
+ *    When returning from Hyp mode to SVC mode, another HVC instruction is
+ *    executed from Hyp mode, which is taken in the hyp_svc handler. The
+ *    bottom of the Hyp is derived from the Hyp stack pointer (only a single
+ *    page aligned stack is used per CPU) and the initial SVC registers are
+ *    used to restore the host state.
+ *
+ *
+ * Note that the above is used to execute code in Hyp-mode from a host-kernel
+ * point of view, and is a different concept from performing a world-switch and
+ * executing guest code SVC mode (with a VMID != 0).
+ */
+
+@ Handle undef, svc, pabt, or dabt by crashing with a user notice
+.macro bad_exception exception_code, panic_str
+	mrrc	p15, 6, r2, r3, c2	@ Read VTTBR
+	lsr	r3, r3, #16
+	ands	r3, r3, #0xff
+
+	@ COND:neq means we're probably in the guest and we can try fetching
+	@ the vcpu pointer and stuff off the stack and keep our fingers crossed
+	beq	99f
+	mov	r0, #\exception_code
+	pop	{r1}			@ Load VCPU pointer
+	.if \exception_code == ARM_EXCEPTION_DATA_ABORT
+	mrc	p15, 4, r2, c5, c2, 0	@ HSR
+	mrc	p15, 4, r3, c6, c0, 0	@ HDFAR
+	str	r2, [r1, #VCPU_HSR]
+	str	r3, [r1, #VCPU_HDFAR]
+	.endif
+	.if \exception_code == ARM_EXCEPTION_PREF_ABORT
+	mrc	p15, 4, r2, c5, c2, 0	@ HSR
+	mrc	p15, 4, r3, c6, c0, 2	@ HIFAR
+	str	r2, [r1, #VCPU_HSR]
+	str	r3, [r1, #VCPU_HIFAR]
+	.endif
+	mrs	r2, ELR_hyp
+	str	r2, [r1, #VCPU_HYP_PC]
+	b	__kvm_vcpu_return
+
+	@ We were in the host already
+99:	hvc	#0	@ switch to SVC mode
+	ldr	r0, \panic_str
+	mrs	r1, ELR_hyp
+	b	panic
+
+.endm
+
+	.text
+
 	.align 5
 __kvm_hyp_vector:
 	.globl __kvm_hyp_vector
-	nop
+
+	@ Hyp-mode exception vector
+	W(b)	hyp_reset
+	W(b)	hyp_undef
+	W(b)	hyp_svc
+	W(b)	hyp_pabt
+	W(b)	hyp_dabt
+	W(b)	hyp_hvc
+	W(b)	hyp_irq
+	W(b)	hyp_fiq
+
+	.align
+hyp_reset:
+	b	hyp_reset
+
+	.align
+hyp_undef:
+	bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str
+
+	.align
+hyp_svc:
+	@ Can only get here if HVC or SVC is called from Hyp, mode which means
+	@ we want to change mode back to SVC mode.
+	push	{r12}
+	mov	r12, sp
+	bic	r12, r12, #0x0ff
+	bic	r12, r12, #0xf00
+	ldr	lr, [r12, #4]
+	msr	SPSR_csxf, lr
+	ldr	lr, [r12]
+	pop	{r12}
+	eret
+
+	.align
+hyp_pabt:
+	bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str
+
+	.align
+hyp_dabt:
+	bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str
+
+	.align
+hyp_hvc:
+	@ Getting here is either becuase of a trap from a guest or from calling
+	@ HVC from the host kernel, which means "switch to Hyp mode".
+	push	{r0, r1, r2}
+
+	@ Check syndrome register
+	mrc	p15, 4, r0, c5, c2, 0	@ HSR
+	lsr	r1, r0, #HSR_EC_SHIFT
+#ifdef CONFIG_VFPv3
+	cmp	r1, #HSR_EC_CP_0_13
+	beq	switch_to_guest_vfp
+#endif
+	cmp	r1, #HSR_EC_HVC
+	bne	guest_trap		@ Not HVC instr.
+
+	@ Let's check if the HVC came from VMID 0 and allow simple
+	@ switch to Hyp mode
+	mrrc    p15, 6, r1, r2, c2
+	lsr     r2, r2, #16
+	and     r2, r2, #0xff
+	cmp     r2, #0
+	bne	guest_trap		@ Guest called HVC
+
+	@ Store lr_usr,spsr (svc cpsr) on bottom of stack
+	mov	r1, sp
+	bic	r1, r1, #0x0ff
+	bic	r1, r1, #0xf00
+	str	lr, [r1]
+	mrs	lr, spsr
+	str	lr, [r1, #4]
+
+	pop	{r0, r1, r2}
+
+	@ Return to caller in Hyp mode
+	mrs	lr, ELR_hyp
+	mov	pc, lr
+
+guest_trap:
+	ldr	r1, [sp, #12]		@ Load VCPU pointer
+	str	r0, [r1, #VCPU_HSR]
+	add	r1, r1, #VCPU_USR_REG(3)
+	stmia	r1, {r3-r12}
+	sub	r1, r1, #(VCPU_USR_REG(3) - VCPU_USR_REG(0))
+	pop	{r3, r4, r5}
+	add	sp, sp, #4		@ We loaded the VCPU pointer above
+	stmia	r1, {r3, r4, r5}
+	sub	r1, r1, #VCPU_USR_REG(0)
+
+	@ Check if we need the fault information
+	lsr	r2, r0, #HSR_EC_SHIFT
+	cmp	r2, #HSR_EC_IABT
+	beq	2f
+	cmpne	r2, #HSR_EC_DABT
+	bne	1f
+
+	@ For non-valid data aborts, get the offending instr. PA
+	lsr	r2, r0, #HSR_ISV_SHIFT
+	ands	r2, r2, #1
+	bne	2f
+	mrs	r3, ELR_hyp
+	mrs	r7, spsr
+	and	r7, r7, #0xf
+	cmp	r7, #0			@ fault happened in user mode?
+	mcreq	p15, 0, r3, c7, c8, 2	@ VA to PA, ATS1CUR
+	mcrne	p15, 0, r3, c7, c8, 0	@ VA to PA, ATS1CPR
+	mrrc	p15, 0, r4, r5, c7	@ PAR
+	add	r6, r1, #VCPU_PC_IPA
+	strd	r4, r5, [r6]
+
+	@ Check if we might have a wide thumb instruction spill-over
+	ldr	r5, =0xfff
+	bic	r4, r3, r5		@ clear page mask
+	sub	r5, r5, #1		@ last 2-byte page bounday, 0xffe
+	cmp	r4, r5
+	bne	2f
+	add	r4, r3, #2		@ _really_ unlikely!
+	cmp	r7, #0			@ fault happened in user mode?
+	mcreq	p15, 0, r4, c7, c8, 2	@ VA to PA, ATS1CUR
+	mcrne	p15, 0, r4, c7, c8, 0	@ VA to PA, ATS1CPR
+	mrrc	p15, 0, r4, r5, c7	@ PAR
+	add	r6, r1, #VCPU_PC_IPA2
+	strd	r4, r5, [r6]
+
+2:	mrc	p15, 4, r2, c6, c0, 0	@ HDFAR
+	mrc	p15, 4, r3, c6, c0, 2	@ HIFAR
+	mrc	p15, 4, r4, c6, c0, 4	@ HPFAR
+	add	r5, r1, #VCPU_HDFAR
+	stmia	r5, {r2, r3, r4}
+
+1:	mov	r0, #ARM_EXCEPTION_HVC
+	b	__kvm_vcpu_return
+
+@ If VFPv3 support is not available, then we will not switch the VFP
+@ registers; however cp10 and cp11 accesses will still trap and fallback
+@ to the regular coprocessor emulation code, which currently will
+@ inject an undefined exception to the guest.
+#ifdef CONFIG_VFPv3
+switch_to_guest_vfp:
+	ldr	r0, [sp, #12]		@ Load VCPU pointer
+	push	{r3-r7}
+
+	@ NEON/VFP used.  Turn on VFP access.
+	set_hcptr 0, (HCPTR_TCP(10) | HCPTR_TCP(11))
+
+	@ Switch VFP/NEON hardware state to the guest's
+	add	r7, r0, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	store_vfp_state r7
+	add	r7, r0, #VCPU_VFP_GUEST
+	restore_vfp_state r7
+
+	pop	{r3-r7}
+	pop	{r0-r2}
+	eret
+#endif
+
+	.align
+hyp_irq:
+	push	{r0}
+	ldr	r0, [sp, #4]		@ Load VCPU pointer
+	add	r0, r0, #(VCPU_USR_REG(1))
+	stmia	r0, {r1-r12}
+	pop	{r0, r1}		@ r1 == vcpu pointer
+	str	r0, [r1, #VCPU_USR_REG(0)]
+
+	mov	r0, #ARM_EXCEPTION_IRQ
+	b	__kvm_vcpu_return
+
+	.align
+hyp_fiq:
+	b	hyp_fiq
+
+	.ltorg
+
+und_die_str:
+	.ascii	"unexpected undefined exception in Hyp mode at: %#08x"
+pabt_die_str:
+	.ascii	"unexpected prefetch abort in Hyp mode at: %#08x"
+dabt_die_str:
+	.ascii	"unexpected data abort in Hyp mode at: %#08x"
 
 /*
  * The below lines makes sure the HYP mode code fits in a single page (the


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 10/14] KVM: ARM: Emulation framework and CP15 emulation
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
                   ` (8 preceding siblings ...)
  2012-08-16 15:29 ` [PATCH v10 09/14] KVM: ARM: World-switch implementation Christoffer Dall
@ 2012-08-16 15:29 ` Christoffer Dall
  2012-08-16 15:30 ` [PATCH v10 11/14] KVM: ARM: User space API for getting/setting co-proc registers Christoffer Dall
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:29 UTC (permalink / raw)
  To: kvmarm, kvm

Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.

Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commit handles these exits by
emulating the intended operation in software and skip the guest
instruction.

Minor notes about the coproc register reset:
1) We reserve a value of 0 as an invalid cp15 offset, to catch bugs in our
   table, at cost of 4 bytes per vcpu.

2) Added comments on the table indicating how we handle each register, for
   simplicity of understanding.


Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h     |    9 +
 arch/arm/include/asm/kvm_coproc.h  |    7 
 arch/arm/include/asm/kvm_emulate.h |    5 
 arch/arm/include/asm/kvm_host.h    |    5 
 arch/arm/kvm/arm.c                 |  166 ++++++++++
 arch/arm/kvm/coproc.c              |  572 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/emulate.c             |  120 ++++++++
 arch/arm/kvm/trace.h               |   28 ++
 8 files changed, 910 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index ee345a6..ae586c1 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -76,6 +76,11 @@
 			HCR_SWIO | HCR_TIDCP)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
+/* System Control Register (SCTLR) bits */
+#define SCTLR_TE	(1 << 30)
+#define SCTLR_EE	(1 << 25)
+#define SCTLR_V		(1 << 13)
+
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
 #define HSCTLR_EE	(1 << 25)
@@ -153,6 +158,10 @@
 #define HSR_ISS		(HSR_IL - 1)
 #define HSR_ISV_SHIFT	(24)
 #define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_CV_SHIFT	(24)
+#define HSR_CV		(1U << HSR_CV_SHIFT)
+#define HSR_COND_SHIFT	(20)
+#define HSR_COND	(0xfU << HSR_COND_SHIFT)
 
 #define HSR_EC_UNKNOWN	(0x00)
 #define HSR_EC_WFI	(0x01)
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
index b6d023d..c451fb4 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -21,4 +21,11 @@
 
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_coproc_table_init(void);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 9e29335..d914029 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -51,6 +51,11 @@ static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
 	return mode;
 }
 
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_adjust_itstate(struct kvm_vcpu *vcpu);
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
+void kvm_inject_undefined(struct kvm_vcpu *vcpu);
+
 /*
  * Return the SPSR for the specified mode of the virtual CPU.
  */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 5414eeb..778d2af 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -102,6 +102,7 @@ enum cp15_regs {
 	c5_AIFSR,		/* Auxilary Instruction Fault Status Register */
 	c6_DFAR,		/* Data Fault Address Register */
 	c6_IFAR,		/* Instruction Fault Address Register */
+	c9_L2CTLR,		/* Cortex A15 L2 Control Register */
 	c10_PRRR,		/* Primary Region Remap Register */
 	c10_NMRR,		/* Normal Memory Remap Register */
 	c12_VBAR,		/* Vector Base Address Register */
@@ -142,6 +143,10 @@ struct kvm_vcpu_arch {
 	 * Anything that is not used directly from assembly code goes
 	 * here.
 	 */
+	/* dcache set/way operation pending */
+	int last_pcpu;
+	cpumask_t require_dcache_flush;
+
 	/* IO related fields */
 	bool mmio_sign_extend;	/* for byte/halfword loads */
 	u32 mmio_rd;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 28bf2c2..8eec273 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -37,10 +37,13 @@
 #include <asm/cputype.h>
 #include <asm/idmap.h>
 #include <asm/tlbflush.h>
+#include <asm/cacheflush.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <asm/opcodes.h>
 
 #ifdef REQUIRES_SEC
 __asm__(".arch_extension	sec");
@@ -274,6 +277,17 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
 	vcpu->arch.vfp_host = __get_cpu_var(kvm_host_vfp_state);
+
+	/*
+	 * Check whether this vcpu requires the cache to be flushed on
+	 * this physical CPU. This is a consequence of doing dcache
+	 * operations by set/way on this vcpu. We do it here to be in
+	 * a non-preemptible section.
+	 */
+	if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
+		cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+		flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
+	}
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -374,6 +388,114 @@ static void update_vttbr(struct kvm *kvm)
 	spin_unlock(&kvm_vmid_lock);
 }
 
+static int handle_svc_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* SVC called from Hyp mode should never get here */
+	kvm_debug("SVC called from Hyp mode shouldn't go here\n");
+	BUG();
+	return -EINVAL; /* Squash warning */
+}
+
+static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/*
+	 * Guest called HVC instruction:
+	 * Let it know we don't want that by injecting an undefined exception.
+	 */
+	kvm_debug("hvc: %x (at %08x)", vcpu->arch.hsr & ((1 << 16) - 1),
+				     vcpu->arch.regs.pc);
+	kvm_debug("         HSR: %8x", vcpu->arch.hsr);
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* We don't support SMC; don't do that. */
+	kvm_debug("smc: at %08x", vcpu->arch.regs.pc);
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int handle_pabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* The hypervisor should never cause aborts */
+	kvm_err("Prefetch Abort taken from Hyp mode at %#08x (HSR: %#08x)\n",
+		vcpu->arch.hifar, vcpu->arch.hsr);
+	return -EFAULT;
+}
+
+static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* This is either an error in the ws. code or an external abort */
+	kvm_err("Data Abort taken from Hyp mode at %#08x (HSR: %#08x)\n",
+		vcpu->arch.hdfar, vcpu->arch.hsr);
+	return -EFAULT;
+}
+
+typedef int (*exit_handle_fn)(struct kvm_vcpu *, struct kvm_run *);
+static exit_handle_fn arm_exit_handlers[] = {
+	[HSR_EC_WFI]		= kvm_handle_wfi,
+	[HSR_EC_CP15_32]	= kvm_handle_cp15_32,
+	[HSR_EC_CP15_64]	= kvm_handle_cp15_64,
+	[HSR_EC_CP14_MR]	= kvm_handle_cp14_access,
+	[HSR_EC_CP14_LS]	= kvm_handle_cp14_load_store,
+	[HSR_EC_CP14_64]	= kvm_handle_cp14_access,
+	[HSR_EC_CP_0_13]	= kvm_handle_cp_0_13_access,
+	[HSR_EC_CP10_ID]	= kvm_handle_cp10_id,
+	[HSR_EC_SVC_HYP]	= handle_svc_hyp,
+	[HSR_EC_HVC]		= handle_hvc,
+	[HSR_EC_SMC]		= handle_smc,
+	[HSR_EC_IABT]		= kvm_handle_guest_abort,
+	[HSR_EC_IABT_HYP]	= handle_pabt_hyp,
+	[HSR_EC_DABT]		= kvm_handle_guest_abort,
+	[HSR_EC_DABT_HYP]	= handle_dabt_hyp,
+};
+
+/*
+ * A conditional instruction is allowed to trap, even though it
+ * wouldn't be executed.  So let's re-implement the hardware, in
+ * software!
+ */
+static bool kvm_condition_valid(struct kvm_vcpu *vcpu)
+{
+	unsigned long cpsr, cond, insn;
+
+	/*
+	 * Exception Code 0 can only happen if we set HCR.TGE to 1, to
+	 * catch undefined instructions, and then we won't get past
+	 * the arm_exit_handlers test anyway.
+	 */
+	BUG_ON(((vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT) == 0);
+
+	/* Top two bits non-zero?  Unconditional. */
+	if (vcpu->arch.hsr >> 30)
+		return true;
+
+	cpsr = *vcpu_cpsr(vcpu);
+
+	/* Is condition field valid? */
+	if ((vcpu->arch.hsr & HSR_CV) >> HSR_CV_SHIFT)
+		cond = (vcpu->arch.hsr & HSR_COND) >> HSR_COND_SHIFT;
+	else {
+		/* This can happen in Thumb mode: examine IT state. */
+		unsigned long it;
+
+		it = ((cpsr >> 8) & 0xFC) | ((cpsr >> 25) & 0x3);
+
+		/* it == 0 => unconditional. */
+		if (it == 0)
+			return true;
+
+		/* The cond for this insn works out as the top 4 bits. */
+		cond = (it >> 4);
+	}
+
+	/* Shift makes it look like an ARM-mode instruction */
+	insn = cond << 28;
+	return arm_check_condition(insn, cpsr) != ARM_OPCODE_CONDTEST_FAIL;
+}
+
 /*
  * Return 0 to return to guest, < 0 on error, exit_reason ( > 0) on proper
  * exit to QEMU.
@@ -381,8 +503,46 @@ static void update_vttbr(struct kvm *kvm)
 static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		       int exception_index)
 {
-	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-	return 0;
+	unsigned long hsr_ec;
+
+	switch (exception_index) {
+	case ARM_EXCEPTION_IRQ:
+		return 1;
+	case ARM_EXCEPTION_UNDEFINED:
+		kvm_err("Undefined exception in Hyp mode at: %#08x\n",
+			vcpu->arch.hyp_pc);
+		BUG();
+		panic("KVM: Hypervisor undefined exception!\n");
+	case ARM_EXCEPTION_DATA_ABORT:
+	case ARM_EXCEPTION_PREF_ABORT:
+	case ARM_EXCEPTION_HVC:
+		hsr_ec = (vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT;
+
+		if (hsr_ec >= ARRAY_SIZE(arm_exit_handlers)
+		    || !arm_exit_handlers[hsr_ec]) {
+			kvm_err("Unkown exception class: %#08lx, "
+				"hsr: %#08x\n", hsr_ec,
+				(unsigned int)vcpu->arch.hsr);
+			BUG();
+		}
+
+		/*
+		 * See ARM ARM B1.14.1: "Hyp traps on instructions
+		 * that fail their condition code check"
+		 */
+		if (!kvm_condition_valid(vcpu)) {
+			bool is_wide = vcpu->arch.hsr & HSR_IL;
+			kvm_skip_instr(vcpu, is_wide);
+			return 1;
+		}
+
+		return arm_exit_handlers[hsr_ec](vcpu, run);
+	default:
+		kvm_pr_unimpl("Unsupported exception type: %d",
+			      exception_index);
+		run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+		return 0;
+	}
 }
 
 /**
@@ -445,6 +605,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		ret = __kvm_vcpu_run(vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
+		vcpu->arch.last_pcpu = smp_processor_id();
 		kvm_guest_exit();
 		trace_kvm_exit(vcpu->arch.regs.pc);
 		/*
@@ -713,6 +874,7 @@ int kvm_arch_init(void *opaque)
 	if (err)
 		goto out_err;
 
+	kvm_coproc_table_init();
 	return 0;
 out_err:
 	return err;
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 4b9dad8..eeb8376 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -15,8 +15,580 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+#include <linux/mm.h>
 #include <linux/kvm_host.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <trace/events/kvm.h>
 
+#include "trace.h"
+
+/******************************************************************************
+ * Co-processor emulation
+ *****************************************************************************/
+
+struct coproc_params {
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+	unsigned long Rt1;
+	unsigned long Rt2;
+	bool is_64bit;
+	bool is_write;
+};
+
+struct coproc_reg {
+	/* MRC/MCR/MRRC/MCRR instruction which accesses it. */
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+
+	bool is_64;
+
+	/* Trapped access from guest, if non-NULL. */
+	bool (*access)(struct kvm_vcpu *,
+		       const struct coproc_params *,
+		       const struct coproc_reg *);
+
+	/* Initialization for vcpu. */
+	void (*reset)(struct kvm_vcpu *, const struct coproc_reg *);
+
+	/* Index into vcpu->arch.cp15[], or 0 if we don't need to save it. */
+	enum cp15_regs reg;
+
+	/* Value (usually reset value) */
+	u64 val;
+};
+
+static void print_cp_instr(const struct coproc_params *p)
+{
+	/* Look, we even formatted it for you to paste into the table! */
+	if (p->is_64bit) {
+		kvm_err(" { CRm(%2lu), Op1(%2lu), is64, func_%s },\n",
+			p->CRm, p->Op1, p->is_write ? "write" : "read");
+	} else {
+		kvm_err(" { CRn(%2lu), CRm(%2lu), Op1(%2lu), Op2(%2lu), is32,"
+			" func_%s },\n",
+			p->CRn, p->CRm, p->Op1, p->Op2,
+			p->is_write ? "write" : "read");
+	}
+}
+
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/*
+	 * We can get here, if the host has been built without VFPv3 support,
+	 * but the guest attempted a floating point operation.
+	 */
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static bool ignore_write(struct kvm_vcpu *vcpu, const struct coproc_params *p)
+{
+	return true;
+}
+
+static bool read_zero(struct kvm_vcpu *vcpu, const struct coproc_params *p)
+{
+	*vcpu_reg(vcpu, p->Rt1) = 0;
+	return true;
+}
+
+/* A15 TRM 4.3.48: R/O WI. */
+static bool access_l2ctlr(struct kvm_vcpu *vcpu,
+			  const struct coproc_params *p,
+			  const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c9_L2CTLR];
+	return true;
+}
+
+static void reset_l2ctlr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	u32 l2ctlr, ncores;
+
+	asm volatile("mrc p15, 1, %0, c9, c0, 2\n" : "=r" (l2ctlr));
+	l2ctlr &= ~(3 << 24);
+	ncores = atomic_read(&vcpu->kvm->online_vcpus) - 1;
+	l2ctlr |= (ncores & 3) << 24;
+
+	vcpu->arch.cp15[c9_L2CTLR] = l2ctlr;
+}
+
+/* A15 TRM 4.3.49: R/O WI (even if NSACR.NS_L2ERR, a write of 1 is ignored). */
+static bool access_l2ectlr(struct kvm_vcpu *vcpu,
+			   const struct coproc_params *p,
+			   const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = 0;
+	return true;
+}
+
+/* A15 TRM 4.3.60: R/O. */
+static bool access_cbar(struct kvm_vcpu *vcpu,
+			const struct coproc_params *p,
+			const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return false;
+	return read_zero(vcpu, p);
+}
+
+/* A15 TRM 4.3.28: RO WI */
+static bool access_actlr(struct kvm_vcpu *vcpu,
+			 const struct coproc_params *p,
+			 const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c1_ACTLR];
+	return true;
+}
+
+static void reset_actlr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	u32 actlr;
+
+	/* ACTLR contains SMP bit: make sure you create all cpus first! */
+	asm volatile("mrc p15, 0, %0, c1, c0, 1\n" : "=r" (actlr));
+	/* Make the SMP bit consistent with the guest configuration */
+	if (atomic_read(&vcpu->kvm->online_vcpus) > 1)
+		actlr |= 1U << 6;
+	else
+		actlr &= ~(1U << 6);
+
+	vcpu->arch.cp15[c1_ACTLR] = actlr;
+}
+
+/* See note at ARM ARM B1.14.4 */
+static bool access_dcsw(struct kvm_vcpu *vcpu,
+			const struct coproc_params *p,
+			const struct coproc_reg *r)
+{
+	u32 val;
+	int cpu;
+
+	cpu = get_cpu();
+
+	if (!p->is_write)
+		return false;
+
+	cpumask_setall(&vcpu->arch.require_dcache_flush);
+	cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+
+	/* If we were already preempted, take the long way around */
+	if (cpu != vcpu->arch.last_pcpu) {
+		flush_cache_all();
+		goto done;
+	}
+
+	val = *vcpu_reg(vcpu, p->Rt1);
+
+	switch (p->CRm) {
+	case 6:			/* Upgrade DCISW to DCCISW, as per HCR.SWIO */
+	case 14:		/* DCCISW */
+		asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r" (val));
+		break;
+
+	case 10:		/* DCCSW */
+		asm volatile("mcr p15, 0, %0, c7, c10, 2" : : "r" (val));
+		break;
+	}
+
+done:
+	put_cpu();
+
+	return true;
+}
+
+/*
+ * We could trap ID_DFR0 and tell the guest we don't support performance
+ * monitoring.  Unfortunately the patch to make the kernel check ID_DFR0 was
+ * NAKed, so it will read the PMCR anyway.
+ *
+ * Therefore we tell the guest we have 0 counters.  Unfortunately, we
+ * must always support PMCCNTR (the cycle counter): we just RAZ/WI for
+ * all PM registers, which doesn't crash the guest kernel at least.
+ */
+static bool pm_fake(struct kvm_vcpu *vcpu,
+		    const struct coproc_params *p,
+		    const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+	else
+		return read_zero(vcpu, p);
+}
+
+#define access_pmcr pm_fake
+#define access_pmcntenset pm_fake
+#define access_pmcntenclr pm_fake
+#define access_pmovsr pm_fake
+#define access_pmselr pm_fake
+#define access_pmceid0 pm_fake
+#define access_pmceid1 pm_fake
+#define access_pmccntr pm_fake
+#define access_pmxevtyper pm_fake
+#define access_pmxevcntr pm_fake
+#define access_pmuserenr pm_fake
+#define access_pmintenset pm_fake
+#define access_pmintenclr pm_fake
+
+/* Reset functions */
+static void reset_unknown(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15));
+	vcpu->arch.cp15[r->reg] = 0xdecafbad;
+}
+
+static void reset_val(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15));
+	vcpu->arch.cp15[r->reg] = r->val;
+}
+
+static void reset_unknown64(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg + 1 >= ARRAY_SIZE(vcpu->arch.cp15));
+
+	vcpu->arch.cp15[r->reg] = 0xdecafbad;
+	vcpu->arch.cp15[r->reg+1] = 0xd0c0ffee;
+}
+
+static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	/*
+	 * Compute guest MPIDR:
+	 * (Even if we present only one VCPU to the guest on an SMP
+	 * host we don't set the U bit in the MPIDR, or vice versa, as
+	 * revealing the underlying hardware properties is likely to
+	 * be the best choice).
+	 */
+	vcpu->arch.cp15[c0_MPIDR] = (read_cpuid_mpidr() & ~MPIDR_CPUID)
+		| (vcpu->vcpu_id & MPIDR_CPUID);
+}
+
+#define CRn(_x)		.CRn = _x
+#define CRm(_x) 	.CRm = _x
+#define Op1(_x) 	.Op1 = _x
+#define Op2(_x) 	.Op2 = _x
+#define is64		.is_64 = true
+#define is32		.is_64 = false
+
+/* Architected CP15 registers.
+ * Important: Must sorted ascending by CRn, CRM, Op1, Op2
+ */
+static const struct coproc_reg cp15_regs[] = {
+	/* TTBR0/TTBR1: swapped by interrupt.S. */
+	{ CRm( 2), Op1( 0), is64, NULL, reset_unknown64, c2_TTBR0 },
+	{ CRm( 2), Op1( 1), is64, NULL, reset_unknown64, c2_TTBR1 },
+
+	/* TTBCR: swapped by interrupt.S. */
+	{ CRn( 2), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_val, c2_TTBCR, 0x00000000 },
+
+	/* DACR: swapped by interrupt.S. */
+	{ CRn( 3), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c3_DACR },
+
+	/* DFSR/IFSR/ADFSR/AIFSR: swapped by interrupt.S. */
+	{ CRn( 5), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c5_DFSR },
+	{ CRn( 5), CRm( 0), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c5_IFSR },
+	{ CRn( 5), CRm( 1), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c5_ADFSR },
+	{ CRn( 5), CRm( 1), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c5_AIFSR },
+
+	/* DFAR/IFAR: swapped by interrupt.S. */
+	{ CRn( 6), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c6_DFAR },
+	{ CRn( 6), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_unknown, c6_IFAR },
+	/*
+	 * DC{C,I,CI}SW operations:
+	 */
+	{ CRn( 7), CRm( 6), Op1( 0), Op2( 2), is32, access_dcsw},
+	{ CRn( 7), CRm(10), Op1( 0), Op2( 2), is32, access_dcsw},
+	{ CRn( 7), CRm(14), Op1( 0), Op2( 2), is32, access_dcsw},
+	/*
+	 * Dummy performance monitor implementation.
+	 */
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 0), is32, access_pmcr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 1), is32, access_pmcntenset},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 2), is32, access_pmcntenclr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 3), is32, access_pmovsr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 5), is32, access_pmselr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 6), is32, access_pmceid0},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 7), is32, access_pmceid1},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 0), is32, access_pmccntr},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 1), is32, access_pmxevtyper},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 2), is32, access_pmxevcntr},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 0), is32, access_pmuserenr},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 1), is32, access_pmintenset},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 2), is32, access_pmintenclr},
+
+	/* PRRR/NMRR (aka MAIR0/MAIR1): swapped by interrupt.S. */
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c10_PRRR},
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c10_NMRR},
+
+	/* VBAR: swapped by interrupt.S. */
+	{ CRn(12), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_val, c12_VBAR, 0x00000000 },
+
+	/* CONTEXTIDR/TPIDRURW/TPIDRURO/TPIDRPRW: swapped by interrupt.S. */
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 1), is32,
+			NULL, reset_val, c13_CID, 0x00000000 },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_unknown, c13_TID_URW },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 3), is32,
+			NULL, reset_unknown, c13_TID_URO },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 4), is32,
+			NULL, reset_unknown, c13_TID_PRIV },
+};
+
+/*
+ * A15-specific CP15 registers.
+ * Important: Must sorted ascending by CRn, CRM, Op1, Op2
+ */
+static const struct coproc_reg cp15_cortex_a15_regs[] = {
+	/* MPIDR: we use VMPIDR for guest access. */
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 5), is32,
+			NULL, reset_mpidr, c0_MPIDR },
+
+	/* SCTLR: swapped by interrupt.S. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_val, c1_SCTLR, 0x00C50078 },
+	/* ACTLR: trapped by HCR.TAC bit. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 1), is32,
+			access_actlr, reset_actlr, c1_ACTLR },
+	/* CPACR: swapped by interrupt.S. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_val, c1_CPACR, 0x00000000 },
+
+	/*
+	 * L2CTLR access (guest wants to know #CPUs).
+	 */
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 2), is32,
+			access_l2ctlr, reset_l2ctlr, c9_L2CTLR },
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 3), is32, access_l2ectlr},
+
+	/* The Configuration Base Address Register. */
+	{ CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
+};
+
+/* Get specific register table for this target. */
+static const struct coproc_reg *get_target_table(unsigned target, size_t *num)
+{
+	switch (target) {
+	case KVM_ARM_TARGET_CORTEX_A15:
+		*num = ARRAY_SIZE(cp15_cortex_a15_regs);
+		return cp15_cortex_a15_regs;
+	default:
+		*num = 0;
+		return NULL;
+	}
+}
+
+static const struct coproc_reg *find_reg(const struct coproc_params *params,
+					 const struct coproc_reg table[],
+					 unsigned int num)
+{
+	unsigned int i;
+
+	for (i = 0; i < num; i++) {
+		const struct coproc_reg *r = &table[i];
+
+		if (params->is_64bit != r->is_64)
+			continue;
+		if (params->CRn != r->CRn)
+			continue;
+		if (params->CRm != r->CRm)
+			continue;
+		if (params->Op1 != r->Op1)
+			continue;
+		if (params->Op2 != r->Op2)
+			continue;
+
+		return r;
+	}
+	return NULL;
+}
+
+static int emulate_cp15(struct kvm_vcpu *vcpu,
+			const struct coproc_params *params)
+{
+	size_t num;
+	const struct coproc_reg *table, *r;
+
+	trace_kvm_emulate_cp15_imp(params->Op1, params->Rt1, params->CRn,
+				   params->CRm, params->Op2, params->is_write);
+
+	table = get_target_table(vcpu->arch.target, &num);
+
+	/* Search target-specific then generic table. */
+	r = find_reg(params, table, num);
+	if (!r)
+		r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	if (likely(r)) {
+		/* If we don't have an accessor, we should never get here! */
+		BUG_ON(!r->access);
+
+		if (likely(r->access(vcpu, params, r))) {
+			/* Skip instruction, since it was emulated */
+			int instr_len = ((vcpu->arch.hsr >> 25) & 1) ? 4 : 2;
+			*vcpu_pc(vcpu) += instr_len;
+			kvm_adjust_itstate(vcpu);
+			return 1;
+		}
+		/* If access function fails, it should complain. */
+	} else {
+		kvm_err("Unsupported guest CP15 access at: %08x\n",
+			vcpu->arch.regs.pc);
+		print_cp_instr(params);
+	}
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+/**
+ * kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = true;
+
+	params.Op1 = (vcpu->arch.hsr >> 16) & 0xf;
+	params.Op2 = 0;
+	params.Rt2 = (vcpu->arch.hsr >> 10) & 0xf;
+	params.CRn = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+static void reset_coproc_regs(struct kvm_vcpu *vcpu,
+			      const struct coproc_reg *table, size_t num)
+{
+	unsigned long i;
+
+	for (i = 0; i < num; i++)
+		if (table[i].reset)
+			table[i].reset(vcpu, &table[i]);
+}
+
+/**
+ * kvm_handle_cp15_32 -- handles a mrc/mcr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = false;
+
+	params.CRn = (vcpu->arch.hsr >> 10) & 0xf;
+	params.Op1 = (vcpu->arch.hsr >> 14) & 0x7;
+	params.Op2 = (vcpu->arch.hsr >> 17) & 0x7;
+	params.Rt2 = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+static int cmp_reg(const struct coproc_reg *i1, const struct coproc_reg *i2)
+{
+	if (i1->CRn != i2->CRn)
+		return i1->CRn - i2->CRn;
+	if (i1->CRm != i2->CRm)
+		return i1->CRm - i2->CRm;
+	if (i1->Op1 != i2->Op1)
+		return i1->Op1 - i2->Op1;
+	return i1->Op2 - i2->Op2;
+}
+
+void kvm_coproc_table_init(void)
+{
+	unsigned int i;
+
+	/* Make sure tables are unique and in order. */
+	for (i = 1; i < ARRAY_SIZE(cp15_regs); i++)
+		BUG_ON(cmp_reg(&cp15_regs[i-1], &cp15_regs[i]) >= 0);
+	for (i = 1; i < ARRAY_SIZE(cp15_cortex_a15_regs); i++)
+		BUG_ON(cmp_reg(&cp15_cortex_a15_regs[i-1],
+			       &cp15_cortex_a15_regs[i]) >= 0);
+}
+
+/**
+ * kvm_reset_coprocs - sets cp15 registers to reset value
+ * @vcpu: The VCPU pointer
+ *
+ * This function finds the right table above and sets the registers on the
+ * virtual CPU struct to their architecturally defined reset values.
+ */
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
 {
+	size_t num;
+	const struct coproc_reg *table;
+
+	/* Catch someone adding a register without putting in reset entry. */
+	memset(vcpu->arch.cp15, 0x42, sizeof(vcpu->arch.cp15));
+
+	/* Generic chip reset first (so target could override). */
+	reset_coproc_regs(vcpu, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	table = get_target_table(vcpu->arch.target, &num);
+	reset_coproc_regs(vcpu, table, num);
+
+	for (num = 1; num < nr_cp15_regs; num++)
+		if (vcpu->arch.cp15[num] == 0x42424242)
+			panic("Didn't reset vcpu->arch.cp15[%zi]", num);
 }
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index c0bacc6..93bd3e2 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -16,7 +16,13 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 
+#include <linux/mm.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
 
 #define REG_OFFSET(_reg) \
 	(offsetof(struct kvm_vcpu_regs, _reg) / sizeof(u32))
@@ -125,3 +131,117 @@ u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
 
 	return reg_array + vcpu_reg_offsets[mode][reg_num];
 }
+
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return 0;
+}
+
+/**
+ * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
+ * @vcpu:	The VCPU pointer
+ *
+ * When exceptions occur while instructions are executed in Thumb IF-THEN
+ * blocks, the ITSTATE field of the CPSR is not advanved (updated), so we have
+ * to do this little bit of work manually. The fields map like this:
+ *
+ * IT[7:0] -> CPSR[26:25],CPSR[15:10]
+ */
+void kvm_adjust_itstate(struct kvm_vcpu *vcpu)
+{
+	unsigned long itbits, cond;
+	unsigned long cpsr = *vcpu_cpsr(vcpu);
+	bool is_arm = !(cpsr & PSR_T_BIT);
+
+	BUG_ON(is_arm && (cpsr & PSR_IT_MASK));
+
+	if (!(cpsr & PSR_IT_MASK))
+		return;
+
+	cond = (cpsr & 0xe000) >> 13;
+	itbits = (cpsr & 0x1c00) >> (10 - 2);
+	itbits |= (cpsr & (0x3 << 25)) >> 25;
+
+	/* Perform ITAdvance (see page A-52 in ARM DDI 0406C) */
+	if ((itbits & 0x7) == 0)
+		itbits = cond = 0;
+	else
+		itbits = (itbits << 1) & 0x1f;
+
+	cpsr &= ~PSR_IT_MASK;
+	cpsr |= cond << 13;
+	cpsr |= (itbits & 0x1c) << (10 - 2);
+	cpsr |= (itbits & 0x3) << 25;
+	*vcpu_cpsr(vcpu) = cpsr;
+}
+
+/**
+ * kvm_skip_instr - skip a trapped instruction and proceed to the next
+ * @vcpu: The vcpu pointer
+ */
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
+{
+	bool is_thumb;
+
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	if (is_thumb && !is_wide_instr)
+		*vcpu_pc(vcpu) += 2;
+	else
+		*vcpu_pc(vcpu) += 4;
+	kvm_adjust_itstate(vcpu);
+}
+
+
+/******************************************************************************
+ * Inject exceptions into the guest
+ */
+
+static u32 exc_vector_base(struct kvm_vcpu *vcpu)
+{
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	u32 vbar = vcpu->arch.cp15[c12_VBAR];
+
+	if (sctlr & SCTLR_V)
+		return 0xffff0000;
+	else /* always have security exceptions */
+		return vbar;
+}
+
+/**
+ * kvm_inject_undefined - inject an undefined exception into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ *
+ * Modelled after TakeUndefInstrException() pseudocode.
+ */
+void kvm_inject_undefined(struct kvm_vcpu *vcpu)
+{
+	u32 new_lr_value;
+	u32 new_spsr_value;
+	u32 cpsr = *vcpu_cpsr(vcpu);
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	bool is_thumb = (cpsr & PSR_T_BIT);
+	u32 vect_offset = 4;
+	u32 return_offset = (is_thumb) ? 2 : 4;
+
+	new_spsr_value = cpsr;
+	new_lr_value = *vcpu_pc(vcpu) - return_offset;
+
+	*vcpu_cpsr(vcpu) = (cpsr & ~MODE_MASK) | UND_MODE;
+	*vcpu_cpsr(vcpu) |= PSR_I_BIT;
+	*vcpu_cpsr(vcpu) &= ~(PSR_IT_MASK | PSR_J_BIT | PSR_E_BIT | PSR_T_BIT);
+
+	if (sctlr & SCTLR_TE)
+		*vcpu_cpsr(vcpu) |= PSR_T_BIT;
+	if (sctlr & SCTLR_EE)
+		*vcpu_cpsr(vcpu) |= PSR_E_BIT;
+
+	/* Note: These now point to UND banked copies */
+	*vcpu_spsr(vcpu) = cpsr;
+	*vcpu_reg(vcpu, 14) = new_lr_value;
+
+	/* Branch to exception vector */
+	*vcpu_pc(vcpu) = exc_vector_base(vcpu) + vect_offset;
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index f8869c1..e474a0a 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,7 +39,35 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+/* Architecturally implementation defined CP15 register access */
+TRACE_EVENT(kvm_emulate_cp15_imp,
+	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
+		 unsigned long CRm, unsigned long Op2, bool is_write),
+	TP_ARGS(Op1, Rt1, CRn, CRm, Op2, is_write),
 
+	TP_STRUCT__entry(
+		__field(	unsigned int,	Op1		)
+		__field(	unsigned int,	Rt1		)
+		__field(	unsigned int,	CRn		)
+		__field(	unsigned int,	CRm		)
+		__field(	unsigned int,	Op2		)
+		__field(	bool,		is_write	)
+	),
+
+	TP_fast_assign(
+		__entry->is_write		= is_write;
+		__entry->Op1			= Op1;
+		__entry->Rt1			= Rt1;
+		__entry->CRn			= CRn;
+		__entry->CRm			= CRm;
+		__entry->Op2			= Op2;
+	),
+
+	TP_printk("Implementation defined CP15: %s\tp15, %u, r%u, c%u, c%u, %u",
+			(__entry->is_write) ? "mcr" : "mrc",
+			__entry->Op1, __entry->Rt1, __entry->CRn,
+			__entry->CRm, __entry->Op2)
+);
 
 #endif /* _TRACE_KVM_H */
 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 11/14] KVM: ARM: User space API for getting/setting co-proc registers
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
                   ` (9 preceding siblings ...)
  2012-08-16 15:29 ` [PATCH v10 10/14] KVM: ARM: Emulation framework and CP15 emulation Christoffer Dall
@ 2012-08-16 15:30 ` Christoffer Dall
  2012-08-16 15:30 ` [PATCH v10 12/14] KVM: ARM: Handle guest faults in KVM Christoffer Dall
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:30 UTC (permalink / raw)
  To: kvmarm, kvm

The following three ioctls are implemented:
 -  KVM_VCPU_GET_MSR_INDEX_LIST
 -  KVM_GET_MSRS
 -  KVM_SET_MSRS

Now we have a table for all the cp15 registers, we can drive a generic
API.  x86 already has one for sparse-numbered registers, so we simply
reproduce that.  The only difference is that our KVM_GET_MSR_INDEX_LIST
is a per-vcpu ioctl; we can't know the MSRs until we know the cpu type.
Thus we add KVM_VCPU_GET_MSR_INDEX_LIST.

The numbering for the indices for coprocesors is simple, if userspace
cares (it might not for simple save and restore): the upper 16 bits
are the coprocessor number.  If it's > 15, it's something else, for
future expansion.

Bit 15 indicates a 64-bit register.  For 64 bit registers the bottom 4
bits are CRm, the next 4 are opc1 (just like the MCRR/MRCC instruction
encoding).  For 32 bit registers, the bottom 4 bits are CRm, the next
3 are opc2, the next 4 CRn, and the next 3 opc1 (the same order as the
MRC/MCR instruction encoding, but not the same bit positions).

64-bit coprocessor register:
       ...|19 18 17 16|15|14 13 12 11 10  9  8| 7  6  5  4 |3  2  1  0|
  ...0  0 |  cp num   | 1| 0  0  0  0  0  0  0|   opc1     |   CRm    |

32-bit coprocessor register:
       ...|19 18 17 16|15|14|13 12 11|10  9  8  7 |6  5  4 |3  2  1  0|
  ...0  0 |  cp num   | 0| 0|  opc1  |    CRn     | opc2   |   CRm    |

Non-coprocessor register:

   | 32 31 30 29 28 27 26 25 24 23 22 21 20|19 18 17 16 15 ...
   |     < some non-zero value >           | ...


For futureproofing, we need to tell QEMU about the CP15 registers the
host lets the guest access.

It will need this information to restore a current guest on a future
CPU or perhaps a future KVM which allow some of these to be changed.

We use a separate table for these, as they're only for the userspace API.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
---
 Documentation/virtual/kvm/api.txt |   73 +++++++
 arch/arm/include/asm/kvm.h        |   31 +++
 arch/arm/include/asm/kvm_coproc.h |    7 +
 arch/arm/kvm/arm.c                |   29 +++
 arch/arm/kvm/coproc.c             |  368 +++++++++++++++++++++++++++++++++++++
 include/linux/kvm.h               |    1 
 6 files changed, 506 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 8345b78..19d8915 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -451,13 +451,14 @@ Support for this has been removed.  Use KVM_SET_GUEST_DEBUG instead.
 4.18 KVM_GET_MSRS
 
 Capability: basic
-Architectures: x86
+Architectures: x86, arm
 Type: vcpu ioctl
 Parameters: struct kvm_msrs (in/out)
 Returns: 0 on success, -1 on error
 
 Reads model-specific registers from the vcpu.  Supported msr indices can
-be obtained using KVM_GET_MSR_INDEX_LIST.
+be obtained using KVM_GET_MSR_INDEX_LIST (x86) or KVM_VCPU_GET_MSR_INDEX_LIST
+(arm).
 
 struct kvm_msrs {
 	__u32 nmsrs; /* number of msrs in entries */
@@ -480,7 +481,7 @@ kvm will fill in the 'data' member.
 4.19 KVM_SET_MSRS
 
 Capability: basic
-Architectures: x86
+Architectures: x86, arm
 Type: vcpu ioctl
 Parameters: struct kvm_msrs (in)
 Returns: 0 on success, -1 on error
@@ -1985,6 +1986,72 @@ the virtualized real-mode area (VRMA) facility, the kernel will
 re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
 
 
+4.76 KVM_VCPU_GET_MSR_INDEX_LIST
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct kvm_msr_list (in/out)
+Returns: 0 on success; -1 on error
+Errors:
+  E2BIG:     the msr index list is too big to fit in the array specified by
+             the user.
+
+struct kvm_msr_list {
+	__u32 nmsrs; /* number of msrs in entries */
+	__u32 indices[0];
+};
+
+This ioctl returns the guest special registers that are supported, and
+is only valid after KVM_ARM_VCPU_INIT has been performed to initialize
+the vcpu type and features.  It is otherwise the equivalent of the
+x86-specific KVM_GET_MSR_INDEX_LIST, for arm's coprocessor registers
+and other non-register state.
+
+The numbering for the indices for coprocesors is simple: the upper 16
+bits are the coprocessor number.  If it's > 15, it's something else,
+for future expansion.
+
+Bit 15 indicates a 64-bit register.  For 64 bit registers the bottom 4
+bits are CRm, the next 4 are opc1 (just like the MCRR/MRCC instruction
+encoding).  For 32 bit registers, the bottom 4 bits are CRm, the next
+3 are opc2, the next 4 CRn, and the next 3 opc1 (the same order as the
+MRC/MCR instruction encoding, but not the same bit positions).
+
+64-bit coprocessor register:
+       ...|19 18 17 16|15|14 13 12 11 10  9  8| 7  6  5  4 |3  2  1  0|
+  ...0  0 |  cp num   | 1| 0  0  0  0  0  0  0|   opc1     |   CRm    |
+
+32-bit coprocessor register:
+       ...|19 18 17 16|15|14|13 12 11|10  9  8  7 |6  5  4 |3  2  1  0|
+  ...0  0 |  cp num   | 0| 0|  opc1  |    CRn     | opc2   |   CRm    |
+
+Non-coprocessor register:
+
+   | 32 31 30 29 28 27 26 25 24 23 22 21 20|19 18 17 16 15 ...
+   |     < some non-zero value >           | ...
+
+
+4.77 KVM_ARM_VCPU_INIT
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct struct kvm_vcpu_init (in)
+Returns: 0 on success; -1 on error
+Errors:
+  EINVAL:    the target is unknown, or the combination of features is invalid.
+  ENOENT:    a features bit specified is unknown.
+
+This tells KVM what type of CPU to present to the guest, and what
+optional features it should have.  This will cause a reset of the cpu
+registers to their initial values.  If this is not called, KVM_RUN will
+return ENOEXEC for that vcpu.
+
+Note that because some registers reflect machine topology, all vcpus
+should be created before this ioctl is invoked.
+
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index 4a3e25d..d040a2a 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -85,4 +85,35 @@ struct kvm_sync_regs {
 struct kvm_arch_memory_slot {
 };
 
+/* Based on x86, but we use KVM_GET_VCPU_MSR_INDEX_LIST. */
+struct kvm_msr_entry {
+	__u32 index;
+	__u32 reserved;
+	__u64 data;
+};
+
+/* for KVM_GET_MSRS and KVM_SET_MSRS */
+struct kvm_msrs {
+	__u32 nmsrs; /* number of msrs in entries */
+	__u32 pad;
+
+	struct kvm_msr_entry entries[0];
+};
+
+/* for KVM_VCPU_GET_MSR_INDEX_LIST */
+struct kvm_msr_list {
+	__u32 nmsrs; /* number of msrs in entries */
+	__u32 indices[0];
+};
+
+/* If you need to interpret the index values, here's the key. */
+#define KVM_ARM_MSR_COPROC_MASK		0xFFFF0000
+#define KVM_ARM_MSR_64_BIT_MASK		0x00008000
+#define KVM_ARM_MSR_64_OPC1_MASK	0x000000F0
+#define KVM_ARM_MSR_64_CRM_MASK		0x0000000F
+#define KVM_ARM_MSR_32_CRM_MASK		0x0000000F
+#define KVM_ARM_MSR_32_OPC2_MASK	0x00000070
+#define KVM_ARM_MSR_32_CRN_MASK		0x00000780
+#define KVM_ARM_MSR_32_OPC1_MASK	0x00003800
+
 #endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
index c451fb4..894574c 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -27,5 +27,12 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+int kvm_arm_get_msrs(struct kvm_vcpu *vcpu,
+		     struct kvm_msr_entry __user *entries, u32 num);
+int kvm_arm_set_msrs(struct kvm_vcpu *vcpu,
+		     struct kvm_msr_entry __user *entries, u32 num);
+unsigned long kvm_arm_num_guest_msrs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_msrindices(struct kvm_vcpu *vcpu, u32 __user *uindices);
 void kvm_coproc_table_init(void);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 8eec273..4eafdcd 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -693,6 +693,35 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		return kvm_vcpu_set_target(vcpu, &init);
 
 	}
+	case KVM_VCPU_GET_MSR_INDEX_LIST: {
+		struct kvm_msr_list __user *user_msr_list = argp;
+		struct kvm_msr_list msr_list;
+		unsigned n;
+
+		if (copy_from_user(&msr_list, user_msr_list, sizeof msr_list))
+			return -EFAULT;
+		n = msr_list.nmsrs;
+		msr_list.nmsrs = kvm_arm_num_guest_msrs(vcpu);
+		if (copy_to_user(user_msr_list, &msr_list, sizeof msr_list))
+			return -EFAULT;
+		if (n < msr_list.nmsrs)
+			return -E2BIG;
+		return kvm_arm_copy_msrindices(vcpu, user_msr_list->indices);
+	}
+	case KVM_GET_MSRS: {
+		struct kvm_msrs msrs;
+		struct kvm_msrs __user *umsrs = argp;
+		if (copy_from_user(&msrs, umsrs, sizeof(msrs)) != 0)
+			return -EFAULT;
+		return kvm_arm_get_msrs(vcpu, umsrs->entries, msrs.nmsrs);
+	}
+	case KVM_SET_MSRS: {
+		struct kvm_msrs msrs;
+		struct kvm_msrs __user *umsrs = argp;
+		if (copy_from_user(&msrs, umsrs, sizeof(msrs)) != 0)
+			return -EFAULT;
+		return kvm_arm_set_msrs(vcpu, umsrs->entries, msrs.nmsrs);
+	}
 	default:
 		return -EINVAL;
 	}
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index eeb8376..e36826c 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -17,6 +17,7 @@
  */
 #include <linux/mm.h>
 #include <linux/kvm_host.h>
+#include <linux/uaccess.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
@@ -544,8 +545,261 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return emulate_cp15(vcpu, &params);
 }
 
+/******************************************************************************
+ * Userspace API
+ *****************************************************************************/
+
+/* Given a simple mask, get those bits. */
+static inline u32 get_bits(u32 index, u32 mask)
+{
+	return (index & mask) >> (ffs(mask) - 1);
+}
+
+static void index_to_params(u32 index, struct coproc_params *params)
+{
+	if (get_bits(index, KVM_ARM_MSR_64_BIT_MASK)) {
+		params->is_64bit = true;
+		params->CRm = get_bits(index, KVM_ARM_MSR_64_CRM_MASK);
+		params->Op1 = get_bits(index, KVM_ARM_MSR_64_OPC1_MASK);
+		params->Op2 = 0;
+		params->CRn = 0;
+	} else {
+		params->is_64bit = false;
+		params->CRn = get_bits(index, KVM_ARM_MSR_32_CRN_MASK);
+		params->CRm = get_bits(index, KVM_ARM_MSR_32_CRM_MASK);
+		params->Op1 = get_bits(index, KVM_ARM_MSR_32_OPC1_MASK);
+		params->Op2 = get_bits(index, KVM_ARM_MSR_32_OPC2_MASK);
+	}
+}
+
+/* Decode an index value, and find the cp15 coproc_reg entry. */
+static const struct coproc_reg *index_to_coproc_reg(struct kvm_vcpu *vcpu,
+						    u32 index)
+{
+	size_t num;
+	const struct coproc_reg *table, *r;
+	struct coproc_params params;
+
+	/* We only do cp15 for now. */
+	if (get_bits(index, KVM_ARM_MSR_COPROC_MASK != 15))
+		return NULL;
+
+	index_to_params(index, &params);
+
+	table = get_target_table(vcpu->arch.target, &num);
+	r = find_reg(&params, table, num);
+	if (!r)
+		r = find_reg(&params, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	/* Not saved in the cp15 array? */
+	if (r && !r->reg)
+		r = NULL;
+
+	return r;
+}
+
+/*
+ * These are the invariant cp15 registers: we let the guest see the host
+ * versions of these, so they're part of the guest state.
+ *
+ * A future CPU may provide a mechanism to present different values to
+ * the guest, or a future kvm may trap them.
+ */
+/* Unfortunately, there's no register-argument for mrc, so generate. */
+#define FUNCTION_FOR32(crn, crm, op1, op2, name)			\
+	static void get_##name(struct kvm_vcpu *v,			\
+			       const struct coproc_reg *r)		\
+	{								\
+		u32 val;						\
+									\
+		asm volatile("mrc p15, " __stringify(op1)		\
+			     ", %0, c" __stringify(crn)			\
+			     ", c" __stringify(crm)			\
+			     ", " __stringify(op2) "\n" : "=r" (val));	\
+		((struct coproc_reg *)r)->val = val;			\
+	}
+
+FUNCTION_FOR32(0, 0, 0, 0, MIDR)
+FUNCTION_FOR32(0, 0, 0, 1, CTR)
+FUNCTION_FOR32(0, 0, 0, 2, TCMTR)
+FUNCTION_FOR32(0, 0, 0, 3, TLBTR)
+FUNCTION_FOR32(0, 0, 0, 6, REVIDR)
+FUNCTION_FOR32(0, 1, 0, 0, ID_PFR0)
+FUNCTION_FOR32(0, 1, 0, 1, ID_PFR1)
+FUNCTION_FOR32(0, 1, 0, 2, ID_DFR0)
+FUNCTION_FOR32(0, 1, 0, 3, ID_AFR0)
+FUNCTION_FOR32(0, 1, 0, 4, ID_MMFR0)
+FUNCTION_FOR32(0, 1, 0, 5, ID_MMFR1)
+FUNCTION_FOR32(0, 1, 0, 6, ID_MMFR2)
+FUNCTION_FOR32(0, 1, 0, 7, ID_MMFR3)
+FUNCTION_FOR32(0, 2, 0, 0, ID_ISAR0)
+FUNCTION_FOR32(0, 2, 0, 1, ID_ISAR1)
+FUNCTION_FOR32(0, 2, 0, 2, ID_ISAR2)
+FUNCTION_FOR32(0, 2, 0, 3, ID_ISAR3)
+FUNCTION_FOR32(0, 2, 0, 4, ID_ISAR4)
+FUNCTION_FOR32(0, 2, 0, 5, ID_ISAR5)
+FUNCTION_FOR32(0, 0, 1, 0, CSSIDR)
+FUNCTION_FOR32(0, 0, 1, 1, CLIDR)
+FUNCTION_FOR32(0, 0, 1, 7, AIDR)
+
+/* ->val is filled in by kvm_invariant_coproc_table_init() */
+static struct coproc_reg invariant_cp15[] = {
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 0), is32, NULL, get_MIDR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 1), is32, NULL, get_CTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 2), is32, NULL, get_TCMTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 3), is32, NULL, get_TLBTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 6), is32, NULL, get_REVIDR },
+
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 0), is32, NULL, get_ID_PFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 1), is32, NULL, get_ID_PFR1 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 2), is32, NULL, get_ID_DFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 3), is32, NULL, get_ID_AFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 4), is32, NULL, get_ID_MMFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 5), is32, NULL, get_ID_MMFR1 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 6), is32, NULL, get_ID_MMFR2 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 7), is32, NULL, get_ID_MMFR3 },
+
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 0), is32, NULL, get_ID_ISAR0 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 1), is32, NULL, get_ID_ISAR1 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 2), is32, NULL, get_ID_ISAR2 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 3), is32, NULL, get_ID_ISAR3 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 4), is32, NULL, get_ID_ISAR4 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 5), is32, NULL, get_ID_ISAR5 },
+
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 0), is32, NULL, get_CSSIDR },
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR },
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
+};
+
+static int get_invariant_cp15(u32 index, u64 *val)
+{
+	struct coproc_params params;
+	const struct coproc_reg *r;
+
+	index_to_params(index, &params);
+	r = find_reg(&params, invariant_cp15, ARRAY_SIZE(invariant_cp15));
+	if (!r)
+		return -ENOENT;
+
+	*val = r->val;
+	return 0;
+}
+
+static int set_invariant_cp15(u32 index, u64 val)
+{
+	struct coproc_params params;
+	const struct coproc_reg *r;
+
+	index_to_params(index, &params);
+	r = find_reg(&params, invariant_cp15, ARRAY_SIZE(invariant_cp15));
+	if (!r)
+		return -ENOENT;
+
+	/* This is what we mean by invariant: you can't change it. */
+	if (r->val != val)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *val)
+{
+	const struct coproc_reg *r;
+
+	r = index_to_coproc_reg(vcpu, index);
+	if (!r)
+		return get_invariant_cp15(index, val);
+
+	*val = vcpu->arch.cp15[r->reg];
+	if (r->is_64)
+		*val |= ((u64)vcpu->arch.cp15[r->reg+1]) << 32;
+	return 0;
+}
+
+static int set_msr(struct kvm_vcpu *vcpu, u32 index, u64 val)
+{
+	const struct coproc_reg *r;
+
+	r = index_to_coproc_reg(vcpu, index);
+	if (!r)
+		return set_invariant_cp15(index, val);
+
+	vcpu->arch.cp15[r->reg] = val;
+	if (r->is_64)
+		vcpu->arch.cp15[r->reg+1] = (val >> 32);
+	return 0;
+}
+
+/* Return user adddress to get/set value from. */
+static u64 __user *get_umsr(struct kvm_msr_entry __user *uentry, u32 *idx)
+{
+	struct kvm_msr_entry entry;
+
+	if (copy_from_user(&entry, uentry, sizeof(entry)))
+		return NULL;
+	*idx = entry.index;
+	return &uentry->data;
+}
+
+/**
+ * kvm_arm_get_msrs - copy one or more special registers to userspace.
+ * @vcpu: the vcpu
+ * @entries: the array of entries
+ * @num: the number of entries
+ */
+int kvm_arm_get_msrs(struct kvm_vcpu *vcpu,
+		     struct kvm_msr_entry __user *entries, u32 num)
+{
+	u32 i, index;
+	u64 val;
+	u64 __user *uval;
+	int ret;
+
+	for (i = 0; i < num; i++) {
+		uval = get_umsr(&entries[i], &index);
+		if (!uval)
+			return -EFAULT;
+		if ((ret = get_msr(vcpu, index, &val)) != 0)
+			return ret;
+		if (put_user(val, uval))
+			return -EFAULT;
+	}
+	return 0;
+}
+
+/**
+ * kvm_arm_set_msrs - copy one or more special registers from userspace.
+ * @vcpu: the vcpu
+ * @entries: the array of entries
+ * @num: the number of entries
+ */
+int kvm_arm_set_msrs(struct kvm_vcpu *vcpu,
+		     struct kvm_msr_entry __user *entries, u32 num)
+{
+	u32 i, index;
+	u64 val;
+	u64 __user *uval;
+	int ret;
+
+	for (i = 0; i < num; i++) {
+		uval = get_umsr(&entries[i], &index);
+		if (!uval)
+			return -EFAULT;
+		if (copy_from_user(&val, uval, sizeof(val)) != 0)
+			return -EFAULT;
+		if ((ret = set_msr(vcpu, index, val)) != 0)
+			return ret;
+	}
+	return 0;
+}
+
 static int cmp_reg(const struct coproc_reg *i1, const struct coproc_reg *i2)
 {
+	BUG_ON(i1 == i2);
+	if (!i1)
+		return 1;
+	else if (!i2)
+		return -1;
 	if (i1->CRn != i2->CRn)
 		return i1->CRn - i2->CRn;
 	if (i1->CRm != i2->CRm)
@@ -555,6 +809,116 @@ static int cmp_reg(const struct coproc_reg *i1, const struct coproc_reg *i2)
 	return i1->Op2 - i2->Op2;
 }
 
+/* Puts in the position indicated by mask (assumes val fits in mask) */
+static inline u32 set_bits(u32 val, u32 mask)
+{
+	return val << (ffs(mask)-1);
+}
+
+static u32 cp15_to_index(const struct coproc_reg *reg)
+{
+	u32 val = set_bits(15, KVM_ARM_MSR_COPROC_MASK);
+	if (reg->is_64) {
+		val |= set_bits(1, KVM_ARM_MSR_64_BIT_MASK);
+		val |= set_bits(reg->Op1, KVM_ARM_MSR_64_OPC1_MASK);
+		val |= set_bits(reg->CRm, KVM_ARM_MSR_64_CRM_MASK);
+	} else {
+		val |= set_bits(reg->Op1, KVM_ARM_MSR_32_OPC1_MASK);
+		val |= set_bits(reg->Op2, KVM_ARM_MSR_32_OPC2_MASK);
+		val |= set_bits(reg->CRm, KVM_ARM_MSR_32_CRM_MASK);
+		val |= set_bits(reg->CRn, KVM_ARM_MSR_32_CRN_MASK);
+	}
+	return val;
+}
+
+static bool copy_reg_to_user(const struct coproc_reg *reg, u32 __user **uind)
+{
+	if (!*uind)
+		return true;
+
+	if (put_user(cp15_to_index(reg), *uind))
+		return false;
+
+	(*uind)++;
+	return true;
+}
+
+/* Assumed ordered tables, see kvm_coproc_table_init. */
+static int walk_msrs(struct kvm_vcpu *vcpu, u32 __user *uind)
+{
+	const struct coproc_reg *i1, *i2, *end1, *end2;
+	unsigned int total = 0;
+	size_t num;
+
+	/* We check for duplicates here, to allow arch-specific overrides. */
+	i1 = get_target_table(vcpu->arch.target, &num);
+	end1 = i1 + num;
+	i2 = cp15_regs;
+	end2 = cp15_regs + ARRAY_SIZE(cp15_regs);
+
+	BUG_ON(i1 == end1 || i2 == end2);
+
+	/* Walk carefully, as both tables may refer to the same register. */
+	while (i1 && i2) {
+		int cmp = cmp_reg(i1, i2);
+		/* target-specific overrides generic entry. */
+		if (cmp <= 0) {
+			/* Ignore registers we trap but don't save. */
+			if (i1->reg) {
+				if (!copy_reg_to_user(i1, &uind))
+					return -EFAULT;
+				total++;
+			}
+		} else {
+			/* Ignore registers we trap but don't save. */
+			if (i2->reg) {
+				if (!copy_reg_to_user(i2, &uind))
+					return -EFAULT;
+				total++;
+			}
+		}
+
+		if (cmp <= 0 && ++i1 == end1)
+			i1 = NULL;
+		if (cmp >= 0 && ++i2 == end2)
+			i2 = NULL;
+	}
+	return total;
+}
+
+/**
+ * kvm_arm_num_guest_msrs - how many registers do we present via KVM_GET_MSR
+ *
+ * This is for special registers, particularly cp15.
+ */
+unsigned long kvm_arm_num_guest_msrs(struct kvm_vcpu *vcpu)
+{
+	return ARRAY_SIZE(invariant_cp15) + walk_msrs(vcpu, (u32 __user *)NULL);
+}
+
+/**
+ * kvm_arm_copy_msrindices - copy a series of coprocessor registers.
+ *
+ * This is for special registers, particularly cp15.
+ */
+int kvm_arm_copy_msrindices(struct kvm_vcpu *vcpu, u32 __user *uindices)
+{
+	unsigned int i;
+	int err;
+
+	/* First give them all the invariant registers' indices. */
+	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++) {
+		if (put_user(cp15_to_index(&invariant_cp15[i]), uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	err = walk_msrs(vcpu, uindices);
+	if (err > 0)
+		err = 0;
+	return err;
+}
+
 void kvm_coproc_table_init(void)
 {
 	unsigned int i;
@@ -565,6 +929,10 @@ void kvm_coproc_table_init(void)
 	for (i = 1; i < ARRAY_SIZE(cp15_cortex_a15_regs); i++)
 		BUG_ON(cmp_reg(&cp15_cortex_a15_regs[i-1],
 			       &cp15_cortex_a15_regs[i]) >= 0);
+
+	/* We abuse the reset function to overwrite the table itself. */
+	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++)
+		invariant_cp15[i].reset(NULL, &invariant_cp15[i]);
 }
 
 /**
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index c9b2556..209b432 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -906,6 +906,7 @@ struct kvm_s390_ucas_mapping {
 /* VM is being stopped by host */
 #define KVM_KVMCLOCK_CTRL	  _IO(KVMIO,   0xad)
 #define KVM_ARM_VCPU_INIT	  _IOW(KVMIO,  0xae, struct kvm_vcpu_init)
+#define KVM_VCPU_GET_MSR_INDEX_LIST    _IOWR(KVMIO, 0xaf, struct kvm_msr_list)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 12/14] KVM: ARM: Handle guest faults in KVM
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
                   ` (10 preceding siblings ...)
  2012-08-16 15:30 ` [PATCH v10 11/14] KVM: ARM: User space API for getting/setting co-proc registers Christoffer Dall
@ 2012-08-16 15:30 ` Christoffer Dall
  2012-08-16 15:30 ` [PATCH v10 13/14] KVM: ARM: Handle I/O aborts Christoffer Dall
  2012-08-16 15:30 ` [PATCH v10 14/14] KVM: ARM: Guest wait-for-interrupts (WFI) support Christoffer Dall
  13 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:30 UTC (permalink / raw)
  To: kvmarm, kvm

Handles the guest faults in KVM by mapping in corresponding user pages
in the 2nd stage page tables.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h |    9 ++++
 arch/arm/include/asm/kvm_asm.h |    2 +
 arch/arm/kvm/mmu.c             |  102 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index ae586c1..4cff3b7 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -158,11 +158,20 @@
 #define HSR_ISS		(HSR_IL - 1)
 #define HSR_ISV_SHIFT	(24)
 #define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_FSC		(0x3f)
+#define HSR_FSC_TYPE	(0x3c)
+#define HSR_WNR		(1 << 6)
 #define HSR_CV_SHIFT	(24)
 #define HSR_CV		(1U << HSR_CV_SHIFT)
 #define HSR_COND_SHIFT	(20)
 #define HSR_COND	(0xfU << HSR_COND_SHIFT)
 
+#define FSC_FAULT	(0x04)
+#define FSC_PERM	(0x0c)
+
+/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
+#define HPFAR_MASK	(~0xf)
+
 #define HSR_EC_UNKNOWN	(0x00)
 #define HSR_EC_WFI	(0x01)
 #define HSR_EC_CP15_32	(0x03)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 55b6446..85bd676 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -48,6 +48,8 @@ extern char __kvm_hyp_vector[];
 extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 6cb0e38..448fbd6 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -25,6 +25,7 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_asm.h>
 #include <asm/mach/map.h>
+#include <asm/kvm_asm.h>
 
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
@@ -491,9 +492,108 @@ out:
 	return ret;
 }
 
+static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			  gfn_t gfn, struct kvm_memory_slot *memslot,
+			  bool is_iabt)
+{
+	pte_t new_pte;
+	pfn_t pfn;
+	int ret;
+	bool write_fault, writable;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+
+	/* TODO: Use instr. decoding for non-ISV to determine r/w fault */
+	if (is_iabt)
+		write_fault = false;
+	else if ((vcpu->arch.hsr & HSR_ISV) && !(vcpu->arch.hsr & HSR_WNR))
+		write_fault = false;
+	else
+		write_fault = true;
+
+	if ((vcpu->arch.hsr & HSR_FSC_TYPE) == FSC_PERM && !write_fault) {
+		kvm_err("Unexpected L2 read permission error\n");
+		return -EFAULT;
+	}
+
+	pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
+
+	if (is_error_pfn(pfn)) {
+		put_page(pfn_to_page(pfn));
+		kvm_err("No host mapping: gfn %u (0x%08x)\n",
+			(unsigned int)gfn,
+			(unsigned int)gfn << PAGE_SHIFT);
+		return -EFAULT;
+	}
+
+	/* We need minimum second+third level pages */
+	ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
+	if (ret)
+		return ret;
+	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
+	if (writable)
+		new_pte |= L_PTE2_WRITE;
+	spin_lock(&vcpu->kvm->arch.pgd_lock);
+	stage2_set_pte(vcpu->kvm, memcache, fault_ipa, &new_pte);
+	spin_unlock(&vcpu->kvm->arch.pgd_lock);
+
+	return ret;
+}
+
+/**
+ * kvm_handle_guest_abort - handles all 2nd stage aborts
+ * @vcpu:	the VCPU pointer
+ * @run:	the kvm_run structure
+ *
+ * Any abort that gets to the host is almost guaranteed to be caused by a
+ * missing second stage translation table entry, which can mean that either the
+ * guest simply needs more memory and we must allocate an appropriate page or it
+ * can mean that the guest tried to access I/O memory, which is emulated by user
+ * space. The distinction is based on the IPA causing the fault and whether this
+ * memory region has been registered as standard RAM by user space.
+ */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	unsigned long hsr_ec;
+	unsigned long fault_status;
+	phys_addr_t fault_ipa;
+	struct kvm_memory_slot *memslot = NULL;
+	bool is_iabt;
+	gfn_t gfn;
+	int ret;
+
+	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
+	is_iabt = (hsr_ec == HSR_EC_IABT);
+
+	/* Check that the second stage fault is a translation fault */
+	fault_status = (vcpu->arch.hsr & HSR_FSC_TYPE);
+	if (fault_status != FSC_FAULT && fault_status != FSC_PERM) {
+		kvm_err("Unsupported fault status: EC=%#lx DFCS=%#lx\n",
+			hsr_ec, fault_status);
+		return -EFAULT;
+	}
+
+	fault_ipa = ((phys_addr_t)vcpu->arch.hpfar & HPFAR_MASK) << 8;
+
+	gfn = fault_ipa >> PAGE_SHIFT;
+	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
+		if (is_iabt) {
+			kvm_err("Inst. abort on I/O address %08lx\n",
+				(unsigned long)fault_ipa);
+			return -EFAULT;
+		}
+
+		kvm_pr_unimpl("I/O address abort...");
+		return 0;
+	}
+
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot->user_alloc) {
+		kvm_err("non user-alloc memslots not supported\n");
+		return -EINVAL;
+	}
+
+	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, is_iabt);
+	return ret ? ret : 1;
 }
 
 static bool hva_to_gpa(struct kvm *kvm, unsigned long hva, gpa_t *gpa)


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 13/14] KVM: ARM: Handle I/O aborts
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
                   ` (11 preceding siblings ...)
  2012-08-16 15:30 ` [PATCH v10 12/14] KVM: ARM: Handle guest faults in KVM Christoffer Dall
@ 2012-08-16 15:30 ` Christoffer Dall
  2012-08-16 15:30 ` [PATCH v10 14/14] KVM: ARM: Guest wait-for-interrupts (WFI) support Christoffer Dall
  13 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:30 UTC (permalink / raw)
  To: kvmarm, kvm

When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.

Certain classes of load/store operations do not support the syndrome
information provided in the HSR and we therefore must be able to fetch
the offending instruction from guest memory and decode it manually.

This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h     |    3 
 arch/arm/include/asm/kvm_emulate.h |    2 
 arch/arm/include/asm/kvm_mmu.h     |    1 
 arch/arm/kvm/arm.c                 |    6 +
 arch/arm/kvm/emulate.c             |  273 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/mmu.c                 |  162 +++++++++++++++++++++
 arch/arm/kvm/trace.h               |   21 +++
 7 files changed, 466 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 4cff3b7..21cb240 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -158,8 +158,11 @@
 #define HSR_ISS		(HSR_IL - 1)
 #define HSR_ISV_SHIFT	(24)
 #define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_SRT_SHIFT	(16)
+#define HSR_SRT_MASK	(0xf << HSR_SRT_SHIFT)
 #define HSR_FSC		(0x3f)
 #define HSR_FSC_TYPE	(0x3c)
+#define HSR_SSE		(1 << 21)
 #define HSR_WNR		(1 << 6)
 #define HSR_CV_SHIFT	(24)
 #define HSR_CV		(1U << HSR_CV_SHIFT)
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index d914029..d899fbb 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -52,6 +52,8 @@ static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
 }
 
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			unsigned long instr);
 void kvm_adjust_itstate(struct kvm_vcpu *vcpu);
 void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 11f4c3a..c3f90b0 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -38,6 +38,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size);
 
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 4eafdcd..31ddf56 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -565,6 +565,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	if (unlikely(!vcpu->arch.target))
 		return -ENOEXEC;
 
+	if (run->exit_reason == KVM_EXIT_MMIO) {
+		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
+		if (ret)
+			return ret;
+	}
+
 	if (vcpu->sigset_active)
 		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
 
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 93bd3e2..cc5fa89 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -132,11 +132,284 @@ u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
 	return reg_array + vcpu_reg_offsets[mode][reg_num];
 }
 
+/******************************************************************************
+ * Utility functions common for all emulation code
+ *****************************************************************************/
+
+/*
+ * This one accepts a matrix where the first element is the
+ * bits as they must be, and the second element is the bitmask.
+ */
+#define INSTR_NONE	-1
+static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
+{
+	int i;
+	u32 mask;
+
+	for (i = 0; i < table_entries; i++) {
+		mask = table[i][1];
+		if ((table[i][0] & mask) == (instr & mask))
+			return i;
+	}
+	return INSTR_NONE;
+}
+
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	return 0;
 }
 
+
+/******************************************************************************
+ * Load-Store instruction emulation
+ *****************************************************************************/
+
+/*
+ * Must be ordered with LOADS first and WRITES afterwards
+ * for easy distinction when doing MMIO.
+ */
+#define NUM_LD_INSTR  9
+enum INSTR_LS_INDEXES {
+	INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
+	INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
+	INSTR_LS_LDRSH,
+	INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
+	INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
+	NUM_LS_INSTR
+};
+
+static u32 ls_instr[NUM_LS_INSTR][2] = {
+	{0x04700000, 0x0d700000}, /* LDRBT */
+	{0x04300000, 0x0d700000}, /* LDRT  */
+	{0x04100000, 0x0c500000}, /* LDR   */
+	{0x04500000, 0x0c500000}, /* LDRB  */
+	{0x000000d0, 0x0e1000f0}, /* LDRD  */
+	{0x01900090, 0x0ff000f0}, /* LDREX */
+	{0x001000b0, 0x0e1000f0}, /* LDRH  */
+	{0x001000d0, 0x0e1000f0}, /* LDRSB */
+	{0x001000f0, 0x0e1000f0}, /* LDRSH */
+	{0x04600000, 0x0d700000}, /* STRBT */
+	{0x04200000, 0x0d700000}, /* STRT  */
+	{0x04000000, 0x0c500000}, /* STR   */
+	{0x04400000, 0x0c500000}, /* STRB  */
+	{0x000000f0, 0x0e1000f0}, /* STRD  */
+	{0x01800090, 0x0ff000f0}, /* STREX */
+	{0x000000b0, 0x0e1000f0}  /* STRH  */
+};
+
+static inline int get_arm_ls_instr_index(u32 instr)
+{
+	return kvm_instr_index(instr, ls_instr, NUM_LS_INSTR);
+}
+
+/*
+ * Load-Store instruction decoding
+ */
+#define INSTR_LS_TYPE_BIT		26
+#define INSTR_LS_RD_MASK		0x0000f000
+#define INSTR_LS_RD_SHIFT		12
+#define INSTR_LS_RN_MASK		0x000f0000
+#define INSTR_LS_RN_SHIFT		16
+#define INSTR_LS_RM_MASK		0x0000000f
+#define INSTR_LS_OFFSET12_MASK		0x00000fff
+
+#define INSTR_LS_BIT_P			24
+#define INSTR_LS_BIT_U			23
+#define INSTR_LS_BIT_B			22
+#define INSTR_LS_BIT_W			21
+#define INSTR_LS_BIT_L			20
+#define INSTR_LS_BIT_S			 6
+#define INSTR_LS_BIT_H			 5
+
+/*
+ * ARM addressing mode defines
+ */
+#define OFFSET_IMM_MASK			0x0e000000
+#define OFFSET_IMM_VALUE		0x04000000
+#define OFFSET_REG_MASK			0x0e000ff0
+#define OFFSET_REG_VALUE		0x06000000
+#define OFFSET_SCALE_MASK		0x0e000010
+#define OFFSET_SCALE_VALUE		0x06000000
+
+#define SCALE_SHIFT_MASK		0x000000a0
+#define SCALE_SHIFT_SHIFT		5
+#define SCALE_SHIFT_LSL			0x0
+#define SCALE_SHIFT_LSR			0x1
+#define SCALE_SHIFT_ASR			0x2
+#define SCALE_SHIFT_ROR_RRX		0x3
+#define SCALE_SHIFT_IMM_MASK		0x00000f80
+#define SCALE_SHIFT_IMM_SHIFT		6
+
+#define PSR_BIT_C			29
+
+static unsigned long ls_word_calc_offset(struct kvm_vcpu *vcpu,
+					 unsigned long instr)
+{
+	int offset = 0;
+
+	if ((instr & OFFSET_IMM_MASK) == OFFSET_IMM_VALUE) {
+		/* Immediate offset/index */
+		offset = instr & INSTR_LS_OFFSET12_MASK;
+
+		if (!(instr & (1U << INSTR_LS_BIT_U)))
+			offset = -offset;
+	}
+
+	if ((instr & OFFSET_REG_MASK) == OFFSET_REG_VALUE) {
+		/* Register offset/index */
+		u8 rm = instr & INSTR_LS_RM_MASK;
+		offset = *vcpu_reg(vcpu, rm);
+
+		if (!(instr & (1U << INSTR_LS_BIT_P)))
+			offset = 0;
+	}
+
+	if ((instr & OFFSET_SCALE_MASK) == OFFSET_SCALE_VALUE) {
+		/* Scaled register offset */
+		u8 rm = instr & INSTR_LS_RM_MASK;
+		u8 shift = (instr & SCALE_SHIFT_MASK) >> SCALE_SHIFT_SHIFT;
+		u32 shift_imm = (instr & SCALE_SHIFT_IMM_MASK)
+				>> SCALE_SHIFT_IMM_SHIFT;
+		offset = *vcpu_reg(vcpu, rm);
+
+		switch (shift) {
+		case SCALE_SHIFT_LSL:
+			offset = offset << shift_imm;
+			break;
+		case SCALE_SHIFT_LSR:
+			if (shift_imm == 0)
+				offset = 0;
+			else
+				offset = ((u32)offset) >> shift_imm;
+			break;
+		case SCALE_SHIFT_ASR:
+			if (shift_imm == 0) {
+				if (offset & (1U << 31))
+					offset = 0xffffffff;
+				else
+					offset = 0;
+			} else {
+				/* Ensure arithmetic shift */
+				asm("mov %[r], %[op], ASR %[s]" :
+				    [r] "=r" (offset) :
+				    [op] "r" (offset), [s] "r" (shift_imm));
+			}
+			break;
+		case SCALE_SHIFT_ROR_RRX:
+			if (shift_imm == 0) {
+				u32 C = (vcpu->arch.regs.cpsr &
+						(1U << PSR_BIT_C));
+				offset = (C << 31) | offset >> 1;
+			} else {
+				/* Ensure arithmetic shift */
+				asm("mov %[r], %[op], ASR %[s]" :
+				    [r] "=r" (offset) :
+				    [op] "r" (offset), [s] "r" (shift_imm));
+			}
+			break;
+		}
+
+		if (instr & (1U << INSTR_LS_BIT_U))
+			return offset;
+		else
+			return -offset;
+	}
+
+	if (instr & (1U << INSTR_LS_BIT_U))
+		return offset;
+	else
+		return -offset;
+
+	BUG();
+}
+
+static int kvm_ls_length(struct kvm_vcpu *vcpu, u32 instr)
+{
+	int index;
+
+	index = get_arm_ls_instr_index(instr);
+
+	if (instr & (1U << INSTR_LS_TYPE_BIT)) {
+		/* LS word or unsigned byte */
+		if (instr & (1U << INSTR_LS_BIT_B))
+			return sizeof(unsigned char);
+		else
+			return sizeof(u32);
+	} else {
+		/* LS halfword, doubleword or signed byte */
+		u32 H = (instr & (1U << INSTR_LS_BIT_H));
+		u32 S = (instr & (1U << INSTR_LS_BIT_S));
+		u32 L = (instr & (1U << INSTR_LS_BIT_L));
+
+		if (!L && S) {
+			kvm_err("WARNING: d-word for MMIO\n");
+			return 2 * sizeof(u32);
+		} else if (L && S && !H)
+			return sizeof(char);
+		else
+			return sizeof(u16);
+	}
+
+	BUG();
+}
+
+/**
+ * kvm_emulate_mmio_ls - emulates load/store instructions made to I/O memory
+ * @vcpu:	The vcpu pointer
+ * @fault_ipa:	The IPA that caused the 2nd stage fault
+ * @instr:	The instruction that caused the fault
+ *
+ * Handles emulation of load/store instructions which cannot be emulated through
+ * information found in the HSR on faults. It is necessary in this case to
+ * simply decode the offending instruction in software and determine the
+ * required operands.
+ */
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			unsigned long instr)
+{
+	unsigned long rd, rn, offset, len;
+	int index;
+	bool is_write;
+
+	trace_kvm_mmio_emulate(vcpu->arch.regs.pc, instr, vcpu->arch.regs.cpsr);
+
+	index = get_arm_ls_instr_index(instr);
+	if (index == INSTR_NONE) {
+		kvm_err("Unknown load/store instruction\n");
+		return -EINVAL;
+	}
+
+	is_write = (index < NUM_LD_INSTR) ? false : true;
+	rd = (instr & INSTR_LS_RD_MASK) >> INSTR_LS_RD_SHIFT;
+	len = kvm_ls_length(vcpu, instr);
+
+	vcpu->run->mmio.is_write = is_write;
+	vcpu->run->mmio.phys_addr = fault_ipa;
+	vcpu->run->mmio.len = len;
+	vcpu->arch.mmio_sign_extend = false;
+	vcpu->arch.mmio_rd = rd;
+
+	trace_kvm_mmio((is_write) ? KVM_TRACE_MMIO_WRITE :
+				    KVM_TRACE_MMIO_READ_UNSATISFIED,
+			len, fault_ipa, (is_write) ? *vcpu_reg(vcpu, rd) : 0);
+
+	/* Handle base register writeback */
+	if (!(instr & (1U << INSTR_LS_BIT_P)) ||
+	     (instr & (1U << INSTR_LS_BIT_W))) {
+		rn = (instr & INSTR_LS_RN_MASK) >> INSTR_LS_RN_SHIFT;
+		offset = ls_word_calc_offset(vcpu, instr);
+		*vcpu_reg(vcpu, rn) += offset;
+	}
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	kvm_skip_instr(vcpu, is_wide_instruction(instr));
+	vcpu->run->exit_reason = KVM_EXIT_MMIO;
+	return 0;
+}
+
 /**
  * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
  * @vcpu:	The VCPU pointer
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 448fbd6..3df4fa8 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -19,6 +19,7 @@
 #include <linux/mman.h>
 #include <linux/kvm_host.h>
 #include <linux/io.h>
+#include <trace/events/kvm.h>
 #include <asm/idmap.h>
 #include <asm/pgalloc.h>
 #include <asm/kvm_arm.h>
@@ -26,6 +27,9 @@
 #include <asm/kvm_asm.h>
 #include <asm/mach/map.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
+
+#include "trace.h"
 
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
@@ -540,6 +544,159 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 }
 
 /**
+ * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation
+ * @vcpu: The VCPU pointer
+ * @run:  The VCPU run struct containing the mmio data
+ *
+ * This should only be called after returning from userspace for MMIO load
+ * emulation.
+ */
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	int *dest;
+	unsigned int len;
+	int mask;
+
+	if (!run->mmio.is_write) {
+		dest = vcpu_reg(vcpu, vcpu->arch.mmio_rd);
+		memset(dest, 0, sizeof(int));
+
+		len = run->mmio.len;
+		if (len > 4)
+			return -EINVAL;
+
+		memcpy(dest, run->mmio.data, len);
+
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
+				*((u64 *)run->mmio.data));
+
+		if (vcpu->arch.mmio_sign_extend && len < 4) {
+			mask = 1U << ((len * 8) - 1);
+			*dest = (*dest ^ mask) - mask;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * invalid_io_mem_abort -- Handle I/O aborts ISV bit is clear
+ *
+ * @vcpu:      The vcpu pointer
+ * @fault_ipa: The IPA that caused the 2nd stage fault
+ *
+ * Some load/store instructions cannot be emulated using the information
+ * presented in the HSR, for instance, register write-back instructions are not
+ * supported. We therefore need to fetch the instruction, decode it, and then
+ * emulate its behavior.
+ */
+static int invalid_io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
+{
+	unsigned long instr;
+	phys_addr_t pc_ipa;
+
+	if (vcpu->arch.pc_ipa & 1) {
+		kvm_err("I/O Abort from invalid instruction address? Wrong!\n");
+		return -EINVAL;
+	}
+
+	if (vcpu->arch.pc_ipa & (1U << 11)) {
+		/* LPAE PAR format */
+		/* TODO: Check if this ever happens - called from Hyp mode */
+		pc_ipa = vcpu->arch.pc_ipa & PAGE_MASK & ((1ULL << 32) - 1);
+	} else {
+		/* VMSAv7 PAR format */
+		pc_ipa = vcpu->arch.pc_ipa & PAGE_MASK & ((1ULL << 40) - 1);
+	}
+	pc_ipa += vcpu->arch.regs.pc & ~PAGE_MASK;
+
+	if (vcpu->arch.regs.cpsr & PSR_T_BIT) {
+		/* TODO: Check validity of PC IPA and IPA2!!! */
+		/* Need to decode thumb instructions as well */
+		kvm_err("Thumb guest support not there yet :(\n");
+		return -EINVAL;
+	}
+
+	if (kvm_read_guest(vcpu->kvm, pc_ipa, &instr, sizeof(instr))) {
+		kvm_err("Could not copy guest instruction\n");
+		return -EFAULT;
+	}
+
+	return kvm_emulate_mmio_ls(vcpu, fault_ipa, instr);
+}
+
+static int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
+{
+	unsigned long rd, len, instr_len;
+	bool is_write, sign_extend;
+
+	if (!(vcpu->arch.hsr & HSR_ISV))
+		return invalid_io_mem_abort(vcpu, fault_ipa);
+
+	if (((vcpu->arch.hsr >> 8) & 1)) {
+		kvm_err("Not supported, Cache operation on I/O addr.\n");
+		return -EFAULT;
+	}
+
+	if ((vcpu->arch.hsr >> 7) & 1) {
+		kvm_err("Translation table accesses I/O memory\n");
+		return -EFAULT;
+	}
+
+	switch ((vcpu->arch.hsr >> 22) & 0x3) {
+	case 0:
+		len = 1;
+		break;
+	case 1:
+		len = 2;
+		break;
+	case 2:
+		len = 4;
+		break;
+	default:
+		kvm_err("Invalid I/O abort\n");
+		return -EFAULT;
+	}
+
+	is_write = vcpu->arch.hsr & HSR_WNR;
+	sign_extend = vcpu->arch.hsr & HSR_SSE;
+	rd = (vcpu->arch.hsr & HSR_SRT_MASK) >> HSR_SRT_SHIFT;
+	BUG_ON(rd > 15);
+
+	if (rd == 15) {
+		kvm_err("I/O memory trying to read/write pc\n");
+		return -EFAULT;
+	}
+
+	/* Get instruction length in bytes */
+	instr_len = (vcpu->arch.hsr & HSR_IL) ? 4 : 2;
+
+	/* Export MMIO operations to user space */
+	run->mmio.is_write = is_write;
+	run->mmio.phys_addr = fault_ipa;
+	run->mmio.len = len;
+	vcpu->arch.mmio_sign_extend = sign_extend;
+	vcpu->arch.mmio_rd = rd;
+
+	trace_kvm_mmio((is_write) ? KVM_TRACE_MMIO_WRITE :
+				    KVM_TRACE_MMIO_READ_UNSATISFIED,
+			len, fault_ipa, (is_write) ? *vcpu_reg(vcpu, rd) : 0);
+
+	if (is_write)
+		memcpy(run->mmio.data, vcpu_reg(vcpu, rd), len);
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	*vcpu_pc(vcpu) += instr_len;
+	kvm_adjust_itstate(vcpu);
+	run->exit_reason = KVM_EXIT_MMIO;
+	return 0;
+}
+
+/**
  * kvm_handle_guest_abort - handles all 2nd stage aborts
  * @vcpu:	the VCPU pointer
  * @run:	the kvm_run structure
@@ -582,8 +739,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			return -EFAULT;
 		}
 
-		kvm_pr_unimpl("I/O address abort...");
-		return 0;
+		/* Adjust page offset */
+		fault_ipa |= vcpu->arch.hdfar & ~PAGE_MASK;
+		return io_mem_abort(vcpu, run, fault_ipa, memslot);
 	}
 
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index e474a0a..325106c 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,6 +39,27 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_mmio_emulate,
+	TP_PROTO(unsigned long vcpu_pc, unsigned long instr,
+		 unsigned long cpsr),
+	TP_ARGS(vcpu_pc, instr, cpsr),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+		__field(	unsigned long,	instr		)
+		__field(	unsigned long,	cpsr		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+		__entry->instr			= instr;
+		__entry->cpsr			= cpsr;
+	),
+
+	TP_printk("Emulate MMIO at: 0x%08lx (instr: %08lx, cpsr: %08lx)",
+		  __entry->vcpu_pc, __entry->instr, __entry->cpsr)
+);
+
 /* Architecturally implementation defined CP15 register access */
 TRACE_EVENT(kvm_emulate_cp15_imp,
 	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v10 14/14] KVM: ARM: Guest wait-for-interrupts (WFI) support
  2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
                   ` (12 preceding siblings ...)
  2012-08-16 15:30 ` [PATCH v10 13/14] KVM: ARM: Handle I/O aborts Christoffer Dall
@ 2012-08-16 15:30 ` Christoffer Dall
  13 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-16 15:30 UTC (permalink / raw)
  To: kvmarm, kvm

From: Christoffer Dall <cdall@cs.columbia.edu>

When the guest executes a WFI instruction the operation is trapped to
KVM, which emulates the instruction in software. There is no correlation
between a guest executing a WFI instruction and actually putting the
hardware into a low-power mode, since a KVM guest is essentially a
process and the WFI instruction can be seen as 'sleep' call from this
process. Therefore, we block the vcpu when the guest excecutes a wfi
instruction and the IRQ or FIQ lines are not raised.

When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
signal the VCPU thread and unflag the VCPU to no longer wait for
interrupts.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/kvm/arm.c     |   10 ++++++++--
 arch/arm/kvm/emulate.c |   13 ++++++++++++-
 arch/arm/kvm/trace.h   |   16 ++++++++++++++++
 3 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 31ddf56..09a6800 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -313,9 +313,16 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+/**
+ * kvm_arch_vcpu_runnable - determine if the vcpu can be scheduled
+ * @v:		The VCPU pointer
+ *
+ * If the guest CPU is not waiting for interrupts or an interrupt line is
+ * asserted, the CPU is by definition runnable.
+ */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return 0;
+	return !!v->arch.irq_lines;
 }
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
@@ -581,7 +588,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * Check conditions before entering the guest
 		 */
 		cond_resched();
-
 		update_vttbr(vcpu->kvm);
 
 		local_irq_disable();
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index cc5fa89..6cbdb08 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -154,9 +154,20 @@ static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
 	return INSTR_NONE;
 }
 
+/**
+ * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
+ * @vcpu:	the vcpu pointer
+ * @run:	the kvm_run structure pointer
+ *
+ * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
+ * halt execution of world-switches and schedule other host processes until
+ * there is an incoming IRQ or FIQ to the VM.
+ */
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return 0;
+	trace_kvm_wfi(vcpu->arch.regs.pc);
+	kvm_vcpu_block(vcpu);
+	return 1;
 }
 
 
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 325106c..28ed1a1 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -90,6 +90,22 @@ TRACE_EVENT(kvm_emulate_cp15_imp,
 			__entry->CRm, __entry->Op2)
 );
 
+TRACE_EVENT(kvm_wfi,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("guest executed wfi at: 0x%08lx", __entry->vcpu_pc)
+);
+
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [kvmarm] [PATCH v10 07/14] KVM: ARM: Memory virtualization setup
  2012-08-16 15:29 ` [PATCH v10 07/14] KVM: ARM: Memory virtualization setup Christoffer Dall
@ 2012-08-16 18:25   ` Alexander Graf
  2012-08-19  4:34     ` Christoffer Dall
  2012-08-23  8:12   ` Min-gyu Kim
  1 sibling, 1 reply; 29+ messages in thread
From: Alexander Graf @ 2012-08-16 18:25 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm


On 16.08.2012, at 17:29, Christoffer Dall wrote:

> This commit introduces the framework for guest memory management
> through the use of 2nd stage translation. Each VM has a pointer
> to a level-1 table (the pgd field in struct kvm_arch) which is
> used for the 2nd stage translations. Entries are added when handling
> guest faults (later patch) and the table itself can be allocated and
> freed through the following functions implemented in
> arch/arm/kvm/arm_mmu.c:
> - kvm_alloc_stage2_pgd(struct kvm *kvm);
> - kvm_free_stage2_pgd(struct kvm *kvm);
> 
> Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
> pgprot_guest variables used to map 2nd stage memory for KVM guests.
> 
> Each entry in TLBs and caches are tagged with a VMID identifier in
> addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
> order that VMs are executed, and caches and tlbs are invalidated when
> the VMID space has been used to allow for more than 255 simultaenously
> running guests.
> 
> The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
> freed in kvm_arch_destroy_vm(). Both functions are called from the main
> KVM code.
> 
> We pre-allocate page table memory to be able to synchronize using a
> spinlock and be called under rcu_read_lock from the MMU notifiers.  We
> steal the mmu_memory_cache implementation from x86 and adapt for our
> specific usage.
> 
> We support MMU notifiers (thanks to Marc Zyngier) through
> kvm_unmap_hva and kvm_set_spte_hva.
> 
> Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
> which is used by VGIC support to map the virtual CPU interface registers
> to the guest. This support is added by Marc Zyngier.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
> arch/arm/include/asm/kvm_asm.h        |    2 
> arch/arm/include/asm/kvm_host.h       |   18 ++
> arch/arm/include/asm/kvm_mmu.h        |    9 +
> arch/arm/include/asm/pgtable-3level.h |    9 +
> arch/arm/include/asm/pgtable.h        |    4 
> arch/arm/kvm/Kconfig                  |    1 
> arch/arm/kvm/arm.c                    |   38 +++
> arch/arm/kvm/exports.c                |    1 
> arch/arm/kvm/interrupts.S             |    8 +
> arch/arm/kvm/mmu.c                    |  373 +++++++++++++++++++++++++++++++++
> arch/arm/mm/mmu.c                     |    3 
> 11 files changed, 465 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 58d51e3..55b6446 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -34,6 +34,7 @@
> #define SMCHYP_HVBAR_W 0xfffffff0
> 
> #ifndef __ASSEMBLY__
> +struct kvm;
> struct kvm_vcpu;
> 
> extern char __kvm_hyp_init[];
> @@ -48,6 +49,7 @@ extern char __kvm_hyp_code_start[];
> extern char __kvm_hyp_code_end[];
> 
> extern void __kvm_flush_vm_context(void);
> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> 
> extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> #endif
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index d7e3398..d86ce39 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -157,4 +157,22 @@ struct kvm_vcpu_stat {
> struct kvm_vcpu_init;
> int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
> 			const struct kvm_vcpu_init *init);
> +
> +#define KVM_ARCH_WANT_MMU_NOTIFIER
> +struct kvm;
> +int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
> +int kvm_unmap_hva_range(struct kvm *kvm,
> +			unsigned long start, unsigned long end);
> +void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
> +
> +/* We do not have shadow page tables, hence the empty hooks */
> +static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
> +{
> +	return 0;
> +}
> +
> +static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
> +{
> +	return 0;
> +}
> #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 8252921..11f4c3a 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -33,4 +33,13 @@ int create_hyp_mappings(void *from, void *to);
> int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
> void free_hyp_pmds(void);
> 
> +int kvm_alloc_stage2_pgd(struct kvm *kvm);
> +void kvm_free_stage2_pgd(struct kvm *kvm);
> +int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> +			  phys_addr_t pa, unsigned long size);
> +
> +int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
> +
> +void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
> +
> #endif /* __ARM_KVM_MMU_H__ */
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 1169a8a..7351eee 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -102,6 +102,15 @@
>  */
> #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
> 
> +/*
> + * 2-nd stage PTE definitions for LPAE.
> + */
> +#define L_PTE2_SHARED		L_PTE_SHARED
> +#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
> +#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */
> +#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
> +#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */
> +
> #ifndef __ASSEMBLY__
> 
> #define pud_none(pud)		(!pud_val(pud))
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index bc83540..a31d0e9 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
> 
> extern pgprot_t		pgprot_user;
> extern pgprot_t		pgprot_kernel;
> +extern pgprot_t		pgprot_guest;
> 
> #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
> 
> @@ -83,6 +84,9 @@ extern pgprot_t		pgprot_kernel;
> #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
> #define PAGE_KERNEL_EXEC	pgprot_kernel
> #define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
> +#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
> +					  L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
> +					  L_PTE2_SHARED)
> 
> #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
> #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 83abbe0..7fa50d3 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -36,6 +36,7 @@ config KVM_ARM_HOST
> 	depends on KVM
> 	depends on MMU
> 	depends on CPU_V7 && ARM_VIRT_EXT
> +	select	MMU_NOTIFIER
> 	---help---
> 	  Provides host support for ARM processors.
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 0b1c466..3f97e7c 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -82,12 +82,34 @@ void kvm_arch_sync_events(struct kvm *kvm)
> {
> }
> 
> +/**
> + * kvm_arch_init_vm - initializes a VM data structure
> + * @kvm:	pointer to the KVM struct
> + */
> int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> {
> +	int ret = 0;
> +
> 	if (type)
> 		return -EINVAL;
> 
> -	return 0;
> +	ret = kvm_alloc_stage2_pgd(kvm);
> +	if (ret)
> +		goto out_fail_alloc;
> +	spin_lock_init(&kvm->arch.pgd_lock);
> +
> +	ret = create_hyp_mappings(kvm, kvm + 1);
> +	if (ret)
> +		goto out_free_stage2_pgd;
> +
> +	/* Mark the initial VMID generation invalid */
> +	kvm->arch.vmid_gen = 0;
> +
> +	return ret;
> +out_free_stage2_pgd:
> +	kvm_free_stage2_pgd(kvm);
> +out_fail_alloc:
> +	return ret;
> }
> 
> int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> @@ -105,10 +127,16 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
> 	return 0;
> }
> 
> +/**
> + * kvm_arch_destroy_vm - destroy the VM data structure
> + * @kvm:	pointer to the KVM struct
> + */
> void kvm_arch_destroy_vm(struct kvm *kvm)
> {
> 	int i;
> 
> +	kvm_free_stage2_pgd(kvm);
> +
> 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
> 		if (kvm->vcpus[i]) {
> 			kvm_arch_vcpu_free(kvm->vcpus[i]);
> @@ -184,7 +212,13 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
> 	if (err)
> 		goto free_vcpu;
> 
> +	err = create_hyp_mappings(vcpu, vcpu + 1);
> +	if (err)
> +		goto vcpu_uninit;
> +
> 	return vcpu;
> +vcpu_uninit:
> +	kvm_vcpu_uninit(vcpu);
> free_vcpu:
> 	kmem_cache_free(kvm_vcpu_cache, vcpu);
> out:
> @@ -193,6 +227,8 @@ out:
> 
> void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
> {
> +	kvm_mmu_free_memory_caches(vcpu);
> +	kmem_cache_free(kvm_vcpu_cache, vcpu);
> }
> 
> void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
> index 8ebdf07..f39f823 100644
> --- a/arch/arm/kvm/exports.c
> +++ b/arch/arm/kvm/exports.c
> @@ -33,5 +33,6 @@ EXPORT_SYMBOL_GPL(__kvm_hyp_code_end);
> EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
> 
> EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
> +EXPORT_SYMBOL_GPL(__kvm_tlb_flush_vmid);
> 
> EXPORT_SYMBOL_GPL(smp_send_reschedule);
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index bf09801..edf9ed5 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -31,6 +31,14 @@ __kvm_hyp_code_start:
> 	.globl __kvm_hyp_code_start
> 
> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +@  Flush per-VMID TLBs
> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +
> +ENTRY(__kvm_tlb_flush_vmid)
> +	bx	lr
> +ENDPROC(__kvm_tlb_flush_vmid)
> +
> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> @  Flush TLBs and instruction caches of current CPU for all VMIDs
> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 6a7dfd4..6cb0e38 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -23,10 +23,43 @@
> #include <asm/pgalloc.h>
> #include <asm/kvm_arm.h>
> #include <asm/kvm_mmu.h>
> +#include <asm/kvm_asm.h>
> #include <asm/mach/map.h>
> 
> static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
> 
> +static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
> +				  int min, int max)
> +{
> +	void *page;
> +
> +	BUG_ON(max > KVM_NR_MEM_OBJS);
> +	if (cache->nobjs >= min)
> +		return 0;
> +	while (cache->nobjs < max) {
> +		page = (void *)__get_free_page(PGALLOC_GFP);
> +		if (!page)
> +			return -ENOMEM;
> +		cache->objects[cache->nobjs++] = page;
> +	}
> +	return 0;
> +}
> +
> +static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
> +{
> +	while (mc->nobjs)
> +		free_page((unsigned long)mc->objects[--mc->nobjs]);
> +}
> +
> +static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
> +{
> +	void *p;
> +
> +	BUG_ON(!mc || !mc->nobjs);
> +	p = mc->objects[--mc->nobjs];
> +	return p;
> +}
> +
> static void free_ptes(pmd_t *pmd, unsigned long addr)
> {
> 	pte_t *pte;
> @@ -200,7 +233,347 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
> 	return __create_hyp_mappings(from, to, &pfn);
> }
> 
> +/**
> + * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
> + * @kvm:	The KVM struct pointer for the VM.
> + *
> + * Allocates the 1st level table only of size defined by PGD2_ORDER (can
> + * support either full 40-bit input addresses or limited to 32-bit input
> + * addresses). Clears the allocated pages.
> + *
> + * Note we don't need locking here as this is only called when the VM is
> + * created, which can only be done once.
> + */
> +int kvm_alloc_stage2_pgd(struct kvm *kvm)
> +{
> +	pgd_t *pgd;
> +
> +	if (kvm->arch.pgd != NULL) {
> +		kvm_err("kvm_arch already initialized?\n");
> +		return -EINVAL;
> +	}
> +
> +	pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
> +	if (!pgd)
> +		return -ENOMEM;
> +
> +	memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
> +	kvm->arch.pgd = pgd;
> +
> +	return 0;
> +}
> +
> +static void free_guest_pages(pte_t *pte, unsigned long addr)
> +{
> +	unsigned int i;
> +	struct page *page, *pte_page;
> +
> +	pte_page = virt_to_page(pte);
> +
> +	for (i = 0; i < PTRS_PER_PTE; i++) {
> +		if (pte_present(*pte)) {
> +			unsigned long pfn = pte_pfn(*pte);
> +
> +			if (pfn_valid(pfn)) { /* Skip over device memory */
> +				page = pfn_to_page(pfn);
> +				put_page(page);
> +			}
> +			put_page(pte_page);
> +		}
> +		pte++;
> +	}
> +}
> +
> +static void free_stage2_ptes(pmd_t *pmd, unsigned long addr)
> +{
> +	unsigned int i;
> +	pte_t *pte;
> +	struct page *page, *pmd_page;
> +
> +	pmd_page = virt_to_page(pmd);
> +
> +	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
> +		BUG_ON(pmd_sect(*pmd));
> +		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
> +			pte = pte_offset_kernel(pmd, addr);
> +			free_guest_pages(pte, addr);
> +			page = virt_to_page((void *)pte);
> +			WARN_ON(page_count(page) != 1);
> +			pte_free_kernel(NULL, pte);
> +
> +			put_page(pmd_page);
> +		}
> +		pmd++;
> +	}
> +}
> +
> +/**
> + * kvm_free_stage2_pgd - free all stage-2 tables
> + * @kvm:	The KVM struct pointer for the VM.
> + *
> + * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
> + * underlying level-2 and level-3 tables before freeing the actual level-1 table
> + * and setting the struct pointer to NULL.
> + *
> + * Note we don't need locking here as this is only called when the VM is
> + * destroyed, which can only be done once.
> + */
> +void kvm_free_stage2_pgd(struct kvm *kvm)
> +{
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	unsigned long long i, addr;
> +	struct page *page, *pud_page;
> +
> +	if (kvm->arch.pgd == NULL)
> +		return;
> +
> +	/*
> +	 * We do this slightly different than other places, since we need more
> +	 * than 32 bits and for instance pgd_addr_end converts to unsigned long.
> +	 */
> +	addr = 0;
> +	for (i = 0; i < PTRS_PER_PGD2; i++) {
> +		addr = i * (unsigned long long)PGDIR_SIZE;
> +		pgd = kvm->arch.pgd + i;
> +		pud = pud_offset(pgd, addr);
> +		pud_page = virt_to_page(pud);
> +
> +		if (pud_none(*pud))
> +			continue;
> +
> +		BUG_ON(pud_bad(*pud));
> +
> +		pmd = pmd_offset(pud, addr);
> +		free_stage2_ptes(pmd, addr);
> +		page = virt_to_page((void *)pmd);
> +		WARN_ON(page_count(page) != 1);
> +		pmd_free(NULL, pmd);
> +		put_page(pud_page);
> +	}
> +
> +	WARN_ON(page_count(pud_page) != 1);
> +	free_pages((unsigned long)kvm->arch.pgd, PGD2_ORDER);
> +	kvm->arch.pgd = NULL;
> +}
> +
> +/*
> + * Clear a stage-2 PTE, lowering the various ref-counts. Also takes
> + * care of invalidating the TLBs.  Must be called while holding
> + * pgd_lock, otherwise another faulting VCPU may come in and mess
> + * things behind our back.
> + */
> +static void stage2_clear_pte(struct kvm *kvm, phys_addr_t addr)
> +{
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +	struct page *page;
> +
> +	kvm_debug("Clearing PTE&%08llx\n", addr);
> +	pgd = kvm->arch.pgd + pgd_index(addr);
> +	pud = pud_offset(pgd, addr);
> +	BUG_ON(pud_none(*pud));
> +
> +	pmd = pmd_offset(pud, addr);
> +	BUG_ON(pmd_none(*pmd));
> +
> +	pte = pte_offset_kernel(pmd, addr);
> +	set_pte_ext(pte, __pte(0), 0);
> +
> +	page = virt_to_page(pte);
> +	put_page(page);
> +	if (page_count(page) != 1) {
> +		__kvm_tlb_flush_vmid(kvm);
> +		return;
> +	}
> +
> +	/* Need to remove pte page */
> +	pmd_clear(pmd);
> +	__kvm_tlb_flush_vmid(kvm);
> +	pte_free_kernel(NULL, (pte_t *)((unsigned long)pte & PAGE_MASK));
> +
> +	page = virt_to_page(pmd);
> +	put_page(page);
> +	if (page_count(page) != 1)
> +		return;
> +
> +	/*
> +	 * Need to remove pmd page. This is the worst case, and we end
> +	 * up invalidating the TLB twice. No big deal.
> +	 */
> +	pud_clear(pud);
> +	__kvm_tlb_flush_vmid(kvm);
> +	pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
> +
> +	page = virt_to_page(pud);
> +	put_page(page);
> +}
> +
> +static void stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> +			   phys_addr_t addr, const pte_t *new_pte)
> +{
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +
> +	/* Create 2nd stage page table mapping - Level 1 */
> +	pgd = kvm->arch.pgd + pgd_index(addr);
> +	pud = pud_offset(pgd, addr);
> +	if (pud_none(*pud)) {
> +		if (!cache)
> +			return; /* ignore calls from kvm_set_spte_hva */
> +		pmd = mmu_memory_cache_alloc(cache);
> +		pud_populate(NULL, pud, pmd);
> +		pmd += pmd_index(addr);
> +		get_page(virt_to_page(pud));
> +	} else
> +		pmd = pmd_offset(pud, addr);
> +
> +	/* Create 2nd stage page table mapping - Level 2 */
> +	if (pmd_none(*pmd)) {
> +		if (!cache)
> +			return; /* ignore calls from kvm_set_spte_hva */
> +		pte = mmu_memory_cache_alloc(cache);
> +		clean_pte_table(pte);
> +		pmd_populate_kernel(NULL, pmd, pte);
> +		pte += pte_index(addr);
> +		get_page(virt_to_page(pmd));
> +	} else
> +		pte = pte_offset_kernel(pmd, addr);
> +
> +	/* Create 2nd stage page table mapping - Level 3 */
> +	BUG_ON(pte_none(pte));
> +	set_pte_ext(pte, *new_pte, 0);
> +	get_page(virt_to_page(pte));
> +}
> +
> +/**
> + * kvm_phys_addr_ioremap - map a device range to guest IPA
> + *
> + * @kvm:	The KVM pointer
> + * @guest_ipa:	The IPA at which to insert the mapping
> + * @pa:		The physical address of the device
> + * @size:	The size of the mapping
> + */
> +int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> +			  phys_addr_t pa, unsigned long size)
> +{
> +	phys_addr_t addr, end;
> +	pgprot_t prot;
> +	int ret = 0;
> +	unsigned long pfn;
> +	struct kvm_mmu_memory_cache cache = { 0, };
> +
> +	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
> +	prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
> +			L_PTE2_READ | L_PTE2_WRITE);
> +	pfn = __phys_to_pfn(pa);
> +
> +	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
> +		pte_t pte = pfn_pte(pfn, prot);
> +
> +		ret = mmu_topup_memory_cache(&cache, 2, 2);
> +		if (ret)
> +			goto out;
> +		spin_lock(&kvm->arch.pgd_lock);
> +		stage2_set_pte(kvm, &cache, addr, &pte);
> +		spin_unlock(&kvm->arch.pgd_lock);
> +
> +		pfn++;
> +	}
> +
> +out:
> +	mmu_free_memory_cache(&cache);
> +	return ret;
> +}
> +
> int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
> 	return -EINVAL;
> }
> +
> +static bool hva_to_gpa(struct kvm *kvm, unsigned long hva, gpa_t *gpa)

A single hva can have multiple gpas mapped, no? At least that's what I gathered from the discussion about my attempt to a function similar to this :).

I'm also having a hard time following your mmu code in general. When do pages get mapped? Where do they get mapped from?


Alex


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [kvmarm] [PATCH v10 07/14] KVM: ARM: Memory virtualization setup
  2012-08-16 18:25   ` [kvmarm] " Alexander Graf
@ 2012-08-19  4:34     ` Christoffer Dall
  2012-08-19  9:38       ` Peter Maydell
  0 siblings, 1 reply; 29+ messages in thread
From: Christoffer Dall @ 2012-08-19  4:34 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvmarm, kvm

On Thu, Aug 16, 2012 at 2:25 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 16.08.2012, at 17:29, Christoffer Dall wrote:
>
>> This commit introduces the framework for guest memory management
>> through the use of 2nd stage translation. Each VM has a pointer
>> to a level-1 table (the pgd field in struct kvm_arch) which is
>> used for the 2nd stage translations. Entries are added when handling
>> guest faults (later patch) and the table itself can be allocated and
>> freed through the following functions implemented in
>> arch/arm/kvm/arm_mmu.c:
>> - kvm_alloc_stage2_pgd(struct kvm *kvm);
>> - kvm_free_stage2_pgd(struct kvm *kvm);
>>
>> Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
>> pgprot_guest variables used to map 2nd stage memory for KVM guests.
>>
>> Each entry in TLBs and caches are tagged with a VMID identifier in
>> addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
>> order that VMs are executed, and caches and tlbs are invalidated when
>> the VMID space has been used to allow for more than 255 simultaenously
>> running guests.
>>
>> The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
>> freed in kvm_arch_destroy_vm(). Both functions are called from the main
>> KVM code.
>>
>> We pre-allocate page table memory to be able to synchronize using a
>> spinlock and be called under rcu_read_lock from the MMU notifiers.  We
>> steal the mmu_memory_cache implementation from x86 and adapt for our
>> specific usage.
>>
>> We support MMU notifiers (thanks to Marc Zyngier) through
>> kvm_unmap_hva and kvm_set_spte_hva.
>>
>> Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
>> which is used by VGIC support to map the virtual CPU interface registers
>> to the guest. This support is added by Marc Zyngier.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>> arch/arm/include/asm/kvm_asm.h        |    2
>> arch/arm/include/asm/kvm_host.h       |   18 ++
>> arch/arm/include/asm/kvm_mmu.h        |    9 +
>> arch/arm/include/asm/pgtable-3level.h |    9 +
>> arch/arm/include/asm/pgtable.h        |    4
>> arch/arm/kvm/Kconfig                  |    1
>> arch/arm/kvm/arm.c                    |   38 +++
>> arch/arm/kvm/exports.c                |    1
>> arch/arm/kvm/interrupts.S             |    8 +
>> arch/arm/kvm/mmu.c                    |  373 +++++++++++++++++++++++++++++++++
>> arch/arm/mm/mmu.c                     |    3
>> 11 files changed, 465 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>> index 58d51e3..55b6446 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -34,6 +34,7 @@
>> #define SMCHYP_HVBAR_W 0xfffffff0
>>
>> #ifndef __ASSEMBLY__
>> +struct kvm;
>> struct kvm_vcpu;
>>
>> extern char __kvm_hyp_init[];
>> @@ -48,6 +49,7 @@ extern char __kvm_hyp_code_start[];
>> extern char __kvm_hyp_code_end[];
>>
>> extern void __kvm_flush_vm_context(void);
>> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>
>> extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>> #endif
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index d7e3398..d86ce39 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -157,4 +157,22 @@ struct kvm_vcpu_stat {
>> struct kvm_vcpu_init;
>> int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>>                       const struct kvm_vcpu_init *init);
>> +
>> +#define KVM_ARCH_WANT_MMU_NOTIFIER
>> +struct kvm;
>> +int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
>> +int kvm_unmap_hva_range(struct kvm *kvm,
>> +                     unsigned long start, unsigned long end);
>> +void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>> +
>> +/* We do not have shadow page tables, hence the empty hooks */
>> +static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
>> +{
>> +     return 0;
>> +}
>> +
>> +static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
>> +{
>> +     return 0;
>> +}
>> #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 8252921..11f4c3a 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -33,4 +33,13 @@ int create_hyp_mappings(void *from, void *to);
>> int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
>> void free_hyp_pmds(void);
>>
>> +int kvm_alloc_stage2_pgd(struct kvm *kvm);
>> +void kvm_free_stage2_pgd(struct kvm *kvm);
>> +int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>> +                       phys_addr_t pa, unsigned long size);
>> +
>> +int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
>> +
>> +void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
>> +
>> #endif /* __ARM_KVM_MMU_H__ */
>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> index 1169a8a..7351eee 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -102,6 +102,15 @@
>>  */
>> #define L_PGD_SWAPPER         (_AT(pgdval_t, 1) << 55)        /* swapper_pg_dir entry */
>>
>> +/*
>> + * 2-nd stage PTE definitions for LPAE.
>> + */
>> +#define L_PTE2_SHARED                L_PTE_SHARED
>> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>> +#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
>> +#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
>> +
>> #ifndef __ASSEMBLY__
>>
>> #define pud_none(pud)         (!pud_val(pud))
>> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
>> index bc83540..a31d0e9 100644
>> --- a/arch/arm/include/asm/pgtable.h
>> +++ b/arch/arm/include/asm/pgtable.h
>> @@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
>>
>> extern pgprot_t               pgprot_user;
>> extern pgprot_t               pgprot_kernel;
>> +extern pgprot_t              pgprot_guest;
>>
>> #define _MOD_PROT(p, b)       __pgprot(pgprot_val(p) | (b))
>>
>> @@ -83,6 +84,9 @@ extern pgprot_t             pgprot_kernel;
>> #define PAGE_KERNEL           _MOD_PROT(pgprot_kernel, L_PTE_XN)
>> #define PAGE_KERNEL_EXEC      pgprot_kernel
>> #define PAGE_HYP              _MOD_PROT(pgprot_kernel, L_PTE_USER)
>> +#define PAGE_KVM_GUEST               _MOD_PROT(pgprot_guest, L_PTE2_READ | \
>> +                                       L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
>> +                                       L_PTE2_SHARED)
>>
>> #define __PAGE_NONE           __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
>> #define __PAGE_SHARED         __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>> index 83abbe0..7fa50d3 100644
>> --- a/arch/arm/kvm/Kconfig
>> +++ b/arch/arm/kvm/Kconfig
>> @@ -36,6 +36,7 @@ config KVM_ARM_HOST
>>       depends on KVM
>>       depends on MMU
>>       depends on CPU_V7 && ARM_VIRT_EXT
>> +     select  MMU_NOTIFIER
>>       ---help---
>>         Provides host support for ARM processors.
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 0b1c466..3f97e7c 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -82,12 +82,34 @@ void kvm_arch_sync_events(struct kvm *kvm)
>> {
>> }
>>
>> +/**
>> + * kvm_arch_init_vm - initializes a VM data structure
>> + * @kvm:     pointer to the KVM struct
>> + */
>> int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>> {
>> +     int ret = 0;
>> +
>>       if (type)
>>               return -EINVAL;
>>
>> -     return 0;
>> +     ret = kvm_alloc_stage2_pgd(kvm);
>> +     if (ret)
>> +             goto out_fail_alloc;
>> +     spin_lock_init(&kvm->arch.pgd_lock);
>> +
>> +     ret = create_hyp_mappings(kvm, kvm + 1);
>> +     if (ret)
>> +             goto out_free_stage2_pgd;
>> +
>> +     /* Mark the initial VMID generation invalid */
>> +     kvm->arch.vmid_gen = 0;
>> +
>> +     return ret;
>> +out_free_stage2_pgd:
>> +     kvm_free_stage2_pgd(kvm);
>> +out_fail_alloc:
>> +     return ret;
>> }
>>
>> int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>> @@ -105,10 +127,16 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
>>       return 0;
>> }
>>
>> +/**
>> + * kvm_arch_destroy_vm - destroy the VM data structure
>> + * @kvm:     pointer to the KVM struct
>> + */
>> void kvm_arch_destroy_vm(struct kvm *kvm)
>> {
>>       int i;
>>
>> +     kvm_free_stage2_pgd(kvm);
>> +
>>       for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>>               if (kvm->vcpus[i]) {
>>                       kvm_arch_vcpu_free(kvm->vcpus[i]);
>> @@ -184,7 +212,13 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>>       if (err)
>>               goto free_vcpu;
>>
>> +     err = create_hyp_mappings(vcpu, vcpu + 1);
>> +     if (err)
>> +             goto vcpu_uninit;
>> +
>>       return vcpu;
>> +vcpu_uninit:
>> +     kvm_vcpu_uninit(vcpu);
>> free_vcpu:
>>       kmem_cache_free(kvm_vcpu_cache, vcpu);
>> out:
>> @@ -193,6 +227,8 @@ out:
>>
>> void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
>> {
>> +     kvm_mmu_free_memory_caches(vcpu);
>> +     kmem_cache_free(kvm_vcpu_cache, vcpu);
>> }
>>
>> void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>> diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
>> index 8ebdf07..f39f823 100644
>> --- a/arch/arm/kvm/exports.c
>> +++ b/arch/arm/kvm/exports.c
>> @@ -33,5 +33,6 @@ EXPORT_SYMBOL_GPL(__kvm_hyp_code_end);
>> EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
>>
>> EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
>> +EXPORT_SYMBOL_GPL(__kvm_tlb_flush_vmid);
>>
>> EXPORT_SYMBOL_GPL(smp_send_reschedule);
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index bf09801..edf9ed5 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -31,6 +31,14 @@ __kvm_hyp_code_start:
>>       .globl __kvm_hyp_code_start
>>
>> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>> +@  Flush per-VMID TLBs
>> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>> +
>> +ENTRY(__kvm_tlb_flush_vmid)
>> +     bx      lr
>> +ENDPROC(__kvm_tlb_flush_vmid)
>> +
>> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>> @  Flush TLBs and instruction caches of current CPU for all VMIDs
>> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 6a7dfd4..6cb0e38 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -23,10 +23,43 @@
>> #include <asm/pgalloc.h>
>> #include <asm/kvm_arm.h>
>> #include <asm/kvm_mmu.h>
>> +#include <asm/kvm_asm.h>
>> #include <asm/mach/map.h>
>>
>> static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
>>
>> +static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
>> +                               int min, int max)
>> +{
>> +     void *page;
>> +
>> +     BUG_ON(max > KVM_NR_MEM_OBJS);
>> +     if (cache->nobjs >= min)
>> +             return 0;
>> +     while (cache->nobjs < max) {
>> +             page = (void *)__get_free_page(PGALLOC_GFP);
>> +             if (!page)
>> +                     return -ENOMEM;
>> +             cache->objects[cache->nobjs++] = page;
>> +     }
>> +     return 0;
>> +}
>> +
>> +static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
>> +{
>> +     while (mc->nobjs)
>> +             free_page((unsigned long)mc->objects[--mc->nobjs]);
>> +}
>> +
>> +static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
>> +{
>> +     void *p;
>> +
>> +     BUG_ON(!mc || !mc->nobjs);
>> +     p = mc->objects[--mc->nobjs];
>> +     return p;
>> +}
>> +
>> static void free_ptes(pmd_t *pmd, unsigned long addr)
>> {
>>       pte_t *pte;
>> @@ -200,7 +233,347 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
>>       return __create_hyp_mappings(from, to, &pfn);
>> }
>>
>> +/**
>> + * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
>> + * @kvm:     The KVM struct pointer for the VM.
>> + *
>> + * Allocates the 1st level table only of size defined by PGD2_ORDER (can
>> + * support either full 40-bit input addresses or limited to 32-bit input
>> + * addresses). Clears the allocated pages.
>> + *
>> + * Note we don't need locking here as this is only called when the VM is
>> + * created, which can only be done once.
>> + */
>> +int kvm_alloc_stage2_pgd(struct kvm *kvm)
>> +{
>> +     pgd_t *pgd;
>> +
>> +     if (kvm->arch.pgd != NULL) {
>> +             kvm_err("kvm_arch already initialized?\n");
>> +             return -EINVAL;
>> +     }
>> +
>> +     pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
>> +     if (!pgd)
>> +             return -ENOMEM;
>> +
>> +     memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
>> +     kvm->arch.pgd = pgd;
>> +
>> +     return 0;
>> +}
>> +
>> +static void free_guest_pages(pte_t *pte, unsigned long addr)
>> +{
>> +     unsigned int i;
>> +     struct page *page, *pte_page;
>> +
>> +     pte_page = virt_to_page(pte);
>> +
>> +     for (i = 0; i < PTRS_PER_PTE; i++) {
>> +             if (pte_present(*pte)) {
>> +                     unsigned long pfn = pte_pfn(*pte);
>> +
>> +                     if (pfn_valid(pfn)) { /* Skip over device memory */
>> +                             page = pfn_to_page(pfn);
>> +                             put_page(page);
>> +                     }
>> +                     put_page(pte_page);
>> +             }
>> +             pte++;
>> +     }
>> +}
>> +
>> +static void free_stage2_ptes(pmd_t *pmd, unsigned long addr)
>> +{
>> +     unsigned int i;
>> +     pte_t *pte;
>> +     struct page *page, *pmd_page;
>> +
>> +     pmd_page = virt_to_page(pmd);
>> +
>> +     for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
>> +             BUG_ON(pmd_sect(*pmd));
>> +             if (!pmd_none(*pmd) && pmd_table(*pmd)) {
>> +                     pte = pte_offset_kernel(pmd, addr);
>> +                     free_guest_pages(pte, addr);
>> +                     page = virt_to_page((void *)pte);
>> +                     WARN_ON(page_count(page) != 1);
>> +                     pte_free_kernel(NULL, pte);
>> +
>> +                     put_page(pmd_page);
>> +             }
>> +             pmd++;
>> +     }
>> +}
>> +
>> +/**
>> + * kvm_free_stage2_pgd - free all stage-2 tables
>> + * @kvm:     The KVM struct pointer for the VM.
>> + *
>> + * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
>> + * underlying level-2 and level-3 tables before freeing the actual level-1 table
>> + * and setting the struct pointer to NULL.
>> + *
>> + * Note we don't need locking here as this is only called when the VM is
>> + * destroyed, which can only be done once.
>> + */
>> +void kvm_free_stage2_pgd(struct kvm *kvm)
>> +{
>> +     pgd_t *pgd;
>> +     pud_t *pud;
>> +     pmd_t *pmd;
>> +     unsigned long long i, addr;
>> +     struct page *page, *pud_page;
>> +
>> +     if (kvm->arch.pgd == NULL)
>> +             return;
>> +
>> +     /*
>> +      * We do this slightly different than other places, since we need more
>> +      * than 32 bits and for instance pgd_addr_end converts to unsigned long.
>> +      */
>> +     addr = 0;
>> +     for (i = 0; i < PTRS_PER_PGD2; i++) {
>> +             addr = i * (unsigned long long)PGDIR_SIZE;
>> +             pgd = kvm->arch.pgd + i;
>> +             pud = pud_offset(pgd, addr);
>> +             pud_page = virt_to_page(pud);
>> +
>> +             if (pud_none(*pud))
>> +                     continue;
>> +
>> +             BUG_ON(pud_bad(*pud));
>> +
>> +             pmd = pmd_offset(pud, addr);
>> +             free_stage2_ptes(pmd, addr);
>> +             page = virt_to_page((void *)pmd);
>> +             WARN_ON(page_count(page) != 1);
>> +             pmd_free(NULL, pmd);
>> +             put_page(pud_page);
>> +     }
>> +
>> +     WARN_ON(page_count(pud_page) != 1);
>> +     free_pages((unsigned long)kvm->arch.pgd, PGD2_ORDER);
>> +     kvm->arch.pgd = NULL;
>> +}
>> +
>> +/*
>> + * Clear a stage-2 PTE, lowering the various ref-counts. Also takes
>> + * care of invalidating the TLBs.  Must be called while holding
>> + * pgd_lock, otherwise another faulting VCPU may come in and mess
>> + * things behind our back.
>> + */
>> +static void stage2_clear_pte(struct kvm *kvm, phys_addr_t addr)
>> +{
>> +     pgd_t *pgd;
>> +     pud_t *pud;
>> +     pmd_t *pmd;
>> +     pte_t *pte;
>> +     struct page *page;
>> +
>> +     kvm_debug("Clearing PTE&%08llx\n", addr);
>> +     pgd = kvm->arch.pgd + pgd_index(addr);
>> +     pud = pud_offset(pgd, addr);
>> +     BUG_ON(pud_none(*pud));
>> +
>> +     pmd = pmd_offset(pud, addr);
>> +     BUG_ON(pmd_none(*pmd));
>> +
>> +     pte = pte_offset_kernel(pmd, addr);
>> +     set_pte_ext(pte, __pte(0), 0);
>> +
>> +     page = virt_to_page(pte);
>> +     put_page(page);
>> +     if (page_count(page) != 1) {
>> +             __kvm_tlb_flush_vmid(kvm);
>> +             return;
>> +     }
>> +
>> +     /* Need to remove pte page */
>> +     pmd_clear(pmd);
>> +     __kvm_tlb_flush_vmid(kvm);
>> +     pte_free_kernel(NULL, (pte_t *)((unsigned long)pte & PAGE_MASK));
>> +
>> +     page = virt_to_page(pmd);
>> +     put_page(page);
>> +     if (page_count(page) != 1)
>> +             return;
>> +
>> +     /*
>> +      * Need to remove pmd page. This is the worst case, and we end
>> +      * up invalidating the TLB twice. No big deal.
>> +      */
>> +     pud_clear(pud);
>> +     __kvm_tlb_flush_vmid(kvm);
>> +     pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
>> +
>> +     page = virt_to_page(pud);
>> +     put_page(page);
>> +}
>> +
>> +static void stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>> +                        phys_addr_t addr, const pte_t *new_pte)
>> +{
>> +     pgd_t *pgd;
>> +     pud_t *pud;
>> +     pmd_t *pmd;
>> +     pte_t *pte;
>> +
>> +     /* Create 2nd stage page table mapping - Level 1 */
>> +     pgd = kvm->arch.pgd + pgd_index(addr);
>> +     pud = pud_offset(pgd, addr);
>> +     if (pud_none(*pud)) {
>> +             if (!cache)
>> +                     return; /* ignore calls from kvm_set_spte_hva */
>> +             pmd = mmu_memory_cache_alloc(cache);
>> +             pud_populate(NULL, pud, pmd);
>> +             pmd += pmd_index(addr);
>> +             get_page(virt_to_page(pud));
>> +     } else
>> +             pmd = pmd_offset(pud, addr);
>> +
>> +     /* Create 2nd stage page table mapping - Level 2 */
>> +     if (pmd_none(*pmd)) {
>> +             if (!cache)
>> +                     return; /* ignore calls from kvm_set_spte_hva */
>> +             pte = mmu_memory_cache_alloc(cache);
>> +             clean_pte_table(pte);
>> +             pmd_populate_kernel(NULL, pmd, pte);
>> +             pte += pte_index(addr);
>> +             get_page(virt_to_page(pmd));
>> +     } else
>> +             pte = pte_offset_kernel(pmd, addr);
>> +
>> +     /* Create 2nd stage page table mapping - Level 3 */
>> +     BUG_ON(pte_none(pte));
>> +     set_pte_ext(pte, *new_pte, 0);
>> +     get_page(virt_to_page(pte));
>> +}
>> +
>> +/**
>> + * kvm_phys_addr_ioremap - map a device range to guest IPA
>> + *
>> + * @kvm:     The KVM pointer
>> + * @guest_ipa:       The IPA at which to insert the mapping
>> + * @pa:              The physical address of the device
>> + * @size:    The size of the mapping
>> + */
>> +int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>> +                       phys_addr_t pa, unsigned long size)
>> +{
>> +     phys_addr_t addr, end;
>> +     pgprot_t prot;
>> +     int ret = 0;
>> +     unsigned long pfn;
>> +     struct kvm_mmu_memory_cache cache = { 0, };
>> +
>> +     end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
>> +     prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
>> +                     L_PTE2_READ | L_PTE2_WRITE);
>> +     pfn = __phys_to_pfn(pa);
>> +
>> +     for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
>> +             pte_t pte = pfn_pte(pfn, prot);
>> +
>> +             ret = mmu_topup_memory_cache(&cache, 2, 2);
>> +             if (ret)
>> +                     goto out;
>> +             spin_lock(&kvm->arch.pgd_lock);
>> +             stage2_set_pte(kvm, &cache, addr, &pte);
>> +             spin_unlock(&kvm->arch.pgd_lock);
>> +
>> +             pfn++;
>> +     }
>> +
>> +out:
>> +     mmu_free_memory_cache(&cache);
>> +     return ret;
>> +}
>> +
>> int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> {
>>       return -EINVAL;
>> }
>> +
>> +static bool hva_to_gpa(struct kvm *kvm, unsigned long hva, gpa_t *gpa)
>
> A single hva can have multiple gpas mapped, no? At least that's what I gathered from the discussion about my attempt to a function similar to this :).
>

I don't think this is the case for ARM, can you provide an example? We
use gfn_to_pfn_prot and only allow user memory regions. What you
suggest would be multiple physical addresses pointing to the same
memory bank, I don't think that makes any sense on ARM hardware, for
x86 and PPC I don't know.

Unless I miss something completely all together here...?

> I'm also having a hard time following your mmu code in general. When do pages get mapped? Where do they get mapped from?
>

as the text describes, this happens in a later patch. See
user_mem_abort in patch 12.

>
> Alex
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [kvmarm] [PATCH v10 07/14] KVM: ARM: Memory virtualization setup
  2012-08-19  4:34     ` Christoffer Dall
@ 2012-08-19  9:38       ` Peter Maydell
  2012-08-19 13:00         ` Avi Kivity
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Maydell @ 2012-08-19  9:38 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Alexander Graf, kvmarm, kvm

On 19 August 2012 05:34, Christoffer Dall <c.dall@virtualopensystems.com> wrote:
> On Thu, Aug 16, 2012 at 2:25 PM, Alexander Graf <agraf@suse.de> wrote:
>> A single hva can have multiple gpas mapped, no? At least that's what I gathered
>> from the discussion about my attempt to a function similar to this :).

> I don't think this is the case for ARM, can you provide an example? We
> use gfn_to_pfn_prot and only allow user memory regions. What you
> suggest would be multiple physical addresses pointing to the same
> memory bank, I don't think that makes any sense on ARM hardware, for
> x86 and PPC I don't know.

I don't know what an hva is, but yes, ARM boards can have the same
block of RAM aliased into multiple places in the physical address space.
(we don't currently bother to implement the aliases in qemu's vexpress-a15
though because it's a bunch of mappings of the low 2GB into high
addresses mostly intended to let you test LPAE code without having to
put lots of RAM on the hardware).

-- PMM

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [kvmarm] [PATCH v10 07/14] KVM: ARM: Memory virtualization setup
  2012-08-19  9:38       ` Peter Maydell
@ 2012-08-19 13:00         ` Avi Kivity
  2012-08-19 20:00           ` Christoffer Dall
  0 siblings, 1 reply; 29+ messages in thread
From: Avi Kivity @ 2012-08-19 13:00 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Christoffer Dall, Alexander Graf, kvmarm, kvm

On 08/19/2012 12:38 PM, Peter Maydell wrote:
> On 19 August 2012 05:34, Christoffer Dall <c.dall@virtualopensystems.com> wrote:
>> On Thu, Aug 16, 2012 at 2:25 PM, Alexander Graf <agraf@suse.de> wrote:
>>> A single hva can have multiple gpas mapped, no? At least that's what I gathered
>>> from the discussion about my attempt to a function similar to this :).
> 
>> I don't think this is the case for ARM, can you provide an example? We
>> use gfn_to_pfn_prot and only allow user memory regions. What you
>> suggest would be multiple physical addresses pointing to the same
>> memory bank, I don't think that makes any sense on ARM hardware, for
>> x86 and PPC I don't know.
> 
> I don't know what an hva is,

host virtual address

(see Documentation/virtual/kvm/mmu.txt for more TLAs in this area).

 but yes, ARM boards can have the same
> block of RAM aliased into multiple places in the physical address space.
> (we don't currently bother to implement the aliases in qemu's vexpress-a15
> though because it's a bunch of mappings of the low 2GB into high
> addresses mostly intended to let you test LPAE code without having to
> put lots of RAM on the hardware).

Even if it weren't common, the API allows it, so we must behave sensibly.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [kvmarm] [PATCH v10 07/14] KVM: ARM: Memory virtualization setup
  2012-08-19 13:00         ` Avi Kivity
@ 2012-08-19 20:00           ` Christoffer Dall
  0 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-19 20:00 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Peter Maydell, Alexander Graf, kvmarm, kvm

On Sun, Aug 19, 2012 at 9:00 AM, Avi Kivity <avi@redhat.com> wrote:
> On 08/19/2012 12:38 PM, Peter Maydell wrote:
>> On 19 August 2012 05:34, Christoffer Dall <c.dall@virtualopensystems.com> wrote:
>>> On Thu, Aug 16, 2012 at 2:25 PM, Alexander Graf <agraf@suse.de> wrote:
>>>> A single hva can have multiple gpas mapped, no? At least that's what I gathered
>>>> from the discussion about my attempt to a function similar to this :).
>>
>>> I don't think this is the case for ARM, can you provide an example? We
>>> use gfn_to_pfn_prot and only allow user memory regions. What you
>>> suggest would be multiple physical addresses pointing to the same
>>> memory bank, I don't think that makes any sense on ARM hardware, for
>>> x86 and PPC I don't know.
>>
>> I don't know what an hva is,
>
> host virtual address
>
> (see Documentation/virtual/kvm/mmu.txt for more TLAs in this area).
>
>  but yes, ARM boards can have the same
>> block of RAM aliased into multiple places in the physical address space.
>> (we don't currently bother to implement the aliases in qemu's vexpress-a15
>> though because it's a bunch of mappings of the low 2GB into high
>> addresses mostly intended to let you test LPAE code without having to
>> put lots of RAM on the hardware).

I stand corrected.

>
> Even if it weren't common, the API allows it, so we must behave sensibly.
>

true, this should be a solution:

commit 2a8661fd7e6c15889a20a4547bd7861e84b778a8
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Sun Aug 19 15:52:10 2012 -0400

    KVM: ARM: A single hva can map multiple gpas

    Handle mmu notifier ops for every such mapping.

    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 3df4fa8..9b23230 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -754,11 +754,14 @@ int kvm_handle_guest_abort(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 	return ret ? ret : 1;
 }

-static bool hva_to_gpa(struct kvm *kvm, unsigned long hva, gpa_t *gpa)
+static int handle_hva_to_gpa(struct kvm *kvm, unsigned long hva,
+			     void (*handler)(struct kvm *kvm, unsigned long hva,
+					     gpa_t gpa, void *data),
+			     void *data)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
-	bool found = false;
+	int cnt = 0;

 	slots = kvm_memslots(kvm);

@@ -769,31 +772,36 @@ static bool hva_to_gpa(struct kvm *kvm, unsigned
long hva, gpa_t *gpa)

 		end = start + (memslot->npages << PAGE_SHIFT);
 		if (hva >= start && hva < end) {
+			gpa_t gpa;
 			gpa_t gpa_offset = hva - start;
-			*gpa = (memslot->base_gfn << PAGE_SHIFT) + gpa_offset;
-			found = true;
-			/* no overlapping memslots allowed: break */
-			break;
+			gpa = (memslot->base_gfn << PAGE_SHIFT) + gpa_offset;
+			handler(kvm, hva, gpa, data);
+			cnt++;
 		}
 	}

-	return found;
+	return cnt;
+}
+
+static void kvm_unmap_hva_handler(struct kvm *kvm, unsigned long hva,
+				  gpa_t gpa, void *data)
+{
+	spin_lock(&kvm->arch.pgd_lock);
+	stage2_clear_pte(kvm, gpa);
+	spin_unlock(&kvm->arch.pgd_lock);
 }

 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 {
-	bool found;
-	gpa_t gpa;
+	int found;

 	if (!kvm->arch.pgd)
 		return 0;

-	found = hva_to_gpa(kvm, hva, &gpa);
-	if (found) {
-		spin_lock(&kvm->arch.pgd_lock);
-		stage2_clear_pte(kvm, gpa);
-		spin_unlock(&kvm->arch.pgd_lock);
-	}
+	found = handle_hva_to_gpa(kvm, hva, &kvm_unmap_hva_handler, NULL);
+	if (found > 0)
+		__kvm_tlb_flush_vmid(kvm);
+
 	return 0;
 }

@@ -814,21 +822,27 @@ int kvm_unmap_hva_range(struct kvm *kvm,
 	return 0;
 }

+static void kvm_set_spte_handler(struct kvm *kvm, unsigned long hva,
+				 gpa_t gpa, void *data)
+{
+	pte_t *pte = (pte_t *)data;
+
+	spin_lock(&kvm->arch.pgd_lock);
+	stage2_set_pte(kvm, NULL, gpa, pte);
+	spin_unlock(&kvm->arch.pgd_lock);
+}
+
+
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 {
-	gpa_t gpa;
-	bool found;
+	int found;

 	if (!kvm->arch.pgd)
 		return;

-	found = hva_to_gpa(kvm, hva, &gpa);
-	if (found) {
-		spin_lock(&kvm->arch.pgd_lock);
-		stage2_set_pte(kvm, NULL, gpa, &pte);
-		spin_unlock(&kvm->arch.pgd_lock);
+	found = handle_hva_to_gpa(kvm, hva, &kvm_set_spte_handler, &pte);
+	if (found > 0)
 		__kvm_tlb_flush_vmid(kvm);
-	}
 }

 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)

--

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 08/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2012-08-16 15:29 ` [PATCH v10 08/14] KVM: ARM: Inject IRQs and FIQs from userspace Christoffer Dall
@ 2012-08-21  8:20   ` Jan Kiszka
  2012-08-21 14:13     ` Christoffer Dall
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2012-08-21  8:20 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm

On 2012-08-16 17:29, Christoffer Dall wrote:
> From: Christoffer Dall <cdall@cs.columbia.edu>
> 
> Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
> This ioctl is used since the sematics are in fact two lines that can be
> either raised or lowered on the VCPU - the IRQ and FIQ lines.
> 
> KVM needs to know which VCPU it must operate on and whether the FIQ or
> IRQ line is raised/lowered. Hence both pieces of information is packed
> in the kvm_irq_level->irq field. The irq fild value will be:
>   IRQ: vcpu_index << 1
>   FIQ: (vcpu_index << 1) | 1
> 
> This is documented in Documentation/kvm/api.txt.
> 
> The effect of the ioctl is simply to simply raise/lower the
> corresponding irq_line field on the VCPU struct, which will cause the
> world-switch code to raise/lower virtual interrupts when running the
> guest on next switch. The wait_for_interrupt flag is also cleared for
> raised IRQs or FIQs causing an idle VCPU to become active again. CPUs
> in guest mode are kicked to make sure they refresh their interrupt status.
> 
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  Documentation/virtual/kvm/api.txt |   12 ++++++---
>  arch/arm/include/asm/kvm.h        |    9 +++++++
>  arch/arm/include/asm/kvm_arm.h    |    1 +
>  arch/arm/kvm/arm.c                |   47 +++++++++++++++++++++++++++++++++++++
>  include/linux/kvm.h               |    1 +
>  5 files changed, 66 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index bf33aaa..8345b78 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -614,15 +614,19 @@ only go to the IOAPIC.  On ia64, a IOSAPIC is created.
>  4.25 KVM_IRQ_LINE

So this IOCTL is now used for injecting from a userspace GIC model to
the VCPUs, right? Then this version is OK as userspace should be able to
detect IRQ coalescing per IRQ generating device.

However, injecting into an in-kernel GIC should use KVM_IRQ_LINE_STATUS
later on, keeping the door open for userspace implementing the IRQ
de-coalescing (for custom periodic timer devices). Have you thought
about how that IRQ injecting interface will look like, how the
interrupts will be specified? Do you actually plan to have both version
as ABI (with and without in-kernel irqchip)?

>  
>  Capability: KVM_CAP_IRQCHIP
> -Architectures: x86, ia64
> +Architectures: x86, ia64, arm
>  Type: vm ioctl
>  Parameters: struct kvm_irq_level
>  Returns: 0 on success, -1 on error
>  
>  Sets the level of a GSI input to the interrupt controller model in the kernel.
> -Requires that an interrupt controller model has been previously created with
> -KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
> -to be set to 1 and then back to 0.
> +On some architectures it is required that an interrupt controller model has
      ^^^^
Please be more specific to help the poor users of this interface
guessing if the should or should not call CREATE_IRQCHIP first.

> +been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
> +interrupts require the level to be set to 1 and then back to 0.
> +
> +ARM uses two types of interrupt lines per CPU: IRQ and FIQ.  The value of the
> +irq field should be (vcpu_index << 1) for IRQs and ((vcpu_index << 1) | 1) for
> +FIQs. Level is used to raise/lower the line.
>  
>  struct kvm_irq_level {
>  	union {
> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
> index bc5d72b..4a3e25d 100644
> --- a/arch/arm/include/asm/kvm.h
> +++ b/arch/arm/include/asm/kvm.h
> @@ -22,6 +22,15 @@
>  #include <asm/types.h>
>  
>  #define __KVM_HAVE_GUEST_DEBUG
> +#define __KVM_HAVE_IRQ_LINE
> +
> +/*
> + * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index.
> + */
> +enum KVM_ARM_IRQ_LINE_TYPE {
> +	KVM_ARM_IRQ_LINE = 0,
> +	KVM_ARM_FIQ_LINE = 1,
> +};
>  
>  /*
>   * Modes used for short-hand mode determinition in the world-switch code and
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index 6e46541..0f641c1 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -74,6 +74,7 @@
>  #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
>  			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
>  			HCR_SWIO | HCR_TIDCP)
> +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>  
>  /* Hyp System Control Register (HSCTLR) bits */
>  #define HSCTLR_TE	(1 << 30)
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 3f97e7c..8306587 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -24,6 +24,7 @@
>  #include <linux/fs.h>
>  #include <linux/mman.h>
>  #include <linux/sched.h>
> +#include <linux/kvm.h>
>  #include <trace/events/kvm.h>
>  
>  #define CREATE_TRACE_POINTS
> @@ -265,6 +266,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
> +	vcpu->cpu = cpu;
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -305,6 +307,51 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	return -EINVAL;
>  }
>  
> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
> +{
> +	unsigned int vcpu_idx;
> +	struct kvm_vcpu *vcpu;
> +	unsigned long *ptr;
> +	bool set;
> +	int bit_index;
> +
> +	vcpu_idx = irq_level->irq >> 1;
> +	if (vcpu_idx >= KVM_MAX_VCPUS)
> +		return -EINVAL;
> +
> +	vcpu = kvm_get_vcpu(kvm, vcpu_idx);
> +	if (!vcpu)
> +		return -EINVAL;
> +
> +	trace_kvm_set_irq(irq_level->irq, irq_level->level, 0);
> +
> +	if ((irq_level->irq & 1) == KVM_ARM_IRQ_LINE)
> +		bit_index = ffs(HCR_VI) - 1;
> +	else /* KVM_ARM_FIQ_LINE */
> +		bit_index = ffs(HCR_VF) - 1;
> +
> +	ptr = (unsigned long *)&vcpu->arch.irq_lines;
> +	if (irq_level->level)
> +		set = test_and_set_bit(bit_index, ptr);
> +	else
> +		set = test_and_clear_bit(bit_index, ptr);
> +
> +	/*
> +	 * If we didn't change anything, no need to wake up or kick other CPUs
> +	 */
> +	if (!!set == !!irq_level->level)
> +		return 0;
> +
> +	/*
> +	 * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
> +	 * trigger a world-switch round on the running physical CPU to set the
> +	 * virtual IRQ/FIQ fields in the HCR appropriately.
> +	 */
> +	kvm_vcpu_kick(vcpu);
> +
> +	return 0;
> +}
> +
>  long kvm_arch_vcpu_ioctl(struct file *filp,
>  			 unsigned int ioctl, unsigned long arg)
>  {
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 5fb08b5..c9b2556 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -111,6 +111,7 @@ struct kvm_irq_level {
>  	 * ACPI gsi notion of irq.
>  	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
>  	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
> +	 * For ARM: IRQ: irq = (2*vcpu_index). FIQ: irq = (2*vcpu_indx + 1).
>  	 */
>  	union {
>  		__u32 irq;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 08/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2012-08-21  8:20   ` Jan Kiszka
@ 2012-08-21 14:13     ` Christoffer Dall
  0 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-21 14:13 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvmarm, kvm

On Tue, Aug 21, 2012 at 4:20 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-08-16 17:29, Christoffer Dall wrote:
>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>
>> Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
>> This ioctl is used since the sematics are in fact two lines that can be
>> either raised or lowered on the VCPU - the IRQ and FIQ lines.
>>
>> KVM needs to know which VCPU it must operate on and whether the FIQ or
>> IRQ line is raised/lowered. Hence both pieces of information is packed
>> in the kvm_irq_level->irq field. The irq fild value will be:
>>   IRQ: vcpu_index << 1
>>   FIQ: (vcpu_index << 1) | 1
>>
>> This is documented in Documentation/kvm/api.txt.
>>
>> The effect of the ioctl is simply to simply raise/lower the
>> corresponding irq_line field on the VCPU struct, which will cause the
>> world-switch code to raise/lower virtual interrupts when running the
>> guest on next switch. The wait_for_interrupt flag is also cleared for
>> raised IRQs or FIQs causing an idle VCPU to become active again. CPUs
>> in guest mode are kicked to make sure they refresh their interrupt status.
>>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  Documentation/virtual/kvm/api.txt |   12 ++++++---
>>  arch/arm/include/asm/kvm.h        |    9 +++++++
>>  arch/arm/include/asm/kvm_arm.h    |    1 +
>>  arch/arm/kvm/arm.c                |   47 +++++++++++++++++++++++++++++++++++++
>>  include/linux/kvm.h               |    1 +
>>  5 files changed, 66 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index bf33aaa..8345b78 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -614,15 +614,19 @@ only go to the IOAPIC.  On ia64, a IOSAPIC is created.
>>  4.25 KVM_IRQ_LINE
>
> So this IOCTL is now used for injecting from a userspace GIC model to
> the VCPUs, right? Then this version is OK as userspace should be able to
> detect IRQ coalescing per IRQ generating device.
>
> However, injecting into an in-kernel GIC should use KVM_IRQ_LINE_STATUS
> later on, keeping the door open for userspace implementing the IRQ
> de-coalescing (for custom periodic timer devices). Have you thought
> about how that IRQ injecting interface will look like, how the
> interrupts will be specified? Do you actually plan to have both version
> as ABI (with and without in-kernel irqchip)?
>

Hi Jan,

this is being reworked when the in-kernel vgic support gets merged,
see the discussion starting with this mail:
https://lists.cs.columbia.edu/pipermail/kvmarm/2012-August/002922.html



>>
>>  Capability: KVM_CAP_IRQCHIP
>> -Architectures: x86, ia64
>> +Architectures: x86, ia64, arm
>>  Type: vm ioctl
>>  Parameters: struct kvm_irq_level
>>  Returns: 0 on success, -1 on error
>>
>>  Sets the level of a GSI input to the interrupt controller model in the kernel.
>> -Requires that an interrupt controller model has been previously created with
>> -KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
>> -to be set to 1 and then back to 0.
>> +On some architectures it is required that an interrupt controller model has
>       ^^^^
> Please be more specific to help the poor users of this interface
> guessing if the should or should not call CREATE_IRQCHIP first.
>

ok, I will revise that text.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 07/14] KVM: ARM: Memory virtualization setup
  2012-08-16 15:29 ` [PATCH v10 07/14] KVM: ARM: Memory virtualization setup Christoffer Dall
  2012-08-16 18:25   ` [kvmarm] " Alexander Graf
@ 2012-08-23  8:12   ` Min-gyu Kim
  2012-08-23 14:46     ` Christoffer Dall
  1 sibling, 1 reply; 29+ messages in thread
From: Min-gyu Kim @ 2012-08-23  8:12 UTC (permalink / raw)
  To: 'Christoffer Dall', kvmarm, kvm; +Cc: 김창환

Dear Christoffer Dall,

In you code, kvm_alloc_stage2_pgd function allocates page table and memset.
isn't it necessary to clean the data cache after memset?

For reference, pgd_alloc cleans the area with clean_dcache_area.

There's some situation that cleaning is not necessary, and clean_dcache_area function becomes empty in that case.
But I think explicit cleaning would be better for portability.


Best Regards
Min-gyu Kim


-----Original Message-----
From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf Of Christoffer Dall
Sent: Friday, August 17, 2012 12:29 AM
To: kvmarm@lists.cs.columbia.edu; kvm@vger.kernel.org
Subject: [PATCH v10 07/14] KVM: ARM: Memory virtualization setup

This commit introduces the framework for guest memory management through the use of 2nd stage translation. Each VM has a pointer to a level-1 table (the pgd field in struct kvm_arch) which is used for the 2nd stage translations. Entries are added when handling guest faults (later patch) and the table itself can be allocated and freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and pgprot_guest variables used to map 2nd stage memory for KVM guests.

Each entry in TLBs and caches are tagged with a VMID identifier in addition to ASIDs. The VMIDs are assigned consecutively to VMs in the order that VMs are executed, and caches and tlbs are invalidated when the VMID space has been used to allow for more than 255 simultaenously running guests.

The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is freed in kvm_arch_destroy_vm(). Both functions are called from the main KVM code.

We pre-allocate page table memory to be able to synchronize using a spinlock and be called under rcu_read_lock from the MMU notifiers.  We steal the mmu_memory_cache implementation from x86 and adapt for our specific usage.

We support MMU notifiers (thanks to Marc Zyngier) through kvm_unmap_hva and kvm_set_spte_hva.

Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA, which is used by VGIC support to map the virtual CPU interface registers to the guest. This support is added by Marc Zyngier.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_asm.h        |    2 
 arch/arm/include/asm/kvm_host.h       |   18 ++
 arch/arm/include/asm/kvm_mmu.h        |    9 +
 arch/arm/include/asm/pgtable-3level.h |    9 +
 arch/arm/include/asm/pgtable.h        |    4 
 arch/arm/kvm/Kconfig                  |    1 
 arch/arm/kvm/arm.c                    |   38 +++
 arch/arm/kvm/exports.c                |    1 
 arch/arm/kvm/interrupts.S             |    8 +
 arch/arm/kvm/mmu.c                    |  373 +++++++++++++++++++++++++++++++++
 arch/arm/mm/mmu.c                     |    3 
 11 files changed, 465 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h index 58d51e3..55b6446 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -34,6 +34,7 @@
 #define SMCHYP_HVBAR_W 0xfffffff0
 
 #ifndef __ASSEMBLY__
+struct kvm;
 struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
@@ -48,6 +49,7 @@ extern char __kvm_hyp_code_start[];  extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);  #endif diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index d7e3398..d86ce39 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -157,4 +157,22 @@ struct kvm_vcpu_stat {  struct kvm_vcpu_init;  int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 			const struct kvm_vcpu_init *init);
+
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+struct kvm;
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva); int 
+kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end); void 
+kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+
+/* We do not have shadow page tables, hence the empty hooks */ static 
+inline int kvm_age_hva(struct kvm *kvm, unsigned long hva) {
+	return 0;
+}
+
+static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva) 
+{
+	return 0;
+}
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 8252921..11f4c3a 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -33,4 +33,13 @@ int create_hyp_mappings(void *from, void *to);  int create_hyp_io_mappings(void *from, void *to, phys_addr_t);  void free_hyp_pmds(void);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm); void 
+kvm_free_stage2_pgd(struct kvm *kvm); int kvm_phys_addr_ioremap(struct 
+kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 1169a8a..7351eee 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -102,6 +102,15 @@
  */
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
 
+/*
+ * 2-nd stage PTE definitions for LPAE.
+ */
+#define L_PTE2_SHARED		L_PTE_SHARED
+#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
+#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */
+#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
+#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */
+
 #ifndef __ASSEMBLY__
 
 #define pud_none(pud)		(!pud_val(pud))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index bc83540..a31d0e9 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 
 extern pgprot_t		pgprot_user;
 extern pgprot_t		pgprot_kernel;
+extern pgprot_t		pgprot_guest;
 
 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
 
@@ -83,6 +84,9 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
 #define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
+#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
+					  L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
+					  L_PTE2_SHARED)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 83abbe0..7fa50d3 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM_ARM_HOST
 	depends on KVM
 	depends on MMU
 	depends on CPU_V7 && ARM_VIRT_EXT
+	select	MMU_NOTIFIER
 	---help---
 	  Provides host support for ARM processors.
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 0b1c466..3f97e7c 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -82,12 +82,34 @@ void kvm_arch_sync_events(struct kvm *kvm)  {  }
 
+/**
+ * kvm_arch_init_vm - initializes a VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)  {
+	int ret = 0;
+
 	if (type)
 		return -EINVAL;
 
-	return 0;
+	ret = kvm_alloc_stage2_pgd(kvm);
+	if (ret)
+		goto out_fail_alloc;
+	spin_lock_init(&kvm->arch.pgd_lock);
+
+	ret = create_hyp_mappings(kvm, kvm + 1);
+	if (ret)
+		goto out_free_stage2_pgd;
+
+	/* Mark the initial VMID generation invalid */
+	kvm->arch.vmid_gen = 0;
+
+	return ret;
+out_free_stage2_pgd:
+	kvm_free_stage2_pgd(kvm);
+out_fail_alloc:
+	return ret;
 }
 
 int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) @@ -105,10 +127,16 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 	return 0;
 }
 
+/**
+ * kvm_arch_destroy_vm - destroy the VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 void kvm_arch_destroy_vm(struct kvm *kvm)  {
 	int i;
 
+	kvm_free_stage2_pgd(kvm);
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
@@ -184,7 +212,13 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (err)
 		goto free_vcpu;
 
+	err = create_hyp_mappings(vcpu, vcpu + 1);
+	if (err)
+		goto vcpu_uninit;
+
 	return vcpu;
+vcpu_uninit:
+	kvm_vcpu_uninit(vcpu);
 free_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 out:
@@ -193,6 +227,8 @@ out:
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)  {
+	kvm_mmu_free_memory_caches(vcpu);
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c index 8ebdf07..f39f823 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -33,5 +33,6 @@ EXPORT_SYMBOL_GPL(__kvm_hyp_code_end);
 EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
 
 EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
+EXPORT_SYMBOL_GPL(__kvm_tlb_flush_vmid);
 
 EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S index bf09801..edf9ed5 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -31,6 +31,14 @@ __kvm_hyp_code_start:
 	.globl __kvm_hyp_code_start
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Flush per-VMID TLBs
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_tlb_flush_vmid)
+	bx	lr
+ENDPROC(__kvm_tlb_flush_vmid)
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Flush TLBs and instruction caches of current CPU for all VMIDs  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 6a7dfd4..6cb0e38 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -23,10 +23,43 @@
 #include <asm/pgalloc.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_asm.h>
 #include <asm/mach/map.h>
 
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
+				  int min, int max)
+{
+	void *page;
+
+	BUG_ON(max > KVM_NR_MEM_OBJS);
+	if (cache->nobjs >= min)
+		return 0;
+	while (cache->nobjs < max) {
+		page = (void *)__get_free_page(PGALLOC_GFP);
+		if (!page)
+			return -ENOMEM;
+		cache->objects[cache->nobjs++] = page;
+	}
+	return 0;
+}
+
+static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) {
+	while (mc->nobjs)
+		free_page((unsigned long)mc->objects[--mc->nobjs]); }
+
+static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc) {
+	void *p;
+
+	BUG_ON(!mc || !mc->nobjs);
+	p = mc->objects[--mc->nobjs];
+	return p;
+}
+
 static void free_ptes(pmd_t *pmd, unsigned long addr)  {
 	pte_t *pte;
@@ -200,7 +233,347 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
 	return __create_hyp_mappings(from, to, &pfn);  }
 
+/**
+ * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Allocates the 1st level table only of size defined by PGD2_ORDER 
+(can
+ * support either full 40-bit input addresses or limited to 32-bit 
+input
+ * addresses). Clears the allocated pages.
+ *
+ * Note we don't need locking here as this is only called when the VM 
+is
+ * created, which can only be done once.
+ */
+int kvm_alloc_stage2_pgd(struct kvm *kvm) {
+	pgd_t *pgd;
+
+	if (kvm->arch.pgd != NULL) {
+		kvm_err("kvm_arch already initialized?\n");
+		return -EINVAL;
+	}
+
+	pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
+	if (!pgd)
+		return -ENOMEM;
+
+	memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
+	kvm->arch.pgd = pgd;
+
+	return 0;
+}
+
+static void free_guest_pages(pte_t *pte, unsigned long addr) {
+	unsigned int i;
+	struct page *page, *pte_page;
+
+	pte_page = virt_to_page(pte);
+
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		if (pte_present(*pte)) {
+			unsigned long pfn = pte_pfn(*pte);
+
+			if (pfn_valid(pfn)) { /* Skip over device memory */
+				page = pfn_to_page(pfn);
+				put_page(page);
+			}
+			put_page(pte_page);
+		}
+		pte++;
+	}
+}
+
+static void free_stage2_ptes(pmd_t *pmd, unsigned long addr) {
+	unsigned int i;
+	pte_t *pte;
+	struct page *page, *pmd_page;
+
+	pmd_page = virt_to_page(pmd);
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		BUG_ON(pmd_sect(*pmd));
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			free_guest_pages(pte, addr);
+			page = virt_to_page((void *)pte);
+			WARN_ON(page_count(page) != 1);
+			pte_free_kernel(NULL, pte);
+
+			put_page(pmd_page);
+		}
+		pmd++;
+	}
+}
+
+/**
+ * kvm_free_stage2_pgd - free all stage-2 tables
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Walks the level-1 page table pointed to by kvm->arch.pgd and frees 
+all
+ * underlying level-2 and level-3 tables before freeing the actual 
+level-1 table
+ * and setting the struct pointer to NULL.
+ *
+ * Note we don't need locking here as this is only called when the VM 
+is
+ * destroyed, which can only be done once.
+ */
+void kvm_free_stage2_pgd(struct kvm *kvm) {
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long long i, addr;
+	struct page *page, *pud_page;
+
+	if (kvm->arch.pgd == NULL)
+		return;
+
+	/*
+	 * We do this slightly different than other places, since we need more
+	 * than 32 bits and for instance pgd_addr_end converts to unsigned long.
+	 */
+	addr = 0;
+	for (i = 0; i < PTRS_PER_PGD2; i++) {
+		addr = i * (unsigned long long)PGDIR_SIZE;
+		pgd = kvm->arch.pgd + i;
+		pud = pud_offset(pgd, addr);
+		pud_page = virt_to_page(pud);
+
+		if (pud_none(*pud))
+			continue;
+
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_stage2_ptes(pmd, addr);
+		page = virt_to_page((void *)pmd);
+		WARN_ON(page_count(page) != 1);
+		pmd_free(NULL, pmd);
+		put_page(pud_page);
+	}
+
+	WARN_ON(page_count(pud_page) != 1);
+	free_pages((unsigned long)kvm->arch.pgd, PGD2_ORDER);
+	kvm->arch.pgd = NULL;
+}
+
+/*
+ * Clear a stage-2 PTE, lowering the various ref-counts. Also takes
+ * care of invalidating the TLBs.  Must be called while holding
+ * pgd_lock, otherwise another faulting VCPU may come in and mess
+ * things behind our back.
+ */
+static void stage2_clear_pte(struct kvm *kvm, phys_addr_t addr) {
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	struct page *page;
+
+	kvm_debug("Clearing PTE&%08llx\n", addr);
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	BUG_ON(pud_none(*pud));
+
+	pmd = pmd_offset(pud, addr);
+	BUG_ON(pmd_none(*pmd));
+
+	pte = pte_offset_kernel(pmd, addr);
+	set_pte_ext(pte, __pte(0), 0);
+
+	page = virt_to_page(pte);
+	put_page(page);
+	if (page_count(page) != 1) {
+		__kvm_tlb_flush_vmid(kvm);
+		return;
+	}
+
+	/* Need to remove pte page */
+	pmd_clear(pmd);
+	__kvm_tlb_flush_vmid(kvm);
+	pte_free_kernel(NULL, (pte_t *)((unsigned long)pte & PAGE_MASK));
+
+	page = virt_to_page(pmd);
+	put_page(page);
+	if (page_count(page) != 1)
+		return;
+
+	/*
+	 * Need to remove pmd page. This is the worst case, and we end
+	 * up invalidating the TLB twice. No big deal.
+	 */
+	pud_clear(pud);
+	__kvm_tlb_flush_vmid(kvm);
+	pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
+
+	page = virt_to_page(pud);
+	put_page(page);
+}
+
+static void stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+			   phys_addr_t addr, const pte_t *new_pte) {
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	/* Create 2nd stage page table mapping - Level 1 */
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud)) {
+		if (!cache)
+			return; /* ignore calls from kvm_set_spte_hva */
+		pmd = mmu_memory_cache_alloc(cache);
+		pud_populate(NULL, pud, pmd);
+		pmd += pmd_index(addr);
+		get_page(virt_to_page(pud));
+	} else
+		pmd = pmd_offset(pud, addr);
+
+	/* Create 2nd stage page table mapping - Level 2 */
+	if (pmd_none(*pmd)) {
+		if (!cache)
+			return; /* ignore calls from kvm_set_spte_hva */
+		pte = mmu_memory_cache_alloc(cache);
+		clean_pte_table(pte);
+		pmd_populate_kernel(NULL, pmd, pte);
+		pte += pte_index(addr);
+		get_page(virt_to_page(pmd));
+	} else
+		pte = pte_offset_kernel(pmd, addr);
+
+	/* Create 2nd stage page table mapping - Level 3 */
+	BUG_ON(pte_none(pte));
+	set_pte_ext(pte, *new_pte, 0);
+	get_page(virt_to_page(pte));
+}
+
+/**
+ * kvm_phys_addr_ioremap - map a device range to guest IPA
+ *
+ * @kvm:	The KVM pointer
+ * @guest_ipa:	The IPA at which to insert the mapping
+ * @pa:		The physical address of the device
+ * @size:	The size of the mapping
+ */
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size) {
+	phys_addr_t addr, end;
+	pgprot_t prot;
+	int ret = 0;
+	unsigned long pfn;
+	struct kvm_mmu_memory_cache cache = { 0, };
+
+	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
+	prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
+			L_PTE2_READ | L_PTE2_WRITE);
+	pfn = __phys_to_pfn(pa);
+
+	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
+		pte_t pte = pfn_pte(pfn, prot);
+
+		ret = mmu_topup_memory_cache(&cache, 2, 2);
+		if (ret)
+			goto out;
+		spin_lock(&kvm->arch.pgd_lock);
+		stage2_set_pte(kvm, &cache, addr, &pte);
+		spin_unlock(&kvm->arch.pgd_lock);
+
+		pfn++;
+	}
+
+out:
+	mmu_free_memory_cache(&cache);
+	return ret;
+}
+
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)  {
 	return -EINVAL;
 }
+
+static bool hva_to_gpa(struct kvm *kvm, unsigned long hva, gpa_t *gpa) 
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+	bool found = false;
+
+	slots = kvm_memslots(kvm);
+
+	/* we only care about the pages that the guest sees */
+	kvm_for_each_memslot(memslot, slots) {
+		unsigned long start = memslot->userspace_addr;
+		unsigned long end;
+
+		end = start + (memslot->npages << PAGE_SHIFT);
+		if (hva >= start && hva < end) {
+			gpa_t gpa_offset = hva - start;
+			*gpa = (memslot->base_gfn << PAGE_SHIFT) + gpa_offset;
+			found = true;
+			/* no overlapping memslots allowed: break */
+			break;
+		}
+	}
+
+	return found;
+}
+
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva) {
+	bool found;
+	gpa_t gpa;
+
+	if (!kvm->arch.pgd)
+		return 0;
+
+	found = hva_to_gpa(kvm, hva, &gpa);
+	if (found) {
+		spin_lock(&kvm->arch.pgd_lock);
+		stage2_clear_pte(kvm, gpa);
+		spin_unlock(&kvm->arch.pgd_lock);
+	}
+	return 0;
+}
+
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end) {
+	unsigned long addr;
+	int ret;
+
+	BUG_ON((start | end) & (~PAGE_MASK));
+
+	for (addr = start; addr < end; addr += PAGE_SIZE) {
+		ret = kvm_unmap_hva(kvm, addr);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) {
+	gpa_t gpa;
+	bool found;
+
+	if (!kvm->arch.pgd)
+		return;
+
+	found = hva_to_gpa(kvm, hva, &gpa);
+	if (found) {
+		spin_lock(&kvm->arch.pgd_lock);
+		stage2_set_pte(kvm, NULL, gpa, &pte);
+		spin_unlock(&kvm->arch.pgd_lock);
+		__kvm_tlb_flush_vmid(kvm);
+	}
+}
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu) {
+	mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+}
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index 76bf4f5..a153fd4 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -56,9 +56,11 @@ static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;  static unsigned int ecc_mask __initdata = 0;  pgprot_t pgprot_user;  pgprot_t pgprot_kernel;
+pgprot_t pgprot_guest;
 
 EXPORT_SYMBOL(pgprot_user);
 EXPORT_SYMBOL(pgprot_kernel);
+EXPORT_SYMBOL(pgprot_guest);
 
 struct cachepolicy {
 	const char	policy[16];
@@ -514,6 +516,7 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
+	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
 
 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 07/14] KVM: ARM: Memory virtualization setup
  2012-08-23  8:12   ` Min-gyu Kim
@ 2012-08-23 14:46     ` Christoffer Dall
  0 siblings, 0 replies; 29+ messages in thread
From: Christoffer Dall @ 2012-08-23 14:46 UTC (permalink / raw)
  To: Min-gyu Kim; +Cc: kvmarm, kvm, 김창환

On Thu, Aug 23, 2012 at 4:12 AM, Min-gyu Kim <mingyu84.kim@samsung.com> wrote:
> Dear Christoffer Dall,
>

Hi Min-gyu

(it's not considered best practice to top-post in these types of
mailing list, please post inline or bottom-post in the future).

> In you code, kvm_alloc_stage2_pgd function allocates page table and memset.
> isn't it necessary to clean the data cache after memset?
>
> For reference, pgd_alloc cleans the area with clean_dcache_area.
>

You are right, it does. I don't know if the architecture requires it,
but we certainly don't want the MMU to see anything else than zeroes
at that point. Good catch!

> There's some situation that cleaning is not necessary, and clean_dcache_area function becomes empty in that case.
> But I think explicit cleaning would be better for portability.
>

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [kvmarm] [PATCH v10 06/14] KVM: ARM: Hypervisor inititalization
  2012-08-16 15:29 ` [PATCH v10 06/14] KVM: ARM: Hypervisor inititalization Christoffer Dall
@ 2012-08-23 15:08   ` Lei Wen
  2012-08-23 15:27     ` Christoffer Dall
  0 siblings, 1 reply; 29+ messages in thread
From: Lei Wen @ 2012-08-23 15:08 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm

Hi Christoffer,

[snip]

> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> index 1dc8926..4db26cb 100644
> --- a/arch/arm/kvm/init.S
> +++ b/arch/arm/kvm/init.S
> @@ -15,5 +15,135 @@
>   * along with this program; if not, write to the Free Software
>   * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>   */
> +
> +#include <linux/linkage.h>
> +#include <asm/unified.h>
>  #include <asm/asm-offsets.h>
>  #include <asm/kvm_asm.h>
> +#include <asm/kvm_arm.h>
> +
> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +@  Hypervisor initialization
> +@    - should be called with:
> +@        r0 = Hypervisor pgd pointer
> +@        r1 = top of Hyp stack (kernel VA)
> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       .text
> +       .arm
> +        .pushsection    .hyp.idmap.text,"ax"
> +       .align 12
> +__kvm_hyp_init:
> +       .globl __kvm_hyp_init
> +
> +       @ Hyp-mode exception vector
> +       nop
> +       nop
> +       nop
> +       nop
> +       nop
> +       b       __do_hyp_init
> +       nop
> +       nop
> +
> +__do_hyp_init:
> +       @ Set the sp to end of this page and push data for later use
> +       mov     sp, pc
> +       bic     sp, sp, #0x0ff
> +       bic     sp, sp, #0xf00
> +       add     sp, sp, #0x1000
> +       push    {r0, r1, r2, r12}

Is safe to use the stack here? Since our HYP memory mapping is
gotten valid after the following HTTBR being set as I think.

> +
> +       @ Set the HTTBR to point to the hypervisor PGD pointer passed to
> +       @ function and set the upper bits equal to the kernel PGD.
> +       mrrc    p15, 1, r1, r2, c2
> +       mcrr    p15, 4, r0, r2, c2
> +

[snip]

Thanks,
Lei

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [kvmarm] [PATCH v10 06/14] KVM: ARM: Hypervisor inititalization
  2012-08-23 15:08   ` [kvmarm] " Lei Wen
@ 2012-08-23 15:27     ` Christoffer Dall
  2012-08-24  8:04       ` Lei Wen
  0 siblings, 1 reply; 29+ messages in thread
From: Christoffer Dall @ 2012-08-23 15:27 UTC (permalink / raw)
  To: Lei Wen; +Cc: kvmarm, kvm

On Thu, Aug 23, 2012 at 11:08 AM, Lei Wen <adrian.wenl@gmail.com> wrote:
> Hi Christoffer,
>
> [snip]
>
>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>> index 1dc8926..4db26cb 100644
>> --- a/arch/arm/kvm/init.S
>> +++ b/arch/arm/kvm/init.S
>> @@ -15,5 +15,135 @@
>>   * along with this program; if not, write to the Free Software
>>   * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>>   */
>> +
>> +#include <linux/linkage.h>
>> +#include <asm/unified.h>
>>  #include <asm/asm-offsets.h>
>>  #include <asm/kvm_asm.h>
>> +#include <asm/kvm_arm.h>
>> +
>> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>> +@  Hypervisor initialization
>> +@    - should be called with:
>> +@        r0 = Hypervisor pgd pointer
>> +@        r1 = top of Hyp stack (kernel VA)
>> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>> +       .text
>> +       .arm
>> +        .pushsection    .hyp.idmap.text,"ax"
>> +       .align 12
>> +__kvm_hyp_init:
>> +       .globl __kvm_hyp_init
>> +
>> +       @ Hyp-mode exception vector
>> +       nop
>> +       nop
>> +       nop
>> +       nop
>> +       nop
>> +       b       __do_hyp_init
>> +       nop
>> +       nop
>> +
>> +__do_hyp_init:
>> +       @ Set the sp to end of this page and push data for later use
>> +       mov     sp, pc
>> +       bic     sp, sp, #0x0ff
>> +       bic     sp, sp, #0xf00
>> +       add     sp, sp, #0x1000
>> +       push    {r0, r1, r2, r12}
>
> Is safe to use the stack here? Since our HYP memory mapping is
> gotten valid after the following HTTBR being set as I think.
>

yes, as you can see in the end of this block we have a .align 12
before __kvm_hyp_init_end giving us the stack space we need in this
page (as long as this init code doesn't grow beyond ~3K



>> +
>> +       @ Set the HTTBR to point to the hypervisor PGD pointer passed to
>> +       @ function and set the upper bits equal to the kernel PGD.
>> +       mrrc    p15, 1, r1, r2, c2
>> +       mcrr    p15, 4, r0, r2, c2
>> +
>
> [snip]
>
> Thanks,
> Lei

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [kvmarm] [PATCH v10 06/14] KVM: ARM: Hypervisor inititalization
  2012-08-23 15:27     ` Christoffer Dall
@ 2012-08-24  8:04       ` Lei Wen
  2012-08-24 13:38         ` Marc Zyngier
  0 siblings, 1 reply; 29+ messages in thread
From: Lei Wen @ 2012-08-24  8:04 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm

On Thu, Aug 23, 2012 at 11:27 PM, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On Thu, Aug 23, 2012 at 11:08 AM, Lei Wen <adrian.wenl@gmail.com> wrote:
>> Hi Christoffer,
>>
>> [snip]
>>
>>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>>> index 1dc8926..4db26cb 100644
>>> --- a/arch/arm/kvm/init.S
>>> +++ b/arch/arm/kvm/init.S
>>> @@ -15,5 +15,135 @@
>>>   * along with this program; if not, write to the Free Software
>>>   * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>>>   */
>>> +
>>> +#include <linux/linkage.h>
>>> +#include <asm/unified.h>
>>>  #include <asm/asm-offsets.h>
>>>  #include <asm/kvm_asm.h>
>>> +#include <asm/kvm_arm.h>
>>> +
>>> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>>> +@  Hypervisor initialization
>>> +@    - should be called with:
>>> +@        r0 = Hypervisor pgd pointer
>>> +@        r1 = top of Hyp stack (kernel VA)
>>> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>>> +       .text
>>> +       .arm
>>> +        .pushsection    .hyp.idmap.text,"ax"
>>> +       .align 12
>>> +__kvm_hyp_init:
>>> +       .globl __kvm_hyp_init
>>> +
>>> +       @ Hyp-mode exception vector
>>> +       nop
>>> +       nop
>>> +       nop
>>> +       nop
>>> +       nop
>>> +       b       __do_hyp_init
>>> +       nop
>>> +       nop
>>> +
>>> +__do_hyp_init:
>>> +       @ Set the sp to end of this page and push data for later use
>>> +       mov     sp, pc
>>> +       bic     sp, sp, #0x0ff
>>> +       bic     sp, sp, #0xf00
>>> +       add     sp, sp, #0x1000
>>> +       push    {r0, r1, r2, r12}
>>
>> Is safe to use the stack here? Since our HYP memory mapping is
>> gotten valid after the following HTTBR being set as I think.
>>
>
> yes, as you can see in the end of this block we have a .align 12
> before __kvm_hyp_init_end giving us the stack space we need in this
> page (as long as this init code doesn't grow beyond ~3K

I know this area is already being mapped in hyp_init_static_idmap,
however, due to the HTTBR is not being set up yet in the below code.
So how the memory get mapped in the hyp mode? I mean what I could
understand the hyp virtual address only could be seen after HTTBR
being set up.


>
>
>
>>> +
>>> +       @ Set the HTTBR to point to the hypervisor PGD pointer passed to
>>> +       @ function and set the upper bits equal to the kernel PGD.
>>> +       mrrc    p15, 1, r1, r2, c2
>>> +       mcrr    p15, 4, r0, r2, c2
>>> +
>>
>> [snip]

Thanks,
Lei

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [kvmarm] [PATCH v10 06/14] KVM: ARM: Hypervisor inititalization
  2012-08-24  8:04       ` Lei Wen
@ 2012-08-24 13:38         ` Marc Zyngier
  2012-08-24 14:34           ` Lei Wen
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2012-08-24 13:38 UTC (permalink / raw)
  To: Lei Wen; +Cc: Christoffer Dall, kvmarm, kvm

On Fri, 24 Aug 2012 16:04:02 +0800, Lei Wen <adrian.wenl@gmail.com> wrote:
> On Thu, Aug 23, 2012 at 11:27 PM, Christoffer Dall
> <c.dall@virtualopensystems.com> wrote:
>> On Thu, Aug 23, 2012 at 11:08 AM, Lei Wen <adrian.wenl@gmail.com>
wrote:
>>> Hi Christoffer,
>>>
>>> [snip]
>>>
>>>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>>>> index 1dc8926..4db26cb 100644
>>>> --- a/arch/arm/kvm/init.S
>>>> +++ b/arch/arm/kvm/init.S
>>>> @@ -15,5 +15,135 @@
>>>>   * along with this program; if not, write to the Free Software
>>>>   * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 
>>>>   02110-1301, USA.
>>>>   */
>>>> +
>>>> +#include <linux/linkage.h>
>>>> +#include <asm/unified.h>
>>>>  #include <asm/asm-offsets.h>
>>>>  #include <asm/kvm_asm.h>
>>>> +#include <asm/kvm_arm.h>
>>>> +
>>>>
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>>>> +@  Hypervisor initialization
>>>> +@    - should be called with:
>>>> +@        r0 = Hypervisor pgd pointer
>>>> +@        r1 = top of Hyp stack (kernel VA)
>>>>
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>>>> +       .text
>>>> +       .arm
>>>> +        .pushsection    .hyp.idmap.text,"ax"
>>>> +       .align 12
>>>> +__kvm_hyp_init:
>>>> +       .globl __kvm_hyp_init
>>>> +
>>>> +       @ Hyp-mode exception vector
>>>> +       nop
>>>> +       nop
>>>> +       nop
>>>> +       nop
>>>> +       nop
>>>> +       b       __do_hyp_init
>>>> +       nop
>>>> +       nop
>>>> +
>>>> +__do_hyp_init:
>>>> +       @ Set the sp to end of this page and push data for later use
>>>> +       mov     sp, pc
>>>> +       bic     sp, sp, #0x0ff
>>>> +       bic     sp, sp, #0xf00
>>>> +       add     sp, sp, #0x1000
>>>> +       push    {r0, r1, r2, r12}
>>>
>>> Is safe to use the stack here? Since our HYP memory mapping is
>>> gotten valid after the following HTTBR being set as I think.
>>>
>>
>> yes, as you can see in the end of this block we have a .align 12
>> before __kvm_hyp_init_end giving us the stack space we need in this
>> page (as long as this init code doesn't grow beyond ~3K
> 
> I know this area is already being mapped in hyp_init_static_idmap,
> however, due to the HTTBR is not being set up yet in the below code.
> So how the memory get mapped in the hyp mode? I mean what I could
> understand the hyp virtual address only could be seen after HTTBR
> being set up.

Hint: look at HSCTLR, and when the M bit gets set. Until then, the words
"virtual address" have no meaning.

        M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [kvmarm] [PATCH v10 06/14] KVM: ARM: Hypervisor inititalization
  2012-08-24 13:38         ` Marc Zyngier
@ 2012-08-24 14:34           ` Lei Wen
  0 siblings, 0 replies; 29+ messages in thread
From: Lei Wen @ 2012-08-24 14:34 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Christoffer Dall, kvmarm, kvm

Hi Marc,

On Fri, Aug 24, 2012 at 9:38 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On Fri, 24 Aug 2012 16:04:02 +0800, Lei Wen <adrian.wenl@gmail.com> wrote:
>> On Thu, Aug 23, 2012 at 11:27 PM, Christoffer Dall
>> <c.dall@virtualopensystems.com> wrote:
>>> On Thu, Aug 23, 2012 at 11:08 AM, Lei Wen <adrian.wenl@gmail.com>
> wrote:
>>>> Hi Christoffer,
>>>>
>>>> [snip]
>>>>
>>>>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>>>>> index 1dc8926..4db26cb 100644
>>>>> --- a/arch/arm/kvm/init.S
>>>>> +++ b/arch/arm/kvm/init.S
>>>>> @@ -15,5 +15,135 @@
>>>>>   * along with this program; if not, write to the Free Software
>>>>>   * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
>>>>>   02110-1301, USA.
>>>>>   */
>>>>> +
>>>>> +#include <linux/linkage.h>
>>>>> +#include <asm/unified.h>
>>>>>  #include <asm/asm-offsets.h>
>>>>>  #include <asm/kvm_asm.h>
>>>>> +#include <asm/kvm_arm.h>
>>>>> +
>>>>>
> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>>>>> +@  Hypervisor initialization
>>>>> +@    - should be called with:
>>>>> +@        r0 = Hypervisor pgd pointer
>>>>> +@        r1 = top of Hyp stack (kernel VA)
>>>>>
> +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>>>>> +       .text
>>>>> +       .arm
>>>>> +        .pushsection    .hyp.idmap.text,"ax"
>>>>> +       .align 12
>>>>> +__kvm_hyp_init:
>>>>> +       .globl __kvm_hyp_init
>>>>> +
>>>>> +       @ Hyp-mode exception vector
>>>>> +       nop
>>>>> +       nop
>>>>> +       nop
>>>>> +       nop
>>>>> +       nop
>>>>> +       b       __do_hyp_init
>>>>> +       nop
>>>>> +       nop
>>>>> +
>>>>> +__do_hyp_init:
>>>>> +       @ Set the sp to end of this page and push data for later use
>>>>> +       mov     sp, pc
>>>>> +       bic     sp, sp, #0x0ff
>>>>> +       bic     sp, sp, #0xf00
>>>>> +       add     sp, sp, #0x1000
>>>>> +       push    {r0, r1, r2, r12}
>>>>
>>>> Is safe to use the stack here? Since our HYP memory mapping is
>>>> gotten valid after the following HTTBR being set as I think.
>>>>
>>>
>>> yes, as you can see in the end of this block we have a .align 12
>>> before __kvm_hyp_init_end giving us the stack space we need in this
>>> page (as long as this init code doesn't grow beyond ~3K
>>
>> I know this area is already being mapped in hyp_init_static_idmap,
>> however, due to the HTTBR is not being set up yet in the below code.
>> So how the memory get mapped in the hyp mode? I mean what I could
>> understand the hyp virtual address only could be seen after HTTBR
>> being set up.
>
> Hint: look at HSCTLR, and when the M bit gets set. Until then, the words
> "virtual address" have no meaning.

I see. You means that when at the first push operation, it directly deal with
physical memory, mapping is meaningless at that moment. After MMU on,
for that piece of code being mapped as identical one, the sp operation also
get no problem.

Yep, that solve my confusion.

Thanks,
Lei

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-08-24 14:34 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-16 15:27 [PATCH v10 00/14] KVM/ARM Implementation Christoffer Dall
2012-08-16 15:28 ` [PATCH v10 01/14] ARM: add mem_type prot_pte accessor Christoffer Dall
2012-08-16 15:28 ` [PATCH v10 02/14] ARM: Add config option ARM_VIRT_EXT Christoffer Dall
2012-08-16 15:28 ` [PATCH v10 03/14] ARM: Section based HYP idmap Christoffer Dall
2012-08-16 15:28 ` [PATCH v10 04/14] ARM: Expose PMNC bitfields for KVM use Christoffer Dall
2012-08-16 15:28 ` [PATCH v10 05/14] KVM: ARM: Initial skeleton to compile KVM support Christoffer Dall
2012-08-16 15:29 ` [PATCH v10 06/14] KVM: ARM: Hypervisor inititalization Christoffer Dall
2012-08-23 15:08   ` [kvmarm] " Lei Wen
2012-08-23 15:27     ` Christoffer Dall
2012-08-24  8:04       ` Lei Wen
2012-08-24 13:38         ` Marc Zyngier
2012-08-24 14:34           ` Lei Wen
2012-08-16 15:29 ` [PATCH v10 07/14] KVM: ARM: Memory virtualization setup Christoffer Dall
2012-08-16 18:25   ` [kvmarm] " Alexander Graf
2012-08-19  4:34     ` Christoffer Dall
2012-08-19  9:38       ` Peter Maydell
2012-08-19 13:00         ` Avi Kivity
2012-08-19 20:00           ` Christoffer Dall
2012-08-23  8:12   ` Min-gyu Kim
2012-08-23 14:46     ` Christoffer Dall
2012-08-16 15:29 ` [PATCH v10 08/14] KVM: ARM: Inject IRQs and FIQs from userspace Christoffer Dall
2012-08-21  8:20   ` Jan Kiszka
2012-08-21 14:13     ` Christoffer Dall
2012-08-16 15:29 ` [PATCH v10 09/14] KVM: ARM: World-switch implementation Christoffer Dall
2012-08-16 15:29 ` [PATCH v10 10/14] KVM: ARM: Emulation framework and CP15 emulation Christoffer Dall
2012-08-16 15:30 ` [PATCH v10 11/14] KVM: ARM: User space API for getting/setting co-proc registers Christoffer Dall
2012-08-16 15:30 ` [PATCH v10 12/14] KVM: ARM: Handle guest faults in KVM Christoffer Dall
2012-08-16 15:30 ` [PATCH v10 13/14] KVM: ARM: Handle I/O aborts Christoffer Dall
2012-08-16 15:30 ` [PATCH v10 14/14] KVM: ARM: Guest wait-for-interrupts (WFI) support Christoffer Dall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.