All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 00/15] KVM/ARM Implementation
@ 2012-06-15 19:06 Christoffer Dall
  2012-06-15 19:06 ` [PATCH v8 01/15] ARM: add mem_type prot_pte accessor Christoffer Dall
                   ` (15 more replies)
  0 siblings, 16 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:06 UTC (permalink / raw)
  To: android-virt, kvm

The following series implements KVM support for ARM processors,
specifically on the Cortex A-15 platform.  Work is done in
collaboration between Columbia University, Virtual Open Systems and
ARM/Linaro.

The patch series applies to kvm/next, specifically commit:
25e531a988ea5a64bd97a72dc9d3c65ad5850120

This is Version 8 of the patch series, but the first two versions
were reviewed outside of the KVM mailing list. Changes can also be
pulled from:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v8

A non-flattened edition of the patch series can be found at:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v8-stage

The implementation is broken up into a logical set of patches, the first
four are preparatory patches:
 1. Add mem_type prot_pte accessor (ARM community)
 2. Use KVM_CAP_IRQ_ROUTING to protect routing code  (KVM community)
 3. Introduce __KVM_HAVE_IRQ_LINE (KVM community)
 4. Guard code with CONFIG_MMU_NOTIFIER (KVM community)

KVM guys, please consider pulling the KVM generic patches as early as
possible. Thanks.

The main implementation is broken up into separate patches, the first
containing a skeleton of files, makefile changes, the basic user space
interface and KVM architecture specific stubs.  Subsequent patches
implement parts of the system as listed:
 1.  Preparatory patch introducing __KVM_HAVE_IRQ_LINE
 2.  Preparatory patch guarding mmu_notifier code with CONFIG_MMU_NOTIFIER
 3.  Skeleton
 4.  Identity Mapping for Hyp mode
 5.  Hypervisor initialization
 6.  Hypervisor module unloading
 7.  Memory virtualization setup (hyp mode mappings and 2nd stage)
 8.  Inject IRQs and FIQs from userspace
 9.  World-switch implementation and Hyp exception vectors
 10. Emulation framework and CP15 emulation
 11. Handle guest user memory aborts
 12. Handle guest MMIO aborts
 13. Support guest wait-for-interrupt instructions

Testing:
Limited testing, but have run GCC inside guest, which compiled a small
hello-world program, which was successfully run. For v8 both ARM/Thumb-2
kernels were tested as both host/guest and both a compiled-in version
and a kernel module version of KVM was tested. Hardware still
unavailable to me, so all testing has been done on ARM Fast Models.

For a guide on how to set up a testing environment and try out these
patches, see:
 http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf

There is an issue list available using the issue tracker on:
https://github.com/virtualopensystems/linux-kvm-arm

Additionally a few major milestones are coming up shortly:
 - Support Thumb MMIO emulation and test MMIO emulation code
 - Use section-based permanent identity mappings for init code.
 - Merge Marc Zyngier's patch series for VGIC and timers (review in
   progress)
 - Change from SMC based install to relying on booting the kernel in Hyp
   mode. This requires some larger changes, but will allow a guest
   kernel to boot with KVM configured.
 - Guest NEON/VFP support (work-in-progress from Virtual Open Systems)

Changes since v7:
 - Traps accesses to ACTLR
 - Do not trap WFE execution
 - Upgrade barriers and TLB operations to inner-shareable domain
 - Restrucure hyp_pgd related code to be more opaque
 - Random SMP fixes
 - Random BUG fixes
 - Improve commenting
 - Support module loading/unloading of KVM/ARM
 - Thumb-2 support for host kernel and KVM
 - Unaligned cross-page wide guest Thumb instruction fetching
 - Support ITSTATE fields in CPSR for Thumb guests
 - Document HCR settings

Changes since v6:
 - Support for MMU notifiers to not pin user pages in memory
 - Suport build with log debugging
 - Bugfix: v6 clobbered r7 in init code
 - Simplify hyp code mapping
 - Cleanup of register access code
 - Table-based CP15 emulation from Rusty Russell
 - Various other bug fixes and cleanups

Changes since v5:
 - General bugfixes and nit fixes from reviews
 - Implemented re-use of VMIDs
 - Cleaned up the Hyp-mapping code to be readable by non-mm hackers
   (including myself)
 - Integrated preliminary SMP support in base patches
 - Lock-less interrupt injection and WFI support
 - Fixed signal-handling in while in guest (increases overall stability)

Changes since v4:
 - Addressed reviewer comments from v4
    * cleanup debug and trace code
    * remove printks
    * fixup kvm_arch_vcpu_ioctl_run
    * add trace details to mmio emulation
 - Fix from Marc Zyngier: Move kvm_guest_enter/exit into non-preemptible
   section (squashed into world-switch patch)
 - Cleanup create_hyp_mappings/remove_hyp_mappings from Marc Zyngier
   (squashed into hypervisor initialization patch)
 - Removed the remove_hyp_mappings feature. Removing hypervisor mappings
   could potentially unmap other important data shared in the same page.
 - Removed the arm_ prefix from the arch-specific files.
 - Initial SMP host/guest support

Changes since v3:
 - v4 actually works, fully boots a guest
 - Support compiling as a module
 - Use static inlines instead of macros for vcpu_reg and friends
 - Optimize kvm_vcpu_reg function
 - Use Ftrace for trace capabilities
 - Updated documentation and commenting
 - Use KVM_IRQ_LINE instead of KVM_INTERRUPT
 - Emulates load/store instructions not supported through HSR
  syndrome information.
 - Frees 2nd stage translation tables on VM teardown
 - Handles IRQ/FIQ instructions
 - Handles more CP15 accesses
 - Support guest WFI calls
 - Uses debugfs instead of /proc
 - Support compiling in Thumb mode

Changes since v2:
 - Performs world-switch code
 - Maps guest memory using 2nd stage translation
 - Emulates co-processor 15 instructions
 - Forwards I/O faults to QEMU.

---

Christoffer Dall (12):
      KVM: Introduce __KVM_HAVE_IRQ_LINE
      ARM: KVM: Initial skeleton to compile KVM support
      ARM: KVM: Hypervisor identity mapping
      ARM: KVM: Hypervisor inititalization
      ARM: KVM: Module unloading support
      ARM: KVM: Memory virtualization setup
      ARM: KVM: Inject IRQs and FIQs from userspace
      ARM: KVM: World-switch implementation
      ARM: KVM: Emulation framework and CP15 emulation
      ARM: KVM: Handle guest faults in KVM
      ARM: KVM: Handle I/O aborts
      ARM: KVM: Guest wait-for-interrupts (WFI) support

Marc Zyngier (3):
      ARM: add mem_type prot_pte accessor
      KVM: use KVM_CAP_IRQ_ROUTING to protect the routing related code
      KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER


 Documentation/virtual/kvm/api.txt           |   12 
 arch/arm/Kconfig                            |    2 
 arch/arm/Makefile                           |    1 
 arch/arm/include/asm/kvm.h                  |   87 +++
 arch/arm/include/asm/kvm_arm.h              |  144 +++++
 arch/arm/include/asm/kvm_asm.h              |   56 ++
 arch/arm/include/asm/kvm_emulate.h          |  108 +++
 arch/arm/include/asm/kvm_host.h             |  140 ++++
 arch/arm/include/asm/kvm_mmu.h              |   43 +
 arch/arm/include/asm/mach/map.h             |    1 
 arch/arm/include/asm/pgtable-3level-hwdef.h |    5 
 arch/arm/include/asm/pgtable-3level.h       |   12 
 arch/arm/include/asm/pgtable.h              |   10 
 arch/arm/include/asm/unified.h              |   12 
 arch/arm/kernel/armksyms.c                  |    7 
 arch/arm/kernel/asm-offsets.c               |   35 +
 arch/arm/kernel/entry-armv.S                |    1 
 arch/arm/kvm/Kconfig                        |   45 +
 arch/arm/kvm/Makefile                       |   17 +
 arch/arm/kvm/arm.c                          |  843 +++++++++++++++++++++++++++
 arch/arm/kvm/emulate.c                      |  747 ++++++++++++++++++++++++
 arch/arm/kvm/exports.c                      |   35 +
 arch/arm/kvm/guest.c                        |  147 +++++
 arch/arm/kvm/init.S                         |  144 +++++
 arch/arm/kvm/interrupts.S                   |  596 +++++++++++++++++++
 arch/arm/kvm/mmu.c                          |  608 +++++++++++++++++++
 arch/arm/kvm/trace.h                        |  117 ++++
 arch/arm/mm/Kconfig                         |   10 
 arch/arm/mm/idmap.c                         |   47 +-
 arch/arm/mm/mmu.c                           |    9 
 arch/ia64/include/asm/kvm.h                 |    1 
 arch/x86/include/asm/kvm.h                  |    1 
 include/linux/kvm.h                         |    1 
 include/linux/kvm_host.h                    |    6 
 include/trace/events/kvm.h                  |    4 
 mm/memory.c                                 |    2 
 virt/kvm/kvm_main.c                         |    2 
 37 files changed, 4045 insertions(+), 13 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/asm/kvm_mmu.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/exports.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/trace.h

-- 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v8 01/15] ARM: add mem_type prot_pte accessor
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
@ 2012-06-15 19:06 ` Christoffer Dall
  2012-06-15 19:07 ` [PATCH v8 02/15] KVM: use KVM_CAP_IRQ_ROUTING to protect the routing related code Christoffer Dall
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:06 UTC (permalink / raw)
  To: android-virt, kvm

From: Marc Zyngier <marc.zyngier@arm.com>

The KVM hypervisor mmu code requires requires access to the
mem_type prot_pte field when setting up page tables pointing
to a device. Unfortunately, the mem_type structure is opaque.

Add an accessor (get_mem_type_prot_pte()) to retrieve the
prot_pte value.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/mach/map.h |    1 +
 arch/arm/mm/mmu.c               |    6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index a6efcdd..3787c9f 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -37,6 +37,7 @@ extern void iotable_init(struct map_desc *, int);
 
 struct mem_type;
 extern const struct mem_type *get_mem_type(unsigned int type);
+extern pteval_t get_mem_type_prot_pte(unsigned int type);
 /*
  * external interface to remap single page with appropriate type
  */
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index e5dad60..f7439e7 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -301,6 +301,12 @@ const struct mem_type *get_mem_type(unsigned int type)
 }
 EXPORT_SYMBOL(get_mem_type);
 
+pteval_t get_mem_type_prot_pte(unsigned int type)
+{
+	return get_mem_type(type)->prot_pte;
+}
+EXPORT_SYMBOL(get_mem_type_prot_pte);
+
 /*
  * Adjust the PMD section entries according to the CPU in use.
  */


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 02/15] KVM: use KVM_CAP_IRQ_ROUTING to protect the routing related code
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
  2012-06-15 19:06 ` [PATCH v8 01/15] ARM: add mem_type prot_pte accessor Christoffer Dall
@ 2012-06-15 19:07 ` Christoffer Dall
  2012-06-18 13:06   ` Avi Kivity
  2012-06-15 19:07 ` [PATCH v8 03/15] KVM: Introduce __KVM_HAVE_IRQ_LINE Christoffer Dall
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:07 UTC (permalink / raw)
  To: android-virt, kvm

From: Marc Zyngier <marc.zyngier@arm.com>

The KVM code sometimes uses CONFIG_HAVE_KVM_IRQCHIP to protect
code that is related to IRQ routing, which not all in-kernel
irqchips may support.

Use KVM_CAP_IRQ_ROUTING instead.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 include/linux/kvm_host.h |    2 +-
 virt/kvm/kvm_main.c      |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 27ac8a4..c7f7787 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -802,7 +802,7 @@ static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_se
 }
 #endif
 
-#ifdef CONFIG_HAVE_KVM_IRQCHIP
+#ifdef KVM_CAP_IRQ_ROUTING
 
 #define KVM_MAX_IRQ_ROUTES 1024
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 02cb440..636bd08 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2225,7 +2225,7 @@ static long kvm_dev_ioctl_check_extension_generic(long arg)
 	case KVM_CAP_SIGNAL_MSI:
 #endif
 		return 1;
-#ifdef CONFIG_HAVE_KVM_IRQCHIP
+#ifdef KVM_CAP_IRQ_ROUTING
 	case KVM_CAP_IRQ_ROUTING:
 		return KVM_MAX_IRQ_ROUTES;
 #endif


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 03/15] KVM: Introduce __KVM_HAVE_IRQ_LINE
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
  2012-06-15 19:06 ` [PATCH v8 01/15] ARM: add mem_type prot_pte accessor Christoffer Dall
  2012-06-15 19:07 ` [PATCH v8 02/15] KVM: use KVM_CAP_IRQ_ROUTING to protect the routing related code Christoffer Dall
@ 2012-06-15 19:07 ` Christoffer Dall
  2012-06-18 13:07   ` Avi Kivity
  2012-06-15 19:07 ` [PATCH v8 04/15] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER Christoffer Dall
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:07 UTC (permalink / raw)
  To: android-virt, kvm

This is a preparatory patch for the KVM/ARM implementation. KVM/ARM will use
the KVM_IRQ_LINE ioctl, which is currently conditional on
__KVM_HAVE_IOAPIC, but ARM obviously doesn't have any IOAPIC support and we
need a separate define.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/ia64/include/asm/kvm.h |    1 +
 arch/x86/include/asm/kvm.h  |    1 +
 include/trace/events/kvm.h  |    4 +++-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/kvm.h b/arch/ia64/include/asm/kvm.h
index b9f82c8..ec6c6b3 100644
--- a/arch/ia64/include/asm/kvm.h
+++ b/arch/ia64/include/asm/kvm.h
@@ -26,6 +26,7 @@
 
 /* Select x86 specific features in <linux/kvm.h> */
 #define __KVM_HAVE_IOAPIC
+#define __KVM_HAVE_IRQ_LINE
 #define __KVM_HAVE_DEVICE_ASSIGNMENT
 
 /* Architectural interrupt line count. */
diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index e7d1c19..246617e 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -12,6 +12,7 @@
 /* Select x86 specific features in <linux/kvm.h> */
 #define __KVM_HAVE_PIT
 #define __KVM_HAVE_IOAPIC
+#define __KVM_HAVE_IRQ_LINE
 #define __KVM_HAVE_DEVICE_ASSIGNMENT
 #define __KVM_HAVE_MSI
 #define __KVM_HAVE_USER_NMI
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 46e3cd8..ae9779d 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -36,7 +36,7 @@ TRACE_EVENT(kvm_userspace_exit,
 		  __entry->errno < 0 ? -__entry->errno : __entry->reason)
 );
 
-#if defined(__KVM_HAVE_IOAPIC)
+#if defined(__KVM_HAVE_IRQ_LINE)
 TRACE_EVENT(kvm_set_irq,
 	TP_PROTO(unsigned int gsi, int level, int irq_source_id),
 	TP_ARGS(gsi, level, irq_source_id),
@@ -56,7 +56,9 @@ TRACE_EVENT(kvm_set_irq,
 	TP_printk("gsi %u level %d source %d",
 		  __entry->gsi, __entry->level, __entry->irq_source_id)
 );
+#endif
 
+#if defined(__KVM_HAVE_IOAPIC)
 #define kvm_deliver_mode		\
 	{0x0, "Fixed"},			\
 	{0x1, "LowPrio"},		\


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 04/15] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (2 preceding siblings ...)
  2012-06-15 19:07 ` [PATCH v8 03/15] KVM: Introduce __KVM_HAVE_IRQ_LINE Christoffer Dall
@ 2012-06-15 19:07 ` Christoffer Dall
  2012-06-18 13:08   ` Avi Kivity
  2012-06-28 21:28   ` Marcelo Tosatti
  2012-06-15 19:07 ` [PATCH v8 05/15] ARM: KVM: Initial skeleton to compile KVM support Christoffer Dall
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:07 UTC (permalink / raw)
  To: android-virt, kvm

From: Marc Zyngier <marc.zyngier@arm.com>

In order to avoid compilation failure when KVM is not compiled in,
guard the mmu_notifier specific sections with both CONFIG_MMU_NOTIFIER
and KVM_ARCH_WANT_MMU_NOTIFIER, like it is being done in the rest of
the KVM code.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 include/linux/kvm_host.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c7f7787..e3c86f8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -306,7 +306,7 @@ struct kvm {
 	struct hlist_head irq_ack_notifier_list;
 #endif
 
-#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 	struct mmu_notifier mmu_notifier;
 	unsigned long mmu_notifier_seq;
 	long mmu_notifier_count;
@@ -780,7 +780,7 @@ struct kvm_stats_debugfs_item {
 extern struct kvm_stats_debugfs_item debugfs_entries[];
 extern struct dentry *kvm_debugfs_dir;
 
-#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq)
 {
 	if (unlikely(vcpu->kvm->mmu_notifier_count))


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 05/15] ARM: KVM: Initial skeleton to compile KVM support
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (3 preceding siblings ...)
  2012-06-15 19:07 ` [PATCH v8 04/15] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER Christoffer Dall
@ 2012-06-15 19:07 ` Christoffer Dall
  2012-06-15 19:07 ` [PATCH v8 06/15] ARM: KVM: Hypervisor identity mapping Christoffer Dall
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:07 UTC (permalink / raw)
  To: android-virt, kvm

From: Christoffer Dall <cdall@cs.columbia.edu>

Targets KVM support for Cortex A-15 processors.

Contains no real functionality but all the framework components,
make files, header files and some tracing functionality.

“Nothing to see here. Move along, move along..."

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/Kconfig                   |    2 
 arch/arm/Makefile                  |    1 
 arch/arm/include/asm/kvm.h         |   78 ++++++++++
 arch/arm/include/asm/kvm_asm.h     |   28 ++++
 arch/arm/include/asm/kvm_emulate.h |   97 +++++++++++++
 arch/arm/include/asm/kvm_host.h    |  116 +++++++++++++++
 arch/arm/include/asm/unified.h     |   12 ++
 arch/arm/kvm/Kconfig               |   44 ++++++
 arch/arm/kvm/Makefile              |   17 ++
 arch/arm/kvm/arm.c                 |  279 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/emulate.c             |  125 ++++++++++++++++
 arch/arm/kvm/exports.c             |   16 ++
 arch/arm/kvm/guest.c               |  148 +++++++++++++++++++
 arch/arm/kvm/init.S                |   17 ++
 arch/arm/kvm/interrupts.S          |   17 ++
 arch/arm/kvm/mmu.c                 |   15 ++
 arch/arm/kvm/trace.h               |   52 +++++++
 arch/arm/mm/Kconfig                |   10 +
 18 files changed, 1074 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/exports.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/trace.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b649c59..736244c 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2273,3 +2273,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
+
+source "arch/arm/kvm/Kconfig"
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 0298b00..64f1e16 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -250,6 +250,7 @@ core-$(CONFIG_VFP)		+= arch/arm/vfp/
 # If we have a machine-specific directory, then include it in the build.
 core-y				+= arch/arm/kernel/ arch/arm/mm/ arch/arm/common/
 core-y				+= arch/arm/net/
+core-y 				+= arch/arm/kvm/
 core-y				+= $(machdirs) $(platdirs)
 
 drivers-$(CONFIG_OPROFILE)      += arch/arm/oprofile/
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
new file mode 100644
index 0000000..c8466b7
--- /dev/null
+++ b/arch/arm/include/asm/kvm.h
@@ -0,0 +1,78 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __ARM_KVM_H__
+#define __ARM_KVM_H__
+
+#include <asm/types.h>
+
+#define __KVM_HAVE_GUEST_DEBUG
+
+/*
+ * Modes used for short-hand mode determinition in the world-switch code and
+ * in emulation code.
+ *
+ * Note: These indices do NOT correspond to the value of the CPSR mode bits!
+ */
+enum vcpu_mode {
+	MODE_FIQ = 0,
+	MODE_IRQ,
+	MODE_SVC,
+	MODE_ABT,
+	MODE_UND,
+	MODE_USR,
+	MODE_SYS
+};
+
+struct kvm_regs {
+	__u32 regs0_7[8];	/* Unbanked regs. (r0 - r7)	   */
+	__u32 fiq_regs8_12[5];	/* Banked fiq regs. (r8 - r12)	   */
+	__u32 usr_regs8_12[5];	/* Banked usr registers (r8 - r12) */
+	__u32 reg13[6];		/* Banked r13, indexed by MODE_	   */
+	__u32 reg14[6];		/* Banked r13, indexed by MODE_	   */
+	__u32 reg15;
+	__u32 cpsr;
+	__u32 spsr[5];		/* Banked SPSR,  indexed by MODE_  */
+	struct {
+		__u32 c0_midr;
+		__u32 c1_sys;
+		__u32 c2_base0;
+		__u32 c2_base1;
+		__u32 c2_control;
+		__u32 c3_dacr;
+	} cp15;
+
+};
+
+struct kvm_sregs {
+};
+
+struct kvm_fpu {
+};
+
+struct kvm_guest_debug_arch {
+};
+
+struct kvm_debug_exit_arch {
+};
+
+struct kvm_sync_regs {
+};
+
+struct kvm_arch_memory_slot {
+};
+
+#endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
new file mode 100644
index 0000000..c3d4458
--- /dev/null
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -0,0 +1,28 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __ARM_KVM_ASM_H__
+#define __ARM_KVM_ASM_H__
+
+#define ARM_EXCEPTION_RESET	  0
+#define ARM_EXCEPTION_UNDEFINED   1
+#define ARM_EXCEPTION_SOFTWARE    2
+#define ARM_EXCEPTION_PREF_ABORT  3
+#define ARM_EXCEPTION_DATA_ABORT  4
+#define ARM_EXCEPTION_IRQ	  5
+#define ARM_EXCEPTION_FIQ	  6
+
+#endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
new file mode 100644
index 0000000..7ab696d
--- /dev/null
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -0,0 +1,97 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __ARM_KVM_EMULATE_H__
+#define __ARM_KVM_EMULATE_H__
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_asm.h>
+
+u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, enum vcpu_mode mode);
+
+static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
+{
+	u8 modes_table[16] = {
+		MODE_USR,	/* 0x0 */
+		MODE_FIQ,	/* 0x1 */
+		MODE_IRQ,	/* 0x2 */
+		MODE_SVC,	/* 0x3 */
+		0xf, 0xf, 0xf,
+		MODE_ABT,	/* 0x7 */
+		0xf, 0xf, 0xf,
+		MODE_UND,	/* 0xb */
+		0xf, 0xf, 0xf,
+		MODE_SYS};	/* 0xf */
+
+	BUG_ON(modes_table[vcpu->arch.regs.cpsr & 0xf] == 0xf);
+	return modes_table[vcpu->arch.regs.cpsr & 0xf];
+}
+
+/*
+ * Return the SPSR for the specified mode of the virtual CPU.
+ */
+static inline u32 *vcpu_spsr_mode(struct kvm_vcpu *vcpu, enum vcpu_mode mode)
+{
+	switch (mode) {
+	case MODE_SVC:
+		return &vcpu->arch.regs.svc_regs[2];
+	case MODE_ABT:
+		return &vcpu->arch.regs.abt_regs[2];
+	case MODE_UND:
+		return &vcpu->arch.regs.und_regs[2];
+	case MODE_IRQ:
+		return &vcpu->arch.regs.irq_regs[2];
+	case MODE_FIQ:
+		return &vcpu->arch.regs.fiq_regs[7];
+	default:
+		BUG();
+	}
+}
+
+/* Get vcpu register for current mode */
+static inline u32 *vcpu_reg(struct kvm_vcpu *vcpu, unsigned long reg_num)
+{
+	return vcpu_reg_mode(vcpu, reg_num, vcpu_mode(vcpu));
+}
+
+static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
+{
+	return vcpu_reg(vcpu, 15);
+}
+
+static inline u32 *vcpu_cpsr(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.regs.cpsr;
+}
+
+/* Get vcpu SPSR for current mode */
+static inline u32 *vcpu_spsr(struct kvm_vcpu *vcpu)
+{
+	return vcpu_spsr_mode(vcpu, vcpu_mode(vcpu));
+}
+
+static inline bool mode_has_spsr(struct kvm_vcpu *vcpu)
+{
+	return (vcpu_mode(vcpu) < MODE_USR);
+}
+
+static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
+{
+	BUG_ON(vcpu_mode(vcpu) > MODE_SYS);
+	return vcpu_mode(vcpu) != MODE_USR;
+}
+
+#endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
new file mode 100644
index 0000000..2d611df
--- /dev/null
+++ b/arch/arm/include/asm/kvm_host.h
@@ -0,0 +1,116 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __ARM_KVM_HOST_H__
+#define __ARM_KVM_HOST_H__
+
+#define KVM_MAX_VCPUS 1
+#define KVM_MEMORY_SLOTS 32
+#define KVM_PRIVATE_MEM_SLOTS 4
+#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+
+/* We don't currently support large pages. */
+#define KVM_HPAGE_GFN_SHIFT(x)	0
+#define KVM_NR_PAGE_SIZES	1
+#define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
+
+struct kvm_vcpu;
+u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
+
+struct kvm_arch {
+	/* The VMID used for the virt. memory system */
+	u64    vmid;
+
+	/* 1-level 2nd stage table and lock */
+	struct mutex pgd_mutex;
+	pgd_t *pgd;
+
+	/* VTTBR value associated with above pgd and vmid */
+	u64    vttbr;
+};
+
+#define EXCEPTION_NONE      0
+#define EXCEPTION_RESET     0x80
+#define EXCEPTION_UNDEFINED 0x40
+#define EXCEPTION_SOFTWARE  0x20
+#define EXCEPTION_PREFETCH  0x10
+#define EXCEPTION_DATA      0x08
+#define EXCEPTION_IMPRECISE 0x04
+#define EXCEPTION_IRQ       0x02
+#define EXCEPTION_FIQ       0x01
+
+struct kvm_vcpu_regs {
+	u32 usr_regs[15];	/* R0_usr - R14_usr */
+	u32 svc_regs[3];	/* SP_svc, LR_svc, SPSR_svc */
+	u32 abt_regs[3];	/* SP_abt, LR_abt, SPSR_abt */
+	u32 und_regs[3];	/* SP_und, LR_und, SPSR_und */
+	u32 irq_regs[3];	/* SP_irq, LR_irq, SPSR_irq */
+	u32 fiq_regs[8];	/* R8_fiq - R14_fiq, SPSR_fiq */
+	u32 pc;			/* The program counter (r15) */
+	u32 cpsr;		/* The guest CPSR */
+} __packed;
+
+enum cp15_regs {
+	c0_MIDR,		/* Main ID Register */
+	c0_MPIDR,		/* MultiProcessor ID Register */
+	c1_SCTLR,		/* System Control Register */
+	c1_ACTLR,		/* Auxilliary Control Register */
+	c1_CPACR,		/* Coprocessor Access Control */
+	c2_TTBR0,		/* Translation Table Base Register 0 */
+	c2_TTBR0_high,		/* TTBR0 top 32 bits */
+	c2_TTBR1,		/* Translation Table Base Register 1 */
+	c2_TTBR1_high,		/* TTBR1 top 32 bits */
+	c2_TTBCR,		/* Translation Table Base Control R. */
+	c3_DACR,		/* Domain Access Control Register */
+	c10_PRRR,		/* Primary Region Remap Register */
+	c10_NMRR,		/* Normal Memory Remap Register */
+	c13_CID,		/* Context ID Register */
+	c13_TID_URW,		/* Thread ID, User R/W */
+	c13_TID_URO,		/* Thread ID, User R/O */
+	c13_TID_PRIV,		/* Thread ID, Priveleged */
+
+	nr_cp15_regs
+};
+
+struct kvm_vcpu_arch {
+	struct kvm_vcpu_regs regs;
+
+	/* System control coprocessor (cp15) */
+	u32 cp15[nr_cp15_regs];
+
+	/* Exception Information */
+	u32 hsr;		/* Hyp Syndrom Register */
+	u32 hdfar;		/* Hyp Data Fault Address Register */
+	u32 hifar;		/* Hyp Inst. Fault Address Register */
+	u32 hpfar;		/* Hyp IPA Fault Address Register */
+
+	/* IO related fields */
+	u32 mmio_rd;
+
+	/* Interrupt related fields */
+	u32 irq_lines;		/* IRQ and FIQ levels */
+	u32 wait_for_interrupts;
+};
+
+struct kvm_vm_stat {
+	u32 remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat {
+	u32 halt_wakeup;
+};
+
+#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/unified.h b/arch/arm/include/asm/unified.h
index f5989f4..e5dc5f6 100644
--- a/arch/arm/include/asm/unified.h
+++ b/arch/arm/include/asm/unified.h
@@ -54,6 +54,18 @@
 
 #endif	/* CONFIG_THUMB2_KERNEL */
 
+#ifdef CONFIG_KVM_ARM_HOST
+#ifdef __ASSEMBLY__
+.arch_extension sec
+.arch_extension virt
+#else
+__asm__(
+"	.arch_extension sec\n"
+"	.arch_extension virt\n"
+);
+#endif
+#endif
+
 #ifndef CONFIG_ARM_ASM_UNIFIED
 
 /*
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
new file mode 100644
index 0000000..83abbe0
--- /dev/null
+++ b/arch/arm/kvm/Kconfig
@@ -0,0 +1,44 @@
+#
+# KVM configuration
+#
+
+source "virt/kvm/Kconfig"
+
+menuconfig VIRTUALIZATION
+	bool "Virtualization"
+	---help---
+	  Say Y here to get to see options for using your Linux host to run
+	  other operating systems inside virtual machines (guests).
+	  This option alone does not add any kernel code.
+
+	  If you say N, all options in this submenu will be skipped and
+	  disabled.
+
+if VIRTUALIZATION
+
+config KVM
+	tristate "Kernel-based Virtual Machine (KVM) support"
+	select PREEMPT_NOTIFIERS
+	select ANON_INODES
+	select KVM_MMIO
+	depends on CPU_V7 && ARM_VIRT_EXT
+	---help---
+	  Support hosting virtualized guest machines. You will also
+	  need to select one or more of the processor modules below.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
+config KVM_ARM_HOST
+	bool "KVM host support for ARM cpus."
+	depends on KVM
+	depends on MMU
+	depends on CPU_V7 && ARM_VIRT_EXT
+	---help---
+	  Provides host support for ARM processors.
+
+source drivers/virtio/Kconfig
+
+endif # VIRTUALIZATION
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
new file mode 100644
index 0000000..e69a8e1
--- /dev/null
+++ b/arch/arm/kvm/Makefile
@@ -0,0 +1,17 @@
+#
+# Makefile for Kernel-based Virtual Machine module
+#
+
+ccflags-y += -Ivirt/kvm -Iarch/arm/kvm
+CFLAGS_arm.o     := -I.
+CFLAGS_mmu.o := -I.
+
+AFLAGS_interrupts.o := -I$(obj)
+
+obj-$(CONFIG_KVM_ARM_HOST) += init.o interrupts.o exports.o
+
+kvm-arm-y += $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+
+kvm-arm-y += arm.o guest.o mmu.o emulate.o
+
+obj-$(CONFIG_KVM) += kvm-arm.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
new file mode 100644
index 0000000..5992d90
--- /dev/null
+++ b/arch/arm/kvm/arm.c
@@ -0,0 +1,279 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <linux/mman.h>
+#include <linux/sched.h>
+#include <trace/events/kvm.h>
+
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
+#include <asm/unified.h>
+#include <asm/uaccess.h>
+#include <asm/ptrace.h>
+#include <asm/mman.h>
+
+int kvm_arch_hardware_enable(void *garbage)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+
+void kvm_arch_hardware_disable(void *garbage)
+{
+}
+
+int kvm_arch_hardware_setup(void)
+{
+	return 0;
+}
+
+void kvm_arch_hardware_unsetup(void)
+{
+}
+
+void kvm_arch_check_processor_compat(void *rtn)
+{
+	*(int *)rtn = 0;
+}
+
+void kvm_arch_sync_events(struct kvm *kvm)
+{
+}
+
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+{
+	if (type)
+		return -EINVAL;
+
+	return 0;
+}
+
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+	return VM_FAULT_SIGBUS;
+}
+
+void kvm_arch_free_memslot(struct kvm_memory_slot *free,
+			   struct kvm_memory_slot *dont)
+{
+}
+
+int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
+{
+	return 0;
+}
+
+void kvm_arch_destroy_vm(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+		if (kvm->vcpus[i]) {
+			kvm_arch_vcpu_free(kvm->vcpus[i]);
+			kvm->vcpus[i] = NULL;
+		}
+	}
+}
+
+int kvm_dev_ioctl_check_extension(long ext)
+{
+	int r;
+	switch (ext) {
+	case KVM_CAP_USER_MEMORY:
+	case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
+		r = 1;
+		break;
+	case KVM_CAP_COALESCED_MMIO:
+		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
+		break;
+	default:
+		r = 0;
+		break;
+	}
+	return r;
+}
+
+long kvm_arch_dev_ioctl(struct file *filp,
+			unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_set_memory_region(struct kvm *kvm,
+			       struct kvm_userspace_memory_region *mem,
+			       struct kvm_memory_slot old,
+			       int user_alloc)
+{
+	return 0;
+}
+
+int kvm_arch_prepare_memory_region(struct kvm *kvm,
+				   struct kvm_memory_slot *memslot,
+				   struct kvm_memory_slot old,
+				   struct kvm_userspace_memory_region *mem,
+				   int user_alloc)
+{
+	return 0;
+}
+
+void kvm_arch_commit_memory_region(struct kvm *kvm,
+				   struct kvm_userspace_memory_region *mem,
+				   struct kvm_memory_slot old,
+				   int user_alloc)
+{
+}
+
+void kvm_arch_flush_shadow(struct kvm *kvm)
+{
+}
+
+struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	int err;
+	struct kvm_vcpu *vcpu;
+
+	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	if (!vcpu) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	return vcpu;
+free_vcpu:
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
+out:
+	return ERR_PTR(err);
+}
+
+void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvm_arch_vcpu_free(vcpu);
+}
+
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
+void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
+{
+}
+
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+					struct kvm_guest_debug *dbg)
+{
+	return -EINVAL;
+}
+
+
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vcpu_ioctl(struct file *filp,
+			 unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vm_ioctl(struct file *filp,
+		       unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_init(void *opaque)
+{
+	return 0;
+}
+
+void kvm_arch_exit(void)
+{
+}
+
+static int arm_init(void)
+{
+	int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+	return rc;
+}
+
+static void __exit arm_exit(void)
+{
+	kvm_exit();
+}
+
+module_init(arm_init);
+module_exit(arm_exit)
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
new file mode 100644
index 0000000..0e5bd90
--- /dev/null
+++ b/arch/arm/kvm/emulate.c
@@ -0,0 +1,125 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#include <asm/kvm_emulate.h>
+
+#define REG_OFFSET(_reg) \
+	(offsetof(struct kvm_vcpu_regs, _reg) / sizeof(u32))
+
+#define USR_REG_OFFSET(_num) REG_OFFSET(usr_regs[_num])
+
+static const unsigned long vcpu_reg_offsets[MODE_SYS + 1][16] = {
+	/* FIQ Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7),
+		REG_OFFSET(fiq_regs[1]), /* r8 */
+		REG_OFFSET(fiq_regs[1]), /* r9 */
+		REG_OFFSET(fiq_regs[2]), /* r10 */
+		REG_OFFSET(fiq_regs[3]), /* r11 */
+		REG_OFFSET(fiq_regs[4]), /* r12 */
+		REG_OFFSET(fiq_regs[5]), /* r13 */
+		REG_OFFSET(fiq_regs[6]), /* r14 */
+		REG_OFFSET(pc)		 /* r15 */
+	},
+
+	/* IRQ Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(irq_regs[0]), /* r13 */
+		REG_OFFSET(irq_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* SVC Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(svc_regs[0]), /* r13 */
+		REG_OFFSET(svc_regs[1]), /* r14 */
+		REG_OFFSET(pc)		 /* r15 */
+	},
+
+	/* ABT Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(abt_regs[0]), /* r13 */
+		REG_OFFSET(abt_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* UND Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(und_regs[0]), /* r13 */
+		REG_OFFSET(und_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* USR Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(usr_regs[13]), /* r13 */
+		REG_OFFSET(usr_regs[14]), /* r14 */
+		REG_OFFSET(pc)	          /* r15 */
+	},
+
+	/* SYS Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(usr_regs[13]), /* r13 */
+		REG_OFFSET(usr_regs[14]), /* r14 */
+		REG_OFFSET(pc)	          /* r15 */
+	},
+};
+
+/*
+ * Return a pointer to the register number valid in the specified mode of
+ * the virtual CPU.
+ */
+u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
+{
+	u32 *reg_array = (u32 *)&vcpu->arch.regs;
+
+	BUG_ON(reg_num > 15);
+	BUG_ON(mode > MODE_SYS);
+
+	return reg_array + vcpu_reg_offsets[mode][reg_num];
+}
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
new file mode 100644
index 0000000..d8a7fd5
--- /dev/null
+++ b/arch/arm/kvm/exports.c
@@ -0,0 +1,16 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/module.h>
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
new file mode 100644
index 0000000..9c75ec4
--- /dev/null
+++ b/arch/arm/kvm/guest.c
@@ -0,0 +1,148 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <asm/uaccess.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
+
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ NULL }
+};
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	struct kvm_vcpu_regs *vcpu_regs = &vcpu->arch.regs;
+
+	/*
+	 * GPRs and PSRs
+	 */
+	memcpy(regs->regs0_7, &(vcpu_regs->usr_regs[0]), sizeof(u32) * 8);
+	memcpy(regs->usr_regs8_12, &(vcpu_regs->usr_regs[8]), sizeof(u32) * 5);
+	memcpy(regs->fiq_regs8_12, &(vcpu_regs->fiq_regs[0]), sizeof(u32) * 5);
+	regs->reg13[MODE_FIQ] = vcpu_regs->fiq_regs[5];
+	regs->reg14[MODE_FIQ] = vcpu_regs->fiq_regs[6];
+	regs->reg13[MODE_IRQ] = vcpu_regs->irq_regs[0];
+	regs->reg14[MODE_IRQ] = vcpu_regs->irq_regs[1];
+	regs->reg13[MODE_SVC] = vcpu_regs->svc_regs[0];
+	regs->reg14[MODE_SVC] = vcpu_regs->svc_regs[1];
+	regs->reg13[MODE_ABT] = vcpu_regs->abt_regs[0];
+	regs->reg14[MODE_ABT] = vcpu_regs->abt_regs[1];
+	regs->reg13[MODE_UND] = vcpu_regs->und_regs[0];
+	regs->reg14[MODE_UND] = vcpu_regs->und_regs[1];
+	regs->reg13[MODE_USR] = vcpu_regs->usr_regs[0];
+	regs->reg14[MODE_USR] = vcpu_regs->usr_regs[1];
+
+	regs->spsr[MODE_FIQ]  = vcpu_regs->fiq_regs[7];
+	regs->spsr[MODE_IRQ]  = vcpu_regs->irq_regs[2];
+	regs->spsr[MODE_SVC]  = vcpu_regs->svc_regs[2];
+	regs->spsr[MODE_ABT]  = vcpu_regs->abt_regs[2];
+	regs->spsr[MODE_UND]  = vcpu_regs->und_regs[2];
+
+	regs->reg15 = vcpu_regs->pc;
+	regs->cpsr = vcpu_regs->cpsr;
+
+
+	/*
+	 * Co-processor registers.
+	 */
+	regs->cp15.c0_midr = vcpu->arch.cp15[c0_MIDR];
+	regs->cp15.c1_sys = vcpu->arch.cp15[c1_SCTLR];
+	regs->cp15.c2_base0 = vcpu->arch.cp15[c2_TTBR0];
+	regs->cp15.c2_base1 = vcpu->arch.cp15[c2_TTBR1];
+	regs->cp15.c2_control = vcpu->arch.cp15[c2_TTBCR];
+	regs->cp15.c3_dacr = vcpu->arch.cp15[c3_DACR];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	struct kvm_vcpu_regs *vcpu_regs = &vcpu->arch.regs;
+
+	memcpy(&(vcpu_regs->usr_regs[0]), regs->regs0_7, sizeof(u32) * 8);
+	memcpy(&(vcpu_regs->usr_regs[8]), regs->usr_regs8_12, sizeof(u32) * 5);
+	memcpy(&(vcpu_regs->fiq_regs[0]), regs->fiq_regs8_12, sizeof(u32) * 5);
+
+	vcpu_regs->fiq_regs[5] = regs->reg13[MODE_FIQ];
+	vcpu_regs->fiq_regs[6] = regs->reg14[MODE_FIQ];
+	vcpu_regs->irq_regs[0] = regs->reg13[MODE_IRQ];
+	vcpu_regs->irq_regs[1] = regs->reg14[MODE_IRQ];
+	vcpu_regs->svc_regs[0] = regs->reg13[MODE_SVC];
+	vcpu_regs->svc_regs[1] = regs->reg14[MODE_SVC];
+	vcpu_regs->abt_regs[0] = regs->reg13[MODE_ABT];
+	vcpu_regs->abt_regs[1] = regs->reg14[MODE_ABT];
+	vcpu_regs->und_regs[0] = regs->reg13[MODE_UND];
+	vcpu_regs->und_regs[1] = regs->reg14[MODE_UND];
+	vcpu_regs->usr_regs[0] = regs->reg13[MODE_USR];
+	vcpu_regs->usr_regs[1] = regs->reg14[MODE_USR];
+
+	vcpu_regs->fiq_regs[7] = regs->spsr[MODE_FIQ];
+	vcpu_regs->irq_regs[2] = regs->spsr[MODE_IRQ];
+	vcpu_regs->svc_regs[2] = regs->spsr[MODE_SVC];
+	vcpu_regs->abt_regs[2] = regs->spsr[MODE_ABT];
+	vcpu_regs->und_regs[2] = regs->spsr[MODE_UND];
+
+	/*
+	 * Co-processor registers.
+	 */
+	vcpu->arch.cp15[c0_MIDR] = regs->cp15.c0_midr;
+	vcpu->arch.cp15[c1_SCTLR] = regs->cp15.c1_sys;
+
+	vcpu_regs->pc = regs->reg15;
+	vcpu_regs->cpsr = regs->cpsr;
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+				  struct kvm_translation *tr)
+{
+	return -EINVAL;
+}
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
new file mode 100644
index 0000000..073a494
--- /dev/null
+++ b/arch/arm/kvm/init.S
@@ -0,0 +1,17 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
new file mode 100644
index 0000000..073a494
--- /dev/null
+++ b/arch/arm/kvm/interrupts.S
@@ -0,0 +1,17 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
new file mode 100644
index 0000000..2cccd48
--- /dev/null
+++ b/arch/arm/kvm/mmu.c
@@ -0,0 +1,15 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
new file mode 100644
index 0000000..f8869c1
--- /dev/null
+++ b/arch/arm/kvm/trace.h
@@ -0,0 +1,52 @@
+#if !defined(_TRACE_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+/*
+ * Tracepoints for entry/exit to guest
+ */
+TRACE_EVENT(kvm_entry,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+TRACE_EVENT(kvm_exit,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+
+
+#endif /* _TRACE_KVM_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH arch/arm/kvm
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 101b968..037dc53 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -597,6 +597,16 @@ config ARM_LPAE
 
 	  If unsure, say N.
 
+config ARM_VIRT_EXT
+	bool "Support for ARM Virtualization Extensions"
+	depends on ARM_LPAE
+	help
+	  Say Y if you have an ARMv7 processor supporting the ARM hardware
+	  Virtualization extensions. KVM depends on this feature and will
+	  not run without it being selected. If you say Y here, the kernel
+	  will not boot on a machine without virtualization extensions and
+	  will not boot as a KVM guest.
+
 config ARCH_PHYS_ADDR_T_64BIT
 	def_bool ARM_LPAE
 


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 06/15] ARM: KVM: Hypervisor identity mapping
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (4 preceding siblings ...)
  2012-06-15 19:07 ` [PATCH v8 05/15] ARM: KVM: Initial skeleton to compile KVM support Christoffer Dall
@ 2012-06-15 19:07 ` Christoffer Dall
  2012-06-18 13:12   ` Avi Kivity
  2012-06-15 19:07 ` [PATCH v8 07/15] ARM: KVM: Hypervisor inititalization Christoffer Dall
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:07 UTC (permalink / raw)
  To: android-virt, kvm

Adds support in the identity mapping feature that allows KVM to setup
identity mapping for the Hyp mode with the AP[1] bit set as required by
the specification and also supports freeing created sub pmd's after
finished use.

These two functions:
 - hyp_idmap_add(pgd, addr, end);
 - hyp_idmap_del(pgd, addr, end);
are essentially calls to the same function as the non-hyp versions but
with a different argument value. KVM calls these functions to setup
and teardown the identity mapping used to initialize the hypervisor.

Note, the hyp-version of the _del function actually frees the pmd's
pointed to by the pgd as opposed to the non-hyp version which just
clears them.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/pgtable-3level-hwdef.h |    1 +
 arch/arm/include/asm/pgtable.h              |    5 +++
 arch/arm/kvm/guest.c                        |    1 -
 arch/arm/mm/idmap.c                         |   47 +++++++++++++++++++++++++--
 4 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..a2d404e 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -44,6 +44,7 @@
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
+#define PMD_SECT_AP1		(_AT(pmdval_t, 1) << 6)
 #define PMD_SECT_TEX(x)		(_AT(pmdval_t, 0))
 
 /*
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index f66626d..c7bd809 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -310,6 +310,11 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 
 #define pgtable_cache_init() do { } while (0)
 
+#ifdef CONFIG_KVM_ARM_HOST
+void hyp_idmap_add(pgd_t *, unsigned long, unsigned long);
+void hyp_idmap_del(pgd_t *pgd, unsigned long addr, unsigned long end);
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* CONFIG_MMU */
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 9c75ec4..c0adab0 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -24,7 +24,6 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 
-
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ NULL }
 };
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index ab88ed4..87d00ae 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,3 +1,4 @@
+#include <linux/module.h>
 #include <linux/kernel.h>
 
 #include <asm/cputype.h>
@@ -59,11 +60,13 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 	} while (pud++, addr = next, addr != end);
 }
 
-static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+static void identity_mapping_add(pgd_t *pgd, unsigned long addr,
+				 unsigned long end, unsigned long prot)
 {
-	unsigned long prot, next;
+	unsigned long next;
+
+	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
-	prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
 		prot |= PMD_BIT4;
 
@@ -90,12 +93,48 @@ static int __init init_static_idmap(void)
 
 	pr_info("Setting up static identity map for 0x%llx - 0x%llx\n",
 		(long long)idmap_start, (long long)idmap_end);
-	identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
+	identity_mapping_add(idmap_pgd, idmap_start, idmap_end, 0);
 
 	return 0;
 }
 early_initcall(init_static_idmap);
 
+#ifdef CONFIG_KVM_ARM_HOST
+void hyp_idmap_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+{
+	identity_mapping_add(pgd, addr, end, PMD_SECT_AP1);
+}
+EXPORT_SYMBOL_GPL(hyp_idmap_add);
+
+static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pud = pud_offset(pgd, addr);
+	pmd = pmd_offset(pud, addr);
+	pmd_free(NULL, pmd);
+	pud_clear(pud);
+}
+
+/*
+ * This version actually frees the underlying pmds for all pgds in range and
+ * clear the pgds themselves afterwards.
+ */
+void hyp_idmap_del(pgd_t *pgd, unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+
+	pgd += pgd_index(addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (!pgd_none_or_clear_bad(pgd))
+			hyp_idmap_del_pmd(pgd, addr);
+	} while (pgd++, addr = next, addr < end);
+}
+EXPORT_SYMBOL_GPL(hyp_idmap_del);
+#endif
+
 /*
  * In order to soft-boot, we need to switch to a 1:1 mapping for the
  * cpu_reset functions. This will then ensure that we have predictable


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 07/15] ARM: KVM: Hypervisor inititalization
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (5 preceding siblings ...)
  2012-06-15 19:07 ` [PATCH v8 06/15] ARM: KVM: Hypervisor identity mapping Christoffer Dall
@ 2012-06-15 19:07 ` Christoffer Dall
  2012-06-28 22:35   ` Marcelo Tosatti
  2012-06-15 19:08 ` [PATCH v8 08/15] ARM: KVM: Module unloading support Christoffer Dall
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:07 UTC (permalink / raw)
  To: android-virt, kvm

Sets up the required registers to run code in HYP-mode from the kernel.

By setting the HVBAR the kernel can execute code in Hyp-mode with
the MMU disabled. The HVBAR initially points to initialization code,
which initializes other Hyp-mode registers and enables the MMU
for Hyp-mode. Afterwards, the HVBAR is changed to point to KVM
Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.

Also provides memory mapping code to map required code pages and data
structures accessed in Hyp mode at the same virtual address as the
host kernel virtual addresses, but which conforms to the architectural
requirements for translations in Hyp mode. This interface is added in
arch/arm/kvm/arm_mmu.c and is comprised of:
 - create_hyp_mappings(hyp_pgd, start, end);
 - free_hyp_pmds(pgd_hyp);

Note: The initialization mechanism currently relies on an SMC #0 call
to the secure monitor, which was merely a fast way of getting to the
hypervisor. Dave Marting and Rusty Russel have patches out to make the
boot-wrapper and the kernel boot in Hyp-mode and setup a generic way for
hypervisors to get access to Hyp-mode if the boot-loader allows such
access.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h              |  117 +++++++++++++++++++
 arch/arm/include/asm/kvm_asm.h              |   22 +++
 arch/arm/include/asm/kvm_mmu.h              |   37 ++++++
 arch/arm/include/asm/pgtable-3level-hwdef.h |    4 +
 arch/arm/include/asm/pgtable-3level.h       |    4 +
 arch/arm/include/asm/pgtable.h              |    1 
 arch/arm/kvm/arm.c                          |  167 +++++++++++++++++++++++++++
 arch/arm/kvm/exports.c                      |   15 ++
 arch/arm/kvm/init.S                         |   99 ++++++++++++++++
 arch/arm/kvm/interrupts.S                   |   47 +++++++
 arch/arm/kvm/mmu.c                          |  170 +++++++++++++++++++++++++++
 mm/memory.c                                 |    2 
 12 files changed, 685 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_mmu.h

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
new file mode 100644
index 0000000..7f30cbd
--- /dev/null
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -0,0 +1,117 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __KVM_ARM_H__
+#define __KVM_ARM_H__
+
+#include <asm/types.h>
+
+/* Hyp Configuration Register (HCR) bits */
+#define HCR_TGE		(1 << 27)
+#define HCR_TVM		(1 << 26)
+#define HCR_TTLB	(1 << 25)
+#define HCR_TPU		(1 << 24)
+#define HCR_TPC		(1 << 23)
+#define HCR_TSW		(1 << 22)
+#define HCR_TAC		(1 << 21)
+#define HCR_TIDCP	(1 << 20)
+#define HCR_TSC		(1 << 19)
+#define HCR_TID3	(1 << 18)
+#define HCR_TID2	(1 << 17)
+#define HCR_TID1	(1 << 16)
+#define HCR_TID0	(1 << 15)
+#define HCR_TWE		(1 << 14)
+#define HCR_TWI		(1 << 13)
+#define HCR_DC		(1 << 12)
+#define HCR_BSU		(3 << 10)
+#define HCR_BSU_IS	(1 << 10)
+#define HCR_FB		(1 << 9)
+#define HCR_VA		(1 << 8)
+#define HCR_VI		(1 << 7)
+#define HCR_VF		(1 << 6)
+#define HCR_AMO		(1 << 5)
+#define HCR_IMO		(1 << 4)
+#define HCR_FMO		(1 << 3)
+#define HCR_PTW		(1 << 2)
+#define HCR_SWIO	(1 << 1)
+#define HCR_VM		1
+
+/*
+ * The bits we set in HCR:
+ * TAC:		Trap ACTLR
+ * TSC:		Trap SMC
+ * TWI:		Trap WFI
+ * BSU_IS:	Upgrade barriers to the inner shareable domain
+ * FB:		Force broadcast of all maintainance operations
+ * AMO:		Override CPSR.A and enable signaling with VA
+ * IMO:		Override CPSR.I and enable signaling with VI
+ * FMO:		Override CPSR.F and enable signaling with VF
+ * SWIO:	Turn set/way invalidates into set/way clean+invalidate
+ */
+#define HCR_GUEST_MASK (HCR_TSC | HCR_TWI | HCR_VM | HCR_BSU_IS | HCR_FB | \
+			HCR_AMO | HCR_IMO | HCR_FMO | HCR_FMO | HCR_SWIO)
+
+/* Hyp System Control Register (HSCTLR) bits */
+#define HSCTLR_TE	(1 << 30)
+#define HSCTLR_EE	(1 << 25)
+#define HSCTLR_FI	(1 << 21)
+#define HSCTLR_WXN	(1 << 19)
+#define HSCTLR_I	(1 << 12)
+#define HSCTLR_C	(1 << 2)
+#define HSCTLR_A	(1 << 1)
+#define HSCTLR_M	1
+#define HSCTLR_MASK	(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
+			 HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
+
+/* TTBCR and HTCR Registers bits */
+#define TTBCR_EAE	(1 << 31)
+#define TTBCR_IMP	(1 << 30)
+#define TTBCR_SH1	(3 << 28)
+#define TTBCR_ORGN1	(3 << 26)
+#define TTBCR_IRGN1	(3 << 24)
+#define TTBCR_EPD1	(1 << 23)
+#define TTBCR_A1	(1 << 22)
+#define TTBCR_T1SZ	(3 << 16)
+#define TTBCR_SH0	(3 << 12)
+#define TTBCR_ORGN0	(3 << 10)
+#define TTBCR_IRGN0	(3 << 8)
+#define TTBCR_EPD0	(1 << 7)
+#define TTBCR_T0SZ	3
+#define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
+
+
+/* Virtualization Translation Control Register (VTCR) bits */
+#define VTCR_SH0	(3 << 12)
+#define VTCR_ORGN0	(3 << 10)
+#define VTCR_IRGN0	(3 << 8)
+#define VTCR_SL0	(3 << 6)
+#define VTCR_S		(1 << 4)
+#define VTCR_T0SZ	3
+#define VTCR_MASK	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0 | VTCR_SL0 | \
+			 VTCR_S | VTCR_T0SZ | VTCR_MASK)
+#define VTCR_HTCR_SH	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0)
+#define VTCR_SL_L2	0		/* Starting-level: 2 */
+#define VTCR_SL_L1	(1 << 6)	/* Starting-level: 1 */
+#define VTCR_GUEST_SL	VTCR_SL_L1
+#define VTCR_GUEST_T0SZ	0
+#if VTCR_GUEST_SL == 0
+#define VTTBR_X		(14 - VTCR_GUEST_T0SZ)
+#else
+#define VTTBR_X		(5 - VTCR_GUEST_T0SZ)
+#endif
+
+
+#endif /* __KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index c3d4458..69afdf3 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -24,5 +24,27 @@
 #define ARM_EXCEPTION_DATA_ABORT  4
 #define ARM_EXCEPTION_IRQ	  5
 #define ARM_EXCEPTION_FIQ	  6
+#define ARM_EXCEPTION_HVC	  7
+
+/*
+ * SMC Hypervisor API call number
+ */
+#define SMCHYP_HVBAR_W 0xfffffff0
+
+#ifndef __ASSEMBLY__
+struct kvm_vcpu;
+
+extern char __kvm_hyp_init[];
+extern char __kvm_hyp_init_end[];
+
+extern char __kvm_hyp_vector[];
+
+extern char __kvm_hyp_code_start[];
+extern char __kvm_hyp_code_end[];
+
+extern void __kvm_flush_vm_context(void);
+
+extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+#endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
new file mode 100644
index 0000000..1aa1af4
--- /dev/null
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -0,0 +1,37 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __ARM_KVM_MMU_H__
+#define __ARM_KVM_MMU_H__
+
+/*
+ * The architecture supports 40-bit IPA as input to the 2nd stage translations
+ * and PTRS_PER_PGD2 could therefore be 1024.
+ *
+ * To save a bit of memory and to avoid alignment issues we assume 39-bit IPA
+ * for now, but remember that the level-1 table must be aligned to its size.
+ */
+#define PTRS_PER_PGD2	512
+#define PGD2_ORDER	get_order(PTRS_PER_PGD2 * sizeof(pgd_t))
+
+int create_hyp_mappings(void *from, void *to);
+void free_hyp_pmds(void);
+
+int kvm_hyp_pgd_alloc(void);
+pgd_t *kvm_hyp_pgd_get(void);
+void kvm_hyp_pgd_free(void);
+
+#endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index a2d404e..18f5cef 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -32,6 +32,9 @@
 #define PMD_TYPE_SECT		(_AT(pmdval_t, 1) << 0)
 #define PMD_BIT4		(_AT(pmdval_t, 0))
 #define PMD_DOMAIN(x)		(_AT(pmdval_t, 0))
+#define PMD_APTABLE_SHIFT	(61)
+#define PMD_APTABLE		(_AT(pgdval_t, 3) << PGD_APTABLE_SHIFT)
+#define PMD_PXNTABLE		(_AT(pgdval_t, 1) << 59)
 
 /*
  *   - section
@@ -41,6 +44,7 @@
 #define PMD_SECT_S		(_AT(pmdval_t, 3) << 8)
 #define PMD_SECT_AF		(_AT(pmdval_t, 1) << 10)
 #define PMD_SECT_nG		(_AT(pmdval_t, 1) << 11)
+#define PMD_SECT_PXN		(_AT(pmdval_t, 1) << 53)
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index b249035..1169a8a 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -107,6 +107,10 @@
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & 2))
 #define pud_present(pud)	(pud_val(pud))
+#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_TABLE)
+#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_SECT)
 
 #define pud_clear(pudp)			\
 	do {				\
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index c7bd809..4b72287 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -82,6 +82,7 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
+#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 5992d90..4c61d3c 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -31,6 +31,12 @@
 #include <asm/uaccess.h>
 #include <asm/ptrace.h>
 #include <asm/mman.h>
+#include <asm/tlbflush.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_mmu.h>
+
+static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 
 int kvm_arch_hardware_enable(void *garbage)
 {
@@ -255,13 +261,174 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	return -EINVAL;
 }
 
+static void cpu_set_vector(void *vector)
+{
+	unsigned long vector_ptr;
+	unsigned long smc_hyp_nr;
+
+	vector_ptr = (unsigned long)vector;
+	smc_hyp_nr = SMCHYP_HVBAR_W;
+
+	/*
+	 * Set the HVBAR
+	 */
+	asm volatile (
+		"mov	r0, %[vector_ptr]\n\t"
+		"mov	r7, %[smc_hyp_nr]\n\t"
+		"smc	#0\n\t" : :
+		[vector_ptr] "r" (vector_ptr),
+		[smc_hyp_nr] "r" (smc_hyp_nr) :
+		"r0", "r7");
+}
+
+static void cpu_init_hyp_mode(void *vector)
+{
+	unsigned long pgd_ptr;
+	unsigned long hyp_stack_ptr;
+	unsigned long stack_page;
+
+	cpu_set_vector(vector);
+
+	pgd_ptr = virt_to_phys(kvm_hyp_pgd_get());
+	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
+	hyp_stack_ptr = stack_page + PAGE_SIZE;
+
+	/*
+	 * Call initialization code
+	 */
+	asm volatile (
+		"mov	r0, %[pgd_ptr]\n\t"
+		"mov	r1, %[hyp_stack_ptr]\n\t"
+		"hvc	#0\n\t" : :
+		[pgd_ptr] "r" (pgd_ptr),
+		[hyp_stack_ptr] "r" (hyp_stack_ptr) :
+		"r0", "r1");
+}
+
+/**
+ * Inits Hyp-mode on all online CPUs
+ */
+static int init_hyp_mode(void)
+{
+	phys_addr_t init_phys_addr, init_end_phys_addr;
+	int cpu;
+	int err = 0;
+
+	/*
+	 * Allocate stack pages for Hypervisor-mode
+	 */
+	for_each_possible_cpu(cpu) {
+		unsigned long stack_page;
+
+		stack_page = __get_free_page(GFP_KERNEL);
+		if (!stack_page) {
+			err = -ENOMEM;
+			goto out_free_stack_pages;
+		}
+
+		per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
+	}
+
+	/*
+	 * Allocate Hyp level-1 page table
+	 */
+	err = kvm_hyp_pgd_alloc();
+	if (err)
+		goto out_free_stack_pages;
+
+	init_phys_addr = virt_to_phys(__kvm_hyp_init);
+	init_end_phys_addr = virt_to_phys(__kvm_hyp_init_end);
+	BUG_ON(init_phys_addr & 0x1f);
+
+	/*
+	 * Create identity mapping for the init code.
+	 */
+	hyp_idmap_add(kvm_hyp_pgd_get(),
+		      (unsigned long)init_phys_addr,
+		      (unsigned long)init_end_phys_addr);
+
+	/*
+	 * Execute the init code on each CPU.
+	 *
+	 * Note: The stack is not mapped yet, so don't do anything else than
+	 * initializing the hypervisor mode on each CPU using a local stack
+	 * space for temporary storage.
+	 */
+	for_each_online_cpu(cpu) {
+		smp_call_function_single(cpu, cpu_init_hyp_mode,
+					 (void *)(long)init_phys_addr, 1);
+	}
+
+	/*
+	 * Unmap the identity mapping
+	 */
+	hyp_idmap_del(kvm_hyp_pgd_get(),
+		      (unsigned long)init_phys_addr,
+		      (unsigned long)init_end_phys_addr);
+
+	/*
+	 * Map the Hyp-code called directly from the host
+	 */
+	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
+	if (err) {
+		kvm_err("Cannot map world-switch code\n");
+		goto out_free_mappings;
+	}
+
+	/*
+	 * Map the Hyp stack pages
+	 */
+	for_each_possible_cpu(cpu) {
+		char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
+		err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE);
+
+		if (err) {
+			kvm_err("Cannot map hyp stack\n");
+			goto out_free_mappings;
+		}
+	}
+
+	/*
+	 * Set the HVBAR to the virtual kernel address
+	 */
+	for_each_online_cpu(cpu)
+		smp_call_function_single(cpu, cpu_set_vector,
+					 __kvm_hyp_vector, 1);
+
+	return 0;
+out_free_mappings:
+	free_hyp_pmds();
+	kvm_hyp_pgd_free();
+out_free_stack_pages:
+	for_each_possible_cpu(cpu)
+		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+	return err;
+}
+
+/**
+ * Initialize Hyp-mode and memory mappings on all CPUs.
+ */
 int kvm_arch_init(void *opaque)
 {
+	int err;
+
+	err = init_hyp_mode();
+	if (err)
+		goto out_err;
+
 	return 0;
+out_err:
+	return err;
 }
 
 void kvm_arch_exit(void)
 {
+	int cpu;
+
+	free_hyp_pmds();
+	for_each_possible_cpu(cpu)
+		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+	kvm_hyp_pgd_free();
 }
 
 static int arm_init(void)
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index d8a7fd5..2631609 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -14,3 +14,18 @@
  */
 
 #include <linux/module.h>
+#include <asm/kvm_asm.h>
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_init);
+EXPORT_SYMBOL_GPL(__kvm_hyp_init_end);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_vector);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_code_start);
+EXPORT_SYMBOL_GPL(__kvm_hyp_code_end);
+
+EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
+
+EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
+
+EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 073a494..7800023 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -13,5 +13,104 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  *
  */
+
+#include <linux/linkage.h>
+#include <asm/unified.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor initialization
+@    - should be called with:
+@        r0 = Hypervisor pgd pointer
+@        r1 = top of Hyp stack (kernel VA)
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	.text
+	.arm
+	.align 12
+__kvm_hyp_init:
+	.globl __kvm_hyp_init
+
+	@ Hyp-mode exception vector
+	nop
+	nop
+	nop
+	nop
+	nop
+	b	__do_hyp_init
+	nop
+	nop
+
+__do_hyp_init:
+	@ Set the sp to end of this page and push data for later use
+	mov	sp, pc
+	bic	sp, sp, #0x0ff
+	bic	sp, sp, #0xf00
+	add	sp, sp, #0x1000
+	push	{r0, r1, r2, r12}
+
+	@ Set the HTTBR to point to the hypervisor PGD pointer passed to
+	@ function and set the upper bits equal to the kernel PGD.
+	mrrc	p15, 1, r1, r2, c2
+	mcrr	p15, 4, r0, r2, c2
+
+	@ Set the HTCR and VTCR to the same shareability and cacheability
+	@ settings as the non-secure TTBCR and with T0SZ == 0.
+	mrc	p15, 4, r0, c2, c0, 2	@ HTCR
+	ldr	r12, =HTCR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c2, c0, 2	@ TTBCR
+	and	r1, r1, #(HTCR_MASK & ~TTBCR_T0SZ)
+	orr	r0, r0, r1
+	mcr	p15, 4, r0, c2, c0, 2	@ HTCR
+
+	mrc	p15, 4, r1, c2, c1, 2	@ VTCR
+	bic	r1, r1, #(VTCR_HTCR_SH | VTCR_SL0)
+	bic	r0, r0, #(~VTCR_HTCR_SH)
+	orr	r1, r0, r1
+	orr	r1, r1, #(VTCR_SL_L1 | VTCR_GUEST_T0SZ)
+	mcr	p15, 4, r1, c2, c1, 2	@ VTCR
+
+	@ Use the same memory attributes for hyp. accesses as the kernel
+	@ (copy MAIRx ro HMAIRx).
+	mrc	p15, 0, r0, c10, c2, 0
+	mcr	p15, 4, r0, c10, c2, 0
+	mrc	p15, 0, r0, c10, c2, 1
+	mcr	p15, 4, r0, c10, c2, 1
+
+	@ Set the HSCTLR to:
+	@  - ARM/THUMB exceptions: Kernel config (Thumb-2 kernel)
+	@  - Endianness: Kernel config
+	@  - Fast Interrupt Features: Kernel config
+	@  - Write permission implies XN: disabled
+	@  - Instruction cache: enabled
+	@  - Data/Unified cache: enabled
+	@  - Memory alignment checks: enabled
+	@  - MMU: enabled (this code must be run from an identity mapping)
+	mrc	p15, 4, r0, c1, c0, 0	@ HSCR
+	ldr	r12, =HSCTLR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c1, c0, 0	@ SCTLR
+	ldr	r12, =(HSCTLR_EE | HSCTLR_FI)
+	and	r1, r1, r12
+ ARM(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I)			)
+ THUMB(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I | HSCTLR_TE)	)
+	orr	r1, r1, r12
+	orr	r0, r0, r1
+	isb
+	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
+	isb
+
+	@ Set stack pointer and return to the kernel
+	pop	{r0, r1, r2, r12}
+	mov	sp, r1
+	eret
+
+	.ltorg
+
+	.align 12
+
+	__kvm_init_sp:
+	.globl __kvm_hyp_init_end
+__kvm_hyp_init_end:
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 073a494..d99a9c7 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -13,5 +13,52 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  *
  */
+
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/unified.h>
+#include <asm/page.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+	.text
+	.align	PAGE_SHIFT
+
+__kvm_hyp_code_start:
+	.globl __kvm_hyp_code_start
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Flush TLBs and instruction caches of current CPU for all VMIDs
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_flush_vm_context)
+	bx	lr
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor world-switch code
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_vcpu_run)
+	bx	lr
+
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor exception vector and handlers
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+	.align 5
+__kvm_hyp_vector:
+	.globl __kvm_hyp_vector
+	nop
+
+/*
+ * The below lines makes sure the HYP mode code fits in a single page (the
+ * assembler will bark at you if it doesn't). Please keep them together. If
+ * you plan to restructure the code or increase its size over a page, you'll
+ * have to fix the code in init_hyp_mode().
+ */
+__kvm_hyp_code_end:
+	.globl	__kvm_hyp_code_end
+
+	.org	__kvm_hyp_code_start + PAGE_SIZE
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2cccd48..a320b56a 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -13,3 +13,173 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  *
  */
+
+#include <linux/mman.h>
+#include <linux/kvm_host.h>
+#include <asm/pgalloc.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
+
+static pgd_t *kvm_hyp_pgd;
+static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
+
+static void free_ptes(pmd_t *pmd, unsigned long addr)
+{
+	pte_t *pte;
+	unsigned int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			pte_free_kernel(NULL, pte);
+		}
+		pmd++;
+	}
+}
+
+/**
+ * free_hyp_pmds - free a Hyp-mode level-2 tables and child level-3 tables
+ *
+ * Assumes this is a page table used strictly in Hyp-mode and therefore contains
+ * only mappings in the kernel memory area, which is above PAGE_OFFSET.
+ */
+void free_hyp_pmds(void)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = PAGE_OFFSET; addr != 0; addr += PGDIR_SIZE) {
+		pgd = kvm_hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none(*pud))
+			continue;
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+	}
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+}
+
+static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,
+						unsigned long end)
+{
+	pte_t *pte;
+	struct page *page;
+	unsigned long addr;
+
+	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
+		pte = pte_offset_kernel(pmd, addr);
+		BUG_ON(!virt_addr_valid(addr));
+		page = virt_to_page(addr);
+
+		set_pte_ext(pte, mk_pte(page, PAGE_HYP), 0);
+	}
+}
+
+static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
+					       unsigned long end)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+	unsigned long addr, next;
+
+	for (addr = start; addr < end; addr = next) {
+		pmd = pmd_offset(pud, addr);
+
+		BUG_ON(pmd_sect(*pmd));
+
+		if (pmd_none(*pmd)) {
+			pte = pte_alloc_one_kernel(NULL, addr);
+			if (!pte) {
+				kvm_err("Cannot allocate Hyp pte\n");
+				return -ENOMEM;
+			}
+			pmd_populate_kernel(NULL, pmd, pte);
+		}
+
+		next = pmd_addr_end(addr, end);
+		create_hyp_pte_mappings(pmd, addr, next);
+	}
+
+	return 0;
+}
+
+/**
+ * create_hyp_mappings - map a kernel virtual address range in Hyp mode
+ * @from:	The virtual kernel start address of the range
+ * @to:		The virtual kernel end address of the range (exclusive)
+ *
+ * The same virtual address as the kernel virtual address is also used in
+ * Hyp-mode mapping to the same underlying physical pages.
+ *
+ * Note: Wrapping around zero in the "to" address is not supported.
+ */
+int create_hyp_mappings(void *from, void *to)
+{
+	unsigned long start = (unsigned long)from;
+	unsigned long end = (unsigned long)to;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr, next;
+	int err = 0;
+
+	BUG_ON(start > end);
+	if (start < PAGE_OFFSET)
+		return -EINVAL;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = start; addr < end; addr = next) {
+		pgd = kvm_hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none_or_clear_bad(pud)) {
+			pmd = pmd_alloc_one(NULL, addr);
+			if (!pmd) {
+				kvm_err("Cannot allocate Hyp pmd\n");
+				err = -ENOMEM;
+				goto out;
+			}
+			pud_populate(NULL, pud, pmd);
+		}
+
+		next = pgd_addr_end(addr, end);
+		err = create_hyp_pmd_mappings(pud, addr, next);
+		if (err)
+			goto out;
+	}
+out:
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+	return err;
+}
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+int kvm_hyp_pgd_alloc(void)
+{
+	kvm_hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!kvm_hyp_pgd)
+		return -ENOMEM;
+
+	return 0;
+}
+
+pgd_t *kvm_hyp_pgd_get(void)
+{
+	return kvm_hyp_pgd;
+}
+
+void kvm_hyp_pgd_free(void)
+{
+	kfree(kvm_hyp_pgd);
+	kvm_hyp_pgd = NULL;
+}
diff --git a/mm/memory.c b/mm/memory.c
index 1b7dc66..b09d781 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -379,12 +379,14 @@ void pgd_clear_bad(pgd_t *pgd)
 	pgd_ERROR(*pgd);
 	pgd_clear(pgd);
 }
+EXPORT_SYMBOL_GPL(pgd_clear_bad);
 
 void pud_clear_bad(pud_t *pud)
 {
 	pud_ERROR(*pud);
 	pud_clear(pud);
 }
+EXPORT_SYMBOL_GPL(pud_clear_bad);
 
 void pmd_clear_bad(pmd_t *pmd)
 {


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 08/15] ARM: KVM: Module unloading support
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (6 preceding siblings ...)
  2012-06-15 19:07 ` [PATCH v8 07/15] ARM: KVM: Hypervisor inititalization Christoffer Dall
@ 2012-06-15 19:08 ` Christoffer Dall
  2012-06-15 19:08 ` [PATCH v8 09/15] ARM: KVM: Memory virtualization setup Christoffer Dall
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:08 UTC (permalink / raw)
  To: android-virt, kvm

Current initialization code relies on the MMU-bit and TE-bit of the
HSCTLR register to be cleared, so to support re-inserting the KVM module
we must clear these bits when unloading the module.

This is going to change in two ways:

First, the init id-map code is going to go away in favor of
section-based id-mapping.

Second, we are not going to use the SMC call in the future, but rather
an HVC instruction to take control of Hyp mode.  We need, however, a
method to setup the original init code again to support module
unloading.  It is useful to add this support at this point since we will (a)
remember to support unloading and (b) benefit from shorter debug cycles.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_asm.h |    3 ++
 arch/arm/kvm/arm.c             |   50 ++++++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/exports.c         |    3 ++
 arch/arm/kvm/init.S            |   28 ++++++++++++++++++++++
 4 files changed, 84 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 69afdf3..c2ec131 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -37,6 +37,9 @@ struct kvm_vcpu;
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
 
+extern char __kvm_hyp_exit[];
+extern char __kvm_hyp_exit_end[];
+
 extern char __kvm_hyp_vector[];
 
 extern char __kvm_hyp_code_start[];
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 4c61d3c..efe130c 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -421,10 +421,60 @@ out_err:
 	return err;
 }
 
+static void cpu_exit_hyp_mode(void *vector)
+{
+	cpu_set_vector(vector);
+
+	/*
+	 * Disable Hyp-MMU for each cpu
+	 */
+	asm volatile ("hvc	#0");
+}
+
+static int exit_hyp_mode(void)
+{
+	phys_addr_t exit_phys_addr, exit_end_phys_addr;
+	int cpu;
+
+	exit_phys_addr = virt_to_phys(__kvm_hyp_exit);
+	exit_end_phys_addr = virt_to_phys(__kvm_hyp_exit_end);
+	BUG_ON(exit_phys_addr & 0x1f);
+
+	/*
+	 * Create identity mapping for the exit code.
+	 */
+	hyp_idmap_add(kvm_hyp_pgd_get(),
+		      (unsigned long)exit_phys_addr,
+		      (unsigned long)exit_end_phys_addr);
+
+	/*
+	 * Execute the exit code on each CPU.
+	 *
+	 * Note: The stack is not mapped yet, so don't do anything else than
+	 * initializing the hypervisor mode on each CPU using a local stack
+	 * space for temporary storage.
+	 */
+	for_each_online_cpu(cpu) {
+		smp_call_function_single(cpu, cpu_exit_hyp_mode,
+					 (void *)(long)exit_phys_addr, 1);
+	}
+
+	/*
+	 * Unmap the identity mapping
+	 */
+	hyp_idmap_del(kvm_hyp_pgd_get(),
+		      (unsigned long)exit_phys_addr,
+		      (unsigned long)exit_end_phys_addr);
+
+	return 0;
+}
+
 void kvm_arch_exit(void)
 {
 	int cpu;
 
+	exit_hyp_mode();
+
 	free_hyp_pmds();
 	for_each_possible_cpu(cpu)
 		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index 2631609..9bdaf11 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -19,6 +19,9 @@
 EXPORT_SYMBOL_GPL(__kvm_hyp_init);
 EXPORT_SYMBOL_GPL(__kvm_hyp_init_end);
 
+EXPORT_SYMBOL_GPL(__kvm_hyp_exit);
+EXPORT_SYMBOL_GPL(__kvm_hyp_exit_end);
+
 EXPORT_SYMBOL_GPL(__kvm_hyp_vector);
 
 EXPORT_SYMBOL_GPL(__kvm_hyp_code_start);
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 7800023..c3beb20 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -114,3 +114,31 @@ __do_hyp_init:
 	__kvm_init_sp:
 	.globl __kvm_hyp_init_end
 __kvm_hyp_init_end:
+
+
+	.align 12
+__kvm_hyp_exit:
+	.globl __kvm_hyp_exit
+
+	@ Hyp-mode exception vector
+	nop
+	nop
+	nop
+	nop
+	nop
+	b	__do_hyp_exit
+	nop
+	nop
+
+__do_hyp_exit:
+	@ Clear the MMU and TE bits in the HSCR
+	mrc	p15, 4, sp, c1, c0, 0	@ HSCR
+	bic	sp, sp, #((1 << 30) | (1 << 0))
+
+	isb
+	mcr	p15, 4, sp, c1, c0, 0	@ HSCR
+	isb
+	eret
+
+	.globl __kvm_hyp_exit_end
+__kvm_hyp_exit_end:


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 09/15] ARM: KVM: Memory virtualization setup
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (7 preceding siblings ...)
  2012-06-15 19:08 ` [PATCH v8 08/15] ARM: KVM: Module unloading support Christoffer Dall
@ 2012-06-15 19:08 ` Christoffer Dall
  2012-06-21 12:29   ` Gleb Natapov
  2012-06-28 22:34   ` Marcelo Tosatti
  2012-06-15 19:08 ` [PATCH v8 10/15] ARM: KVM: Inject IRQs and FIQs from userspace Christoffer Dall
                   ` (6 subsequent siblings)
  15 siblings, 2 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:08 UTC (permalink / raw)
  To: android-virt, kvm

From: Christoffer Dall <cdall@cs.columbia.edu>

This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Further, each entry in TLBs and caches are tagged with a VMID
identifier in addition to ASIDs. The VMIDs are assigned consecutively
to VMs in the order that VMs are executed, and caches and tlbs are
invalidated when the VMID space has been used to allow for more than
255 simultaenously running guests.

The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h |    2 -
 arch/arm/include/asm/kvm_mmu.h |    5 ++
 arch/arm/kvm/arm.c             |   65 ++++++++++++++++++++++---
 arch/arm/kvm/mmu.c             |  103 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 166 insertions(+), 9 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 7f30cbd..257242f 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -62,7 +62,7 @@
  * SWIO:	Turn set/way invalidates into set/way clean+invalidate
  */
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TWI | HCR_VM | HCR_BSU_IS | HCR_FB | \
-			HCR_AMO | HCR_IMO | HCR_FMO | HCR_FMO | HCR_SWIO)
+			HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 1aa1af4..d95662eb 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -34,4 +34,9 @@ int kvm_hyp_pgd_alloc(void);
 pgd_t *kvm_hyp_pgd_get(void);
 void kvm_hyp_pgd_free(void);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index efe130c..81babe9 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -38,6 +38,13 @@
 
 static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 
+/* The VMID used in the VTTBR */
+#define VMID_BITS               8
+#define VMID_MASK               ((1 << VMID_BITS) - 1)
+#define VMID_FIRST_GENERATION	(1 << VMID_BITS)
+static u64 next_vmid;		/* The next available VMID in the sequence */
+DEFINE_SPINLOCK(kvm_vmid_lock);
+
 int kvm_arch_hardware_enable(void *garbage)
 {
 	return 0;
@@ -70,14 +77,6 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 }
 
-int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
-{
-	if (type)
-		return -EINVAL;
-
-	return 0;
-}
-
 int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
 {
 	return VM_FAULT_SIGBUS;
@@ -93,10 +92,46 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 	return 0;
 }
 
+/**
+ * kvm_arch_init_vm - initializes a VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+{
+	int ret = 0;
+
+	if (type)
+		return -EINVAL;
+
+	ret = kvm_alloc_stage2_pgd(kvm);
+	if (ret)
+		goto out_fail_alloc;
+	mutex_init(&kvm->arch.pgd_mutex);
+
+	ret = create_hyp_mappings(kvm, kvm + 1);
+	if (ret)
+		goto out_free_stage2_pgd;
+
+	/* Mark the initial VMID invalid */
+	kvm->arch.vmid = 0;
+
+	return ret;
+out_free_stage2_pgd:
+	kvm_free_stage2_pgd(kvm);
+out_fail_alloc:
+	return ret;
+}
+
+/**
+ * kvm_arch_destroy_vm - destroy the VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	int i;
 
+	kvm_free_stage2_pgd(kvm);
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
@@ -172,6 +207,10 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (err)
 		goto free_vcpu;
 
+	err = create_hyp_mappings(vcpu, vcpu + 1);
+	if (err)
+		goto free_vcpu;
+
 	return vcpu;
 free_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
@@ -181,6 +220,7 @@ out:
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
@@ -416,6 +456,15 @@ int kvm_arch_init(void *opaque)
 	if (err)
 		goto out_err;
 
+	/*
+	 * The upper 56 bits of VMIDs are used to identify the generation
+	 * counter, so VMIDs initialized to 0, having generation == 0, will
+	 * never be considered valid and therefor a new VMID must always be
+	 * assigned. Whent he VMID generation rolls over, we start from
+	 * VMID_FIRST_GENERATION again.
+	 */
+	next_vmid = VMID_FIRST_GENERATION;
+
 	return 0;
 out_err:
 	return err;
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index a320b56a..b256540 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -159,6 +159,109 @@ out:
 	return err;
 }
 
+/**
+ * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Allocates the 1st level table only of size defined by PGD2_ORDER (can
+ * support either full 40-bit input addresses or limited to 32-bit input
+ * addresses). Clears the allocated pages.
+ */
+int kvm_alloc_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+
+	if (kvm->arch.pgd != NULL) {
+		kvm_err("kvm_arch already initialized?\n");
+		return -EINVAL;
+	}
+
+	pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
+	if (!pgd)
+		return -ENOMEM;
+
+	memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
+	kvm->arch.pgd = pgd;
+
+	return 0;
+}
+
+static void free_guest_pages(pte_t *pte, unsigned long addr)
+{
+	unsigned int i;
+	struct page *page;
+
+	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
+		if (!pte_present(*pte))
+			goto next_page;
+		page = pfn_to_page(pte_pfn(*pte));
+		put_page(page);
+next_page:
+		pte++;
+	}
+}
+
+static void free_stage2_ptes(pmd_t *pmd, unsigned long addr)
+{
+	unsigned int i;
+	pte_t *pte;
+	struct page *page;
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		BUG_ON(pmd_sect(*pmd));
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			free_guest_pages(pte, addr);
+			page = virt_to_page((void *)pte);
+			WARN_ON(atomic_read(&page->_count) != 1);
+			pte_free_kernel(NULL, pte);
+		}
+		pmd++;
+	}
+}
+
+/**
+ * kvm_free_stage2_pgd - free all stage-2 tables
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
+ * underlying level-2 and level-3 tables before freeing the actual level-1 table
+ * and setting the struct pointer to NULL.
+ */
+void kvm_free_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long long i, addr;
+
+	if (kvm->arch.pgd == NULL)
+		return;
+
+	/*
+	 * We do this slightly different than other places, since we need more
+	 * than 32 bits and for instance pgd_addr_end converts to unsigned long.
+	 */
+	addr = 0;
+	for (i = 0; i < PTRS_PER_PGD2; i++) {
+		addr = i * (unsigned long long)PGDIR_SIZE;
+		pgd = kvm->arch.pgd + i;
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none(*pud))
+			continue;
+
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_stage2_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+	}
+
+	free_pages((unsigned long)kvm->arch.pgd, PGD2_ORDER);
+	kvm->arch.pgd = NULL;
+}
+
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	return -EINVAL;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 10/15] ARM: KVM: Inject IRQs and FIQs from userspace
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (8 preceding siblings ...)
  2012-06-15 19:08 ` [PATCH v8 09/15] ARM: KVM: Memory virtualization setup Christoffer Dall
@ 2012-06-15 19:08 ` Christoffer Dall
  2012-06-18 13:32   ` Avi Kivity
  2012-06-15 19:08 ` [PATCH v8 11/15] ARM: KVM: World-switch implementation Christoffer Dall
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:08 UTC (permalink / raw)
  To: android-virt, kvm

From: Christoffer Dall <cdall@cs.columbia.edu>

Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
This ioctl is used since the sematics are in fact two lines that can be
either raised or lowered on the VCPU - the IRQ and FIQ lines.

KVM needs to know which VCPU it must operate on and whether the FIQ or
IRQ line is raised/lowered. Hence both pieces of information is packed
in the kvm_irq_level->irq field. The irq fild value will be:
  IRQ: vcpu_index << 1
  FIQ: (vcpu_index << 1) | 1

This is documented in Documentation/kvm/api.txt.

The effect of the ioctl is simply to simply raise/lower the
corresponding irq_line field on the VCPU struct, which will cause the
world-switch code to raise/lower virtual interrupts when running the
guest on next switch. The wait_for_interrupt flag is also cleared for
raised IRQs or FIQs causing an idle VCPU to become active again. CPUs
in guest mode are kicked to make sure they refresh their interrupt status.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |   12 +++++--
 arch/arm/include/asm/kvm.h        |    9 +++++
 arch/arm/include/asm/kvm_arm.h    |    1 +
 arch/arm/kvm/arm.c                |   62 ++++++++++++++++++++++++++++++++++++-
 include/linux/kvm.h               |    1 +
 5 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 310fe50..79c10fc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -614,15 +614,19 @@ only go to the IOAPIC.  On ia64, a IOSAPIC is created.
 4.25 KVM_IRQ_LINE
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irq_level
 Returns: 0 on success, -1 on error
 
 Sets the level of a GSI input to the interrupt controller model in the kernel.
-Requires that an interrupt controller model has been previously created with
-KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
-to be set to 1 and then back to 0.
+On some architectures it is required that an interrupt controller model has
+been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
+interrupts require the level to be set to 1 and then back to 0.
+
+ARM uses two types of interrupt lines per CPU: IRQ and FIQ.  The value of the
+irq field should be (vcpu_index << 1) for IRQs and ((vcpu_index << 1) | 1) for
+FIQs. Level is used to raise/lower the line.
 
 struct kvm_irq_level {
 	union {
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index c8466b7..38ff1d6 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -20,6 +20,15 @@
 #include <asm/types.h>
 
 #define __KVM_HAVE_GUEST_DEBUG
+#define __KVM_HAVE_IRQ_LINE
+
+/*
+ * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index.
+ */
+enum KVM_ARM_IRQ_LINE_TYPE {
+	KVM_ARM_IRQ_LINE = 0,
+	KVM_ARM_FIQ_LINE = 1,
+};
 
 /*
  * Modes used for short-hand mode determinition in the world-switch code and
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 257242f..130526c 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -63,6 +63,7 @@
  */
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TWI | HCR_VM | HCR_BSU_IS | HCR_FB | \
 			HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO)
+#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 81babe9..4306093 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -22,6 +22,7 @@
 #include <linux/fs.h>
 #include <linux/mman.h>
 #include <linux/sched.h>
+#include <linux/kvm.h>
 #include <trace/events/kvm.h>
 
 #define CREATE_TRACE_POINTS
@@ -244,6 +245,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+	vcpu->cpu = cpu;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -284,6 +286,51 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return -EINVAL;
 }
 
+static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
+				      struct kvm_irq_level *irq_level)
+{
+	int mask;
+	unsigned int vcpu_idx;
+	struct kvm_vcpu *vcpu;
+	unsigned long old, new, *ptr;
+
+	vcpu_idx = irq_level->irq >> 1;
+	if (vcpu_idx >= KVM_MAX_VCPUS)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+	if (!vcpu)
+		return -EINVAL;
+
+	if ((irq_level->irq & 1) == KVM_ARM_IRQ_LINE)
+		mask = HCR_VI;
+	else /* KVM_ARM_FIQ_LINE */
+		mask = HCR_VF;
+
+	trace_kvm_set_irq(irq_level->irq, irq_level->level, 0);
+
+	ptr = (unsigned long *)&vcpu->arch.irq_lines;
+	do {
+		old = ACCESS_ONCE(*ptr);
+		if (irq_level->level)
+			new = old | mask;
+		else
+			new = old & ~mask;
+
+		if (new == old)
+			return 0; /* no change */
+	} while (cmpxchg(ptr, old, new) != old);
+
+	/*
+	 * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
+	 * trigger a world-switch round on the running physical CPU to set the
+	 * virtual IRQ/FIQ fields in the HCR appropriately.
+	 */
+	kvm_vcpu_kick(vcpu);
+
+	return 0;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
@@ -298,7 +345,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
-	return -EINVAL;
+	struct kvm *kvm = filp->private_data;
+	void __user *argp = (void __user *)arg;
+
+	switch (ioctl) {
+	case KVM_IRQ_LINE: {
+		struct kvm_irq_level irq_event;
+
+		if (copy_from_user(&irq_event, argp, sizeof irq_event))
+			return -EFAULT;
+		return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event);
+	}
+	default:
+		return -EINVAL;
+	}
 }
 
 static void cpu_set_vector(void *vector)
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..f978447 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -111,6 +111,7 @@ struct kvm_irq_level {
 	 * ACPI gsi notion of irq.
 	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
 	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
+	 * For ARM: IRQ: irq = (2*vcpu_index). FIQ: irq = (2*vcpu_indx + 1).
 	 */
 	union {
 		__u32 irq;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 11/15] ARM: KVM: World-switch implementation
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (9 preceding siblings ...)
  2012-06-15 19:08 ` [PATCH v8 10/15] ARM: KVM: Inject IRQs and FIQs from userspace Christoffer Dall
@ 2012-06-15 19:08 ` Christoffer Dall
  2012-06-18 13:41   ` Avi Kivity
  2012-06-15 19:08 ` [PATCH v8 12/15] ARM: KVM: Emulation framework and CP15 emulation Christoffer Dall
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:08 UTC (permalink / raw)
  To: android-virt, kvm

Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.

Switching to Hyp mode is done through a simple HVC instructions. The
exception vector code will check that the HVC comes from VMID==0 and if
so will store the necessary state on the Hyp stack, which will look like
this (growing downwards, see the hyp_hvc handler):
  ...
  Hyp_Sp + 4: spsr (Host-SVC cpsr)
  Hyp_Sp    : lr_usr

When returning from Hyp mode to SVC mode, another HVC instruction is
executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp
stack pointer should be where it was left from the above initial call,
since the values on the stack will be used to restore state (see
hyp_svc).

Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.

One controversy may be the back-door call to __irq_svc (the host
kernel's own physical IRQ handler) which is called when a physical IRQ
exception is taken in Hyp mode while running in the guest.

SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.

Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h  |   26 ++
 arch/arm/include/asm/kvm_host.h |    3 
 arch/arm/kernel/armksyms.c      |    7 +
 arch/arm/kernel/asm-offsets.c   |   35 +++
 arch/arm/kernel/entry-armv.S    |    1 
 arch/arm/kvm/arm.c              |  136 ++++++++++-
 arch/arm/kvm/interrupts.S       |  499 +++++++++++++++++++++++++++++++++++++++
 7 files changed, 703 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 130526c..d76e8eb37 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -114,5 +114,31 @@
 #define VTTBR_X		(5 - VTCR_GUEST_T0SZ)
 #endif
 
+/* Hyp Syndrome Register (HSR) bits */
+#define HSR_EC_SHIFT	(26)
+#define HSR_EC		(0x3fU << HSR_EC_SHIFT)
+#define HSR_IL		(1U << 25)
+#define HSR_ISS		(HSR_IL - 1)
+#define HSR_ISV_SHIFT	(24)
+#define HSR_ISV		(1U << HSR_ISV_SHIFT)
+
+#define HSR_EC_UNKNOWN	(0x00)
+#define HSR_EC_WFI	(0x01)
+#define HSR_EC_CP15_32	(0x03)
+#define HSR_EC_CP15_64	(0x04)
+#define HSR_EC_CP14_MR	(0x05)
+#define HSR_EC_CP14_LS	(0x06)
+#define HSR_EC_CP_0_13	(0x07)
+#define HSR_EC_CP10_ID	(0x08)
+#define HSR_EC_JAZELLE	(0x09)
+#define HSR_EC_BXJ	(0x0A)
+#define HSR_EC_CP14_64	(0x0C)
+#define HSR_EC_SVC_HYP	(0x11)
+#define HSR_EC_HVC	(0x12)
+#define HSR_EC_SMC	(0x13)
+#define HSR_EC_IABT	(0x20)
+#define HSR_EC_IABT_HYP	(0x21)
+#define HSR_EC_DABT	(0x24)
+#define HSR_EC_DABT_HYP	(0x25)
 
 #endif /* __KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 2d611df..ed4144b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -96,6 +96,9 @@ struct kvm_vcpu_arch {
 	u32 hdfar;		/* Hyp Data Fault Address Register */
 	u32 hifar;		/* Hyp Inst. Fault Address Register */
 	u32 hpfar;		/* Hyp IPA Fault Address Register */
+	u64 pc_ipa;		/* IPA for the current PC (VA to PA result) */
+	u64 pc_ipa2;		/* same as above, but for non-aligned wide thumb
+				   instructions */
 
 	/* IO related fields */
 	u32 mmio_rd;
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index b57c75e..38d3a12 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -48,6 +48,13 @@ extern void __aeabi_ulcmp(void);
 
 extern void fpundefinstr(void);
 
+#ifdef CONFIG_KVM_ARM_HOST
+/* This is needed for KVM */
+extern void __irq_svc(void);
+
+EXPORT_SYMBOL_GPL(__irq_svc);
+#endif
+
 	/* platform dependent support */
 EXPORT_SYMBOL(__udelay);
 EXPORT_SYMBOL(__const_udelay);
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 1429d89..6ef0cdc 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -13,6 +13,7 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/dma-mapping.h>
+#include <linux/kvm_host.h>
 #include <asm/cacheflush.h>
 #include <asm/glue-df.h>
 #include <asm/glue-pf.h>
@@ -144,5 +145,39 @@ int main(void)
   DEFINE(DMA_BIDIRECTIONAL,	DMA_BIDIRECTIONAL);
   DEFINE(DMA_TO_DEVICE,		DMA_TO_DEVICE);
   DEFINE(DMA_FROM_DEVICE,	DMA_FROM_DEVICE);
+#ifdef CONFIG_KVM_ARM_HOST
+  DEFINE(VCPU_KVM,		offsetof(struct kvm_vcpu, kvm));
+  DEFINE(VCPU_MIDR,		offsetof(struct kvm_vcpu, arch.cp15[c0_MIDR]));
+  DEFINE(VCPU_MPIDR,		offsetof(struct kvm_vcpu, arch.cp15[c0_MPIDR]));
+  DEFINE(VCPU_SCTLR,		offsetof(struct kvm_vcpu, arch.cp15[c1_SCTLR]));
+  DEFINE(VCPU_CPACR,		offsetof(struct kvm_vcpu, arch.cp15[c1_CPACR]));
+  DEFINE(VCPU_TTBR0,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR0]));
+  DEFINE(VCPU_TTBR1,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR1]));
+  DEFINE(VCPU_TTBCR,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBCR]));
+  DEFINE(VCPU_DACR,		offsetof(struct kvm_vcpu, arch.cp15[c3_DACR]));
+  DEFINE(VCPU_PRRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_PRRR]));
+  DEFINE(VCPU_NMRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_NMRR]));
+  DEFINE(VCPU_CID,		offsetof(struct kvm_vcpu, arch.cp15[c13_CID]));
+  DEFINE(VCPU_TID_URW,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URW]));
+  DEFINE(VCPU_TID_URO,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URO]));
+  DEFINE(VCPU_TID_PRIV,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_PRIV]));
+  DEFINE(VCPU_REGS,		offsetof(struct kvm_vcpu, arch.regs));
+  DEFINE(VCPU_USR_REGS,		offsetof(struct kvm_vcpu, arch.regs.usr_regs));
+  DEFINE(VCPU_SVC_REGS,		offsetof(struct kvm_vcpu, arch.regs.svc_regs));
+  DEFINE(VCPU_ABT_REGS,		offsetof(struct kvm_vcpu, arch.regs.abt_regs));
+  DEFINE(VCPU_UND_REGS,		offsetof(struct kvm_vcpu, arch.regs.und_regs));
+  DEFINE(VCPU_IRQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.irq_regs));
+  DEFINE(VCPU_FIQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
+  DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.pc));
+  DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.cpsr));
+  DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.hsr));
+  DEFINE(VCPU_HDFAR,		offsetof(struct kvm_vcpu, arch.hdfar));
+  DEFINE(VCPU_HIFAR,		offsetof(struct kvm_vcpu, arch.hifar));
+  DEFINE(VCPU_HPFAR,		offsetof(struct kvm_vcpu, arch.hpfar));
+  DEFINE(VCPU_PC_IPA,		offsetof(struct kvm_vcpu, arch.pc_ipa));
+  DEFINE(VCPU_PC_IPA2,		offsetof(struct kvm_vcpu, arch.pc_ipa2));
+  DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
+#endif
   return 0; 
 }
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 437f0c4..db029bb 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -209,6 +209,7 @@ __dabt_svc:
 ENDPROC(__dabt_svc)
 
 	.align	5
+	.globl __irq_svc
 __irq_svc:
 	svc_entry
 	irq_handler
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 4306093..efbdcf9 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -33,6 +33,7 @@
 #include <asm/ptrace.h>
 #include <asm/mman.h>
 #include <asm/tlbflush.h>
+#include <asm/cputype.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
@@ -236,6 +237,24 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
+	unsigned long cpsr;
+	unsigned long sctlr;
+
+
+	/* Init execution CPSR */
+	asm volatile ("mrs	%[cpsr], cpsr" :
+			[cpsr] "=r" (cpsr));
+	vcpu->arch.regs.cpsr = SVC_MODE | PSR_I_BIT | PSR_F_BIT | PSR_A_BIT |
+				(cpsr & PSR_E_BIT);
+
+	/* Init SCTLR with MMU disabled */
+	asm volatile ("mrc	p15, 0, %[sctlr], c1, c0, 0" :
+			[sctlr] "=r" (sctlr));
+	vcpu->arch.cp15[c1_SCTLR] = sctlr & ~1U;
+
+	/* Compute guest MPIDR */
+	vcpu->arch.cp15[c0_MPIDR] = (read_cpuid_mpidr() & ~0xff)
+				    | vcpu->vcpu_id;
 	return 0;
 }
 
@@ -278,12 +297,125 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
 {
-	return 0;
+	return v->mode == IN_GUEST_MODE;
 }
 
+static void reset_vm_context(void *info)
+{
+	__kvm_flush_vm_context();
+}
+
+/**
+ * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
+ * @kvm	The guest that we are about to run
+ *
+ * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
+ * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
+ * caches and TLBs.
+ */
+static void update_vttbr(struct kvm *kvm)
+{
+	phys_addr_t pgd_phys;
+
+	spin_lock(&kvm_vmid_lock);
+
+	/*
+	 *  Check that the VMID is still valid.
+	 *  (The hardware supports only 256 values with the value zero
+	 *   reserved for the host, so we check if an assigned value has rolled
+	 *   over a sequence, which requires us to assign a new value and flush
+	 *   necessary caches and TLBs on all CPUs.)
+	 */
+	if (unlikely((kvm->arch.vmid ^ next_vmid) >> VMID_BITS)) {
+		/* Check for a new VMID generation */
+		if (unlikely((next_vmid & VMID_MASK) == 0)) {
+			/* Check for the (very unlikely) 64-bit wrap around */
+			if (unlikely(next_vmid == 0))
+				next_vmid = VMID_FIRST_GENERATION;
+
+			next_vmid++;
+
+			/* This does nothing on UP */
+			smp_call_function(reset_vm_context, NULL, 1);
+
+			/*
+			 * On SMP we know no other CPUs can use this CPU's or
+			 * each other's VMID since the kvm_vmid_lock blocks
+			 * them from reentry to the guest.
+			 */
+
+			reset_vm_context(NULL);
+		}
+
+		kvm->arch.vmid = next_vmid++;
+
+		/* update vttbr to be used with the new vmid */
+		pgd_phys = virt_to_phys(kvm->arch.pgd);
+		kvm->arch.vttbr = pgd_phys & ((1LLU << 40) - 1)
+				  & ~((2 << VTTBR_X) - 1);
+		kvm->arch.vttbr |= (kvm->arch.vmid & VMID_MASK) << 48;
+	}
+
+	spin_unlock(&kvm_vmid_lock);
+}
+
+/**
+ * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
+ * @vcpu:	The VCPU pointer
+ * @run:	The kvm_run structure pointer used for userspace state exchange
+ *
+ * This function is called through the VCPU_RUN ioctl called from user space. It
+ * will execute VM code in a loop until the time slice for the process is used
+ * or some emulation is needed from user space in which case the function will
+ * return with return value 0 and with the kvm_run structure filled in with the
+ * required data for the requested emulation.
+ */
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	int ret = 0;
+	sigset_t sigsaved;
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
+
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	while (run->exit_reason == KVM_EXIT_UNKNOWN) {
+		/*
+		 * Check conditions before entering the guest
+		 */
+		if (need_resched())
+			kvm_resched(vcpu);
+
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			run->exit_reason = KVM_EXIT_INTR;
+			break;
+		}
+
+		/*
+		 * Enter the guest
+		 */
+		trace_kvm_entry(vcpu->arch.regs.pc);
+
+		update_vttbr(vcpu->kvm);
+
+		local_irq_disable();
+		kvm_guest_enter();
+		vcpu->mode = IN_GUEST_MODE;
+
+		ret = __kvm_vcpu_run(vcpu);
+
+		vcpu->mode = OUTSIDE_GUEST_MODE;
+		kvm_guest_exit();
+		local_irq_enable();
+
+		trace_kvm_exit(vcpu->arch.regs.pc);
+	}
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &sigsaved, NULL);
+
+	return ret;
 }
 
 static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index d99a9c7..13b7b85 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -22,6 +22,11 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
 
+#define VCPU_USR_REG(_reg_nr)	(VCPU_USR_REGS + (_reg_nr * 4))
+#define VCPU_USR_SP		(VCPU_USR_REG(13))
+#define VCPU_FIQ_REG(_reg_nr)	(VCPU_FIQ_REGS + (_reg_nr * 4))
+#define VCPU_FIQ_SPSR		(VCPU_FIQ_REG(7))
+
 	.text
 	.align	PAGE_SHIFT
 
@@ -33,24 +38,514 @@ __kvm_hyp_code_start:
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
 ENTRY(__kvm_flush_vm_context)
+	hvc	#0			@ switch to hyp-mode
+
+	mov	r0, #0			@ rn parameter for c15 flushes is SBZ
+	mcr     p15, 4, r0, c8, c7, 4   @ Invalidate Non-secure Non-Hyp TLB
+	mcr     p15, 0, r0, c7, c5, 0   @ Invalidate instruction caches
+	dsb
+	isb
+
+	hvc	#0			@ switch back to svc-mode, see hyp_svc
 	bx	lr
+ENDPROC(__kvm_flush_vm_context)
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor world-switch code
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/* These are simply for the macros to work - value don't have meaning */
+.equ usr, 0
+.equ svc, 1
+.equ abt, 2
+.equ und, 3
+.equ irq, 4
+.equ fiq, 5
+
+.macro store_mode_state base_reg, mode
+	.if \mode == usr
+	mrs	r2, SP_usr
+	mov	r3, lr
+	stmdb	\base_reg!, {r2, r3}
+	.elseif \mode != fiq
+	mrs	r2, SP_\mode
+	mrs	r3, LR_\mode
+	mrs	r4, SPSR_\mode
+	stmdb	\base_reg!, {r2, r3, r4}
+	.else
+	mrs	r2, r8_fiq
+	mrs	r3, r9_fiq
+	mrs	r4, r10_fiq
+	mrs	r5, r11_fiq
+	mrs	r6, r12_fiq
+	mrs	r7, SP_fiq
+	mrs	r8, LR_fiq
+	mrs	r9, SPSR_fiq
+	stmdb	\base_reg!, {r2-r9}
+	.endif
+.endm
+
+.macro load_mode_state base_reg, mode
+	.if \mode == usr
+	ldmia	\base_reg!, {r2, r3}
+	msr	SP_usr, r2
+	mov	lr, r3
+	.elseif \mode != fiq
+	ldmia	\base_reg!, {r2, r3, r4}
+	msr	SP_\mode, r2
+	msr	LR_\mode, r3
+	msr	SPSR_\mode, r4
+	.else
+	ldmia	\base_reg!, {r2-r9}
+	msr	r8_fiq, r2
+	msr	r9_fiq, r3
+	msr	r10_fiq, r4
+	msr	r11_fiq, r5
+	msr	r12_fiq, r6
+	msr	SP_fiq, r7
+	msr	LR_fiq, r8
+	msr	SPSR_fiq, r9
+	.endif
+.endm
+
+/* Reads cp15 registers from hardware and stores them in memory
+ * @vcpu:   If 0, registers are written in-order to the stack,
+ * 	    otherwise to the VCPU struct pointed to by vcpup
+ * @vcpup:  Register pointing to VCPU struct
+ */
+.macro read_cp15_state vcpu=0, vcpup
+	mrc	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mrc	p15, 0, r3, c1, c0, 2	@ CPACR
+	mrc	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mrc	p15, 0, r5, c3, c0, 0	@ DACR
+	mrrc	p15, 0, r6, r7, c2	@ TTBR 0
+	mrrc	p15, 1, r8, r9, c2	@ TTBR 1
+	mrc	p15, 0, r10, c10, c2, 0	@ PRRR
+	mrc	p15, 0, r11, c10, c2, 1	@ NMRR
+
+	.if \vcpu == 0
+	push	{r2-r11}		@ Push CP15 registers
+	.else
+	str	r2, [\vcpup, #VCPU_SCTLR]
+	str	r3, [\vcpup, #VCPU_CPACR]
+	str	r4, [\vcpup, #VCPU_TTBCR]
+	str	r5, [\vcpup, #VCPU_DACR]
+	add	\vcpup, \vcpup, #VCPU_TTBR0
+	strd	r6, r7, [\vcpup]
+	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	strd	r8, r9, [\vcpup]
+	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
+	str	r10, [\vcpup, #VCPU_PRRR]
+	str	r11, [\vcpup, #VCPU_NMRR]
+	.endif
+
+	mrc	p15, 0, r2, c13, c0, 1	@ CID
+	mrc	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mrc	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mrc	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	.if \vcpu == 0
+	push	{r2-r5}			@ Push CP15 registers
+	.else
+	str	r2, [\vcpup, #VCPU_CID]
+	str	r3, [\vcpup, #VCPU_TID_URW]
+	str	r4, [\vcpup, #VCPU_TID_URO]
+	str	r5, [\vcpup, #VCPU_TID_PRIV]
+	.endif
+.endm
+
+/* Reads cp15 registers from memory and writes them to hardware
+ * @vcpu:   If 0, registers are read in-order from the stack,
+ * 	    otherwise from the VCPU struct pointed to by vcpup
+ * @vcpup:  Register pointing to VCPU struct
+ */
+.macro write_cp15_state vcpu=0, vcpup
+	.if \vcpu == 0
+	pop	{r2-r5}
+	.else
+	ldr	r2, [\vcpup, #VCPU_CID]
+	ldr	r3, [\vcpup, #VCPU_TID_URW]
+	ldr	r4, [\vcpup, #VCPU_TID_URO]
+	ldr	r5, [\vcpup, #VCPU_TID_PRIV]
+	.endif
+
+	mcr	p15, 0, r2, c13, c0, 1	@ CID
+	mcr	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mcr	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mcr	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+
+	.if \vcpu == 0
+	pop	{r2-r11}
+	.else
+	ldr	r2, [\vcpup, #VCPU_SCTLR]
+	ldr	r3, [\vcpup, #VCPU_CPACR]
+	ldr	r4, [\vcpup, #VCPU_TTBCR]
+	ldr	r5, [\vcpup, #VCPU_DACR]
+	add	\vcpup, \vcpup, #VCPU_TTBR0
+	ldrd	r6, r7, [\vcpup]
+	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	ldrd	r8, r9, [\vcpup]
+	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
+	ldr	r10, [\vcpup, #VCPU_PRRR]
+	ldr	r11, [\vcpup, #VCPU_NMRR]
+	.endif
+
+	mcr	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mcr	p15, 0, r3, c1, c0, 2	@ CPACR
+	mcr	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mcr	p15, 0, r5, c3, c0, 0	@ DACR
+	mcrr	p15, 0, r6, r7, c2	@ TTBR 0
+	mcrr	p15, 1, r8, r9, c2	@ TTBR 1
+	mcr	p15, 0, r10, c10, c2, 0	@ PRRR
+	mcr	p15, 0, r11, c10, c2, 1	@ NMRR
+.endm
+
+/* Configures the HSTR (Hyp System Trap Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hstr entry
+	mrc	p15, 4, r2, c1, c1, 3
+	ldr	r3, =0x9e00
+	.if \entry == 1
+	orr	r2, r2, r3		@ Trap CR{9,10,11,12,15}
+	.else
+	bic	r2, r2, r3		@ Don't trap any CRx accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 3
+.endm
+
+/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
+.macro configure_hyp_role entry, vcpu_ptr
+	mrc	p15, 4, r2, c1, c1, 0	@ HCR
+	bic	r2, r2, #HCR_VIRT_EXCP_MASK
+	ldr	r3, =HCR_GUEST_MASK
+	.if \entry == 1
+	orr	r2, r2, r3
+	ldr	r3, [\vcpu_ptr, #VCPU_IRQ_LINES]
+	orr	r2, r2, r3
+	.else
+	bic	r2, r2, r3
+	.endif
+	mcr	p15, 4, r2, c1, c1, 0
+.endm
+
+@ Arguments:
+@  r0: pointer to vcpu struct
 ENTRY(__kvm_vcpu_run)
-	bx	lr
+	hvc	#0			@ switch to hyp-mode
+
+	@ Now we're in Hyp-mode and lr_usr, spsr_hyp are on the stack
+	mrs	r2, sp_usr
+	push	{r2}			@ Push r13_usr
+	push	{r4-r12}		@ Push r4-r12
+
+	store_mode_state sp, svc
+	store_mode_state sp, abt
+	store_mode_state sp, und
+	store_mode_state sp, irq
+	store_mode_state sp, fiq
+
+	@ Store hardware CP15 state and load guest state
+	read_cp15_state
+	write_cp15_state 1, r0
+
+	push	{r0}			@ Push the VCPU pointer
+
+	@ Set up guest memory translation
+	ldr	r1, [r0, #VCPU_KVM]	@ r1 points to kvm struct
+	add	r1, r1, #KVM_VTTBR
+	ldrd	r2, r3, [r1]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Configure Hyp-role
+	configure_hyp_role 1, r0
+
+	@ Trap coprocessor CRx for all x except 2 and 14
+	set_hstr 1
+
+	@ Write standard A-9 CPU id in MIDR
+	ldr	r1, [r0, #VCPU_MIDR]
+	mcr	p15, 4, r1, c0, c0, 0
+
+	@ Write guest view of MPIDR into VMPIDR
+	ldr	r1, [r0, #VCPU_MPIDR]
+	mcr	p15, 4, r1, c0, c0, 5
+
+	@ Load guest registers
+	add	r0, r0, #(VCPU_USR_SP)
+	load_mode_state r0, usr
+	load_mode_state r0, svc
+	load_mode_state r0, abt
+	load_mode_state r0, und
+	load_mode_state r0, irq
+	load_mode_state r0, fiq
+
+	@ Load return state (r0 now points to vcpu->arch.regs.pc)
+	ldmia	r0, {r2, r3}
+	msr	ELR_hyp, r2
+	msr	SPSR_cxsf, r3
+
+	@ Load remaining registers and do the switch
+	sub	r0, r0, #(VCPU_PC - VCPU_USR_REGS)
+	ldmia	r0, {r0-r12}
+	eret
+
+__kvm_vcpu_return:
+	@ Store return state
+	mrs	r2, ELR_hyp
+	mrs	r3, spsr
+	str	r2, [r1, #VCPU_PC]
+	str	r3, [r1, #VCPU_CPSR]
+
+	@ Store guest registers
+	add	r1, r1, #(VCPU_FIQ_SPSR + 4)
+	store_mode_state r1, fiq
+	store_mode_state r1, irq
+	store_mode_state r1, und
+	store_mode_state r1, abt
+	store_mode_state r1, svc
+	store_mode_state r1, usr
+	sub	r1, r1, #(VCPU_USR_REG(13))
+
+	@ Don't trap coprocessor accesses for host kernel
+	set_hstr 0
+
+	@ Reset Hyp-role
+	configure_hyp_role 0, r1
+
+	@ Let guest read hardware MIDR
+	mrc	p15, 0, r2, c0, c0, 0
+	mcr	p15, 4, r2, c0, c0, 0
+
+	@ Back to hardware MPIDR
+	mrc	p15, 0, r2, c0, c0, 5
+	mcr	p15, 4, r2, c0, c0, 5
+
+	@ Set VMID == 0
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Store guest CP15 state and restore host state
+	read_cp15_state 1, r1
+	write_cp15_state
 
+	load_mode_state sp, fiq
+	load_mode_state sp, irq
+	load_mode_state sp, und
+	load_mode_state sp, abt
+	load_mode_state sp, svc
+
+	pop	{r4-r12}		@ Pop r4-r12
+	pop	{r2}			@ Pop r13_usr
+	msr	sp_usr, r2
+	ldr	r2, [sp, #4]		@ Get svc-cpsr in case we need if for
+					@ the __irq_svc call
+
+	hvc	#0			@ switch back to svc-mode, see hyp_svc
+
+	cmp	r0, #ARM_EXCEPTION_IRQ
+	bxne	lr			@ return to IOCTL
+
+	/*
+	 * It's time to launch the kernel IRQ handler for IRQ exceptions. This
+	 * requires some manipulation though.
+	 *
+	 *  - The easiest entry point to the host handler is __irq_svc.
+	 *  - The __irq_svc expects to be called from SVC mode, which has been
+	 *    switched to from vector_stub code in entry-armv.S. The __irq_svc
+	 *    calls svc_entry which uses values stored in memory and pointed to
+	 *    by r0 to return from handler. We allocate this memory on the
+	 *    stack, which will contain these values:
+	 *      0x8:   cpsr
+	 *      0x4:   return_address
+	 *      0x0:   r0
+	 */
+	adr	r1, irq_kernel_resume	@ Where to resume
+	push	{r0 - r2}
+	mov	r0, sp
+	b	__irq_svc
+
+irq_kernel_resume:
+	pop	{r0}
+	add	sp, sp, #8
+	bx	lr			@ return to IOCTL
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor exception vector and handlers
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/*
+ * The KVM/ARM Hypervisor ABI is defined as follows:
+ *
+ * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
+ * instruction is issued since all traps are disabled when running the host
+ * kernel as per the Hyp-mode initialization at boot time.
+ *
+ * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
+ * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
+ * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
+ * instructions are called from within Hyp-mode.
+ *
+ * Hyp-ABI: Switching from host kernel to Hyp-mode:
+ *    Switching to Hyp mode is done through a simple HVC instructions. The
+ *    exception vector code will check that the HVC comes from VMID==0 and if
+ *    so will store the necessary state on the Hyp stack, which will look like
+ *    this (growing downwards, see the hyp_hvc handler):
+ *      ...
+ *      Hyp_Sp + 4: spsr (Host-SVC cpsr)
+ *      Hyp_Sp    : lr_usr
+ *
+ * Hyp-ABI: Switching from Hyp-mode to host kernel SVC mode:
+ *    When returning from Hyp mode to SVC mode, another HVC instruction is
+ *    executed from Hyp mode, which is taken in the hyp_svc handler. The Hyp
+ *    stack pointer should be where it was left from the above initial call,
+ *    since the values on the stack will be used to restore state.
+ *
+ *
+ * Note that the above is used to execute code in Hyp-mode from a host-kernel
+ * point of view, and is a different concept from performing a world-switch and
+ * executing guest code SVC mode (with a VMID != 0).
+ */
+	.text
+
 	.align 5
 __kvm_hyp_vector:
 	.globl __kvm_hyp_vector
-	nop
+
+	@ Hyp-mode exception vector
+	W(b)	hyp_reset
+	W(b)	hyp_undef
+	W(b)	hyp_svc
+	W(b)	hyp_pabt
+	W(b)	hyp_dabt
+	W(b)	hyp_hvc
+	W(b)	hyp_irq
+	W(b)	hyp_fiq
+
+	.align
+hyp_reset:
+	b	hyp_reset
+
+	.align
+hyp_undef:
+	b	hyp_undef
+
+	.align
+hyp_svc:
+	@ Can only get here if HVC or SVC is called from Hyp, mode which means
+	@ we want to change mode back to SVC mode.
+	@ NB: Stack pointer should be where hyp_hvc handler left it!
+	ldr	lr, [sp, #4]
+	msr	spsr, lr
+	ldr	lr, [sp]
+	add	sp, sp, #8
+	eret
+
+	.align
+hyp_pabt:
+	b	hyp_pabt
+
+	.align
+hyp_dabt:
+	b	hyp_dabt
+
+	.align
+hyp_hvc:
+	@ Getting here is either becuase of a trap from a guest or from calling
+	@ HVC from the host kernel, which means "switch to Hyp mode".
+	push	{r0, r1, r2}
+
+	@ Check syndrome register
+	mrc	p15, 4, r0, c5, c2, 0	@ HSR
+	lsr	r1, r0, #HSR_EC_SHIFT
+	cmp	r1, #HSR_EC_HVC
+	bne	guest_trap		@ Not HVC instr.
+
+	@ Let's check if the HVC came from VMID 0 and allow simple
+	@ switch to Hyp mode
+	mrrc    p15, 6, r1, r2, c2
+	lsr     r2, r2, #16
+	and     r2, r2, #0xff
+	cmp     r2, #0
+	bne	guest_trap		@ Guest called HVC
+
+	pop	{r0, r1, r2}
+
+	@ Store lr_usr,spsr (svc cpsr) on stack
+	sub	sp, sp, #8
+	str	lr, [sp]
+	mrs	lr, spsr
+	str	lr, [sp, #4]
+
+	@ Return to caller in Hyp mode
+	mrs	lr, ELR_hyp
+	mov	pc, lr
+
+guest_trap:
+	ldr	r1, [sp, #12]		@ Load VCPU pointer
+	str	r0, [r1, #VCPU_HSR]
+	add	r1, r1, #VCPU_USR_REG(3)
+	stmia	r1, {r3-r12}
+	sub	r1, r1, #(VCPU_USR_REG(3) - VCPU_USR_REG(0))
+	pop	{r3, r4, r5}
+	add	sp, sp, #4		@ We loaded the VCPU pointer above
+	stmia	r1, {r3, r4, r5}
+	sub	r1, r1, #VCPU_USR_REG(0)
+
+	@ Check if we need the fault information
+	lsr	r2, r0, #HSR_EC_SHIFT
+	cmp	r2, #HSR_EC_IABT
+	beq	2f
+	cmpne	r2, #HSR_EC_DABT
+	bne	1f
+
+	@ For non-valid data aborts, get the offending instr. PA
+	lsr	r2, r0, #HSR_ISV_SHIFT
+	ands	r2, r2, #1
+	bne	2f
+	mrs	r3, ELR_hyp
+	mcr	p15, 0, r3, c7, c8, 0	@ VA to PA, ATS1CPR
+	mrrc	p15, 0, r4, r5, c7	@ PAR
+	add	r6, r1, #VCPU_PC_IPA
+	strd	r4, r5, [r6]
+
+	@ Check if we might have a wide thumb instruction spill-over
+	ldr	r5, =0xfff
+	bic	r4, r3, r5		@ clear page mask
+	sub	r5, r5, #1		@ last 2-byte page bounday, 0xffe
+	cmp	r4, r5
+	bne	2f
+	add	r4, r3, #2		@ _really_ unlikely!
+	mcr	p15, 0, r4, c7, c8, 0	@ VA to PA, ATS1CPR
+	mrrc	p15, 0, r4, r5, c7	@ PAR
+	add	r6, r1, #VCPU_PC_IPA2
+	strd	r4, r5, [r6]
+
+2:	mrc	p15, 4, r2, c6, c0, 0	@ HDFAR
+	mrc	p15, 4, r3, c6, c0, 2	@ HIFAR
+	mrc	p15, 4, r4, c6, c0, 4	@ HPFAR
+	add	r5, r1, #VCPU_HDFAR
+	stmia	r5, {r2, r3, r4}
+
+1:	mov	r0, #ARM_EXCEPTION_HVC
+	b	__kvm_vcpu_return
+
+	.align
+hyp_irq:
+	push	{r0}
+	ldr	r0, [sp, #4]		@ Load VCPU pointer
+	add	r0, r0, #(VCPU_USR_REG(1))
+	stmia	r0, {r1-r12}
+	pop	{r0, r1}		@ r1 == vcpu pointer
+	str	r0, [r1, #VCPU_USR_REG(0)]
+
+	mov	r0, #ARM_EXCEPTION_IRQ
+	b	__kvm_vcpu_return
+
+	.align
+hyp_fiq:
+	b	hyp_fiq
+
+	.ltorg
 
 /*
  * The below lines makes sure the HYP mode code fits in a single page (the


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 12/15] ARM: KVM: Emulation framework and CP15 emulation
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (10 preceding siblings ...)
  2012-06-15 19:08 ` [PATCH v8 11/15] ARM: KVM: World-switch implementation Christoffer Dall
@ 2012-06-15 19:08 ` Christoffer Dall
  2012-06-15 19:09 ` [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM Christoffer Dall
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:08 UTC (permalink / raw)
  To: android-virt, kvm

From: Christoffer Dall <cdall@cs.columbia.edu>

Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.

Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commit handles these exits by
emulating the intended operation in software and skip the guest
instruction.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_emulate.h |    9 +
 arch/arm/kvm/arm.c                 |   87 +++++++++
 arch/arm/kvm/emulate.c             |  330 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h               |   28 +++
 4 files changed, 454 insertions(+)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 7ab696d..4b49168 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -40,6 +40,15 @@ static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
 	return modes_table[vcpu->arch.regs.cpsr & 0xf];
 }
 
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_adjust_itstate(struct kvm_vcpu *vcpu);
+
 /*
  * Return the SPSR for the specified mode of the virtual CPU.
  */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index efbdcf9..24de3b7 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -37,6 +37,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_emulate.h>
 
 static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 
@@ -359,6 +360,86 @@ static void update_vttbr(struct kvm *kvm)
 	spin_unlock(&kvm_vmid_lock);
 }
 
+static int handle_svc_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* SVC called from Hyp mode should never get here */
+	kvm_debug("SVC called from Hyp mode shouldn't go here\n");
+	BUG();
+	return -EINVAL; /* Squash warning */
+}
+
+static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/*
+	 * Guest called HVC instruction:
+	 * So far we are not doing anything here, but in the longer run we are
+	 * probably going to have some hypercall interface entry point
+	 * starting from here.
+	 */
+	kvm_debug("hvc: %x (at %08x)", vcpu->arch.hsr & ((1 << 16) - 1),
+				     vcpu->arch.regs.pc);
+	kvm_debug("         HSR: %8x", vcpu->arch.hsr);
+	return 0;
+}
+
+static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* We don't support SMC; don't do that. */
+	kvm_debug("smc: at %08x", vcpu->arch.regs.pc);
+	return -EINVAL;
+}
+
+static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* The hypervisor should never cause aborts */
+	kvm_debug("The hypervisor itself shouldn't cause aborts\n");
+	BUG();
+	return -EINVAL; /* Squash warning */
+}
+
+static int (*arm_exit_handlers[])(struct kvm_vcpu *vcpu, struct kvm_run *r) = {
+	[HSR_EC_WFI]		= kvm_handle_wfi,
+	[HSR_EC_CP15_32]	= kvm_handle_cp15_32,
+	[HSR_EC_CP15_64]	= kvm_handle_cp15_64,
+	[HSR_EC_CP14_MR]	= kvm_handle_cp14_access,
+	[HSR_EC_CP14_LS]	= kvm_handle_cp14_load_store,
+	[HSR_EC_CP14_64]	= kvm_handle_cp14_access,
+	[HSR_EC_CP_0_13]	= kvm_handle_cp_0_13_access,
+	[HSR_EC_CP10_ID]	= kvm_handle_cp10_id,
+	[HSR_EC_SVC_HYP]	= handle_svc_hyp,
+	[HSR_EC_HVC]		= handle_hvc,
+	[HSR_EC_SMC]		= handle_smc,
+	[HSR_EC_IABT]		= kvm_handle_guest_abort,
+	[HSR_EC_DABT]		= kvm_handle_guest_abort,
+	[HSR_EC_DABT_HYP]	= handle_dabt_hyp,
+};
+
+static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			      int exception_index)
+{
+	unsigned long hsr_ec;
+
+	if (exception_index == ARM_EXCEPTION_IRQ)
+		return 0;
+
+	if (exception_index != ARM_EXCEPTION_HVC) {
+		kvm_pr_unimpl("Unsupported exception type: %d",
+			      exception_index);
+		return -EINVAL;
+	}
+
+	hsr_ec = (vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT;
+
+	if (hsr_ec >= ARRAY_SIZE(arm_exit_handlers)
+	    || !arm_exit_handlers[hsr_ec]) {
+		kvm_err("Unkown exception class: %08lx, hsr: %08x\n", hsr_ec,
+			(unsigned int)vcpu->arch.hsr);
+		BUG();
+	}
+
+	return arm_exit_handlers[hsr_ec](vcpu, run);
+}
+
 /**
  * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
  * @vcpu:	The VCPU pointer
@@ -410,6 +491,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		local_irq_enable();
 
 		trace_kvm_exit(vcpu->arch.regs.pc);
+
+		ret = handle_exit(vcpu, run, ret);
+		if (ret) {
+			kvm_err("Error in handle_exit\n");
+			break;
+		}
 	}
 
 	if (vcpu->sigset_active)
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 0e5bd90..c7a0663 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -14,7 +14,13 @@
  *
  */
 
+#include <linux/mm.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
 
 #define REG_OFFSET(_reg) \
 	(offsetof(struct kvm_vcpu_regs, _reg) / sizeof(u32))
@@ -123,3 +129,327 @@ u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
 
 	return reg_array + vcpu_reg_offsets[mode][reg_num];
 }
+
+/******************************************************************************
+ * Co-processor emulation
+ */
+
+struct coproc_params {
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+	unsigned long Rt1;
+	unsigned long Rt2;
+	bool is_64bit;
+	bool is_write;
+};
+
+static void print_cp_instr(const struct coproc_params *p)
+{
+	/* Look, we even formatted it for you to paste into the table! */
+	if (p->is_64bit) {
+		kvm_err("{ CRn(DF), CRm(%2lu), Op1(%2lu), Op2(DF), is64, %-6s"
+			" func, arg},\n",
+			p->CRm, p->Op1, p->is_write ? "WRITE," : "READ,");
+	} else {
+		kvm_err("{ CRn(%2lu), CRm(%2lu), Op1(%2lu), Op2(%2lu), is32,"
+			" %-6s func, arg},\n",
+			p->CRn, p->CRm, p->Op1, p->Op2,
+			p->is_write ? "WRITE," : "READ,");
+	}
+}
+
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+static bool ignore_write(struct kvm_vcpu *vcpu,
+			 const struct coproc_params *p,
+			 unsigned long arg)
+{
+	if (arg)
+		trace_kvm_emulate_cp15_imp(p->Op1, p->Rt1, p->CRn, p->CRm,
+					   p->Op2, p->is_write);
+	return true;
+}
+
+static bool read_zero(struct kvm_vcpu *vcpu,
+		      const struct coproc_params *p,
+		      unsigned long arg)
+{
+	if (arg)
+		trace_kvm_emulate_cp15_imp(p->Op1, p->Rt1, p->CRn, p->CRm,
+					   p->Op2, p->is_write);
+	*vcpu_reg(vcpu, p->Rt1) = 0;
+	return true;
+}
+
+static bool read_l2ctlr(struct kvm_vcpu *vcpu,
+			const struct coproc_params *p,
+			unsigned long arg)
+{
+	u32 l2ctlr, ncores;
+
+	asm volatile("mrc p15, 1, %0, c9, c0, 2\n" : "=r" (l2ctlr));
+	l2ctlr &= ~(3 << 24);
+	ncores = atomic_read(&vcpu->kvm->online_vcpus) - 1;
+	l2ctlr |= (ncores & 3) << 24;
+	*vcpu_reg(vcpu, p->Rt1) = l2ctlr;
+
+	return true;
+}
+
+static bool read_actlr(struct kvm_vcpu *vcpu,
+		       const struct coproc_params *p,
+		       unsigned long arg)
+{
+	u32 actlr;
+
+	asm volatile("mrc p15, 0, %0, c1, c0, 1\n" : "=r" (actlr));
+	/* Make the SMP bit consistent with the guest configuration */
+	/* TODO: Check emualted processor for bit accuracy */
+	if (atomic_read(&vcpu->kvm->online_vcpus) > 1)
+		actlr |= 1U << 6;
+	else
+		actlr &= ~(1U << 6);
+	*vcpu_reg(vcpu, p->Rt1) = actlr;
+
+	return true;
+}
+
+static bool access_cp15_reg(struct kvm_vcpu *vcpu,
+			    const struct coproc_params *p,
+			    unsigned long cp15_reg)
+{
+	if (p->is_write)
+		vcpu->arch.cp15[cp15_reg] = *vcpu_reg(vcpu, p->Rt1);
+	else
+		*vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[cp15_reg];
+	return true;
+}
+
+/* Any field which is 0xFFFFFFFF == DF */
+struct coproc_emulate {
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+
+	unsigned long is_64;
+	unsigned long is_w;
+
+	bool (*f)(struct kvm_vcpu *,
+		  const struct coproc_params *,
+		  unsigned long);
+	unsigned long arg;
+};
+
+#define DF (-1UL) /* Default: If nothing else fits, use this one */
+#define CRn(_x)		.CRn = _x
+#define CRm(_x) 	.CRm = _x
+#define Op1(_x) 	.Op1 = _x
+#define Op2(_x) 	.Op2 = _x
+#define is64		.is_64 = true
+#define is32		.is_64 = false
+#define READ		.is_w  = false
+#define WRITE		.is_w  = true
+#define RW		.is_w  = DF
+
+static const struct coproc_emulate coproc_emulate[] = {
+	/*
+	 * ACTRL access:
+	 *
+	 * Ignore writes, and read returns the host settings.
+	 */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 1), is32, WRITE, ignore_write},
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 1), is32, READ,  read_actlr},
+	/*
+	 * L2CTLR access:
+	 *
+	 * Ignore writes completely.
+	 *
+	 * FIXME: Hack Alert: Read zero as default case.
+	 */
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 2), is32,  WRITE, ignore_write},
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 2), is32,  READ,  read_l2ctlr},
+	{ CRn( 9), CRm(DF), Op1(DF), Op2(DF), is32,  WRITE, ignore_write},
+	{ CRn( 9), CRm(DF), Op1(DF), Op2(DF), is32,  READ,  read_zero},
+
+	/*
+	 * These CRn == 10 entries may not need to exist - if we can
+	 * ignore guest attempts to tamper with TLB lockdowns then it
+	 * should be enough to store/restore the host/guest PRRR and
+	 * NMRR memory remap registers and allow guest direct access
+	 * to these registers.
+	 *
+	 * TLB Lockdown operations - ignored
+	 */
+	{ CRn(10), CRm( 0), Op1(DF), Op2(DF), is32,  WRITE, ignore_write},
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 0), is32,  RW,    access_cp15_reg,
+							    c10_PRRR},
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 1), is32,  RW,    access_cp15_reg,
+							    c10_NMRR},
+
+	/*
+	 * The CP15 c15 register is architecturally implementation
+	 * defined, but some guest kernels attempt to read/write a
+	 * diagnostics register here. We always return 0 and ignore
+	 * writes and hope for the best.
+	 */
+	{ CRn(15), CRm(DF), Op1(DF), Op2(DF), is32,  WRITE, ignore_write, 1},
+	{ CRn(15), CRm(DF), Op1(DF), Op2(DF), is32,  READ,  read_zero,    1},
+};
+
+#undef is64
+#undef is32
+#undef READ
+#undef WRITE
+#undef RW
+
+static inline bool match(unsigned long val, unsigned long param)
+{
+	return param == DF || val == param;
+}
+
+static int emulate_cp15(struct kvm_vcpu *vcpu,
+			const struct coproc_params *params)
+{
+	unsigned long instr_len, i;
+
+	for (i = 0; i < ARRAY_SIZE(coproc_emulate); i++) {
+		const struct coproc_emulate *e = &coproc_emulate[i];
+
+		if (!match(params->is_64bit, e->is_64))
+			continue;
+		if (!match(params->is_write, e->is_w))
+			continue;
+		if (!match(params->CRn, e->CRn))
+			continue;
+		if (!match(params->CRm, e->CRm))
+			continue;
+		if (!match(params->Op1, e->Op1))
+			continue;
+		if (!match(params->Op2, e->Op2))
+			continue;
+
+		/* If function fails, it should complain. */
+		if (!e->f(vcpu, params, e->arg))
+			goto fail;
+
+		/* Skip instruction, since it was emulated */
+		instr_len = ((vcpu->arch.hsr >> 25) & 1) ? 4 : 2;
+		*vcpu_pc(vcpu) += instr_len;
+		kvm_adjust_itstate(vcpu);
+		return 0;
+	}
+
+	kvm_err("Unsupported guest CP15 access at: %08x\n",
+		vcpu->arch.regs.pc);
+	print_cp_instr(params);
+fail:
+	return -EINVAL;
+}
+
+/**
+ * kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = true;
+
+	params.Op1 = (vcpu->arch.hsr >> 16) & 0xf;
+	params.Op2 = 0;
+	params.Rt2 = (vcpu->arch.hsr >> 10) & 0xf;
+	params.CRn = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+/**
+ * kvm_handle_cp15_32 -- handles a mrc/mcr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = false;
+
+	params.CRn = (vcpu->arch.hsr >> 10) & 0xf;
+	params.Op1 = (vcpu->arch.hsr >> 14) & 0x7;
+	params.Op2 = (vcpu->arch.hsr >> 17) & 0x7;
+	params.Rt2 = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return 0;
+}
+
+/**
+ * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
+ * @vcpu:	The VCPU pointer
+ *
+ * When exceptions occur while instructions are executed in Thumb IF-THEN
+ * blocks, the ITSTATE field of the CPSR is not advanved (updated), so we have
+ * to do this little bit of work manually. The fields map like this:
+ *
+ * IT[7:0] -> CPSR[26:25],CPSR[15:10]
+ */
+void kvm_adjust_itstate(struct kvm_vcpu *vcpu)
+{
+	unsigned long itbits, cond;
+	unsigned long cpsr = *vcpu_cpsr(vcpu);
+	bool is_arm = !(cpsr & PSR_T_BIT);
+
+	BUG_ON(is_arm && (cpsr & PSR_IT_MASK));
+
+	if (!(cpsr & PSR_IT_MASK))
+		return;
+
+	cond = (cpsr & 0xe000) >> 13;
+	itbits = (cpsr & 0x1c00) >> (10 - 2);
+	itbits |= (cpsr & (0x3 << 25)) >> 25;
+
+	/* Perform ITAdvance (see page A-52 in ARM DDI 0406C) */
+	if ((itbits & 0x7) == 0)
+		itbits = cond = 0;
+	else
+		itbits = (itbits << 1) & 0x1f;
+
+	cpsr &= ~PSR_IT_MASK;
+	cpsr |= cond << 13;
+	cpsr |= (itbits & 0x1c) << (10 - 2);
+	cpsr |= (itbits & 0x3) << 25;
+	*vcpu_cpsr(vcpu) = cpsr;
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index f8869c1..e474a0a 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,7 +39,35 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+/* Architecturally implementation defined CP15 register access */
+TRACE_EVENT(kvm_emulate_cp15_imp,
+	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
+		 unsigned long CRm, unsigned long Op2, bool is_write),
+	TP_ARGS(Op1, Rt1, CRn, CRm, Op2, is_write),
 
+	TP_STRUCT__entry(
+		__field(	unsigned int,	Op1		)
+		__field(	unsigned int,	Rt1		)
+		__field(	unsigned int,	CRn		)
+		__field(	unsigned int,	CRm		)
+		__field(	unsigned int,	Op2		)
+		__field(	bool,		is_write	)
+	),
+
+	TP_fast_assign(
+		__entry->is_write		= is_write;
+		__entry->Op1			= Op1;
+		__entry->Rt1			= Rt1;
+		__entry->CRn			= CRn;
+		__entry->CRm			= CRm;
+		__entry->Op2			= Op2;
+	),
+
+	TP_printk("Implementation defined CP15: %s\tp15, %u, r%u, c%u, c%u, %u",
+			(__entry->is_write) ? "mcr" : "mrc",
+			__entry->Op1, __entry->Rt1, __entry->CRn,
+			__entry->CRm, __entry->Op2)
+);
 
 #endif /* _TRACE_KVM_H */
 


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (11 preceding siblings ...)
  2012-06-15 19:08 ` [PATCH v8 12/15] ARM: KVM: Emulation framework and CP15 emulation Christoffer Dall
@ 2012-06-15 19:09 ` Christoffer Dall
  2012-06-18 13:45   ` Avi Kivity
  2012-06-15 19:09 ` [PATCH v8 14/15] ARM: KVM: Handle I/O aborts Christoffer Dall
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:09 UTC (permalink / raw)
  To: android-virt, kvm

From: Christoffer Dall <cdall@cs.columbia.edu>

Handles the guest faults in KVM by mapping in corresponding user pages
in the 2nd stage page tables.

Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
pgprot_guest variables used to map 2nd stage memory for KVM guests.

Leverages MMU notifiers on KVM/ARM by supporting the kvm_unmap_hva() operation,
where we remove the HVA from the 2nd stage translation. All other KVM MMU
notifierhooks are NOPs.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_asm.h        |    3 +
 arch/arm/include/asm/kvm_host.h       |   20 ++++
 arch/arm/include/asm/pgtable-3level.h |    8 ++
 arch/arm/include/asm/pgtable.h        |    4 +
 arch/arm/kvm/Kconfig                  |    1 
 arch/arm/kvm/exports.c                |    1 
 arch/arm/kvm/interrupts.S             |   37 +++++++
 arch/arm/kvm/mmu.c                    |  164 +++++++++++++++++++++++++++++++++
 arch/arm/mm/mmu.c                     |    3 +
 9 files changed, 240 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index c2ec131..c4f40f3 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -32,6 +32,7 @@
 #define SMCHYP_HVBAR_W 0xfffffff0
 
 #ifndef __ASSEMBLY__
+struct kvm;
 struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
@@ -45,6 +46,8 @@ extern char __kvm_hyp_vector[];
 extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+
 extern void __kvm_flush_vm_context(void);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ed4144b..21712c5 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -116,4 +116,24 @@ struct kvm_vcpu_stat {
 	u32 halt_wakeup;
 };
 
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+struct kvm;
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+
+/* We do not have shadow page tables, hence the empty hooks */
+static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
+
+static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
+
+static inline void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva,
+				    pte_t pte)
+{
+}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 1169a8a..e10bb5e 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -102,6 +102,14 @@
  */
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
 
+/*
+ * 2-nd stage PTE definitions for LPAE.
+ */
+#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
+#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */
+#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
+#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */
+
 #ifndef __ASSEMBLY__
 
 #define pud_none(pud)		(!pud_val(pud))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 4b72287..2561a8b 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 
 extern pgprot_t		pgprot_user;
 extern pgprot_t		pgprot_kernel;
+extern pgprot_t		pgprot_guest;
 
 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
 
@@ -83,6 +84,9 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
 #define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
+#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
+					  L_PTE2_WRITE | L_PTE2_NORM_WB | \
+					  L_PTE2_INNER_WB)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 83abbe0..7fa50d3 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM_ARM_HOST
 	depends on KVM
 	depends on MMU
 	depends on CPU_V7 && ARM_VIRT_EXT
+	select	MMU_NOTIFIER
 	---help---
 	  Provides host support for ARM processors.
 
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index 9bdaf11..ec6e7c4 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -30,5 +30,6 @@ EXPORT_SYMBOL_GPL(__kvm_hyp_code_end);
 EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
 
 EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
+EXPORT_SYMBOL_GPL(__kvm_tlb_flush_vmid);
 
 EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 13b7b85..4a70250 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -34,9 +34,46 @@ __kvm_hyp_code_start:
 	.globl __kvm_hyp_code_start
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Flush per-VMID TLBs
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+/*
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm);
+ *
+ * We rely on the hardware to broadcast the TLB invalidation to all CPUs
+ * inside the inner-shareable domain (which is the case for all v7
+ * implementations).  If we come across a non-IS SMP implementation, we'll
+ * have to use an IPI based mechanism. Until then, we stick to the simple
+ * hardware assisted version.
+ */
+ENTRY(__kvm_tlb_flush_vmid)
+	hvc	#0			@ Switch to Hyp mode
+	push	{r2, r3}
+
+	add	r0, r0, #KVM_VTTBR
+	ldrd	r2, r3, [r0]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+	isb
+	mcr     p15, 0, r0, c8, c3, 0	@ TLBIALLIS (rt ignored)
+	dsb
+	isb
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
+	isb
+
+	pop	{r2, r3}
+	hvc	#0			@ Back to SVC
+	bx	lr
+ENDPROC(__kvm_tlb_flush_vmid)
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Flush TLBs and instruction caches of current CPU for all VMIDs
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/*
+ * void __kvm_flush_vm_context(void);
+ */
 ENTRY(__kvm_flush_vm_context)
 	hvc	#0			@ switch to hyp-mode
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index b256540..8ea311a 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -19,6 +19,7 @@
 #include <asm/pgalloc.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_asm.h>
 
 static pgd_t *kvm_hyp_pgd;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
@@ -166,6 +167,9 @@ out:
  * Allocates the 1st level table only of size defined by PGD2_ORDER (can
  * support either full 40-bit input addresses or limited to 32-bit input
  * addresses). Clears the allocated pages.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * created, which can only be done once.
  */
 int kvm_alloc_stage2_pgd(struct kvm *kvm)
 {
@@ -227,6 +231,9 @@ static void free_stage2_ptes(pmd_t *pmd, unsigned long addr)
  * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
  * underlying level-2 and level-3 tables before freeing the actual level-1 table
  * and setting the struct pointer to NULL.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * destroyed, which can only be done once.
  */
 void kvm_free_stage2_pgd(struct kvm *kvm)
 {
@@ -262,9 +269,164 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
 	kvm->arch.pgd = NULL;
 }
 
+static const pte_t null_pte;
+
+static int stage2_set_pte(struct kvm *kvm, phys_addr_t addr,
+			  const pte_t *new_pte)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	/* Create 2nd stage page table mapping - Level 1 */
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud)) {
+		BUG_ON(new_pte == &null_pte);
+		pmd = pmd_alloc_one(NULL, addr);
+		if (!pmd) {
+			kvm_err("Cannot allocate 2nd stage pmd\n");
+			return -ENOMEM;
+		}
+		pud_populate(NULL, pud, pmd);
+		pmd += pmd_index(addr);
+	} else
+		pmd = pmd_offset(pud, addr);
+
+	/* Create 2nd stage page table mapping - Level 2 */
+	if (pmd_none(*pmd)) {
+		BUG_ON(new_pte == &null_pte);
+		pte = pte_alloc_one_kernel(NULL, addr);
+		if (!pte) {
+			kvm_err("Cannot allocate 2nd stage pte\n");
+			return -ENOMEM;
+		}
+		pmd_populate_kernel(NULL, pmd, pte);
+		pte += pte_index(addr);
+	} else
+		pte = pte_offset_kernel(pmd, addr);
+
+	/* Create 2nd stage page table mapping - Level 3 */
+	set_pte_ext(pte, *new_pte, 0);
+
+	return 0;
+}
+
+static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			  gfn_t gfn, struct kvm_memory_slot *memslot)
+{
+	pte_t new_pte;
+	pfn_t pfn;
+	int ret;
+
+	pfn = gfn_to_pfn(vcpu->kvm, gfn);
+
+	if (is_error_pfn(pfn)) {
+		put_page(pfn_to_page(pfn));
+		kvm_err("Guest gfn %u (0x%08x) does not have\n"
+				"corresponding host mapping",
+				(unsigned int)gfn,
+				(unsigned int)gfn << PAGE_SHIFT);
+		return -EFAULT;
+	}
+
+	mutex_lock(&vcpu->kvm->arch.pgd_mutex);
+	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
+	ret = stage2_set_pte(vcpu->kvm, fault_ipa, &new_pte);
+	if (ret)
+		put_page(pfn_to_page(pfn));
+	mutex_unlock(&vcpu->kvm->arch.pgd_mutex);
+
+	return ret;
+}
+
+#define HSR_ABT_FS	(0x3f)
+#define HPFAR_MASK	(~0xf)
+
+/**
+ * kvm_handle_guest_abort - handles all 2nd stage aborts
+ * @vcpu:	the VCPU pointer
+ * @run:	the kvm_run structure
+ *
+ * Any abort that gets to the host is almost guaranteed to be caused by a
+ * missing second stage translation table entry, which can mean that either the
+ * guest simply needs more memory and we must allocate an appropriate page or it
+ * can mean that the guest tried to access I/O memory, which is emulated by user
+ * space. The distinction is based on the IPA causing the fault and whether this
+ * memory region has been registered as standard RAM by user space.
+ */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	unsigned long hsr_ec;
+	unsigned long fault_status;
+	phys_addr_t fault_ipa;
+	struct kvm_memory_slot *memslot = NULL;
+	bool is_iabt;
+	gfn_t gfn;
+
+	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
+	is_iabt = (hsr_ec == HSR_EC_IABT);
+
+	/* Check that the second stage fault is a translation fault */
+	fault_status = vcpu->arch.hsr & HSR_ABT_FS;
+	if ((fault_status & 0x3c) != 0x4) {
+		kvm_err("Unsupported fault status: EC=%lx DFCS=%lx\n",
+			hsr_ec, fault_status);
+		return -EFAULT;
+	}
+
+	fault_ipa = ((phys_addr_t)vcpu->arch.hpfar & HPFAR_MASK) << 8;
+
+	gfn = fault_ipa >> PAGE_SHIFT;
+	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
+		if (is_iabt) {
+			kvm_err("Inst. abort on I/O address %08lx\n",
+				(unsigned long)fault_ipa);
+			return -EFAULT;
+		}
+
+		kvm_pr_unimpl("I/O address abort...");
+		return 0;
+	}
+
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot->user_alloc) {
+		kvm_err("non user-alloc memslots not supported\n");
+		return -EINVAL;
+	}
+
+	return user_mem_abort(vcpu, fault_ipa, gfn, memslot);
+}
+
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+	int needs_stage2_flush = 0;
+
+	slots = kvm_memslots(kvm);
+
+	/* we only care about the pages that the guest sees */
+	kvm_for_each_memslot(memslot, slots) {
+		unsigned long start = memslot->userspace_addr;
+		unsigned long end;
+
+		end = start + (memslot->npages << PAGE_SHIFT);
+		if (hva >= start && hva < end) {
+			gpa_t gpa_offset = hva - start;
+			gpa_t gpa = (memslot->base_gfn << PAGE_SHIFT) +
+				     gpa_offset;
+
+			stage2_set_pte(kvm, gpa, &null_pte);
+			needs_stage2_flush = 1;
+		}
+	}
+
+	if (needs_stage2_flush)
+		__kvm_tlb_flush_vmid(kvm);
+
+	return 0;
 }
 
 int kvm_hyp_pgd_alloc(void)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index f7439e7..7dd4b54 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -56,9 +56,11 @@ static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
+pgprot_t pgprot_guest;
 
 EXPORT_SYMBOL(pgprot_user);
 EXPORT_SYMBOL(pgprot_kernel);
+EXPORT_SYMBOL(pgprot_guest);
 
 struct cachepolicy {
 	const char	policy[16];
@@ -520,6 +522,7 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
+	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
 
 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 14/15] ARM: KVM: Handle I/O aborts
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (12 preceding siblings ...)
  2012-06-15 19:09 ` [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM Christoffer Dall
@ 2012-06-15 19:09 ` Christoffer Dall
  2012-06-18 13:48   ` Avi Kivity
  2012-06-15 19:09 ` [PATCH v8 15/15] ARM: KVM: Guest wait-for-interrupts (WFI) support Christoffer Dall
  2012-06-28 21:49 ` [PATCH v8 00/15] KVM/ARM Implementation Marcelo Tosatti
  15 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:09 UTC (permalink / raw)
  To: android-virt, kvm

When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.

Certain classes of load/store operations do not support the syndrome
information provided in the HSR and we therefore must be able to fetch
the offending instruction from guest memory and decode it manually.

This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_emulate.h |    2 
 arch/arm/include/asm/kvm_host.h    |    1 
 arch/arm/include/asm/kvm_mmu.h     |    1 
 arch/arm/kvm/arm.c                 |    6 +
 arch/arm/kvm/emulate.c             |  282 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/mmu.c                 |  162 ++++++++++++++++++++-
 arch/arm/kvm/trace.h               |   21 +++
 7 files changed, 472 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 4b49168..1f93e71 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -47,6 +47,8 @@ int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			unsigned long instr);
 void kvm_adjust_itstate(struct kvm_vcpu *vcpu);
 
 /*
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 21712c5..df62754 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -101,6 +101,7 @@ struct kvm_vcpu_arch {
 				   instructions */
 
 	/* IO related fields */
+	bool mmio_sign_extend;	/* for byte/halfword loads */
 	u32 mmio_rd;
 
 	/* Interrupt related fields */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index d95662eb..0ce51b0 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -37,6 +37,7 @@ void kvm_hyp_pgd_free(void);
 int kvm_alloc_stage2_pgd(struct kvm *kvm);
 void kvm_free_stage2_pgd(struct kvm *kvm);
 
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 24de3b7..eedf171 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -456,6 +456,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	int ret = 0;
 	sigset_t sigsaved;
 
+	if (run->exit_reason == KVM_EXIT_MMIO) {
+		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
+		if (ret)
+			return ret;
+	}
+
 	if (vcpu->sigset_active)
 		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
 
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index c7a0663..47aa04c 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -131,8 +131,30 @@ u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
 }
 
 /******************************************************************************
- * Co-processor emulation
+ * Utility functions common for all emulation code
+ *****************************************************************************/
+
+/*
+ * This one accepts a matrix where the first element is the
+ * bits as they must be, and the second element is the bitmask.
  */
+#define INSTR_NONE	-1
+static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
+{
+	int i;
+	u32 mask;
+
+	for (i = 0; i < table_entries; i++) {
+		mask = table[i][1];
+		if ((table[i][0] & mask) == (instr & mask))
+			return i;
+	}
+	return INSTR_NONE;
+}
+
+/******************************************************************************
+ * Co-processor emulation
+ *****************************************************************************/
 
 struct coproc_params {
 	unsigned long CRn;
@@ -416,6 +438,264 @@ int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return 0;
 }
 
+
+/******************************************************************************
+ * Load-Store instruction emulation
+ *****************************************************************************/
+
+/*
+ * Must be ordered with LOADS first and WRITES afterwards
+ * for easy distinction when doing MMIO.
+ */
+#define NUM_LD_INSTR  9
+enum INSTR_LS_INDEXES {
+	INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
+	INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
+	INSTR_LS_LDRSH,
+	INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
+	INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
+	NUM_LS_INSTR
+};
+
+static u32 ls_instr[NUM_LS_INSTR][2] = {
+	{0x04700000, 0x0d700000}, /* LDRBT */
+	{0x04300000, 0x0d700000}, /* LDRT  */
+	{0x04100000, 0x0c500000}, /* LDR   */
+	{0x04500000, 0x0c500000}, /* LDRB  */
+	{0x000000d0, 0x0e1000f0}, /* LDRD  */
+	{0x01900090, 0x0ff000f0}, /* LDREX */
+	{0x001000b0, 0x0e1000f0}, /* LDRH  */
+	{0x001000d0, 0x0e1000f0}, /* LDRSB */
+	{0x001000f0, 0x0e1000f0}, /* LDRSH */
+	{0x04600000, 0x0d700000}, /* STRBT */
+	{0x04200000, 0x0d700000}, /* STRT  */
+	{0x04000000, 0x0c500000}, /* STR   */
+	{0x04400000, 0x0c500000}, /* STRB  */
+	{0x000000f0, 0x0e1000f0}, /* STRD  */
+	{0x01800090, 0x0ff000f0}, /* STREX */
+	{0x000000b0, 0x0e1000f0}  /* STRH  */
+};
+
+static inline int get_arm_ls_instr_index(u32 instr)
+{
+	return kvm_instr_index(instr, ls_instr, NUM_LS_INSTR);
+}
+
+/*
+ * Load-Store instruction decoding
+ */
+#define INSTR_LS_TYPE_BIT		26
+#define INSTR_LS_RD_MASK		0x0000f000
+#define INSTR_LS_RD_SHIFT		12
+#define INSTR_LS_RN_MASK		0x000f0000
+#define INSTR_LS_RN_SHIFT		16
+#define INSTR_LS_RM_MASK		0x0000000f
+#define INSTR_LS_OFFSET12_MASK		0x00000fff
+
+#define INSTR_LS_BIT_P			24
+#define INSTR_LS_BIT_U			23
+#define INSTR_LS_BIT_B			22
+#define INSTR_LS_BIT_W			21
+#define INSTR_LS_BIT_L			20
+#define INSTR_LS_BIT_S			 6
+#define INSTR_LS_BIT_H			 5
+
+/*
+ * ARM addressing mode defines
+ */
+#define OFFSET_IMM_MASK			0x0e000000
+#define OFFSET_IMM_VALUE		0x04000000
+#define OFFSET_REG_MASK			0x0e000ff0
+#define OFFSET_REG_VALUE		0x06000000
+#define OFFSET_SCALE_MASK		0x0e000010
+#define OFFSET_SCALE_VALUE		0x06000000
+
+#define SCALE_SHIFT_MASK		0x000000a0
+#define SCALE_SHIFT_SHIFT		5
+#define SCALE_SHIFT_LSL			0x0
+#define SCALE_SHIFT_LSR			0x1
+#define SCALE_SHIFT_ASR			0x2
+#define SCALE_SHIFT_ROR_RRX		0x3
+#define SCALE_SHIFT_IMM_MASK		0x00000f80
+#define SCALE_SHIFT_IMM_SHIFT		6
+
+#define PSR_BIT_C			29
+
+static unsigned long ls_word_calc_offset(struct kvm_vcpu *vcpu,
+					 unsigned long instr)
+{
+	int offset = 0;
+
+	if ((instr & OFFSET_IMM_MASK) == OFFSET_IMM_VALUE) {
+		/* Immediate offset/index */
+		offset = instr & INSTR_LS_OFFSET12_MASK;
+
+		if (!(instr & (1U << INSTR_LS_BIT_U)))
+			offset = -offset;
+	}
+
+	if ((instr & OFFSET_REG_MASK) == OFFSET_REG_VALUE) {
+		/* Register offset/index */
+		u8 rm = instr & INSTR_LS_RM_MASK;
+		offset = *vcpu_reg(vcpu, rm);
+
+		if (!(instr & (1U << INSTR_LS_BIT_P)))
+			offset = 0;
+	}
+
+	if ((instr & OFFSET_SCALE_MASK) == OFFSET_SCALE_VALUE) {
+		/* Scaled register offset */
+		u8 rm = instr & INSTR_LS_RM_MASK;
+		u8 shift = (instr & SCALE_SHIFT_MASK) >> SCALE_SHIFT_SHIFT;
+		u32 shift_imm = (instr & SCALE_SHIFT_IMM_MASK)
+				>> SCALE_SHIFT_IMM_SHIFT;
+		offset = *vcpu_reg(vcpu, rm);
+
+		switch (shift) {
+		case SCALE_SHIFT_LSL:
+			offset = offset << shift_imm;
+			break;
+		case SCALE_SHIFT_LSR:
+			if (shift_imm == 0)
+				offset = 0;
+			else
+				offset = ((u32)offset) >> shift_imm;
+			break;
+		case SCALE_SHIFT_ASR:
+			if (shift_imm == 0) {
+				if (offset & (1U << 31))
+					offset = 0xffffffff;
+				else
+					offset = 0;
+			} else {
+				/* Ensure arithmetic shift */
+				asm("mov %[r], %[op], ASR %[s]" :
+				    [r] "=r" (offset) :
+				    [op] "r" (offset), [s] "r" (shift_imm));
+			}
+			break;
+		case SCALE_SHIFT_ROR_RRX:
+			if (shift_imm == 0) {
+				u32 C = (vcpu->arch.regs.cpsr &
+						(1U << PSR_BIT_C));
+				offset = (C << 31) | offset >> 1;
+			} else {
+				/* Ensure arithmetic shift */
+				asm("mov %[r], %[op], ASR %[s]" :
+				    [r] "=r" (offset) :
+				    [op] "r" (offset), [s] "r" (shift_imm));
+			}
+			break;
+		}
+
+		if (instr & (1U << INSTR_LS_BIT_U))
+			return offset;
+		else
+			return -offset;
+	}
+
+	if (instr & (1U << INSTR_LS_BIT_U))
+		return offset;
+	else
+		return -offset;
+
+	BUG();
+}
+
+static int kvm_ls_length(struct kvm_vcpu *vcpu, u32 instr)
+{
+	int index;
+
+	index = get_arm_ls_instr_index(instr);
+
+	if (instr & (1U << INSTR_LS_TYPE_BIT)) {
+		/* LS word or unsigned byte */
+		if (instr & (1U << INSTR_LS_BIT_B))
+			return sizeof(unsigned char);
+		else
+			return sizeof(u32);
+	} else {
+		/* LS halfword, doubleword or signed byte */
+		u32 H = (instr & (1U << INSTR_LS_BIT_H));
+		u32 S = (instr & (1U << INSTR_LS_BIT_S));
+		u32 L = (instr & (1U << INSTR_LS_BIT_L));
+
+		if (!L && S) {
+			kvm_err("WARNING: d-word for MMIO\n");
+			return 2 * sizeof(u32);
+		} else if (L && S && !H)
+			return sizeof(char);
+		else
+			return sizeof(u16);
+	}
+
+	BUG();
+}
+
+/**
+ * kvm_emulate_mmio_ls - emulates load/store instructions made to I/O memory
+ * @vcpu:	The vcpu pointer
+ * @fault_ipa:	The IPA that caused the 2nd stage fault
+ * @instr:	The instruction that caused the fault
+ *
+ * Handles emulation of load/store instructions which cannot be emulated through
+ * information found in the HSR on faults. It is necessary in this case to
+ * simply decode the offending instruction in software and determine the
+ * required operands.
+ */
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			unsigned long instr)
+{
+	unsigned long rd, rn, offset, len, instr_len;
+	int index;
+	bool is_write, is_thumb;
+
+	trace_kvm_mmio_emulate(vcpu->arch.regs.pc, instr, vcpu->arch.regs.cpsr);
+
+	index = get_arm_ls_instr_index(instr);
+	if (index == INSTR_NONE) {
+		kvm_err("Unknown load/store instruction\n");
+		return -EINVAL;
+	}
+
+	is_write = (index < NUM_LD_INSTR) ? false : true;
+	rd = (instr & INSTR_LS_RD_MASK) >> INSTR_LS_RD_SHIFT;
+	len = kvm_ls_length(vcpu, instr);
+
+	vcpu->run->exit_reason = KVM_EXIT_MMIO;
+	vcpu->run->mmio.is_write = is_write;
+	vcpu->run->mmio.phys_addr = fault_ipa;
+	vcpu->run->mmio.len = len;
+	vcpu->arch.mmio_sign_extend = false;
+	vcpu->arch.mmio_rd = rd;
+
+	trace_kvm_mmio((is_write) ? KVM_TRACE_MMIO_WRITE :
+				    KVM_TRACE_MMIO_READ_UNSATISFIED,
+			len, fault_ipa, (is_write) ? *vcpu_reg(vcpu, rd) : 0);
+
+	/* Handle base register writeback */
+	if (!(instr & (1U << INSTR_LS_BIT_P)) ||
+	     (instr & (1U << INSTR_LS_BIT_W))) {
+		rn = (instr & INSTR_LS_RN_MASK) >> INSTR_LS_RN_SHIFT;
+		offset = ls_word_calc_offset(vcpu, instr);
+		*vcpu_reg(vcpu, rn) += offset;
+	}
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	if (is_thumb && !is_wide_instruction(instr))
+		instr_len = 2;
+	else
+		instr_len = 4;
+
+	*vcpu_pc(vcpu) += instr_len;
+	kvm_adjust_itstate(vcpu);
+	return 0;
+}
+
 /**
  * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
  * @vcpu:	The VCPU pointer
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 8ea311a..2f6e8ec 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -16,10 +16,14 @@
 
 #include <linux/mman.h>
 #include <linux/kvm_host.h>
+#include <trace/events/kvm.h>
 #include <asm/pgalloc.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
+
+#include "trace.h"
 
 static pgd_t *kvm_hyp_pgd;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
@@ -341,6 +345,159 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	return ret;
 }
 
+/**
+ * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation
+ * @vcpu: The VCPU pointer
+ * @run:  The VCPU run struct containing the mmio data
+ *
+ * This should only be called after returning from userspace for MMIO load
+ * emulation.
+ */
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	int *dest;
+	unsigned int len;
+	int mask;
+
+	if (!run->mmio.is_write) {
+		dest = vcpu_reg(vcpu, vcpu->arch.mmio_rd);
+		memset(dest, 0, sizeof(int));
+
+		len = run->mmio.len;
+		if (len > 4)
+			return -EINVAL;
+
+		memcpy(dest, run->mmio.data, len);
+
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
+				*((u64 *)run->mmio.data));
+
+		if (vcpu->arch.mmio_sign_extend && len < 4) {
+			mask = 1U << ((len * 8) - 1);
+			*dest = (*dest ^ mask) - mask;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * invalid_io_mem_abort -- Handle I/O aborts ISV bit is clear
+ *
+ * @vcpu:      The vcpu pointer
+ * @fault_ipa: The IPA that caused the 2nd stage fault
+ *
+ * Some load/store instructions cannot be emulated using the information
+ * presented in the HSR, for instance, register write-back instructions are not
+ * supported. We therefore need to fetch the instruction, decode it, and then
+ * emulate its behavior.
+ */
+static int invalid_io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
+{
+	unsigned long instr;
+	phys_addr_t pc_ipa;
+
+	if (vcpu->arch.pc_ipa & 1) {
+		kvm_err("I/O Abort from invalid instruction address? Wrong!\n");
+		return -EINVAL;
+	}
+
+	if (vcpu->arch.pc_ipa & (1U << 11)) {
+		/* LPAE PAR format */
+		/* TODO: Check if this ever happens - called from Hyp mode */
+		pc_ipa = vcpu->arch.pc_ipa & PAGE_MASK & ((1ULL << 32) - 1);
+	} else {
+		/* VMSAv7 PAR format */
+		pc_ipa = vcpu->arch.pc_ipa & PAGE_MASK & ((1ULL << 40) - 1);
+	}
+	pc_ipa += vcpu->arch.regs.pc & ~PAGE_MASK;
+
+	if (vcpu->arch.regs.cpsr & PSR_T_BIT) {
+		/* TODO: Check validity of PC IPA and IPA2!!! */
+		/* Need to decode thumb instructions as well */
+		kvm_err("Thumb guest support not there yet :(\n");
+		return -EINVAL;
+	}
+
+	if (kvm_read_guest(vcpu->kvm, pc_ipa, &instr, sizeof(instr))) {
+		kvm_err("Could not copy guest instruction\n");
+		return -EFAULT;
+	}
+
+	return kvm_emulate_mmio_ls(vcpu, fault_ipa, instr);
+}
+
+static int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
+{
+	unsigned long rd, len, instr_len;
+	bool is_write, sign_extend;
+
+	if (!(vcpu->arch.hsr & HSR_ISV))
+		return invalid_io_mem_abort(vcpu, fault_ipa);
+
+	if (((vcpu->arch.hsr >> 8) & 1)) {
+		kvm_err("Not supported, Cache operation on I/O addr.\n");
+		return -EFAULT;
+	}
+
+	if ((vcpu->arch.hsr >> 7) & 1) {
+		kvm_err("Translation table accesses I/O memory\n");
+		return -EFAULT;
+	}
+
+	switch ((vcpu->arch.hsr >> 22) & 0x3) {
+	case 0:
+		len = 1;
+		break;
+	case 1:
+		len = 2;
+		break;
+	case 2:
+		len = 4;
+		break;
+	default:
+		kvm_err("Invalid I/O abort\n");
+		return -EFAULT;
+	}
+
+	is_write = ((vcpu->arch.hsr >> 6) & 1);
+	sign_extend = ((vcpu->arch.hsr >> 21) & 1);
+	rd = (vcpu->arch.hsr >> 16) & 0xf;
+	BUG_ON(rd > 15);
+
+	if (rd == 15) {
+		kvm_err("I/O memory trying to read/write pc\n");
+		return -EFAULT;
+	}
+
+	/* Get instruction length in bytes */
+	instr_len = ((vcpu->arch.hsr >> 25) & 1) ? 4 : 2;
+
+	/* Export MMIO operations to user space */
+	run->exit_reason = KVM_EXIT_MMIO;
+	run->mmio.is_write = is_write;
+	run->mmio.phys_addr = fault_ipa;
+	run->mmio.len = len;
+	vcpu->arch.mmio_sign_extend = sign_extend;
+	vcpu->arch.mmio_rd = rd;
+
+	trace_kvm_mmio((is_write) ? KVM_TRACE_MMIO_WRITE :
+				    KVM_TRACE_MMIO_READ_UNSATISFIED,
+			len, fault_ipa, (is_write) ? *vcpu_reg(vcpu, rd) : 0);
+
+	if (is_write)
+		memcpy(run->mmio.data, vcpu_reg(vcpu, rd), len);
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	*vcpu_pc(vcpu) += instr_len;
+	kvm_adjust_itstate(vcpu);
+	return 0;
+}
+
 #define HSR_ABT_FS	(0x3f)
 #define HPFAR_MASK	(~0xf)
 
@@ -386,8 +543,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			return -EFAULT;
 		}
 
-		kvm_pr_unimpl("I/O address abort...");
-		return 0;
+		/* Adjust page offset */
+		fault_ipa |= vcpu->arch.hdfar & ~PAGE_MASK;
+		return io_mem_abort(vcpu, run, fault_ipa, memslot);
 	}
 
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index e474a0a..bd3a6cc 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,6 +39,27 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_mmio_emulate,
+	TP_PROTO(unsigned long vcpu_pc, unsigned long instr,
+		 unsigned long cpsr),
+	TP_ARGS(vcpu_pc, instr, cpsr),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+		__field(	unsigned long,	instr		)
+		__field(	unsigned long,	cpsr		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+		__entry->vcpu_pc		= instr;
+		__entry->vcpu_pc		= cpsr;
+	),
+
+	TP_printk("Emulate MMIO at: 0x%08lx (instr: %08lx, cpsr: %08lx)",
+		  __entry->vcpu_pc, __entry->instr, __entry->cpsr)
+);
+
 /* Architecturally implementation defined CP15 register access */
 TRACE_EVENT(kvm_emulate_cp15_imp,
 	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 15/15] ARM: KVM: Guest wait-for-interrupts (WFI) support
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (13 preceding siblings ...)
  2012-06-15 19:09 ` [PATCH v8 14/15] ARM: KVM: Handle I/O aborts Christoffer Dall
@ 2012-06-15 19:09 ` Christoffer Dall
  2012-06-28 21:49 ` [PATCH v8 00/15] KVM/ARM Implementation Marcelo Tosatti
  15 siblings, 0 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-15 19:09 UTC (permalink / raw)
  To: android-virt, kvm

From: Christoffer Dall <cdall@cs.columbia.edu>

When the guest executes a WFI instruction the operation is trapped to
KVM, which emulates the instruction in software. There is no correlation
between a guest executing a WFI instruction and actually putting the
hardware into a low-power mode, since a KVM guest is essentially a
process and the WFI instruction can be seen as 'sleep' call from this
process. Therefore, we flag the VCPU to be in wait_for_interrupts mode
and call the main KVM function kvm_vcpu_block() function. This function
will put the thread on a wait-queue and call schedule.

When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
signal the VCPU thread and unflag the VCPU to no longer wait for
interrupts. All calls to kvm_arch_vcpu_ioctl_run() result in a call to
kvm_vcpu_block() as long as the VCPU is in wfi-mode.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/kvm/arm.c     |   15 ++++++++++++++-
 arch/arm/kvm/emulate.c |   12 ++++++++++++
 arch/arm/kvm/trace.h   |   16 ++++++++++++++++
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index eedf171..e4b659b 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -291,9 +291,17 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+/**
+ * kvm_arch_vcpu_runnable - determine if the vcpu can be scheduled
+ * @v:		The VCPU pointer
+ *
+ * If the guest CPU is not waiting for interrupts (or waiting and
+ * an interrupt is pending) then it is by definition runnable.
+ */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return 0;
+	return !!v->arch.irq_lines ||
+		!v->arch.wait_for_interrupts;
 }
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
@@ -479,6 +487,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			break;
 		}
 
+		if (vcpu->arch.wait_for_interrupts)
+			kvm_vcpu_block(vcpu);
+
 		/*
 		 * Enter the guest
 		 */
@@ -551,6 +562,8 @@ static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
 	 * trigger a world-switch round on the running physical CPU to set the
 	 * virtual IRQ/FIQ fields in the HCR appropriately.
 	 */
+	if (irq_level->level)
+		vcpu->arch.wait_for_interrupts = 0;
 	kvm_vcpu_kick(vcpu);
 
 	return 0;
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 47aa04c..914b17f 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -433,8 +433,20 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return emulate_cp15(vcpu, &params);
 }
 
+/**
+ * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
+ * @vcpu:	the vcpu pointer
+ * @run:	the kvm_run structure pointer
+ *
+ * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
+ * halt execution of world-switches and schedule other host processes until
+ * there is an incoming IRQ or FIQ to the VM.
+ */
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+	trace_kvm_wfi(vcpu->arch.regs.pc);
+	if (!vcpu->arch.irq_lines)
+		vcpu->arch.wait_for_interrupts = 1;
 	return 0;
 }
 
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index bd3a6cc..fc68394 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -90,6 +90,22 @@ TRACE_EVENT(kvm_emulate_cp15_imp,
 			__entry->CRm, __entry->Op2)
 );
 
+TRACE_EVENT(kvm_wfi,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("guest executed wfi at: 0x%08lx", __entry->vcpu_pc)
+);
+
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 02/15] KVM: use KVM_CAP_IRQ_ROUTING to protect the routing related code
  2012-06-15 19:07 ` [PATCH v8 02/15] KVM: use KVM_CAP_IRQ_ROUTING to protect the routing related code Christoffer Dall
@ 2012-06-18 13:06   ` Avi Kivity
  0 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2012-06-18 13:06 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/15/2012 10:07 PM, Christoffer Dall wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> The KVM code sometimes uses CONFIG_HAVE_KVM_IRQCHIP to protect
> code that is related to IRQ routing, which not all in-kernel
> irqchips may support.
> 
> Use KVM_CAP_IRQ_ROUTING instead.

Thanks, applied.


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 03/15] KVM: Introduce __KVM_HAVE_IRQ_LINE
  2012-06-15 19:07 ` [PATCH v8 03/15] KVM: Introduce __KVM_HAVE_IRQ_LINE Christoffer Dall
@ 2012-06-18 13:07   ` Avi Kivity
  0 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2012-06-18 13:07 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/15/2012 10:07 PM, Christoffer Dall wrote:
> This is a preparatory patch for the KVM/ARM implementation. KVM/ARM will use
> the KVM_IRQ_LINE ioctl, which is currently conditional on
> __KVM_HAVE_IOAPIC, but ARM obviously doesn't have any IOAPIC support and we
> need a separate define.
> 

Thanks, applied.


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 04/15] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER
  2012-06-15 19:07 ` [PATCH v8 04/15] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER Christoffer Dall
@ 2012-06-18 13:08   ` Avi Kivity
  2012-06-18 17:47     ` Christoffer Dall
  2012-06-28 21:28   ` Marcelo Tosatti
  1 sibling, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-06-18 13:08 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/15/2012 10:07 PM, Christoffer Dall wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> In order to avoid compilation failure when KVM is not compiled in,
> guard the mmu_notifier specific sections with both CONFIG_MMU_NOTIFIER
> and KVM_ARCH_WANT_MMU_NOTIFIER, like it is being done in the rest of
> the KVM code.
> 
>  
> -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
> +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
>  	struct mmu_notifier mmu_notifier;
>  	unsigned long mmu_notifier_seq;
>  	long mmu_notifier_count;
> @@ -780,7 +780,7 @@ struct kvm_stats_debugfs_item {
>  extern struct kvm_stats_debugfs_item debugfs_entries[];
>  extern struct dentry *kvm_debugfs_dir;
>  
> -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
> +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
>  static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq)
>  {

Why not have Kconfig select CONFIG_MMU_NOTIFIER?


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 06/15] ARM: KVM: Hypervisor identity mapping
  2012-06-15 19:07 ` [PATCH v8 06/15] ARM: KVM: Hypervisor identity mapping Christoffer Dall
@ 2012-06-18 13:12   ` Avi Kivity
  2012-06-18 17:55     ` Christoffer Dall
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-06-18 13:12 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/15/2012 10:07 PM, Christoffer Dall wrote:
> Adds support in the identity mapping feature that allows KVM to setup
> identity mapping for the Hyp mode with the AP[1] bit set as required by
> the specification and also supports freeing created sub pmd's after
> finished use.
> 
> These two functions:
>  - hyp_idmap_add(pgd, addr, end);
>  - hyp_idmap_del(pgd, addr, end);
> are essentially calls to the same function as the non-hyp versions but
> with a different argument value. KVM calls these functions to setup
> and teardown the identity mapping used to initialize the hypervisor.
> 
> Note, the hyp-version of the _del function actually frees the pmd's
> pointed to by the pgd as opposed to the non-hyp version which just
> clears them.


I asked previously what happens if two data structures share a page, and
one of them is removed.  Is that handled now?  How?

Why not just identity map all memory?  You can use large pages so it's
fast and doesn't consume a lot of page table memory.--
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 10/15] ARM: KVM: Inject IRQs and FIQs from userspace
  2012-06-15 19:08 ` [PATCH v8 10/15] ARM: KVM: Inject IRQs and FIQs from userspace Christoffer Dall
@ 2012-06-18 13:32   ` Avi Kivity
  2012-06-18 20:56     ` Christoffer Dall
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-06-18 13:32 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/15/2012 10:08 PM, Christoffer Dall wrote:
> From: Christoffer Dall <cdall@cs.columbia.edu>
> 
> Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
> This ioctl is used since the sematics are in fact two lines that can be
> either raised or lowered on the VCPU - the IRQ and FIQ lines.
> 
> KVM needs to know which VCPU it must operate on and whether the FIQ or
> IRQ line is raised/lowered. Hence both pieces of information is packed
> in the kvm_irq_level->irq field. The irq fild value will be:
>   IRQ: vcpu_index << 1
>   FIQ: (vcpu_index << 1) | 1
> 
> This is documented in Documentation/kvm/api.txt.
> 
> The effect of the ioctl is simply to simply raise/lower the
> corresponding irq_line field on the VCPU struct, which will cause the
> world-switch code to raise/lower virtual interrupts when running the
> guest on next switch. The wait_for_interrupt flag is also cleared for
> raised IRQs or FIQs causing an idle VCPU to become active again. CPUs
> in guest mode are kicked to make sure they refresh their interrupt status.

>  
> +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
> +				      struct kvm_irq_level *irq_level)
> +{
> +	int mask;
> +	unsigned int vcpu_idx;
> +	struct kvm_vcpu *vcpu;
> +	unsigned long old, new, *ptr;
> +
> +	vcpu_idx = irq_level->irq >> 1;
> +	if (vcpu_idx >= KVM_MAX_VCPUS)
> +		return -EINVAL;
> +
> +	vcpu = kvm_get_vcpu(kvm, vcpu_idx);
> +	if (!vcpu)
> +		return -EINVAL;
> +
> +	if ((irq_level->irq & 1) == KVM_ARM_IRQ_LINE)
> +		mask = HCR_VI;
> +	else /* KVM_ARM_FIQ_LINE */
> +		mask = HCR_VF;
> +
> +	trace_kvm_set_irq(irq_level->irq, irq_level->level, 0);
> +
> +	ptr = (unsigned long *)&vcpu->arch.irq_lines;
> +	do {
> +		old = ACCESS_ONCE(*ptr);
> +		if (irq_level->level)
> +			new = old | mask;
> +		else
> +			new = old & ~mask;
> +
> +		if (new == old)
> +			return 0; /* no change */
> +	} while (cmpxchg(ptr, old, new) != old);

Isn't this a complicated


   if (level)
       set_bit()
   else
       clear_bit()

?

> +
> +	/*
> +	 * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
> +	 * trigger a world-switch round on the running physical CPU to set the
> +	 * virtual IRQ/FIQ fields in the HCR appropriately.
> +	 */
> +	kvm_vcpu_kick(vcpu);

No need to wake when the line is asserted so you can make this
conditional on level.

> +
> +	return 0;
> +}
> +
>  long kvm_arch_vcpu_ioctl(struct file *filp,
>  			 unsigned int ioctl, unsigned long arg)
>  {
> @@ -298,7 +345,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  long kvm_arch_vm_ioctl(struct file *filp,
>  		       unsigned int ioctl, unsigned long arg)
>  {
> -	return -EINVAL;
> +	struct kvm *kvm = filp->private_data;
> +	void __user *argp = (void __user *)arg;
> +
> +	switch (ioctl) {
> +	case KVM_IRQ_LINE: {
> +		struct kvm_irq_level irq_event;
> +
> +		if (copy_from_user(&irq_event, argp, sizeof irq_event))
> +			return -EFAULT;
> +		return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event);
> +	}
> +	default:
> +		return -EINVAL;
> +	}
>  }

Should be in common code guarded by the define introduced previously.


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
  2012-06-15 19:08 ` [PATCH v8 11/15] ARM: KVM: World-switch implementation Christoffer Dall
@ 2012-06-18 13:41   ` Avi Kivity
  2012-06-18 22:05     ` Christoffer Dall
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-06-18 13:41 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/15/2012 10:08 PM, Christoffer Dall wrote:
> Provides complete world-switch implementation to switch to other guests
> running in non-secure modes. Includes Hyp exception handlers that
> capture necessary exception information and stores the information on
> the VCPU and KVM structures.
> 
> Switching to Hyp mode is done through a simple HVC instructions. The
> exception vector code will check that the HVC comes from VMID==0 and if
> so will store the necessary state on the Hyp stack, which will look like
> this (growing downwards, see the hyp_hvc handler):
>   ...
>   Hyp_Sp + 4: spsr (Host-SVC cpsr)
>   Hyp_Sp    : lr_usr
> 
> When returning from Hyp mode to SVC mode, another HVC instruction is
> executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp
> stack pointer should be where it was left from the above initial call,
> since the values on the stack will be used to restore state (see
> hyp_svc).
> 
> Otherwise, the world-switch is pretty straight-forward. All state that
> can be modified by the guest is first backed up on the Hyp stack and the
> VCPU values is loaded onto the hardware. State, which is not loaded, but
> theoretically modifiable by the guest is protected through the
> virtualiation features to generate a trap and cause software emulation.
> Upon guest returns, all state is restored from hardware onto the VCPU
> struct and the original state is restored from the Hyp-stack onto the
> hardware.
> 
> One controversy may be the back-door call to __irq_svc (the host
> kernel's own physical IRQ handler) which is called when a physical IRQ
> exception is taken in Hyp mode while running in the guest.
> 
> SMP support using the VMPIDR calculated on the basis of the host MPIDR
> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
> 
> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
> a separate patch into the appropriate patches introducing the
> functionality. Note that the VMIDs are stored per VM as required by the ARM
> architecture reference manual.
> 
> +
> +/**
> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
> + * @kvm	The guest that we are about to run
> + *
> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
> + * caches and TLBs.
> + */
> +static void update_vttbr(struct kvm *kvm)
> +{
> +	phys_addr_t pgd_phys;
> +
> +	spin_lock(&kvm_vmid_lock);

Premature, but this is sad.  I suggest you split vmid generation from
next available vmid.  This allows you to make the generation counter
atomic so it may be read outside the lock.

You can do

    if (likely(kvm->arch.vmd_gen) == atomic_read(&kvm_vmid_gen)))
           return;

    spin_lock(...

> +
> +	/*
> +	 *  Check that the VMID is still valid.
> +	 *  (The hardware supports only 256 values with the value zero
> +	 *   reserved for the host, so we check if an assigned value has rolled
> +	 *   over a sequence, which requires us to assign a new value and flush
> +	 *   necessary caches and TLBs on all CPUs.)
> +	 */
> +	if (unlikely((kvm->arch.vmid ^ next_vmid) >> VMID_BITS)) {
> +		/* Check for a new VMID generation */
> +		if (unlikely((next_vmid & VMID_MASK) == 0)) {
> +			/* Check for the (very unlikely) 64-bit wrap around */

Unlikely?  it's impossible.

> +
> +/**
> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
> + * @vcpu:	The VCPU pointer
> + * @run:	The kvm_run structure pointer used for userspace state exchange
> + *
> + * This function is called through the VCPU_RUN ioctl called from user space. It
> + * will execute VM code in a loop until the time slice for the process is used
> + * or some emulation is needed from user space in which case the function will
> + * return with return value 0 and with the kvm_run structure filled in with the
> + * required data for the requested emulation.
> + */
>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> -	return -EINVAL;
> +	int ret = 0;
> +	sigset_t sigsaved;
> +
> +	if (vcpu->sigset_active)
> +		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
> +
> +	run->exit_reason = KVM_EXIT_UNKNOWN;
> +	while (run->exit_reason == KVM_EXIT_UNKNOWN) {

It's not a good idea to read stuff from run unless it's part of the ABI,
since userspace can play with it while you're reading it.  It's harmless
here but in other places it can lead to a vulnerability.

> +		/*
> +		 * Check conditions before entering the guest
> +		 */
> +		if (need_resched())
> +			kvm_resched(vcpu);

I think cond_resched() is all that's needed these days (kvm_resched()
predates preempt notifiers).

> +
> +		if (signal_pending(current)) {
> +			ret = -EINTR;
> +			run->exit_reason = KVM_EXIT_INTR;
> +			break;
> +		}
> +
> +		/*
> +		 * Enter the guest
> +		 */
> +		trace_kvm_entry(vcpu->arch.regs.pc);
> +
> +		update_vttbr(vcpu->kvm);
> +
> +		local_irq_disable();
> +		kvm_guest_enter();
> +		vcpu->mode = IN_GUEST_MODE;
> +
> +		ret = __kvm_vcpu_run(vcpu);
> +
> +		vcpu->mode = OUTSIDE_GUEST_MODE;
> +		kvm_guest_exit();
> +		local_irq_enable();
> +
> +		trace_kvm_exit(vcpu->arch.regs.pc);
> +	}
> +
> +	if (vcpu->sigset_active)
> +		sigprocmask(SIG_SETMASK, &sigsaved, NULL);
> +
> +	return ret;
>  }
>  

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM
  2012-06-15 19:09 ` [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM Christoffer Dall
@ 2012-06-18 13:45   ` Avi Kivity
  2012-06-18 22:20     ` Christoffer Dall
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-06-18 13:45 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm, Andrea Arcangeli

On 06/15/2012 10:09 PM, Christoffer Dall wrote:
> From: Christoffer Dall <cdall@cs.columbia.edu>
> 
> Handles the guest faults in KVM by mapping in corresponding user pages
> in the 2nd stage page tables.
> 
> Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
> pgprot_guest variables used to map 2nd stage memory for KVM guests.
> 
> Leverages MMU notifiers on KVM/ARM by supporting the kvm_unmap_hva() operation,
> where we remove the HVA from the 2nd stage translation. All other KVM MMU
> notifierhooks are NOPs.

I think you must at least support change_pte (possibly by unmapping).
Andrea?



-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 14/15] ARM: KVM: Handle I/O aborts
  2012-06-15 19:09 ` [PATCH v8 14/15] ARM: KVM: Handle I/O aborts Christoffer Dall
@ 2012-06-18 13:48   ` Avi Kivity
  2012-06-18 22:28     ` Christoffer Dall
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-06-18 13:48 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/15/2012 10:09 PM, Christoffer Dall wrote:
> When the guest accesses I/O memory this will create data abort
> exceptions and they are handled by decoding the HSR information
> (physical address, read/write, length, register) and forwarding reads
> and writes to QEMU which performs the device emulation.
> 
> Certain classes of load/store operations do not support the syndrome
> information provided in the HSR and we therefore must be able to fetch
> the offending instruction from guest memory and decode it manually.
> 
> This requires changing the general flow somewhat since new calls to run
> the VCPU must check if there's a pending MMIO load and perform the write
> after userspace has made the data available.
> 
>  
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index e474a0a..bd3a6cc 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -39,6 +39,27 @@ TRACE_EVENT(kvm_exit,
>  	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
>  );
>  
> +TRACE_EVENT(kvm_mmio_emulate,
> +	TP_PROTO(unsigned long vcpu_pc, unsigned long instr,
> +		 unsigned long cpsr),
> +	TP_ARGS(vcpu_pc, instr, cpsr),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	vcpu_pc		)
> +		__field(	unsigned long,	instr		)
> +		__field(	unsigned long,	cpsr		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu_pc		= vcpu_pc;
> +		__entry->vcpu_pc		= instr;
> +		__entry->vcpu_pc		= cpsr;

-ECUTANDPASTE

> +	),
> +
> +	TP_printk("Emulate MMIO at: 0x%08lx (instr: %08lx, cpsr: %08lx)",
> +		  __entry->vcpu_pc, __entry->instr, __entry->cpsr)
> +);
> +


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 04/15] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER
  2012-06-18 13:08   ` Avi Kivity
@ 2012-06-18 17:47     ` Christoffer Dall
  2012-06-19  8:37       ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-18 17:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: android-virt, kvm

On Mon, Jun 18, 2012 at 9:08 AM, Avi Kivity <avi@redhat.com> wrote:
> On 06/15/2012 10:07 PM, Christoffer Dall wrote:
>> From: Marc Zyngier <marc.zyngier@arm.com>
>>
>> In order to avoid compilation failure when KVM is not compiled in,
>> guard the mmu_notifier specific sections with both CONFIG_MMU_NOTIFIER
>> and KVM_ARCH_WANT_MMU_NOTIFIER, like it is being done in the rest of
>> the KVM code.
>>
>>
>> -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
>> +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
>>       struct mmu_notifier mmu_notifier;
>>       unsigned long mmu_notifier_seq;
>>       long mmu_notifier_count;
>> @@ -780,7 +780,7 @@ struct kvm_stats_debugfs_item {
>>  extern struct kvm_stats_debugfs_item debugfs_entries[];
>>  extern struct dentry *kvm_debugfs_dir;
>>
>> -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
>> +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
>>  static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq)
>>  {
>
> Why not have Kconfig select CONFIG_MMU_NOTIFIER?
>
>
Not sure I understand. Where would you select this option?

We do select this option when choosing to compile KVM on, but when we
do _not_, then other includes of kvm_host.h fails.

-Christoffer

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 06/15] ARM: KVM: Hypervisor identity mapping
  2012-06-18 13:12   ` Avi Kivity
@ 2012-06-18 17:55     ` Christoffer Dall
  2012-06-19  8:38       ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-18 17:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: android-virt, kvm

On Mon, Jun 18, 2012 at 9:12 AM, Avi Kivity <avi@redhat.com> wrote:
> On 06/15/2012 10:07 PM, Christoffer Dall wrote:
>> Adds support in the identity mapping feature that allows KVM to setup
>> identity mapping for the Hyp mode with the AP[1] bit set as required by
>> the specification and also supports freeing created sub pmd's after
>> finished use.
>>
>> These two functions:
>>  - hyp_idmap_add(pgd, addr, end);
>>  - hyp_idmap_del(pgd, addr, end);
>> are essentially calls to the same function as the non-hyp versions but
>> with a different argument value. KVM calls these functions to setup
>> and teardown the identity mapping used to initialize the hypervisor.
>>
>> Note, the hyp-version of the _del function actually frees the pmd's
>> pointed to by the pgd as opposed to the non-hyp version which just
>> clears them.
>
>
> I asked previously what happens if two data structures share a page, and
> one of them is removed.  Is that handled now?  How?
>

I think you asked previously for the general hyp-mode mappings, not
the identity mappings. For the general hyp-mode mappings we simply
don't unmap the data structures, potentially leaking a few pages for
the page tables themselves.

This is only for initialization, so there are not really any data
structures mapped, only one/two pages to initialize the hypervisor
mode.

> Why not just identity map all memory?  You can use large pages so it's
> fast and doesn't consume a lot of page table memory.

That's an option, but it still seems like an awful waste since it's
only used once (unless you unload and re-load the module) and there's
really no problem with data structures here.

The truth is that this is going to go away, and the code will be put
in a section that's idmapped from kernel start. There's a patch under
way from Marc taking care of this which I assmue we'll merge for v9.

-Christoffer

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 10/15] ARM: KVM: Inject IRQs and FIQs from userspace
  2012-06-18 13:32   ` Avi Kivity
@ 2012-06-18 20:56     ` Christoffer Dall
  2012-06-19  8:49       ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-18 20:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: android-virt, kvm

On Mon, Jun 18, 2012 at 9:32 AM, Avi Kivity <avi@redhat.com> wrote:
> On 06/15/2012 10:08 PM, Christoffer Dall wrote:
>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>
>> Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
>> This ioctl is used since the sematics are in fact two lines that can be
>> either raised or lowered on the VCPU - the IRQ and FIQ lines.
>>
>> KVM needs to know which VCPU it must operate on and whether the FIQ or
>> IRQ line is raised/lowered. Hence both pieces of information is packed
>> in the kvm_irq_level->irq field. The irq fild value will be:
>>   IRQ: vcpu_index << 1
>>   FIQ: (vcpu_index << 1) | 1
>>
>> This is documented in Documentation/kvm/api.txt.
>>
>> The effect of the ioctl is simply to simply raise/lower the
>> corresponding irq_line field on the VCPU struct, which will cause the
>> world-switch code to raise/lower virtual interrupts when running the
>> guest on next switch. The wait_for_interrupt flag is also cleared for
>> raised IRQs or FIQs causing an idle VCPU to become active again. CPUs
>> in guest mode are kicked to make sure they refresh their interrupt status.
>
>>
>> +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
>> +                                   struct kvm_irq_level *irq_level)
>> +{
>> +     int mask;
>> +     unsigned int vcpu_idx;
>> +     struct kvm_vcpu *vcpu;
>> +     unsigned long old, new, *ptr;
>> +
>> +     vcpu_idx = irq_level->irq >> 1;
>> +     if (vcpu_idx >= KVM_MAX_VCPUS)
>> +             return -EINVAL;
>> +
>> +     vcpu = kvm_get_vcpu(kvm, vcpu_idx);
>> +     if (!vcpu)
>> +             return -EINVAL;
>> +
>> +     if ((irq_level->irq & 1) == KVM_ARM_IRQ_LINE)
>> +             mask = HCR_VI;
>> +     else /* KVM_ARM_FIQ_LINE */
>> +             mask = HCR_VF;
>> +
>> +     trace_kvm_set_irq(irq_level->irq, irq_level->level, 0);
>> +
>> +     ptr = (unsigned long *)&vcpu->arch.irq_lines;
>> +     do {
>> +             old = ACCESS_ONCE(*ptr);
>> +             if (irq_level->level)
>> +                     new = old | mask;
>> +             else
>> +                     new = old & ~mask;
>> +
>> +             if (new == old)
>> +                     return 0; /* no change */
>> +     } while (cmpxchg(ptr, old, new) != old);
>
> Isn't this a complicated
>
>
>   if (level)
>       set_bit()
>   else
>       clear_bit()
>
> ?
>

we need to atomically know if we changed either the FIQ/IRQ fields and
atomically update without locking. (I think you suggested this
approach, in fact).

>> +
>> +     /*
>> +      * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
>> +      * trigger a world-switch round on the running physical CPU to set the
>> +      * virtual IRQ/FIQ fields in the HCR appropriately.
>> +      */
>> +     kvm_vcpu_kick(vcpu);
>
> No need to wake when the line is asserted so you can make this
> conditional on level.
>

we need to trigger a world switch to update the virtualized register
from the actual running physical CPU if we changed any of the IRQ/FIQ
fields. We return in the loop above if we didn't change anything.

>> +
>> +     return 0;
>> +}
>> +
>>  long kvm_arch_vcpu_ioctl(struct file *filp,
>>                        unsigned int ioctl, unsigned long arg)
>>  {
>> @@ -298,7 +345,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>  long kvm_arch_vm_ioctl(struct file *filp,
>>                      unsigned int ioctl, unsigned long arg)
>>  {
>> -     return -EINVAL;
>> +     struct kvm *kvm = filp->private_data;
>> +     void __user *argp = (void __user *)arg;
>> +
>> +     switch (ioctl) {
>> +     case KVM_IRQ_LINE: {
>> +             struct kvm_irq_level irq_event;
>> +
>> +             if (copy_from_user(&irq_event, argp, sizeof irq_event))
>> +                     return -EFAULT;
>> +             return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event);
>> +     }
>> +     default:
>> +             return -EINVAL;
>> +     }
>>  }
>
> Should be in common code guarded by the define introduced previously.
>
>

you mean like this: ?


diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index a9f209b..5bf2193 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -535,8 +535,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
struct kvm_run *run)
 	return ret;
 }

-static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
-				      struct kvm_irq_level *irq_level)
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
 {
 	int mask;
 	unsigned int vcpu_idx;
@@ -596,20 +595,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log)
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
-	struct kvm *kvm = filp->private_data;
-	void __user *argp = (void __user *)arg;
-
-	switch (ioctl) {
-	case KVM_IRQ_LINE: {
-		struct kvm_irq_level irq_event;
-
-		if (copy_from_user(&irq_event, argp, sizeof irq_event))
-			return -EFAULT;
-		return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event);
-	}
-	default:
-		return -EINVAL;
-	}
+	return -EINVAL;
 }

 static void cpu_set_vector(void *vector)
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index bd77cb5..122a4b2 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -924,6 +924,16 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu
*vcpu, struct kvm_regs *regs)
 	return 0;
 }

+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event)
+{
+	if (!irqchip_in_kernel(kvm))
+		return -ENXIO;
+
+	irq_event->statusstatus = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
+					      irq_event->irq, irq_event->level);
+	return 0;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
 		unsigned int ioctl, unsigned long arg)
 {
@@ -963,29 +973,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			goto out;
 		}
 		break;
-	case KVM_IRQ_LINE_STATUS:
-	case KVM_IRQ_LINE: {
-		struct kvm_irq_level irq_event;
-
-		r = -EFAULT;
-		if (copy_from_user(&irq_event, argp, sizeof irq_event))
-			goto out;
-		r = -ENXIO;
-		if (irqchip_in_kernel(kvm)) {
-			__s32 status;
-			status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
-				    irq_event.irq, irq_event.level);
-			if (ioctl == KVM_IRQ_LINE_STATUS) {
-				r = -EFAULT;
-				irq_event.status = status;
-				if (copy_to_user(argp, &irq_event,
-							sizeof irq_event))
-					goto out;
-			}
-			r = 0;
-		}
-		break;
-		}
 	case KVM_GET_IRQCHIP: {
 		/* 0: PIC master, 1: PIC slave, 2: IOAPIC */
 		struct kvm_irqchip chip;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a01a424..c9c4186 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3144,6 +3144,16 @@ out:
 	return r;
 }

+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event)
+{
+	if (!irqchip_in_kernel(kvm))
+		return -ENXIO;
+
+	irq_event->status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
+					irq_event->irq, irq_event->level);
+	return 0;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
@@ -3250,29 +3260,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	create_pit_unlock:
 		mutex_unlock(&kvm->slots_lock);
 		break;
-	case KVM_IRQ_LINE_STATUS:
-	case KVM_IRQ_LINE: {
-		struct kvm_irq_level irq_event;
-
-		r = -EFAULT;
-		if (copy_from_user(&irq_event, argp, sizeof irq_event))
-			goto out;
-		r = -ENXIO;
-		if (irqchip_in_kernel(kvm)) {
-			__s32 status;
-			status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
-					irq_event.irq, irq_event.level);
-			if (ioctl == KVM_IRQ_LINE_STATUS) {
-				r = -EFAULT;
-				irq_event.status = status;
-				if (copy_to_user(argp, &irq_event,
-							sizeof irq_event))
-					goto out;
-			}
-			r = 0;
-		}
-		break;
-	}
 	case KVM_GET_IRQCHIP: {
 		/* 0: PIC master, 1: PIC slave, 2: IOAPIC */
 		struct kvm_irqchip *chip;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e3c86f8..96aa7fb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -494,6 +494,7 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
 				   struct
 				   kvm_userspace_memory_region *mem,
 				   int user_alloc);
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level);
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 636bd08..1d33877 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2093,6 +2093,25 @@ static long kvm_vm_ioctl(struct file *filp,
 		break;
 	}
 #endif
+#ifdef __KVM_HAVE_IRQ_LINE
+	case KVM_IRQ_LINE_STATUS:
+	case KVM_IRQ_LINE: {
+		struct kvm_irq_level irq_event;
+
+		r = -EFAULT;
+		if (copy_from_user(&irq_event, argp, sizeof irq_event))
+			goto out;
+
+		r = kvm_vm_ioctl_irq_line(kvm, &irq_event);
+
+		if (ioctl == KVM_IRQ_LINE_STATUS) {
+			if (copy_to_user(argp, &irq_event, sizeof irq_event))
+				r = -EFAULT;
+		}
+
+		break;
+	}
+#endif
 	default:
 		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
 		if (r == -ENOTTY)
---

Thanks,
Christoffer

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
  2012-06-18 13:41   ` Avi Kivity
@ 2012-06-18 22:05     ` Christoffer Dall
  2012-06-19  9:16       ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-18 22:05 UTC (permalink / raw)
  To: Avi Kivity; +Cc: android-virt, kvm

On Mon, Jun 18, 2012 at 9:41 AM, Avi Kivity <avi@redhat.com> wrote:
> On 06/15/2012 10:08 PM, Christoffer Dall wrote:
>> Provides complete world-switch implementation to switch to other guests
>> running in non-secure modes. Includes Hyp exception handlers that
>> capture necessary exception information and stores the information on
>> the VCPU and KVM structures.
>>
>> Switching to Hyp mode is done through a simple HVC instructions. The
>> exception vector code will check that the HVC comes from VMID==0 and if
>> so will store the necessary state on the Hyp stack, which will look like
>> this (growing downwards, see the hyp_hvc handler):
>>   ...
>>   Hyp_Sp + 4: spsr (Host-SVC cpsr)
>>   Hyp_Sp    : lr_usr
>>
>> When returning from Hyp mode to SVC mode, another HVC instruction is
>> executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp
>> stack pointer should be where it was left from the above initial call,
>> since the values on the stack will be used to restore state (see
>> hyp_svc).
>>
>> Otherwise, the world-switch is pretty straight-forward. All state that
>> can be modified by the guest is first backed up on the Hyp stack and the
>> VCPU values is loaded onto the hardware. State, which is not loaded, but
>> theoretically modifiable by the guest is protected through the
>> virtualiation features to generate a trap and cause software emulation.
>> Upon guest returns, all state is restored from hardware onto the VCPU
>> struct and the original state is restored from the Hyp-stack onto the
>> hardware.
>>
>> One controversy may be the back-door call to __irq_svc (the host
>> kernel's own physical IRQ handler) which is called when a physical IRQ
>> exception is taken in Hyp mode while running in the guest.
>>
>> SMP support using the VMPIDR calculated on the basis of the host MPIDR
>> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
>>
>> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
>> a separate patch into the appropriate patches introducing the
>> functionality. Note that the VMIDs are stored per VM as required by the ARM
>> architecture reference manual.
>>
>> +
>> +/**
>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>> + * @kvm      The guest that we are about to run
>> + *
>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
>> + * caches and TLBs.
>> + */
>> +static void update_vttbr(struct kvm *kvm)
>> +{
>> +     phys_addr_t pgd_phys;
>> +
>> +     spin_lock(&kvm_vmid_lock);
>
> Premature, but this is sad.  I suggest you split vmid generation from
> next available vmid.  This allows you to make the generation counter
> atomic so it may be read outside the lock.
>
> You can do
>
>    if (likely(kvm->arch.vmd_gen) == atomic_read(&kvm_vmid_gen)))
>           return;
>
>    spin_lock(...
>

I knew you were going to say something here :), please take a look at
this and see if you agree:

+struct kvm_arch {
+	/* The VMID generation used for the virt. memory system */
+	u64    vmid_gen;
+	u32    vmid;
+
+	/* 1-level 2nd stage table and lock */
+	struct mutex pgd_mutex;
+	pgd_t *pgd;
+
+	/* VTTBR value associated with above pgd and vmid */
+	u64    vttbr;
+};

[snip]

+/* The VMID used in the VTTBR */
+static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
+static u8 kvm_next_vmid;
+DEFINE_SPINLOCK(kvm_vmid_lock);

[snip]

+/**
+ * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
+ * @kvm	The guest that we are about to run
+ *
+ * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
+ * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
+ * caches and TLBs.
+ */
+static void update_vttbr(struct kvm *kvm)
+{
+	phys_addr_t pgd_phys;
+
+	/*
+	 *  Check that the VMID is still valid.
+	 *  (The hardware supports only 256 values with the value zero
+	 *   reserved for the host, so we check if an assigned value belongs to
+	 *   a previous generation, which which requires us to assign a new
+	 *   value. If we're the first to use a VMID for the new generation,
+	 *   we must flush necessary caches and TLBs on all CPUs.)
+	 */
+	if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
+		return;
+
+	spin_lock(&kvm_vmid_lock);
+
+	/* First user of a new VMID generation? */
+	if (unlikely(kvm_next_vmid == 0)) {
+		atomic64_inc(&kvm_vmid_gen);
+		kvm_next_vmid = 1;
+
+		/* This does nothing on UP */
+		smp_call_function(reset_vm_context, NULL, 1);
+
+		/*
+		 * On SMP we know no other CPUs can use this CPU's or
+		 * each other's VMID since the kvm_vmid_lock blocks
+		 * them from reentry to the guest.
+		 */
+
+		reset_vm_context(NULL);
+	}
+
+	kvm->arch.vmid = kvm_next_vmid;
+	kvm_next_vmid++;
+
+	/* update vttbr to be used with the new vmid */
+	pgd_phys = virt_to_phys(kvm->arch.pgd);
+	kvm->arch.vttbr = pgd_phys & ((1LLU << 40) - 1)
+			  & ~((2 << VTTBR_X) - 1);
+	kvm->arch.vttbr |= (u64)(kvm->arch.vmid) << 48;
+
+	spin_unlock(&kvm_vmid_lock);
+}


>> +
>> +     /*
>> +      *  Check that the VMID is still valid.
>> +      *  (The hardware supports only 256 values with the value zero
>> +      *   reserved for the host, so we check if an assigned value has rolled
>> +      *   over a sequence, which requires us to assign a new value and flush
>> +      *   necessary caches and TLBs on all CPUs.)
>> +      */
>> +     if (unlikely((kvm->arch.vmid ^ next_vmid) >> VMID_BITS)) {
>> +             /* Check for a new VMID generation */
>> +             if (unlikely((next_vmid & VMID_MASK) == 0)) {
>> +                     /* Check for the (very unlikely) 64-bit wrap around */
>
> Unlikely?  it's impossible.
>

well, if it happens every microsecond on a really fast processor, it
would occur a little less than every 500,000 years :)  Ok, I'll remove
the check.

>> +
>> +/**
>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>> + * @vcpu:    The VCPU pointer
>> + * @run:     The kvm_run structure pointer used for userspace state exchange
>> + *
>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>> + * will execute VM code in a loop until the time slice for the process is used
>> + * or some emulation is needed from user space in which case the function will
>> + * return with return value 0 and with the kvm_run structure filled in with the
>> + * required data for the requested emulation.
>> + */
>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  {
>> -     return -EINVAL;
>> +     int ret = 0;
>> +     sigset_t sigsaved;
>> +
>> +     if (vcpu->sigset_active)
>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>> +
>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
>> +     while (run->exit_reason == KVM_EXIT_UNKNOWN) {
>
> It's not a good idea to read stuff from run unless it's part of the ABI,
> since userspace can play with it while you're reading it.  It's harmless
> here but in other places it can lead to a vulnerability.
>

ok, so in this case, userspace can 'suppress' an MMIO or interrupt
window operation.

I can change to keep some local variable to maintain the state and
have some API convention for emulation functions, if you feel strongly
about it, but otherwise it feels to me like the code will be more
complicated. Do you have other ideas?


>> +             /*
>> +              * Check conditions before entering the guest
>> +              */
>> +             if (need_resched())
>> +                     kvm_resched(vcpu);
>
> I think cond_resched() is all that's needed these days (kvm_resched()
> predates preempt notifiers).
>

ok

>> +
>> +             if (signal_pending(current)) {
>> +                     ret = -EINTR;
>> +                     run->exit_reason = KVM_EXIT_INTR;
>> +                     break;
>> +             }
>> +
>> +             /*
>> +              * Enter the guest
>> +              */
>> +             trace_kvm_entry(vcpu->arch.regs.pc);
>> +
>> +             update_vttbr(vcpu->kvm);
>> +
>> +             local_irq_disable();
>> +             kvm_guest_enter();
>> +             vcpu->mode = IN_GUEST_MODE;
>> +
>> +             ret = __kvm_vcpu_run(vcpu);
>> +
>> +             vcpu->mode = OUTSIDE_GUEST_MODE;
>> +             kvm_guest_exit();
>> +             local_irq_enable();
>> +
>> +             trace_kvm_exit(vcpu->arch.regs.pc);
>> +     }
>> +
>> +     if (vcpu->sigset_active)
>> +             sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>> +
>> +     return ret;
>>  }
>>


Thanks,
Christoffer

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM
  2012-06-18 13:45   ` Avi Kivity
@ 2012-06-18 22:20     ` Christoffer Dall
  2012-06-19  9:32       ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-18 22:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: android-virt, kvm, Andrea Arcangeli, Marc Zyngier

On Mon, Jun 18, 2012 at 9:45 AM, Avi Kivity <avi@redhat.com> wrote:
> On 06/15/2012 10:09 PM, Christoffer Dall wrote:
>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>
>> Handles the guest faults in KVM by mapping in corresponding user pages
>> in the 2nd stage page tables.
>>
>> Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
>> pgprot_guest variables used to map 2nd stage memory for KVM guests.
>>
>> Leverages MMU notifiers on KVM/ARM by supporting the kvm_unmap_hva() operation,
>> where we remove the HVA from the 2nd stage translation. All other KVM MMU
>> notifierhooks are NOPs.
>
> I think you must at least support change_pte (possibly by unmapping).
> Andrea?
>
hmmm, at least for KSM support we need to support change_pte (are
there other callers for this type of memory?)

It's not trivial I guess, since we would need to support COW and
thereby stage-2 permission faults... Marc, right?

-Christoffer

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 14/15] ARM: KVM: Handle I/O aborts
  2012-06-18 13:48   ` Avi Kivity
@ 2012-06-18 22:28     ` Christoffer Dall
  0 siblings, 0 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-18 22:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: android-virt, kvm

On Mon, Jun 18, 2012 at 9:48 AM, Avi Kivity <avi@redhat.com> wrote:
> On 06/15/2012 10:09 PM, Christoffer Dall wrote:
>> When the guest accesses I/O memory this will create data abort
>> exceptions and they are handled by decoding the HSR information
>> (physical address, read/write, length, register) and forwarding reads
>> and writes to QEMU which performs the device emulation.
>>
>> Certain classes of load/store operations do not support the syndrome
>> information provided in the HSR and we therefore must be able to fetch
>> the offending instruction from guest memory and decode it manually.
>>
>> This requires changing the general flow somewhat since new calls to run
>> the VCPU must check if there's a pending MMIO load and perform the write
>> after userspace has made the data available.
>>
>>
>>       memslot = gfn_to_memslot(vcpu->kvm, gfn);
>> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
>> index e474a0a..bd3a6cc 100644
>> --- a/arch/arm/kvm/trace.h
>> +++ b/arch/arm/kvm/trace.h
>> @@ -39,6 +39,27 @@ TRACE_EVENT(kvm_exit,
>>       TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
>>  );
>>
>> +TRACE_EVENT(kvm_mmio_emulate,
>> +     TP_PROTO(unsigned long vcpu_pc, unsigned long instr,
>> +              unsigned long cpsr),
>> +     TP_ARGS(vcpu_pc, instr, cpsr),
>> +
>> +     TP_STRUCT__entry(
>> +             __field(        unsigned long,  vcpu_pc         )
>> +             __field(        unsigned long,  instr           )
>> +             __field(        unsigned long,  cpsr            )
>> +     ),
>> +
>> +     TP_fast_assign(
>> +             __entry->vcpu_pc                = vcpu_pc;
>> +             __entry->vcpu_pc                = instr;
>> +             __entry->vcpu_pc                = cpsr;
>
> -ECUTANDPASTE
>
hehe, nice, thanks.

-Christoffer

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 04/15] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER
  2012-06-18 17:47     ` Christoffer Dall
@ 2012-06-19  8:37       ` Avi Kivity
  0 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2012-06-19  8:37 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/18/2012 08:47 PM, Christoffer Dall wrote:
> On Mon, Jun 18, 2012 at 9:08 AM, Avi Kivity <avi@redhat.com> wrote:
>> On 06/15/2012 10:07 PM, Christoffer Dall wrote:
>>> From: Marc Zyngier <marc.zyngier@arm.com>
>>>
>>> In order to avoid compilation failure when KVM is not compiled in,
>>> guard the mmu_notifier specific sections with both CONFIG_MMU_NOTIFIER
>>> and KVM_ARCH_WANT_MMU_NOTIFIER, like it is being done in the rest of
>>> the KVM code.
>>>
>>>
>>> -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
>>> +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
>>>       struct mmu_notifier mmu_notifier;
>>>       unsigned long mmu_notifier_seq;
>>>       long mmu_notifier_count;
>>> @@ -780,7 +780,7 @@ struct kvm_stats_debugfs_item {
>>>  extern struct kvm_stats_debugfs_item debugfs_entries[];
>>>  extern struct dentry *kvm_debugfs_dir;
>>>
>>> -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
>>> +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
>>>  static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq)
>>>  {
>>
>> Why not have Kconfig select CONFIG_MMU_NOTIFIER?
>>
>>
> Not sure I understand. Where would you select this option?
> 
> We do select this option when choosing to compile KVM on, but when we
> do _not_, then other includes of kvm_host.h fails.

Right, my mistake.  Didn't notice it was a header.


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 06/15] ARM: KVM: Hypervisor identity mapping
  2012-06-18 17:55     ` Christoffer Dall
@ 2012-06-19  8:38       ` Avi Kivity
  0 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2012-06-19  8:38 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/18/2012 08:55 PM, Christoffer Dall wrote:
> On Mon, Jun 18, 2012 at 9:12 AM, Avi Kivity <avi@redhat.com> wrote:
>> On 06/15/2012 10:07 PM, Christoffer Dall wrote:
>>> Adds support in the identity mapping feature that allows KVM to setup
>>> identity mapping for the Hyp mode with the AP[1] bit set as required by
>>> the specification and also supports freeing created sub pmd's after
>>> finished use.
>>>
>>> These two functions:
>>>  - hyp_idmap_add(pgd, addr, end);
>>>  - hyp_idmap_del(pgd, addr, end);
>>> are essentially calls to the same function as the non-hyp versions but
>>> with a different argument value. KVM calls these functions to setup
>>> and teardown the identity mapping used to initialize the hypervisor.
>>>
>>> Note, the hyp-version of the _del function actually frees the pmd's
>>> pointed to by the pgd as opposed to the non-hyp version which just
>>> clears them.
>>
>>
>> I asked previously what happens if two data structures share a page, and
>> one of them is removed.  Is that handled now?  How?
>>
> 
> I think you asked previously for the general hyp-mode mappings, not
> the identity mappings. For the general hyp-mode mappings we simply
> don't unmap the data structures, potentially leaking a few pages for
> the page tables themselves.
> 
> This is only for initialization, so there are not really any data
> structures mapped, only one/two pages to initialize the hypervisor
> mode.
> 
>> Why not just identity map all memory?  You can use large pages so it's
>> fast and doesn't consume a lot of page table memory.
> 
> That's an option, but it still seems like an awful waste since it's
> only used once (unless you unload and re-load the module) and there's
> really no problem with data structures here.
> 
> The truth is that this is going to go away, and the code will be put
> in a section that's idmapped from kernel start. There's a patch under
> way from Marc taking care of this which I assmue we'll merge for v9.

Okay, thanks.


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 10/15] ARM: KVM: Inject IRQs and FIQs from userspace
  2012-06-18 20:56     ` Christoffer Dall
@ 2012-06-19  8:49       ` Avi Kivity
  2012-06-20  3:17         ` Christoffer Dall
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-06-19  8:49 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/18/2012 11:56 PM, Christoffer Dall wrote:
> On Mon, Jun 18, 2012 at 9:32 AM, Avi Kivity <avi@redhat.com> wrote:
>> On 06/15/2012 10:08 PM, Christoffer Dall wrote:
>>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>>
>>> Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
>>> This ioctl is used since the sematics are in fact two lines that can be
>>> either raised or lowered on the VCPU - the IRQ and FIQ lines.
>>>
>>> KVM needs to know which VCPU it must operate on and whether the FIQ or
>>> IRQ line is raised/lowered. Hence both pieces of information is packed
>>> in the kvm_irq_level->irq field. The irq fild value will be:
>>>   IRQ: vcpu_index << 1
>>>   FIQ: (vcpu_index << 1) | 1
>>>
>>> This is documented in Documentation/kvm/api.txt.
>>>
>>> The effect of the ioctl is simply to simply raise/lower the
>>> corresponding irq_line field on the VCPU struct, which will cause the
>>> world-switch code to raise/lower virtual interrupts when running the
>>> guest on next switch. The wait_for_interrupt flag is also cleared for
>>> raised IRQs or FIQs causing an idle VCPU to become active again. CPUs
>>> in guest mode are kicked to make sure they refresh their interrupt status.
>>
>>>
>>> +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
>>> +                                   struct kvm_irq_level *irq_level)
>>> +{
>>> +     int mask;
>>> +     unsigned int vcpu_idx;
>>> +     struct kvm_vcpu *vcpu;
>>> +     unsigned long old, new, *ptr;
>>> +
>>> +     vcpu_idx = irq_level->irq >> 1;
>>> +     if (vcpu_idx >= KVM_MAX_VCPUS)
>>> +             return -EINVAL;
>>> +
>>> +     vcpu = kvm_get_vcpu(kvm, vcpu_idx);
>>> +     if (!vcpu)
>>> +             return -EINVAL;
>>> +
>>> +     if ((irq_level->irq & 1) == KVM_ARM_IRQ_LINE)
>>> +             mask = HCR_VI;
>>> +     else /* KVM_ARM_FIQ_LINE */
>>> +             mask = HCR_VF;
>>> +
>>> +     trace_kvm_set_irq(irq_level->irq, irq_level->level, 0);
>>> +
>>> +     ptr = (unsigned long *)&vcpu->arch.irq_lines;
>>> +     do {
>>> +             old = ACCESS_ONCE(*ptr);
>>> +             if (irq_level->level)
>>> +                     new = old | mask;
>>> +             else
>>> +                     new = old & ~mask;
>>> +
>>> +             if (new == old)
>>> +                     return 0; /* no change */
>>> +     } while (cmpxchg(ptr, old, new) != old);
>>
>> Isn't this a complicated
>>
>>
>>   if (level)
>>       set_bit()
>>   else
>>       clear_bit()
>>
>> ?
>>
> 
> we need to atomically know if we changed either the FIQ/IRQ fields and
> atomically update without locking.

So use test_and_set_bit()/test_and_clear_bit().

> (I think you suggested this
> approach, in fact).

I think so too, but it only makes sense if you need to consider both
fields simultaneously (which you don't here).  For example, if IRQ was
asserted while FIQ was already raised, you don't need to kick.  But
that's not the case according to the below.

> 
>>> +
>>> +     /*
>>> +      * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
>>> +      * trigger a world-switch round on the running physical CPU to set the
>>> +      * virtual IRQ/FIQ fields in the HCR appropriately.
>>> +      */
>>> +     kvm_vcpu_kick(vcpu);
>>
>> No need to wake when the line is asserted so you can make this
>> conditional on level.
>>
> 
> we need to trigger a world switch to update the virtualized register
> from the actual running physical CPU if we changed any of the IRQ/FIQ
> fields. We return in the loop above if we didn't change anything.

Okay.


> 
>>> +
>>> +     return 0;
>>> +}
>>> +
>>>  long kvm_arch_vcpu_ioctl(struct file *filp,
>>>                        unsigned int ioctl, unsigned long arg)
>>>  {
>>> @@ -298,7 +345,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>>  long kvm_arch_vm_ioctl(struct file *filp,
>>>                      unsigned int ioctl, unsigned long arg)
>>>  {
>>> -     return -EINVAL;
>>> +     struct kvm *kvm = filp->private_data;
>>> +     void __user *argp = (void __user *)arg;
>>> +
>>> +     switch (ioctl) {
>>> +     case KVM_IRQ_LINE: {
>>> +             struct kvm_irq_level irq_event;
>>> +
>>> +             if (copy_from_user(&irq_event, argp, sizeof irq_event))
>>> +                     return -EFAULT;
>>> +             return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event);
>>> +     }
>>> +     default:
>>> +             return -EINVAL;
>>> +     }
>>>  }
>>
>> Should be in common code guarded by the define introduced previously.
>>
>>
> 
> you mean like this: ?
> 
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index a9f209b..5bf2193 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -535,8 +535,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>  	return ret;
>  }
> 
> -static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
> -				      struct kvm_irq_level *irq_level)
> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
>  {
>  	int mask;
>  	unsigned int vcpu_idx;
> @@ -596,20 +595,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> struct kvm_dirty_log *log)
>  long kvm_arch_vm_ioctl(struct file *filp,
>  		       unsigned int ioctl, unsigned long arg)
>  {
> -	struct kvm *kvm = filp->private_data;
> -	void __user *argp = (void __user *)arg;
> -
> -	switch (ioctl) {
> -	case KVM_IRQ_LINE: {
> -		struct kvm_irq_level irq_event;
> -
> -		if (copy_from_user(&irq_event, argp, sizeof irq_event))
> -			return -EFAULT;
> -		return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event);
> -	}
> -	default:
> -		return -EINVAL;
> -	}
> +	return -EINVAL;
>  }
> 
>  static void cpu_set_vector(void *vector)
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index bd77cb5..122a4b2 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -924,6 +924,16 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu
> *vcpu, struct kvm_regs *regs)
>  	return 0;
>  }
> 
> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event)
> +{
> +	if (!irqchip_in_kernel(kvm))
> +		return -ENXIO;
> +
> +	irq_event->statusstatus = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
> +					      irq_event->irq, irq_event->level);
> +	return 0;
> +}
> +
>  long kvm_arch_vm_ioctl(struct file *filp,
>  		unsigned int ioctl, unsigned long arg)
>  {
> @@ -963,29 +973,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  			goto out;
>  		}
>  		break;
> -	case KVM_IRQ_LINE_STATUS:
> -	case KVM_IRQ_LINE: {
> -		struct kvm_irq_level irq_event;
> -
> -		r = -EFAULT;
> -		if (copy_from_user(&irq_event, argp, sizeof irq_event))
> -			goto out;
> -		r = -ENXIO;
> -		if (irqchip_in_kernel(kvm)) {
> -			__s32 status;
> -			status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
> -				    irq_event.irq, irq_event.level);
> -			if (ioctl == KVM_IRQ_LINE_STATUS) {
> -				r = -EFAULT;
> -				irq_event.status = status;
> -				if (copy_to_user(argp, &irq_event,
> -							sizeof irq_event))
> -					goto out;
> -			}
> -			r = 0;
> -		}
> -		break;
> -		}
>  	case KVM_GET_IRQCHIP: {
>  		/* 0: PIC master, 1: PIC slave, 2: IOAPIC */
>  		struct kvm_irqchip chip;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a01a424..c9c4186 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3144,6 +3144,16 @@ out:
>  	return r;
>  }
> 
> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event)
> +{
> +	if (!irqchip_in_kernel(kvm))
> +		return -ENXIO;
> +
> +	irq_event->status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
> +					irq_event->irq, irq_event->level);
> +	return 0;
> +}
> +
>  long kvm_arch_vm_ioctl(struct file *filp,
>  		       unsigned int ioctl, unsigned long arg)
>  {
> @@ -3250,29 +3260,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  	create_pit_unlock:
>  		mutex_unlock(&kvm->slots_lock);
>  		break;
> -	case KVM_IRQ_LINE_STATUS:
> -	case KVM_IRQ_LINE: {
> -		struct kvm_irq_level irq_event;
> -
> -		r = -EFAULT;
> -		if (copy_from_user(&irq_event, argp, sizeof irq_event))
> -			goto out;
> -		r = -ENXIO;
> -		if (irqchip_in_kernel(kvm)) {
> -			__s32 status;
> -			status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
> -					irq_event.irq, irq_event.level);
> -			if (ioctl == KVM_IRQ_LINE_STATUS) {
> -				r = -EFAULT;
> -				irq_event.status = status;
> -				if (copy_to_user(argp, &irq_event,
> -							sizeof irq_event))
> -					goto out;
> -			}
> -			r = 0;
> -		}
> -		break;
> -	}
>  	case KVM_GET_IRQCHIP: {
>  		/* 0: PIC master, 1: PIC slave, 2: IOAPIC */
>  		struct kvm_irqchip *chip;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e3c86f8..96aa7fb 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -494,6 +494,7 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
>  				   struct
>  				   kvm_userspace_memory_region *mem,
>  				   int user_alloc);
> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level);
>  long kvm_arch_vm_ioctl(struct file *filp,
>  		       unsigned int ioctl, unsigned long arg);
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 636bd08..1d33877 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2093,6 +2093,25 @@ static long kvm_vm_ioctl(struct file *filp,
>  		break;
>  	}
>  #endif
> +#ifdef __KVM_HAVE_IRQ_LINE
> +	case KVM_IRQ_LINE_STATUS:
> +	case KVM_IRQ_LINE: {
> +		struct kvm_irq_level irq_event;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&irq_event, argp, sizeof irq_event))
> +			goto out;
> +
> +		r = kvm_vm_ioctl_irq_line(kvm, &irq_event);
> +
> +		if (ioctl == KVM_IRQ_LINE_STATUS) {
> +			if (copy_to_user(argp, &irq_event, sizeof irq_event))
> +				r = -EFAULT;
> +		}
> +
> +		break;
> +	}
> +#endif
>  	default:
>  		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
>  		if (r == -ENOTTY)

Yup.  Note it brings in KVM_IRQ_LINE_STATUS for ARM.  I think it makes
sense if you have guests that do timekeeping by counting interrupts (as
opposed to reading a wall clock register).

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
  2012-06-18 22:05     ` Christoffer Dall
@ 2012-06-19  9:16       ` Avi Kivity
  2012-06-20  3:27         ` Christoffer Dall
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-06-19  9:16 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/19/2012 01:05 AM, Christoffer Dall wrote:
>> Premature, but this is sad.  I suggest you split vmid generation from
>> next available vmid.  This allows you to make the generation counter
>> atomic so it may be read outside the lock.
>>
>> You can do
>>
>>    if (likely(kvm->arch.vmd_gen) == atomic_read(&kvm_vmid_gen)))
>>           return;
>>
>>    spin_lock(...
>>
> 
> I knew you were going to say something here :), please take a look at
> this and see if you agree:

It looks reasonable wrt locking.

> +
> +	/* First user of a new VMID generation? */
> +	if (unlikely(kvm_next_vmid == 0)) {
> +		atomic64_inc(&kvm_vmid_gen);
> +		kvm_next_vmid = 1;
> +
> +		/* This does nothing on UP */
> +		smp_call_function(reset_vm_context, NULL, 1);

Does this imply a memory barrier?  If not, smp_mb__after_atomic_inc().

> +
> +		/*
> +		 * On SMP we know no other CPUs can use this CPU's or
> +		 * each other's VMID since the kvm_vmid_lock blocks
> +		 * them from reentry to the guest.
> +		 */
> +
> +		reset_vm_context(NULL);

These two lines can be folded as on_each_cpu().

Don't you have a race here where you can increment the generation just
before guest entry?

cpu0                       cpu1
(vmid=0, gen=1)            (gen=0)
-------------------------- ----------------------
gen == global_gen, return

                           gen != global_gen
                           increment global_gen
                           smp_call_function
reset_vm_context
                           vmid=0

enter with vmid=0          enter with vmid=0

You must recheck gen after disabling interrupts to ensure global_gen
didn't bump after update_vttbr but before entry.  x86 has a lot of this,
see vcpu_enter_guest() and vcpu->requests (doesn't apply directly to
your case but may come in useful later).

> 
>>> +
>>> +/**
>>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>>> + * @vcpu:    The VCPU pointer
>>> + * @run:     The kvm_run structure pointer used for userspace state exchange
>>> + *
>>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>>> + * will execute VM code in a loop until the time slice for the process is used
>>> + * or some emulation is needed from user space in which case the function will
>>> + * return with return value 0 and with the kvm_run structure filled in with the
>>> + * required data for the requested emulation.
>>> + */
>>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  {
>>> -     return -EINVAL;
>>> +     int ret = 0;
>>> +     sigset_t sigsaved;
>>> +
>>> +     if (vcpu->sigset_active)
>>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>>> +
>>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
>>> +     while (run->exit_reason == KVM_EXIT_UNKNOWN) {
>>
>> It's not a good idea to read stuff from run unless it's part of the ABI,
>> since userspace can play with it while you're reading it.  It's harmless
>> here but in other places it can lead to a vulnerability.
>>
> 
> ok, so in this case, userspace can 'suppress' an MMIO or interrupt
> window operation.
> 
> I can change to keep some local variable to maintain the state and
> have some API convention for emulation functions, if you feel strongly
> about it, but otherwise it feels to me like the code will be more
> complicated. Do you have other ideas?

x86 uses:

 0 - return to userspace (run prepared)
 1 - return to guest (run untouched)
 -ESOMETHING - return to userspace

as return values from handlers and for locals (usually named 'r').


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM
  2012-06-18 22:20     ` Christoffer Dall
@ 2012-06-19  9:32       ` Avi Kivity
  2012-06-19 10:41         ` Andrea Arcangeli
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-06-19  9:32 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm, Andrea Arcangeli, Marc Zyngier

On 06/19/2012 01:20 AM, Christoffer Dall wrote:
> On Mon, Jun 18, 2012 at 9:45 AM, Avi Kivity <avi@redhat.com> wrote:
>> On 06/15/2012 10:09 PM, Christoffer Dall wrote:
>>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>>
>>> Handles the guest faults in KVM by mapping in corresponding user pages
>>> in the 2nd stage page tables.
>>>
>>> Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
>>> pgprot_guest variables used to map 2nd stage memory for KVM guests.
>>>
>>> Leverages MMU notifiers on KVM/ARM by supporting the kvm_unmap_hva() operation,
>>> where we remove the HVA from the 2nd stage translation. All other KVM MMU
>>> notifierhooks are NOPs.
>>
>> I think you must at least support change_pte (possibly by unmapping).
>> Andrea?
>>
> hmmm, at least for KSM support we need to support change_pte (are
> there other callers for this type of memory?)
> 
> It's not trivial I guess, since we would need to support COW and
> thereby stage-2 permission faults... Marc, right?

As I mentioned, you can support change_pte by unmapping.  This will
cause ksm to be ineffective (pages will only be shared if the guest
doesn't touch them at all), but it's enough to get started.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM
  2012-06-19  9:32       ` Avi Kivity
@ 2012-06-19 10:41         ` Andrea Arcangeli
  2012-06-20 15:13           ` Christoffer Dall
  0 siblings, 1 reply; 54+ messages in thread
From: Andrea Arcangeli @ 2012-06-19 10:41 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoffer Dall, android-virt, kvm, Marc Zyngier

On Tue, Jun 19, 2012 at 12:32:06PM +0300, Avi Kivity wrote:
> On 06/19/2012 01:20 AM, Christoffer Dall wrote:
> > On Mon, Jun 18, 2012 at 9:45 AM, Avi Kivity <avi@redhat.com> wrote:
> >> On 06/15/2012 10:09 PM, Christoffer Dall wrote:
> >>> From: Christoffer Dall <cdall@cs.columbia.edu>
> >>>
> >>> Handles the guest faults in KVM by mapping in corresponding user pages
> >>> in the 2nd stage page tables.
> >>>
> >>> Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
> >>> pgprot_guest variables used to map 2nd stage memory for KVM guests.
> >>>
> >>> Leverages MMU notifiers on KVM/ARM by supporting the kvm_unmap_hva() operation,
> >>> where we remove the HVA from the 2nd stage translation. All other KVM MMU
> >>> notifierhooks are NOPs.
> >>
> >> I think you must at least support change_pte (possibly by unmapping).
> >> Andrea?
> >>
> > hmmm, at least for KSM support we need to support change_pte (are
> > there other callers for this type of memory?)
> > 
> > It's not trivial I guess, since we would need to support COW and
> > thereby stage-2 permission faults... Marc, right?
> 
> As I mentioned, you can support change_pte by unmapping.  This will
> cause ksm to be ineffective (pages will only be shared if the guest
> doesn't touch them at all), but it's enough to get started.

The main reason change_pte initially was required for KSM to be
effective was because gup_fast was called with write=1
unconditionally. change_pte was also responsible to set the spte
readonly. But that should have been fixed now on x86, so KSM should be
effective even despite lack of change_pte on x86.

If the KVM page fault is calling gfn_to_pfn_async(write=0/1) depending
if the vmexit was caused by a write or read access (instead of
gfn_to_pfn which still has the unconditional write=1), and in turn
it's forced to sete the spte readonly after calling
gfn_to_pfn_async(write=0), change_pte is still useful but it's only a
worthwhile optimization to avoid a spte read fault after every KSM
page merged, it's not strictly required for KSM effectiveness anymore.

In short if ARM does the right thing with regard of KVM read faults
passed to gup_fast(write=0) and setting the spte readonly, all should
work good with KSM (even if not as optimal as with change_pte).

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 10/15] ARM: KVM: Inject IRQs and FIQs from userspace
  2012-06-19  8:49       ` Avi Kivity
@ 2012-06-20  3:17         ` Christoffer Dall
  0 siblings, 0 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-20  3:17 UTC (permalink / raw)
  To: Avi Kivity; +Cc: android-virt, kvm

On Tue, Jun 19, 2012 at 4:49 AM, Avi Kivity <avi@redhat.com> wrote:
> On 06/18/2012 11:56 PM, Christoffer Dall wrote:
>> On Mon, Jun 18, 2012 at 9:32 AM, Avi Kivity <avi@redhat.com> wrote:
>>> On 06/15/2012 10:08 PM, Christoffer Dall wrote:
>>>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>>>
>>>> Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
>>>> This ioctl is used since the sematics are in fact two lines that can be
>>>> either raised or lowered on the VCPU - the IRQ and FIQ lines.
>>>>
>>>> KVM needs to know which VCPU it must operate on and whether the FIQ or
>>>> IRQ line is raised/lowered. Hence both pieces of information is packed
>>>> in the kvm_irq_level->irq field. The irq fild value will be:
>>>>   IRQ: vcpu_index << 1
>>>>   FIQ: (vcpu_index << 1) | 1
>>>>
>>>> This is documented in Documentation/kvm/api.txt.
>>>>
>>>> The effect of the ioctl is simply to simply raise/lower the
>>>> corresponding irq_line field on the VCPU struct, which will cause the
>>>> world-switch code to raise/lower virtual interrupts when running the
>>>> guest on next switch. The wait_for_interrupt flag is also cleared for
>>>> raised IRQs or FIQs causing an idle VCPU to become active again. CPUs
>>>> in guest mode are kicked to make sure they refresh their interrupt status.
>>>
>>>>
>>>> +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
>>>> +                                   struct kvm_irq_level *irq_level)
>>>> +{
>>>> +     int mask;
>>>> +     unsigned int vcpu_idx;
>>>> +     struct kvm_vcpu *vcpu;
>>>> +     unsigned long old, new, *ptr;
>>>> +
>>>> +     vcpu_idx = irq_level->irq >> 1;
>>>> +     if (vcpu_idx >= KVM_MAX_VCPUS)
>>>> +             return -EINVAL;
>>>> +
>>>> +     vcpu = kvm_get_vcpu(kvm, vcpu_idx);
>>>> +     if (!vcpu)
>>>> +             return -EINVAL;
>>>> +
>>>> +     if ((irq_level->irq & 1) == KVM_ARM_IRQ_LINE)
>>>> +             mask = HCR_VI;
>>>> +     else /* KVM_ARM_FIQ_LINE */
>>>> +             mask = HCR_VF;
>>>> +
>>>> +     trace_kvm_set_irq(irq_level->irq, irq_level->level, 0);
>>>> +
>>>> +     ptr = (unsigned long *)&vcpu->arch.irq_lines;
>>>> +     do {
>>>> +             old = ACCESS_ONCE(*ptr);
>>>> +             if (irq_level->level)
>>>> +                     new = old | mask;
>>>> +             else
>>>> +                     new = old & ~mask;
>>>> +
>>>> +             if (new == old)
>>>> +                     return 0; /* no change */
>>>> +     } while (cmpxchg(ptr, old, new) != old);
>>>
>>> Isn't this a complicated
>>>
>>>
>>>   if (level)
>>>       set_bit()
>>>   else
>>>       clear_bit()
>>>
>>> ?
>>>
>>
>> we need to atomically know if we changed either the FIQ/IRQ fields and
>> atomically update without locking.
>
> So use test_and_set_bit()/test_and_clear_bit().
>
>> (I think you suggested this
>> approach, in fact).
>
> I think so too, but it only makes sense if you need to consider both
> fields simultaneously (which you don't here).  For example, if IRQ was
> asserted while FIQ was already raised, you don't need to kick.  But
> that's not the case according to the below.
>

ok, fair enough. I simplified it a bit.

>>
>>>> +
>>>> +     /*
>>>> +      * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
>>>> +      * trigger a world-switch round on the running physical CPU to set the
>>>> +      * virtual IRQ/FIQ fields in the HCR appropriately.
>>>> +      */
>>>> +     kvm_vcpu_kick(vcpu);
>>>
>>> No need to wake when the line is asserted so you can make this
>>> conditional on level.
>>>
>>
>> we need to trigger a world switch to update the virtualized register
>> from the actual running physical CPU if we changed any of the IRQ/FIQ
>> fields. We return in the loop above if we didn't change anything.
>
> Okay.
>
>
>>
>>>> +
>>>> +     return 0;
>>>> +}
>>>> +
>>>>  long kvm_arch_vcpu_ioctl(struct file *filp,
>>>>                        unsigned int ioctl, unsigned long arg)
>>>>  {
>>>> @@ -298,7 +345,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>>>  long kvm_arch_vm_ioctl(struct file *filp,
>>>>                      unsigned int ioctl, unsigned long arg)
>>>>  {
>>>> -     return -EINVAL;
>>>> +     struct kvm *kvm = filp->private_data;
>>>> +     void __user *argp = (void __user *)arg;
>>>> +
>>>> +     switch (ioctl) {
>>>> +     case KVM_IRQ_LINE: {
>>>> +             struct kvm_irq_level irq_event;
>>>> +
>>>> +             if (copy_from_user(&irq_event, argp, sizeof irq_event))
>>>> +                     return -EFAULT;
>>>> +             return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event);
>>>> +     }
>>>> +     default:
>>>> +             return -EINVAL;
>>>> +     }
>>>>  }
>>>
>>> Should be in common code guarded by the define introduced previously.
>>>
>>>
>>
>> you mean like this: ?
>>
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index a9f209b..5bf2193 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -535,8 +535,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
>> struct kvm_run *run)
>>       return ret;
>>  }
>>
>> -static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
>> -                                   struct kvm_irq_level *irq_level)
>> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
>>  {
>>       int mask;
>>       unsigned int vcpu_idx;
>> @@ -596,20 +595,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>> struct kvm_dirty_log *log)
>>  long kvm_arch_vm_ioctl(struct file *filp,
>>                      unsigned int ioctl, unsigned long arg)
>>  {
>> -     struct kvm *kvm = filp->private_data;
>> -     void __user *argp = (void __user *)arg;
>> -
>> -     switch (ioctl) {
>> -     case KVM_IRQ_LINE: {
>> -             struct kvm_irq_level irq_event;
>> -
>> -             if (copy_from_user(&irq_event, argp, sizeof irq_event))
>> -                     return -EFAULT;
>> -             return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event);
>> -     }
>> -     default:
>> -             return -EINVAL;
>> -     }
>> +     return -EINVAL;
>>  }
>>
>>  static void cpu_set_vector(void *vector)
>> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
>> index bd77cb5..122a4b2 100644
>> --- a/arch/ia64/kvm/kvm-ia64.c
>> +++ b/arch/ia64/kvm/kvm-ia64.c
>> @@ -924,6 +924,16 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu
>> *vcpu, struct kvm_regs *regs)
>>       return 0;
>>  }
>>
>> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event)
>> +{
>> +     if (!irqchip_in_kernel(kvm))
>> +             return -ENXIO;
>> +
>> +     irq_event->statusstatus = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
>> +                                           irq_event->irq, irq_event->level);
>> +     return 0;
>> +}
>> +
>>  long kvm_arch_vm_ioctl(struct file *filp,
>>               unsigned int ioctl, unsigned long arg)
>>  {
>> @@ -963,29 +973,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>                       goto out;
>>               }
>>               break;
>> -     case KVM_IRQ_LINE_STATUS:
>> -     case KVM_IRQ_LINE: {
>> -             struct kvm_irq_level irq_event;
>> -
>> -             r = -EFAULT;
>> -             if (copy_from_user(&irq_event, argp, sizeof irq_event))
>> -                     goto out;
>> -             r = -ENXIO;
>> -             if (irqchip_in_kernel(kvm)) {
>> -                     __s32 status;
>> -                     status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
>> -                                 irq_event.irq, irq_event.level);
>> -                     if (ioctl == KVM_IRQ_LINE_STATUS) {
>> -                             r = -EFAULT;
>> -                             irq_event.status = status;
>> -                             if (copy_to_user(argp, &irq_event,
>> -                                                     sizeof irq_event))
>> -                                     goto out;
>> -                     }
>> -                     r = 0;
>> -             }
>> -             break;
>> -             }
>>       case KVM_GET_IRQCHIP: {
>>               /* 0: PIC master, 1: PIC slave, 2: IOAPIC */
>>               struct kvm_irqchip chip;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index a01a424..c9c4186 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3144,6 +3144,16 @@ out:
>>       return r;
>>  }
>>
>> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event)
>> +{
>> +     if (!irqchip_in_kernel(kvm))
>> +             return -ENXIO;
>> +
>> +     irq_event->status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
>> +                                     irq_event->irq, irq_event->level);
>> +     return 0;
>> +}
>> +
>>  long kvm_arch_vm_ioctl(struct file *filp,
>>                      unsigned int ioctl, unsigned long arg)
>>  {
>> @@ -3250,29 +3260,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>       create_pit_unlock:
>>               mutex_unlock(&kvm->slots_lock);
>>               break;
>> -     case KVM_IRQ_LINE_STATUS:
>> -     case KVM_IRQ_LINE: {
>> -             struct kvm_irq_level irq_event;
>> -
>> -             r = -EFAULT;
>> -             if (copy_from_user(&irq_event, argp, sizeof irq_event))
>> -                     goto out;
>> -             r = -ENXIO;
>> -             if (irqchip_in_kernel(kvm)) {
>> -                     __s32 status;
>> -                     status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
>> -                                     irq_event.irq, irq_event.level);
>> -                     if (ioctl == KVM_IRQ_LINE_STATUS) {
>> -                             r = -EFAULT;
>> -                             irq_event.status = status;
>> -                             if (copy_to_user(argp, &irq_event,
>> -                                                     sizeof irq_event))
>> -                                     goto out;
>> -                     }
>> -                     r = 0;
>> -             }
>> -             break;
>> -     }
>>       case KVM_GET_IRQCHIP: {
>>               /* 0: PIC master, 1: PIC slave, 2: IOAPIC */
>>               struct kvm_irqchip *chip;
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index e3c86f8..96aa7fb 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -494,6 +494,7 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
>>                                  struct
>>                                  kvm_userspace_memory_region *mem,
>>                                  int user_alloc);
>> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level);
>>  long kvm_arch_vm_ioctl(struct file *filp,
>>                      unsigned int ioctl, unsigned long arg);
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 636bd08..1d33877 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -2093,6 +2093,25 @@ static long kvm_vm_ioctl(struct file *filp,
>>               break;
>>       }
>>  #endif
>> +#ifdef __KVM_HAVE_IRQ_LINE
>> +     case KVM_IRQ_LINE_STATUS:
>> +     case KVM_IRQ_LINE: {
>> +             struct kvm_irq_level irq_event;
>> +
>> +             r = -EFAULT;
>> +             if (copy_from_user(&irq_event, argp, sizeof irq_event))
>> +                     goto out;
>> +
>> +             r = kvm_vm_ioctl_irq_line(kvm, &irq_event);
>> +
>> +             if (ioctl == KVM_IRQ_LINE_STATUS) {
>> +                     if (copy_to_user(argp, &irq_event, sizeof irq_event))
>> +                             r = -EFAULT;
>> +             }
>> +
>> +             break;
>> +     }
>> +#endif
>>       default:
>>               r = kvm_arch_vm_ioctl(filp, ioctl, arg);
>>               if (r == -ENOTTY)
>
> Yup.  Note it brings in KVM_IRQ_LINE_STATUS for ARM.  I think it makes
> sense if you have guests that do timekeeping by counting interrupts (as
> opposed to reading a wall clock register).
>
> --
> error compiling committee.c: too many arguments to function
>
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
  2012-06-19  9:16       ` Avi Kivity
@ 2012-06-20  3:27         ` Christoffer Dall
  2012-06-20  4:40           ` Christoffer Dall
  0 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-20  3:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: android-virt, kvm

On Tue, Jun 19, 2012 at 5:16 AM, Avi Kivity <avi@redhat.com> wrote:
> On 06/19/2012 01:05 AM, Christoffer Dall wrote:
>>> Premature, but this is sad.  I suggest you split vmid generation from
>>> next available vmid.  This allows you to make the generation counter
>>> atomic so it may be read outside the lock.
>>>
>>> You can do
>>>
>>>    if (likely(kvm->arch.vmd_gen) == atomic_read(&kvm_vmid_gen)))
>>>           return;
>>>
>>>    spin_lock(...
>>>
>>
>> I knew you were going to say something here :), please take a look at
>> this and see if you agree:
>
> It looks reasonable wrt locking.
>
>> +
>> +     /* First user of a new VMID generation? */
>> +     if (unlikely(kvm_next_vmid == 0)) {
>> +             atomic64_inc(&kvm_vmid_gen);
>> +             kvm_next_vmid = 1;
>> +
>> +             /* This does nothing on UP */
>> +             smp_call_function(reset_vm_context, NULL, 1);
>
> Does this imply a memory barrier?  If not, smp_mb__after_atomic_inc().
>

yes, it implies a memory barrier.

>> +
>> +             /*
>> +              * On SMP we know no other CPUs can use this CPU's or
>> +              * each other's VMID since the kvm_vmid_lock blocks
>> +              * them from reentry to the guest.
>> +              */
>> +
>> +             reset_vm_context(NULL);
>
> These two lines can be folded as on_each_cpu().
>
> Don't you have a race here where you can increment the generation just
> before guest entry?

I don't think I do.

>
> cpu0                       cpu1
> (vmid=0, gen=1)            (gen=0)
> -------------------------- ----------------------
> gen == global_gen, return
>
>                           gen != global_gen
>                           increment global_gen
>                           smp_call_function
> reset_vm_context
>                           vmid=0
>
> enter with vmid=0          enter with vmid=0

I can't see how the scenario above can happen. First, no-one can run
with vmid 0 - it is reserved for the host. If we bump global_gen on
cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure
that after this call, all cpus(N-1) potentially being inside a VM will
have exited, called reset_vm_context, but before they can re-enter
into the guest, they will call update_vttbr, and if their generation
counter differs from global_gen, they will try to grab that spinlock
and everything should happen in order.

>
> You must recheck gen after disabling interrupts to ensure global_gen
> didn't bump after update_vttbr but before entry.  x86 has a lot of this,
> see vcpu_enter_guest() and vcpu->requests (doesn't apply directly to
> your case but may come in useful later).
>
>>
>>>> +
>>>> +/**
>>>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>>>> + * @vcpu:    The VCPU pointer
>>>> + * @run:     The kvm_run structure pointer used for userspace state exchange
>>>> + *
>>>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>>>> + * will execute VM code in a loop until the time slice for the process is used
>>>> + * or some emulation is needed from user space in which case the function will
>>>> + * return with return value 0 and with the kvm_run structure filled in with the
>>>> + * required data for the requested emulation.
>>>> + */
>>>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>  {
>>>> -     return -EINVAL;
>>>> +     int ret = 0;
>>>> +     sigset_t sigsaved;
>>>> +
>>>> +     if (vcpu->sigset_active)
>>>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>>>> +
>>>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
>>>> +     while (run->exit_reason == KVM_EXIT_UNKNOWN) {
>>>
>>> It's not a good idea to read stuff from run unless it's part of the ABI,
>>> since userspace can play with it while you're reading it.  It's harmless
>>> here but in other places it can lead to a vulnerability.
>>>
>>
>> ok, so in this case, userspace can 'suppress' an MMIO or interrupt
>> window operation.
>>
>> I can change to keep some local variable to maintain the state and
>> have some API convention for emulation functions, if you feel strongly
>> about it, but otherwise it feels to me like the code will be more
>> complicated. Do you have other ideas?
>
> x86 uses:
>
>  0 - return to userspace (run prepared)
>  1 - return to guest (run untouched)
>  -ESOMETHING - return to userspace
>

yeah, we can do that I guess, that's fair.

> as return values from handlers and for locals (usually named 'r').
>
>
> --
> error compiling committee.c: too many arguments to function
>
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
  2012-06-20  3:27         ` Christoffer Dall
@ 2012-06-20  4:40           ` Christoffer Dall
  2012-06-21  8:13             ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-20  4:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: android-virt, kvm

On Tue, Jun 19, 2012 at 11:27 PM, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On Tue, Jun 19, 2012 at 5:16 AM, Avi Kivity <avi@redhat.com> wrote:
>> On 06/19/2012 01:05 AM, Christoffer Dall wrote:
>>>> Premature, but this is sad.  I suggest you split vmid generation from
>>>> next available vmid.  This allows you to make the generation counter
>>>> atomic so it may be read outside the lock.
>>>>
>>>> You can do
>>>>
>>>>    if (likely(kvm->arch.vmd_gen) == atomic_read(&kvm_vmid_gen)))
>>>>           return;
>>>>
>>>>    spin_lock(...
>>>>
>>>
>>> I knew you were going to say something here :), please take a look at
>>> this and see if you agree:
>>
>> It looks reasonable wrt locking.
>>
>>> +
>>> +     /* First user of a new VMID generation? */
>>> +     if (unlikely(kvm_next_vmid == 0)) {
>>> +             atomic64_inc(&kvm_vmid_gen);
>>> +             kvm_next_vmid = 1;
>>> +
>>> +             /* This does nothing on UP */
>>> +             smp_call_function(reset_vm_context, NULL, 1);
>>
>> Does this imply a memory barrier?  If not, smp_mb__after_atomic_inc().
>>
>
> yes, it implies a memory barrier.
>
>>> +
>>> +             /*
>>> +              * On SMP we know no other CPUs can use this CPU's or
>>> +              * each other's VMID since the kvm_vmid_lock blocks
>>> +              * them from reentry to the guest.
>>> +              */
>>> +
>>> +             reset_vm_context(NULL);
>>
>> These two lines can be folded as on_each_cpu().
>>
>> Don't you have a race here where you can increment the generation just
>> before guest entry?
>
> I don't think I do.
>

uh, strike that, I do.

>>
>> cpu0                       cpu1
>> (vmid=0, gen=1)            (gen=0)
>> -------------------------- ----------------------
>> gen == global_gen, return
>>
>>                           gen != global_gen
>>                           increment global_gen
>>                           smp_call_function
>> reset_vm_context
>>                           vmid=0
>>
>> enter with vmid=0          enter with vmid=0
>
> I can't see how the scenario above can happen. First, no-one can run
> with vmid 0 - it is reserved for the host. If we bump global_gen on
> cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure
> that after this call, all cpus(N-1) potentially being inside a VM will
> have exited, called reset_vm_context, but before they can re-enter
> into the guest, they will call update_vttbr, and if their generation
> counter differs from global_gen, they will try to grab that spinlock
> and everything should happen in order.
>

the whole vmid=0 confused me a bit. The point is since we moved the
generation check outside the spin_lock we have to re-check, I see:

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 19fe3b0..74760af 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -313,6 +313,24 @@ static void reset_vm_context(void *info)
 }

 /**
+ * check_new_vmid_gen - check that the VMID is still valid
+ * @kvm: The VM's VMID to checkt
+ *
+ * return true if there is a new generation of VMIDs being used
+ *
+ * The hardware supports only 256 values with the value zero reserved for the
+ * host, so we check if an assigned value belongs to a previous generation,
+ * which which requires us to assign a new value. If we're the first to use a
+ * VMID for the new generation, we must flush necessary caches and TLBs on all
+ * CPUs.
+ */
+static bool check_new_vmid_gen(struct kvm *kvm)
+{
+	if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
+		return;
+}
+
+/**
  * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
  * @kvm	The guest that we are about to run
  *
@@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm)
 {
 	phys_addr_t pgd_phys;

-	/*
-	 *  Check that the VMID is still valid.
-	 *  (The hardware supports only 256 values with the value zero
-	 *   reserved for the host, so we check if an assigned value belongs to
-	 *   a previous generation, which which requires us to assign a new
-	 *   value. If we're the first to use a VMID for the new generation,
-	 *   we must flush necessary caches and TLBs on all CPUs.)
-	 */
-	if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
+	if (!check_new_vmid_gen(kvm))
 		return;

 	spin_lock(&kvm_vmid_lock);
@@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 		 */
 		preempt_disable();
 		local_irq_disable();
+
+		if (check_new_vmid_gen(kvm)) {
+			local_irq_enable();
+			preempt_enable();
+			continue;
+		}
+
 		kvm_guest_enter();
 		vcpu->mode = IN_GUEST_MODE;

>>
>> You must recheck gen after disabling interrupts to ensure global_gen
>> didn't bump after update_vttbr but before entry.  x86 has a lot of this,
>> see vcpu_enter_guest() and vcpu->requests (doesn't apply directly to
>> your case but may come in useful later).
>>
>>>
>>>>> +
>>>>> +/**
>>>>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>>>>> + * @vcpu:    The VCPU pointer
>>>>> + * @run:     The kvm_run structure pointer used for userspace state exchange
>>>>> + *
>>>>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>>>>> + * will execute VM code in a loop until the time slice for the process is used
>>>>> + * or some emulation is needed from user space in which case the function will
>>>>> + * return with return value 0 and with the kvm_run structure filled in with the
>>>>> + * required data for the requested emulation.
>>>>> + */
>>>>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>  {
>>>>> -     return -EINVAL;
>>>>> +     int ret = 0;
>>>>> +     sigset_t sigsaved;
>>>>> +
>>>>> +     if (vcpu->sigset_active)
>>>>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>>>>> +
>>>>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
>>>>> +     while (run->exit_reason == KVM_EXIT_UNKNOWN) {
>>>>
>>>> It's not a good idea to read stuff from run unless it's part of the ABI,
>>>> since userspace can play with it while you're reading it.  It's harmless
>>>> here but in other places it can lead to a vulnerability.
>>>>
>>>
>>> ok, so in this case, userspace can 'suppress' an MMIO or interrupt
>>> window operation.
>>>
>>> I can change to keep some local variable to maintain the state and
>>> have some API convention for emulation functions, if you feel strongly
>>> about it, but otherwise it feels to me like the code will be more
>>> complicated. Do you have other ideas?
>>
>> x86 uses:
>>
>>  0 - return to userspace (run prepared)
>>  1 - return to guest (run untouched)
>>  -ESOMETHING - return to userspace
>>
>
> yeah, we can do that I guess, that's fair.
>
>> as return values from handlers and for locals (usually named 'r').
>>
>>
>> --
>> error compiling committee.c: too many arguments to function
>>
>>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM
  2012-06-19 10:41         ` Andrea Arcangeli
@ 2012-06-20 15:13           ` Christoffer Dall
  2012-06-20 17:49             ` Andrea Arcangeli
  0 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-20 15:13 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Avi Kivity, android-virt, kvm, Marc Zyngier

On Tue, Jun 19, 2012 at 6:41 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Tue, Jun 19, 2012 at 12:32:06PM +0300, Avi Kivity wrote:
>> On 06/19/2012 01:20 AM, Christoffer Dall wrote:
>> > On Mon, Jun 18, 2012 at 9:45 AM, Avi Kivity <avi@redhat.com> wrote:
>> >> On 06/15/2012 10:09 PM, Christoffer Dall wrote:
>> >>> From: Christoffer Dall <cdall@cs.columbia.edu>
>> >>>
>> >>> Handles the guest faults in KVM by mapping in corresponding user pages
>> >>> in the 2nd stage page tables.
>> >>>
>> >>> Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
>> >>> pgprot_guest variables used to map 2nd stage memory for KVM guests.
>> >>>
>> >>> Leverages MMU notifiers on KVM/ARM by supporting the kvm_unmap_hva() operation,
>> >>> where we remove the HVA from the 2nd stage translation. All other KVM MMU
>> >>> notifierhooks are NOPs.
>> >>
>> >> I think you must at least support change_pte (possibly by unmapping).
>> >> Andrea?
>> >>
>> > hmmm, at least for KSM support we need to support change_pte (are
>> > there other callers for this type of memory?)
>> >
>> > It's not trivial I guess, since we would need to support COW and
>> > thereby stage-2 permission faults... Marc, right?
>>
>> As I mentioned, you can support change_pte by unmapping.  This will
>> cause ksm to be ineffective (pages will only be shared if the guest
>> doesn't touch them at all), but it's enough to get started.
>
> The main reason change_pte initially was required for KSM to be
> effective was because gup_fast was called with write=1
> unconditionally. change_pte was also responsible to set the spte
> readonly. But that should have been fixed now on x86, so KSM should be
> effective even despite lack of change_pte on x86.
>
> If the KVM page fault is calling gfn_to_pfn_async(write=0/1) depending
> if the vmexit was caused by a write or read access (instead of
> gfn_to_pfn which still has the unconditional write=1), and in turn
> it's forced to sete the spte readonly after calling
> gfn_to_pfn_async(write=0), change_pte is still useful but it's only a
> worthwhile optimization to avoid a spte read fault after every KSM
> page merged, it's not strictly required for KSM effectiveness anymore.
>
> In short if ARM does the right thing with regard of KVM read faults
> passed to gup_fast(write=0) and setting the spte readonly, all should
> work good with KSM (even if not as optimal as with change_pte).

ah, we don't do things right, we use gfn_to_pfn() flat out and will
always break the COW :)

I guess now, when change_pte is a nop, it's outright incorrect if
anyone runs KSM.

This has just been added to my todo-list.

-Christoffer

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM
  2012-06-20 15:13           ` Christoffer Dall
@ 2012-06-20 17:49             ` Andrea Arcangeli
  0 siblings, 0 replies; 54+ messages in thread
From: Andrea Arcangeli @ 2012-06-20 17:49 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Avi Kivity, android-virt, kvm, Marc Zyngier

On Wed, Jun 20, 2012 at 11:13:36AM -0400, Christoffer Dall wrote:
> ah, we don't do things right, we use gfn_to_pfn() flat out and will
> always break the COW :)
> 
> I guess now, when change_pte is a nop, it's outright incorrect if
> anyone runs KSM.
> 
> This has just been added to my todo-list.

Great.

Either implement change_pte, or use gfn_to_pfn_async(write_fault=0)
for read faults.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
  2012-06-20  4:40           ` Christoffer Dall
@ 2012-06-21  8:13             ` Avi Kivity
  2012-06-21 17:54               ` Christoffer Dall
  0 siblings, 1 reply; 54+ messages in thread
From: Avi Kivity @ 2012-06-21  8:13 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/20/2012 07:40 AM, Christoffer Dall wrote:
> 
>>>
>>> cpu0                       cpu1
>>> (vmid=0, gen=1)            (gen=0)
>>> -------------------------- ----------------------
>>> gen == global_gen, return
>>>
>>>                           gen != global_gen
>>>                           increment global_gen
>>>                           smp_call_function
>>> reset_vm_context
>>>                           vmid=0
>>>
>>> enter with vmid=0          enter with vmid=0
>>
>> I can't see how the scenario above can happen. First, no-one can run
>> with vmid 0 - it is reserved for the host. If we bump global_gen on
>> cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure
>> that after this call, all cpus(N-1) potentially being inside a VM will
>> have exited, called reset_vm_context, but before they can re-enter
>> into the guest, they will call update_vttbr, and if their generation
>> counter differs from global_gen, they will try to grab that spinlock
>> and everything should happen in order.
>>
> 
> the whole vmid=0 confused me a bit. The point is since we moved the
> generation check outside the spin_lock we have to re-check, I see:

In fact I think the problem occured with the original code as well.  The
problem is that the check is not atomic wrt guest entry, so

  spin_lock
  check/update
  spin_unlock

  enter guest

has a hole between spin_unlock and guest entry.

> 
>  /**
> + * check_new_vmid_gen - check that the VMID is still valid
> + * @kvm: The VM's VMID to checkt
> + *
> + * return true if there is a new generation of VMIDs being used
> + *
> + * The hardware supports only 256 values with the value zero reserved for the
> + * host, so we check if an assigned value belongs to a previous generation,
> + * which which requires us to assign a new value. If we're the first to use a
> + * VMID for the new generation, we must flush necessary caches and TLBs on all
> + * CPUs.
> + */
> +static bool check_new_vmid_gen(struct kvm *kvm)
> +{
> +	if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
> +		return;
> +}
> +
> +/**
>   * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>   * @kvm	The guest that we are about to run
>   *
> @@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm)
>  {
>  	phys_addr_t pgd_phys;
> 
> -	/*
> -	 *  Check that the VMID is still valid.
> -	 *  (The hardware supports only 256 values with the value zero
> -	 *   reserved for the host, so we check if an assigned value belongs to
> -	 *   a previous generation, which which requires us to assign a new
> -	 *   value. If we're the first to use a VMID for the new generation,
> -	 *   we must flush necessary caches and TLBs on all CPUs.)
> -	 */
> -	if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
> +	if (!check_new_vmid_gen(kvm))
>  		return;
> 
>  	spin_lock(&kvm_vmid_lock);
> @@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
>  		 */
>  		preempt_disable();
>  		local_irq_disable();
> +
> +		if (check_new_vmid_gen(kvm)) {
> +			local_irq_enable();
> +			preempt_enable();
> +			continue;
> +		}
> +

I see the same race with signals.  Your signal_pending() check needs to
be after the local_irq_disable(), otherwise we can enter a guest with a
pending signal.

Better place the signal check before the vmid_gen check, to avoid the
possibility of a a signal being held up by other guests.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 09/15] ARM: KVM: Memory virtualization setup
  2012-06-15 19:08 ` [PATCH v8 09/15] ARM: KVM: Memory virtualization setup Christoffer Dall
@ 2012-06-21 12:29   ` Gleb Natapov
  2012-06-21 19:48     ` Christoffer Dall
  2012-06-28 22:34   ` Marcelo Tosatti
  1 sibling, 1 reply; 54+ messages in thread
From: Gleb Natapov @ 2012-06-21 12:29 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On Fri, Jun 15, 2012 at 03:08:22PM -0400, Christoffer Dall wrote:
> From: Christoffer Dall <cdall@cs.columbia.edu>
> 
> This commit introduces the framework for guest memory management
> through the use of 2nd stage translation. Each VM has a pointer
> to a level-1 table (the pgd field in struct kvm_arch) which is
> used for the 2nd stage translations. Entries are added when handling
> guest faults (later patch) and the table itself can be allocated and
> freed through the following functions implemented in
> arch/arm/kvm/arm_mmu.c:
>  - kvm_alloc_stage2_pgd(struct kvm *kvm);
>  - kvm_free_stage2_pgd(struct kvm *kvm);
> 
> Further, each entry in TLBs and caches are tagged with a VMID
> identifier in addition to ASIDs. The VMIDs are assigned consecutively
> to VMs in the order that VMs are executed, and caches and tlbs are
> invalidated when the VMID space has been used to allow for more than
> 255 simultaenously running guests.
> 
> The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
> freed in kvm_arch_destroy_vm(). Both functions are called from the main
> KVM code.
> 
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/kvm_arm.h |    2 -
>  arch/arm/include/asm/kvm_mmu.h |    5 ++
>  arch/arm/kvm/arm.c             |   65 ++++++++++++++++++++++---
>  arch/arm/kvm/mmu.c             |  103 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 166 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index 7f30cbd..257242f 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -62,7 +62,7 @@
>   * SWIO:	Turn set/way invalidates into set/way clean+invalidate
>   */
>  #define HCR_GUEST_MASK (HCR_TSC | HCR_TWI | HCR_VM | HCR_BSU_IS | HCR_FB | \
> -			HCR_AMO | HCR_IMO | HCR_FMO | HCR_FMO | HCR_SWIO)
> +			HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO)
>  
>  /* Hyp System Control Register (HSCTLR) bits */
>  #define HSCTLR_TE	(1 << 30)
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 1aa1af4..d95662eb 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -34,4 +34,9 @@ int kvm_hyp_pgd_alloc(void);
>  pgd_t *kvm_hyp_pgd_get(void);
>  void kvm_hyp_pgd_free(void);
>  
> +int kvm_alloc_stage2_pgd(struct kvm *kvm);
> +void kvm_free_stage2_pgd(struct kvm *kvm);
> +
> +int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
> +
>  #endif /* __ARM_KVM_MMU_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index efe130c..81babe9 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -38,6 +38,13 @@
>  
>  static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>  
> +/* The VMID used in the VTTBR */
> +#define VMID_BITS               8
> +#define VMID_MASK               ((1 << VMID_BITS) - 1)
> +#define VMID_FIRST_GENERATION	(1 << VMID_BITS)
> +static u64 next_vmid;		/* The next available VMID in the sequence */
> +DEFINE_SPINLOCK(kvm_vmid_lock);
> +
>  int kvm_arch_hardware_enable(void *garbage)
>  {
>  	return 0;
> @@ -70,14 +77,6 @@ void kvm_arch_sync_events(struct kvm *kvm)
>  {
>  }
>  
> -int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> -{
> -	if (type)
> -		return -EINVAL;
> -
> -	return 0;
> -}
> -
>  int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>  {
>  	return VM_FAULT_SIGBUS;
> @@ -93,10 +92,46 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
>  	return 0;
>  }
>  
> +/**
> + * kvm_arch_init_vm - initializes a VM data structure
> + * @kvm:	pointer to the KVM struct
> + */
> +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> +{
> +	int ret = 0;
> +
> +	if (type)
> +		return -EINVAL;
> +
> +	ret = kvm_alloc_stage2_pgd(kvm);
> +	if (ret)
> +		goto out_fail_alloc;
> +	mutex_init(&kvm->arch.pgd_mutex);
> +
> +	ret = create_hyp_mappings(kvm, kvm + 1);
> +	if (ret)
> +		goto out_free_stage2_pgd;
> +
> +	/* Mark the initial VMID invalid */
> +	kvm->arch.vmid = 0;
> +
> +	return ret;
> +out_free_stage2_pgd:
> +	kvm_free_stage2_pgd(kvm);
> +out_fail_alloc:
> +	return ret;
> +}
> +
> +/**
> + * kvm_arch_destroy_vm - destroy the VM data structure
> + * @kvm:	pointer to the KVM struct
> + */
>  void kvm_arch_destroy_vm(struct kvm *kvm)
>  {
>  	int i;
>  
> +	kvm_free_stage2_pgd(kvm);
> +
>  	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>  		if (kvm->vcpus[i]) {
>  			kvm_arch_vcpu_free(kvm->vcpus[i]);
> @@ -172,6 +207,10 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>  	if (err)
>  		goto free_vcpu;
>  
> +	err = create_hyp_mappings(vcpu, vcpu + 1);
> +	if (err)
> +		goto free_vcpu;
> +
>  	return vcpu;
>  free_vcpu:
>  	kmem_cache_free(kvm_vcpu_cache, vcpu);
> @@ -181,6 +220,7 @@ out:
>  
>  void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
>  {
> +	kmem_cache_free(kvm_vcpu_cache, vcpu);
>  }
>  
>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> @@ -416,6 +456,15 @@ int kvm_arch_init(void *opaque)
>  	if (err)
>  		goto out_err;
>  
> +	/*
> +	 * The upper 56 bits of VMIDs are used to identify the generation
> +	 * counter, so VMIDs initialized to 0, having generation == 0, will
> +	 * never be considered valid and therefor a new VMID must always be
> +	 * assigned. Whent he VMID generation rolls over, we start from
> +	 * VMID_FIRST_GENERATION again.
> +	 */
> +	next_vmid = VMID_FIRST_GENERATION;
> +
>  	return 0;
>  out_err:
>  	return err;
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index a320b56a..b256540 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -159,6 +159,109 @@ out:
>  	return err;
>  }
>  
> +/**
> + * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
> + * @kvm:	The KVM struct pointer for the VM.
> + *
> + * Allocates the 1st level table only of size defined by PGD2_ORDER (can
> + * support either full 40-bit input addresses or limited to 32-bit input
> + * addresses). Clears the allocated pages.
> + */
> +int kvm_alloc_stage2_pgd(struct kvm *kvm)
> +{
> +	pgd_t *pgd;
> +
> +	if (kvm->arch.pgd != NULL) {
> +		kvm_err("kvm_arch already initialized?\n");
> +		return -EINVAL;
> +	}
> +
> +	pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
> +	if (!pgd)
> +		return -ENOMEM;
> +
> +	memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
> +	kvm->arch.pgd = pgd;
> +
> +	return 0;
> +}
> +
> +static void free_guest_pages(pte_t *pte, unsigned long addr)
> +{
> +	unsigned int i;
> +	struct page *page;
> +
> +	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
Hmm, "addr" is not used.

> +		if (!pte_present(*pte))
> +			goto next_page;
Why goto instead of:

              if(pte_present(*pte)) {
> +		page = pfn_to_page(pte_pfn(*pte));
> +		put_page(page);
              }

> +next_page:
> +		pte++;
> +	}
> +}

> +
> +static void free_stage2_ptes(pmd_t *pmd, unsigned long addr)
> +{
> +	unsigned int i;
> +	pte_t *pte;
> +	struct page *page;
> +
> +	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
> +		BUG_ON(pmd_sect(*pmd));
> +		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
> +			pte = pte_offset_kernel(pmd, addr);
> +			free_guest_pages(pte, addr);
> +			page = virt_to_page((void *)pte);
> +			WARN_ON(atomic_read(&page->_count) != 1);
> +			pte_free_kernel(NULL, pte);
> +		}
> +		pmd++;
> +	}
> +}
> +
> +/**
> + * kvm_free_stage2_pgd - free all stage-2 tables
> + * @kvm:	The KVM struct pointer for the VM.
> + *
> + * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
> + * underlying level-2 and level-3 tables before freeing the actual level-1 table
> + * and setting the struct pointer to NULL.
> + */
> +void kvm_free_stage2_pgd(struct kvm *kvm)
> +{
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	unsigned long long i, addr;
> +
> +	if (kvm->arch.pgd == NULL)
> +		return;
> +
> +	/*
> +	 * We do this slightly different than other places, since we need more
> +	 * than 32 bits and for instance pgd_addr_end converts to unsigned long.
> +	 */
> +	addr = 0;
> +	for (i = 0; i < PTRS_PER_PGD2; i++) {
> +		addr = i * (unsigned long long)PGDIR_SIZE;
> +		pgd = kvm->arch.pgd + i;
> +		pud = pud_offset(pgd, addr);
> +
> +		if (pud_none(*pud))
> +			continue;
> +
> +		BUG_ON(pud_bad(*pud));
> +
> +		pmd = pmd_offset(pud, addr);
> +		free_stage2_ptes(pmd, addr);
> +		pmd_free(NULL, pmd);
> +	}
> +
> +	free_pages((unsigned long)kvm->arch.pgd, PGD2_ORDER);
> +	kvm->arch.pgd = NULL;
> +}
> +
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
>  	return -EINVAL;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
  2012-06-21  8:13             ` Avi Kivity
@ 2012-06-21 17:54               ` Christoffer Dall
  2012-07-02 13:07                 ` Avi Kivity
  0 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-21 17:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: android-virt, kvm

On Thu, Jun 21, 2012 at 4:13 AM, Avi Kivity <avi@redhat.com> wrote:
> On 06/20/2012 07:40 AM, Christoffer Dall wrote:
>>
>>>>
>>>> cpu0                       cpu1
>>>> (vmid=0, gen=1)            (gen=0)
>>>> -------------------------- ----------------------
>>>> gen == global_gen, return
>>>>
>>>>                           gen != global_gen
>>>>                           increment global_gen
>>>>                           smp_call_function
>>>> reset_vm_context
>>>>                           vmid=0
>>>>
>>>> enter with vmid=0          enter with vmid=0
>>>
>>> I can't see how the scenario above can happen. First, no-one can run
>>> with vmid 0 - it is reserved for the host. If we bump global_gen on
>>> cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure
>>> that after this call, all cpus(N-1) potentially being inside a VM will
>>> have exited, called reset_vm_context, but before they can re-enter
>>> into the guest, they will call update_vttbr, and if their generation
>>> counter differs from global_gen, they will try to grab that spinlock
>>> and everything should happen in order.
>>>
>>
>> the whole vmid=0 confused me a bit. The point is since we moved the
>> generation check outside the spin_lock we have to re-check, I see:
>
> In fact I think the problem occured with the original code as well.  The
> problem is that the check is not atomic wrt guest entry, so
>
>  spin_lock
>  check/update
>  spin_unlock
>
>  enter guest
>
> has a hole between spin_unlock and guest entry.
>

you are absolutely right.

>>
>>  /**
>> + * check_new_vmid_gen - check that the VMID is still valid
>> + * @kvm: The VM's VMID to checkt
>> + *
>> + * return true if there is a new generation of VMIDs being used
>> + *
>> + * The hardware supports only 256 values with the value zero reserved for the
>> + * host, so we check if an assigned value belongs to a previous generation,
>> + * which which requires us to assign a new value. If we're the first to use a
>> + * VMID for the new generation, we must flush necessary caches and TLBs on all
>> + * CPUs.
>> + */
>> +static bool check_new_vmid_gen(struct kvm *kvm)
>> +{
>> +     if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
>> +             return;
>> +}
>> +
>> +/**
>>   * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>>   * @kvm      The guest that we are about to run
>>   *
>> @@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm)
>>  {
>>       phys_addr_t pgd_phys;
>>
>> -     /*
>> -      *  Check that the VMID is still valid.
>> -      *  (The hardware supports only 256 values with the value zero
>> -      *   reserved for the host, so we check if an assigned value belongs to
>> -      *   a previous generation, which which requires us to assign a new
>> -      *   value. If we're the first to use a VMID for the new generation,
>> -      *   we must flush necessary caches and TLBs on all CPUs.)
>> -      */
>> -     if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
>> +     if (!check_new_vmid_gen(kvm))
>>               return;
>>
>>       spin_lock(&kvm_vmid_lock);
>> @@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
>> *vcpu, struct kvm_run *run)
>>                */
>>               preempt_disable();
>>               local_irq_disable();
>> +
>> +             if (check_new_vmid_gen(kvm)) {
>> +                     local_irq_enable();
>> +                     preempt_enable();
>> +                     continue;
>> +             }
>> +
>
> I see the same race with signals.  Your signal_pending() check needs to
> be after the local_irq_disable(), otherwise we can enter a guest with a
> pending signal.
>

that's not functionally incorrect though is it? It may simply increase
the latency for the signal delivery as far as I can see, but I
definitely don't mind changing this path in any case.

>
> Better place the signal check before the vmid_gen check, to avoid the
> possibility of a a signal being held up by other guests.
>

nice ;)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 09/15] ARM: KVM: Memory virtualization setup
  2012-06-21 12:29   ` Gleb Natapov
@ 2012-06-21 19:48     ` Christoffer Dall
  0 siblings, 0 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-21 19:48 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: android-virt, kvm

On Thu, Jun 21, 2012 at 8:29 AM, Gleb Natapov <gleb@redhat.com> wrote:
> On Fri, Jun 15, 2012 at 03:08:22PM -0400, Christoffer Dall wrote:
>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>
>> This commit introduces the framework for guest memory management
>> through the use of 2nd stage translation. Each VM has a pointer
>> to a level-1 table (the pgd field in struct kvm_arch) which is
>> used for the 2nd stage translations. Entries are added when handling
>> guest faults (later patch) and the table itself can be allocated and
>> freed through the following functions implemented in
>> arch/arm/kvm/arm_mmu.c:
>>  - kvm_alloc_stage2_pgd(struct kvm *kvm);
>>  - kvm_free_stage2_pgd(struct kvm *kvm);
>>
>> Further, each entry in TLBs and caches are tagged with a VMID
>> identifier in addition to ASIDs. The VMIDs are assigned consecutively
>> to VMs in the order that VMs are executed, and caches and tlbs are
>> invalidated when the VMID space has been used to allow for more than
>> 255 simultaenously running guests.
>>
>> The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
>> freed in kvm_arch_destroy_vm(). Both functions are called from the main
>> KVM code.
>>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/kvm_arm.h |    2 -
>>  arch/arm/include/asm/kvm_mmu.h |    5 ++
>>  arch/arm/kvm/arm.c             |   65 ++++++++++++++++++++++---
>>  arch/arm/kvm/mmu.c             |  103 ++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 166 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index 7f30cbd..257242f 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -62,7 +62,7 @@
>>   * SWIO:     Turn set/way invalidates into set/way clean+invalidate
>>   */
>>  #define HCR_GUEST_MASK (HCR_TSC | HCR_TWI | HCR_VM | HCR_BSU_IS | HCR_FB | \
>> -                     HCR_AMO | HCR_IMO | HCR_FMO | HCR_FMO | HCR_SWIO)
>> +                     HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO)
>>
>>  /* Hyp System Control Register (HSCTLR) bits */
>>  #define HSCTLR_TE    (1 << 30)
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 1aa1af4..d95662eb 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -34,4 +34,9 @@ int kvm_hyp_pgd_alloc(void);
>>  pgd_t *kvm_hyp_pgd_get(void);
>>  void kvm_hyp_pgd_free(void);
>>
>> +int kvm_alloc_stage2_pgd(struct kvm *kvm);
>> +void kvm_free_stage2_pgd(struct kvm *kvm);
>> +
>> +int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
>> +
>>  #endif /* __ARM_KVM_MMU_H__ */
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index efe130c..81babe9 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -38,6 +38,13 @@
>>
>>  static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>>
>> +/* The VMID used in the VTTBR */
>> +#define VMID_BITS               8
>> +#define VMID_MASK               ((1 << VMID_BITS) - 1)
>> +#define VMID_FIRST_GENERATION        (1 << VMID_BITS)
>> +static u64 next_vmid;                /* The next available VMID in the sequence */
>> +DEFINE_SPINLOCK(kvm_vmid_lock);
>> +
>>  int kvm_arch_hardware_enable(void *garbage)
>>  {
>>       return 0;
>> @@ -70,14 +77,6 @@ void kvm_arch_sync_events(struct kvm *kvm)
>>  {
>>  }
>>
>> -int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>> -{
>> -     if (type)
>> -             return -EINVAL;
>> -
>> -     return 0;
>> -}
>> -
>>  int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>>  {
>>       return VM_FAULT_SIGBUS;
>> @@ -93,10 +92,46 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
>>       return 0;
>>  }
>>
>> +/**
>> + * kvm_arch_init_vm - initializes a VM data structure
>> + * @kvm:     pointer to the KVM struct
>> + */
>> +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>> +{
>> +     int ret = 0;
>> +
>> +     if (type)
>> +             return -EINVAL;
>> +
>> +     ret = kvm_alloc_stage2_pgd(kvm);
>> +     if (ret)
>> +             goto out_fail_alloc;
>> +     mutex_init(&kvm->arch.pgd_mutex);
>> +
>> +     ret = create_hyp_mappings(kvm, kvm + 1);
>> +     if (ret)
>> +             goto out_free_stage2_pgd;
>> +
>> +     /* Mark the initial VMID invalid */
>> +     kvm->arch.vmid = 0;
>> +
>> +     return ret;
>> +out_free_stage2_pgd:
>> +     kvm_free_stage2_pgd(kvm);
>> +out_fail_alloc:
>> +     return ret;
>> +}
>> +
>> +/**
>> + * kvm_arch_destroy_vm - destroy the VM data structure
>> + * @kvm:     pointer to the KVM struct
>> + */
>>  void kvm_arch_destroy_vm(struct kvm *kvm)
>>  {
>>       int i;
>>
>> +     kvm_free_stage2_pgd(kvm);
>> +
>>       for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>>               if (kvm->vcpus[i]) {
>>                       kvm_arch_vcpu_free(kvm->vcpus[i]);
>> @@ -172,6 +207,10 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>>       if (err)
>>               goto free_vcpu;
>>
>> +     err = create_hyp_mappings(vcpu, vcpu + 1);
>> +     if (err)
>> +             goto free_vcpu;
>> +
>>       return vcpu;
>>  free_vcpu:
>>       kmem_cache_free(kvm_vcpu_cache, vcpu);
>> @@ -181,6 +220,7 @@ out:
>>
>>  void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
>>  {
>> +     kmem_cache_free(kvm_vcpu_cache, vcpu);
>>  }
>>
>>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>> @@ -416,6 +456,15 @@ int kvm_arch_init(void *opaque)
>>       if (err)
>>               goto out_err;
>>
>> +     /*
>> +      * The upper 56 bits of VMIDs are used to identify the generation
>> +      * counter, so VMIDs initialized to 0, having generation == 0, will
>> +      * never be considered valid and therefor a new VMID must always be
>> +      * assigned. Whent he VMID generation rolls over, we start from
>> +      * VMID_FIRST_GENERATION again.
>> +      */
>> +     next_vmid = VMID_FIRST_GENERATION;
>> +
>>       return 0;
>>  out_err:
>>       return err;
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index a320b56a..b256540 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -159,6 +159,109 @@ out:
>>       return err;
>>  }
>>
>> +/**
>> + * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
>> + * @kvm:     The KVM struct pointer for the VM.
>> + *
>> + * Allocates the 1st level table only of size defined by PGD2_ORDER (can
>> + * support either full 40-bit input addresses or limited to 32-bit input
>> + * addresses). Clears the allocated pages.
>> + */
>> +int kvm_alloc_stage2_pgd(struct kvm *kvm)
>> +{
>> +     pgd_t *pgd;
>> +
>> +     if (kvm->arch.pgd != NULL) {
>> +             kvm_err("kvm_arch already initialized?\n");
>> +             return -EINVAL;
>> +     }
>> +
>> +     pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
>> +     if (!pgd)
>> +             return -ENOMEM;
>> +
>> +     memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
>> +     kvm->arch.pgd = pgd;
>> +
>> +     return 0;
>> +}
>> +
>> +static void free_guest_pages(pte_t *pte, unsigned long addr)
>> +{
>> +     unsigned int i;
>> +     struct page *page;
>> +
>> +     for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
> Hmm, "addr" is not used.
>

indeed it's not

>> +             if (!pte_present(*pte))
>> +                     goto next_page;
> Why goto instead of:
>

historic reasons, thanks.

>              if(pte_present(*pte)) {
>> +             page = pfn_to_page(pte_pfn(*pte));
>> +             put_page(page);
>              }
>
>> +next_page:
>> +             pte++;
>> +     }
>> +}
>
>> +
>> +static void free_stage2_ptes(pmd_t *pmd, unsigned long addr)
>> +{
>> +     unsigned int i;
>> +     pte_t *pte;
>> +     struct page *page;
>> +
>> +     for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
>> +             BUG_ON(pmd_sect(*pmd));
>> +             if (!pmd_none(*pmd) && pmd_table(*pmd)) {
>> +                     pte = pte_offset_kernel(pmd, addr);
>> +                     free_guest_pages(pte, addr);
>> +                     page = virt_to_page((void *)pte);
>> +                     WARN_ON(atomic_read(&page->_count) != 1);
>> +                     pte_free_kernel(NULL, pte);
>> +             }
>> +             pmd++;
>> +     }
>> +}
>> +
>> +/**
>> + * kvm_free_stage2_pgd - free all stage-2 tables
>> + * @kvm:     The KVM struct pointer for the VM.
>> + *
>> + * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
>> + * underlying level-2 and level-3 tables before freeing the actual level-1 table
>> + * and setting the struct pointer to NULL.
>> + */
>> +void kvm_free_stage2_pgd(struct kvm *kvm)
>> +{
>> +     pgd_t *pgd;
>> +     pud_t *pud;
>> +     pmd_t *pmd;
>> +     unsigned long long i, addr;
>> +
>> +     if (kvm->arch.pgd == NULL)
>> +             return;
>> +
>> +     /*
>> +      * We do this slightly different than other places, since we need more
>> +      * than 32 bits and for instance pgd_addr_end converts to unsigned long.
>> +      */
>> +     addr = 0;
>> +     for (i = 0; i < PTRS_PER_PGD2; i++) {
>> +             addr = i * (unsigned long long)PGDIR_SIZE;
>> +             pgd = kvm->arch.pgd + i;
>> +             pud = pud_offset(pgd, addr);
>> +
>> +             if (pud_none(*pud))
>> +                     continue;
>> +
>> +             BUG_ON(pud_bad(*pud));
>> +
>> +             pmd = pmd_offset(pud, addr);
>> +             free_stage2_ptes(pmd, addr);
>> +             pmd_free(NULL, pmd);
>> +     }
>> +
>> +     free_pages((unsigned long)kvm->arch.pgd, PGD2_ORDER);
>> +     kvm->arch.pgd = NULL;
>> +}
>> +
>>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  {
>>       return -EINVAL;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
>                        Gleb.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 04/15] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER
  2012-06-15 19:07 ` [PATCH v8 04/15] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER Christoffer Dall
  2012-06-18 13:08   ` Avi Kivity
@ 2012-06-28 21:28   ` Marcelo Tosatti
  1 sibling, 0 replies; 54+ messages in thread
From: Marcelo Tosatti @ 2012-06-28 21:28 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On Fri, Jun 15, 2012 at 03:07:24PM -0400, Christoffer Dall wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> In order to avoid compilation failure when KVM is not compiled in,
> guard the mmu_notifier specific sections with both CONFIG_MMU_NOTIFIER
> and KVM_ARCH_WANT_MMU_NOTIFIER, like it is being done in the rest of
> the KVM code.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  include/linux/kvm_host.h |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Applied, thanks.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 00/15] KVM/ARM Implementation
  2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
                   ` (14 preceding siblings ...)
  2012-06-15 19:09 ` [PATCH v8 15/15] ARM: KVM: Guest wait-for-interrupts (WFI) support Christoffer Dall
@ 2012-06-28 21:49 ` Marcelo Tosatti
  2012-06-28 22:44   ` Christoffer Dall
  15 siblings, 1 reply; 54+ messages in thread
From: Marcelo Tosatti @ 2012-06-28 21:49 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On Fri, Jun 15, 2012 at 03:06:39PM -0400, Christoffer Dall wrote:
> The following series implements KVM support for ARM processors,
> specifically on the Cortex A-15 platform.  Work is done in
> collaboration between Columbia University, Virtual Open Systems and
> ARM/Linaro.
> 
> The patch series applies to kvm/next, specifically commit:
> 25e531a988ea5a64bd97a72dc9d3c65ad5850120
> 
> This is Version 8 of the patch series, but the first two versions
> were reviewed outside of the KVM mailing list. Changes can also be
> pulled from:
>  git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v8
> 
> A non-flattened edition of the patch series can be found at:
>  git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v8-stage
> 
> The implementation is broken up into a logical set of patches, the first
> four are preparatory patches:
>  1. Add mem_type prot_pte accessor (ARM community)
>  2. Use KVM_CAP_IRQ_ROUTING to protect routing code  (KVM community)
>  3. Introduce __KVM_HAVE_IRQ_LINE (KVM community)
>  4. Guard code with CONFIG_MMU_NOTIFIER (KVM community)
> 
> KVM guys, please consider pulling the KVM generic patches as early as
> possible. Thanks.
> 
> The main implementation is broken up into separate patches, the first
> containing a skeleton of files, makefile changes, the basic user space
> interface and KVM architecture specific stubs.  Subsequent patches
> implement parts of the system as listed:
>  1.  Preparatory patch introducing __KVM_HAVE_IRQ_LINE
>  2.  Preparatory patch guarding mmu_notifier code with CONFIG_MMU_NOTIFIER
>  3.  Skeleton
>  4.  Identity Mapping for Hyp mode
>  5.  Hypervisor initialization
>  6.  Hypervisor module unloading
>  7.  Memory virtualization setup (hyp mode mappings and 2nd stage)
>  8.  Inject IRQs and FIQs from userspace
>  9.  World-switch implementation and Hyp exception vectors
>  10. Emulation framework and CP15 emulation
>  11. Handle guest user memory aborts
>  12. Handle guest MMIO aborts
>  13. Support guest wait-for-interrupt instructions
> 
> Testing:
> Limited testing, but have run GCC inside guest, which compiled a small
> hello-world program, which was successfully run. For v8 both ARM/Thumb-2
> kernels were tested as both host/guest and both a compiled-in version
> and a kernel module version of KVM was tested. Hardware still
> unavailable to me, so all testing has been done on ARM Fast Models.
> 
> For a guide on how to set up a testing environment and try out these
> patches, see:
>  http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf
> 
> There is an issue list available using the issue tracker on:
> https://github.com/virtualopensystems/linux-kvm-arm

Is there public documentation for "hyp-mode" available?


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 09/15] ARM: KVM: Memory virtualization setup
  2012-06-15 19:08 ` [PATCH v8 09/15] ARM: KVM: Memory virtualization setup Christoffer Dall
  2012-06-21 12:29   ` Gleb Natapov
@ 2012-06-28 22:34   ` Marcelo Tosatti
  2012-06-28 22:51     ` Christoffer Dall
  1 sibling, 1 reply; 54+ messages in thread
From: Marcelo Tosatti @ 2012-06-28 22:34 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On Fri, Jun 15, 2012 at 03:08:22PM -0400, Christoffer Dall wrote:
> From: Christoffer Dall <cdall@cs.columbia.edu>
> 
> This commit introduces the framework for guest memory management
> through the use of 2nd stage translation. Each VM has a pointer
> to a level-1 table (the pgd field in struct kvm_arch) which is
> used for the 2nd stage translations. Entries are added when handling
> guest faults (later patch) and the table itself can be allocated and
> freed through the following functions implemented in
> arch/arm/kvm/arm_mmu.c:
>  - kvm_alloc_stage2_pgd(struct kvm *kvm);
>  - kvm_free_stage2_pgd(struct kvm *kvm);
> 
> Further, each entry in TLBs and caches are tagged with a VMID
> identifier in addition to ASIDs. The VMIDs are assigned consecutively
> to VMs in the order that VMs are executed, and caches and tlbs are
> invalidated when the VMID space has been used to allow for more than
> 255 simultaenously running guests.
> 
> The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
> freed in kvm_arch_destroy_vm(). Both functions are called from the main
> KVM code.
> 
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

Can you explain on a high level how the IPA -> PA mappings work? 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 07/15] ARM: KVM: Hypervisor inititalization
  2012-06-15 19:07 ` [PATCH v8 07/15] ARM: KVM: Hypervisor inititalization Christoffer Dall
@ 2012-06-28 22:35   ` Marcelo Tosatti
  2012-06-28 22:53     ` Christoffer Dall
  0 siblings, 1 reply; 54+ messages in thread
From: Marcelo Tosatti @ 2012-06-28 22:35 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On Fri, Jun 15, 2012 at 03:07:59PM -0400, Christoffer Dall wrote:
> Sets up the required registers to run code in HYP-mode from the kernel.
> 
> By setting the HVBAR the kernel can execute code in Hyp-mode with
> the MMU disabled. The HVBAR initially points to initialization code,
> which initializes other Hyp-mode registers and enables the MMU
> for Hyp-mode. Afterwards, the HVBAR is changed to point to KVM
> Hyp vectors used to catch guest faults and to switch to Hyp mode
> to perform a world-switch into a KVM guest.
> 
> Also provides memory mapping code to map required code pages and data
> structures accessed in Hyp mode at the same virtual address as the
> host kernel virtual addresses, but which conforms to the architectural
> requirements for translations in Hyp mode. This interface is added in
> arch/arm/kvm/arm_mmu.c and is comprised of:
>  - create_hyp_mappings(hyp_pgd, start, end);
>  - free_hyp_pmds(pgd_hyp);
> 
> Note: The initialization mechanism currently relies on an SMC #0 call
> to the secure monitor, which was merely a fast way of getting to the
> hypervisor. Dave Marting and Rusty Russel have patches out to make the
> boot-wrapper and the kernel boot in Hyp-mode and setup a generic way for
> hypervisors to get access to Hyp-mode if the boot-loader allows such
> access.
> 
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/kvm_arm.h              |  117 +++++++++++++++++++
>  arch/arm/include/asm/kvm_asm.h              |   22 +++
>  arch/arm/include/asm/kvm_mmu.h              |   37 ++++++
>  arch/arm/include/asm/pgtable-3level-hwdef.h |    4 +
>  arch/arm/include/asm/pgtable-3level.h       |    4 +
>  arch/arm/include/asm/pgtable.h              |    1 
>  arch/arm/kvm/arm.c                          |  167 +++++++++++++++++++++++++++
>  arch/arm/kvm/exports.c                      |   15 ++
>  arch/arm/kvm/init.S                         |   99 ++++++++++++++++
>  arch/arm/kvm/interrupts.S                   |   47 +++++++
>  arch/arm/kvm/mmu.c                          |  170 +++++++++++++++++++++++++++
>  mm/memory.c                                 |    2 
>  12 files changed, 685 insertions(+)
>  create mode 100644 arch/arm/include/asm/kvm_arm.h
>  create mode 100644 arch/arm/include/asm/kvm_mmu.h
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> new file mode 100644
> index 0000000..7f30cbd
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -0,0 +1,117 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + *
> + */
> +
> +#ifndef __KVM_ARM_H__
> +#define __KVM_ARM_H__
> +
> +#include <asm/types.h>
> +
> +/* Hyp Configuration Register (HCR) bits */
> +#define HCR_TGE		(1 << 27)
> +#define HCR_TVM		(1 << 26)
> +#define HCR_TTLB	(1 << 25)
> +#define HCR_TPU		(1 << 24)
> +#define HCR_TPC		(1 << 23)
> +#define HCR_TSW		(1 << 22)
> +#define HCR_TAC		(1 << 21)
> +#define HCR_TIDCP	(1 << 20)
> +#define HCR_TSC		(1 << 19)
> +#define HCR_TID3	(1 << 18)
> +#define HCR_TID2	(1 << 17)
> +#define HCR_TID1	(1 << 16)
> +#define HCR_TID0	(1 << 15)
> +#define HCR_TWE		(1 << 14)
> +#define HCR_TWI		(1 << 13)
> +#define HCR_DC		(1 << 12)
> +#define HCR_BSU		(3 << 10)
> +#define HCR_BSU_IS	(1 << 10)
> +#define HCR_FB		(1 << 9)
> +#define HCR_VA		(1 << 8)
> +#define HCR_VI		(1 << 7)
> +#define HCR_VF		(1 << 6)
> +#define HCR_AMO		(1 << 5)
> +#define HCR_IMO		(1 << 4)
> +#define HCR_FMO		(1 << 3)
> +#define HCR_PTW		(1 << 2)
> +#define HCR_SWIO	(1 << 1)
> +#define HCR_VM		1
> +
> +/*
> + * The bits we set in HCR:
> + * TAC:		Trap ACTLR
> + * TSC:		Trap SMC
> + * TWI:		Trap WFI
> + * BSU_IS:	Upgrade barriers to the inner shareable domain
> + * FB:		Force broadcast of all maintainance operations
> + * AMO:		Override CPSR.A and enable signaling with VA
> + * IMO:		Override CPSR.I and enable signaling with VI
> + * FMO:		Override CPSR.F and enable signaling with VF
> + * SWIO:	Turn set/way invalidates into set/way clean+invalidate
> + */
> +#define HCR_GUEST_MASK (HCR_TSC | HCR_TWI | HCR_VM | HCR_BSU_IS | HCR_FB | \
> +			HCR_AMO | HCR_IMO | HCR_FMO | HCR_FMO | HCR_SWIO)
> +
> +/* Hyp System Control Register (HSCTLR) bits */
> +#define HSCTLR_TE	(1 << 30)
> +#define HSCTLR_EE	(1 << 25)
> +#define HSCTLR_FI	(1 << 21)
> +#define HSCTLR_WXN	(1 << 19)
> +#define HSCTLR_I	(1 << 12)
> +#define HSCTLR_C	(1 << 2)
> +#define HSCTLR_A	(1 << 1)
> +#define HSCTLR_M	1
> +#define HSCTLR_MASK	(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
> +			 HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
> +
> +/* TTBCR and HTCR Registers bits */
> +#define TTBCR_EAE	(1 << 31)
> +#define TTBCR_IMP	(1 << 30)
> +#define TTBCR_SH1	(3 << 28)
> +#define TTBCR_ORGN1	(3 << 26)
> +#define TTBCR_IRGN1	(3 << 24)
> +#define TTBCR_EPD1	(1 << 23)
> +#define TTBCR_A1	(1 << 22)
> +#define TTBCR_T1SZ	(3 << 16)
> +#define TTBCR_SH0	(3 << 12)
> +#define TTBCR_ORGN0	(3 << 10)
> +#define TTBCR_IRGN0	(3 << 8)
> +#define TTBCR_EPD0	(1 << 7)
> +#define TTBCR_T0SZ	3
> +#define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
> +
> +
> +/* Virtualization Translation Control Register (VTCR) bits */
> +#define VTCR_SH0	(3 << 12)
> +#define VTCR_ORGN0	(3 << 10)
> +#define VTCR_IRGN0	(3 << 8)
> +#define VTCR_SL0	(3 << 6)
> +#define VTCR_S		(1 << 4)
> +#define VTCR_T0SZ	3
> +#define VTCR_MASK	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0 | VTCR_SL0 | \
> +			 VTCR_S | VTCR_T0SZ | VTCR_MASK)
> +#define VTCR_HTCR_SH	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0)
> +#define VTCR_SL_L2	0		/* Starting-level: 2 */
> +#define VTCR_SL_L1	(1 << 6)	/* Starting-level: 1 */
> +#define VTCR_GUEST_SL	VTCR_SL_L1
> +#define VTCR_GUEST_T0SZ	0
> +#if VTCR_GUEST_SL == 0
> +#define VTTBR_X		(14 - VTCR_GUEST_T0SZ)
> +#else
> +#define VTTBR_X		(5 - VTCR_GUEST_T0SZ)
> +#endif
> +
> +
> +#endif /* __KVM_ARM_H__ */
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index c3d4458..69afdf3 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -24,5 +24,27 @@
>  #define ARM_EXCEPTION_DATA_ABORT  4
>  #define ARM_EXCEPTION_IRQ	  5
>  #define ARM_EXCEPTION_FIQ	  6
> +#define ARM_EXCEPTION_HVC	  7
> +
> +/*
> + * SMC Hypervisor API call number
> + */
> +#define SMCHYP_HVBAR_W 0xfffffff0
> +
> +#ifndef __ASSEMBLY__
> +struct kvm_vcpu;
> +
> +extern char __kvm_hyp_init[];
> +extern char __kvm_hyp_init_end[];
> +
> +extern char __kvm_hyp_vector[];
> +
> +extern char __kvm_hyp_code_start[];
> +extern char __kvm_hyp_code_end[];
> +
> +extern void __kvm_flush_vm_context(void);
> +
> +extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +#endif
>  
>  #endif /* __ARM_KVM_ASM_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> new file mode 100644
> index 0000000..1aa1af4
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -0,0 +1,37 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + *
> + */
> +
> +#ifndef __ARM_KVM_MMU_H__
> +#define __ARM_KVM_MMU_H__
> +
> +/*
> + * The architecture supports 40-bit IPA as input to the 2nd stage translations
> + * and PTRS_PER_PGD2 could therefore be 1024.
> + *
> + * To save a bit of memory and to avoid alignment issues we assume 39-bit IPA
> + * for now, but remember that the level-1 table must be aligned to its size.
> + */
> +#define PTRS_PER_PGD2	512
> +#define PGD2_ORDER	get_order(PTRS_PER_PGD2 * sizeof(pgd_t))
> +
> +int create_hyp_mappings(void *from, void *to);
> +void free_hyp_pmds(void);
> +
> +int kvm_hyp_pgd_alloc(void);
> +pgd_t *kvm_hyp_pgd_get(void);
> +void kvm_hyp_pgd_free(void);
> +
> +#endif /* __ARM_KVM_MMU_H__ */
> diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
> index a2d404e..18f5cef 100644
> --- a/arch/arm/include/asm/pgtable-3level-hwdef.h
> +++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
> @@ -32,6 +32,9 @@
>  #define PMD_TYPE_SECT		(_AT(pmdval_t, 1) << 0)
>  #define PMD_BIT4		(_AT(pmdval_t, 0))
>  #define PMD_DOMAIN(x)		(_AT(pmdval_t, 0))
> +#define PMD_APTABLE_SHIFT	(61)
> +#define PMD_APTABLE		(_AT(pgdval_t, 3) << PGD_APTABLE_SHIFT)
> +#define PMD_PXNTABLE		(_AT(pgdval_t, 1) << 59)
>  
>  /*
>   *   - section
> @@ -41,6 +44,7 @@
>  #define PMD_SECT_S		(_AT(pmdval_t, 3) << 8)
>  #define PMD_SECT_AF		(_AT(pmdval_t, 1) << 10)
>  #define PMD_SECT_nG		(_AT(pmdval_t, 1) << 11)
> +#define PMD_SECT_PXN		(_AT(pmdval_t, 1) << 53)
>  #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
>  #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
>  #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index b249035..1169a8a 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -107,6 +107,10 @@
>  #define pud_none(pud)		(!pud_val(pud))
>  #define pud_bad(pud)		(!(pud_val(pud) & 2))
>  #define pud_present(pud)	(pud_val(pud))
> +#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
> +						 PMD_TYPE_TABLE)
> +#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
> +						 PMD_TYPE_SECT)
>  
>  #define pud_clear(pudp)			\
>  	do {				\
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index c7bd809..4b72287 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -82,6 +82,7 @@ extern pgprot_t		pgprot_kernel;
>  #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
>  #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
>  #define PAGE_KERNEL_EXEC	pgprot_kernel
> +#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
>  
>  #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
>  #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 5992d90..4c61d3c 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -31,6 +31,12 @@
>  #include <asm/uaccess.h>
>  #include <asm/ptrace.h>
>  #include <asm/mman.h>
> +#include <asm/tlbflush.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_mmu.h>
> +
> +static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>  
>  int kvm_arch_hardware_enable(void *garbage)
>  {
> @@ -255,13 +261,174 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  	return -EINVAL;
>  }
>  
> +static void cpu_set_vector(void *vector)
> +{
> +	unsigned long vector_ptr;
> +	unsigned long smc_hyp_nr;
> +
> +	vector_ptr = (unsigned long)vector;
> +	smc_hyp_nr = SMCHYP_HVBAR_W;
> +
> +	/*
> +	 * Set the HVBAR
> +	 */
> +	asm volatile (
> +		"mov	r0, %[vector_ptr]\n\t"
> +		"mov	r7, %[smc_hyp_nr]\n\t"
> +		"smc	#0\n\t" : :
> +		[vector_ptr] "r" (vector_ptr),
> +		[smc_hyp_nr] "r" (smc_hyp_nr) :
> +		"r0", "r7");
> +}
> +
> +static void cpu_init_hyp_mode(void *vector)
> +{
> +	unsigned long pgd_ptr;
> +	unsigned long hyp_stack_ptr;
> +	unsigned long stack_page;
> +
> +	cpu_set_vector(vector);
> +
> +	pgd_ptr = virt_to_phys(kvm_hyp_pgd_get());
> +	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
> +	hyp_stack_ptr = stack_page + PAGE_SIZE;
> +
> +	/*
> +	 * Call initialization code
> +	 */
> +	asm volatile (
> +		"mov	r0, %[pgd_ptr]\n\t"
> +		"mov	r1, %[hyp_stack_ptr]\n\t"
> +		"hvc	#0\n\t" : :
> +		[pgd_ptr] "r" (pgd_ptr),
> +		[hyp_stack_ptr] "r" (hyp_stack_ptr) :
> +		"r0", "r1");
> +}
> +
> +/**
> + * Inits Hyp-mode on all online CPUs
> + */
> +static int init_hyp_mode(void)
> +{
> +	phys_addr_t init_phys_addr, init_end_phys_addr;
> +	int cpu;
> +	int err = 0;
> +
> +	/*
> +	 * Allocate stack pages for Hypervisor-mode
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		unsigned long stack_page;
> +
> +		stack_page = __get_free_page(GFP_KERNEL);
> +		if (!stack_page) {
> +			err = -ENOMEM;
> +			goto out_free_stack_pages;
> +		}
> +
> +		per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
> +	}
> +
> +	/*
> +	 * Allocate Hyp level-1 page table
> +	 */
> +	err = kvm_hyp_pgd_alloc();
> +	if (err)
> +		goto out_free_stack_pages;
> +
> +	init_phys_addr = virt_to_phys(__kvm_hyp_init);
> +	init_end_phys_addr = virt_to_phys(__kvm_hyp_init_end);
> +	BUG_ON(init_phys_addr & 0x1f);
> +
> +	/*
> +	 * Create identity mapping for the init code.
> +	 */
> +	hyp_idmap_add(kvm_hyp_pgd_get(),
> +		      (unsigned long)init_phys_addr,
> +		      (unsigned long)init_end_phys_addr);
> +
> +	/*
> +	 * Execute the init code on each CPU.
> +	 *
> +	 * Note: The stack is not mapped yet, so don't do anything else than
> +	 * initializing the hypervisor mode on each CPU using a local stack
> +	 * space for temporary storage.
> +	 */
> +	for_each_online_cpu(cpu) {
> +		smp_call_function_single(cpu, cpu_init_hyp_mode,
> +					 (void *)(long)init_phys_addr, 1);
> +	}
> +
> +	/*
> +	 * Unmap the identity mapping
> +	 */
> +	hyp_idmap_del(kvm_hyp_pgd_get(),
> +		      (unsigned long)init_phys_addr,
> +		      (unsigned long)init_end_phys_addr);
> +
> +	/*
> +	 * Map the Hyp-code called directly from the host
> +	 */
> +	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
> +	if (err) {
> +		kvm_err("Cannot map world-switch code\n");
> +		goto out_free_mappings;
> +	}
> +
> +	/*
> +	 * Map the Hyp stack pages
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
> +		err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE);
> +
> +		if (err) {
> +			kvm_err("Cannot map hyp stack\n");
> +			goto out_free_mappings;
> +		}
> +	}
> +
> +	/*
> +	 * Set the HVBAR to the virtual kernel address
> +	 */
> +	for_each_online_cpu(cpu)
> +		smp_call_function_single(cpu, cpu_set_vector,
> +					 __kvm_hyp_vector, 1);
> +
> +	return 0;
> +out_free_mappings:
> +	free_hyp_pmds();
> +	kvm_hyp_pgd_free();
> +out_free_stack_pages:
> +	for_each_possible_cpu(cpu)
> +		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));

should assign per_cpu(kvm_arm_hyp_stack_page, cpu) to NULL.

Is there CPU hotplug support on ARM?


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 00/15] KVM/ARM Implementation
  2012-06-28 21:49 ` [PATCH v8 00/15] KVM/ARM Implementation Marcelo Tosatti
@ 2012-06-28 22:44   ` Christoffer Dall
  0 siblings, 0 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-28 22:44 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: android-virt, kvm

>
> Is there public documentation for "hyp-mode" available?
>
yes, you have to register on the ARM website
(http://infocenter.arm.com) but there you can download the ARM v7
architecture reference manual.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 09/15] ARM: KVM: Memory virtualization setup
  2012-06-28 22:34   ` Marcelo Tosatti
@ 2012-06-28 22:51     ` Christoffer Dall
  0 siblings, 0 replies; 54+ messages in thread
From: Christoffer Dall @ 2012-06-28 22:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: android-virt, kvm

On Thu, Jun 28, 2012 at 6:34 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Fri, Jun 15, 2012 at 03:08:22PM -0400, Christoffer Dall wrote:
>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>
>> This commit introduces the framework for guest memory management
>> through the use of 2nd stage translation. Each VM has a pointer
>> to a level-1 table (the pgd field in struct kvm_arch) which is
>> used for the 2nd stage translations. Entries are added when handling
>> guest faults (later patch) and the table itself can be allocated and
>> freed through the following functions implemented in
>> arch/arm/kvm/arm_mmu.c:
>>  - kvm_alloc_stage2_pgd(struct kvm *kvm);
>>  - kvm_free_stage2_pgd(struct kvm *kvm);
>>
>> Further, each entry in TLBs and caches are tagged with a VMID
>> identifier in addition to ASIDs. The VMIDs are assigned consecutively
>> to VMs in the order that VMs are executed, and caches and tlbs are
>> invalidated when the VMID space has been used to allow for more than
>> 255 simultaenously running guests.
>>
>> The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
>> freed in kvm_arch_destroy_vm(). Both functions are called from the main
>> KVM code.
>>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>
> Can you explain on a high level how the IPA -> PA mappings work?
>

the memory system on ARM with Virtualization Extensions is separated
into two stages: stage 1 and stage 2.

If stage 2 translation is disabled, which it is when we boot the host
kernel, only a three-level page table for stage 1 translations are
performed by the MMU and the result of the stage 1 translation is used
to physically access memory.

If stage 2 translation is enabled, the output of the stage 1
translation (which is a 40-bit intermediate physical address, IPA,
a.k.a. gpa_t/gfn in KVM language) is used for a stage 2 translation
that takes the 40-bit address as input and uses another set of page
tables, in 3 levels, to produce the resulting address.

If a fault happens during stage 1 translation, this fault is taken:
 a) directly by the host when the host is running and stage 2
translation is disabled
 b) directly by the guest VM

If a fault happens during stage 2 translation, the fault is always
taken by the hypervisor, which populates the missing entry in the
stage 2 page table (or changes the entry to be a writable entry in the
case of a permission fault).

During boot of a VM, the MMU is disabled for the guest Stage 1
translations and the address produced by the CPU is fed directly to
the stage 2 translation system.

A nice diagram is shown on page B3-1330 of the ARM arm I referred you
to in the other mail.

Let me know if this is the level you had in mind.

-Christoffer

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 07/15] ARM: KVM: Hypervisor inititalization
  2012-06-28 22:35   ` Marcelo Tosatti
@ 2012-06-28 22:53     ` Christoffer Dall
  2012-06-29  1:07       ` Marcelo Tosatti
  0 siblings, 1 reply; 54+ messages in thread
From: Christoffer Dall @ 2012-06-28 22:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: android-virt, kvm

On Thu, Jun 28, 2012 at 6:35 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Fri, Jun 15, 2012 at 03:07:59PM -0400, Christoffer Dall wrote:
>> Sets up the required registers to run code in HYP-mode from the kernel.
>>
>> By setting the HVBAR the kernel can execute code in Hyp-mode with
>> the MMU disabled. The HVBAR initially points to initialization code,
>> which initializes other Hyp-mode registers and enables the MMU
>> for Hyp-mode. Afterwards, the HVBAR is changed to point to KVM
>> Hyp vectors used to catch guest faults and to switch to Hyp mode
>> to perform a world-switch into a KVM guest.
>>
>> Also provides memory mapping code to map required code pages and data
>> structures accessed in Hyp mode at the same virtual address as the
>> host kernel virtual addresses, but which conforms to the architectural
>> requirements for translations in Hyp mode. This interface is added in
>> arch/arm/kvm/arm_mmu.c and is comprised of:
>>  - create_hyp_mappings(hyp_pgd, start, end);
>>  - free_hyp_pmds(pgd_hyp);
>>
>> Note: The initialization mechanism currently relies on an SMC #0 call
>> to the secure monitor, which was merely a fast way of getting to the
>> hypervisor. Dave Marting and Rusty Russel have patches out to make the
>> boot-wrapper and the kernel boot in Hyp-mode and setup a generic way for
>> hypervisors to get access to Hyp-mode if the boot-loader allows such
>> access.
>>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/kvm_arm.h              |  117 +++++++++++++++++++
>>  arch/arm/include/asm/kvm_asm.h              |   22 +++
>>  arch/arm/include/asm/kvm_mmu.h              |   37 ++++++
>>  arch/arm/include/asm/pgtable-3level-hwdef.h |    4 +
>>  arch/arm/include/asm/pgtable-3level.h       |    4 +
>>  arch/arm/include/asm/pgtable.h              |    1
>>  arch/arm/kvm/arm.c                          |  167 +++++++++++++++++++++++++++
>>  arch/arm/kvm/exports.c                      |   15 ++
>>  arch/arm/kvm/init.S                         |   99 ++++++++++++++++
>>  arch/arm/kvm/interrupts.S                   |   47 +++++++
>>  arch/arm/kvm/mmu.c                          |  170 +++++++++++++++++++++++++++
>>  mm/memory.c                                 |    2
>>  12 files changed, 685 insertions(+)
>>  create mode 100644 arch/arm/include/asm/kvm_arm.h
>>  create mode 100644 arch/arm/include/asm/kvm_mmu.h
>>
>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> new file mode 100644
>> index 0000000..7f30cbd
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -0,0 +1,117 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + *
>> + */
>> +
>> +#ifndef __KVM_ARM_H__
>> +#define __KVM_ARM_H__
>> +
>> +#include <asm/types.h>
>> +
>> +/* Hyp Configuration Register (HCR) bits */
>> +#define HCR_TGE              (1 << 27)
>> +#define HCR_TVM              (1 << 26)
>> +#define HCR_TTLB     (1 << 25)
>> +#define HCR_TPU              (1 << 24)
>> +#define HCR_TPC              (1 << 23)
>> +#define HCR_TSW              (1 << 22)
>> +#define HCR_TAC              (1 << 21)
>> +#define HCR_TIDCP    (1 << 20)
>> +#define HCR_TSC              (1 << 19)
>> +#define HCR_TID3     (1 << 18)
>> +#define HCR_TID2     (1 << 17)
>> +#define HCR_TID1     (1 << 16)
>> +#define HCR_TID0     (1 << 15)
>> +#define HCR_TWE              (1 << 14)
>> +#define HCR_TWI              (1 << 13)
>> +#define HCR_DC               (1 << 12)
>> +#define HCR_BSU              (3 << 10)
>> +#define HCR_BSU_IS   (1 << 10)
>> +#define HCR_FB               (1 << 9)
>> +#define HCR_VA               (1 << 8)
>> +#define HCR_VI               (1 << 7)
>> +#define HCR_VF               (1 << 6)
>> +#define HCR_AMO              (1 << 5)
>> +#define HCR_IMO              (1 << 4)
>> +#define HCR_FMO              (1 << 3)
>> +#define HCR_PTW              (1 << 2)
>> +#define HCR_SWIO     (1 << 1)
>> +#define HCR_VM               1
>> +
>> +/*
>> + * The bits we set in HCR:
>> + * TAC:              Trap ACTLR
>> + * TSC:              Trap SMC
>> + * TWI:              Trap WFI
>> + * BSU_IS:   Upgrade barriers to the inner shareable domain
>> + * FB:               Force broadcast of all maintainance operations
>> + * AMO:              Override CPSR.A and enable signaling with VA
>> + * IMO:              Override CPSR.I and enable signaling with VI
>> + * FMO:              Override CPSR.F and enable signaling with VF
>> + * SWIO:     Turn set/way invalidates into set/way clean+invalidate
>> + */
>> +#define HCR_GUEST_MASK (HCR_TSC | HCR_TWI | HCR_VM | HCR_BSU_IS | HCR_FB | \
>> +                     HCR_AMO | HCR_IMO | HCR_FMO | HCR_FMO | HCR_SWIO)
>> +
>> +/* Hyp System Control Register (HSCTLR) bits */
>> +#define HSCTLR_TE    (1 << 30)
>> +#define HSCTLR_EE    (1 << 25)
>> +#define HSCTLR_FI    (1 << 21)
>> +#define HSCTLR_WXN   (1 << 19)
>> +#define HSCTLR_I     (1 << 12)
>> +#define HSCTLR_C     (1 << 2)
>> +#define HSCTLR_A     (1 << 1)
>> +#define HSCTLR_M     1
>> +#define HSCTLR_MASK  (HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
>> +                      HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
>> +
>> +/* TTBCR and HTCR Registers bits */
>> +#define TTBCR_EAE    (1 << 31)
>> +#define TTBCR_IMP    (1 << 30)
>> +#define TTBCR_SH1    (3 << 28)
>> +#define TTBCR_ORGN1  (3 << 26)
>> +#define TTBCR_IRGN1  (3 << 24)
>> +#define TTBCR_EPD1   (1 << 23)
>> +#define TTBCR_A1     (1 << 22)
>> +#define TTBCR_T1SZ   (3 << 16)
>> +#define TTBCR_SH0    (3 << 12)
>> +#define TTBCR_ORGN0  (3 << 10)
>> +#define TTBCR_IRGN0  (3 << 8)
>> +#define TTBCR_EPD0   (1 << 7)
>> +#define TTBCR_T0SZ   3
>> +#define HTCR_MASK    (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
>> +
>> +
>> +/* Virtualization Translation Control Register (VTCR) bits */
>> +#define VTCR_SH0     (3 << 12)
>> +#define VTCR_ORGN0   (3 << 10)
>> +#define VTCR_IRGN0   (3 << 8)
>> +#define VTCR_SL0     (3 << 6)
>> +#define VTCR_S               (1 << 4)
>> +#define VTCR_T0SZ    3
>> +#define VTCR_MASK    (VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0 | VTCR_SL0 | \
>> +                      VTCR_S | VTCR_T0SZ | VTCR_MASK)
>> +#define VTCR_HTCR_SH (VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0)
>> +#define VTCR_SL_L2   0               /* Starting-level: 2 */
>> +#define VTCR_SL_L1   (1 << 6)        /* Starting-level: 1 */
>> +#define VTCR_GUEST_SL        VTCR_SL_L1
>> +#define VTCR_GUEST_T0SZ      0
>> +#if VTCR_GUEST_SL == 0
>> +#define VTTBR_X              (14 - VTCR_GUEST_T0SZ)
>> +#else
>> +#define VTTBR_X              (5 - VTCR_GUEST_T0SZ)
>> +#endif
>> +
>> +
>> +#endif /* __KVM_ARM_H__ */
>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>> index c3d4458..69afdf3 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -24,5 +24,27 @@
>>  #define ARM_EXCEPTION_DATA_ABORT  4
>>  #define ARM_EXCEPTION_IRQ      5
>>  #define ARM_EXCEPTION_FIQ      6
>> +#define ARM_EXCEPTION_HVC      7
>> +
>> +/*
>> + * SMC Hypervisor API call number
>> + */
>> +#define SMCHYP_HVBAR_W 0xfffffff0
>> +
>> +#ifndef __ASSEMBLY__
>> +struct kvm_vcpu;
>> +
>> +extern char __kvm_hyp_init[];
>> +extern char __kvm_hyp_init_end[];
>> +
>> +extern char __kvm_hyp_vector[];
>> +
>> +extern char __kvm_hyp_code_start[];
>> +extern char __kvm_hyp_code_end[];
>> +
>> +extern void __kvm_flush_vm_context(void);
>> +
>> +extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>> +#endif
>>
>>  #endif /* __ARM_KVM_ASM_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> new file mode 100644
>> index 0000000..1aa1af4
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -0,0 +1,37 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + *
>> + */
>> +
>> +#ifndef __ARM_KVM_MMU_H__
>> +#define __ARM_KVM_MMU_H__
>> +
>> +/*
>> + * The architecture supports 40-bit IPA as input to the 2nd stage translations
>> + * and PTRS_PER_PGD2 could therefore be 1024.
>> + *
>> + * To save a bit of memory and to avoid alignment issues we assume 39-bit IPA
>> + * for now, but remember that the level-1 table must be aligned to its size.
>> + */
>> +#define PTRS_PER_PGD2        512
>> +#define PGD2_ORDER   get_order(PTRS_PER_PGD2 * sizeof(pgd_t))
>> +
>> +int create_hyp_mappings(void *from, void *to);
>> +void free_hyp_pmds(void);
>> +
>> +int kvm_hyp_pgd_alloc(void);
>> +pgd_t *kvm_hyp_pgd_get(void);
>> +void kvm_hyp_pgd_free(void);
>> +
>> +#endif /* __ARM_KVM_MMU_H__ */
>> diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
>> index a2d404e..18f5cef 100644
>> --- a/arch/arm/include/asm/pgtable-3level-hwdef.h
>> +++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
>> @@ -32,6 +32,9 @@
>>  #define PMD_TYPE_SECT                (_AT(pmdval_t, 1) << 0)
>>  #define PMD_BIT4             (_AT(pmdval_t, 0))
>>  #define PMD_DOMAIN(x)                (_AT(pmdval_t, 0))
>> +#define PMD_APTABLE_SHIFT    (61)
>> +#define PMD_APTABLE          (_AT(pgdval_t, 3) << PGD_APTABLE_SHIFT)
>> +#define PMD_PXNTABLE         (_AT(pgdval_t, 1) << 59)
>>
>>  /*
>>   *   - section
>> @@ -41,6 +44,7 @@
>>  #define PMD_SECT_S           (_AT(pmdval_t, 3) << 8)
>>  #define PMD_SECT_AF          (_AT(pmdval_t, 1) << 10)
>>  #define PMD_SECT_nG          (_AT(pmdval_t, 1) << 11)
>> +#define PMD_SECT_PXN         (_AT(pmdval_t, 1) << 53)
>>  #define PMD_SECT_XN          (_AT(pmdval_t, 1) << 54)
>>  #define PMD_SECT_AP_WRITE    (_AT(pmdval_t, 0))
>>  #define PMD_SECT_AP_READ     (_AT(pmdval_t, 0))
>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> index b249035..1169a8a 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -107,6 +107,10 @@
>>  #define pud_none(pud)                (!pud_val(pud))
>>  #define pud_bad(pud)         (!(pud_val(pud) & 2))
>>  #define pud_present(pud)     (pud_val(pud))
>> +#define pmd_table(pmd)               ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>> +                                              PMD_TYPE_TABLE)
>> +#define pmd_sect(pmd)                ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>> +                                              PMD_TYPE_SECT)
>>
>>  #define pud_clear(pudp)                      \
>>       do {                            \
>> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
>> index c7bd809..4b72287 100644
>> --- a/arch/arm/include/asm/pgtable.h
>> +++ b/arch/arm/include/asm/pgtable.h
>> @@ -82,6 +82,7 @@ extern pgprot_t             pgprot_kernel;
>>  #define PAGE_READONLY_EXEC   _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
>>  #define PAGE_KERNEL          _MOD_PROT(pgprot_kernel, L_PTE_XN)
>>  #define PAGE_KERNEL_EXEC     pgprot_kernel
>> +#define PAGE_HYP             _MOD_PROT(pgprot_kernel, L_PTE_USER)
>>
>>  #define __PAGE_NONE          __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
>>  #define __PAGE_SHARED                __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 5992d90..4c61d3c 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -31,6 +31,12 @@
>>  #include <asm/uaccess.h>
>>  #include <asm/ptrace.h>
>>  #include <asm/mman.h>
>> +#include <asm/tlbflush.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/kvm_asm.h>
>> +#include <asm/kvm_mmu.h>
>> +
>> +static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>>
>>  int kvm_arch_hardware_enable(void *garbage)
>>  {
>> @@ -255,13 +261,174 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>       return -EINVAL;
>>  }
>>
>> +static void cpu_set_vector(void *vector)
>> +{
>> +     unsigned long vector_ptr;
>> +     unsigned long smc_hyp_nr;
>> +
>> +     vector_ptr = (unsigned long)vector;
>> +     smc_hyp_nr = SMCHYP_HVBAR_W;
>> +
>> +     /*
>> +      * Set the HVBAR
>> +      */
>> +     asm volatile (
>> +             "mov    r0, %[vector_ptr]\n\t"
>> +             "mov    r7, %[smc_hyp_nr]\n\t"
>> +             "smc    #0\n\t" : :
>> +             [vector_ptr] "r" (vector_ptr),
>> +             [smc_hyp_nr] "r" (smc_hyp_nr) :
>> +             "r0", "r7");
>> +}
>> +
>> +static void cpu_init_hyp_mode(void *vector)
>> +{
>> +     unsigned long pgd_ptr;
>> +     unsigned long hyp_stack_ptr;
>> +     unsigned long stack_page;
>> +
>> +     cpu_set_vector(vector);
>> +
>> +     pgd_ptr = virt_to_phys(kvm_hyp_pgd_get());
>> +     stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
>> +     hyp_stack_ptr = stack_page + PAGE_SIZE;
>> +
>> +     /*
>> +      * Call initialization code
>> +      */
>> +     asm volatile (
>> +             "mov    r0, %[pgd_ptr]\n\t"
>> +             "mov    r1, %[hyp_stack_ptr]\n\t"
>> +             "hvc    #0\n\t" : :
>> +             [pgd_ptr] "r" (pgd_ptr),
>> +             [hyp_stack_ptr] "r" (hyp_stack_ptr) :
>> +             "r0", "r1");
>> +}
>> +
>> +/**
>> + * Inits Hyp-mode on all online CPUs
>> + */
>> +static int init_hyp_mode(void)
>> +{
>> +     phys_addr_t init_phys_addr, init_end_phys_addr;
>> +     int cpu;
>> +     int err = 0;
>> +
>> +     /*
>> +      * Allocate stack pages for Hypervisor-mode
>> +      */
>> +     for_each_possible_cpu(cpu) {
>> +             unsigned long stack_page;
>> +
>> +             stack_page = __get_free_page(GFP_KERNEL);
>> +             if (!stack_page) {
>> +                     err = -ENOMEM;
>> +                     goto out_free_stack_pages;
>> +             }
>> +
>> +             per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
>> +     }
>> +
>> +     /*
>> +      * Allocate Hyp level-1 page table
>> +      */
>> +     err = kvm_hyp_pgd_alloc();
>> +     if (err)
>> +             goto out_free_stack_pages;
>> +
>> +     init_phys_addr = virt_to_phys(__kvm_hyp_init);
>> +     init_end_phys_addr = virt_to_phys(__kvm_hyp_init_end);
>> +     BUG_ON(init_phys_addr & 0x1f);
>> +
>> +     /*
>> +      * Create identity mapping for the init code.
>> +      */
>> +     hyp_idmap_add(kvm_hyp_pgd_get(),
>> +                   (unsigned long)init_phys_addr,
>> +                   (unsigned long)init_end_phys_addr);
>> +
>> +     /*
>> +      * Execute the init code on each CPU.
>> +      *
>> +      * Note: The stack is not mapped yet, so don't do anything else than
>> +      * initializing the hypervisor mode on each CPU using a local stack
>> +      * space for temporary storage.
>> +      */
>> +     for_each_online_cpu(cpu) {
>> +             smp_call_function_single(cpu, cpu_init_hyp_mode,
>> +                                      (void *)(long)init_phys_addr, 1);
>> +     }
>> +
>> +     /*
>> +      * Unmap the identity mapping
>> +      */
>> +     hyp_idmap_del(kvm_hyp_pgd_get(),
>> +                   (unsigned long)init_phys_addr,
>> +                   (unsigned long)init_end_phys_addr);
>> +
>> +     /*
>> +      * Map the Hyp-code called directly from the host
>> +      */
>> +     err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
>> +     if (err) {
>> +             kvm_err("Cannot map world-switch code\n");
>> +             goto out_free_mappings;
>> +     }
>> +
>> +     /*
>> +      * Map the Hyp stack pages
>> +      */
>> +     for_each_possible_cpu(cpu) {
>> +             char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
>> +             err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE);
>> +
>> +             if (err) {
>> +                     kvm_err("Cannot map hyp stack\n");
>> +                     goto out_free_mappings;
>> +             }
>> +     }
>> +
>> +     /*
>> +      * Set the HVBAR to the virtual kernel address
>> +      */
>> +     for_each_online_cpu(cpu)
>> +             smp_call_function_single(cpu, cpu_set_vector,
>> +                                      __kvm_hyp_vector, 1);
>> +
>> +     return 0;
>> +out_free_mappings:
>> +     free_hyp_pmds();
>> +     kvm_hyp_pgd_free();
>> +out_free_stack_pages:
>> +     for_each_possible_cpu(cpu)
>> +             free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
>
> should assign per_cpu(kvm_arm_hyp_stack_page, cpu) to NULL.
>

why? this is run as part of the init code and thus the only way it
could ever run again would be to have the module unloaded in which
case the variable would be re-initialized to zero as per the static
declaration, no?

> Is there CPU hotplug support on ARM?
>

I don't think (read: hope) so. ARM people?

-Christoffer

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 07/15] ARM: KVM: Hypervisor inititalization
  2012-06-28 22:53     ` Christoffer Dall
@ 2012-06-29  1:07       ` Marcelo Tosatti
  0 siblings, 0 replies; 54+ messages in thread
From: Marcelo Tosatti @ 2012-06-29  1:07 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On Thu, Jun 28, 2012 at 06:53:43PM -0400, Christoffer Dall wrote:
> > should assign per_cpu(kvm_arm_hyp_stack_page, cpu) to NULL.
> >
> 
> why? this is run as part of the init code and thus the only way it
> could ever run again would be to have the module unloaded in which
> case the variable would be re-initialized to zero as per the static
> declaration, no?

Right.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
  2012-06-21 17:54               ` Christoffer Dall
@ 2012-07-02 13:07                 ` Avi Kivity
  0 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2012-07-02 13:07 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: android-virt, kvm

On 06/21/2012 08:54 PM, Christoffer Dall wrote:
>>> @@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
>>> *vcpu, struct kvm_run *run)
>>>                */
>>>               preempt_disable();
>>>               local_irq_disable();
>>> +
>>> +             if (check_new_vmid_gen(kvm)) {
>>> +                     local_irq_enable();
>>> +                     preempt_enable();
>>> +                     continue;
>>> +             }
>>> +
>>
>> I see the same race with signals.  Your signal_pending() check needs to
>> be after the local_irq_disable(), otherwise we can enter a guest with a
>> pending signal.
>>
> 
> that's not functionally incorrect though is it? It may simply increase
> the latency for the signal delivery as far as I can see, but I
> definitely don't mind changing this path in any case.

Nothing guarantees that there will be a next exit.  I think we still run
the timer tick on guest entry, so we'll exit after a few milliseconds,
but there are patches to disable the timer tick if only one task is queued.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2012-07-02 13:08 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-15 19:06 [PATCH v8 00/15] KVM/ARM Implementation Christoffer Dall
2012-06-15 19:06 ` [PATCH v8 01/15] ARM: add mem_type prot_pte accessor Christoffer Dall
2012-06-15 19:07 ` [PATCH v8 02/15] KVM: use KVM_CAP_IRQ_ROUTING to protect the routing related code Christoffer Dall
2012-06-18 13:06   ` Avi Kivity
2012-06-15 19:07 ` [PATCH v8 03/15] KVM: Introduce __KVM_HAVE_IRQ_LINE Christoffer Dall
2012-06-18 13:07   ` Avi Kivity
2012-06-15 19:07 ` [PATCH v8 04/15] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER Christoffer Dall
2012-06-18 13:08   ` Avi Kivity
2012-06-18 17:47     ` Christoffer Dall
2012-06-19  8:37       ` Avi Kivity
2012-06-28 21:28   ` Marcelo Tosatti
2012-06-15 19:07 ` [PATCH v8 05/15] ARM: KVM: Initial skeleton to compile KVM support Christoffer Dall
2012-06-15 19:07 ` [PATCH v8 06/15] ARM: KVM: Hypervisor identity mapping Christoffer Dall
2012-06-18 13:12   ` Avi Kivity
2012-06-18 17:55     ` Christoffer Dall
2012-06-19  8:38       ` Avi Kivity
2012-06-15 19:07 ` [PATCH v8 07/15] ARM: KVM: Hypervisor inititalization Christoffer Dall
2012-06-28 22:35   ` Marcelo Tosatti
2012-06-28 22:53     ` Christoffer Dall
2012-06-29  1:07       ` Marcelo Tosatti
2012-06-15 19:08 ` [PATCH v8 08/15] ARM: KVM: Module unloading support Christoffer Dall
2012-06-15 19:08 ` [PATCH v8 09/15] ARM: KVM: Memory virtualization setup Christoffer Dall
2012-06-21 12:29   ` Gleb Natapov
2012-06-21 19:48     ` Christoffer Dall
2012-06-28 22:34   ` Marcelo Tosatti
2012-06-28 22:51     ` Christoffer Dall
2012-06-15 19:08 ` [PATCH v8 10/15] ARM: KVM: Inject IRQs and FIQs from userspace Christoffer Dall
2012-06-18 13:32   ` Avi Kivity
2012-06-18 20:56     ` Christoffer Dall
2012-06-19  8:49       ` Avi Kivity
2012-06-20  3:17         ` Christoffer Dall
2012-06-15 19:08 ` [PATCH v8 11/15] ARM: KVM: World-switch implementation Christoffer Dall
2012-06-18 13:41   ` Avi Kivity
2012-06-18 22:05     ` Christoffer Dall
2012-06-19  9:16       ` Avi Kivity
2012-06-20  3:27         ` Christoffer Dall
2012-06-20  4:40           ` Christoffer Dall
2012-06-21  8:13             ` Avi Kivity
2012-06-21 17:54               ` Christoffer Dall
2012-07-02 13:07                 ` Avi Kivity
2012-06-15 19:08 ` [PATCH v8 12/15] ARM: KVM: Emulation framework and CP15 emulation Christoffer Dall
2012-06-15 19:09 ` [PATCH v8 13/15] ARM: KVM: Handle guest faults in KVM Christoffer Dall
2012-06-18 13:45   ` Avi Kivity
2012-06-18 22:20     ` Christoffer Dall
2012-06-19  9:32       ` Avi Kivity
2012-06-19 10:41         ` Andrea Arcangeli
2012-06-20 15:13           ` Christoffer Dall
2012-06-20 17:49             ` Andrea Arcangeli
2012-06-15 19:09 ` [PATCH v8 14/15] ARM: KVM: Handle I/O aborts Christoffer Dall
2012-06-18 13:48   ` Avi Kivity
2012-06-18 22:28     ` Christoffer Dall
2012-06-15 19:09 ` [PATCH v8 15/15] ARM: KVM: Guest wait-for-interrupts (WFI) support Christoffer Dall
2012-06-28 21:49 ` [PATCH v8 00/15] KVM/ARM Implementation Marcelo Tosatti
2012-06-28 22:44   ` Christoffer Dall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.