All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/15] KVM/ARM Implementation
@ 2012-09-15 15:34 ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:34 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

The following series implements KVM support for ARM processors,
specifically on the Cortex A-15 platform.  We feel this is ready to be
merged.

Work is done in collaboration between Columbia University, Virtual Open
Systems and ARM/Linaro.

The patch series applies to Linux 3.6-rc5 with a number of merges:
 1. git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git
        branch: io (fc8a08c3a3a)
 2. git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
        branch: hyp-mode-boot-next (e5a04cb0b4a)
 3. git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
        branch: timers-next (437814c44c)
 4. git://git.kernel.org/pub/scm/virt/kvm/kvm.git
        branch: next (7de5bdc96c3)

This is Version 11 of the patch series, the first 10 versions were
reviewed on the KVM/ARM and KVM mailing lists. Changes can also be
pulled from:
    git://github.com/virtualopensystems/linux-kvm-arm.git
        branch: kvm-arm-v11
        branch: kvm-arm-v11-vgic
        branch: kvm-arm-v11-vgic-timers

A non-flattened edition of the patch series, which can always be merged,
can be found at:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-master

This patch series requires QEMU compatibility.  Use the branch
 git://github.com/virtualopensystems/qemu.git kvm-arm

Following this patch series, which implements core KVM support are two
other patch series implementing Virtual Generic Interrupt Controller
(VGIC) support and Architected Generic Timers.  All three patch series
should be applied for full QEMU compatibility.

The implementation is broken up into a logical set of patches, the first
are preparatory patches:
  1. ARM: Add mem_type prot_pte accessor
  2. ARM: Add page table defines for KVM
  3. ARM: Section based HYP idmaps
  4. ARM: Only initialize HYP idmap when HYP mode is available
  5. ARM: Expose PMNC bitfields for KVM use

The main implementation is broken up into separate patches, the first
containing a skeleton of files, makefile changes, the basic user space
interface and KVM architecture specific stubs.  Subsequent patches
implement parts of the system as listed:
  6. Skeleton and reset hooks
  7. Hypervisor initialization
  8. Memory virtualization setup (hyp mode mappings and 2nd stage)
  9. Inject IRQs and FIQs from userspace
 10. World-switch implementation and Hyp exception vectors
 11. Emulation framework and coproc emulation
 12. Coproc user space API
 13. Handle guest user memory aborts
 14. Handle guest MMIO aborts
 15. Support guest wait-for-interrupt instructions

Testing:
 Tested on FAST Models and Versatile Express test-chip2.  Tested by
 running three simultaenous VMs, all running SMP, on an SMP host, each
 VM running hackbench and cyclictest and with extreme memory pressure
 applied to the host with swapping enabled to provoke page eviction.
 Also tested KSM merging and GCC inside VMs.  Fully boots both Ubuntu
 (user space Thumb-2) and Debian (user space ARM) guests.

For a guide on how to set up a testing environment and try out these
patches, see:
 http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf


Changes since v10:
 - Boot in Hyp mode and user HVC to initialize HVBAR
 - Support VGIC
 - Support Arch timers
 - Support Thumb-2 mmio instruction decoding
 - Transition to GET_ONE/SET_ONE register API
 - Added KVM_VCPU_GET_REG_LIST
 - New interrupt injection API
 - Don't pin guest pages anymore
 - Fix race condition in page fault handler
 - Cleanup guest instruction copying.
 - Fix race when copying SMP guest instructions
 - Inject data/prefetch aborts when guest does something strange

Changes since v9:
 - Addressed reviewer comments (see mailing list archive)
 - Limit the user of .arch_extensiion sec/virt for compilers that need them
 - VFP/Neon Support (Antonios Motakis)
 - Run exit handling under preemption and still handle guest cache ops
 - Add support for IO mapping at Hyp level (VGIC prep)
 - Add support for IO mapping at Guest level (VGIC prep)
 - Remove backdoor call to irq_svc
 - Complete rework of CP15 handling and register reset (Rusty Russell)
 - Don't use HSTR for anything else than CR 15
 - New ioctl to set emulation target core (only A15 supported for now)
 - Support KVM_GET_MSRS / KVM_SET_MSRS
 - Add page accounting and page table eviction
 - Change pgd lock to spinlock and fix sleeping in atomic bugs
 - Check kvm_condition_valid for HVC traps of undefs
 - Added a naive implementation of kvm_unmap_hva_range

Changes since v8:
 - Support cache maintenance on SMP through set/way
 - Hyp mode idmaps are now section based and happen at kernel init
 - Handle aborts in Hyp mode
 - Inject undefined exceptions into the guest on error
 - Kernel-side reset of all crucial registers
 - Specifically state which target CPU is being virtualized
 - Exit statistics in debugfs
 - Some L2CTLR cp15 emulation cleanups
 - Support spte_hva for MMU notifiers and take write faults
 - FIX: Race condition in VMID generation
 - BUG: Run exit handling code with disabled preemption
 - Save/Restore abort fault register during world switch

Changes since v7:
 - Traps accesses to ACTLR
 - Do not trap WFE execution
 - Upgrade barriers and TLB operations to inner-shareable domain
 - Restrucure hyp_pgd related code to be more opaque
 - Random SMP fixes
 - Random BUG fixes
 - Improve commenting
 - Support module loading/unloading of KVM/ARM
 - Thumb-2 support for host kernel and KVM
 - Unaligned cross-page wide guest Thumb instruction fetching
 - Support ITSTATE fields in CPSR for Thumb guests
 - Document HCR settings

Changes since v6:
 - Support for MMU notifiers to not pin user pages in memory
 - Suport build with log debugging
 - Bugfix: v6 clobbered r7 in init code
 - Simplify hyp code mapping
 - Cleanup of register access code
 - Table-based CP15 emulation from Rusty Russell
 - Various other bug fixes and cleanups

Changes since v5:
 - General bugfixes and nit fixes from reviews
 - Implemented re-use of VMIDs
 - Cleaned up the Hyp-mapping code to be readable by non-mm hackers
   (including myself)
 - Integrated preliminary SMP support in base patches
 - Lock-less interrupt injection and WFI support
 - Fixed signal-handling in while in guest (increases overall stability)

Changes since v4:
 - Addressed reviewer comments from v4
    * cleanup debug and trace code
    * remove printks
    * fixup kvm_arch_vcpu_ioctl_run
    * add trace details to mmio emulation
 - Fix from Marc Zyngier: Move kvm_guest_enter/exit into non-preemptible
   section (squashed into world-switch patch)
 - Cleanup create_hyp_mappings/remove_hyp_mappings from Marc Zyngier
   (squashed into hypervisor initialization patch)
 - Removed the remove_hyp_mappings feature. Removing hypervisor mappings
   could potentially unmap other important data shared in the same page.
 - Removed the arm_ prefix from the arch-specific files.
 - Initial SMP host/guest support

Changes since v3:
 - v4 actually works, fully boots a guest
 - Support compiling as a module
 - Use static inlines instead of macros for vcpu_reg and friends
 - Optimize kvm_vcpu_reg function
 - Use Ftrace for trace capabilities
 - Updated documentation and commenting
 - Use KVM_IRQ_LINE instead of KVM_INTERRUPT
 - Emulates load/store instructions not supported through HSR
  syndrome information.
 - Frees 2nd stage translation tables on VM teardown
 - Handles IRQ/FIQ instructions
 - Handles more CP15 accesses
 - Support guest WFI calls
 - Uses debugfs instead of /proc
 - Support compiling in Thumb mode

Changes since v2:
 - Performs world-switch code
 - Maps guest memory using 2nd stage translation
 - Emulates co-processor 15 instructions
 - Forwards I/O faults to QEMU.

---

Christoffer Dall (11):
      ARM: Add page table and page defines needed by KVM
      KVM: ARM: Initial skeleton to compile KVM support
      KVM: ARM: Hypervisor inititalization
      KVM: ARM: Memory virtualization setup
      KVM: ARM: Inject IRQs and FIQs from userspace
      KVM: ARM: World-switch implementation
      KVM: ARM: Emulation framework and CP15 emulation
      KVM: ARM: User space API for getting/setting co-proc registers
      KVM: ARM: Handle guest faults in KVM
      KVM: ARM: Handle I/O aborts
      KVM: ARM: Guest wait-for-interrupts (WFI) support

Marc Zyngier (3):
      ARM: add mem_type prot_pte accessor
      ARM: Section based HYP idmap
      ARM: idmap: only initialize HYP idmap when HYP mode is available

Rusty Russell (1):
      ARM: Expose PMNC bitfields for KVM use


 Documentation/virtual/kvm/api.txt           |  125 +++
 arch/arm/Kconfig                            |    2 
 arch/arm/Makefile                           |    1 
 arch/arm/include/asm/idmap.h                |    7 
 arch/arm/include/asm/kvm.h                  |  109 +++
 arch/arm/include/asm/kvm_arm.h              |  197 +++++
 arch/arm/include/asm/kvm_asm.h              |   56 +
 arch/arm/include/asm/kvm_coproc.h           |   40 +
 arch/arm/include/asm/kvm_emulate.h          |  137 ++++
 arch/arm/include/asm/kvm_host.h             |  211 ++++++
 arch/arm/include/asm/kvm_mmu.h              |   46 +
 arch/arm/include/asm/mach/map.h             |    1 
 arch/arm/include/asm/perf_bits.h            |   56 +
 arch/arm/include/asm/pgtable-3level-hwdef.h |    5 
 arch/arm/include/asm/pgtable-3level.h       |   13 
 arch/arm/include/asm/pgtable.h              |    5 
 arch/arm/kernel/asm-offsets.c               |   44 +
 arch/arm/kernel/perf_event_v7.c             |   51 -
 arch/arm/kernel/vmlinux.lds.S               |    6 
 arch/arm/kvm/Kconfig                        |   45 +
 arch/arm/kvm/Makefile                       |   21 +
 arch/arm/kvm/arm.c                          | 1030 +++++++++++++++++++++++++++
 arch/arm/kvm/coproc.c                       |  946 +++++++++++++++++++++++++
 arch/arm/kvm/emulate.c                      |  793 +++++++++++++++++++++
 arch/arm/kvm/exports.c                      |   39 +
 arch/arm/kvm/guest.c                        |  212 ++++++
 arch/arm/kvm/init.S                         |  156 ++++
 arch/arm/kvm/interrupts.S                   |  823 ++++++++++++++++++++++
 arch/arm/kvm/mmu.c                          |  998 ++++++++++++++++++++++++++
 arch/arm/kvm/reset.c                        |   74 ++
 arch/arm/kvm/trace.h                        |  170 ++++
 arch/arm/mm/idmap.c                         |   92 ++
 arch/arm/mm/mmu.c                           |    9 
 include/linux/kvm.h                         |    3 
 mm/memory.c                                 |    2 
 35 files changed, 6454 insertions(+), 71 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/asm/kvm_mmu.h
 create mode 100644 arch/arm/include/asm/perf_bits.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/exports.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

-- 

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 00/15] KVM/ARM Implementation
@ 2012-09-15 15:34 ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

The following series implements KVM support for ARM processors,
specifically on the Cortex A-15 platform.  We feel this is ready to be
merged.

Work is done in collaboration between Columbia University, Virtual Open
Systems and ARM/Linaro.

The patch series applies to Linux 3.6-rc5 with a number of merges:
 1. git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git
        branch: io (fc8a08c3a3a)
 2. git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
        branch: hyp-mode-boot-next (e5a04cb0b4a)
 3. git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
        branch: timers-next (437814c44c)
 4. git://git.kernel.org/pub/scm/virt/kvm/kvm.git
        branch: next (7de5bdc96c3)

This is Version 11 of the patch series, the first 10 versions were
reviewed on the KVM/ARM and KVM mailing lists. Changes can also be
pulled from:
    git://github.com/virtualopensystems/linux-kvm-arm.git
        branch: kvm-arm-v11
        branch: kvm-arm-v11-vgic
        branch: kvm-arm-v11-vgic-timers

A non-flattened edition of the patch series, which can always be merged,
can be found at:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-master

This patch series requires QEMU compatibility.  Use the branch
 git://github.com/virtualopensystems/qemu.git kvm-arm

Following this patch series, which implements core KVM support are two
other patch series implementing Virtual Generic Interrupt Controller
(VGIC) support and Architected Generic Timers.  All three patch series
should be applied for full QEMU compatibility.

The implementation is broken up into a logical set of patches, the first
are preparatory patches:
  1. ARM: Add mem_type prot_pte accessor
  2. ARM: Add page table defines for KVM
  3. ARM: Section based HYP idmaps
  4. ARM: Only initialize HYP idmap when HYP mode is available
  5. ARM: Expose PMNC bitfields for KVM use

The main implementation is broken up into separate patches, the first
containing a skeleton of files, makefile changes, the basic user space
interface and KVM architecture specific stubs.  Subsequent patches
implement parts of the system as listed:
  6. Skeleton and reset hooks
  7. Hypervisor initialization
  8. Memory virtualization setup (hyp mode mappings and 2nd stage)
  9. Inject IRQs and FIQs from userspace
 10. World-switch implementation and Hyp exception vectors
 11. Emulation framework and coproc emulation
 12. Coproc user space API
 13. Handle guest user memory aborts
 14. Handle guest MMIO aborts
 15. Support guest wait-for-interrupt instructions

Testing:
 Tested on FAST Models and Versatile Express test-chip2.  Tested by
 running three simultaenous VMs, all running SMP, on an SMP host, each
 VM running hackbench and cyclictest and with extreme memory pressure
 applied to the host with swapping enabled to provoke page eviction.
 Also tested KSM merging and GCC inside VMs.  Fully boots both Ubuntu
 (user space Thumb-2) and Debian (user space ARM) guests.

For a guide on how to set up a testing environment and try out these
patches, see:
 http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf


Changes since v10:
 - Boot in Hyp mode and user HVC to initialize HVBAR
 - Support VGIC
 - Support Arch timers
 - Support Thumb-2 mmio instruction decoding
 - Transition to GET_ONE/SET_ONE register API
 - Added KVM_VCPU_GET_REG_LIST
 - New interrupt injection API
 - Don't pin guest pages anymore
 - Fix race condition in page fault handler
 - Cleanup guest instruction copying.
 - Fix race when copying SMP guest instructions
 - Inject data/prefetch aborts when guest does something strange

Changes since v9:
 - Addressed reviewer comments (see mailing list archive)
 - Limit the user of .arch_extensiion sec/virt for compilers that need them
 - VFP/Neon Support (Antonios Motakis)
 - Run exit handling under preemption and still handle guest cache ops
 - Add support for IO mapping at Hyp level (VGIC prep)
 - Add support for IO mapping at Guest level (VGIC prep)
 - Remove backdoor call to irq_svc
 - Complete rework of CP15 handling and register reset (Rusty Russell)
 - Don't use HSTR for anything else than CR 15
 - New ioctl to set emulation target core (only A15 supported for now)
 - Support KVM_GET_MSRS / KVM_SET_MSRS
 - Add page accounting and page table eviction
 - Change pgd lock to spinlock and fix sleeping in atomic bugs
 - Check kvm_condition_valid for HVC traps of undefs
 - Added a naive implementation of kvm_unmap_hva_range

Changes since v8:
 - Support cache maintenance on SMP through set/way
 - Hyp mode idmaps are now section based and happen at kernel init
 - Handle aborts in Hyp mode
 - Inject undefined exceptions into the guest on error
 - Kernel-side reset of all crucial registers
 - Specifically state which target CPU is being virtualized
 - Exit statistics in debugfs
 - Some L2CTLR cp15 emulation cleanups
 - Support spte_hva for MMU notifiers and take write faults
 - FIX: Race condition in VMID generation
 - BUG: Run exit handling code with disabled preemption
 - Save/Restore abort fault register during world switch

Changes since v7:
 - Traps accesses to ACTLR
 - Do not trap WFE execution
 - Upgrade barriers and TLB operations to inner-shareable domain
 - Restrucure hyp_pgd related code to be more opaque
 - Random SMP fixes
 - Random BUG fixes
 - Improve commenting
 - Support module loading/unloading of KVM/ARM
 - Thumb-2 support for host kernel and KVM
 - Unaligned cross-page wide guest Thumb instruction fetching
 - Support ITSTATE fields in CPSR for Thumb guests
 - Document HCR settings

Changes since v6:
 - Support for MMU notifiers to not pin user pages in memory
 - Suport build with log debugging
 - Bugfix: v6 clobbered r7 in init code
 - Simplify hyp code mapping
 - Cleanup of register access code
 - Table-based CP15 emulation from Rusty Russell
 - Various other bug fixes and cleanups

Changes since v5:
 - General bugfixes and nit fixes from reviews
 - Implemented re-use of VMIDs
 - Cleaned up the Hyp-mapping code to be readable by non-mm hackers
   (including myself)
 - Integrated preliminary SMP support in base patches
 - Lock-less interrupt injection and WFI support
 - Fixed signal-handling in while in guest (increases overall stability)

Changes since v4:
 - Addressed reviewer comments from v4
    * cleanup debug and trace code
    * remove printks
    * fixup kvm_arch_vcpu_ioctl_run
    * add trace details to mmio emulation
 - Fix from Marc Zyngier: Move kvm_guest_enter/exit into non-preemptible
   section (squashed into world-switch patch)
 - Cleanup create_hyp_mappings/remove_hyp_mappings from Marc Zyngier
   (squashed into hypervisor initialization patch)
 - Removed the remove_hyp_mappings feature. Removing hypervisor mappings
   could potentially unmap other important data shared in the same page.
 - Removed the arm_ prefix from the arch-specific files.
 - Initial SMP host/guest support

Changes since v3:
 - v4 actually works, fully boots a guest
 - Support compiling as a module
 - Use static inlines instead of macros for vcpu_reg and friends
 - Optimize kvm_vcpu_reg function
 - Use Ftrace for trace capabilities
 - Updated documentation and commenting
 - Use KVM_IRQ_LINE instead of KVM_INTERRUPT
 - Emulates load/store instructions not supported through HSR
  syndrome information.
 - Frees 2nd stage translation tables on VM teardown
 - Handles IRQ/FIQ instructions
 - Handles more CP15 accesses
 - Support guest WFI calls
 - Uses debugfs instead of /proc
 - Support compiling in Thumb mode

Changes since v2:
 - Performs world-switch code
 - Maps guest memory using 2nd stage translation
 - Emulates co-processor 15 instructions
 - Forwards I/O faults to QEMU.

---

Christoffer Dall (11):
      ARM: Add page table and page defines needed by KVM
      KVM: ARM: Initial skeleton to compile KVM support
      KVM: ARM: Hypervisor inititalization
      KVM: ARM: Memory virtualization setup
      KVM: ARM: Inject IRQs and FIQs from userspace
      KVM: ARM: World-switch implementation
      KVM: ARM: Emulation framework and CP15 emulation
      KVM: ARM: User space API for getting/setting co-proc registers
      KVM: ARM: Handle guest faults in KVM
      KVM: ARM: Handle I/O aborts
      KVM: ARM: Guest wait-for-interrupts (WFI) support

Marc Zyngier (3):
      ARM: add mem_type prot_pte accessor
      ARM: Section based HYP idmap
      ARM: idmap: only initialize HYP idmap when HYP mode is available

Rusty Russell (1):
      ARM: Expose PMNC bitfields for KVM use


 Documentation/virtual/kvm/api.txt           |  125 +++
 arch/arm/Kconfig                            |    2 
 arch/arm/Makefile                           |    1 
 arch/arm/include/asm/idmap.h                |    7 
 arch/arm/include/asm/kvm.h                  |  109 +++
 arch/arm/include/asm/kvm_arm.h              |  197 +++++
 arch/arm/include/asm/kvm_asm.h              |   56 +
 arch/arm/include/asm/kvm_coproc.h           |   40 +
 arch/arm/include/asm/kvm_emulate.h          |  137 ++++
 arch/arm/include/asm/kvm_host.h             |  211 ++++++
 arch/arm/include/asm/kvm_mmu.h              |   46 +
 arch/arm/include/asm/mach/map.h             |    1 
 arch/arm/include/asm/perf_bits.h            |   56 +
 arch/arm/include/asm/pgtable-3level-hwdef.h |    5 
 arch/arm/include/asm/pgtable-3level.h       |   13 
 arch/arm/include/asm/pgtable.h              |    5 
 arch/arm/kernel/asm-offsets.c               |   44 +
 arch/arm/kernel/perf_event_v7.c             |   51 -
 arch/arm/kernel/vmlinux.lds.S               |    6 
 arch/arm/kvm/Kconfig                        |   45 +
 arch/arm/kvm/Makefile                       |   21 +
 arch/arm/kvm/arm.c                          | 1030 +++++++++++++++++++++++++++
 arch/arm/kvm/coproc.c                       |  946 +++++++++++++++++++++++++
 arch/arm/kvm/emulate.c                      |  793 +++++++++++++++++++++
 arch/arm/kvm/exports.c                      |   39 +
 arch/arm/kvm/guest.c                        |  212 ++++++
 arch/arm/kvm/init.S                         |  156 ++++
 arch/arm/kvm/interrupts.S                   |  823 ++++++++++++++++++++++
 arch/arm/kvm/mmu.c                          |  998 ++++++++++++++++++++++++++
 arch/arm/kvm/reset.c                        |   74 ++
 arch/arm/kvm/trace.h                        |  170 ++++
 arch/arm/mm/idmap.c                         |   92 ++
 arch/arm/mm/mmu.c                           |    9 
 include/linux/kvm.h                         |    3 
 mm/memory.c                                 |    2 
 35 files changed, 6454 insertions(+), 71 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/asm/kvm_mmu.h
 create mode 100644 arch/arm/include/asm/perf_bits.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/exports.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

-- 

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 01/15] ARM: add mem_type prot_pte accessor
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:34   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:34 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

From: Marc Zyngier <marc.zyngier@arm.com>

The KVM hypervisor mmu code requires access to the mem_type prot_pte
field when setting up page tables pointing to a device. Unfortunately,
the mem_type structure is opaque.

Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
value.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/mach/map.h |    1 +
 arch/arm/mm/mmu.c               |    6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index a6efcdd..3787c9f 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -37,6 +37,7 @@ extern void iotable_init(struct map_desc *, int);
 
 struct mem_type;
 extern const struct mem_type *get_mem_type(unsigned int type);
+extern pteval_t get_mem_type_prot_pte(unsigned int type);
 /*
  * external interface to remap single page with appropriate type
  */
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 4c2d045..76bf4f5 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -301,6 +301,12 @@ const struct mem_type *get_mem_type(unsigned int type)
 }
 EXPORT_SYMBOL(get_mem_type);
 
+pteval_t get_mem_type_prot_pte(unsigned int type)
+{
+	return get_mem_type(type)->prot_pte;
+}
+EXPORT_SYMBOL(get_mem_type_prot_pte);
+
 /*
  * Adjust the PMD section entries according to the CPU in use.
  */


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 01/15] ARM: add mem_type prot_pte accessor
@ 2012-09-15 15:34   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

From: Marc Zyngier <marc.zyngier@arm.com>

The KVM hypervisor mmu code requires access to the mem_type prot_pte
field when setting up page tables pointing to a device. Unfortunately,
the mem_type structure is opaque.

Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
value.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/mach/map.h |    1 +
 arch/arm/mm/mmu.c               |    6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index a6efcdd..3787c9f 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -37,6 +37,7 @@ extern void iotable_init(struct map_desc *, int);
 
 struct mem_type;
 extern const struct mem_type *get_mem_type(unsigned int type);
+extern pteval_t get_mem_type_prot_pte(unsigned int type);
 /*
  * external interface to remap single page with appropriate type
  */
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 4c2d045..76bf4f5 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -301,6 +301,12 @@ const struct mem_type *get_mem_type(unsigned int type)
 }
 EXPORT_SYMBOL(get_mem_type);
 
+pteval_t get_mem_type_prot_pte(unsigned int type)
+{
+	return get_mem_type(type)->prot_pte;
+}
+EXPORT_SYMBOL(get_mem_type_prot_pte);
+
 /*
  * Adjust the PMD section entries according to the CPU in use.
  */

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 02/15] ARM: Add page table and page defines needed by KVM
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:34   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:34 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

KVM uses the stage-2 page tables and the Hyp page table format,
so let's define the fields we need to access in KVM.

We use pgprot_guest to indicate stage-2 entries.

Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/pgtable-3level.h |   13 +++++++++++++
 arch/arm/include/asm/pgtable.h        |    5 +++++
 arch/arm/mm/mmu.c                     |    3 +++
 3 files changed, 21 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index b249035..7351eee 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -102,11 +102,24 @@
  */
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
 
+/*
+ * 2-nd stage PTE definitions for LPAE.
+ */
+#define L_PTE2_SHARED		L_PTE_SHARED
+#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
+#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */
+#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
+#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */
+
 #ifndef __ASSEMBLY__
 
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & 2))
 #define pud_present(pud)	(pud_val(pud))
+#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_TABLE)
+#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_SECT)
 
 #define pud_clear(pudp)			\
 	do {				\
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 41dc31f..c422f62 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 
 extern pgprot_t		pgprot_user;
 extern pgprot_t		pgprot_kernel;
+extern pgprot_t		pgprot_guest;
 
 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
 
@@ -82,6 +83,10 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
+#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
+#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
+					  L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
+					  L_PTE2_SHARED)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 76bf4f5..a153fd4 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -56,9 +56,11 @@ static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
+pgprot_t pgprot_guest;
 
 EXPORT_SYMBOL(pgprot_user);
 EXPORT_SYMBOL(pgprot_kernel);
+EXPORT_SYMBOL(pgprot_guest);
 
 struct cachepolicy {
 	const char	policy[16];
@@ -514,6 +516,7 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
+	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
 
 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 02/15] ARM: Add page table and page defines needed by KVM
@ 2012-09-15 15:34   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

KVM uses the stage-2 page tables and the Hyp page table format,
so let's define the fields we need to access in KVM.

We use pgprot_guest to indicate stage-2 entries.

Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/pgtable-3level.h |   13 +++++++++++++
 arch/arm/include/asm/pgtable.h        |    5 +++++
 arch/arm/mm/mmu.c                     |    3 +++
 3 files changed, 21 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index b249035..7351eee 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -102,11 +102,24 @@
  */
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
 
+/*
+ * 2-nd stage PTE definitions for LPAE.
+ */
+#define L_PTE2_SHARED		L_PTE_SHARED
+#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
+#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */
+#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
+#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */
+
 #ifndef __ASSEMBLY__
 
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & 2))
 #define pud_present(pud)	(pud_val(pud))
+#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_TABLE)
+#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_SECT)
 
 #define pud_clear(pudp)			\
 	do {				\
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 41dc31f..c422f62 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 
 extern pgprot_t		pgprot_user;
 extern pgprot_t		pgprot_kernel;
+extern pgprot_t		pgprot_guest;
 
 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
 
@@ -82,6 +83,10 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
+#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
+#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
+					  L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
+					  L_PTE2_SHARED)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 76bf4f5..a153fd4 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -56,9 +56,11 @@ static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
+pgprot_t pgprot_guest;
 
 EXPORT_SYMBOL(pgprot_user);
 EXPORT_SYMBOL(pgprot_kernel);
+EXPORT_SYMBOL(pgprot_guest);
 
 struct cachepolicy {
 	const char	policy[16];
@@ -514,6 +516,7 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
+	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
 
 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 03/15] ARM: Section based HYP idmap
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:34   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:34 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

From: Marc Zyngier <marc.zyngier@arm.com>

Add a HYP pgd to the core code (so it can benefit all Linux
hypervisors).

Populate this pgd with an identity mapping of the code contained
in the .hyp.idmap.text section

Offer a method to drop the this identity mapping through
hyp_idmap_teardown and re-create it through hyp_idmap_setup.

Make all the above depend on CONFIG_ARM_VIRT_EXT

Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/idmap.h                |    7 ++
 arch/arm/include/asm/pgtable-3level-hwdef.h |    1 
 arch/arm/kernel/vmlinux.lds.S               |    6 ++
 arch/arm/mm/idmap.c                         |   88 +++++++++++++++++++++++----
 4 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
index bf863ed..a1ab8d6 100644
--- a/arch/arm/include/asm/idmap.h
+++ b/arch/arm/include/asm/idmap.h
@@ -11,4 +11,11 @@ extern pgd_t *idmap_pgd;
 
 void setup_mm_for_reboot(void);
 
+#ifdef CONFIG_ARM_VIRT_EXT
+extern pgd_t *hyp_pgd;
+
+void hyp_idmap_teardown(void);
+void hyp_idmap_setup(void);
+#endif
+
 #endif	/* __ASM_IDMAP_H */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..a2d404e 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -44,6 +44,7 @@
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
+#define PMD_SECT_AP1		(_AT(pmdval_t, 1) << 6)
 #define PMD_SECT_TEX(x)		(_AT(pmdval_t, 0))
 
 /*
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 36ff15b..12fd2eb 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -19,7 +19,11 @@
 	ALIGN_FUNCTION();						\
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
 	*(.idmap.text)							\
-	VMLINUX_SYMBOL(__idmap_text_end) = .;
+	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
+	ALIGN_FUNCTION();						\
+	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
+	*(.hyp.idmap.text)						\
+	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index ab88ed4..7a944af 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,4 +1,6 @@
+#include <linux/module.h>
 #include <linux/kernel.h>
+#include <linux/slab.h>
 
 #include <asm/cputype.h>
 #include <asm/idmap.h>
@@ -59,11 +61,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 	} while (pud++, addr = next, addr != end);
 }
 
-static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+static void identity_mapping_add(pgd_t *pgd, const char *text_start,
+				 const char *text_end, unsigned long prot)
 {
-	unsigned long prot, next;
+	unsigned long addr, end;
+	unsigned long next;
+
+	addr = virt_to_phys(text_start);
+	end = virt_to_phys(text_end);
+
+	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
+		prot ? "HYP " : "",
+		(long long)addr, (long long)end);
+	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
-	prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
 		prot |= PMD_BIT4;
 
@@ -78,24 +89,77 @@ extern char  __idmap_text_start[], __idmap_text_end[];
 
 static int __init init_static_idmap(void)
 {
-	phys_addr_t idmap_start, idmap_end;
-
 	idmap_pgd = pgd_alloc(&init_mm);
 	if (!idmap_pgd)
 		return -ENOMEM;
 
-	/* Add an identity mapping for the physical address of the section. */
-	idmap_start = virt_to_phys((void *)__idmap_text_start);
-	idmap_end = virt_to_phys((void *)__idmap_text_end);
-
-	pr_info("Setting up static identity map for 0x%llx - 0x%llx\n",
-		(long long)idmap_start, (long long)idmap_end);
-	identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
+	identity_mapping_add(idmap_pgd, __idmap_text_start,
+			     __idmap_text_end, 0);
 
 	return 0;
 }
 early_initcall(init_static_idmap);
 
+#ifdef CONFIG_ARM_VIRT_EXT
+pgd_t *hyp_pgd;
+EXPORT_SYMBOL_GPL(hyp_pgd);
+
+static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pud = pud_offset(pgd, addr);
+	pmd = pmd_offset(pud, addr);
+	pud_clear(pud);
+	clean_pmd_entry(pmd);
+	pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
+}
+
+extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
+
+/*
+ * This version actually frees the underlying pmds for all pgds in range and
+ * clear the pgds themselves afterwards.
+ */
+void hyp_idmap_teardown(void)
+{
+	unsigned long addr, end;
+	unsigned long next;
+	pgd_t *pgd = hyp_pgd;
+
+	addr = virt_to_phys(__hyp_idmap_text_start);
+	end = virt_to_phys(__hyp_idmap_text_end);
+
+	pgd += pgd_index(addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (!pgd_none_or_clear_bad(pgd))
+			hyp_idmap_del_pmd(pgd, addr);
+	} while (pgd++, addr = next, addr < end);
+}
+EXPORT_SYMBOL_GPL(hyp_idmap_teardown);
+
+void hyp_idmap_setup(void)
+{
+	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
+			     __hyp_idmap_text_end, PMD_SECT_AP1);
+}
+EXPORT_SYMBOL_GPL(hyp_idmap_setup);
+
+static int __init hyp_init_static_idmap(void)
+{
+	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!hyp_pgd)
+		return -ENOMEM;
+
+	hyp_idmap_setup();
+
+	return 0;
+}
+early_initcall(hyp_init_static_idmap);
+#endif
+
 /*
  * In order to soft-boot, we need to switch to a 1:1 mapping for the
  * cpu_reset functions. This will then ensure that we have predictable


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 03/15] ARM: Section based HYP idmap
@ 2012-09-15 15:34   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

From: Marc Zyngier <marc.zyngier@arm.com>

Add a HYP pgd to the core code (so it can benefit all Linux
hypervisors).

Populate this pgd with an identity mapping of the code contained
in the .hyp.idmap.text section

Offer a method to drop the this identity mapping through
hyp_idmap_teardown and re-create it through hyp_idmap_setup.

Make all the above depend on CONFIG_ARM_VIRT_EXT

Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/idmap.h                |    7 ++
 arch/arm/include/asm/pgtable-3level-hwdef.h |    1 
 arch/arm/kernel/vmlinux.lds.S               |    6 ++
 arch/arm/mm/idmap.c                         |   88 +++++++++++++++++++++++----
 4 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
index bf863ed..a1ab8d6 100644
--- a/arch/arm/include/asm/idmap.h
+++ b/arch/arm/include/asm/idmap.h
@@ -11,4 +11,11 @@ extern pgd_t *idmap_pgd;
 
 void setup_mm_for_reboot(void);
 
+#ifdef CONFIG_ARM_VIRT_EXT
+extern pgd_t *hyp_pgd;
+
+void hyp_idmap_teardown(void);
+void hyp_idmap_setup(void);
+#endif
+
 #endif	/* __ASM_IDMAP_H */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..a2d404e 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -44,6 +44,7 @@
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
+#define PMD_SECT_AP1		(_AT(pmdval_t, 1) << 6)
 #define PMD_SECT_TEX(x)		(_AT(pmdval_t, 0))
 
 /*
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 36ff15b..12fd2eb 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -19,7 +19,11 @@
 	ALIGN_FUNCTION();						\
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
 	*(.idmap.text)							\
-	VMLINUX_SYMBOL(__idmap_text_end) = .;
+	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
+	ALIGN_FUNCTION();						\
+	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
+	*(.hyp.idmap.text)						\
+	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index ab88ed4..7a944af 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,4 +1,6 @@
+#include <linux/module.h>
 #include <linux/kernel.h>
+#include <linux/slab.h>
 
 #include <asm/cputype.h>
 #include <asm/idmap.h>
@@ -59,11 +61,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 	} while (pud++, addr = next, addr != end);
 }
 
-static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+static void identity_mapping_add(pgd_t *pgd, const char *text_start,
+				 const char *text_end, unsigned long prot)
 {
-	unsigned long prot, next;
+	unsigned long addr, end;
+	unsigned long next;
+
+	addr = virt_to_phys(text_start);
+	end = virt_to_phys(text_end);
+
+	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
+		prot ? "HYP " : "",
+		(long long)addr, (long long)end);
+	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
-	prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
 		prot |= PMD_BIT4;
 
@@ -78,24 +89,77 @@ extern char  __idmap_text_start[], __idmap_text_end[];
 
 static int __init init_static_idmap(void)
 {
-	phys_addr_t idmap_start, idmap_end;
-
 	idmap_pgd = pgd_alloc(&init_mm);
 	if (!idmap_pgd)
 		return -ENOMEM;
 
-	/* Add an identity mapping for the physical address of the section. */
-	idmap_start = virt_to_phys((void *)__idmap_text_start);
-	idmap_end = virt_to_phys((void *)__idmap_text_end);
-
-	pr_info("Setting up static identity map for 0x%llx - 0x%llx\n",
-		(long long)idmap_start, (long long)idmap_end);
-	identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
+	identity_mapping_add(idmap_pgd, __idmap_text_start,
+			     __idmap_text_end, 0);
 
 	return 0;
 }
 early_initcall(init_static_idmap);
 
+#ifdef CONFIG_ARM_VIRT_EXT
+pgd_t *hyp_pgd;
+EXPORT_SYMBOL_GPL(hyp_pgd);
+
+static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pud = pud_offset(pgd, addr);
+	pmd = pmd_offset(pud, addr);
+	pud_clear(pud);
+	clean_pmd_entry(pmd);
+	pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
+}
+
+extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
+
+/*
+ * This version actually frees the underlying pmds for all pgds in range and
+ * clear the pgds themselves afterwards.
+ */
+void hyp_idmap_teardown(void)
+{
+	unsigned long addr, end;
+	unsigned long next;
+	pgd_t *pgd = hyp_pgd;
+
+	addr = virt_to_phys(__hyp_idmap_text_start);
+	end = virt_to_phys(__hyp_idmap_text_end);
+
+	pgd += pgd_index(addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (!pgd_none_or_clear_bad(pgd))
+			hyp_idmap_del_pmd(pgd, addr);
+	} while (pgd++, addr = next, addr < end);
+}
+EXPORT_SYMBOL_GPL(hyp_idmap_teardown);
+
+void hyp_idmap_setup(void)
+{
+	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
+			     __hyp_idmap_text_end, PMD_SECT_AP1);
+}
+EXPORT_SYMBOL_GPL(hyp_idmap_setup);
+
+static int __init hyp_init_static_idmap(void)
+{
+	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!hyp_pgd)
+		return -ENOMEM;
+
+	hyp_idmap_setup();
+
+	return 0;
+}
+early_initcall(hyp_init_static_idmap);
+#endif
+
 /*
  * In order to soft-boot, we need to switch to a 1:1 mapping for the
  * cpu_reset functions. This will then ensure that we have predictable

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 04/15] ARM: idmap: only initialize HYP idmap when HYP mode is available
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:34   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:34 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

From: Marc Zyngier <marc.zyngier@arm.com>

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/mm/idmap.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 7a944af..95e8d67 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -8,6 +8,7 @@
 #include <asm/pgtable.h>
 #include <asm/sections.h>
 #include <asm/system_info.h>
+#include <asm/virt.h>
 
 pgd_t *idmap_pgd;
 
@@ -149,6 +150,9 @@ EXPORT_SYMBOL_GPL(hyp_idmap_setup);
 
 static int __init hyp_init_static_idmap(void)
 {
+	if (!is_hyp_mode_available())
+		return 0;
+
 	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
 	if (!hyp_pgd)
 		return -ENOMEM;


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 04/15] ARM: idmap: only initialize HYP idmap when HYP mode is available
@ 2012-09-15 15:34   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

From: Marc Zyngier <marc.zyngier@arm.com>

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/mm/idmap.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 7a944af..95e8d67 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -8,6 +8,7 @@
 #include <asm/pgtable.h>
 #include <asm/sections.h>
 #include <asm/system_info.h>
+#include <asm/virt.h>
 
 pgd_t *idmap_pgd;
 
@@ -149,6 +150,9 @@ EXPORT_SYMBOL_GPL(hyp_idmap_setup);
 
 static int __init hyp_init_static_idmap(void)
 {
+	if (!is_hyp_mode_available())
+		return 0;
+
 	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
 	if (!hyp_pgd)
 		return -ENOMEM;

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 05/15] ARM: Expose PMNC bitfields for KVM use
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:35   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

From: Rusty Russell <rusty.russell@linaro.org>

We want some of these for use in KVM, so pull them out of
arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/perf_bits.h |   56 ++++++++++++++++++++++++++++++++++++++
 arch/arm/kernel/perf_event_v7.c  |   51 +----------------------------------
 2 files changed, 57 insertions(+), 50 deletions(-)
 create mode 100644 arch/arm/include/asm/perf_bits.h

diff --git a/arch/arm/include/asm/perf_bits.h b/arch/arm/include/asm/perf_bits.h
new file mode 100644
index 0000000..eeb266a
--- /dev/null
+++ b/arch/arm/include/asm/perf_bits.h
@@ -0,0 +1,56 @@
+#ifndef __ARM_PERF_BITS_H__
+#define __ARM_PERF_BITS_H__
+
+/*
+ * ARMv7 low level PMNC access
+ */
+
+/*
+ * Per-CPU PMNC: config reg
+ */
+#define ARMV7_PMNC_E		(1 << 0) /* Enable all counters */
+#define ARMV7_PMNC_P		(1 << 1) /* Reset all counters */
+#define ARMV7_PMNC_C		(1 << 2) /* Cycle counter reset */
+#define ARMV7_PMNC_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
+#define ARMV7_PMNC_X		(1 << 4) /* Export to ETM */
+#define ARMV7_PMNC_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
+#define	ARMV7_PMNC_N_SHIFT	11	 /* Number of counters supported */
+#define	ARMV7_PMNC_N_MASK	0x1f
+#define	ARMV7_PMNC_MASK		0x3f	 /* Mask for writable bits */
+
+/*
+ * FLAG: counters overflow flag status reg
+ */
+#define	ARMV7_FLAG_MASK		0xffffffff	/* Mask for writable bits */
+#define	ARMV7_OVERFLOWED_MASK	ARMV7_FLAG_MASK
+
+/*
+ * PMXEVTYPER: Event selection reg
+ */
+#define	ARMV7_EVTYPE_MASK	0xc00000ff	/* Mask for writable bits */
+#define	ARMV7_EVTYPE_EVENT	0xff		/* Mask for EVENT bits */
+
+/*
+ * Event filters for PMUv2
+ */
+#define	ARMV7_EXCLUDE_PL1	(1 << 31)
+#define	ARMV7_EXCLUDE_USER	(1 << 30)
+#define	ARMV7_INCLUDE_HYP	(1 << 27)
+
+#ifndef __ASSEMBLY__
+static inline u32 armv7_pmnc_read(void)
+{
+	u32 val;
+	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r"(val));
+	return val;
+}
+
+static inline void armv7_pmnc_write(u32 val)
+{
+	val &= ARMV7_PMNC_MASK;
+	isb();
+	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r"(val));
+}
+#endif
+
+#endif /* __ARM_PERF_BITS_H__ */
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index f04070b..09851b3 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -17,6 +17,7 @@
  */
 
 #ifdef CONFIG_CPU_V7
+#include <asm/perf_bits.h>
 
 static struct arm_pmu armv7pmu;
 
@@ -744,61 +745,11 @@ static const unsigned armv7_a7_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 #define	ARMV7_COUNTER_MASK	(ARMV7_MAX_COUNTERS - 1)
 
 /*
- * ARMv7 low level PMNC access
- */
-
-/*
  * Perf Event to low level counters mapping
  */
 #define	ARMV7_IDX_TO_COUNTER(x)	\
 	(((x) - ARMV7_IDX_COUNTER0) & ARMV7_COUNTER_MASK)
 
-/*
- * Per-CPU PMNC: config reg
- */
-#define ARMV7_PMNC_E		(1 << 0) /* Enable all counters */
-#define ARMV7_PMNC_P		(1 << 1) /* Reset all counters */
-#define ARMV7_PMNC_C		(1 << 2) /* Cycle counter reset */
-#define ARMV7_PMNC_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
-#define ARMV7_PMNC_X		(1 << 4) /* Export to ETM */
-#define ARMV7_PMNC_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
-#define	ARMV7_PMNC_N_SHIFT	11	 /* Number of counters supported */
-#define	ARMV7_PMNC_N_MASK	0x1f
-#define	ARMV7_PMNC_MASK		0x3f	 /* Mask for writable bits */
-
-/*
- * FLAG: counters overflow flag status reg
- */
-#define	ARMV7_FLAG_MASK		0xffffffff	/* Mask for writable bits */
-#define	ARMV7_OVERFLOWED_MASK	ARMV7_FLAG_MASK
-
-/*
- * PMXEVTYPER: Event selection reg
- */
-#define	ARMV7_EVTYPE_MASK	0xc00000ff	/* Mask for writable bits */
-#define	ARMV7_EVTYPE_EVENT	0xff		/* Mask for EVENT bits */
-
-/*
- * Event filters for PMUv2
- */
-#define	ARMV7_EXCLUDE_PL1	(1 << 31)
-#define	ARMV7_EXCLUDE_USER	(1 << 30)
-#define	ARMV7_INCLUDE_HYP	(1 << 27)
-
-static inline u32 armv7_pmnc_read(void)
-{
-	u32 val;
-	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r"(val));
-	return val;
-}
-
-static inline void armv7_pmnc_write(u32 val)
-{
-	val &= ARMV7_PMNC_MASK;
-	isb();
-	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r"(val));
-}
-
 static inline int armv7_pmnc_has_overflowed(u32 pmnc)
 {
 	return pmnc & ARMV7_OVERFLOWED_MASK;


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 05/15] ARM: Expose PMNC bitfields for KVM use
@ 2012-09-15 15:35   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rusty Russell <rusty.russell@linaro.org>

We want some of these for use in KVM, so pull them out of
arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/perf_bits.h |   56 ++++++++++++++++++++++++++++++++++++++
 arch/arm/kernel/perf_event_v7.c  |   51 +----------------------------------
 2 files changed, 57 insertions(+), 50 deletions(-)
 create mode 100644 arch/arm/include/asm/perf_bits.h

diff --git a/arch/arm/include/asm/perf_bits.h b/arch/arm/include/asm/perf_bits.h
new file mode 100644
index 0000000..eeb266a
--- /dev/null
+++ b/arch/arm/include/asm/perf_bits.h
@@ -0,0 +1,56 @@
+#ifndef __ARM_PERF_BITS_H__
+#define __ARM_PERF_BITS_H__
+
+/*
+ * ARMv7 low level PMNC access
+ */
+
+/*
+ * Per-CPU PMNC: config reg
+ */
+#define ARMV7_PMNC_E		(1 << 0) /* Enable all counters */
+#define ARMV7_PMNC_P		(1 << 1) /* Reset all counters */
+#define ARMV7_PMNC_C		(1 << 2) /* Cycle counter reset */
+#define ARMV7_PMNC_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
+#define ARMV7_PMNC_X		(1 << 4) /* Export to ETM */
+#define ARMV7_PMNC_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
+#define	ARMV7_PMNC_N_SHIFT	11	 /* Number of counters supported */
+#define	ARMV7_PMNC_N_MASK	0x1f
+#define	ARMV7_PMNC_MASK		0x3f	 /* Mask for writable bits */
+
+/*
+ * FLAG: counters overflow flag status reg
+ */
+#define	ARMV7_FLAG_MASK		0xffffffff	/* Mask for writable bits */
+#define	ARMV7_OVERFLOWED_MASK	ARMV7_FLAG_MASK
+
+/*
+ * PMXEVTYPER: Event selection reg
+ */
+#define	ARMV7_EVTYPE_MASK	0xc00000ff	/* Mask for writable bits */
+#define	ARMV7_EVTYPE_EVENT	0xff		/* Mask for EVENT bits */
+
+/*
+ * Event filters for PMUv2
+ */
+#define	ARMV7_EXCLUDE_PL1	(1 << 31)
+#define	ARMV7_EXCLUDE_USER	(1 << 30)
+#define	ARMV7_INCLUDE_HYP	(1 << 27)
+
+#ifndef __ASSEMBLY__
+static inline u32 armv7_pmnc_read(void)
+{
+	u32 val;
+	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r"(val));
+	return val;
+}
+
+static inline void armv7_pmnc_write(u32 val)
+{
+	val &= ARMV7_PMNC_MASK;
+	isb();
+	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r"(val));
+}
+#endif
+
+#endif /* __ARM_PERF_BITS_H__ */
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index f04070b..09851b3 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -17,6 +17,7 @@
  */
 
 #ifdef CONFIG_CPU_V7
+#include <asm/perf_bits.h>
 
 static struct arm_pmu armv7pmu;
 
@@ -744,61 +745,11 @@ static const unsigned armv7_a7_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 #define	ARMV7_COUNTER_MASK	(ARMV7_MAX_COUNTERS - 1)
 
 /*
- * ARMv7 low level PMNC access
- */
-
-/*
  * Perf Event to low level counters mapping
  */
 #define	ARMV7_IDX_TO_COUNTER(x)	\
 	(((x) - ARMV7_IDX_COUNTER0) & ARMV7_COUNTER_MASK)
 
-/*
- * Per-CPU PMNC: config reg
- */
-#define ARMV7_PMNC_E		(1 << 0) /* Enable all counters */
-#define ARMV7_PMNC_P		(1 << 1) /* Reset all counters */
-#define ARMV7_PMNC_C		(1 << 2) /* Cycle counter reset */
-#define ARMV7_PMNC_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
-#define ARMV7_PMNC_X		(1 << 4) /* Export to ETM */
-#define ARMV7_PMNC_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
-#define	ARMV7_PMNC_N_SHIFT	11	 /* Number of counters supported */
-#define	ARMV7_PMNC_N_MASK	0x1f
-#define	ARMV7_PMNC_MASK		0x3f	 /* Mask for writable bits */
-
-/*
- * FLAG: counters overflow flag status reg
- */
-#define	ARMV7_FLAG_MASK		0xffffffff	/* Mask for writable bits */
-#define	ARMV7_OVERFLOWED_MASK	ARMV7_FLAG_MASK
-
-/*
- * PMXEVTYPER: Event selection reg
- */
-#define	ARMV7_EVTYPE_MASK	0xc00000ff	/* Mask for writable bits */
-#define	ARMV7_EVTYPE_EVENT	0xff		/* Mask for EVENT bits */
-
-/*
- * Event filters for PMUv2
- */
-#define	ARMV7_EXCLUDE_PL1	(1 << 31)
-#define	ARMV7_EXCLUDE_USER	(1 << 30)
-#define	ARMV7_INCLUDE_HYP	(1 << 27)
-
-static inline u32 armv7_pmnc_read(void)
-{
-	u32 val;
-	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r"(val));
-	return val;
-}
-
-static inline void armv7_pmnc_write(u32 val)
-{
-	val &= ARMV7_PMNC_MASK;
-	isb();
-	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r"(val));
-}
-
 static inline int armv7_pmnc_has_overflowed(u32 pmnc)
 {
 	return pmnc & ARMV7_OVERFLOWED_MASK;

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:35   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

“Nothing to see here. Move along, move along..."

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt  |   54 +++++-
 arch/arm/Kconfig                   |    2 
 arch/arm/Makefile                  |    1 
 arch/arm/include/asm/kvm.h         |   88 +++++++++
 arch/arm/include/asm/kvm_arm.h     |   28 +++
 arch/arm/include/asm/kvm_asm.h     |   30 +++
 arch/arm/include/asm/kvm_coproc.h  |   24 +++
 arch/arm/include/asm/kvm_emulate.h |  108 +++++++++++
 arch/arm/include/asm/kvm_host.h    |  172 ++++++++++++++++++
 arch/arm/kvm/Kconfig               |   44 +++++
 arch/arm/kvm/Makefile              |   21 ++
 arch/arm/kvm/arm.c                 |  345 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/coproc.c              |   22 ++
 arch/arm/kvm/emulate.c             |  127 +++++++++++++
 arch/arm/kvm/exports.c             |   21 ++
 arch/arm/kvm/guest.c               |  211 ++++++++++++++++++++++
 arch/arm/kvm/init.S                |   19 ++
 arch/arm/kvm/interrupts.S          |   19 ++
 arch/arm/kvm/mmu.c                 |   17 ++
 arch/arm/kvm/reset.c               |   74 ++++++++
 arch/arm/kvm/trace.h               |   52 +++++
 include/linux/kvm.h                |    2 
 22 files changed, 1477 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/exports.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 36befa7..67640c6 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -293,7 +293,7 @@ kvm_run' (see below).
 4.11 KVM_GET_REGS
 
 Capability: basic
-Architectures: all
+Architectures: all except ARM
 Type: vcpu ioctl
 Parameters: struct kvm_regs (out)
 Returns: 0 on success, -1 on error
@@ -314,7 +314,7 @@ struct kvm_regs {
 4.12 KVM_SET_REGS
 
 Capability: basic
-Architectures: all
+Architectures: all except ARM
 Type: vcpu ioctl
 Parameters: struct kvm_regs (in)
 Returns: 0 on success, -1 on error
@@ -600,7 +600,7 @@ struct kvm_fpu {
 4.24 KVM_CREATE_IRQCHIP
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, ARM
 Type: vm ioctl
 Parameters: none
 Returns: 0 on success, -1 on error
@@ -608,7 +608,8 @@ Returns: 0 on success, -1 on error
 Creates an interrupt controller model in the kernel.  On x86, creates a virtual
 ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
 local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
-only go to the IOAPIC.  On ia64, a IOSAPIC is created.
+only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM, a GIC is
+created.
 
 
 4.25 KVM_IRQ_LINE
@@ -1732,6 +1733,11 @@ registers, find a list below:
         |                       |
   PPC   | KVM_REG_PPC_HIOR      | 64
 
+ARM registers are mapped using the lower 32 bits.  The upper 16 of that
+is the register group type, or coprocessor number:
+
+ARM core registers have the following id bit patterns:
+  0x4002 0000 0010 <index into the kvm_regs struct:32>
 
 4.69 KVM_GET_ONE_REG
 
@@ -1985,6 +1991,46 @@ the virtualized real-mode area (VRMA) facility, the kernel will
 re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
 
 
+4.77 KVM_ARM_VCPU_INIT
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct struct kvm_vcpu_init (in)
+Returns: 0 on success; -1 on error
+Errors:
+  EINVAL:    the target is unknown, or the combination of features is invalid.
+  ENOENT:    a features bit specified is unknown.
+
+This tells KVM what type of CPU to present to the guest, and what
+optional features it should have.  This will cause a reset of the cpu
+registers to their initial values.  If this is not called, KVM_RUN will
+return ENOEXEC for that vcpu.
+
+Note that because some registers reflect machine topology, all vcpus
+should be created before this ioctl is invoked.
+
+
+4.78 KVM_GET_REG_LIST
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct kvm_reg_list (in/out)
+Returns: 0 on success; -1 on error
+Errors:
+  E2BIG:     the reg index list is too big to fit in the array specified by
+             the user (the number required will be written into n).
+
+struct kvm_reg_list {
+	__u64 n; /* number of registers in reg[] */
+	__u64 reg[0];
+};
+
+This ioctl returns the guest registers that are supported for the
+KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
+
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2f88d8d..c8fea8f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2336,3 +2336,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
+
+source "arch/arm/kvm/Kconfig"
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 30eae87..3bcc414 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -255,6 +255,7 @@ core-$(CONFIG_VFP)		+= arch/arm/vfp/
 # If we have a machine-specific directory, then include it in the build.
 core-y				+= arch/arm/kernel/ arch/arm/mm/ arch/arm/common/
 core-y				+= arch/arm/net/
+core-y 				+= arch/arm/kvm/
 core-y				+= $(machdirs) $(platdirs)
 
 drivers-$(CONFIG_OPROFILE)      += arch/arm/oprofile/
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
new file mode 100644
index 0000000..a13b582
--- /dev/null
+++ b/arch/arm/include/asm/kvm.h
@@ -0,0 +1,88 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_H__
+#define __ARM_KVM_H__
+
+#include <asm/types.h>
+
+#define __KVM_HAVE_GUEST_DEBUG
+
+#define KVM_REG_SIZE(id)						\
+	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
+
+struct kvm_regs {
+	__u32 usr_regs[15];	/* R0_usr - R14_usr */
+	__u32 svc_regs[3];	/* SP_svc, LR_svc, SPSR_svc */
+	__u32 abt_regs[3];	/* SP_abt, LR_abt, SPSR_abt */
+	__u32 und_regs[3];	/* SP_und, LR_und, SPSR_und */
+	__u32 irq_regs[3];	/* SP_irq, LR_irq, SPSR_irq */
+	__u32 fiq_regs[8];	/* R8_fiq - R14_fiq, SPSR_fiq */
+	__u32 pc;		/* The program counter (r15) */
+	__u32 cpsr;		/* The guest CPSR */
+};
+
+/* Supported Processor Types */
+#define KVM_ARM_TARGET_CORTEX_A15	(0xC0F)
+
+struct kvm_vcpu_init {
+	__u32 target;
+	__u32 features[7];
+};
+
+struct kvm_sregs {
+};
+
+struct kvm_fpu {
+};
+
+struct kvm_guest_debug_arch {
+};
+
+struct kvm_debug_exit_arch {
+};
+
+struct kvm_sync_regs {
+};
+
+struct kvm_arch_memory_slot {
+};
+
+/* For KVM_VCPU_GET_REG_LIST. */
+struct kvm_reg_list {
+	__u64 n; /* number of regs */
+	__u64 reg[0];
+};
+
+/* If you need to interpret the index values, here is the key: */
+#define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
+#define KVM_REG_ARM_COPROC_SHIFT	16
+#define KVM_REG_ARM_32_OPC2_MASK	0x0000000000000007
+#define KVM_REG_ARM_32_OPC2_SHIFT	0
+#define KVM_REG_ARM_OPC1_MASK		0x0000000000000078
+#define KVM_REG_ARM_OPC1_SHIFT		3
+#define KVM_REG_ARM_CRM_MASK		0x0000000000000780
+#define KVM_REG_ARM_CRM_SHIFT		7
+#define KVM_REG_ARM_32_CRN_MASK		0x0000000000007800
+#define KVM_REG_ARM_32_CRN_SHIFT	11
+
+/* Normal registers are mapped as coprocessor 16. */
+#define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
+
+#endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
new file mode 100644
index 0000000..2f9d28e
--- /dev/null
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_ARM_H__
+#define __ARM_KVM_ARM_H__
+
+/* Supported Processor Types */
+#define CORTEX_A15	(0xC0F)
+
+/* Multiprocessor Affinity Register */
+#define MPIDR_CPUID	(0x3 << 0)
+
+#endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
new file mode 100644
index 0000000..44591f9
--- /dev/null
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_ASM_H__
+#define __ARM_KVM_ASM_H__
+
+#define ARM_EXCEPTION_RESET	  0
+#define ARM_EXCEPTION_UNDEFINED   1
+#define ARM_EXCEPTION_SOFTWARE    2
+#define ARM_EXCEPTION_PREF_ABORT  3
+#define ARM_EXCEPTION_DATA_ABORT  4
+#define ARM_EXCEPTION_IRQ	  5
+#define ARM_EXCEPTION_FIQ	  6
+
+#endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
new file mode 100644
index 0000000..b6d023d
--- /dev/null
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2012 Rusty Russell IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_COPROC_H__
+#define __ARM_KVM_COPROC_H__
+#include <linux/kvm_host.h>
+
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
+
+#endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
new file mode 100644
index 0000000..9e29335
--- /dev/null
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_EMULATE_H__
+#define __ARM_KVM_EMULATE_H__
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_asm.h>
+
+u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, enum vcpu_mode mode);
+
+static inline u8 __vcpu_mode(u32 cpsr)
+{
+	u8 modes_table[32] = {
+		0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
+		0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
+		MODE_USR,	/* 0x0 */
+		MODE_FIQ,	/* 0x1 */
+		MODE_IRQ,	/* 0x2 */
+		MODE_SVC,	/* 0x3 */
+		0xf, 0xf, 0xf,
+		MODE_ABT,	/* 0x7 */
+		0xf, 0xf, 0xf,
+		MODE_UND,	/* 0xb */
+		0xf, 0xf, 0xf,
+		MODE_SYS	/* 0xf */
+	};
+
+	return modes_table[cpsr & 0x1f];
+}
+
+static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
+{
+	u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
+	BUG_ON(mode == 0xf);
+	return mode;
+}
+
+/*
+ * Return the SPSR for the specified mode of the virtual CPU.
+ */
+static inline u32 *vcpu_spsr_mode(struct kvm_vcpu *vcpu, enum vcpu_mode mode)
+{
+	switch (mode) {
+	case MODE_SVC:
+		return &vcpu->arch.regs.svc_regs[2];
+	case MODE_ABT:
+		return &vcpu->arch.regs.abt_regs[2];
+	case MODE_UND:
+		return &vcpu->arch.regs.und_regs[2];
+	case MODE_IRQ:
+		return &vcpu->arch.regs.irq_regs[2];
+	case MODE_FIQ:
+		return &vcpu->arch.regs.fiq_regs[7];
+	default:
+		BUG();
+	}
+}
+
+/* Get vcpu register for current mode */
+static inline u32 *vcpu_reg(struct kvm_vcpu *vcpu, unsigned long reg_num)
+{
+	return vcpu_reg_mode(vcpu, reg_num, vcpu_mode(vcpu));
+}
+
+static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
+{
+	return vcpu_reg(vcpu, 15);
+}
+
+static inline u32 *vcpu_cpsr(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.regs.cpsr;
+}
+
+/* Get vcpu SPSR for current mode */
+static inline u32 *vcpu_spsr(struct kvm_vcpu *vcpu)
+{
+	return vcpu_spsr_mode(vcpu, vcpu_mode(vcpu));
+}
+
+static inline bool mode_has_spsr(struct kvm_vcpu *vcpu)
+{
+	return (vcpu_mode(vcpu) < MODE_USR);
+}
+
+static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
+{
+	BUG_ON(vcpu_mode(vcpu) > MODE_SYS);
+	return vcpu_mode(vcpu) != MODE_USR;
+}
+
+#endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
new file mode 100644
index 0000000..24959f4
--- /dev/null
+++ b/arch/arm/include/asm/kvm_host.h
@@ -0,0 +1,172 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_HOST_H__
+#define __ARM_KVM_HOST_H__
+
+#include <asm/kvm.h>
+
+#define KVM_MAX_VCPUS 4
+#define KVM_MEMORY_SLOTS 32
+#define KVM_PRIVATE_MEM_SLOTS 4
+#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+
+#define NUM_FEATURES 0
+
+/* We don't currently support large pages. */
+#define KVM_HPAGE_GFN_SHIFT(x)	0
+#define KVM_NR_PAGE_SIZES	1
+#define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
+
+struct kvm_vcpu;
+u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
+int kvm_target_cpu(void);
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
+
+struct kvm_arch {
+	/* The VMID generation used for the virt. memory system */
+	u64    vmid_gen;
+	u32    vmid;
+
+	/* 1-level 2nd stage table and lock */
+	spinlock_t pgd_lock;
+	pgd_t *pgd;
+
+	/* VTTBR value associated with above pgd and vmid */
+	u64    vttbr;
+};
+
+#define EXCEPTION_NONE      0
+#define EXCEPTION_RESET     0x80
+#define EXCEPTION_UNDEFINED 0x40
+#define EXCEPTION_SOFTWARE  0x20
+#define EXCEPTION_PREFETCH  0x10
+#define EXCEPTION_DATA      0x08
+#define EXCEPTION_IMPRECISE 0x04
+#define EXCEPTION_IRQ       0x02
+#define EXCEPTION_FIQ       0x01
+
+#define KVM_NR_MEM_OBJS     40
+
+/*
+ * We don't want allocation failures within the mmu code, so we preallocate
+ * enough memory for a single page fault in a cache.
+ */
+struct kvm_mmu_memory_cache {
+	int nobjs;
+	void *objects[KVM_NR_MEM_OBJS];
+};
+
+/*
+ * Modes used for short-hand mode determinition in the world-switch code and
+ * in emulation code.
+ *
+ * Note: These indices do NOT correspond to the value of the CPSR mode bits!
+ */
+enum vcpu_mode {
+	MODE_FIQ = 0,
+	MODE_IRQ,
+	MODE_SVC,
+	MODE_ABT,
+	MODE_UND,
+	MODE_USR,
+	MODE_SYS
+};
+
+/* 0 is reserved as an invalid value. */
+enum cp15_regs {
+	c0_MPIDR=1,		/* MultiProcessor ID Register */
+	c0_CSSELR,		/* Cache Size Selection Register */
+	c1_SCTLR,		/* System Control Register */
+	c1_ACTLR,		/* Auxilliary Control Register */
+	c1_CPACR,		/* Coprocessor Access Control */
+	c2_TTBR0,		/* Translation Table Base Register 0 */
+	c2_TTBR0_high,		/* TTBR0 top 32 bits */
+	c2_TTBR1,		/* Translation Table Base Register 1 */
+	c2_TTBR1_high,		/* TTBR1 top 32 bits */
+	c2_TTBCR,		/* Translation Table Base Control R. */
+	c3_DACR,		/* Domain Access Control Register */
+	c5_DFSR,		/* Data Fault Status Register */
+	c5_IFSR,		/* Instruction Fault Status Register */
+	c5_ADFSR,		/* Auxilary Data Fault Status Register */
+	c5_AIFSR,		/* Auxilary Instruction Fault Status Register */
+	c6_DFAR,		/* Data Fault Address Register */
+	c6_IFAR,		/* Instruction Fault Address Register */
+	c10_PRRR,		/* Primary Region Remap Register */
+	c10_NMRR,		/* Normal Memory Remap Register */
+	c12_VBAR,		/* Vector Base Address Register */
+	c13_CID,		/* Context ID Register */
+	c13_TID_URW,		/* Thread ID, User R/W */
+	c13_TID_URO,		/* Thread ID, User R/O */
+	c13_TID_PRIV,		/* Thread ID, Priveleged */
+
+	nr_cp15_regs
+};
+
+struct kvm_vcpu_arch {
+	struct kvm_regs regs;
+
+	u32 target; /* Currently KVM_ARM_TARGET_CORTEX_A15 */
+	DECLARE_BITMAP(features, NUM_FEATURES);
+
+	/* System control coprocessor (cp15) */
+	u32 cp15[nr_cp15_regs];
+
+	/* The CPU type we expose to the VM */
+	u32 midr;
+
+	/* Exception Information */
+	u32 hsr;		/* Hyp Syndrom Register */
+	u32 hdfar;		/* Hyp Data Fault Address Register */
+	u32 hifar;		/* Hyp Inst. Fault Address Register */
+	u32 hpfar;		/* Hyp IPA Fault Address Register */
+
+	/* IO related fields */
+	struct {
+		bool sign_extend;	/* for byte/halfword loads */
+		u32  rd;
+	} mmio;
+
+	/* Interrupt related fields */
+	u32 irq_lines;		/* IRQ and FIQ levels */
+
+	/* Hyp exception information */
+	u32 hyp_pc;		/* PC when exception was taken from Hyp mode */
+
+	/* Cache some mmu pages needed inside spinlock regions */
+	struct kvm_mmu_memory_cache mmu_page_cache;
+};
+
+struct kvm_vm_stat {
+	u32 remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat {
+	u32 halt_wakeup;
+};
+
+struct kvm_vcpu_init;
+int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
+			const struct kvm_vcpu_init *init);
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
+struct kvm_one_reg;
+int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
new file mode 100644
index 0000000..a07ddcc
--- /dev/null
+++ b/arch/arm/kvm/Kconfig
@@ -0,0 +1,44 @@
+#
+# KVM configuration
+#
+
+source "virt/kvm/Kconfig"
+
+menuconfig VIRTUALIZATION
+	bool "Virtualization"
+	---help---
+	  Say Y here to get to see options for using your Linux host to run
+	  other operating systems inside virtual machines (guests).
+	  This option alone does not add any kernel code.
+
+	  If you say N, all options in this submenu will be skipped and
+	  disabled.
+
+if VIRTUALIZATION
+
+config KVM
+	bool "Kernel-based Virtual Machine (KVM) support"
+	select PREEMPT_NOTIFIERS
+	select ANON_INODES
+	select KVM_MMIO
+	depends on ARM_VIRT_EXT && ARM_LPAE
+	---help---
+	  Support hosting virtualized guest machines. You will also
+	  need to select one or more of the processor modules below.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
+config KVM_ARM_HOST
+	bool "KVM host support for ARM cpus."
+	depends on KVM
+	depends on MMU
+	depends on CPU_V7 && ARM_VIRT_EXT
+	---help---
+	  Provides host support for ARM processors.
+
+source drivers/virtio/Kconfig
+
+endif # VIRTUALIZATION
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
new file mode 100644
index 0000000..db8c8f4
--- /dev/null
+++ b/arch/arm/kvm/Makefile
@@ -0,0 +1,21 @@
+#
+# Makefile for Kernel-based Virtual Machine module
+#
+
+plus_virt := $(call as-instr,.arch_extension virt,+virt)
+ifeq ($(plus_virt),+virt)
+	plus_virt_def := -DREQUIRES_VIRT=1
+endif
+
+ccflags-y += -Ivirt/kvm -Iarch/arm/kvm
+CFLAGS_arm.o := -I. $(plus_virt_def)
+CFLAGS_mmu.o := -I.
+
+AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
+AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
+
+obj-$(CONFIG_KVM_ARM_HOST) += init.o interrupts.o exports.o
+
+obj-$(CONFIG_KVM_ARM_HOST) += $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+
+obj-$(CONFIG_KVM_ARM_HOST) += arm.o guest.o mmu.o emulate.o reset.o coproc.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
new file mode 100644
index 0000000..fd6fa9b
--- /dev/null
+++ b/arch/arm/kvm/arm.c
@@ -0,0 +1,345 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <linux/mman.h>
+#include <linux/sched.h>
+#include <trace/events/kvm.h>
+
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
+#include <asm/unified.h>
+#include <asm/uaccess.h>
+#include <asm/ptrace.h>
+#include <asm/mman.h>
+#include <asm/cputype.h>
+
+#ifdef REQUIRES_VIRT
+__asm__(".arch_extension	virt");
+#endif
+
+int kvm_arch_hardware_enable(void *garbage)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
+{
+	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
+}
+
+void kvm_arch_hardware_disable(void *garbage)
+{
+}
+
+int kvm_arch_hardware_setup(void)
+{
+	return 0;
+}
+
+void kvm_arch_hardware_unsetup(void)
+{
+}
+
+void kvm_arch_check_processor_compat(void *rtn)
+{
+	*(int *)rtn = 0;
+}
+
+void kvm_arch_sync_events(struct kvm *kvm)
+{
+}
+
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+{
+	if (type)
+		return -EINVAL;
+
+	return 0;
+}
+
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+	return VM_FAULT_SIGBUS;
+}
+
+void kvm_arch_free_memslot(struct kvm_memory_slot *free,
+			   struct kvm_memory_slot *dont)
+{
+}
+
+int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
+{
+	return 0;
+}
+
+void kvm_arch_destroy_vm(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+		if (kvm->vcpus[i]) {
+			kvm_arch_vcpu_free(kvm->vcpus[i]);
+			kvm->vcpus[i] = NULL;
+		}
+	}
+}
+
+int kvm_dev_ioctl_check_extension(long ext)
+{
+	int r;
+	switch (ext) {
+	case KVM_CAP_USER_MEMORY:
+	case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
+	case KVM_CAP_ONE_REG:
+		r = 1;
+		break;
+	case KVM_CAP_COALESCED_MMIO:
+		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
+		break;
+	default:
+		r = 0;
+		break;
+	}
+	return r;
+}
+
+long kvm_arch_dev_ioctl(struct file *filp,
+			unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_set_memory_region(struct kvm *kvm,
+			       struct kvm_userspace_memory_region *mem,
+			       struct kvm_memory_slot old,
+			       int user_alloc)
+{
+	return 0;
+}
+
+int kvm_arch_prepare_memory_region(struct kvm *kvm,
+				   struct kvm_memory_slot *memslot,
+				   struct kvm_memory_slot old,
+				   struct kvm_userspace_memory_region *mem,
+				   int user_alloc)
+{
+	return 0;
+}
+
+void kvm_arch_commit_memory_region(struct kvm *kvm,
+				   struct kvm_userspace_memory_region *mem,
+				   struct kvm_memory_slot old,
+				   int user_alloc)
+{
+}
+
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+}
+
+void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+				   struct kvm_memory_slot *slot)
+{
+}
+
+struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	int err;
+	struct kvm_vcpu *vcpu;
+
+	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	if (!vcpu) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	return vcpu;
+free_vcpu:
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
+out:
+	return ERR_PTR(err);
+}
+
+void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvm_arch_vcpu_free(vcpu);
+}
+
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int __attribute_const__ kvm_target_cpu(void)
+{
+	unsigned int midr;
+
+	midr = read_cpuid_id();
+	switch ((midr >> 4) & 0xfff) {
+	case KVM_ARM_TARGET_CORTEX_A15:
+		return KVM_ARM_TARGET_CORTEX_A15;
+	default:
+		return -EINVAL;
+	}
+}
+
+int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
+void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
+{
+}
+
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+					struct kvm_guest_debug *dbg)
+{
+	return -EINVAL;
+}
+
+
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vcpu_ioctl(struct file *filp,
+			 unsigned int ioctl, unsigned long arg)
+{
+	struct kvm_vcpu *vcpu = filp->private_data;
+	void __user *argp = (void __user *)arg;
+
+	switch (ioctl) {
+	case KVM_ARM_VCPU_INIT: {
+		struct kvm_vcpu_init init;
+
+		if (copy_from_user(&init, argp, sizeof init))
+			return -EFAULT;
+
+		return kvm_vcpu_set_target(vcpu, &init);
+
+	}
+	case KVM_SET_ONE_REG:
+	case KVM_GET_ONE_REG: {
+		struct kvm_one_reg reg;
+		if (copy_from_user(&reg, argp, sizeof(reg)))
+			return -EFAULT;
+		if (ioctl == KVM_SET_ONE_REG)
+			return kvm_arm_set_reg(vcpu, &reg);
+		else
+			return kvm_arm_get_reg(vcpu, &reg);
+	}
+	case KVM_GET_REG_LIST: {
+		struct kvm_reg_list __user *user_list = argp;
+		struct kvm_reg_list reg_list;
+		unsigned n;
+
+		if (copy_from_user(&reg_list, user_list, sizeof reg_list))
+			return -EFAULT;
+		n = reg_list.n;
+		reg_list.n = kvm_arm_num_regs(vcpu);
+		if (copy_to_user(user_list, &reg_list, sizeof reg_list))
+			return -EFAULT;
+		if (n < reg_list.n)
+			return -E2BIG;
+		return kvm_arm_copy_reg_indices(vcpu, user_list->reg);
+	}
+	default:
+		return -EINVAL;
+	}
+}
+
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vm_ioctl(struct file *filp,
+		       unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_init(void *opaque)
+{
+	return 0;
+}
+
+void kvm_arch_exit(void)
+{
+}
+
+static int arm_init(void)
+{
+	int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+	return rc;
+}
+
+static void __exit arm_exit(void)
+{
+	kvm_exit();
+}
+
+module_init(arm_init);
+module_exit(arm_exit)
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
new file mode 100644
index 0000000..4b9dad8
--- /dev/null
+++ b/arch/arm/kvm/coproc.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/kvm_host.h>
+
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
+{
+}
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
new file mode 100644
index 0000000..690bbb3
--- /dev/null
+++ b/arch/arm/kvm/emulate.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <asm/kvm_emulate.h>
+
+#define REG_OFFSET(_reg) \
+	(offsetof(struct kvm_regs, _reg) / sizeof(u32))
+
+#define USR_REG_OFFSET(_num) REG_OFFSET(usr_regs[_num])
+
+static const unsigned long vcpu_reg_offsets[MODE_SYS + 1][16] = {
+	/* FIQ Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7),
+		REG_OFFSET(fiq_regs[1]), /* r8 */
+		REG_OFFSET(fiq_regs[1]), /* r9 */
+		REG_OFFSET(fiq_regs[2]), /* r10 */
+		REG_OFFSET(fiq_regs[3]), /* r11 */
+		REG_OFFSET(fiq_regs[4]), /* r12 */
+		REG_OFFSET(fiq_regs[5]), /* r13 */
+		REG_OFFSET(fiq_regs[6]), /* r14 */
+		REG_OFFSET(pc)		 /* r15 */
+	},
+
+	/* IRQ Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(irq_regs[0]), /* r13 */
+		REG_OFFSET(irq_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* SVC Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(svc_regs[0]), /* r13 */
+		REG_OFFSET(svc_regs[1]), /* r14 */
+		REG_OFFSET(pc)		 /* r15 */
+	},
+
+	/* ABT Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(abt_regs[0]), /* r13 */
+		REG_OFFSET(abt_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* UND Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(und_regs[0]), /* r13 */
+		REG_OFFSET(und_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* USR Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(usr_regs[13]), /* r13 */
+		REG_OFFSET(usr_regs[14]), /* r14 */
+		REG_OFFSET(pc)	          /* r15 */
+	},
+
+	/* SYS Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(usr_regs[13]), /* r13 */
+		REG_OFFSET(usr_regs[14]), /* r14 */
+		REG_OFFSET(pc)	          /* r15 */
+	},
+};
+
+/*
+ * Return a pointer to the register number valid in the specified mode of
+ * the virtual CPU.
+ */
+u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
+{
+	u32 *reg_array = (u32 *)&vcpu->arch.regs;
+
+	BUG_ON(reg_num > 15);
+	BUG_ON(mode > MODE_SYS);
+
+	return reg_array + vcpu_reg_offsets[mode][reg_num];
+}
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
new file mode 100644
index 0000000..3e38c95
--- /dev/null
+++ b/arch/arm/kvm/exports.c
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/module.h>
+
+EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
new file mode 100644
index 0000000..19a5389
--- /dev/null
+++ b/arch/arm/kvm/guest.c
@@ -0,0 +1,211 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <asm/uaccess.h>
+#include <asm/kvm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
+
+#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
+#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ NULL }
+};
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+static u64 core_reg_offset_from_id(u64 id)
+{
+	return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
+}
+
+static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	u32 __user *uaddr = (u32 __user *)(long)reg->addr;
+	struct kvm_regs *regs = &vcpu->arch.regs;
+	u64 off;
+
+	if (KVM_REG_SIZE(reg->id) != 4)
+		return -ENOENT;
+
+	/* Our ID is an index into the kvm_regs struct. */
+	off = core_reg_offset_from_id(reg->id);
+	if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
+		return -ENOENT;
+
+	return put_user(((u32 *)regs)[off], uaddr);
+}
+
+static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	u32 __user *uaddr = (u32 __user *)(long)reg->addr;
+	struct kvm_regs *regs = &vcpu->arch.regs;
+	u64 off, val;
+
+	if (KVM_REG_SIZE(reg->id) != 4)
+		return -ENOENT;
+
+	/* Our ID is an index into the kvm_regs struct. */
+	off = core_reg_offset_from_id(reg->id);
+	if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
+		return -ENOENT;
+
+	if (get_user(val, uaddr) != 0)
+		return -EFAULT;
+
+	if (off == KVM_REG_ARM_CORE_REG(cpsr)) {
+		if (__vcpu_mode(val) == 0xf)
+			return -EINVAL;
+	}
+
+	((u32 *)regs)[off] = val;
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	return -EINVAL;
+}
+
+static unsigned long num_core_regs(void)
+{
+	return sizeof(struct kvm_regs) / sizeof(u32);
+}
+
+/**
+ * kvm_arm_num_regs - how many registers do we present via KVM_GET_ONE_REG
+ *
+ * This is for all registers.
+ */
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
+{
+	return num_core_regs();
+}
+
+/**
+ * kvm_arm_copy_reg_indices - get indices of all registers.
+ *
+ * We do core registers right here, then we apppend coproc regs.
+ */
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
+{
+	unsigned int i;
+	const u64 core_reg = KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_CORE;
+
+	for (i = 0; i < sizeof(struct kvm_regs)/sizeof(u32); i++) {
+		if (put_user(core_reg | i, uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	return 0;
+}
+
+int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	/* We currently use nothing arch-specific in upper 32 bits */
+	if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM >> 32)
+		return -EINVAL;
+
+	/* Register group 16 means we want a core register. */
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
+		return get_core_reg(vcpu, reg);
+
+	return -EINVAL;
+}
+
+int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	/* We currently use nothing arch-specific in upper 32 bits */
+	if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM >> 32)
+		return -EINVAL;
+
+	/* Register group 16 means we set a core register. */
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
+		return set_core_reg(vcpu, reg);
+
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
+			const struct kvm_vcpu_init *init)
+{
+	unsigned int i;
+
+	/* We can only do a cortex A15 for now. */
+	if (init->target != kvm_target_cpu())
+		return -EINVAL;
+
+	vcpu->arch.target = init->target;
+	bitmap_zero(vcpu->arch.features, NUM_FEATURES);
+
+	/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
+	for (i = 0; i < sizeof(init->features)*8; i++) {
+		if (init->features[i / 32] & (1 << (i % 32))) {
+			if (i >= NUM_FEATURES)
+				return -ENOENT;
+			set_bit(i, vcpu->arch.features);
+		}
+	}
+
+	/* Now we know what it is, we can reset it. */
+	return kvm_reset_vcpu(vcpu);
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+				  struct kvm_translation *tr)
+{
+	return -EINVAL;
+}
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
new file mode 100644
index 0000000..1dc8926
--- /dev/null
+++ b/arch/arm/kvm/init.S
@@ -0,0 +1,19 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
new file mode 100644
index 0000000..1dc8926
--- /dev/null
+++ b/arch/arm/kvm/interrupts.S
@@ -0,0 +1,19 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
new file mode 100644
index 0000000..10ed464
--- /dev/null
+++ b/arch/arm/kvm/mmu.c
@@ -0,0 +1,17 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
new file mode 100644
index 0000000..290a13d
--- /dev/null
+++ b/arch/arm/kvm/reset.c
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/sched.h>
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+
+#include <asm/unified.h>
+#include <asm/ptrace.h>
+#include <asm/cputype.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_coproc.h>
+
+/******************************************************************************
+ * Cortex-A15 Reset Values
+ */
+
+static const int a15_max_cpu_idx = 3;
+
+static struct kvm_regs a15_regs_reset = {
+	.cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
+};
+
+
+/*******************************************************************************
+ * Exported reset function
+ */
+
+/**
+ * kvm_reset_vcpu - sets core registers and cp15 registers to reset value
+ * @vcpu: The VCPU pointer
+ *
+ * This function finds the right table above and sets the registers on the
+ * virtual CPU struct to their architectually defined reset values.
+ */
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+	struct kvm_regs *cpu_reset;
+
+	switch (vcpu->arch.target) {
+	case KVM_ARM_TARGET_CORTEX_A15:
+		if (vcpu->vcpu_id > a15_max_cpu_idx)
+			return -EINVAL;
+		cpu_reset = &a15_regs_reset;
+		vcpu->arch.midr = read_cpuid_id();
+		break;
+	default:
+		return -ENODEV;
+	}
+
+	/* Reset core registers */
+	memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
+
+	/* Reset CP15 registers */
+	kvm_reset_coprocs(vcpu);
+
+	return 0;
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
new file mode 100644
index 0000000..f8869c1
--- /dev/null
+++ b/arch/arm/kvm/trace.h
@@ -0,0 +1,52 @@
+#if !defined(_TRACE_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+/*
+ * Tracepoints for entry/exit to guest
+ */
+TRACE_EVENT(kvm_entry,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+TRACE_EVENT(kvm_exit,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+
+
+#endif /* _TRACE_KVM_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH arch/arm/kvm
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index d808694..a960f66 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -911,6 +911,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_ONE_REG		  _IOW(KVMIO,  0xac, struct kvm_one_reg)
 /* VM is being stopped by host */
 #define KVM_KVMCLOCK_CTRL	  _IO(KVMIO,   0xad)
+#define KVM_ARM_VCPU_INIT	  _IOW(KVMIO,  0xae, struct kvm_vcpu_init)
+#define KVM_GET_REG_LIST	  _IOWR(KVMIO, 0xb0, struct kvm_reg_list)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-09-15 15:35   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

?Nothing to see here. Move along, move along..."

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt  |   54 +++++-
 arch/arm/Kconfig                   |    2 
 arch/arm/Makefile                  |    1 
 arch/arm/include/asm/kvm.h         |   88 +++++++++
 arch/arm/include/asm/kvm_arm.h     |   28 +++
 arch/arm/include/asm/kvm_asm.h     |   30 +++
 arch/arm/include/asm/kvm_coproc.h  |   24 +++
 arch/arm/include/asm/kvm_emulate.h |  108 +++++++++++
 arch/arm/include/asm/kvm_host.h    |  172 ++++++++++++++++++
 arch/arm/kvm/Kconfig               |   44 +++++
 arch/arm/kvm/Makefile              |   21 ++
 arch/arm/kvm/arm.c                 |  345 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/coproc.c              |   22 ++
 arch/arm/kvm/emulate.c             |  127 +++++++++++++
 arch/arm/kvm/exports.c             |   21 ++
 arch/arm/kvm/guest.c               |  211 ++++++++++++++++++++++
 arch/arm/kvm/init.S                |   19 ++
 arch/arm/kvm/interrupts.S          |   19 ++
 arch/arm/kvm/mmu.c                 |   17 ++
 arch/arm/kvm/reset.c               |   74 ++++++++
 arch/arm/kvm/trace.h               |   52 +++++
 include/linux/kvm.h                |    2 
 22 files changed, 1477 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/exports.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 36befa7..67640c6 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -293,7 +293,7 @@ kvm_run' (see below).
 4.11 KVM_GET_REGS
 
 Capability: basic
-Architectures: all
+Architectures: all except ARM
 Type: vcpu ioctl
 Parameters: struct kvm_regs (out)
 Returns: 0 on success, -1 on error
@@ -314,7 +314,7 @@ struct kvm_regs {
 4.12 KVM_SET_REGS
 
 Capability: basic
-Architectures: all
+Architectures: all except ARM
 Type: vcpu ioctl
 Parameters: struct kvm_regs (in)
 Returns: 0 on success, -1 on error
@@ -600,7 +600,7 @@ struct kvm_fpu {
 4.24 KVM_CREATE_IRQCHIP
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, ARM
 Type: vm ioctl
 Parameters: none
 Returns: 0 on success, -1 on error
@@ -608,7 +608,8 @@ Returns: 0 on success, -1 on error
 Creates an interrupt controller model in the kernel.  On x86, creates a virtual
 ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
 local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
-only go to the IOAPIC.  On ia64, a IOSAPIC is created.
+only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM, a GIC is
+created.
 
 
 4.25 KVM_IRQ_LINE
@@ -1732,6 +1733,11 @@ registers, find a list below:
         |                       |
   PPC   | KVM_REG_PPC_HIOR      | 64
 
+ARM registers are mapped using the lower 32 bits.  The upper 16 of that
+is the register group type, or coprocessor number:
+
+ARM core registers have the following id bit patterns:
+  0x4002 0000 0010 <index into the kvm_regs struct:32>
 
 4.69 KVM_GET_ONE_REG
 
@@ -1985,6 +1991,46 @@ the virtualized real-mode area (VRMA) facility, the kernel will
 re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
 
 
+4.77 KVM_ARM_VCPU_INIT
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct struct kvm_vcpu_init (in)
+Returns: 0 on success; -1 on error
+Errors:
+ ?EINVAL: ???the target is unknown, or the combination of features is invalid.
+ ?ENOENT: ???a features bit specified is unknown.
+
+This tells KVM what type of CPU to present to the guest, and what
+optional features it should have. ?This will cause a reset of the cpu
+registers to their initial values. ?If this is not called, KVM_RUN will
+return ENOEXEC for that vcpu.
+
+Note that because some registers reflect machine topology, all vcpus
+should be created before this ioctl is invoked.
+
+
+4.78 KVM_GET_REG_LIST
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct kvm_reg_list (in/out)
+Returns: 0 on success; -1 on error
+Errors:
+ ?E2BIG: ????the reg index list is too big to fit in the array specified by
+ ????????????the user (the number required will be written into n).
+
+struct kvm_reg_list {
+	__u64 n; /* number of registers in reg[] */
+	__u64 reg[0];
+};
+
+This ioctl returns the guest registers that are supported for the
+KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
+
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2f88d8d..c8fea8f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2336,3 +2336,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
+
+source "arch/arm/kvm/Kconfig"
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 30eae87..3bcc414 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -255,6 +255,7 @@ core-$(CONFIG_VFP)		+= arch/arm/vfp/
 # If we have a machine-specific directory, then include it in the build.
 core-y				+= arch/arm/kernel/ arch/arm/mm/ arch/arm/common/
 core-y				+= arch/arm/net/
+core-y 				+= arch/arm/kvm/
 core-y				+= $(machdirs) $(platdirs)
 
 drivers-$(CONFIG_OPROFILE)      += arch/arm/oprofile/
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
new file mode 100644
index 0000000..a13b582
--- /dev/null
+++ b/arch/arm/include/asm/kvm.h
@@ -0,0 +1,88 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_H__
+#define __ARM_KVM_H__
+
+#include <asm/types.h>
+
+#define __KVM_HAVE_GUEST_DEBUG
+
+#define KVM_REG_SIZE(id)						\
+	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
+
+struct kvm_regs {
+	__u32 usr_regs[15];	/* R0_usr - R14_usr */
+	__u32 svc_regs[3];	/* SP_svc, LR_svc, SPSR_svc */
+	__u32 abt_regs[3];	/* SP_abt, LR_abt, SPSR_abt */
+	__u32 und_regs[3];	/* SP_und, LR_und, SPSR_und */
+	__u32 irq_regs[3];	/* SP_irq, LR_irq, SPSR_irq */
+	__u32 fiq_regs[8];	/* R8_fiq - R14_fiq, SPSR_fiq */
+	__u32 pc;		/* The program counter (r15) */
+	__u32 cpsr;		/* The guest CPSR */
+};
+
+/* Supported Processor Types */
+#define KVM_ARM_TARGET_CORTEX_A15	(0xC0F)
+
+struct kvm_vcpu_init {
+	__u32 target;
+	__u32 features[7];
+};
+
+struct kvm_sregs {
+};
+
+struct kvm_fpu {
+};
+
+struct kvm_guest_debug_arch {
+};
+
+struct kvm_debug_exit_arch {
+};
+
+struct kvm_sync_regs {
+};
+
+struct kvm_arch_memory_slot {
+};
+
+/* For KVM_VCPU_GET_REG_LIST. */
+struct kvm_reg_list {
+	__u64 n; /* number of regs */
+	__u64 reg[0];
+};
+
+/* If you need to interpret the index values, here is the key: */
+#define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
+#define KVM_REG_ARM_COPROC_SHIFT	16
+#define KVM_REG_ARM_32_OPC2_MASK	0x0000000000000007
+#define KVM_REG_ARM_32_OPC2_SHIFT	0
+#define KVM_REG_ARM_OPC1_MASK		0x0000000000000078
+#define KVM_REG_ARM_OPC1_SHIFT		3
+#define KVM_REG_ARM_CRM_MASK		0x0000000000000780
+#define KVM_REG_ARM_CRM_SHIFT		7
+#define KVM_REG_ARM_32_CRN_MASK		0x0000000000007800
+#define KVM_REG_ARM_32_CRN_SHIFT	11
+
+/* Normal registers are mapped as coprocessor 16. */
+#define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
+
+#endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
new file mode 100644
index 0000000..2f9d28e
--- /dev/null
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_ARM_H__
+#define __ARM_KVM_ARM_H__
+
+/* Supported Processor Types */
+#define CORTEX_A15	(0xC0F)
+
+/* Multiprocessor Affinity Register */
+#define MPIDR_CPUID	(0x3 << 0)
+
+#endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
new file mode 100644
index 0000000..44591f9
--- /dev/null
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_ASM_H__
+#define __ARM_KVM_ASM_H__
+
+#define ARM_EXCEPTION_RESET	  0
+#define ARM_EXCEPTION_UNDEFINED   1
+#define ARM_EXCEPTION_SOFTWARE    2
+#define ARM_EXCEPTION_PREF_ABORT  3
+#define ARM_EXCEPTION_DATA_ABORT  4
+#define ARM_EXCEPTION_IRQ	  5
+#define ARM_EXCEPTION_FIQ	  6
+
+#endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
new file mode 100644
index 0000000..b6d023d
--- /dev/null
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2012 Rusty Russell IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_COPROC_H__
+#define __ARM_KVM_COPROC_H__
+#include <linux/kvm_host.h>
+
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
+
+#endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
new file mode 100644
index 0000000..9e29335
--- /dev/null
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_EMULATE_H__
+#define __ARM_KVM_EMULATE_H__
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_asm.h>
+
+u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, enum vcpu_mode mode);
+
+static inline u8 __vcpu_mode(u32 cpsr)
+{
+	u8 modes_table[32] = {
+		0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
+		0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
+		MODE_USR,	/* 0x0 */
+		MODE_FIQ,	/* 0x1 */
+		MODE_IRQ,	/* 0x2 */
+		MODE_SVC,	/* 0x3 */
+		0xf, 0xf, 0xf,
+		MODE_ABT,	/* 0x7 */
+		0xf, 0xf, 0xf,
+		MODE_UND,	/* 0xb */
+		0xf, 0xf, 0xf,
+		MODE_SYS	/* 0xf */
+	};
+
+	return modes_table[cpsr & 0x1f];
+}
+
+static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
+{
+	u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
+	BUG_ON(mode == 0xf);
+	return mode;
+}
+
+/*
+ * Return the SPSR for the specified mode of the virtual CPU.
+ */
+static inline u32 *vcpu_spsr_mode(struct kvm_vcpu *vcpu, enum vcpu_mode mode)
+{
+	switch (mode) {
+	case MODE_SVC:
+		return &vcpu->arch.regs.svc_regs[2];
+	case MODE_ABT:
+		return &vcpu->arch.regs.abt_regs[2];
+	case MODE_UND:
+		return &vcpu->arch.regs.und_regs[2];
+	case MODE_IRQ:
+		return &vcpu->arch.regs.irq_regs[2];
+	case MODE_FIQ:
+		return &vcpu->arch.regs.fiq_regs[7];
+	default:
+		BUG();
+	}
+}
+
+/* Get vcpu register for current mode */
+static inline u32 *vcpu_reg(struct kvm_vcpu *vcpu, unsigned long reg_num)
+{
+	return vcpu_reg_mode(vcpu, reg_num, vcpu_mode(vcpu));
+}
+
+static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
+{
+	return vcpu_reg(vcpu, 15);
+}
+
+static inline u32 *vcpu_cpsr(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.regs.cpsr;
+}
+
+/* Get vcpu SPSR for current mode */
+static inline u32 *vcpu_spsr(struct kvm_vcpu *vcpu)
+{
+	return vcpu_spsr_mode(vcpu, vcpu_mode(vcpu));
+}
+
+static inline bool mode_has_spsr(struct kvm_vcpu *vcpu)
+{
+	return (vcpu_mode(vcpu) < MODE_USR);
+}
+
+static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
+{
+	BUG_ON(vcpu_mode(vcpu) > MODE_SYS);
+	return vcpu_mode(vcpu) != MODE_USR;
+}
+
+#endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
new file mode 100644
index 0000000..24959f4
--- /dev/null
+++ b/arch/arm/include/asm/kvm_host.h
@@ -0,0 +1,172 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_HOST_H__
+#define __ARM_KVM_HOST_H__
+
+#include <asm/kvm.h>
+
+#define KVM_MAX_VCPUS 4
+#define KVM_MEMORY_SLOTS 32
+#define KVM_PRIVATE_MEM_SLOTS 4
+#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+
+#define NUM_FEATURES 0
+
+/* We don't currently support large pages. */
+#define KVM_HPAGE_GFN_SHIFT(x)	0
+#define KVM_NR_PAGE_SIZES	1
+#define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
+
+struct kvm_vcpu;
+u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
+int kvm_target_cpu(void);
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
+
+struct kvm_arch {
+	/* The VMID generation used for the virt. memory system */
+	u64    vmid_gen;
+	u32    vmid;
+
+	/* 1-level 2nd stage table and lock */
+	spinlock_t pgd_lock;
+	pgd_t *pgd;
+
+	/* VTTBR value associated with above pgd and vmid */
+	u64    vttbr;
+};
+
+#define EXCEPTION_NONE      0
+#define EXCEPTION_RESET     0x80
+#define EXCEPTION_UNDEFINED 0x40
+#define EXCEPTION_SOFTWARE  0x20
+#define EXCEPTION_PREFETCH  0x10
+#define EXCEPTION_DATA      0x08
+#define EXCEPTION_IMPRECISE 0x04
+#define EXCEPTION_IRQ       0x02
+#define EXCEPTION_FIQ       0x01
+
+#define KVM_NR_MEM_OBJS     40
+
+/*
+ * We don't want allocation failures within the mmu code, so we preallocate
+ * enough memory for a single page fault in a cache.
+ */
+struct kvm_mmu_memory_cache {
+	int nobjs;
+	void *objects[KVM_NR_MEM_OBJS];
+};
+
+/*
+ * Modes used for short-hand mode determinition in the world-switch code and
+ * in emulation code.
+ *
+ * Note: These indices do NOT correspond to the value of the CPSR mode bits!
+ */
+enum vcpu_mode {
+	MODE_FIQ = 0,
+	MODE_IRQ,
+	MODE_SVC,
+	MODE_ABT,
+	MODE_UND,
+	MODE_USR,
+	MODE_SYS
+};
+
+/* 0 is reserved as an invalid value. */
+enum cp15_regs {
+	c0_MPIDR=1,		/* MultiProcessor ID Register */
+	c0_CSSELR,		/* Cache Size Selection Register */
+	c1_SCTLR,		/* System Control Register */
+	c1_ACTLR,		/* Auxilliary Control Register */
+	c1_CPACR,		/* Coprocessor Access Control */
+	c2_TTBR0,		/* Translation Table Base Register 0 */
+	c2_TTBR0_high,		/* TTBR0 top 32 bits */
+	c2_TTBR1,		/* Translation Table Base Register 1 */
+	c2_TTBR1_high,		/* TTBR1 top 32 bits */
+	c2_TTBCR,		/* Translation Table Base Control R. */
+	c3_DACR,		/* Domain Access Control Register */
+	c5_DFSR,		/* Data Fault Status Register */
+	c5_IFSR,		/* Instruction Fault Status Register */
+	c5_ADFSR,		/* Auxilary Data Fault Status Register */
+	c5_AIFSR,		/* Auxilary Instruction Fault Status Register */
+	c6_DFAR,		/* Data Fault Address Register */
+	c6_IFAR,		/* Instruction Fault Address Register */
+	c10_PRRR,		/* Primary Region Remap Register */
+	c10_NMRR,		/* Normal Memory Remap Register */
+	c12_VBAR,		/* Vector Base Address Register */
+	c13_CID,		/* Context ID Register */
+	c13_TID_URW,		/* Thread ID, User R/W */
+	c13_TID_URO,		/* Thread ID, User R/O */
+	c13_TID_PRIV,		/* Thread ID, Priveleged */
+
+	nr_cp15_regs
+};
+
+struct kvm_vcpu_arch {
+	struct kvm_regs regs;
+
+	u32 target; /* Currently KVM_ARM_TARGET_CORTEX_A15 */
+	DECLARE_BITMAP(features, NUM_FEATURES);
+
+	/* System control coprocessor (cp15) */
+	u32 cp15[nr_cp15_regs];
+
+	/* The CPU type we expose to the VM */
+	u32 midr;
+
+	/* Exception Information */
+	u32 hsr;		/* Hyp Syndrom Register */
+	u32 hdfar;		/* Hyp Data Fault Address Register */
+	u32 hifar;		/* Hyp Inst. Fault Address Register */
+	u32 hpfar;		/* Hyp IPA Fault Address Register */
+
+	/* IO related fields */
+	struct {
+		bool sign_extend;	/* for byte/halfword loads */
+		u32  rd;
+	} mmio;
+
+	/* Interrupt related fields */
+	u32 irq_lines;		/* IRQ and FIQ levels */
+
+	/* Hyp exception information */
+	u32 hyp_pc;		/* PC when exception was taken from Hyp mode */
+
+	/* Cache some mmu pages needed inside spinlock regions */
+	struct kvm_mmu_memory_cache mmu_page_cache;
+};
+
+struct kvm_vm_stat {
+	u32 remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat {
+	u32 halt_wakeup;
+};
+
+struct kvm_vcpu_init;
+int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
+			const struct kvm_vcpu_init *init);
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
+struct kvm_one_reg;
+int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
new file mode 100644
index 0000000..a07ddcc
--- /dev/null
+++ b/arch/arm/kvm/Kconfig
@@ -0,0 +1,44 @@
+#
+# KVM configuration
+#
+
+source "virt/kvm/Kconfig"
+
+menuconfig VIRTUALIZATION
+	bool "Virtualization"
+	---help---
+	  Say Y here to get to see options for using your Linux host to run
+	  other operating systems inside virtual machines (guests).
+	  This option alone does not add any kernel code.
+
+	  If you say N, all options in this submenu will be skipped and
+	  disabled.
+
+if VIRTUALIZATION
+
+config KVM
+	bool "Kernel-based Virtual Machine (KVM) support"
+	select PREEMPT_NOTIFIERS
+	select ANON_INODES
+	select KVM_MMIO
+	depends on ARM_VIRT_EXT && ARM_LPAE
+	---help---
+	  Support hosting virtualized guest machines. You will also
+	  need to select one or more of the processor modules below.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
+config KVM_ARM_HOST
+	bool "KVM host support for ARM cpus."
+	depends on KVM
+	depends on MMU
+	depends on CPU_V7 && ARM_VIRT_EXT
+	---help---
+	  Provides host support for ARM processors.
+
+source drivers/virtio/Kconfig
+
+endif # VIRTUALIZATION
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
new file mode 100644
index 0000000..db8c8f4
--- /dev/null
+++ b/arch/arm/kvm/Makefile
@@ -0,0 +1,21 @@
+#
+# Makefile for Kernel-based Virtual Machine module
+#
+
+plus_virt := $(call as-instr,.arch_extension virt,+virt)
+ifeq ($(plus_virt),+virt)
+	plus_virt_def := -DREQUIRES_VIRT=1
+endif
+
+ccflags-y += -Ivirt/kvm -Iarch/arm/kvm
+CFLAGS_arm.o := -I. $(plus_virt_def)
+CFLAGS_mmu.o := -I.
+
+AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
+AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
+
+obj-$(CONFIG_KVM_ARM_HOST) += init.o interrupts.o exports.o
+
+obj-$(CONFIG_KVM_ARM_HOST) += $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+
+obj-$(CONFIG_KVM_ARM_HOST) += arm.o guest.o mmu.o emulate.o reset.o coproc.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
new file mode 100644
index 0000000..fd6fa9b
--- /dev/null
+++ b/arch/arm/kvm/arm.c
@@ -0,0 +1,345 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <linux/mman.h>
+#include <linux/sched.h>
+#include <trace/events/kvm.h>
+
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
+#include <asm/unified.h>
+#include <asm/uaccess.h>
+#include <asm/ptrace.h>
+#include <asm/mman.h>
+#include <asm/cputype.h>
+
+#ifdef REQUIRES_VIRT
+__asm__(".arch_extension	virt");
+#endif
+
+int kvm_arch_hardware_enable(void *garbage)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
+{
+	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
+}
+
+void kvm_arch_hardware_disable(void *garbage)
+{
+}
+
+int kvm_arch_hardware_setup(void)
+{
+	return 0;
+}
+
+void kvm_arch_hardware_unsetup(void)
+{
+}
+
+void kvm_arch_check_processor_compat(void *rtn)
+{
+	*(int *)rtn = 0;
+}
+
+void kvm_arch_sync_events(struct kvm *kvm)
+{
+}
+
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+{
+	if (type)
+		return -EINVAL;
+
+	return 0;
+}
+
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+	return VM_FAULT_SIGBUS;
+}
+
+void kvm_arch_free_memslot(struct kvm_memory_slot *free,
+			   struct kvm_memory_slot *dont)
+{
+}
+
+int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
+{
+	return 0;
+}
+
+void kvm_arch_destroy_vm(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+		if (kvm->vcpus[i]) {
+			kvm_arch_vcpu_free(kvm->vcpus[i]);
+			kvm->vcpus[i] = NULL;
+		}
+	}
+}
+
+int kvm_dev_ioctl_check_extension(long ext)
+{
+	int r;
+	switch (ext) {
+	case KVM_CAP_USER_MEMORY:
+	case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
+	case KVM_CAP_ONE_REG:
+		r = 1;
+		break;
+	case KVM_CAP_COALESCED_MMIO:
+		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
+		break;
+	default:
+		r = 0;
+		break;
+	}
+	return r;
+}
+
+long kvm_arch_dev_ioctl(struct file *filp,
+			unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_set_memory_region(struct kvm *kvm,
+			       struct kvm_userspace_memory_region *mem,
+			       struct kvm_memory_slot old,
+			       int user_alloc)
+{
+	return 0;
+}
+
+int kvm_arch_prepare_memory_region(struct kvm *kvm,
+				   struct kvm_memory_slot *memslot,
+				   struct kvm_memory_slot old,
+				   struct kvm_userspace_memory_region *mem,
+				   int user_alloc)
+{
+	return 0;
+}
+
+void kvm_arch_commit_memory_region(struct kvm *kvm,
+				   struct kvm_userspace_memory_region *mem,
+				   struct kvm_memory_slot old,
+				   int user_alloc)
+{
+}
+
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+}
+
+void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+				   struct kvm_memory_slot *slot)
+{
+}
+
+struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	int err;
+	struct kvm_vcpu *vcpu;
+
+	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	if (!vcpu) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	return vcpu;
+free_vcpu:
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
+out:
+	return ERR_PTR(err);
+}
+
+void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvm_arch_vcpu_free(vcpu);
+}
+
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int __attribute_const__ kvm_target_cpu(void)
+{
+	unsigned int midr;
+
+	midr = read_cpuid_id();
+	switch ((midr >> 4) & 0xfff) {
+	case KVM_ARM_TARGET_CORTEX_A15:
+		return KVM_ARM_TARGET_CORTEX_A15;
+	default:
+		return -EINVAL;
+	}
+}
+
+int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
+void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
+{
+}
+
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+					struct kvm_guest_debug *dbg)
+{
+	return -EINVAL;
+}
+
+
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vcpu_ioctl(struct file *filp,
+			 unsigned int ioctl, unsigned long arg)
+{
+	struct kvm_vcpu *vcpu = filp->private_data;
+	void __user *argp = (void __user *)arg;
+
+	switch (ioctl) {
+	case KVM_ARM_VCPU_INIT: {
+		struct kvm_vcpu_init init;
+
+		if (copy_from_user(&init, argp, sizeof init))
+			return -EFAULT;
+
+		return kvm_vcpu_set_target(vcpu, &init);
+
+	}
+	case KVM_SET_ONE_REG:
+	case KVM_GET_ONE_REG: {
+		struct kvm_one_reg reg;
+		if (copy_from_user(&reg, argp, sizeof(reg)))
+			return -EFAULT;
+		if (ioctl == KVM_SET_ONE_REG)
+			return kvm_arm_set_reg(vcpu, &reg);
+		else
+			return kvm_arm_get_reg(vcpu, &reg);
+	}
+	case KVM_GET_REG_LIST: {
+		struct kvm_reg_list __user *user_list = argp;
+		struct kvm_reg_list reg_list;
+		unsigned n;
+
+		if (copy_from_user(&reg_list, user_list, sizeof reg_list))
+			return -EFAULT;
+		n = reg_list.n;
+		reg_list.n = kvm_arm_num_regs(vcpu);
+		if (copy_to_user(user_list, &reg_list, sizeof reg_list))
+			return -EFAULT;
+		if (n < reg_list.n)
+			return -E2BIG;
+		return kvm_arm_copy_reg_indices(vcpu, user_list->reg);
+	}
+	default:
+		return -EINVAL;
+	}
+}
+
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vm_ioctl(struct file *filp,
+		       unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_init(void *opaque)
+{
+	return 0;
+}
+
+void kvm_arch_exit(void)
+{
+}
+
+static int arm_init(void)
+{
+	int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+	return rc;
+}
+
+static void __exit arm_exit(void)
+{
+	kvm_exit();
+}
+
+module_init(arm_init);
+module_exit(arm_exit)
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
new file mode 100644
index 0000000..4b9dad8
--- /dev/null
+++ b/arch/arm/kvm/coproc.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/kvm_host.h>
+
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
+{
+}
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
new file mode 100644
index 0000000..690bbb3
--- /dev/null
+++ b/arch/arm/kvm/emulate.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <asm/kvm_emulate.h>
+
+#define REG_OFFSET(_reg) \
+	(offsetof(struct kvm_regs, _reg) / sizeof(u32))
+
+#define USR_REG_OFFSET(_num) REG_OFFSET(usr_regs[_num])
+
+static const unsigned long vcpu_reg_offsets[MODE_SYS + 1][16] = {
+	/* FIQ Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7),
+		REG_OFFSET(fiq_regs[1]), /* r8 */
+		REG_OFFSET(fiq_regs[1]), /* r9 */
+		REG_OFFSET(fiq_regs[2]), /* r10 */
+		REG_OFFSET(fiq_regs[3]), /* r11 */
+		REG_OFFSET(fiq_regs[4]), /* r12 */
+		REG_OFFSET(fiq_regs[5]), /* r13 */
+		REG_OFFSET(fiq_regs[6]), /* r14 */
+		REG_OFFSET(pc)		 /* r15 */
+	},
+
+	/* IRQ Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(irq_regs[0]), /* r13 */
+		REG_OFFSET(irq_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* SVC Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(svc_regs[0]), /* r13 */
+		REG_OFFSET(svc_regs[1]), /* r14 */
+		REG_OFFSET(pc)		 /* r15 */
+	},
+
+	/* ABT Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(abt_regs[0]), /* r13 */
+		REG_OFFSET(abt_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* UND Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(und_regs[0]), /* r13 */
+		REG_OFFSET(und_regs[1]), /* r14 */
+		REG_OFFSET(pc)	         /* r15 */
+	},
+
+	/* USR Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(usr_regs[13]), /* r13 */
+		REG_OFFSET(usr_regs[14]), /* r14 */
+		REG_OFFSET(pc)	          /* r15 */
+	},
+
+	/* SYS Registers */
+	{
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(usr_regs[13]), /* r13 */
+		REG_OFFSET(usr_regs[14]), /* r14 */
+		REG_OFFSET(pc)	          /* r15 */
+	},
+};
+
+/*
+ * Return a pointer to the register number valid in the specified mode of
+ * the virtual CPU.
+ */
+u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
+{
+	u32 *reg_array = (u32 *)&vcpu->arch.regs;
+
+	BUG_ON(reg_num > 15);
+	BUG_ON(mode > MODE_SYS);
+
+	return reg_array + vcpu_reg_offsets[mode][reg_num];
+}
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
new file mode 100644
index 0000000..3e38c95
--- /dev/null
+++ b/arch/arm/kvm/exports.c
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/module.h>
+
+EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
new file mode 100644
index 0000000..19a5389
--- /dev/null
+++ b/arch/arm/kvm/guest.c
@@ -0,0 +1,211 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <asm/uaccess.h>
+#include <asm/kvm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
+
+#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
+#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ NULL }
+};
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+static u64 core_reg_offset_from_id(u64 id)
+{
+	return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
+}
+
+static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	u32 __user *uaddr = (u32 __user *)(long)reg->addr;
+	struct kvm_regs *regs = &vcpu->arch.regs;
+	u64 off;
+
+	if (KVM_REG_SIZE(reg->id) != 4)
+		return -ENOENT;
+
+	/* Our ID is an index into the kvm_regs struct. */
+	off = core_reg_offset_from_id(reg->id);
+	if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
+		return -ENOENT;
+
+	return put_user(((u32 *)regs)[off], uaddr);
+}
+
+static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	u32 __user *uaddr = (u32 __user *)(long)reg->addr;
+	struct kvm_regs *regs = &vcpu->arch.regs;
+	u64 off, val;
+
+	if (KVM_REG_SIZE(reg->id) != 4)
+		return -ENOENT;
+
+	/* Our ID is an index into the kvm_regs struct. */
+	off = core_reg_offset_from_id(reg->id);
+	if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
+		return -ENOENT;
+
+	if (get_user(val, uaddr) != 0)
+		return -EFAULT;
+
+	if (off == KVM_REG_ARM_CORE_REG(cpsr)) {
+		if (__vcpu_mode(val) == 0xf)
+			return -EINVAL;
+	}
+
+	((u32 *)regs)[off] = val;
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	return -EINVAL;
+}
+
+static unsigned long num_core_regs(void)
+{
+	return sizeof(struct kvm_regs) / sizeof(u32);
+}
+
+/**
+ * kvm_arm_num_regs - how many registers do we present via KVM_GET_ONE_REG
+ *
+ * This is for all registers.
+ */
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
+{
+	return num_core_regs();
+}
+
+/**
+ * kvm_arm_copy_reg_indices - get indices of all registers.
+ *
+ * We do core registers right here, then we apppend coproc regs.
+ */
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
+{
+	unsigned int i;
+	const u64 core_reg = KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_CORE;
+
+	for (i = 0; i < sizeof(struct kvm_regs)/sizeof(u32); i++) {
+		if (put_user(core_reg | i, uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	return 0;
+}
+
+int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	/* We currently use nothing arch-specific in upper 32 bits */
+	if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM >> 32)
+		return -EINVAL;
+
+	/* Register group 16 means we want a core register. */
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
+		return get_core_reg(vcpu, reg);
+
+	return -EINVAL;
+}
+
+int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	/* We currently use nothing arch-specific in upper 32 bits */
+	if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM >> 32)
+		return -EINVAL;
+
+	/* Register group 16 means we set a core register. */
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
+		return set_core_reg(vcpu, reg);
+
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
+			const struct kvm_vcpu_init *init)
+{
+	unsigned int i;
+
+	/* We can only do a cortex A15 for now. */
+	if (init->target != kvm_target_cpu())
+		return -EINVAL;
+
+	vcpu->arch.target = init->target;
+	bitmap_zero(vcpu->arch.features, NUM_FEATURES);
+
+	/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
+	for (i = 0; i < sizeof(init->features)*8; i++) {
+		if (init->features[i / 32] & (1 << (i % 32))) {
+			if (i >= NUM_FEATURES)
+				return -ENOENT;
+			set_bit(i, vcpu->arch.features);
+		}
+	}
+
+	/* Now we know what it is, we can reset it. */
+	return kvm_reset_vcpu(vcpu);
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+				  struct kvm_translation *tr)
+{
+	return -EINVAL;
+}
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
new file mode 100644
index 0000000..1dc8926
--- /dev/null
+++ b/arch/arm/kvm/init.S
@@ -0,0 +1,19 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
new file mode 100644
index 0000000..1dc8926
--- /dev/null
+++ b/arch/arm/kvm/interrupts.S
@@ -0,0 +1,19 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
new file mode 100644
index 0000000..10ed464
--- /dev/null
+++ b/arch/arm/kvm/mmu.c
@@ -0,0 +1,17 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
new file mode 100644
index 0000000..290a13d
--- /dev/null
+++ b/arch/arm/kvm/reset.c
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/sched.h>
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+
+#include <asm/unified.h>
+#include <asm/ptrace.h>
+#include <asm/cputype.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_coproc.h>
+
+/******************************************************************************
+ * Cortex-A15 Reset Values
+ */
+
+static const int a15_max_cpu_idx = 3;
+
+static struct kvm_regs a15_regs_reset = {
+	.cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
+};
+
+
+/*******************************************************************************
+ * Exported reset function
+ */
+
+/**
+ * kvm_reset_vcpu - sets core registers and cp15 registers to reset value
+ * @vcpu: The VCPU pointer
+ *
+ * This function finds the right table above and sets the registers on the
+ * virtual CPU struct to their architectually defined reset values.
+ */
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+	struct kvm_regs *cpu_reset;
+
+	switch (vcpu->arch.target) {
+	case KVM_ARM_TARGET_CORTEX_A15:
+		if (vcpu->vcpu_id > a15_max_cpu_idx)
+			return -EINVAL;
+		cpu_reset = &a15_regs_reset;
+		vcpu->arch.midr = read_cpuid_id();
+		break;
+	default:
+		return -ENODEV;
+	}
+
+	/* Reset core registers */
+	memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
+
+	/* Reset CP15 registers */
+	kvm_reset_coprocs(vcpu);
+
+	return 0;
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
new file mode 100644
index 0000000..f8869c1
--- /dev/null
+++ b/arch/arm/kvm/trace.h
@@ -0,0 +1,52 @@
+#if !defined(_TRACE_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+/*
+ * Tracepoints for entry/exit to guest
+ */
+TRACE_EVENT(kvm_entry,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+TRACE_EVENT(kvm_exit,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+
+
+#endif /* _TRACE_KVM_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH arch/arm/kvm
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index d808694..a960f66 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -911,6 +911,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_ONE_REG		  _IOW(KVMIO,  0xac, struct kvm_one_reg)
 /* VM is being stopped by host */
 #define KVM_KVMCLOCK_CTRL	  _IO(KVMIO,   0xad)
+#define KVM_ARM_VCPU_INIT	  _IOW(KVMIO,  0xae, struct kvm_vcpu_init)
+#define KVM_GET_REG_LIST	  _IOWR(KVMIO, 0xb0, struct kvm_reg_list)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 07/15] KVM: ARM: Hypervisor inititalization
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:35   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

Sets up KVM code to handle all exceptions taken to Hyp mode.

When the kernel is booted in Hyp mode, calling "hvc #0xff" with r0 pointing to
the new vectors, the HVBAR is changed to the the vector pointers.  This allows
subsystems (like KVM here) to execute code in Hyp-mode with the MMU disabled.

We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.

If the KVM module is unloaded we call "hvc #0xff" once more to disable the MMU
in Hyp mode again and install a vector handler to change the HVBAR for a
subsequent reload of KVM or another hypervisor.

Also provides memory mapping code to map required code pages, data structures,
and I/O regions  accessed in Hyp mode at the same virtual address as the host
kernel virtual addresses, but which conforms to the architectural requirements
for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
and comprises:
 - create_hyp_mappings(from, to);
 - create_hyp_io_mappings(from, to, phys_addr);
 - free_hyp_pmds();

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h              |  109 ++++++++++++
 arch/arm/include/asm/kvm_asm.h              |   20 ++
 arch/arm/include/asm/kvm_mmu.h              |   36 ++++
 arch/arm/include/asm/pgtable-3level-hwdef.h |    4 
 arch/arm/kvm/arm.c                          |  236 +++++++++++++++++++++++++++
 arch/arm/kvm/exports.c                      |   16 ++
 arch/arm/kvm/init.S                         |  137 ++++++++++++++++
 arch/arm/kvm/interrupts.S                   |   48 +++++
 arch/arm/kvm/mmu.c                          |  189 ++++++++++++++++++++++
 mm/memory.c                                 |    2 
 10 files changed, 797 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm_mmu.h

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 2f9d28e..6e46541 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -19,10 +19,119 @@
 #ifndef __ARM_KVM_ARM_H__
 #define __ARM_KVM_ARM_H__
 
+#include <asm/types.h>
+
 /* Supported Processor Types */
 #define CORTEX_A15	(0xC0F)
 
 /* Multiprocessor Affinity Register */
 #define MPIDR_CPUID	(0x3 << 0)
 
+/* Hyp Configuration Register (HCR) bits */
+#define HCR_TGE		(1 << 27)
+#define HCR_TVM		(1 << 26)
+#define HCR_TTLB	(1 << 25)
+#define HCR_TPU		(1 << 24)
+#define HCR_TPC		(1 << 23)
+#define HCR_TSW		(1 << 22)
+#define HCR_TAC		(1 << 21)
+#define HCR_TIDCP	(1 << 20)
+#define HCR_TSC		(1 << 19)
+#define HCR_TID3	(1 << 18)
+#define HCR_TID2	(1 << 17)
+#define HCR_TID1	(1 << 16)
+#define HCR_TID0	(1 << 15)
+#define HCR_TWE		(1 << 14)
+#define HCR_TWI		(1 << 13)
+#define HCR_DC		(1 << 12)
+#define HCR_BSU		(3 << 10)
+#define HCR_BSU_IS	(1 << 10)
+#define HCR_FB		(1 << 9)
+#define HCR_VA		(1 << 8)
+#define HCR_VI		(1 << 7)
+#define HCR_VF		(1 << 6)
+#define HCR_AMO		(1 << 5)
+#define HCR_IMO		(1 << 4)
+#define HCR_FMO		(1 << 3)
+#define HCR_PTW		(1 << 2)
+#define HCR_SWIO	(1 << 1)
+#define HCR_VM		1
+
+/*
+ * The bits we set in HCR:
+ * TAC:		Trap ACTLR
+ * TSC:		Trap SMC
+ * TSW:		Trap cache operations by set/way
+ * TWI:		Trap WFI
+ * TIDCP:	Trap L2CTLR/L2ECTLR
+ * BSU_IS:	Upgrade barriers to the inner shareable domain
+ * FB:		Force broadcast of all maintainance operations
+ * AMO:		Override CPSR.A and enable signaling with VA
+ * IMO:		Override CPSR.I and enable signaling with VI
+ * FMO:		Override CPSR.F and enable signaling with VF
+ * SWIO:	Turn set/way invalidates into set/way clean+invalidate
+ */
+#define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
+			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
+			HCR_SWIO | HCR_TIDCP)
+
+/* Hyp System Control Register (HSCTLR) bits */
+#define HSCTLR_TE	(1 << 30)
+#define HSCTLR_EE	(1 << 25)
+#define HSCTLR_FI	(1 << 21)
+#define HSCTLR_WXN	(1 << 19)
+#define HSCTLR_I	(1 << 12)
+#define HSCTLR_C	(1 << 2)
+#define HSCTLR_A	(1 << 1)
+#define HSCTLR_M	1
+#define HSCTLR_MASK	(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
+			 HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
+
+/* TTBCR and HTCR Registers bits */
+#define TTBCR_EAE	(1 << 31)
+#define TTBCR_IMP	(1 << 30)
+#define TTBCR_SH1	(3 << 28)
+#define TTBCR_ORGN1	(3 << 26)
+#define TTBCR_IRGN1	(3 << 24)
+#define TTBCR_EPD1	(1 << 23)
+#define TTBCR_A1	(1 << 22)
+#define TTBCR_T1SZ	(3 << 16)
+#define TTBCR_SH0	(3 << 12)
+#define TTBCR_ORGN0	(3 << 10)
+#define TTBCR_IRGN0	(3 << 8)
+#define TTBCR_EPD0	(1 << 7)
+#define TTBCR_T0SZ	3
+#define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
+
+/* Hyp Debug Configuration Register bits */
+#define HDCR_TDRA	(1 << 11)
+#define HDCR_TDOSA	(1 << 10)
+#define HDCR_TDA	(1 << 9)
+#define HDCR_TDE	(1 << 8)
+#define HDCR_HPME	(1 << 7)
+#define HDCR_TPM	(1 << 6)
+#define HDCR_TPMCR	(1 << 5)
+#define HDCR_HPMN_MASK	(0x1F)
+
+/* Virtualization Translation Control Register (VTCR) bits */
+#define VTCR_SH0	(3 << 12)
+#define VTCR_ORGN0	(3 << 10)
+#define VTCR_IRGN0	(3 << 8)
+#define VTCR_SL0	(3 << 6)
+#define VTCR_S		(1 << 4)
+#define VTCR_T0SZ	3
+#define VTCR_MASK	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0 | VTCR_SL0 | \
+			 VTCR_S | VTCR_T0SZ | VTCR_MASK)
+#define VTCR_HTCR_SH	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0)
+#define VTCR_SL_L2	0		/* Starting-level: 2 */
+#define VTCR_SL_L1	(1 << 6)	/* Starting-level: 1 */
+#define VTCR_GUEST_SL	VTCR_SL_L1
+#define VTCR_GUEST_T0SZ	0
+#if VTCR_GUEST_SL == 0
+#define VTTBR_X		(14 - VTCR_GUEST_T0SZ)
+#else
+#define VTTBR_X		(5 - VTCR_GUEST_T0SZ)
+#endif
+
+
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 44591f9..6c40e55 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -26,5 +26,25 @@
 #define ARM_EXCEPTION_DATA_ABORT  4
 #define ARM_EXCEPTION_IRQ	  5
 #define ARM_EXCEPTION_FIQ	  6
+#define ARM_EXCEPTION_HVC	  7
+
+#ifndef __ASSEMBLY__
+struct kvm_vcpu;
+
+extern char __kvm_hyp_init[];
+extern char __kvm_hyp_init_end[];
+
+extern char __kvm_hyp_exit[];
+extern char __kvm_hyp_exit_end[];
+
+extern char __kvm_hyp_vector[];
+
+extern char __kvm_hyp_code_start[];
+extern char __kvm_hyp_code_end[];
+
+extern void __kvm_flush_vm_context(void);
+
+extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+#endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
new file mode 100644
index 0000000..8252921
--- /dev/null
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_MMU_H__
+#define __ARM_KVM_MMU_H__
+
+/*
+ * The architecture supports 40-bit IPA as input to the 2nd stage translations
+ * and PTRS_PER_PGD2 could therefore be 1024.
+ *
+ * To save a bit of memory and to avoid alignment issues we assume 39-bit IPA
+ * for now, but remember that the level-1 table must be aligned to its size.
+ */
+#define PTRS_PER_PGD2	512
+#define PGD2_ORDER	get_order(PTRS_PER_PGD2 * sizeof(pgd_t))
+
+int create_hyp_mappings(void *from, void *to);
+int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
+void free_hyp_pmds(void);
+
+#endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index a2d404e..18f5cef 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -32,6 +32,9 @@
 #define PMD_TYPE_SECT		(_AT(pmdval_t, 1) << 0)
 #define PMD_BIT4		(_AT(pmdval_t, 0))
 #define PMD_DOMAIN(x)		(_AT(pmdval_t, 0))
+#define PMD_APTABLE_SHIFT	(61)
+#define PMD_APTABLE		(_AT(pgdval_t, 3) << PGD_APTABLE_SHIFT)
+#define PMD_PXNTABLE		(_AT(pgdval_t, 1) << 59)
 
 /*
  *   - section
@@ -41,6 +44,7 @@
 #define PMD_SECT_S		(_AT(pmdval_t, 3) << 8)
 #define PMD_SECT_AF		(_AT(pmdval_t, 1) << 10)
 #define PMD_SECT_nG		(_AT(pmdval_t, 1) << 11)
+#define PMD_SECT_PXN		(_AT(pmdval_t, 1) << 53)
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index fd6fa9b..6f35aec 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -34,11 +34,22 @@
 #include <asm/ptrace.h>
 #include <asm/mman.h>
 #include <asm/cputype.h>
+#include <asm/idmap.h>
+#include <asm/tlbflush.h>
+#include <asm/virt.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_mmu.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
 #endif
 
+static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
+static unsigned long hyp_default_vectors;
+
+
 int kvm_arch_hardware_enable(void *garbage)
 {
 	return 0;
@@ -321,13 +332,238 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	return -EINVAL;
 }
 
+static void cpu_set_vector(void *vector)
+{
+	unsigned long vector_ptr;
+
+	vector_ptr = (unsigned long)vector;
+
+	/*
+	 * Set the HVBAR
+	 */
+	asm volatile (
+		"mov	r0, %[vector_ptr]\n\t"
+		"hvc	#0xff\n\t" : :
+		[vector_ptr] "r" (vector_ptr) :
+		"r0");
+}
+
+static void cpu_init_hyp_mode(void *vector)
+{
+	unsigned long pgd_ptr;
+	unsigned long hyp_stack_ptr;
+	unsigned long stack_page;
+	unsigned long vector_ptr;
+
+	/* Switch from the HYP stub to our own HYP init vector */
+	__hyp_set_vectors((unsigned long)vector);
+
+	pgd_ptr = virt_to_phys(hyp_pgd);
+	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
+	hyp_stack_ptr = stack_page + PAGE_SIZE;
+	vector_ptr = (unsigned long)__kvm_hyp_vector;
+
+	/*
+	 * Call initialization code, and switch to the full blown
+	 * HYP code. The init code corrupts r12, so set the clobber
+	 * list accordingly.
+	 */
+	asm volatile (
+		"mov	r0, %[pgd_ptr]\n\t"
+		"mov	r1, %[hyp_stack_ptr]\n\t"
+		"mov	r2, %[vector_ptr]\n\t"
+		"hvc	#0\n\t" : :
+		[pgd_ptr] "r" (pgd_ptr),
+		[hyp_stack_ptr] "r" (hyp_stack_ptr),
+		[vector_ptr] "r" (vector_ptr) :
+		"r0", "r1", "r2", "r12");
+}
+
+/**
+ * Inits Hyp-mode on all online CPUs
+ */
+static int init_hyp_mode(void)
+{
+	phys_addr_t init_phys_addr;
+	int cpu;
+	int err = 0;
+
+	/*
+	 * It is probably enough to obtain the default on one
+	 * CPU. It's unlikely to be different on the others.
+	 */
+	hyp_default_vectors = __hyp_get_vectors();
+
+	/*
+	 * Allocate stack pages for Hypervisor-mode
+	 */
+	for_each_possible_cpu(cpu) {
+		unsigned long stack_page;
+
+		stack_page = __get_free_page(GFP_KERNEL);
+		if (!stack_page) {
+			err = -ENOMEM;
+			goto out_free_stack_pages;
+		}
+
+		per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
+	}
+
+	/*
+	 * Execute the init code on each CPU.
+	 *
+	 * Note: The stack is not mapped yet, so don't do anything else than
+	 * initializing the hypervisor mode on each CPU using a local stack
+	 * space for temporary storage.
+	 */
+	init_phys_addr = virt_to_phys(__kvm_hyp_init);
+	for_each_online_cpu(cpu) {
+		smp_call_function_single(cpu, cpu_init_hyp_mode,
+					 (void *)(long)init_phys_addr, 1);
+	}
+
+	/*
+	 * Unmap the identity mapping
+	 */
+	hyp_idmap_teardown();
+
+	/*
+	 * Map the Hyp-code called directly from the host
+	 */
+	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
+	if (err) {
+		kvm_err("Cannot map world-switch code\n");
+		goto out_free_mappings;
+	}
+
+	/*
+	 * Map the Hyp stack pages
+	 */
+	for_each_possible_cpu(cpu) {
+		char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
+		err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE);
+
+		if (err) {
+			kvm_err("Cannot map hyp stack\n");
+			goto out_free_mappings;
+		}
+	}
+
+	/*
+	 * Map the host VFP structures
+	 */
+	kvm_host_vfp_state = alloc_percpu(struct vfp_hard_struct);
+	if (!kvm_host_vfp_state) {
+		err = -ENOMEM;
+		kvm_err("Cannot allocate host VFP state\n");
+		goto out_free_mappings;
+	}
+
+	for_each_possible_cpu(cpu) {
+		struct vfp_hard_struct *vfp;
+
+		vfp = per_cpu_ptr(kvm_host_vfp_state, cpu);
+		err = create_hyp_mappings(vfp, vfp + 1);
+
+		if (err) {
+			kvm_err("Cannot map host VFP state: %d\n", err);
+			goto out_free_vfp;
+		}
+	}
+
+	return 0;
+out_free_vfp:
+	free_percpu(kvm_host_vfp_state);
+out_free_mappings:
+	free_hyp_pmds();
+out_free_stack_pages:
+	for_each_possible_cpu(cpu)
+		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+	return err;
+}
+
+/**
+ * Initialize Hyp-mode and memory mappings on all CPUs.
+ */
 int kvm_arch_init(void *opaque)
 {
+	int err;
+
+	if (!is_hyp_mode_available()) {
+		kvm_err("HYP mode not available\n");
+		return -ENODEV;
+	}
+
+	if (kvm_target_cpu() < 0) {
+		kvm_err("Target CPU not supported!\n");
+		return -ENODEV;
+	}
+
+	err = init_hyp_mode();
+	if (err)
+		goto out_err;
+
+	return 0;
+out_err:
+	return err;
+}
+
+static void cpu_exit_hyp_mode(void *vector)
+{
+	cpu_set_vector(vector);
+
+	/*
+	 * Disable Hyp-MMU for each cpu, and switch back to the
+	 * default vectors.
+	 */
+	asm volatile ("mov	r0, %[vector_ptr]\n\t"
+		      "hvc	#0\n\t" : :
+		      [vector_ptr] "r" (hyp_default_vectors) :
+		      "r0");
+}
+
+static int exit_hyp_mode(void)
+{
+	phys_addr_t exit_phys_addr;
+	int cpu;
+
+	/*
+	 * TODO: flush Hyp TLB in case idmap code overlaps.
+	 * Note that we should do this in the monitor code when switching the
+	 * HVBAR, but this is going  away and should be rather done in the Hyp
+	 * mode change of HVBAR.
+	 */
+	hyp_idmap_setup();
+	exit_phys_addr = virt_to_phys(__kvm_hyp_exit);
+	BUG_ON(exit_phys_addr & 0x1f);
+
+	/*
+	 * Execute the exit code on each CPU.
+	 *
+	 * Note: The stack is not mapped yet, so don't do anything else than
+	 * initializing the hypervisor mode on each CPU using a local stack
+	 * space for temporary storage.
+	 */
+	for_each_online_cpu(cpu) {
+		smp_call_function_single(cpu, cpu_exit_hyp_mode,
+					 (void *)(long)exit_phys_addr, 1);
+	}
+
 	return 0;
 }
 
 void kvm_arch_exit(void)
 {
+	int cpu;
+
+	exit_hyp_mode();
+
+	free_hyp_pmds();
+	free_percpu(kvm_host_vfp_state);
+	for_each_possible_cpu(cpu) {
+		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+		per_cpu(kvm_arm_hyp_stack_page, cpu) = 0;
+	}
 }
 
 static int arm_init(void)
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index 3e38c95..8ebdf07 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -17,5 +17,21 @@
  */
 
 #include <linux/module.h>
+#include <asm/kvm_asm.h>
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_init);
+EXPORT_SYMBOL_GPL(__kvm_hyp_init_end);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_exit);
+EXPORT_SYMBOL_GPL(__kvm_hyp_exit_end);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_vector);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_code_start);
+EXPORT_SYMBOL_GPL(__kvm_hyp_code_end);
+
+EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
+
+EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
 
 EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 1dc8926..3f209c8 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -15,5 +15,142 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/linkage.h>
+#include <asm/unified.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor initialization
+@    - should be called with:
+@        r0 = Hypervisor pgd pointer
+@        r1 = top of Hyp stack (kernel VA)
+@        r2 = pointer to hyp vectors
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	.text
+        .pushsection    .hyp.idmap.text,"ax"
+	.align 12
+__kvm_hyp_init:
+	.globl __kvm_hyp_init
+
+	@ Hyp-mode exception vector
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	__do_hyp_init
+	W(b)	.
+	W(b)	.
+
+__do_hyp_init:
+	@ Set the sp to end of this page and push data for later use
+ARM(	add	r12, pc, #(__kvm_init_sp - .)	)
+ARM(	sub	r12, r12, #8			)
+THUMB(	adr	r12, __kvm_init_sp		)
+	mov	sp, r12
+	push	{r0, r1, r2}
+
+	@ Set the HTTBR to point to the hypervisor PGD pointer passed to
+	@ function and set the upper bits equal to the kernel PGD.
+	mrrc	p15, 1, r1, r2, c2
+	mcrr	p15, 4, r0, r2, c2
+
+	@ Set the HTCR and VTCR to the same shareability and cacheability
+	@ settings as the non-secure TTBCR and with T0SZ == 0.
+	mrc	p15, 4, r0, c2, c0, 2	@ HTCR
+	ldr	r12, =HTCR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c2, c0, 2	@ TTBCR
+	and	r1, r1, #(HTCR_MASK & ~TTBCR_T0SZ)
+	orr	r0, r0, r1
+	mcr	p15, 4, r0, c2, c0, 2	@ HTCR
+
+	mrc	p15, 4, r1, c2, c1, 2	@ VTCR
+	bic	r1, r1, #(VTCR_HTCR_SH | VTCR_SL0)
+	bic	r0, r0, #(~VTCR_HTCR_SH)
+	orr	r1, r0, r1
+	orr	r1, r1, #(VTCR_SL_L1 | VTCR_GUEST_T0SZ)
+	mcr	p15, 4, r1, c2, c1, 2	@ VTCR
+
+	@ Use the same memory attributes for hyp. accesses as the kernel
+	@ (copy MAIRx ro HMAIRx).
+	mrc	p15, 0, r0, c10, c2, 0
+	mcr	p15, 4, r0, c10, c2, 0
+	mrc	p15, 0, r0, c10, c2, 1
+	mcr	p15, 4, r0, c10, c2, 1
+
+	@ Set the HSCTLR to:
+	@  - ARM/THUMB exceptions: Kernel config (Thumb-2 kernel)
+	@  - Endianness: Kernel config
+	@  - Fast Interrupt Features: Kernel config
+	@  - Write permission implies XN: disabled
+	@  - Instruction cache: enabled
+	@  - Data/Unified cache: enabled
+	@  - Memory alignment checks: enabled
+	@  - MMU: enabled (this code must be run from an identity mapping)
+	mrc	p15, 4, r0, c1, c0, 0	@ HSCR
+	ldr	r12, =HSCTLR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c1, c0, 0	@ SCTLR
+	ldr	r12, =(HSCTLR_EE | HSCTLR_FI)
+	and	r1, r1, r12
+ ARM(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I)			)
+ THUMB(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I | HSCTLR_TE)	)
+	orr	r1, r1, r12
+	orr	r0, r0, r1
+	isb
+	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
+	isb
+
+	@ Set stack pointer and return to the kernel
+	pop	{r0, r1, r2}
+	mov	sp, r1
+
+	@ Set HVBAR to point to the HYP vectors
+	mcr	p15, 4, r2, c12, c0, 0	@ HVBAR
+
+	eret
+
+	.ltorg
+
+	.align 12
+
+	__kvm_init_sp:
+	.globl __kvm_hyp_init_end
+__kvm_hyp_init_end:
+
+	.align 12
+__kvm_hyp_exit:
+	.globl __kvm_hyp_exit
+
+	@ Hyp-mode exception vector
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	__do_hyp_exit
+	W(b)	.
+	W(b)	.
+
+__do_hyp_exit:
+	@ Set the next HVBAR (normally the default vectors)
+	mcr	p15, 4, r0, c12, c0, 0	@ HVBAR
+
+	@ Clear the MMU bit in the HSCR
+	mrc	p15, 4, r0, c1, c0, 0	@ HSCR
+	bic	r0, r0, #HSCTLR_M
+
+	isb
+	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
+	mcr	p15, 4, r0, c8, c7, 0   @ Flush Hyp TLB, r0 ignored
+	isb
+	eret
+
+	.globl __kvm_hyp_exit_end
+__kvm_hyp_exit_end:
+
+	.popsection
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 1dc8926..bf09801 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -15,5 +15,53 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/unified.h>
+#include <asm/page.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+	.text
+	.align	PAGE_SHIFT
+
+__kvm_hyp_code_start:
+	.globl __kvm_hyp_code_start
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Flush TLBs and instruction caches of current CPU for all VMIDs
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_flush_vm_context)
+	bx	lr
+ENDPROC(__kvm_flush_vm_context)
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor world-switch code
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_vcpu_run)
+	bx	lr
+
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor exception vector and handlers
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+	.align 5
+__kvm_hyp_vector:
+	.globl __kvm_hyp_vector
+	nop
+
+/*
+ * The below lines makes sure the HYP mode code fits in a single page (the
+ * assembler will bark at you if it doesn't). Please keep them together. If
+ * you plan to restructure the code or increase its size over a page, you'll
+ * have to fix the code in init_hyp_mode().
+ */
+__kvm_hyp_code_end:
+	.globl	__kvm_hyp_code_end
+
+	.org	__kvm_hyp_code_start + PAGE_SIZE
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 10ed464..6a7dfd4 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -15,3 +15,192 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/mman.h>
+#include <linux/kvm_host.h>
+#include <linux/io.h>
+#include <asm/idmap.h>
+#include <asm/pgalloc.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
+#include <asm/mach/map.h>
+
+static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
+
+static void free_ptes(pmd_t *pmd, unsigned long addr)
+{
+	pte_t *pte;
+	unsigned int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			pte_free_kernel(NULL, pte);
+		}
+		pmd++;
+	}
+}
+
+/**
+ * free_hyp_pmds - free a Hyp-mode level-2 tables and child level-3 tables
+ *
+ * Assumes this is a page table used strictly in Hyp-mode and therefore contains
+ * only mappings in the kernel memory area, which is above PAGE_OFFSET.
+ */
+void free_hyp_pmds(void)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = PAGE_OFFSET; addr != 0; addr += PGDIR_SIZE) {
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none(*pud))
+			continue;
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+		pud_clear(pud);
+	}
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+}
+
+/*
+ * Create a HYP pte mapping.
+ *
+ * If pfn_base is NULL, we map kernel pages into HYP with the virtual
+ * address. Otherwise, this is considered an I/O mapping and we map
+ * the physical region starting at *pfn_base to [start, end[.
+ */
+static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,
+				    unsigned long end, unsigned long *pfn_base)
+{
+	pte_t *pte;
+	unsigned long addr;
+	pgprot_t prot;
+
+	if (pfn_base)
+		prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER);
+	else
+		prot = PAGE_HYP;
+
+	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
+		pte = pte_offset_kernel(pmd, addr);
+		if (pfn_base) {
+			BUG_ON(pfn_valid(*pfn_base));
+			set_pte_ext(pte, pfn_pte(*pfn_base, prot), 0);
+			(*pfn_base)++;
+		} else {
+			struct page *page;
+			BUG_ON(!virt_addr_valid(addr));
+			page = virt_to_page(addr);
+			set_pte_ext(pte, mk_pte(page, prot), 0);
+		}
+
+	}
+}
+
+static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
+				   unsigned long end, unsigned long *pfn_base)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+	unsigned long addr, next;
+
+	for (addr = start; addr < end; addr = next) {
+		pmd = pmd_offset(pud, addr);
+
+		BUG_ON(pmd_sect(*pmd));
+
+		if (pmd_none(*pmd)) {
+			pte = pte_alloc_one_kernel(NULL, addr);
+			if (!pte) {
+				kvm_err("Cannot allocate Hyp pte\n");
+				return -ENOMEM;
+			}
+			pmd_populate_kernel(NULL, pmd, pte);
+		}
+
+		next = pmd_addr_end(addr, end);
+		create_hyp_pte_mappings(pmd, addr, next, pfn_base);
+	}
+
+	return 0;
+}
+
+static int __create_hyp_mappings(void *from, void *to, unsigned long *pfn_base)
+{
+	unsigned long start = (unsigned long)from;
+	unsigned long end = (unsigned long)to;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr, next;
+	int err = 0;
+
+	BUG_ON(start > end);
+	if (start < PAGE_OFFSET)
+		return -EINVAL;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = start; addr < end; addr = next) {
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none_or_clear_bad(pud)) {
+			pmd = pmd_alloc_one(NULL, addr);
+			if (!pmd) {
+				kvm_err("Cannot allocate Hyp pmd\n");
+				err = -ENOMEM;
+				goto out;
+			}
+			pud_populate(NULL, pud, pmd);
+		}
+
+		next = pgd_addr_end(addr, end);
+		err = create_hyp_pmd_mappings(pud, addr, next, pfn_base);
+		if (err)
+			goto out;
+	}
+out:
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+	return err;
+}
+
+/**
+ * create_hyp_mappings - map a kernel virtual address range in Hyp mode
+ * @from:	The virtual kernel start address of the range
+ * @to:		The virtual kernel end address of the range (exclusive)
+ *
+ * The same virtual address as the kernel virtual address is also used in
+ * Hyp-mode mapping to the same underlying physical pages.
+ *
+ * Note: Wrapping around zero in the "to" address is not supported.
+ */
+int create_hyp_mappings(void *from, void *to)
+{
+	return __create_hyp_mappings(from, to, NULL);
+}
+
+/**
+ * create_hyp_io_mappings - map a physical IO range in Hyp mode
+ * @from:	The virtual HYP start address of the range
+ * @to:		The virtual HYP end address of the range (exclusive)
+ * @addr:	The physical start address which gets mapped
+ */
+int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
+{
+	unsigned long pfn = __phys_to_pfn(addr);
+	return __create_hyp_mappings(from, to, &pfn);
+}
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
diff --git a/mm/memory.c b/mm/memory.c
index 5736170..0e58fdd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -383,12 +383,14 @@ void pgd_clear_bad(pgd_t *pgd)
 	pgd_ERROR(*pgd);
 	pgd_clear(pgd);
 }
+EXPORT_SYMBOL_GPL(pgd_clear_bad);
 
 void pud_clear_bad(pud_t *pud)
 {
 	pud_ERROR(*pud);
 	pud_clear(pud);
 }
+EXPORT_SYMBOL_GPL(pud_clear_bad);
 
 void pmd_clear_bad(pmd_t *pmd)
 {


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 07/15] KVM: ARM: Hypervisor inititalization
@ 2012-09-15 15:35   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

Sets up KVM code to handle all exceptions taken to Hyp mode.

When the kernel is booted in Hyp mode, calling "hvc #0xff" with r0 pointing to
the new vectors, the HVBAR is changed to the the vector pointers.  This allows
subsystems (like KVM here) to execute code in Hyp-mode with the MMU disabled.

We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.

If the KVM module is unloaded we call "hvc #0xff" once more to disable the MMU
in Hyp mode again and install a vector handler to change the HVBAR for a
subsequent reload of KVM or another hypervisor.

Also provides memory mapping code to map required code pages, data structures,
and I/O regions  accessed in Hyp mode at the same virtual address as the host
kernel virtual addresses, but which conforms to the architectural requirements
for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
and comprises:
 - create_hyp_mappings(from, to);
 - create_hyp_io_mappings(from, to, phys_addr);
 - free_hyp_pmds();

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h              |  109 ++++++++++++
 arch/arm/include/asm/kvm_asm.h              |   20 ++
 arch/arm/include/asm/kvm_mmu.h              |   36 ++++
 arch/arm/include/asm/pgtable-3level-hwdef.h |    4 
 arch/arm/kvm/arm.c                          |  236 +++++++++++++++++++++++++++
 arch/arm/kvm/exports.c                      |   16 ++
 arch/arm/kvm/init.S                         |  137 ++++++++++++++++
 arch/arm/kvm/interrupts.S                   |   48 +++++
 arch/arm/kvm/mmu.c                          |  189 ++++++++++++++++++++++
 mm/memory.c                                 |    2 
 10 files changed, 797 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm_mmu.h

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 2f9d28e..6e46541 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -19,10 +19,119 @@
 #ifndef __ARM_KVM_ARM_H__
 #define __ARM_KVM_ARM_H__
 
+#include <asm/types.h>
+
 /* Supported Processor Types */
 #define CORTEX_A15	(0xC0F)
 
 /* Multiprocessor Affinity Register */
 #define MPIDR_CPUID	(0x3 << 0)
 
+/* Hyp Configuration Register (HCR) bits */
+#define HCR_TGE		(1 << 27)
+#define HCR_TVM		(1 << 26)
+#define HCR_TTLB	(1 << 25)
+#define HCR_TPU		(1 << 24)
+#define HCR_TPC		(1 << 23)
+#define HCR_TSW		(1 << 22)
+#define HCR_TAC		(1 << 21)
+#define HCR_TIDCP	(1 << 20)
+#define HCR_TSC		(1 << 19)
+#define HCR_TID3	(1 << 18)
+#define HCR_TID2	(1 << 17)
+#define HCR_TID1	(1 << 16)
+#define HCR_TID0	(1 << 15)
+#define HCR_TWE		(1 << 14)
+#define HCR_TWI		(1 << 13)
+#define HCR_DC		(1 << 12)
+#define HCR_BSU		(3 << 10)
+#define HCR_BSU_IS	(1 << 10)
+#define HCR_FB		(1 << 9)
+#define HCR_VA		(1 << 8)
+#define HCR_VI		(1 << 7)
+#define HCR_VF		(1 << 6)
+#define HCR_AMO		(1 << 5)
+#define HCR_IMO		(1 << 4)
+#define HCR_FMO		(1 << 3)
+#define HCR_PTW		(1 << 2)
+#define HCR_SWIO	(1 << 1)
+#define HCR_VM		1
+
+/*
+ * The bits we set in HCR:
+ * TAC:		Trap ACTLR
+ * TSC:		Trap SMC
+ * TSW:		Trap cache operations by set/way
+ * TWI:		Trap WFI
+ * TIDCP:	Trap L2CTLR/L2ECTLR
+ * BSU_IS:	Upgrade barriers to the inner shareable domain
+ * FB:		Force broadcast of all maintainance operations
+ * AMO:		Override CPSR.A and enable signaling with VA
+ * IMO:		Override CPSR.I and enable signaling with VI
+ * FMO:		Override CPSR.F and enable signaling with VF
+ * SWIO:	Turn set/way invalidates into set/way clean+invalidate
+ */
+#define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
+			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
+			HCR_SWIO | HCR_TIDCP)
+
+/* Hyp System Control Register (HSCTLR) bits */
+#define HSCTLR_TE	(1 << 30)
+#define HSCTLR_EE	(1 << 25)
+#define HSCTLR_FI	(1 << 21)
+#define HSCTLR_WXN	(1 << 19)
+#define HSCTLR_I	(1 << 12)
+#define HSCTLR_C	(1 << 2)
+#define HSCTLR_A	(1 << 1)
+#define HSCTLR_M	1
+#define HSCTLR_MASK	(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
+			 HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
+
+/* TTBCR and HTCR Registers bits */
+#define TTBCR_EAE	(1 << 31)
+#define TTBCR_IMP	(1 << 30)
+#define TTBCR_SH1	(3 << 28)
+#define TTBCR_ORGN1	(3 << 26)
+#define TTBCR_IRGN1	(3 << 24)
+#define TTBCR_EPD1	(1 << 23)
+#define TTBCR_A1	(1 << 22)
+#define TTBCR_T1SZ	(3 << 16)
+#define TTBCR_SH0	(3 << 12)
+#define TTBCR_ORGN0	(3 << 10)
+#define TTBCR_IRGN0	(3 << 8)
+#define TTBCR_EPD0	(1 << 7)
+#define TTBCR_T0SZ	3
+#define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
+
+/* Hyp Debug Configuration Register bits */
+#define HDCR_TDRA	(1 << 11)
+#define HDCR_TDOSA	(1 << 10)
+#define HDCR_TDA	(1 << 9)
+#define HDCR_TDE	(1 << 8)
+#define HDCR_HPME	(1 << 7)
+#define HDCR_TPM	(1 << 6)
+#define HDCR_TPMCR	(1 << 5)
+#define HDCR_HPMN_MASK	(0x1F)
+
+/* Virtualization Translation Control Register (VTCR) bits */
+#define VTCR_SH0	(3 << 12)
+#define VTCR_ORGN0	(3 << 10)
+#define VTCR_IRGN0	(3 << 8)
+#define VTCR_SL0	(3 << 6)
+#define VTCR_S		(1 << 4)
+#define VTCR_T0SZ	3
+#define VTCR_MASK	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0 | VTCR_SL0 | \
+			 VTCR_S | VTCR_T0SZ | VTCR_MASK)
+#define VTCR_HTCR_SH	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0)
+#define VTCR_SL_L2	0		/* Starting-level: 2 */
+#define VTCR_SL_L1	(1 << 6)	/* Starting-level: 1 */
+#define VTCR_GUEST_SL	VTCR_SL_L1
+#define VTCR_GUEST_T0SZ	0
+#if VTCR_GUEST_SL == 0
+#define VTTBR_X		(14 - VTCR_GUEST_T0SZ)
+#else
+#define VTTBR_X		(5 - VTCR_GUEST_T0SZ)
+#endif
+
+
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 44591f9..6c40e55 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -26,5 +26,25 @@
 #define ARM_EXCEPTION_DATA_ABORT  4
 #define ARM_EXCEPTION_IRQ	  5
 #define ARM_EXCEPTION_FIQ	  6
+#define ARM_EXCEPTION_HVC	  7
+
+#ifndef __ASSEMBLY__
+struct kvm_vcpu;
+
+extern char __kvm_hyp_init[];
+extern char __kvm_hyp_init_end[];
+
+extern char __kvm_hyp_exit[];
+extern char __kvm_hyp_exit_end[];
+
+extern char __kvm_hyp_vector[];
+
+extern char __kvm_hyp_code_start[];
+extern char __kvm_hyp_code_end[];
+
+extern void __kvm_flush_vm_context(void);
+
+extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+#endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
new file mode 100644
index 0000000..8252921
--- /dev/null
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_MMU_H__
+#define __ARM_KVM_MMU_H__
+
+/*
+ * The architecture supports 40-bit IPA as input to the 2nd stage translations
+ * and PTRS_PER_PGD2 could therefore be 1024.
+ *
+ * To save a bit of memory and to avoid alignment issues we assume 39-bit IPA
+ * for now, but remember that the level-1 table must be aligned to its size.
+ */
+#define PTRS_PER_PGD2	512
+#define PGD2_ORDER	get_order(PTRS_PER_PGD2 * sizeof(pgd_t))
+
+int create_hyp_mappings(void *from, void *to);
+int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
+void free_hyp_pmds(void);
+
+#endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index a2d404e..18f5cef 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -32,6 +32,9 @@
 #define PMD_TYPE_SECT		(_AT(pmdval_t, 1) << 0)
 #define PMD_BIT4		(_AT(pmdval_t, 0))
 #define PMD_DOMAIN(x)		(_AT(pmdval_t, 0))
+#define PMD_APTABLE_SHIFT	(61)
+#define PMD_APTABLE		(_AT(pgdval_t, 3) << PGD_APTABLE_SHIFT)
+#define PMD_PXNTABLE		(_AT(pgdval_t, 1) << 59)
 
 /*
  *   - section
@@ -41,6 +44,7 @@
 #define PMD_SECT_S		(_AT(pmdval_t, 3) << 8)
 #define PMD_SECT_AF		(_AT(pmdval_t, 1) << 10)
 #define PMD_SECT_nG		(_AT(pmdval_t, 1) << 11)
+#define PMD_SECT_PXN		(_AT(pmdval_t, 1) << 53)
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index fd6fa9b..6f35aec 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -34,11 +34,22 @@
 #include <asm/ptrace.h>
 #include <asm/mman.h>
 #include <asm/cputype.h>
+#include <asm/idmap.h>
+#include <asm/tlbflush.h>
+#include <asm/virt.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_mmu.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
 #endif
 
+static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
+static unsigned long hyp_default_vectors;
+
+
 int kvm_arch_hardware_enable(void *garbage)
 {
 	return 0;
@@ -321,13 +332,238 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	return -EINVAL;
 }
 
+static void cpu_set_vector(void *vector)
+{
+	unsigned long vector_ptr;
+
+	vector_ptr = (unsigned long)vector;
+
+	/*
+	 * Set the HVBAR
+	 */
+	asm volatile (
+		"mov	r0, %[vector_ptr]\n\t"
+		"hvc	#0xff\n\t" : :
+		[vector_ptr] "r" (vector_ptr) :
+		"r0");
+}
+
+static void cpu_init_hyp_mode(void *vector)
+{
+	unsigned long pgd_ptr;
+	unsigned long hyp_stack_ptr;
+	unsigned long stack_page;
+	unsigned long vector_ptr;
+
+	/* Switch from the HYP stub to our own HYP init vector */
+	__hyp_set_vectors((unsigned long)vector);
+
+	pgd_ptr = virt_to_phys(hyp_pgd);
+	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
+	hyp_stack_ptr = stack_page + PAGE_SIZE;
+	vector_ptr = (unsigned long)__kvm_hyp_vector;
+
+	/*
+	 * Call initialization code, and switch to the full blown
+	 * HYP code. The init code corrupts r12, so set the clobber
+	 * list accordingly.
+	 */
+	asm volatile (
+		"mov	r0, %[pgd_ptr]\n\t"
+		"mov	r1, %[hyp_stack_ptr]\n\t"
+		"mov	r2, %[vector_ptr]\n\t"
+		"hvc	#0\n\t" : :
+		[pgd_ptr] "r" (pgd_ptr),
+		[hyp_stack_ptr] "r" (hyp_stack_ptr),
+		[vector_ptr] "r" (vector_ptr) :
+		"r0", "r1", "r2", "r12");
+}
+
+/**
+ * Inits Hyp-mode on all online CPUs
+ */
+static int init_hyp_mode(void)
+{
+	phys_addr_t init_phys_addr;
+	int cpu;
+	int err = 0;
+
+	/*
+	 * It is probably enough to obtain the default on one
+	 * CPU. It's unlikely to be different on the others.
+	 */
+	hyp_default_vectors = __hyp_get_vectors();
+
+	/*
+	 * Allocate stack pages for Hypervisor-mode
+	 */
+	for_each_possible_cpu(cpu) {
+		unsigned long stack_page;
+
+		stack_page = __get_free_page(GFP_KERNEL);
+		if (!stack_page) {
+			err = -ENOMEM;
+			goto out_free_stack_pages;
+		}
+
+		per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
+	}
+
+	/*
+	 * Execute the init code on each CPU.
+	 *
+	 * Note: The stack is not mapped yet, so don't do anything else than
+	 * initializing the hypervisor mode on each CPU using a local stack
+	 * space for temporary storage.
+	 */
+	init_phys_addr = virt_to_phys(__kvm_hyp_init);
+	for_each_online_cpu(cpu) {
+		smp_call_function_single(cpu, cpu_init_hyp_mode,
+					 (void *)(long)init_phys_addr, 1);
+	}
+
+	/*
+	 * Unmap the identity mapping
+	 */
+	hyp_idmap_teardown();
+
+	/*
+	 * Map the Hyp-code called directly from the host
+	 */
+	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
+	if (err) {
+		kvm_err("Cannot map world-switch code\n");
+		goto out_free_mappings;
+	}
+
+	/*
+	 * Map the Hyp stack pages
+	 */
+	for_each_possible_cpu(cpu) {
+		char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
+		err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE);
+
+		if (err) {
+			kvm_err("Cannot map hyp stack\n");
+			goto out_free_mappings;
+		}
+	}
+
+	/*
+	 * Map the host VFP structures
+	 */
+	kvm_host_vfp_state = alloc_percpu(struct vfp_hard_struct);
+	if (!kvm_host_vfp_state) {
+		err = -ENOMEM;
+		kvm_err("Cannot allocate host VFP state\n");
+		goto out_free_mappings;
+	}
+
+	for_each_possible_cpu(cpu) {
+		struct vfp_hard_struct *vfp;
+
+		vfp = per_cpu_ptr(kvm_host_vfp_state, cpu);
+		err = create_hyp_mappings(vfp, vfp + 1);
+
+		if (err) {
+			kvm_err("Cannot map host VFP state: %d\n", err);
+			goto out_free_vfp;
+		}
+	}
+
+	return 0;
+out_free_vfp:
+	free_percpu(kvm_host_vfp_state);
+out_free_mappings:
+	free_hyp_pmds();
+out_free_stack_pages:
+	for_each_possible_cpu(cpu)
+		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+	return err;
+}
+
+/**
+ * Initialize Hyp-mode and memory mappings on all CPUs.
+ */
 int kvm_arch_init(void *opaque)
 {
+	int err;
+
+	if (!is_hyp_mode_available()) {
+		kvm_err("HYP mode not available\n");
+		return -ENODEV;
+	}
+
+	if (kvm_target_cpu() < 0) {
+		kvm_err("Target CPU not supported!\n");
+		return -ENODEV;
+	}
+
+	err = init_hyp_mode();
+	if (err)
+		goto out_err;
+
+	return 0;
+out_err:
+	return err;
+}
+
+static void cpu_exit_hyp_mode(void *vector)
+{
+	cpu_set_vector(vector);
+
+	/*
+	 * Disable Hyp-MMU for each cpu, and switch back to the
+	 * default vectors.
+	 */
+	asm volatile ("mov	r0, %[vector_ptr]\n\t"
+		      "hvc	#0\n\t" : :
+		      [vector_ptr] "r" (hyp_default_vectors) :
+		      "r0");
+}
+
+static int exit_hyp_mode(void)
+{
+	phys_addr_t exit_phys_addr;
+	int cpu;
+
+	/*
+	 * TODO: flush Hyp TLB in case idmap code overlaps.
+	 * Note that we should do this in the monitor code when switching the
+	 * HVBAR, but this is going  away and should be rather done in the Hyp
+	 * mode change of HVBAR.
+	 */
+	hyp_idmap_setup();
+	exit_phys_addr = virt_to_phys(__kvm_hyp_exit);
+	BUG_ON(exit_phys_addr & 0x1f);
+
+	/*
+	 * Execute the exit code on each CPU.
+	 *
+	 * Note: The stack is not mapped yet, so don't do anything else than
+	 * initializing the hypervisor mode on each CPU using a local stack
+	 * space for temporary storage.
+	 */
+	for_each_online_cpu(cpu) {
+		smp_call_function_single(cpu, cpu_exit_hyp_mode,
+					 (void *)(long)exit_phys_addr, 1);
+	}
+
 	return 0;
 }
 
 void kvm_arch_exit(void)
 {
+	int cpu;
+
+	exit_hyp_mode();
+
+	free_hyp_pmds();
+	free_percpu(kvm_host_vfp_state);
+	for_each_possible_cpu(cpu) {
+		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+		per_cpu(kvm_arm_hyp_stack_page, cpu) = 0;
+	}
 }
 
 static int arm_init(void)
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index 3e38c95..8ebdf07 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -17,5 +17,21 @@
  */
 
 #include <linux/module.h>
+#include <asm/kvm_asm.h>
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_init);
+EXPORT_SYMBOL_GPL(__kvm_hyp_init_end);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_exit);
+EXPORT_SYMBOL_GPL(__kvm_hyp_exit_end);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_vector);
+
+EXPORT_SYMBOL_GPL(__kvm_hyp_code_start);
+EXPORT_SYMBOL_GPL(__kvm_hyp_code_end);
+
+EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
+
+EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
 
 EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 1dc8926..3f209c8 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -15,5 +15,142 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/linkage.h>
+#include <asm/unified.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor initialization
+@    - should be called with:
+@        r0 = Hypervisor pgd pointer
+@        r1 = top of Hyp stack (kernel VA)
+@        r2 = pointer to hyp vectors
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	.text
+        .pushsection    .hyp.idmap.text,"ax"
+	.align 12
+__kvm_hyp_init:
+	.globl __kvm_hyp_init
+
+	@ Hyp-mode exception vector
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	__do_hyp_init
+	W(b)	.
+	W(b)	.
+
+__do_hyp_init:
+	@ Set the sp to end of this page and push data for later use
+ARM(	add	r12, pc, #(__kvm_init_sp - .)	)
+ARM(	sub	r12, r12, #8			)
+THUMB(	adr	r12, __kvm_init_sp		)
+	mov	sp, r12
+	push	{r0, r1, r2}
+
+	@ Set the HTTBR to point to the hypervisor PGD pointer passed to
+	@ function and set the upper bits equal to the kernel PGD.
+	mrrc	p15, 1, r1, r2, c2
+	mcrr	p15, 4, r0, r2, c2
+
+	@ Set the HTCR and VTCR to the same shareability and cacheability
+	@ settings as the non-secure TTBCR and with T0SZ == 0.
+	mrc	p15, 4, r0, c2, c0, 2	@ HTCR
+	ldr	r12, =HTCR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c2, c0, 2	@ TTBCR
+	and	r1, r1, #(HTCR_MASK & ~TTBCR_T0SZ)
+	orr	r0, r0, r1
+	mcr	p15, 4, r0, c2, c0, 2	@ HTCR
+
+	mrc	p15, 4, r1, c2, c1, 2	@ VTCR
+	bic	r1, r1, #(VTCR_HTCR_SH | VTCR_SL0)
+	bic	r0, r0, #(~VTCR_HTCR_SH)
+	orr	r1, r0, r1
+	orr	r1, r1, #(VTCR_SL_L1 | VTCR_GUEST_T0SZ)
+	mcr	p15, 4, r1, c2, c1, 2	@ VTCR
+
+	@ Use the same memory attributes for hyp. accesses as the kernel
+	@ (copy MAIRx ro HMAIRx).
+	mrc	p15, 0, r0, c10, c2, 0
+	mcr	p15, 4, r0, c10, c2, 0
+	mrc	p15, 0, r0, c10, c2, 1
+	mcr	p15, 4, r0, c10, c2, 1
+
+	@ Set the HSCTLR to:
+	@  - ARM/THUMB exceptions: Kernel config (Thumb-2 kernel)
+	@  - Endianness: Kernel config
+	@  - Fast Interrupt Features: Kernel config
+	@  - Write permission implies XN: disabled
+	@  - Instruction cache: enabled
+	@  - Data/Unified cache: enabled
+	@  - Memory alignment checks: enabled
+	@  - MMU: enabled (this code must be run from an identity mapping)
+	mrc	p15, 4, r0, c1, c0, 0	@ HSCR
+	ldr	r12, =HSCTLR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c1, c0, 0	@ SCTLR
+	ldr	r12, =(HSCTLR_EE | HSCTLR_FI)
+	and	r1, r1, r12
+ ARM(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I)			)
+ THUMB(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I | HSCTLR_TE)	)
+	orr	r1, r1, r12
+	orr	r0, r0, r1
+	isb
+	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
+	isb
+
+	@ Set stack pointer and return to the kernel
+	pop	{r0, r1, r2}
+	mov	sp, r1
+
+	@ Set HVBAR to point to the HYP vectors
+	mcr	p15, 4, r2, c12, c0, 0	@ HVBAR
+
+	eret
+
+	.ltorg
+
+	.align 12
+
+	__kvm_init_sp:
+	.globl __kvm_hyp_init_end
+__kvm_hyp_init_end:
+
+	.align 12
+__kvm_hyp_exit:
+	.globl __kvm_hyp_exit
+
+	@ Hyp-mode exception vector
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	__do_hyp_exit
+	W(b)	.
+	W(b)	.
+
+__do_hyp_exit:
+	@ Set the next HVBAR (normally the default vectors)
+	mcr	p15, 4, r0, c12, c0, 0	@ HVBAR
+
+	@ Clear the MMU bit in the HSCR
+	mrc	p15, 4, r0, c1, c0, 0	@ HSCR
+	bic	r0, r0, #HSCTLR_M
+
+	isb
+	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
+	mcr	p15, 4, r0, c8, c7, 0   @ Flush Hyp TLB, r0 ignored
+	isb
+	eret
+
+	.globl __kvm_hyp_exit_end
+__kvm_hyp_exit_end:
+
+	.popsection
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 1dc8926..bf09801 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -15,5 +15,53 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/unified.h>
+#include <asm/page.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+	.text
+	.align	PAGE_SHIFT
+
+__kvm_hyp_code_start:
+	.globl __kvm_hyp_code_start
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Flush TLBs and instruction caches of current CPU for all VMIDs
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_flush_vm_context)
+	bx	lr
+ENDPROC(__kvm_flush_vm_context)
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor world-switch code
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_vcpu_run)
+	bx	lr
+
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Hypervisor exception vector and handlers
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+	.align 5
+__kvm_hyp_vector:
+	.globl __kvm_hyp_vector
+	nop
+
+/*
+ * The below lines makes sure the HYP mode code fits in a single page (the
+ * assembler will bark at you if it doesn't). Please keep them together. If
+ * you plan to restructure the code or increase its size over a page, you'll
+ * have to fix the code in init_hyp_mode().
+ */
+__kvm_hyp_code_end:
+	.globl	__kvm_hyp_code_end
+
+	.org	__kvm_hyp_code_start + PAGE_SIZE
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 10ed464..6a7dfd4 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -15,3 +15,192 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/mman.h>
+#include <linux/kvm_host.h>
+#include <linux/io.h>
+#include <asm/idmap.h>
+#include <asm/pgalloc.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
+#include <asm/mach/map.h>
+
+static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
+
+static void free_ptes(pmd_t *pmd, unsigned long addr)
+{
+	pte_t *pte;
+	unsigned int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			pte_free_kernel(NULL, pte);
+		}
+		pmd++;
+	}
+}
+
+/**
+ * free_hyp_pmds - free a Hyp-mode level-2 tables and child level-3 tables
+ *
+ * Assumes this is a page table used strictly in Hyp-mode and therefore contains
+ * only mappings in the kernel memory area, which is above PAGE_OFFSET.
+ */
+void free_hyp_pmds(void)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = PAGE_OFFSET; addr != 0; addr += PGDIR_SIZE) {
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none(*pud))
+			continue;
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+		pud_clear(pud);
+	}
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+}
+
+/*
+ * Create a HYP pte mapping.
+ *
+ * If pfn_base is NULL, we map kernel pages into HYP with the virtual
+ * address. Otherwise, this is considered an I/O mapping and we map
+ * the physical region starting@*pfn_base to [start, end[.
+ */
+static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,
+				    unsigned long end, unsigned long *pfn_base)
+{
+	pte_t *pte;
+	unsigned long addr;
+	pgprot_t prot;
+
+	if (pfn_base)
+		prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER);
+	else
+		prot = PAGE_HYP;
+
+	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
+		pte = pte_offset_kernel(pmd, addr);
+		if (pfn_base) {
+			BUG_ON(pfn_valid(*pfn_base));
+			set_pte_ext(pte, pfn_pte(*pfn_base, prot), 0);
+			(*pfn_base)++;
+		} else {
+			struct page *page;
+			BUG_ON(!virt_addr_valid(addr));
+			page = virt_to_page(addr);
+			set_pte_ext(pte, mk_pte(page, prot), 0);
+		}
+
+	}
+}
+
+static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
+				   unsigned long end, unsigned long *pfn_base)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+	unsigned long addr, next;
+
+	for (addr = start; addr < end; addr = next) {
+		pmd = pmd_offset(pud, addr);
+
+		BUG_ON(pmd_sect(*pmd));
+
+		if (pmd_none(*pmd)) {
+			pte = pte_alloc_one_kernel(NULL, addr);
+			if (!pte) {
+				kvm_err("Cannot allocate Hyp pte\n");
+				return -ENOMEM;
+			}
+			pmd_populate_kernel(NULL, pmd, pte);
+		}
+
+		next = pmd_addr_end(addr, end);
+		create_hyp_pte_mappings(pmd, addr, next, pfn_base);
+	}
+
+	return 0;
+}
+
+static int __create_hyp_mappings(void *from, void *to, unsigned long *pfn_base)
+{
+	unsigned long start = (unsigned long)from;
+	unsigned long end = (unsigned long)to;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr, next;
+	int err = 0;
+
+	BUG_ON(start > end);
+	if (start < PAGE_OFFSET)
+		return -EINVAL;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = start; addr < end; addr = next) {
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none_or_clear_bad(pud)) {
+			pmd = pmd_alloc_one(NULL, addr);
+			if (!pmd) {
+				kvm_err("Cannot allocate Hyp pmd\n");
+				err = -ENOMEM;
+				goto out;
+			}
+			pud_populate(NULL, pud, pmd);
+		}
+
+		next = pgd_addr_end(addr, end);
+		err = create_hyp_pmd_mappings(pud, addr, next, pfn_base);
+		if (err)
+			goto out;
+	}
+out:
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+	return err;
+}
+
+/**
+ * create_hyp_mappings - map a kernel virtual address range in Hyp mode
+ * @from:	The virtual kernel start address of the range
+ * @to:		The virtual kernel end address of the range (exclusive)
+ *
+ * The same virtual address as the kernel virtual address is also used in
+ * Hyp-mode mapping to the same underlying physical pages.
+ *
+ * Note: Wrapping around zero in the "to" address is not supported.
+ */
+int create_hyp_mappings(void *from, void *to)
+{
+	return __create_hyp_mappings(from, to, NULL);
+}
+
+/**
+ * create_hyp_io_mappings - map a physical IO range in Hyp mode
+ * @from:	The virtual HYP start address of the range
+ * @to:		The virtual HYP end address of the range (exclusive)
+ * @addr:	The physical start address which gets mapped
+ */
+int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
+{
+	unsigned long pfn = __phys_to_pfn(addr);
+	return __create_hyp_mappings(from, to, &pfn);
+}
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
diff --git a/mm/memory.c b/mm/memory.c
index 5736170..0e58fdd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -383,12 +383,14 @@ void pgd_clear_bad(pgd_t *pgd)
 	pgd_ERROR(*pgd);
 	pgd_clear(pgd);
 }
+EXPORT_SYMBOL_GPL(pgd_clear_bad);
 
 void pud_clear_bad(pud_t *pud)
 {
 	pud_ERROR(*pud);
 	pud_clear(pud);
 }
+EXPORT_SYMBOL_GPL(pud_clear_bad);
 
 void pmd_clear_bad(pmd_t *pmd)
 {

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 08/15] KVM: ARM: Memory virtualization setup
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:35   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

From: Christoffer Dall <cdall@cs.columbia.edu>

This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Each entry in TLBs and caches are tagged with a VMID identifier in
addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
order that VMs are executed, and caches and tlbs are invalidated when
the VMID space has been used to allow for more than 255 simultaenously
running guests.

The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

We pre-allocate page table memory to be able to synchronize using a
spinlock and be called under rcu_read_lock from the MMU notifiers.  We
steal the mmu_memory_cache implementation from x86 and adapt for our
specific usage.

We support MMU notifiers (thanks to Marc Zyngier) through
kvm_unmap_hva and kvm_set_spte_hva.

Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
which is used by VGIC support to map the virtual CPU interface registers
to the guest. This support is added by Marc Zyngier.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_asm.h  |    2 
 arch/arm/include/asm/kvm_host.h |   18 ++
 arch/arm/include/asm/kvm_mmu.h  |    9 +
 arch/arm/kvm/Kconfig            |    1 
 arch/arm/kvm/arm.c              |   38 ++++
 arch/arm/kvm/exports.c          |    1 
 arch/arm/kvm/interrupts.S       |    8 +
 arch/arm/kvm/mmu.c              |  377 +++++++++++++++++++++++++++++++++++++++
 8 files changed, 453 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 6c40e55..201ec1f 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -29,6 +29,7 @@
 #define ARM_EXCEPTION_HVC	  7
 
 #ifndef __ASSEMBLY__
+struct kvm;
 struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
@@ -43,6 +44,7 @@ extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 24959f4..f0c72b9 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -169,4 +169,22 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 struct kvm_one_reg;
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+struct kvm;
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end);
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+
+/* We do not have shadow page tables, hence the empty hooks */
+static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
+
+static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 8252921..11f4c3a 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -33,4 +33,13 @@ int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
 void free_hyp_pmds(void);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index a07ddcc..47c5500 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM_ARM_HOST
 	depends on KVM
 	depends on MMU
 	depends on CPU_V7 && ARM_VIRT_EXT
+	select	MMU_NOTIFIER
 	---help---
 	  Provides host support for ARM processors.
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6f35aec..b97ebd0 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -82,12 +82,34 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 }
 
+/**
+ * kvm_arch_init_vm - initializes a VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
+	int ret = 0;
+
 	if (type)
 		return -EINVAL;
 
-	return 0;
+	ret = kvm_alloc_stage2_pgd(kvm);
+	if (ret)
+		goto out_fail_alloc;
+	spin_lock_init(&kvm->arch.pgd_lock);
+
+	ret = create_hyp_mappings(kvm, kvm + 1);
+	if (ret)
+		goto out_free_stage2_pgd;
+
+	/* Mark the initial VMID generation invalid */
+	kvm->arch.vmid_gen = 0;
+
+	return ret;
+out_free_stage2_pgd:
+	kvm_free_stage2_pgd(kvm);
+out_fail_alloc:
+	return ret;
 }
 
 int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
@@ -105,10 +127,16 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 	return 0;
 }
 
+/**
+ * kvm_arch_destroy_vm - destroy the VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	int i;
 
+	kvm_free_stage2_pgd(kvm);
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
@@ -190,7 +218,13 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (err)
 		goto free_vcpu;
 
+	err = create_hyp_mappings(vcpu, vcpu + 1);
+	if (err)
+		goto vcpu_uninit;
+
 	return vcpu;
+vcpu_uninit:
+	kvm_vcpu_uninit(vcpu);
 free_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 out:
@@ -199,6 +233,8 @@ out:
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
+	kvm_mmu_free_memory_caches(vcpu);
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index 8ebdf07..f39f823 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -33,5 +33,6 @@ EXPORT_SYMBOL_GPL(__kvm_hyp_code_end);
 EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
 
 EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
+EXPORT_SYMBOL_GPL(__kvm_tlb_flush_vmid);
 
 EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index bf09801..edf9ed5 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -31,6 +31,14 @@ __kvm_hyp_code_start:
 	.globl __kvm_hyp_code_start
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Flush per-VMID TLBs
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_tlb_flush_vmid)
+	bx	lr
+ENDPROC(__kvm_tlb_flush_vmid)
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Flush TLBs and instruction caches of current CPU for all VMIDs
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 6a7dfd4..ea17a97 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -23,10 +23,43 @@
 #include <asm/pgalloc.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_asm.h>
 #include <asm/mach/map.h>
 
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
+				  int min, int max)
+{
+	void *page;
+
+	BUG_ON(max > KVM_NR_MEM_OBJS);
+	if (cache->nobjs >= min)
+		return 0;
+	while (cache->nobjs < max) {
+		page = (void *)__get_free_page(PGALLOC_GFP);
+		if (!page)
+			return -ENOMEM;
+		cache->objects[cache->nobjs++] = page;
+	}
+	return 0;
+}
+
+static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+{
+	while (mc->nobjs)
+		free_page((unsigned long)mc->objects[--mc->nobjs]);
+}
+
+static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
+{
+	void *p;
+
+	BUG_ON(!mc || !mc->nobjs);
+	p = mc->objects[--mc->nobjs];
+	return p;
+}
+
 static void free_ptes(pmd_t *pmd, unsigned long addr)
 {
 	pte_t *pte;
@@ -200,7 +233,351 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
 	return __create_hyp_mappings(from, to, &pfn);
 }
 
+/**
+ * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Allocates the 1st level table only of size defined by PGD2_ORDER (can
+ * support either full 40-bit input addresses or limited to 32-bit input
+ * addresses). Clears the allocated pages.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * created, which can only be done once.
+ */
+int kvm_alloc_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+
+	if (kvm->arch.pgd != NULL) {
+		kvm_err("kvm_arch already initialized?\n");
+		return -EINVAL;
+	}
+
+	pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
+	if (!pgd)
+		return -ENOMEM;
+
+	memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
+	clean_dcache_area(pgd, PTRS_PER_PGD2 * sizeof(pgd_t));
+	kvm->arch.pgd = pgd;
+
+	return 0;
+}
+
+static void free_guest_pages(pte_t *pte, unsigned long addr)
+{
+	unsigned int i;
+	struct page *pte_page;
+
+	pte_page = virt_to_page(pte);
+
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		if (pte_present(*pte))
+			put_page(pte_page);
+		pte++;
+	}
+
+	WARN_ON(page_count(pte_page) != 1);
+}
+
+static void free_stage2_ptes(pmd_t *pmd, unsigned long addr)
+{
+	unsigned int i;
+	pte_t *pte;
+	struct page *pmd_page;
+
+	pmd_page = virt_to_page(pmd);
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		BUG_ON(pmd_sect(*pmd));
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			free_guest_pages(pte, addr);
+			pte_free_kernel(NULL, pte);
+
+			put_page(pmd_page);
+		}
+		pmd++;
+	}
+
+	WARN_ON(page_count(pmd_page) != 1);
+}
+
+/**
+ * kvm_free_stage2_pgd - free all stage-2 tables
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
+ * underlying level-2 and level-3 tables before freeing the actual level-1 table
+ * and setting the struct pointer to NULL.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * destroyed, which can only be done once.
+ */
+void kvm_free_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long long i, addr;
+	struct page *pud_page;
+
+	if (kvm->arch.pgd == NULL)
+		return;
+
+	/*
+	 * We do this slightly different than other places, since we need more
+	 * than 32 bits and for instance pgd_addr_end converts to unsigned long.
+	 */
+	addr = 0;
+	for (i = 0; i < PTRS_PER_PGD2; i++) {
+		addr = i * (unsigned long long)PGDIR_SIZE;
+		pgd = kvm->arch.pgd + i;
+		pud = pud_offset(pgd, addr);
+		pud_page = virt_to_page(pud);
+
+		if (pud_none(*pud))
+			continue;
+
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_stage2_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+		put_page(pud_page);
+	}
+
+	WARN_ON(page_count(pud_page) != 1);
+	free_pages((unsigned long)kvm->arch.pgd, PGD2_ORDER);
+	kvm->arch.pgd = NULL;
+}
+
+/**
+ * stage2_clear_pte -- Clear a stage-2 PTE.
+ * @kvm:  The VM pointer
+ * @addr: The physical address of the PTE
+ *
+ * Clear a stage-2 PTE, lowering the various ref-counts. Also takes
+ * care of invalidating the TLBs.  Must be called while holding
+ * pgd_lock, otherwise another faulting VCPU may come in and mess
+ * things behind our back.
+ */
+static void stage2_clear_pte(struct kvm *kvm, phys_addr_t addr)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	struct page *page;
+
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud))
+		return;
+
+	pmd = pmd_offset(pud, addr);
+	if (pmd_none(*pmd))
+		return;
+
+	pte = pte_offset_kernel(pmd, addr);
+	set_pte_ext(pte, __pte(0), 0);
+
+	page = virt_to_page(pte);
+	put_page(page);
+	if (page_count(page) != 1) {
+		__kvm_tlb_flush_vmid(kvm);
+		return;
+	}
+
+	/* Need to remove pte page */
+	pmd_clear(pmd);
+	pte_free_kernel(NULL, (pte_t *)((unsigned long)pte & PAGE_MASK));
+
+	page = virt_to_page(pmd);
+	put_page(page);
+	if (page_count(page) != 1) {
+		__kvm_tlb_flush_vmid(kvm);
+		return;
+	}
+
+	pud_clear(pud);
+	pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
+
+	page = virt_to_page(pud);
+	put_page(page);
+	__kvm_tlb_flush_vmid(kvm);
+}
+
+static void stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+			   phys_addr_t addr, const pte_t *new_pte)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte, old_pte;
+
+	/* Create 2nd stage page table mapping - Level 1 */
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud)) {
+		if (!cache)
+			return; /* ignore calls from kvm_set_spte_hva */
+		pmd = mmu_memory_cache_alloc(cache);
+		pud_populate(NULL, pud, pmd);
+		pmd += pmd_index(addr);
+		get_page(virt_to_page(pud));
+	} else
+		pmd = pmd_offset(pud, addr);
+
+	/* Create 2nd stage page table mapping - Level 2 */
+	if (pmd_none(*pmd)) {
+		if (!cache)
+			return; /* ignore calls from kvm_set_spte_hva */
+		pte = mmu_memory_cache_alloc(cache);
+		clean_pte_table(pte);
+		pmd_populate_kernel(NULL, pmd, pte);
+		pte += pte_index(addr);
+		get_page(virt_to_page(pmd));
+	} else
+		pte = pte_offset_kernel(pmd, addr);
+
+	/* Create 2nd stage page table mapping - Level 3 */
+	old_pte = *pte;
+	set_pte_ext(pte, *new_pte, 0);
+	if (pte_present(old_pte))
+		__kvm_tlb_flush_vmid(kvm);
+	else
+		get_page(virt_to_page(pte));
+}
+
+/**
+ * kvm_phys_addr_ioremap - map a device range to guest IPA
+ *
+ * @kvm:	The KVM pointer
+ * @guest_ipa:	The IPA at which to insert the mapping
+ * @pa:		The physical address of the device
+ * @size:	The size of the mapping
+ */
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size)
+{
+	phys_addr_t addr, end;
+	pgprot_t prot;
+	int ret = 0;
+	unsigned long pfn;
+	struct kvm_mmu_memory_cache cache = { 0, };
+
+	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
+	prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
+			L_PTE2_READ | L_PTE2_WRITE);
+	pfn = __phys_to_pfn(pa);
+
+	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
+		pte_t pte = pfn_pte(pfn, prot);
+
+		ret = mmu_topup_memory_cache(&cache, 2, 2);
+		if (ret)
+			goto out;
+		spin_lock(&kvm->arch.pgd_lock);
+		stage2_set_pte(kvm, &cache, addr, &pte);
+		spin_unlock(&kvm->arch.pgd_lock);
+
+		pfn++;
+	}
+
+out:
+	mmu_free_memory_cache(&cache);
+	return ret;
+}
+
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	return -EINVAL;
 }
+
+static void handle_hva_to_gpa(struct kvm *kvm, unsigned long hva,
+			      void (*handler)(struct kvm *kvm, unsigned long hva,
+					      gpa_t gpa, void *data),
+			      void *data)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+
+	slots = kvm_memslots(kvm);
+
+	/* we only care about the pages that the guest sees */
+	kvm_for_each_memslot(memslot, slots) {
+		unsigned long start = memslot->userspace_addr;
+		unsigned long end;
+
+		end = start + (memslot->npages << PAGE_SHIFT);
+		if (hva >= start && hva < end) {
+			gpa_t gpa;
+			gpa_t gpa_offset = hva - start;
+			gpa = (memslot->base_gfn << PAGE_SHIFT) + gpa_offset;
+			handler(kvm, hva, gpa, data);
+		}
+	}
+}
+
+static void kvm_unmap_hva_handler(struct kvm *kvm, unsigned long hva,
+				  gpa_t gpa, void *data)
+{
+	spin_lock(&kvm->arch.pgd_lock);
+	stage2_clear_pte(kvm, gpa);
+	spin_unlock(&kvm->arch.pgd_lock);
+}
+
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+	if (!kvm->arch.pgd)
+		return 0;
+
+	handle_hva_to_gpa(kvm, hva, &kvm_unmap_hva_handler, NULL);
+
+	return 0;
+}
+
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end)
+{
+	unsigned long addr;
+	int ret;
+
+	BUG_ON((start | end) & (~PAGE_MASK));
+
+	for (addr = start; addr < end; addr += PAGE_SIZE) {
+		ret = kvm_unmap_hva(kvm, addr);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static void kvm_set_spte_handler(struct kvm *kvm, unsigned long hva,
+				 gpa_t gpa, void *data)
+{
+	pte_t *pte = (pte_t *)data;
+
+	spin_lock(&kvm->arch.pgd_lock);
+	stage2_set_pte(kvm, NULL, gpa, pte);
+	spin_unlock(&kvm->arch.pgd_lock);
+}
+
+
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+{
+	pte_t stage2_pte;
+
+	if (!kvm->arch.pgd)
+		return;
+
+	stage2_pte = pfn_pte(pte_pfn(pte), PAGE_KVM_GUEST);
+	handle_hva_to_gpa(kvm, hva, &kvm_set_spte_handler, &stage2_pte);
+}
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
+{
+	mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+}


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 08/15] KVM: ARM: Memory virtualization setup
@ 2012-09-15 15:35   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christoffer Dall <cdall@cs.columbia.edu>

This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Each entry in TLBs and caches are tagged with a VMID identifier in
addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
order that VMs are executed, and caches and tlbs are invalidated when
the VMID space has been used to allow for more than 255 simultaenously
running guests.

The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

We pre-allocate page table memory to be able to synchronize using a
spinlock and be called under rcu_read_lock from the MMU notifiers.  We
steal the mmu_memory_cache implementation from x86 and adapt for our
specific usage.

We support MMU notifiers (thanks to Marc Zyngier) through
kvm_unmap_hva and kvm_set_spte_hva.

Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
which is used by VGIC support to map the virtual CPU interface registers
to the guest. This support is added by Marc Zyngier.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_asm.h  |    2 
 arch/arm/include/asm/kvm_host.h |   18 ++
 arch/arm/include/asm/kvm_mmu.h  |    9 +
 arch/arm/kvm/Kconfig            |    1 
 arch/arm/kvm/arm.c              |   38 ++++
 arch/arm/kvm/exports.c          |    1 
 arch/arm/kvm/interrupts.S       |    8 +
 arch/arm/kvm/mmu.c              |  377 +++++++++++++++++++++++++++++++++++++++
 8 files changed, 453 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 6c40e55..201ec1f 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -29,6 +29,7 @@
 #define ARM_EXCEPTION_HVC	  7
 
 #ifndef __ASSEMBLY__
+struct kvm;
 struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
@@ -43,6 +44,7 @@ extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 24959f4..f0c72b9 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -169,4 +169,22 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 struct kvm_one_reg;
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+struct kvm;
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end);
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+
+/* We do not have shadow page tables, hence the empty hooks */
+static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
+
+static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 8252921..11f4c3a 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -33,4 +33,13 @@ int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
 void free_hyp_pmds(void);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index a07ddcc..47c5500 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM_ARM_HOST
 	depends on KVM
 	depends on MMU
 	depends on CPU_V7 && ARM_VIRT_EXT
+	select	MMU_NOTIFIER
 	---help---
 	  Provides host support for ARM processors.
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6f35aec..b97ebd0 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -82,12 +82,34 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 }
 
+/**
+ * kvm_arch_init_vm - initializes a VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
+	int ret = 0;
+
 	if (type)
 		return -EINVAL;
 
-	return 0;
+	ret = kvm_alloc_stage2_pgd(kvm);
+	if (ret)
+		goto out_fail_alloc;
+	spin_lock_init(&kvm->arch.pgd_lock);
+
+	ret = create_hyp_mappings(kvm, kvm + 1);
+	if (ret)
+		goto out_free_stage2_pgd;
+
+	/* Mark the initial VMID generation invalid */
+	kvm->arch.vmid_gen = 0;
+
+	return ret;
+out_free_stage2_pgd:
+	kvm_free_stage2_pgd(kvm);
+out_fail_alloc:
+	return ret;
 }
 
 int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
@@ -105,10 +127,16 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 	return 0;
 }
 
+/**
+ * kvm_arch_destroy_vm - destroy the VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	int i;
 
+	kvm_free_stage2_pgd(kvm);
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
@@ -190,7 +218,13 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (err)
 		goto free_vcpu;
 
+	err = create_hyp_mappings(vcpu, vcpu + 1);
+	if (err)
+		goto vcpu_uninit;
+
 	return vcpu;
+vcpu_uninit:
+	kvm_vcpu_uninit(vcpu);
 free_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 out:
@@ -199,6 +233,8 @@ out:
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
+	kvm_mmu_free_memory_caches(vcpu);
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index 8ebdf07..f39f823 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -33,5 +33,6 @@ EXPORT_SYMBOL_GPL(__kvm_hyp_code_end);
 EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
 
 EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
+EXPORT_SYMBOL_GPL(__kvm_tlb_flush_vmid);
 
 EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index bf09801..edf9ed5 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -31,6 +31,14 @@ __kvm_hyp_code_start:
 	.globl __kvm_hyp_code_start
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Flush per-VMID TLBs
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+ENTRY(__kvm_tlb_flush_vmid)
+	bx	lr
+ENDPROC(__kvm_tlb_flush_vmid)
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Flush TLBs and instruction caches of current CPU for all VMIDs
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 6a7dfd4..ea17a97 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -23,10 +23,43 @@
 #include <asm/pgalloc.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_asm.h>
 #include <asm/mach/map.h>
 
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
+				  int min, int max)
+{
+	void *page;
+
+	BUG_ON(max > KVM_NR_MEM_OBJS);
+	if (cache->nobjs >= min)
+		return 0;
+	while (cache->nobjs < max) {
+		page = (void *)__get_free_page(PGALLOC_GFP);
+		if (!page)
+			return -ENOMEM;
+		cache->objects[cache->nobjs++] = page;
+	}
+	return 0;
+}
+
+static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+{
+	while (mc->nobjs)
+		free_page((unsigned long)mc->objects[--mc->nobjs]);
+}
+
+static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
+{
+	void *p;
+
+	BUG_ON(!mc || !mc->nobjs);
+	p = mc->objects[--mc->nobjs];
+	return p;
+}
+
 static void free_ptes(pmd_t *pmd, unsigned long addr)
 {
 	pte_t *pte;
@@ -200,7 +233,351 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
 	return __create_hyp_mappings(from, to, &pfn);
 }
 
+/**
+ * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Allocates the 1st level table only of size defined by PGD2_ORDER (can
+ * support either full 40-bit input addresses or limited to 32-bit input
+ * addresses). Clears the allocated pages.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * created, which can only be done once.
+ */
+int kvm_alloc_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+
+	if (kvm->arch.pgd != NULL) {
+		kvm_err("kvm_arch already initialized?\n");
+		return -EINVAL;
+	}
+
+	pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD2_ORDER);
+	if (!pgd)
+		return -ENOMEM;
+
+	memset(pgd, 0, PTRS_PER_PGD2 * sizeof(pgd_t));
+	clean_dcache_area(pgd, PTRS_PER_PGD2 * sizeof(pgd_t));
+	kvm->arch.pgd = pgd;
+
+	return 0;
+}
+
+static void free_guest_pages(pte_t *pte, unsigned long addr)
+{
+	unsigned int i;
+	struct page *pte_page;
+
+	pte_page = virt_to_page(pte);
+
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		if (pte_present(*pte))
+			put_page(pte_page);
+		pte++;
+	}
+
+	WARN_ON(page_count(pte_page) != 1);
+}
+
+static void free_stage2_ptes(pmd_t *pmd, unsigned long addr)
+{
+	unsigned int i;
+	pte_t *pte;
+	struct page *pmd_page;
+
+	pmd_page = virt_to_page(pmd);
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		BUG_ON(pmd_sect(*pmd));
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			free_guest_pages(pte, addr);
+			pte_free_kernel(NULL, pte);
+
+			put_page(pmd_page);
+		}
+		pmd++;
+	}
+
+	WARN_ON(page_count(pmd_page) != 1);
+}
+
+/**
+ * kvm_free_stage2_pgd - free all stage-2 tables
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
+ * underlying level-2 and level-3 tables before freeing the actual level-1 table
+ * and setting the struct pointer to NULL.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * destroyed, which can only be done once.
+ */
+void kvm_free_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long long i, addr;
+	struct page *pud_page;
+
+	if (kvm->arch.pgd == NULL)
+		return;
+
+	/*
+	 * We do this slightly different than other places, since we need more
+	 * than 32 bits and for instance pgd_addr_end converts to unsigned long.
+	 */
+	addr = 0;
+	for (i = 0; i < PTRS_PER_PGD2; i++) {
+		addr = i * (unsigned long long)PGDIR_SIZE;
+		pgd = kvm->arch.pgd + i;
+		pud = pud_offset(pgd, addr);
+		pud_page = virt_to_page(pud);
+
+		if (pud_none(*pud))
+			continue;
+
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_stage2_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+		put_page(pud_page);
+	}
+
+	WARN_ON(page_count(pud_page) != 1);
+	free_pages((unsigned long)kvm->arch.pgd, PGD2_ORDER);
+	kvm->arch.pgd = NULL;
+}
+
+/**
+ * stage2_clear_pte -- Clear a stage-2 PTE.
+ * @kvm:  The VM pointer
+ * @addr: The physical address of the PTE
+ *
+ * Clear a stage-2 PTE, lowering the various ref-counts. Also takes
+ * care of invalidating the TLBs.  Must be called while holding
+ * pgd_lock, otherwise another faulting VCPU may come in and mess
+ * things behind our back.
+ */
+static void stage2_clear_pte(struct kvm *kvm, phys_addr_t addr)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	struct page *page;
+
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud))
+		return;
+
+	pmd = pmd_offset(pud, addr);
+	if (pmd_none(*pmd))
+		return;
+
+	pte = pte_offset_kernel(pmd, addr);
+	set_pte_ext(pte, __pte(0), 0);
+
+	page = virt_to_page(pte);
+	put_page(page);
+	if (page_count(page) != 1) {
+		__kvm_tlb_flush_vmid(kvm);
+		return;
+	}
+
+	/* Need to remove pte page */
+	pmd_clear(pmd);
+	pte_free_kernel(NULL, (pte_t *)((unsigned long)pte & PAGE_MASK));
+
+	page = virt_to_page(pmd);
+	put_page(page);
+	if (page_count(page) != 1) {
+		__kvm_tlb_flush_vmid(kvm);
+		return;
+	}
+
+	pud_clear(pud);
+	pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
+
+	page = virt_to_page(pud);
+	put_page(page);
+	__kvm_tlb_flush_vmid(kvm);
+}
+
+static void stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+			   phys_addr_t addr, const pte_t *new_pte)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte, old_pte;
+
+	/* Create 2nd stage page table mapping - Level 1 */
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud)) {
+		if (!cache)
+			return; /* ignore calls from kvm_set_spte_hva */
+		pmd = mmu_memory_cache_alloc(cache);
+		pud_populate(NULL, pud, pmd);
+		pmd += pmd_index(addr);
+		get_page(virt_to_page(pud));
+	} else
+		pmd = pmd_offset(pud, addr);
+
+	/* Create 2nd stage page table mapping - Level 2 */
+	if (pmd_none(*pmd)) {
+		if (!cache)
+			return; /* ignore calls from kvm_set_spte_hva */
+		pte = mmu_memory_cache_alloc(cache);
+		clean_pte_table(pte);
+		pmd_populate_kernel(NULL, pmd, pte);
+		pte += pte_index(addr);
+		get_page(virt_to_page(pmd));
+	} else
+		pte = pte_offset_kernel(pmd, addr);
+
+	/* Create 2nd stage page table mapping - Level 3 */
+	old_pte = *pte;
+	set_pte_ext(pte, *new_pte, 0);
+	if (pte_present(old_pte))
+		__kvm_tlb_flush_vmid(kvm);
+	else
+		get_page(virt_to_page(pte));
+}
+
+/**
+ * kvm_phys_addr_ioremap - map a device range to guest IPA
+ *
+ * @kvm:	The KVM pointer
+ * @guest_ipa:	The IPA at which to insert the mapping
+ * @pa:		The physical address of the device
+ * @size:	The size of the mapping
+ */
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size)
+{
+	phys_addr_t addr, end;
+	pgprot_t prot;
+	int ret = 0;
+	unsigned long pfn;
+	struct kvm_mmu_memory_cache cache = { 0, };
+
+	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
+	prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
+			L_PTE2_READ | L_PTE2_WRITE);
+	pfn = __phys_to_pfn(pa);
+
+	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
+		pte_t pte = pfn_pte(pfn, prot);
+
+		ret = mmu_topup_memory_cache(&cache, 2, 2);
+		if (ret)
+			goto out;
+		spin_lock(&kvm->arch.pgd_lock);
+		stage2_set_pte(kvm, &cache, addr, &pte);
+		spin_unlock(&kvm->arch.pgd_lock);
+
+		pfn++;
+	}
+
+out:
+	mmu_free_memory_cache(&cache);
+	return ret;
+}
+
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	return -EINVAL;
 }
+
+static void handle_hva_to_gpa(struct kvm *kvm, unsigned long hva,
+			      void (*handler)(struct kvm *kvm, unsigned long hva,
+					      gpa_t gpa, void *data),
+			      void *data)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+
+	slots = kvm_memslots(kvm);
+
+	/* we only care about the pages that the guest sees */
+	kvm_for_each_memslot(memslot, slots) {
+		unsigned long start = memslot->userspace_addr;
+		unsigned long end;
+
+		end = start + (memslot->npages << PAGE_SHIFT);
+		if (hva >= start && hva < end) {
+			gpa_t gpa;
+			gpa_t gpa_offset = hva - start;
+			gpa = (memslot->base_gfn << PAGE_SHIFT) + gpa_offset;
+			handler(kvm, hva, gpa, data);
+		}
+	}
+}
+
+static void kvm_unmap_hva_handler(struct kvm *kvm, unsigned long hva,
+				  gpa_t gpa, void *data)
+{
+	spin_lock(&kvm->arch.pgd_lock);
+	stage2_clear_pte(kvm, gpa);
+	spin_unlock(&kvm->arch.pgd_lock);
+}
+
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+	if (!kvm->arch.pgd)
+		return 0;
+
+	handle_hva_to_gpa(kvm, hva, &kvm_unmap_hva_handler, NULL);
+
+	return 0;
+}
+
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end)
+{
+	unsigned long addr;
+	int ret;
+
+	BUG_ON((start | end) & (~PAGE_MASK));
+
+	for (addr = start; addr < end; addr += PAGE_SIZE) {
+		ret = kvm_unmap_hva(kvm, addr);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static void kvm_set_spte_handler(struct kvm *kvm, unsigned long hva,
+				 gpa_t gpa, void *data)
+{
+	pte_t *pte = (pte_t *)data;
+
+	spin_lock(&kvm->arch.pgd_lock);
+	stage2_set_pte(kvm, NULL, gpa, pte);
+	spin_unlock(&kvm->arch.pgd_lock);
+}
+
+
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+{
+	pte_t stage2_pte;
+
+	if (!kvm->arch.pgd)
+		return;
+
+	stage2_pte = pfn_pte(pte_pfn(pte), PAGE_KVM_GUEST);
+	handle_hva_to_gpa(kvm, hva, &kvm_set_spte_handler, &stage2_pte);
+}
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
+{
+	mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+}

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 09/15] KVM: ARM: Inject IRQs and FIQs from userspace
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:35   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

From: Christoffer Dall <cdall@cs.columbia.edu>

All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
works semantically well for the GIC as we in fact raise/lower a line on
a machine component (the gic).  The IOCTL uses the follwing struct.

struct kvm_irq_level {
	union {
		__u32 irq;     /* GSI */
		__s32 status;  /* not used for KVM_IRQ_LEVEL */
	};
	__u32 level;           /* 0 or 1 */
};

ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
specific cpus.  The irq field is interpreted like this:

  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
  field: | irq_type  | vcpu_index |   irq_number   |

The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
- irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
               (the vcpu_index field is ignored)
- irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)

The irq_number thus corresponds to the irq ID in as in the GICv2 specs.

This is documented in Documentation/kvm/api.txt.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |   25 +++++++++++--
 arch/arm/include/asm/kvm.h        |   21 +++++++++++
 arch/arm/include/asm/kvm_arm.h    |    1 +
 arch/arm/kvm/arm.c                |   70 +++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h              |   23 ++++++++++++
 include/linux/kvm.h               |    1 +
 6 files changed, 137 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 67640c6..605986f 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -615,15 +615,32 @@ created.
 4.25 KVM_IRQ_LINE
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irq_level
 Returns: 0 on success, -1 on error
 
 Sets the level of a GSI input to the interrupt controller model in the kernel.
-Requires that an interrupt controller model has been previously created with
-KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
-to be set to 1 and then back to 0.
+On some architectures it is required that an interrupt controller model has
+been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
+interrupts require the level to be set to 1 and then back to 0.
+
+ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
+(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
+specific cpus.  The irq field is interpreted like this:
+
+  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
+  field: | irq_type  | vcpu_index |     irq_id     |
+
+The irq_type field has the following values:
+- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
+- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
+               (the vcpu_index field is ignored)
+- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
+
+(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
+
+In both cases, level is used to raise/lower the line.
 
 struct kvm_irq_level {
 	union {
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index a13b582..131e632 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -22,6 +22,7 @@
 #include <asm/types.h>
 
 #define __KVM_HAVE_GUEST_DEBUG
+#define __KVM_HAVE_IRQ_LINE
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
@@ -85,4 +86,24 @@ struct kvm_reg_list {
 #define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
 #define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
 
+/* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_TYPE_SHIFT		24
+#define KVM_ARM_IRQ_TYPE_MASK		0xff
+#define KVM_ARM_IRQ_VCPU_SHIFT		16
+#define KVM_ARM_IRQ_VCPU_MASK		0xff
+#define KVM_ARM_IRQ_NUM_SHIFT		0
+#define KVM_ARM_IRQ_NUM_MASK		0xffff
+
+/* irq_type field */
+#define KVM_ARM_IRQ_TYPE_CPU		0
+#define KVM_ARM_IRQ_TYPE_SPI		1
+#define KVM_ARM_IRQ_TYPE_PPI		2
+
+/* out-of-kernel GIC cpu interrupt injection irq_number field */
+#define KVM_ARM_IRQ_CPU_IRQ		0
+#define KVM_ARM_IRQ_CPU_FIQ		1
+
+/* Highest supported SPI, from VGIC_NR_IRQS */
+#define KVM_ARM_IRQ_GIC_MAX		127
+
 #endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 6e46541..0f641c1 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -74,6 +74,7 @@
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
 			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
 			HCR_SWIO | HCR_TIDCP)
+#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index b97ebd0..8a87fc7 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -24,6 +24,7 @@
 #include <linux/fs.h>
 #include <linux/mman.h>
 #include <linux/sched.h>
+#include <linux/kvm.h>
 #include <trace/events/kvm.h>
 
 #define CREATE_TRACE_POINTS
@@ -271,6 +272,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+	vcpu->cpu = cpu;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -311,6 +313,74 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return -EINVAL;
 }
 
+static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
+{
+	int bit_index;
+	bool set;
+	unsigned long *ptr;
+
+	if (number == KVM_ARM_IRQ_CPU_IRQ)
+		bit_index = ffs(HCR_VI) - 1;
+	else /* KVM_ARM_IRQ_CPU_FIQ */
+		bit_index = ffs(HCR_VF) - 1;
+
+	ptr = (unsigned long *)&vcpu->arch.irq_lines;
+	if (level)
+		set = test_and_set_bit(bit_index, ptr);
+	else
+		set = test_and_clear_bit(bit_index, ptr);
+
+	/*
+	 * If we didn't change anything, no need to wake up or kick other CPUs
+	 */
+	if (set == level)
+		return 0;
+
+	/*
+	 * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
+	 * trigger a world-switch round on the running physical CPU to set the
+	 * virtual IRQ/FIQ fields in the HCR appropriately.
+	 */
+	kvm_vcpu_kick(vcpu);
+
+	return 0;
+}
+
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
+{
+	u32 irq = irq_level->irq;
+	unsigned int irq_type, vcpu_idx, irq_num;
+	int nrcpus = atomic_read(&kvm->online_vcpus);
+	struct kvm_vcpu *vcpu = NULL;
+	bool level = irq_level->level;
+
+	irq_type = (irq >> KVM_ARM_IRQ_TYPE_SHIFT) & KVM_ARM_IRQ_TYPE_MASK;
+	vcpu_idx = (irq >> KVM_ARM_IRQ_VCPU_SHIFT) & KVM_ARM_IRQ_VCPU_MASK;
+	irq_num = (irq >> KVM_ARM_IRQ_NUM_SHIFT) & KVM_ARM_IRQ_NUM_MASK;
+
+	trace_kvm_irq_line(irq_type, vcpu_idx, irq_num, irq_level->level);
+
+	if (irq_type == KVM_ARM_IRQ_TYPE_CPU ||
+	    irq_type == KVM_ARM_IRQ_TYPE_PPI) {
+		if (vcpu_idx >= nrcpus)
+			return -EINVAL;
+
+		vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+		if (!vcpu)
+			return -EINVAL;
+	}
+
+	switch (irq_type) {
+	case KVM_ARM_IRQ_TYPE_CPU:
+		if (irq_num > KVM_ARM_IRQ_CPU_FIQ)
+			return -EINVAL;
+
+		return vcpu_interrupt_line(vcpu, irq_num, level);
+	}
+
+	return -EINVAL;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index f8869c1..a283eef 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,7 +39,30 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_irq_line,
+	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
+	TP_ARGS(type, vcpu_idx, irq_num, level),
 
+	TP_STRUCT__entry(
+		__field(	unsigned int,	type		)
+		__field(	int,		vcpu_idx	)
+		__field(	int,		irq_num		)
+		__field(	int,		level		)
+	),
+
+	TP_fast_assign(
+		__entry->type		= type;
+		__entry->vcpu_idx	= vcpu_idx;
+		__entry->irq_num	= irq_num;
+		__entry->level		= level;
+	),
+
+	TP_printk("Inject %s interrupt (%d), vcpu->idx: %d, num: %d, level: %d",
+		  (__entry->type == KVM_ARM_IRQ_TYPE_CPU) ? "CPU" :
+		  (__entry->type == KVM_ARM_IRQ_TYPE_PPI) ? "VGIC PPI" :
+		  (__entry->type == KVM_ARM_IRQ_TYPE_SPI) ? "VGIC SPI" : "UNKNOWN",
+		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
+);
 
 #endif /* _TRACE_KVM_H */
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index a960f66..8091b1d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -115,6 +115,7 @@ struct kvm_irq_level {
 	 * ACPI gsi notion of irq.
 	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
 	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
+	 * For ARM: IRQ: irq = (2*vcpu_index). FIQ: irq = (2*vcpu_indx + 1).
 	 */
 	union {
 		__u32 irq;


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 09/15] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2012-09-15 15:35   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christoffer Dall <cdall@cs.columbia.edu>

All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
works semantically well for the GIC as we in fact raise/lower a line on
a machine component (the gic).  The IOCTL uses the follwing struct.

struct kvm_irq_level {
	union {
		__u32 irq;     /* GSI */
		__s32 status;  /* not used for KVM_IRQ_LEVEL */
	};
	__u32 level;           /* 0 or 1 */
};

ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
specific cpus.  The irq field is interpreted like this:

 ?bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
  field: | irq_type  | vcpu_index |   irq_number   |

The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
- irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
               (the vcpu_index field is ignored)
- irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)

The irq_number thus corresponds to the irq ID in as in the GICv2 specs.

This is documented in Documentation/kvm/api.txt.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |   25 +++++++++++--
 arch/arm/include/asm/kvm.h        |   21 +++++++++++
 arch/arm/include/asm/kvm_arm.h    |    1 +
 arch/arm/kvm/arm.c                |   70 +++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h              |   23 ++++++++++++
 include/linux/kvm.h               |    1 +
 6 files changed, 137 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 67640c6..605986f 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -615,15 +615,32 @@ created.
 4.25 KVM_IRQ_LINE
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irq_level
 Returns: 0 on success, -1 on error
 
 Sets the level of a GSI input to the interrupt controller model in the kernel.
-Requires that an interrupt controller model has been previously created with
-KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
-to be set to 1 and then back to 0.
+On some architectures it is required that an interrupt controller model has
+been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
+interrupts require the level to be set to 1 and then back to 0.
+
+ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
+(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
+specific cpus.  The irq field is interpreted like this:
+
+ ?bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
+  field: | irq_type  | vcpu_index |     irq_id     |
+
+The irq_type field has the following values:
+- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
+- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
+               (the vcpu_index field is ignored)
+- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
+
+(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
+
+In both cases, level is used to raise/lower the line.
 
 struct kvm_irq_level {
 	union {
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index a13b582..131e632 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -22,6 +22,7 @@
 #include <asm/types.h>
 
 #define __KVM_HAVE_GUEST_DEBUG
+#define __KVM_HAVE_IRQ_LINE
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
@@ -85,4 +86,24 @@ struct kvm_reg_list {
 #define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
 #define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
 
+/* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_TYPE_SHIFT		24
+#define KVM_ARM_IRQ_TYPE_MASK		0xff
+#define KVM_ARM_IRQ_VCPU_SHIFT		16
+#define KVM_ARM_IRQ_VCPU_MASK		0xff
+#define KVM_ARM_IRQ_NUM_SHIFT		0
+#define KVM_ARM_IRQ_NUM_MASK		0xffff
+
+/* irq_type field */
+#define KVM_ARM_IRQ_TYPE_CPU		0
+#define KVM_ARM_IRQ_TYPE_SPI		1
+#define KVM_ARM_IRQ_TYPE_PPI		2
+
+/* out-of-kernel GIC cpu interrupt injection irq_number field */
+#define KVM_ARM_IRQ_CPU_IRQ		0
+#define KVM_ARM_IRQ_CPU_FIQ		1
+
+/* Highest supported SPI, from VGIC_NR_IRQS */
+#define KVM_ARM_IRQ_GIC_MAX		127
+
 #endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 6e46541..0f641c1 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -74,6 +74,7 @@
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
 			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
 			HCR_SWIO | HCR_TIDCP)
+#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index b97ebd0..8a87fc7 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -24,6 +24,7 @@
 #include <linux/fs.h>
 #include <linux/mman.h>
 #include <linux/sched.h>
+#include <linux/kvm.h>
 #include <trace/events/kvm.h>
 
 #define CREATE_TRACE_POINTS
@@ -271,6 +272,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+	vcpu->cpu = cpu;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -311,6 +313,74 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return -EINVAL;
 }
 
+static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
+{
+	int bit_index;
+	bool set;
+	unsigned long *ptr;
+
+	if (number == KVM_ARM_IRQ_CPU_IRQ)
+		bit_index = ffs(HCR_VI) - 1;
+	else /* KVM_ARM_IRQ_CPU_FIQ */
+		bit_index = ffs(HCR_VF) - 1;
+
+	ptr = (unsigned long *)&vcpu->arch.irq_lines;
+	if (level)
+		set = test_and_set_bit(bit_index, ptr);
+	else
+		set = test_and_clear_bit(bit_index, ptr);
+
+	/*
+	 * If we didn't change anything, no need to wake up or kick other CPUs
+	 */
+	if (set == level)
+		return 0;
+
+	/*
+	 * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
+	 * trigger a world-switch round on the running physical CPU to set the
+	 * virtual IRQ/FIQ fields in the HCR appropriately.
+	 */
+	kvm_vcpu_kick(vcpu);
+
+	return 0;
+}
+
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
+{
+	u32 irq = irq_level->irq;
+	unsigned int irq_type, vcpu_idx, irq_num;
+	int nrcpus = atomic_read(&kvm->online_vcpus);
+	struct kvm_vcpu *vcpu = NULL;
+	bool level = irq_level->level;
+
+	irq_type = (irq >> KVM_ARM_IRQ_TYPE_SHIFT) & KVM_ARM_IRQ_TYPE_MASK;
+	vcpu_idx = (irq >> KVM_ARM_IRQ_VCPU_SHIFT) & KVM_ARM_IRQ_VCPU_MASK;
+	irq_num = (irq >> KVM_ARM_IRQ_NUM_SHIFT) & KVM_ARM_IRQ_NUM_MASK;
+
+	trace_kvm_irq_line(irq_type, vcpu_idx, irq_num, irq_level->level);
+
+	if (irq_type == KVM_ARM_IRQ_TYPE_CPU ||
+	    irq_type == KVM_ARM_IRQ_TYPE_PPI) {
+		if (vcpu_idx >= nrcpus)
+			return -EINVAL;
+
+		vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+		if (!vcpu)
+			return -EINVAL;
+	}
+
+	switch (irq_type) {
+	case KVM_ARM_IRQ_TYPE_CPU:
+		if (irq_num > KVM_ARM_IRQ_CPU_FIQ)
+			return -EINVAL;
+
+		return vcpu_interrupt_line(vcpu, irq_num, level);
+	}
+
+	return -EINVAL;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index f8869c1..a283eef 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,7 +39,30 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_irq_line,
+	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
+	TP_ARGS(type, vcpu_idx, irq_num, level),
 
+	TP_STRUCT__entry(
+		__field(	unsigned int,	type		)
+		__field(	int,		vcpu_idx	)
+		__field(	int,		irq_num		)
+		__field(	int,		level		)
+	),
+
+	TP_fast_assign(
+		__entry->type		= type;
+		__entry->vcpu_idx	= vcpu_idx;
+		__entry->irq_num	= irq_num;
+		__entry->level		= level;
+	),
+
+	TP_printk("Inject %s interrupt (%d), vcpu->idx: %d, num: %d, level: %d",
+		  (__entry->type == KVM_ARM_IRQ_TYPE_CPU) ? "CPU" :
+		  (__entry->type == KVM_ARM_IRQ_TYPE_PPI) ? "VGIC PPI" :
+		  (__entry->type == KVM_ARM_IRQ_TYPE_SPI) ? "VGIC SPI" : "UNKNOWN",
+		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
+);
 
 #endif /* _TRACE_KVM_H */
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index a960f66..8091b1d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -115,6 +115,7 @@ struct kvm_irq_level {
 	 * ACPI gsi notion of irq.
 	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
 	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
+	 * For ARM: IRQ: irq = (2*vcpu_index). FIQ: irq = (2*vcpu_indx + 1).
 	 */
 	union {
 		__u32 irq;

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 10/15] KVM: ARM: World-switch implementation
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:35   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

From: Christoffer Dall <cdall@cs.columbia.edu>

Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.

The following Hyp-ABI is also documented in the code:

Hyp-ABI: Switching from host kernel to Hyp-mode:
   Switching to Hyp mode is done through a simple HVC #0 instruction. The
   exception vector code will check that the HVC comes from VMID==0 and if
   so will store the necessary state on the Hyp stack, which will look like
   this (growing downwards, see the hyp_hvc handler):
     ...
     stack_page + 4: spsr (Host-SVC cpsr)
     stack_page    : lr_usr
     --------------: stack bottom

Hyp-ABI: Switching from Hyp-mode to host kernel SVC mode:
   When returning from Hyp mode to SVC mode, another HVC instruction is
   executed from Hyp mode, which is taken in the hyp_svc handler. The
   bottom of the Hyp is derived from the Hyp stack pointer (only a single
   page aligned stack is used per CPU) and the initial SVC registers are
   used to restore the host state.

Hyp-ABI: Change the HVBAR:
   When removing the KVM module we want to reset our hold on Hyp mode.
   This is accomplished by calling HVC #0xff from the host kernel
   (VMID==0) with the desired new HVBAR in r0.

Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.

SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.

Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.

To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU.  After a guest exit, the VFP state is
returned to the host.  When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state.  We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h  |   38 ++
 arch/arm/include/asm/kvm_host.h |    9 
 arch/arm/kernel/asm-offsets.c   |   44 ++
 arch/arm/kvm/arm.c              |  165 +++++++++
 arch/arm/kvm/interrupts.S       |  712 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 965 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 0f641c1..ee345a6 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -104,6 +104,18 @@
 #define TTBCR_T0SZ	3
 #define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
 
+/* Hyp System Trap Register */
+#define HSTR_T(x)	(1 << x)
+#define HSTR_TTEE	(1 << 16)
+#define HSTR_TJDBX	(1 << 17)
+
+/* Hyp Coprocessor Trap Register */
+#define HCPTR_TCP(x)	(1 << x)
+#define HCPTR_TCP_MASK	(0x3fff)
+#define HCPTR_TASE	(1 << 15)
+#define HCPTR_TTA	(1 << 20)
+#define HCPTR_TCPAC	(1 << 31)
+
 /* Hyp Debug Configuration Register bits */
 #define HDCR_TDRA	(1 << 11)
 #define HDCR_TDOSA	(1 << 10)
@@ -134,5 +146,31 @@
 #define VTTBR_X		(5 - VTCR_GUEST_T0SZ)
 #endif
 
+/* Hyp Syndrome Register (HSR) bits */
+#define HSR_EC_SHIFT	(26)
+#define HSR_EC		(0x3fU << HSR_EC_SHIFT)
+#define HSR_IL		(1U << 25)
+#define HSR_ISS		(HSR_IL - 1)
+#define HSR_ISV_SHIFT	(24)
+#define HSR_ISV		(1U << HSR_ISV_SHIFT)
+
+#define HSR_EC_UNKNOWN	(0x00)
+#define HSR_EC_WFI	(0x01)
+#define HSR_EC_CP15_32	(0x03)
+#define HSR_EC_CP15_64	(0x04)
+#define HSR_EC_CP14_MR	(0x05)
+#define HSR_EC_CP14_LS	(0x06)
+#define HSR_EC_CP_0_13	(0x07)
+#define HSR_EC_CP10_ID	(0x08)
+#define HSR_EC_JAZELLE	(0x09)
+#define HSR_EC_BXJ	(0x0A)
+#define HSR_EC_CP14_64	(0x0C)
+#define HSR_EC_SVC_HYP	(0x11)
+#define HSR_EC_HVC	(0x12)
+#define HSR_EC_SMC	(0x13)
+#define HSR_EC_IABT	(0x20)
+#define HSR_EC_IABT_HYP	(0x21)
+#define HSR_EC_DABT	(0x24)
+#define HSR_EC_DABT_HYP	(0x25)
 
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index f0c72b9..ca4c079 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -20,6 +20,7 @@
 #define __ARM_KVM_HOST_H__
 
 #include <asm/kvm.h>
+#include <asm/fpstate.h>
 
 #define KVM_MAX_VCPUS 4
 #define KVM_MEMORY_SLOTS 32
@@ -137,6 +138,14 @@ struct kvm_vcpu_arch {
 	u32 hifar;		/* Hyp Inst. Fault Address Register */
 	u32 hpfar;		/* Hyp IPA Fault Address Register */
 
+	/* Floating point registers (VFP and Advanced SIMD/NEON) */
+	struct vfp_hard_struct vfp_guest;
+	struct vfp_hard_struct *vfp_host;
+
+	/*
+	 * Anything that is not used directly from assembly code goes
+	 * here.
+	 */
 	/* IO related fields */
 	struct {
 		bool sign_extend;	/* for byte/halfword loads */
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 1429d89..cd8fc86 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -13,6 +13,7 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/dma-mapping.h>
+#include <linux/kvm_host.h>
 #include <asm/cacheflush.h>
 #include <asm/glue-df.h>
 #include <asm/glue-pf.h>
@@ -144,5 +145,48 @@ int main(void)
   DEFINE(DMA_BIDIRECTIONAL,	DMA_BIDIRECTIONAL);
   DEFINE(DMA_TO_DEVICE,		DMA_TO_DEVICE);
   DEFINE(DMA_FROM_DEVICE,	DMA_FROM_DEVICE);
+#ifdef CONFIG_KVM_ARM_HOST
+  DEFINE(VCPU_KVM,		offsetof(struct kvm_vcpu, kvm));
+  DEFINE(VCPU_MIDR,		offsetof(struct kvm_vcpu, arch.midr));
+  DEFINE(VCPU_MPIDR,		offsetof(struct kvm_vcpu, arch.cp15[c0_MPIDR]));
+  DEFINE(VCPU_CSSELR,		offsetof(struct kvm_vcpu, arch.cp15[c0_CSSELR]));
+  DEFINE(VCPU_SCTLR,		offsetof(struct kvm_vcpu, arch.cp15[c1_SCTLR]));
+  DEFINE(VCPU_CPACR,		offsetof(struct kvm_vcpu, arch.cp15[c1_CPACR]));
+  DEFINE(VCPU_TTBR0,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR0]));
+  DEFINE(VCPU_TTBR1,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR1]));
+  DEFINE(VCPU_TTBCR,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBCR]));
+  DEFINE(VCPU_DACR,		offsetof(struct kvm_vcpu, arch.cp15[c3_DACR]));
+  DEFINE(VCPU_DFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_DFSR]));
+  DEFINE(VCPU_IFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_IFSR]));
+  DEFINE(VCPU_ADFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_ADFSR]));
+  DEFINE(VCPU_AIFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_AIFSR]));
+  DEFINE(VCPU_DFAR,		offsetof(struct kvm_vcpu, arch.cp15[c6_DFAR]));
+  DEFINE(VCPU_IFAR,		offsetof(struct kvm_vcpu, arch.cp15[c6_IFAR]));
+  DEFINE(VCPU_PRRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_PRRR]));
+  DEFINE(VCPU_NMRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_NMRR]));
+  DEFINE(VCPU_VBAR,		offsetof(struct kvm_vcpu, arch.cp15[c12_VBAR]));
+  DEFINE(VCPU_CID,		offsetof(struct kvm_vcpu, arch.cp15[c13_CID]));
+  DEFINE(VCPU_TID_URW,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URW]));
+  DEFINE(VCPU_TID_URO,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URO]));
+  DEFINE(VCPU_TID_PRIV,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_PRIV]));
+  DEFINE(VCPU_VFP_GUEST,	offsetof(struct kvm_vcpu, arch.vfp_guest));
+  DEFINE(VCPU_VFP_HOST,		offsetof(struct kvm_vcpu, arch.vfp_host));
+  DEFINE(VCPU_REGS,		offsetof(struct kvm_vcpu, arch.regs));
+  DEFINE(VCPU_USR_REGS,		offsetof(struct kvm_vcpu, arch.regs.usr_regs));
+  DEFINE(VCPU_SVC_REGS,		offsetof(struct kvm_vcpu, arch.regs.svc_regs));
+  DEFINE(VCPU_ABT_REGS,		offsetof(struct kvm_vcpu, arch.regs.abt_regs));
+  DEFINE(VCPU_UND_REGS,		offsetof(struct kvm_vcpu, arch.regs.und_regs));
+  DEFINE(VCPU_IRQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.irq_regs));
+  DEFINE(VCPU_FIQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
+  DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.pc));
+  DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.cpsr));
+  DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.hsr));
+  DEFINE(VCPU_HDFAR,		offsetof(struct kvm_vcpu, arch.hdfar));
+  DEFINE(VCPU_HIFAR,		offsetof(struct kvm_vcpu, arch.hifar));
+  DEFINE(VCPU_HPFAR,		offsetof(struct kvm_vcpu, arch.hpfar));
+  DEFINE(VCPU_HYP_PC,		offsetof(struct kvm_vcpu, arch.hyp_pc));
+  DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
+#endif
   return 0; 
 }
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 8a87fc7..087f9d1 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -41,6 +41,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_emulate.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
@@ -50,6 +51,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
 static unsigned long hyp_default_vectors;
 
+/* The VMID used in the VTTBR */
+static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
+static u8 kvm_next_vmid;
+static DEFINE_SPINLOCK(kvm_vmid_lock);
 
 int kvm_arch_hardware_enable(void *garbage)
 {
@@ -273,6 +278,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
+	vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -305,12 +311,169 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
 {
+	return v->mode == IN_GUEST_MODE;
+}
+
+static void reset_vm_context(void *info)
+{
+	__kvm_flush_vm_context();
+}
+
+/**
+ * need_new_vmid_gen - check that the VMID is still valid
+ * @kvm: The VM's VMID to checkt
+ *
+ * return true if there is a new generation of VMIDs being used
+ *
+ * The hardware supports only 256 values with the value zero reserved for the
+ * host, so we check if an assigned value belongs to a previous generation,
+ * which which requires us to assign a new value. If we're the first to use a
+ * VMID for the new generation, we must flush necessary caches and TLBs on all
+ * CPUs.
+ */
+static bool need_new_vmid_gen(struct kvm *kvm)
+{
+	return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
+}
+
+/**
+ * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
+ * @kvm	The guest that we are about to run
+ *
+ * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
+ * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
+ * caches and TLBs.
+ */
+static void update_vttbr(struct kvm *kvm)
+{
+	phys_addr_t pgd_phys;
+
+	if (!need_new_vmid_gen(kvm))
+		return;
+
+	spin_lock(&kvm_vmid_lock);
+
+	/* First user of a new VMID generation? */
+	if (unlikely(kvm_next_vmid == 0)) {
+		atomic64_inc(&kvm_vmid_gen);
+		kvm_next_vmid = 1;
+
+		/*
+		 * On SMP we know no other CPUs can use this CPU's or
+		 * each other's VMID since the kvm_vmid_lock blocks
+		 * them from reentry to the guest.
+		 */
+		on_each_cpu(reset_vm_context, NULL, 1);
+	}
+
+	kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
+	kvm->arch.vmid = kvm_next_vmid;
+	kvm_next_vmid++;
+
+	/* update vttbr to be used with the new vmid */
+	pgd_phys = virt_to_phys(kvm->arch.pgd);
+	kvm->arch.vttbr = pgd_phys & ((1LLU << 40) - 1)
+			  & ~((2 << VTTBR_X) - 1);
+	kvm->arch.vttbr |= (u64)(kvm->arch.vmid) << 48;
+
+	spin_unlock(&kvm_vmid_lock);
+}
+
+/*
+ * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
+ * proper exit to QEMU.
+ */
+static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+		       int exception_index)
+{
+	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 	return 0;
 }
 
+/**
+ * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
+ * @vcpu:	The VCPU pointer
+ * @run:	The kvm_run structure pointer used for userspace state exchange
+ *
+ * This function is called through the VCPU_RUN ioctl called from user space. It
+ * will execute VM code in a loop until the time slice for the process is used
+ * or some emulation is needed from user space in which case the function will
+ * return with return value 0 and with the kvm_run structure filled in with the
+ * required data for the requested emulation.
+ */
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	int ret;
+	sigset_t sigsaved;
+
+	/* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
+	if (unlikely(!vcpu->arch.target))
+		return -ENOEXEC;
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
+
+	ret = 1;
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	while (ret > 0) {
+		/*
+		 * Check conditions before entering the guest
+		 */
+		cond_resched();
+
+		update_vttbr(vcpu->kvm);
+
+		local_irq_disable();
+
+		/*
+		 * Re-check atomic conditions
+		 */
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			run->exit_reason = KVM_EXIT_INTR;
+		}
+
+		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
+			local_irq_enable();
+			continue;
+		}
+
+		BUG_ON(__vcpu_mode(*vcpu_cpsr(vcpu)) == 0xf);
+
+		/**************************************************************
+		 * Enter the guest
+		 */
+		trace_kvm_entry(vcpu->arch.regs.pc);
+		kvm_guest_enter();
+		vcpu->mode = IN_GUEST_MODE;
+
+		ret = __kvm_vcpu_run(vcpu);
+
+		vcpu->mode = OUTSIDE_GUEST_MODE;
+		kvm_guest_exit();
+		trace_kvm_exit(vcpu->arch.regs.pc);
+		/*
+		 * We may have taken a host interrupt in HYP mode (ie
+		 * while executing the guest). This interrupt is still
+		 * pending, as we haven't serviced it yet!
+		 *
+		 * We're now back in SVC mode, with interrupts
+		 * disabled.  Enabling the interrupts now will have
+		 * the effect of taking the interrupt again, in SVC
+		 * mode this time.
+		 */
+		local_irq_enable();
+
+		/*
+		 * Back from guest
+		 *************************************************************/
+
+		ret = handle_exit(vcpu, run, ret);
+	}
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &sigsaved, NULL);
+	return ret;
 }
 
 static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index edf9ed5..cc9448b 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -23,6 +23,12 @@
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
+#include <asm/vfpmacros.h>
+
+#define VCPU_USR_REG(_reg_nr)	(VCPU_USR_REGS + (_reg_nr * 4))
+#define VCPU_USR_SP		(VCPU_USR_REG(13))
+#define VCPU_FIQ_REG(_reg_nr)	(VCPU_FIQ_REGS + (_reg_nr * 4))
+#define VCPU_FIQ_SPSR		(VCPU_FIQ_REG(7))
 
 	.text
 	.align	PAGE_SHIFT
@@ -34,7 +40,33 @@ __kvm_hyp_code_start:
 @  Flush per-VMID TLBs
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/*
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm);
+ *
+ * We rely on the hardware to broadcast the TLB invalidation to all CPUs
+ * inside the inner-shareable domain (which is the case for all v7
+ * implementations).  If we come across a non-IS SMP implementation, we'll
+ * have to use an IPI based mechanism. Until then, we stick to the simple
+ * hardware assisted version.
+ */
 ENTRY(__kvm_tlb_flush_vmid)
+	hvc	#0			@ Switch to Hyp mode
+	push	{r2, r3}
+
+	add	r0, r0, #KVM_VTTBR
+	ldrd	r2, r3, [r0]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+	isb
+	mcr     p15, 0, r0, c8, c3, 0	@ TLBIALLIS (rt ignored)
+	dsb
+	isb
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
+	isb
+
+	pop	{r2, r3}
+	hvc	#0			@ Back to SVC
 	bx	lr
 ENDPROC(__kvm_tlb_flush_vmid)
 
@@ -42,26 +74,702 @@ ENDPROC(__kvm_tlb_flush_vmid)
 @  Flush TLBs and instruction caches of current CPU for all VMIDs
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/*
+ * void __kvm_flush_vm_context(void);
+ */
 ENTRY(__kvm_flush_vm_context)
+	hvc	#0			@ switch to hyp-mode
+
+	mov	r0, #0			@ rn parameter for c15 flushes is SBZ
+	mcr     p15, 4, r0, c8, c7, 4   @ Invalidate Non-secure Non-Hyp TLB
+	mcr     p15, 0, r0, c7, c5, 0   @ Invalidate instruction caches
+	dsb
+	isb
+
+	hvc	#0			@ switch back to svc-mode, see hyp_svc
 	bx	lr
 ENDPROC(__kvm_flush_vm_context)
 
+/* Clobbers {r2-r6} */
+.macro store_vfp_state vfp_base
+	@ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions
+	VFPFMRX	r2, FPEXC
+	@ Make sure VFP is enabled so we can touch the registers.
+	orr	r6, r2, #FPEXC_EN
+	VFPFMXR	FPEXC, r6
+
+	VFPFMRX	r3, FPSCR
+	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
+	beq	1f
+	@ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so
+	@ we only need to save them if FPEXC_EX is set.
+	VFPFMRX r4, FPINST
+	tst	r2, #FPEXC_FP2V
+	VFPFMRX r5, FPINST2, ne		@ vmrsne
+	bic	r6, r2, #FPEXC_EX	@ FPEXC_EX disable
+	VFPFMXR	FPEXC, r6
+1:
+	VFPFSTMIA \vfp_base, r6		@ Save VFP registers
+	stm	\vfp_base, {r2-r5}	@ Save FPEXC, FPSCR, FPINST, FPINST2
+.endm
+
+/* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */
+.macro restore_vfp_state vfp_base
+	VFPFLDMIA \vfp_base, r6		@ Load VFP registers
+	ldm	\vfp_base, {r2-r5}	@ Load FPEXC, FPSCR, FPINST, FPINST2
+
+	VFPFMXR FPSCR, r3
+	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
+	beq	1f
+	VFPFMXR FPINST, r4
+	tst	r2, #FPEXC_FP2V
+	VFPFMXR FPINST2, r5, ne
+1:
+	VFPFMXR FPEXC, r2	@ FPEXC	(last, in case !EN)
+.endm
+
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor world-switch code
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/* These are simply for the macros to work - value don't have meaning */
+.equ usr, 0
+.equ svc, 1
+.equ abt, 2
+.equ und, 3
+.equ irq, 4
+.equ fiq, 5
+
+.macro store_mode_state base_reg, mode
+	.if \mode == usr
+	mrs	r2, SP_usr
+	mov	r3, lr
+	stmdb	\base_reg!, {r2, r3}
+	.elseif \mode != fiq
+	mrs	r2, SP_\mode
+	mrs	r3, LR_\mode
+	mrs	r4, SPSR_\mode
+	stmdb	\base_reg!, {r2, r3, r4}
+	.else
+	mrs	r2, r8_fiq
+	mrs	r3, r9_fiq
+	mrs	r4, r10_fiq
+	mrs	r5, r11_fiq
+	mrs	r6, r12_fiq
+	mrs	r7, SP_fiq
+	mrs	r8, LR_fiq
+	mrs	r9, SPSR_fiq
+	stmdb	\base_reg!, {r2-r9}
+	.endif
+.endm
+
+.macro load_mode_state base_reg, mode
+	.if \mode == usr
+	ldmia	\base_reg!, {r2, r3}
+	msr	SP_usr, r2
+	mov	lr, r3
+	.elseif \mode != fiq
+	ldmia	\base_reg!, {r2, r3, r4}
+	msr	SP_\mode, r2
+	msr	LR_\mode, r3
+	msr	SPSR_\mode, r4
+	.else
+	ldmia	\base_reg!, {r2-r9}
+	msr	r8_fiq, r2
+	msr	r9_fiq, r3
+	msr	r10_fiq, r4
+	msr	r11_fiq, r5
+	msr	r12_fiq, r6
+	msr	SP_fiq, r7
+	msr	LR_fiq, r8
+	msr	SPSR_fiq, r9
+	.endif
+.endm
+
+/* Reads cp15 registers from hardware and stores them in memory
+ * @vcpu:   If 0, registers are written in-order to the stack,
+ * 	    otherwise to the VCPU struct pointed to by vcpup
+ * @vcpup:  Register pointing to VCPU struct
+ */
+.macro read_cp15_state vcpu=0, vcpup
+	mrc	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mrc	p15, 0, r3, c1, c0, 2	@ CPACR
+	mrc	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mrc	p15, 0, r5, c3, c0, 0	@ DACR
+	mrrc	p15, 0, r6, r7, c2	@ TTBR 0
+	mrrc	p15, 1, r8, r9, c2	@ TTBR 1
+	mrc	p15, 0, r10, c10, c2, 0	@ PRRR
+	mrc	p15, 0, r11, c10, c2, 1	@ NMRR
+	mrc	p15, 2, r12, c0, c0, 0	@ CSSELR
+
+	.if \vcpu == 0
+	push	{r2-r12}		@ Push CP15 registers
+	.else
+	str	r2, [\vcpup, #VCPU_SCTLR]
+	str	r3, [\vcpup, #VCPU_CPACR]
+	str	r4, [\vcpup, #VCPU_TTBCR]
+	str	r5, [\vcpup, #VCPU_DACR]
+	add	\vcpup, \vcpup, #VCPU_TTBR0
+	strd	r6, r7, [\vcpup]
+	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	strd	r8, r9, [\vcpup]
+	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
+	str	r10, [\vcpup, #VCPU_PRRR]
+	str	r11, [\vcpup, #VCPU_NMRR]
+	str	r12, [\vcpup, #VCPU_CSSELR]
+	.endif
+
+	mrc	p15, 0, r2, c13, c0, 1	@ CID
+	mrc	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mrc	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mrc	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	mrc	p15, 0, r6, c5, c0, 0	@ DFSR
+	mrc	p15, 0, r7, c5, c0, 1	@ IFSR
+	mrc	p15, 0, r8, c5, c1, 0	@ ADFSR
+	mrc	p15, 0, r9, c5, c1, 1	@ AIFSR
+	mrc	p15, 0, r10, c6, c0, 0	@ DFAR
+	mrc	p15, 0, r11, c6, c0, 2	@ IFAR
+	mrc	p15, 0, r12, c12, c0, 0	@ VBAR
+
+	.if \vcpu == 0
+	push	{r2-r12}		@ Push CP15 registers
+	.else
+	str	r2, [\vcpup, #VCPU_CID]
+	str	r3, [\vcpup, #VCPU_TID_URW]
+	str	r4, [\vcpup, #VCPU_TID_URO]
+	str	r5, [\vcpup, #VCPU_TID_PRIV]
+	str	r6, [\vcpup, #VCPU_DFSR]
+	str	r7, [\vcpup, #VCPU_IFSR]
+	str	r8, [\vcpup, #VCPU_ADFSR]
+	str	r9, [\vcpup, #VCPU_AIFSR]
+	str	r10, [\vcpup, #VCPU_DFAR]
+	str	r11, [\vcpup, #VCPU_IFAR]
+	str	r12, [\vcpup, #VCPU_VBAR]
+	.endif
+.endm
+
+/* Reads cp15 registers from memory and writes them to hardware
+ * @vcpu:   If 0, registers are read in-order from the stack,
+ * 	    otherwise from the VCPU struct pointed to by vcpup
+ * @vcpup:  Register pointing to VCPU struct
+ */
+.macro write_cp15_state vcpu=0, vcpup
+	.if \vcpu == 0
+	pop	{r2-r12}
+	.else
+	ldr	r2, [\vcpup, #VCPU_CID]
+	ldr	r3, [\vcpup, #VCPU_TID_URW]
+	ldr	r4, [\vcpup, #VCPU_TID_URO]
+	ldr	r5, [\vcpup, #VCPU_TID_PRIV]
+	ldr	r6, [\vcpup, #VCPU_DFSR]
+	ldr	r7, [\vcpup, #VCPU_IFSR]
+	ldr	r8, [\vcpup, #VCPU_ADFSR]
+	ldr	r9, [\vcpup, #VCPU_AIFSR]
+	ldr	r10, [\vcpup, #VCPU_DFAR]
+	ldr	r11, [\vcpup, #VCPU_IFAR]
+	ldr	r12, [\vcpup, #VCPU_VBAR]
+	.endif
+
+	mcr	p15, 0, r2, c13, c0, 1	@ CID
+	mcr	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mcr	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mcr	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	mcr	p15, 0, r6, c5, c0, 0	@ DFSR
+	mcr	p15, 0, r7, c5, c0, 1	@ IFSR
+	mcr	p15, 0, r8, c5, c1, 0	@ ADFSR
+	mcr	p15, 0, r9, c5, c1, 1	@ AIFSR
+	mcr	p15, 0, r10, c6, c0, 0	@ DFAR
+	mcr	p15, 0, r11, c6, c0, 2	@ IFAR
+	mcr	p15, 0, r12, c12, c0, 0	@ VBAR
+
+	.if \vcpu == 0
+	pop	{r2-r12}
+	.else
+	ldr	r2, [\vcpup, #VCPU_SCTLR]
+	ldr	r3, [\vcpup, #VCPU_CPACR]
+	ldr	r4, [\vcpup, #VCPU_TTBCR]
+	ldr	r5, [\vcpup, #VCPU_DACR]
+	add	\vcpup, \vcpup, #VCPU_TTBR0
+	ldrd	r6, r7, [\vcpup]
+	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	ldrd	r8, r9, [\vcpup]
+	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
+	ldr	r10, [\vcpup, #VCPU_PRRR]
+	ldr	r11, [\vcpup, #VCPU_NMRR]
+	ldr	r12, [\vcpup, #VCPU_CSSELR]
+	.endif
+
+	mcr	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mcr	p15, 0, r3, c1, c0, 2	@ CPACR
+	mcr	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mcr	p15, 0, r5, c3, c0, 0	@ DACR
+	mcrr	p15, 0, r6, r7, c2	@ TTBR 0
+	mcrr	p15, 1, r8, r9, c2	@ TTBR 1
+	mcr	p15, 0, r10, c10, c2, 0	@ PRRR
+	mcr	p15, 0, r11, c10, c2, 1	@ NMRR
+	mcr	p15, 2, r12, c0, c0, 0	@ CSSELR
+.endm
+
+/* Configures the HSTR (Hyp System Trap Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hstr entry
+	mrc	p15, 4, r2, c1, c1, 3
+	ldr	r3, =HSTR_T(15)
+	.if \entry == 1
+	orr	r2, r2, r3		@ Trap CR{15}
+	.else
+	bic	r2, r2, r3		@ Don't trap any CRx accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 3
+.endm
+
+/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
+ * (hardware reset value is 0). Keep previous value in r2. */
+.macro set_hcptr entry, mask
+	mrc	p15, 4, r2, c1, c1, 2
+	ldr	r3, =\mask
+	.if \entry == 1
+	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
+	.else
+	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
+	.endif
+	mcr	p15, 4, r3, c1, c1, 2
+.endm
+
+/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hdcr entry
+	mrc	p15, 4, r2, c1, c1, 1
+	ldr	r3, =(HDCR_TPM|HDCR_TPMCR)
+	.if \entry == 1
+	orr	r2, r2, r3		@ Trap some perfmon accesses
+	.else
+	bic	r2, r2, r3		@ Don't trap any perfmon accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 1
+.endm
+
+/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
+.macro configure_hyp_role entry, vcpu_ptr
+	mrc	p15, 4, r2, c1, c1, 0	@ HCR
+	bic	r2, r2, #HCR_VIRT_EXCP_MASK
+	ldr	r3, =HCR_GUEST_MASK
+	.if \entry == 1
+	orr	r2, r2, r3
+	ldr	r3, [\vcpu_ptr, #VCPU_IRQ_LINES]
+	orr	r2, r2, r3
+	.else
+	bic	r2, r2, r3
+	.endif
+	mcr	p15, 4, r2, c1, c1, 0
+.endm
+
+.macro load_vcpu reg
+	mrc	p15, 4, \reg, c13, c0, 2	@ HTPIDR
+.endm
+
+@ Arguments:
+@  r0: pointer to vcpu struct
 ENTRY(__kvm_vcpu_run)
-	bx	lr
+	hvc	#0			@ switch to hyp-mode
+
+	@ Save the vcpu pointer
+	mcr	p15, 4, r0, c13, c0, 2	@ HTPIDR
+
+	@ Now we're in Hyp-mode and lr_usr, spsr_hyp are on the stack
+	mrs	r2, sp_usr
+	push	{r2}			@ Push r13_usr
+	push	{r4-r12}		@ Push r4-r12
+
+	store_mode_state sp, svc
+	store_mode_state sp, abt
+	store_mode_state sp, und
+	store_mode_state sp, irq
+	store_mode_state sp, fiq
+
+	@ Store hardware CP15 state and load guest state
+	read_cp15_state
+	write_cp15_state 1, r0
+
+	@ If the host kernel has not been configured with VFPv3 support,
+	@ then it is safer if we deny guests from using it as well.
+#ifdef CONFIG_VFPv3
+	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
+	VFPFMRX r2, FPEXC		@ VMRS
+	push	{r2}
+	orr	r2, r2, #FPEXC_EN
+	VFPFMXR FPEXC, r2		@ VMSR
+#endif
+
+	@ Configure Hyp-role
+	configure_hyp_role 1, r0
+
+	@ Trap coprocessor CRx accesses
+	set_hstr 1
+	set_hcptr 1, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+	set_hdcr 1
+
+	@ Write configured ID register into MIDR alias
+	ldr	r1, [r0, #VCPU_MIDR]
+	mcr	p15, 4, r1, c0, c0, 0
+
+	@ Write guest view of MPIDR into VMPIDR
+	ldr	r1, [r0, #VCPU_MPIDR]
+	mcr	p15, 4, r1, c0, c0, 5
+
+	@ Load guest registers
+	add	r0, r0, #(VCPU_USR_SP)
+	load_mode_state r0, usr
+	load_mode_state r0, svc
+	load_mode_state r0, abt
+	load_mode_state r0, und
+	load_mode_state r0, irq
+	load_mode_state r0, fiq
+
+	@ Load return state (r0 now points to vcpu->arch.regs.pc)
+	ldmia	r0, {r2, r3}
+	msr	ELR_hyp, r2
+	msr	SPSR_cxsf, r3
+
+	@ Set up guest memory translation
+	sub	r1, r0, #(VCPU_PC - VCPU_KVM)	@ r1 points to kvm struct
+	ldr	r1, [r1]
+	add	r1, r1, #KVM_VTTBR
+	ldrd	r2, r3, [r1]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Load remaining registers and do the switch
+	sub	r0, r0, #(VCPU_PC - VCPU_USR_REGS)
+	ldmia	r0, {r0-r12}
+	clrex				@ Clear exclusive monitor
+	eret
+
+__kvm_vcpu_return:
+	@ Set VMID == 0
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Store return state
+	mrs	r2, ELR_hyp
+	mrs	r3, spsr
+	str	r2, [r1, #VCPU_PC]
+	str	r3, [r1, #VCPU_CPSR]
+
+	@ Store guest registers
+	add	r1, r1, #(VCPU_FIQ_SPSR + 4)
+	store_mode_state r1, fiq
+	store_mode_state r1, irq
+	store_mode_state r1, und
+	store_mode_state r1, abt
+	store_mode_state r1, svc
+	store_mode_state r1, usr
+	sub	r1, r1, #(VCPU_USR_REG(13))
+
+	@ Don't trap coprocessor accesses for host kernel
+	set_hstr 0
+	set_hdcr 0
+	set_hcptr 0, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+
+#ifdef CONFIG_VFPv3
+	@ Save floating point registers we if let guest use them.
+	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
+	bne	after_vfp_restore
 
+	@ Switch VFP/NEON hardware state to the host's
+	add	r7, r1, #VCPU_VFP_GUEST
+	store_vfp_state r7
+	add	r7, r1, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	restore_vfp_state r7
+
+after_vfp_restore:
+	@ Restore FPEXC_EN which we clobbered on entry
+	pop	{r2}
+	VFPFMXR FPEXC, r2
+#endif
+
+	@ Reset Hyp-role
+	configure_hyp_role 0, r1
+
+	@ Let host read hardware MIDR
+	mrc	p15, 0, r2, c0, c0, 0
+	mcr	p15, 4, r2, c0, c0, 0
+
+	@ Back to hardware MPIDR
+	mrc	p15, 0, r2, c0, c0, 5
+	mcr	p15, 4, r2, c0, c0, 5
+
+	@ Store guest CP15 state and restore host state
+	read_cp15_state 1, r1
+	write_cp15_state
+
+	load_mode_state sp, fiq
+	load_mode_state sp, irq
+	load_mode_state sp, und
+	load_mode_state sp, abt
+	load_mode_state sp, svc
+
+	pop	{r4-r12}		@ Pop r4-r12
+	pop	{r2}			@ Pop r13_usr
+	msr	sp_usr, r2
+
+	hvc	#0			@ switch back to svc-mode, see hyp_svc
+
+	clrex				@ Clear exclusive monitor
+	bx	lr			@ return to IOCTL
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor exception vector and handlers
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/*
+ * The KVM/ARM Hypervisor ABI is defined as follows:
+ *
+ * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
+ * instruction is issued since all traps are disabled when running the host
+ * kernel as per the Hyp-mode initialization at boot time.
+ *
+ * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
+ * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
+ * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
+ * instructions are called from within Hyp-mode.
+ *
+ * Hyp-ABI: Switching from host kernel to Hyp-mode:
+ *    Switching to Hyp mode is done through a simple HVC #0 instruction. The
+ *    exception vector code will check that the HVC comes from VMID==0 and if
+ *    so will store the necessary state on the Hyp stack, which will look like
+ *    this (growing downwards, see the hyp_hvc handler):
+ *      ...
+ *      stack_page + 4: spsr (Host-SVC cpsr)
+ *      stack_page    : lr_usr
+ *      --------------: stack bottom
+ *
+ * Hyp-ABI: Switching from Hyp-mode to host kernel SVC mode:
+ *    When returning from Hyp mode to SVC mode, another HVC instruction is
+ *    executed from Hyp mode, which is taken in the hyp_svc handler. The
+ *    bottom of the Hyp is derived from the Hyp stack pointer (only a single
+ *    page aligned stack is used per CPU) and the initial SVC registers are
+ *    used to restore the host state.
+ *
+ * Hyp-ABI: Change the HVBAR:
+ *    When removing the KVM module we want to reset our hold on Hyp mode.
+ *    This is accomplished by calling HVC #0xff from the host kernel
+ *    (VMID==0) with the desired new HVBAR in r0.
+ *
+ * Note that the above is used to execute code in Hyp-mode from a host-kernel
+ * point of view, and is a different concept from performing a world-switch and
+ * executing guest code SVC mode (with a VMID != 0).
+ */
+
+@ Handle undef, svc, pabt, or dabt by crashing with a user notice
+.macro bad_exception exception_code, panic_str
+	mrrc	p15, 6, r2, r3, c2	@ Read VTTBR
+	lsr	r3, r3, #16
+	ands	r3, r3, #0xff
+
+	@ COND:neq means we're probably in the guest and we can try fetching
+	@ the vcpu pointer and stuff off the stack and keep our fingers crossed
+	beq	99f
+	mov	r0, #\exception_code
+	load_vcpu	r1		@ Load VCPU pointer
+	.if \exception_code == ARM_EXCEPTION_DATA_ABORT
+	mrc	p15, 4, r2, c5, c2, 0	@ HSR
+	mrc	p15, 4, r3, c6, c0, 0	@ HDFAR
+	str	r2, [r1, #VCPU_HSR]
+	str	r3, [r1, #VCPU_HDFAR]
+	.endif
+	.if \exception_code == ARM_EXCEPTION_PREF_ABORT
+	mrc	p15, 4, r2, c5, c2, 0	@ HSR
+	mrc	p15, 4, r3, c6, c0, 2	@ HIFAR
+	str	r2, [r1, #VCPU_HSR]
+	str	r3, [r1, #VCPU_HIFAR]
+	.endif
+	mrs	r2, ELR_hyp
+	str	r2, [r1, #VCPU_HYP_PC]
+	b	__kvm_vcpu_return
+
+	@ We were in the host already
+99:	hvc	#0	@ switch to SVC mode
+	ldr	r0, \panic_str
+	mrs	r1, ELR_hyp
+	b	panic
+
+.endm
+
+	.text
+
 	.align 5
 __kvm_hyp_vector:
 	.globl __kvm_hyp_vector
-	nop
+
+	@ Hyp-mode exception vector
+	W(b)	hyp_reset
+	W(b)	hyp_undef
+	W(b)	hyp_svc
+	W(b)	hyp_pabt
+	W(b)	hyp_dabt
+	W(b)	hyp_hvc
+	W(b)	hyp_irq
+	W(b)	hyp_fiq
+
+	.align
+hyp_reset:
+	b	hyp_reset
+
+	.align
+hyp_undef:
+	bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str
+
+	.align
+hyp_svc:
+	@ Can only get here if HVC or SVC is called from Hyp, mode which means
+	@ we want to change mode back to SVC mode.
+	push	{r12}
+	mov	r12, sp
+	bic	r12, r12, #0x0ff
+	bic	r12, r12, #0xf00
+	ldr	lr, [r12, #4]
+	msr	SPSR_csxf, lr
+	ldr	lr, [r12]
+	pop	{r12}
+	eret
+
+	.align
+hyp_pabt:
+	bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str
+
+	.align
+hyp_dabt:
+	bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str
+
+	.align
+hyp_hvc:
+	@ Getting here is either becuase of a trap from a guest or from calling
+	@ HVC from the host kernel, which means "switch to Hyp mode".
+	push	{r0, r1, r2}
+
+	@ Check syndrome register
+	mrc	p15, 4, r0, c5, c2, 0	@ HSR
+	lsr	r1, r0, #HSR_EC_SHIFT
+#ifdef CONFIG_VFPv3
+	cmp	r1, #HSR_EC_CP_0_13
+	beq	switch_to_guest_vfp
+#endif
+	cmp	r1, #HSR_EC_HVC
+	bne	guest_trap		@ Not HVC instr.
+
+	@ Let's check if the HVC came from VMID 0 and allow simple
+	@ switch to Hyp mode
+	mrrc    p15, 6, r1, r2, c2
+	lsr     r2, r2, #16
+	and     r2, r2, #0xff
+	cmp     r2, #0
+	bne	guest_trap		@ Guest called HVC
+
+	@ HVC came from host. Check if this is a request to
+	@ switch HVBAR to another set of vectors (kvm_exit).
+	lsl	r0, r0, #16
+	lsr	r0, r0, #16
+	cmp	r0, #0xff
+	bne	host_switch_to_hyp	@ Not HVC #0xff
+
+	@ We're switching away from this hypervisor, let's blow the TLBs.
+	pop	{r0, r1, r2}
+	mcr	p15, 4, r0, c12, c0, 0  @ HVBAR
+	mcr	p15, 4, r0, c8, c7, 0   @ Flush Hyp TLB, r0 ignored
+	eret
+
+host_switch_to_hyp:
+	@ Store lr_usr,spsr (svc cpsr) on bottom of stack
+	mov	r1, sp
+	bic	r1, r1, #0x0ff
+	bic	r1, r1, #0xf00
+	str	lr, [r1]
+	mrs	lr, spsr
+	str	lr, [r1, #4]
+
+	pop	{r0, r1, r2}
+
+	@ Return to caller in Hyp mode
+	mrs	lr, ELR_hyp
+	mov	pc, lr
+
+guest_trap:
+	load_vcpu	r1		@ Load VCPU pointer
+	str	r0, [r1, #VCPU_HSR]
+	add	r1, r1, #VCPU_USR_REG(3)
+	stmia	r1, {r3-r12}
+	sub	r1, r1, #(VCPU_USR_REG(3) - VCPU_USR_REG(0))
+	pop	{r3, r4, r5}
+	stmia	r1, {r3, r4, r5}
+	sub	r1, r1, #VCPU_USR_REG(0)
+
+	@ Check if we need the fault information
+	lsr	r2, r0, #HSR_EC_SHIFT
+	cmp	r2, #HSR_EC_IABT
+	beq	2f
+	cmpne	r2, #HSR_EC_DABT
+	bne	1f
+
+2:	mrc	p15, 4, r2, c6, c0, 0	@ HDFAR
+	mrc	p15, 4, r3, c6, c0, 2	@ HIFAR
+	mrc	p15, 4, r4, c6, c0, 4	@ HPFAR
+	add	r5, r1, #VCPU_HDFAR
+	stmia	r5, {r2, r3, r4}
+
+1:	mov	r0, #ARM_EXCEPTION_HVC
+	b	__kvm_vcpu_return
+
+@ If VFPv3 support is not available, then we will not switch the VFP
+@ registers; however cp10 and cp11 accesses will still trap and fallback
+@ to the regular coprocessor emulation code, which currently will
+@ inject an undefined exception to the guest.
+#ifdef CONFIG_VFPv3
+switch_to_guest_vfp:
+	load_vcpu	r0		@ Load VCPU pointer
+	push	{r3-r7}
+
+	@ NEON/VFP used.  Turn on VFP access.
+	set_hcptr 0, (HCPTR_TCP(10) | HCPTR_TCP(11))
+
+	@ Switch VFP/NEON hardware state to the guest's
+	add	r7, r0, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	store_vfp_state r7
+	add	r7, r0, #VCPU_VFP_GUEST
+	restore_vfp_state r7
+
+	pop	{r3-r7}
+	pop	{r0-r2}
+	eret
+#endif
+
+	.align
+hyp_irq:
+	push	{r0}
+	load_vcpu	r0		@ Load VCPU pointer
+	add	r0, r0, #(VCPU_USR_REG(1))
+	stmia	r0, {r1-r12}
+	pop	{r0}
+	load_vcpu	r1		@ Load VCPU pointer again
+	str	r0, [r1, #VCPU_USR_REG(0)]
+
+	mov	r0, #ARM_EXCEPTION_IRQ
+	b	__kvm_vcpu_return
+
+	.align
+hyp_fiq:
+	b	hyp_fiq
+
+	.ltorg
+
+und_die_str:
+	.ascii	"unexpected undefined exception in Hyp mode at: %#08x"
+pabt_die_str:
+	.ascii	"unexpected prefetch abort in Hyp mode at: %#08x"
+dabt_die_str:
+	.ascii	"unexpected data abort in Hyp mode at: %#08x"
 
 /*
  * The below lines makes sure the HYP mode code fits in a single page (the


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 10/15] KVM: ARM: World-switch implementation
@ 2012-09-15 15:35   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christoffer Dall <cdall@cs.columbia.edu>

Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.

The following Hyp-ABI is also documented in the code:

Hyp-ABI: Switching from host kernel to Hyp-mode:
   Switching to Hyp mode is done through a simple HVC #0 instruction. The
   exception vector code will check that the HVC comes from VMID==0 and if
   so will store the necessary state on the Hyp stack, which will look like
   this (growing downwards, see the hyp_hvc handler):
     ...
     stack_page + 4: spsr (Host-SVC cpsr)
     stack_page    : lr_usr
     --------------: stack bottom

Hyp-ABI: Switching from Hyp-mode to host kernel SVC mode:
   When returning from Hyp mode to SVC mode, another HVC instruction is
   executed from Hyp mode, which is taken in the hyp_svc handler. The
   bottom of the Hyp is derived from the Hyp stack pointer (only a single
   page aligned stack is used per CPU) and the initial SVC registers are
   used to restore the host state.

Hyp-ABI: Change the HVBAR:
   When removing the KVM module we want to reset our hold on Hyp mode.
   This is accomplished by calling HVC #0xff from the host kernel
   (VMID==0) with the desired new HVBAR in r0.

Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.

SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.

Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.

To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU.  After a guest exit, the VFP state is
returned to the host.  When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state.  We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h  |   38 ++
 arch/arm/include/asm/kvm_host.h |    9 
 arch/arm/kernel/asm-offsets.c   |   44 ++
 arch/arm/kvm/arm.c              |  165 +++++++++
 arch/arm/kvm/interrupts.S       |  712 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 965 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 0f641c1..ee345a6 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -104,6 +104,18 @@
 #define TTBCR_T0SZ	3
 #define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
 
+/* Hyp System Trap Register */
+#define HSTR_T(x)	(1 << x)
+#define HSTR_TTEE	(1 << 16)
+#define HSTR_TJDBX	(1 << 17)
+
+/* Hyp Coprocessor Trap Register */
+#define HCPTR_TCP(x)	(1 << x)
+#define HCPTR_TCP_MASK	(0x3fff)
+#define HCPTR_TASE	(1 << 15)
+#define HCPTR_TTA	(1 << 20)
+#define HCPTR_TCPAC	(1 << 31)
+
 /* Hyp Debug Configuration Register bits */
 #define HDCR_TDRA	(1 << 11)
 #define HDCR_TDOSA	(1 << 10)
@@ -134,5 +146,31 @@
 #define VTTBR_X		(5 - VTCR_GUEST_T0SZ)
 #endif
 
+/* Hyp Syndrome Register (HSR) bits */
+#define HSR_EC_SHIFT	(26)
+#define HSR_EC		(0x3fU << HSR_EC_SHIFT)
+#define HSR_IL		(1U << 25)
+#define HSR_ISS		(HSR_IL - 1)
+#define HSR_ISV_SHIFT	(24)
+#define HSR_ISV		(1U << HSR_ISV_SHIFT)
+
+#define HSR_EC_UNKNOWN	(0x00)
+#define HSR_EC_WFI	(0x01)
+#define HSR_EC_CP15_32	(0x03)
+#define HSR_EC_CP15_64	(0x04)
+#define HSR_EC_CP14_MR	(0x05)
+#define HSR_EC_CP14_LS	(0x06)
+#define HSR_EC_CP_0_13	(0x07)
+#define HSR_EC_CP10_ID	(0x08)
+#define HSR_EC_JAZELLE	(0x09)
+#define HSR_EC_BXJ	(0x0A)
+#define HSR_EC_CP14_64	(0x0C)
+#define HSR_EC_SVC_HYP	(0x11)
+#define HSR_EC_HVC	(0x12)
+#define HSR_EC_SMC	(0x13)
+#define HSR_EC_IABT	(0x20)
+#define HSR_EC_IABT_HYP	(0x21)
+#define HSR_EC_DABT	(0x24)
+#define HSR_EC_DABT_HYP	(0x25)
 
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index f0c72b9..ca4c079 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -20,6 +20,7 @@
 #define __ARM_KVM_HOST_H__
 
 #include <asm/kvm.h>
+#include <asm/fpstate.h>
 
 #define KVM_MAX_VCPUS 4
 #define KVM_MEMORY_SLOTS 32
@@ -137,6 +138,14 @@ struct kvm_vcpu_arch {
 	u32 hifar;		/* Hyp Inst. Fault Address Register */
 	u32 hpfar;		/* Hyp IPA Fault Address Register */
 
+	/* Floating point registers (VFP and Advanced SIMD/NEON) */
+	struct vfp_hard_struct vfp_guest;
+	struct vfp_hard_struct *vfp_host;
+
+	/*
+	 * Anything that is not used directly from assembly code goes
+	 * here.
+	 */
 	/* IO related fields */
 	struct {
 		bool sign_extend;	/* for byte/halfword loads */
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 1429d89..cd8fc86 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -13,6 +13,7 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/dma-mapping.h>
+#include <linux/kvm_host.h>
 #include <asm/cacheflush.h>
 #include <asm/glue-df.h>
 #include <asm/glue-pf.h>
@@ -144,5 +145,48 @@ int main(void)
   DEFINE(DMA_BIDIRECTIONAL,	DMA_BIDIRECTIONAL);
   DEFINE(DMA_TO_DEVICE,		DMA_TO_DEVICE);
   DEFINE(DMA_FROM_DEVICE,	DMA_FROM_DEVICE);
+#ifdef CONFIG_KVM_ARM_HOST
+  DEFINE(VCPU_KVM,		offsetof(struct kvm_vcpu, kvm));
+  DEFINE(VCPU_MIDR,		offsetof(struct kvm_vcpu, arch.midr));
+  DEFINE(VCPU_MPIDR,		offsetof(struct kvm_vcpu, arch.cp15[c0_MPIDR]));
+  DEFINE(VCPU_CSSELR,		offsetof(struct kvm_vcpu, arch.cp15[c0_CSSELR]));
+  DEFINE(VCPU_SCTLR,		offsetof(struct kvm_vcpu, arch.cp15[c1_SCTLR]));
+  DEFINE(VCPU_CPACR,		offsetof(struct kvm_vcpu, arch.cp15[c1_CPACR]));
+  DEFINE(VCPU_TTBR0,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR0]));
+  DEFINE(VCPU_TTBR1,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR1]));
+  DEFINE(VCPU_TTBCR,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBCR]));
+  DEFINE(VCPU_DACR,		offsetof(struct kvm_vcpu, arch.cp15[c3_DACR]));
+  DEFINE(VCPU_DFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_DFSR]));
+  DEFINE(VCPU_IFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_IFSR]));
+  DEFINE(VCPU_ADFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_ADFSR]));
+  DEFINE(VCPU_AIFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_AIFSR]));
+  DEFINE(VCPU_DFAR,		offsetof(struct kvm_vcpu, arch.cp15[c6_DFAR]));
+  DEFINE(VCPU_IFAR,		offsetof(struct kvm_vcpu, arch.cp15[c6_IFAR]));
+  DEFINE(VCPU_PRRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_PRRR]));
+  DEFINE(VCPU_NMRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_NMRR]));
+  DEFINE(VCPU_VBAR,		offsetof(struct kvm_vcpu, arch.cp15[c12_VBAR]));
+  DEFINE(VCPU_CID,		offsetof(struct kvm_vcpu, arch.cp15[c13_CID]));
+  DEFINE(VCPU_TID_URW,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URW]));
+  DEFINE(VCPU_TID_URO,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URO]));
+  DEFINE(VCPU_TID_PRIV,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_PRIV]));
+  DEFINE(VCPU_VFP_GUEST,	offsetof(struct kvm_vcpu, arch.vfp_guest));
+  DEFINE(VCPU_VFP_HOST,		offsetof(struct kvm_vcpu, arch.vfp_host));
+  DEFINE(VCPU_REGS,		offsetof(struct kvm_vcpu, arch.regs));
+  DEFINE(VCPU_USR_REGS,		offsetof(struct kvm_vcpu, arch.regs.usr_regs));
+  DEFINE(VCPU_SVC_REGS,		offsetof(struct kvm_vcpu, arch.regs.svc_regs));
+  DEFINE(VCPU_ABT_REGS,		offsetof(struct kvm_vcpu, arch.regs.abt_regs));
+  DEFINE(VCPU_UND_REGS,		offsetof(struct kvm_vcpu, arch.regs.und_regs));
+  DEFINE(VCPU_IRQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.irq_regs));
+  DEFINE(VCPU_FIQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
+  DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.pc));
+  DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.cpsr));
+  DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.hsr));
+  DEFINE(VCPU_HDFAR,		offsetof(struct kvm_vcpu, arch.hdfar));
+  DEFINE(VCPU_HIFAR,		offsetof(struct kvm_vcpu, arch.hifar));
+  DEFINE(VCPU_HPFAR,		offsetof(struct kvm_vcpu, arch.hpfar));
+  DEFINE(VCPU_HYP_PC,		offsetof(struct kvm_vcpu, arch.hyp_pc));
+  DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
+#endif
   return 0; 
 }
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 8a87fc7..087f9d1 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -41,6 +41,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_emulate.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
@@ -50,6 +51,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
 static unsigned long hyp_default_vectors;
 
+/* The VMID used in the VTTBR */
+static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
+static u8 kvm_next_vmid;
+static DEFINE_SPINLOCK(kvm_vmid_lock);
 
 int kvm_arch_hardware_enable(void *garbage)
 {
@@ -273,6 +278,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
+	vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -305,12 +311,169 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
 {
+	return v->mode == IN_GUEST_MODE;
+}
+
+static void reset_vm_context(void *info)
+{
+	__kvm_flush_vm_context();
+}
+
+/**
+ * need_new_vmid_gen - check that the VMID is still valid
+ * @kvm: The VM's VMID to checkt
+ *
+ * return true if there is a new generation of VMIDs being used
+ *
+ * The hardware supports only 256 values with the value zero reserved for the
+ * host, so we check if an assigned value belongs to a previous generation,
+ * which which requires us to assign a new value. If we're the first to use a
+ * VMID for the new generation, we must flush necessary caches and TLBs on all
+ * CPUs.
+ */
+static bool need_new_vmid_gen(struct kvm *kvm)
+{
+	return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
+}
+
+/**
+ * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
+ * @kvm	The guest that we are about to run
+ *
+ * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
+ * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
+ * caches and TLBs.
+ */
+static void update_vttbr(struct kvm *kvm)
+{
+	phys_addr_t pgd_phys;
+
+	if (!need_new_vmid_gen(kvm))
+		return;
+
+	spin_lock(&kvm_vmid_lock);
+
+	/* First user of a new VMID generation? */
+	if (unlikely(kvm_next_vmid == 0)) {
+		atomic64_inc(&kvm_vmid_gen);
+		kvm_next_vmid = 1;
+
+		/*
+		 * On SMP we know no other CPUs can use this CPU's or
+		 * each other's VMID since the kvm_vmid_lock blocks
+		 * them from reentry to the guest.
+		 */
+		on_each_cpu(reset_vm_context, NULL, 1);
+	}
+
+	kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
+	kvm->arch.vmid = kvm_next_vmid;
+	kvm_next_vmid++;
+
+	/* update vttbr to be used with the new vmid */
+	pgd_phys = virt_to_phys(kvm->arch.pgd);
+	kvm->arch.vttbr = pgd_phys & ((1LLU << 40) - 1)
+			  & ~((2 << VTTBR_X) - 1);
+	kvm->arch.vttbr |= (u64)(kvm->arch.vmid) << 48;
+
+	spin_unlock(&kvm_vmid_lock);
+}
+
+/*
+ * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
+ * proper exit to QEMU.
+ */
+static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+		       int exception_index)
+{
+	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 	return 0;
 }
 
+/**
+ * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
+ * @vcpu:	The VCPU pointer
+ * @run:	The kvm_run structure pointer used for userspace state exchange
+ *
+ * This function is called through the VCPU_RUN ioctl called from user space. It
+ * will execute VM code in a loop until the time slice for the process is used
+ * or some emulation is needed from user space in which case the function will
+ * return with return value 0 and with the kvm_run structure filled in with the
+ * required data for the requested emulation.
+ */
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	int ret;
+	sigset_t sigsaved;
+
+	/* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
+	if (unlikely(!vcpu->arch.target))
+		return -ENOEXEC;
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
+
+	ret = 1;
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	while (ret > 0) {
+		/*
+		 * Check conditions before entering the guest
+		 */
+		cond_resched();
+
+		update_vttbr(vcpu->kvm);
+
+		local_irq_disable();
+
+		/*
+		 * Re-check atomic conditions
+		 */
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			run->exit_reason = KVM_EXIT_INTR;
+		}
+
+		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
+			local_irq_enable();
+			continue;
+		}
+
+		BUG_ON(__vcpu_mode(*vcpu_cpsr(vcpu)) == 0xf);
+
+		/**************************************************************
+		 * Enter the guest
+		 */
+		trace_kvm_entry(vcpu->arch.regs.pc);
+		kvm_guest_enter();
+		vcpu->mode = IN_GUEST_MODE;
+
+		ret = __kvm_vcpu_run(vcpu);
+
+		vcpu->mode = OUTSIDE_GUEST_MODE;
+		kvm_guest_exit();
+		trace_kvm_exit(vcpu->arch.regs.pc);
+		/*
+		 * We may have taken a host interrupt in HYP mode (ie
+		 * while executing the guest). This interrupt is still
+		 * pending, as we haven't serviced it yet!
+		 *
+		 * We're now back in SVC mode, with interrupts
+		 * disabled.  Enabling the interrupts now will have
+		 * the effect of taking the interrupt again, in SVC
+		 * mode this time.
+		 */
+		local_irq_enable();
+
+		/*
+		 * Back from guest
+		 *************************************************************/
+
+		ret = handle_exit(vcpu, run, ret);
+	}
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &sigsaved, NULL);
+	return ret;
 }
 
 static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index edf9ed5..cc9448b 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -23,6 +23,12 @@
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
+#include <asm/vfpmacros.h>
+
+#define VCPU_USR_REG(_reg_nr)	(VCPU_USR_REGS + (_reg_nr * 4))
+#define VCPU_USR_SP		(VCPU_USR_REG(13))
+#define VCPU_FIQ_REG(_reg_nr)	(VCPU_FIQ_REGS + (_reg_nr * 4))
+#define VCPU_FIQ_SPSR		(VCPU_FIQ_REG(7))
 
 	.text
 	.align	PAGE_SHIFT
@@ -34,7 +40,33 @@ __kvm_hyp_code_start:
 @  Flush per-VMID TLBs
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/*
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm);
+ *
+ * We rely on the hardware to broadcast the TLB invalidation to all CPUs
+ * inside the inner-shareable domain (which is the case for all v7
+ * implementations).  If we come across a non-IS SMP implementation, we'll
+ * have to use an IPI based mechanism. Until then, we stick to the simple
+ * hardware assisted version.
+ */
 ENTRY(__kvm_tlb_flush_vmid)
+	hvc	#0			@ Switch to Hyp mode
+	push	{r2, r3}
+
+	add	r0, r0, #KVM_VTTBR
+	ldrd	r2, r3, [r0]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+	isb
+	mcr     p15, 0, r0, c8, c3, 0	@ TLBIALLIS (rt ignored)
+	dsb
+	isb
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
+	isb
+
+	pop	{r2, r3}
+	hvc	#0			@ Back to SVC
 	bx	lr
 ENDPROC(__kvm_tlb_flush_vmid)
 
@@ -42,26 +74,702 @@ ENDPROC(__kvm_tlb_flush_vmid)
 @  Flush TLBs and instruction caches of current CPU for all VMIDs
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/*
+ * void __kvm_flush_vm_context(void);
+ */
 ENTRY(__kvm_flush_vm_context)
+	hvc	#0			@ switch to hyp-mode
+
+	mov	r0, #0			@ rn parameter for c15 flushes is SBZ
+	mcr     p15, 4, r0, c8, c7, 4   @ Invalidate Non-secure Non-Hyp TLB
+	mcr     p15, 0, r0, c7, c5, 0   @ Invalidate instruction caches
+	dsb
+	isb
+
+	hvc	#0			@ switch back to svc-mode, see hyp_svc
 	bx	lr
 ENDPROC(__kvm_flush_vm_context)
 
+/* Clobbers {r2-r6} */
+.macro store_vfp_state vfp_base
+	@ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions
+	VFPFMRX	r2, FPEXC
+	@ Make sure VFP is enabled so we can touch the registers.
+	orr	r6, r2, #FPEXC_EN
+	VFPFMXR	FPEXC, r6
+
+	VFPFMRX	r3, FPSCR
+	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
+	beq	1f
+	@ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so
+	@ we only need to save them if FPEXC_EX is set.
+	VFPFMRX r4, FPINST
+	tst	r2, #FPEXC_FP2V
+	VFPFMRX r5, FPINST2, ne		@ vmrsne
+	bic	r6, r2, #FPEXC_EX	@ FPEXC_EX disable
+	VFPFMXR	FPEXC, r6
+1:
+	VFPFSTMIA \vfp_base, r6		@ Save VFP registers
+	stm	\vfp_base, {r2-r5}	@ Save FPEXC, FPSCR, FPINST, FPINST2
+.endm
+
+/* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */
+.macro restore_vfp_state vfp_base
+	VFPFLDMIA \vfp_base, r6		@ Load VFP registers
+	ldm	\vfp_base, {r2-r5}	@ Load FPEXC, FPSCR, FPINST, FPINST2
+
+	VFPFMXR FPSCR, r3
+	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
+	beq	1f
+	VFPFMXR FPINST, r4
+	tst	r2, #FPEXC_FP2V
+	VFPFMXR FPINST2, r5, ne
+1:
+	VFPFMXR FPEXC, r2	@ FPEXC	(last, in case !EN)
+.endm
+
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor world-switch code
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/* These are simply for the macros to work - value don't have meaning */
+.equ usr, 0
+.equ svc, 1
+.equ abt, 2
+.equ und, 3
+.equ irq, 4
+.equ fiq, 5
+
+.macro store_mode_state base_reg, mode
+	.if \mode == usr
+	mrs	r2, SP_usr
+	mov	r3, lr
+	stmdb	\base_reg!, {r2, r3}
+	.elseif \mode != fiq
+	mrs	r2, SP_\mode
+	mrs	r3, LR_\mode
+	mrs	r4, SPSR_\mode
+	stmdb	\base_reg!, {r2, r3, r4}
+	.else
+	mrs	r2, r8_fiq
+	mrs	r3, r9_fiq
+	mrs	r4, r10_fiq
+	mrs	r5, r11_fiq
+	mrs	r6, r12_fiq
+	mrs	r7, SP_fiq
+	mrs	r8, LR_fiq
+	mrs	r9, SPSR_fiq
+	stmdb	\base_reg!, {r2-r9}
+	.endif
+.endm
+
+.macro load_mode_state base_reg, mode
+	.if \mode == usr
+	ldmia	\base_reg!, {r2, r3}
+	msr	SP_usr, r2
+	mov	lr, r3
+	.elseif \mode != fiq
+	ldmia	\base_reg!, {r2, r3, r4}
+	msr	SP_\mode, r2
+	msr	LR_\mode, r3
+	msr	SPSR_\mode, r4
+	.else
+	ldmia	\base_reg!, {r2-r9}
+	msr	r8_fiq, r2
+	msr	r9_fiq, r3
+	msr	r10_fiq, r4
+	msr	r11_fiq, r5
+	msr	r12_fiq, r6
+	msr	SP_fiq, r7
+	msr	LR_fiq, r8
+	msr	SPSR_fiq, r9
+	.endif
+.endm
+
+/* Reads cp15 registers from hardware and stores them in memory
+ * @vcpu:   If 0, registers are written in-order to the stack,
+ * 	    otherwise to the VCPU struct pointed to by vcpup
+ * @vcpup:  Register pointing to VCPU struct
+ */
+.macro read_cp15_state vcpu=0, vcpup
+	mrc	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mrc	p15, 0, r3, c1, c0, 2	@ CPACR
+	mrc	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mrc	p15, 0, r5, c3, c0, 0	@ DACR
+	mrrc	p15, 0, r6, r7, c2	@ TTBR 0
+	mrrc	p15, 1, r8, r9, c2	@ TTBR 1
+	mrc	p15, 0, r10, c10, c2, 0	@ PRRR
+	mrc	p15, 0, r11, c10, c2, 1	@ NMRR
+	mrc	p15, 2, r12, c0, c0, 0	@ CSSELR
+
+	.if \vcpu == 0
+	push	{r2-r12}		@ Push CP15 registers
+	.else
+	str	r2, [\vcpup, #VCPU_SCTLR]
+	str	r3, [\vcpup, #VCPU_CPACR]
+	str	r4, [\vcpup, #VCPU_TTBCR]
+	str	r5, [\vcpup, #VCPU_DACR]
+	add	\vcpup, \vcpup, #VCPU_TTBR0
+	strd	r6, r7, [\vcpup]
+	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	strd	r8, r9, [\vcpup]
+	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
+	str	r10, [\vcpup, #VCPU_PRRR]
+	str	r11, [\vcpup, #VCPU_NMRR]
+	str	r12, [\vcpup, #VCPU_CSSELR]
+	.endif
+
+	mrc	p15, 0, r2, c13, c0, 1	@ CID
+	mrc	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mrc	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mrc	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	mrc	p15, 0, r6, c5, c0, 0	@ DFSR
+	mrc	p15, 0, r7, c5, c0, 1	@ IFSR
+	mrc	p15, 0, r8, c5, c1, 0	@ ADFSR
+	mrc	p15, 0, r9, c5, c1, 1	@ AIFSR
+	mrc	p15, 0, r10, c6, c0, 0	@ DFAR
+	mrc	p15, 0, r11, c6, c0, 2	@ IFAR
+	mrc	p15, 0, r12, c12, c0, 0	@ VBAR
+
+	.if \vcpu == 0
+	push	{r2-r12}		@ Push CP15 registers
+	.else
+	str	r2, [\vcpup, #VCPU_CID]
+	str	r3, [\vcpup, #VCPU_TID_URW]
+	str	r4, [\vcpup, #VCPU_TID_URO]
+	str	r5, [\vcpup, #VCPU_TID_PRIV]
+	str	r6, [\vcpup, #VCPU_DFSR]
+	str	r7, [\vcpup, #VCPU_IFSR]
+	str	r8, [\vcpup, #VCPU_ADFSR]
+	str	r9, [\vcpup, #VCPU_AIFSR]
+	str	r10, [\vcpup, #VCPU_DFAR]
+	str	r11, [\vcpup, #VCPU_IFAR]
+	str	r12, [\vcpup, #VCPU_VBAR]
+	.endif
+.endm
+
+/* Reads cp15 registers from memory and writes them to hardware
+ * @vcpu:   If 0, registers are read in-order from the stack,
+ * 	    otherwise from the VCPU struct pointed to by vcpup
+ * @vcpup:  Register pointing to VCPU struct
+ */
+.macro write_cp15_state vcpu=0, vcpup
+	.if \vcpu == 0
+	pop	{r2-r12}
+	.else
+	ldr	r2, [\vcpup, #VCPU_CID]
+	ldr	r3, [\vcpup, #VCPU_TID_URW]
+	ldr	r4, [\vcpup, #VCPU_TID_URO]
+	ldr	r5, [\vcpup, #VCPU_TID_PRIV]
+	ldr	r6, [\vcpup, #VCPU_DFSR]
+	ldr	r7, [\vcpup, #VCPU_IFSR]
+	ldr	r8, [\vcpup, #VCPU_ADFSR]
+	ldr	r9, [\vcpup, #VCPU_AIFSR]
+	ldr	r10, [\vcpup, #VCPU_DFAR]
+	ldr	r11, [\vcpup, #VCPU_IFAR]
+	ldr	r12, [\vcpup, #VCPU_VBAR]
+	.endif
+
+	mcr	p15, 0, r2, c13, c0, 1	@ CID
+	mcr	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mcr	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mcr	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	mcr	p15, 0, r6, c5, c0, 0	@ DFSR
+	mcr	p15, 0, r7, c5, c0, 1	@ IFSR
+	mcr	p15, 0, r8, c5, c1, 0	@ ADFSR
+	mcr	p15, 0, r9, c5, c1, 1	@ AIFSR
+	mcr	p15, 0, r10, c6, c0, 0	@ DFAR
+	mcr	p15, 0, r11, c6, c0, 2	@ IFAR
+	mcr	p15, 0, r12, c12, c0, 0	@ VBAR
+
+	.if \vcpu == 0
+	pop	{r2-r12}
+	.else
+	ldr	r2, [\vcpup, #VCPU_SCTLR]
+	ldr	r3, [\vcpup, #VCPU_CPACR]
+	ldr	r4, [\vcpup, #VCPU_TTBCR]
+	ldr	r5, [\vcpup, #VCPU_DACR]
+	add	\vcpup, \vcpup, #VCPU_TTBR0
+	ldrd	r6, r7, [\vcpup]
+	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	ldrd	r8, r9, [\vcpup]
+	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
+	ldr	r10, [\vcpup, #VCPU_PRRR]
+	ldr	r11, [\vcpup, #VCPU_NMRR]
+	ldr	r12, [\vcpup, #VCPU_CSSELR]
+	.endif
+
+	mcr	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mcr	p15, 0, r3, c1, c0, 2	@ CPACR
+	mcr	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mcr	p15, 0, r5, c3, c0, 0	@ DACR
+	mcrr	p15, 0, r6, r7, c2	@ TTBR 0
+	mcrr	p15, 1, r8, r9, c2	@ TTBR 1
+	mcr	p15, 0, r10, c10, c2, 0	@ PRRR
+	mcr	p15, 0, r11, c10, c2, 1	@ NMRR
+	mcr	p15, 2, r12, c0, c0, 0	@ CSSELR
+.endm
+
+/* Configures the HSTR (Hyp System Trap Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hstr entry
+	mrc	p15, 4, r2, c1, c1, 3
+	ldr	r3, =HSTR_T(15)
+	.if \entry == 1
+	orr	r2, r2, r3		@ Trap CR{15}
+	.else
+	bic	r2, r2, r3		@ Don't trap any CRx accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 3
+.endm
+
+/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
+ * (hardware reset value is 0). Keep previous value in r2. */
+.macro set_hcptr entry, mask
+	mrc	p15, 4, r2, c1, c1, 2
+	ldr	r3, =\mask
+	.if \entry == 1
+	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
+	.else
+	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
+	.endif
+	mcr	p15, 4, r3, c1, c1, 2
+.endm
+
+/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hdcr entry
+	mrc	p15, 4, r2, c1, c1, 1
+	ldr	r3, =(HDCR_TPM|HDCR_TPMCR)
+	.if \entry == 1
+	orr	r2, r2, r3		@ Trap some perfmon accesses
+	.else
+	bic	r2, r2, r3		@ Don't trap any perfmon accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 1
+.endm
+
+/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
+.macro configure_hyp_role entry, vcpu_ptr
+	mrc	p15, 4, r2, c1, c1, 0	@ HCR
+	bic	r2, r2, #HCR_VIRT_EXCP_MASK
+	ldr	r3, =HCR_GUEST_MASK
+	.if \entry == 1
+	orr	r2, r2, r3
+	ldr	r3, [\vcpu_ptr, #VCPU_IRQ_LINES]
+	orr	r2, r2, r3
+	.else
+	bic	r2, r2, r3
+	.endif
+	mcr	p15, 4, r2, c1, c1, 0
+.endm
+
+.macro load_vcpu reg
+	mrc	p15, 4, \reg, c13, c0, 2	@ HTPIDR
+.endm
+
+@ Arguments:
+@  r0: pointer to vcpu struct
 ENTRY(__kvm_vcpu_run)
-	bx	lr
+	hvc	#0			@ switch to hyp-mode
+
+	@ Save the vcpu pointer
+	mcr	p15, 4, r0, c13, c0, 2	@ HTPIDR
+
+	@ Now we're in Hyp-mode and lr_usr, spsr_hyp are on the stack
+	mrs	r2, sp_usr
+	push	{r2}			@ Push r13_usr
+	push	{r4-r12}		@ Push r4-r12
+
+	store_mode_state sp, svc
+	store_mode_state sp, abt
+	store_mode_state sp, und
+	store_mode_state sp, irq
+	store_mode_state sp, fiq
+
+	@ Store hardware CP15 state and load guest state
+	read_cp15_state
+	write_cp15_state 1, r0
+
+	@ If the host kernel has not been configured with VFPv3 support,
+	@ then it is safer if we deny guests from using it as well.
+#ifdef CONFIG_VFPv3
+	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
+	VFPFMRX r2, FPEXC		@ VMRS
+	push	{r2}
+	orr	r2, r2, #FPEXC_EN
+	VFPFMXR FPEXC, r2		@ VMSR
+#endif
+
+	@ Configure Hyp-role
+	configure_hyp_role 1, r0
+
+	@ Trap coprocessor CRx accesses
+	set_hstr 1
+	set_hcptr 1, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+	set_hdcr 1
+
+	@ Write configured ID register into MIDR alias
+	ldr	r1, [r0, #VCPU_MIDR]
+	mcr	p15, 4, r1, c0, c0, 0
+
+	@ Write guest view of MPIDR into VMPIDR
+	ldr	r1, [r0, #VCPU_MPIDR]
+	mcr	p15, 4, r1, c0, c0, 5
+
+	@ Load guest registers
+	add	r0, r0, #(VCPU_USR_SP)
+	load_mode_state r0, usr
+	load_mode_state r0, svc
+	load_mode_state r0, abt
+	load_mode_state r0, und
+	load_mode_state r0, irq
+	load_mode_state r0, fiq
+
+	@ Load return state (r0 now points to vcpu->arch.regs.pc)
+	ldmia	r0, {r2, r3}
+	msr	ELR_hyp, r2
+	msr	SPSR_cxsf, r3
+
+	@ Set up guest memory translation
+	sub	r1, r0, #(VCPU_PC - VCPU_KVM)	@ r1 points to kvm struct
+	ldr	r1, [r1]
+	add	r1, r1, #KVM_VTTBR
+	ldrd	r2, r3, [r1]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Load remaining registers and do the switch
+	sub	r0, r0, #(VCPU_PC - VCPU_USR_REGS)
+	ldmia	r0, {r0-r12}
+	clrex				@ Clear exclusive monitor
+	eret
+
+__kvm_vcpu_return:
+	@ Set VMID == 0
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Store return state
+	mrs	r2, ELR_hyp
+	mrs	r3, spsr
+	str	r2, [r1, #VCPU_PC]
+	str	r3, [r1, #VCPU_CPSR]
+
+	@ Store guest registers
+	add	r1, r1, #(VCPU_FIQ_SPSR + 4)
+	store_mode_state r1, fiq
+	store_mode_state r1, irq
+	store_mode_state r1, und
+	store_mode_state r1, abt
+	store_mode_state r1, svc
+	store_mode_state r1, usr
+	sub	r1, r1, #(VCPU_USR_REG(13))
+
+	@ Don't trap coprocessor accesses for host kernel
+	set_hstr 0
+	set_hdcr 0
+	set_hcptr 0, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+
+#ifdef CONFIG_VFPv3
+	@ Save floating point registers we if let guest use them.
+	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
+	bne	after_vfp_restore
 
+	@ Switch VFP/NEON hardware state to the host's
+	add	r7, r1, #VCPU_VFP_GUEST
+	store_vfp_state r7
+	add	r7, r1, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	restore_vfp_state r7
+
+after_vfp_restore:
+	@ Restore FPEXC_EN which we clobbered on entry
+	pop	{r2}
+	VFPFMXR FPEXC, r2
+#endif
+
+	@ Reset Hyp-role
+	configure_hyp_role 0, r1
+
+	@ Let host read hardware MIDR
+	mrc	p15, 0, r2, c0, c0, 0
+	mcr	p15, 4, r2, c0, c0, 0
+
+	@ Back to hardware MPIDR
+	mrc	p15, 0, r2, c0, c0, 5
+	mcr	p15, 4, r2, c0, c0, 5
+
+	@ Store guest CP15 state and restore host state
+	read_cp15_state 1, r1
+	write_cp15_state
+
+	load_mode_state sp, fiq
+	load_mode_state sp, irq
+	load_mode_state sp, und
+	load_mode_state sp, abt
+	load_mode_state sp, svc
+
+	pop	{r4-r12}		@ Pop r4-r12
+	pop	{r2}			@ Pop r13_usr
+	msr	sp_usr, r2
+
+	hvc	#0			@ switch back to svc-mode, see hyp_svc
+
+	clrex				@ Clear exclusive monitor
+	bx	lr			@ return to IOCTL
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor exception vector and handlers
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
+/*
+ * The KVM/ARM Hypervisor ABI is defined as follows:
+ *
+ * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
+ * instruction is issued since all traps are disabled when running the host
+ * kernel as per the Hyp-mode initialization at boot time.
+ *
+ * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
+ * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
+ * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
+ * instructions are called from within Hyp-mode.
+ *
+ * Hyp-ABI: Switching from host kernel to Hyp-mode:
+ *    Switching to Hyp mode is done through a simple HVC #0 instruction. The
+ *    exception vector code will check that the HVC comes from VMID==0 and if
+ *    so will store the necessary state on the Hyp stack, which will look like
+ *    this (growing downwards, see the hyp_hvc handler):
+ *      ...
+ *      stack_page + 4: spsr (Host-SVC cpsr)
+ *      stack_page    : lr_usr
+ *      --------------: stack bottom
+ *
+ * Hyp-ABI: Switching from Hyp-mode to host kernel SVC mode:
+ *    When returning from Hyp mode to SVC mode, another HVC instruction is
+ *    executed from Hyp mode, which is taken in the hyp_svc handler. The
+ *    bottom of the Hyp is derived from the Hyp stack pointer (only a single
+ *    page aligned stack is used per CPU) and the initial SVC registers are
+ *    used to restore the host state.
+ *
+ * Hyp-ABI: Change the HVBAR:
+ *    When removing the KVM module we want to reset our hold on Hyp mode.
+ *    This is accomplished by calling HVC #0xff from the host kernel
+ *    (VMID==0) with the desired new HVBAR in r0.
+ *
+ * Note that the above is used to execute code in Hyp-mode from a host-kernel
+ * point of view, and is a different concept from performing a world-switch and
+ * executing guest code SVC mode (with a VMID != 0).
+ */
+
+@ Handle undef, svc, pabt, or dabt by crashing with a user notice
+.macro bad_exception exception_code, panic_str
+	mrrc	p15, 6, r2, r3, c2	@ Read VTTBR
+	lsr	r3, r3, #16
+	ands	r3, r3, #0xff
+
+	@ COND:neq means we're probably in the guest and we can try fetching
+	@ the vcpu pointer and stuff off the stack and keep our fingers crossed
+	beq	99f
+	mov	r0, #\exception_code
+	load_vcpu	r1		@ Load VCPU pointer
+	.if \exception_code == ARM_EXCEPTION_DATA_ABORT
+	mrc	p15, 4, r2, c5, c2, 0	@ HSR
+	mrc	p15, 4, r3, c6, c0, 0	@ HDFAR
+	str	r2, [r1, #VCPU_HSR]
+	str	r3, [r1, #VCPU_HDFAR]
+	.endif
+	.if \exception_code == ARM_EXCEPTION_PREF_ABORT
+	mrc	p15, 4, r2, c5, c2, 0	@ HSR
+	mrc	p15, 4, r3, c6, c0, 2	@ HIFAR
+	str	r2, [r1, #VCPU_HSR]
+	str	r3, [r1, #VCPU_HIFAR]
+	.endif
+	mrs	r2, ELR_hyp
+	str	r2, [r1, #VCPU_HYP_PC]
+	b	__kvm_vcpu_return
+
+	@ We were in the host already
+99:	hvc	#0	@ switch to SVC mode
+	ldr	r0, \panic_str
+	mrs	r1, ELR_hyp
+	b	panic
+
+.endm
+
+	.text
+
 	.align 5
 __kvm_hyp_vector:
 	.globl __kvm_hyp_vector
-	nop
+
+	@ Hyp-mode exception vector
+	W(b)	hyp_reset
+	W(b)	hyp_undef
+	W(b)	hyp_svc
+	W(b)	hyp_pabt
+	W(b)	hyp_dabt
+	W(b)	hyp_hvc
+	W(b)	hyp_irq
+	W(b)	hyp_fiq
+
+	.align
+hyp_reset:
+	b	hyp_reset
+
+	.align
+hyp_undef:
+	bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str
+
+	.align
+hyp_svc:
+	@ Can only get here if HVC or SVC is called from Hyp, mode which means
+	@ we want to change mode back to SVC mode.
+	push	{r12}
+	mov	r12, sp
+	bic	r12, r12, #0x0ff
+	bic	r12, r12, #0xf00
+	ldr	lr, [r12, #4]
+	msr	SPSR_csxf, lr
+	ldr	lr, [r12]
+	pop	{r12}
+	eret
+
+	.align
+hyp_pabt:
+	bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str
+
+	.align
+hyp_dabt:
+	bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str
+
+	.align
+hyp_hvc:
+	@ Getting here is either becuase of a trap from a guest or from calling
+	@ HVC from the host kernel, which means "switch to Hyp mode".
+	push	{r0, r1, r2}
+
+	@ Check syndrome register
+	mrc	p15, 4, r0, c5, c2, 0	@ HSR
+	lsr	r1, r0, #HSR_EC_SHIFT
+#ifdef CONFIG_VFPv3
+	cmp	r1, #HSR_EC_CP_0_13
+	beq	switch_to_guest_vfp
+#endif
+	cmp	r1, #HSR_EC_HVC
+	bne	guest_trap		@ Not HVC instr.
+
+	@ Let's check if the HVC came from VMID 0 and allow simple
+	@ switch to Hyp mode
+	mrrc    p15, 6, r1, r2, c2
+	lsr     r2, r2, #16
+	and     r2, r2, #0xff
+	cmp     r2, #0
+	bne	guest_trap		@ Guest called HVC
+
+	@ HVC came from host. Check if this is a request to
+	@ switch HVBAR to another set of vectors (kvm_exit).
+	lsl	r0, r0, #16
+	lsr	r0, r0, #16
+	cmp	r0, #0xff
+	bne	host_switch_to_hyp	@ Not HVC #0xff
+
+	@ We're switching away from this hypervisor, let's blow the TLBs.
+	pop	{r0, r1, r2}
+	mcr	p15, 4, r0, c12, c0, 0  @ HVBAR
+	mcr	p15, 4, r0, c8, c7, 0   @ Flush Hyp TLB, r0 ignored
+	eret
+
+host_switch_to_hyp:
+	@ Store lr_usr,spsr (svc cpsr) on bottom of stack
+	mov	r1, sp
+	bic	r1, r1, #0x0ff
+	bic	r1, r1, #0xf00
+	str	lr, [r1]
+	mrs	lr, spsr
+	str	lr, [r1, #4]
+
+	pop	{r0, r1, r2}
+
+	@ Return to caller in Hyp mode
+	mrs	lr, ELR_hyp
+	mov	pc, lr
+
+guest_trap:
+	load_vcpu	r1		@ Load VCPU pointer
+	str	r0, [r1, #VCPU_HSR]
+	add	r1, r1, #VCPU_USR_REG(3)
+	stmia	r1, {r3-r12}
+	sub	r1, r1, #(VCPU_USR_REG(3) - VCPU_USR_REG(0))
+	pop	{r3, r4, r5}
+	stmia	r1, {r3, r4, r5}
+	sub	r1, r1, #VCPU_USR_REG(0)
+
+	@ Check if we need the fault information
+	lsr	r2, r0, #HSR_EC_SHIFT
+	cmp	r2, #HSR_EC_IABT
+	beq	2f
+	cmpne	r2, #HSR_EC_DABT
+	bne	1f
+
+2:	mrc	p15, 4, r2, c6, c0, 0	@ HDFAR
+	mrc	p15, 4, r3, c6, c0, 2	@ HIFAR
+	mrc	p15, 4, r4, c6, c0, 4	@ HPFAR
+	add	r5, r1, #VCPU_HDFAR
+	stmia	r5, {r2, r3, r4}
+
+1:	mov	r0, #ARM_EXCEPTION_HVC
+	b	__kvm_vcpu_return
+
+@ If VFPv3 support is not available, then we will not switch the VFP
+@ registers; however cp10 and cp11 accesses will still trap and fallback
+@ to the regular coprocessor emulation code, which currently will
+@ inject an undefined exception to the guest.
+#ifdef CONFIG_VFPv3
+switch_to_guest_vfp:
+	load_vcpu	r0		@ Load VCPU pointer
+	push	{r3-r7}
+
+	@ NEON/VFP used.  Turn on VFP access.
+	set_hcptr 0, (HCPTR_TCP(10) | HCPTR_TCP(11))
+
+	@ Switch VFP/NEON hardware state to the guest's
+	add	r7, r0, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	store_vfp_state r7
+	add	r7, r0, #VCPU_VFP_GUEST
+	restore_vfp_state r7
+
+	pop	{r3-r7}
+	pop	{r0-r2}
+	eret
+#endif
+
+	.align
+hyp_irq:
+	push	{r0}
+	load_vcpu	r0		@ Load VCPU pointer
+	add	r0, r0, #(VCPU_USR_REG(1))
+	stmia	r0, {r1-r12}
+	pop	{r0}
+	load_vcpu	r1		@ Load VCPU pointer again
+	str	r0, [r1, #VCPU_USR_REG(0)]
+
+	mov	r0, #ARM_EXCEPTION_IRQ
+	b	__kvm_vcpu_return
+
+	.align
+hyp_fiq:
+	b	hyp_fiq
+
+	.ltorg
+
+und_die_str:
+	.ascii	"unexpected undefined exception in Hyp mode at: %#08x"
+pabt_die_str:
+	.ascii	"unexpected prefetch abort in Hyp mode at: %#08x"
+dabt_die_str:
+	.ascii	"unexpected data abort in Hyp mode at: %#08x"
 
 /*
  * The below lines makes sure the HYP mode code fits in a single page (the

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 11/15] KVM: ARM: Emulation framework and CP15 emulation
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:35   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.

Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commit handles these exits by
emulating the intended operation in software and skipping the guest
instruction.

Minor notes about the coproc register reset:
1) We reserve a value of 0 as an invalid cp15 offset, to catch bugs in our
   table, at cost of 4 bytes per vcpu.

2) Added comments on the table indicating how we handle each register, for
   simplicity of understanding.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h     |    9 +
 arch/arm/include/asm/kvm_coproc.h  |    7 
 arch/arm/include/asm/kvm_emulate.h |    6 
 arch/arm/include/asm/kvm_host.h    |    5 
 arch/arm/kvm/arm.c                 |  166 ++++++++++
 arch/arm/kvm/coproc.c              |  592 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/emulate.c             |  207 +++++++++++++
 arch/arm/kvm/trace.h               |   30 ++
 8 files changed, 1020 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index ee345a6..ae586c1 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -76,6 +76,11 @@
 			HCR_SWIO | HCR_TIDCP)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
+/* System Control Register (SCTLR) bits */
+#define SCTLR_TE	(1 << 30)
+#define SCTLR_EE	(1 << 25)
+#define SCTLR_V		(1 << 13)
+
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
 #define HSCTLR_EE	(1 << 25)
@@ -153,6 +158,10 @@
 #define HSR_ISS		(HSR_IL - 1)
 #define HSR_ISV_SHIFT	(24)
 #define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_CV_SHIFT	(24)
+#define HSR_CV		(1U << HSR_CV_SHIFT)
+#define HSR_COND_SHIFT	(20)
+#define HSR_COND	(0xfU << HSR_COND_SHIFT)
 
 #define HSR_EC_UNKNOWN	(0x00)
 #define HSR_EC_WFI	(0x01)
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
index b6d023d..c451fb4 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -21,4 +21,11 @@
 
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_coproc_table_init(void);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 9e29335..6ddfae2 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -51,6 +51,12 @@ static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
 	return mode;
 }
 
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
+void kvm_inject_undefined(struct kvm_vcpu *vcpu);
+void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
+void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
+
 /*
  * Return the SPSR for the specified mode of the virtual CPU.
  */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ca4c079..c98dcd7 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -109,6 +109,7 @@ enum cp15_regs {
 	c5_AIFSR,		/* Auxilary Instruction Fault Status Register */
 	c6_DFAR,		/* Data Fault Address Register */
 	c6_IFAR,		/* Instruction Fault Address Register */
+	c9_L2CTLR,		/* Cortex A15 L2 Control Register */
 	c10_PRRR,		/* Primary Region Remap Register */
 	c10_NMRR,		/* Normal Memory Remap Register */
 	c12_VBAR,		/* Vector Base Address Register */
@@ -146,6 +147,10 @@ struct kvm_vcpu_arch {
 	 * Anything that is not used directly from assembly code goes
 	 * here.
 	 */
+	/* dcache set/way operation pending */
+	int last_pcpu;
+	cpumask_t require_dcache_flush;
+
 	/* IO related fields */
 	struct {
 		bool sign_extend;	/* for byte/halfword loads */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 087f9d1..8dc5887 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -37,11 +37,14 @@
 #include <asm/cputype.h>
 #include <asm/idmap.h>
 #include <asm/tlbflush.h>
+#include <asm/cacheflush.h>
 #include <asm/virt.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <asm/opcodes.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
@@ -279,6 +282,17 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
 	vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
+
+	/*
+	 * Check whether this vcpu requires the cache to be flushed on
+	 * this physical CPU. This is a consequence of doing dcache
+	 * operations by set/way on this vcpu. We do it here to be in
+	 * a non-preemptible section.
+	 */
+	if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
+		cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+		flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
+	}
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -379,6 +393,114 @@ static void update_vttbr(struct kvm *kvm)
 	spin_unlock(&kvm_vmid_lock);
 }
 
+static int handle_svc_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* SVC called from Hyp mode should never get here */
+	kvm_debug("SVC called from Hyp mode shouldn't go here\n");
+	BUG();
+	return -EINVAL; /* Squash warning */
+}
+
+static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/*
+	 * Guest called HVC instruction:
+	 * Let it know we don't want that by injecting an undefined exception.
+	 */
+	kvm_debug("hvc: %x (at %08x)", vcpu->arch.hsr & ((1 << 16) - 1),
+				     vcpu->arch.regs.pc);
+	kvm_debug("         HSR: %8x", vcpu->arch.hsr);
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* We don't support SMC; don't do that. */
+	kvm_debug("smc: at %08x", vcpu->arch.regs.pc);
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int handle_pabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* The hypervisor should never cause aborts */
+	kvm_err("Prefetch Abort taken from Hyp mode at %#08x (HSR: %#08x)\n",
+		vcpu->arch.hifar, vcpu->arch.hsr);
+	return -EFAULT;
+}
+
+static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* This is either an error in the ws. code or an external abort */
+	kvm_err("Data Abort taken from Hyp mode at %#08x (HSR: %#08x)\n",
+		vcpu->arch.hdfar, vcpu->arch.hsr);
+	return -EFAULT;
+}
+
+typedef int (*exit_handle_fn)(struct kvm_vcpu *, struct kvm_run *);
+static exit_handle_fn arm_exit_handlers[] = {
+	[HSR_EC_WFI]		= kvm_handle_wfi,
+	[HSR_EC_CP15_32]	= kvm_handle_cp15_32,
+	[HSR_EC_CP15_64]	= kvm_handle_cp15_64,
+	[HSR_EC_CP14_MR]	= kvm_handle_cp14_access,
+	[HSR_EC_CP14_LS]	= kvm_handle_cp14_load_store,
+	[HSR_EC_CP14_64]	= kvm_handle_cp14_access,
+	[HSR_EC_CP_0_13]	= kvm_handle_cp_0_13_access,
+	[HSR_EC_CP10_ID]	= kvm_handle_cp10_id,
+	[HSR_EC_SVC_HYP]	= handle_svc_hyp,
+	[HSR_EC_HVC]		= handle_hvc,
+	[HSR_EC_SMC]		= handle_smc,
+	[HSR_EC_IABT]		= kvm_handle_guest_abort,
+	[HSR_EC_IABT_HYP]	= handle_pabt_hyp,
+	[HSR_EC_DABT]		= kvm_handle_guest_abort,
+	[HSR_EC_DABT_HYP]	= handle_dabt_hyp,
+};
+
+/*
+ * A conditional instruction is allowed to trap, even though it
+ * wouldn't be executed.  So let's re-implement the hardware, in
+ * software!
+ */
+static bool kvm_condition_valid(struct kvm_vcpu *vcpu)
+{
+	unsigned long cpsr, cond, insn;
+
+	/*
+	 * Exception Code 0 can only happen if we set HCR.TGE to 1, to
+	 * catch undefined instructions, and then we won't get past
+	 * the arm_exit_handlers test anyway.
+	 */
+	BUG_ON(((vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT) == 0);
+
+	/* Top two bits non-zero?  Unconditional. */
+	if (vcpu->arch.hsr >> 30)
+		return true;
+
+	cpsr = *vcpu_cpsr(vcpu);
+
+	/* Is condition field valid? */
+	if ((vcpu->arch.hsr & HSR_CV) >> HSR_CV_SHIFT)
+		cond = (vcpu->arch.hsr & HSR_COND) >> HSR_COND_SHIFT;
+	else {
+		/* This can happen in Thumb mode: examine IT state. */
+		unsigned long it;
+
+		it = ((cpsr >> 8) & 0xFC) | ((cpsr >> 25) & 0x3);
+
+		/* it == 0 => unconditional. */
+		if (it == 0)
+			return true;
+
+		/* The cond for this insn works out as the top 4 bits. */
+		cond = (it >> 4);
+	}
+
+	/* Shift makes it look like an ARM-mode instruction */
+	insn = cond << 28;
+	return arm_check_condition(insn, cpsr) != ARM_OPCODE_CONDTEST_FAIL;
+}
+
 /*
  * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
  * proper exit to QEMU.
@@ -386,8 +508,46 @@ static void update_vttbr(struct kvm *kvm)
 static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		       int exception_index)
 {
-	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-	return 0;
+	unsigned long hsr_ec;
+
+	switch (exception_index) {
+	case ARM_EXCEPTION_IRQ:
+		return 1;
+	case ARM_EXCEPTION_UNDEFINED:
+		kvm_err("Undefined exception in Hyp mode at: %#08x\n",
+			vcpu->arch.hyp_pc);
+		BUG();
+		panic("KVM: Hypervisor undefined exception!\n");
+	case ARM_EXCEPTION_DATA_ABORT:
+	case ARM_EXCEPTION_PREF_ABORT:
+	case ARM_EXCEPTION_HVC:
+		hsr_ec = (vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT;
+
+		if (hsr_ec >= ARRAY_SIZE(arm_exit_handlers)
+		    || !arm_exit_handlers[hsr_ec]) {
+			kvm_err("Unkown exception class: %#08lx, "
+				"hsr: %#08x\n", hsr_ec,
+				(unsigned int)vcpu->arch.hsr);
+			BUG();
+		}
+
+		/*
+		 * See ARM ARM B1.14.1: "Hyp traps on instructions
+		 * that fail their condition code check"
+		 */
+		if (!kvm_condition_valid(vcpu)) {
+			bool is_wide = vcpu->arch.hsr & HSR_IL;
+			kvm_skip_instr(vcpu, is_wide);
+			return 1;
+		}
+
+		return arm_exit_handlers[hsr_ec](vcpu, run);
+	default:
+		kvm_pr_unimpl("Unsupported exception type: %d",
+			      exception_index);
+		run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+		return 0;
+	}
 }
 
 /**
@@ -450,6 +610,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		ret = __kvm_vcpu_run(vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
+		vcpu->arch.last_pcpu = smp_processor_id();
 		kvm_guest_exit();
 		trace_kvm_exit(vcpu->arch.regs.pc);
 		/*
@@ -772,6 +933,7 @@ int kvm_arch_init(void *opaque)
 	if (err)
 		goto out_err;
 
+	kvm_coproc_table_init();
 	return 0;
 out_err:
 	return err;
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 4b9dad8..a6d8bb0 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -15,8 +15,600 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+#include <linux/mm.h>
 #include <linux/kvm_host.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <trace/events/kvm.h>
 
+#include "trace.h"
+
+/******************************************************************************
+ * Co-processor emulation
+ *****************************************************************************/
+
+struct coproc_params {
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+	unsigned long Rt1;
+	unsigned long Rt2;
+	bool is_64bit;
+	bool is_write;
+};
+
+struct coproc_reg {
+	/* MRC/MCR/MRRC/MCRR instruction which accesses it. */
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+
+	bool is_64;
+
+	/* Trapped access from guest, if non-NULL. */
+	bool (*access)(struct kvm_vcpu *,
+		       const struct coproc_params *,
+		       const struct coproc_reg *);
+
+	/* Initialization for vcpu. */
+	void (*reset)(struct kvm_vcpu *, const struct coproc_reg *);
+
+	/* Index into vcpu->arch.cp15[], or 0 if we don't need to save it. */
+	enum cp15_regs reg;
+
+	/* Value (usually reset value) */
+	u64 val;
+};
+
+static void print_cp_instr(const struct coproc_params *p)
+{
+	/* Look, we even formatted it for you to paste into the table! */
+	if (p->is_64bit) {
+		kvm_pr_unimpl(" { CRm(%2lu), Op1(%2lu), is64, func_%s },\n",
+			      p->CRm, p->Op1, p->is_write ? "write" : "read");
+	} else {
+		kvm_pr_unimpl(" { CRn(%2lu), CRm(%2lu), Op1(%2lu), Op2(%2lu), is32,"
+			      " func_%s },\n",
+			      p->CRn, p->CRm, p->Op1, p->Op2,
+			      p->is_write ? "write" : "read");
+	}
+}
+
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/*
+	 * We can get here, if the host has been built without VFPv3 support,
+	 * but the guest attempted a floating point operation.
+	 */
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static bool ignore_write(struct kvm_vcpu *vcpu, const struct coproc_params *p)
+{
+	return true;
+}
+
+static bool read_zero(struct kvm_vcpu *vcpu, const struct coproc_params *p)
+{
+	*vcpu_reg(vcpu, p->Rt1) = 0;
+	return true;
+}
+
+static bool write_to_read_only(struct kvm_vcpu *vcpu,
+			       const struct coproc_params *params)
+{
+	kvm_debug("CP15 write to read-only register at: %08x\n",
+		  vcpu->arch.regs.pc);
+	print_cp_instr(params);
+	return false;
+}
+
+static bool read_from_write_only(struct kvm_vcpu *vcpu,
+				 const struct coproc_params *params)
+{
+	kvm_debug("CP15 read to write-only register at: %08x\n",
+		  vcpu->arch.regs.pc);
+	print_cp_instr(params);
+	return false;
+}
+
+/* A15 TRM 4.3.48: R/O WI. */
+static bool access_l2ctlr(struct kvm_vcpu *vcpu,
+			  const struct coproc_params *p,
+			  const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c9_L2CTLR];
+	return true;
+}
+
+static void reset_l2ctlr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	u32 l2ctlr, ncores;
+
+	asm volatile("mrc p15, 1, %0, c9, c0, 2\n" : "=r" (l2ctlr));
+	l2ctlr &= ~(3 << 24);
+	ncores = atomic_read(&vcpu->kvm->online_vcpus) - 1;
+	l2ctlr |= (ncores & 3) << 24;
+
+	vcpu->arch.cp15[c9_L2CTLR] = l2ctlr;
+}
+
+/* A15 TRM 4.3.49: R/O WI (even if NSACR.NS_L2ERR, a write of 1 is ignored). */
+static bool access_l2ectlr(struct kvm_vcpu *vcpu,
+			   const struct coproc_params *p,
+			   const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = 0;
+	return true;
+}
+
+/* A15 TRM 4.3.60: R/O. */
+static bool access_cbar(struct kvm_vcpu *vcpu,
+			const struct coproc_params *p,
+			const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p);
+	return read_zero(vcpu, p);
+}
+
+/* A15 TRM 4.3.28: RO WI */
+static bool access_actlr(struct kvm_vcpu *vcpu,
+			 const struct coproc_params *p,
+			 const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c1_ACTLR];
+	return true;
+}
+
+static void reset_actlr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	u32 actlr;
+
+	/* ACTLR contains SMP bit: make sure you create all cpus first! */
+	asm volatile("mrc p15, 0, %0, c1, c0, 1\n" : "=r" (actlr));
+	/* Make the SMP bit consistent with the guest configuration */
+	if (atomic_read(&vcpu->kvm->online_vcpus) > 1)
+		actlr |= 1U << 6;
+	else
+		actlr &= ~(1U << 6);
+
+	vcpu->arch.cp15[c1_ACTLR] = actlr;
+}
+
+/* See note at ARM ARM B1.14.4 */
+static bool access_dcsw(struct kvm_vcpu *vcpu,
+			const struct coproc_params *p,
+			const struct coproc_reg *r)
+{
+	u32 val;
+	int cpu;
+
+	cpu = get_cpu();
+
+	if (!p->is_write)
+		return read_from_write_only(vcpu, p);
+
+	cpumask_setall(&vcpu->arch.require_dcache_flush);
+	cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+
+	/* If we were already preempted, take the long way around */
+	if (cpu != vcpu->arch.last_pcpu) {
+		flush_cache_all();
+		goto done;
+	}
+
+	val = *vcpu_reg(vcpu, p->Rt1);
+
+	switch (p->CRm) {
+	case 6:			/* Upgrade DCISW to DCCISW, as per HCR.SWIO */
+	case 14:		/* DCCISW */
+		asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r" (val));
+		break;
+
+	case 10:		/* DCCSW */
+		asm volatile("mcr p15, 0, %0, c7, c10, 2" : : "r" (val));
+		break;
+	}
+
+done:
+	put_cpu();
+
+	return true;
+}
+
+/*
+ * We could trap ID_DFR0 and tell the guest we don't support performance
+ * monitoring.  Unfortunately the patch to make the kernel check ID_DFR0 was
+ * NAKed, so it will read the PMCR anyway.
+ *
+ * Therefore we tell the guest we have 0 counters.  Unfortunately, we
+ * must always support PMCCNTR (the cycle counter): we just RAZ/WI for
+ * all PM registers, which doesn't crash the guest kernel at least.
+ */
+static bool pm_fake(struct kvm_vcpu *vcpu,
+		    const struct coproc_params *p,
+		    const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+	else
+		return read_zero(vcpu, p);
+}
+
+#define access_pmcr pm_fake
+#define access_pmcntenset pm_fake
+#define access_pmcntenclr pm_fake
+#define access_pmovsr pm_fake
+#define access_pmselr pm_fake
+#define access_pmceid0 pm_fake
+#define access_pmceid1 pm_fake
+#define access_pmccntr pm_fake
+#define access_pmxevtyper pm_fake
+#define access_pmxevcntr pm_fake
+#define access_pmuserenr pm_fake
+#define access_pmintenset pm_fake
+#define access_pmintenclr pm_fake
+
+/* Reset functions */
+static void reset_unknown(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15));
+	vcpu->arch.cp15[r->reg] = 0xdecafbad;
+}
+
+static void reset_val(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15));
+	vcpu->arch.cp15[r->reg] = r->val;
+}
+
+static void reset_unknown64(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg + 1 >= ARRAY_SIZE(vcpu->arch.cp15));
+
+	vcpu->arch.cp15[r->reg] = 0xdecafbad;
+	vcpu->arch.cp15[r->reg+1] = 0xd0c0ffee;
+}
+
+static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	/*
+	 * Compute guest MPIDR:
+	 * (Even if we present only one VCPU to the guest on an SMP
+	 * host we don't set the U bit in the MPIDR, or vice versa, as
+	 * revealing the underlying hardware properties is likely to
+	 * be the best choice).
+	 */
+	vcpu->arch.cp15[c0_MPIDR] = (read_cpuid_mpidr() & ~MPIDR_CPUID)
+		| (vcpu->vcpu_id & MPIDR_CPUID);
+}
+
+#define CRn(_x)		.CRn = _x
+#define CRm(_x) 	.CRm = _x
+#define Op1(_x) 	.Op1 = _x
+#define Op2(_x) 	.Op2 = _x
+#define is64		.is_64 = true
+#define is32		.is_64 = false
+
+/* Architected CP15 registers.
+ * Important: Must be sorted ascending by CRn, CRM, Op1, Op2
+ */
+static const struct coproc_reg cp15_regs[] = {
+	/* CSSELR: swapped by interrupt.S. */
+	{ CRn( 0), CRm( 0), Op1( 2), Op2( 0), is32,
+			NULL, reset_unknown, c0_CSSELR },
+
+	/* TTBR0/TTBR1: swapped by interrupt.S. */
+	{ CRm( 2), Op1( 0), is64, NULL, reset_unknown64, c2_TTBR0 },
+	{ CRm( 2), Op1( 1), is64, NULL, reset_unknown64, c2_TTBR1 },
+
+	/* TTBCR: swapped by interrupt.S. */
+	{ CRn( 2), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_val, c2_TTBCR, 0x00000000 },
+
+	/* DACR: swapped by interrupt.S. */
+	{ CRn( 3), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c3_DACR },
+
+	/* DFSR/IFSR/ADFSR/AIFSR: swapped by interrupt.S. */
+	{ CRn( 5), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c5_DFSR },
+	{ CRn( 5), CRm( 0), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c5_IFSR },
+	{ CRn( 5), CRm( 1), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c5_ADFSR },
+	{ CRn( 5), CRm( 1), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c5_AIFSR },
+
+	/* DFAR/IFAR: swapped by interrupt.S. */
+	{ CRn( 6), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c6_DFAR },
+	{ CRn( 6), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_unknown, c6_IFAR },
+	/*
+	 * DC{C,I,CI}SW operations:
+	 */
+	{ CRn( 7), CRm( 6), Op1( 0), Op2( 2), is32, access_dcsw},
+	{ CRn( 7), CRm(10), Op1( 0), Op2( 2), is32, access_dcsw},
+	{ CRn( 7), CRm(14), Op1( 0), Op2( 2), is32, access_dcsw},
+	/*
+	 * Dummy performance monitor implementation.
+	 */
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 0), is32, access_pmcr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 1), is32, access_pmcntenset},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 2), is32, access_pmcntenclr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 3), is32, access_pmovsr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 5), is32, access_pmselr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 6), is32, access_pmceid0},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 7), is32, access_pmceid1},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 0), is32, access_pmccntr},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 1), is32, access_pmxevtyper},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 2), is32, access_pmxevcntr},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 0), is32, access_pmuserenr},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 1), is32, access_pmintenset},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 2), is32, access_pmintenclr},
+
+	/* PRRR/NMRR (aka MAIR0/MAIR1): swapped by interrupt.S. */
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c10_PRRR},
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c10_NMRR},
+
+	/* VBAR: swapped by interrupt.S. */
+	{ CRn(12), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_val, c12_VBAR, 0x00000000 },
+
+	/* CONTEXTIDR/TPIDRURW/TPIDRURO/TPIDRPRW: swapped by interrupt.S. */
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 1), is32,
+			NULL, reset_val, c13_CID, 0x00000000 },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_unknown, c13_TID_URW },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 3), is32,
+			NULL, reset_unknown, c13_TID_URO },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 4), is32,
+			NULL, reset_unknown, c13_TID_PRIV },
+};
+
+/*
+ * A15-specific CP15 registers.
+ * Important: Must be sorted ascending by CRn, CRM, Op1, Op2
+ */
+static const struct coproc_reg cp15_cortex_a15_regs[] = {
+	/* MPIDR: we use VMPIDR for guest access. */
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 5), is32,
+			NULL, reset_mpidr, c0_MPIDR },
+
+	/* SCTLR: swapped by interrupt.S. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_val, c1_SCTLR, 0x00C50078 },
+	/* ACTLR: trapped by HCR.TAC bit. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 1), is32,
+			access_actlr, reset_actlr, c1_ACTLR },
+	/* CPACR: swapped by interrupt.S. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_val, c1_CPACR, 0x00000000 },
+
+	/*
+	 * L2CTLR access (guest wants to know #CPUs).
+	 */
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 2), is32,
+			access_l2ctlr, reset_l2ctlr, c9_L2CTLR },
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 3), is32, access_l2ectlr},
+
+	/* The Configuration Base Address Register. */
+	{ CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
+};
+
+/* Get specific register table for this target. */
+static const struct coproc_reg *get_target_table(unsigned target, size_t *num)
+{
+	switch (target) {
+	case KVM_ARM_TARGET_CORTEX_A15:
+		*num = ARRAY_SIZE(cp15_cortex_a15_regs);
+		return cp15_cortex_a15_regs;
+	default:
+		*num = 0;
+		return NULL;
+	}
+}
+
+static const struct coproc_reg *find_reg(const struct coproc_params *params,
+					 const struct coproc_reg table[],
+					 unsigned int num)
+{
+	unsigned int i;
+
+	for (i = 0; i < num; i++) {
+		const struct coproc_reg *r = &table[i];
+
+		if (params->is_64bit != r->is_64)
+			continue;
+		if (params->CRn != r->CRn)
+			continue;
+		if (params->CRm != r->CRm)
+			continue;
+		if (params->Op1 != r->Op1)
+			continue;
+		if (params->Op2 != r->Op2)
+			continue;
+
+		return r;
+	}
+	return NULL;
+}
+
+static int emulate_cp15(struct kvm_vcpu *vcpu,
+			const struct coproc_params *params)
+{
+	size_t num;
+	const struct coproc_reg *table, *r;
+
+	trace_kvm_emulate_cp15_imp(params->Op1, params->Rt1, params->CRn,
+				   params->CRm, params->Op2, params->is_write);
+
+	table = get_target_table(vcpu->arch.target, &num);
+
+	/* Search target-specific then generic table. */
+	r = find_reg(params, table, num);
+	if (!r)
+		r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	if (likely(r)) {
+		/* If we don't have an accessor, we should never get here! */
+		BUG_ON(!r->access);
+
+		if (likely(r->access(vcpu, params, r))) {
+			/* Skip instruction, since it was emulated */
+			kvm_skip_instr(vcpu, (vcpu->arch.hsr >> 25) & 1);
+			return 1;
+		}
+		/* If access function fails, it should complain. */
+	} else {
+		kvm_err("Unsupported guest CP15 access at: %08x\n",
+			vcpu->arch.regs.pc);
+		print_cp_instr(params);
+	}
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+/**
+ * kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = true;
+
+	params.Op1 = (vcpu->arch.hsr >> 16) & 0xf;
+	params.Op2 = 0;
+	params.Rt2 = (vcpu->arch.hsr >> 10) & 0xf;
+	params.CRn = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+static void reset_coproc_regs(struct kvm_vcpu *vcpu,
+			      const struct coproc_reg *table, size_t num)
+{
+	unsigned long i;
+
+	for (i = 0; i < num; i++)
+		if (table[i].reset)
+			table[i].reset(vcpu, &table[i]);
+}
+
+/**
+ * kvm_handle_cp15_32 -- handles a mrc/mcr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = false;
+
+	params.CRn = (vcpu->arch.hsr >> 10) & 0xf;
+	params.Op1 = (vcpu->arch.hsr >> 14) & 0x7;
+	params.Op2 = (vcpu->arch.hsr >> 17) & 0x7;
+	params.Rt2 = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+static int cmp_reg(const struct coproc_reg *i1, const struct coproc_reg *i2)
+{
+	if (i1->CRn != i2->CRn)
+		return i1->CRn - i2->CRn;
+	if (i1->CRm != i2->CRm)
+		return i1->CRm - i2->CRm;
+	if (i1->Op1 != i2->Op1)
+		return i1->Op1 - i2->Op1;
+	return i1->Op2 - i2->Op2;
+}
+
+void kvm_coproc_table_init(void)
+{
+	unsigned int i;
+
+	/* Make sure tables are unique and in order. */
+	for (i = 1; i < ARRAY_SIZE(cp15_regs); i++)
+		BUG_ON(cmp_reg(&cp15_regs[i-1], &cp15_regs[i]) >= 0);
+	for (i = 1; i < ARRAY_SIZE(cp15_cortex_a15_regs); i++)
+		BUG_ON(cmp_reg(&cp15_cortex_a15_regs[i-1],
+			       &cp15_cortex_a15_regs[i]) >= 0);
+}
+
+/**
+ * kvm_reset_coprocs - sets cp15 registers to reset value
+ * @vcpu: The VCPU pointer
+ *
+ * This function finds the right table above and sets the registers on the
+ * virtual CPU struct to their architecturally defined reset values.
+ */
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
 {
+	size_t num;
+	const struct coproc_reg *table;
+
+	/* Catch someone adding a register without putting in reset entry. */
+	memset(vcpu->arch.cp15, 0x42, sizeof(vcpu->arch.cp15));
+
+	/* Generic chip reset first (so target could override). */
+	reset_coproc_regs(vcpu, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	table = get_target_table(vcpu->arch.target, &num);
+	reset_coproc_regs(vcpu, table, num);
+
+	for (num = 1; num < nr_cp15_regs; num++)
+		if (vcpu->arch.cp15[num] == 0x42424242)
+			panic("Didn't reset vcpu->arch.cp15[%zi]", num);
 }
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 690bbb3..9236d10 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -16,7 +16,13 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 
+#include <linux/mm.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
 
 #define REG_OFFSET(_reg) \
 	(offsetof(struct kvm_regs, _reg) / sizeof(u32))
@@ -125,3 +131,204 @@ u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
 
 	return reg_array + vcpu_reg_offsets[mode][reg_num];
 }
+
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return 0;
+}
+
+/**
+ * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
+ * @vcpu:	The VCPU pointer
+ *
+ * When exceptions occur while instructions are executed in Thumb IF-THEN
+ * blocks, the ITSTATE field of the CPSR is not advanved (updated), so we have
+ * to do this little bit of work manually. The fields map like this:
+ *
+ * IT[7:0] -> CPSR[26:25],CPSR[15:10]
+ */
+static void kvm_adjust_itstate(struct kvm_vcpu *vcpu)
+{
+	unsigned long itbits, cond;
+	unsigned long cpsr = *vcpu_cpsr(vcpu);
+	bool is_arm = !(cpsr & PSR_T_BIT);
+
+	BUG_ON(is_arm && (cpsr & PSR_IT_MASK));
+
+	if (!(cpsr & PSR_IT_MASK))
+		return;
+
+	cond = (cpsr & 0xe000) >> 13;
+	itbits = (cpsr & 0x1c00) >> (10 - 2);
+	itbits |= (cpsr & (0x3 << 25)) >> 25;
+
+	/* Perform ITAdvance (see page A-52 in ARM DDI 0406C) */
+	if ((itbits & 0x7) == 0)
+		itbits = cond = 0;
+	else
+		itbits = (itbits << 1) & 0x1f;
+
+	cpsr &= ~PSR_IT_MASK;
+	cpsr |= cond << 13;
+	cpsr |= (itbits & 0x1c) << (10 - 2);
+	cpsr |= (itbits & 0x3) << 25;
+	*vcpu_cpsr(vcpu) = cpsr;
+}
+
+/**
+ * kvm_skip_instr - skip a trapped instruction and proceed to the next
+ * @vcpu: The vcpu pointer
+ */
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
+{
+	bool is_thumb;
+
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	if (is_thumb && !is_wide_instr)
+		*vcpu_pc(vcpu) += 2;
+	else
+		*vcpu_pc(vcpu) += 4;
+	kvm_adjust_itstate(vcpu);
+}
+
+
+/******************************************************************************
+ * Inject exceptions into the guest
+ */
+
+static u32 exc_vector_base(struct kvm_vcpu *vcpu)
+{
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	u32 vbar = vcpu->arch.cp15[c12_VBAR];
+
+	if (sctlr & SCTLR_V)
+		return 0xffff0000;
+	else /* always have security exceptions */
+		return vbar;
+}
+
+/**
+ * kvm_inject_undefined - inject an undefined exception into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ *
+ * Modelled after TakeUndefInstrException() pseudocode.
+ */
+void kvm_inject_undefined(struct kvm_vcpu *vcpu)
+{
+	u32 new_lr_value;
+	u32 new_spsr_value;
+	u32 cpsr = *vcpu_cpsr(vcpu);
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	bool is_thumb = (cpsr & PSR_T_BIT);
+	u32 vect_offset = 4;
+	u32 return_offset = (is_thumb) ? 2 : 4;
+
+	new_spsr_value = cpsr;
+	new_lr_value = *vcpu_pc(vcpu) - return_offset;
+
+	*vcpu_cpsr(vcpu) = (cpsr & ~MODE_MASK) | UND_MODE;
+	*vcpu_cpsr(vcpu) |= PSR_I_BIT;
+	*vcpu_cpsr(vcpu) &= ~(PSR_IT_MASK | PSR_J_BIT | PSR_E_BIT | PSR_T_BIT);
+
+	if (sctlr & SCTLR_TE)
+		*vcpu_cpsr(vcpu) |= PSR_T_BIT;
+	if (sctlr & SCTLR_EE)
+		*vcpu_cpsr(vcpu) |= PSR_E_BIT;
+
+	/* Note: These now point to UND banked copies */
+	*vcpu_spsr(vcpu) = cpsr;
+	*vcpu_reg(vcpu, 14) = new_lr_value;
+
+	/* Branch to exception vector */
+	*vcpu_pc(vcpu) = exc_vector_base(vcpu) + vect_offset;
+}
+
+/*
+ * Modelled after TakeDataAbortException() and TakePrefetchAbortException
+ * pseudocode.
+ */
+static void inject_abt(struct kvm_vcpu *vcpu, bool is_pabt, unsigned long addr)
+{
+	u32 new_lr_value;
+	u32 new_spsr_value;
+	u32 cpsr = *vcpu_cpsr(vcpu);
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	bool is_thumb = (cpsr & PSR_T_BIT);
+	u32 vect_offset;
+	u32 return_offset = (is_thumb) ? 4 : 0;
+	bool is_lpae;
+
+	new_spsr_value = cpsr;
+	new_lr_value = *vcpu_pc(vcpu) + return_offset;
+
+	*vcpu_cpsr(vcpu) = (cpsr & ~MODE_MASK) | ABT_MODE;
+	*vcpu_cpsr(vcpu) |= PSR_I_BIT | PSR_A_BIT;
+	*vcpu_cpsr(vcpu) &= ~(PSR_IT_MASK | PSR_J_BIT | PSR_E_BIT | PSR_T_BIT);
+
+	if (sctlr & SCTLR_TE)
+		*vcpu_cpsr(vcpu) |= PSR_T_BIT;
+	if (sctlr & SCTLR_EE)
+		*vcpu_cpsr(vcpu) |= PSR_E_BIT;
+
+	/* Note: These now point to ABT banked copies */
+	*vcpu_spsr(vcpu) = cpsr;
+	*vcpu_reg(vcpu, 14) = new_lr_value;
+
+	if (is_pabt)
+		vect_offset = 12;
+	else
+		vect_offset = 16;
+
+	/* Branch to exception vector */
+	*vcpu_pc(vcpu) = exc_vector_base(vcpu) + vect_offset;
+
+	if (is_pabt) {
+		/* Set DFAR and DFSR */
+		vcpu->arch.cp15[c6_IFAR] = addr;
+		is_lpae = (vcpu->arch.cp15[c2_TTBCR] >> 31);
+		/* Always give debug fault for now - should give guest a clue */
+		if (is_lpae)
+			vcpu->arch.cp15[c5_IFSR] = 1 << 9 | 0x22;
+		else
+			vcpu->arch.cp15[c5_IFSR] = 2;
+	} else { /* !iabt */
+		/* Set DFAR and DFSR */
+		vcpu->arch.cp15[c6_DFAR] = addr;
+		is_lpae = (vcpu->arch.cp15[c2_TTBCR] >> 31);
+		/* Always give debug fault for now - should give guest a clue */
+		if (is_lpae)
+			vcpu->arch.cp15[c5_DFSR] = 1 << 9 | 0x22;
+		else
+			vcpu->arch.cp15[c5_DFSR] = 2;
+	}
+
+}
+
+/**
+ * kvm_inject_dabt - inject a data abort into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ * @addr: The address to report in the DFAR
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ */
+void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
+{
+	inject_abt(vcpu, false, addr);
+}
+
+/**
+ * kvm_inject_pabt - inject a prefetch abort into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ * @addr: The address to report in the DFAR
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ */
+void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
+{
+	inject_abt(vcpu, true, addr);
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index a283eef..772e045 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -64,6 +64,36 @@ TRACE_EVENT(kvm_irq_line,
 		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
 );
 
+/* Architecturally implementation defined CP15 register access */
+TRACE_EVENT(kvm_emulate_cp15_imp,
+	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
+		 unsigned long CRm, unsigned long Op2, bool is_write),
+	TP_ARGS(Op1, Rt1, CRn, CRm, Op2, is_write),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	Op1		)
+		__field(	unsigned int,	Rt1		)
+		__field(	unsigned int,	CRn		)
+		__field(	unsigned int,	CRm		)
+		__field(	unsigned int,	Op2		)
+		__field(	bool,		is_write	)
+	),
+
+	TP_fast_assign(
+		__entry->is_write		= is_write;
+		__entry->Op1			= Op1;
+		__entry->Rt1			= Rt1;
+		__entry->CRn			= CRn;
+		__entry->CRm			= CRm;
+		__entry->Op2			= Op2;
+	),
+
+	TP_printk("Implementation defined CP15: %s\tp15, %u, r%u, c%u, c%u, %u",
+			(__entry->is_write) ? "mcr" : "mrc",
+			__entry->Op1, __entry->Rt1, __entry->CRn,
+			__entry->CRm, __entry->Op2)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 11/15] KVM: ARM: Emulation framework and CP15 emulation
@ 2012-09-15 15:35   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.

Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commit handles these exits by
emulating the intended operation in software and skipping the guest
instruction.

Minor notes about the coproc register reset:
1) We reserve a value of 0 as an invalid cp15 offset, to catch bugs in our
   table, at cost of 4 bytes per vcpu.

2) Added comments on the table indicating how we handle each register, for
   simplicity of understanding.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h     |    9 +
 arch/arm/include/asm/kvm_coproc.h  |    7 
 arch/arm/include/asm/kvm_emulate.h |    6 
 arch/arm/include/asm/kvm_host.h    |    5 
 arch/arm/kvm/arm.c                 |  166 ++++++++++
 arch/arm/kvm/coproc.c              |  592 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/emulate.c             |  207 +++++++++++++
 arch/arm/kvm/trace.h               |   30 ++
 8 files changed, 1020 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index ee345a6..ae586c1 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -76,6 +76,11 @@
 			HCR_SWIO | HCR_TIDCP)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
+/* System Control Register (SCTLR) bits */
+#define SCTLR_TE	(1 << 30)
+#define SCTLR_EE	(1 << 25)
+#define SCTLR_V		(1 << 13)
+
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
 #define HSCTLR_EE	(1 << 25)
@@ -153,6 +158,10 @@
 #define HSR_ISS		(HSR_IL - 1)
 #define HSR_ISV_SHIFT	(24)
 #define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_CV_SHIFT	(24)
+#define HSR_CV		(1U << HSR_CV_SHIFT)
+#define HSR_COND_SHIFT	(20)
+#define HSR_COND	(0xfU << HSR_COND_SHIFT)
 
 #define HSR_EC_UNKNOWN	(0x00)
 #define HSR_EC_WFI	(0x01)
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
index b6d023d..c451fb4 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -21,4 +21,11 @@
 
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_coproc_table_init(void);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 9e29335..6ddfae2 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -51,6 +51,12 @@ static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
 	return mode;
 }
 
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
+void kvm_inject_undefined(struct kvm_vcpu *vcpu);
+void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
+void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
+
 /*
  * Return the SPSR for the specified mode of the virtual CPU.
  */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ca4c079..c98dcd7 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -109,6 +109,7 @@ enum cp15_regs {
 	c5_AIFSR,		/* Auxilary Instruction Fault Status Register */
 	c6_DFAR,		/* Data Fault Address Register */
 	c6_IFAR,		/* Instruction Fault Address Register */
+	c9_L2CTLR,		/* Cortex A15 L2 Control Register */
 	c10_PRRR,		/* Primary Region Remap Register */
 	c10_NMRR,		/* Normal Memory Remap Register */
 	c12_VBAR,		/* Vector Base Address Register */
@@ -146,6 +147,10 @@ struct kvm_vcpu_arch {
 	 * Anything that is not used directly from assembly code goes
 	 * here.
 	 */
+	/* dcache set/way operation pending */
+	int last_pcpu;
+	cpumask_t require_dcache_flush;
+
 	/* IO related fields */
 	struct {
 		bool sign_extend;	/* for byte/halfword loads */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 087f9d1..8dc5887 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -37,11 +37,14 @@
 #include <asm/cputype.h>
 #include <asm/idmap.h>
 #include <asm/tlbflush.h>
+#include <asm/cacheflush.h>
 #include <asm/virt.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <asm/opcodes.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
@@ -279,6 +282,17 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
 	vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
+
+	/*
+	 * Check whether this vcpu requires the cache to be flushed on
+	 * this physical CPU. This is a consequence of doing dcache
+	 * operations by set/way on this vcpu. We do it here to be in
+	 * a non-preemptible section.
+	 */
+	if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
+		cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+		flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
+	}
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -379,6 +393,114 @@ static void update_vttbr(struct kvm *kvm)
 	spin_unlock(&kvm_vmid_lock);
 }
 
+static int handle_svc_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* SVC called from Hyp mode should never get here */
+	kvm_debug("SVC called from Hyp mode shouldn't go here\n");
+	BUG();
+	return -EINVAL; /* Squash warning */
+}
+
+static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/*
+	 * Guest called HVC instruction:
+	 * Let it know we don't want that by injecting an undefined exception.
+	 */
+	kvm_debug("hvc: %x (at %08x)", vcpu->arch.hsr & ((1 << 16) - 1),
+				     vcpu->arch.regs.pc);
+	kvm_debug("         HSR: %8x", vcpu->arch.hsr);
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* We don't support SMC; don't do that. */
+	kvm_debug("smc: at %08x", vcpu->arch.regs.pc);
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int handle_pabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* The hypervisor should never cause aborts */
+	kvm_err("Prefetch Abort taken from Hyp mode at %#08x (HSR: %#08x)\n",
+		vcpu->arch.hifar, vcpu->arch.hsr);
+	return -EFAULT;
+}
+
+static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* This is either an error in the ws. code or an external abort */
+	kvm_err("Data Abort taken from Hyp mode at %#08x (HSR: %#08x)\n",
+		vcpu->arch.hdfar, vcpu->arch.hsr);
+	return -EFAULT;
+}
+
+typedef int (*exit_handle_fn)(struct kvm_vcpu *, struct kvm_run *);
+static exit_handle_fn arm_exit_handlers[] = {
+	[HSR_EC_WFI]		= kvm_handle_wfi,
+	[HSR_EC_CP15_32]	= kvm_handle_cp15_32,
+	[HSR_EC_CP15_64]	= kvm_handle_cp15_64,
+	[HSR_EC_CP14_MR]	= kvm_handle_cp14_access,
+	[HSR_EC_CP14_LS]	= kvm_handle_cp14_load_store,
+	[HSR_EC_CP14_64]	= kvm_handle_cp14_access,
+	[HSR_EC_CP_0_13]	= kvm_handle_cp_0_13_access,
+	[HSR_EC_CP10_ID]	= kvm_handle_cp10_id,
+	[HSR_EC_SVC_HYP]	= handle_svc_hyp,
+	[HSR_EC_HVC]		= handle_hvc,
+	[HSR_EC_SMC]		= handle_smc,
+	[HSR_EC_IABT]		= kvm_handle_guest_abort,
+	[HSR_EC_IABT_HYP]	= handle_pabt_hyp,
+	[HSR_EC_DABT]		= kvm_handle_guest_abort,
+	[HSR_EC_DABT_HYP]	= handle_dabt_hyp,
+};
+
+/*
+ * A conditional instruction is allowed to trap, even though it
+ * wouldn't be executed.  So let's re-implement the hardware, in
+ * software!
+ */
+static bool kvm_condition_valid(struct kvm_vcpu *vcpu)
+{
+	unsigned long cpsr, cond, insn;
+
+	/*
+	 * Exception Code 0 can only happen if we set HCR.TGE to 1, to
+	 * catch undefined instructions, and then we won't get past
+	 * the arm_exit_handlers test anyway.
+	 */
+	BUG_ON(((vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT) == 0);
+
+	/* Top two bits non-zero?  Unconditional. */
+	if (vcpu->arch.hsr >> 30)
+		return true;
+
+	cpsr = *vcpu_cpsr(vcpu);
+
+	/* Is condition field valid? */
+	if ((vcpu->arch.hsr & HSR_CV) >> HSR_CV_SHIFT)
+		cond = (vcpu->arch.hsr & HSR_COND) >> HSR_COND_SHIFT;
+	else {
+		/* This can happen in Thumb mode: examine IT state. */
+		unsigned long it;
+
+		it = ((cpsr >> 8) & 0xFC) | ((cpsr >> 25) & 0x3);
+
+		/* it == 0 => unconditional. */
+		if (it == 0)
+			return true;
+
+		/* The cond for this insn works out as the top 4 bits. */
+		cond = (it >> 4);
+	}
+
+	/* Shift makes it look like an ARM-mode instruction */
+	insn = cond << 28;
+	return arm_check_condition(insn, cpsr) != ARM_OPCODE_CONDTEST_FAIL;
+}
+
 /*
  * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
  * proper exit to QEMU.
@@ -386,8 +508,46 @@ static void update_vttbr(struct kvm *kvm)
 static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		       int exception_index)
 {
-	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-	return 0;
+	unsigned long hsr_ec;
+
+	switch (exception_index) {
+	case ARM_EXCEPTION_IRQ:
+		return 1;
+	case ARM_EXCEPTION_UNDEFINED:
+		kvm_err("Undefined exception in Hyp mode at: %#08x\n",
+			vcpu->arch.hyp_pc);
+		BUG();
+		panic("KVM: Hypervisor undefined exception!\n");
+	case ARM_EXCEPTION_DATA_ABORT:
+	case ARM_EXCEPTION_PREF_ABORT:
+	case ARM_EXCEPTION_HVC:
+		hsr_ec = (vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT;
+
+		if (hsr_ec >= ARRAY_SIZE(arm_exit_handlers)
+		    || !arm_exit_handlers[hsr_ec]) {
+			kvm_err("Unkown exception class: %#08lx, "
+				"hsr: %#08x\n", hsr_ec,
+				(unsigned int)vcpu->arch.hsr);
+			BUG();
+		}
+
+		/*
+		 * See ARM ARM B1.14.1: "Hyp traps on instructions
+		 * that fail their condition code check"
+		 */
+		if (!kvm_condition_valid(vcpu)) {
+			bool is_wide = vcpu->arch.hsr & HSR_IL;
+			kvm_skip_instr(vcpu, is_wide);
+			return 1;
+		}
+
+		return arm_exit_handlers[hsr_ec](vcpu, run);
+	default:
+		kvm_pr_unimpl("Unsupported exception type: %d",
+			      exception_index);
+		run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+		return 0;
+	}
 }
 
 /**
@@ -450,6 +610,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		ret = __kvm_vcpu_run(vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
+		vcpu->arch.last_pcpu = smp_processor_id();
 		kvm_guest_exit();
 		trace_kvm_exit(vcpu->arch.regs.pc);
 		/*
@@ -772,6 +933,7 @@ int kvm_arch_init(void *opaque)
 	if (err)
 		goto out_err;
 
+	kvm_coproc_table_init();
 	return 0;
 out_err:
 	return err;
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 4b9dad8..a6d8bb0 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -15,8 +15,600 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+#include <linux/mm.h>
 #include <linux/kvm_host.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <trace/events/kvm.h>
 
+#include "trace.h"
+
+/******************************************************************************
+ * Co-processor emulation
+ *****************************************************************************/
+
+struct coproc_params {
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+	unsigned long Rt1;
+	unsigned long Rt2;
+	bool is_64bit;
+	bool is_write;
+};
+
+struct coproc_reg {
+	/* MRC/MCR/MRRC/MCRR instruction which accesses it. */
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+
+	bool is_64;
+
+	/* Trapped access from guest, if non-NULL. */
+	bool (*access)(struct kvm_vcpu *,
+		       const struct coproc_params *,
+		       const struct coproc_reg *);
+
+	/* Initialization for vcpu. */
+	void (*reset)(struct kvm_vcpu *, const struct coproc_reg *);
+
+	/* Index into vcpu->arch.cp15[], or 0 if we don't need to save it. */
+	enum cp15_regs reg;
+
+	/* Value (usually reset value) */
+	u64 val;
+};
+
+static void print_cp_instr(const struct coproc_params *p)
+{
+	/* Look, we even formatted it for you to paste into the table! */
+	if (p->is_64bit) {
+		kvm_pr_unimpl(" { CRm(%2lu), Op1(%2lu), is64, func_%s },\n",
+			      p->CRm, p->Op1, p->is_write ? "write" : "read");
+	} else {
+		kvm_pr_unimpl(" { CRn(%2lu), CRm(%2lu), Op1(%2lu), Op2(%2lu), is32,"
+			      " func_%s },\n",
+			      p->CRn, p->CRm, p->Op1, p->Op2,
+			      p->is_write ? "write" : "read");
+	}
+}
+
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/*
+	 * We can get here, if the host has been built without VFPv3 support,
+	 * but the guest attempted a floating point operation.
+	 */
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static bool ignore_write(struct kvm_vcpu *vcpu, const struct coproc_params *p)
+{
+	return true;
+}
+
+static bool read_zero(struct kvm_vcpu *vcpu, const struct coproc_params *p)
+{
+	*vcpu_reg(vcpu, p->Rt1) = 0;
+	return true;
+}
+
+static bool write_to_read_only(struct kvm_vcpu *vcpu,
+			       const struct coproc_params *params)
+{
+	kvm_debug("CP15 write to read-only register at: %08x\n",
+		  vcpu->arch.regs.pc);
+	print_cp_instr(params);
+	return false;
+}
+
+static bool read_from_write_only(struct kvm_vcpu *vcpu,
+				 const struct coproc_params *params)
+{
+	kvm_debug("CP15 read to write-only register at: %08x\n",
+		  vcpu->arch.regs.pc);
+	print_cp_instr(params);
+	return false;
+}
+
+/* A15 TRM 4.3.48: R/O WI. */
+static bool access_l2ctlr(struct kvm_vcpu *vcpu,
+			  const struct coproc_params *p,
+			  const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c9_L2CTLR];
+	return true;
+}
+
+static void reset_l2ctlr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	u32 l2ctlr, ncores;
+
+	asm volatile("mrc p15, 1, %0, c9, c0, 2\n" : "=r" (l2ctlr));
+	l2ctlr &= ~(3 << 24);
+	ncores = atomic_read(&vcpu->kvm->online_vcpus) - 1;
+	l2ctlr |= (ncores & 3) << 24;
+
+	vcpu->arch.cp15[c9_L2CTLR] = l2ctlr;
+}
+
+/* A15 TRM 4.3.49: R/O WI (even if NSACR.NS_L2ERR, a write of 1 is ignored). */
+static bool access_l2ectlr(struct kvm_vcpu *vcpu,
+			   const struct coproc_params *p,
+			   const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = 0;
+	return true;
+}
+
+/* A15 TRM 4.3.60: R/O. */
+static bool access_cbar(struct kvm_vcpu *vcpu,
+			const struct coproc_params *p,
+			const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p);
+	return read_zero(vcpu, p);
+}
+
+/* A15 TRM 4.3.28: RO WI */
+static bool access_actlr(struct kvm_vcpu *vcpu,
+			 const struct coproc_params *p,
+			 const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c1_ACTLR];
+	return true;
+}
+
+static void reset_actlr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	u32 actlr;
+
+	/* ACTLR contains SMP bit: make sure you create all cpus first! */
+	asm volatile("mrc p15, 0, %0, c1, c0, 1\n" : "=r" (actlr));
+	/* Make the SMP bit consistent with the guest configuration */
+	if (atomic_read(&vcpu->kvm->online_vcpus) > 1)
+		actlr |= 1U << 6;
+	else
+		actlr &= ~(1U << 6);
+
+	vcpu->arch.cp15[c1_ACTLR] = actlr;
+}
+
+/* See note at ARM ARM B1.14.4 */
+static bool access_dcsw(struct kvm_vcpu *vcpu,
+			const struct coproc_params *p,
+			const struct coproc_reg *r)
+{
+	u32 val;
+	int cpu;
+
+	cpu = get_cpu();
+
+	if (!p->is_write)
+		return read_from_write_only(vcpu, p);
+
+	cpumask_setall(&vcpu->arch.require_dcache_flush);
+	cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+
+	/* If we were already preempted, take the long way around */
+	if (cpu != vcpu->arch.last_pcpu) {
+		flush_cache_all();
+		goto done;
+	}
+
+	val = *vcpu_reg(vcpu, p->Rt1);
+
+	switch (p->CRm) {
+	case 6:			/* Upgrade DCISW to DCCISW, as per HCR.SWIO */
+	case 14:		/* DCCISW */
+		asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r" (val));
+		break;
+
+	case 10:		/* DCCSW */
+		asm volatile("mcr p15, 0, %0, c7, c10, 2" : : "r" (val));
+		break;
+	}
+
+done:
+	put_cpu();
+
+	return true;
+}
+
+/*
+ * We could trap ID_DFR0 and tell the guest we don't support performance
+ * monitoring.  Unfortunately the patch to make the kernel check ID_DFR0 was
+ * NAKed, so it will read the PMCR anyway.
+ *
+ * Therefore we tell the guest we have 0 counters.  Unfortunately, we
+ * must always support PMCCNTR (the cycle counter): we just RAZ/WI for
+ * all PM registers, which doesn't crash the guest kernel at least.
+ */
+static bool pm_fake(struct kvm_vcpu *vcpu,
+		    const struct coproc_params *p,
+		    const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+	else
+		return read_zero(vcpu, p);
+}
+
+#define access_pmcr pm_fake
+#define access_pmcntenset pm_fake
+#define access_pmcntenclr pm_fake
+#define access_pmovsr pm_fake
+#define access_pmselr pm_fake
+#define access_pmceid0 pm_fake
+#define access_pmceid1 pm_fake
+#define access_pmccntr pm_fake
+#define access_pmxevtyper pm_fake
+#define access_pmxevcntr pm_fake
+#define access_pmuserenr pm_fake
+#define access_pmintenset pm_fake
+#define access_pmintenclr pm_fake
+
+/* Reset functions */
+static void reset_unknown(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15));
+	vcpu->arch.cp15[r->reg] = 0xdecafbad;
+}
+
+static void reset_val(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15));
+	vcpu->arch.cp15[r->reg] = r->val;
+}
+
+static void reset_unknown64(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg + 1 >= ARRAY_SIZE(vcpu->arch.cp15));
+
+	vcpu->arch.cp15[r->reg] = 0xdecafbad;
+	vcpu->arch.cp15[r->reg+1] = 0xd0c0ffee;
+}
+
+static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	/*
+	 * Compute guest MPIDR:
+	 * (Even if we present only one VCPU to the guest on an SMP
+	 * host we don't set the U bit in the MPIDR, or vice versa, as
+	 * revealing the underlying hardware properties is likely to
+	 * be the best choice).
+	 */
+	vcpu->arch.cp15[c0_MPIDR] = (read_cpuid_mpidr() & ~MPIDR_CPUID)
+		| (vcpu->vcpu_id & MPIDR_CPUID);
+}
+
+#define CRn(_x)		.CRn = _x
+#define CRm(_x) 	.CRm = _x
+#define Op1(_x) 	.Op1 = _x
+#define Op2(_x) 	.Op2 = _x
+#define is64		.is_64 = true
+#define is32		.is_64 = false
+
+/* Architected CP15 registers.
+ * Important: Must be sorted ascending by CRn, CRM, Op1, Op2
+ */
+static const struct coproc_reg cp15_regs[] = {
+	/* CSSELR: swapped by interrupt.S. */
+	{ CRn( 0), CRm( 0), Op1( 2), Op2( 0), is32,
+			NULL, reset_unknown, c0_CSSELR },
+
+	/* TTBR0/TTBR1: swapped by interrupt.S. */
+	{ CRm( 2), Op1( 0), is64, NULL, reset_unknown64, c2_TTBR0 },
+	{ CRm( 2), Op1( 1), is64, NULL, reset_unknown64, c2_TTBR1 },
+
+	/* TTBCR: swapped by interrupt.S. */
+	{ CRn( 2), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_val, c2_TTBCR, 0x00000000 },
+
+	/* DACR: swapped by interrupt.S. */
+	{ CRn( 3), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c3_DACR },
+
+	/* DFSR/IFSR/ADFSR/AIFSR: swapped by interrupt.S. */
+	{ CRn( 5), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c5_DFSR },
+	{ CRn( 5), CRm( 0), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c5_IFSR },
+	{ CRn( 5), CRm( 1), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c5_ADFSR },
+	{ CRn( 5), CRm( 1), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c5_AIFSR },
+
+	/* DFAR/IFAR: swapped by interrupt.S. */
+	{ CRn( 6), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c6_DFAR },
+	{ CRn( 6), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_unknown, c6_IFAR },
+	/*
+	 * DC{C,I,CI}SW operations:
+	 */
+	{ CRn( 7), CRm( 6), Op1( 0), Op2( 2), is32, access_dcsw},
+	{ CRn( 7), CRm(10), Op1( 0), Op2( 2), is32, access_dcsw},
+	{ CRn( 7), CRm(14), Op1( 0), Op2( 2), is32, access_dcsw},
+	/*
+	 * Dummy performance monitor implementation.
+	 */
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 0), is32, access_pmcr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 1), is32, access_pmcntenset},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 2), is32, access_pmcntenclr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 3), is32, access_pmovsr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 5), is32, access_pmselr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 6), is32, access_pmceid0},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 7), is32, access_pmceid1},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 0), is32, access_pmccntr},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 1), is32, access_pmxevtyper},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 2), is32, access_pmxevcntr},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 0), is32, access_pmuserenr},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 1), is32, access_pmintenset},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 2), is32, access_pmintenclr},
+
+	/* PRRR/NMRR (aka MAIR0/MAIR1): swapped by interrupt.S. */
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c10_PRRR},
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c10_NMRR},
+
+	/* VBAR: swapped by interrupt.S. */
+	{ CRn(12), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_val, c12_VBAR, 0x00000000 },
+
+	/* CONTEXTIDR/TPIDRURW/TPIDRURO/TPIDRPRW: swapped by interrupt.S. */
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 1), is32,
+			NULL, reset_val, c13_CID, 0x00000000 },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_unknown, c13_TID_URW },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 3), is32,
+			NULL, reset_unknown, c13_TID_URO },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 4), is32,
+			NULL, reset_unknown, c13_TID_PRIV },
+};
+
+/*
+ * A15-specific CP15 registers.
+ * Important: Must be sorted ascending by CRn, CRM, Op1, Op2
+ */
+static const struct coproc_reg cp15_cortex_a15_regs[] = {
+	/* MPIDR: we use VMPIDR for guest access. */
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 5), is32,
+			NULL, reset_mpidr, c0_MPIDR },
+
+	/* SCTLR: swapped by interrupt.S. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_val, c1_SCTLR, 0x00C50078 },
+	/* ACTLR: trapped by HCR.TAC bit. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 1), is32,
+			access_actlr, reset_actlr, c1_ACTLR },
+	/* CPACR: swapped by interrupt.S. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_val, c1_CPACR, 0x00000000 },
+
+	/*
+	 * L2CTLR access (guest wants to know #CPUs).
+	 */
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 2), is32,
+			access_l2ctlr, reset_l2ctlr, c9_L2CTLR },
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 3), is32, access_l2ectlr},
+
+	/* The Configuration Base Address Register. */
+	{ CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
+};
+
+/* Get specific register table for this target. */
+static const struct coproc_reg *get_target_table(unsigned target, size_t *num)
+{
+	switch (target) {
+	case KVM_ARM_TARGET_CORTEX_A15:
+		*num = ARRAY_SIZE(cp15_cortex_a15_regs);
+		return cp15_cortex_a15_regs;
+	default:
+		*num = 0;
+		return NULL;
+	}
+}
+
+static const struct coproc_reg *find_reg(const struct coproc_params *params,
+					 const struct coproc_reg table[],
+					 unsigned int num)
+{
+	unsigned int i;
+
+	for (i = 0; i < num; i++) {
+		const struct coproc_reg *r = &table[i];
+
+		if (params->is_64bit != r->is_64)
+			continue;
+		if (params->CRn != r->CRn)
+			continue;
+		if (params->CRm != r->CRm)
+			continue;
+		if (params->Op1 != r->Op1)
+			continue;
+		if (params->Op2 != r->Op2)
+			continue;
+
+		return r;
+	}
+	return NULL;
+}
+
+static int emulate_cp15(struct kvm_vcpu *vcpu,
+			const struct coproc_params *params)
+{
+	size_t num;
+	const struct coproc_reg *table, *r;
+
+	trace_kvm_emulate_cp15_imp(params->Op1, params->Rt1, params->CRn,
+				   params->CRm, params->Op2, params->is_write);
+
+	table = get_target_table(vcpu->arch.target, &num);
+
+	/* Search target-specific then generic table. */
+	r = find_reg(params, table, num);
+	if (!r)
+		r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	if (likely(r)) {
+		/* If we don't have an accessor, we should never get here! */
+		BUG_ON(!r->access);
+
+		if (likely(r->access(vcpu, params, r))) {
+			/* Skip instruction, since it was emulated */
+			kvm_skip_instr(vcpu, (vcpu->arch.hsr >> 25) & 1);
+			return 1;
+		}
+		/* If access function fails, it should complain. */
+	} else {
+		kvm_err("Unsupported guest CP15 access at: %08x\n",
+			vcpu->arch.regs.pc);
+		print_cp_instr(params);
+	}
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+/**
+ * kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = true;
+
+	params.Op1 = (vcpu->arch.hsr >> 16) & 0xf;
+	params.Op2 = 0;
+	params.Rt2 = (vcpu->arch.hsr >> 10) & 0xf;
+	params.CRn = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+static void reset_coproc_regs(struct kvm_vcpu *vcpu,
+			      const struct coproc_reg *table, size_t num)
+{
+	unsigned long i;
+
+	for (i = 0; i < num; i++)
+		if (table[i].reset)
+			table[i].reset(vcpu, &table[i]);
+}
+
+/**
+ * kvm_handle_cp15_32 -- handles a mrc/mcr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = false;
+
+	params.CRn = (vcpu->arch.hsr >> 10) & 0xf;
+	params.Op1 = (vcpu->arch.hsr >> 14) & 0x7;
+	params.Op2 = (vcpu->arch.hsr >> 17) & 0x7;
+	params.Rt2 = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+static int cmp_reg(const struct coproc_reg *i1, const struct coproc_reg *i2)
+{
+	if (i1->CRn != i2->CRn)
+		return i1->CRn - i2->CRn;
+	if (i1->CRm != i2->CRm)
+		return i1->CRm - i2->CRm;
+	if (i1->Op1 != i2->Op1)
+		return i1->Op1 - i2->Op1;
+	return i1->Op2 - i2->Op2;
+}
+
+void kvm_coproc_table_init(void)
+{
+	unsigned int i;
+
+	/* Make sure tables are unique and in order. */
+	for (i = 1; i < ARRAY_SIZE(cp15_regs); i++)
+		BUG_ON(cmp_reg(&cp15_regs[i-1], &cp15_regs[i]) >= 0);
+	for (i = 1; i < ARRAY_SIZE(cp15_cortex_a15_regs); i++)
+		BUG_ON(cmp_reg(&cp15_cortex_a15_regs[i-1],
+			       &cp15_cortex_a15_regs[i]) >= 0);
+}
+
+/**
+ * kvm_reset_coprocs - sets cp15 registers to reset value
+ * @vcpu: The VCPU pointer
+ *
+ * This function finds the right table above and sets the registers on the
+ * virtual CPU struct to their architecturally defined reset values.
+ */
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
 {
+	size_t num;
+	const struct coproc_reg *table;
+
+	/* Catch someone adding a register without putting in reset entry. */
+	memset(vcpu->arch.cp15, 0x42, sizeof(vcpu->arch.cp15));
+
+	/* Generic chip reset first (so target could override). */
+	reset_coproc_regs(vcpu, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	table = get_target_table(vcpu->arch.target, &num);
+	reset_coproc_regs(vcpu, table, num);
+
+	for (num = 1; num < nr_cp15_regs; num++)
+		if (vcpu->arch.cp15[num] == 0x42424242)
+			panic("Didn't reset vcpu->arch.cp15[%zi]", num);
 }
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 690bbb3..9236d10 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -16,7 +16,13 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 
+#include <linux/mm.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
 
 #define REG_OFFSET(_reg) \
 	(offsetof(struct kvm_regs, _reg) / sizeof(u32))
@@ -125,3 +131,204 @@ u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
 
 	return reg_array + vcpu_reg_offsets[mode][reg_num];
 }
+
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return 0;
+}
+
+/**
+ * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
+ * @vcpu:	The VCPU pointer
+ *
+ * When exceptions occur while instructions are executed in Thumb IF-THEN
+ * blocks, the ITSTATE field of the CPSR is not advanved (updated), so we have
+ * to do this little bit of work manually. The fields map like this:
+ *
+ * IT[7:0] -> CPSR[26:25],CPSR[15:10]
+ */
+static void kvm_adjust_itstate(struct kvm_vcpu *vcpu)
+{
+	unsigned long itbits, cond;
+	unsigned long cpsr = *vcpu_cpsr(vcpu);
+	bool is_arm = !(cpsr & PSR_T_BIT);
+
+	BUG_ON(is_arm && (cpsr & PSR_IT_MASK));
+
+	if (!(cpsr & PSR_IT_MASK))
+		return;
+
+	cond = (cpsr & 0xe000) >> 13;
+	itbits = (cpsr & 0x1c00) >> (10 - 2);
+	itbits |= (cpsr & (0x3 << 25)) >> 25;
+
+	/* Perform ITAdvance (see page A-52 in ARM DDI 0406C) */
+	if ((itbits & 0x7) == 0)
+		itbits = cond = 0;
+	else
+		itbits = (itbits << 1) & 0x1f;
+
+	cpsr &= ~PSR_IT_MASK;
+	cpsr |= cond << 13;
+	cpsr |= (itbits & 0x1c) << (10 - 2);
+	cpsr |= (itbits & 0x3) << 25;
+	*vcpu_cpsr(vcpu) = cpsr;
+}
+
+/**
+ * kvm_skip_instr - skip a trapped instruction and proceed to the next
+ * @vcpu: The vcpu pointer
+ */
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
+{
+	bool is_thumb;
+
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	if (is_thumb && !is_wide_instr)
+		*vcpu_pc(vcpu) += 2;
+	else
+		*vcpu_pc(vcpu) += 4;
+	kvm_adjust_itstate(vcpu);
+}
+
+
+/******************************************************************************
+ * Inject exceptions into the guest
+ */
+
+static u32 exc_vector_base(struct kvm_vcpu *vcpu)
+{
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	u32 vbar = vcpu->arch.cp15[c12_VBAR];
+
+	if (sctlr & SCTLR_V)
+		return 0xffff0000;
+	else /* always have security exceptions */
+		return vbar;
+}
+
+/**
+ * kvm_inject_undefined - inject an undefined exception into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ *
+ * Modelled after TakeUndefInstrException() pseudocode.
+ */
+void kvm_inject_undefined(struct kvm_vcpu *vcpu)
+{
+	u32 new_lr_value;
+	u32 new_spsr_value;
+	u32 cpsr = *vcpu_cpsr(vcpu);
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	bool is_thumb = (cpsr & PSR_T_BIT);
+	u32 vect_offset = 4;
+	u32 return_offset = (is_thumb) ? 2 : 4;
+
+	new_spsr_value = cpsr;
+	new_lr_value = *vcpu_pc(vcpu) - return_offset;
+
+	*vcpu_cpsr(vcpu) = (cpsr & ~MODE_MASK) | UND_MODE;
+	*vcpu_cpsr(vcpu) |= PSR_I_BIT;
+	*vcpu_cpsr(vcpu) &= ~(PSR_IT_MASK | PSR_J_BIT | PSR_E_BIT | PSR_T_BIT);
+
+	if (sctlr & SCTLR_TE)
+		*vcpu_cpsr(vcpu) |= PSR_T_BIT;
+	if (sctlr & SCTLR_EE)
+		*vcpu_cpsr(vcpu) |= PSR_E_BIT;
+
+	/* Note: These now point to UND banked copies */
+	*vcpu_spsr(vcpu) = cpsr;
+	*vcpu_reg(vcpu, 14) = new_lr_value;
+
+	/* Branch to exception vector */
+	*vcpu_pc(vcpu) = exc_vector_base(vcpu) + vect_offset;
+}
+
+/*
+ * Modelled after TakeDataAbortException() and TakePrefetchAbortException
+ * pseudocode.
+ */
+static void inject_abt(struct kvm_vcpu *vcpu, bool is_pabt, unsigned long addr)
+{
+	u32 new_lr_value;
+	u32 new_spsr_value;
+	u32 cpsr = *vcpu_cpsr(vcpu);
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	bool is_thumb = (cpsr & PSR_T_BIT);
+	u32 vect_offset;
+	u32 return_offset = (is_thumb) ? 4 : 0;
+	bool is_lpae;
+
+	new_spsr_value = cpsr;
+	new_lr_value = *vcpu_pc(vcpu) + return_offset;
+
+	*vcpu_cpsr(vcpu) = (cpsr & ~MODE_MASK) | ABT_MODE;
+	*vcpu_cpsr(vcpu) |= PSR_I_BIT | PSR_A_BIT;
+	*vcpu_cpsr(vcpu) &= ~(PSR_IT_MASK | PSR_J_BIT | PSR_E_BIT | PSR_T_BIT);
+
+	if (sctlr & SCTLR_TE)
+		*vcpu_cpsr(vcpu) |= PSR_T_BIT;
+	if (sctlr & SCTLR_EE)
+		*vcpu_cpsr(vcpu) |= PSR_E_BIT;
+
+	/* Note: These now point to ABT banked copies */
+	*vcpu_spsr(vcpu) = cpsr;
+	*vcpu_reg(vcpu, 14) = new_lr_value;
+
+	if (is_pabt)
+		vect_offset = 12;
+	else
+		vect_offset = 16;
+
+	/* Branch to exception vector */
+	*vcpu_pc(vcpu) = exc_vector_base(vcpu) + vect_offset;
+
+	if (is_pabt) {
+		/* Set DFAR and DFSR */
+		vcpu->arch.cp15[c6_IFAR] = addr;
+		is_lpae = (vcpu->arch.cp15[c2_TTBCR] >> 31);
+		/* Always give debug fault for now - should give guest a clue */
+		if (is_lpae)
+			vcpu->arch.cp15[c5_IFSR] = 1 << 9 | 0x22;
+		else
+			vcpu->arch.cp15[c5_IFSR] = 2;
+	} else { /* !iabt */
+		/* Set DFAR and DFSR */
+		vcpu->arch.cp15[c6_DFAR] = addr;
+		is_lpae = (vcpu->arch.cp15[c2_TTBCR] >> 31);
+		/* Always give debug fault for now - should give guest a clue */
+		if (is_lpae)
+			vcpu->arch.cp15[c5_DFSR] = 1 << 9 | 0x22;
+		else
+			vcpu->arch.cp15[c5_DFSR] = 2;
+	}
+
+}
+
+/**
+ * kvm_inject_dabt - inject a data abort into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ * @addr: The address to report in the DFAR
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ */
+void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
+{
+	inject_abt(vcpu, false, addr);
+}
+
+/**
+ * kvm_inject_pabt - inject a prefetch abort into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ * @addr: The address to report in the DFAR
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ */
+void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
+{
+	inject_abt(vcpu, true, addr);
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index a283eef..772e045 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -64,6 +64,36 @@ TRACE_EVENT(kvm_irq_line,
 		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
 );
 
+/* Architecturally implementation defined CP15 register access */
+TRACE_EVENT(kvm_emulate_cp15_imp,
+	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
+		 unsigned long CRm, unsigned long Op2, bool is_write),
+	TP_ARGS(Op1, Rt1, CRn, CRm, Op2, is_write),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	Op1		)
+		__field(	unsigned int,	Rt1		)
+		__field(	unsigned int,	CRn		)
+		__field(	unsigned int,	CRm		)
+		__field(	unsigned int,	Op2		)
+		__field(	bool,		is_write	)
+	),
+
+	TP_fast_assign(
+		__entry->is_write		= is_write;
+		__entry->Op1			= Op1;
+		__entry->Rt1			= Rt1;
+		__entry->CRn			= CRn;
+		__entry->CRm			= CRm;
+		__entry->Op2			= Op2;
+	),
+
+	TP_printk("Implementation defined CP15: %s\tp15, %u, r%u, c%u, c%u, %u",
+			(__entry->is_write) ? "mcr" : "mrc",
+			__entry->Op1, __entry->Rt1, __entry->CRn,
+			__entry->CRm, __entry->Op2)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 12/15] KVM: ARM: User space API for getting/setting co-proc registers
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:35   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

The following three ioctls are implemented:
 -  KVM_GET_REG_LIST
 -  KVM_GET_ONE_REG
 -  KVM_SET_ONE_REG

Now we have a table for all the cp15 registers, we can drive a generic
API.

The register IDs carry the following encoding:

ARM registers are mapped using the lower 32 bits.  The upper 16 of that
is the register group type, or coprocessor number:

ARM 32-bit CP15 registers have the following id bit patterns:
  0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>

ARM 64-bit CP15 registers have the following id bit patterns:
  0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>

For futureproofing, we need to tell QEMU about the CP15 registers the
host lets the guest access.

It will need this information to restore a current guest on a future
CPU or perhaps a future KVM which allow some of these to be changed.

We use a separate table for these, as they're only for the userspace API.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |   46 +++++
 arch/arm/include/asm/kvm_coproc.h |    9 +
 arch/arm/include/asm/kvm_host.h   |    4 
 arch/arm/kvm/coproc.c             |  332 +++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/guest.c              |    9 +
 5 files changed, 396 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 605986f..035d4cb 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1756,6 +1756,13 @@ is the register group type, or coprocessor number:
 ARM core registers have the following id bit patterns:
   0x4002 0000 0010 <index into the kvm_regs struct:32>
 
+ARM 32-bit CP15 registers have the following id bit patterns:
+  0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
+
+ARM 64-bit CP15 registers have the following id bit patterns:
+  0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
+
+
 4.69 KVM_GET_ONE_REG
 
 Capability: KVM_CAP_ONE_REG
@@ -2048,6 +2055,45 @@ This ioctl returns the guest registers that are supported for the
 KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
 
 
+4.77 KVM_ARM_VCPU_INIT
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct struct kvm_vcpu_init (in)
+Returns: 0 on success; -1 on error
+Errors:
+  EINVAL:    the target is unknown, or the combination of features is invalid.
+  ENOENT:    a features bit specified is unknown.
+
+This tells KVM what type of CPU to present to the guest, and what
+optional features it should have.  This will cause a reset of the cpu
+registers to their initial values.  If this is not called, KVM_RUN will
+return ENOEXEC for that vcpu.
+
+Note that because some registers reflect machine topology, all vcpus
+should be created before this ioctl is invoked.
+
+4.78 KVM_GET_REG_LIST
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct kvm_reg_list (in/out)
+Returns: 0 on success; -1 on error
+Errors:
+  E2BIG:     the reg index list is too big to fit in the array specified by
+             the user (the number required will be written into n).
+
+struct kvm_reg_list {
+	__u64 n; /* number of registers in reg[] */
+	__u64 reg[0];
+};
+
+This ioctl returns the guest registers that are supported for the
+KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
+
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
index c451fb4..05943ec 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -27,5 +27,14 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+unsigned long kvm_arm_num_guest_msrs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_msrindices(struct kvm_vcpu *vcpu, u64 __user *uindices);
 void kvm_coproc_table_init(void);
+
+struct kvm_one_reg;
+int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
+int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index c98dcd7..97608d7 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #define KVM_MEMORY_SLOTS 32
 #define KVM_PRIVATE_MEM_SLOTS 4
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_HAVE_ONE_REG
 
 #define NUM_FEATURES 0
 
@@ -191,6 +192,9 @@ int kvm_unmap_hva_range(struct kvm *kvm,
 			unsigned long start, unsigned long end);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
+
 /* We do not have shadow page tables, hence the empty hooks */
 static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
 {
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index a6d8bb0..7eb0aed 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -17,6 +17,7 @@
  */
 #include <linux/mm.h>
 #include <linux/kvm_host.h>
+#include <linux/uaccess.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
@@ -564,8 +565,239 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return emulate_cp15(vcpu, &params);
 }
 
+/******************************************************************************
+ * Userspace API
+ *****************************************************************************/
+
+static bool index_to_params(u64 id, struct coproc_params *params)
+{
+	switch (id & KVM_REG_SIZE_MASK) {
+	case KVM_REG_SIZE_U32:
+		/* Any unused index bits means it's not valid. */
+		if (id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK
+			   | KVM_REG_ARM_COPROC_MASK
+			   | KVM_REG_ARM_32_CRN_MASK
+			   | KVM_REG_ARM_CRM_MASK
+			   | KVM_REG_ARM_OPC1_MASK
+			   | KVM_REG_ARM_32_OPC2_MASK))
+			return false;
+
+		params->is_64bit = false;
+		params->CRn = ((id & KVM_REG_ARM_32_CRN_MASK)
+			       >> KVM_REG_ARM_32_CRN_SHIFT);
+		params->CRm = ((id & KVM_REG_ARM_CRM_MASK)
+			       >> KVM_REG_ARM_CRM_SHIFT);
+		params->Op1 = ((id & KVM_REG_ARM_OPC1_MASK)
+			       >> KVM_REG_ARM_OPC1_SHIFT);
+		params->Op2 = ((id & KVM_REG_ARM_32_OPC2_MASK)
+			       >> KVM_REG_ARM_32_OPC2_SHIFT);
+		return true;
+	case KVM_REG_SIZE_U64:
+		/* Any unused index bits means it's not valid. */
+		if (id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK
+			      | KVM_REG_ARM_COPROC_MASK
+			      | KVM_REG_ARM_CRM_MASK
+			      | KVM_REG_ARM_OPC1_MASK))
+			return false;
+		params->is_64bit = true;
+		params->CRm = ((id & KVM_REG_ARM_CRM_MASK)
+			       >> KVM_REG_ARM_CRM_SHIFT);
+		params->Op1 = ((id & KVM_REG_ARM_OPC1_MASK)
+			       >> KVM_REG_ARM_OPC1_SHIFT);
+		params->Op2 = 0;
+		params->CRn = 0;
+		return true;
+	default:
+		return false;
+	}
+}
+
+/* Decode an index value, and find the cp15 coproc_reg entry. */
+static const struct coproc_reg *index_to_coproc_reg(struct kvm_vcpu *vcpu,
+						    u64 id)
+{
+	size_t num;
+	const struct coproc_reg *table, *r;
+	struct coproc_params params;
+
+	/* We only do cp15 for now. */
+	if ((id & KVM_REG_ARM_COPROC_MASK) >> KVM_REG_ARM_COPROC_SHIFT != 15)
+		return NULL;
+
+	if (!index_to_params(id, &params))
+		return NULL;
+
+	table = get_target_table(vcpu->arch.target, &num);
+	r = find_reg(&params, table, num);
+	if (!r)
+		r = find_reg(&params, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	/* Not saved in the cp15 array? */
+	if (r && !r->reg)
+		r = NULL;
+
+	return r;
+}
+
+/*
+ * These are the invariant cp15 registers: we let the guest see the host
+ * versions of these, so they're part of the guest state.
+ *
+ * A future CPU may provide a mechanism to present different values to
+ * the guest, or a future kvm may trap them.
+ */
+/* Unfortunately, there's no register-argument for mrc, so generate. */
+#define FUNCTION_FOR32(crn, crm, op1, op2, name)			\
+	static void get_##name(struct kvm_vcpu *v,			\
+			       const struct coproc_reg *r)		\
+	{								\
+		u32 val;						\
+									\
+		asm volatile("mrc p15, " __stringify(op1)		\
+			     ", %0, c" __stringify(crn)			\
+			     ", c" __stringify(crm)			\
+			     ", " __stringify(op2) "\n" : "=r" (val));	\
+		((struct coproc_reg *)r)->val = val;			\
+	}
+
+FUNCTION_FOR32(0, 0, 0, 0, MIDR)
+FUNCTION_FOR32(0, 0, 0, 1, CTR)
+FUNCTION_FOR32(0, 0, 0, 2, TCMTR)
+FUNCTION_FOR32(0, 0, 0, 3, TLBTR)
+FUNCTION_FOR32(0, 0, 0, 6, REVIDR)
+FUNCTION_FOR32(0, 1, 0, 0, ID_PFR0)
+FUNCTION_FOR32(0, 1, 0, 1, ID_PFR1)
+FUNCTION_FOR32(0, 1, 0, 2, ID_DFR0)
+FUNCTION_FOR32(0, 1, 0, 3, ID_AFR0)
+FUNCTION_FOR32(0, 1, 0, 4, ID_MMFR0)
+FUNCTION_FOR32(0, 1, 0, 5, ID_MMFR1)
+FUNCTION_FOR32(0, 1, 0, 6, ID_MMFR2)
+FUNCTION_FOR32(0, 1, 0, 7, ID_MMFR3)
+FUNCTION_FOR32(0, 2, 0, 0, ID_ISAR0)
+FUNCTION_FOR32(0, 2, 0, 1, ID_ISAR1)
+FUNCTION_FOR32(0, 2, 0, 2, ID_ISAR2)
+FUNCTION_FOR32(0, 2, 0, 3, ID_ISAR3)
+FUNCTION_FOR32(0, 2, 0, 4, ID_ISAR4)
+FUNCTION_FOR32(0, 2, 0, 5, ID_ISAR5)
+FUNCTION_FOR32(0, 0, 1, 1, CLIDR)
+FUNCTION_FOR32(0, 0, 1, 7, AIDR)
+
+/* ->val is filled in by kvm_invariant_coproc_table_init() */
+static struct coproc_reg invariant_cp15[] = {
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 0), is32, NULL, get_MIDR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 1), is32, NULL, get_CTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 2), is32, NULL, get_TCMTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 3), is32, NULL, get_TLBTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 6), is32, NULL, get_REVIDR },
+
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 0), is32, NULL, get_ID_PFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 1), is32, NULL, get_ID_PFR1 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 2), is32, NULL, get_ID_DFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 3), is32, NULL, get_ID_AFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 4), is32, NULL, get_ID_MMFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 5), is32, NULL, get_ID_MMFR1 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 6), is32, NULL, get_ID_MMFR2 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 7), is32, NULL, get_ID_MMFR3 },
+
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 0), is32, NULL, get_ID_ISAR0 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 1), is32, NULL, get_ID_ISAR1 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 2), is32, NULL, get_ID_ISAR2 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 3), is32, NULL, get_ID_ISAR3 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 4), is32, NULL, get_ID_ISAR4 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 5), is32, NULL, get_ID_ISAR5 },
+
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR },
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
+};
+
+static int reg_from_user(void *val, const void __user *uaddr, u64 id)
+{
+	/* This Just Works because we are little endian. */
+	if (copy_from_user(val, uaddr, KVM_REG_SIZE(id)) != 0)
+		return -EFAULT;
+	return 0;
+}
+
+static int reg_to_user(void __user *uaddr, const void *val, u64 id)
+{
+	/* This Just Works because we are little endian. */
+	if (copy_to_user(uaddr, val, KVM_REG_SIZE(id)) != 0)
+		return -EFAULT;
+	return 0;
+}
+
+static int get_invariant_cp15(u64 id, void __user *uaddr)
+{
+	struct coproc_params params;
+	const struct coproc_reg *r;
+
+	if (!index_to_params(id, &params))
+		return -ENOENT;
+
+	r = find_reg(&params, invariant_cp15, ARRAY_SIZE(invariant_cp15));
+	if (!r)
+		return -ENOENT;
+
+	return reg_to_user(uaddr, &r->val, id);
+}
+
+static int set_invariant_cp15(u64 id, void __user *uaddr)
+{
+	struct coproc_params params;
+	const struct coproc_reg *r;
+	int err;
+	u64 val = 0; /* Make sure high bits are 0 for 32-bit regs */
+
+	if (!index_to_params(id, &params))
+		return -ENOENT;
+	r = find_reg(&params, invariant_cp15, ARRAY_SIZE(invariant_cp15));
+	if (!r)
+		return -ENOENT;
+
+	err = reg_from_user(&val, uaddr, id);
+	if (err)
+		return err;
+
+	/* This is what we mean by invariant: you can't change it. */
+	if (r->val != val)
+		return -EINVAL;
+
+	return 0;
+}
+
+int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	const struct coproc_reg *r;
+	void __user *uaddr = (void __user *)(long)reg->addr;
+
+	r = index_to_coproc_reg(vcpu, reg->id);
+	if (!r)
+		return get_invariant_cp15(reg->id, uaddr);
+
+	/* Note: copies two regs if size is 64 bit. */
+	return reg_to_user(uaddr, &vcpu->arch.cp15[r->reg], reg->id);
+}
+
+int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	const struct coproc_reg *r;
+	void __user *uaddr = (void __user *)(long)reg->addr;
+
+	r = index_to_coproc_reg(vcpu, reg->id);
+	if (!r)
+		return set_invariant_cp15(reg->id, uaddr);
+
+	/* Note: copies two regs if size is 64 bit */
+	return reg_from_user(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
+}
+
 static int cmp_reg(const struct coproc_reg *i1, const struct coproc_reg *i2)
 {
+	BUG_ON(i1 == i2);
+	if (!i1)
+		return 1;
+	else if (!i2)
+		return -1;
 	if (i1->CRn != i2->CRn)
 		return i1->CRn - i2->CRn;
 	if (i1->CRm != i2->CRm)
@@ -575,6 +807,102 @@ static int cmp_reg(const struct coproc_reg *i1, const struct coproc_reg *i2)
 	return i1->Op2 - i2->Op2;
 }
 
+static u64 cp15_to_index(const struct coproc_reg *reg)
+{
+	u64 val = KVM_REG_ARM | (15 << KVM_REG_ARM_COPROC_SHIFT);
+	if (reg->is_64) {
+		val |= KVM_REG_SIZE_U64;
+		val |= (reg->Op1 << KVM_REG_ARM_OPC1_SHIFT);
+		val |= (reg->CRm << KVM_REG_ARM_CRM_SHIFT);
+	} else {
+		val |= KVM_REG_SIZE_U32;
+		val |= (reg->Op1 << KVM_REG_ARM_OPC1_SHIFT);
+		val |= (reg->Op2 << KVM_REG_ARM_32_OPC2_SHIFT);
+		val |= (reg->CRm << KVM_REG_ARM_CRM_SHIFT);
+		val |= (reg->CRn << KVM_REG_ARM_32_CRN_SHIFT);
+	}
+	return val;
+}
+
+static bool copy_reg_to_user(const struct coproc_reg *reg, u64 __user **uind)
+{
+	if (!*uind)
+		return true;
+
+	if (put_user(cp15_to_index(reg), *uind))
+		return false;
+
+	(*uind)++;
+	return true;
+}
+
+/* Assumed ordered tables, see kvm_coproc_table_init. */
+static int walk_cp15(struct kvm_vcpu *vcpu, u64 __user *uind)
+{
+	const struct coproc_reg *i1, *i2, *end1, *end2;
+	unsigned int total = 0;
+	size_t num;
+
+	/* We check for duplicates here, to allow arch-specific overrides. */
+	i1 = get_target_table(vcpu->arch.target, &num);
+	end1 = i1 + num;
+	i2 = cp15_regs;
+	end2 = cp15_regs + ARRAY_SIZE(cp15_regs);
+
+	BUG_ON(i1 == end1 || i2 == end2);
+
+	/* Walk carefully, as both tables may refer to the same register. */
+	while (i1 || i2) {
+		int cmp = cmp_reg(i1, i2);
+		/* target-specific overrides generic entry. */
+		if (cmp <= 0) {
+			/* Ignore registers we trap but don't save. */
+			if (i1->reg) {
+				if (!copy_reg_to_user(i1, &uind))
+					return -EFAULT;
+				total++;
+			}
+		} else {
+			/* Ignore registers we trap but don't save. */
+			if (i2->reg) {
+				if (!copy_reg_to_user(i2, &uind))
+					return -EFAULT;
+				total++;
+			}
+		}
+
+		if (cmp <= 0 && ++i1 == end1)
+			i1 = NULL;
+		if (cmp >= 0 && ++i2 == end2)
+			i2 = NULL;
+	}
+	return total;
+}
+
+unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu)
+{
+	return ARRAY_SIZE(invariant_cp15)
+		+ walk_cp15(vcpu, (u64 __user *)NULL);
+}
+
+int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
+{
+	unsigned int i;
+	int err;
+
+	/* Then give them all the invariant registers' indices. */
+	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++) {
+		if (put_user(cp15_to_index(&invariant_cp15[i]), uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	err = walk_cp15(vcpu, uindices);
+	if (err > 0)
+		err = 0;
+	return err;
+}
+
 void kvm_coproc_table_init(void)
 {
 	unsigned int i;
@@ -585,6 +913,10 @@ void kvm_coproc_table_init(void)
 	for (i = 1; i < ARRAY_SIZE(cp15_cortex_a15_regs); i++)
 		BUG_ON(cmp_reg(&cp15_cortex_a15_regs[i-1],
 			       &cp15_cortex_a15_regs[i]) >= 0);
+
+	/* We abuse the reset function to overwrite the table itself. */
+	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++)
+		invariant_cp15[i].reset(NULL, &invariant_cp15[i]);
 }
 
 /**
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 19a5389..8ad3df7 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -26,6 +26,7 @@
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
 
 #define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
 #define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
@@ -109,7 +110,7 @@ static unsigned long num_core_regs(void)
  */
 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
 {
-	return num_core_regs();
+	return num_core_regs() + kvm_arm_num_coproc_regs(vcpu);
 }
 
 /**
@@ -128,7 +129,7 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 		uindices++;
 	}
 
-	return 0;
+	return kvm_arm_copy_coproc_indices(vcpu, uindices);
 }
 
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@@ -141,7 +142,7 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
 		return get_core_reg(vcpu, reg);
 
-	return -EINVAL;
+	return kvm_arm_coproc_get_reg(vcpu, reg);
 }
 
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@@ -154,7 +155,7 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
 		return set_core_reg(vcpu, reg);
 
-	return -EINVAL;
+	return kvm_arm_coproc_set_reg(vcpu, reg);
 }
 
 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 12/15] KVM: ARM: User space API for getting/setting co-proc registers
@ 2012-09-15 15:35   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

The following three ioctls are implemented:
 -  KVM_GET_REG_LIST
 -  KVM_GET_ONE_REG
 -  KVM_SET_ONE_REG

Now we have a table for all the cp15 registers, we can drive a generic
API.

The register IDs carry the following encoding:

ARM registers are mapped using the lower 32 bits.  The upper 16 of that
is the register group type, or coprocessor number:

ARM 32-bit CP15 registers have the following id bit patterns:
  0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>

ARM 64-bit CP15 registers have the following id bit patterns:
  0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>

For futureproofing, we need to tell QEMU about the CP15 registers the
host lets the guest access.

It will need this information to restore a current guest on a future
CPU or perhaps a future KVM which allow some of these to be changed.

We use a separate table for these, as they're only for the userspace API.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |   46 +++++
 arch/arm/include/asm/kvm_coproc.h |    9 +
 arch/arm/include/asm/kvm_host.h   |    4 
 arch/arm/kvm/coproc.c             |  332 +++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/guest.c              |    9 +
 5 files changed, 396 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 605986f..035d4cb 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1756,6 +1756,13 @@ is the register group type, or coprocessor number:
 ARM core registers have the following id bit patterns:
   0x4002 0000 0010 <index into the kvm_regs struct:32>
 
+ARM 32-bit CP15 registers have the following id bit patterns:
+  0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
+
+ARM 64-bit CP15 registers have the following id bit patterns:
+  0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
+
+
 4.69 KVM_GET_ONE_REG
 
 Capability: KVM_CAP_ONE_REG
@@ -2048,6 +2055,45 @@ This ioctl returns the guest registers that are supported for the
 KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
 
 
+4.77 KVM_ARM_VCPU_INIT
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct struct kvm_vcpu_init (in)
+Returns: 0 on success; -1 on error
+Errors:
+ ?EINVAL: ???the target is unknown, or the combination of features is invalid.
+ ?ENOENT: ???a features bit specified is unknown.
+
+This tells KVM what type of CPU to present to the guest, and what
+optional features it should have. ?This will cause a reset of the cpu
+registers to their initial values. ?If this is not called, KVM_RUN will
+return ENOEXEC for that vcpu.
+
+Note that because some registers reflect machine topology, all vcpus
+should be created before this ioctl is invoked.
+
+4.78 KVM_GET_REG_LIST
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct kvm_reg_list (in/out)
+Returns: 0 on success; -1 on error
+Errors:
+ ?E2BIG: ????the reg index list is too big to fit in the array specified by
+ ????????????the user (the number required will be written into n).
+
+struct kvm_reg_list {
+	__u64 n; /* number of registers in reg[] */
+	__u64 reg[0];
+};
+
+This ioctl returns the guest registers that are supported for the
+KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
+
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
index c451fb4..05943ec 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -27,5 +27,14 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+unsigned long kvm_arm_num_guest_msrs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_msrindices(struct kvm_vcpu *vcpu, u64 __user *uindices);
 void kvm_coproc_table_init(void);
+
+struct kvm_one_reg;
+int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
+int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index c98dcd7..97608d7 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #define KVM_MEMORY_SLOTS 32
 #define KVM_PRIVATE_MEM_SLOTS 4
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_HAVE_ONE_REG
 
 #define NUM_FEATURES 0
 
@@ -191,6 +192,9 @@ int kvm_unmap_hva_range(struct kvm *kvm,
 			unsigned long start, unsigned long end);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
+
 /* We do not have shadow page tables, hence the empty hooks */
 static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
 {
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index a6d8bb0..7eb0aed 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -17,6 +17,7 @@
  */
 #include <linux/mm.h>
 #include <linux/kvm_host.h>
+#include <linux/uaccess.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
@@ -564,8 +565,239 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return emulate_cp15(vcpu, &params);
 }
 
+/******************************************************************************
+ * Userspace API
+ *****************************************************************************/
+
+static bool index_to_params(u64 id, struct coproc_params *params)
+{
+	switch (id & KVM_REG_SIZE_MASK) {
+	case KVM_REG_SIZE_U32:
+		/* Any unused index bits means it's not valid. */
+		if (id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK
+			   | KVM_REG_ARM_COPROC_MASK
+			   | KVM_REG_ARM_32_CRN_MASK
+			   | KVM_REG_ARM_CRM_MASK
+			   | KVM_REG_ARM_OPC1_MASK
+			   | KVM_REG_ARM_32_OPC2_MASK))
+			return false;
+
+		params->is_64bit = false;
+		params->CRn = ((id & KVM_REG_ARM_32_CRN_MASK)
+			       >> KVM_REG_ARM_32_CRN_SHIFT);
+		params->CRm = ((id & KVM_REG_ARM_CRM_MASK)
+			       >> KVM_REG_ARM_CRM_SHIFT);
+		params->Op1 = ((id & KVM_REG_ARM_OPC1_MASK)
+			       >> KVM_REG_ARM_OPC1_SHIFT);
+		params->Op2 = ((id & KVM_REG_ARM_32_OPC2_MASK)
+			       >> KVM_REG_ARM_32_OPC2_SHIFT);
+		return true;
+	case KVM_REG_SIZE_U64:
+		/* Any unused index bits means it's not valid. */
+		if (id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK
+			      | KVM_REG_ARM_COPROC_MASK
+			      | KVM_REG_ARM_CRM_MASK
+			      | KVM_REG_ARM_OPC1_MASK))
+			return false;
+		params->is_64bit = true;
+		params->CRm = ((id & KVM_REG_ARM_CRM_MASK)
+			       >> KVM_REG_ARM_CRM_SHIFT);
+		params->Op1 = ((id & KVM_REG_ARM_OPC1_MASK)
+			       >> KVM_REG_ARM_OPC1_SHIFT);
+		params->Op2 = 0;
+		params->CRn = 0;
+		return true;
+	default:
+		return false;
+	}
+}
+
+/* Decode an index value, and find the cp15 coproc_reg entry. */
+static const struct coproc_reg *index_to_coproc_reg(struct kvm_vcpu *vcpu,
+						    u64 id)
+{
+	size_t num;
+	const struct coproc_reg *table, *r;
+	struct coproc_params params;
+
+	/* We only do cp15 for now. */
+	if ((id & KVM_REG_ARM_COPROC_MASK) >> KVM_REG_ARM_COPROC_SHIFT != 15)
+		return NULL;
+
+	if (!index_to_params(id, &params))
+		return NULL;
+
+	table = get_target_table(vcpu->arch.target, &num);
+	r = find_reg(&params, table, num);
+	if (!r)
+		r = find_reg(&params, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	/* Not saved in the cp15 array? */
+	if (r && !r->reg)
+		r = NULL;
+
+	return r;
+}
+
+/*
+ * These are the invariant cp15 registers: we let the guest see the host
+ * versions of these, so they're part of the guest state.
+ *
+ * A future CPU may provide a mechanism to present different values to
+ * the guest, or a future kvm may trap them.
+ */
+/* Unfortunately, there's no register-argument for mrc, so generate. */
+#define FUNCTION_FOR32(crn, crm, op1, op2, name)			\
+	static void get_##name(struct kvm_vcpu *v,			\
+			       const struct coproc_reg *r)		\
+	{								\
+		u32 val;						\
+									\
+		asm volatile("mrc p15, " __stringify(op1)		\
+			     ", %0, c" __stringify(crn)			\
+			     ", c" __stringify(crm)			\
+			     ", " __stringify(op2) "\n" : "=r" (val));	\
+		((struct coproc_reg *)r)->val = val;			\
+	}
+
+FUNCTION_FOR32(0, 0, 0, 0, MIDR)
+FUNCTION_FOR32(0, 0, 0, 1, CTR)
+FUNCTION_FOR32(0, 0, 0, 2, TCMTR)
+FUNCTION_FOR32(0, 0, 0, 3, TLBTR)
+FUNCTION_FOR32(0, 0, 0, 6, REVIDR)
+FUNCTION_FOR32(0, 1, 0, 0, ID_PFR0)
+FUNCTION_FOR32(0, 1, 0, 1, ID_PFR1)
+FUNCTION_FOR32(0, 1, 0, 2, ID_DFR0)
+FUNCTION_FOR32(0, 1, 0, 3, ID_AFR0)
+FUNCTION_FOR32(0, 1, 0, 4, ID_MMFR0)
+FUNCTION_FOR32(0, 1, 0, 5, ID_MMFR1)
+FUNCTION_FOR32(0, 1, 0, 6, ID_MMFR2)
+FUNCTION_FOR32(0, 1, 0, 7, ID_MMFR3)
+FUNCTION_FOR32(0, 2, 0, 0, ID_ISAR0)
+FUNCTION_FOR32(0, 2, 0, 1, ID_ISAR1)
+FUNCTION_FOR32(0, 2, 0, 2, ID_ISAR2)
+FUNCTION_FOR32(0, 2, 0, 3, ID_ISAR3)
+FUNCTION_FOR32(0, 2, 0, 4, ID_ISAR4)
+FUNCTION_FOR32(0, 2, 0, 5, ID_ISAR5)
+FUNCTION_FOR32(0, 0, 1, 1, CLIDR)
+FUNCTION_FOR32(0, 0, 1, 7, AIDR)
+
+/* ->val is filled in by kvm_invariant_coproc_table_init() */
+static struct coproc_reg invariant_cp15[] = {
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 0), is32, NULL, get_MIDR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 1), is32, NULL, get_CTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 2), is32, NULL, get_TCMTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 3), is32, NULL, get_TLBTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 6), is32, NULL, get_REVIDR },
+
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 0), is32, NULL, get_ID_PFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 1), is32, NULL, get_ID_PFR1 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 2), is32, NULL, get_ID_DFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 3), is32, NULL, get_ID_AFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 4), is32, NULL, get_ID_MMFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 5), is32, NULL, get_ID_MMFR1 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 6), is32, NULL, get_ID_MMFR2 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 7), is32, NULL, get_ID_MMFR3 },
+
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 0), is32, NULL, get_ID_ISAR0 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 1), is32, NULL, get_ID_ISAR1 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 2), is32, NULL, get_ID_ISAR2 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 3), is32, NULL, get_ID_ISAR3 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 4), is32, NULL, get_ID_ISAR4 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 5), is32, NULL, get_ID_ISAR5 },
+
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR },
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
+};
+
+static int reg_from_user(void *val, const void __user *uaddr, u64 id)
+{
+	/* This Just Works because we are little endian. */
+	if (copy_from_user(val, uaddr, KVM_REG_SIZE(id)) != 0)
+		return -EFAULT;
+	return 0;
+}
+
+static int reg_to_user(void __user *uaddr, const void *val, u64 id)
+{
+	/* This Just Works because we are little endian. */
+	if (copy_to_user(uaddr, val, KVM_REG_SIZE(id)) != 0)
+		return -EFAULT;
+	return 0;
+}
+
+static int get_invariant_cp15(u64 id, void __user *uaddr)
+{
+	struct coproc_params params;
+	const struct coproc_reg *r;
+
+	if (!index_to_params(id, &params))
+		return -ENOENT;
+
+	r = find_reg(&params, invariant_cp15, ARRAY_SIZE(invariant_cp15));
+	if (!r)
+		return -ENOENT;
+
+	return reg_to_user(uaddr, &r->val, id);
+}
+
+static int set_invariant_cp15(u64 id, void __user *uaddr)
+{
+	struct coproc_params params;
+	const struct coproc_reg *r;
+	int err;
+	u64 val = 0; /* Make sure high bits are 0 for 32-bit regs */
+
+	if (!index_to_params(id, &params))
+		return -ENOENT;
+	r = find_reg(&params, invariant_cp15, ARRAY_SIZE(invariant_cp15));
+	if (!r)
+		return -ENOENT;
+
+	err = reg_from_user(&val, uaddr, id);
+	if (err)
+		return err;
+
+	/* This is what we mean by invariant: you can't change it. */
+	if (r->val != val)
+		return -EINVAL;
+
+	return 0;
+}
+
+int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	const struct coproc_reg *r;
+	void __user *uaddr = (void __user *)(long)reg->addr;
+
+	r = index_to_coproc_reg(vcpu, reg->id);
+	if (!r)
+		return get_invariant_cp15(reg->id, uaddr);
+
+	/* Note: copies two regs if size is 64 bit. */
+	return reg_to_user(uaddr, &vcpu->arch.cp15[r->reg], reg->id);
+}
+
+int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	const struct coproc_reg *r;
+	void __user *uaddr = (void __user *)(long)reg->addr;
+
+	r = index_to_coproc_reg(vcpu, reg->id);
+	if (!r)
+		return set_invariant_cp15(reg->id, uaddr);
+
+	/* Note: copies two regs if size is 64 bit */
+	return reg_from_user(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
+}
+
 static int cmp_reg(const struct coproc_reg *i1, const struct coproc_reg *i2)
 {
+	BUG_ON(i1 == i2);
+	if (!i1)
+		return 1;
+	else if (!i2)
+		return -1;
 	if (i1->CRn != i2->CRn)
 		return i1->CRn - i2->CRn;
 	if (i1->CRm != i2->CRm)
@@ -575,6 +807,102 @@ static int cmp_reg(const struct coproc_reg *i1, const struct coproc_reg *i2)
 	return i1->Op2 - i2->Op2;
 }
 
+static u64 cp15_to_index(const struct coproc_reg *reg)
+{
+	u64 val = KVM_REG_ARM | (15 << KVM_REG_ARM_COPROC_SHIFT);
+	if (reg->is_64) {
+		val |= KVM_REG_SIZE_U64;
+		val |= (reg->Op1 << KVM_REG_ARM_OPC1_SHIFT);
+		val |= (reg->CRm << KVM_REG_ARM_CRM_SHIFT);
+	} else {
+		val |= KVM_REG_SIZE_U32;
+		val |= (reg->Op1 << KVM_REG_ARM_OPC1_SHIFT);
+		val |= (reg->Op2 << KVM_REG_ARM_32_OPC2_SHIFT);
+		val |= (reg->CRm << KVM_REG_ARM_CRM_SHIFT);
+		val |= (reg->CRn << KVM_REG_ARM_32_CRN_SHIFT);
+	}
+	return val;
+}
+
+static bool copy_reg_to_user(const struct coproc_reg *reg, u64 __user **uind)
+{
+	if (!*uind)
+		return true;
+
+	if (put_user(cp15_to_index(reg), *uind))
+		return false;
+
+	(*uind)++;
+	return true;
+}
+
+/* Assumed ordered tables, see kvm_coproc_table_init. */
+static int walk_cp15(struct kvm_vcpu *vcpu, u64 __user *uind)
+{
+	const struct coproc_reg *i1, *i2, *end1, *end2;
+	unsigned int total = 0;
+	size_t num;
+
+	/* We check for duplicates here, to allow arch-specific overrides. */
+	i1 = get_target_table(vcpu->arch.target, &num);
+	end1 = i1 + num;
+	i2 = cp15_regs;
+	end2 = cp15_regs + ARRAY_SIZE(cp15_regs);
+
+	BUG_ON(i1 == end1 || i2 == end2);
+
+	/* Walk carefully, as both tables may refer to the same register. */
+	while (i1 || i2) {
+		int cmp = cmp_reg(i1, i2);
+		/* target-specific overrides generic entry. */
+		if (cmp <= 0) {
+			/* Ignore registers we trap but don't save. */
+			if (i1->reg) {
+				if (!copy_reg_to_user(i1, &uind))
+					return -EFAULT;
+				total++;
+			}
+		} else {
+			/* Ignore registers we trap but don't save. */
+			if (i2->reg) {
+				if (!copy_reg_to_user(i2, &uind))
+					return -EFAULT;
+				total++;
+			}
+		}
+
+		if (cmp <= 0 && ++i1 == end1)
+			i1 = NULL;
+		if (cmp >= 0 && ++i2 == end2)
+			i2 = NULL;
+	}
+	return total;
+}
+
+unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu)
+{
+	return ARRAY_SIZE(invariant_cp15)
+		+ walk_cp15(vcpu, (u64 __user *)NULL);
+}
+
+int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
+{
+	unsigned int i;
+	int err;
+
+	/* Then give them all the invariant registers' indices. */
+	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++) {
+		if (put_user(cp15_to_index(&invariant_cp15[i]), uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	err = walk_cp15(vcpu, uindices);
+	if (err > 0)
+		err = 0;
+	return err;
+}
+
 void kvm_coproc_table_init(void)
 {
 	unsigned int i;
@@ -585,6 +913,10 @@ void kvm_coproc_table_init(void)
 	for (i = 1; i < ARRAY_SIZE(cp15_cortex_a15_regs); i++)
 		BUG_ON(cmp_reg(&cp15_cortex_a15_regs[i-1],
 			       &cp15_cortex_a15_regs[i]) >= 0);
+
+	/* We abuse the reset function to overwrite the table itself. */
+	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++)
+		invariant_cp15[i].reset(NULL, &invariant_cp15[i]);
 }
 
 /**
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 19a5389..8ad3df7 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -26,6 +26,7 @@
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
 
 #define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
 #define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
@@ -109,7 +110,7 @@ static unsigned long num_core_regs(void)
  */
 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
 {
-	return num_core_regs();
+	return num_core_regs() + kvm_arm_num_coproc_regs(vcpu);
 }
 
 /**
@@ -128,7 +129,7 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 		uindices++;
 	}
 
-	return 0;
+	return kvm_arm_copy_coproc_indices(vcpu, uindices);
 }
 
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@@ -141,7 +142,7 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
 		return get_core_reg(vcpu, reg);
 
-	return -EINVAL;
+	return kvm_arm_coproc_get_reg(vcpu, reg);
 }
 
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@@ -154,7 +155,7 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
 		return set_core_reg(vcpu, reg);
 
-	return -EINVAL;
+	return kvm_arm_coproc_set_reg(vcpu, reg);
 }
 
 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:35   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

From: Christoffer Dall <cdall@cs.columbia.edu>

Handles the guest faults in KVM by mapping in corresponding user pages
in the 2nd stage page tables.

We invalidate the instruction cache by MVA whenever we map a page to the
guest (no, we cannot only do it when we have an iabt because the guest
may happily read/write a page before hitting the icache) if the hardware
uses VIPT or PIPT.  In the latter case, we can invalidate only that
physical page.  In the first case, all bets are off and we simply must
invalidate the whole affair.  Not that VIVT icaches are tagged with
vmids, and we are out of the woods on that one.  Alexander Graf was nice
enough to remind us of this massive pain.

There may be a subtle bug hidden in, which we currently hide by marking
all pages dirty even when the pages are only mapped read-only.  The
current hypothesis is that marking pages dirty may exercise the IO system
and data cache more and therefore we don't see stale data in the guest,
but it's purely guesswork.  The bug is manifested by seemingly random
kernel crashes in guests when the host is under extreme memory pressure and
swapping is enabled.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h |    9 ++
 arch/arm/include/asm/kvm_asm.h |    2 +
 arch/arm/kvm/mmu.c             |  155 ++++++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h           |   28 +++++++
 4 files changed, 193 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index ae586c1..4cff3b7 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -158,11 +158,20 @@
 #define HSR_ISS		(HSR_IL - 1)
 #define HSR_ISV_SHIFT	(24)
 #define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_FSC		(0x3f)
+#define HSR_FSC_TYPE	(0x3c)
+#define HSR_WNR		(1 << 6)
 #define HSR_CV_SHIFT	(24)
 #define HSR_CV		(1U << HSR_CV_SHIFT)
 #define HSR_COND_SHIFT	(20)
 #define HSR_COND	(0xfU << HSR_COND_SHIFT)
 
+#define FSC_FAULT	(0x04)
+#define FSC_PERM	(0x0c)
+
+/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
+#define HPFAR_MASK	(~0xf)
+
 #define HSR_EC_UNKNOWN	(0x00)
 #define HSR_EC_WFI	(0x01)
 #define HSR_EC_CP15_32	(0x03)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 201ec1f..40ee099 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -43,6 +43,8 @@ extern char __kvm_hyp_vector[];
 extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index ea17a97..52cc280 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -21,10 +21,16 @@
 #include <linux/io.h>
 #include <asm/idmap.h>
 #include <asm/pgalloc.h>
+#include <asm/cacheflush.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
 #include <asm/mach/map.h>
+#include <asm/kvm_asm.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
 
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
@@ -490,9 +496,156 @@ out:
 	return ret;
 }
 
+static void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
+{
+	/*
+	 * If we are going to insert an instruction page and the icache is
+	 * either VIPT or PIPT, there is a potential problem where the host
+	 * (or another VM) may have used this page at the same virtual address
+	 * as this guest, and we read incorrect data from the icache.  If
+	 * we're using a PIPT cache, we can invalidate just that page, but if
+	 * we are using a VIPT cache we need to invalidate the entire icache -
+	 * damn shame - as written in the ARM ARM (DDI 0406C - Page B3-1384)
+	 */
+	if (icache_is_pipt()) {
+		unsigned long hva = gfn_to_hva(kvm, gfn);
+		__cpuc_coherent_user_range(hva, hva + PAGE_SIZE);
+	} else if (!icache_is_vivt_asid_tagged()) {
+		/* any kind of VIPT cache */
+		__flush_icache_all();
+	}
+}
+
+static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			  gfn_t gfn, struct kvm_memory_slot *memslot,
+			  bool is_iabt, unsigned long fault_status)
+{
+	pte_t new_pte;
+	pfn_t pfn, pfn_existing = KVM_PFN_ERR_BAD;
+	int ret;
+	bool write_fault, writable;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+
+	if (is_iabt)
+		write_fault = false;
+	else if ((vcpu->arch.hsr & HSR_ISV) && !(vcpu->arch.hsr & HSR_WNR))
+		write_fault = false;
+	else
+		write_fault = true;
+
+	if (fault_status == FSC_PERM && !write_fault) {
+		kvm_err("Unexpected L2 read permission error\n");
+		return -EFAULT;
+	}
+
+	/*
+	 * If this is a write fault (think COW) we need to make sure the
+	 * existing page, which other CPUs might still read, doesn't go away
+	 * from under us, by calling gfn_to_pfn_prot(write_fault=true).
+	 * Therefore, we call gfn_to_pfn_prot(write_fault=false), which will
+	 * pin the existing page, then we get a new page for the user space
+	 * pte and map this in the stage-2 table where we also make sure to
+	 * flush the TLB for the VM, if there was an existing entry (the entry
+	 * was updated setting the write flag to the potentially new page).
+	 */
+	if (fault_status == FSC_PERM) {
+		pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
+		if (is_error_pfn(pfn_existing))
+			return -EFAULT;
+	}
+
+	pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
+	if (is_error_pfn(pfn)) {
+		ret = -EFAULT;
+		goto out_put_existing;
+	}
+
+	/* We need minimum second+third level pages */
+	ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
+	if (ret)
+		goto out;
+	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
+	if (writable)
+		pte_val(new_pte) |= L_PTE2_WRITE;
+	coherent_icache_guest_page(vcpu->kvm, gfn);
+
+	spin_lock(&vcpu->kvm->arch.pgd_lock);
+	stage2_set_pte(vcpu->kvm, memcache, fault_ipa, &new_pte);
+	spin_unlock(&vcpu->kvm->arch.pgd_lock);
+
+out:
+	/*
+	 * XXX TODO FIXME:
+	 * This is _really_ *weird* !!!
+	 * We should only be calling the _dirty verison when we map something
+	 * writable, but this causes memory failures in guests under heavy
+	 * memory pressure on the host and heavy swapping.
+	 */
+	kvm_release_pfn_dirty(pfn);
+out_put_existing:
+	if (!is_error_pfn(pfn_existing))
+		kvm_release_pfn_clean(pfn_existing);
+	return ret;
+}
+
+/**
+ * kvm_handle_guest_abort - handles all 2nd stage aborts
+ * @vcpu:	the VCPU pointer
+ * @run:	the kvm_run structure
+ *
+ * Any abort that gets to the host is almost guaranteed to be caused by a
+ * missing second stage translation table entry, which can mean that either the
+ * guest simply needs more memory and we must allocate an appropriate page or it
+ * can mean that the guest tried to access I/O memory, which is emulated by user
+ * space. The distinction is based on the IPA causing the fault and whether this
+ * memory region has been registered as standard RAM by user space.
+ */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	unsigned long hsr_ec;
+	unsigned long fault_status;
+	phys_addr_t fault_ipa;
+	struct kvm_memory_slot *memslot = NULL;
+	bool is_iabt;
+	gfn_t gfn;
+	int ret;
+
+	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
+	is_iabt = (hsr_ec == HSR_EC_IABT);
+	fault_ipa = ((phys_addr_t)vcpu->arch.hpfar & HPFAR_MASK) << 8;
+
+	trace_kvm_guest_fault(*vcpu_pc(vcpu), vcpu->arch.hsr,
+			      vcpu->arch.hdfar, vcpu->arch.hifar, fault_ipa);
+
+	/* Check the stage-2 fault is trans. fault or write fault */
+	fault_status = (vcpu->arch.hsr & HSR_FSC_TYPE);
+	if (fault_status != FSC_FAULT && fault_status != FSC_PERM) {
+		kvm_err("Unsupported fault status: EC=%#lx DFCS=%#lx\n",
+			hsr_ec, fault_status);
+		return -EFAULT;
+	}
+
+	gfn = fault_ipa >> PAGE_SHIFT;
+	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
+		if (is_iabt) {
+			/* Prefetch Abort on I/O address */
+			kvm_inject_pabt(vcpu, vcpu->arch.hifar);
+			return 1;
+		}
+
+		kvm_pr_unimpl("I/O address abort...");
+		return 0;
+	}
+
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot->user_alloc) {
+		kvm_err("non user-alloc memslots not supported\n");
+		return -EINVAL;
+	}
+
+	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot,
+			     is_iabt, fault_status);
+	return ret ? ret : 1;
 }
 
 static void handle_hva_to_gpa(struct kvm *kvm, unsigned long hva,
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 772e045..40606c9 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,6 +39,34 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_guest_fault,
+	TP_PROTO(unsigned long vcpu_pc, unsigned long hsr,
+		 unsigned long hdfar, unsigned long hifar,
+		 unsigned long ipa),
+	TP_ARGS(vcpu_pc, hsr, hdfar, hifar, ipa),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+		__field(	unsigned long,	hsr		)
+		__field(	unsigned long,	hdfar		)
+		__field(	unsigned long,	hifar		)
+		__field(	unsigned long,	ipa		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+		__entry->hsr			= hsr;
+		__entry->hdfar			= hdfar;
+		__entry->hifar			= hifar;
+		__entry->ipa			= ipa;
+	),
+
+	TP_printk("guest fault at PC %#08lx (hdfar %#08lx, hifar %#08lx, "
+		  "ipa %#08lx, hsr %#08lx",
+		  __entry->vcpu_pc, __entry->hdfar, __entry->hifar,
+		  __entry->hsr, __entry->ipa)
+);
+
 TRACE_EVENT(kvm_irq_line,
 	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
 	TP_ARGS(type, vcpu_idx, irq_num, level),


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
@ 2012-09-15 15:35   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christoffer Dall <cdall@cs.columbia.edu>

Handles the guest faults in KVM by mapping in corresponding user pages
in the 2nd stage page tables.

We invalidate the instruction cache by MVA whenever we map a page to the
guest (no, we cannot only do it when we have an iabt because the guest
may happily read/write a page before hitting the icache) if the hardware
uses VIPT or PIPT.  In the latter case, we can invalidate only that
physical page.  In the first case, all bets are off and we simply must
invalidate the whole affair.  Not that VIVT icaches are tagged with
vmids, and we are out of the woods on that one.  Alexander Graf was nice
enough to remind us of this massive pain.

There may be a subtle bug hidden in, which we currently hide by marking
all pages dirty even when the pages are only mapped read-only.  The
current hypothesis is that marking pages dirty may exercise the IO system
and data cache more and therefore we don't see stale data in the guest,
but it's purely guesswork.  The bug is manifested by seemingly random
kernel crashes in guests when the host is under extreme memory pressure and
swapping is enabled.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h |    9 ++
 arch/arm/include/asm/kvm_asm.h |    2 +
 arch/arm/kvm/mmu.c             |  155 ++++++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h           |   28 +++++++
 4 files changed, 193 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index ae586c1..4cff3b7 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -158,11 +158,20 @@
 #define HSR_ISS		(HSR_IL - 1)
 #define HSR_ISV_SHIFT	(24)
 #define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_FSC		(0x3f)
+#define HSR_FSC_TYPE	(0x3c)
+#define HSR_WNR		(1 << 6)
 #define HSR_CV_SHIFT	(24)
 #define HSR_CV		(1U << HSR_CV_SHIFT)
 #define HSR_COND_SHIFT	(20)
 #define HSR_COND	(0xfU << HSR_COND_SHIFT)
 
+#define FSC_FAULT	(0x04)
+#define FSC_PERM	(0x0c)
+
+/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
+#define HPFAR_MASK	(~0xf)
+
 #define HSR_EC_UNKNOWN	(0x00)
 #define HSR_EC_WFI	(0x01)
 #define HSR_EC_CP15_32	(0x03)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 201ec1f..40ee099 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -43,6 +43,8 @@ extern char __kvm_hyp_vector[];
 extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index ea17a97..52cc280 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -21,10 +21,16 @@
 #include <linux/io.h>
 #include <asm/idmap.h>
 #include <asm/pgalloc.h>
+#include <asm/cacheflush.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
 #include <asm/mach/map.h>
+#include <asm/kvm_asm.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
 
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
@@ -490,9 +496,156 @@ out:
 	return ret;
 }
 
+static void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
+{
+	/*
+	 * If we are going to insert an instruction page and the icache is
+	 * either VIPT or PIPT, there is a potential problem where the host
+	 * (or another VM) may have used this page at the same virtual address
+	 * as this guest, and we read incorrect data from the icache.  If
+	 * we're using a PIPT cache, we can invalidate just that page, but if
+	 * we are using a VIPT cache we need to invalidate the entire icache -
+	 * damn shame - as written in the ARM ARM (DDI 0406C - Page B3-1384)
+	 */
+	if (icache_is_pipt()) {
+		unsigned long hva = gfn_to_hva(kvm, gfn);
+		__cpuc_coherent_user_range(hva, hva + PAGE_SIZE);
+	} else if (!icache_is_vivt_asid_tagged()) {
+		/* any kind of VIPT cache */
+		__flush_icache_all();
+	}
+}
+
+static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			  gfn_t gfn, struct kvm_memory_slot *memslot,
+			  bool is_iabt, unsigned long fault_status)
+{
+	pte_t new_pte;
+	pfn_t pfn, pfn_existing = KVM_PFN_ERR_BAD;
+	int ret;
+	bool write_fault, writable;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+
+	if (is_iabt)
+		write_fault = false;
+	else if ((vcpu->arch.hsr & HSR_ISV) && !(vcpu->arch.hsr & HSR_WNR))
+		write_fault = false;
+	else
+		write_fault = true;
+
+	if (fault_status == FSC_PERM && !write_fault) {
+		kvm_err("Unexpected L2 read permission error\n");
+		return -EFAULT;
+	}
+
+	/*
+	 * If this is a write fault (think COW) we need to make sure the
+	 * existing page, which other CPUs might still read, doesn't go away
+	 * from under us, by calling gfn_to_pfn_prot(write_fault=true).
+	 * Therefore, we call gfn_to_pfn_prot(write_fault=false), which will
+	 * pin the existing page, then we get a new page for the user space
+	 * pte and map this in the stage-2 table where we also make sure to
+	 * flush the TLB for the VM, if there was an existing entry (the entry
+	 * was updated setting the write flag to the potentially new page).
+	 */
+	if (fault_status == FSC_PERM) {
+		pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
+		if (is_error_pfn(pfn_existing))
+			return -EFAULT;
+	}
+
+	pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
+	if (is_error_pfn(pfn)) {
+		ret = -EFAULT;
+		goto out_put_existing;
+	}
+
+	/* We need minimum second+third level pages */
+	ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
+	if (ret)
+		goto out;
+	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
+	if (writable)
+		pte_val(new_pte) |= L_PTE2_WRITE;
+	coherent_icache_guest_page(vcpu->kvm, gfn);
+
+	spin_lock(&vcpu->kvm->arch.pgd_lock);
+	stage2_set_pte(vcpu->kvm, memcache, fault_ipa, &new_pte);
+	spin_unlock(&vcpu->kvm->arch.pgd_lock);
+
+out:
+	/*
+	 * XXX TODO FIXME:
+	 * This is _really_ *weird* !!!
+	 * We should only be calling the _dirty verison when we map something
+	 * writable, but this causes memory failures in guests under heavy
+	 * memory pressure on the host and heavy swapping.
+	 */
+	kvm_release_pfn_dirty(pfn);
+out_put_existing:
+	if (!is_error_pfn(pfn_existing))
+		kvm_release_pfn_clean(pfn_existing);
+	return ret;
+}
+
+/**
+ * kvm_handle_guest_abort - handles all 2nd stage aborts
+ * @vcpu:	the VCPU pointer
+ * @run:	the kvm_run structure
+ *
+ * Any abort that gets to the host is almost guaranteed to be caused by a
+ * missing second stage translation table entry, which can mean that either the
+ * guest simply needs more memory and we must allocate an appropriate page or it
+ * can mean that the guest tried to access I/O memory, which is emulated by user
+ * space. The distinction is based on the IPA causing the fault and whether this
+ * memory region has been registered as standard RAM by user space.
+ */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	unsigned long hsr_ec;
+	unsigned long fault_status;
+	phys_addr_t fault_ipa;
+	struct kvm_memory_slot *memslot = NULL;
+	bool is_iabt;
+	gfn_t gfn;
+	int ret;
+
+	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
+	is_iabt = (hsr_ec == HSR_EC_IABT);
+	fault_ipa = ((phys_addr_t)vcpu->arch.hpfar & HPFAR_MASK) << 8;
+
+	trace_kvm_guest_fault(*vcpu_pc(vcpu), vcpu->arch.hsr,
+			      vcpu->arch.hdfar, vcpu->arch.hifar, fault_ipa);
+
+	/* Check the stage-2 fault is trans. fault or write fault */
+	fault_status = (vcpu->arch.hsr & HSR_FSC_TYPE);
+	if (fault_status != FSC_FAULT && fault_status != FSC_PERM) {
+		kvm_err("Unsupported fault status: EC=%#lx DFCS=%#lx\n",
+			hsr_ec, fault_status);
+		return -EFAULT;
+	}
+
+	gfn = fault_ipa >> PAGE_SHIFT;
+	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
+		if (is_iabt) {
+			/* Prefetch Abort on I/O address */
+			kvm_inject_pabt(vcpu, vcpu->arch.hifar);
+			return 1;
+		}
+
+		kvm_pr_unimpl("I/O address abort...");
+		return 0;
+	}
+
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot->user_alloc) {
+		kvm_err("non user-alloc memslots not supported\n");
+		return -EINVAL;
+	}
+
+	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot,
+			     is_iabt, fault_status);
+	return ret ? ret : 1;
 }
 
 static void handle_hva_to_gpa(struct kvm *kvm, unsigned long hva,
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 772e045..40606c9 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,6 +39,34 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_guest_fault,
+	TP_PROTO(unsigned long vcpu_pc, unsigned long hsr,
+		 unsigned long hdfar, unsigned long hifar,
+		 unsigned long ipa),
+	TP_ARGS(vcpu_pc, hsr, hdfar, hifar, ipa),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+		__field(	unsigned long,	hsr		)
+		__field(	unsigned long,	hdfar		)
+		__field(	unsigned long,	hifar		)
+		__field(	unsigned long,	ipa		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+		__entry->hsr			= hsr;
+		__entry->hdfar			= hdfar;
+		__entry->hifar			= hifar;
+		__entry->ipa			= ipa;
+	),
+
+	TP_printk("guest fault at PC %#08lx (hdfar %#08lx, hifar %#08lx, "
+		  "ipa %#08lx, hsr %#08lx",
+		  __entry->vcpu_pc, __entry->hdfar, __entry->hifar,
+		  __entry->hsr, __entry->ipa)
+);
+
 TRACE_EVENT(kvm_irq_line,
 	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
 	TP_ARGS(type, vcpu_idx, irq_num, level),

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 14/15] KVM: ARM: Handle I/O aborts
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:35   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.

Certain classes of load/store operations do not support the syndrome
information provided in the HSR and we therefore must be able to fetch
the offending instruction from guest memory and decode it manually.

We only support instruction decoding for valid reasonable MMIO operations
where trapping them do not provide sufficient information in the HSR (no
16-bit Thumb instructions provide register writeback that we care about).

The following instruciton types are NOT supported for MMIO operations
despite the HSR not containing decode info:
 - any Load/Store multiple
 - any load/store exclusive
 - any load/store dual
 - anything with the PC as the dest register

This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.

Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
(1) Guest complicated mmio instruction traps.
(2) The hardware doesn't tell us enough, so we need to read the actual
    instruction which was being exectuted.
(3) KVM maps the instruction virtual address to a physical address.
(4) The guest (SMP) swaps out that page, and fills it with something else.
(5) We read the physical address, but now that's the wrong thing.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h     |    3 
 arch/arm/include/asm/kvm_asm.h     |    2 
 arch/arm/include/asm/kvm_emulate.h |   23 ++
 arch/arm/include/asm/kvm_host.h    |    3 
 arch/arm/include/asm/kvm_mmu.h     |    1 
 arch/arm/kvm/arm.c                 |   14 +
 arch/arm/kvm/emulate.c             |  448 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/exports.c             |    1 
 arch/arm/kvm/interrupts.S          |   40 +++
 arch/arm/kvm/mmu.c                 |  266 +++++++++++++++++++++
 arch/arm/kvm/trace.h               |   21 ++
 11 files changed, 819 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 4cff3b7..21cb240 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -158,8 +158,11 @@
 #define HSR_ISS		(HSR_IL - 1)
 #define HSR_ISV_SHIFT	(24)
 #define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_SRT_SHIFT	(16)
+#define HSR_SRT_MASK	(0xf << HSR_SRT_SHIFT)
 #define HSR_FSC		(0x3f)
 #define HSR_FSC_TYPE	(0x3c)
+#define HSR_SSE		(1 << 21)
 #define HSR_WNR		(1 << 6)
 #define HSR_CV_SHIFT	(24)
 #define HSR_CV		(1U << HSR_CV_SHIFT)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 40ee099..5315c69 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -49,6 +49,8 @@ extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+
+extern u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv);
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 6ddfae2..68489f2 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -22,6 +22,27 @@
 #include <linux/kvm_host.h>
 #include <asm/kvm_asm.h>
 
+/*
+ * The in-kernel MMIO emulation code wants to use a copy of run->mmio,
+ * which is an anonymous type. Use our own type instead.
+ */
+struct kvm_exit_mmio {
+	phys_addr_t	phys_addr;
+	u8		data[8];
+	u32		len;
+	bool		is_write;
+};
+
+static inline void kvm_prepare_mmio(struct kvm_run *run,
+				    struct kvm_exit_mmio *mmio)
+{
+	run->mmio.phys_addr	= mmio->phys_addr;
+	run->mmio.len		= mmio->len;
+	run->mmio.is_write	= mmio->is_write;
+	memcpy(run->mmio.data, mmio->data, mmio->len);
+	run->exit_reason	= KVM_EXIT_MMIO;
+}
+
 u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, enum vcpu_mode mode);
 
 static inline u8 __vcpu_mode(u32 cpsr)
@@ -52,6 +73,8 @@ static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
 }
 
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			unsigned long instr, struct kvm_exit_mmio *mmio);
 void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 97608d7..3fec9ad 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -152,6 +152,9 @@ struct kvm_vcpu_arch {
 	int last_pcpu;
 	cpumask_t require_dcache_flush;
 
+	/* Don't run the guest: see copy_current_insn() */
+	bool pause;
+
 	/* IO related fields */
 	struct {
 		bool sign_extend;	/* for byte/halfword loads */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 11f4c3a..c3f90b0 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -38,6 +38,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size);
 
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 8dc5887..06a3368 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -570,6 +570,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	if (unlikely(!vcpu->arch.target))
 		return -ENOEXEC;
 
+	if (run->exit_reason == KVM_EXIT_MMIO) {
+		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
+		if (ret)
+			return ret;
+	}
+
 	if (vcpu->sigset_active)
 		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
 
@@ -607,7 +613,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_guest_enter();
 		vcpu->mode = IN_GUEST_MODE;
 
-		ret = __kvm_vcpu_run(vcpu);
+		smp_mb(); /* set mode before reading vcpu->arch.pause */
+		if (unlikely(vcpu->arch.pause)) {
+			/* This means ignore, try again. */
+			ret = ARM_EXCEPTION_IRQ;
+		} else {
+			ret = __kvm_vcpu_run(vcpu);
+		}
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		vcpu->arch.last_pcpu = smp_processor_id();
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 9236d10..2670679 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -132,11 +132,459 @@ u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
 	return reg_array + vcpu_reg_offsets[mode][reg_num];
 }
 
+/******************************************************************************
+ * Utility functions common for all emulation code
+ *****************************************************************************/
+
+/*
+ * This one accepts a matrix where the first element is the
+ * bits as they must be, and the second element is the bitmask.
+ */
+#define INSTR_NONE	-1
+static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
+{
+	int i;
+	u32 mask;
+
+	for (i = 0; i < table_entries; i++) {
+		mask = table[i][1];
+		if ((table[i][0] & mask) == (instr & mask))
+			return i;
+	}
+	return INSTR_NONE;
+}
+
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	return 0;
 }
 
+
+/******************************************************************************
+ * Load-Store instruction emulation
+ *****************************************************************************/
+
+/*
+ * Must be ordered with LOADS first and WRITES afterwards
+ * for easy distinction when doing MMIO.
+ */
+#define NUM_LD_INSTR  9
+enum INSTR_LS_INDEXES {
+	INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
+	INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
+	INSTR_LS_LDRSH,
+	INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
+	INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
+	NUM_LS_INSTR
+};
+
+static u32 ls_instr[NUM_LS_INSTR][2] = {
+	{0x04700000, 0x0d700000}, /* LDRBT */
+	{0x04300000, 0x0d700000}, /* LDRT  */
+	{0x04100000, 0x0c500000}, /* LDR   */
+	{0x04500000, 0x0c500000}, /* LDRB  */
+	{0x000000d0, 0x0e1000f0}, /* LDRD  */
+	{0x01900090, 0x0ff000f0}, /* LDREX */
+	{0x001000b0, 0x0e1000f0}, /* LDRH  */
+	{0x001000d0, 0x0e1000f0}, /* LDRSB */
+	{0x001000f0, 0x0e1000f0}, /* LDRSH */
+	{0x04600000, 0x0d700000}, /* STRBT */
+	{0x04200000, 0x0d700000}, /* STRT  */
+	{0x04000000, 0x0c500000}, /* STR   */
+	{0x04400000, 0x0c500000}, /* STRB  */
+	{0x000000f0, 0x0e1000f0}, /* STRD  */
+	{0x01800090, 0x0ff000f0}, /* STREX */
+	{0x000000b0, 0x0e1000f0}  /* STRH  */
+};
+
+static inline int get_arm_ls_instr_index(u32 instr)
+{
+	return kvm_instr_index(instr, ls_instr, NUM_LS_INSTR);
+}
+
+/*
+ * Load-Store instruction decoding
+ */
+#define INSTR_LS_TYPE_BIT		26
+#define INSTR_LS_RD_MASK		0x0000f000
+#define INSTR_LS_RD_SHIFT		12
+#define INSTR_LS_RN_MASK		0x000f0000
+#define INSTR_LS_RN_SHIFT		16
+#define INSTR_LS_RM_MASK		0x0000000f
+#define INSTR_LS_OFFSET12_MASK		0x00000fff
+
+#define INSTR_LS_BIT_P			24
+#define INSTR_LS_BIT_U			23
+#define INSTR_LS_BIT_B			22
+#define INSTR_LS_BIT_W			21
+#define INSTR_LS_BIT_L			20
+#define INSTR_LS_BIT_S			 6
+#define INSTR_LS_BIT_H			 5
+
+/*
+ * ARM addressing mode defines
+ */
+#define OFFSET_IMM_MASK			0x0e000000
+#define OFFSET_IMM_VALUE		0x04000000
+#define OFFSET_REG_MASK			0x0e000ff0
+#define OFFSET_REG_VALUE		0x06000000
+#define OFFSET_SCALE_MASK		0x0e000010
+#define OFFSET_SCALE_VALUE		0x06000000
+
+#define SCALE_SHIFT_MASK		0x000000a0
+#define SCALE_SHIFT_SHIFT		5
+#define SCALE_SHIFT_LSL			0x0
+#define SCALE_SHIFT_LSR			0x1
+#define SCALE_SHIFT_ASR			0x2
+#define SCALE_SHIFT_ROR_RRX		0x3
+#define SCALE_SHIFT_IMM_MASK		0x00000f80
+#define SCALE_SHIFT_IMM_SHIFT		6
+
+#define PSR_BIT_C			29
+
+static unsigned long ls_word_calc_offset(struct kvm_vcpu *vcpu,
+					 unsigned long instr)
+{
+	int offset = 0;
+
+	if ((instr & OFFSET_IMM_MASK) == OFFSET_IMM_VALUE) {
+		/* Immediate offset/index */
+		offset = instr & INSTR_LS_OFFSET12_MASK;
+
+		if (!(instr & (1U << INSTR_LS_BIT_U)))
+			offset = -offset;
+	}
+
+	if ((instr & OFFSET_REG_MASK) == OFFSET_REG_VALUE) {
+		/* Register offset/index */
+		u8 rm = instr & INSTR_LS_RM_MASK;
+		offset = *vcpu_reg(vcpu, rm);
+
+		if (!(instr & (1U << INSTR_LS_BIT_P)))
+			offset = 0;
+	}
+
+	if ((instr & OFFSET_SCALE_MASK) == OFFSET_SCALE_VALUE) {
+		/* Scaled register offset */
+		u8 rm = instr & INSTR_LS_RM_MASK;
+		u8 shift = (instr & SCALE_SHIFT_MASK) >> SCALE_SHIFT_SHIFT;
+		u32 shift_imm = (instr & SCALE_SHIFT_IMM_MASK)
+				>> SCALE_SHIFT_IMM_SHIFT;
+		offset = *vcpu_reg(vcpu, rm);
+
+		switch (shift) {
+		case SCALE_SHIFT_LSL:
+			offset = offset << shift_imm;
+			break;
+		case SCALE_SHIFT_LSR:
+			if (shift_imm == 0)
+				offset = 0;
+			else
+				offset = ((u32)offset) >> shift_imm;
+			break;
+		case SCALE_SHIFT_ASR:
+			if (shift_imm == 0) {
+				if (offset & (1U << 31))
+					offset = 0xffffffff;
+				else
+					offset = 0;
+			} else {
+				/* Ensure arithmetic shift */
+				asm("mov %[r], %[op], ASR %[s]" :
+				    [r] "=r" (offset) :
+				    [op] "r" (offset), [s] "r" (shift_imm));
+			}
+			break;
+		case SCALE_SHIFT_ROR_RRX:
+			if (shift_imm == 0) {
+				u32 C = (vcpu->arch.regs.cpsr &
+						(1U << PSR_BIT_C));
+				offset = (C << 31) | offset >> 1;
+			} else {
+				/* Ensure arithmetic shift */
+				asm("mov %[r], %[op], ASR %[s]" :
+				    [r] "=r" (offset) :
+				    [op] "r" (offset), [s] "r" (shift_imm));
+			}
+			break;
+		}
+
+		if (instr & (1U << INSTR_LS_BIT_U))
+			return offset;
+		else
+			return -offset;
+	}
+
+	if (instr & (1U << INSTR_LS_BIT_U))
+		return offset;
+	else
+		return -offset;
+
+	BUG();
+}
+
+static int kvm_ls_length(struct kvm_vcpu *vcpu, u32 instr)
+{
+	int index;
+
+	index = get_arm_ls_instr_index(instr);
+
+	if (instr & (1U << INSTR_LS_TYPE_BIT)) {
+		/* LS word or unsigned byte */
+		if (instr & (1U << INSTR_LS_BIT_B))
+			return sizeof(unsigned char);
+		else
+			return sizeof(u32);
+	} else {
+		/* LS halfword, doubleword or signed byte */
+		u32 H = (instr & (1U << INSTR_LS_BIT_H));
+		u32 S = (instr & (1U << INSTR_LS_BIT_S));
+		u32 L = (instr & (1U << INSTR_LS_BIT_L));
+
+		if (!L && S) {
+			kvm_err("WARNING: d-word for MMIO\n");
+			return 2 * sizeof(u32);
+		} else if (L && S && !H)
+			return sizeof(char);
+		else
+			return sizeof(u16);
+	}
+
+	BUG();
+}
+
+static bool kvm_decode_arm_ls(struct kvm_vcpu *vcpu, unsigned long instr,
+			      struct kvm_exit_mmio *mmio)
+{
+	int index;
+	bool is_write;
+	unsigned long rd, rn, offset, len;
+
+	index = get_arm_ls_instr_index(instr);
+	if (index == INSTR_NONE)
+		return false;
+
+	is_write = (index < NUM_LD_INSTR) ? false : true;
+	rd = (instr & INSTR_LS_RD_MASK) >> INSTR_LS_RD_SHIFT;
+	len = kvm_ls_length(vcpu, instr);
+
+	mmio->is_write = is_write;
+	mmio->len = len;
+
+	vcpu->arch.mmio.sign_extend = false;
+	vcpu->arch.mmio.rd = rd;
+
+	/* Handle base register writeback */
+	if (!(instr & (1U << INSTR_LS_BIT_P)) ||
+	     (instr & (1U << INSTR_LS_BIT_W))) {
+		rn = (instr & INSTR_LS_RN_MASK) >> INSTR_LS_RN_SHIFT;
+		offset = ls_word_calc_offset(vcpu, instr);
+		*vcpu_reg(vcpu, rn) += offset;
+	}
+
+	return true;
+}
+
+struct thumb_instr {
+	bool is32;
+
+	union {
+		struct {
+			u8 opcode;
+			u8 mask;
+		} t16;
+
+		struct {
+			u8 op1;
+			u8 op2;
+			u8 op2_mask;
+		} t32;
+	};
+
+	bool (*decode)(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
+		       unsigned long instr, const struct thumb_instr *ti);
+};
+
+static bool decode_thumb_wb(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
+			    unsigned long instr)
+{
+	bool P = (instr >> 10) & 1;
+	bool U = (instr >> 9) & 1;
+	u8 imm8 = instr & 0xff;
+	u32 offset_addr = vcpu->arch.hdfar;
+	u8 Rn = (instr >> 16) & 0xf;
+
+	vcpu->arch.mmio.rd = (instr >> 12) & 0xf;
+
+	if (Rn == 15)
+		return false;
+
+	/* Handle Writeback */
+	if (!P && U)
+		*vcpu_reg(vcpu, Rn) = offset_addr + imm8;
+	else if (!P && !U)
+		*vcpu_reg(vcpu, Rn) = offset_addr - imm8;
+	return true;
+}
+
+static bool decode_thumb_str(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
+			     unsigned long instr, const struct thumb_instr *ti)
+{
+	u8 op1 = (instr >> (16 + 5)) & 0x7;
+	u8 op2 = (instr >> 6) & 0x3f;
+
+	mmio->is_write = true;
+	vcpu->arch.mmio.sign_extend = false;
+
+	switch (op1) {
+	case 0x0: mmio->len = 1; break;
+	case 0x1: mmio->len = 2; break;
+	case 0x2: mmio->len = 4; break;
+	default:
+		  return false; /* Only register write-back versions! */
+	}
+
+	if ((op2 & 0x24) == 0x24) {
+		/* STRB (immediate, thumb, W=1) */
+		return decode_thumb_wb(vcpu, mmio, instr);
+	}
+
+	return false;
+}
+
+static bool decode_thumb_ldr(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
+			     unsigned long instr, const struct thumb_instr *ti)
+{
+	u8 op1 = (instr >> (16 + 7)) & 0x3;
+	u8 op2 = (instr >> 6) & 0x3f;
+
+	mmio->is_write = false;
+
+	switch (ti->t32.op2 & 0x7) {
+	case 0x1: mmio->len = 1; break;
+	case 0x3: mmio->len = 2; break;
+	case 0x5: mmio->len = 4; break;
+	}
+
+	if (op1 == 0x0)
+		vcpu->arch.mmio.sign_extend = false;
+	else if (op1 == 0x2 && (ti->t32.op2 & 0x7) != 0x5)
+		vcpu->arch.mmio.sign_extend = true;
+	else
+		return false; /* Only register write-back versions! */
+
+	if ((op2 & 0x24) == 0x24) {
+		/* LDR{S}X (immediate, thumb, W=1) */
+		return decode_thumb_wb(vcpu, mmio, instr);
+	}
+
+	return false;
+}
+
+/*
+ * We only support instruction decoding for valid reasonable MMIO operations
+ * where trapping them do not provide sufficient information in the HSR (no
+ * 16-bit Thumb instructions provide register writeback that we care about).
+ *
+ * The following instruciton types are NOT supported for MMIO operations
+ * despite the HSR not containing decode info:
+ *  - any Load/Store multiple
+ *  - any load/store exclusive
+ *  - any load/store dual
+ *  - anything with the PC as the dest register
+ */
+static const struct thumb_instr thumb_instr[] = {
+	/**************** 32-bit Thumb instructions **********************/
+	/* Store single data item:	Op1 == 11, Op2 == 000xxx0 */
+	{ .is32 = true,  .t32 = { 3, 0x00, 0x71}, decode_thumb_str	},
+	/* Load byte:			Op1 == 11, Op2 == 00xx001 */
+	{ .is32 = true,  .t32 = { 3, 0x01, 0x67}, decode_thumb_ldr	},
+	/* Load halfword:		Op1 == 11, Op2 == 00xx011 */
+	{ .is32 = true,  .t32 = { 3, 0x03, 0x67}, decode_thumb_ldr	},
+	/* Load word:			Op1 == 11, Op2 == 00xx101 */
+	{ .is32 = true,  .t32 = { 3, 0x05, 0x67}, decode_thumb_ldr	},
+};
+
+
+
+static bool kvm_decode_thumb_ls(struct kvm_vcpu *vcpu, unsigned long instr,
+				struct kvm_exit_mmio *mmio)
+{
+	bool is32 = is_wide_instruction(instr);
+	bool is16 = !is32;
+	struct thumb_instr tinstr; /* re-use to pass on already decoded info */
+	int i;
+
+	if (is16) {
+		tinstr.t16.opcode = (instr >> 10) & 0x3f;
+	} else {
+		tinstr.t32.op1 = (instr >> (16 + 11)) & 0x3;
+		tinstr.t32.op2 = (instr >> (16 + 4)) & 0x7f;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(thumb_instr); i++) {
+		const struct thumb_instr *ti = &thumb_instr[i];
+		if (ti->is32 != is32)
+			continue;
+
+		if (is16) {
+			if ((tinstr.t16.opcode & ti->t16.mask) != ti->t16.opcode)
+				continue;
+		} else {
+			if (ti->t32.op1 != tinstr.t32.op1)
+				continue;
+			if ((ti->t32.op2_mask & tinstr.t32.op2) != ti->t32.op2)
+				continue;
+		}
+
+		return ti->decode(vcpu, mmio, instr, &tinstr);
+	}
+
+	return false;
+}
+
+/**
+ * kvm_emulate_mmio_ls - emulates load/store instructions made to I/O memory
+ * @vcpu:	The vcpu pointer
+ * @fault_ipa:	The IPA that caused the 2nd stage fault
+ * @instr:	The instruction that caused the fault
+ *
+ * Handles emulation of load/store instructions which cannot be emulated through
+ * information found in the HSR on faults. It is necessary in this case to
+ * simply decode the offending instruction in software and determine the
+ * required operands.
+ */
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			unsigned long instr, struct kvm_exit_mmio *mmio)
+{
+	bool is_thumb;
+
+	trace_kvm_mmio_emulate(vcpu->arch.regs.pc, instr, vcpu->arch.regs.cpsr);
+
+	mmio->phys_addr = fault_ipa;
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	if (!is_thumb && !kvm_decode_arm_ls(vcpu, instr, mmio)) {
+		kvm_debug("Unable to decode inst: %#08lx (cpsr: %#08x (T=0)"
+			  "pc: %#08x)\n",
+			  instr, *vcpu_cpsr(vcpu), *vcpu_pc(vcpu));
+		kvm_inject_dabt(vcpu, vcpu->arch.hdfar);
+		return 1;
+	} else if (is_thumb && !kvm_decode_thumb_ls(vcpu, instr, mmio)) {
+		kvm_debug("Unable to decode inst: %#08lx (cpsr: %#08x (T=1)"
+			  "pc: %#08x)\n",
+			  instr, *vcpu_cpsr(vcpu), *vcpu_pc(vcpu));
+		kvm_inject_dabt(vcpu, vcpu->arch.hdfar);
+		return 1;
+	}
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	kvm_skip_instr(vcpu, is_wide_instruction(instr));
+	return 0;
+}
+
 /**
  * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
  * @vcpu:	The VCPU pointer
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index f39f823..843c3bc 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -34,5 +34,6 @@ EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
 
 EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
 EXPORT_SYMBOL_GPL(__kvm_tlb_flush_vmid);
+EXPORT_SYMBOL_GPL(__kvm_va_to_pa);
 
 EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index cc9448b..ab78477 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -128,6 +128,7 @@ ENDPROC(__kvm_flush_vm_context)
 	VFPFMXR FPEXC, r2	@ FPEXC	(last, in case !EN)
 .endm
 
+
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor world-switch code
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@ -520,6 +521,45 @@ after_vfp_restore:
 	bx	lr			@ return to IOCTL
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Translate VA to PA
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+@ Arguments:
+@  r0: pointer to vcpu struct
+@  r1: virtual address to map (rounded to page)
+@  r2: 1 = P1 (read) mapping, 0 = P0 (read) mapping.
+@ Returns 64 bit PAR value.
+ENTRY(__kvm_va_to_pa)
+	hvc	#0			@ switch to hyp-mode
+
+	push	{r4-r12}
+
+	@ Fold flag into r1, easier than using stack.
+	cmp	r2, #0
+	movne	r2, #1
+	orr	r1, r1, r2
+
+	@ This swaps too many registers, but we're in the slow path anyway.
+	read_cp15_state
+	write_cp15_state 1, r0
+
+	ands	r2, r1, #1
+	bic	r1, r1, r2
+	mcrne	p15, 0, r1, c7, c8, 0	@ VA to PA, ATS1CPR
+	mcreq	p15, 0, r1, c7, c8, 2	@ VA to PA, ATS1CUR
+	isb
+
+	@ Restore host state.
+	read_cp15_state 1, r0
+	write_cp15_state
+
+	mrrc	p15, 0, r0, r1, c7	@ PAR
+	pop	{r4-r12}
+	hvc	#0			@ Back to SVC
+	bx	lr
+
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor exception vector and handlers
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 52cc280..dc760bb 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -19,6 +19,7 @@
 #include <linux/mman.h>
 #include <linux/kvm_host.h>
 #include <linux/io.h>
+#include <trace/events/kvm.h>
 #include <asm/idmap.h>
 #include <asm/pgalloc.h>
 #include <asm/cacheflush.h>
@@ -589,6 +590,266 @@ out_put_existing:
 }
 
 /**
+ * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation
+ * @vcpu: The VCPU pointer
+ * @run:  The VCPU run struct containing the mmio data
+ *
+ * This should only be called after returning from userspace for MMIO load
+ * emulation.
+ */
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	int *dest;
+	unsigned int len;
+	int mask;
+
+	if (!run->mmio.is_write) {
+		dest = vcpu_reg(vcpu, vcpu->arch.mmio.rd);
+		memset(dest, 0, sizeof(int));
+
+		len = run->mmio.len;
+		if (len > 4)
+			return -EINVAL;
+
+		memcpy(dest, run->mmio.data, len);
+
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
+				*((u64 *)run->mmio.data));
+
+		if (vcpu->arch.mmio.sign_extend && len < 4) {
+			mask = 1U << ((len * 8) - 1);
+			*dest = (*dest ^ mask) - mask;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * copy_from_guest_va - copy memory from guest (very slow!)
+ * @vcpu:	vcpu pointer
+ * @dest:	memory to copy into
+ * @gva:	virtual address in guest to copy from
+ * @len:	length to copy
+ * @priv:	use guest PL1 (ie. kernel) mappings
+ *              otherwise use guest PL0 mappings.
+ *
+ * Returns true on success, false on failure (unlikely, but retry).
+ */
+static bool copy_from_guest_va(struct kvm_vcpu *vcpu,
+			       void *dest, unsigned long gva, size_t len,
+			       bool priv)
+{
+	u64 par;
+	phys_addr_t pc_ipa;
+	int err;
+
+	BUG_ON((gva & PAGE_MASK) != ((gva + len) & PAGE_MASK));
+	par = __kvm_va_to_pa(vcpu, gva & PAGE_MASK, priv);
+	if (par & 1) {
+		kvm_err("IO abort from invalid instruction address"
+			" %#lx!\n", gva);
+		return false;
+	}
+
+	BUG_ON(!(par & (1U << 11)));
+	pc_ipa = par & PAGE_MASK & ((1ULL << 32) - 1);
+	pc_ipa += gva & ~PAGE_MASK;
+
+
+	err = kvm_read_guest(vcpu->kvm, pc_ipa, dest, len);
+	if (unlikely(err))
+		return false;
+
+	return true;
+}
+
+/* Just ensure we're not running the guest. */
+static void do_nothing(void *info)
+{
+}
+
+/*
+ * We have to be very careful copying memory from a running (ie. SMP) guest.
+ * Another CPU may remap the page (eg. swap out a userspace text page) as we
+ * read the instruction.  Unlike normal hardware operation, to emulate an
+ * instruction we map the virtual to physical address then read that memory
+ * as separate steps, thus not atomic.
+ *
+ * Fortunately this is so rare (we don't usually need the instruction), we
+ * can go very slowly and noone will mind.
+ */
+static bool copy_current_insn(struct kvm_vcpu *vcpu, unsigned long *instr)
+{
+	int i;
+	bool ret;
+	struct kvm_vcpu *v;
+	bool is_thumb;
+	size_t instr_len;
+
+	/* Don't cross with IPIs in kvm_main.c */
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/* Tell them all to pause, so no more will enter guest. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		v->arch.pause = true;
+
+	/* Set ->pause before we read ->mode */
+	smp_mb();
+
+	/* Kick out any which are still running. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm) {
+		/* Guest could exit now, making cpu wrong. That's OK. */
+		if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE)
+			smp_call_function_single(v->cpu, do_nothing, NULL, 1);
+	}
+
+
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	instr_len = (is_thumb) ? 2 : 4;
+
+	BUG_ON(!is_thumb && vcpu->arch.regs.pc & 0x3);
+
+	/* Now guest isn't running, we can va->pa map and copy atomically. */
+	ret = copy_from_guest_va(vcpu, instr, vcpu->arch.regs.pc, instr_len,
+				 vcpu_mode_priv(vcpu));
+	if (!ret)
+		goto out;
+
+	/* A 32-bit thumb2 instruction can actually go over a page boundary! */
+	if (is_thumb && is_wide_instruction(*instr)) {
+		*instr = *instr << 16;
+		ret = copy_from_guest_va(vcpu, instr, vcpu->arch.regs.pc + 2, 2,
+					 vcpu_mode_priv(vcpu));
+	}
+
+out:
+	/* Release them all. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		v->arch.pause = false;
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return ret;
+}
+
+/**
+ * invalid_io_mem_abort -- Handle I/O aborts ISV bit is clear
+ *
+ * @vcpu:      The vcpu pointer
+ * @fault_ipa: The IPA that caused the 2nd stage fault
+ * @mmio:      Pointer to struct to hold decode information
+ *
+ * Some load/store instructions cannot be emulated using the information
+ * presented in the HSR, for instance, register write-back instructions are not
+ * supported. We therefore need to fetch the instruction, decode it, and then
+ * emulate its behavior.
+ */
+static int invalid_io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+				struct kvm_exit_mmio *mmio)
+{
+	unsigned long instr = 0;
+
+	/* If it fails (SMP race?), we reenter guest for it to retry. */
+	if (!copy_current_insn(vcpu, &instr))
+		return 1;
+
+	return kvm_emulate_mmio_ls(vcpu, fault_ipa, instr, mmio);
+}
+
+static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+		      struct kvm_exit_mmio *mmio)
+{
+	unsigned long rd, len;
+	bool is_write, sign_extend;
+
+	if ((vcpu->arch.hsr >> 8) & 1) {
+		/* cache operation on I/O addr, tell guest unsupported */
+		kvm_inject_dabt(vcpu, vcpu->arch.hdfar);
+		return 1;
+	}
+
+	if ((vcpu->arch.hsr >> 7) & 1) {
+		/* page table accesses IO mem: tell guest to fix its TTBR */
+		kvm_inject_dabt(vcpu, vcpu->arch.hdfar);
+		return 1;
+	}
+
+	switch ((vcpu->arch.hsr >> 22) & 0x3) {
+	case 0:
+		len = 1;
+		break;
+	case 1:
+		len = 2;
+		break;
+	case 2:
+		len = 4;
+		break;
+	default:
+		kvm_err("Hardware is weird: SAS 0b11 is reserved\n");
+		return -EFAULT;
+	}
+
+	is_write = vcpu->arch.hsr & HSR_WNR;
+	sign_extend = vcpu->arch.hsr & HSR_SSE;
+	rd = (vcpu->arch.hsr & HSR_SRT_MASK) >> HSR_SRT_SHIFT;
+
+	if (rd == 15) {
+		/* IO memory trying to read/write pc */
+		kvm_inject_pabt(vcpu, vcpu->arch.hdfar);
+		return 1;
+	}
+
+	mmio->is_write = is_write;
+	mmio->phys_addr = fault_ipa;
+	mmio->len = len;
+	vcpu->arch.mmio.sign_extend = sign_extend;
+	vcpu->arch.mmio.rd = rd;
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	kvm_skip_instr(vcpu, (vcpu->arch.hsr >> 25) & 1);
+	return 0;
+}
+
+static int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
+{
+	struct kvm_exit_mmio mmio;
+	unsigned long rd;
+	int ret;
+
+	/*
+	 * Prepare MMIO operation. First stash it in a private
+	 * structure that we can use for in-kernel emulation. If the
+	 * kernel can't handle it, copy it into run->mmio and let user
+	 * space do its magic.
+	 */
+
+	if (vcpu->arch.hsr & HSR_ISV)
+		ret = decode_hsr(vcpu, fault_ipa, &mmio);
+	else
+		ret = invalid_io_mem_abort(vcpu, fault_ipa, &mmio);
+
+	if (ret != 0)
+		return ret;
+
+	rd = vcpu->arch.mmio.rd;
+	trace_kvm_mmio((mmio.is_write) ? KVM_TRACE_MMIO_WRITE :
+					 KVM_TRACE_MMIO_READ_UNSATISFIED,
+			mmio.len, fault_ipa,
+			(mmio.is_write) ? *vcpu_reg(vcpu, rd) : 0);
+
+	if (mmio.is_write)
+		memcpy(mmio.data, vcpu_reg(vcpu, rd), mmio.len);
+
+	kvm_prepare_mmio(run, &mmio);
+	return 0;
+}
+
+/**
  * kvm_handle_guest_abort - handles all 2nd stage aborts
  * @vcpu:	the VCPU pointer
  * @run:	the kvm_run structure
@@ -633,8 +894,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			return 1;
 		}
 
-		kvm_pr_unimpl("I/O address abort...");
-		return 0;
+		/* Adjust page offset */
+		fault_ipa |= vcpu->arch.hdfar & ~PAGE_MASK;
+		return io_mem_abort(vcpu, run, fault_ipa, memslot);
 	}
 
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 40606c9..7199b58 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -92,6 +92,27 @@ TRACE_EVENT(kvm_irq_line,
 		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
 );
 
+TRACE_EVENT(kvm_mmio_emulate,
+	TP_PROTO(unsigned long vcpu_pc, unsigned long instr,
+		 unsigned long cpsr),
+	TP_ARGS(vcpu_pc, instr, cpsr),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+		__field(	unsigned long,	instr		)
+		__field(	unsigned long,	cpsr		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+		__entry->instr			= instr;
+		__entry->cpsr			= cpsr;
+	),
+
+	TP_printk("Emulate MMIO at: 0x%08lx (instr: %08lx, cpsr: %08lx)",
+		  __entry->vcpu_pc, __entry->instr, __entry->cpsr)
+);
+
 /* Architecturally implementation defined CP15 register access */
 TRACE_EVENT(kvm_emulate_cp15_imp,
 	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 14/15] KVM: ARM: Handle I/O aborts
@ 2012-09-15 15:35   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.

Certain classes of load/store operations do not support the syndrome
information provided in the HSR and we therefore must be able to fetch
the offending instruction from guest memory and decode it manually.

We only support instruction decoding for valid reasonable MMIO operations
where trapping them do not provide sufficient information in the HSR (no
16-bit Thumb instructions provide register writeback that we care about).

The following instruciton types are NOT supported for MMIO operations
despite the HSR not containing decode info:
 - any Load/Store multiple
 - any load/store exclusive
 - any load/store dual
 - anything with the PC as the dest register

This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.

Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
(1) Guest complicated mmio instruction traps.
(2) The hardware doesn't tell us enough, so we need to read the actual
    instruction which was being exectuted.
(3) KVM maps the instruction virtual address to a physical address.
(4) The guest (SMP) swaps out that page, and fills it with something else.
(5) We read the physical address, but now that's the wrong thing.

Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h     |    3 
 arch/arm/include/asm/kvm_asm.h     |    2 
 arch/arm/include/asm/kvm_emulate.h |   23 ++
 arch/arm/include/asm/kvm_host.h    |    3 
 arch/arm/include/asm/kvm_mmu.h     |    1 
 arch/arm/kvm/arm.c                 |   14 +
 arch/arm/kvm/emulate.c             |  448 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/exports.c             |    1 
 arch/arm/kvm/interrupts.S          |   40 +++
 arch/arm/kvm/mmu.c                 |  266 +++++++++++++++++++++
 arch/arm/kvm/trace.h               |   21 ++
 11 files changed, 819 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 4cff3b7..21cb240 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -158,8 +158,11 @@
 #define HSR_ISS		(HSR_IL - 1)
 #define HSR_ISV_SHIFT	(24)
 #define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_SRT_SHIFT	(16)
+#define HSR_SRT_MASK	(0xf << HSR_SRT_SHIFT)
 #define HSR_FSC		(0x3f)
 #define HSR_FSC_TYPE	(0x3c)
+#define HSR_SSE		(1 << 21)
 #define HSR_WNR		(1 << 6)
 #define HSR_CV_SHIFT	(24)
 #define HSR_CV		(1U << HSR_CV_SHIFT)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 40ee099..5315c69 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -49,6 +49,8 @@ extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+
+extern u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv);
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 6ddfae2..68489f2 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -22,6 +22,27 @@
 #include <linux/kvm_host.h>
 #include <asm/kvm_asm.h>
 
+/*
+ * The in-kernel MMIO emulation code wants to use a copy of run->mmio,
+ * which is an anonymous type. Use our own type instead.
+ */
+struct kvm_exit_mmio {
+	phys_addr_t	phys_addr;
+	u8		data[8];
+	u32		len;
+	bool		is_write;
+};
+
+static inline void kvm_prepare_mmio(struct kvm_run *run,
+				    struct kvm_exit_mmio *mmio)
+{
+	run->mmio.phys_addr	= mmio->phys_addr;
+	run->mmio.len		= mmio->len;
+	run->mmio.is_write	= mmio->is_write;
+	memcpy(run->mmio.data, mmio->data, mmio->len);
+	run->exit_reason	= KVM_EXIT_MMIO;
+}
+
 u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, enum vcpu_mode mode);
 
 static inline u8 __vcpu_mode(u32 cpsr)
@@ -52,6 +73,8 @@ static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
 }
 
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			unsigned long instr, struct kvm_exit_mmio *mmio);
 void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 97608d7..3fec9ad 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -152,6 +152,9 @@ struct kvm_vcpu_arch {
 	int last_pcpu;
 	cpumask_t require_dcache_flush;
 
+	/* Don't run the guest: see copy_current_insn() */
+	bool pause;
+
 	/* IO related fields */
 	struct {
 		bool sign_extend;	/* for byte/halfword loads */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 11f4c3a..c3f90b0 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -38,6 +38,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size);
 
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 8dc5887..06a3368 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -570,6 +570,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	if (unlikely(!vcpu->arch.target))
 		return -ENOEXEC;
 
+	if (run->exit_reason == KVM_EXIT_MMIO) {
+		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
+		if (ret)
+			return ret;
+	}
+
 	if (vcpu->sigset_active)
 		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
 
@@ -607,7 +613,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_guest_enter();
 		vcpu->mode = IN_GUEST_MODE;
 
-		ret = __kvm_vcpu_run(vcpu);
+		smp_mb(); /* set mode before reading vcpu->arch.pause */
+		if (unlikely(vcpu->arch.pause)) {
+			/* This means ignore, try again. */
+			ret = ARM_EXCEPTION_IRQ;
+		} else {
+			ret = __kvm_vcpu_run(vcpu);
+		}
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		vcpu->arch.last_pcpu = smp_processor_id();
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 9236d10..2670679 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -132,11 +132,459 @@ u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
 	return reg_array + vcpu_reg_offsets[mode][reg_num];
 }
 
+/******************************************************************************
+ * Utility functions common for all emulation code
+ *****************************************************************************/
+
+/*
+ * This one accepts a matrix where the first element is the
+ * bits as they must be, and the second element is the bitmask.
+ */
+#define INSTR_NONE	-1
+static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
+{
+	int i;
+	u32 mask;
+
+	for (i = 0; i < table_entries; i++) {
+		mask = table[i][1];
+		if ((table[i][0] & mask) == (instr & mask))
+			return i;
+	}
+	return INSTR_NONE;
+}
+
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	return 0;
 }
 
+
+/******************************************************************************
+ * Load-Store instruction emulation
+ *****************************************************************************/
+
+/*
+ * Must be ordered with LOADS first and WRITES afterwards
+ * for easy distinction when doing MMIO.
+ */
+#define NUM_LD_INSTR  9
+enum INSTR_LS_INDEXES {
+	INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
+	INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
+	INSTR_LS_LDRSH,
+	INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
+	INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
+	NUM_LS_INSTR
+};
+
+static u32 ls_instr[NUM_LS_INSTR][2] = {
+	{0x04700000, 0x0d700000}, /* LDRBT */
+	{0x04300000, 0x0d700000}, /* LDRT  */
+	{0x04100000, 0x0c500000}, /* LDR   */
+	{0x04500000, 0x0c500000}, /* LDRB  */
+	{0x000000d0, 0x0e1000f0}, /* LDRD  */
+	{0x01900090, 0x0ff000f0}, /* LDREX */
+	{0x001000b0, 0x0e1000f0}, /* LDRH  */
+	{0x001000d0, 0x0e1000f0}, /* LDRSB */
+	{0x001000f0, 0x0e1000f0}, /* LDRSH */
+	{0x04600000, 0x0d700000}, /* STRBT */
+	{0x04200000, 0x0d700000}, /* STRT  */
+	{0x04000000, 0x0c500000}, /* STR   */
+	{0x04400000, 0x0c500000}, /* STRB  */
+	{0x000000f0, 0x0e1000f0}, /* STRD  */
+	{0x01800090, 0x0ff000f0}, /* STREX */
+	{0x000000b0, 0x0e1000f0}  /* STRH  */
+};
+
+static inline int get_arm_ls_instr_index(u32 instr)
+{
+	return kvm_instr_index(instr, ls_instr, NUM_LS_INSTR);
+}
+
+/*
+ * Load-Store instruction decoding
+ */
+#define INSTR_LS_TYPE_BIT		26
+#define INSTR_LS_RD_MASK		0x0000f000
+#define INSTR_LS_RD_SHIFT		12
+#define INSTR_LS_RN_MASK		0x000f0000
+#define INSTR_LS_RN_SHIFT		16
+#define INSTR_LS_RM_MASK		0x0000000f
+#define INSTR_LS_OFFSET12_MASK		0x00000fff
+
+#define INSTR_LS_BIT_P			24
+#define INSTR_LS_BIT_U			23
+#define INSTR_LS_BIT_B			22
+#define INSTR_LS_BIT_W			21
+#define INSTR_LS_BIT_L			20
+#define INSTR_LS_BIT_S			 6
+#define INSTR_LS_BIT_H			 5
+
+/*
+ * ARM addressing mode defines
+ */
+#define OFFSET_IMM_MASK			0x0e000000
+#define OFFSET_IMM_VALUE		0x04000000
+#define OFFSET_REG_MASK			0x0e000ff0
+#define OFFSET_REG_VALUE		0x06000000
+#define OFFSET_SCALE_MASK		0x0e000010
+#define OFFSET_SCALE_VALUE		0x06000000
+
+#define SCALE_SHIFT_MASK		0x000000a0
+#define SCALE_SHIFT_SHIFT		5
+#define SCALE_SHIFT_LSL			0x0
+#define SCALE_SHIFT_LSR			0x1
+#define SCALE_SHIFT_ASR			0x2
+#define SCALE_SHIFT_ROR_RRX		0x3
+#define SCALE_SHIFT_IMM_MASK		0x00000f80
+#define SCALE_SHIFT_IMM_SHIFT		6
+
+#define PSR_BIT_C			29
+
+static unsigned long ls_word_calc_offset(struct kvm_vcpu *vcpu,
+					 unsigned long instr)
+{
+	int offset = 0;
+
+	if ((instr & OFFSET_IMM_MASK) == OFFSET_IMM_VALUE) {
+		/* Immediate offset/index */
+		offset = instr & INSTR_LS_OFFSET12_MASK;
+
+		if (!(instr & (1U << INSTR_LS_BIT_U)))
+			offset = -offset;
+	}
+
+	if ((instr & OFFSET_REG_MASK) == OFFSET_REG_VALUE) {
+		/* Register offset/index */
+		u8 rm = instr & INSTR_LS_RM_MASK;
+		offset = *vcpu_reg(vcpu, rm);
+
+		if (!(instr & (1U << INSTR_LS_BIT_P)))
+			offset = 0;
+	}
+
+	if ((instr & OFFSET_SCALE_MASK) == OFFSET_SCALE_VALUE) {
+		/* Scaled register offset */
+		u8 rm = instr & INSTR_LS_RM_MASK;
+		u8 shift = (instr & SCALE_SHIFT_MASK) >> SCALE_SHIFT_SHIFT;
+		u32 shift_imm = (instr & SCALE_SHIFT_IMM_MASK)
+				>> SCALE_SHIFT_IMM_SHIFT;
+		offset = *vcpu_reg(vcpu, rm);
+
+		switch (shift) {
+		case SCALE_SHIFT_LSL:
+			offset = offset << shift_imm;
+			break;
+		case SCALE_SHIFT_LSR:
+			if (shift_imm == 0)
+				offset = 0;
+			else
+				offset = ((u32)offset) >> shift_imm;
+			break;
+		case SCALE_SHIFT_ASR:
+			if (shift_imm == 0) {
+				if (offset & (1U << 31))
+					offset = 0xffffffff;
+				else
+					offset = 0;
+			} else {
+				/* Ensure arithmetic shift */
+				asm("mov %[r], %[op], ASR %[s]" :
+				    [r] "=r" (offset) :
+				    [op] "r" (offset), [s] "r" (shift_imm));
+			}
+			break;
+		case SCALE_SHIFT_ROR_RRX:
+			if (shift_imm == 0) {
+				u32 C = (vcpu->arch.regs.cpsr &
+						(1U << PSR_BIT_C));
+				offset = (C << 31) | offset >> 1;
+			} else {
+				/* Ensure arithmetic shift */
+				asm("mov %[r], %[op], ASR %[s]" :
+				    [r] "=r" (offset) :
+				    [op] "r" (offset), [s] "r" (shift_imm));
+			}
+			break;
+		}
+
+		if (instr & (1U << INSTR_LS_BIT_U))
+			return offset;
+		else
+			return -offset;
+	}
+
+	if (instr & (1U << INSTR_LS_BIT_U))
+		return offset;
+	else
+		return -offset;
+
+	BUG();
+}
+
+static int kvm_ls_length(struct kvm_vcpu *vcpu, u32 instr)
+{
+	int index;
+
+	index = get_arm_ls_instr_index(instr);
+
+	if (instr & (1U << INSTR_LS_TYPE_BIT)) {
+		/* LS word or unsigned byte */
+		if (instr & (1U << INSTR_LS_BIT_B))
+			return sizeof(unsigned char);
+		else
+			return sizeof(u32);
+	} else {
+		/* LS halfword, doubleword or signed byte */
+		u32 H = (instr & (1U << INSTR_LS_BIT_H));
+		u32 S = (instr & (1U << INSTR_LS_BIT_S));
+		u32 L = (instr & (1U << INSTR_LS_BIT_L));
+
+		if (!L && S) {
+			kvm_err("WARNING: d-word for MMIO\n");
+			return 2 * sizeof(u32);
+		} else if (L && S && !H)
+			return sizeof(char);
+		else
+			return sizeof(u16);
+	}
+
+	BUG();
+}
+
+static bool kvm_decode_arm_ls(struct kvm_vcpu *vcpu, unsigned long instr,
+			      struct kvm_exit_mmio *mmio)
+{
+	int index;
+	bool is_write;
+	unsigned long rd, rn, offset, len;
+
+	index = get_arm_ls_instr_index(instr);
+	if (index == INSTR_NONE)
+		return false;
+
+	is_write = (index < NUM_LD_INSTR) ? false : true;
+	rd = (instr & INSTR_LS_RD_MASK) >> INSTR_LS_RD_SHIFT;
+	len = kvm_ls_length(vcpu, instr);
+
+	mmio->is_write = is_write;
+	mmio->len = len;
+
+	vcpu->arch.mmio.sign_extend = false;
+	vcpu->arch.mmio.rd = rd;
+
+	/* Handle base register writeback */
+	if (!(instr & (1U << INSTR_LS_BIT_P)) ||
+	     (instr & (1U << INSTR_LS_BIT_W))) {
+		rn = (instr & INSTR_LS_RN_MASK) >> INSTR_LS_RN_SHIFT;
+		offset = ls_word_calc_offset(vcpu, instr);
+		*vcpu_reg(vcpu, rn) += offset;
+	}
+
+	return true;
+}
+
+struct thumb_instr {
+	bool is32;
+
+	union {
+		struct {
+			u8 opcode;
+			u8 mask;
+		} t16;
+
+		struct {
+			u8 op1;
+			u8 op2;
+			u8 op2_mask;
+		} t32;
+	};
+
+	bool (*decode)(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
+		       unsigned long instr, const struct thumb_instr *ti);
+};
+
+static bool decode_thumb_wb(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
+			    unsigned long instr)
+{
+	bool P = (instr >> 10) & 1;
+	bool U = (instr >> 9) & 1;
+	u8 imm8 = instr & 0xff;
+	u32 offset_addr = vcpu->arch.hdfar;
+	u8 Rn = (instr >> 16) & 0xf;
+
+	vcpu->arch.mmio.rd = (instr >> 12) & 0xf;
+
+	if (Rn == 15)
+		return false;
+
+	/* Handle Writeback */
+	if (!P && U)
+		*vcpu_reg(vcpu, Rn) = offset_addr + imm8;
+	else if (!P && !U)
+		*vcpu_reg(vcpu, Rn) = offset_addr - imm8;
+	return true;
+}
+
+static bool decode_thumb_str(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
+			     unsigned long instr, const struct thumb_instr *ti)
+{
+	u8 op1 = (instr >> (16 + 5)) & 0x7;
+	u8 op2 = (instr >> 6) & 0x3f;
+
+	mmio->is_write = true;
+	vcpu->arch.mmio.sign_extend = false;
+
+	switch (op1) {
+	case 0x0: mmio->len = 1; break;
+	case 0x1: mmio->len = 2; break;
+	case 0x2: mmio->len = 4; break;
+	default:
+		  return false; /* Only register write-back versions! */
+	}
+
+	if ((op2 & 0x24) == 0x24) {
+		/* STRB (immediate, thumb, W=1) */
+		return decode_thumb_wb(vcpu, mmio, instr);
+	}
+
+	return false;
+}
+
+static bool decode_thumb_ldr(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
+			     unsigned long instr, const struct thumb_instr *ti)
+{
+	u8 op1 = (instr >> (16 + 7)) & 0x3;
+	u8 op2 = (instr >> 6) & 0x3f;
+
+	mmio->is_write = false;
+
+	switch (ti->t32.op2 & 0x7) {
+	case 0x1: mmio->len = 1; break;
+	case 0x3: mmio->len = 2; break;
+	case 0x5: mmio->len = 4; break;
+	}
+
+	if (op1 == 0x0)
+		vcpu->arch.mmio.sign_extend = false;
+	else if (op1 == 0x2 && (ti->t32.op2 & 0x7) != 0x5)
+		vcpu->arch.mmio.sign_extend = true;
+	else
+		return false; /* Only register write-back versions! */
+
+	if ((op2 & 0x24) == 0x24) {
+		/* LDR{S}X (immediate, thumb, W=1) */
+		return decode_thumb_wb(vcpu, mmio, instr);
+	}
+
+	return false;
+}
+
+/*
+ * We only support instruction decoding for valid reasonable MMIO operations
+ * where trapping them do not provide sufficient information in the HSR (no
+ * 16-bit Thumb instructions provide register writeback that we care about).
+ *
+ * The following instruciton types are NOT supported for MMIO operations
+ * despite the HSR not containing decode info:
+ *  - any Load/Store multiple
+ *  - any load/store exclusive
+ *  - any load/store dual
+ *  - anything with the PC as the dest register
+ */
+static const struct thumb_instr thumb_instr[] = {
+	/**************** 32-bit Thumb instructions **********************/
+	/* Store single data item:	Op1 == 11, Op2 == 000xxx0 */
+	{ .is32 = true,  .t32 = { 3, 0x00, 0x71}, decode_thumb_str	},
+	/* Load byte:			Op1 == 11, Op2 == 00xx001 */
+	{ .is32 = true,  .t32 = { 3, 0x01, 0x67}, decode_thumb_ldr	},
+	/* Load halfword:		Op1 == 11, Op2 == 00xx011 */
+	{ .is32 = true,  .t32 = { 3, 0x03, 0x67}, decode_thumb_ldr	},
+	/* Load word:			Op1 == 11, Op2 == 00xx101 */
+	{ .is32 = true,  .t32 = { 3, 0x05, 0x67}, decode_thumb_ldr	},
+};
+
+
+
+static bool kvm_decode_thumb_ls(struct kvm_vcpu *vcpu, unsigned long instr,
+				struct kvm_exit_mmio *mmio)
+{
+	bool is32 = is_wide_instruction(instr);
+	bool is16 = !is32;
+	struct thumb_instr tinstr; /* re-use to pass on already decoded info */
+	int i;
+
+	if (is16) {
+		tinstr.t16.opcode = (instr >> 10) & 0x3f;
+	} else {
+		tinstr.t32.op1 = (instr >> (16 + 11)) & 0x3;
+		tinstr.t32.op2 = (instr >> (16 + 4)) & 0x7f;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(thumb_instr); i++) {
+		const struct thumb_instr *ti = &thumb_instr[i];
+		if (ti->is32 != is32)
+			continue;
+
+		if (is16) {
+			if ((tinstr.t16.opcode & ti->t16.mask) != ti->t16.opcode)
+				continue;
+		} else {
+			if (ti->t32.op1 != tinstr.t32.op1)
+				continue;
+			if ((ti->t32.op2_mask & tinstr.t32.op2) != ti->t32.op2)
+				continue;
+		}
+
+		return ti->decode(vcpu, mmio, instr, &tinstr);
+	}
+
+	return false;
+}
+
+/**
+ * kvm_emulate_mmio_ls - emulates load/store instructions made to I/O memory
+ * @vcpu:	The vcpu pointer
+ * @fault_ipa:	The IPA that caused the 2nd stage fault
+ * @instr:	The instruction that caused the fault
+ *
+ * Handles emulation of load/store instructions which cannot be emulated through
+ * information found in the HSR on faults. It is necessary in this case to
+ * simply decode the offending instruction in software and determine the
+ * required operands.
+ */
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			unsigned long instr, struct kvm_exit_mmio *mmio)
+{
+	bool is_thumb;
+
+	trace_kvm_mmio_emulate(vcpu->arch.regs.pc, instr, vcpu->arch.regs.cpsr);
+
+	mmio->phys_addr = fault_ipa;
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	if (!is_thumb && !kvm_decode_arm_ls(vcpu, instr, mmio)) {
+		kvm_debug("Unable to decode inst: %#08lx (cpsr: %#08x (T=0)"
+			  "pc: %#08x)\n",
+			  instr, *vcpu_cpsr(vcpu), *vcpu_pc(vcpu));
+		kvm_inject_dabt(vcpu, vcpu->arch.hdfar);
+		return 1;
+	} else if (is_thumb && !kvm_decode_thumb_ls(vcpu, instr, mmio)) {
+		kvm_debug("Unable to decode inst: %#08lx (cpsr: %#08x (T=1)"
+			  "pc: %#08x)\n",
+			  instr, *vcpu_cpsr(vcpu), *vcpu_pc(vcpu));
+		kvm_inject_dabt(vcpu, vcpu->arch.hdfar);
+		return 1;
+	}
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	kvm_skip_instr(vcpu, is_wide_instruction(instr));
+	return 0;
+}
+
 /**
  * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
  * @vcpu:	The VCPU pointer
diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
index f39f823..843c3bc 100644
--- a/arch/arm/kvm/exports.c
+++ b/arch/arm/kvm/exports.c
@@ -34,5 +34,6 @@ EXPORT_SYMBOL_GPL(__kvm_vcpu_run);
 
 EXPORT_SYMBOL_GPL(__kvm_flush_vm_context);
 EXPORT_SYMBOL_GPL(__kvm_tlb_flush_vmid);
+EXPORT_SYMBOL_GPL(__kvm_va_to_pa);
 
 EXPORT_SYMBOL_GPL(smp_send_reschedule);
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index cc9448b..ab78477 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -128,6 +128,7 @@ ENDPROC(__kvm_flush_vm_context)
 	VFPFMXR FPEXC, r2	@ FPEXC	(last, in case !EN)
 .endm
 
+
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor world-switch code
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@ -520,6 +521,45 @@ after_vfp_restore:
 	bx	lr			@ return to IOCTL
 
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+@  Translate VA to PA
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+
+@ Arguments:
+@  r0: pointer to vcpu struct
+@  r1: virtual address to map (rounded to page)
+@  r2: 1 = P1 (read) mapping, 0 = P0 (read) mapping.
+@ Returns 64 bit PAR value.
+ENTRY(__kvm_va_to_pa)
+	hvc	#0			@ switch to hyp-mode
+
+	push	{r4-r12}
+
+	@ Fold flag into r1, easier than using stack.
+	cmp	r2, #0
+	movne	r2, #1
+	orr	r1, r1, r2
+
+	@ This swaps too many registers, but we're in the slow path anyway.
+	read_cp15_state
+	write_cp15_state 1, r0
+
+	ands	r2, r1, #1
+	bic	r1, r1, r2
+	mcrne	p15, 0, r1, c7, c8, 0	@ VA to PA, ATS1CPR
+	mcreq	p15, 0, r1, c7, c8, 2	@ VA to PA, ATS1CUR
+	isb
+
+	@ Restore host state.
+	read_cp15_state 1, r0
+	write_cp15_state
+
+	mrrc	p15, 0, r0, r1, c7	@ PAR
+	pop	{r4-r12}
+	hvc	#0			@ Back to SVC
+	bx	lr
+
+
+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 @  Hypervisor exception vector and handlers
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 52cc280..dc760bb 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -19,6 +19,7 @@
 #include <linux/mman.h>
 #include <linux/kvm_host.h>
 #include <linux/io.h>
+#include <trace/events/kvm.h>
 #include <asm/idmap.h>
 #include <asm/pgalloc.h>
 #include <asm/cacheflush.h>
@@ -589,6 +590,266 @@ out_put_existing:
 }
 
 /**
+ * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation
+ * @vcpu: The VCPU pointer
+ * @run:  The VCPU run struct containing the mmio data
+ *
+ * This should only be called after returning from userspace for MMIO load
+ * emulation.
+ */
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	int *dest;
+	unsigned int len;
+	int mask;
+
+	if (!run->mmio.is_write) {
+		dest = vcpu_reg(vcpu, vcpu->arch.mmio.rd);
+		memset(dest, 0, sizeof(int));
+
+		len = run->mmio.len;
+		if (len > 4)
+			return -EINVAL;
+
+		memcpy(dest, run->mmio.data, len);
+
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
+				*((u64 *)run->mmio.data));
+
+		if (vcpu->arch.mmio.sign_extend && len < 4) {
+			mask = 1U << ((len * 8) - 1);
+			*dest = (*dest ^ mask) - mask;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * copy_from_guest_va - copy memory from guest (very slow!)
+ * @vcpu:	vcpu pointer
+ * @dest:	memory to copy into
+ * @gva:	virtual address in guest to copy from
+ * @len:	length to copy
+ * @priv:	use guest PL1 (ie. kernel) mappings
+ *              otherwise use guest PL0 mappings.
+ *
+ * Returns true on success, false on failure (unlikely, but retry).
+ */
+static bool copy_from_guest_va(struct kvm_vcpu *vcpu,
+			       void *dest, unsigned long gva, size_t len,
+			       bool priv)
+{
+	u64 par;
+	phys_addr_t pc_ipa;
+	int err;
+
+	BUG_ON((gva & PAGE_MASK) != ((gva + len) & PAGE_MASK));
+	par = __kvm_va_to_pa(vcpu, gva & PAGE_MASK, priv);
+	if (par & 1) {
+		kvm_err("IO abort from invalid instruction address"
+			" %#lx!\n", gva);
+		return false;
+	}
+
+	BUG_ON(!(par & (1U << 11)));
+	pc_ipa = par & PAGE_MASK & ((1ULL << 32) - 1);
+	pc_ipa += gva & ~PAGE_MASK;
+
+
+	err = kvm_read_guest(vcpu->kvm, pc_ipa, dest, len);
+	if (unlikely(err))
+		return false;
+
+	return true;
+}
+
+/* Just ensure we're not running the guest. */
+static void do_nothing(void *info)
+{
+}
+
+/*
+ * We have to be very careful copying memory from a running (ie. SMP) guest.
+ * Another CPU may remap the page (eg. swap out a userspace text page) as we
+ * read the instruction.  Unlike normal hardware operation, to emulate an
+ * instruction we map the virtual to physical address then read that memory
+ * as separate steps, thus not atomic.
+ *
+ * Fortunately this is so rare (we don't usually need the instruction), we
+ * can go very slowly and noone will mind.
+ */
+static bool copy_current_insn(struct kvm_vcpu *vcpu, unsigned long *instr)
+{
+	int i;
+	bool ret;
+	struct kvm_vcpu *v;
+	bool is_thumb;
+	size_t instr_len;
+
+	/* Don't cross with IPIs in kvm_main.c */
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/* Tell them all to pause, so no more will enter guest. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		v->arch.pause = true;
+
+	/* Set ->pause before we read ->mode */
+	smp_mb();
+
+	/* Kick out any which are still running. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm) {
+		/* Guest could exit now, making cpu wrong. That's OK. */
+		if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE)
+			smp_call_function_single(v->cpu, do_nothing, NULL, 1);
+	}
+
+
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	instr_len = (is_thumb) ? 2 : 4;
+
+	BUG_ON(!is_thumb && vcpu->arch.regs.pc & 0x3);
+
+	/* Now guest isn't running, we can va->pa map and copy atomically. */
+	ret = copy_from_guest_va(vcpu, instr, vcpu->arch.regs.pc, instr_len,
+				 vcpu_mode_priv(vcpu));
+	if (!ret)
+		goto out;
+
+	/* A 32-bit thumb2 instruction can actually go over a page boundary! */
+	if (is_thumb && is_wide_instruction(*instr)) {
+		*instr = *instr << 16;
+		ret = copy_from_guest_va(vcpu, instr, vcpu->arch.regs.pc + 2, 2,
+					 vcpu_mode_priv(vcpu));
+	}
+
+out:
+	/* Release them all. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		v->arch.pause = false;
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return ret;
+}
+
+/**
+ * invalid_io_mem_abort -- Handle I/O aborts ISV bit is clear
+ *
+ * @vcpu:      The vcpu pointer
+ * @fault_ipa: The IPA that caused the 2nd stage fault
+ * @mmio:      Pointer to struct to hold decode information
+ *
+ * Some load/store instructions cannot be emulated using the information
+ * presented in the HSR, for instance, register write-back instructions are not
+ * supported. We therefore need to fetch the instruction, decode it, and then
+ * emulate its behavior.
+ */
+static int invalid_io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+				struct kvm_exit_mmio *mmio)
+{
+	unsigned long instr = 0;
+
+	/* If it fails (SMP race?), we reenter guest for it to retry. */
+	if (!copy_current_insn(vcpu, &instr))
+		return 1;
+
+	return kvm_emulate_mmio_ls(vcpu, fault_ipa, instr, mmio);
+}
+
+static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+		      struct kvm_exit_mmio *mmio)
+{
+	unsigned long rd, len;
+	bool is_write, sign_extend;
+
+	if ((vcpu->arch.hsr >> 8) & 1) {
+		/* cache operation on I/O addr, tell guest unsupported */
+		kvm_inject_dabt(vcpu, vcpu->arch.hdfar);
+		return 1;
+	}
+
+	if ((vcpu->arch.hsr >> 7) & 1) {
+		/* page table accesses IO mem: tell guest to fix its TTBR */
+		kvm_inject_dabt(vcpu, vcpu->arch.hdfar);
+		return 1;
+	}
+
+	switch ((vcpu->arch.hsr >> 22) & 0x3) {
+	case 0:
+		len = 1;
+		break;
+	case 1:
+		len = 2;
+		break;
+	case 2:
+		len = 4;
+		break;
+	default:
+		kvm_err("Hardware is weird: SAS 0b11 is reserved\n");
+		return -EFAULT;
+	}
+
+	is_write = vcpu->arch.hsr & HSR_WNR;
+	sign_extend = vcpu->arch.hsr & HSR_SSE;
+	rd = (vcpu->arch.hsr & HSR_SRT_MASK) >> HSR_SRT_SHIFT;
+
+	if (rd == 15) {
+		/* IO memory trying to read/write pc */
+		kvm_inject_pabt(vcpu, vcpu->arch.hdfar);
+		return 1;
+	}
+
+	mmio->is_write = is_write;
+	mmio->phys_addr = fault_ipa;
+	mmio->len = len;
+	vcpu->arch.mmio.sign_extend = sign_extend;
+	vcpu->arch.mmio.rd = rd;
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	kvm_skip_instr(vcpu, (vcpu->arch.hsr >> 25) & 1);
+	return 0;
+}
+
+static int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
+{
+	struct kvm_exit_mmio mmio;
+	unsigned long rd;
+	int ret;
+
+	/*
+	 * Prepare MMIO operation. First stash it in a private
+	 * structure that we can use for in-kernel emulation. If the
+	 * kernel can't handle it, copy it into run->mmio and let user
+	 * space do its magic.
+	 */
+
+	if (vcpu->arch.hsr & HSR_ISV)
+		ret = decode_hsr(vcpu, fault_ipa, &mmio);
+	else
+		ret = invalid_io_mem_abort(vcpu, fault_ipa, &mmio);
+
+	if (ret != 0)
+		return ret;
+
+	rd = vcpu->arch.mmio.rd;
+	trace_kvm_mmio((mmio.is_write) ? KVM_TRACE_MMIO_WRITE :
+					 KVM_TRACE_MMIO_READ_UNSATISFIED,
+			mmio.len, fault_ipa,
+			(mmio.is_write) ? *vcpu_reg(vcpu, rd) : 0);
+
+	if (mmio.is_write)
+		memcpy(mmio.data, vcpu_reg(vcpu, rd), mmio.len);
+
+	kvm_prepare_mmio(run, &mmio);
+	return 0;
+}
+
+/**
  * kvm_handle_guest_abort - handles all 2nd stage aborts
  * @vcpu:	the VCPU pointer
  * @run:	the kvm_run structure
@@ -633,8 +894,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			return 1;
 		}
 
-		kvm_pr_unimpl("I/O address abort...");
-		return 0;
+		/* Adjust page offset */
+		fault_ipa |= vcpu->arch.hdfar & ~PAGE_MASK;
+		return io_mem_abort(vcpu, run, fault_ipa, memslot);
 	}
 
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 40606c9..7199b58 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -92,6 +92,27 @@ TRACE_EVENT(kvm_irq_line,
 		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
 );
 
+TRACE_EVENT(kvm_mmio_emulate,
+	TP_PROTO(unsigned long vcpu_pc, unsigned long instr,
+		 unsigned long cpsr),
+	TP_ARGS(vcpu_pc, instr, cpsr),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+		__field(	unsigned long,	instr		)
+		__field(	unsigned long,	cpsr		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+		__entry->instr			= instr;
+		__entry->cpsr			= cpsr;
+	),
+
+	TP_printk("Emulate MMIO at: 0x%08lx (instr: %08lx, cpsr: %08lx)",
+		  __entry->vcpu_pc, __entry->instr, __entry->cpsr)
+);
+
 /* Architecturally implementation defined CP15 register access */
 TRACE_EVENT(kvm_emulate_cp15_imp,
 	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 15/15] KVM: ARM: Guest wait-for-interrupts (WFI) support
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-15 15:36   ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:36 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

From: Christoffer Dall <cdall@cs.columbia.edu>

When the guest executes a WFI instruction the operation is trapped to
KVM, which emulates the instruction in software. There is no correlation
between a guest executing a WFI instruction and actually putting the
hardware into a low-power mode, since a KVM guest is essentially a
process and the WFI instruction can be seen as 'sleep' call from this
process. Therefore, we block the vcpu when the guest excecutes a wfi
instruction and the IRQ or FIQ lines are not raised.

When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
signal the VCPU thread and unflag the VCPU to no longer wait for
interrupts.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/kvm/arm.c     |   10 ++++++++--
 arch/arm/kvm/emulate.c |   13 ++++++++++++-
 arch/arm/kvm/trace.h   |   16 ++++++++++++++++
 3 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 06a3368..64fbec7 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -318,9 +318,16 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+/**
+ * kvm_arch_vcpu_runnable - determine if the vcpu can be scheduled
+ * @v:		The VCPU pointer
+ *
+ * If the guest CPU is not waiting for interrupts or an interrupt line is
+ * asserted, the CPU is by definition runnable.
+ */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return 0;
+	return !!v->arch.irq_lines;
 }
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
@@ -586,7 +593,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * Check conditions before entering the guest
 		 */
 		cond_resched();
-
 		update_vttbr(vcpu->kvm);
 
 		local_irq_disable();
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 2670679..fc0fcd3 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -154,9 +154,20 @@ static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
 	return INSTR_NONE;
 }
 
+/**
+ * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
+ * @vcpu:	the vcpu pointer
+ * @run:	the kvm_run structure pointer
+ *
+ * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
+ * halt execution of world-switches and schedule other host processes until
+ * there is an incoming IRQ or FIQ to the VM.
+ */
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return 0;
+	trace_kvm_wfi(vcpu->arch.regs.pc);
+	kvm_vcpu_block(vcpu);
+	return 1;
 }
 
 
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 7199b58..b371138 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -143,6 +143,22 @@ TRACE_EVENT(kvm_emulate_cp15_imp,
 			__entry->CRm, __entry->Op2)
 );
 
+TRACE_EVENT(kvm_wfi,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("guest executed wfi at: 0x%08lx", __entry->vcpu_pc)
+);
+
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 15/15] KVM: ARM: Guest wait-for-interrupts (WFI) support
@ 2012-09-15 15:36   ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-15 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christoffer Dall <cdall@cs.columbia.edu>

When the guest executes a WFI instruction the operation is trapped to
KVM, which emulates the instruction in software. There is no correlation
between a guest executing a WFI instruction and actually putting the
hardware into a low-power mode, since a KVM guest is essentially a
process and the WFI instruction can be seen as 'sleep' call from this
process. Therefore, we block the vcpu when the guest excecutes a wfi
instruction and the IRQ or FIQ lines are not raised.

When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
signal the VCPU thread and unflag the VCPU to no longer wait for
interrupts.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/kvm/arm.c     |   10 ++++++++--
 arch/arm/kvm/emulate.c |   13 ++++++++++++-
 arch/arm/kvm/trace.h   |   16 ++++++++++++++++
 3 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 06a3368..64fbec7 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -318,9 +318,16 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+/**
+ * kvm_arch_vcpu_runnable - determine if the vcpu can be scheduled
+ * @v:		The VCPU pointer
+ *
+ * If the guest CPU is not waiting for interrupts or an interrupt line is
+ * asserted, the CPU is by definition runnable.
+ */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return 0;
+	return !!v->arch.irq_lines;
 }
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
@@ -586,7 +593,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * Check conditions before entering the guest
 		 */
 		cond_resched();
-
 		update_vttbr(vcpu->kvm);
 
 		local_irq_disable();
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 2670679..fc0fcd3 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -154,9 +154,20 @@ static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
 	return INSTR_NONE;
 }
 
+/**
+ * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
+ * @vcpu:	the vcpu pointer
+ * @run:	the kvm_run structure pointer
+ *
+ * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
+ * halt execution of world-switches and schedule other host processes until
+ * there is an incoming IRQ or FIQ to the VM.
+ */
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return 0;
+	trace_kvm_wfi(vcpu->arch.regs.pc);
+	kvm_vcpu_block(vcpu);
+	return 1;
 }
 
 
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 7199b58..b371138 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -143,6 +143,22 @@ TRACE_EVENT(kvm_emulate_cp15_imp,
 			__entry->CRm, __entry->Op2)
 );
 
+TRACE_EVENT(kvm_wfi,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("guest executed wfi at: 0x%08lx", __entry->vcpu_pc)
+);
+
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* Re: [PATCH 00/15] KVM/ARM Implementation
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-18 12:21   ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 12:21 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Hi Christoffer,

On Sat, Sep 15, 2012 at 04:34:29PM +0100, Christoffer Dall wrote:
> The following series implements KVM support for ARM processors,
> specifically on the Cortex A-15 platform.  We feel this is ready to be
> merged.
> 
> Work is done in collaboration between Columbia University, Virtual Open
> Systems and ARM/Linaro.
> 
> The patch series applies to Linux 3.6-rc5 with a number of merges:
>  1. git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git
>         branch: io (fc8a08c3a3a)

This is a rebasing branch. If you want to merge it, I can push a separate
stable branch for you if you like? That said, I imagine the only patch you
want from there is the non-writeback I/O accessors, so I could just stick
that somewhere else until it hits mainline.

Let me know,

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 00/15] KVM/ARM Implementation
@ 2012-09-18 12:21   ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 12:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On Sat, Sep 15, 2012 at 04:34:29PM +0100, Christoffer Dall wrote:
> The following series implements KVM support for ARM processors,
> specifically on the Cortex A-15 platform.  We feel this is ready to be
> merged.
> 
> Work is done in collaboration between Columbia University, Virtual Open
> Systems and ARM/Linaro.
> 
> The patch series applies to Linux 3.6-rc5 with a number of merges:
>  1. git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git
>         branch: io (fc8a08c3a3a)

This is a rebasing branch. If you want to merge it, I can push a separate
stable branch for you if you like? That said, I imagine the only patch you
want from there is the non-writeback I/O accessors, so I could just stick
that somewhere else until it hits mainline.

Let me know,

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 01/15] ARM: add mem_type prot_pte accessor
  2012-09-15 15:34   ` Christoffer Dall
@ 2012-09-18 12:23     ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 12:23 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Sat, Sep 15, 2012 at 04:34:36PM +0100, Christoffer Dall wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> The KVM hypervisor mmu code requires access to the mem_type prot_pte
> field when setting up page tables pointing to a device. Unfortunately,
> the mem_type structure is opaque.
> 
> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
> value.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/mach/map.h |    1 +
>  arch/arm/mm/mmu.c               |    6 ++++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
> index a6efcdd..3787c9f 100644
> --- a/arch/arm/include/asm/mach/map.h
> +++ b/arch/arm/include/asm/mach/map.h
> @@ -37,6 +37,7 @@ extern void iotable_init(struct map_desc *, int);
>  
>  struct mem_type;
>  extern const struct mem_type *get_mem_type(unsigned int type);
> +extern pteval_t get_mem_type_prot_pte(unsigned int type);
>  /*
>   * external interface to remap single page with appropriate type
>   */
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 4c2d045..76bf4f5 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -301,6 +301,12 @@ const struct mem_type *get_mem_type(unsigned int type)
>  }
>  EXPORT_SYMBOL(get_mem_type);
>  
> +pteval_t get_mem_type_prot_pte(unsigned int type)
> +{
> +	return get_mem_type(type)->prot_pte;
> +}
> +EXPORT_SYMBOL(get_mem_type_prot_pte);
> +

get_mem_type can return NULL, so you should probably pass the error through
rather than dereferencing it.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 01/15] ARM: add mem_type prot_pte accessor
@ 2012-09-18 12:23     ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 12:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 15, 2012 at 04:34:36PM +0100, Christoffer Dall wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> The KVM hypervisor mmu code requires access to the mem_type prot_pte
> field when setting up page tables pointing to a device. Unfortunately,
> the mem_type structure is opaque.
> 
> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
> value.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/mach/map.h |    1 +
>  arch/arm/mm/mmu.c               |    6 ++++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
> index a6efcdd..3787c9f 100644
> --- a/arch/arm/include/asm/mach/map.h
> +++ b/arch/arm/include/asm/mach/map.h
> @@ -37,6 +37,7 @@ extern void iotable_init(struct map_desc *, int);
>  
>  struct mem_type;
>  extern const struct mem_type *get_mem_type(unsigned int type);
> +extern pteval_t get_mem_type_prot_pte(unsigned int type);
>  /*
>   * external interface to remap single page with appropriate type
>   */
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 4c2d045..76bf4f5 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -301,6 +301,12 @@ const struct mem_type *get_mem_type(unsigned int type)
>  }
>  EXPORT_SYMBOL(get_mem_type);
>  
> +pteval_t get_mem_type_prot_pte(unsigned int type)
> +{
> +	return get_mem_type(type)->prot_pte;
> +}
> +EXPORT_SYMBOL(get_mem_type_prot_pte);
> +

get_mem_type can return NULL, so you should probably pass the error through
rather than dereferencing it.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 00/15] KVM/ARM Implementation
  2012-09-18 12:21   ` Will Deacon
@ 2012-09-18 12:32     ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 12:32 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Sep 18, 2012 at 8:21 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Christoffer,
>
> On Sat, Sep 15, 2012 at 04:34:29PM +0100, Christoffer Dall wrote:
>> The following series implements KVM support for ARM processors,
>> specifically on the Cortex A-15 platform.  We feel this is ready to be
>> merged.
>>
>> Work is done in collaboration between Columbia University, Virtual Open
>> Systems and ARM/Linaro.
>>
>> The patch series applies to Linux 3.6-rc5 with a number of merges:
>>  1. git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git
>>         branch: io (fc8a08c3a3a)
>
> This is a rebasing branch. If you want to merge it, I can push a separate
> stable branch for you if you like? That said, I imagine the only patch you
> want from there is the non-writeback I/O accessors, so I could just stick
> that somewhere else until it hits mainline.
>
> Let me know,
>
Hi Will,

I actually merged this branch so Marc could merge from my branches, so
I included it in the base for this patch series as well, to minimize
the diff between the HEAD of kvm-arm-v11-vgic-times and kvm-arm-master
at time of sending these patches out. You don't have to prepare the
patch separately, as the kernel boots just fine without it now when we
have thumb mmio instruction decoding, albeit somewhat slower.

Thanks!
-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 00/15] KVM/ARM Implementation
@ 2012-09-18 12:32     ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 12:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 8:21 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Christoffer,
>
> On Sat, Sep 15, 2012 at 04:34:29PM +0100, Christoffer Dall wrote:
>> The following series implements KVM support for ARM processors,
>> specifically on the Cortex A-15 platform.  We feel this is ready to be
>> merged.
>>
>> Work is done in collaboration between Columbia University, Virtual Open
>> Systems and ARM/Linaro.
>>
>> The patch series applies to Linux 3.6-rc5 with a number of merges:
>>  1. git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git
>>         branch: io (fc8a08c3a3a)
>
> This is a rebasing branch. If you want to merge it, I can push a separate
> stable branch for you if you like? That said, I imagine the only patch you
> want from there is the non-writeback I/O accessors, so I could just stick
> that somewhere else until it hits mainline.
>
> Let me know,
>
Hi Will,

I actually merged this branch so Marc could merge from my branches, so
I included it in the base for this patch series as well, to minimize
the diff between the HEAD of kvm-arm-v11-vgic-times and kvm-arm-master
at time of sending these patches out. You don't have to prepare the
patch separately, as the kernel boots just fine without it now when we
have thumb mmio instruction decoding, albeit somewhat slower.

Thanks!
-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 02/15] ARM: Add page table and page defines needed by KVM
  2012-09-15 15:34   ` Christoffer Dall
@ 2012-09-18 12:47     ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 12:47 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
> KVM uses the stage-2 page tables and the Hyp page table format,
> so let's define the fields we need to access in KVM.
> 
> We use pgprot_guest to indicate stage-2 entries.
> 
> Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/pgtable-3level.h |   13 +++++++++++++
>  arch/arm/include/asm/pgtable.h        |    5 +++++
>  arch/arm/mm/mmu.c                     |    3 +++
>  3 files changed, 21 insertions(+)
> 
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index b249035..7351eee 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -102,11 +102,24 @@
>   */
>  #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
>  
> +/*
> + * 2-nd stage PTE definitions for LPAE.
> + */

Minor nit: 2nd

> +#define L_PTE2_SHARED		L_PTE_SHARED
> +#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
> +#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */

This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
stage 1 translation and name these RDONLY and WRONLY (do you even use
that?).

> +#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
> +#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */

Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?

>  #ifndef __ASSEMBLY__
>  
>  #define pud_none(pud)		(!pud_val(pud))
>  #define pud_bad(pud)		(!(pud_val(pud) & 2))
>  #define pud_present(pud)	(pud_val(pud))
> +#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
> +						 PMD_TYPE_TABLE)
> +#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
> +						 PMD_TYPE_SECT)
>  
>  #define pud_clear(pudp)			\
>  	do {				\
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index 41dc31f..c422f62 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
>  
>  extern pgprot_t		pgprot_user;
>  extern pgprot_t		pgprot_kernel;
> +extern pgprot_t		pgprot_guest;
>  
>  #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
>  
> @@ -82,6 +83,10 @@ extern pgprot_t		pgprot_kernel;
>  #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
>  #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
>  #define PAGE_KERNEL_EXEC	pgprot_kernel
> +#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)

Just define L_PTE_HYP to L_PTE_USER, otherwise that's confusing.

> +#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
> +					  L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
> +					  L_PTE2_SHARED)

It would be cleaner to separate the cacheability attributes out from here
and into the cache_policies array. Then you just need L_PTE_HYP_RDONLY here.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 02/15] ARM: Add page table and page defines needed by KVM
@ 2012-09-18 12:47     ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 12:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
> KVM uses the stage-2 page tables and the Hyp page table format,
> so let's define the fields we need to access in KVM.
> 
> We use pgprot_guest to indicate stage-2 entries.
> 
> Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/pgtable-3level.h |   13 +++++++++++++
>  arch/arm/include/asm/pgtable.h        |    5 +++++
>  arch/arm/mm/mmu.c                     |    3 +++
>  3 files changed, 21 insertions(+)
> 
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index b249035..7351eee 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -102,11 +102,24 @@
>   */
>  #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
>  
> +/*
> + * 2-nd stage PTE definitions for LPAE.
> + */

Minor nit: 2nd

> +#define L_PTE2_SHARED		L_PTE_SHARED
> +#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
> +#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */

This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
stage 1 translation and name these RDONLY and WRONLY (do you even use
that?).

> +#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
> +#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */

Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?

>  #ifndef __ASSEMBLY__
>  
>  #define pud_none(pud)		(!pud_val(pud))
>  #define pud_bad(pud)		(!(pud_val(pud) & 2))
>  #define pud_present(pud)	(pud_val(pud))
> +#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
> +						 PMD_TYPE_TABLE)
> +#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
> +						 PMD_TYPE_SECT)
>  
>  #define pud_clear(pudp)			\
>  	do {				\
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index 41dc31f..c422f62 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
>  
>  extern pgprot_t		pgprot_user;
>  extern pgprot_t		pgprot_kernel;
> +extern pgprot_t		pgprot_guest;
>  
>  #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
>  
> @@ -82,6 +83,10 @@ extern pgprot_t		pgprot_kernel;
>  #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
>  #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
>  #define PAGE_KERNEL_EXEC	pgprot_kernel
> +#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)

Just define L_PTE_HYP to L_PTE_USER, otherwise that's confusing.

> +#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
> +					  L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
> +					  L_PTE2_SHARED)

It would be cleaner to separate the cacheability attributes out from here
and into the cache_policies array. Then you just need L_PTE_HYP_RDONLY here.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 03/15] ARM: Section based HYP idmap
  2012-09-15 15:34   ` Christoffer Dall
@ 2012-09-18 13:00     ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 13:00 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Sat, Sep 15, 2012 at 04:34:49PM +0100, Christoffer Dall wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> Add a HYP pgd to the core code (so it can benefit all Linux
> hypervisors).
> 
> Populate this pgd with an identity mapping of the code contained
> in the .hyp.idmap.text section
> 
> Offer a method to drop the this identity mapping through
> hyp_idmap_teardown and re-create it through hyp_idmap_setup.
> 
> Make all the above depend on CONFIG_ARM_VIRT_EXT
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

I didn't get CC'd on this -- please check your git send-email configuration.

> ---
>  arch/arm/include/asm/idmap.h                |    7 ++
>  arch/arm/include/asm/pgtable-3level-hwdef.h |    1 
>  arch/arm/kernel/vmlinux.lds.S               |    6 ++
>  arch/arm/mm/idmap.c                         |   88 +++++++++++++++++++++++----
>  4 files changed, 89 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
> index bf863ed..a1ab8d6 100644
> --- a/arch/arm/include/asm/idmap.h
> +++ b/arch/arm/include/asm/idmap.h
> @@ -11,4 +11,11 @@ extern pgd_t *idmap_pgd;
>  
>  void setup_mm_for_reboot(void);
>  
> +#ifdef CONFIG_ARM_VIRT_EXT
> +extern pgd_t *hyp_pgd;
> +
> +void hyp_idmap_teardown(void);
> +void hyp_idmap_setup(void);
> +#endif
> +
>  #endif	/* __ASM_IDMAP_H */
> diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
> index d795282..a2d404e 100644
> --- a/arch/arm/include/asm/pgtable-3level-hwdef.h
> +++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
> @@ -44,6 +44,7 @@
>  #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
>  #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
>  #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
> +#define PMD_SECT_AP1		(_AT(pmdval_t, 1) << 6)
>  #define PMD_SECT_TEX(x)		(_AT(pmdval_t, 0))

If we're making an idmap for the hyp, shouldn't this just follow the
L_PTE_HYP_RDONLY bit (L_PTE2_READ?) from the previous patch except with PMD
in the name (the bit is identical)?

> diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
> index ab88ed4..7a944af 100644
> --- a/arch/arm/mm/idmap.c
> +++ b/arch/arm/mm/idmap.c
> @@ -1,4 +1,6 @@
> +#include <linux/module.h>
>  #include <linux/kernel.h>
> +#include <linux/slab.h>
>  
>  #include <asm/cputype.h>
>  #include <asm/idmap.h>
> @@ -59,11 +61,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>  	} while (pud++, addr = next, addr != end);
>  }
>  
> -static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
> +static void identity_mapping_add(pgd_t *pgd, const char *text_start,
> +				 const char *text_end, unsigned long prot)
>  {
> -	unsigned long prot, next;
> +	unsigned long addr, end;
> +	unsigned long next;
> +
> +	addr = virt_to_phys(text_start);
> +	end = virt_to_phys(text_end);
> +
> +	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
> +		prot ? "HYP " : "",
> +		(long long)addr, (long long)end);
> +	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
>  
> -	prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
>  	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
>  		prot |= PMD_BIT4;
>  
> @@ -78,24 +89,77 @@ extern char  __idmap_text_start[], __idmap_text_end[];
>  
>  static int __init init_static_idmap(void)
>  {
> -	phys_addr_t idmap_start, idmap_end;
> -
>  	idmap_pgd = pgd_alloc(&init_mm);
>  	if (!idmap_pgd)
>  		return -ENOMEM;
>  
> -	/* Add an identity mapping for the physical address of the section. */
> -	idmap_start = virt_to_phys((void *)__idmap_text_start);
> -	idmap_end = virt_to_phys((void *)__idmap_text_end);
> -
> -	pr_info("Setting up static identity map for 0x%llx - 0x%llx\n",
> -		(long long)idmap_start, (long long)idmap_end);
> -	identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
> +	identity_mapping_add(idmap_pgd, __idmap_text_start,
> +			     __idmap_text_end, 0);
>  
>  	return 0;
>  }
>  early_initcall(init_static_idmap);

I think it would be cleaner just to have a separate initcall for the hyp pgd
initialisation, leaving the print like it is. That way we also don't have to
infer the exception level from the prot, which is pretty fragile.

> +#ifdef CONFIG_ARM_VIRT_EXT
> +pgd_t *hyp_pgd;
> +EXPORT_SYMBOL_GPL(hyp_pgd);
> +
> +static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
> +{
> +	pud_t *pud;
> +	pmd_t *pmd;
> +
> +	pud = pud_offset(pgd, addr);
> +	pmd = pmd_offset(pud, addr);
> +	pud_clear(pud);
> +	clean_pmd_entry(pmd);
> +	pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
> +}
> +
> +extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
> +
> +/*
> + * This version actually frees the underlying pmds for all pgds in range and
> + * clear the pgds themselves afterwards.
> + */
> +void hyp_idmap_teardown(void)
> +{
> +	unsigned long addr, end;
> +	unsigned long next;
> +	pgd_t *pgd = hyp_pgd;
> +
> +	addr = virt_to_phys(__hyp_idmap_text_start);
> +	end = virt_to_phys(__hyp_idmap_text_end);
> +
> +	pgd += pgd_index(addr);
> +	do {
> +		next = pgd_addr_end(addr, end);
> +		if (!pgd_none_or_clear_bad(pgd))
> +			hyp_idmap_del_pmd(pgd, addr);
> +	} while (pgd++, addr = next, addr < end);
> +}
> +EXPORT_SYMBOL_GPL(hyp_idmap_teardown);
> +
> +void hyp_idmap_setup(void)
> +{
> +	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
> +			     __hyp_idmap_text_end, PMD_SECT_AP1);
> +}
> +EXPORT_SYMBOL_GPL(hyp_idmap_setup);
> +
> +static int __init hyp_init_static_idmap(void)
> +{
> +	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
> +	if (!hyp_pgd)
> +		return -ENOMEM;
> +
> +	hyp_idmap_setup();
> +
> +	return 0;
> +}
> +early_initcall(hyp_init_static_idmap);
> +#endif

I'd rather the alloc/free functions for the hyp pgd were somewhere else,
like they are for standard pgds. Then we can just call them here without
having to encode knowledge of PGD size etc in the mapping code.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 03/15] ARM: Section based HYP idmap
@ 2012-09-18 13:00     ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 13:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 15, 2012 at 04:34:49PM +0100, Christoffer Dall wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> Add a HYP pgd to the core code (so it can benefit all Linux
> hypervisors).
> 
> Populate this pgd with an identity mapping of the code contained
> in the .hyp.idmap.text section
> 
> Offer a method to drop the this identity mapping through
> hyp_idmap_teardown and re-create it through hyp_idmap_setup.
> 
> Make all the above depend on CONFIG_ARM_VIRT_EXT
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

I didn't get CC'd on this -- please check your git send-email configuration.

> ---
>  arch/arm/include/asm/idmap.h                |    7 ++
>  arch/arm/include/asm/pgtable-3level-hwdef.h |    1 
>  arch/arm/kernel/vmlinux.lds.S               |    6 ++
>  arch/arm/mm/idmap.c                         |   88 +++++++++++++++++++++++----
>  4 files changed, 89 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
> index bf863ed..a1ab8d6 100644
> --- a/arch/arm/include/asm/idmap.h
> +++ b/arch/arm/include/asm/idmap.h
> @@ -11,4 +11,11 @@ extern pgd_t *idmap_pgd;
>  
>  void setup_mm_for_reboot(void);
>  
> +#ifdef CONFIG_ARM_VIRT_EXT
> +extern pgd_t *hyp_pgd;
> +
> +void hyp_idmap_teardown(void);
> +void hyp_idmap_setup(void);
> +#endif
> +
>  #endif	/* __ASM_IDMAP_H */
> diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
> index d795282..a2d404e 100644
> --- a/arch/arm/include/asm/pgtable-3level-hwdef.h
> +++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
> @@ -44,6 +44,7 @@
>  #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
>  #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
>  #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
> +#define PMD_SECT_AP1		(_AT(pmdval_t, 1) << 6)
>  #define PMD_SECT_TEX(x)		(_AT(pmdval_t, 0))

If we're making an idmap for the hyp, shouldn't this just follow the
L_PTE_HYP_RDONLY bit (L_PTE2_READ?) from the previous patch except with PMD
in the name (the bit is identical)?

> diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
> index ab88ed4..7a944af 100644
> --- a/arch/arm/mm/idmap.c
> +++ b/arch/arm/mm/idmap.c
> @@ -1,4 +1,6 @@
> +#include <linux/module.h>
>  #include <linux/kernel.h>
> +#include <linux/slab.h>
>  
>  #include <asm/cputype.h>
>  #include <asm/idmap.h>
> @@ -59,11 +61,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>  	} while (pud++, addr = next, addr != end);
>  }
>  
> -static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
> +static void identity_mapping_add(pgd_t *pgd, const char *text_start,
> +				 const char *text_end, unsigned long prot)
>  {
> -	unsigned long prot, next;
> +	unsigned long addr, end;
> +	unsigned long next;
> +
> +	addr = virt_to_phys(text_start);
> +	end = virt_to_phys(text_end);
> +
> +	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
> +		prot ? "HYP " : "",
> +		(long long)addr, (long long)end);
> +	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
>  
> -	prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
>  	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
>  		prot |= PMD_BIT4;
>  
> @@ -78,24 +89,77 @@ extern char  __idmap_text_start[], __idmap_text_end[];
>  
>  static int __init init_static_idmap(void)
>  {
> -	phys_addr_t idmap_start, idmap_end;
> -
>  	idmap_pgd = pgd_alloc(&init_mm);
>  	if (!idmap_pgd)
>  		return -ENOMEM;
>  
> -	/* Add an identity mapping for the physical address of the section. */
> -	idmap_start = virt_to_phys((void *)__idmap_text_start);
> -	idmap_end = virt_to_phys((void *)__idmap_text_end);
> -
> -	pr_info("Setting up static identity map for 0x%llx - 0x%llx\n",
> -		(long long)idmap_start, (long long)idmap_end);
> -	identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
> +	identity_mapping_add(idmap_pgd, __idmap_text_start,
> +			     __idmap_text_end, 0);
>  
>  	return 0;
>  }
>  early_initcall(init_static_idmap);

I think it would be cleaner just to have a separate initcall for the hyp pgd
initialisation, leaving the print like it is. That way we also don't have to
infer the exception level from the prot, which is pretty fragile.

> +#ifdef CONFIG_ARM_VIRT_EXT
> +pgd_t *hyp_pgd;
> +EXPORT_SYMBOL_GPL(hyp_pgd);
> +
> +static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
> +{
> +	pud_t *pud;
> +	pmd_t *pmd;
> +
> +	pud = pud_offset(pgd, addr);
> +	pmd = pmd_offset(pud, addr);
> +	pud_clear(pud);
> +	clean_pmd_entry(pmd);
> +	pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
> +}
> +
> +extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
> +
> +/*
> + * This version actually frees the underlying pmds for all pgds in range and
> + * clear the pgds themselves afterwards.
> + */
> +void hyp_idmap_teardown(void)
> +{
> +	unsigned long addr, end;
> +	unsigned long next;
> +	pgd_t *pgd = hyp_pgd;
> +
> +	addr = virt_to_phys(__hyp_idmap_text_start);
> +	end = virt_to_phys(__hyp_idmap_text_end);
> +
> +	pgd += pgd_index(addr);
> +	do {
> +		next = pgd_addr_end(addr, end);
> +		if (!pgd_none_or_clear_bad(pgd))
> +			hyp_idmap_del_pmd(pgd, addr);
> +	} while (pgd++, addr = next, addr < end);
> +}
> +EXPORT_SYMBOL_GPL(hyp_idmap_teardown);
> +
> +void hyp_idmap_setup(void)
> +{
> +	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
> +			     __hyp_idmap_text_end, PMD_SECT_AP1);
> +}
> +EXPORT_SYMBOL_GPL(hyp_idmap_setup);
> +
> +static int __init hyp_init_static_idmap(void)
> +{
> +	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
> +	if (!hyp_pgd)
> +		return -ENOMEM;
> +
> +	hyp_idmap_setup();
> +
> +	return 0;
> +}
> +early_initcall(hyp_init_static_idmap);
> +#endif

I'd rather the alloc/free functions for the hyp pgd were somewhere else,
like they are for standard pgds. Then we can just call them here without
having to encode knowledge of PGD size etc in the mapping code.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 04/15] ARM: idmap: only initialize HYP idmap when HYP mode is available
  2012-09-15 15:34   ` Christoffer Dall
@ 2012-09-18 13:03     ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 13:03 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Sat, Sep 15, 2012 at 04:34:55PM +0100, Christoffer Dall wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/mm/idmap.c |    4 ++++
>  1 file changed, 4 insertions(+)

Just fold this into the previous patch.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 04/15] ARM: idmap: only initialize HYP idmap when HYP mode is available
@ 2012-09-18 13:03     ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 13:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 15, 2012 at 04:34:55PM +0100, Christoffer Dall wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/mm/idmap.c |    4 ++++
>  1 file changed, 4 insertions(+)

Just fold this into the previous patch.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 05/15] ARM: Expose PMNC bitfields for KVM use
  2012-09-15 15:35   ` Christoffer Dall
@ 2012-09-18 13:08     ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 13:08 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Sat, Sep 15, 2012 at 04:35:02PM +0100, Christoffer Dall wrote:
> From: Rusty Russell <rusty.russell@linaro.org>
> 
> We want some of these for use in KVM, so pull them out of
> arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.
> 
> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/perf_bits.h |   56 ++++++++++++++++++++++++++++++++++++++
>  arch/arm/kernel/perf_event_v7.c  |   51 +----------------------------------
>  2 files changed, 57 insertions(+), 50 deletions(-)
>  create mode 100644 arch/arm/include/asm/perf_bits.h

I don't like this I'm afraid. These bit definitions, although useful for
kvm, are only applicable to ARMv7 PMUs. Perf does a reasonable job of
separating the low-level CPU-specific code and adding the v7 definitions
into their own global header feels like a step backwards. I also want to
move a load of this into drivers/ at some point and this won't help with
that effort.

Is KVM just using this for world switch? If so, why does it care about the
bit definitions (and what do you do for things like debug regs)? Is there
anything I could add to perf that you could call instead?

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 05/15] ARM: Expose PMNC bitfields for KVM use
@ 2012-09-18 13:08     ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-18 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 15, 2012 at 04:35:02PM +0100, Christoffer Dall wrote:
> From: Rusty Russell <rusty.russell@linaro.org>
> 
> We want some of these for use in KVM, so pull them out of
> arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.
> 
> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/perf_bits.h |   56 ++++++++++++++++++++++++++++++++++++++
>  arch/arm/kernel/perf_event_v7.c  |   51 +----------------------------------
>  2 files changed, 57 insertions(+), 50 deletions(-)
>  create mode 100644 arch/arm/include/asm/perf_bits.h

I don't like this I'm afraid. These bit definitions, although useful for
kvm, are only applicable to ARMv7 PMUs. Perf does a reasonable job of
separating the low-level CPU-specific code and adding the v7 definitions
into their own global header feels like a step backwards. I also want to
move a load of this into drivers/ at some point and this won't help with
that effort.

Is KVM just using this for world switch? If so, why does it care about the
bit definitions (and what do you do for things like debug regs)? Is there
anything I could add to perf that you could call instead?

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 02/15] ARM: Add page table and page defines needed by KVM
  2012-09-18 12:47     ` Will Deacon
@ 2012-09-18 14:06       ` Catalin Marinas
  -1 siblings, 0 replies; 164+ messages in thread
From: Catalin Marinas @ 2012-09-18 14:06 UTC (permalink / raw)
  To: Will Deacon; +Cc: Christoffer Dall, linux-arm-kernel, kvm, kvmarm

On 18 September 2012 13:47, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
>> +#define L_PTE2_SHARED                L_PTE_SHARED
>> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>
> This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
> stage 1 translation and name these RDONLY and WRONLY (do you even use
> that?).

We can't use RDONLY as this would have value 0 as the HAP attributes
(stage 2 overriding stage 1 translation attributes). Unless you add 4
definitions like NOACCESS, RDONLY, WRONLY and RDWR to cover all the
bit combinations.

>> +#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
>> +#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
>
> Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?

L_PTE_HYP may be confused with the Stage 1 Hyp translation which is
different from the guest Stage 2.

But I have another minor nit - just write them in the ascending bit
order as other definitions in this file.

-- 
Catalin

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 02/15] ARM: Add page table and page defines needed by KVM
@ 2012-09-18 14:06       ` Catalin Marinas
  0 siblings, 0 replies; 164+ messages in thread
From: Catalin Marinas @ 2012-09-18 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 18 September 2012 13:47, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
>> +#define L_PTE2_SHARED                L_PTE_SHARED
>> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>
> This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
> stage 1 translation and name these RDONLY and WRONLY (do you even use
> that?).

We can't use RDONLY as this would have value 0 as the HAP attributes
(stage 2 overriding stage 1 translation attributes). Unless you add 4
definitions like NOACCESS, RDONLY, WRONLY and RDWR to cover all the
bit combinations.

>> +#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
>> +#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
>
> Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?

L_PTE_HYP may be confused with the Stage 1 Hyp translation which is
different from the guest Stage 2.

But I have another minor nit - just write them in the ascending bit
order as other definitions in this file.

-- 
Catalin

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 02/15] ARM: Add page table and page defines needed by KVM
  2012-09-18 14:06       ` Catalin Marinas
@ 2012-09-18 15:05         ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 15:05 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Will Deacon, linux-arm-kernel, kvm, kvmarm

On Tue, Sep 18, 2012 at 10:06 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On 18 September 2012 13:47, Will Deacon <will.deacon@arm.com> wrote:
>> On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
>>> +#define L_PTE2_SHARED                L_PTE_SHARED
>>> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>>> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>>
>> This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
>> stage 1 translation and name these RDONLY and WRONLY (do you even use
>> that?).
>
> We can't use RDONLY as this would have value 0 as the HAP attributes
> (stage 2 overriding stage 1 translation attributes). Unless you add 4
> definitions like NOACCESS, RDONLY, WRONLY and RDWR to cover all the
> bit combinations.
>
>>> +#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
>>> +#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
>>
>> Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?
>
> L_PTE_HYP may be confused with the Stage 1 Hyp translation which is
> different from the guest Stage 2.

exactly, it's misleading, how about L_PTE_STAGE2, a little verbose,
but clear...?

>
> But I have another minor nit - just write them in the ascending bit
> order as other definitions in this file.
>

ok, will fix.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 02/15] ARM: Add page table and page defines needed by KVM
@ 2012-09-18 15:05         ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 15:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 10:06 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On 18 September 2012 13:47, Will Deacon <will.deacon@arm.com> wrote:
>> On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
>>> +#define L_PTE2_SHARED                L_PTE_SHARED
>>> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>>> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>>
>> This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
>> stage 1 translation and name these RDONLY and WRONLY (do you even use
>> that?).
>
> We can't use RDONLY as this would have value 0 as the HAP attributes
> (stage 2 overriding stage 1 translation attributes). Unless you add 4
> definitions like NOACCESS, RDONLY, WRONLY and RDWR to cover all the
> bit combinations.
>
>>> +#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
>>> +#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
>>
>> Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?
>
> L_PTE_HYP may be confused with the Stage 1 Hyp translation which is
> different from the guest Stage 2.

exactly, it's misleading, how about L_PTE_STAGE2, a little verbose,
but clear...?

>
> But I have another minor nit - just write them in the ascending bit
> order as other definitions in this file.
>

ok, will fix.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 02/15] ARM: Add page table and page defines needed by KVM
  2012-09-18 15:05         ` Christoffer Dall
@ 2012-09-18 15:07           ` Catalin Marinas
  -1 siblings, 0 replies; 164+ messages in thread
From: Catalin Marinas @ 2012-09-18 15:07 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Will Deacon, linux-arm-kernel, kvm, kvmarm

On Tue, Sep 18, 2012 at 04:05:13PM +0100, Christoffer Dall wrote:
> On Tue, Sep 18, 2012 at 10:06 AM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On 18 September 2012 13:47, Will Deacon <will.deacon@arm.com> wrote:
> >> On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
> >>> +#define L_PTE2_SHARED                L_PTE_SHARED
> >>> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
> >>> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
> >>
> >> This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
> >> stage 1 translation and name these RDONLY and WRONLY (do you even use
> >> that?).
> >
> > We can't use RDONLY as this would have value 0 as the HAP attributes
> > (stage 2 overriding stage 1 translation attributes). Unless you add 4
> > definitions like NOACCESS, RDONLY, WRONLY and RDWR to cover all the
> > bit combinations.
> >
> >>> +#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
> >>> +#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
> >>
> >> Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?
> >
> > L_PTE_HYP may be confused with the Stage 1 Hyp translation which is
> > different from the guest Stage 2.
> 
> exactly, it's misleading, how about L_PTE_STAGE2, a little verbose,
> but clear...?

I don't mind any (apart from L_PTE_HYP_ would be confusing) for stage 2.
You could just use S2 to make it shorter.

-- 
Catalin

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 02/15] ARM: Add page table and page defines needed by KVM
@ 2012-09-18 15:07           ` Catalin Marinas
  0 siblings, 0 replies; 164+ messages in thread
From: Catalin Marinas @ 2012-09-18 15:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 04:05:13PM +0100, Christoffer Dall wrote:
> On Tue, Sep 18, 2012 at 10:06 AM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On 18 September 2012 13:47, Will Deacon <will.deacon@arm.com> wrote:
> >> On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
> >>> +#define L_PTE2_SHARED                L_PTE_SHARED
> >>> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
> >>> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
> >>
> >> This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
> >> stage 1 translation and name these RDONLY and WRONLY (do you even use
> >> that?).
> >
> > We can't use RDONLY as this would have value 0 as the HAP attributes
> > (stage 2 overriding stage 1 translation attributes). Unless you add 4
> > definitions like NOACCESS, RDONLY, WRONLY and RDWR to cover all the
> > bit combinations.
> >
> >>> +#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
> >>> +#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
> >>
> >> Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?
> >
> > L_PTE_HYP may be confused with the Stage 1 Hyp translation which is
> > different from the guest Stage 2.
> 
> exactly, it's misleading, how about L_PTE_STAGE2, a little verbose,
> but clear...?

I don't mind any (apart from L_PTE_HYP_ would be confusing) for stage 2.
You could just use S2 to make it shorter.

-- 
Catalin

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 02/15] ARM: Add page table and page defines needed by KVM
  2012-09-18 15:07           ` Catalin Marinas
@ 2012-09-18 15:10             ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 15:10 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Will Deacon, linux-arm-kernel, kvm, kvmarm

On Tue, Sep 18, 2012 at 11:07 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Tue, Sep 18, 2012 at 04:05:13PM +0100, Christoffer Dall wrote:
>> On Tue, Sep 18, 2012 at 10:06 AM, Catalin Marinas
>> <catalin.marinas@arm.com> wrote:
>> > On 18 September 2012 13:47, Will Deacon <will.deacon@arm.com> wrote:
>> >> On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
>> >>> +#define L_PTE2_SHARED                L_PTE_SHARED
>> >>> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>> >>> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>> >>
>> >> This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
>> >> stage 1 translation and name these RDONLY and WRONLY (do you even use
>> >> that?).
>> >
>> > We can't use RDONLY as this would have value 0 as the HAP attributes
>> > (stage 2 overriding stage 1 translation attributes). Unless you add 4
>> > definitions like NOACCESS, RDONLY, WRONLY and RDWR to cover all the
>> > bit combinations.
>> >
>> >>> +#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
>> >>> +#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
>> >>
>> >> Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?
>> >
>> > L_PTE_HYP may be confused with the Stage 1 Hyp translation which is
>> > different from the guest Stage 2.
>>
>> exactly, it's misleading, how about L_PTE_STAGE2, a little verbose,
>> but clear...?
>
> I don't mind any (apart from L_PTE_HYP_ would be confusing) for stage 2.
> You could just use S2 to make it shorter.
>
I'm good with that, done.

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 02/15] ARM: Add page table and page defines needed by KVM
@ 2012-09-18 15:10             ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 15:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 11:07 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Tue, Sep 18, 2012 at 04:05:13PM +0100, Christoffer Dall wrote:
>> On Tue, Sep 18, 2012 at 10:06 AM, Catalin Marinas
>> <catalin.marinas@arm.com> wrote:
>> > On 18 September 2012 13:47, Will Deacon <will.deacon@arm.com> wrote:
>> >> On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
>> >>> +#define L_PTE2_SHARED                L_PTE_SHARED
>> >>> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>> >>> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>> >>
>> >> This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
>> >> stage 1 translation and name these RDONLY and WRONLY (do you even use
>> >> that?).
>> >
>> > We can't use RDONLY as this would have value 0 as the HAP attributes
>> > (stage 2 overriding stage 1 translation attributes). Unless you add 4
>> > definitions like NOACCESS, RDONLY, WRONLY and RDWR to cover all the
>> > bit combinations.
>> >
>> >>> +#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
>> >>> +#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
>> >>
>> >> Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?
>> >
>> > L_PTE_HYP may be confused with the Stage 1 Hyp translation which is
>> > different from the guest Stage 2.
>>
>> exactly, it's misleading, how about L_PTE_STAGE2, a little verbose,
>> but clear...?
>
> I don't mind any (apart from L_PTE_HYP_ would be confusing) for stage 2.
> You could just use S2 to make it shorter.
>
I'm good with that, done.

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 01/15] ARM: add mem_type prot_pte accessor
  2012-09-18 12:23     ` Will Deacon
@ 2012-09-18 19:18       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 19:18 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Sep 18, 2012 at 8:23 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:34:36PM +0100, Christoffer Dall wrote:
>> From: Marc Zyngier <marc.zyngier@arm.com>
>>
>> The KVM hypervisor mmu code requires access to the mem_type prot_pte
>> field when setting up page tables pointing to a device. Unfortunately,
>> the mem_type structure is opaque.
>>
>> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
>> value.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/mach/map.h |    1 +
>>  arch/arm/mm/mmu.c               |    6 ++++++
>>  2 files changed, 7 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
>> index a6efcdd..3787c9f 100644
>> --- a/arch/arm/include/asm/mach/map.h
>> +++ b/arch/arm/include/asm/mach/map.h
>> @@ -37,6 +37,7 @@ extern void iotable_init(struct map_desc *, int);
>>
>>  struct mem_type;
>>  extern const struct mem_type *get_mem_type(unsigned int type);
>> +extern pteval_t get_mem_type_prot_pte(unsigned int type);
>>  /*
>>   * external interface to remap single page with appropriate type
>>   */
>> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
>> index 4c2d045..76bf4f5 100644
>> --- a/arch/arm/mm/mmu.c
>> +++ b/arch/arm/mm/mmu.c
>> @@ -301,6 +301,12 @@ const struct mem_type *get_mem_type(unsigned int type)
>>  }
>>  EXPORT_SYMBOL(get_mem_type);
>>
>> +pteval_t get_mem_type_prot_pte(unsigned int type)
>> +{
>> +     return get_mem_type(type)->prot_pte;
>> +}
>> +EXPORT_SYMBOL(get_mem_type_prot_pte);
>> +
>
> get_mem_type can return NULL, so you should probably pass the error through
> rather than dereferencing it.
>
right, I guess callers can check against 0, since L_PTE_PRESENT should
always be there.

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index a153fd4..f2b6287 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -305,7 +305,9 @@ EXPORT_SYMBOL(get_mem_type);

 pteval_t get_mem_type_prot_pte(unsigned int type)
 {
-       return get_mem_type(type)->prot_pte;
+       if (get_mem_type(type))
+               return get_mem_type(type)->prot_pte;
+       return 0;
 }
 EXPORT_SYMBOL(get_mem_type_prot_pte);

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 01/15] ARM: add mem_type prot_pte accessor
@ 2012-09-18 19:18       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 19:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 8:23 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:34:36PM +0100, Christoffer Dall wrote:
>> From: Marc Zyngier <marc.zyngier@arm.com>
>>
>> The KVM hypervisor mmu code requires access to the mem_type prot_pte
>> field when setting up page tables pointing to a device. Unfortunately,
>> the mem_type structure is opaque.
>>
>> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
>> value.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/mach/map.h |    1 +
>>  arch/arm/mm/mmu.c               |    6 ++++++
>>  2 files changed, 7 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
>> index a6efcdd..3787c9f 100644
>> --- a/arch/arm/include/asm/mach/map.h
>> +++ b/arch/arm/include/asm/mach/map.h
>> @@ -37,6 +37,7 @@ extern void iotable_init(struct map_desc *, int);
>>
>>  struct mem_type;
>>  extern const struct mem_type *get_mem_type(unsigned int type);
>> +extern pteval_t get_mem_type_prot_pte(unsigned int type);
>>  /*
>>   * external interface to remap single page with appropriate type
>>   */
>> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
>> index 4c2d045..76bf4f5 100644
>> --- a/arch/arm/mm/mmu.c
>> +++ b/arch/arm/mm/mmu.c
>> @@ -301,6 +301,12 @@ const struct mem_type *get_mem_type(unsigned int type)
>>  }
>>  EXPORT_SYMBOL(get_mem_type);
>>
>> +pteval_t get_mem_type_prot_pte(unsigned int type)
>> +{
>> +     return get_mem_type(type)->prot_pte;
>> +}
>> +EXPORT_SYMBOL(get_mem_type_prot_pte);
>> +
>
> get_mem_type can return NULL, so you should probably pass the error through
> rather than dereferencing it.
>
right, I guess callers can check against 0, since L_PTE_PRESENT should
always be there.

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index a153fd4..f2b6287 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -305,7 +305,9 @@ EXPORT_SYMBOL(get_mem_type);

 pteval_t get_mem_type_prot_pte(unsigned int type)
 {
-       return get_mem_type(type)->prot_pte;
+       if (get_mem_type(type))
+               return get_mem_type(type)->prot_pte;
+       return 0;
 }
 EXPORT_SYMBOL(get_mem_type_prot_pte);

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* Re: [PATCH 01/15] ARM: add mem_type prot_pte accessor
  2012-09-15 15:34   ` Christoffer Dall
@ 2012-09-18 21:04     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 164+ messages in thread
From: Russell King - ARM Linux @ 2012-09-18 21:04 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Sat, Sep 15, 2012 at 11:34:36AM -0400, Christoffer Dall wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> The KVM hypervisor mmu code requires access to the mem_type prot_pte
> field when setting up page tables pointing to a device. Unfortunately,
> the mem_type structure is opaque.
> 
> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
> value.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

Is there a reason why we need this to be exposed, along with all the
page table manipulation in patch 7?

Is there a reason why we can't have new MT_ types for PAGE_HYP and
the HYP MT_DEVICE type (which is the same as MT_DEVICE but with
PTE_USER set) and have the standard ARM/generic kernel code build
those mappings?

That would (it seems) also avoid the need to export the pXd_clear_bad()
acessors too...

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 01/15] ARM: add mem_type prot_pte accessor
@ 2012-09-18 21:04     ` Russell King - ARM Linux
  0 siblings, 0 replies; 164+ messages in thread
From: Russell King - ARM Linux @ 2012-09-18 21:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 15, 2012 at 11:34:36AM -0400, Christoffer Dall wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> The KVM hypervisor mmu code requires access to the mem_type prot_pte
> field when setting up page tables pointing to a device. Unfortunately,
> the mem_type structure is opaque.
> 
> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
> value.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

Is there a reason why we need this to be exposed, along with all the
page table manipulation in patch 7?

Is there a reason why we can't have new MT_ types for PAGE_HYP and
the HYP MT_DEVICE type (which is the same as MT_DEVICE but with
PTE_USER set) and have the standard ARM/generic kernel code build
those mappings?

That would (it seems) also avoid the need to export the pXd_clear_bad()
acessors too...

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 01/15] ARM: add mem_type prot_pte accessor
  2012-09-18 21:04     ` Russell King - ARM Linux
@ 2012-09-18 21:53       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 21:53 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Sep 18, 2012 at 5:04 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Sat, Sep 15, 2012 at 11:34:36AM -0400, Christoffer Dall wrote:
>> From: Marc Zyngier <marc.zyngier@arm.com>
>>
>> The KVM hypervisor mmu code requires access to the mem_type prot_pte
>> field when setting up page tables pointing to a device. Unfortunately,
>> the mem_type structure is opaque.
>>
>> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
>> value.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>
> Is there a reason why we need this to be exposed, along with all the
> page table manipulation in patch 7?
>
> Is there a reason why we can't have new MT_ types for PAGE_HYP and
> the HYP MT_DEVICE type (which is the same as MT_DEVICE but with
> PTE_USER set) and have the standard ARM/generic kernel code build
> those mappings?

For hyp mode we can do this, but we cannot do this for the cpu
interfaces that need to be mapped into each VM as they have each their
own pgd. We can move the Hyp mode mappings, and I was playing with the
though of having a PAGE_KVM_DEVICE set in pgtable.h and a
pgprot_guest_device setup in build_mem_type_table.

What do you think?

>
> That would (it seems) also avoid the need to export the pXd_clear_bad()
> acessors too...

for hyp mode mappings this would be true, then.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 01/15] ARM: add mem_type prot_pte accessor
@ 2012-09-18 21:53       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 21:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 5:04 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Sat, Sep 15, 2012 at 11:34:36AM -0400, Christoffer Dall wrote:
>> From: Marc Zyngier <marc.zyngier@arm.com>
>>
>> The KVM hypervisor mmu code requires access to the mem_type prot_pte
>> field when setting up page tables pointing to a device. Unfortunately,
>> the mem_type structure is opaque.
>>
>> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
>> value.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>
> Is there a reason why we need this to be exposed, along with all the
> page table manipulation in patch 7?
>
> Is there a reason why we can't have new MT_ types for PAGE_HYP and
> the HYP MT_DEVICE type (which is the same as MT_DEVICE but with
> PTE_USER set) and have the standard ARM/generic kernel code build
> those mappings?

For hyp mode we can do this, but we cannot do this for the cpu
interfaces that need to be mapped into each VM as they have each their
own pgd. We can move the Hyp mode mappings, and I was playing with the
though of having a PAGE_KVM_DEVICE set in pgtable.h and a
pgprot_guest_device setup in build_mem_type_table.

What do you think?

>
> That would (it seems) also avoid the need to export the pXd_clear_bad()
> acessors too...

for hyp mode mappings this would be true, then.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 02/15] ARM: Add page table and page defines needed by KVM
  2012-09-18 12:47     ` Will Deacon
@ 2012-09-18 22:01       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 22:01 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Sep 18, 2012 at 8:47 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
>> KVM uses the stage-2 page tables and the Hyp page table format,
>> so let's define the fields we need to access in KVM.
>>
>> We use pgprot_guest to indicate stage-2 entries.
>>
>> Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/pgtable-3level.h |   13 +++++++++++++
>>  arch/arm/include/asm/pgtable.h        |    5 +++++
>>  arch/arm/mm/mmu.c                     |    3 +++
>>  3 files changed, 21 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> index b249035..7351eee 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -102,11 +102,24 @@
>>   */
>>  #define L_PGD_SWAPPER                (_AT(pgdval_t, 1) << 55)        /* swapper_pg_dir entry */
>>
>> +/*
>> + * 2-nd stage PTE definitions for LPAE.
>> + */
>
> Minor nit: 2nd
>
>> +#define L_PTE2_SHARED                L_PTE_SHARED
>> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>
> This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
> stage 1 translation and name these RDONLY and WRONLY (do you even use
> that?).
>

The ARM arm is actually ambiguous, B3-1335 defines it as HAP[2:1], but
B3-1355 defines it as HAP[1:0] and I chose the latter as it is more
clear to most people not knowing that for historical reasons {H}AP[0]
is not defined. If there's a consensus for the other choice here, then
I'm good with that.

Also, these bits have a different meaning for stage-2, HAP[2] (ok, in
this case it's less misleading with bit index 2), HAP[2] actually
means you can write this, two clear bits means access denied, not
read/write, so it made more sense to me to do:

prot = READ | WRITE;

than

prt = RDONLY | WRONLY; // (not mutually exclusive). See my point?

>> +#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
>> +#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
>
> Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?
>
>>  #ifndef __ASSEMBLY__
>>
>>  #define pud_none(pud)                (!pud_val(pud))
>>  #define pud_bad(pud)         (!(pud_val(pud) & 2))
>>  #define pud_present(pud)     (pud_val(pud))
>> +#define pmd_table(pmd)               ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>> +                                              PMD_TYPE_TABLE)
>> +#define pmd_sect(pmd)                ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>> +                                              PMD_TYPE_SECT)
>>
>>  #define pud_clear(pudp)                      \
>>       do {                            \
>> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
>> index 41dc31f..c422f62 100644
>> --- a/arch/arm/include/asm/pgtable.h
>> +++ b/arch/arm/include/asm/pgtable.h
>> @@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
>>
>>  extern pgprot_t              pgprot_user;
>>  extern pgprot_t              pgprot_kernel;
>> +extern pgprot_t              pgprot_guest;
>>
>>  #define _MOD_PROT(p, b)      __pgprot(pgprot_val(p) | (b))
>>
>> @@ -82,6 +83,10 @@ extern pgprot_t            pgprot_kernel;
>>  #define PAGE_READONLY_EXEC   _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
>>  #define PAGE_KERNEL          _MOD_PROT(pgprot_kernel, L_PTE_XN)
>>  #define PAGE_KERNEL_EXEC     pgprot_kernel
>> +#define PAGE_HYP             _MOD_PROT(pgprot_kernel, L_PTE_USER)
>
> Just define L_PTE_HYP to L_PTE_USER, otherwise that's confusing.
>
>> +#define PAGE_KVM_GUEST               _MOD_PROT(pgprot_guest, L_PTE2_READ | \
>> +                                       L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
>> +                                       L_PTE2_SHARED)
>
> It would be cleaner to separate the cacheability attributes out from here
> and into the cache_policies array. Then you just need L_PTE_HYP_RDONLY here.
>

ok, below is an attempt to rework all this, comments please:


diff --git a/arch/arm/include/asm/pgtable-3level.h
b/arch/arm/include/asm/pgtable-3level.h
index 7351eee..6df235c 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -103,13 +103,19 @@
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */

 /*
- * 2-nd stage PTE definitions for LPAE.
+ * 2nd stage PTE definitions for LPAE.
  */
-#define L_PTE2_SHARED		L_PTE_SHARED
-#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
-#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */
-#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
-#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */
+#define L_PTE_S2_SHARED		L_PTE_SHARED
+#define L_PTE_S2_READ		(_AT(pteval_t, 1) << 6)	  /* HAP[1] */
+#define L_PTE_S2_WRITE		(_AT(pteval_t, 1) << 7)	  /* HAP[2] */
+#define L_PTE_S2_MT_UNCACHED	(_AT(pteval_t, 0x5) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRTHROUGH	(_AT(pteval_t, 0xa) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRBACK	(_AT(pteval_t, 0xf) << 2) /* MemAttr[3:0] */
+
+/*
+ * Hyp-mode PL2 PTE definitions for LPAE.
+ */
+#define L_PTE_HYP		L_PTE_USER

 #ifndef __ASSEMBLY__

diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index c422f62..6ab276b 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -83,10 +83,8 @@ extern pgprot_t		pgprot_guest;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
-#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
-#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
-					  L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
-					  L_PTE2_SHARED)
+#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_HYP)
+#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE_S2_READ)

 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 82d0edf..3ff427b 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -476,7 +476,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm,
phys_addr_t guest_ipa,

 	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
 	prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
-			L_PTE2_READ | L_PTE2_WRITE);
+			L_PTE_S2_READ | L_PTE_S2_WRITE);
 	pfn = __phys_to_pfn(pa);

 	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
@@ -567,7 +567,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
 		goto out;
 	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
 	if (writable)
-		pte_val(new_pte) |= L_PTE2_WRITE;
+		pte_val(new_pte) |= L_PTE_S2_WRITE;
 	coherent_icache_guest_page(vcpu->kvm, gfn);

 	spin_lock(&vcpu->kvm->arch.pgd_lock);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index f2b6287..a06f3496 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -67,6 +67,7 @@ struct cachepolicy {
 	unsigned int	cr_mask;
 	pmdval_t	pmd;
 	pteval_t	pte;
+	pteval_t	pte_s2;
 };

 static struct cachepolicy cache_policies[] __initdata = {
@@ -75,26 +76,31 @@ static struct cachepolicy cache_policies[] __initdata = {
 		.cr_mask	= CR_W|CR_C,
 		.pmd		= PMD_SECT_UNCACHED,
 		.pte		= L_PTE_MT_UNCACHED,
+		.pte_s2		= L_PTE_S2_MT_UNCACHED,
 	}, {
 		.policy		= "buffered",
 		.cr_mask	= CR_C,
 		.pmd		= PMD_SECT_BUFFERED,
 		.pte		= L_PTE_MT_BUFFERABLE,
+		.pte_s2		= L_PTE_S2_MT_UNCACHED,
 	}, {
 		.policy		= "writethrough",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WT,
 		.pte		= L_PTE_MT_WRITETHROUGH,
+		.pte_s2		= L_PTE_S2_MT_WRTHROUGH,
 	}, {
 		.policy		= "writeback",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WB,
 		.pte		= L_PTE_MT_WRITEBACK,
+		.pte_s2		= L_PTE_S2_MT_WRBACK,
 	}, {
 		.policy		= "writealloc",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WBWA,
 		.pte		= L_PTE_MT_WRITEALLOC,
+		.pte_s2		= L_PTE_S2_MT_WRBACK,
 	}
 };

@@ -318,7 +324,7 @@ static void __init build_mem_type_table(void)
 {
 	struct cachepolicy *cp;
 	unsigned int cr = get_cr();
-	pteval_t user_pgprot, kern_pgprot, vecs_pgprot;
+	pteval_t user_pgprot, kern_pgprot, vecs_pgprot, guest_pgprot;
 	int cpu_arch = cpu_architecture();
 	int i;

@@ -430,6 +436,7 @@ static void __init build_mem_type_table(void)
 	 */
 	cp = &cache_policies[cachepolicy];
 	vecs_pgprot = kern_pgprot = user_pgprot = cp->pte;
+	guest_pgprot = cp->pte_s2;

 	/*
 	 * Enable CPU-specific coherency if supported.
@@ -464,6 +471,7 @@ static void __init build_mem_type_table(void)
 			user_pgprot |= L_PTE_SHARED;
 			kern_pgprot |= L_PTE_SHARED;
 			vecs_pgprot |= L_PTE_SHARED;
+			guest_pgprot |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
 			mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
@@ -518,7 +526,7 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
-	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
+	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | guest_pgprot);

 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 02/15] ARM: Add page table and page defines needed by KVM
@ 2012-09-18 22:01       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 22:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 8:47 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
>> KVM uses the stage-2 page tables and the Hyp page table format,
>> so let's define the fields we need to access in KVM.
>>
>> We use pgprot_guest to indicate stage-2 entries.
>>
>> Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/pgtable-3level.h |   13 +++++++++++++
>>  arch/arm/include/asm/pgtable.h        |    5 +++++
>>  arch/arm/mm/mmu.c                     |    3 +++
>>  3 files changed, 21 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> index b249035..7351eee 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -102,11 +102,24 @@
>>   */
>>  #define L_PGD_SWAPPER                (_AT(pgdval_t, 1) << 55)        /* swapper_pg_dir entry */
>>
>> +/*
>> + * 2-nd stage PTE definitions for LPAE.
>> + */
>
> Minor nit: 2nd
>
>> +#define L_PTE2_SHARED                L_PTE_SHARED
>> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>
> This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
> stage 1 translation and name these RDONLY and WRONLY (do you even use
> that?).
>

The ARM arm is actually ambiguous, B3-1335 defines it as HAP[2:1], but
B3-1355 defines it as HAP[1:0] and I chose the latter as it is more
clear to most people not knowing that for historical reasons {H}AP[0]
is not defined. If there's a consensus for the other choice here, then
I'm good with that.

Also, these bits have a different meaning for stage-2, HAP[2] (ok, in
this case it's less misleading with bit index 2), HAP[2] actually
means you can write this, two clear bits means access denied, not
read/write, so it made more sense to me to do:

prot = READ | WRITE;

than

prt = RDONLY | WRONLY; // (not mutually exclusive). See my point?

>> +#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
>> +#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
>
> Another minor nit: PTE2 looks awful. Maybe L_PTE_HYP_* instead?
>
>>  #ifndef __ASSEMBLY__
>>
>>  #define pud_none(pud)                (!pud_val(pud))
>>  #define pud_bad(pud)         (!(pud_val(pud) & 2))
>>  #define pud_present(pud)     (pud_val(pud))
>> +#define pmd_table(pmd)               ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>> +                                              PMD_TYPE_TABLE)
>> +#define pmd_sect(pmd)                ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>> +                                              PMD_TYPE_SECT)
>>
>>  #define pud_clear(pudp)                      \
>>       do {                            \
>> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
>> index 41dc31f..c422f62 100644
>> --- a/arch/arm/include/asm/pgtable.h
>> +++ b/arch/arm/include/asm/pgtable.h
>> @@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
>>
>>  extern pgprot_t              pgprot_user;
>>  extern pgprot_t              pgprot_kernel;
>> +extern pgprot_t              pgprot_guest;
>>
>>  #define _MOD_PROT(p, b)      __pgprot(pgprot_val(p) | (b))
>>
>> @@ -82,6 +83,10 @@ extern pgprot_t            pgprot_kernel;
>>  #define PAGE_READONLY_EXEC   _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
>>  #define PAGE_KERNEL          _MOD_PROT(pgprot_kernel, L_PTE_XN)
>>  #define PAGE_KERNEL_EXEC     pgprot_kernel
>> +#define PAGE_HYP             _MOD_PROT(pgprot_kernel, L_PTE_USER)
>
> Just define L_PTE_HYP to L_PTE_USER, otherwise that's confusing.
>
>> +#define PAGE_KVM_GUEST               _MOD_PROT(pgprot_guest, L_PTE2_READ | \
>> +                                       L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
>> +                                       L_PTE2_SHARED)
>
> It would be cleaner to separate the cacheability attributes out from here
> and into the cache_policies array. Then you just need L_PTE_HYP_RDONLY here.
>

ok, below is an attempt to rework all this, comments please:


diff --git a/arch/arm/include/asm/pgtable-3level.h
b/arch/arm/include/asm/pgtable-3level.h
index 7351eee..6df235c 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -103,13 +103,19 @@
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */

 /*
- * 2-nd stage PTE definitions for LPAE.
+ * 2nd stage PTE definitions for LPAE.
  */
-#define L_PTE2_SHARED		L_PTE_SHARED
-#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
-#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */
-#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
-#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */
+#define L_PTE_S2_SHARED		L_PTE_SHARED
+#define L_PTE_S2_READ		(_AT(pteval_t, 1) << 6)	  /* HAP[1] */
+#define L_PTE_S2_WRITE		(_AT(pteval_t, 1) << 7)	  /* HAP[2] */
+#define L_PTE_S2_MT_UNCACHED	(_AT(pteval_t, 0x5) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRTHROUGH	(_AT(pteval_t, 0xa) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRBACK	(_AT(pteval_t, 0xf) << 2) /* MemAttr[3:0] */
+
+/*
+ * Hyp-mode PL2 PTE definitions for LPAE.
+ */
+#define L_PTE_HYP		L_PTE_USER

 #ifndef __ASSEMBLY__

diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index c422f62..6ab276b 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -83,10 +83,8 @@ extern pgprot_t		pgprot_guest;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
-#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
-#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
-					  L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
-					  L_PTE2_SHARED)
+#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_HYP)
+#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE_S2_READ)

 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 82d0edf..3ff427b 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -476,7 +476,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm,
phys_addr_t guest_ipa,

 	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
 	prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
-			L_PTE2_READ | L_PTE2_WRITE);
+			L_PTE_S2_READ | L_PTE_S2_WRITE);
 	pfn = __phys_to_pfn(pa);

 	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
@@ -567,7 +567,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
 		goto out;
 	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
 	if (writable)
-		pte_val(new_pte) |= L_PTE2_WRITE;
+		pte_val(new_pte) |= L_PTE_S2_WRITE;
 	coherent_icache_guest_page(vcpu->kvm, gfn);

 	spin_lock(&vcpu->kvm->arch.pgd_lock);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index f2b6287..a06f3496 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -67,6 +67,7 @@ struct cachepolicy {
 	unsigned int	cr_mask;
 	pmdval_t	pmd;
 	pteval_t	pte;
+	pteval_t	pte_s2;
 };

 static struct cachepolicy cache_policies[] __initdata = {
@@ -75,26 +76,31 @@ static struct cachepolicy cache_policies[] __initdata = {
 		.cr_mask	= CR_W|CR_C,
 		.pmd		= PMD_SECT_UNCACHED,
 		.pte		= L_PTE_MT_UNCACHED,
+		.pte_s2		= L_PTE_S2_MT_UNCACHED,
 	}, {
 		.policy		= "buffered",
 		.cr_mask	= CR_C,
 		.pmd		= PMD_SECT_BUFFERED,
 		.pte		= L_PTE_MT_BUFFERABLE,
+		.pte_s2		= L_PTE_S2_MT_UNCACHED,
 	}, {
 		.policy		= "writethrough",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WT,
 		.pte		= L_PTE_MT_WRITETHROUGH,
+		.pte_s2		= L_PTE_S2_MT_WRTHROUGH,
 	}, {
 		.policy		= "writeback",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WB,
 		.pte		= L_PTE_MT_WRITEBACK,
+		.pte_s2		= L_PTE_S2_MT_WRBACK,
 	}, {
 		.policy		= "writealloc",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WBWA,
 		.pte		= L_PTE_MT_WRITEALLOC,
+		.pte_s2		= L_PTE_S2_MT_WRBACK,
 	}
 };

@@ -318,7 +324,7 @@ static void __init build_mem_type_table(void)
 {
 	struct cachepolicy *cp;
 	unsigned int cr = get_cr();
-	pteval_t user_pgprot, kern_pgprot, vecs_pgprot;
+	pteval_t user_pgprot, kern_pgprot, vecs_pgprot, guest_pgprot;
 	int cpu_arch = cpu_architecture();
 	int i;

@@ -430,6 +436,7 @@ static void __init build_mem_type_table(void)
 	 */
 	cp = &cache_policies[cachepolicy];
 	vecs_pgprot = kern_pgprot = user_pgprot = cp->pte;
+	guest_pgprot = cp->pte_s2;

 	/*
 	 * Enable CPU-specific coherency if supported.
@@ -464,6 +471,7 @@ static void __init build_mem_type_table(void)
 			user_pgprot |= L_PTE_SHARED;
 			kern_pgprot |= L_PTE_SHARED;
 			vecs_pgprot |= L_PTE_SHARED;
+			guest_pgprot |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
 			mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
@@ -518,7 +526,7 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
-	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
+	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | guest_pgprot);

 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* Re: [PATCH 05/15] ARM: Expose PMNC bitfields for KVM use
  2012-09-18 13:08     ` Will Deacon
@ 2012-09-18 22:13       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 22:13 UTC (permalink / raw)
  To: Will Deacon, Rusty Russell; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Sep 18, 2012 at 9:08 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:35:02PM +0100, Christoffer Dall wrote:
>> From: Rusty Russell <rusty.russell@linaro.org>
>>
>> We want some of these for use in KVM, so pull them out of
>> arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.
>>
>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/perf_bits.h |   56 ++++++++++++++++++++++++++++++++++++++
>>  arch/arm/kernel/perf_event_v7.c  |   51 +----------------------------------
>>  2 files changed, 57 insertions(+), 50 deletions(-)
>>  create mode 100644 arch/arm/include/asm/perf_bits.h
>
> I don't like this I'm afraid. These bit definitions, although useful for
> kvm, are only applicable to ARMv7 PMUs. Perf does a reasonable job of
> separating the low-level CPU-specific code and adding the v7 definitions
> into their own global header feels like a step backwards. I also want to
> move a load of this into drivers/ at some point and this won't help with
> that effort.
>
> Is KVM just using this for world switch? If so, why does it care about the
> bit definitions (and what do you do for things like debug regs)? Is there
> anything I could add to perf that you could call instead?
>
I'm going to let Rusty reply to this one...

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 05/15] ARM: Expose PMNC bitfields for KVM use
@ 2012-09-18 22:13       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-18 22:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 9:08 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:35:02PM +0100, Christoffer Dall wrote:
>> From: Rusty Russell <rusty.russell@linaro.org>
>>
>> We want some of these for use in KVM, so pull them out of
>> arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.
>>
>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/perf_bits.h |   56 ++++++++++++++++++++++++++++++++++++++
>>  arch/arm/kernel/perf_event_v7.c  |   51 +----------------------------------
>>  2 files changed, 57 insertions(+), 50 deletions(-)
>>  create mode 100644 arch/arm/include/asm/perf_bits.h
>
> I don't like this I'm afraid. These bit definitions, although useful for
> kvm, are only applicable to ARMv7 PMUs. Perf does a reasonable job of
> separating the low-level CPU-specific code and adding the v7 definitions
> into their own global header feels like a step backwards. I also want to
> move a load of this into drivers/ at some point and this won't help with
> that effort.
>
> Is KVM just using this for world switch? If so, why does it care about the
> bit definitions (and what do you do for things like debug regs)? Is there
> anything I could add to perf that you could call instead?
>
I'm going to let Rusty reply to this one...

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 05/15] ARM: Expose PMNC bitfields for KVM use
  2012-09-18 13:08     ` Will Deacon
@ 2012-09-19  4:09       ` Rusty Russell
  -1 siblings, 0 replies; 164+ messages in thread
From: Rusty Russell @ 2012-09-19  4:09 UTC (permalink / raw)
  To: Will Deacon, Christoffer Dall; +Cc: linux-arm-kernel, kvm, kvmarm

Will Deacon <will.deacon@arm.com> writes:
> On Sat, Sep 15, 2012 at 04:35:02PM +0100, Christoffer Dall wrote:
>> From: Rusty Russell <rusty.russell@linaro.org>
>> 
>> We want some of these for use in KVM, so pull them out of
>> arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.
>> 
>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/perf_bits.h |   56 ++++++++++++++++++++++++++++++++++++++
>>  arch/arm/kernel/perf_event_v7.c  |   51 +----------------------------------
>>  2 files changed, 57 insertions(+), 50 deletions(-)
>>  create mode 100644 arch/arm/include/asm/perf_bits.h
>
> I don't like this I'm afraid. These bit definitions, although useful for
> kvm, are only applicable to ARMv7 PMUs. Perf does a reasonable job of
> separating the low-level CPU-specific code and adding the v7 definitions
> into their own global header feels like a step backwards. I also want to
> move a load of this into drivers/ at some point and this won't help with
> that effort.
>
> Is KVM just using this for world switch? If so, why does it care about the
> bit definitions (and what do you do for things like debug regs)? Is there
> anything I could add to perf that you could call instead?

No, we need these definitions if we ever want to actually implement
PMU for the guest.[1]

But we don't do this yet, so you can defer this patch until then if you
want.

Cheers,
Rusty.
[1] Which we should do, since you NAKed the patch which would allow the
    guest to detect that we don't have a PMU, insisting that "all A15s have
    a PMU", despite the fact that we don't.  I assume this means you're
    busy implementing it right now :)

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 05/15] ARM: Expose PMNC bitfields for KVM use
@ 2012-09-19  4:09       ` Rusty Russell
  0 siblings, 0 replies; 164+ messages in thread
From: Rusty Russell @ 2012-09-19  4:09 UTC (permalink / raw)
  To: linux-arm-kernel

Will Deacon <will.deacon@arm.com> writes:
> On Sat, Sep 15, 2012 at 04:35:02PM +0100, Christoffer Dall wrote:
>> From: Rusty Russell <rusty.russell@linaro.org>
>> 
>> We want some of these for use in KVM, so pull them out of
>> arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.
>> 
>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/perf_bits.h |   56 ++++++++++++++++++++++++++++++++++++++
>>  arch/arm/kernel/perf_event_v7.c  |   51 +----------------------------------
>>  2 files changed, 57 insertions(+), 50 deletions(-)
>>  create mode 100644 arch/arm/include/asm/perf_bits.h
>
> I don't like this I'm afraid. These bit definitions, although useful for
> kvm, are only applicable to ARMv7 PMUs. Perf does a reasonable job of
> separating the low-level CPU-specific code and adding the v7 definitions
> into their own global header feels like a step backwards. I also want to
> move a load of this into drivers/ at some point and this won't help with
> that effort.
>
> Is KVM just using this for world switch? If so, why does it care about the
> bit definitions (and what do you do for things like debug regs)? Is there
> anything I could add to perf that you could call instead?

No, we need these definitions if we ever want to actually implement
PMU for the guest.[1]

But we don't do this yet, so you can defer this patch until then if you
want.

Cheers,
Rusty.
[1] Which we should do, since you NAKed the patch which would allow the
    guest to detect that we don't have a PMU, insisting that "all A15s have
    a PMU", despite the fact that we don't.  I assume this means you're
    busy implementing it right now :)

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 02/15] ARM: Add page table and page defines needed by KVM
  2012-09-18 22:01       ` Christoffer Dall
@ 2012-09-19  9:21         ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-19  9:21 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Sep 18, 2012 at 11:01:27PM +0100, Christoffer Dall wrote:
> On Tue, Sep 18, 2012 at 8:47 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
> >> +#define L_PTE2_SHARED                L_PTE_SHARED
> >> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
> >> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
> >
> > This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
> > stage 1 translation and name these RDONLY and WRONLY (do you even use
> > that?).
> >
> 
> The ARM arm is actually ambiguous, B3-1335 defines it as HAP[2:1], but
> B3-1355 defines it as HAP[1:0] and I chose the latter as it is more
> clear to most people not knowing that for historical reasons {H}AP[0]
> is not defined. If there's a consensus for the other choice here, then
> I'm good with that.

Just checked the latest ARM ARM (rev C) and "HAP[1:0]" doesn't appear anywhere
in the document, so it looks like it's been fixed.

> Also, these bits have a different meaning for stage-2, HAP[2] (ok, in
> this case it's less misleading with bit index 2), HAP[2] actually
> means you can write this, two clear bits means access denied, not
> read/write, so it made more sense to me to do:
> 
> prot = READ | WRITE;
> 
> than
> 
> prt = RDONLY | WRONLY; // (not mutually exclusive). See my point?

If you define the bits like:

  L_PTE_S2_RDONLY	(_AT(pteval_t, 1) << 6)
  L_PTE_S2_WRONLY	(_AT(pteval_t, 2) << 6)

then I think it's fairly clear and it also matches the ARM ARM descriptions
for the HAP permissions. You could also add L_PTE_S2_RDWR if you don't like
orring the things elsewhere.

> >
> > It would be cleaner to separate the cacheability attributes out from here
> > and into the cache_policies array. Then you just need L_PTE_HYP_RDONLY here.
> >
> 
> ok, below is an attempt to rework all this, comments please:
> 
> 
> diff --git a/arch/arm/include/asm/pgtable-3level.h
> b/arch/arm/include/asm/pgtable-3level.h
> index 7351eee..6df235c 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -103,13 +103,19 @@
>  #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
> 
>  /*
> - * 2-nd stage PTE definitions for LPAE.
> + * 2nd stage PTE definitions for LPAE.
>   */
> -#define L_PTE2_SHARED		L_PTE_SHARED
> -#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
> -#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */
> -#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
> -#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */
> +#define L_PTE_S2_SHARED		L_PTE_SHARED
> +#define L_PTE_S2_READ		(_AT(pteval_t, 1) << 6)	  /* HAP[1] */
> +#define L_PTE_S2_WRITE		(_AT(pteval_t, 1) << 7)	  /* HAP[2] */
> +#define L_PTE_S2_MT_UNCACHED	(_AT(pteval_t, 0x5) << 2) /* MemAttr[3:0] */
> +#define L_PTE_S2_MT_WRTHROUGH	(_AT(pteval_t, 0xa) << 2) /* MemAttr[3:0] */
> +#define L_PTE_S2_MT_WRBACK	(_AT(pteval_t, 0xf) << 2) /* MemAttr[3:0] */

Again, just use the same names as we do for stage 1. It really makes the
code easier to read (L_PTE_S2_MT_WRITETHROUGH etc.).

> +/*
> + * Hyp-mode PL2 PTE definitions for LPAE.
> + */
> +#define L_PTE_HYP		L_PTE_USER
> 
>  #ifndef __ASSEMBLY__
> 
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index c422f62..6ab276b 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -83,10 +83,8 @@ extern pgprot_t		pgprot_guest;
>  #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
>  #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
>  #define PAGE_KERNEL_EXEC	pgprot_kernel
> -#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
> -#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
> -					  L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
> -					  L_PTE2_SHARED)
> +#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_HYP)
> +#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE_S2_READ)
> 
>  #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
>  #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 82d0edf..3ff427b 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -476,7 +476,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm,
> phys_addr_t guest_ipa,
> 
>  	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
>  	prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
> -			L_PTE2_READ | L_PTE2_WRITE);
> +			L_PTE_S2_READ | L_PTE_S2_WRITE);
>  	pfn = __phys_to_pfn(pa);
> 
>  	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
> @@ -567,7 +567,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
> phys_addr_t fault_ipa,
>  		goto out;
>  	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>  	if (writable)
> -		pte_val(new_pte) |= L_PTE2_WRITE;
> +		pte_val(new_pte) |= L_PTE_S2_WRITE;
>  	coherent_icache_guest_page(vcpu->kvm, gfn);
> 
>  	spin_lock(&vcpu->kvm->arch.pgd_lock);
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index f2b6287..a06f3496 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -67,6 +67,7 @@ struct cachepolicy {
>  	unsigned int	cr_mask;
>  	pmdval_t	pmd;
>  	pteval_t	pte;
> +	pteval_t	pte_s2;
>  };
> 
>  static struct cachepolicy cache_policies[] __initdata = {
> @@ -75,26 +76,31 @@ static struct cachepolicy cache_policies[] __initdata = {
>  		.cr_mask	= CR_W|CR_C,
>  		.pmd		= PMD_SECT_UNCACHED,
>  		.pte		= L_PTE_MT_UNCACHED,
> +		.pte_s2		= L_PTE_S2_MT_UNCACHED,
>  	}, {
>  		.policy		= "buffered",
>  		.cr_mask	= CR_C,
>  		.pmd		= PMD_SECT_BUFFERED,
>  		.pte		= L_PTE_MT_BUFFERABLE,
> +		.pte_s2		= L_PTE_S2_MT_UNCACHED,
>  	}, {
>  		.policy		= "writethrough",
>  		.cr_mask	= 0,
>  		.pmd		= PMD_SECT_WT,
>  		.pte		= L_PTE_MT_WRITETHROUGH,
> +		.pte_s2		= L_PTE_S2_MT_WRTHROUGH,
>  	}, {
>  		.policy		= "writeback",
>  		.cr_mask	= 0,
>  		.pmd		= PMD_SECT_WB,
>  		.pte		= L_PTE_MT_WRITEBACK,
> +		.pte_s2		= L_PTE_S2_MT_WRBACK,
>  	}, {
>  		.policy		= "writealloc",
>  		.cr_mask	= 0,
>  		.pmd		= PMD_SECT_WBWA,
>  		.pte		= L_PTE_MT_WRITEALLOC,
> +		.pte_s2		= L_PTE_S2_MT_WRBACK,
>  	}
>  };

Does this still compile for classic MMU? It might be nicer to use arrays for
the pte types instead of the extra field too -- I assume you'll want
something similar for the pmd if/when you map stage2 translations using
sections?

> @@ -318,7 +324,7 @@ static void __init build_mem_type_table(void)
>  {
>  	struct cachepolicy *cp;
>  	unsigned int cr = get_cr();
> -	pteval_t user_pgprot, kern_pgprot, vecs_pgprot;
> +	pteval_t user_pgprot, kern_pgprot, vecs_pgprot, guest_pgprot;
>  	int cpu_arch = cpu_architecture();
>  	int i;
> 
> @@ -430,6 +436,7 @@ static void __init build_mem_type_table(void)
>  	 */
>  	cp = &cache_policies[cachepolicy];
>  	vecs_pgprot = kern_pgprot = user_pgprot = cp->pte;
> +	guest_pgprot = cp->pte_s2;
> 
>  	/*
>  	 * Enable CPU-specific coherency if supported.
> @@ -464,6 +471,7 @@ static void __init build_mem_type_table(void)
>  			user_pgprot |= L_PTE_SHARED;
>  			kern_pgprot |= L_PTE_SHARED;
>  			vecs_pgprot |= L_PTE_SHARED;
> +			guest_pgprot |= L_PTE_SHARED;

If we're using L_PTE_SHARED directly, do we even need L_PTE_S2_SHARED to be
defined?

>  			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
>  			mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
>  			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
> @@ -518,7 +526,7 @@ static void __init build_mem_type_table(void)
>  	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
>  	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
>  				 L_PTE_DIRTY | kern_pgprot);
> -	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
> +	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | guest_pgprot);

We don't have L_PTE_S2_PRESENT, for example.

I'll try and get on to the meat of the code at some point, but there's an
awful lot of it and it's hard to see how it fits together.

Cheers,

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 02/15] ARM: Add page table and page defines needed by KVM
@ 2012-09-19  9:21         ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-19  9:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 11:01:27PM +0100, Christoffer Dall wrote:
> On Tue, Sep 18, 2012 at 8:47 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
> >> +#define L_PTE2_SHARED                L_PTE_SHARED
> >> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
> >> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
> >
> > This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
> > stage 1 translation and name these RDONLY and WRONLY (do you even use
> > that?).
> >
> 
> The ARM arm is actually ambiguous, B3-1335 defines it as HAP[2:1], but
> B3-1355 defines it as HAP[1:0] and I chose the latter as it is more
> clear to most people not knowing that for historical reasons {H}AP[0]
> is not defined. If there's a consensus for the other choice here, then
> I'm good with that.

Just checked the latest ARM ARM (rev C) and "HAP[1:0]" doesn't appear anywhere
in the document, so it looks like it's been fixed.

> Also, these bits have a different meaning for stage-2, HAP[2] (ok, in
> this case it's less misleading with bit index 2), HAP[2] actually
> means you can write this, two clear bits means access denied, not
> read/write, so it made more sense to me to do:
> 
> prot = READ | WRITE;
> 
> than
> 
> prt = RDONLY | WRONLY; // (not mutually exclusive). See my point?

If you define the bits like:

  L_PTE_S2_RDONLY	(_AT(pteval_t, 1) << 6)
  L_PTE_S2_WRONLY	(_AT(pteval_t, 2) << 6)

then I think it's fairly clear and it also matches the ARM ARM descriptions
for the HAP permissions. You could also add L_PTE_S2_RDWR if you don't like
orring the things elsewhere.

> >
> > It would be cleaner to separate the cacheability attributes out from here
> > and into the cache_policies array. Then you just need L_PTE_HYP_RDONLY here.
> >
> 
> ok, below is an attempt to rework all this, comments please:
> 
> 
> diff --git a/arch/arm/include/asm/pgtable-3level.h
> b/arch/arm/include/asm/pgtable-3level.h
> index 7351eee..6df235c 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -103,13 +103,19 @@
>  #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
> 
>  /*
> - * 2-nd stage PTE definitions for LPAE.
> + * 2nd stage PTE definitions for LPAE.
>   */
> -#define L_PTE2_SHARED		L_PTE_SHARED
> -#define L_PTE2_READ		(_AT(pteval_t, 1) << 6)	/* HAP[0] */
> -#define L_PTE2_WRITE		(_AT(pteval_t, 1) << 7)	/* HAP[1] */
> -#define L_PTE2_NORM_WB		(_AT(pteval_t, 3) << 4)	/* MemAttr[3:2] */
> -#define L_PTE2_INNER_WB		(_AT(pteval_t, 3) << 2)	/* MemAttr[1:0] */
> +#define L_PTE_S2_SHARED		L_PTE_SHARED
> +#define L_PTE_S2_READ		(_AT(pteval_t, 1) << 6)	  /* HAP[1] */
> +#define L_PTE_S2_WRITE		(_AT(pteval_t, 1) << 7)	  /* HAP[2] */
> +#define L_PTE_S2_MT_UNCACHED	(_AT(pteval_t, 0x5) << 2) /* MemAttr[3:0] */
> +#define L_PTE_S2_MT_WRTHROUGH	(_AT(pteval_t, 0xa) << 2) /* MemAttr[3:0] */
> +#define L_PTE_S2_MT_WRBACK	(_AT(pteval_t, 0xf) << 2) /* MemAttr[3:0] */

Again, just use the same names as we do for stage 1. It really makes the
code easier to read (L_PTE_S2_MT_WRITETHROUGH etc.).

> +/*
> + * Hyp-mode PL2 PTE definitions for LPAE.
> + */
> +#define L_PTE_HYP		L_PTE_USER
> 
>  #ifndef __ASSEMBLY__
> 
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index c422f62..6ab276b 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -83,10 +83,8 @@ extern pgprot_t		pgprot_guest;
>  #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
>  #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
>  #define PAGE_KERNEL_EXEC	pgprot_kernel
> -#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_USER)
> -#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE2_READ | \
> -					  L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
> -					  L_PTE2_SHARED)
> +#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_HYP)
> +#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE_S2_READ)
> 
>  #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
>  #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 82d0edf..3ff427b 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -476,7 +476,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm,
> phys_addr_t guest_ipa,
> 
>  	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
>  	prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
> -			L_PTE2_READ | L_PTE2_WRITE);
> +			L_PTE_S2_READ | L_PTE_S2_WRITE);
>  	pfn = __phys_to_pfn(pa);
> 
>  	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
> @@ -567,7 +567,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
> phys_addr_t fault_ipa,
>  		goto out;
>  	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>  	if (writable)
> -		pte_val(new_pte) |= L_PTE2_WRITE;
> +		pte_val(new_pte) |= L_PTE_S2_WRITE;
>  	coherent_icache_guest_page(vcpu->kvm, gfn);
> 
>  	spin_lock(&vcpu->kvm->arch.pgd_lock);
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index f2b6287..a06f3496 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -67,6 +67,7 @@ struct cachepolicy {
>  	unsigned int	cr_mask;
>  	pmdval_t	pmd;
>  	pteval_t	pte;
> +	pteval_t	pte_s2;
>  };
> 
>  static struct cachepolicy cache_policies[] __initdata = {
> @@ -75,26 +76,31 @@ static struct cachepolicy cache_policies[] __initdata = {
>  		.cr_mask	= CR_W|CR_C,
>  		.pmd		= PMD_SECT_UNCACHED,
>  		.pte		= L_PTE_MT_UNCACHED,
> +		.pte_s2		= L_PTE_S2_MT_UNCACHED,
>  	}, {
>  		.policy		= "buffered",
>  		.cr_mask	= CR_C,
>  		.pmd		= PMD_SECT_BUFFERED,
>  		.pte		= L_PTE_MT_BUFFERABLE,
> +		.pte_s2		= L_PTE_S2_MT_UNCACHED,
>  	}, {
>  		.policy		= "writethrough",
>  		.cr_mask	= 0,
>  		.pmd		= PMD_SECT_WT,
>  		.pte		= L_PTE_MT_WRITETHROUGH,
> +		.pte_s2		= L_PTE_S2_MT_WRTHROUGH,
>  	}, {
>  		.policy		= "writeback",
>  		.cr_mask	= 0,
>  		.pmd		= PMD_SECT_WB,
>  		.pte		= L_PTE_MT_WRITEBACK,
> +		.pte_s2		= L_PTE_S2_MT_WRBACK,
>  	}, {
>  		.policy		= "writealloc",
>  		.cr_mask	= 0,
>  		.pmd		= PMD_SECT_WBWA,
>  		.pte		= L_PTE_MT_WRITEALLOC,
> +		.pte_s2		= L_PTE_S2_MT_WRBACK,
>  	}
>  };

Does this still compile for classic MMU? It might be nicer to use arrays for
the pte types instead of the extra field too -- I assume you'll want
something similar for the pmd if/when you map stage2 translations using
sections?

> @@ -318,7 +324,7 @@ static void __init build_mem_type_table(void)
>  {
>  	struct cachepolicy *cp;
>  	unsigned int cr = get_cr();
> -	pteval_t user_pgprot, kern_pgprot, vecs_pgprot;
> +	pteval_t user_pgprot, kern_pgprot, vecs_pgprot, guest_pgprot;
>  	int cpu_arch = cpu_architecture();
>  	int i;
> 
> @@ -430,6 +436,7 @@ static void __init build_mem_type_table(void)
>  	 */
>  	cp = &cache_policies[cachepolicy];
>  	vecs_pgprot = kern_pgprot = user_pgprot = cp->pte;
> +	guest_pgprot = cp->pte_s2;
> 
>  	/*
>  	 * Enable CPU-specific coherency if supported.
> @@ -464,6 +471,7 @@ static void __init build_mem_type_table(void)
>  			user_pgprot |= L_PTE_SHARED;
>  			kern_pgprot |= L_PTE_SHARED;
>  			vecs_pgprot |= L_PTE_SHARED;
> +			guest_pgprot |= L_PTE_SHARED;

If we're using L_PTE_SHARED directly, do we even need L_PTE_S2_SHARED to be
defined?

>  			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
>  			mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
>  			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
> @@ -518,7 +526,7 @@ static void __init build_mem_type_table(void)
>  	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
>  	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
>  				 L_PTE_DIRTY | kern_pgprot);
> -	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
> +	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | guest_pgprot);

We don't have L_PTE_S2_PRESENT, for example.

I'll try and get on to the meat of the code at some point, but there's an
awful lot of it and it's hard to see how it fits together.

Cheers,

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 05/15] ARM: Expose PMNC bitfields for KVM use
  2012-09-19  4:09       ` Rusty Russell
@ 2012-09-19  9:30         ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-19  9:30 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Christoffer Dall, kvm, linux-arm-kernel, kvmarm

On Wed, Sep 19, 2012 at 05:09:34AM +0100, Rusty Russell wrote:
> Will Deacon <will.deacon@arm.com> writes:
> > On Sat, Sep 15, 2012 at 04:35:02PM +0100, Christoffer Dall wrote:
> >> From: Rusty Russell <rusty.russell@linaro.org>
> >> 
> >> We want some of these for use in KVM, so pull them out of
> >> arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.
> >> 
> >> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> >> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> >> ---
> >>  arch/arm/include/asm/perf_bits.h |   56 ++++++++++++++++++++++++++++++++++++++
> >>  arch/arm/kernel/perf_event_v7.c  |   51 +----------------------------------
> >>  2 files changed, 57 insertions(+), 50 deletions(-)
> >>  create mode 100644 arch/arm/include/asm/perf_bits.h
> >
> > I don't like this I'm afraid. These bit definitions, although useful for
> > kvm, are only applicable to ARMv7 PMUs. Perf does a reasonable job of
> > separating the low-level CPU-specific code and adding the v7 definitions
> > into their own global header feels like a step backwards. I also want to
> > move a load of this into drivers/ at some point and this won't help with
> > that effort.
> >
> > Is KVM just using this for world switch? If so, why does it care about the
> > bit definitions (and what do you do for things like debug regs)? Is there
> > anything I could add to perf that you could call instead?
> 
> No, we need these definitions if we ever want to actually implement
> PMU for the guest.[1]
> 
> But we don't do this yet, so you can defer this patch until then if you
> want.

Ok, let's postpone the pain for the moment.

> [1] Which we should do, since you NAKed the patch which would allow the
>     guest to detect that we don't have a PMU, insisting that "all A15s have
>     a PMU", despite the fact that we don't.  I assume this means you're
>     busy implementing it right now :)

Yeah, right! Happy to add hooks to perf if you need them though, rather than
expose the raw bit definitions.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 05/15] ARM: Expose PMNC bitfields for KVM use
@ 2012-09-19  9:30         ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-19  9:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 19, 2012 at 05:09:34AM +0100, Rusty Russell wrote:
> Will Deacon <will.deacon@arm.com> writes:
> > On Sat, Sep 15, 2012 at 04:35:02PM +0100, Christoffer Dall wrote:
> >> From: Rusty Russell <rusty.russell@linaro.org>
> >> 
> >> We want some of these for use in KVM, so pull them out of
> >> arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.
> >> 
> >> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> >> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> >> ---
> >>  arch/arm/include/asm/perf_bits.h |   56 ++++++++++++++++++++++++++++++++++++++
> >>  arch/arm/kernel/perf_event_v7.c  |   51 +----------------------------------
> >>  2 files changed, 57 insertions(+), 50 deletions(-)
> >>  create mode 100644 arch/arm/include/asm/perf_bits.h
> >
> > I don't like this I'm afraid. These bit definitions, although useful for
> > kvm, are only applicable to ARMv7 PMUs. Perf does a reasonable job of
> > separating the low-level CPU-specific code and adding the v7 definitions
> > into their own global header feels like a step backwards. I also want to
> > move a load of this into drivers/ at some point and this won't help with
> > that effort.
> >
> > Is KVM just using this for world switch? If so, why does it care about the
> > bit definitions (and what do you do for things like debug regs)? Is there
> > anything I could add to perf that you could call instead?
> 
> No, we need these definitions if we ever want to actually implement
> PMU for the guest.[1]
> 
> But we don't do this yet, so you can defer this patch until then if you
> want.

Ok, let's postpone the pain for the moment.

> [1] Which we should do, since you NAKed the patch which would allow the
>     guest to detect that we don't have a PMU, insisting that "all A15s have
>     a PMU", despite the fact that we don't.  I assume this means you're
>     busy implementing it right now :)

Yeah, right! Happy to add hooks to perf if you need them though, rather than
expose the raw bit definitions.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 00/15] KVM/ARM Implementation
  2012-09-15 15:34 ` Christoffer Dall
@ 2012-09-19 12:44   ` Avi Kivity
  -1 siblings, 0 replies; 164+ messages in thread
From: Avi Kivity @ 2012-09-19 12:44 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On 09/15/2012 06:34 PM, Christoffer Dall wrote:
> The following series implements KVM support for ARM processors,
> specifically on the Cortex A-15 platform.  We feel this is ready to be
> merged.
> 
> Work is done in collaboration between Columbia University, Virtual Open
> Systems and ARM/Linaro.
> 

>From my point of view this looks ready to merge.  Once all outstanding
comments have been addressed, please collect ACKs from the ARM
maintainers (esp. for the non-kvm parts), and we can finally merge this.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 00/15] KVM/ARM Implementation
@ 2012-09-19 12:44   ` Avi Kivity
  0 siblings, 0 replies; 164+ messages in thread
From: Avi Kivity @ 2012-09-19 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/15/2012 06:34 PM, Christoffer Dall wrote:
> The following series implements KVM support for ARM processors,
> specifically on the Cortex A-15 platform.  We feel this is ready to be
> merged.
> 
> Work is done in collaboration between Columbia University, Virtual Open
> Systems and ARM/Linaro.
> 

>From my point of view this looks ready to merge.  Once all outstanding
comments have been addressed, please collect ACKs from the ARM
maintainers (esp. for the non-kvm parts), and we can finally merge this.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 02/15] ARM: Add page table and page defines needed by KVM
  2012-09-19  9:21         ` Will Deacon
@ 2012-09-20  0:10           ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-20  0:10 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvm, linux-arm-kernel, kvmarm

On Wed, Sep 19, 2012 at 5:21 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Sep 18, 2012 at 11:01:27PM +0100, Christoffer Dall wrote:
>> On Tue, Sep 18, 2012 at 8:47 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
>> >> +#define L_PTE2_SHARED                L_PTE_SHARED
>> >> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>> >> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>> >
>> > This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
>> > stage 1 translation and name these RDONLY and WRONLY (do you even use
>> > that?).
>> >
>>
>> The ARM arm is actually ambiguous, B3-1335 defines it as HAP[2:1], but
>> B3-1355 defines it as HAP[1:0] and I chose the latter as it is more
>> clear to most people not knowing that for historical reasons {H}AP[0]
>> is not defined. If there's a consensus for the other choice here, then
>> I'm good with that.
>
> Just checked the latest ARM ARM (rev C) and "HAP[1:0]" doesn't appear anywhere
> in the document, so it looks like it's been fixed.
>
>> Also, these bits have a different meaning for stage-2, HAP[2] (ok, in
>> this case it's less misleading with bit index 2), HAP[2] actually
>> means you can write this, two clear bits means access denied, not
>> read/write, so it made more sense to me to do:
>>
>> prot = READ | WRITE;
>>
>> than
>>
>> prt = RDONLY | WRONLY; // (not mutually exclusive). See my point?
>
> If you define the bits like:
>
>   L_PTE_S2_RDONLY       (_AT(pteval_t, 1) << 6)
>   L_PTE_S2_WRONLY       (_AT(pteval_t, 2) << 6)
>
> then I think it's fairly clear and it also matches the ARM ARM descriptions
> for the HAP permissions. You could also add L_PTE_S2_RDWR if you don't like
> orring the things elsewhere.
>

ok

>> >
>> > It would be cleaner to separate the cacheability attributes out from here
>> > and into the cache_policies array. Then you just need L_PTE_HYP_RDONLY here.
>> >
>>
>> ok, below is an attempt to rework all this, comments please:
>>
>>
>> diff --git a/arch/arm/include/asm/pgtable-3level.h
>> b/arch/arm/include/asm/pgtable-3level.h
>> index 7351eee..6df235c 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -103,13 +103,19 @@
>>  #define L_PGD_SWAPPER                (_AT(pgdval_t, 1) << 55)        /* swapper_pg_dir entry */
>>
>>  /*
>> - * 2-nd stage PTE definitions for LPAE.
>> + * 2nd stage PTE definitions for LPAE.
>>   */
>> -#define L_PTE2_SHARED                L_PTE_SHARED
>> -#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>> -#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>> -#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
>> -#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
>> +#define L_PTE_S2_SHARED              L_PTE_SHARED
>> +#define L_PTE_S2_READ                (_AT(pteval_t, 1) << 6)   /* HAP[1] */
>> +#define L_PTE_S2_WRITE               (_AT(pteval_t, 1) << 7)   /* HAP[2] */
>> +#define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x5) << 2) /* MemAttr[3:0] */
>> +#define L_PTE_S2_MT_WRTHROUGH        (_AT(pteval_t, 0xa) << 2) /* MemAttr[3:0] */
>> +#define L_PTE_S2_MT_WRBACK   (_AT(pteval_t, 0xf) << 2) /* MemAttr[3:0] */
>
> Again, just use the same names as we do for stage 1. It really makes the
> code easier to read (L_PTE_S2_MT_WRITETHROUGH etc.).
>

ok, I've changed it, they're just really long

>> +/*
>> + * Hyp-mode PL2 PTE definitions for LPAE.
>> + */
>> +#define L_PTE_HYP            L_PTE_USER
>>
>>  #ifndef __ASSEMBLY__
>>
>> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
>> index c422f62..6ab276b 100644
>> --- a/arch/arm/include/asm/pgtable.h
>> +++ b/arch/arm/include/asm/pgtable.h
>> @@ -83,10 +83,8 @@ extern pgprot_t            pgprot_guest;
>>  #define PAGE_READONLY_EXEC   _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
>>  #define PAGE_KERNEL          _MOD_PROT(pgprot_kernel, L_PTE_XN)
>>  #define PAGE_KERNEL_EXEC     pgprot_kernel
>> -#define PAGE_HYP             _MOD_PROT(pgprot_kernel, L_PTE_USER)
>> -#define PAGE_KVM_GUEST               _MOD_PROT(pgprot_guest, L_PTE2_READ | \
>> -                                       L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
>> -                                       L_PTE2_SHARED)
>> +#define PAGE_HYP             _MOD_PROT(pgprot_kernel, L_PTE_HYP)
>> +#define PAGE_KVM_GUEST               _MOD_PROT(pgprot_guest, L_PTE_S2_READ)
>>
>>  #define __PAGE_NONE          __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
>>  #define __PAGE_SHARED                __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 82d0edf..3ff427b 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -476,7 +476,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm,
>> phys_addr_t guest_ipa,
>>
>>       end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
>>       prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
>> -                     L_PTE2_READ | L_PTE2_WRITE);
>> +                     L_PTE_S2_READ | L_PTE_S2_WRITE);
>>       pfn = __phys_to_pfn(pa);
>>
>>       for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
>> @@ -567,7 +567,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
>> phys_addr_t fault_ipa,
>>               goto out;
>>       new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>>       if (writable)
>> -             pte_val(new_pte) |= L_PTE2_WRITE;
>> +             pte_val(new_pte) |= L_PTE_S2_WRITE;
>>       coherent_icache_guest_page(vcpu->kvm, gfn);
>>
>>       spin_lock(&vcpu->kvm->arch.pgd_lock);
>> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
>> index f2b6287..a06f3496 100644
>> --- a/arch/arm/mm/mmu.c
>> +++ b/arch/arm/mm/mmu.c
>> @@ -67,6 +67,7 @@ struct cachepolicy {
>>       unsigned int    cr_mask;
>>       pmdval_t        pmd;
>>       pteval_t        pte;
>> +     pteval_t        pte_s2;
>>  };
>>
>>  static struct cachepolicy cache_policies[] __initdata = {
>> @@ -75,26 +76,31 @@ static struct cachepolicy cache_policies[] __initdata = {
>>               .cr_mask        = CR_W|CR_C,
>>               .pmd            = PMD_SECT_UNCACHED,
>>               .pte            = L_PTE_MT_UNCACHED,
>> +             .pte_s2         = L_PTE_S2_MT_UNCACHED,
>>       }, {
>>               .policy         = "buffered",
>>               .cr_mask        = CR_C,
>>               .pmd            = PMD_SECT_BUFFERED,
>>               .pte            = L_PTE_MT_BUFFERABLE,
>> +             .pte_s2         = L_PTE_S2_MT_UNCACHED,
>>       }, {
>>               .policy         = "writethrough",
>>               .cr_mask        = 0,
>>               .pmd            = PMD_SECT_WT,
>>               .pte            = L_PTE_MT_WRITETHROUGH,
>> +             .pte_s2         = L_PTE_S2_MT_WRTHROUGH,
>>       }, {
>>               .policy         = "writeback",
>>               .cr_mask        = 0,
>>               .pmd            = PMD_SECT_WB,
>>               .pte            = L_PTE_MT_WRITEBACK,
>> +             .pte_s2         = L_PTE_S2_MT_WRBACK,
>>       }, {
>>               .policy         = "writealloc",
>>               .cr_mask        = 0,
>>               .pmd            = PMD_SECT_WBWA,
>>               .pte            = L_PTE_MT_WRITEALLOC,
>> +             .pte_s2         = L_PTE_S2_MT_WRBACK,
>>       }
>>  };
>
> Does this still compile for classic MMU? It might be nicer to use arrays for
> the pte types instead of the extra field too -- I assume you'll want
> something similar for the pmd if/when you map stage2 translations using
> sections?
>

definitely did not compile for MMU, and was just an RFC. I don't think
it's nicer to have an opaque array than an explicit field as it is
clearer and requires less code changes, and I think we will add the
pmd_s2 field when we need it, so we don't merge untested code. Perhaps
I am missing some good reason to use an array.

>> @@ -318,7 +324,7 @@ static void __init build_mem_type_table(void)
>>  {
>>       struct cachepolicy *cp;
>>       unsigned int cr = get_cr();
>> -     pteval_t user_pgprot, kern_pgprot, vecs_pgprot;
>> +     pteval_t user_pgprot, kern_pgprot, vecs_pgprot, guest_pgprot;
>>       int cpu_arch = cpu_architecture();
>>       int i;
>>
>> @@ -430,6 +436,7 @@ static void __init build_mem_type_table(void)
>>        */
>>       cp = &cache_policies[cachepolicy];
>>       vecs_pgprot = kern_pgprot = user_pgprot = cp->pte;
>> +     guest_pgprot = cp->pte_s2;
>>
>>       /*
>>        * Enable CPU-specific coherency if supported.
>> @@ -464,6 +471,7 @@ static void __init build_mem_type_table(void)
>>                       user_pgprot |= L_PTE_SHARED;
>>                       kern_pgprot |= L_PTE_SHARED;
>>                       vecs_pgprot |= L_PTE_SHARED;
>> +                     guest_pgprot |= L_PTE_SHARED;
>
> If we're using L_PTE_SHARED directly, do we even need L_PTE_S2_SHARED to be
> defined?
>

nope, removed it.

>>                       mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
>>                       mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
>>                       mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
>> @@ -518,7 +526,7 @@ static void __init build_mem_type_table(void)
>>       pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
>>       pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
>>                                L_PTE_DIRTY | kern_pgprot);
>> -     pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
>> +     pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | guest_pgprot);
>
> We don't have L_PTE_S2_PRESENT, for example.
>
> I'll try and get on to the meat of the code at some point, but there's an
> awful lot of it and it's hard to see how it fits together.
>
I don't think it's that bad once you understand the KVM structure
(which you will need to anyway to review the code) and it's very much
self-contained. In any case, I'll be available for any questions to
clarify things if needed. I appreciate the review very much!

Here's an updated version of the entire patch that compile both
with/without LPAE:


Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Wed Sep 19 17:50:07 2012 -0400

    ARM: Add page table and page defines needed by KVM

    KVM uses the stage-2 page tables and the Hyp page table format,
    so let's define the fields we need to access in KVM.

    We use pgprot_guest to indicate stage-2 entries.

    Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/include/asm/pgtable-3level.h
b/arch/arm/include/asm/pgtable-3level.h
index b249035..eaba5a4 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -102,11 +102,29 @@
  */
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */

+/*
+ * 2nd stage PTE definitions for LPAE.
+ */
+#define L_PTE_S2_MT_UNCACHED	 (_AT(pteval_t, 0x5) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRITEBACK	 (_AT(pteval_t, 0xf) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_RDONLY		 (_AT(pteval_t, 1) << 6)   /* HAP[1]   */
+#define L_PTE_S2_RDWR		 (_AT(pteval_t, 2) << 6)   /* HAP[2:1] */
+
+/*
+ * Hyp-mode PL2 PTE definitions for LPAE.
+ */
+#define L_PTE_HYP		L_PTE_USER
+
 #ifndef __ASSEMBLY__

 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & 2))
 #define pud_present(pud)	(pud_val(pud))
+#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_TABLE)
+#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_SECT)

 #define pud_clear(pudp)			\
 	do {				\
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 41dc31f..f9a124a 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);

 extern pgprot_t		pgprot_user;
 extern pgprot_t		pgprot_kernel;
+extern pgprot_t		pgprot_guest;

 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))

@@ -82,6 +83,8 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
+#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_HYP)
+#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE_S2_RDONLY)

 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 57f21de..6267f38 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -56,43 +56,57 @@ static unsigned int cachepolicy __initdata =
CPOLICY_WRITEBACK;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
+pgprot_t pgprot_guest;

 EXPORT_SYMBOL(pgprot_user);
 EXPORT_SYMBOL(pgprot_kernel);
+EXPORT_SYMBOL(pgprot_guest);

 struct cachepolicy {
 	const char	policy[16];
 	unsigned int	cr_mask;
 	pmdval_t	pmd;
 	pteval_t	pte;
+	pteval_t	pte_s2;
 };

+#ifdef CONFIG_ARM_LPAE
+#define s2_policy(policy)	policy
+#else
+#define s2_policy(policy)	0
+#endif
+
 static struct cachepolicy cache_policies[] __initdata = {
 	{
 		.policy		= "uncached",
 		.cr_mask	= CR_W|CR_C,
 		.pmd		= PMD_SECT_UNCACHED,
 		.pte		= L_PTE_MT_UNCACHED,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_UNCACHED),
 	}, {
 		.policy		= "buffered",
 		.cr_mask	= CR_C,
 		.pmd		= PMD_SECT_BUFFERED,
 		.pte		= L_PTE_MT_BUFFERABLE,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_UNCACHED),
 	}, {
 		.policy		= "writethrough",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WT,
 		.pte		= L_PTE_MT_WRITETHROUGH,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITETHROUGH),
 	}, {
 		.policy		= "writeback",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WB,
 		.pte		= L_PTE_MT_WRITEBACK,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITEBACK),
 	}, {
 		.policy		= "writealloc",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WBWA,
 		.pte		= L_PTE_MT_WRITEALLOC,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITEBACK),
 	}
 };

@@ -314,7 +328,7 @@ static void __init build_mem_type_table(void)
 {
 	struct cachepolicy *cp;
 	unsigned int cr = get_cr();
-	pteval_t user_pgprot, kern_pgprot, vecs_pgprot;
+	pteval_t user_pgprot, kern_pgprot, vecs_pgprot, guest_pgprot;
 	int cpu_arch = cpu_architecture();
 	int i;

@@ -426,6 +440,7 @@ static void __init build_mem_type_table(void)
 	 */
 	cp = &cache_policies[cachepolicy];
 	vecs_pgprot = kern_pgprot = user_pgprot = cp->pte;
+	guest_pgprot = cp->pte_s2;

 	/*
 	 * Enable CPU-specific coherency if supported.
@@ -460,6 +475,7 @@ static void __init build_mem_type_table(void)
 			user_pgprot |= L_PTE_SHARED;
 			kern_pgprot |= L_PTE_SHARED;
 			vecs_pgprot |= L_PTE_SHARED;
+			guest_pgprot |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
 			mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
@@ -514,6 +530,7 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
+	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | guest_pgprot);

 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 02/15] ARM: Add page table and page defines needed by KVM
@ 2012-09-20  0:10           ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-20  0:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 19, 2012 at 5:21 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Sep 18, 2012 at 11:01:27PM +0100, Christoffer Dall wrote:
>> On Tue, Sep 18, 2012 at 8:47 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Sat, Sep 15, 2012 at 04:34:43PM +0100, Christoffer Dall wrote:
>> >> +#define L_PTE2_SHARED                L_PTE_SHARED
>> >> +#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>> >> +#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>> >
>> > This is actually HAP[2:1], not HAP[1:0]. Also, can you follow what we do for
>> > stage 1 translation and name these RDONLY and WRONLY (do you even use
>> > that?).
>> >
>>
>> The ARM arm is actually ambiguous, B3-1335 defines it as HAP[2:1], but
>> B3-1355 defines it as HAP[1:0] and I chose the latter as it is more
>> clear to most people not knowing that for historical reasons {H}AP[0]
>> is not defined. If there's a consensus for the other choice here, then
>> I'm good with that.
>
> Just checked the latest ARM ARM (rev C) and "HAP[1:0]" doesn't appear anywhere
> in the document, so it looks like it's been fixed.
>
>> Also, these bits have a different meaning for stage-2, HAP[2] (ok, in
>> this case it's less misleading with bit index 2), HAP[2] actually
>> means you can write this, two clear bits means access denied, not
>> read/write, so it made more sense to me to do:
>>
>> prot = READ | WRITE;
>>
>> than
>>
>> prt = RDONLY | WRONLY; // (not mutually exclusive). See my point?
>
> If you define the bits like:
>
>   L_PTE_S2_RDONLY       (_AT(pteval_t, 1) << 6)
>   L_PTE_S2_WRONLY       (_AT(pteval_t, 2) << 6)
>
> then I think it's fairly clear and it also matches the ARM ARM descriptions
> for the HAP permissions. You could also add L_PTE_S2_RDWR if you don't like
> orring the things elsewhere.
>

ok

>> >
>> > It would be cleaner to separate the cacheability attributes out from here
>> > and into the cache_policies array. Then you just need L_PTE_HYP_RDONLY here.
>> >
>>
>> ok, below is an attempt to rework all this, comments please:
>>
>>
>> diff --git a/arch/arm/include/asm/pgtable-3level.h
>> b/arch/arm/include/asm/pgtable-3level.h
>> index 7351eee..6df235c 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -103,13 +103,19 @@
>>  #define L_PGD_SWAPPER                (_AT(pgdval_t, 1) << 55)        /* swapper_pg_dir entry */
>>
>>  /*
>> - * 2-nd stage PTE definitions for LPAE.
>> + * 2nd stage PTE definitions for LPAE.
>>   */
>> -#define L_PTE2_SHARED                L_PTE_SHARED
>> -#define L_PTE2_READ          (_AT(pteval_t, 1) << 6) /* HAP[0] */
>> -#define L_PTE2_WRITE         (_AT(pteval_t, 1) << 7) /* HAP[1] */
>> -#define L_PTE2_NORM_WB               (_AT(pteval_t, 3) << 4) /* MemAttr[3:2] */
>> -#define L_PTE2_INNER_WB              (_AT(pteval_t, 3) << 2) /* MemAttr[1:0] */
>> +#define L_PTE_S2_SHARED              L_PTE_SHARED
>> +#define L_PTE_S2_READ                (_AT(pteval_t, 1) << 6)   /* HAP[1] */
>> +#define L_PTE_S2_WRITE               (_AT(pteval_t, 1) << 7)   /* HAP[2] */
>> +#define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x5) << 2) /* MemAttr[3:0] */
>> +#define L_PTE_S2_MT_WRTHROUGH        (_AT(pteval_t, 0xa) << 2) /* MemAttr[3:0] */
>> +#define L_PTE_S2_MT_WRBACK   (_AT(pteval_t, 0xf) << 2) /* MemAttr[3:0] */
>
> Again, just use the same names as we do for stage 1. It really makes the
> code easier to read (L_PTE_S2_MT_WRITETHROUGH etc.).
>

ok, I've changed it, they're just really long

>> +/*
>> + * Hyp-mode PL2 PTE definitions for LPAE.
>> + */
>> +#define L_PTE_HYP            L_PTE_USER
>>
>>  #ifndef __ASSEMBLY__
>>
>> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
>> index c422f62..6ab276b 100644
>> --- a/arch/arm/include/asm/pgtable.h
>> +++ b/arch/arm/include/asm/pgtable.h
>> @@ -83,10 +83,8 @@ extern pgprot_t            pgprot_guest;
>>  #define PAGE_READONLY_EXEC   _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
>>  #define PAGE_KERNEL          _MOD_PROT(pgprot_kernel, L_PTE_XN)
>>  #define PAGE_KERNEL_EXEC     pgprot_kernel
>> -#define PAGE_HYP             _MOD_PROT(pgprot_kernel, L_PTE_USER)
>> -#define PAGE_KVM_GUEST               _MOD_PROT(pgprot_guest, L_PTE2_READ | \
>> -                                       L_PTE2_NORM_WB | L_PTE2_INNER_WB | \
>> -                                       L_PTE2_SHARED)
>> +#define PAGE_HYP             _MOD_PROT(pgprot_kernel, L_PTE_HYP)
>> +#define PAGE_KVM_GUEST               _MOD_PROT(pgprot_guest, L_PTE_S2_READ)
>>
>>  #define __PAGE_NONE          __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
>>  #define __PAGE_SHARED                __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 82d0edf..3ff427b 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -476,7 +476,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm,
>> phys_addr_t guest_ipa,
>>
>>       end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
>>       prot = __pgprot(get_mem_type_prot_pte(MT_DEVICE) | L_PTE_USER |
>> -                     L_PTE2_READ | L_PTE2_WRITE);
>> +                     L_PTE_S2_READ | L_PTE_S2_WRITE);
>>       pfn = __phys_to_pfn(pa);
>>
>>       for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
>> @@ -567,7 +567,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
>> phys_addr_t fault_ipa,
>>               goto out;
>>       new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>>       if (writable)
>> -             pte_val(new_pte) |= L_PTE2_WRITE;
>> +             pte_val(new_pte) |= L_PTE_S2_WRITE;
>>       coherent_icache_guest_page(vcpu->kvm, gfn);
>>
>>       spin_lock(&vcpu->kvm->arch.pgd_lock);
>> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
>> index f2b6287..a06f3496 100644
>> --- a/arch/arm/mm/mmu.c
>> +++ b/arch/arm/mm/mmu.c
>> @@ -67,6 +67,7 @@ struct cachepolicy {
>>       unsigned int    cr_mask;
>>       pmdval_t        pmd;
>>       pteval_t        pte;
>> +     pteval_t        pte_s2;
>>  };
>>
>>  static struct cachepolicy cache_policies[] __initdata = {
>> @@ -75,26 +76,31 @@ static struct cachepolicy cache_policies[] __initdata = {
>>               .cr_mask        = CR_W|CR_C,
>>               .pmd            = PMD_SECT_UNCACHED,
>>               .pte            = L_PTE_MT_UNCACHED,
>> +             .pte_s2         = L_PTE_S2_MT_UNCACHED,
>>       }, {
>>               .policy         = "buffered",
>>               .cr_mask        = CR_C,
>>               .pmd            = PMD_SECT_BUFFERED,
>>               .pte            = L_PTE_MT_BUFFERABLE,
>> +             .pte_s2         = L_PTE_S2_MT_UNCACHED,
>>       }, {
>>               .policy         = "writethrough",
>>               .cr_mask        = 0,
>>               .pmd            = PMD_SECT_WT,
>>               .pte            = L_PTE_MT_WRITETHROUGH,
>> +             .pte_s2         = L_PTE_S2_MT_WRTHROUGH,
>>       }, {
>>               .policy         = "writeback",
>>               .cr_mask        = 0,
>>               .pmd            = PMD_SECT_WB,
>>               .pte            = L_PTE_MT_WRITEBACK,
>> +             .pte_s2         = L_PTE_S2_MT_WRBACK,
>>       }, {
>>               .policy         = "writealloc",
>>               .cr_mask        = 0,
>>               .pmd            = PMD_SECT_WBWA,
>>               .pte            = L_PTE_MT_WRITEALLOC,
>> +             .pte_s2         = L_PTE_S2_MT_WRBACK,
>>       }
>>  };
>
> Does this still compile for classic MMU? It might be nicer to use arrays for
> the pte types instead of the extra field too -- I assume you'll want
> something similar for the pmd if/when you map stage2 translations using
> sections?
>

definitely did not compile for MMU, and was just an RFC. I don't think
it's nicer to have an opaque array than an explicit field as it is
clearer and requires less code changes, and I think we will add the
pmd_s2 field when we need it, so we don't merge untested code. Perhaps
I am missing some good reason to use an array.

>> @@ -318,7 +324,7 @@ static void __init build_mem_type_table(void)
>>  {
>>       struct cachepolicy *cp;
>>       unsigned int cr = get_cr();
>> -     pteval_t user_pgprot, kern_pgprot, vecs_pgprot;
>> +     pteval_t user_pgprot, kern_pgprot, vecs_pgprot, guest_pgprot;
>>       int cpu_arch = cpu_architecture();
>>       int i;
>>
>> @@ -430,6 +436,7 @@ static void __init build_mem_type_table(void)
>>        */
>>       cp = &cache_policies[cachepolicy];
>>       vecs_pgprot = kern_pgprot = user_pgprot = cp->pte;
>> +     guest_pgprot = cp->pte_s2;
>>
>>       /*
>>        * Enable CPU-specific coherency if supported.
>> @@ -464,6 +471,7 @@ static void __init build_mem_type_table(void)
>>                       user_pgprot |= L_PTE_SHARED;
>>                       kern_pgprot |= L_PTE_SHARED;
>>                       vecs_pgprot |= L_PTE_SHARED;
>> +                     guest_pgprot |= L_PTE_SHARED;
>
> If we're using L_PTE_SHARED directly, do we even need L_PTE_S2_SHARED to be
> defined?
>

nope, removed it.

>>                       mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
>>                       mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
>>                       mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
>> @@ -518,7 +526,7 @@ static void __init build_mem_type_table(void)
>>       pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
>>       pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
>>                                L_PTE_DIRTY | kern_pgprot);
>> -     pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG);
>> +     pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | guest_pgprot);
>
> We don't have L_PTE_S2_PRESENT, for example.
>
> I'll try and get on to the meat of the code at some point, but there's an
> awful lot of it and it's hard to see how it fits together.
>
I don't think it's that bad once you understand the KVM structure
(which you will need to anyway to review the code) and it's very much
self-contained. In any case, I'll be available for any questions to
clarify things if needed. I appreciate the review very much!

Here's an updated version of the entire patch that compile both
with/without LPAE:


Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Wed Sep 19 17:50:07 2012 -0400

    ARM: Add page table and page defines needed by KVM

    KVM uses the stage-2 page tables and the Hyp page table format,
    so let's define the fields we need to access in KVM.

    We use pgprot_guest to indicate stage-2 entries.

    Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/include/asm/pgtable-3level.h
b/arch/arm/include/asm/pgtable-3level.h
index b249035..eaba5a4 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -102,11 +102,29 @@
  */
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */

+/*
+ * 2nd stage PTE definitions for LPAE.
+ */
+#define L_PTE_S2_MT_UNCACHED	 (_AT(pteval_t, 0x5) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRITEBACK	 (_AT(pteval_t, 0xf) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_RDONLY		 (_AT(pteval_t, 1) << 6)   /* HAP[1]   */
+#define L_PTE_S2_RDWR		 (_AT(pteval_t, 2) << 6)   /* HAP[2:1] */
+
+/*
+ * Hyp-mode PL2 PTE definitions for LPAE.
+ */
+#define L_PTE_HYP		L_PTE_USER
+
 #ifndef __ASSEMBLY__

 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & 2))
 #define pud_present(pud)	(pud_val(pud))
+#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_TABLE)
+#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_SECT)

 #define pud_clear(pudp)			\
 	do {				\
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 41dc31f..f9a124a 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -70,6 +70,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);

 extern pgprot_t		pgprot_user;
 extern pgprot_t		pgprot_kernel;
+extern pgprot_t		pgprot_guest;

 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))

@@ -82,6 +83,8 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
+#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_HYP)
+#define PAGE_KVM_GUEST		_MOD_PROT(pgprot_guest, L_PTE_S2_RDONLY)

 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 57f21de..6267f38 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -56,43 +56,57 @@ static unsigned int cachepolicy __initdata =
CPOLICY_WRITEBACK;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
+pgprot_t pgprot_guest;

 EXPORT_SYMBOL(pgprot_user);
 EXPORT_SYMBOL(pgprot_kernel);
+EXPORT_SYMBOL(pgprot_guest);

 struct cachepolicy {
 	const char	policy[16];
 	unsigned int	cr_mask;
 	pmdval_t	pmd;
 	pteval_t	pte;
+	pteval_t	pte_s2;
 };

+#ifdef CONFIG_ARM_LPAE
+#define s2_policy(policy)	policy
+#else
+#define s2_policy(policy)	0
+#endif
+
 static struct cachepolicy cache_policies[] __initdata = {
 	{
 		.policy		= "uncached",
 		.cr_mask	= CR_W|CR_C,
 		.pmd		= PMD_SECT_UNCACHED,
 		.pte		= L_PTE_MT_UNCACHED,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_UNCACHED),
 	}, {
 		.policy		= "buffered",
 		.cr_mask	= CR_C,
 		.pmd		= PMD_SECT_BUFFERED,
 		.pte		= L_PTE_MT_BUFFERABLE,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_UNCACHED),
 	}, {
 		.policy		= "writethrough",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WT,
 		.pte		= L_PTE_MT_WRITETHROUGH,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITETHROUGH),
 	}, {
 		.policy		= "writeback",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WB,
 		.pte		= L_PTE_MT_WRITEBACK,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITEBACK),
 	}, {
 		.policy		= "writealloc",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WBWA,
 		.pte		= L_PTE_MT_WRITEALLOC,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITEBACK),
 	}
 };

@@ -314,7 +328,7 @@ static void __init build_mem_type_table(void)
 {
 	struct cachepolicy *cp;
 	unsigned int cr = get_cr();
-	pteval_t user_pgprot, kern_pgprot, vecs_pgprot;
+	pteval_t user_pgprot, kern_pgprot, vecs_pgprot, guest_pgprot;
 	int cpu_arch = cpu_architecture();
 	int i;

@@ -426,6 +440,7 @@ static void __init build_mem_type_table(void)
 	 */
 	cp = &cache_policies[cachepolicy];
 	vecs_pgprot = kern_pgprot = user_pgprot = cp->pte;
+	guest_pgprot = cp->pte_s2;

 	/*
 	 * Enable CPU-specific coherency if supported.
@@ -460,6 +475,7 @@ static void __init build_mem_type_table(void)
 			user_pgprot |= L_PTE_SHARED;
 			kern_pgprot |= L_PTE_SHARED;
 			vecs_pgprot |= L_PTE_SHARED;
+			guest_pgprot |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
 			mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
@@ -514,6 +530,7 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
+	pgprot_guest  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | guest_pgprot);

 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* Re: [PATCH 04/15] ARM: idmap: only initialize HYP idmap when HYP mode is available
  2012-09-18 13:03     ` Will Deacon
@ 2012-09-20  0:11       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-20  0:11 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Sep 18, 2012 at 9:03 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:34:55PM +0100, Christoffer Dall wrote:
>> From: Marc Zyngier <marc.zyngier@arm.com>
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/mm/idmap.c |    4 ++++
>>  1 file changed, 4 insertions(+)
>
> Just fold this into the previous patch.
>
yep, that was silly.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 04/15] ARM: idmap: only initialize HYP idmap when HYP mode is available
@ 2012-09-20  0:11       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-20  0:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 18, 2012 at 9:03 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:34:55PM +0100, Christoffer Dall wrote:
>> From: Marc Zyngier <marc.zyngier@arm.com>
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/mm/idmap.c |    4 ++++
>>  1 file changed, 4 insertions(+)
>
> Just fold this into the previous patch.
>
yep, that was silly.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 01/15] ARM: add mem_type prot_pte accessor
  2012-09-18 21:53       ` Christoffer Dall
@ 2012-09-20 10:01         ` Marc Zyngier
  -1 siblings, 0 replies; 164+ messages in thread
From: Marc Zyngier @ 2012-09-20 10:01 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Russell King - ARM Linux, linux-arm-kernel, kvm, kvmarm

On 18/09/12 22:53, Christoffer Dall wrote:
> On Tue, Sep 18, 2012 at 5:04 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Sat, Sep 15, 2012 at 11:34:36AM -0400, Christoffer Dall wrote:
>>> From: Marc Zyngier <marc.zyngier@arm.com>
>>>
>>> The KVM hypervisor mmu code requires access to the mem_type prot_pte
>>> field when setting up page tables pointing to a device. Unfortunately,
>>> the mem_type structure is opaque.
>>>
>>> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
>>> value.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>>
>> Is there a reason why we need this to be exposed, along with all the
>> page table manipulation in patch 7?
>>
>> Is there a reason why we can't have new MT_ types for PAGE_HYP and
>> the HYP MT_DEVICE type (which is the same as MT_DEVICE but with
>> PTE_USER set) and have the standard ARM/generic kernel code build
>> those mappings?
> 
> For hyp mode we can do this, but we cannot do this for the cpu
> interfaces that need to be mapped into each VM as they have each their
> own pgd. 

Isn't that the same problem? The HYP mode has its own pgd too. I think
this is the main issue with the generic code. If we can come up with an
interface that allows the generic code to work on alternative pgds, we
could pretty much do what Russell suggests here.

	M.
-- 
Jazz is not dead. It just smells funny...


^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 01/15] ARM: add mem_type prot_pte accessor
@ 2012-09-20 10:01         ` Marc Zyngier
  0 siblings, 0 replies; 164+ messages in thread
From: Marc Zyngier @ 2012-09-20 10:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 18/09/12 22:53, Christoffer Dall wrote:
> On Tue, Sep 18, 2012 at 5:04 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Sat, Sep 15, 2012 at 11:34:36AM -0400, Christoffer Dall wrote:
>>> From: Marc Zyngier <marc.zyngier@arm.com>
>>>
>>> The KVM hypervisor mmu code requires access to the mem_type prot_pte
>>> field when setting up page tables pointing to a device. Unfortunately,
>>> the mem_type structure is opaque.
>>>
>>> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
>>> value.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>>
>> Is there a reason why we need this to be exposed, along with all the
>> page table manipulation in patch 7?
>>
>> Is there a reason why we can't have new MT_ types for PAGE_HYP and
>> the HYP MT_DEVICE type (which is the same as MT_DEVICE but with
>> PTE_USER set) and have the standard ARM/generic kernel code build
>> those mappings?
> 
> For hyp mode we can do this, but we cannot do this for the cpu
> interfaces that need to be mapped into each VM as they have each their
> own pgd. 

Isn't that the same problem? The HYP mode has its own pgd too. I think
this is the main issue with the generic code. If we can come up with an
interface that allows the generic code to work on alternative pgds, we
could pretty much do what Russell suggests here.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 01/15] ARM: add mem_type prot_pte accessor
  2012-09-20 10:01         ` Marc Zyngier
@ 2012-09-20 13:21           ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-20 13:21 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Russell King - ARM Linux, linux-arm-kernel, kvm, kvmarm

On Thu, Sep 20, 2012 at 6:01 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 18/09/12 22:53, Christoffer Dall wrote:
>> On Tue, Sep 18, 2012 at 5:04 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>>> On Sat, Sep 15, 2012 at 11:34:36AM -0400, Christoffer Dall wrote:
>>>> From: Marc Zyngier <marc.zyngier@arm.com>
>>>>
>>>> The KVM hypervisor mmu code requires access to the mem_type prot_pte
>>>> field when setting up page tables pointing to a device. Unfortunately,
>>>> the mem_type structure is opaque.
>>>>
>>>> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
>>>> value.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>>>
>>> Is there a reason why we need this to be exposed, along with all the
>>> page table manipulation in patch 7?
>>>
>>> Is there a reason why we can't have new MT_ types for PAGE_HYP and
>>> the HYP MT_DEVICE type (which is the same as MT_DEVICE but with
>>> PTE_USER set) and have the standard ARM/generic kernel code build
>>> those mappings?
>>
>> For hyp mode we can do this, but we cannot do this for the cpu
>> interfaces that need to be mapped into each VM as they have each their
>> own pgd.
>
> Isn't that the same problem? The HYP mode has its own pgd too. I think
> this is the main issue with the generic code. If we can come up with an
> interface that allows the generic code to work on alternative pgds, we
> could pretty much do what Russell suggests here.
>
Hyp mode has its own pgd, but there will only be one of them which can
be allocated at boot and setup at boot in mmu.c

This will not be the case for guest pages.  On the other hand, this
locks us to only one user of such mappings and more users could
potentially (I know it's a stretch) clutter mmu.c later on, which is
why I suggest the PAGE_KVM_DEVICE approach for now, which should then
be renamed, PAGE_S2_DEVICE probably.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 01/15] ARM: add mem_type prot_pte accessor
@ 2012-09-20 13:21           ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-20 13:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 20, 2012 at 6:01 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 18/09/12 22:53, Christoffer Dall wrote:
>> On Tue, Sep 18, 2012 at 5:04 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>>> On Sat, Sep 15, 2012 at 11:34:36AM -0400, Christoffer Dall wrote:
>>>> From: Marc Zyngier <marc.zyngier@arm.com>
>>>>
>>>> The KVM hypervisor mmu code requires access to the mem_type prot_pte
>>>> field when setting up page tables pointing to a device. Unfortunately,
>>>> the mem_type structure is opaque.
>>>>
>>>> Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
>>>> value.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>>>
>>> Is there a reason why we need this to be exposed, along with all the
>>> page table manipulation in patch 7?
>>>
>>> Is there a reason why we can't have new MT_ types for PAGE_HYP and
>>> the HYP MT_DEVICE type (which is the same as MT_DEVICE but with
>>> PTE_USER set) and have the standard ARM/generic kernel code build
>>> those mappings?
>>
>> For hyp mode we can do this, but we cannot do this for the cpu
>> interfaces that need to be mapped into each VM as they have each their
>> own pgd.
>
> Isn't that the same problem? The HYP mode has its own pgd too. I think
> this is the main issue with the generic code. If we can come up with an
> interface that allows the generic code to work on alternative pgds, we
> could pretty much do what Russell suggests here.
>
Hyp mode has its own pgd, but there will only be one of them which can
be allocated at boot and setup at boot in mmu.c

This will not be the case for guest pages.  On the other hand, this
locks us to only one user of such mappings and more users could
potentially (I know it's a stretch) clutter mmu.c later on, which is
why I suggest the PAGE_KVM_DEVICE approach for now, which should then
be renamed, PAGE_S2_DEVICE probably.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* RE: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
  2012-09-15 15:35   ` Christoffer Dall
@ 2012-09-25 11:11     ` Min-gyu Kim
  -1 siblings, 0 replies; 164+ messages in thread
From: Min-gyu Kim @ 2012-09-25 11:11 UTC (permalink / raw)
  To: 'Christoffer Dall', kvm, linux-arm-kernel, kvmarm
  Cc: 김창환



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Christoffer Dall
> Sent: Sunday, September 16, 2012 12:36 AM
> To: kvm@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> kvmarm@lists.cs.columbia.edu
> Subject: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
> 
> From: Christoffer Dall <cdall@cs.columbia.edu>
> 
> Handles the guest faults in KVM by mapping in corresponding user pages in
> the 2nd stage page tables.
> 
> We invalidate the instruction cache by MVA whenever we map a page to the
> guest (no, we cannot only do it when we have an iabt because the guest may
> happily read/write a page before hitting the icache) if the hardware uses
> VIPT or PIPT.  In the latter case, we can invalidate only that physical
> page.  In the first case, all bets are off and we simply must invalidate
> the whole affair.  Not that VIVT icaches are tagged with vmids, and we are
> out of the woods on that one.  Alexander Graf was nice enough to remind us
> of this massive pain.
> 
> There may be a subtle bug hidden in, which we currently hide by marking
> all pages dirty even when the pages are only mapped read-only.  The
> current hypothesis is that marking pages dirty may exercise the IO system
> and data cache more and therefore we don't see stale data in the guest,
> but it's purely guesswork.  The bug is manifested by seemingly random
> kernel crashes in guests when the host is under extreme memory pressure
> and swapping is enabled.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/kvm_arm.h |    9 ++
>  arch/arm/include/asm/kvm_asm.h |    2 +
>  arch/arm/kvm/mmu.c             |  155
> ++++++++++++++++++++++++++++++++++++++++
>  arch/arm/kvm/trace.h           |   28 +++++++
>  4 files changed, 193 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h
> b/arch/arm/include/asm/kvm_arm.h index ae586c1..4cff3b7 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -158,11 +158,20 @@
>  #define HSR_ISS		(HSR_IL - 1)
>  #define HSR_ISV_SHIFT	(24)
>  #define HSR_ISV		(1U << HSR_ISV_SHIFT)
> +#define HSR_FSC		(0x3f)
> +#define HSR_FSC_TYPE	(0x3c)
> +#define HSR_WNR		(1 << 6)
>  #define HSR_CV_SHIFT	(24)
>  #define HSR_CV		(1U << HSR_CV_SHIFT)
>  #define HSR_COND_SHIFT	(20)
>  #define HSR_COND	(0xfU << HSR_COND_SHIFT)
> 
> +#define FSC_FAULT	(0x04)
> +#define FSC_PERM	(0x0c)
> +
> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> +#define HPFAR_MASK	(~0xf)
> +
>  #define HSR_EC_UNKNOWN	(0x00)
>  #define HSR_EC_WFI	(0x01)
>  #define HSR_EC_CP15_32	(0x03)
> diff --git a/arch/arm/include/asm/kvm_asm.h
> b/arch/arm/include/asm/kvm_asm.h index 201ec1f..40ee099 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -43,6 +43,8 @@ extern char __kvm_hyp_vector[];  extern char
> __kvm_hyp_code_start[];  extern char __kvm_hyp_code_end[];
> 
> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> +
>  extern void __kvm_flush_vm_context(void);  extern void
> __kvm_tlb_flush_vmid(struct kvm *kvm);
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index
> ea17a97..52cc280 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -21,10 +21,16 @@
>  #include <linux/io.h>
>  #include <asm/idmap.h>
>  #include <asm/pgalloc.h>
> +#include <asm/cacheflush.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_mmu.h>
>  #include <asm/kvm_asm.h>
> +#include <asm/kvm_emulate.h>
>  #include <asm/mach/map.h>
> +#include <asm/kvm_asm.h>
> +#include <trace/events/kvm.h>
> +
> +#include "trace.h"
> 
>  static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
> 
> @@ -490,9 +496,156 @@ out:
>  	return ret;
>  }
> 
> +static void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn) {
> +	/*
> +	 * If we are going to insert an instruction page and the icache is
> +	 * either VIPT or PIPT, there is a potential problem where the host
> +	 * (or another VM) may have used this page at the same virtual
> address
> +	 * as this guest, and we read incorrect data from the icache.  If
> +	 * we're using a PIPT cache, we can invalidate just that page, but
> if
> +	 * we are using a VIPT cache we need to invalidate the entire
> icache -
> +	 * damn shame - as written in the ARM ARM (DDI 0406C - Page B3-1384)
> +	 */
> +	if (icache_is_pipt()) {
> +		unsigned long hva = gfn_to_hva(kvm, gfn);
> +		__cpuc_coherent_user_range(hva, hva + PAGE_SIZE);
> +	} else if (!icache_is_vivt_asid_tagged()) {
> +		/* any kind of VIPT cache */
> +		__flush_icache_all();
> +	}
> +}
> +
> +static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +			  gfn_t gfn, struct kvm_memory_slot *memslot,
> +			  bool is_iabt, unsigned long fault_status) {
> +	pte_t new_pte;
> +	pfn_t pfn, pfn_existing = KVM_PFN_ERR_BAD;
> +	int ret;
> +	bool write_fault, writable;
> +	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> +
> +	if (is_iabt)
> +		write_fault = false;
> +	else if ((vcpu->arch.hsr & HSR_ISV) && !(vcpu->arch.hsr & HSR_WNR))
> +		write_fault = false;
> +	else
> +		write_fault = true;
> +
> +	if (fault_status == FSC_PERM && !write_fault) {
> +		kvm_err("Unexpected L2 read permission error\n");
> +		return -EFAULT;
> +	}
> +
> +	/*
> +	 * If this is a write fault (think COW) we need to make sure the
> +	 * existing page, which other CPUs might still read, doesn't go
> away
> +	 * from under us, by calling gfn_to_pfn_prot(write_fault=true).
> +	 * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
> will
> +	 * pin the existing page, then we get a new page for the user space
> +	 * pte and map this in the stage-2 table where we also make sure to
> +	 * flush the TLB for the VM, if there was an existing entry (the
> entry
> +	 * was updated setting the write flag to the potentially new page).
> +	 */
> +	if (fault_status == FSC_PERM) {
> +		pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
> +		if (is_error_pfn(pfn_existing))
> +			return -EFAULT;
> +	}
> +
> +	pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
> +	if (is_error_pfn(pfn)) {
> +		ret = -EFAULT;
> +		goto out_put_existing;
> +	}
> +
> +	/* We need minimum second+third level pages */
> +	ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
> +	if (ret)
> +		goto out;
> +	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
> +	if (writable)
> +		pte_val(new_pte) |= L_PTE2_WRITE;
> +	coherent_icache_guest_page(vcpu->kvm, gfn);

why don't you flush icache only when guest has mapped executable page
as __sync_icache_dcache function does currently?


> +
> +	spin_lock(&vcpu->kvm->arch.pgd_lock);
> +	stage2_set_pte(vcpu->kvm, memcache, fault_ipa, &new_pte);
> +	spin_unlock(&vcpu->kvm->arch.pgd_lock);
> +
> +out:
> +	/*
> +	 * XXX TODO FIXME:
> +	 * This is _really_ *weird* !!!
> +	 * We should only be calling the _dirty verison when we map
> something
> +	 * writable, but this causes memory failures in guests under heavy
> +	 * memory pressure on the host and heavy swapping.
> +	 */
> +	kvm_release_pfn_dirty(pfn);
> +out_put_existing:
> +	if (!is_error_pfn(pfn_existing))
> +		kvm_release_pfn_clean(pfn_existing);
> +	return ret;
> +}
> +
> +/**
> + * kvm_handle_guest_abort - handles all 2nd stage aborts
> + * @vcpu:	the VCPU pointer
> + * @run:	the kvm_run structure
> + *
> + * Any abort that gets to the host is almost guaranteed to be caused by
> +a
> + * missing second stage translation table entry, which can mean that
> +either the
> + * guest simply needs more memory and we must allocate an appropriate
> +page or it
> + * can mean that the guest tried to access I/O memory, which is
> +emulated by user
> + * space. The distinction is based on the IPA causing the fault and
> +whether this
> + * memory region has been registered as standard RAM by user space.
> + */
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)  {
> -	return -EINVAL;
> +	unsigned long hsr_ec;
> +	unsigned long fault_status;
> +	phys_addr_t fault_ipa;
> +	struct kvm_memory_slot *memslot = NULL;
> +	bool is_iabt;
> +	gfn_t gfn;
> +	int ret;
> +
> +	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
> +	is_iabt = (hsr_ec == HSR_EC_IABT);
> +	fault_ipa = ((phys_addr_t)vcpu->arch.hpfar & HPFAR_MASK) << 8;
> +
> +	trace_kvm_guest_fault(*vcpu_pc(vcpu), vcpu->arch.hsr,
> +			      vcpu->arch.hdfar, vcpu->arch.hifar, fault_ipa);
> +
> +	/* Check the stage-2 fault is trans. fault or write fault */
> +	fault_status = (vcpu->arch.hsr & HSR_FSC_TYPE);
> +	if (fault_status != FSC_FAULT && fault_status != FSC_PERM) {
> +		kvm_err("Unsupported fault status: EC=%#lx DFCS=%#lx\n",
> +			hsr_ec, fault_status);
> +		return -EFAULT;
> +	}
> +
> +	gfn = fault_ipa >> PAGE_SHIFT;
> +	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
> +		if (is_iabt) {
> +			/* Prefetch Abort on I/O address */
> +			kvm_inject_pabt(vcpu, vcpu->arch.hifar);
> +			return 1;
> +		}
> +
> +		kvm_pr_unimpl("I/O address abort...");
> +		return 0;
> +	}
> +
> +	memslot = gfn_to_memslot(vcpu->kvm, gfn);
> +	if (!memslot->user_alloc) {
> +		kvm_err("non user-alloc memslots not supported\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot,
> +			     is_iabt, fault_status);
> +	return ret ? ret : 1;
>  }
> 
>  static void handle_hva_to_gpa(struct kvm *kvm, unsigned long hva, diff --
> git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h index 772e045..40606c9
> 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -39,6 +39,34 @@ TRACE_EVENT(kvm_exit,
>  	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)  );
> 
> +TRACE_EVENT(kvm_guest_fault,
> +	TP_PROTO(unsigned long vcpu_pc, unsigned long hsr,
> +		 unsigned long hdfar, unsigned long hifar,
> +		 unsigned long ipa),
> +	TP_ARGS(vcpu_pc, hsr, hdfar, hifar, ipa),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	vcpu_pc		)
> +		__field(	unsigned long,	hsr		)
> +		__field(	unsigned long,	hdfar		)
> +		__field(	unsigned long,	hifar		)
> +		__field(	unsigned long,	ipa		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu_pc		= vcpu_pc;
> +		__entry->hsr			= hsr;
> +		__entry->hdfar			= hdfar;
> +		__entry->hifar			= hifar;
> +		__entry->ipa			= ipa;
> +	),
> +
> +	TP_printk("guest fault at PC %#08lx (hdfar %#08lx, hifar %#08lx, "
> +		  "ipa %#08lx, hsr %#08lx",
> +		  __entry->vcpu_pc, __entry->hdfar, __entry->hifar,
> +		  __entry->hsr, __entry->ipa)
> +);
> +
>  TRACE_EVENT(kvm_irq_line,
>  	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
>  	TP_ARGS(type, vcpu_idx, irq_num, level),
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
@ 2012-09-25 11:11     ` Min-gyu Kim
  0 siblings, 0 replies; 164+ messages in thread
From: Min-gyu Kim @ 2012-09-25 11:11 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: kvm-owner at vger.kernel.org [mailto:kvm-owner at vger.kernel.org] On
> Behalf Of Christoffer Dall
> Sent: Sunday, September 16, 2012 12:36 AM
> To: kvm at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
> kvmarm at lists.cs.columbia.edu
> Subject: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
> 
> From: Christoffer Dall <cdall@cs.columbia.edu>
> 
> Handles the guest faults in KVM by mapping in corresponding user pages in
> the 2nd stage page tables.
> 
> We invalidate the instruction cache by MVA whenever we map a page to the
> guest (no, we cannot only do it when we have an iabt because the guest may
> happily read/write a page before hitting the icache) if the hardware uses
> VIPT or PIPT.  In the latter case, we can invalidate only that physical
> page.  In the first case, all bets are off and we simply must invalidate
> the whole affair.  Not that VIVT icaches are tagged with vmids, and we are
> out of the woods on that one.  Alexander Graf was nice enough to remind us
> of this massive pain.
> 
> There may be a subtle bug hidden in, which we currently hide by marking
> all pages dirty even when the pages are only mapped read-only.  The
> current hypothesis is that marking pages dirty may exercise the IO system
> and data cache more and therefore we don't see stale data in the guest,
> but it's purely guesswork.  The bug is manifested by seemingly random
> kernel crashes in guests when the host is under extreme memory pressure
> and swapping is enabled.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/kvm_arm.h |    9 ++
>  arch/arm/include/asm/kvm_asm.h |    2 +
>  arch/arm/kvm/mmu.c             |  155
> ++++++++++++++++++++++++++++++++++++++++
>  arch/arm/kvm/trace.h           |   28 +++++++
>  4 files changed, 193 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h
> b/arch/arm/include/asm/kvm_arm.h index ae586c1..4cff3b7 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -158,11 +158,20 @@
>  #define HSR_ISS		(HSR_IL - 1)
>  #define HSR_ISV_SHIFT	(24)
>  #define HSR_ISV		(1U << HSR_ISV_SHIFT)
> +#define HSR_FSC		(0x3f)
> +#define HSR_FSC_TYPE	(0x3c)
> +#define HSR_WNR		(1 << 6)
>  #define HSR_CV_SHIFT	(24)
>  #define HSR_CV		(1U << HSR_CV_SHIFT)
>  #define HSR_COND_SHIFT	(20)
>  #define HSR_COND	(0xfU << HSR_COND_SHIFT)
> 
> +#define FSC_FAULT	(0x04)
> +#define FSC_PERM	(0x0c)
> +
> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> +#define HPFAR_MASK	(~0xf)
> +
>  #define HSR_EC_UNKNOWN	(0x00)
>  #define HSR_EC_WFI	(0x01)
>  #define HSR_EC_CP15_32	(0x03)
> diff --git a/arch/arm/include/asm/kvm_asm.h
> b/arch/arm/include/asm/kvm_asm.h index 201ec1f..40ee099 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -43,6 +43,8 @@ extern char __kvm_hyp_vector[];  extern char
> __kvm_hyp_code_start[];  extern char __kvm_hyp_code_end[];
> 
> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> +
>  extern void __kvm_flush_vm_context(void);  extern void
> __kvm_tlb_flush_vmid(struct kvm *kvm);
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index
> ea17a97..52cc280 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -21,10 +21,16 @@
>  #include <linux/io.h>
>  #include <asm/idmap.h>
>  #include <asm/pgalloc.h>
> +#include <asm/cacheflush.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_mmu.h>
>  #include <asm/kvm_asm.h>
> +#include <asm/kvm_emulate.h>
>  #include <asm/mach/map.h>
> +#include <asm/kvm_asm.h>
> +#include <trace/events/kvm.h>
> +
> +#include "trace.h"
> 
>  static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
> 
> @@ -490,9 +496,156 @@ out:
>  	return ret;
>  }
> 
> +static void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn) {
> +	/*
> +	 * If we are going to insert an instruction page and the icache is
> +	 * either VIPT or PIPT, there is a potential problem where the host
> +	 * (or another VM) may have used this page at the same virtual
> address
> +	 * as this guest, and we read incorrect data from the icache.  If
> +	 * we're using a PIPT cache, we can invalidate just that page, but
> if
> +	 * we are using a VIPT cache we need to invalidate the entire
> icache -
> +	 * damn shame - as written in the ARM ARM (DDI 0406C - Page B3-1384)
> +	 */
> +	if (icache_is_pipt()) {
> +		unsigned long hva = gfn_to_hva(kvm, gfn);
> +		__cpuc_coherent_user_range(hva, hva + PAGE_SIZE);
> +	} else if (!icache_is_vivt_asid_tagged()) {
> +		/* any kind of VIPT cache */
> +		__flush_icache_all();
> +	}
> +}
> +
> +static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +			  gfn_t gfn, struct kvm_memory_slot *memslot,
> +			  bool is_iabt, unsigned long fault_status) {
> +	pte_t new_pte;
> +	pfn_t pfn, pfn_existing = KVM_PFN_ERR_BAD;
> +	int ret;
> +	bool write_fault, writable;
> +	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> +
> +	if (is_iabt)
> +		write_fault = false;
> +	else if ((vcpu->arch.hsr & HSR_ISV) && !(vcpu->arch.hsr & HSR_WNR))
> +		write_fault = false;
> +	else
> +		write_fault = true;
> +
> +	if (fault_status == FSC_PERM && !write_fault) {
> +		kvm_err("Unexpected L2 read permission error\n");
> +		return -EFAULT;
> +	}
> +
> +	/*
> +	 * If this is a write fault (think COW) we need to make sure the
> +	 * existing page, which other CPUs might still read, doesn't go
> away
> +	 * from under us, by calling gfn_to_pfn_prot(write_fault=true).
> +	 * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
> will
> +	 * pin the existing page, then we get a new page for the user space
> +	 * pte and map this in the stage-2 table where we also make sure to
> +	 * flush the TLB for the VM, if there was an existing entry (the
> entry
> +	 * was updated setting the write flag to the potentially new page).
> +	 */
> +	if (fault_status == FSC_PERM) {
> +		pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
> +		if (is_error_pfn(pfn_existing))
> +			return -EFAULT;
> +	}
> +
> +	pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
> +	if (is_error_pfn(pfn)) {
> +		ret = -EFAULT;
> +		goto out_put_existing;
> +	}
> +
> +	/* We need minimum second+third level pages */
> +	ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
> +	if (ret)
> +		goto out;
> +	new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
> +	if (writable)
> +		pte_val(new_pte) |= L_PTE2_WRITE;
> +	coherent_icache_guest_page(vcpu->kvm, gfn);

why don't you flush icache only when guest has mapped executable page
as __sync_icache_dcache function does currently?


> +
> +	spin_lock(&vcpu->kvm->arch.pgd_lock);
> +	stage2_set_pte(vcpu->kvm, memcache, fault_ipa, &new_pte);
> +	spin_unlock(&vcpu->kvm->arch.pgd_lock);
> +
> +out:
> +	/*
> +	 * XXX TODO FIXME:
> +	 * This is _really_ *weird* !!!
> +	 * We should only be calling the _dirty verison when we map
> something
> +	 * writable, but this causes memory failures in guests under heavy
> +	 * memory pressure on the host and heavy swapping.
> +	 */
> +	kvm_release_pfn_dirty(pfn);
> +out_put_existing:
> +	if (!is_error_pfn(pfn_existing))
> +		kvm_release_pfn_clean(pfn_existing);
> +	return ret;
> +}
> +
> +/**
> + * kvm_handle_guest_abort - handles all 2nd stage aborts
> + * @vcpu:	the VCPU pointer
> + * @run:	the kvm_run structure
> + *
> + * Any abort that gets to the host is almost guaranteed to be caused by
> +a
> + * missing second stage translation table entry, which can mean that
> +either the
> + * guest simply needs more memory and we must allocate an appropriate
> +page or it
> + * can mean that the guest tried to access I/O memory, which is
> +emulated by user
> + * space. The distinction is based on the IPA causing the fault and
> +whether this
> + * memory region has been registered as standard RAM by user space.
> + */
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)  {
> -	return -EINVAL;
> +	unsigned long hsr_ec;
> +	unsigned long fault_status;
> +	phys_addr_t fault_ipa;
> +	struct kvm_memory_slot *memslot = NULL;
> +	bool is_iabt;
> +	gfn_t gfn;
> +	int ret;
> +
> +	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
> +	is_iabt = (hsr_ec == HSR_EC_IABT);
> +	fault_ipa = ((phys_addr_t)vcpu->arch.hpfar & HPFAR_MASK) << 8;
> +
> +	trace_kvm_guest_fault(*vcpu_pc(vcpu), vcpu->arch.hsr,
> +			      vcpu->arch.hdfar, vcpu->arch.hifar, fault_ipa);
> +
> +	/* Check the stage-2 fault is trans. fault or write fault */
> +	fault_status = (vcpu->arch.hsr & HSR_FSC_TYPE);
> +	if (fault_status != FSC_FAULT && fault_status != FSC_PERM) {
> +		kvm_err("Unsupported fault status: EC=%#lx DFCS=%#lx\n",
> +			hsr_ec, fault_status);
> +		return -EFAULT;
> +	}
> +
> +	gfn = fault_ipa >> PAGE_SHIFT;
> +	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
> +		if (is_iabt) {
> +			/* Prefetch Abort on I/O address */
> +			kvm_inject_pabt(vcpu, vcpu->arch.hifar);
> +			return 1;
> +		}
> +
> +		kvm_pr_unimpl("I/O address abort...");
> +		return 0;
> +	}
> +
> +	memslot = gfn_to_memslot(vcpu->kvm, gfn);
> +	if (!memslot->user_alloc) {
> +		kvm_err("non user-alloc memslots not supported\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot,
> +			     is_iabt, fault_status);
> +	return ret ? ret : 1;
>  }
> 
>  static void handle_hva_to_gpa(struct kvm *kvm, unsigned long hva, diff --
> git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h index 772e045..40606c9
> 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -39,6 +39,34 @@ TRACE_EVENT(kvm_exit,
>  	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)  );
> 
> +TRACE_EVENT(kvm_guest_fault,
> +	TP_PROTO(unsigned long vcpu_pc, unsigned long hsr,
> +		 unsigned long hdfar, unsigned long hifar,
> +		 unsigned long ipa),
> +	TP_ARGS(vcpu_pc, hsr, hdfar, hifar, ipa),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	vcpu_pc		)
> +		__field(	unsigned long,	hsr		)
> +		__field(	unsigned long,	hdfar		)
> +		__field(	unsigned long,	hifar		)
> +		__field(	unsigned long,	ipa		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu_pc		= vcpu_pc;
> +		__entry->hsr			= hsr;
> +		__entry->hdfar			= hdfar;
> +		__entry->hifar			= hifar;
> +		__entry->ipa			= ipa;
> +	),
> +
> +	TP_printk("guest fault at PC %#08lx (hdfar %#08lx, hifar %#08lx, "
> +		  "ipa %#08lx, hsr %#08lx",
> +		  __entry->vcpu_pc, __entry->hdfar, __entry->hifar,
> +		  __entry->hsr, __entry->ipa)
> +);
> +
>  TRACE_EVENT(kvm_irq_line,
>  	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
>  	TP_ARGS(type, vcpu_idx, irq_num, level),
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the body
> of a message to majordomo at vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
  2012-09-25 11:11     ` Min-gyu Kim
@ 2012-09-25 12:38       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-25 12:38 UTC (permalink / raw)
  To: Min-gyu Kim; +Cc: kvm, linux-arm-kernel, kvmarm, 김창환

>> +
>> +     /*
>> +      * If this is a write fault (think COW) we need to make sure the
>> +      * existing page, which other CPUs might still read, doesn't go
>> away
>> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
>> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
>> will
>> +      * pin the existing page, then we get a new page for the user space
>> +      * pte and map this in the stage-2 table where we also make sure to
>> +      * flush the TLB for the VM, if there was an existing entry (the
>> entry
>> +      * was updated setting the write flag to the potentially new page).
>> +      */
>> +     if (fault_status == FSC_PERM) {
>> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
>> +             if (is_error_pfn(pfn_existing))
>> +                     return -EFAULT;
>> +     }
>> +
>> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
>> +     if (is_error_pfn(pfn)) {
>> +             ret = -EFAULT;
>> +             goto out_put_existing;
>> +     }
>> +
>> +     /* We need minimum second+third level pages */
>> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
>> +     if (ret)
>> +             goto out;
>> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>> +     if (writable)
>> +             pte_val(new_pte) |= L_PTE2_WRITE;
>> +     coherent_icache_guest_page(vcpu->kvm, gfn);
>
> why don't you flush icache only when guest has mapped executable page
> as __sync_icache_dcache function does currently?
>
>

because we don't know if the guest will map the page executable. The
guest may read the page through a normal load, which causes the fault,
and subsequently execute it (even possible through different guest
mappings). The only way to see this happening would be to mark all
pages as non-executable and catch the fault when it occurs -
unfortunately the HPFAR which gives us the IPA is not populated on
execute never faults, so we would have to translate the PC's va to ipa
using cp15 functionality when this happens, which is then also racy
with other CPUs. So the question is really if this will even be an
optimization, but it's definitely something that requires further
investigation.

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
@ 2012-09-25 12:38       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-25 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

>> +
>> +     /*
>> +      * If this is a write fault (think COW) we need to make sure the
>> +      * existing page, which other CPUs might still read, doesn't go
>> away
>> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
>> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
>> will
>> +      * pin the existing page, then we get a new page for the user space
>> +      * pte and map this in the stage-2 table where we also make sure to
>> +      * flush the TLB for the VM, if there was an existing entry (the
>> entry
>> +      * was updated setting the write flag to the potentially new page).
>> +      */
>> +     if (fault_status == FSC_PERM) {
>> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
>> +             if (is_error_pfn(pfn_existing))
>> +                     return -EFAULT;
>> +     }
>> +
>> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
>> +     if (is_error_pfn(pfn)) {
>> +             ret = -EFAULT;
>> +             goto out_put_existing;
>> +     }
>> +
>> +     /* We need minimum second+third level pages */
>> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
>> +     if (ret)
>> +             goto out;
>> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>> +     if (writable)
>> +             pte_val(new_pte) |= L_PTE2_WRITE;
>> +     coherent_icache_guest_page(vcpu->kvm, gfn);
>
> why don't you flush icache only when guest has mapped executable page
> as __sync_icache_dcache function does currently?
>
>

because we don't know if the guest will map the page executable. The
guest may read the page through a normal load, which causes the fault,
and subsequently execute it (even possible through different guest
mappings). The only way to see this happening would be to mark all
pages as non-executable and catch the fault when it occurs -
unfortunately the HPFAR which gives us the IPA is not populated on
execute never faults, so we would have to translate the PC's va to ipa
using cp15 functionality when this happens, which is then also racy
with other CPUs. So the question is really if this will even be an
optimization, but it's definitely something that requires further
investigation.

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-09-15 15:35   ` Christoffer Dall
@ 2012-09-25 15:20     ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-25 15:20 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Sat, Sep 15, 2012 at 04:35:08PM +0100, Christoffer Dall wrote:
> Targets KVM support for Cortex A-15 processors.
> 
> Contains all the framework components, make files, header files, some
> tracing functionality, and basic user space API.
> 
> Only supported core is Cortex-A15 for now.
> 
> Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.
> 
> “Nothing to see here. Move along, move along..."

[...]

> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
> new file mode 100644
> index 0000000..a13b582
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm.h
> @@ -0,0 +1,88 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_H__
> +#define __ARM_KVM_H__
> +
> +#include <asm/types.h>
> +
> +#define __KVM_HAVE_GUEST_DEBUG
> +
> +#define KVM_REG_SIZE(id)                                               \
> +       (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> +
> +struct kvm_regs {
> +       __u32 usr_regs[15];     /* R0_usr - R14_usr */
> +       __u32 svc_regs[3];      /* SP_svc, LR_svc, SPSR_svc */
> +       __u32 abt_regs[3];      /* SP_abt, LR_abt, SPSR_abt */
> +       __u32 und_regs[3];      /* SP_und, LR_und, SPSR_und */
> +       __u32 irq_regs[3];      /* SP_irq, LR_irq, SPSR_irq */
> +       __u32 fiq_regs[8];      /* R8_fiq - R14_fiq, SPSR_fiq */
> +       __u32 pc;               /* The program counter (r15) */
> +       __u32 cpsr;             /* The guest CPSR */
> +};
> +
> +/* Supported Processor Types */
> +#define KVM_ARM_TARGET_CORTEX_A15      (0xC0F)

So there's this define...

> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> new file mode 100644
> index 0000000..2f9d28e
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -0,0 +1,28 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_ARM_H__
> +#define __ARM_KVM_ARM_H__
> +
> +/* Supported Processor Types */
> +#define CORTEX_A15     (0xC0F)

... and then this one. Do we really need both?

> +/* Multiprocessor Affinity Register */
> +#define MPIDR_CPUID    (0x3 << 0)

I'm fairly sure we already have code under arch/arm/ for dealing with the
mpidr. Let's re-use that rather than reinventing it here.

> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> new file mode 100644
> index 0000000..44591f9
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -0,0 +1,30 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_ASM_H__
> +#define __ARM_KVM_ASM_H__
> +
> +#define ARM_EXCEPTION_RESET      0
> +#define ARM_EXCEPTION_UNDEFINED   1
> +#define ARM_EXCEPTION_SOFTWARE    2
> +#define ARM_EXCEPTION_PREF_ABORT  3
> +#define ARM_EXCEPTION_DATA_ABORT  4
> +#define ARM_EXCEPTION_IRQ        5
> +#define ARM_EXCEPTION_FIQ        6

Again, you have these defines (which look more suited to an enum type), but
later (in kvm_host.h) you have:

> +#define EXCEPTION_NONE      0
> +#define EXCEPTION_RESET     0x80
> +#define EXCEPTION_UNDEFINED 0x40
> +#define EXCEPTION_SOFTWARE  0x20
> +#define EXCEPTION_PREFETCH  0x10
> +#define EXCEPTION_DATA      0x08
> +#define EXCEPTION_IMPRECISE 0x04
> +#define EXCEPTION_IRQ       0x02
> +#define EXCEPTION_FIQ       0x01

Why the noise?

> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> new file mode 100644
> index 0000000..9e29335
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -0,0 +1,108 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_EMULATE_H__
> +#define __ARM_KVM_EMULATE_H__
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_asm.h>
> +
> +u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, enum vcpu_mode mode);
> +
> +static inline u8 __vcpu_mode(u32 cpsr)
> +{
> +       u8 modes_table[32] = {
> +               0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
> +               0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
> +               MODE_USR,       /* 0x0 */
> +               MODE_FIQ,       /* 0x1 */
> +               MODE_IRQ,       /* 0x2 */
> +               MODE_SVC,       /* 0x3 */
> +               0xf, 0xf, 0xf,
> +               MODE_ABT,       /* 0x7 */
> +               0xf, 0xf, 0xf,
> +               MODE_UND,       /* 0xb */
> +               0xf, 0xf, 0xf,
> +               MODE_SYS        /* 0xf */
> +       };

These MODE_* definitions sound like our *_MODE definitions... except they're
not. It would probably be far more readable as a switch, but at least change
the name of those things!

> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
> +{
> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
> +       BUG_ON(mode == 0xf);
> +       return mode;
> +}

I noticed that you have a fair few BUG_ONs throughout the series. Fair
enough, but for hyp code is that really the right thing to do? Killing the
guest could make more sense, perhaps?

> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
> +{
> +       return vcpu_reg(vcpu, 15);
> +}

If you stick a struct pt_regs into struct kvm_regs, you could reuse ARM_pc
here etc.

> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> new file mode 100644
> index 0000000..24959f4
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -0,0 +1,172 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_HOST_H__
> +#define __ARM_KVM_HOST_H__
> +
> +#include <asm/kvm.h>
> +
> +#define KVM_MAX_VCPUS 4

NR_CPUS?

> +#define KVM_MEMORY_SLOTS 32
> +#define KVM_PRIVATE_MEM_SLOTS 4
> +#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> +
> +#define NUM_FEATURES 0

Ha! No idea what that means, but hopefully there's less code to review
because of it :)

> +
> +/* We don't currently support large pages. */
> +#define KVM_HPAGE_GFN_SHIFT(x) 0
> +#define KVM_NR_PAGE_SIZES      1
> +#define KVM_PAGES_PER_HPAGE(x) (1UL<<31)
> +
> +struct kvm_vcpu;
> +u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> +int kvm_target_cpu(void);
> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> +void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
> +
> +struct kvm_arch {
> +       /* The VMID generation used for the virt. memory system */
> +       u64    vmid_gen;
> +       u32    vmid;
> +
> +       /* 1-level 2nd stage table and lock */
> +       spinlock_t pgd_lock;
> +       pgd_t *pgd;
> +
> +       /* VTTBR value associated with above pgd and vmid */
> +       u64    vttbr;
> +};
> +
> +#define EXCEPTION_NONE      0
> +#define EXCEPTION_RESET     0x80
> +#define EXCEPTION_UNDEFINED 0x40
> +#define EXCEPTION_SOFTWARE  0x20
> +#define EXCEPTION_PREFETCH  0x10
> +#define EXCEPTION_DATA      0x08
> +#define EXCEPTION_IMPRECISE 0x04
> +#define EXCEPTION_IRQ       0x02
> +#define EXCEPTION_FIQ       0x01
> +
> +#define KVM_NR_MEM_OBJS     40
> +
> +/*
> + * We don't want allocation failures within the mmu code, so we preallocate
> + * enough memory for a single page fault in a cache.
> + */
> +struct kvm_mmu_memory_cache {
> +       int nobjs;
> +       void *objects[KVM_NR_MEM_OBJS];
> +};
> +
> +/*
> + * Modes used for short-hand mode determinition in the world-switch code and
> + * in emulation code.
> + *
> + * Note: These indices do NOT correspond to the value of the CPSR mode bits!
> + */
> +enum vcpu_mode {
> +       MODE_FIQ = 0,
> +       MODE_IRQ,
> +       MODE_SVC,
> +       MODE_ABT,
> +       MODE_UND,
> +       MODE_USR,
> +       MODE_SYS
> +};

So the need for this enum is for indexing the array of modes, right? But
accesses to that array are already hidden behind an accessor function from
what I can tell, so I'd rather the arithmetic from cpsr -> index was
restricted to that function and the rest of the code just passed either the
raw mode or the full cpsr around.

> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> new file mode 100644
> index 0000000..fd6fa9b
> --- /dev/null
> +++ b/arch/arm/kvm/arm.c
> @@ -0,0 +1,345 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/kvm_host.h>
> +#include <linux/module.h>
> +#include <linux/vmalloc.h>
> +#include <linux/fs.h>
> +#include <linux/mman.h>
> +#include <linux/sched.h>
> +#include <trace/events/kvm.h>
> +
> +#define CREATE_TRACE_POINTS
> +#include "trace.h"
> +
> +#include <asm/unified.h>
> +#include <asm/uaccess.h>
> +#include <asm/ptrace.h>
> +#include <asm/mman.h>
> +#include <asm/cputype.h>
> +
> +#ifdef REQUIRES_VIRT
> +__asm__(".arch_extension       virt");
> +#endif
> +
> +int kvm_arch_hardware_enable(void *garbage)
> +{
> +       return 0;
> +}
> +
> +int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> +{
> +       return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
> +}
> +
> +void kvm_arch_hardware_disable(void *garbage)
> +{
> +}
> +
> +int kvm_arch_hardware_setup(void)
> +{
> +       return 0;
> +}
> +
> +void kvm_arch_hardware_unsetup(void)
> +{
> +}
> +
> +void kvm_arch_check_processor_compat(void *rtn)
> +{
> +       *(int *)rtn = 0;
> +}
> +
> +void kvm_arch_sync_events(struct kvm *kvm)
> +{
> +}
> +
> +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> +{
> +       if (type)
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> +{
> +       return VM_FAULT_SIGBUS;
> +}
> +
> +void kvm_arch_free_memslot(struct kvm_memory_slot *free,
> +                          struct kvm_memory_slot *dont)
> +{
> +}
> +
> +int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
> +{
> +       return 0;
> +}
> +
> +void kvm_arch_destroy_vm(struct kvm *kvm)
> +{
> +       int i;
> +
> +       for (i = 0; i < KVM_MAX_VCPUS; ++i) {
> +               if (kvm->vcpus[i]) {
> +                       kvm_arch_vcpu_free(kvm->vcpus[i]);
> +                       kvm->vcpus[i] = NULL;
> +               }
> +       }
> +}
> +
> +int kvm_dev_ioctl_check_extension(long ext)
> +{
> +       int r;
> +       switch (ext) {
> +       case KVM_CAP_USER_MEMORY:
> +       case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
> +       case KVM_CAP_ONE_REG:
> +               r = 1;
> +               break;
> +       case KVM_CAP_COALESCED_MMIO:
> +               r = KVM_COALESCED_MMIO_PAGE_OFFSET;
> +               break;
> +       default:
> +               r = 0;
> +               break;
> +       }
> +       return r;
> +}
> +
> +long kvm_arch_dev_ioctl(struct file *filp,
> +                       unsigned int ioctl, unsigned long arg)
> +{
> +       return -EINVAL;
> +}
> +
> +int kvm_arch_set_memory_region(struct kvm *kvm,
> +                              struct kvm_userspace_memory_region *mem,
> +                              struct kvm_memory_slot old,
> +                              int user_alloc)
> +{
> +       return 0;
> +}
> +
> +int kvm_arch_prepare_memory_region(struct kvm *kvm,
> +                                  struct kvm_memory_slot *memslot,
> +                                  struct kvm_memory_slot old,
> +                                  struct kvm_userspace_memory_region *mem,
> +                                  int user_alloc)
> +{
> +       return 0;
> +}
> +
> +void kvm_arch_commit_memory_region(struct kvm *kvm,
> +                                  struct kvm_userspace_memory_region *mem,
> +                                  struct kvm_memory_slot old,
> +                                  int user_alloc)
> +{
> +}
> +
> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +}
> +
> +void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> +                                  struct kvm_memory_slot *slot)
> +{
> +}
> +
> +struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
> +{
> +       int err;
> +       struct kvm_vcpu *vcpu;
> +
> +       vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
> +       if (!vcpu) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       err = kvm_vcpu_init(vcpu, kvm, id);
> +       if (err)
> +               goto free_vcpu;
> +
> +       return vcpu;
> +free_vcpu:
> +       kmem_cache_free(kvm_vcpu_cache, vcpu);
> +out:
> +       return ERR_PTR(err);
> +}
> +
> +void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
> +{
> +}
> +
> +void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> +{
> +       kvm_arch_vcpu_free(vcpu);
> +}
> +
> +int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
> +{
> +       return 0;
> +}
> +
> +int __attribute_const__ kvm_target_cpu(void)
> +{
> +       unsigned int midr;
> +
> +       midr = read_cpuid_id();
> +       switch ((midr >> 4) & 0xfff) {
> +       case KVM_ARM_TARGET_CORTEX_A15:
> +               return KVM_ARM_TARGET_CORTEX_A15;

I have this code already in perf_event.c. Can we move it somewhere common
and share it? You should also check that the implementor field is 0x41.

> +       default:
> +               return -EINVAL;
> +       }
> +}
> +
> +int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> +{
> +       return 0;
> +}
> +
> +void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
> +{
> +}
> +
> +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> +{
> +}
> +
> +void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> +{
> +}
> +
> +int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
> +                                       struct kvm_guest_debug *dbg)
> +{
> +       return -EINVAL;
> +}
> +
> +
> +int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> +                                   struct kvm_mp_state *mp_state)
> +{
> +       return -EINVAL;
> +}
> +
> +int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> +                                   struct kvm_mp_state *mp_state)
> +{
> +       return -EINVAL;
> +}
> +
> +int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> +{
> +       return 0;
> +}
> +
> +int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
> +{
> +       return 0;
> +}
> +
> +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> +       return -EINVAL;
> +}
> +
> +long kvm_arch_vcpu_ioctl(struct file *filp,
> +                        unsigned int ioctl, unsigned long arg)
> +{
> +       struct kvm_vcpu *vcpu = filp->private_data;
> +       void __user *argp = (void __user *)arg;
> +
> +       switch (ioctl) {
> +       case KVM_ARM_VCPU_INIT: {
> +               struct kvm_vcpu_init init;
> +
> +               if (copy_from_user(&init, argp, sizeof init))
> +                       return -EFAULT;
> +
> +               return kvm_vcpu_set_target(vcpu, &init);
> +
> +       }
> +       case KVM_SET_ONE_REG:
> +       case KVM_GET_ONE_REG: {
> +               struct kvm_one_reg reg;
> +               if (copy_from_user(&reg, argp, sizeof(reg)))
> +                       return -EFAULT;
> +               if (ioctl == KVM_SET_ONE_REG)
> +                       return kvm_arm_set_reg(vcpu, &reg);
> +               else
> +                       return kvm_arm_get_reg(vcpu, &reg);
> +       }
> +       case KVM_GET_REG_LIST: {
> +               struct kvm_reg_list __user *user_list = argp;
> +               struct kvm_reg_list reg_list;
> +               unsigned n;
> +
> +               if (copy_from_user(&reg_list, user_list, sizeof reg_list))
> +                       return -EFAULT;
> +               n = reg_list.n;
> +               reg_list.n = kvm_arm_num_regs(vcpu);
> +               if (copy_to_user(user_list, &reg_list, sizeof reg_list))
> +                       return -EFAULT;
> +               if (n < reg_list.n)
> +                       return -E2BIG;
> +               return kvm_arm_copy_reg_indices(vcpu, user_list->reg);

kvm_reg_list sounds like it could be done using a regset instead.

> diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
> new file mode 100644
> index 0000000..690bbb3
> --- /dev/null
> +++ b/arch/arm/kvm/emulate.c
> @@ -0,0 +1,127 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#include <asm/kvm_emulate.h>
> +
> +#define REG_OFFSET(_reg) \
> +       (offsetof(struct kvm_regs, _reg) / sizeof(u32))
> +
> +#define USR_REG_OFFSET(_num) REG_OFFSET(usr_regs[_num])
> +
> +static const unsigned long vcpu_reg_offsets[MODE_SYS + 1][16] = {
> +       /* FIQ Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7),
> +               REG_OFFSET(fiq_regs[1]), /* r8 */
> +               REG_OFFSET(fiq_regs[1]), /* r9 */
> +               REG_OFFSET(fiq_regs[2]), /* r10 */
> +               REG_OFFSET(fiq_regs[3]), /* r11 */
> +               REG_OFFSET(fiq_regs[4]), /* r12 */
> +               REG_OFFSET(fiq_regs[5]), /* r13 */
> +               REG_OFFSET(fiq_regs[6]), /* r14 */
> +               REG_OFFSET(pc)           /* r15 */
> +       },
> +
> +       /* IRQ Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(irq_regs[0]), /* r13 */
> +               REG_OFFSET(irq_regs[1]), /* r14 */
> +               REG_OFFSET(pc)           /* r15 */
> +       },
> +
> +       /* SVC Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(svc_regs[0]), /* r13 */
> +               REG_OFFSET(svc_regs[1]), /* r14 */
> +               REG_OFFSET(pc)           /* r15 */
> +       },
> +
> +       /* ABT Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(abt_regs[0]), /* r13 */
> +               REG_OFFSET(abt_regs[1]), /* r14 */
> +               REG_OFFSET(pc)           /* r15 */
> +       },
> +
> +       /* UND Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(und_regs[0]), /* r13 */
> +               REG_OFFSET(und_regs[1]), /* r14 */
> +               REG_OFFSET(pc)           /* r15 */
> +       },
> +
> +       /* USR Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(usr_regs[13]), /* r13 */
> +               REG_OFFSET(usr_regs[14]), /* r14 */
> +               REG_OFFSET(pc)            /* r15 */
> +       },
> +
> +       /* SYS Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(usr_regs[13]), /* r13 */
> +               REG_OFFSET(usr_regs[14]), /* r14 */
> +               REG_OFFSET(pc)            /* r15 */
> +       },
> +};
> +
> +/*
> + * Return a pointer to the register number valid in the specified mode of
> + * the virtual CPU.
> + */
> +u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
> +{
> +       u32 *reg_array = (u32 *)&vcpu->arch.regs;
> +
> +       BUG_ON(reg_num > 15);
> +       BUG_ON(mode > MODE_SYS);

Again, BUG_ON seems a bit OTT here. Also, this is where the mode => idx
calculation should happen.

> +       return reg_array + vcpu_reg_offsets[mode][reg_num];
> +}
> diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
> new file mode 100644
> index 0000000..3e38c95
> --- /dev/null
> +++ b/arch/arm/kvm/exports.c
> @@ -0,0 +1,21 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#include <linux/module.h>
> +
> +EXPORT_SYMBOL_GPL(smp_send_reschedule);

Erm...

We already have arch/arm/kernel/armksyms.c for exports -- please use that.
However, exporting such low-level operations sounds like a bad idea. How
realistic is kvm-as-a-module on ARM anyway?

> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
> new file mode 100644
> index 0000000..19a5389
> --- /dev/null
> +++ b/arch/arm/kvm/guest.c
> @@ -0,0 +1,211 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/kvm_host.h>
> +#include <linux/module.h>
> +#include <linux/vmalloc.h>
> +#include <linux/fs.h>
> +#include <asm/uaccess.h>
> +#include <asm/kvm.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_emulate.h>
> +
> +#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
> +#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
> +
> +struct kvm_stats_debugfs_item debugfs_entries[] = {
> +       { NULL }
> +};
> +
> +int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
> +{
> +       return 0;
> +}
> +
> +static u64 core_reg_offset_from_id(u64 id)
> +{
> +       return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
> +}
> +
> +static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> +{
> +       u32 __user *uaddr = (u32 __user *)(long)reg->addr;
> +       struct kvm_regs *regs = &vcpu->arch.regs;
> +       u64 off;
> +
> +       if (KVM_REG_SIZE(reg->id) != 4)
> +               return -ENOENT;
> +
> +       /* Our ID is an index into the kvm_regs struct. */
> +       off = core_reg_offset_from_id(reg->id);
> +       if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
> +               return -ENOENT;
> +
> +       return put_user(((u32 *)regs)[off], uaddr);
> +}
> +
> +static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> +{
> +       u32 __user *uaddr = (u32 __user *)(long)reg->addr;
> +       struct kvm_regs *regs = &vcpu->arch.regs;
> +       u64 off, val;
> +
> +       if (KVM_REG_SIZE(reg->id) != 4)
> +               return -ENOENT;
> +
> +       /* Our ID is an index into the kvm_regs struct. */
> +       off = core_reg_offset_from_id(reg->id);
> +       if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
> +               return -ENOENT;
> +
> +       if (get_user(val, uaddr) != 0)
> +               return -EFAULT;
> +
> +       if (off == KVM_REG_ARM_CORE_REG(cpsr)) {
> +               if (__vcpu_mode(val) == 0xf)
> +                       return -EINVAL;
> +       }
> +
> +       ((u32 *)regs)[off] = val;
> +       return 0;
> +}
> +
> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> +{
> +       return -EINVAL;
> +}
> +
> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> +{
> +       return -EINVAL;
> +}

Again, all looks like this should be implemented using regsets from what I
can tell.

> diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
> new file mode 100644
> index 0000000..290a13d
> --- /dev/null
> +++ b/arch/arm/kvm/reset.c
> @@ -0,0 +1,74 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include <linux/compiler.h>
> +#include <linux/errno.h>
> +#include <linux/sched.h>
> +#include <linux/kvm_host.h>
> +#include <linux/kvm.h>
> +
> +#include <asm/unified.h>
> +#include <asm/ptrace.h>
> +#include <asm/cputype.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_coproc.h>
> +
> +/******************************************************************************
> + * Cortex-A15 Reset Values
> + */
> +
> +static const int a15_max_cpu_idx = 3;
> +
> +static struct kvm_regs a15_regs_reset = {
> +       .cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
> +};
> +
> +
> +/*******************************************************************************
> + * Exported reset function
> + */
> +
> +/**
> + * kvm_reset_vcpu - sets core registers and cp15 registers to reset value
> + * @vcpu: The VCPU pointer
> + *
> + * This function finds the right table above and sets the registers on the
> + * virtual CPU struct to their architectually defined reset values.
> + */
> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> +{
> +       struct kvm_regs *cpu_reset;
> +
> +       switch (vcpu->arch.target) {
> +       case KVM_ARM_TARGET_CORTEX_A15:
> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
> +                       return -EINVAL;
> +               cpu_reset = &a15_regs_reset;
> +               vcpu->arch.midr = read_cpuid_id();
> +               break;
> +       default:
> +               return -ENODEV;
> +       }
> +
> +       /* Reset core registers */
> +       memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
> +
> +       /* Reset CP15 registers */
> +       kvm_reset_coprocs(vcpu);
> +
> +       return 0;
> +}

This is a nice way to plug in new CPUs but the way the rest of the code is
currently written, all the ARMv7 and Cortex-A15 code is merged together. I
*strongly* suggest you isolate this from the start, as it will help you see
what is architected and what is implementation-specific.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-09-25 15:20     ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-25 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 15, 2012 at 04:35:08PM +0100, Christoffer Dall wrote:
> Targets KVM support for Cortex A-15 processors.
> 
> Contains all the framework components, make files, header files, some
> tracing functionality, and basic user space API.
> 
> Only supported core is Cortex-A15 for now.
> 
> Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.
> 
> ?Nothing to see here. Move along, move along..."

[...]

> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
> new file mode 100644
> index 0000000..a13b582
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm.h
> @@ -0,0 +1,88 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_H__
> +#define __ARM_KVM_H__
> +
> +#include <asm/types.h>
> +
> +#define __KVM_HAVE_GUEST_DEBUG
> +
> +#define KVM_REG_SIZE(id)                                               \
> +       (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> +
> +struct kvm_regs {
> +       __u32 usr_regs[15];     /* R0_usr - R14_usr */
> +       __u32 svc_regs[3];      /* SP_svc, LR_svc, SPSR_svc */
> +       __u32 abt_regs[3];      /* SP_abt, LR_abt, SPSR_abt */
> +       __u32 und_regs[3];      /* SP_und, LR_und, SPSR_und */
> +       __u32 irq_regs[3];      /* SP_irq, LR_irq, SPSR_irq */
> +       __u32 fiq_regs[8];      /* R8_fiq - R14_fiq, SPSR_fiq */
> +       __u32 pc;               /* The program counter (r15) */
> +       __u32 cpsr;             /* The guest CPSR */
> +};
> +
> +/* Supported Processor Types */
> +#define KVM_ARM_TARGET_CORTEX_A15      (0xC0F)

So there's this define...

> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> new file mode 100644
> index 0000000..2f9d28e
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -0,0 +1,28 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_ARM_H__
> +#define __ARM_KVM_ARM_H__
> +
> +/* Supported Processor Types */
> +#define CORTEX_A15     (0xC0F)

... and then this one. Do we really need both?

> +/* Multiprocessor Affinity Register */
> +#define MPIDR_CPUID    (0x3 << 0)

I'm fairly sure we already have code under arch/arm/ for dealing with the
mpidr. Let's re-use that rather than reinventing it here.

> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> new file mode 100644
> index 0000000..44591f9
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -0,0 +1,30 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_ASM_H__
> +#define __ARM_KVM_ASM_H__
> +
> +#define ARM_EXCEPTION_RESET      0
> +#define ARM_EXCEPTION_UNDEFINED   1
> +#define ARM_EXCEPTION_SOFTWARE    2
> +#define ARM_EXCEPTION_PREF_ABORT  3
> +#define ARM_EXCEPTION_DATA_ABORT  4
> +#define ARM_EXCEPTION_IRQ        5
> +#define ARM_EXCEPTION_FIQ        6

Again, you have these defines (which look more suited to an enum type), but
later (in kvm_host.h) you have:

> +#define EXCEPTION_NONE      0
> +#define EXCEPTION_RESET     0x80
> +#define EXCEPTION_UNDEFINED 0x40
> +#define EXCEPTION_SOFTWARE  0x20
> +#define EXCEPTION_PREFETCH  0x10
> +#define EXCEPTION_DATA      0x08
> +#define EXCEPTION_IMPRECISE 0x04
> +#define EXCEPTION_IRQ       0x02
> +#define EXCEPTION_FIQ       0x01

Why the noise?

> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> new file mode 100644
> index 0000000..9e29335
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -0,0 +1,108 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_EMULATE_H__
> +#define __ARM_KVM_EMULATE_H__
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_asm.h>
> +
> +u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, enum vcpu_mode mode);
> +
> +static inline u8 __vcpu_mode(u32 cpsr)
> +{
> +       u8 modes_table[32] = {
> +               0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
> +               0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
> +               MODE_USR,       /* 0x0 */
> +               MODE_FIQ,       /* 0x1 */
> +               MODE_IRQ,       /* 0x2 */
> +               MODE_SVC,       /* 0x3 */
> +               0xf, 0xf, 0xf,
> +               MODE_ABT,       /* 0x7 */
> +               0xf, 0xf, 0xf,
> +               MODE_UND,       /* 0xb */
> +               0xf, 0xf, 0xf,
> +               MODE_SYS        /* 0xf */
> +       };

These MODE_* definitions sound like our *_MODE definitions... except they're
not. It would probably be far more readable as a switch, but at least change
the name of those things!

> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
> +{
> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
> +       BUG_ON(mode == 0xf);
> +       return mode;
> +}

I noticed that you have a fair few BUG_ONs throughout the series. Fair
enough, but for hyp code is that really the right thing to do? Killing the
guest could make more sense, perhaps?

> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
> +{
> +       return vcpu_reg(vcpu, 15);
> +}

If you stick a struct pt_regs into struct kvm_regs, you could reuse ARM_pc
here etc.

> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> new file mode 100644
> index 0000000..24959f4
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -0,0 +1,172 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_HOST_H__
> +#define __ARM_KVM_HOST_H__
> +
> +#include <asm/kvm.h>
> +
> +#define KVM_MAX_VCPUS 4

NR_CPUS?

> +#define KVM_MEMORY_SLOTS 32
> +#define KVM_PRIVATE_MEM_SLOTS 4
> +#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> +
> +#define NUM_FEATURES 0

Ha! No idea what that means, but hopefully there's less code to review
because of it :)

> +
> +/* We don't currently support large pages. */
> +#define KVM_HPAGE_GFN_SHIFT(x) 0
> +#define KVM_NR_PAGE_SIZES      1
> +#define KVM_PAGES_PER_HPAGE(x) (1UL<<31)
> +
> +struct kvm_vcpu;
> +u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> +int kvm_target_cpu(void);
> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> +void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
> +
> +struct kvm_arch {
> +       /* The VMID generation used for the virt. memory system */
> +       u64    vmid_gen;
> +       u32    vmid;
> +
> +       /* 1-level 2nd stage table and lock */
> +       spinlock_t pgd_lock;
> +       pgd_t *pgd;
> +
> +       /* VTTBR value associated with above pgd and vmid */
> +       u64    vttbr;
> +};
> +
> +#define EXCEPTION_NONE      0
> +#define EXCEPTION_RESET     0x80
> +#define EXCEPTION_UNDEFINED 0x40
> +#define EXCEPTION_SOFTWARE  0x20
> +#define EXCEPTION_PREFETCH  0x10
> +#define EXCEPTION_DATA      0x08
> +#define EXCEPTION_IMPRECISE 0x04
> +#define EXCEPTION_IRQ       0x02
> +#define EXCEPTION_FIQ       0x01
> +
> +#define KVM_NR_MEM_OBJS     40
> +
> +/*
> + * We don't want allocation failures within the mmu code, so we preallocate
> + * enough memory for a single page fault in a cache.
> + */
> +struct kvm_mmu_memory_cache {
> +       int nobjs;
> +       void *objects[KVM_NR_MEM_OBJS];
> +};
> +
> +/*
> + * Modes used for short-hand mode determinition in the world-switch code and
> + * in emulation code.
> + *
> + * Note: These indices do NOT correspond to the value of the CPSR mode bits!
> + */
> +enum vcpu_mode {
> +       MODE_FIQ = 0,
> +       MODE_IRQ,
> +       MODE_SVC,
> +       MODE_ABT,
> +       MODE_UND,
> +       MODE_USR,
> +       MODE_SYS
> +};

So the need for this enum is for indexing the array of modes, right? But
accesses to that array are already hidden behind an accessor function from
what I can tell, so I'd rather the arithmetic from cpsr -> index was
restricted to that function and the rest of the code just passed either the
raw mode or the full cpsr around.

> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> new file mode 100644
> index 0000000..fd6fa9b
> --- /dev/null
> +++ b/arch/arm/kvm/arm.c
> @@ -0,0 +1,345 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/kvm_host.h>
> +#include <linux/module.h>
> +#include <linux/vmalloc.h>
> +#include <linux/fs.h>
> +#include <linux/mman.h>
> +#include <linux/sched.h>
> +#include <trace/events/kvm.h>
> +
> +#define CREATE_TRACE_POINTS
> +#include "trace.h"
> +
> +#include <asm/unified.h>
> +#include <asm/uaccess.h>
> +#include <asm/ptrace.h>
> +#include <asm/mman.h>
> +#include <asm/cputype.h>
> +
> +#ifdef REQUIRES_VIRT
> +__asm__(".arch_extension       virt");
> +#endif
> +
> +int kvm_arch_hardware_enable(void *garbage)
> +{
> +       return 0;
> +}
> +
> +int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> +{
> +       return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
> +}
> +
> +void kvm_arch_hardware_disable(void *garbage)
> +{
> +}
> +
> +int kvm_arch_hardware_setup(void)
> +{
> +       return 0;
> +}
> +
> +void kvm_arch_hardware_unsetup(void)
> +{
> +}
> +
> +void kvm_arch_check_processor_compat(void *rtn)
> +{
> +       *(int *)rtn = 0;
> +}
> +
> +void kvm_arch_sync_events(struct kvm *kvm)
> +{
> +}
> +
> +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> +{
> +       if (type)
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> +{
> +       return VM_FAULT_SIGBUS;
> +}
> +
> +void kvm_arch_free_memslot(struct kvm_memory_slot *free,
> +                          struct kvm_memory_slot *dont)
> +{
> +}
> +
> +int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
> +{
> +       return 0;
> +}
> +
> +void kvm_arch_destroy_vm(struct kvm *kvm)
> +{
> +       int i;
> +
> +       for (i = 0; i < KVM_MAX_VCPUS; ++i) {
> +               if (kvm->vcpus[i]) {
> +                       kvm_arch_vcpu_free(kvm->vcpus[i]);
> +                       kvm->vcpus[i] = NULL;
> +               }
> +       }
> +}
> +
> +int kvm_dev_ioctl_check_extension(long ext)
> +{
> +       int r;
> +       switch (ext) {
> +       case KVM_CAP_USER_MEMORY:
> +       case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
> +       case KVM_CAP_ONE_REG:
> +               r = 1;
> +               break;
> +       case KVM_CAP_COALESCED_MMIO:
> +               r = KVM_COALESCED_MMIO_PAGE_OFFSET;
> +               break;
> +       default:
> +               r = 0;
> +               break;
> +       }
> +       return r;
> +}
> +
> +long kvm_arch_dev_ioctl(struct file *filp,
> +                       unsigned int ioctl, unsigned long arg)
> +{
> +       return -EINVAL;
> +}
> +
> +int kvm_arch_set_memory_region(struct kvm *kvm,
> +                              struct kvm_userspace_memory_region *mem,
> +                              struct kvm_memory_slot old,
> +                              int user_alloc)
> +{
> +       return 0;
> +}
> +
> +int kvm_arch_prepare_memory_region(struct kvm *kvm,
> +                                  struct kvm_memory_slot *memslot,
> +                                  struct kvm_memory_slot old,
> +                                  struct kvm_userspace_memory_region *mem,
> +                                  int user_alloc)
> +{
> +       return 0;
> +}
> +
> +void kvm_arch_commit_memory_region(struct kvm *kvm,
> +                                  struct kvm_userspace_memory_region *mem,
> +                                  struct kvm_memory_slot old,
> +                                  int user_alloc)
> +{
> +}
> +
> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +}
> +
> +void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> +                                  struct kvm_memory_slot *slot)
> +{
> +}
> +
> +struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
> +{
> +       int err;
> +       struct kvm_vcpu *vcpu;
> +
> +       vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
> +       if (!vcpu) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       err = kvm_vcpu_init(vcpu, kvm, id);
> +       if (err)
> +               goto free_vcpu;
> +
> +       return vcpu;
> +free_vcpu:
> +       kmem_cache_free(kvm_vcpu_cache, vcpu);
> +out:
> +       return ERR_PTR(err);
> +}
> +
> +void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
> +{
> +}
> +
> +void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> +{
> +       kvm_arch_vcpu_free(vcpu);
> +}
> +
> +int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
> +{
> +       return 0;
> +}
> +
> +int __attribute_const__ kvm_target_cpu(void)
> +{
> +       unsigned int midr;
> +
> +       midr = read_cpuid_id();
> +       switch ((midr >> 4) & 0xfff) {
> +       case KVM_ARM_TARGET_CORTEX_A15:
> +               return KVM_ARM_TARGET_CORTEX_A15;

I have this code already in perf_event.c. Can we move it somewhere common
and share it? You should also check that the implementor field is 0x41.

> +       default:
> +               return -EINVAL;
> +       }
> +}
> +
> +int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> +{
> +       return 0;
> +}
> +
> +void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
> +{
> +}
> +
> +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> +{
> +}
> +
> +void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> +{
> +}
> +
> +int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
> +                                       struct kvm_guest_debug *dbg)
> +{
> +       return -EINVAL;
> +}
> +
> +
> +int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> +                                   struct kvm_mp_state *mp_state)
> +{
> +       return -EINVAL;
> +}
> +
> +int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> +                                   struct kvm_mp_state *mp_state)
> +{
> +       return -EINVAL;
> +}
> +
> +int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> +{
> +       return 0;
> +}
> +
> +int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
> +{
> +       return 0;
> +}
> +
> +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> +       return -EINVAL;
> +}
> +
> +long kvm_arch_vcpu_ioctl(struct file *filp,
> +                        unsigned int ioctl, unsigned long arg)
> +{
> +       struct kvm_vcpu *vcpu = filp->private_data;
> +       void __user *argp = (void __user *)arg;
> +
> +       switch (ioctl) {
> +       case KVM_ARM_VCPU_INIT: {
> +               struct kvm_vcpu_init init;
> +
> +               if (copy_from_user(&init, argp, sizeof init))
> +                       return -EFAULT;
> +
> +               return kvm_vcpu_set_target(vcpu, &init);
> +
> +       }
> +       case KVM_SET_ONE_REG:
> +       case KVM_GET_ONE_REG: {
> +               struct kvm_one_reg reg;
> +               if (copy_from_user(&reg, argp, sizeof(reg)))
> +                       return -EFAULT;
> +               if (ioctl == KVM_SET_ONE_REG)
> +                       return kvm_arm_set_reg(vcpu, &reg);
> +               else
> +                       return kvm_arm_get_reg(vcpu, &reg);
> +       }
> +       case KVM_GET_REG_LIST: {
> +               struct kvm_reg_list __user *user_list = argp;
> +               struct kvm_reg_list reg_list;
> +               unsigned n;
> +
> +               if (copy_from_user(&reg_list, user_list, sizeof reg_list))
> +                       return -EFAULT;
> +               n = reg_list.n;
> +               reg_list.n = kvm_arm_num_regs(vcpu);
> +               if (copy_to_user(user_list, &reg_list, sizeof reg_list))
> +                       return -EFAULT;
> +               if (n < reg_list.n)
> +                       return -E2BIG;
> +               return kvm_arm_copy_reg_indices(vcpu, user_list->reg);

kvm_reg_list sounds like it could be done using a regset instead.

> diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
> new file mode 100644
> index 0000000..690bbb3
> --- /dev/null
> +++ b/arch/arm/kvm/emulate.c
> @@ -0,0 +1,127 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#include <asm/kvm_emulate.h>
> +
> +#define REG_OFFSET(_reg) \
> +       (offsetof(struct kvm_regs, _reg) / sizeof(u32))
> +
> +#define USR_REG_OFFSET(_num) REG_OFFSET(usr_regs[_num])
> +
> +static const unsigned long vcpu_reg_offsets[MODE_SYS + 1][16] = {
> +       /* FIQ Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7),
> +               REG_OFFSET(fiq_regs[1]), /* r8 */
> +               REG_OFFSET(fiq_regs[1]), /* r9 */
> +               REG_OFFSET(fiq_regs[2]), /* r10 */
> +               REG_OFFSET(fiq_regs[3]), /* r11 */
> +               REG_OFFSET(fiq_regs[4]), /* r12 */
> +               REG_OFFSET(fiq_regs[5]), /* r13 */
> +               REG_OFFSET(fiq_regs[6]), /* r14 */
> +               REG_OFFSET(pc)           /* r15 */
> +       },
> +
> +       /* IRQ Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(irq_regs[0]), /* r13 */
> +               REG_OFFSET(irq_regs[1]), /* r14 */
> +               REG_OFFSET(pc)           /* r15 */
> +       },
> +
> +       /* SVC Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(svc_regs[0]), /* r13 */
> +               REG_OFFSET(svc_regs[1]), /* r14 */
> +               REG_OFFSET(pc)           /* r15 */
> +       },
> +
> +       /* ABT Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(abt_regs[0]), /* r13 */
> +               REG_OFFSET(abt_regs[1]), /* r14 */
> +               REG_OFFSET(pc)           /* r15 */
> +       },
> +
> +       /* UND Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(und_regs[0]), /* r13 */
> +               REG_OFFSET(und_regs[1]), /* r14 */
> +               REG_OFFSET(pc)           /* r15 */
> +       },
> +
> +       /* USR Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(usr_regs[13]), /* r13 */
> +               REG_OFFSET(usr_regs[14]), /* r14 */
> +               REG_OFFSET(pc)            /* r15 */
> +       },
> +
> +       /* SYS Registers */
> +       {
> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
> +               USR_REG_OFFSET(12),
> +               REG_OFFSET(usr_regs[13]), /* r13 */
> +               REG_OFFSET(usr_regs[14]), /* r14 */
> +               REG_OFFSET(pc)            /* r15 */
> +       },
> +};
> +
> +/*
> + * Return a pointer to the register number valid in the specified mode of
> + * the virtual CPU.
> + */
> +u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
> +{
> +       u32 *reg_array = (u32 *)&vcpu->arch.regs;
> +
> +       BUG_ON(reg_num > 15);
> +       BUG_ON(mode > MODE_SYS);

Again, BUG_ON seems a bit OTT here. Also, this is where the mode => idx
calculation should happen.

> +       return reg_array + vcpu_reg_offsets[mode][reg_num];
> +}
> diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
> new file mode 100644
> index 0000000..3e38c95
> --- /dev/null
> +++ b/arch/arm/kvm/exports.c
> @@ -0,0 +1,21 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#include <linux/module.h>
> +
> +EXPORT_SYMBOL_GPL(smp_send_reschedule);

Erm...

We already have arch/arm/kernel/armksyms.c for exports -- please use that.
However, exporting such low-level operations sounds like a bad idea. How
realistic is kvm-as-a-module on ARM anyway?

> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
> new file mode 100644
> index 0000000..19a5389
> --- /dev/null
> +++ b/arch/arm/kvm/guest.c
> @@ -0,0 +1,211 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/kvm_host.h>
> +#include <linux/module.h>
> +#include <linux/vmalloc.h>
> +#include <linux/fs.h>
> +#include <asm/uaccess.h>
> +#include <asm/kvm.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_emulate.h>
> +
> +#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
> +#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
> +
> +struct kvm_stats_debugfs_item debugfs_entries[] = {
> +       { NULL }
> +};
> +
> +int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
> +{
> +       return 0;
> +}
> +
> +static u64 core_reg_offset_from_id(u64 id)
> +{
> +       return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
> +}
> +
> +static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> +{
> +       u32 __user *uaddr = (u32 __user *)(long)reg->addr;
> +       struct kvm_regs *regs = &vcpu->arch.regs;
> +       u64 off;
> +
> +       if (KVM_REG_SIZE(reg->id) != 4)
> +               return -ENOENT;
> +
> +       /* Our ID is an index into the kvm_regs struct. */
> +       off = core_reg_offset_from_id(reg->id);
> +       if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
> +               return -ENOENT;
> +
> +       return put_user(((u32 *)regs)[off], uaddr);
> +}
> +
> +static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> +{
> +       u32 __user *uaddr = (u32 __user *)(long)reg->addr;
> +       struct kvm_regs *regs = &vcpu->arch.regs;
> +       u64 off, val;
> +
> +       if (KVM_REG_SIZE(reg->id) != 4)
> +               return -ENOENT;
> +
> +       /* Our ID is an index into the kvm_regs struct. */
> +       off = core_reg_offset_from_id(reg->id);
> +       if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
> +               return -ENOENT;
> +
> +       if (get_user(val, uaddr) != 0)
> +               return -EFAULT;
> +
> +       if (off == KVM_REG_ARM_CORE_REG(cpsr)) {
> +               if (__vcpu_mode(val) == 0xf)
> +                       return -EINVAL;
> +       }
> +
> +       ((u32 *)regs)[off] = val;
> +       return 0;
> +}
> +
> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> +{
> +       return -EINVAL;
> +}
> +
> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> +{
> +       return -EINVAL;
> +}

Again, all looks like this should be implemented using regsets from what I
can tell.

> diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
> new file mode 100644
> index 0000000..290a13d
> --- /dev/null
> +++ b/arch/arm/kvm/reset.c
> @@ -0,0 +1,74 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include <linux/compiler.h>
> +#include <linux/errno.h>
> +#include <linux/sched.h>
> +#include <linux/kvm_host.h>
> +#include <linux/kvm.h>
> +
> +#include <asm/unified.h>
> +#include <asm/ptrace.h>
> +#include <asm/cputype.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_coproc.h>
> +
> +/******************************************************************************
> + * Cortex-A15 Reset Values
> + */
> +
> +static const int a15_max_cpu_idx = 3;
> +
> +static struct kvm_regs a15_regs_reset = {
> +       .cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
> +};
> +
> +
> +/*******************************************************************************
> + * Exported reset function
> + */
> +
> +/**
> + * kvm_reset_vcpu - sets core registers and cp15 registers to reset value
> + * @vcpu: The VCPU pointer
> + *
> + * This function finds the right table above and sets the registers on the
> + * virtual CPU struct to their architectually defined reset values.
> + */
> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> +{
> +       struct kvm_regs *cpu_reset;
> +
> +       switch (vcpu->arch.target) {
> +       case KVM_ARM_TARGET_CORTEX_A15:
> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
> +                       return -EINVAL;
> +               cpu_reset = &a15_regs_reset;
> +               vcpu->arch.midr = read_cpuid_id();
> +               break;
> +       default:
> +               return -ENODEV;
> +       }
> +
> +       /* Reset core registers */
> +       memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
> +
> +       /* Reset CP15 registers */
> +       kvm_reset_coprocs(vcpu);
> +
> +       return 0;
> +}

This is a nice way to plug in new CPUs but the way the rest of the code is
currently written, all the ARMv7 and Cortex-A15 code is merged together. I
*strongly* suggest you isolate this from the start, as it will help you see
what is architected and what is implementation-specific.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 09/15] KVM: ARM: Inject IRQs and FIQs from userspace
  2012-09-15 15:35   ` Christoffer Dall
@ 2012-09-25 15:55     ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-25 15:55 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Sat, Sep 15, 2012 at 04:35:27PM +0100, Christoffer Dall wrote:
> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
> index a13b582..131e632 100644
> --- a/arch/arm/include/asm/kvm.h
> +++ b/arch/arm/include/asm/kvm.h
> @@ -22,6 +22,7 @@
>  #include <asm/types.h>
>  
>  #define __KVM_HAVE_GUEST_DEBUG
> +#define __KVM_HAVE_IRQ_LINE
>  
>  #define KVM_REG_SIZE(id)						\
>  	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> @@ -85,4 +86,24 @@ struct kvm_reg_list {
>  #define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
>  #define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
>  
> +/* KVM_IRQ_LINE irq field index values */
> +#define KVM_ARM_IRQ_TYPE_SHIFT		24
> +#define KVM_ARM_IRQ_TYPE_MASK		0xff
> +#define KVM_ARM_IRQ_VCPU_SHIFT		16
> +#define KVM_ARM_IRQ_VCPU_MASK		0xff
> +#define KVM_ARM_IRQ_NUM_SHIFT		0
> +#define KVM_ARM_IRQ_NUM_MASK		0xffff
> +
> +/* irq_type field */
> +#define KVM_ARM_IRQ_TYPE_CPU		0
> +#define KVM_ARM_IRQ_TYPE_SPI		1
> +#define KVM_ARM_IRQ_TYPE_PPI		2
> +
> +/* out-of-kernel GIC cpu interrupt injection irq_number field */
> +#define KVM_ARM_IRQ_CPU_IRQ		0
> +#define KVM_ARM_IRQ_CPU_FIQ		1
> +
> +/* Highest supported SPI, from VGIC_NR_IRQS */
> +#define KVM_ARM_IRQ_GIC_MAX		127

This define, and those referring to PPIs and SPIs sound highly GIC-specific.
Is that really appropriate for kvm.h? Do you mandate a single GIC as the
only interrupt controller?

> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index 6e46541..0f641c1 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -74,6 +74,7 @@
>  #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
>  			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
>  			HCR_SWIO | HCR_TIDCP)
> +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>  
>  /* Hyp System Control Register (HSCTLR) bits */
>  #define HSCTLR_TE	(1 << 30)
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index b97ebd0..8a87fc7 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -24,6 +24,7 @@
>  #include <linux/fs.h>
>  #include <linux/mman.h>
>  #include <linux/sched.h>
> +#include <linux/kvm.h>
>  #include <trace/events/kvm.h>
>  
>  #define CREATE_TRACE_POINTS
> @@ -271,6 +272,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
> +	vcpu->cpu = cpu;
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -311,6 +313,74 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	return -EINVAL;
>  }
>  
> +static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
> +{
> +	int bit_index;
> +	bool set;
> +	unsigned long *ptr;
> +
> +	if (number == KVM_ARM_IRQ_CPU_IRQ)
> +		bit_index = ffs(HCR_VI) - 1;
> +	else /* KVM_ARM_IRQ_CPU_FIQ */
> +		bit_index = ffs(HCR_VF) - 1;

__ffs

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 09/15] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2012-09-25 15:55     ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-25 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 15, 2012 at 04:35:27PM +0100, Christoffer Dall wrote:
> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
> index a13b582..131e632 100644
> --- a/arch/arm/include/asm/kvm.h
> +++ b/arch/arm/include/asm/kvm.h
> @@ -22,6 +22,7 @@
>  #include <asm/types.h>
>  
>  #define __KVM_HAVE_GUEST_DEBUG
> +#define __KVM_HAVE_IRQ_LINE
>  
>  #define KVM_REG_SIZE(id)						\
>  	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> @@ -85,4 +86,24 @@ struct kvm_reg_list {
>  #define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
>  #define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
>  
> +/* KVM_IRQ_LINE irq field index values */
> +#define KVM_ARM_IRQ_TYPE_SHIFT		24
> +#define KVM_ARM_IRQ_TYPE_MASK		0xff
> +#define KVM_ARM_IRQ_VCPU_SHIFT		16
> +#define KVM_ARM_IRQ_VCPU_MASK		0xff
> +#define KVM_ARM_IRQ_NUM_SHIFT		0
> +#define KVM_ARM_IRQ_NUM_MASK		0xffff
> +
> +/* irq_type field */
> +#define KVM_ARM_IRQ_TYPE_CPU		0
> +#define KVM_ARM_IRQ_TYPE_SPI		1
> +#define KVM_ARM_IRQ_TYPE_PPI		2
> +
> +/* out-of-kernel GIC cpu interrupt injection irq_number field */
> +#define KVM_ARM_IRQ_CPU_IRQ		0
> +#define KVM_ARM_IRQ_CPU_FIQ		1
> +
> +/* Highest supported SPI, from VGIC_NR_IRQS */
> +#define KVM_ARM_IRQ_GIC_MAX		127

This define, and those referring to PPIs and SPIs sound highly GIC-specific.
Is that really appropriate for kvm.h? Do you mandate a single GIC as the
only interrupt controller?

> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index 6e46541..0f641c1 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -74,6 +74,7 @@
>  #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
>  			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
>  			HCR_SWIO | HCR_TIDCP)
> +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>  
>  /* Hyp System Control Register (HSCTLR) bits */
>  #define HSCTLR_TE	(1 << 30)
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index b97ebd0..8a87fc7 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -24,6 +24,7 @@
>  #include <linux/fs.h>
>  #include <linux/mman.h>
>  #include <linux/sched.h>
> +#include <linux/kvm.h>
>  #include <trace/events/kvm.h>
>  
>  #define CREATE_TRACE_POINTS
> @@ -271,6 +272,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
> +	vcpu->cpu = cpu;
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -311,6 +313,74 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	return -EINVAL;
>  }
>  
> +static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
> +{
> +	int bit_index;
> +	bool set;
> +	unsigned long *ptr;
> +
> +	if (number == KVM_ARM_IRQ_CPU_IRQ)
> +		bit_index = ffs(HCR_VI) - 1;
> +	else /* KVM_ARM_IRQ_CPU_FIQ */
> +		bit_index = ffs(HCR_VF) - 1;

__ffs

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 10/15] KVM: ARM: World-switch implementation
  2012-09-15 15:35   ` Christoffer Dall
@ 2012-09-25 17:00     ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-25 17:00 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Sat, Sep 15, 2012 at 04:35:33PM +0100, Christoffer Dall wrote:
> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> index 1429d89..cd8fc86 100644
> --- a/arch/arm/kernel/asm-offsets.c
> +++ b/arch/arm/kernel/asm-offsets.c
> @@ -13,6 +13,7 @@
>  #include <linux/sched.h>
>  #include <linux/mm.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/kvm_host.h>
>  #include <asm/cacheflush.h>
>  #include <asm/glue-df.h>
>  #include <asm/glue-pf.h>
> @@ -144,5 +145,48 @@ int main(void)
>    DEFINE(DMA_BIDIRECTIONAL,    DMA_BIDIRECTIONAL);
>    DEFINE(DMA_TO_DEVICE,                DMA_TO_DEVICE);
>    DEFINE(DMA_FROM_DEVICE,      DMA_FROM_DEVICE);
> +#ifdef CONFIG_KVM_ARM_HOST
> +  DEFINE(VCPU_KVM,             offsetof(struct kvm_vcpu, kvm));
> +  DEFINE(VCPU_MIDR,            offsetof(struct kvm_vcpu, arch.midr));
> +  DEFINE(VCPU_MPIDR,           offsetof(struct kvm_vcpu, arch.cp15[c0_MPIDR]));
> +  DEFINE(VCPU_CSSELR,          offsetof(struct kvm_vcpu, arch.cp15[c0_CSSELR]));
> +  DEFINE(VCPU_SCTLR,           offsetof(struct kvm_vcpu, arch.cp15[c1_SCTLR]));
> +  DEFINE(VCPU_CPACR,           offsetof(struct kvm_vcpu, arch.cp15[c1_CPACR]));
> +  DEFINE(VCPU_TTBR0,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR0]));
> +  DEFINE(VCPU_TTBR1,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR1]));
> +  DEFINE(VCPU_TTBCR,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBCR]));
> +  DEFINE(VCPU_DACR,            offsetof(struct kvm_vcpu, arch.cp15[c3_DACR]));
> +  DEFINE(VCPU_DFSR,            offsetof(struct kvm_vcpu, arch.cp15[c5_DFSR]));
> +  DEFINE(VCPU_IFSR,            offsetof(struct kvm_vcpu, arch.cp15[c5_IFSR]));
> +  DEFINE(VCPU_ADFSR,           offsetof(struct kvm_vcpu, arch.cp15[c5_ADFSR]));
> +  DEFINE(VCPU_AIFSR,           offsetof(struct kvm_vcpu, arch.cp15[c5_AIFSR]));
> +  DEFINE(VCPU_DFAR,            offsetof(struct kvm_vcpu, arch.cp15[c6_DFAR]));
> +  DEFINE(VCPU_IFAR,            offsetof(struct kvm_vcpu, arch.cp15[c6_IFAR]));
> +  DEFINE(VCPU_PRRR,            offsetof(struct kvm_vcpu, arch.cp15[c10_PRRR]));
> +  DEFINE(VCPU_NMRR,            offsetof(struct kvm_vcpu, arch.cp15[c10_NMRR]));
> +  DEFINE(VCPU_VBAR,            offsetof(struct kvm_vcpu, arch.cp15[c12_VBAR]));
> +  DEFINE(VCPU_CID,             offsetof(struct kvm_vcpu, arch.cp15[c13_CID]));
> +  DEFINE(VCPU_TID_URW,         offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URW]));
> +  DEFINE(VCPU_TID_URO,         offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URO]));
> +  DEFINE(VCPU_TID_PRIV,                offsetof(struct kvm_vcpu, arch.cp15[c13_TID_PRIV]));

Could you instead define an offset for arch.cp15, then use scaled offsets
from that in the assembly code?

> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 8a87fc7..087f9d1 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -41,6 +41,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_emulate.h>
> 
>  #ifdef REQUIRES_VIRT
>  __asm__(".arch_extension       virt");
> @@ -50,6 +51,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>  static unsigned long hyp_default_vectors;
> 
> +/* The VMID used in the VTTBR */
> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> +static u8 kvm_next_vmid;
> +static DEFINE_SPINLOCK(kvm_vmid_lock);
> 
>  int kvm_arch_hardware_enable(void *garbage)
>  {
> @@ -273,6 +278,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>         vcpu->cpu = cpu;
> +       vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
>  }
> 
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -305,12 +311,169 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> 
>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
>  {
> +       return v->mode == IN_GUEST_MODE;
> +}
> +
> +static void reset_vm_context(void *info)
> +{
> +       __kvm_flush_vm_context();
> +}
> +
> +/**
> + * need_new_vmid_gen - check that the VMID is still valid
> + * @kvm: The VM's VMID to checkt
> + *
> + * return true if there is a new generation of VMIDs being used
> + *
> + * The hardware supports only 256 values with the value zero reserved for the
> + * host, so we check if an assigned value belongs to a previous generation,
> + * which which requires us to assign a new value. If we're the first to use a
> + * VMID for the new generation, we must flush necessary caches and TLBs on all
> + * CPUs.
> + */
> +static bool need_new_vmid_gen(struct kvm *kvm)
> +{
> +       return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
> +}
> +
> +/**
> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
> + * @kvm        The guest that we are about to run
> + *
> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
> + * caches and TLBs.
> + */
> +static void update_vttbr(struct kvm *kvm)
> +{
> +       phys_addr_t pgd_phys;
> +
> +       if (!need_new_vmid_gen(kvm))
> +               return;
> +
> +       spin_lock(&kvm_vmid_lock);
> +
> +       /* First user of a new VMID generation? */
> +       if (unlikely(kvm_next_vmid == 0)) {
> +               atomic64_inc(&kvm_vmid_gen);
> +               kvm_next_vmid = 1;
> +
> +               /*
> +                * On SMP we know no other CPUs can use this CPU's or
> +                * each other's VMID since the kvm_vmid_lock blocks
> +                * them from reentry to the guest.
> +                */
> +               on_each_cpu(reset_vm_context, NULL, 1);

Why on_each_cpu? The maintenance operations should be broadcast, right?

> +       }
> +
> +       kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
> +       kvm->arch.vmid = kvm_next_vmid;
> +       kvm_next_vmid++;
> +
> +       /* update vttbr to be used with the new vmid */
> +       pgd_phys = virt_to_phys(kvm->arch.pgd);
> +       kvm->arch.vttbr = pgd_phys & ((1LLU << 40) - 1)
> +                         & ~((2 << VTTBR_X) - 1);
> +       kvm->arch.vttbr |= (u64)(kvm->arch.vmid) << 48;
> +
> +       spin_unlock(&kvm_vmid_lock);
> +}

This smells like a watered-down version of the ASID allocator. Now, I can't
really see much code sharing going on here, but perhaps your case is
simpler... do you anticipate running more than 255 VMs in parallel? If not,
then you could just invalidate the relevant TLB entries on VM shutdown and
avoid the rollover case.

> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index edf9ed5..cc9448b 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -23,6 +23,12 @@
>  #include <asm/asm-offsets.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_arm.h>
> +#include <asm/vfpmacros.h>
> +
> +#define VCPU_USR_REG(_reg_nr)  (VCPU_USR_REGS + (_reg_nr * 4))
> +#define VCPU_USR_SP            (VCPU_USR_REG(13))
> +#define VCPU_FIQ_REG(_reg_nr)  (VCPU_FIQ_REGS + (_reg_nr * 4))
> +#define VCPU_FIQ_SPSR          (VCPU_FIQ_REG(7))
> 
>         .text
>         .align  PAGE_SHIFT
> @@ -34,7 +40,33 @@ __kvm_hyp_code_start:
>  @  Flush per-VMID TLBs
>  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

This comment syntax crops up a few times in your .S files but doesn't match
anything currently under arch/arm/. Please can you follow what we do there
and use /* */ ?

>  ENTRY(__kvm_tlb_flush_vmid)
> +       hvc     #0                      @ Switch to Hyp mode
> +       push    {r2, r3}
> +
> +       add     r0, r0, #KVM_VTTBR
> +       ldrd    r2, r3, [r0]
> +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
> +       isb
> +       mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
> +       dsb
> +       isb
> +       mov     r2, #0
> +       mov     r3, #0
> +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
> +       isb

Do you need this isb, given that you're about to do an hvc?

> +       pop     {r2, r3}
> +       hvc     #0                      @ Back to SVC
>         bx      lr
>  ENDPROC(__kvm_tlb_flush_vmid)
> 
> @@ -42,26 +74,702 @@ ENDPROC(__kvm_tlb_flush_vmid)
>  @  Flush TLBs and instruction caches of current CPU for all VMIDs
>  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> 
> +/*
> + * void __kvm_flush_vm_context(void);
> + */
>  ENTRY(__kvm_flush_vm_context)
> +       hvc     #0                      @ switch to hyp-mode
> +
> +       mov     r0, #0                  @ rn parameter for c15 flushes is SBZ
> +       mcr     p15, 4, r0, c8, c7, 4   @ Invalidate Non-secure Non-Hyp TLB
> +       mcr     p15, 0, r0, c7, c5, 0   @ Invalidate instruction caches
> +       dsb
> +       isb

Likewise.

> +       hvc     #0                      @ switch back to svc-mode, see hyp_svc
>         bx      lr
>  ENDPROC(__kvm_flush_vm_context)
> 
> +/* These are simply for the macros to work - value don't have meaning */
> +.equ usr, 0
> +.equ svc, 1
> +.equ abt, 2
> +.equ und, 3
> +.equ irq, 4
> +.equ fiq, 5
> +
> +.macro store_mode_state base_reg, mode
> +       .if \mode == usr
> +       mrs     r2, SP_usr
> +       mov     r3, lr
> +       stmdb   \base_reg!, {r2, r3}
> +       .elseif \mode != fiq
> +       mrs     r2, SP_\mode
> +       mrs     r3, LR_\mode
> +       mrs     r4, SPSR_\mode
> +       stmdb   \base_reg!, {r2, r3, r4}
> +       .else
> +       mrs     r2, r8_fiq
> +       mrs     r3, r9_fiq
> +       mrs     r4, r10_fiq
> +       mrs     r5, r11_fiq
> +       mrs     r6, r12_fiq
> +       mrs     r7, SP_fiq
> +       mrs     r8, LR_fiq
> +       mrs     r9, SPSR_fiq
> +       stmdb   \base_reg!, {r2-r9}
> +       .endif
> +.endm

Perhaps you could stick the assembly macros into a separate file, like we do
in assembler.h, so the code is more readable and they can be reused if
need-be?

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 10/15] KVM: ARM: World-switch implementation
@ 2012-09-25 17:00     ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-25 17:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 15, 2012 at 04:35:33PM +0100, Christoffer Dall wrote:
> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> index 1429d89..cd8fc86 100644
> --- a/arch/arm/kernel/asm-offsets.c
> +++ b/arch/arm/kernel/asm-offsets.c
> @@ -13,6 +13,7 @@
>  #include <linux/sched.h>
>  #include <linux/mm.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/kvm_host.h>
>  #include <asm/cacheflush.h>
>  #include <asm/glue-df.h>
>  #include <asm/glue-pf.h>
> @@ -144,5 +145,48 @@ int main(void)
>    DEFINE(DMA_BIDIRECTIONAL,    DMA_BIDIRECTIONAL);
>    DEFINE(DMA_TO_DEVICE,                DMA_TO_DEVICE);
>    DEFINE(DMA_FROM_DEVICE,      DMA_FROM_DEVICE);
> +#ifdef CONFIG_KVM_ARM_HOST
> +  DEFINE(VCPU_KVM,             offsetof(struct kvm_vcpu, kvm));
> +  DEFINE(VCPU_MIDR,            offsetof(struct kvm_vcpu, arch.midr));
> +  DEFINE(VCPU_MPIDR,           offsetof(struct kvm_vcpu, arch.cp15[c0_MPIDR]));
> +  DEFINE(VCPU_CSSELR,          offsetof(struct kvm_vcpu, arch.cp15[c0_CSSELR]));
> +  DEFINE(VCPU_SCTLR,           offsetof(struct kvm_vcpu, arch.cp15[c1_SCTLR]));
> +  DEFINE(VCPU_CPACR,           offsetof(struct kvm_vcpu, arch.cp15[c1_CPACR]));
> +  DEFINE(VCPU_TTBR0,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR0]));
> +  DEFINE(VCPU_TTBR1,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR1]));
> +  DEFINE(VCPU_TTBCR,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBCR]));
> +  DEFINE(VCPU_DACR,            offsetof(struct kvm_vcpu, arch.cp15[c3_DACR]));
> +  DEFINE(VCPU_DFSR,            offsetof(struct kvm_vcpu, arch.cp15[c5_DFSR]));
> +  DEFINE(VCPU_IFSR,            offsetof(struct kvm_vcpu, arch.cp15[c5_IFSR]));
> +  DEFINE(VCPU_ADFSR,           offsetof(struct kvm_vcpu, arch.cp15[c5_ADFSR]));
> +  DEFINE(VCPU_AIFSR,           offsetof(struct kvm_vcpu, arch.cp15[c5_AIFSR]));
> +  DEFINE(VCPU_DFAR,            offsetof(struct kvm_vcpu, arch.cp15[c6_DFAR]));
> +  DEFINE(VCPU_IFAR,            offsetof(struct kvm_vcpu, arch.cp15[c6_IFAR]));
> +  DEFINE(VCPU_PRRR,            offsetof(struct kvm_vcpu, arch.cp15[c10_PRRR]));
> +  DEFINE(VCPU_NMRR,            offsetof(struct kvm_vcpu, arch.cp15[c10_NMRR]));
> +  DEFINE(VCPU_VBAR,            offsetof(struct kvm_vcpu, arch.cp15[c12_VBAR]));
> +  DEFINE(VCPU_CID,             offsetof(struct kvm_vcpu, arch.cp15[c13_CID]));
> +  DEFINE(VCPU_TID_URW,         offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URW]));
> +  DEFINE(VCPU_TID_URO,         offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URO]));
> +  DEFINE(VCPU_TID_PRIV,                offsetof(struct kvm_vcpu, arch.cp15[c13_TID_PRIV]));

Could you instead define an offset for arch.cp15, then use scaled offsets
from that in the assembly code?

> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 8a87fc7..087f9d1 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -41,6 +41,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_emulate.h>
> 
>  #ifdef REQUIRES_VIRT
>  __asm__(".arch_extension       virt");
> @@ -50,6 +51,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>  static unsigned long hyp_default_vectors;
> 
> +/* The VMID used in the VTTBR */
> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> +static u8 kvm_next_vmid;
> +static DEFINE_SPINLOCK(kvm_vmid_lock);
> 
>  int kvm_arch_hardware_enable(void *garbage)
>  {
> @@ -273,6 +278,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>         vcpu->cpu = cpu;
> +       vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
>  }
> 
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -305,12 +311,169 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> 
>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
>  {
> +       return v->mode == IN_GUEST_MODE;
> +}
> +
> +static void reset_vm_context(void *info)
> +{
> +       __kvm_flush_vm_context();
> +}
> +
> +/**
> + * need_new_vmid_gen - check that the VMID is still valid
> + * @kvm: The VM's VMID to checkt
> + *
> + * return true if there is a new generation of VMIDs being used
> + *
> + * The hardware supports only 256 values with the value zero reserved for the
> + * host, so we check if an assigned value belongs to a previous generation,
> + * which which requires us to assign a new value. If we're the first to use a
> + * VMID for the new generation, we must flush necessary caches and TLBs on all
> + * CPUs.
> + */
> +static bool need_new_vmid_gen(struct kvm *kvm)
> +{
> +       return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
> +}
> +
> +/**
> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
> + * @kvm        The guest that we are about to run
> + *
> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
> + * caches and TLBs.
> + */
> +static void update_vttbr(struct kvm *kvm)
> +{
> +       phys_addr_t pgd_phys;
> +
> +       if (!need_new_vmid_gen(kvm))
> +               return;
> +
> +       spin_lock(&kvm_vmid_lock);
> +
> +       /* First user of a new VMID generation? */
> +       if (unlikely(kvm_next_vmid == 0)) {
> +               atomic64_inc(&kvm_vmid_gen);
> +               kvm_next_vmid = 1;
> +
> +               /*
> +                * On SMP we know no other CPUs can use this CPU's or
> +                * each other's VMID since the kvm_vmid_lock blocks
> +                * them from reentry to the guest.
> +                */
> +               on_each_cpu(reset_vm_context, NULL, 1);

Why on_each_cpu? The maintenance operations should be broadcast, right?

> +       }
> +
> +       kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
> +       kvm->arch.vmid = kvm_next_vmid;
> +       kvm_next_vmid++;
> +
> +       /* update vttbr to be used with the new vmid */
> +       pgd_phys = virt_to_phys(kvm->arch.pgd);
> +       kvm->arch.vttbr = pgd_phys & ((1LLU << 40) - 1)
> +                         & ~((2 << VTTBR_X) - 1);
> +       kvm->arch.vttbr |= (u64)(kvm->arch.vmid) << 48;
> +
> +       spin_unlock(&kvm_vmid_lock);
> +}

This smells like a watered-down version of the ASID allocator. Now, I can't
really see much code sharing going on here, but perhaps your case is
simpler... do you anticipate running more than 255 VMs in parallel? If not,
then you could just invalidate the relevant TLB entries on VM shutdown and
avoid the rollover case.

> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index edf9ed5..cc9448b 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -23,6 +23,12 @@
>  #include <asm/asm-offsets.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_arm.h>
> +#include <asm/vfpmacros.h>
> +
> +#define VCPU_USR_REG(_reg_nr)  (VCPU_USR_REGS + (_reg_nr * 4))
> +#define VCPU_USR_SP            (VCPU_USR_REG(13))
> +#define VCPU_FIQ_REG(_reg_nr)  (VCPU_FIQ_REGS + (_reg_nr * 4))
> +#define VCPU_FIQ_SPSR          (VCPU_FIQ_REG(7))
> 
>         .text
>         .align  PAGE_SHIFT
> @@ -34,7 +40,33 @@ __kvm_hyp_code_start:
>  @  Flush per-VMID TLBs
>  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

This comment syntax crops up a few times in your .S files but doesn't match
anything currently under arch/arm/. Please can you follow what we do there
and use /* */ ?

>  ENTRY(__kvm_tlb_flush_vmid)
> +       hvc     #0                      @ Switch to Hyp mode
> +       push    {r2, r3}
> +
> +       add     r0, r0, #KVM_VTTBR
> +       ldrd    r2, r3, [r0]
> +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
> +       isb
> +       mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
> +       dsb
> +       isb
> +       mov     r2, #0
> +       mov     r3, #0
> +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
> +       isb

Do you need this isb, given that you're about to do an hvc?

> +       pop     {r2, r3}
> +       hvc     #0                      @ Back to SVC
>         bx      lr
>  ENDPROC(__kvm_tlb_flush_vmid)
> 
> @@ -42,26 +74,702 @@ ENDPROC(__kvm_tlb_flush_vmid)
>  @  Flush TLBs and instruction caches of current CPU for all VMIDs
>  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> 
> +/*
> + * void __kvm_flush_vm_context(void);
> + */
>  ENTRY(__kvm_flush_vm_context)
> +       hvc     #0                      @ switch to hyp-mode
> +
> +       mov     r0, #0                  @ rn parameter for c15 flushes is SBZ
> +       mcr     p15, 4, r0, c8, c7, 4   @ Invalidate Non-secure Non-Hyp TLB
> +       mcr     p15, 0, r0, c7, c5, 0   @ Invalidate instruction caches
> +       dsb
> +       isb

Likewise.

> +       hvc     #0                      @ switch back to svc-mode, see hyp_svc
>         bx      lr
>  ENDPROC(__kvm_flush_vm_context)
> 
> +/* These are simply for the macros to work - value don't have meaning */
> +.equ usr, 0
> +.equ svc, 1
> +.equ abt, 2
> +.equ und, 3
> +.equ irq, 4
> +.equ fiq, 5
> +
> +.macro store_mode_state base_reg, mode
> +       .if \mode == usr
> +       mrs     r2, SP_usr
> +       mov     r3, lr
> +       stmdb   \base_reg!, {r2, r3}
> +       .elseif \mode != fiq
> +       mrs     r2, SP_\mode
> +       mrs     r3, LR_\mode
> +       mrs     r4, SPSR_\mode
> +       stmdb   \base_reg!, {r2, r3, r4}
> +       .else
> +       mrs     r2, r8_fiq
> +       mrs     r3, r9_fiq
> +       mrs     r4, r10_fiq
> +       mrs     r5, r11_fiq
> +       mrs     r6, r12_fiq
> +       mrs     r7, SP_fiq
> +       mrs     r8, LR_fiq
> +       mrs     r9, SPSR_fiq
> +       stmdb   \base_reg!, {r2-r9}
> +       .endif
> +.endm

Perhaps you could stick the assembly macros into a separate file, like we do
in assembler.h, so the code is more readable and they can be reused if
need-be?

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 15/15] KVM: ARM: Guest wait-for-interrupts (WFI) support
  2012-09-15 15:36   ` Christoffer Dall
@ 2012-09-25 17:04     ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-25 17:04 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

On Sat, Sep 15, 2012 at 04:36:05PM +0100, Christoffer Dall wrote:
> From: Christoffer Dall <cdall@cs.columbia.edu>
> 
> When the guest executes a WFI instruction the operation is trapped to
> KVM, which emulates the instruction in software. There is no correlation
> between a guest executing a WFI instruction and actually putting the
> hardware into a low-power mode, since a KVM guest is essentially a
> process and the WFI instruction can be seen as 'sleep' call from this
> process. Therefore, we block the vcpu when the guest excecutes a wfi
> instruction and the IRQ or FIQ lines are not raised.
> 
> When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
> signal the VCPU thread and unflag the VCPU to no longer wait for
> interrupts.

Seems a bit strange tagging this small addition on the end of this series.
Can you merge it in with the rest?

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 15/15] KVM: ARM: Guest wait-for-interrupts (WFI) support
@ 2012-09-25 17:04     ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-25 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 15, 2012 at 04:36:05PM +0100, Christoffer Dall wrote:
> From: Christoffer Dall <cdall@cs.columbia.edu>
> 
> When the guest executes a WFI instruction the operation is trapped to
> KVM, which emulates the instruction in software. There is no correlation
> between a guest executing a WFI instruction and actually putting the
> hardware into a low-power mode, since a KVM guest is essentially a
> process and the WFI instruction can be seen as 'sleep' call from this
> process. Therefore, we block the vcpu when the guest excecutes a wfi
> instruction and the IRQ or FIQ lines are not raised.
> 
> When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
> signal the VCPU thread and unflag the VCPU to no longer wait for
> interrupts.

Seems a bit strange tagging this small addition on the end of this series.
Can you merge it in with the rest?

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 10/15] KVM: ARM: World-switch implementation
  2012-09-25 17:00     ` Will Deacon
@ 2012-09-25 17:15       ` Peter Maydell
  -1 siblings, 0 replies; 164+ messages in thread
From: Peter Maydell @ 2012-09-25 17:15 UTC (permalink / raw)
  To: Will Deacon; +Cc: Christoffer Dall, linux-arm-kernel, kvm, kvmarm

On 25 September 2012 18:00, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:35:33PM +0100, Christoffer Dall wrote:
>>  ENTRY(__kvm_tlb_flush_vmid)
>> +       hvc     #0                      @ Switch to Hyp mode
>> +       push    {r2, r3}
>> +
>> +       add     r0, r0, #KVM_VTTBR
>> +       ldrd    r2, r3, [r0]
>> +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>> +       isb
>> +       mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
>> +       dsb
>> +       isb
>> +       mov     r2, #0
>> +       mov     r3, #0
>> +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
>> +       isb
>
> Do you need this isb, given that you're about to do an hvc?
>
>> +       pop     {r2, r3}
>> +       hvc     #0                      @ Back to SVC
>>         bx      lr

...you probably don't want to do the memory accesses involved
in the 'pop' under the wrong VMID ?

-- PMM

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 10/15] KVM: ARM: World-switch implementation
@ 2012-09-25 17:15       ` Peter Maydell
  0 siblings, 0 replies; 164+ messages in thread
From: Peter Maydell @ 2012-09-25 17:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 25 September 2012 18:00, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:35:33PM +0100, Christoffer Dall wrote:
>>  ENTRY(__kvm_tlb_flush_vmid)
>> +       hvc     #0                      @ Switch to Hyp mode
>> +       push    {r2, r3}
>> +
>> +       add     r0, r0, #KVM_VTTBR
>> +       ldrd    r2, r3, [r0]
>> +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>> +       isb
>> +       mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
>> +       dsb
>> +       isb
>> +       mov     r2, #0
>> +       mov     r3, #0
>> +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
>> +       isb
>
> Do you need this isb, given that you're about to do an hvc?
>
>> +       pop     {r2, r3}
>> +       hvc     #0                      @ Back to SVC
>>         bx      lr

...you probably don't want to do the memory accesses involved
in the 'pop' under the wrong VMID ?

-- PMM

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 10/15] KVM: ARM: World-switch implementation
  2012-09-25 17:15       ` Peter Maydell
@ 2012-09-25 17:42         ` Marc Zyngier
  -1 siblings, 0 replies; 164+ messages in thread
From: Marc Zyngier @ 2012-09-25 17:42 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Will Deacon, Christoffer Dall, kvm, linux-arm-kernel, kvmarm

On Tue, 25 Sep 2012 18:15:50 +0100, Peter Maydell
<peter.maydell@linaro.org> wrote:
> On 25 September 2012 18:00, Will Deacon <will.deacon@arm.com> wrote:
>> On Sat, Sep 15, 2012 at 04:35:33PM +0100, Christoffer Dall wrote:
>>>  ENTRY(__kvm_tlb_flush_vmid)
>>> +       hvc     #0                      @ Switch to Hyp mode
>>> +       push    {r2, r3}
>>> +
>>> +       add     r0, r0, #KVM_VTTBR
>>> +       ldrd    r2, r3, [r0]
>>> +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>>> +       isb
>>> +       mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
>>> +       dsb
>>> +       isb
>>> +       mov     r2, #0
>>> +       mov     r3, #0
>>> +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
>>> +       isb
>>
>> Do you need this isb, given that you're about to do an hvc?
>>
>>> +       pop     {r2, r3}
>>> +       hvc     #0                      @ Back to SVC
>>>         bx      lr
> 
> ...you probably don't want to do the memory accesses involved
> in the 'pop' under the wrong VMID ?

Well, we're still in HYP mode when performing the pop, so the VMID is
pretty much irrelevant. Same for the initial push, actually. As long as
we're sure VTTBR has been updated when we do the exception return, I think
we're safe.

        M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 10/15] KVM: ARM: World-switch implementation
@ 2012-09-25 17:42         ` Marc Zyngier
  0 siblings, 0 replies; 164+ messages in thread
From: Marc Zyngier @ 2012-09-25 17:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 25 Sep 2012 18:15:50 +0100, Peter Maydell
<peter.maydell@linaro.org> wrote:
> On 25 September 2012 18:00, Will Deacon <will.deacon@arm.com> wrote:
>> On Sat, Sep 15, 2012 at 04:35:33PM +0100, Christoffer Dall wrote:
>>>  ENTRY(__kvm_tlb_flush_vmid)
>>> +       hvc     #0                      @ Switch to Hyp mode
>>> +       push    {r2, r3}
>>> +
>>> +       add     r0, r0, #KVM_VTTBR
>>> +       ldrd    r2, r3, [r0]
>>> +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>>> +       isb
>>> +       mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
>>> +       dsb
>>> +       isb
>>> +       mov     r2, #0
>>> +       mov     r3, #0
>>> +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
>>> +       isb
>>
>> Do you need this isb, given that you're about to do an hvc?
>>
>>> +       pop     {r2, r3}
>>> +       hvc     #0                      @ Back to SVC
>>>         bx      lr
> 
> ...you probably don't want to do the memory accesses involved
> in the 'pop' under the wrong VMID ?

Well, we're still in HYP mode when performing the pop, so the VMID is
pretty much irrelevant. Same for the initial push, actually. As long as
we're sure VTTBR has been updated when we do the exception return, I think
we're safe.

        M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-09-25 15:20     ` Will Deacon
@ 2012-09-26  1:43       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-26  1:43 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvm, linux-arm-kernel, kvmarm, rusty.russell, avi

On 09/25/2012 11:20 AM, Will Deacon wrote:
> On Sat, Sep 15, 2012 at 04:35:08PM +0100, Christoffer Dall wrote:
>> Targets KVM support for Cortex A-15 processors.
>>
>> Contains all the framework components, make files, header files, some
>> tracing functionality, and basic user space API.
>>
>> Only supported core is Cortex-A15 for now.
>>
>> Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.
>>
>> “Nothing to see here. Move along, move along..."
>
> [...]
>
>> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
>> new file mode 100644
>> index 0000000..a13b582
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm.h
>> @@ -0,0 +1,88 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_H__
>> +#define __ARM_KVM_H__
>> +
>> +#include <asm/types.h>
>> +
>> +#define __KVM_HAVE_GUEST_DEBUG
>> +
>> +#define KVM_REG_SIZE(id)                                               \
>> +       (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>> +
>> +struct kvm_regs {
>> +       __u32 usr_regs[15];     /* R0_usr - R14_usr */
>> +       __u32 svc_regs[3];      /* SP_svc, LR_svc, SPSR_svc */
>> +       __u32 abt_regs[3];      /* SP_abt, LR_abt, SPSR_abt */
>> +       __u32 und_regs[3];      /* SP_und, LR_und, SPSR_und */
>> +       __u32 irq_regs[3];      /* SP_irq, LR_irq, SPSR_irq */
>> +       __u32 fiq_regs[8];      /* R8_fiq - R14_fiq, SPSR_fiq */
>> +       __u32 pc;               /* The program counter (r15) */
>> +       __u32 cpsr;             /* The guest CPSR */
>> +};
>> +
>> +/* Supported Processor Types */
>> +#define KVM_ARM_TARGET_CORTEX_A15      (0xC0F)
>
> So there's this define...
>
>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> new file mode 100644
>> index 0000000..2f9d28e
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -0,0 +1,28 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_ARM_H__
>> +#define __ARM_KVM_ARM_H__
>> +
>> +/* Supported Processor Types */
>> +#define CORTEX_A15     (0xC0F)
>
> ... and then this one. Do we really need both?
>

no, we don't

>> +/* Multiprocessor Affinity Register */
>> +#define MPIDR_CPUID    (0x3 << 0)
>
> I'm fairly sure we already have code under arch/arm/ for dealing with the
> mpidr. Let's re-use that rather than reinventing it here.
>

I see some defines in topology.c - do you want some of these factored 
out into a header file that we can then also use from kvm? If so, where?

>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>> new file mode 100644
>> index 0000000..44591f9
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -0,0 +1,30 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_ASM_H__
>> +#define __ARM_KVM_ASM_H__
>> +
>> +#define ARM_EXCEPTION_RESET      0
>> +#define ARM_EXCEPTION_UNDEFINED   1
>> +#define ARM_EXCEPTION_SOFTWARE    2
>> +#define ARM_EXCEPTION_PREF_ABORT  3
>> +#define ARM_EXCEPTION_DATA_ABORT  4
>> +#define ARM_EXCEPTION_IRQ        5
>> +#define ARM_EXCEPTION_FIQ        6
>
> Again, you have these defines (which look more suited to an enum type), but
> later (in kvm_host.h) you have:

well, unless I miss some known trick, assembly code doesn't like enums?

>
>> +#define EXCEPTION_NONE      0
>> +#define EXCEPTION_RESET     0x80
>> +#define EXCEPTION_UNDEFINED 0x40
>> +#define EXCEPTION_SOFTWARE  0x20
>> +#define EXCEPTION_PREFETCH  0x10
>> +#define EXCEPTION_DATA      0x08
>> +#define EXCEPTION_IMPRECISE 0x04
>> +#define EXCEPTION_IRQ       0x02
>> +#define EXCEPTION_FIQ       0x01
>
> Why the noise?
>

these are simply cruft from a previous life of KVM/ARM.

>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
>> new file mode 100644
>> index 0000000..9e29335
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm_emulate.h
>> @@ -0,0 +1,108 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_EMULATE_H__
>> +#define __ARM_KVM_EMULATE_H__
>> +
>> +#include <linux/kvm_host.h>
>> +#include <asm/kvm_asm.h>
>> +
>> +u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, enum vcpu_mode mode);
>> +
>> +static inline u8 __vcpu_mode(u32 cpsr)
>> +{
>> +       u8 modes_table[32] = {
>> +               0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
>> +               0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
>> +               MODE_USR,       /* 0x0 */
>> +               MODE_FIQ,       /* 0x1 */
>> +               MODE_IRQ,       /* 0x2 */
>> +               MODE_SVC,       /* 0x3 */
>> +               0xf, 0xf, 0xf,
>> +               MODE_ABT,       /* 0x7 */
>> +               0xf, 0xf, 0xf,
>> +               MODE_UND,       /* 0xb */
>> +               0xf, 0xf, 0xf,
>> +               MODE_SYS        /* 0xf */
>> +       };
>
> These MODE_* definitions sound like our *_MODE definitions... except they're
> not. It would probably be far more readable as a switch, but at least change
> the name of those things!


fair enough, they're renamed to VCPU_XXX_MODE

>> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
>> +{
>> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
>> +       BUG_ON(mode == 0xf);
>> +       return mode;
>> +}
>
> I noticed that you have a fair few BUG_ONs throughout the series. Fair
> enough, but for hyp code is that really the right thing to do? Killing the
> guest could make more sense, perhaps?

the idea is to have BUG_ONs that are indeed BUG_ONs that we want to 
catch explicitly on the host. We have had a pass over the code to change 
all the BUG_ONs that can be provoked by the guest and inject the proper 
exceptions into the guest in this case. If you find places where this is 
not the case, it should be changed, and do let me know.

>
>> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
>> +{
>> +       return vcpu_reg(vcpu, 15);
>> +}
>
> If you stick a struct pt_regs into struct kvm_regs, you could reuse ARM_pc
> here etc.
>

I prefer not to, because we'd have those registers presumably for usr 
mode and then we only define the others explicit. I think it's much 
clearer to look at kvm_regs today.


>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> new file mode 100644
>> index 0000000..24959f4
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -0,0 +1,172 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_HOST_H__
>> +#define __ARM_KVM_HOST_H__
>> +
>> +#include <asm/kvm.h>
>> +
>> +#define KVM_MAX_VCPUS 4
>
> NR_CPUS?
>

well this is defined by KVM generic code, and is common for other 
architecture.

>> +#define KVM_MEMORY_SLOTS 32
>> +#define KVM_PRIVATE_MEM_SLOTS 4
>> +#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>> +
>> +#define NUM_FEATURES 0
>
> Ha! No idea what that means, but hopefully there's less code to review
> because of it :)
>

that's actually true.

will rename to KVM_VCPU_MAX_FEATURES
(or do you want NR in this case? :-\)

>> +
>> +/* We don't currently support large pages. */
>> +#define KVM_HPAGE_GFN_SHIFT(x) 0
>> +#define KVM_NR_PAGE_SIZES      1
>> +#define KVM_PAGES_PER_HPAGE(x) (1UL<<31)
>> +
>> +struct kvm_vcpu;
>> +u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>> +int kvm_target_cpu(void);
>> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>> +void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
>> +
>> +struct kvm_arch {
>> +       /* The VMID generation used for the virt. memory system */
>> +       u64    vmid_gen;
>> +       u32    vmid;
>> +
>> +       /* 1-level 2nd stage table and lock */
>> +       spinlock_t pgd_lock;
>> +       pgd_t *pgd;
>> +
>> +       /* VTTBR value associated with above pgd and vmid */
>> +       u64    vttbr;
>> +};
>> +
>> +#define EXCEPTION_NONE      0
>> +#define EXCEPTION_RESET     0x80
>> +#define EXCEPTION_UNDEFINED 0x40
>> +#define EXCEPTION_SOFTWARE  0x20
>> +#define EXCEPTION_PREFETCH  0x10
>> +#define EXCEPTION_DATA      0x08
>> +#define EXCEPTION_IMPRECISE 0x04
>> +#define EXCEPTION_IRQ       0x02
>> +#define EXCEPTION_FIQ       0x01
>> +
>> +#define KVM_NR_MEM_OBJS     40
>> +
>> +/*
>> + * We don't want allocation failures within the mmu code, so we preallocate
>> + * enough memory for a single page fault in a cache.
>> + */
>> +struct kvm_mmu_memory_cache {
>> +       int nobjs;
>> +       void *objects[KVM_NR_MEM_OBJS];
>> +};
>> +
>> +/*
>> + * Modes used for short-hand mode determinition in the world-switch code and
>> + * in emulation code.
>> + *
>> + * Note: These indices do NOT correspond to the value of the CPSR mode bits!
>> + */
>> +enum vcpu_mode {
>> +       MODE_FIQ = 0,
>> +       MODE_IRQ,
>> +       MODE_SVC,
>> +       MODE_ABT,
>> +       MODE_UND,
>> +       MODE_USR,
>> +       MODE_SYS
>> +};
>
> So the need for this enum is for indexing the array of modes, right? But
> accesses to that array are already hidden behind an accessor function from
> what I can tell, so I'd rather the arithmetic from cpsr -> index was
> restricted to that function and the rest of the code just passed either the
> raw mode or the full cpsr around.
>

good point, this was really useful in that prior life of kvm/arm where 
we did a bunch of emulation and decoding all over the place. I'll send 
out a v2 with this reworked.

>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> new file mode 100644
>> index 0000000..fd6fa9b
>> --- /dev/null
>> +++ b/arch/arm/kvm/arm.c
>> @@ -0,0 +1,345 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#include <linux/errno.h>
>> +#include <linux/err.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/module.h>
>> +#include <linux/vmalloc.h>
>> +#include <linux/fs.h>
>> +#include <linux/mman.h>
>> +#include <linux/sched.h>
>> +#include <trace/events/kvm.h>
>> +
>> +#define CREATE_TRACE_POINTS
>> +#include "trace.h"
>> +
>> +#include <asm/unified.h>
>> +#include <asm/uaccess.h>
>> +#include <asm/ptrace.h>
>> +#include <asm/mman.h>
>> +#include <asm/cputype.h>
>> +
>> +#ifdef REQUIRES_VIRT
>> +__asm__(".arch_extension       virt");
>> +#endif
>> +
>> +int kvm_arch_hardware_enable(void *garbage)
>> +{
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
>> +{
>> +       return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
>> +}
>> +
>> +void kvm_arch_hardware_disable(void *garbage)
>> +{
>> +}
>> +
>> +int kvm_arch_hardware_setup(void)
>> +{
>> +       return 0;
>> +}
>> +
>> +void kvm_arch_hardware_unsetup(void)
>> +{
>> +}
>> +
>> +void kvm_arch_check_processor_compat(void *rtn)
>> +{
>> +       *(int *)rtn = 0;
>> +}
>> +
>> +void kvm_arch_sync_events(struct kvm *kvm)
>> +{
>> +}
>> +
>> +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>> +{
>> +       if (type)
>> +               return -EINVAL;
>> +
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>> +{
>> +       return VM_FAULT_SIGBUS;
>> +}
>> +
>> +void kvm_arch_free_memslot(struct kvm_memory_slot *free,
>> +                          struct kvm_memory_slot *dont)
>> +{
>> +}
>> +
>> +int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
>> +{
>> +       return 0;
>> +}
>> +
>> +void kvm_arch_destroy_vm(struct kvm *kvm)
>> +{
>> +       int i;
>> +
>> +       for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>> +               if (kvm->vcpus[i]) {
>> +                       kvm_arch_vcpu_free(kvm->vcpus[i]);
>> +                       kvm->vcpus[i] = NULL;
>> +               }
>> +       }
>> +}
>> +
>> +int kvm_dev_ioctl_check_extension(long ext)
>> +{
>> +       int r;
>> +       switch (ext) {
>> +       case KVM_CAP_USER_MEMORY:
>> +       case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
>> +       case KVM_CAP_ONE_REG:
>> +               r = 1;
>> +               break;
>> +       case KVM_CAP_COALESCED_MMIO:
>> +               r = KVM_COALESCED_MMIO_PAGE_OFFSET;
>> +               break;
>> +       default:
>> +               r = 0;
>> +               break;
>> +       }
>> +       return r;
>> +}
>> +
>> +long kvm_arch_dev_ioctl(struct file *filp,
>> +                       unsigned int ioctl, unsigned long arg)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +int kvm_arch_set_memory_region(struct kvm *kvm,
>> +                              struct kvm_userspace_memory_region *mem,
>> +                              struct kvm_memory_slot old,
>> +                              int user_alloc)
>> +{
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_prepare_memory_region(struct kvm *kvm,
>> +                                  struct kvm_memory_slot *memslot,
>> +                                  struct kvm_memory_slot old,
>> +                                  struct kvm_userspace_memory_region *mem,
>> +                                  int user_alloc)
>> +{
>> +       return 0;
>> +}
>> +
>> +void kvm_arch_commit_memory_region(struct kvm *kvm,
>> +                                  struct kvm_userspace_memory_region *mem,
>> +                                  struct kvm_memory_slot old,
>> +                                  int user_alloc)
>> +{
>> +}
>> +
>> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
>> +{
>> +}
>> +
>> +void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>> +                                  struct kvm_memory_slot *slot)
>> +{
>> +}
>> +
>> +struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>> +{
>> +       int err;
>> +       struct kvm_vcpu *vcpu;
>> +
>> +       vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
>> +       if (!vcpu) {
>> +               err = -ENOMEM;
>> +               goto out;
>> +       }
>> +
>> +       err = kvm_vcpu_init(vcpu, kvm, id);
>> +       if (err)
>> +               goto free_vcpu;
>> +
>> +       return vcpu;
>> +free_vcpu:
>> +       kmem_cache_free(kvm_vcpu_cache, vcpu);
>> +out:
>> +       return ERR_PTR(err);
>> +}
>> +
>> +void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
>> +{
>> +}
>> +
>> +void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>> +{
>> +       kvm_arch_vcpu_free(vcpu);
>> +}
>> +
>> +int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>> +{
>> +       return 0;
>> +}
>> +
>> +int __attribute_const__ kvm_target_cpu(void)
>> +{
>> +       unsigned int midr;
>> +
>> +       midr = read_cpuid_id();
>> +       switch ((midr >> 4) & 0xfff) {
>> +       case KVM_ARM_TARGET_CORTEX_A15:
>> +               return KVM_ARM_TARGET_CORTEX_A15;
>
> I have this code already in perf_event.c. Can we move it somewhere common
> and share it? You should also check that the implementor field is 0x41.
>

by all means, you can probably suggest a good place better than I can...

>> +       default:
>> +               return -EINVAL;
>> +       }
>> +}
>> +
>> +int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>> +{
>> +       return 0;
>> +}
>> +
>> +void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>> +{
>> +}
>> +
>> +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>> +{
>> +}
>> +
>> +void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> +{
>> +}
>> +
>> +int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>> +                                       struct kvm_guest_debug *dbg)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +
>> +int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>> +                                   struct kvm_mp_state *mp_state)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>> +                                   struct kvm_mp_state *mp_state)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>> +{
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
>> +{
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +long kvm_arch_vcpu_ioctl(struct file *filp,
>> +                        unsigned int ioctl, unsigned long arg)
>> +{
>> +       struct kvm_vcpu *vcpu = filp->private_data;
>> +       void __user *argp = (void __user *)arg;
>> +
>> +       switch (ioctl) {
>> +       case KVM_ARM_VCPU_INIT: {
>> +               struct kvm_vcpu_init init;
>> +
>> +               if (copy_from_user(&init, argp, sizeof init))
>> +                       return -EFAULT;
>> +
>> +               return kvm_vcpu_set_target(vcpu, &init);
>> +
>> +       }
>> +       case KVM_SET_ONE_REG:
>> +       case KVM_GET_ONE_REG: {
>> +               struct kvm_one_reg reg;
>> +               if (copy_from_user(&reg, argp, sizeof(reg)))
>> +                       return -EFAULT;
>> +               if (ioctl == KVM_SET_ONE_REG)
>> +                       return kvm_arm_set_reg(vcpu, &reg);
>> +               else
>> +                       return kvm_arm_get_reg(vcpu, &reg);
>> +       }
>> +       case KVM_GET_REG_LIST: {
>> +               struct kvm_reg_list __user *user_list = argp;
>> +               struct kvm_reg_list reg_list;
>> +               unsigned n;
>> +
>> +               if (copy_from_user(&reg_list, user_list, sizeof reg_list))
>> +                       return -EFAULT;
>> +               n = reg_list.n;
>> +               reg_list.n = kvm_arm_num_regs(vcpu);
>> +               if (copy_to_user(user_list, &reg_list, sizeof reg_list))
>> +                       return -EFAULT;
>> +               if (n < reg_list.n)
>> +                       return -E2BIG;
>> +               return kvm_arm_copy_reg_indices(vcpu, user_list->reg);
>
> kvm_reg_list sounds like it could be done using a regset instead.
>
>> diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
>> new file mode 100644
>> index 0000000..690bbb3
>> --- /dev/null
>> +++ b/arch/arm/kvm/emulate.c
>> @@ -0,0 +1,127 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#include <asm/kvm_emulate.h>
>> +
>> +#define REG_OFFSET(_reg) \
>> +       (offsetof(struct kvm_regs, _reg) / sizeof(u32))
>> +
>> +#define USR_REG_OFFSET(_num) REG_OFFSET(usr_regs[_num])
>> +
>> +static const unsigned long vcpu_reg_offsets[MODE_SYS + 1][16] = {
>> +       /* FIQ Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7),
>> +               REG_OFFSET(fiq_regs[1]), /* r8 */
>> +               REG_OFFSET(fiq_regs[1]), /* r9 */
>> +               REG_OFFSET(fiq_regs[2]), /* r10 */
>> +               REG_OFFSET(fiq_regs[3]), /* r11 */
>> +               REG_OFFSET(fiq_regs[4]), /* r12 */
>> +               REG_OFFSET(fiq_regs[5]), /* r13 */
>> +               REG_OFFSET(fiq_regs[6]), /* r14 */
>> +               REG_OFFSET(pc)           /* r15 */
>> +       },
>> +
>> +       /* IRQ Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(irq_regs[0]), /* r13 */
>> +               REG_OFFSET(irq_regs[1]), /* r14 */
>> +               REG_OFFSET(pc)           /* r15 */
>> +       },
>> +
>> +       /* SVC Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(svc_regs[0]), /* r13 */
>> +               REG_OFFSET(svc_regs[1]), /* r14 */
>> +               REG_OFFSET(pc)           /* r15 */
>> +       },
>> +
>> +       /* ABT Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(abt_regs[0]), /* r13 */
>> +               REG_OFFSET(abt_regs[1]), /* r14 */
>> +               REG_OFFSET(pc)           /* r15 */
>> +       },
>> +
>> +       /* UND Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(und_regs[0]), /* r13 */
>> +               REG_OFFSET(und_regs[1]), /* r14 */
>> +               REG_OFFSET(pc)           /* r15 */
>> +       },
>> +
>> +       /* USR Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(usr_regs[13]), /* r13 */
>> +               REG_OFFSET(usr_regs[14]), /* r14 */
>> +               REG_OFFSET(pc)            /* r15 */
>> +       },
>> +
>> +       /* SYS Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(usr_regs[13]), /* r13 */
>> +               REG_OFFSET(usr_regs[14]), /* r14 */
>> +               REG_OFFSET(pc)            /* r15 */
>> +       },
>> +};
>> +
>> +/*
>> + * Return a pointer to the register number valid in the specified mode of
>> + * the virtual CPU.
>> + */
>> +u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
>> +{
>> +       u32 *reg_array = (u32 *)&vcpu->arch.regs;
>> +
>> +       BUG_ON(reg_num > 15);
>> +       BUG_ON(mode > MODE_SYS);
>
> Again, BUG_ON seems a bit OTT here. Also, this is where the mode => idx
> calculation should happen.
>
>> +       return reg_array + vcpu_reg_offsets[mode][reg_num];
>> +}
>> diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
>> new file mode 100644
>> index 0000000..3e38c95
>> --- /dev/null
>> +++ b/arch/arm/kvm/exports.c
>> @@ -0,0 +1,21 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#include <linux/module.h>
>> +
>> +EXPORT_SYMBOL_GPL(smp_send_reschedule);
>
> Erm...
>
> We already have arch/arm/kernel/armksyms.c for exports -- please use that.
> However, exporting such low-level operations sounds like a bad idea. How
> realistic is kvm-as-a-module on ARM anyway?
>

at this point it's broken, so I'll just remove this and leave this for a 
fun project for some poor soul at some point if anyone ever needs half 
the code outside the kernel as a module (the other half needs to be 
compiled in anyway)

>> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
>> new file mode 100644
>> index 0000000..19a5389
>> --- /dev/null
>> +++ b/arch/arm/kvm/guest.c
>> @@ -0,0 +1,211 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#include <linux/errno.h>
>> +#include <linux/err.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/module.h>
>> +#include <linux/vmalloc.h>
>> +#include <linux/fs.h>
>> +#include <asm/uaccess.h>
>> +#include <asm/kvm.h>
>> +#include <asm/kvm_asm.h>
>> +#include <asm/kvm_emulate.h>
>> +
>> +#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
>> +#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
>> +
>> +struct kvm_stats_debugfs_item debugfs_entries[] = {
>> +       { NULL }
>> +};
>> +
>> +int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
>> +{
>> +       return 0;
>> +}
>> +
>> +static u64 core_reg_offset_from_id(u64 id)
>> +{
>> +       return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
>> +}
>> +
>> +static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>> +{
>> +       u32 __user *uaddr = (u32 __user *)(long)reg->addr;
>> +       struct kvm_regs *regs = &vcpu->arch.regs;
>> +       u64 off;
>> +
>> +       if (KVM_REG_SIZE(reg->id) != 4)
>> +               return -ENOENT;
>> +
>> +       /* Our ID is an index into the kvm_regs struct. */
>> +       off = core_reg_offset_from_id(reg->id);
>> +       if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
>> +               return -ENOENT;
>> +
>> +       return put_user(((u32 *)regs)[off], uaddr);
>> +}
>> +
>> +static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>> +{
>> +       u32 __user *uaddr = (u32 __user *)(long)reg->addr;
>> +       struct kvm_regs *regs = &vcpu->arch.regs;
>> +       u64 off, val;
>> +
>> +       if (KVM_REG_SIZE(reg->id) != 4)
>> +               return -ENOENT;
>> +
>> +       /* Our ID is an index into the kvm_regs struct. */
>> +       off = core_reg_offset_from_id(reg->id);
>> +       if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
>> +               return -ENOENT;
>> +
>> +       if (get_user(val, uaddr) != 0)
>> +               return -EFAULT;
>> +
>> +       if (off == KVM_REG_ARM_CORE_REG(cpsr)) {
>> +               if (__vcpu_mode(val) == 0xf)
>> +                       return -EINVAL;
>> +       }
>> +
>> +       ((u32 *)regs)[off] = val;
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>> +{
>> +       return -EINVAL;
>> +}
>
> Again, all looks like this should be implemented using regsets from what I
> can tell.
>

this API has been discussed to death on the KVM lists, and we can of 
course revive that if the regset makes it nicer - I'd prefer getting 
this upstream the way that it is now though, where GET_REG / SET_REG 
seems to be the way forward from a KVM perspective.

>> diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
>> new file mode 100644
>> index 0000000..290a13d
>> --- /dev/null
>> +++ b/arch/arm/kvm/reset.c
>> @@ -0,0 +1,74 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virrtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +#include <linux/compiler.h>
>> +#include <linux/errno.h>
>> +#include <linux/sched.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/kvm.h>
>> +
>> +#include <asm/unified.h>
>> +#include <asm/ptrace.h>
>> +#include <asm/cputype.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/kvm_coproc.h>
>> +
>> +/******************************************************************************
>> + * Cortex-A15 Reset Values
>> + */
>> +
>> +static const int a15_max_cpu_idx = 3;
>> +
>> +static struct kvm_regs a15_regs_reset = {
>> +       .cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
>> +};
>> +
>> +
>> +/*******************************************************************************
>> + * Exported reset function
>> + */
>> +
>> +/**
>> + * kvm_reset_vcpu - sets core registers and cp15 registers to reset value
>> + * @vcpu: The VCPU pointer
>> + *
>> + * This function finds the right table above and sets the registers on the
>> + * virtual CPU struct to their architectually defined reset values.
>> + */
>> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>> +{
>> +       struct kvm_regs *cpu_reset;
>> +
>> +       switch (vcpu->arch.target) {
>> +       case KVM_ARM_TARGET_CORTEX_A15:
>> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
>> +                       return -EINVAL;
>> +               cpu_reset = &a15_regs_reset;
>> +               vcpu->arch.midr = read_cpuid_id();
>> +               break;
>> +       default:
>> +               return -ENODEV;
>> +       }
>> +
>> +       /* Reset core registers */
>> +       memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
>> +
>> +       /* Reset CP15 registers */
>> +       kvm_reset_coprocs(vcpu);
>> +
>> +       return 0;
>> +}
>
> This is a nice way to plug in new CPUs but the way the rest of the code is
> currently written, all the ARMv7 and Cortex-A15 code is merged together. I
> *strongly* suggest you isolate this from the start, as it will help you see
> what is architected and what is implementation-specific.
>

not entirely sure what you mean. You want a separate coproc.c file for 
Cortex-A15 specific stuff like coproc_a15.c?

Thanks a bunch for the review!
-Christoffer


^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-09-26  1:43       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-26  1:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/25/2012 11:20 AM, Will Deacon wrote:
> On Sat, Sep 15, 2012 at 04:35:08PM +0100, Christoffer Dall wrote:
>> Targets KVM support for Cortex A-15 processors.
>>
>> Contains all the framework components, make files, header files, some
>> tracing functionality, and basic user space API.
>>
>> Only supported core is Cortex-A15 for now.
>>
>> Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.
>>
>> ?Nothing to see here. Move along, move along..."
>
> [...]
>
>> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
>> new file mode 100644
>> index 0000000..a13b582
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm.h
>> @@ -0,0 +1,88 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_H__
>> +#define __ARM_KVM_H__
>> +
>> +#include <asm/types.h>
>> +
>> +#define __KVM_HAVE_GUEST_DEBUG
>> +
>> +#define KVM_REG_SIZE(id)                                               \
>> +       (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>> +
>> +struct kvm_regs {
>> +       __u32 usr_regs[15];     /* R0_usr - R14_usr */
>> +       __u32 svc_regs[3];      /* SP_svc, LR_svc, SPSR_svc */
>> +       __u32 abt_regs[3];      /* SP_abt, LR_abt, SPSR_abt */
>> +       __u32 und_regs[3];      /* SP_und, LR_und, SPSR_und */
>> +       __u32 irq_regs[3];      /* SP_irq, LR_irq, SPSR_irq */
>> +       __u32 fiq_regs[8];      /* R8_fiq - R14_fiq, SPSR_fiq */
>> +       __u32 pc;               /* The program counter (r15) */
>> +       __u32 cpsr;             /* The guest CPSR */
>> +};
>> +
>> +/* Supported Processor Types */
>> +#define KVM_ARM_TARGET_CORTEX_A15      (0xC0F)
>
> So there's this define...
>
>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> new file mode 100644
>> index 0000000..2f9d28e
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -0,0 +1,28 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_ARM_H__
>> +#define __ARM_KVM_ARM_H__
>> +
>> +/* Supported Processor Types */
>> +#define CORTEX_A15     (0xC0F)
>
> ... and then this one. Do we really need both?
>

no, we don't

>> +/* Multiprocessor Affinity Register */
>> +#define MPIDR_CPUID    (0x3 << 0)
>
> I'm fairly sure we already have code under arch/arm/ for dealing with the
> mpidr. Let's re-use that rather than reinventing it here.
>

I see some defines in topology.c - do you want some of these factored 
out into a header file that we can then also use from kvm? If so, where?

>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>> new file mode 100644
>> index 0000000..44591f9
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -0,0 +1,30 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_ASM_H__
>> +#define __ARM_KVM_ASM_H__
>> +
>> +#define ARM_EXCEPTION_RESET      0
>> +#define ARM_EXCEPTION_UNDEFINED   1
>> +#define ARM_EXCEPTION_SOFTWARE    2
>> +#define ARM_EXCEPTION_PREF_ABORT  3
>> +#define ARM_EXCEPTION_DATA_ABORT  4
>> +#define ARM_EXCEPTION_IRQ        5
>> +#define ARM_EXCEPTION_FIQ        6
>
> Again, you have these defines (which look more suited to an enum type), but
> later (in kvm_host.h) you have:

well, unless I miss some known trick, assembly code doesn't like enums?

>
>> +#define EXCEPTION_NONE      0
>> +#define EXCEPTION_RESET     0x80
>> +#define EXCEPTION_UNDEFINED 0x40
>> +#define EXCEPTION_SOFTWARE  0x20
>> +#define EXCEPTION_PREFETCH  0x10
>> +#define EXCEPTION_DATA      0x08
>> +#define EXCEPTION_IMPRECISE 0x04
>> +#define EXCEPTION_IRQ       0x02
>> +#define EXCEPTION_FIQ       0x01
>
> Why the noise?
>

these are simply cruft from a previous life of KVM/ARM.

>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
>> new file mode 100644
>> index 0000000..9e29335
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm_emulate.h
>> @@ -0,0 +1,108 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_EMULATE_H__
>> +#define __ARM_KVM_EMULATE_H__
>> +
>> +#include <linux/kvm_host.h>
>> +#include <asm/kvm_asm.h>
>> +
>> +u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, enum vcpu_mode mode);
>> +
>> +static inline u8 __vcpu_mode(u32 cpsr)
>> +{
>> +       u8 modes_table[32] = {
>> +               0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
>> +               0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf,
>> +               MODE_USR,       /* 0x0 */
>> +               MODE_FIQ,       /* 0x1 */
>> +               MODE_IRQ,       /* 0x2 */
>> +               MODE_SVC,       /* 0x3 */
>> +               0xf, 0xf, 0xf,
>> +               MODE_ABT,       /* 0x7 */
>> +               0xf, 0xf, 0xf,
>> +               MODE_UND,       /* 0xb */
>> +               0xf, 0xf, 0xf,
>> +               MODE_SYS        /* 0xf */
>> +       };
>
> These MODE_* definitions sound like our *_MODE definitions... except they're
> not. It would probably be far more readable as a switch, but at least change
> the name of those things!


fair enough, they're renamed to VCPU_XXX_MODE

>> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
>> +{
>> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
>> +       BUG_ON(mode == 0xf);
>> +       return mode;
>> +}
>
> I noticed that you have a fair few BUG_ONs throughout the series. Fair
> enough, but for hyp code is that really the right thing to do? Killing the
> guest could make more sense, perhaps?

the idea is to have BUG_ONs that are indeed BUG_ONs that we want to 
catch explicitly on the host. We have had a pass over the code to change 
all the BUG_ONs that can be provoked by the guest and inject the proper 
exceptions into the guest in this case. If you find places where this is 
not the case, it should be changed, and do let me know.

>
>> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
>> +{
>> +       return vcpu_reg(vcpu, 15);
>> +}
>
> If you stick a struct pt_regs into struct kvm_regs, you could reuse ARM_pc
> here etc.
>

I prefer not to, because we'd have those registers presumably for usr 
mode and then we only define the others explicit. I think it's much 
clearer to look at kvm_regs today.


>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> new file mode 100644
>> index 0000000..24959f4
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -0,0 +1,172 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_HOST_H__
>> +#define __ARM_KVM_HOST_H__
>> +
>> +#include <asm/kvm.h>
>> +
>> +#define KVM_MAX_VCPUS 4
>
> NR_CPUS?
>

well this is defined by KVM generic code, and is common for other 
architecture.

>> +#define KVM_MEMORY_SLOTS 32
>> +#define KVM_PRIVATE_MEM_SLOTS 4
>> +#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>> +
>> +#define NUM_FEATURES 0
>
> Ha! No idea what that means, but hopefully there's less code to review
> because of it :)
>

that's actually true.

will rename to KVM_VCPU_MAX_FEATURES
(or do you want NR in this case? :-\)

>> +
>> +/* We don't currently support large pages. */
>> +#define KVM_HPAGE_GFN_SHIFT(x) 0
>> +#define KVM_NR_PAGE_SIZES      1
>> +#define KVM_PAGES_PER_HPAGE(x) (1UL<<31)
>> +
>> +struct kvm_vcpu;
>> +u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>> +int kvm_target_cpu(void);
>> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>> +void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
>> +
>> +struct kvm_arch {
>> +       /* The VMID generation used for the virt. memory system */
>> +       u64    vmid_gen;
>> +       u32    vmid;
>> +
>> +       /* 1-level 2nd stage table and lock */
>> +       spinlock_t pgd_lock;
>> +       pgd_t *pgd;
>> +
>> +       /* VTTBR value associated with above pgd and vmid */
>> +       u64    vttbr;
>> +};
>> +
>> +#define EXCEPTION_NONE      0
>> +#define EXCEPTION_RESET     0x80
>> +#define EXCEPTION_UNDEFINED 0x40
>> +#define EXCEPTION_SOFTWARE  0x20
>> +#define EXCEPTION_PREFETCH  0x10
>> +#define EXCEPTION_DATA      0x08
>> +#define EXCEPTION_IMPRECISE 0x04
>> +#define EXCEPTION_IRQ       0x02
>> +#define EXCEPTION_FIQ       0x01
>> +
>> +#define KVM_NR_MEM_OBJS     40
>> +
>> +/*
>> + * We don't want allocation failures within the mmu code, so we preallocate
>> + * enough memory for a single page fault in a cache.
>> + */
>> +struct kvm_mmu_memory_cache {
>> +       int nobjs;
>> +       void *objects[KVM_NR_MEM_OBJS];
>> +};
>> +
>> +/*
>> + * Modes used for short-hand mode determinition in the world-switch code and
>> + * in emulation code.
>> + *
>> + * Note: These indices do NOT correspond to the value of the CPSR mode bits!
>> + */
>> +enum vcpu_mode {
>> +       MODE_FIQ = 0,
>> +       MODE_IRQ,
>> +       MODE_SVC,
>> +       MODE_ABT,
>> +       MODE_UND,
>> +       MODE_USR,
>> +       MODE_SYS
>> +};
>
> So the need for this enum is for indexing the array of modes, right? But
> accesses to that array are already hidden behind an accessor function from
> what I can tell, so I'd rather the arithmetic from cpsr -> index was
> restricted to that function and the rest of the code just passed either the
> raw mode or the full cpsr around.
>

good point, this was really useful in that prior life of kvm/arm where 
we did a bunch of emulation and decoding all over the place. I'll send 
out a v2 with this reworked.

>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> new file mode 100644
>> index 0000000..fd6fa9b
>> --- /dev/null
>> +++ b/arch/arm/kvm/arm.c
>> @@ -0,0 +1,345 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#include <linux/errno.h>
>> +#include <linux/err.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/module.h>
>> +#include <linux/vmalloc.h>
>> +#include <linux/fs.h>
>> +#include <linux/mman.h>
>> +#include <linux/sched.h>
>> +#include <trace/events/kvm.h>
>> +
>> +#define CREATE_TRACE_POINTS
>> +#include "trace.h"
>> +
>> +#include <asm/unified.h>
>> +#include <asm/uaccess.h>
>> +#include <asm/ptrace.h>
>> +#include <asm/mman.h>
>> +#include <asm/cputype.h>
>> +
>> +#ifdef REQUIRES_VIRT
>> +__asm__(".arch_extension       virt");
>> +#endif
>> +
>> +int kvm_arch_hardware_enable(void *garbage)
>> +{
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
>> +{
>> +       return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
>> +}
>> +
>> +void kvm_arch_hardware_disable(void *garbage)
>> +{
>> +}
>> +
>> +int kvm_arch_hardware_setup(void)
>> +{
>> +       return 0;
>> +}
>> +
>> +void kvm_arch_hardware_unsetup(void)
>> +{
>> +}
>> +
>> +void kvm_arch_check_processor_compat(void *rtn)
>> +{
>> +       *(int *)rtn = 0;
>> +}
>> +
>> +void kvm_arch_sync_events(struct kvm *kvm)
>> +{
>> +}
>> +
>> +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>> +{
>> +       if (type)
>> +               return -EINVAL;
>> +
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>> +{
>> +       return VM_FAULT_SIGBUS;
>> +}
>> +
>> +void kvm_arch_free_memslot(struct kvm_memory_slot *free,
>> +                          struct kvm_memory_slot *dont)
>> +{
>> +}
>> +
>> +int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
>> +{
>> +       return 0;
>> +}
>> +
>> +void kvm_arch_destroy_vm(struct kvm *kvm)
>> +{
>> +       int i;
>> +
>> +       for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>> +               if (kvm->vcpus[i]) {
>> +                       kvm_arch_vcpu_free(kvm->vcpus[i]);
>> +                       kvm->vcpus[i] = NULL;
>> +               }
>> +       }
>> +}
>> +
>> +int kvm_dev_ioctl_check_extension(long ext)
>> +{
>> +       int r;
>> +       switch (ext) {
>> +       case KVM_CAP_USER_MEMORY:
>> +       case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
>> +       case KVM_CAP_ONE_REG:
>> +               r = 1;
>> +               break;
>> +       case KVM_CAP_COALESCED_MMIO:
>> +               r = KVM_COALESCED_MMIO_PAGE_OFFSET;
>> +               break;
>> +       default:
>> +               r = 0;
>> +               break;
>> +       }
>> +       return r;
>> +}
>> +
>> +long kvm_arch_dev_ioctl(struct file *filp,
>> +                       unsigned int ioctl, unsigned long arg)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +int kvm_arch_set_memory_region(struct kvm *kvm,
>> +                              struct kvm_userspace_memory_region *mem,
>> +                              struct kvm_memory_slot old,
>> +                              int user_alloc)
>> +{
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_prepare_memory_region(struct kvm *kvm,
>> +                                  struct kvm_memory_slot *memslot,
>> +                                  struct kvm_memory_slot old,
>> +                                  struct kvm_userspace_memory_region *mem,
>> +                                  int user_alloc)
>> +{
>> +       return 0;
>> +}
>> +
>> +void kvm_arch_commit_memory_region(struct kvm *kvm,
>> +                                  struct kvm_userspace_memory_region *mem,
>> +                                  struct kvm_memory_slot old,
>> +                                  int user_alloc)
>> +{
>> +}
>> +
>> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
>> +{
>> +}
>> +
>> +void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>> +                                  struct kvm_memory_slot *slot)
>> +{
>> +}
>> +
>> +struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>> +{
>> +       int err;
>> +       struct kvm_vcpu *vcpu;
>> +
>> +       vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
>> +       if (!vcpu) {
>> +               err = -ENOMEM;
>> +               goto out;
>> +       }
>> +
>> +       err = kvm_vcpu_init(vcpu, kvm, id);
>> +       if (err)
>> +               goto free_vcpu;
>> +
>> +       return vcpu;
>> +free_vcpu:
>> +       kmem_cache_free(kvm_vcpu_cache, vcpu);
>> +out:
>> +       return ERR_PTR(err);
>> +}
>> +
>> +void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
>> +{
>> +}
>> +
>> +void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>> +{
>> +       kvm_arch_vcpu_free(vcpu);
>> +}
>> +
>> +int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>> +{
>> +       return 0;
>> +}
>> +
>> +int __attribute_const__ kvm_target_cpu(void)
>> +{
>> +       unsigned int midr;
>> +
>> +       midr = read_cpuid_id();
>> +       switch ((midr >> 4) & 0xfff) {
>> +       case KVM_ARM_TARGET_CORTEX_A15:
>> +               return KVM_ARM_TARGET_CORTEX_A15;
>
> I have this code already in perf_event.c. Can we move it somewhere common
> and share it? You should also check that the implementor field is 0x41.
>

by all means, you can probably suggest a good place better than I can...

>> +       default:
>> +               return -EINVAL;
>> +       }
>> +}
>> +
>> +int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>> +{
>> +       return 0;
>> +}
>> +
>> +void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>> +{
>> +}
>> +
>> +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>> +{
>> +}
>> +
>> +void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> +{
>> +}
>> +
>> +int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>> +                                       struct kvm_guest_debug *dbg)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +
>> +int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>> +                                   struct kvm_mp_state *mp_state)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>> +                                   struct kvm_mp_state *mp_state)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>> +{
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
>> +{
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +long kvm_arch_vcpu_ioctl(struct file *filp,
>> +                        unsigned int ioctl, unsigned long arg)
>> +{
>> +       struct kvm_vcpu *vcpu = filp->private_data;
>> +       void __user *argp = (void __user *)arg;
>> +
>> +       switch (ioctl) {
>> +       case KVM_ARM_VCPU_INIT: {
>> +               struct kvm_vcpu_init init;
>> +
>> +               if (copy_from_user(&init, argp, sizeof init))
>> +                       return -EFAULT;
>> +
>> +               return kvm_vcpu_set_target(vcpu, &init);
>> +
>> +       }
>> +       case KVM_SET_ONE_REG:
>> +       case KVM_GET_ONE_REG: {
>> +               struct kvm_one_reg reg;
>> +               if (copy_from_user(&reg, argp, sizeof(reg)))
>> +                       return -EFAULT;
>> +               if (ioctl == KVM_SET_ONE_REG)
>> +                       return kvm_arm_set_reg(vcpu, &reg);
>> +               else
>> +                       return kvm_arm_get_reg(vcpu, &reg);
>> +       }
>> +       case KVM_GET_REG_LIST: {
>> +               struct kvm_reg_list __user *user_list = argp;
>> +               struct kvm_reg_list reg_list;
>> +               unsigned n;
>> +
>> +               if (copy_from_user(&reg_list, user_list, sizeof reg_list))
>> +                       return -EFAULT;
>> +               n = reg_list.n;
>> +               reg_list.n = kvm_arm_num_regs(vcpu);
>> +               if (copy_to_user(user_list, &reg_list, sizeof reg_list))
>> +                       return -EFAULT;
>> +               if (n < reg_list.n)
>> +                       return -E2BIG;
>> +               return kvm_arm_copy_reg_indices(vcpu, user_list->reg);
>
> kvm_reg_list sounds like it could be done using a regset instead.
>
>> diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
>> new file mode 100644
>> index 0000000..690bbb3
>> --- /dev/null
>> +++ b/arch/arm/kvm/emulate.c
>> @@ -0,0 +1,127 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#include <asm/kvm_emulate.h>
>> +
>> +#define REG_OFFSET(_reg) \
>> +       (offsetof(struct kvm_regs, _reg) / sizeof(u32))
>> +
>> +#define USR_REG_OFFSET(_num) REG_OFFSET(usr_regs[_num])
>> +
>> +static const unsigned long vcpu_reg_offsets[MODE_SYS + 1][16] = {
>> +       /* FIQ Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7),
>> +               REG_OFFSET(fiq_regs[1]), /* r8 */
>> +               REG_OFFSET(fiq_regs[1]), /* r9 */
>> +               REG_OFFSET(fiq_regs[2]), /* r10 */
>> +               REG_OFFSET(fiq_regs[3]), /* r11 */
>> +               REG_OFFSET(fiq_regs[4]), /* r12 */
>> +               REG_OFFSET(fiq_regs[5]), /* r13 */
>> +               REG_OFFSET(fiq_regs[6]), /* r14 */
>> +               REG_OFFSET(pc)           /* r15 */
>> +       },
>> +
>> +       /* IRQ Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(irq_regs[0]), /* r13 */
>> +               REG_OFFSET(irq_regs[1]), /* r14 */
>> +               REG_OFFSET(pc)           /* r15 */
>> +       },
>> +
>> +       /* SVC Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(svc_regs[0]), /* r13 */
>> +               REG_OFFSET(svc_regs[1]), /* r14 */
>> +               REG_OFFSET(pc)           /* r15 */
>> +       },
>> +
>> +       /* ABT Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(abt_regs[0]), /* r13 */
>> +               REG_OFFSET(abt_regs[1]), /* r14 */
>> +               REG_OFFSET(pc)           /* r15 */
>> +       },
>> +
>> +       /* UND Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(und_regs[0]), /* r13 */
>> +               REG_OFFSET(und_regs[1]), /* r14 */
>> +               REG_OFFSET(pc)           /* r15 */
>> +       },
>> +
>> +       /* USR Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(usr_regs[13]), /* r13 */
>> +               REG_OFFSET(usr_regs[14]), /* r14 */
>> +               REG_OFFSET(pc)            /* r15 */
>> +       },
>> +
>> +       /* SYS Registers */
>> +       {
>> +               USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
>> +               USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
>> +               USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
>> +               USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
>> +               USR_REG_OFFSET(12),
>> +               REG_OFFSET(usr_regs[13]), /* r13 */
>> +               REG_OFFSET(usr_regs[14]), /* r14 */
>> +               REG_OFFSET(pc)            /* r15 */
>> +       },
>> +};
>> +
>> +/*
>> + * Return a pointer to the register number valid in the specified mode of
>> + * the virtual CPU.
>> + */
>> +u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode)
>> +{
>> +       u32 *reg_array = (u32 *)&vcpu->arch.regs;
>> +
>> +       BUG_ON(reg_num > 15);
>> +       BUG_ON(mode > MODE_SYS);
>
> Again, BUG_ON seems a bit OTT here. Also, this is where the mode => idx
> calculation should happen.
>
>> +       return reg_array + vcpu_reg_offsets[mode][reg_num];
>> +}
>> diff --git a/arch/arm/kvm/exports.c b/arch/arm/kvm/exports.c
>> new file mode 100644
>> index 0000000..3e38c95
>> --- /dev/null
>> +++ b/arch/arm/kvm/exports.c
>> @@ -0,0 +1,21 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#include <linux/module.h>
>> +
>> +EXPORT_SYMBOL_GPL(smp_send_reschedule);
>
> Erm...
>
> We already have arch/arm/kernel/armksyms.c for exports -- please use that.
> However, exporting such low-level operations sounds like a bad idea. How
> realistic is kvm-as-a-module on ARM anyway?
>

at this point it's broken, so I'll just remove this and leave this for a 
fun project for some poor soul at some point if anyone ever needs half 
the code outside the kernel as a module (the other half needs to be 
compiled in anyway)

>> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
>> new file mode 100644
>> index 0000000..19a5389
>> --- /dev/null
>> +++ b/arch/arm/kvm/guest.c
>> @@ -0,0 +1,211 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#include <linux/errno.h>
>> +#include <linux/err.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/module.h>
>> +#include <linux/vmalloc.h>
>> +#include <linux/fs.h>
>> +#include <asm/uaccess.h>
>> +#include <asm/kvm.h>
>> +#include <asm/kvm_asm.h>
>> +#include <asm/kvm_emulate.h>
>> +
>> +#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
>> +#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
>> +
>> +struct kvm_stats_debugfs_item debugfs_entries[] = {
>> +       { NULL }
>> +};
>> +
>> +int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
>> +{
>> +       return 0;
>> +}
>> +
>> +static u64 core_reg_offset_from_id(u64 id)
>> +{
>> +       return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
>> +}
>> +
>> +static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>> +{
>> +       u32 __user *uaddr = (u32 __user *)(long)reg->addr;
>> +       struct kvm_regs *regs = &vcpu->arch.regs;
>> +       u64 off;
>> +
>> +       if (KVM_REG_SIZE(reg->id) != 4)
>> +               return -ENOENT;
>> +
>> +       /* Our ID is an index into the kvm_regs struct. */
>> +       off = core_reg_offset_from_id(reg->id);
>> +       if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
>> +               return -ENOENT;
>> +
>> +       return put_user(((u32 *)regs)[off], uaddr);
>> +}
>> +
>> +static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>> +{
>> +       u32 __user *uaddr = (u32 __user *)(long)reg->addr;
>> +       struct kvm_regs *regs = &vcpu->arch.regs;
>> +       u64 off, val;
>> +
>> +       if (KVM_REG_SIZE(reg->id) != 4)
>> +               return -ENOENT;
>> +
>> +       /* Our ID is an index into the kvm_regs struct. */
>> +       off = core_reg_offset_from_id(reg->id);
>> +       if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
>> +               return -ENOENT;
>> +
>> +       if (get_user(val, uaddr) != 0)
>> +               return -EFAULT;
>> +
>> +       if (off == KVM_REG_ARM_CORE_REG(cpsr)) {
>> +               if (__vcpu_mode(val) == 0xf)
>> +                       return -EINVAL;
>> +       }
>> +
>> +       ((u32 *)regs)[off] = val;
>> +       return 0;
>> +}
>> +
>> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>> +{
>> +       return -EINVAL;
>> +}
>> +
>> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>> +{
>> +       return -EINVAL;
>> +}
>
> Again, all looks like this should be implemented using regsets from what I
> can tell.
>

this API has been discussed to death on the KVM lists, and we can of 
course revive that if the regset makes it nicer - I'd prefer getting 
this upstream the way that it is now though, where GET_REG / SET_REG 
seems to be the way forward from a KVM perspective.

>> diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
>> new file mode 100644
>> index 0000000..290a13d
>> --- /dev/null
>> +++ b/arch/arm/kvm/reset.c
>> @@ -0,0 +1,74 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virrtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +#include <linux/compiler.h>
>> +#include <linux/errno.h>
>> +#include <linux/sched.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/kvm.h>
>> +
>> +#include <asm/unified.h>
>> +#include <asm/ptrace.h>
>> +#include <asm/cputype.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/kvm_coproc.h>
>> +
>> +/******************************************************************************
>> + * Cortex-A15 Reset Values
>> + */
>> +
>> +static const int a15_max_cpu_idx = 3;
>> +
>> +static struct kvm_regs a15_regs_reset = {
>> +       .cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
>> +};
>> +
>> +
>> +/*******************************************************************************
>> + * Exported reset function
>> + */
>> +
>> +/**
>> + * kvm_reset_vcpu - sets core registers and cp15 registers to reset value
>> + * @vcpu: The VCPU pointer
>> + *
>> + * This function finds the right table above and sets the registers on the
>> + * virtual CPU struct to their architectually defined reset values.
>> + */
>> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>> +{
>> +       struct kvm_regs *cpu_reset;
>> +
>> +       switch (vcpu->arch.target) {
>> +       case KVM_ARM_TARGET_CORTEX_A15:
>> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
>> +                       return -EINVAL;
>> +               cpu_reset = &a15_regs_reset;
>> +               vcpu->arch.midr = read_cpuid_id();
>> +               break;
>> +       default:
>> +               return -ENODEV;
>> +       }
>> +
>> +       /* Reset core registers */
>> +       memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
>> +
>> +       /* Reset CP15 registers */
>> +       kvm_reset_coprocs(vcpu);
>> +
>> +       return 0;
>> +}
>
> This is a nice way to plug in new CPUs but the way the rest of the code is
> currently written, all the ARMv7 and Cortex-A15 code is merged together. I
> *strongly* suggest you isolate this from the start, as it will help you see
> what is architected and what is implementation-specific.
>

not entirely sure what you mean. You want a separate coproc.c file for 
Cortex-A15 specific stuff like coproc_a15.c?

Thanks a bunch for the review!
-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* RE: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
  2012-09-25 12:38       ` Christoffer Dall
@ 2012-09-27  3:11         ` Min-gyu Kim
  -1 siblings, 0 replies; 164+ messages in thread
From: Min-gyu Kim @ 2012-09-27  3:11 UTC (permalink / raw)
  To: 'Christoffer Dall'
  Cc: kvm, linux-arm-kernel, kvmarm, '김창환'



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Christoffer Dall
> Sent: Tuesday, September 25, 2012 9:39 PM
> To: Min-gyu Kim
> Cc: kvm@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> kvmarm@lists.cs.columbia.edu; 김창환
> Subject: Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
> 
> >> +
> >> +     /*
> >> +      * If this is a write fault (think COW) we need to make sure the
> >> +      * existing page, which other CPUs might still read, doesn't go
> >> away
> >> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
> >> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
> >> will
> >> +      * pin the existing page, then we get a new page for the user
> space
> >> +      * pte and map this in the stage-2 table where we also make sure
> to
> >> +      * flush the TLB for the VM, if there was an existing entry
> >> + (the
> >> entry
> >> +      * was updated setting the write flag to the potentially new
page).
> >> +      */
> >> +     if (fault_status == FSC_PERM) {
> >> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false,
NULL);
> >> +             if (is_error_pfn(pfn_existing))
> >> +                     return -EFAULT;
> >> +     }
> >> +
> >> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
> >> +     if (is_error_pfn(pfn)) {
> >> +             ret = -EFAULT;
> >> +             goto out_put_existing;
> >> +     }
> >> +
> >> +     /* We need minimum second+third level pages */
> >> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
> >> +     if (ret)
> >> +             goto out;
> >> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
> >> +     if (writable)
> >> +             pte_val(new_pte) |= L_PTE2_WRITE;
> >> +     coherent_icache_guest_page(vcpu->kvm, gfn);
> >
> > why don't you flush icache only when guest has mapped executable page
> > as __sync_icache_dcache function does currently?
> >
> >
> 
> because we don't know if the guest will map the page executable. The guest
> may read the page through a normal load, which causes the fault, and
> subsequently execute it (even possible through different guest mappings).
> The only way to see this happening would be to mark all pages as non-
> executable and catch the fault when it occurs - unfortunately the HPFAR
> which gives us the IPA is not populated on execute never faults, so we
> would have to translate the PC's va to ipa using cp15 functionality when
> this happens, which is then also racy with other CPUs. So the question is
> really if this will even be an optimization, but it's definitely something
> that requires further investigation.

OK. I understand your point.

But if guest maps a page for execution, guest will flush Icache
from __sync_icache_dcache. Then coherent_icache_guest_page doesn't seem to
be
necessary again. One thing I'm not sure in this case is when guest maps
for kernel executable page(module loading) and it reuses the kernel
executable page
from host(module unloading). But in that case, I think it is possible to
reduce 
the number of flush by limiting the address range for flush.


> 
> -Christoffer
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
@ 2012-09-27  3:11         ` Min-gyu Kim
  0 siblings, 0 replies; 164+ messages in thread
From: Min-gyu Kim @ 2012-09-27  3:11 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: kvm-owner at vger.kernel.org [mailto:kvm-owner at vger.kernel.org] On
> Behalf Of Christoffer Dall
> Sent: Tuesday, September 25, 2012 9:39 PM
> To: Min-gyu Kim
> Cc: kvm at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
> kvmarm at lists.cs.columbia.edu; ???
> Subject: Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
> 
> >> +
> >> +     /*
> >> +      * If this is a write fault (think COW) we need to make sure the
> >> +      * existing page, which other CPUs might still read, doesn't go
> >> away
> >> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
> >> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
> >> will
> >> +      * pin the existing page, then we get a new page for the user
> space
> >> +      * pte and map this in the stage-2 table where we also make sure
> to
> >> +      * flush the TLB for the VM, if there was an existing entry
> >> + (the
> >> entry
> >> +      * was updated setting the write flag to the potentially new
page).
> >> +      */
> >> +     if (fault_status == FSC_PERM) {
> >> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false,
NULL);
> >> +             if (is_error_pfn(pfn_existing))
> >> +                     return -EFAULT;
> >> +     }
> >> +
> >> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
> >> +     if (is_error_pfn(pfn)) {
> >> +             ret = -EFAULT;
> >> +             goto out_put_existing;
> >> +     }
> >> +
> >> +     /* We need minimum second+third level pages */
> >> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
> >> +     if (ret)
> >> +             goto out;
> >> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
> >> +     if (writable)
> >> +             pte_val(new_pte) |= L_PTE2_WRITE;
> >> +     coherent_icache_guest_page(vcpu->kvm, gfn);
> >
> > why don't you flush icache only when guest has mapped executable page
> > as __sync_icache_dcache function does currently?
> >
> >
> 
> because we don't know if the guest will map the page executable. The guest
> may read the page through a normal load, which causes the fault, and
> subsequently execute it (even possible through different guest mappings).
> The only way to see this happening would be to mark all pages as non-
> executable and catch the fault when it occurs - unfortunately the HPFAR
> which gives us the IPA is not populated on execute never faults, so we
> would have to translate the PC's va to ipa using cp15 functionality when
> this happens, which is then also racy with other CPUs. So the question is
> really if this will even be an optimization, but it's definitely something
> that requires further investigation.

OK. I understand your point.

But if guest maps a page for execution, guest will flush Icache
from __sync_icache_dcache. Then coherent_icache_guest_page doesn't seem to
be
necessary again. One thing I'm not sure in this case is when guest maps
for kernel executable page(module loading) and it reuses the kernel
executable page
from host(module unloading). But in that case, I think it is possible to
reduce 
the number of flush by limiting the address range for flush.


> 
> -Christoffer
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the body
> of a message to majordomo at vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
  2012-09-27  3:11         ` Min-gyu Kim
@ 2012-09-27  5:35           ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-27  5:35 UTC (permalink / raw)
  To: Min-gyu Kim; +Cc: kvm, linux-arm-kernel, kvmarm, 김창환

On Wed, Sep 26, 2012 at 11:11 PM, Min-gyu Kim <mingyu84.kim@samsung.com> wrote:
>
>
>> -----Original Message-----
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
>> Behalf Of Christoffer Dall
>> Sent: Tuesday, September 25, 2012 9:39 PM
>> To: Min-gyu Kim
>> Cc: kvm@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
>> kvmarm@lists.cs.columbia.edu; 김창환
>> Subject: Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
>>
>> >> +
>> >> +     /*
>> >> +      * If this is a write fault (think COW) we need to make sure the
>> >> +      * existing page, which other CPUs might still read, doesn't go
>> >> away
>> >> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
>> >> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
>> >> will
>> >> +      * pin the existing page, then we get a new page for the user
>> space
>> >> +      * pte and map this in the stage-2 table where we also make sure
>> to
>> >> +      * flush the TLB for the VM, if there was an existing entry
>> >> + (the
>> >> entry
>> >> +      * was updated setting the write flag to the potentially new
> page).
>> >> +      */
>> >> +     if (fault_status == FSC_PERM) {
>> >> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false,
> NULL);
>> >> +             if (is_error_pfn(pfn_existing))
>> >> +                     return -EFAULT;
>> >> +     }
>> >> +
>> >> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
>> >> +     if (is_error_pfn(pfn)) {
>> >> +             ret = -EFAULT;
>> >> +             goto out_put_existing;
>> >> +     }
>> >> +
>> >> +     /* We need minimum second+third level pages */
>> >> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
>> >> +     if (ret)
>> >> +             goto out;
>> >> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>> >> +     if (writable)
>> >> +             pte_val(new_pte) |= L_PTE2_WRITE;
>> >> +     coherent_icache_guest_page(vcpu->kvm, gfn);
>> >
>> > why don't you flush icache only when guest has mapped executable page
>> > as __sync_icache_dcache function does currently?
>> >
>> >
>>
>> because we don't know if the guest will map the page executable. The guest
>> may read the page through a normal load, which causes the fault, and
>> subsequently execute it (even possible through different guest mappings).
>> The only way to see this happening would be to mark all pages as non-
>> executable and catch the fault when it occurs - unfortunately the HPFAR
>> which gives us the IPA is not populated on execute never faults, so we
>> would have to translate the PC's va to ipa using cp15 functionality when
>> this happens, which is then also racy with other CPUs. So the question is
>> really if this will even be an optimization, but it's definitely something
>> that requires further investigation.
>
> OK. I understand your point.
>
> But if guest maps a page for execution, guest will flush Icache
> from __sync_icache_dcache. Then coherent_icache_guest_page doesn't seem to
> be
> necessary again. One thing I'm not sure in this case is when guest maps
> for kernel executable page(module loading) and it reuses the kernel
> executable page
> from host(module unloading). But in that case, I think it is possible to
> reduce
> the number of flush by limiting the address range for flush.
>
>
the guest kernel will flush the dcache when it maps the page
initially. However, when we swap on the host we might use that same
page at the same virtual address as the original guest in another
guest or on the host, and the icache will now contain incorrect code
that can be executed from the guest in case of vipt caches. In the
case of pipt caches, if the page is used for instructions on any
virtual address, incorrect entries can be executed from the icache
once this page is used for guest instructions again. If you have
suggestions on how to optimize this, it would be great, but I see not
good way around it.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
@ 2012-09-27  5:35           ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-27  5:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 26, 2012 at 11:11 PM, Min-gyu Kim <mingyu84.kim@samsung.com> wrote:
>
>
>> -----Original Message-----
>> From: kvm-owner at vger.kernel.org [mailto:kvm-owner at vger.kernel.org] On
>> Behalf Of Christoffer Dall
>> Sent: Tuesday, September 25, 2012 9:39 PM
>> To: Min-gyu Kim
>> Cc: kvm at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
>> kvmarm at lists.cs.columbia.edu; ???
>> Subject: Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
>>
>> >> +
>> >> +     /*
>> >> +      * If this is a write fault (think COW) we need to make sure the
>> >> +      * existing page, which other CPUs might still read, doesn't go
>> >> away
>> >> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
>> >> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
>> >> will
>> >> +      * pin the existing page, then we get a new page for the user
>> space
>> >> +      * pte and map this in the stage-2 table where we also make sure
>> to
>> >> +      * flush the TLB for the VM, if there was an existing entry
>> >> + (the
>> >> entry
>> >> +      * was updated setting the write flag to the potentially new
> page).
>> >> +      */
>> >> +     if (fault_status == FSC_PERM) {
>> >> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false,
> NULL);
>> >> +             if (is_error_pfn(pfn_existing))
>> >> +                     return -EFAULT;
>> >> +     }
>> >> +
>> >> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
>> >> +     if (is_error_pfn(pfn)) {
>> >> +             ret = -EFAULT;
>> >> +             goto out_put_existing;
>> >> +     }
>> >> +
>> >> +     /* We need minimum second+third level pages */
>> >> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
>> >> +     if (ret)
>> >> +             goto out;
>> >> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>> >> +     if (writable)
>> >> +             pte_val(new_pte) |= L_PTE2_WRITE;
>> >> +     coherent_icache_guest_page(vcpu->kvm, gfn);
>> >
>> > why don't you flush icache only when guest has mapped executable page
>> > as __sync_icache_dcache function does currently?
>> >
>> >
>>
>> because we don't know if the guest will map the page executable. The guest
>> may read the page through a normal load, which causes the fault, and
>> subsequently execute it (even possible through different guest mappings).
>> The only way to see this happening would be to mark all pages as non-
>> executable and catch the fault when it occurs - unfortunately the HPFAR
>> which gives us the IPA is not populated on execute never faults, so we
>> would have to translate the PC's va to ipa using cp15 functionality when
>> this happens, which is then also racy with other CPUs. So the question is
>> really if this will even be an optimization, but it's definitely something
>> that requires further investigation.
>
> OK. I understand your point.
>
> But if guest maps a page for execution, guest will flush Icache
> from __sync_icache_dcache. Then coherent_icache_guest_page doesn't seem to
> be
> necessary again. One thing I'm not sure in this case is when guest maps
> for kernel executable page(module loading) and it reuses the kernel
> executable page
> from host(module unloading). But in that case, I think it is possible to
> reduce
> the number of flush by limiting the address range for flush.
>
>
the guest kernel will flush the dcache when it maps the page
initially. However, when we swap on the host we might use that same
page at the same virtual address as the original guest in another
guest or on the host, and the icache will now contain incorrect code
that can be executed from the guest in case of vipt caches. In the
case of pipt caches, if the page is used for instructions on any
virtual address, incorrect entries can be executed from the icache
once this page is used for guest instructions again. If you have
suggestions on how to optimize this, it would be great, but I see not
good way around it.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
  2012-09-25 12:38       ` Christoffer Dall
@ 2012-09-27 12:39         ` Catalin Marinas
  -1 siblings, 0 replies; 164+ messages in thread
From: Catalin Marinas @ 2012-09-27 12:39 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: 김창환, kvmarm, kvm, linux-arm-kernel, Min-gyu Kim

On 25 September 2012 13:38, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
>>> +
>>> +     /*
>>> +      * If this is a write fault (think COW) we need to make sure the
>>> +      * existing page, which other CPUs might still read, doesn't go
>>> away
>>> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
>>> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
>>> will
>>> +      * pin the existing page, then we get a new page for the user space
>>> +      * pte and map this in the stage-2 table where we also make sure to
>>> +      * flush the TLB for the VM, if there was an existing entry (the
>>> entry
>>> +      * was updated setting the write flag to the potentially new page).
>>> +      */
>>> +     if (fault_status == FSC_PERM) {
>>> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
>>> +             if (is_error_pfn(pfn_existing))
>>> +                     return -EFAULT;
>>> +     }
>>> +
>>> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
>>> +     if (is_error_pfn(pfn)) {
>>> +             ret = -EFAULT;
>>> +             goto out_put_existing;
>>> +     }
>>> +
>>> +     /* We need minimum second+third level pages */
>>> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
>>> +     if (ret)
>>> +             goto out;
>>> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>>> +     if (writable)
>>> +             pte_val(new_pte) |= L_PTE2_WRITE;
>>> +     coherent_icache_guest_page(vcpu->kvm, gfn);
>>
>> why don't you flush icache only when guest has mapped executable page
>> as __sync_icache_dcache function does currently?
>
> because we don't know if the guest will map the page executable. The
> guest may read the page through a normal load, which causes the fault,
> and subsequently execute it (even possible through different guest
> mappings). The only way to see this happening would be to mark all
> pages as non-executable and catch the fault when it occurs -
> unfortunately the HPFAR which gives us the IPA is not populated on
> execute never faults, so we would have to translate the PC's va to ipa
> using cp15 functionality when this happens, which is then also racy
> with other CPUs.

I think you can avoid the race in the stage 2 XN case. In the Hyp
exception handler entered because of a stage 2 XN bit you can get the
IPA via the CP15 ATS1CPR and PAR registers. If the address translation
failed because the same guest running on other CPU changed the stage 1
page table, you can simply return to the guest rather than switching
to host with incomplete information. The guest may handle its own
stage 1 fault and eventually trigger another stage 2 permission and
Hyp will try the address translation again. That's a very rare
situation, so just returning without handling it would not cause any
performance issues.

-- 
Catalin

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
@ 2012-09-27 12:39         ` Catalin Marinas
  0 siblings, 0 replies; 164+ messages in thread
From: Catalin Marinas @ 2012-09-27 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 25 September 2012 13:38, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
>>> +
>>> +     /*
>>> +      * If this is a write fault (think COW) we need to make sure the
>>> +      * existing page, which other CPUs might still read, doesn't go
>>> away
>>> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
>>> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
>>> will
>>> +      * pin the existing page, then we get a new page for the user space
>>> +      * pte and map this in the stage-2 table where we also make sure to
>>> +      * flush the TLB for the VM, if there was an existing entry (the
>>> entry
>>> +      * was updated setting the write flag to the potentially new page).
>>> +      */
>>> +     if (fault_status == FSC_PERM) {
>>> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
>>> +             if (is_error_pfn(pfn_existing))
>>> +                     return -EFAULT;
>>> +     }
>>> +
>>> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
>>> +     if (is_error_pfn(pfn)) {
>>> +             ret = -EFAULT;
>>> +             goto out_put_existing;
>>> +     }
>>> +
>>> +     /* We need minimum second+third level pages */
>>> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
>>> +     if (ret)
>>> +             goto out;
>>> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>>> +     if (writable)
>>> +             pte_val(new_pte) |= L_PTE2_WRITE;
>>> +     coherent_icache_guest_page(vcpu->kvm, gfn);
>>
>> why don't you flush icache only when guest has mapped executable page
>> as __sync_icache_dcache function does currently?
>
> because we don't know if the guest will map the page executable. The
> guest may read the page through a normal load, which causes the fault,
> and subsequently execute it (even possible through different guest
> mappings). The only way to see this happening would be to mark all
> pages as non-executable and catch the fault when it occurs -
> unfortunately the HPFAR which gives us the IPA is not populated on
> execute never faults, so we would have to translate the PC's va to ipa
> using cp15 functionality when this happens, which is then also racy
> with other CPUs.

I think you can avoid the race in the stage 2 XN case. In the Hyp
exception handler entered because of a stage 2 XN bit you can get the
IPA via the CP15 ATS1CPR and PAR registers. If the address translation
failed because the same guest running on other CPU changed the stage 1
page table, you can simply return to the guest rather than switching
to host with incomplete information. The guest may handle its own
stage 1 fault and eventually trigger another stage 2 permission and
Hyp will try the address translation again. That's a very rare
situation, so just returning without handling it would not cause any
performance issues.

-- 
Catalin

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-09-26  1:43       ` Christoffer Dall
@ 2012-09-27 14:13         ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-27 14:13 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, rusty.russell, avi, marc.zyngier

On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
> On 09/25/2012 11:20 AM, Will Deacon wrote:
> >> +/* Multiprocessor Affinity Register */
> >> +#define MPIDR_CPUID    (0x3 << 0)
> >
> > I'm fairly sure we already have code under arch/arm/ for dealing with the
> > mpidr. Let's re-use that rather than reinventing it here.
> >
>
> I see some defines in topology.c - do you want some of these factored
> out into a header file that we can then also use from kvm? If so, where?

I guess either in topology.h or a new header (topology-bits.h).

> >> +#define EXCEPTION_NONE      0
> >> +#define EXCEPTION_RESET     0x80
> >> +#define EXCEPTION_UNDEFINED 0x40
> >> +#define EXCEPTION_SOFTWARE  0x20
> >> +#define EXCEPTION_PREFETCH  0x10
> >> +#define EXCEPTION_DATA      0x08
> >> +#define EXCEPTION_IMPRECISE 0x04
> >> +#define EXCEPTION_IRQ       0x02
> >> +#define EXCEPTION_FIQ       0x01
> >
> > Why the noise?
> >
>
> these are simply cruft from a previous life of KVM/ARM.

Ok, then please get rid of them.

> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
> >> +{
> >> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
> >> +       BUG_ON(mode == 0xf);
> >> +       return mode;
> >> +}
> >
> > I noticed that you have a fair few BUG_ONs throughout the series. Fair
> > enough, but for hyp code is that really the right thing to do? Killing the
> > guest could make more sense, perhaps?
>
> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
> catch explicitly on the host. We have had a pass over the code to change
> all the BUG_ONs that can be provoked by the guest and inject the proper
> exceptions into the guest in this case. If you find places where this is
> not the case, it should be changed, and do let me know.

Ok, so are you saying that a BUG_ON due to some detected inconsistency with
one guest may not necessarily terminate the other guests? BUG_ONs in the
host seem like a bad idea if the host is able to continue with a subset of
guests.

> >
> >> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
> >> +{
> >> +       return vcpu_reg(vcpu, 15);
> >> +}
> >
> > If you stick a struct pt_regs into struct kvm_regs, you could reuse ARM_pc
> > here etc.
> >
>
> I prefer not to, because we'd have those registers presumably for usr
> mode and then we only define the others explicit. I think it's much
> clearer to look at kvm_regs today.

I disagree and think that you should reuse as much of the arch/arm/ code as
possible. Not only does it make it easier to read by people who are familiar
with that code (and in turn get you more reviewers) but it also means that
we limit the amount of duplication that we have.

I think Marc (CC'd) had a go at this with some success.

> >> +#ifndef __ARM_KVM_HOST_H__
> >> +#define __ARM_KVM_HOST_H__
> >> +
> >> +#include <asm/kvm.h>
> >> +
> >> +#define KVM_MAX_VCPUS 4
> >
> > NR_CPUS?
> >
>
> well this is defined by KVM generic code, and is common for other
> architecture.

I mean #define KVM_MAX_CPUS NR_CPUS. The 4 seems arbitrary.

> >> +int __attribute_const__ kvm_target_cpu(void)
> >> +{
> >> +       unsigned int midr;
> >> +
> >> +       midr = read_cpuid_id();
> >> +       switch ((midr >> 4) & 0xfff) {
> >> +       case KVM_ARM_TARGET_CORTEX_A15:
> >> +               return KVM_ARM_TARGET_CORTEX_A15;
> >
> > I have this code already in perf_event.c. Can we move it somewhere common
> > and share it? You should also check that the implementor field is 0x41.
> >
>
> by all means, you can probably suggest a good place better than I can...

cputype.h?

> >> +#include <linux/module.h>
> >> +
> >> +EXPORT_SYMBOL_GPL(smp_send_reschedule);
> >
> > Erm...
> >
> > We already have arch/arm/kernel/armksyms.c for exports -- please use that.
> > However, exporting such low-level operations sounds like a bad idea. How
> > realistic is kvm-as-a-module on ARM anyway?
> >
>
> at this point it's broken, so I'll just remove this and leave this for a
> fun project for some poor soul at some point if anyone ever needs half
> the code outside the kernel as a module (the other half needs to be
> compiled in anyway)

Ok, that suits me. If it's broken, let's not include it in the initial
submission.

> >> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> >> +{
> >> +       return -EINVAL;
> >> +}
> >> +
> >> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> >> +{
> >> +       return -EINVAL;
> >> +}
> >
> > Again, all looks like this should be implemented using regsets from what I
> > can tell.
> >
>
> this API has been discussed to death on the KVM lists, and we can of
> course revive that if the regset makes it nicer - I'd prefer getting
> this upstream the way that it is now though, where GET_REG / SET_REG
> seems to be the way forward from a KVM perspective.

I'm sure the API has been discussed, but I've not seen that conversation and
I'm also not aware about whether or not regsets were considered as a
possibility for this stuff. The advantages of using them are:

	1. It's less code for the arch to implement (and most of what you
	need, you already have).

	2. You can move the actual copying code into core KVM, like we have
	for ptrace.

	3. New KVM ports (e.g. arm64) can reuse the core copying code
	easily.

Furthermore, some registers (typically) floating point and GPRs will already
have regsets for the ptrace code, so that can be reused if you share the
datatypes.

The big problem with getting things upstream and then changing it later is
that you will break the ABI. I highly doubt that's feasible, so can we not
just use regsets from the start for ARM?

> >> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >> +{
> >> +       struct kvm_regs *cpu_reset;
> >> +
> >> +       switch (vcpu->arch.target) {
> >> +       case KVM_ARM_TARGET_CORTEX_A15:
> >> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
> >> +                       return -EINVAL;
> >> +               cpu_reset = &a15_regs_reset;
> >> +               vcpu->arch.midr = read_cpuid_id();
> >> +               break;
> >> +       default:
> >> +               return -ENODEV;
> >> +       }
> >> +
> >> +       /* Reset core registers */
> >> +       memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
> >> +
> >> +       /* Reset CP15 registers */
> >> +       kvm_reset_coprocs(vcpu);
> >> +
> >> +       return 0;
> >> +}
> >
> > This is a nice way to plug in new CPUs but the way the rest of the code is
> > currently written, all the ARMv7 and Cortex-A15 code is merged together. I
> > *strongly* suggest you isolate this from the start, as it will help you see
> > what is architected and what is implementation-specific.
> >
>
> not entirely sure what you mean. You want a separate coproc.c file for
> Cortex-A15 specific stuff like coproc_a15.c?

Indeed. I think it will make adding new CPUs a lot clearer and separate the
architecture from the implementation.

Cheers,

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-09-27 14:13         ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-27 14:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
> On 09/25/2012 11:20 AM, Will Deacon wrote:
> >> +/* Multiprocessor Affinity Register */
> >> +#define MPIDR_CPUID    (0x3 << 0)
> >
> > I'm fairly sure we already have code under arch/arm/ for dealing with the
> > mpidr. Let's re-use that rather than reinventing it here.
> >
>
> I see some defines in topology.c - do you want some of these factored
> out into a header file that we can then also use from kvm? If so, where?

I guess either in topology.h or a new header (topology-bits.h).

> >> +#define EXCEPTION_NONE      0
> >> +#define EXCEPTION_RESET     0x80
> >> +#define EXCEPTION_UNDEFINED 0x40
> >> +#define EXCEPTION_SOFTWARE  0x20
> >> +#define EXCEPTION_PREFETCH  0x10
> >> +#define EXCEPTION_DATA      0x08
> >> +#define EXCEPTION_IMPRECISE 0x04
> >> +#define EXCEPTION_IRQ       0x02
> >> +#define EXCEPTION_FIQ       0x01
> >
> > Why the noise?
> >
>
> these are simply cruft from a previous life of KVM/ARM.

Ok, then please get rid of them.

> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
> >> +{
> >> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
> >> +       BUG_ON(mode == 0xf);
> >> +       return mode;
> >> +}
> >
> > I noticed that you have a fair few BUG_ONs throughout the series. Fair
> > enough, but for hyp code is that really the right thing to do? Killing the
> > guest could make more sense, perhaps?
>
> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
> catch explicitly on the host. We have had a pass over the code to change
> all the BUG_ONs that can be provoked by the guest and inject the proper
> exceptions into the guest in this case. If you find places where this is
> not the case, it should be changed, and do let me know.

Ok, so are you saying that a BUG_ON due to some detected inconsistency with
one guest may not necessarily terminate the other guests? BUG_ONs in the
host seem like a bad idea if the host is able to continue with a subset of
guests.

> >
> >> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
> >> +{
> >> +       return vcpu_reg(vcpu, 15);
> >> +}
> >
> > If you stick a struct pt_regs into struct kvm_regs, you could reuse ARM_pc
> > here etc.
> >
>
> I prefer not to, because we'd have those registers presumably for usr
> mode and then we only define the others explicit. I think it's much
> clearer to look at kvm_regs today.

I disagree and think that you should reuse as much of the arch/arm/ code as
possible. Not only does it make it easier to read by people who are familiar
with that code (and in turn get you more reviewers) but it also means that
we limit the amount of duplication that we have.

I think Marc (CC'd) had a go at this with some success.

> >> +#ifndef __ARM_KVM_HOST_H__
> >> +#define __ARM_KVM_HOST_H__
> >> +
> >> +#include <asm/kvm.h>
> >> +
> >> +#define KVM_MAX_VCPUS 4
> >
> > NR_CPUS?
> >
>
> well this is defined by KVM generic code, and is common for other
> architecture.

I mean #define KVM_MAX_CPUS NR_CPUS. The 4 seems arbitrary.

> >> +int __attribute_const__ kvm_target_cpu(void)
> >> +{
> >> +       unsigned int midr;
> >> +
> >> +       midr = read_cpuid_id();
> >> +       switch ((midr >> 4) & 0xfff) {
> >> +       case KVM_ARM_TARGET_CORTEX_A15:
> >> +               return KVM_ARM_TARGET_CORTEX_A15;
> >
> > I have this code already in perf_event.c. Can we move it somewhere common
> > and share it? You should also check that the implementor field is 0x41.
> >
>
> by all means, you can probably suggest a good place better than I can...

cputype.h?

> >> +#include <linux/module.h>
> >> +
> >> +EXPORT_SYMBOL_GPL(smp_send_reschedule);
> >
> > Erm...
> >
> > We already have arch/arm/kernel/armksyms.c for exports -- please use that.
> > However, exporting such low-level operations sounds like a bad idea. How
> > realistic is kvm-as-a-module on ARM anyway?
> >
>
> at this point it's broken, so I'll just remove this and leave this for a
> fun project for some poor soul at some point if anyone ever needs half
> the code outside the kernel as a module (the other half needs to be
> compiled in anyway)

Ok, that suits me. If it's broken, let's not include it in the initial
submission.

> >> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> >> +{
> >> +       return -EINVAL;
> >> +}
> >> +
> >> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> >> +{
> >> +       return -EINVAL;
> >> +}
> >
> > Again, all looks like this should be implemented using regsets from what I
> > can tell.
> >
>
> this API has been discussed to death on the KVM lists, and we can of
> course revive that if the regset makes it nicer - I'd prefer getting
> this upstream the way that it is now though, where GET_REG / SET_REG
> seems to be the way forward from a KVM perspective.

I'm sure the API has been discussed, but I've not seen that conversation and
I'm also not aware about whether or not regsets were considered as a
possibility for this stuff. The advantages of using them are:

	1. It's less code for the arch to implement (and most of what you
	need, you already have).

	2. You can move the actual copying code into core KVM, like we have
	for ptrace.

	3. New KVM ports (e.g. arm64) can reuse the core copying code
	easily.

Furthermore, some registers (typically) floating point and GPRs will already
have regsets for the ptrace code, so that can be reused if you share the
datatypes.

The big problem with getting things upstream and then changing it later is
that you will break the ABI. I highly doubt that's feasible, so can we not
just use regsets from the start for ARM?

> >> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >> +{
> >> +       struct kvm_regs *cpu_reset;
> >> +
> >> +       switch (vcpu->arch.target) {
> >> +       case KVM_ARM_TARGET_CORTEX_A15:
> >> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
> >> +                       return -EINVAL;
> >> +               cpu_reset = &a15_regs_reset;
> >> +               vcpu->arch.midr = read_cpuid_id();
> >> +               break;
> >> +       default:
> >> +               return -ENODEV;
> >> +       }
> >> +
> >> +       /* Reset core registers */
> >> +       memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
> >> +
> >> +       /* Reset CP15 registers */
> >> +       kvm_reset_coprocs(vcpu);
> >> +
> >> +       return 0;
> >> +}
> >
> > This is a nice way to plug in new CPUs but the way the rest of the code is
> > currently written, all the ARMv7 and Cortex-A15 code is merged together. I
> > *strongly* suggest you isolate this from the start, as it will help you see
> > what is architected and what is implementation-specific.
> >
>
> not entirely sure what you mean. You want a separate coproc.c file for
> Cortex-A15 specific stuff like coproc_a15.c?

Indeed. I think it will make adding new CPUs a lot clearer and separate the
architecture from the implementation.

Cheers,

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-09-27 14:13         ` Will Deacon
@ 2012-09-27 14:39           ` Marc Zyngier
  -1 siblings, 0 replies; 164+ messages in thread
From: Marc Zyngier @ 2012-09-27 14:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: Christoffer Dall, kvm, linux-arm-kernel, kvmarm, rusty.russell, avi

On 27/09/12 15:13, Will Deacon wrote:
> On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
>> On 09/25/2012 11:20 AM, Will Deacon wrote:
>>>
>>>> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       return vcpu_reg(vcpu, 15);
>>>> +}
>>>
>>> If you stick a struct pt_regs into struct kvm_regs, you could reuse ARM_pc
>>> here etc.
>>>
>>
>> I prefer not to, because we'd have those registers presumably for usr
>> mode and then we only define the others explicit. I think it's much
>> clearer to look at kvm_regs today.
> 
> I disagree and think that you should reuse as much of the arch/arm/ code as
> possible. Not only does it make it easier to read by people who are familiar
> with that code (and in turn get you more reviewers) but it also means that
> we limit the amount of duplication that we have.
> 
> I think Marc (CC'd) had a go at this with some success.

Yup, I have it converted already. It requires a number of changes, but I
took this opportunity to do some other cleanup (world switch
save/restore code, mostly).

Patches are at the top of mu kvm-cleanup branch.

	M.
-- 
Jazz is not dead. It just smells funny...


^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-09-27 14:39           ` Marc Zyngier
  0 siblings, 0 replies; 164+ messages in thread
From: Marc Zyngier @ 2012-09-27 14:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/09/12 15:13, Will Deacon wrote:
> On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
>> On 09/25/2012 11:20 AM, Will Deacon wrote:
>>>
>>>> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       return vcpu_reg(vcpu, 15);
>>>> +}
>>>
>>> If you stick a struct pt_regs into struct kvm_regs, you could reuse ARM_pc
>>> here etc.
>>>
>>
>> I prefer not to, because we'd have those registers presumably for usr
>> mode and then we only define the others explicit. I think it's much
>> clearer to look at kvm_regs today.
> 
> I disagree and think that you should reuse as much of the arch/arm/ code as
> possible. Not only does it make it easier to read by people who are familiar
> with that code (and in turn get you more reviewers) but it also means that
> we limit the amount of duplication that we have.
> 
> I think Marc (CC'd) had a go at this with some success.

Yup, I have it converted already. It requires a number of changes, but I
took this opportunity to do some other cleanup (world switch
save/restore code, mostly).

Patches are at the top of mu kvm-cleanup branch.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-09-27 14:13         ` Will Deacon
@ 2012-09-27 14:45           ` Peter Maydell
  -1 siblings, 0 replies; 164+ messages in thread
From: Peter Maydell @ 2012-09-27 14:45 UTC (permalink / raw)
  To: Will Deacon
  Cc: Christoffer Dall, kvm, rusty.russell, kvmarm, linux-arm-kernel

On 27 September 2012 15:13, Will Deacon <will.deacon@arm.com> wrote:
> On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
>> this API has been discussed to death on the KVM lists, and we can of
>> course revive that if the regset makes it nicer - I'd prefer getting
>> this upstream the way that it is now though, where GET_REG / SET_REG
>> seems to be the way forward from a KVM perspective.
>
> I'm sure the API has been discussed, but I've not seen that conversation and
> I'm also not aware about whether or not regsets were considered as a
> possibility for this stuff.

Can you point me at some documentation for regsets? It's a bit
difficult to have a sensible conversation about their suitability
otherwise...

(The potentially tricky area to handle is the cp15 registers, where
it's quite likely that new registers might be added later and so
userspace has to be able to query the kernel for what is supported
and possibly deal with the kernel reporting attempts to set read
only bits within registers, etc. Using the same ABI for simpler
cases like the GPRs and FP registers is then just a matter of
being consistent in the interface we expose to userspace.)

-- PMM

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-09-27 14:45           ` Peter Maydell
  0 siblings, 0 replies; 164+ messages in thread
From: Peter Maydell @ 2012-09-27 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 27 September 2012 15:13, Will Deacon <will.deacon@arm.com> wrote:
> On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
>> this API has been discussed to death on the KVM lists, and we can of
>> course revive that if the regset makes it nicer - I'd prefer getting
>> this upstream the way that it is now though, where GET_REG / SET_REG
>> seems to be the way forward from a KVM perspective.
>
> I'm sure the API has been discussed, but I've not seen that conversation and
> I'm also not aware about whether or not regsets were considered as a
> possibility for this stuff.

Can you point me at some documentation for regsets? It's a bit
difficult to have a sensible conversation about their suitability
otherwise...

(The potentially tricky area to handle is the cp15 registers, where
it's quite likely that new registers might be added later and so
userspace has to be able to query the kernel for what is supported
and possibly deal with the kernel reporting attempts to set read
only bits within registers, etc. Using the same ABI for simpler
cases like the GPRs and FP registers is then just a matter of
being consistent in the interface we expose to userspace.)

-- PMM

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts
  2012-09-15 15:35   ` Christoffer Dall
@ 2012-09-27 15:11     ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-27 15:11 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm, dave.martin

On Sat, Sep 15, 2012 at 04:35:59PM +0100, Christoffer Dall wrote:
> When the guest accesses I/O memory this will create data abort
> exceptions and they are handled by decoding the HSR information
> (physical address, read/write, length, register) and forwarding reads
> and writes to QEMU which performs the device emulation.
> 
> Certain classes of load/store operations do not support the syndrome
> information provided in the HSR and we therefore must be able to fetch
> the offending instruction from guest memory and decode it manually.
> 
> We only support instruction decoding for valid reasonable MMIO operations
> where trapping them do not provide sufficient information in the HSR (no
> 16-bit Thumb instructions provide register writeback that we care about).
> 
> The following instruciton types are NOT supported for MMIO operations
> despite the HSR not containing decode info:
>  - any Load/Store multiple
>  - any load/store exclusive
>  - any load/store dual
>  - anything with the PC as the dest register

[...]

> +
> +/******************************************************************************
> + * Load-Store instruction emulation
> + *****************************************************************************/
> +
> +/*
> + * Must be ordered with LOADS first and WRITES afterwards
> + * for easy distinction when doing MMIO.
> + */
> +#define NUM_LD_INSTR  9
> +enum INSTR_LS_INDEXES {
> +       INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
> +       INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
> +       INSTR_LS_LDRSH,
> +       INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
> +       INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
> +       NUM_LS_INSTR
> +};
> +
> +static u32 ls_instr[NUM_LS_INSTR][2] = {
> +       {0x04700000, 0x0d700000}, /* LDRBT */
> +       {0x04300000, 0x0d700000}, /* LDRT  */
> +       {0x04100000, 0x0c500000}, /* LDR   */
> +       {0x04500000, 0x0c500000}, /* LDRB  */
> +       {0x000000d0, 0x0e1000f0}, /* LDRD  */
> +       {0x01900090, 0x0ff000f0}, /* LDREX */
> +       {0x001000b0, 0x0e1000f0}, /* LDRH  */
> +       {0x001000d0, 0x0e1000f0}, /* LDRSB */
> +       {0x001000f0, 0x0e1000f0}, /* LDRSH */
> +       {0x04600000, 0x0d700000}, /* STRBT */
> +       {0x04200000, 0x0d700000}, /* STRT  */
> +       {0x04000000, 0x0c500000}, /* STR   */
> +       {0x04400000, 0x0c500000}, /* STRB  */
> +       {0x000000f0, 0x0e1000f0}, /* STRD  */
> +       {0x01800090, 0x0ff000f0}, /* STREX */
> +       {0x000000b0, 0x0e1000f0}  /* STRH  */
> +};
> +
> +static inline int get_arm_ls_instr_index(u32 instr)
> +{
> +       return kvm_instr_index(instr, ls_instr, NUM_LS_INSTR);
> +}
> +
> +/*
> + * Load-Store instruction decoding
> + */
> +#define INSTR_LS_TYPE_BIT              26
> +#define INSTR_LS_RD_MASK               0x0000f000
> +#define INSTR_LS_RD_SHIFT              12
> +#define INSTR_LS_RN_MASK               0x000f0000
> +#define INSTR_LS_RN_SHIFT              16
> +#define INSTR_LS_RM_MASK               0x0000000f
> +#define INSTR_LS_OFFSET12_MASK         0x00000fff

I'm afraid you're not going to thank me much for this, but it's high time we
unified the various instruction decoding functions we have under arch/arm/
and this seems like a good opportunity for that. For example, look at the
following snippets (there is much more in the files I list) in addition to
what you have:


asm/ptrace.h
-------------
#define PSR_T_BIT	0x00000020
#define PSR_F_BIT	0x00000040
#define PSR_I_BIT	0x00000080
#define PSR_A_BIT	0x00000100
#define PSR_E_BIT	0x00000200
#define PSR_J_BIT	0x01000000
#define PSR_Q_BIT	0x08000000
#define PSR_V_BIT	0x10000000
#define PSR_C_BIT	0x20000000
#define PSR_Z_BIT	0x40000000
#define PSR_N_BIT	0x80000000

mm/alignment.c
--------------
#define LDST_I_BIT(i)	(i & (1 << 26))		/* Immediate constant	*/
#define LDST_P_BIT(i)	(i & (1 << 24))		/* Preindex		*/
#define LDST_U_BIT(i)	(i & (1 << 23))		/* Add offset		*/
#define LDST_W_BIT(i)	(i & (1 << 21))		/* Writeback		*/
#define LDST_L_BIT(i)	(i & (1 << 20))		/* Load			*/

kernel/kprobes*.c
-----------------
static void __kprobes
emulate_ldr(struct kprobe *p, struct pt_regs *regs)
{
	kprobe_opcode_t insn = p->opcode;
	unsigned long pc = (unsigned long)p->addr + 8;
	int rt = (insn >> 12) & 0xf;
	int rn = (insn >> 16) & 0xf;
	int rm = insn & 0xf;

kernel/opcodes.c
----------------
static const unsigned short cc_map[16] = {
	0xF0F0,			/* EQ == Z set            */
	0x0F0F,			/* NE                     */
	0xCCCC,			/* CS == C set            */
	0x3333,			/* CC                     */
	0xFF00,			/* MI == N set            */
	0x00FF,			/* PL                     */
	0xAAAA,			/* VS == V set            */
	0x5555,			/* VC                     */
	0x0C0C,			/* HI == C set && Z clear */
	0xF3F3,			/* LS == C clear || Z set */
	0xAA55,			/* GE == (N==V)           */
	0x55AA,			/* LT == (N!=V)           */
	0x0A05,			/* GT == (!Z && (N==V))   */
	0xF5FA,			/* LE == (Z || (N!=V))    */
	0xFFFF,			/* AL always              */
	0			/* NV                     */
};

kernel/swp_emulate.c
--------------------
#define EXTRACT_REG_NUM(instruction, offset) \
	(((instruction) & (0xf << (offset))) >> (offset))
#define RN_OFFSET  16
#define RT_OFFSET  12
#define RT2_OFFSET  0


There are also bits and pieces with the patching frameworks and module
relocations that could benefit from some code sharing. Now, I think Dave had
some ideas about moving a load of this code into a common disassembler under
arch/arm/ so it would be great to tie that in here and implement that for
load/store instructions. Then other users can augment the number of
supported instruction classes as and when it is required.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 14/15] KVM: ARM: Handle I/O aborts
@ 2012-09-27 15:11     ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-27 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 15, 2012 at 04:35:59PM +0100, Christoffer Dall wrote:
> When the guest accesses I/O memory this will create data abort
> exceptions and they are handled by decoding the HSR information
> (physical address, read/write, length, register) and forwarding reads
> and writes to QEMU which performs the device emulation.
> 
> Certain classes of load/store operations do not support the syndrome
> information provided in the HSR and we therefore must be able to fetch
> the offending instruction from guest memory and decode it manually.
> 
> We only support instruction decoding for valid reasonable MMIO operations
> where trapping them do not provide sufficient information in the HSR (no
> 16-bit Thumb instructions provide register writeback that we care about).
> 
> The following instruciton types are NOT supported for MMIO operations
> despite the HSR not containing decode info:
>  - any Load/Store multiple
>  - any load/store exclusive
>  - any load/store dual
>  - anything with the PC as the dest register

[...]

> +
> +/******************************************************************************
> + * Load-Store instruction emulation
> + *****************************************************************************/
> +
> +/*
> + * Must be ordered with LOADS first and WRITES afterwards
> + * for easy distinction when doing MMIO.
> + */
> +#define NUM_LD_INSTR  9
> +enum INSTR_LS_INDEXES {
> +       INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
> +       INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
> +       INSTR_LS_LDRSH,
> +       INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
> +       INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
> +       NUM_LS_INSTR
> +};
> +
> +static u32 ls_instr[NUM_LS_INSTR][2] = {
> +       {0x04700000, 0x0d700000}, /* LDRBT */
> +       {0x04300000, 0x0d700000}, /* LDRT  */
> +       {0x04100000, 0x0c500000}, /* LDR   */
> +       {0x04500000, 0x0c500000}, /* LDRB  */
> +       {0x000000d0, 0x0e1000f0}, /* LDRD  */
> +       {0x01900090, 0x0ff000f0}, /* LDREX */
> +       {0x001000b0, 0x0e1000f0}, /* LDRH  */
> +       {0x001000d0, 0x0e1000f0}, /* LDRSB */
> +       {0x001000f0, 0x0e1000f0}, /* LDRSH */
> +       {0x04600000, 0x0d700000}, /* STRBT */
> +       {0x04200000, 0x0d700000}, /* STRT  */
> +       {0x04000000, 0x0c500000}, /* STR   */
> +       {0x04400000, 0x0c500000}, /* STRB  */
> +       {0x000000f0, 0x0e1000f0}, /* STRD  */
> +       {0x01800090, 0x0ff000f0}, /* STREX */
> +       {0x000000b0, 0x0e1000f0}  /* STRH  */
> +};
> +
> +static inline int get_arm_ls_instr_index(u32 instr)
> +{
> +       return kvm_instr_index(instr, ls_instr, NUM_LS_INSTR);
> +}
> +
> +/*
> + * Load-Store instruction decoding
> + */
> +#define INSTR_LS_TYPE_BIT              26
> +#define INSTR_LS_RD_MASK               0x0000f000
> +#define INSTR_LS_RD_SHIFT              12
> +#define INSTR_LS_RN_MASK               0x000f0000
> +#define INSTR_LS_RN_SHIFT              16
> +#define INSTR_LS_RM_MASK               0x0000000f
> +#define INSTR_LS_OFFSET12_MASK         0x00000fff

I'm afraid you're not going to thank me much for this, but it's high time we
unified the various instruction decoding functions we have under arch/arm/
and this seems like a good opportunity for that. For example, look at the
following snippets (there is much more in the files I list) in addition to
what you have:


asm/ptrace.h
-------------
#define PSR_T_BIT	0x00000020
#define PSR_F_BIT	0x00000040
#define PSR_I_BIT	0x00000080
#define PSR_A_BIT	0x00000100
#define PSR_E_BIT	0x00000200
#define PSR_J_BIT	0x01000000
#define PSR_Q_BIT	0x08000000
#define PSR_V_BIT	0x10000000
#define PSR_C_BIT	0x20000000
#define PSR_Z_BIT	0x40000000
#define PSR_N_BIT	0x80000000

mm/alignment.c
--------------
#define LDST_I_BIT(i)	(i & (1 << 26))		/* Immediate constant	*/
#define LDST_P_BIT(i)	(i & (1 << 24))		/* Preindex		*/
#define LDST_U_BIT(i)	(i & (1 << 23))		/* Add offset		*/
#define LDST_W_BIT(i)	(i & (1 << 21))		/* Writeback		*/
#define LDST_L_BIT(i)	(i & (1 << 20))		/* Load			*/

kernel/kprobes*.c
-----------------
static void __kprobes
emulate_ldr(struct kprobe *p, struct pt_regs *regs)
{
	kprobe_opcode_t insn = p->opcode;
	unsigned long pc = (unsigned long)p->addr + 8;
	int rt = (insn >> 12) & 0xf;
	int rn = (insn >> 16) & 0xf;
	int rm = insn & 0xf;

kernel/opcodes.c
----------------
static const unsigned short cc_map[16] = {
	0xF0F0,			/* EQ == Z set            */
	0x0F0F,			/* NE                     */
	0xCCCC,			/* CS == C set            */
	0x3333,			/* CC                     */
	0xFF00,			/* MI == N set            */
	0x00FF,			/* PL                     */
	0xAAAA,			/* VS == V set            */
	0x5555,			/* VC                     */
	0x0C0C,			/* HI == C set && Z clear */
	0xF3F3,			/* LS == C clear || Z set */
	0xAA55,			/* GE == (N==V)           */
	0x55AA,			/* LT == (N!=V)           */
	0x0A05,			/* GT == (!Z && (N==V))   */
	0xF5FA,			/* LE == (Z || (N!=V))    */
	0xFFFF,			/* AL always              */
	0			/* NV                     */
};

kernel/swp_emulate.c
--------------------
#define EXTRACT_REG_NUM(instruction, offset) \
	(((instruction) & (0xf << (offset))) >> (offset))
#define RN_OFFSET  16
#define RT_OFFSET  12
#define RT2_OFFSET  0


There are also bits and pieces with the patching frameworks and module
relocations that could benefit from some code sharing. Now, I think Dave had
some ideas about moving a load of this code into a common disassembler under
arch/arm/ so it would be great to tie that in here and implement that for
load/store instructions. Then other users can augment the number of
supported instruction classes as and when it is required.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-09-27 14:45           ` Peter Maydell
@ 2012-09-27 15:20             ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-27 15:20 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Christoffer Dall, kvm, rusty.russell, kvmarm, linux-arm-kernel

On Thu, Sep 27, 2012 at 03:45:50PM +0100, Peter Maydell wrote:
> On 27 September 2012 15:13, Will Deacon <will.deacon@arm.com> wrote:
> > On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
> >> this API has been discussed to death on the KVM lists, and we can of
> >> course revive that if the regset makes it nicer - I'd prefer getting
> >> this upstream the way that it is now though, where GET_REG / SET_REG
> >> seems to be the way forward from a KVM perspective.
> >
> > I'm sure the API has been discussed, but I've not seen that conversation and
> > I'm also not aware about whether or not regsets were considered as a
> > possibility for this stuff.
> 
> Can you point me at some documentation for regsets? It's a bit
> difficult to have a sensible conversation about their suitability
> otherwise...

The actual regset structure (struct user_regset) is internal to the kernel,
so it's not very well documented. As far as userspace interaction goes, the
usual method is via an iovec (see readv/writev) which is well documented, but
of course the kvm ioctl would still need documenting. For ptrace, that's just
a small paragraph in a user header:

/*
 * Generic ptrace interface that exports the architecture specific regsets
 * using the corresponding NT_* types (which are also used in the core dump).
 * Please note that the NT_PRSTATUS note type in a core dump contains a full
 * 'struct elf_prstatus'. But the user_regset for NT_PRSTATUS contains just the
 * elf_gregset_t that is the pr_reg field of 'struct elf_prstatus'. For all the
 * other user_regset flavors, the user_regset layout and the ELF core dump note
 * payload are exactly the same layout.
 *
 * This interface usage is as follows:
 *      struct iovec iov = { buf, len};
 *
 *      ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov);
 *
 * On the successful completion, iov.len will be updated by the kernel,
 * specifying how much the kernel has written/read to/from the user's iov.buf.
 */

but obviously you'd probably have something under Documentation/.

> (The potentially tricky area to handle is the cp15 registers, where
> it's quite likely that new registers might be added later and so
> userspace has to be able to query the kernel for what is supported
> and possibly deal with the kernel reporting attempts to set read
> only bits within registers, etc. Using the same ABI for simpler
> cases like the GPRs and FP registers is then just a matter of
> being consistent in the interface we expose to userspace.)

You *could* split up cp15 into lots of regsets but realistically that's going
to hurt down the line. GPR and FP would be good though.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-09-27 15:20             ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-27 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 27, 2012 at 03:45:50PM +0100, Peter Maydell wrote:
> On 27 September 2012 15:13, Will Deacon <will.deacon@arm.com> wrote:
> > On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
> >> this API has been discussed to death on the KVM lists, and we can of
> >> course revive that if the regset makes it nicer - I'd prefer getting
> >> this upstream the way that it is now though, where GET_REG / SET_REG
> >> seems to be the way forward from a KVM perspective.
> >
> > I'm sure the API has been discussed, but I've not seen that conversation and
> > I'm also not aware about whether or not regsets were considered as a
> > possibility for this stuff.
> 
> Can you point me at some documentation for regsets? It's a bit
> difficult to have a sensible conversation about their suitability
> otherwise...

The actual regset structure (struct user_regset) is internal to the kernel,
so it's not very well documented. As far as userspace interaction goes, the
usual method is via an iovec (see readv/writev) which is well documented, but
of course the kvm ioctl would still need documenting. For ptrace, that's just
a small paragraph in a user header:

/*
 * Generic ptrace interface that exports the architecture specific regsets
 * using the corresponding NT_* types (which are also used in the core dump).
 * Please note that the NT_PRSTATUS note type in a core dump contains a full
 * 'struct elf_prstatus'. But the user_regset for NT_PRSTATUS contains just the
 * elf_gregset_t that is the pr_reg field of 'struct elf_prstatus'. For all the
 * other user_regset flavors, the user_regset layout and the ELF core dump note
 * payload are exactly the same layout.
 *
 * This interface usage is as follows:
 *      struct iovec iov = { buf, len};
 *
 *      ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov);
 *
 * On the successful completion, iov.len will be updated by the kernel,
 * specifying how much the kernel has written/read to/from the user's iov.buf.
 */

but obviously you'd probably have something under Documentation/.

> (The potentially tricky area to handle is the cp15 registers, where
> it's quite likely that new registers might be added later and so
> userspace has to be able to query the kernel for what is supported
> and possibly deal with the kernel reporting attempts to set read
> only bits within registers, etc. Using the same ABI for simpler
> cases like the GPRs and FP registers is then just a matter of
> being consistent in the interface we expose to userspace.)

You *could* split up cp15 into lots of regsets but realistically that's going
to hurt down the line. GPR and FP would be good though.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
  2012-09-27  3:11         ` Min-gyu Kim
@ 2012-09-27 15:26           ` Marc Zyngier
  -1 siblings, 0 replies; 164+ messages in thread
From: Marc Zyngier @ 2012-09-27 15:26 UTC (permalink / raw)
  To: Min-gyu Kim
  Cc: 'Christoffer Dall', '김창환',
	linux-arm-kernel, kvm, kvmarm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=KSC5601, Size: 4037 bytes --]

On 27/09/12 04:11, Min-gyu Kim wrote:
> 
> 
>> -----Original Message-----
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
>> Behalf Of Christoffer Dall
>> Sent: Tuesday, September 25, 2012 9:39 PM
>> To: Min-gyu Kim
>> Cc: kvm@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
>> kvmarm@lists.cs.columbia.edu; ±èâȯ
>> Subject: Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
>>
>>>> +
>>>> +     /*
>>>> +      * If this is a write fault (think COW) we need to make sure the
>>>> +      * existing page, which other CPUs might still read, doesn't go
>>>> away
>>>> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
>>>> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
>>>> will
>>>> +      * pin the existing page, then we get a new page for the user
>> space
>>>> +      * pte and map this in the stage-2 table where we also make sure
>> to
>>>> +      * flush the TLB for the VM, if there was an existing entry
>>>> + (the
>>>> entry
>>>> +      * was updated setting the write flag to the potentially new
> page).
>>>> +      */
>>>> +     if (fault_status == FSC_PERM) {
>>>> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false,
> NULL);
>>>> +             if (is_error_pfn(pfn_existing))
>>>> +                     return -EFAULT;
>>>> +     }
>>>> +
>>>> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
>>>> +     if (is_error_pfn(pfn)) {
>>>> +             ret = -EFAULT;
>>>> +             goto out_put_existing;
>>>> +     }
>>>> +
>>>> +     /* We need minimum second+third level pages */
>>>> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
>>>> +     if (ret)
>>>> +             goto out;
>>>> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>>>> +     if (writable)
>>>> +             pte_val(new_pte) |= L_PTE2_WRITE;
>>>> +     coherent_icache_guest_page(vcpu->kvm, gfn);
>>>
>>> why don't you flush icache only when guest has mapped executable page
>>> as __sync_icache_dcache function does currently?
>>>
>>>
>>
>> because we don't know if the guest will map the page executable. The guest
>> may read the page through a normal load, which causes the fault, and
>> subsequently execute it (even possible through different guest mappings).
>> The only way to see this happening would be to mark all pages as non-
>> executable and catch the fault when it occurs - unfortunately the HPFAR
>> which gives us the IPA is not populated on execute never faults, so we
>> would have to translate the PC's va to ipa using cp15 functionality when
>> this happens, which is then also racy with other CPUs. So the question is
>> really if this will even be an optimization, but it's definitely something
>> that requires further investigation.
> 
> OK. I understand your point.
> 
> But if guest maps a page for execution, guest will flush Icache
> from __sync_icache_dcache. Then coherent_icache_guest_page doesn't seem to
> be
> necessary again. One thing I'm not sure in this case is when guest maps
> for kernel executable page(module loading) and it reuses the kernel
> executable page
> from host(module unloading). But in that case, I think it is possible to
> reduce 
> the number of flush by limiting the address range for flush.

I think you're missing the major point:
When the guest maps a page for execution, it knows it has to synchronize
Icache and Dcache. But the guest never knows when we swap out a page
because of memory pressure.

When the guest eventually faults that page back in, chances are it will
be a different physical page, and the cache content may be inconsistent.
We must then sync Icache/Dcache for this page.

Now, as Christoffer mentioned, there's a number of schemes we could
potentially use to mitigate this effect (using the XN bit in the Stage2
page tables), but it remains to be seen how effective it will be.

	M.
-- 
Jazz is not dead. It just smells funny...


^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
@ 2012-09-27 15:26           ` Marc Zyngier
  0 siblings, 0 replies; 164+ messages in thread
From: Marc Zyngier @ 2012-09-27 15:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/09/12 04:11, Min-gyu Kim wrote:
> 
> 
>> -----Original Message-----
>> From: kvm-owner at vger.kernel.org [mailto:kvm-owner at vger.kernel.org] On
>> Behalf Of Christoffer Dall
>> Sent: Tuesday, September 25, 2012 9:39 PM
>> To: Min-gyu Kim
>> Cc: kvm at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
>> kvmarm at lists.cs.columbia.edu; ???
>> Subject: Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
>>
>>>> +
>>>> +     /*
>>>> +      * If this is a write fault (think COW) we need to make sure the
>>>> +      * existing page, which other CPUs might still read, doesn't go
>>>> away
>>>> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
>>>> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
>>>> will
>>>> +      * pin the existing page, then we get a new page for the user
>> space
>>>> +      * pte and map this in the stage-2 table where we also make sure
>> to
>>>> +      * flush the TLB for the VM, if there was an existing entry
>>>> + (the
>>>> entry
>>>> +      * was updated setting the write flag to the potentially new
> page).
>>>> +      */
>>>> +     if (fault_status == FSC_PERM) {
>>>> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false,
> NULL);
>>>> +             if (is_error_pfn(pfn_existing))
>>>> +                     return -EFAULT;
>>>> +     }
>>>> +
>>>> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
>>>> +     if (is_error_pfn(pfn)) {
>>>> +             ret = -EFAULT;
>>>> +             goto out_put_existing;
>>>> +     }
>>>> +
>>>> +     /* We need minimum second+third level pages */
>>>> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
>>>> +     if (ret)
>>>> +             goto out;
>>>> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>>>> +     if (writable)
>>>> +             pte_val(new_pte) |= L_PTE2_WRITE;
>>>> +     coherent_icache_guest_page(vcpu->kvm, gfn);
>>>
>>> why don't you flush icache only when guest has mapped executable page
>>> as __sync_icache_dcache function does currently?
>>>
>>>
>>
>> because we don't know if the guest will map the page executable. The guest
>> may read the page through a normal load, which causes the fault, and
>> subsequently execute it (even possible through different guest mappings).
>> The only way to see this happening would be to mark all pages as non-
>> executable and catch the fault when it occurs - unfortunately the HPFAR
>> which gives us the IPA is not populated on execute never faults, so we
>> would have to translate the PC's va to ipa using cp15 functionality when
>> this happens, which is then also racy with other CPUs. So the question is
>> really if this will even be an optimization, but it's definitely something
>> that requires further investigation.
> 
> OK. I understand your point.
> 
> But if guest maps a page for execution, guest will flush Icache
> from __sync_icache_dcache. Then coherent_icache_guest_page doesn't seem to
> be
> necessary again. One thing I'm not sure in this case is when guest maps
> for kernel executable page(module loading) and it reuses the kernel
> executable page
> from host(module unloading). But in that case, I think it is possible to
> reduce 
> the number of flush by limiting the address range for flush.

I think you're missing the major point:
When the guest maps a page for execution, it knows it has to synchronize
Icache and Dcache. But the guest never knows when we swap out a page
because of memory pressure.

When the guest eventually faults that page back in, chances are it will
be a different physical page, and the cache content may be inconsistent.
We must then sync Icache/Dcache for this page.

Now, as Christoffer mentioned, there's a number of schemes we could
potentially use to mitigate this effect (using the XN bit in the Stage2
page tables), but it remains to be seen how effective it will be.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
  2012-09-27 12:39         ` Catalin Marinas
@ 2012-09-27 17:15           ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-27 17:15 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Min-gyu Kim, 김창환, linux-arm-kernel, kvm, kvmarm

On Thu, Sep 27, 2012 at 8:39 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On 25 September 2012 13:38, Christoffer Dall
> <c.dall@virtualopensystems.com> wrote:
>>>> +
>>>> +     /*
>>>> +      * If this is a write fault (think COW) we need to make sure the
>>>> +      * existing page, which other CPUs might still read, doesn't go
>>>> away
>>>> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
>>>> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
>>>> will
>>>> +      * pin the existing page, then we get a new page for the user space
>>>> +      * pte and map this in the stage-2 table where we also make sure to
>>>> +      * flush the TLB for the VM, if there was an existing entry (the
>>>> entry
>>>> +      * was updated setting the write flag to the potentially new page).
>>>> +      */
>>>> +     if (fault_status == FSC_PERM) {
>>>> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
>>>> +             if (is_error_pfn(pfn_existing))
>>>> +                     return -EFAULT;
>>>> +     }
>>>> +
>>>> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
>>>> +     if (is_error_pfn(pfn)) {
>>>> +             ret = -EFAULT;
>>>> +             goto out_put_existing;
>>>> +     }
>>>> +
>>>> +     /* We need minimum second+third level pages */
>>>> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
>>>> +     if (ret)
>>>> +             goto out;
>>>> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>>>> +     if (writable)
>>>> +             pte_val(new_pte) |= L_PTE2_WRITE;
>>>> +     coherent_icache_guest_page(vcpu->kvm, gfn);
>>>
>>> why don't you flush icache only when guest has mapped executable page
>>> as __sync_icache_dcache function does currently?
>>
>> because we don't know if the guest will map the page executable. The
>> guest may read the page through a normal load, which causes the fault,
>> and subsequently execute it (even possible through different guest
>> mappings). The only way to see this happening would be to mark all
>> pages as non-executable and catch the fault when it occurs -
>> unfortunately the HPFAR which gives us the IPA is not populated on
>> execute never faults, so we would have to translate the PC's va to ipa
>> using cp15 functionality when this happens, which is then also racy
>> with other CPUs.
>
> I think you can avoid the race in the stage 2 XN case. In the Hyp
> exception handler entered because of a stage 2 XN bit you can get the
> IPA via the CP15 ATS1CPR and PAR registers. If the address translation
> failed because the same guest running on other CPU changed the stage 1
> page table, you can simply return to the guest rather than switching
> to host with incomplete information. The guest may handle its own
> stage 1 fault and eventually trigger another stage 2 permission and
> Hyp will try the address translation again. That's a very rare
> situation, so just returning without handling it would not cause any
> performance issues.
>
you're right that the race is not a big issue, but it's not clear to
me that the trapping + ATS1CPR will be faster than just flushing
icache - we'll have to measure this.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
@ 2012-09-27 17:15           ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-27 17:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 27, 2012 at 8:39 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On 25 September 2012 13:38, Christoffer Dall
> <c.dall@virtualopensystems.com> wrote:
>>>> +
>>>> +     /*
>>>> +      * If this is a write fault (think COW) we need to make sure the
>>>> +      * existing page, which other CPUs might still read, doesn't go
>>>> away
>>>> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
>>>> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
>>>> will
>>>> +      * pin the existing page, then we get a new page for the user space
>>>> +      * pte and map this in the stage-2 table where we also make sure to
>>>> +      * flush the TLB for the VM, if there was an existing entry (the
>>>> entry
>>>> +      * was updated setting the write flag to the potentially new page).
>>>> +      */
>>>> +     if (fault_status == FSC_PERM) {
>>>> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
>>>> +             if (is_error_pfn(pfn_existing))
>>>> +                     return -EFAULT;
>>>> +     }
>>>> +
>>>> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
>>>> +     if (is_error_pfn(pfn)) {
>>>> +             ret = -EFAULT;
>>>> +             goto out_put_existing;
>>>> +     }
>>>> +
>>>> +     /* We need minimum second+third level pages */
>>>> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
>>>> +     if (ret)
>>>> +             goto out;
>>>> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
>>>> +     if (writable)
>>>> +             pte_val(new_pte) |= L_PTE2_WRITE;
>>>> +     coherent_icache_guest_page(vcpu->kvm, gfn);
>>>
>>> why don't you flush icache only when guest has mapped executable page
>>> as __sync_icache_dcache function does currently?
>>
>> because we don't know if the guest will map the page executable. The
>> guest may read the page through a normal load, which causes the fault,
>> and subsequently execute it (even possible through different guest
>> mappings). The only way to see this happening would be to mark all
>> pages as non-executable and catch the fault when it occurs -
>> unfortunately the HPFAR which gives us the IPA is not populated on
>> execute never faults, so we would have to translate the PC's va to ipa
>> using cp15 functionality when this happens, which is then also racy
>> with other CPUs.
>
> I think you can avoid the race in the stage 2 XN case. In the Hyp
> exception handler entered because of a stage 2 XN bit you can get the
> IPA via the CP15 ATS1CPR and PAR registers. If the address translation
> failed because the same guest running on other CPU changed the stage 1
> page table, you can simply return to the guest rather than switching
> to host with incomplete information. The guest may handle its own
> stage 1 fault and eventually trigger another stage 2 permission and
> Hyp will try the address translation again. That's a very rare
> situation, so just returning without handling it would not cause any
> performance issues.
>
you're right that the race is not a big issue, but it's not clear to
me that the trapping + ATS1CPR will be faster than just flushing
icache - we'll have to measure this.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
  2012-09-27 17:15           ` Christoffer Dall
@ 2012-09-27 17:21             ` Catalin Marinas
  -1 siblings, 0 replies; 164+ messages in thread
From: Catalin Marinas @ 2012-09-27 17:21 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Min-gyu Kim, ��âȯ, linux-arm-kernel, kvm, kvmarm

On Thu, Sep 27, 2012 at 06:15:05PM +0100, Christoffer Dall wrote:
> On Thu, Sep 27, 2012 at 8:39 AM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On 25 September 2012 13:38, Christoffer Dall
> > <c.dall@virtualopensystems.com> wrote:
> >>>> +
> >>>> +     /*
> >>>> +      * If this is a write fault (think COW) we need to make sure the
> >>>> +      * existing page, which other CPUs might still read, doesn't go
> >>>> away
> >>>> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
> >>>> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
> >>>> will
> >>>> +      * pin the existing page, then we get a new page for the user space
> >>>> +      * pte and map this in the stage-2 table where we also make sure to
> >>>> +      * flush the TLB for the VM, if there was an existing entry (the
> >>>> entry
> >>>> +      * was updated setting the write flag to the potentially new page).
> >>>> +      */
> >>>> +     if (fault_status == FSC_PERM) {
> >>>> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
> >>>> +             if (is_error_pfn(pfn_existing))
> >>>> +                     return -EFAULT;
> >>>> +     }
> >>>> +
> >>>> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
> >>>> +     if (is_error_pfn(pfn)) {
> >>>> +             ret = -EFAULT;
> >>>> +             goto out_put_existing;
> >>>> +     }
> >>>> +
> >>>> +     /* We need minimum second+third level pages */
> >>>> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
> >>>> +     if (ret)
> >>>> +             goto out;
> >>>> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
> >>>> +     if (writable)
> >>>> +             pte_val(new_pte) |= L_PTE2_WRITE;
> >>>> +     coherent_icache_guest_page(vcpu->kvm, gfn);
> >>>
> >>> why don't you flush icache only when guest has mapped executable page
> >>> as __sync_icache_dcache function does currently?
> >>
> >> because we don't know if the guest will map the page executable. The
> >> guest may read the page through a normal load, which causes the fault,
> >> and subsequently execute it (even possible through different guest
> >> mappings). The only way to see this happening would be to mark all
> >> pages as non-executable and catch the fault when it occurs -
> >> unfortunately the HPFAR which gives us the IPA is not populated on
> >> execute never faults, so we would have to translate the PC's va to ipa
> >> using cp15 functionality when this happens, which is then also racy
> >> with other CPUs.
> >
> > I think you can avoid the race in the stage 2 XN case. In the Hyp
> > exception handler entered because of a stage 2 XN bit you can get the
> > IPA via the CP15 ATS1CPR and PAR registers. If the address translation
> > failed because the same guest running on other CPU changed the stage 1
> > page table, you can simply return to the guest rather than switching
> > to host with incomplete information. The guest may handle its own
> > stage 1 fault and eventually trigger another stage 2 permission and
> > Hyp will try the address translation again. That's a very rare
> > situation, so just returning without handling it would not cause any
> > performance issues.
> >
> you're right that the race is not a big issue, but it's not clear to
> me that the trapping + ATS1CPR will be faster than just flushing
> icache - we'll have to measure this.

I agree, it needs measuring first as it may not be worth the hassle.

-- 
Catalin

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 13/15] KVM: ARM: Handle guest faults in KVM
@ 2012-09-27 17:21             ` Catalin Marinas
  0 siblings, 0 replies; 164+ messages in thread
From: Catalin Marinas @ 2012-09-27 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 27, 2012 at 06:15:05PM +0100, Christoffer Dall wrote:
> On Thu, Sep 27, 2012 at 8:39 AM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On 25 September 2012 13:38, Christoffer Dall
> > <c.dall@virtualopensystems.com> wrote:
> >>>> +
> >>>> +     /*
> >>>> +      * If this is a write fault (think COW) we need to make sure the
> >>>> +      * existing page, which other CPUs might still read, doesn't go
> >>>> away
> >>>> +      * from under us, by calling gfn_to_pfn_prot(write_fault=true).
> >>>> +      * Therefore, we call gfn_to_pfn_prot(write_fault=false), which
> >>>> will
> >>>> +      * pin the existing page, then we get a new page for the user space
> >>>> +      * pte and map this in the stage-2 table where we also make sure to
> >>>> +      * flush the TLB for the VM, if there was an existing entry (the
> >>>> entry
> >>>> +      * was updated setting the write flag to the potentially new page).
> >>>> +      */
> >>>> +     if (fault_status == FSC_PERM) {
> >>>> +             pfn_existing = gfn_to_pfn_prot(vcpu->kvm, gfn, false, NULL);
> >>>> +             if (is_error_pfn(pfn_existing))
> >>>> +                     return -EFAULT;
> >>>> +     }
> >>>> +
> >>>> +     pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
> >>>> +     if (is_error_pfn(pfn)) {
> >>>> +             ret = -EFAULT;
> >>>> +             goto out_put_existing;
> >>>> +     }
> >>>> +
> >>>> +     /* We need minimum second+third level pages */
> >>>> +     ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
> >>>> +     if (ret)
> >>>> +             goto out;
> >>>> +     new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
> >>>> +     if (writable)
> >>>> +             pte_val(new_pte) |= L_PTE2_WRITE;
> >>>> +     coherent_icache_guest_page(vcpu->kvm, gfn);
> >>>
> >>> why don't you flush icache only when guest has mapped executable page
> >>> as __sync_icache_dcache function does currently?
> >>
> >> because we don't know if the guest will map the page executable. The
> >> guest may read the page through a normal load, which causes the fault,
> >> and subsequently execute it (even possible through different guest
> >> mappings). The only way to see this happening would be to mark all
> >> pages as non-executable and catch the fault when it occurs -
> >> unfortunately the HPFAR which gives us the IPA is not populated on
> >> execute never faults, so we would have to translate the PC's va to ipa
> >> using cp15 functionality when this happens, which is then also racy
> >> with other CPUs.
> >
> > I think you can avoid the race in the stage 2 XN case. In the Hyp
> > exception handler entered because of a stage 2 XN bit you can get the
> > IPA via the CP15 ATS1CPR and PAR registers. If the address translation
> > failed because the same guest running on other CPU changed the stage 1
> > page table, you can simply return to the guest rather than switching
> > to host with incomplete information. The guest may handle its own
> > stage 1 fault and eventually trigger another stage 2 permission and
> > Hyp will try the address translation again. That's a very rare
> > situation, so just returning without handling it would not cause any
> > performance issues.
> >
> you're right that the race is not a big issue, but it's not clear to
> me that the trapping + ATS1CPR will be faster than just flushing
> icache - we'll have to measure this.

I agree, it needs measuring first as it may not be worth the hassle.

-- 
Catalin

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 09/15] KVM: ARM: Inject IRQs and FIQs from userspace
  2012-09-25 15:55     ` Will Deacon
@ 2012-09-29 15:50       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-29 15:50 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Sep 25, 2012 at 11:55 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:35:27PM +0100, Christoffer Dall wrote:
>> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
>> index a13b582..131e632 100644
>> --- a/arch/arm/include/asm/kvm.h
>> +++ b/arch/arm/include/asm/kvm.h
>> @@ -22,6 +22,7 @@
>>  #include <asm/types.h>
>>
>>  #define __KVM_HAVE_GUEST_DEBUG
>> +#define __KVM_HAVE_IRQ_LINE
>>
>>  #define KVM_REG_SIZE(id)                                             \
>>       (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>> @@ -85,4 +86,24 @@ struct kvm_reg_list {
>>  #define KVM_REG_ARM_CORE             (0x0010 << KVM_REG_ARM_COPROC_SHIFT)
>>  #define KVM_REG_ARM_CORE_REG(name)   (offsetof(struct kvm_regs, name) / 4)
>>
>> +/* KVM_IRQ_LINE irq field index values */
>> +#define KVM_ARM_IRQ_TYPE_SHIFT               24
>> +#define KVM_ARM_IRQ_TYPE_MASK                0xff
>> +#define KVM_ARM_IRQ_VCPU_SHIFT               16
>> +#define KVM_ARM_IRQ_VCPU_MASK                0xff
>> +#define KVM_ARM_IRQ_NUM_SHIFT                0
>> +#define KVM_ARM_IRQ_NUM_MASK         0xffff
>> +
>> +/* irq_type field */
>> +#define KVM_ARM_IRQ_TYPE_CPU         0
>> +#define KVM_ARM_IRQ_TYPE_SPI         1
>> +#define KVM_ARM_IRQ_TYPE_PPI         2
>> +
>> +/* out-of-kernel GIC cpu interrupt injection irq_number field */
>> +#define KVM_ARM_IRQ_CPU_IRQ          0
>> +#define KVM_ARM_IRQ_CPU_FIQ          1
>> +
>> +/* Highest supported SPI, from VGIC_NR_IRQS */
>> +#define KVM_ARM_IRQ_GIC_MAX          127
>
> This define, and those referring to PPIs and SPIs sound highly GIC-specific.
> Is that really appropriate for kvm.h? Do you mandate a single GIC as the
> only interrupt controller?
>

you can add a another gic with another in-kernel gic emulation if
someone makes such one that's different from the vgic specifications
by defining another irq type.

devices must interact with a gic available in the kernel, so I think
referring to PPIs and SPIs is very appropriate in kvm.h for a user
space device emulation that must inject either a PPI or an SPI.

We can call them TYPE_GIC_V2_XXX or something like that if you feel
this is cleaner.

>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index 6e46541..0f641c1 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -74,6 +74,7 @@
>>  #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
>>                       HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
>>                       HCR_SWIO | HCR_TIDCP)
>> +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>>
>>  /* Hyp System Control Register (HSCTLR) bits */
>>  #define HSCTLR_TE    (1 << 30)
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index b97ebd0..8a87fc7 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/fs.h>
>>  #include <linux/mman.h>
>>  #include <linux/sched.h>
>> +#include <linux/kvm.h>
>>  #include <trace/events/kvm.h>
>>
>>  #define CREATE_TRACE_POINTS
>> @@ -271,6 +272,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>>
>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  {
>> +     vcpu->cpu = cpu;
>>  }
>>
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> @@ -311,6 +313,74 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>       return -EINVAL;
>>  }
>>
>> +static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
>> +{
>> +     int bit_index;
>> +     bool set;
>> +     unsigned long *ptr;
>> +
>> +     if (number == KVM_ARM_IRQ_CPU_IRQ)
>> +             bit_index = ffs(HCR_VI) - 1;
>> +     else /* KVM_ARM_IRQ_CPU_FIQ */
>> +             bit_index = ffs(HCR_VF) - 1;
>
> __ffs
>
fixed


-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 09/15] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2012-09-29 15:50       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-29 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 25, 2012 at 11:55 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:35:27PM +0100, Christoffer Dall wrote:
>> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
>> index a13b582..131e632 100644
>> --- a/arch/arm/include/asm/kvm.h
>> +++ b/arch/arm/include/asm/kvm.h
>> @@ -22,6 +22,7 @@
>>  #include <asm/types.h>
>>
>>  #define __KVM_HAVE_GUEST_DEBUG
>> +#define __KVM_HAVE_IRQ_LINE
>>
>>  #define KVM_REG_SIZE(id)                                             \
>>       (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>> @@ -85,4 +86,24 @@ struct kvm_reg_list {
>>  #define KVM_REG_ARM_CORE             (0x0010 << KVM_REG_ARM_COPROC_SHIFT)
>>  #define KVM_REG_ARM_CORE_REG(name)   (offsetof(struct kvm_regs, name) / 4)
>>
>> +/* KVM_IRQ_LINE irq field index values */
>> +#define KVM_ARM_IRQ_TYPE_SHIFT               24
>> +#define KVM_ARM_IRQ_TYPE_MASK                0xff
>> +#define KVM_ARM_IRQ_VCPU_SHIFT               16
>> +#define KVM_ARM_IRQ_VCPU_MASK                0xff
>> +#define KVM_ARM_IRQ_NUM_SHIFT                0
>> +#define KVM_ARM_IRQ_NUM_MASK         0xffff
>> +
>> +/* irq_type field */
>> +#define KVM_ARM_IRQ_TYPE_CPU         0
>> +#define KVM_ARM_IRQ_TYPE_SPI         1
>> +#define KVM_ARM_IRQ_TYPE_PPI         2
>> +
>> +/* out-of-kernel GIC cpu interrupt injection irq_number field */
>> +#define KVM_ARM_IRQ_CPU_IRQ          0
>> +#define KVM_ARM_IRQ_CPU_FIQ          1
>> +
>> +/* Highest supported SPI, from VGIC_NR_IRQS */
>> +#define KVM_ARM_IRQ_GIC_MAX          127
>
> This define, and those referring to PPIs and SPIs sound highly GIC-specific.
> Is that really appropriate for kvm.h? Do you mandate a single GIC as the
> only interrupt controller?
>

you can add a another gic with another in-kernel gic emulation if
someone makes such one that's different from the vgic specifications
by defining another irq type.

devices must interact with a gic available in the kernel, so I think
referring to PPIs and SPIs is very appropriate in kvm.h for a user
space device emulation that must inject either a PPI or an SPI.

We can call them TYPE_GIC_V2_XXX or something like that if you feel
this is cleaner.

>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index 6e46541..0f641c1 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -74,6 +74,7 @@
>>  #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
>>                       HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
>>                       HCR_SWIO | HCR_TIDCP)
>> +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>>
>>  /* Hyp System Control Register (HSCTLR) bits */
>>  #define HSCTLR_TE    (1 << 30)
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index b97ebd0..8a87fc7 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/fs.h>
>>  #include <linux/mman.h>
>>  #include <linux/sched.h>
>> +#include <linux/kvm.h>
>>  #include <trace/events/kvm.h>
>>
>>  #define CREATE_TRACE_POINTS
>> @@ -271,6 +272,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>>
>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  {
>> +     vcpu->cpu = cpu;
>>  }
>>
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> @@ -311,6 +313,74 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>       return -EINVAL;
>>  }
>>
>> +static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
>> +{
>> +     int bit_index;
>> +     bool set;
>> +     unsigned long *ptr;
>> +
>> +     if (number == KVM_ARM_IRQ_CPU_IRQ)
>> +             bit_index = ffs(HCR_VI) - 1;
>> +     else /* KVM_ARM_IRQ_CPU_FIQ */
>> +             bit_index = ffs(HCR_VF) - 1;
>
> __ffs
>
fixed


-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 15/15] KVM: ARM: Guest wait-for-interrupts (WFI) support
  2012-09-25 17:04     ` Will Deacon
@ 2012-09-29 23:00       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-29 23:00 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Sep 25, 2012 at 1:04 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:36:05PM +0100, Christoffer Dall wrote:
>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>
>> When the guest executes a WFI instruction the operation is trapped to
>> KVM, which emulates the instruction in software. There is no correlation
>> between a guest executing a WFI instruction and actually putting the
>> hardware into a low-power mode, since a KVM guest is essentially a
>> process and the WFI instruction can be seen as 'sleep' call from this
>> process. Therefore, we block the vcpu when the guest excecutes a wfi
>> instruction and the IRQ or FIQ lines are not raised.
>>
>> When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
>> signal the VCPU thread and unflag the VCPU to no longer wait for
>> interrupts.
>
> Seems a bit strange tagging this small addition on the end of this series.
> Can you merge it in with the rest?
>

sure, I'll tag it in with the world-switch or emulation.

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 15/15] KVM: ARM: Guest wait-for-interrupts (WFI) support
@ 2012-09-29 23:00       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-29 23:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 25, 2012 at 1:04 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:36:05PM +0100, Christoffer Dall wrote:
>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>
>> When the guest executes a WFI instruction the operation is trapped to
>> KVM, which emulates the instruction in software. There is no correlation
>> between a guest executing a WFI instruction and actually putting the
>> hardware into a low-power mode, since a KVM guest is essentially a
>> process and the WFI instruction can be seen as 'sleep' call from this
>> process. Therefore, we block the vcpu when the guest excecutes a wfi
>> instruction and the IRQ or FIQ lines are not raised.
>>
>> When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
>> signal the VCPU thread and unflag the VCPU to no longer wait for
>> interrupts.
>
> Seems a bit strange tagging this small addition on the end of this series.
> Can you merge it in with the rest?
>

sure, I'll tag it in with the world-switch or emulation.

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 10/15] KVM: ARM: World-switch implementation
  2012-09-25 17:42         ` Marc Zyngier
@ 2012-09-30  0:33           ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30  0:33 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Peter Maydell, Will Deacon, kvm, linux-arm-kernel, kvmarm

On Tue, Sep 25, 2012 at 1:42 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On Tue, 25 Sep 2012 18:15:50 +0100, Peter Maydell
> <peter.maydell@linaro.org> wrote:
>> On 25 September 2012 18:00, Will Deacon <will.deacon@arm.com> wrote:
>>> On Sat, Sep 15, 2012 at 04:35:33PM +0100, Christoffer Dall wrote:
>>>>  ENTRY(__kvm_tlb_flush_vmid)
>>>> +       hvc     #0                      @ Switch to Hyp mode
>>>> +       push    {r2, r3}
>>>> +
>>>> +       add     r0, r0, #KVM_VTTBR
>>>> +       ldrd    r2, r3, [r0]
>>>> +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>>>> +       isb
>>>> +       mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
>>>> +       dsb
>>>> +       isb
>>>> +       mov     r2, #0
>>>> +       mov     r3, #0
>>>> +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
>>>> +       isb
>>>
>>> Do you need this isb, given that you're about to do an hvc?
>>>
>>>> +       pop     {r2, r3}
>>>> +       hvc     #0                      @ Back to SVC
>>>>         bx      lr
>>
>> ...you probably don't want to do the memory accesses involved
>> in the 'pop' under the wrong VMID ?
>
> Well, we're still in HYP mode when performing the pop, so the VMID is
> pretty much irrelevant. Same for the initial push, actually. As long as
> we're sure VTTBR has been updated when we do the exception return, I think
> we're safe.
>
yeah we're safe, but I can't find anywhere that says the ISB is
implied from exception handling/hvc, even though I seem to recall
having read this, so I put this here not to worry other readers.

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 10/15] KVM: ARM: World-switch implementation
@ 2012-09-30  0:33           ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30  0:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 25, 2012 at 1:42 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On Tue, 25 Sep 2012 18:15:50 +0100, Peter Maydell
> <peter.maydell@linaro.org> wrote:
>> On 25 September 2012 18:00, Will Deacon <will.deacon@arm.com> wrote:
>>> On Sat, Sep 15, 2012 at 04:35:33PM +0100, Christoffer Dall wrote:
>>>>  ENTRY(__kvm_tlb_flush_vmid)
>>>> +       hvc     #0                      @ Switch to Hyp mode
>>>> +       push    {r2, r3}
>>>> +
>>>> +       add     r0, r0, #KVM_VTTBR
>>>> +       ldrd    r2, r3, [r0]
>>>> +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>>>> +       isb
>>>> +       mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
>>>> +       dsb
>>>> +       isb
>>>> +       mov     r2, #0
>>>> +       mov     r3, #0
>>>> +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
>>>> +       isb
>>>
>>> Do you need this isb, given that you're about to do an hvc?
>>>
>>>> +       pop     {r2, r3}
>>>> +       hvc     #0                      @ Back to SVC
>>>>         bx      lr
>>
>> ...you probably don't want to do the memory accesses involved
>> in the 'pop' under the wrong VMID ?
>
> Well, we're still in HYP mode when performing the pop, so the VMID is
> pretty much irrelevant. Same for the initial push, actually. As long as
> we're sure VTTBR has been updated when we do the exception return, I think
> we're safe.
>
yeah we're safe, but I can't find anywhere that says the ISB is
implied from exception handling/hvc, even though I seem to recall
having read this, so I put this here not to worry other readers.

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 10/15] KVM: ARM: World-switch implementation
  2012-09-30  0:33           ` Christoffer Dall
@ 2012-09-30  9:48             ` Peter Maydell
  -1 siblings, 0 replies; 164+ messages in thread
From: Peter Maydell @ 2012-09-30  9:48 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, Will Deacon, kvm, linux-arm-kernel, kvmarm

On 30 September 2012 01:33, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On Tue, Sep 25, 2012 at 1:42 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> Well, we're still in HYP mode when performing the pop, so the VMID is
>> pretty much irrelevant. Same for the initial push, actually. As long as
>> we're sure VTTBR has been updated when we do the exception return, I think
>> we're safe.
>>
> yeah we're safe, but I can't find anywhere that says the ISB is
> implied from exception handling/hvc, even though I seem to recall
> having read this, so I put this here not to worry other readers.

"TLB maintenance operations and the memory order model" in section
B3.10.1 of the ARM ARM says that exception entry forces completion of
TLB maintenance ops. Section B5.6.3 ("Synchronization of changes to
system control registers") says that taking an exception synchronizes
context, so the VTTBR update is guaranteed to have happened.

-- PMM

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 10/15] KVM: ARM: World-switch implementation
@ 2012-09-30  9:48             ` Peter Maydell
  0 siblings, 0 replies; 164+ messages in thread
From: Peter Maydell @ 2012-09-30  9:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 30 September 2012 01:33, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On Tue, Sep 25, 2012 at 1:42 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> Well, we're still in HYP mode when performing the pop, so the VMID is
>> pretty much irrelevant. Same for the initial push, actually. As long as
>> we're sure VTTBR has been updated when we do the exception return, I think
>> we're safe.
>>
> yeah we're safe, but I can't find anywhere that says the ISB is
> implied from exception handling/hvc, even though I seem to recall
> having read this, so I put this here not to worry other readers.

"TLB maintenance operations and the memory order model" in section
B3.10.1 of the ARM ARM says that exception entry forces completion of
TLB maintenance ops. Section B5.6.3 ("Synchronization of changes to
system control registers") says that taking an exception synchronizes
context, so the VTTBR update is guaranteed to have happened.

-- PMM

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 09/15] KVM: ARM: Inject IRQs and FIQs from userspace
  2012-09-29 15:50       ` Christoffer Dall
@ 2012-09-30 12:48         ` Will Deacon
  -1 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-30 12:48 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm

Hi Christoffer,

On Sat, Sep 29, 2012 at 04:50:25PM +0100, Christoffer Dall wrote:
> On Tue, Sep 25, 2012 at 11:55 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Sat, Sep 15, 2012 at 04:35:27PM +0100, Christoffer Dall wrote:
> >> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
> >> index a13b582..131e632 100644
> >> --- a/arch/arm/include/asm/kvm.h
> >> +++ b/arch/arm/include/asm/kvm.h
> >> @@ -22,6 +22,7 @@
> >>  #include <asm/types.h>
> >>
> >>  #define __KVM_HAVE_GUEST_DEBUG
> >> +#define __KVM_HAVE_IRQ_LINE
> >>
> >>  #define KVM_REG_SIZE(id)                                             \
> >>       (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> >> @@ -85,4 +86,24 @@ struct kvm_reg_list {
> >>  #define KVM_REG_ARM_CORE             (0x0010 << KVM_REG_ARM_COPROC_SHIFT)
> >>  #define KVM_REG_ARM_CORE_REG(name)   (offsetof(struct kvm_regs, name) / 4)
> >>
> >> +/* KVM_IRQ_LINE irq field index values */
> >> +#define KVM_ARM_IRQ_TYPE_SHIFT               24
> >> +#define KVM_ARM_IRQ_TYPE_MASK                0xff
> >> +#define KVM_ARM_IRQ_VCPU_SHIFT               16
> >> +#define KVM_ARM_IRQ_VCPU_MASK                0xff
> >> +#define KVM_ARM_IRQ_NUM_SHIFT                0
> >> +#define KVM_ARM_IRQ_NUM_MASK         0xffff
> >> +
> >> +/* irq_type field */
> >> +#define KVM_ARM_IRQ_TYPE_CPU         0
> >> +#define KVM_ARM_IRQ_TYPE_SPI         1
> >> +#define KVM_ARM_IRQ_TYPE_PPI         2
> >> +
> >> +/* out-of-kernel GIC cpu interrupt injection irq_number field */
> >> +#define KVM_ARM_IRQ_CPU_IRQ          0
> >> +#define KVM_ARM_IRQ_CPU_FIQ          1
> >> +
> >> +/* Highest supported SPI, from VGIC_NR_IRQS */
> >> +#define KVM_ARM_IRQ_GIC_MAX          127
> >
> > This define, and those referring to PPIs and SPIs sound highly GIC-specific.
> > Is that really appropriate for kvm.h? Do you mandate a single GIC as the
> > only interrupt controller?
> >
> 
> you can add a another gic with another in-kernel gic emulation if
> someone makes such one that's different from the vgic specifications
> by defining another irq type.
> 
> devices must interact with a gic available in the kernel, so I think
> referring to PPIs and SPIs is very appropriate in kvm.h for a user
> space device emulation that must inject either a PPI or an SPI.
> 
> We can call them TYPE_GIC_V2_XXX or something like that if you feel
> this is cleaner.

It's more that the GIC isn't really part of the architecture, so it would be
cleaner to have the GIC-specifics separated out from the architectural part
of KVM. That will also make it easier when adding support for future
versions of the GIC.

If core KVM needs the concept of a per-cpu interrupt, just call it
IRQ_TYPE_PERCPU or something rather than PPI.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 09/15] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2012-09-30 12:48         ` Will Deacon
  0 siblings, 0 replies; 164+ messages in thread
From: Will Deacon @ 2012-09-30 12:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On Sat, Sep 29, 2012 at 04:50:25PM +0100, Christoffer Dall wrote:
> On Tue, Sep 25, 2012 at 11:55 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Sat, Sep 15, 2012 at 04:35:27PM +0100, Christoffer Dall wrote:
> >> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
> >> index a13b582..131e632 100644
> >> --- a/arch/arm/include/asm/kvm.h
> >> +++ b/arch/arm/include/asm/kvm.h
> >> @@ -22,6 +22,7 @@
> >>  #include <asm/types.h>
> >>
> >>  #define __KVM_HAVE_GUEST_DEBUG
> >> +#define __KVM_HAVE_IRQ_LINE
> >>
> >>  #define KVM_REG_SIZE(id)                                             \
> >>       (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> >> @@ -85,4 +86,24 @@ struct kvm_reg_list {
> >>  #define KVM_REG_ARM_CORE             (0x0010 << KVM_REG_ARM_COPROC_SHIFT)
> >>  #define KVM_REG_ARM_CORE_REG(name)   (offsetof(struct kvm_regs, name) / 4)
> >>
> >> +/* KVM_IRQ_LINE irq field index values */
> >> +#define KVM_ARM_IRQ_TYPE_SHIFT               24
> >> +#define KVM_ARM_IRQ_TYPE_MASK                0xff
> >> +#define KVM_ARM_IRQ_VCPU_SHIFT               16
> >> +#define KVM_ARM_IRQ_VCPU_MASK                0xff
> >> +#define KVM_ARM_IRQ_NUM_SHIFT                0
> >> +#define KVM_ARM_IRQ_NUM_MASK         0xffff
> >> +
> >> +/* irq_type field */
> >> +#define KVM_ARM_IRQ_TYPE_CPU         0
> >> +#define KVM_ARM_IRQ_TYPE_SPI         1
> >> +#define KVM_ARM_IRQ_TYPE_PPI         2
> >> +
> >> +/* out-of-kernel GIC cpu interrupt injection irq_number field */
> >> +#define KVM_ARM_IRQ_CPU_IRQ          0
> >> +#define KVM_ARM_IRQ_CPU_FIQ          1
> >> +
> >> +/* Highest supported SPI, from VGIC_NR_IRQS */
> >> +#define KVM_ARM_IRQ_GIC_MAX          127
> >
> > This define, and those referring to PPIs and SPIs sound highly GIC-specific.
> > Is that really appropriate for kvm.h? Do you mandate a single GIC as the
> > only interrupt controller?
> >
> 
> you can add a another gic with another in-kernel gic emulation if
> someone makes such one that's different from the vgic specifications
> by defining another irq type.
> 
> devices must interact with a gic available in the kernel, so I think
> referring to PPIs and SPIs is very appropriate in kvm.h for a user
> space device emulation that must inject either a PPI or an SPI.
> 
> We can call them TYPE_GIC_V2_XXX or something like that if you feel
> this is cleaner.

It's more that the GIC isn't really part of the architecture, so it would be
cleaner to have the GIC-specifics separated out from the architectural part
of KVM. That will also make it easier when adding support for future
versions of the GIC.

If core KVM needs the concept of a per-cpu interrupt, just call it
IRQ_TYPE_PERCPU or something rather than PPI.

Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 10/15] KVM: ARM: World-switch implementation
  2012-09-30  9:48             ` Peter Maydell
@ 2012-09-30 14:31               ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30 14:31 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Marc Zyngier, Will Deacon, kvm, linux-arm-kernel, kvmarm

On Sun, Sep 30, 2012 at 5:48 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 30 September 2012 01:33, Christoffer Dall
> <c.dall@virtualopensystems.com> wrote:
>> On Tue, Sep 25, 2012 at 1:42 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> Well, we're still in HYP mode when performing the pop, so the VMID is
>>> pretty much irrelevant. Same for the initial push, actually. As long as
>>> we're sure VTTBR has been updated when we do the exception return, I think
>>> we're safe.
>>>
>> yeah we're safe, but I can't find anywhere that says the ISB is
>> implied from exception handling/hvc, even though I seem to recall
>> having read this, so I put this here not to worry other readers.
>
> "TLB maintenance operations and the memory order model" in section
> B3.10.1 of the ARM ARM says that exception entry forces completion of
> TLB maintenance ops. Section B5.6.3 ("Synchronization of changes to
> system control registers") says that taking an exception synchronizes
> context, so the VTTBR update is guaranteed to have happened.
>
thanks - I was counting on your help here :)

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 10/15] KVM: ARM: World-switch implementation
@ 2012-09-30 14:31               ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30 14:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Sep 30, 2012 at 5:48 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 30 September 2012 01:33, Christoffer Dall
> <c.dall@virtualopensystems.com> wrote:
>> On Tue, Sep 25, 2012 at 1:42 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> Well, we're still in HYP mode when performing the pop, so the VMID is
>>> pretty much irrelevant. Same for the initial push, actually. As long as
>>> we're sure VTTBR has been updated when we do the exception return, I think
>>> we're safe.
>>>
>> yeah we're safe, but I can't find anywhere that says the ISB is
>> implied from exception handling/hvc, even though I seem to recall
>> having read this, so I put this here not to worry other readers.
>
> "TLB maintenance operations and the memory order model" in section
> B3.10.1 of the ARM ARM says that exception entry forces completion of
> TLB maintenance ops. Section B5.6.3 ("Synchronization of changes to
> system control registers") says that taking an exception synchronizes
> context, so the VTTBR update is guaranteed to have happened.
>
thanks - I was counting on your help here :)

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 09/15] KVM: ARM: Inject IRQs and FIQs from userspace
  2012-09-30 12:48         ` Will Deacon
@ 2012-09-30 14:34           ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30 14:34 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvm, linux-arm-kernel, kvmarm

On Sun, Sep 30, 2012 at 8:48 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Christoffer,
>
> On Sat, Sep 29, 2012 at 04:50:25PM +0100, Christoffer Dall wrote:
>> On Tue, Sep 25, 2012 at 11:55 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Sat, Sep 15, 2012 at 04:35:27PM +0100, Christoffer Dall wrote:
>> >> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
>> >> index a13b582..131e632 100644
>> >> --- a/arch/arm/include/asm/kvm.h
>> >> +++ b/arch/arm/include/asm/kvm.h
>> >> @@ -22,6 +22,7 @@
>> >>  #include <asm/types.h>
>> >>
>> >>  #define __KVM_HAVE_GUEST_DEBUG
>> >> +#define __KVM_HAVE_IRQ_LINE
>> >>
>> >>  #define KVM_REG_SIZE(id)                                             \
>> >>       (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>> >> @@ -85,4 +86,24 @@ struct kvm_reg_list {
>> >>  #define KVM_REG_ARM_CORE             (0x0010 << KVM_REG_ARM_COPROC_SHIFT)
>> >>  #define KVM_REG_ARM_CORE_REG(name)   (offsetof(struct kvm_regs, name) / 4)
>> >>
>> >> +/* KVM_IRQ_LINE irq field index values */
>> >> +#define KVM_ARM_IRQ_TYPE_SHIFT               24
>> >> +#define KVM_ARM_IRQ_TYPE_MASK                0xff
>> >> +#define KVM_ARM_IRQ_VCPU_SHIFT               16
>> >> +#define KVM_ARM_IRQ_VCPU_MASK                0xff
>> >> +#define KVM_ARM_IRQ_NUM_SHIFT                0
>> >> +#define KVM_ARM_IRQ_NUM_MASK         0xffff
>> >> +
>> >> +/* irq_type field */
>> >> +#define KVM_ARM_IRQ_TYPE_CPU         0
>> >> +#define KVM_ARM_IRQ_TYPE_SPI         1
>> >> +#define KVM_ARM_IRQ_TYPE_PPI         2
>> >> +
>> >> +/* out-of-kernel GIC cpu interrupt injection irq_number field */
>> >> +#define KVM_ARM_IRQ_CPU_IRQ          0
>> >> +#define KVM_ARM_IRQ_CPU_FIQ          1
>> >> +
>> >> +/* Highest supported SPI, from VGIC_NR_IRQS */
>> >> +#define KVM_ARM_IRQ_GIC_MAX          127
>> >
>> > This define, and those referring to PPIs and SPIs sound highly GIC-specific.
>> > Is that really appropriate for kvm.h? Do you mandate a single GIC as the
>> > only interrupt controller?
>> >
>>
>> you can add a another gic with another in-kernel gic emulation if
>> someone makes such one that's different from the vgic specifications
>> by defining another irq type.
>>
>> devices must interact with a gic available in the kernel, so I think
>> referring to PPIs and SPIs is very appropriate in kvm.h for a user
>> space device emulation that must inject either a PPI or an SPI.
>>
>> We can call them TYPE_GIC_V2_XXX or something like that if you feel
>> this is cleaner.
>
> It's more that the GIC isn't really part of the architecture, so it would be
> cleaner to have the GIC-specifics separated out from the architectural part
> of KVM. That will also make it easier when adding support for future
> versions of the GIC.
>
> If core KVM needs the concept of a per-cpu interrupt, just call it
> IRQ_TYPE_PERCPU or something rather than PPI.
>
that we have already, KVM_ARM_IRQ_TYPE_CPU

then how do you propose an interface for a user space emulation of a
board that uses the vgic where a device needs to inject a PPI to the
kernel that emulates the vgic?

I don't see the dire need for this separation: the API is extendable
and covers all the needs at this point.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 09/15] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2012-09-30 14:34           ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Sep 30, 2012 at 8:48 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Christoffer,
>
> On Sat, Sep 29, 2012 at 04:50:25PM +0100, Christoffer Dall wrote:
>> On Tue, Sep 25, 2012 at 11:55 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Sat, Sep 15, 2012 at 04:35:27PM +0100, Christoffer Dall wrote:
>> >> diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
>> >> index a13b582..131e632 100644
>> >> --- a/arch/arm/include/asm/kvm.h
>> >> +++ b/arch/arm/include/asm/kvm.h
>> >> @@ -22,6 +22,7 @@
>> >>  #include <asm/types.h>
>> >>
>> >>  #define __KVM_HAVE_GUEST_DEBUG
>> >> +#define __KVM_HAVE_IRQ_LINE
>> >>
>> >>  #define KVM_REG_SIZE(id)                                             \
>> >>       (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>> >> @@ -85,4 +86,24 @@ struct kvm_reg_list {
>> >>  #define KVM_REG_ARM_CORE             (0x0010 << KVM_REG_ARM_COPROC_SHIFT)
>> >>  #define KVM_REG_ARM_CORE_REG(name)   (offsetof(struct kvm_regs, name) / 4)
>> >>
>> >> +/* KVM_IRQ_LINE irq field index values */
>> >> +#define KVM_ARM_IRQ_TYPE_SHIFT               24
>> >> +#define KVM_ARM_IRQ_TYPE_MASK                0xff
>> >> +#define KVM_ARM_IRQ_VCPU_SHIFT               16
>> >> +#define KVM_ARM_IRQ_VCPU_MASK                0xff
>> >> +#define KVM_ARM_IRQ_NUM_SHIFT                0
>> >> +#define KVM_ARM_IRQ_NUM_MASK         0xffff
>> >> +
>> >> +/* irq_type field */
>> >> +#define KVM_ARM_IRQ_TYPE_CPU         0
>> >> +#define KVM_ARM_IRQ_TYPE_SPI         1
>> >> +#define KVM_ARM_IRQ_TYPE_PPI         2
>> >> +
>> >> +/* out-of-kernel GIC cpu interrupt injection irq_number field */
>> >> +#define KVM_ARM_IRQ_CPU_IRQ          0
>> >> +#define KVM_ARM_IRQ_CPU_FIQ          1
>> >> +
>> >> +/* Highest supported SPI, from VGIC_NR_IRQS */
>> >> +#define KVM_ARM_IRQ_GIC_MAX          127
>> >
>> > This define, and those referring to PPIs and SPIs sound highly GIC-specific.
>> > Is that really appropriate for kvm.h? Do you mandate a single GIC as the
>> > only interrupt controller?
>> >
>>
>> you can add a another gic with another in-kernel gic emulation if
>> someone makes such one that's different from the vgic specifications
>> by defining another irq type.
>>
>> devices must interact with a gic available in the kernel, so I think
>> referring to PPIs and SPIs is very appropriate in kvm.h for a user
>> space device emulation that must inject either a PPI or an SPI.
>>
>> We can call them TYPE_GIC_V2_XXX or something like that if you feel
>> this is cleaner.
>
> It's more that the GIC isn't really part of the architecture, so it would be
> cleaner to have the GIC-specifics separated out from the architectural part
> of KVM. That will also make it easier when adding support for future
> versions of the GIC.
>
> If core KVM needs the concept of a per-cpu interrupt, just call it
> IRQ_TYPE_PERCPU or something rather than PPI.
>
that we have already, KVM_ARM_IRQ_TYPE_CPU

then how do you propose an interface for a user space emulation of a
board that uses the vgic where a device needs to inject a PPI to the
kernel that emulates the vgic?

I don't see the dire need for this separation: the API is extendable
and covers all the needs at this point.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 10/15] KVM: ARM: World-switch implementation
  2012-09-25 17:00     ` Will Deacon
@ 2012-09-30 17:47       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30 17:47 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvm, linux-arm-kernel, kvmarm

On Tue, Sep 25, 2012 at 1:00 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:35:33PM +0100, Christoffer Dall wrote:
>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>> index 1429d89..cd8fc86 100644
>> --- a/arch/arm/kernel/asm-offsets.c
>> +++ b/arch/arm/kernel/asm-offsets.c
>> @@ -13,6 +13,7 @@
>>  #include <linux/sched.h>
>>  #include <linux/mm.h>
>>  #include <linux/dma-mapping.h>
>> +#include <linux/kvm_host.h>
>>  #include <asm/cacheflush.h>
>>  #include <asm/glue-df.h>
>>  #include <asm/glue-pf.h>
>> @@ -144,5 +145,48 @@ int main(void)
>>    DEFINE(DMA_BIDIRECTIONAL,    DMA_BIDIRECTIONAL);
>>    DEFINE(DMA_TO_DEVICE,                DMA_TO_DEVICE);
>>    DEFINE(DMA_FROM_DEVICE,      DMA_FROM_DEVICE);
>> +#ifdef CONFIG_KVM_ARM_HOST
>> +  DEFINE(VCPU_KVM,             offsetof(struct kvm_vcpu, kvm));
>> +  DEFINE(VCPU_MIDR,            offsetof(struct kvm_vcpu, arch.midr));
>> +  DEFINE(VCPU_MPIDR,           offsetof(struct kvm_vcpu, arch.cp15[c0_MPIDR]));
>> +  DEFINE(VCPU_CSSELR,          offsetof(struct kvm_vcpu, arch.cp15[c0_CSSELR]));
>> +  DEFINE(VCPU_SCTLR,           offsetof(struct kvm_vcpu, arch.cp15[c1_SCTLR]));
>> +  DEFINE(VCPU_CPACR,           offsetof(struct kvm_vcpu, arch.cp15[c1_CPACR]));
>> +  DEFINE(VCPU_TTBR0,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR0]));
>> +  DEFINE(VCPU_TTBR1,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR1]));
>> +  DEFINE(VCPU_TTBCR,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBCR]));
>> +  DEFINE(VCPU_DACR,            offsetof(struct kvm_vcpu, arch.cp15[c3_DACR]));
>> +  DEFINE(VCPU_DFSR,            offsetof(struct kvm_vcpu, arch.cp15[c5_DFSR]));
>> +  DEFINE(VCPU_IFSR,            offsetof(struct kvm_vcpu, arch.cp15[c5_IFSR]));
>> +  DEFINE(VCPU_ADFSR,           offsetof(struct kvm_vcpu, arch.cp15[c5_ADFSR]));
>> +  DEFINE(VCPU_AIFSR,           offsetof(struct kvm_vcpu, arch.cp15[c5_AIFSR]));
>> +  DEFINE(VCPU_DFAR,            offsetof(struct kvm_vcpu, arch.cp15[c6_DFAR]));
>> +  DEFINE(VCPU_IFAR,            offsetof(struct kvm_vcpu, arch.cp15[c6_IFAR]));
>> +  DEFINE(VCPU_PRRR,            offsetof(struct kvm_vcpu, arch.cp15[c10_PRRR]));
>> +  DEFINE(VCPU_NMRR,            offsetof(struct kvm_vcpu, arch.cp15[c10_NMRR]));
>> +  DEFINE(VCPU_VBAR,            offsetof(struct kvm_vcpu, arch.cp15[c12_VBAR]));
>> +  DEFINE(VCPU_CID,             offsetof(struct kvm_vcpu, arch.cp15[c13_CID]));
>> +  DEFINE(VCPU_TID_URW,         offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URW]));
>> +  DEFINE(VCPU_TID_URO,         offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URO]));
>> +  DEFINE(VCPU_TID_PRIV,                offsetof(struct kvm_vcpu, arch.cp15[c13_TID_PRIV]));
>
> Could you instead define an offset for arch.cp15, then use scaled offsets
> from that in the assembly code?
>

that would require changing the enum in kvm_host.h to defines and
either wrap that whole file in #ifndef __ASSEMBLY__ or move those
defines to kvm_asm.h, not sure which I think is the most pretty:

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 5315c69..99c0faf 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -19,6 +19,34 @@
 #ifndef __ARM_KVM_ASM_H__
 #define __ARM_KVM_ASM_H__

+/* 0 is reserved as an invalid value. */
+#define c0_MPIDR	1	/* MultiProcessor ID Register */
+#define c0_CSSELR	2	/* Cache Size Selection Register */
+#define c1_SCTLR	3	/* System Control Register */
+#define c1_ACTLR	4	/* Auxilliary Control Register */
+#define c1_CPACR	5	/* Coprocessor Access Control */
+#define c2_TTBR0	6	/* Translation Table Base Register 0 */
+#define c2_TTBR0_high	7	/* TTBR0 top 32 bits */
+#define c2_TTBR1	8	/* Translation Table Base Register 1 */
+#define c2_TTBR1_high	9	/* TTBR1 top 32 bits */
+#define c2_TTBCR	10	/* Translation Table Base Control R. */
+#define c3_DACR		11	/* Domain Access Control Register */
+#define c5_DFSR		12	/* Data Fault Status Register */
+#define c5_IFSR		13	/* Instruction Fault Status Register */
+#define c5_ADFSR	14	/* Auxilary Data Fault Status R */
+#define c5_AIFSR	15	/* Auxilary Instrunction Fault Status R */
+#define c6_DFAR		16	/* Data Fault Address Register */
+#define c6_IFAR		17	/* Instruction Fault Address Register */
+#define c9_L2CTLR	18	/* Cortex A15 L2 Control Register */
+#define c10_PRRR	19	/* Primary Region Remap Register */
+#define c10_NMRR	20	/* Normal Memory Remap Register */
+#define c12_VBAR	21	/* Vector Base Address Register */
+#define c13_CID		22	/* Context ID Register */
+#define c13_TID_URW	23	/* Thread ID, User R/W */
+#define c13_TID_URO	24	/* Thread ID, User R/O */
+#define c13_TID_PRIV	25	/* Thread ID, Priveleged */
+#define NR_CP15_REGS	26	/* Number of regs (incl. invalid) */
+
 #define ARM_EXCEPTION_RESET	  0
 #define ARM_EXCEPTION_UNDEFINED   1
 #define ARM_EXCEPTION_SOFTWARE    2
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 9c4fbd4..f9b2ca6 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -21,6 +21,7 @@

 #include <asm/kvm.h>
 #include <asm/fpstate.h>
+#include <asm/kvm_asm.h>

 #define KVM_MAX_VCPUS NR_CPUS
 #define KVM_MEMORY_SLOTS 32
@@ -73,37 +74,6 @@ struct kvm_mmu_memory_cache {
 	void *objects[KVM_NR_MEM_OBJS];
 };

-/* 0 is reserved as an invalid value. */
-enum cp15_regs {
-	c0_MPIDR=1,		/* MultiProcessor ID Register */
-	c0_CSSELR,		/* Cache Size Selection Register */
-	c1_SCTLR,		/* System Control Register */
-	c1_ACTLR,		/* Auxilliary Control Register */
-	c1_CPACR,		/* Coprocessor Access Control */
-	c2_TTBR0,		/* Translation Table Base Register 0 */
-	c2_TTBR0_high,		/* TTBR0 top 32 bits */
-	c2_TTBR1,		/* Translation Table Base Register 1 */
-	c2_TTBR1_high,		/* TTBR1 top 32 bits */
-	c2_TTBCR,		/* Translation Table Base Control R. */
-	c3_DACR,		/* Domain Access Control Register */
-	c5_DFSR,		/* Data Fault Status Register */
-	c5_IFSR,		/* Instruction Fault Status Register */
-	c5_ADFSR,		/* Auxilary Data Fault Status Register */
-	c5_AIFSR,		/* Auxilary Instruction Fault Status Register */
-	c6_DFAR,		/* Data Fault Address Register */
-	c6_IFAR,		/* Instruction Fault Address Register */
-	c9_L2CTLR,		/* Cortex A15 L2 Control Register */
-	c10_PRRR,		/* Primary Region Remap Register */
-	c10_NMRR,		/* Normal Memory Remap Register */
-	c12_VBAR,		/* Vector Base Address Register */
-	c13_CID,		/* Context ID Register */
-	c13_TID_URW,		/* Thread ID, User R/W */
-	c13_TID_URO,		/* Thread ID, User R/O */
-	c13_TID_PRIV,		/* Thread ID, Priveleged */
-
-	nr_cp15_regs
-};
-
 struct kvm_vcpu_arch {
 	struct kvm_regs regs;

@@ -111,7 +81,7 @@ struct kvm_vcpu_arch {
 	DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);

 	/* System control coprocessor (cp15) */
-	u32 cp15[nr_cp15_regs];
+	u32 cp15[NR_CP15_REGS];

 	/* The CPU type we expose to the VM */
 	u32 midr;
@@ -203,4 +173,5 @@ unsigned long kvm_arm_num_coproc_regs(struct
kvm_vcpu *vcpu);
 struct kvm_one_reg;
 int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
 int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index cf0b50e..1c4181e 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -148,27 +148,7 @@ int main(void)
 #ifdef CONFIG_KVM_ARM_HOST
   DEFINE(VCPU_KVM,		offsetof(struct kvm_vcpu, kvm));
   DEFINE(VCPU_MIDR,		offsetof(struct kvm_vcpu, arch.midr));
-  DEFINE(VCPU_MPIDR,		offsetof(struct kvm_vcpu, arch.cp15[c0_MPIDR]));
-  DEFINE(VCPU_CSSELR,		offsetof(struct kvm_vcpu, arch.cp15[c0_CSSELR]));
-  DEFINE(VCPU_SCTLR,		offsetof(struct kvm_vcpu, arch.cp15[c1_SCTLR]));
-  DEFINE(VCPU_CPACR,		offsetof(struct kvm_vcpu, arch.cp15[c1_CPACR]));
-  DEFINE(VCPU_TTBR0,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR0]));
-  DEFINE(VCPU_TTBR1,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR1]));
-  DEFINE(VCPU_TTBCR,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBCR]));
-  DEFINE(VCPU_DACR,		offsetof(struct kvm_vcpu, arch.cp15[c3_DACR]));
-  DEFINE(VCPU_DFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_DFSR]));
-  DEFINE(VCPU_IFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_IFSR]));
-  DEFINE(VCPU_ADFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_ADFSR]));
-  DEFINE(VCPU_AIFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_AIFSR]));
-  DEFINE(VCPU_DFAR,		offsetof(struct kvm_vcpu, arch.cp15[c6_DFAR]));
-  DEFINE(VCPU_IFAR,		offsetof(struct kvm_vcpu, arch.cp15[c6_IFAR]));
-  DEFINE(VCPU_PRRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_PRRR]));
-  DEFINE(VCPU_NMRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_NMRR]));
-  DEFINE(VCPU_VBAR,		offsetof(struct kvm_vcpu, arch.cp15[c12_VBAR]));
-  DEFINE(VCPU_CID,		offsetof(struct kvm_vcpu, arch.cp15[c13_CID]));
-  DEFINE(VCPU_TID_URW,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URW]));
-  DEFINE(VCPU_TID_URO,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URO]));
-  DEFINE(VCPU_TID_PRIV,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_PRIV]));
+  DEFINE(VCPU_CP15,		offsetof(struct kvm_vcpu, arch.cp15));
   DEFINE(VCPU_VFP_GUEST,	offsetof(struct kvm_vcpu, arch.vfp_guest));
   DEFINE(VCPU_VFP_HOST,		offsetof(struct kvm_vcpu, arch.vfp_host));
   DEFINE(VCPU_REGS,		offsetof(struct kvm_vcpu, arch.regs));
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 15977a6..759396a 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -61,7 +61,7 @@ struct coproc_reg {
 	void (*reset)(struct kvm_vcpu *, const struct coproc_reg *);

 	/* Index into vcpu->arch.cp15[], or 0 if we don't need to save it. */
-	enum cp15_regs reg;
+	unsigned long reg;

 	/* Value (usually reset value) */
 	u64 val;
@@ -1097,7 +1097,7 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
 	table = get_target_table(vcpu->arch.target, &num);
 	reset_coproc_regs(vcpu, table, num);

-	for (num = 1; num < nr_cp15_regs; num++)
+	for (num = 1; num < NR_CP15_REGS; num++)
 		if (vcpu->arch.cp15[num] == 0x42424242)
 			panic("Didn't reset vcpu->arch.cp15[%zi]", num);
 }
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index f32e2f7..2839afa 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -29,6 +29,7 @@
 #define VCPU_USR_SP		(VCPU_USR_REG(13))
 #define VCPU_FIQ_REG(_reg_nr)	(VCPU_FIQ_REGS + (_reg_nr * 4))
 #define VCPU_FIQ_SPSR		(VCPU_FIQ_REG(7))
+#define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4))

 	.text
 	.align	PAGE_SHIFT
@@ -202,18 +203,18 @@ ENDPROC(__kvm_flush_vm_context)
 	.if \vcpu == 0
 	push	{r2-r12}		@ Push CP15 registers
 	.else
-	str	r2, [\vcpup, #VCPU_SCTLR]
-	str	r3, [\vcpup, #VCPU_CPACR]
-	str	r4, [\vcpup, #VCPU_TTBCR]
-	str	r5, [\vcpup, #VCPU_DACR]
-	add	\vcpup, \vcpup, #VCPU_TTBR0
+	str	r2, [\vcpup, #CP15_OFFSET(c1_SCTLR)]
+	str	r3, [\vcpup, #CP15_OFFSET(c1_CPACR)]
+	str	r4, [\vcpup, #CP15_OFFSET(c2_TTBCR)]
+	str	r5, [\vcpup, #CP15_OFFSET(c3_DACR)]
+	add	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR0)
 	strd	r6, r7, [\vcpup]
-	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	add	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
 	strd	r8, r9, [\vcpup]
-	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
-	str	r10, [\vcpup, #VCPU_PRRR]
-	str	r11, [\vcpup, #VCPU_NMRR]
-	str	r12, [\vcpup, #VCPU_CSSELR]
+	sub	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR1)
+	str	r10, [\vcpup, #CP15_OFFSET(c10_PRRR)]
+	str	r11, [\vcpup, #CP15_OFFSET(c10_NMRR)]
+	str	r12, [\vcpup, #CP15_OFFSET(c0_CSSELR)]
 	.endif

 	mrc	p15, 0, r2, c13, c0, 1	@ CID
@@ -231,17 +232,17 @@ ENDPROC(__kvm_flush_vm_context)
 	.if \vcpu == 0
 	push	{r2-r12}		@ Push CP15 registers
 	.else
-	str	r2, [\vcpup, #VCPU_CID]
-	str	r3, [\vcpup, #VCPU_TID_URW]
-	str	r4, [\vcpup, #VCPU_TID_URO]
-	str	r5, [\vcpup, #VCPU_TID_PRIV]
-	str	r6, [\vcpup, #VCPU_DFSR]
-	str	r7, [\vcpup, #VCPU_IFSR]
-	str	r8, [\vcpup, #VCPU_ADFSR]
-	str	r9, [\vcpup, #VCPU_AIFSR]
-	str	r10, [\vcpup, #VCPU_DFAR]
-	str	r11, [\vcpup, #VCPU_IFAR]
-	str	r12, [\vcpup, #VCPU_VBAR]
+	str	r2, [\vcpup, #CP15_OFFSET(c13_CID)]
+	str	r3, [\vcpup, #CP15_OFFSET(c13_TID_URW)]
+	str	r4, [\vcpup, #CP15_OFFSET(c13_TID_URO)]
+	str	r5, [\vcpup, #CP15_OFFSET(c13_TID_PRIV)]
+	str	r6, [\vcpup, #CP15_OFFSET(c5_DFSR)]
+	str	r7, [\vcpup, #CP15_OFFSET(c5_IFSR)]
+	str	r8, [\vcpup, #CP15_OFFSET(c5_ADFSR)]
+	str	r9, [\vcpup, #CP15_OFFSET(c5_AIFSR)]
+	str	r10, [\vcpup, #CP15_OFFSET(c6_DFAR)]
+	str	r11, [\vcpup, #CP15_OFFSET(c6_IFAR)]
+	str	r12, [\vcpup, #CP15_OFFSET(c12_VBAR)]
 	.endif
 .endm

@@ -254,17 +255,17 @@ ENDPROC(__kvm_flush_vm_context)
 	.if \vcpu == 0
 	pop	{r2-r12}
 	.else
-	ldr	r2, [\vcpup, #VCPU_CID]
-	ldr	r3, [\vcpup, #VCPU_TID_URW]
-	ldr	r4, [\vcpup, #VCPU_TID_URO]
-	ldr	r5, [\vcpup, #VCPU_TID_PRIV]
-	ldr	r6, [\vcpup, #VCPU_DFSR]
-	ldr	r7, [\vcpup, #VCPU_IFSR]
-	ldr	r8, [\vcpup, #VCPU_ADFSR]
-	ldr	r9, [\vcpup, #VCPU_AIFSR]
-	ldr	r10, [\vcpup, #VCPU_DFAR]
-	ldr	r11, [\vcpup, #VCPU_IFAR]
-	ldr	r12, [\vcpup, #VCPU_VBAR]
+	ldr	r2, [\vcpup, #CP15_OFFSET(c13_CID)]
+	ldr	r3, [\vcpup, #CP15_OFFSET(c13_TID_URW)]
+	ldr	r4, [\vcpup, #CP15_OFFSET(c13_TID_URO)]
+	ldr	r5, [\vcpup, #CP15_OFFSET(c13_TID_PRIV)]
+	ldr	r6, [\vcpup, #CP15_OFFSET(c5_DFSR)]
+	ldr	r7, [\vcpup, #CP15_OFFSET(c5_IFSR)]
+	ldr	r8, [\vcpup, #CP15_OFFSET(c5_ADFSR)]
+	ldr	r9, [\vcpup, #CP15_OFFSET(c5_AIFSR)]
+	ldr	r10, [\vcpup, #CP15_OFFSET(c6_DFAR)]
+	ldr	r11, [\vcpup, #CP15_OFFSET(c6_IFAR)]
+	ldr	r12, [\vcpup, #CP15_OFFSET(c12_VBAR)]
 	.endif

 	mcr	p15, 0, r2, c13, c0, 1	@ CID
@@ -282,18 +283,18 @@ ENDPROC(__kvm_flush_vm_context)
 	.if \vcpu == 0
 	pop	{r2-r12}
 	.else
-	ldr	r2, [\vcpup, #VCPU_SCTLR]
-	ldr	r3, [\vcpup, #VCPU_CPACR]
-	ldr	r4, [\vcpup, #VCPU_TTBCR]
-	ldr	r5, [\vcpup, #VCPU_DACR]
-	add	\vcpup, \vcpup, #VCPU_TTBR0
+	ldr	r2, [\vcpup, #CP15_OFFSET(c1_SCTLR)]
+	ldr	r3, [\vcpup, #CP15_OFFSET(c1_CPACR)]
+	ldr	r4, [\vcpup, #CP15_OFFSET(c2_TTBCR)]
+	ldr	r5, [\vcpup, #CP15_OFFSET(c3_DACR)]
+	add	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR0)
 	ldrd	r6, r7, [\vcpup]
-	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	add	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
 	ldrd	r8, r9, [\vcpup]
-	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
-	ldr	r10, [\vcpup, #VCPU_PRRR]
-	ldr	r11, [\vcpup, #VCPU_NMRR]
-	ldr	r12, [\vcpup, #VCPU_CSSELR]
+	sub	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR1)
+	ldr	r10, [\vcpup, #CP15_OFFSET(c10_PRRR)]
+	ldr	r11, [\vcpup, #CP15_OFFSET(c10_NMRR)]
+	ldr	r12, [\vcpup, #CP15_OFFSET(c0_CSSELR)]
 	.endif

 	mcr	p15, 0, r2, c1, c0, 0	@ SCTLR
@@ -556,7 +557,7 @@ ENTRY(__kvm_vcpu_run)
 	mcr	p15, 4, r1, c0, c0, 0

 	@ Write guest view of MPIDR into VMPIDR
-	ldr	r1, [r0, #VCPU_MPIDR]
+	ldr	r1, [r0, #CP15_OFFSET(c0_MPIDR)]
 	mcr	p15, 4, r1, c0, c0, 5

 	@ Load guest registers


>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 8a87fc7..087f9d1 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -41,6 +41,7 @@
>>  #include <asm/kvm_arm.h>
>>  #include <asm/kvm_asm.h>
>>  #include <asm/kvm_mmu.h>
>> +#include <asm/kvm_emulate.h>
>>
>>  #ifdef REQUIRES_VIRT
>>  __asm__(".arch_extension       virt");
>> @@ -50,6 +51,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>>  static unsigned long hyp_default_vectors;
>>
>> +/* The VMID used in the VTTBR */
>> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
>> +static u8 kvm_next_vmid;
>> +static DEFINE_SPINLOCK(kvm_vmid_lock);
>>
>>  int kvm_arch_hardware_enable(void *garbage)
>>  {
>> @@ -273,6 +278,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  {
>>         vcpu->cpu = cpu;
>> +       vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
>>  }
>>
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> @@ -305,12 +311,169 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>>
>>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
>>  {
>> +       return v->mode == IN_GUEST_MODE;
>> +}
>> +
>> +static void reset_vm_context(void *info)
>> +{
>> +       __kvm_flush_vm_context();
>> +}
>> +
>> +/**
>> + * need_new_vmid_gen - check that the VMID is still valid
>> + * @kvm: The VM's VMID to checkt
>> + *
>> + * return true if there is a new generation of VMIDs being used
>> + *
>> + * The hardware supports only 256 values with the value zero reserved for the
>> + * host, so we check if an assigned value belongs to a previous generation,
>> + * which which requires us to assign a new value. If we're the first to use a
>> + * VMID for the new generation, we must flush necessary caches and TLBs on all
>> + * CPUs.
>> + */
>> +static bool need_new_vmid_gen(struct kvm *kvm)
>> +{
>> +       return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
>> +}
>> +
>> +/**
>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>> + * @kvm        The guest that we are about to run
>> + *
>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
>> + * caches and TLBs.
>> + */
>> +static void update_vttbr(struct kvm *kvm)
>> +{
>> +       phys_addr_t pgd_phys;
>> +
>> +       if (!need_new_vmid_gen(kvm))
>> +               return;
>> +
>> +       spin_lock(&kvm_vmid_lock);
>> +
>> +       /* First user of a new VMID generation? */
>> +       if (unlikely(kvm_next_vmid == 0)) {
>> +               atomic64_inc(&kvm_vmid_gen);
>> +               kvm_next_vmid = 1;
>> +
>> +               /*
>> +                * On SMP we know no other CPUs can use this CPU's or
>> +                * each other's VMID since the kvm_vmid_lock blocks
>> +                * them from reentry to the guest.
>> +                */
>> +               on_each_cpu(reset_vm_context, NULL, 1);
>
> Why on_each_cpu? The maintenance operations should be broadcast, right?
>

we need each cpu (that runs guests) to exit guests and pick up the new
vmid in their vttbr.

>> +       }
>> +
>> +       kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
>> +       kvm->arch.vmid = kvm_next_vmid;
>> +       kvm_next_vmid++;
>> +
>> +       /* update vttbr to be used with the new vmid */
>> +       pgd_phys = virt_to_phys(kvm->arch.pgd);
>> +       kvm->arch.vttbr = pgd_phys & ((1LLU << 40) - 1)
>> +                         & ~((2 << VTTBR_X) - 1);
>> +       kvm->arch.vttbr |= (u64)(kvm->arch.vmid) << 48;
>> +
>> +       spin_unlock(&kvm_vmid_lock);
>> +}
>
> This smells like a watered-down version of the ASID allocator. Now, I can't
> really see much code sharing going on here, but perhaps your case is
> simpler... do you anticipate running more than 255 VMs in parallel? If not,
> then you could just invalidate the relevant TLB entries on VM shutdown and
> avoid the rollover case.
>

I want to support running more than 255 VMs in parallel. I think
trying to share code with the ASID allocator complicates things
without any real benefit.

>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index edf9ed5..cc9448b 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -23,6 +23,12 @@
>>  #include <asm/asm-offsets.h>
>>  #include <asm/kvm_asm.h>
>>  #include <asm/kvm_arm.h>
>> +#include <asm/vfpmacros.h>
>> +
>> +#define VCPU_USR_REG(_reg_nr)  (VCPU_USR_REGS + (_reg_nr * 4))
>> +#define VCPU_USR_SP            (VCPU_USR_REG(13))
>> +#define VCPU_FIQ_REG(_reg_nr)  (VCPU_FIQ_REGS + (_reg_nr * 4))
>> +#define VCPU_FIQ_SPSR          (VCPU_FIQ_REG(7))
>>
>>         .text
>>         .align  PAGE_SHIFT
>> @@ -34,7 +40,33 @@ __kvm_hyp_code_start:
>>  @  Flush per-VMID TLBs
>>  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>
> This comment syntax crops up a few times in your .S files but doesn't match
> anything currently under arch/arm/. Please can you follow what we do there
> and use /* */ ?
>

sure

>>  ENTRY(__kvm_tlb_flush_vmid)
>> +       hvc     #0                      @ Switch to Hyp mode
>> +       push    {r2, r3}
>> +
>> +       add     r0, r0, #KVM_VTTBR
>> +       ldrd    r2, r3, [r0]
>> +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>> +       isb
>> +       mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
>> +       dsb
>> +       isb
>> +       mov     r2, #0
>> +       mov     r3, #0
>> +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
>> +       isb
>
> Do you need this isb, given that you're about to do an hvc?
>

they're gone

>> +       pop     {r2, r3}
>> +       hvc     #0                      @ Back to SVC
>>         bx      lr
>>  ENDPROC(__kvm_tlb_flush_vmid)
>>
>> @@ -42,26 +74,702 @@ ENDPROC(__kvm_tlb_flush_vmid)
>>  @  Flush TLBs and instruction caches of current CPU for all VMIDs
>>  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>>
>> +/*
>> + * void __kvm_flush_vm_context(void);
>> + */
>>  ENTRY(__kvm_flush_vm_context)
>> +       hvc     #0                      @ switch to hyp-mode
>> +
>> +       mov     r0, #0                  @ rn parameter for c15 flushes is SBZ
>> +       mcr     p15, 4, r0, c8, c7, 4   @ Invalidate Non-secure Non-Hyp TLB
>> +       mcr     p15, 0, r0, c7, c5, 0   @ Invalidate instruction caches
>> +       dsb
>> +       isb
>
> Likewise.
>

ditto

>> +       hvc     #0                      @ switch back to svc-mode, see hyp_svc
>>         bx      lr
>>  ENDPROC(__kvm_flush_vm_context)
>>
>> +/* These are simply for the macros to work - value don't have meaning */
>> +.equ usr, 0
>> +.equ svc, 1
>> +.equ abt, 2
>> +.equ und, 3
>> +.equ irq, 4
>> +.equ fiq, 5
>> +
>> +.macro store_mode_state base_reg, mode
>> +       .if \mode == usr
>> +       mrs     r2, SP_usr
>> +       mov     r3, lr
>> +       stmdb   \base_reg!, {r2, r3}
>> +       .elseif \mode != fiq
>> +       mrs     r2, SP_\mode
>> +       mrs     r3, LR_\mode
>> +       mrs     r4, SPSR_\mode
>> +       stmdb   \base_reg!, {r2, r3, r4}
>> +       .else
>> +       mrs     r2, r8_fiq
>> +       mrs     r3, r9_fiq
>> +       mrs     r4, r10_fiq
>> +       mrs     r5, r11_fiq
>> +       mrs     r6, r12_fiq
>> +       mrs     r7, SP_fiq
>> +       mrs     r8, LR_fiq
>> +       mrs     r9, SPSR_fiq
>> +       stmdb   \base_reg!, {r2-r9}
>> +       .endif
>> +.endm
>
> Perhaps you could stick the assembly macros into a separate file, like we do
> in assembler.h, so the code is more readable and they can be reused if
> need-be?
>
This is a lot of code to stick in a header file (hard to read within C
constructs, no assembly syntax highlighting, cannot use @ for
end-of-line comments), but I factored it out to interrupts_header.S,
which makes it nicer to read interrupts.S and it should be easy to
factor out pieces if ever needed anywhere else.

-Christoffer

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 10/15] KVM: ARM: World-switch implementation
@ 2012-09-30 17:47       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 25, 2012 at 1:00 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:35:33PM +0100, Christoffer Dall wrote:
>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>> index 1429d89..cd8fc86 100644
>> --- a/arch/arm/kernel/asm-offsets.c
>> +++ b/arch/arm/kernel/asm-offsets.c
>> @@ -13,6 +13,7 @@
>>  #include <linux/sched.h>
>>  #include <linux/mm.h>
>>  #include <linux/dma-mapping.h>
>> +#include <linux/kvm_host.h>
>>  #include <asm/cacheflush.h>
>>  #include <asm/glue-df.h>
>>  #include <asm/glue-pf.h>
>> @@ -144,5 +145,48 @@ int main(void)
>>    DEFINE(DMA_BIDIRECTIONAL,    DMA_BIDIRECTIONAL);
>>    DEFINE(DMA_TO_DEVICE,                DMA_TO_DEVICE);
>>    DEFINE(DMA_FROM_DEVICE,      DMA_FROM_DEVICE);
>> +#ifdef CONFIG_KVM_ARM_HOST
>> +  DEFINE(VCPU_KVM,             offsetof(struct kvm_vcpu, kvm));
>> +  DEFINE(VCPU_MIDR,            offsetof(struct kvm_vcpu, arch.midr));
>> +  DEFINE(VCPU_MPIDR,           offsetof(struct kvm_vcpu, arch.cp15[c0_MPIDR]));
>> +  DEFINE(VCPU_CSSELR,          offsetof(struct kvm_vcpu, arch.cp15[c0_CSSELR]));
>> +  DEFINE(VCPU_SCTLR,           offsetof(struct kvm_vcpu, arch.cp15[c1_SCTLR]));
>> +  DEFINE(VCPU_CPACR,           offsetof(struct kvm_vcpu, arch.cp15[c1_CPACR]));
>> +  DEFINE(VCPU_TTBR0,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR0]));
>> +  DEFINE(VCPU_TTBR1,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR1]));
>> +  DEFINE(VCPU_TTBCR,           offsetof(struct kvm_vcpu, arch.cp15[c2_TTBCR]));
>> +  DEFINE(VCPU_DACR,            offsetof(struct kvm_vcpu, arch.cp15[c3_DACR]));
>> +  DEFINE(VCPU_DFSR,            offsetof(struct kvm_vcpu, arch.cp15[c5_DFSR]));
>> +  DEFINE(VCPU_IFSR,            offsetof(struct kvm_vcpu, arch.cp15[c5_IFSR]));
>> +  DEFINE(VCPU_ADFSR,           offsetof(struct kvm_vcpu, arch.cp15[c5_ADFSR]));
>> +  DEFINE(VCPU_AIFSR,           offsetof(struct kvm_vcpu, arch.cp15[c5_AIFSR]));
>> +  DEFINE(VCPU_DFAR,            offsetof(struct kvm_vcpu, arch.cp15[c6_DFAR]));
>> +  DEFINE(VCPU_IFAR,            offsetof(struct kvm_vcpu, arch.cp15[c6_IFAR]));
>> +  DEFINE(VCPU_PRRR,            offsetof(struct kvm_vcpu, arch.cp15[c10_PRRR]));
>> +  DEFINE(VCPU_NMRR,            offsetof(struct kvm_vcpu, arch.cp15[c10_NMRR]));
>> +  DEFINE(VCPU_VBAR,            offsetof(struct kvm_vcpu, arch.cp15[c12_VBAR]));
>> +  DEFINE(VCPU_CID,             offsetof(struct kvm_vcpu, arch.cp15[c13_CID]));
>> +  DEFINE(VCPU_TID_URW,         offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URW]));
>> +  DEFINE(VCPU_TID_URO,         offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URO]));
>> +  DEFINE(VCPU_TID_PRIV,                offsetof(struct kvm_vcpu, arch.cp15[c13_TID_PRIV]));
>
> Could you instead define an offset for arch.cp15, then use scaled offsets
> from that in the assembly code?
>

that would require changing the enum in kvm_host.h to defines and
either wrap that whole file in #ifndef __ASSEMBLY__ or move those
defines to kvm_asm.h, not sure which I think is the most pretty:

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 5315c69..99c0faf 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -19,6 +19,34 @@
 #ifndef __ARM_KVM_ASM_H__
 #define __ARM_KVM_ASM_H__

+/* 0 is reserved as an invalid value. */
+#define c0_MPIDR	1	/* MultiProcessor ID Register */
+#define c0_CSSELR	2	/* Cache Size Selection Register */
+#define c1_SCTLR	3	/* System Control Register */
+#define c1_ACTLR	4	/* Auxilliary Control Register */
+#define c1_CPACR	5	/* Coprocessor Access Control */
+#define c2_TTBR0	6	/* Translation Table Base Register 0 */
+#define c2_TTBR0_high	7	/* TTBR0 top 32 bits */
+#define c2_TTBR1	8	/* Translation Table Base Register 1 */
+#define c2_TTBR1_high	9	/* TTBR1 top 32 bits */
+#define c2_TTBCR	10	/* Translation Table Base Control R. */
+#define c3_DACR		11	/* Domain Access Control Register */
+#define c5_DFSR		12	/* Data Fault Status Register */
+#define c5_IFSR		13	/* Instruction Fault Status Register */
+#define c5_ADFSR	14	/* Auxilary Data Fault Status R */
+#define c5_AIFSR	15	/* Auxilary Instrunction Fault Status R */
+#define c6_DFAR		16	/* Data Fault Address Register */
+#define c6_IFAR		17	/* Instruction Fault Address Register */
+#define c9_L2CTLR	18	/* Cortex A15 L2 Control Register */
+#define c10_PRRR	19	/* Primary Region Remap Register */
+#define c10_NMRR	20	/* Normal Memory Remap Register */
+#define c12_VBAR	21	/* Vector Base Address Register */
+#define c13_CID		22	/* Context ID Register */
+#define c13_TID_URW	23	/* Thread ID, User R/W */
+#define c13_TID_URO	24	/* Thread ID, User R/O */
+#define c13_TID_PRIV	25	/* Thread ID, Priveleged */
+#define NR_CP15_REGS	26	/* Number of regs (incl. invalid) */
+
 #define ARM_EXCEPTION_RESET	  0
 #define ARM_EXCEPTION_UNDEFINED   1
 #define ARM_EXCEPTION_SOFTWARE    2
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 9c4fbd4..f9b2ca6 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -21,6 +21,7 @@

 #include <asm/kvm.h>
 #include <asm/fpstate.h>
+#include <asm/kvm_asm.h>

 #define KVM_MAX_VCPUS NR_CPUS
 #define KVM_MEMORY_SLOTS 32
@@ -73,37 +74,6 @@ struct kvm_mmu_memory_cache {
 	void *objects[KVM_NR_MEM_OBJS];
 };

-/* 0 is reserved as an invalid value. */
-enum cp15_regs {
-	c0_MPIDR=1,		/* MultiProcessor ID Register */
-	c0_CSSELR,		/* Cache Size Selection Register */
-	c1_SCTLR,		/* System Control Register */
-	c1_ACTLR,		/* Auxilliary Control Register */
-	c1_CPACR,		/* Coprocessor Access Control */
-	c2_TTBR0,		/* Translation Table Base Register 0 */
-	c2_TTBR0_high,		/* TTBR0 top 32 bits */
-	c2_TTBR1,		/* Translation Table Base Register 1 */
-	c2_TTBR1_high,		/* TTBR1 top 32 bits */
-	c2_TTBCR,		/* Translation Table Base Control R. */
-	c3_DACR,		/* Domain Access Control Register */
-	c5_DFSR,		/* Data Fault Status Register */
-	c5_IFSR,		/* Instruction Fault Status Register */
-	c5_ADFSR,		/* Auxilary Data Fault Status Register */
-	c5_AIFSR,		/* Auxilary Instruction Fault Status Register */
-	c6_DFAR,		/* Data Fault Address Register */
-	c6_IFAR,		/* Instruction Fault Address Register */
-	c9_L2CTLR,		/* Cortex A15 L2 Control Register */
-	c10_PRRR,		/* Primary Region Remap Register */
-	c10_NMRR,		/* Normal Memory Remap Register */
-	c12_VBAR,		/* Vector Base Address Register */
-	c13_CID,		/* Context ID Register */
-	c13_TID_URW,		/* Thread ID, User R/W */
-	c13_TID_URO,		/* Thread ID, User R/O */
-	c13_TID_PRIV,		/* Thread ID, Priveleged */
-
-	nr_cp15_regs
-};
-
 struct kvm_vcpu_arch {
 	struct kvm_regs regs;

@@ -111,7 +81,7 @@ struct kvm_vcpu_arch {
 	DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);

 	/* System control coprocessor (cp15) */
-	u32 cp15[nr_cp15_regs];
+	u32 cp15[NR_CP15_REGS];

 	/* The CPU type we expose to the VM */
 	u32 midr;
@@ -203,4 +173,5 @@ unsigned long kvm_arm_num_coproc_regs(struct
kvm_vcpu *vcpu);
 struct kvm_one_reg;
 int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
 int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index cf0b50e..1c4181e 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -148,27 +148,7 @@ int main(void)
 #ifdef CONFIG_KVM_ARM_HOST
   DEFINE(VCPU_KVM,		offsetof(struct kvm_vcpu, kvm));
   DEFINE(VCPU_MIDR,		offsetof(struct kvm_vcpu, arch.midr));
-  DEFINE(VCPU_MPIDR,		offsetof(struct kvm_vcpu, arch.cp15[c0_MPIDR]));
-  DEFINE(VCPU_CSSELR,		offsetof(struct kvm_vcpu, arch.cp15[c0_CSSELR]));
-  DEFINE(VCPU_SCTLR,		offsetof(struct kvm_vcpu, arch.cp15[c1_SCTLR]));
-  DEFINE(VCPU_CPACR,		offsetof(struct kvm_vcpu, arch.cp15[c1_CPACR]));
-  DEFINE(VCPU_TTBR0,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR0]));
-  DEFINE(VCPU_TTBR1,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBR1]));
-  DEFINE(VCPU_TTBCR,		offsetof(struct kvm_vcpu, arch.cp15[c2_TTBCR]));
-  DEFINE(VCPU_DACR,		offsetof(struct kvm_vcpu, arch.cp15[c3_DACR]));
-  DEFINE(VCPU_DFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_DFSR]));
-  DEFINE(VCPU_IFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_IFSR]));
-  DEFINE(VCPU_ADFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_ADFSR]));
-  DEFINE(VCPU_AIFSR,		offsetof(struct kvm_vcpu, arch.cp15[c5_AIFSR]));
-  DEFINE(VCPU_DFAR,		offsetof(struct kvm_vcpu, arch.cp15[c6_DFAR]));
-  DEFINE(VCPU_IFAR,		offsetof(struct kvm_vcpu, arch.cp15[c6_IFAR]));
-  DEFINE(VCPU_PRRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_PRRR]));
-  DEFINE(VCPU_NMRR,		offsetof(struct kvm_vcpu, arch.cp15[c10_NMRR]));
-  DEFINE(VCPU_VBAR,		offsetof(struct kvm_vcpu, arch.cp15[c12_VBAR]));
-  DEFINE(VCPU_CID,		offsetof(struct kvm_vcpu, arch.cp15[c13_CID]));
-  DEFINE(VCPU_TID_URW,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URW]));
-  DEFINE(VCPU_TID_URO,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URO]));
-  DEFINE(VCPU_TID_PRIV,		offsetof(struct kvm_vcpu, arch.cp15[c13_TID_PRIV]));
+  DEFINE(VCPU_CP15,		offsetof(struct kvm_vcpu, arch.cp15));
   DEFINE(VCPU_VFP_GUEST,	offsetof(struct kvm_vcpu, arch.vfp_guest));
   DEFINE(VCPU_VFP_HOST,		offsetof(struct kvm_vcpu, arch.vfp_host));
   DEFINE(VCPU_REGS,		offsetof(struct kvm_vcpu, arch.regs));
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 15977a6..759396a 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -61,7 +61,7 @@ struct coproc_reg {
 	void (*reset)(struct kvm_vcpu *, const struct coproc_reg *);

 	/* Index into vcpu->arch.cp15[], or 0 if we don't need to save it. */
-	enum cp15_regs reg;
+	unsigned long reg;

 	/* Value (usually reset value) */
 	u64 val;
@@ -1097,7 +1097,7 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
 	table = get_target_table(vcpu->arch.target, &num);
 	reset_coproc_regs(vcpu, table, num);

-	for (num = 1; num < nr_cp15_regs; num++)
+	for (num = 1; num < NR_CP15_REGS; num++)
 		if (vcpu->arch.cp15[num] == 0x42424242)
 			panic("Didn't reset vcpu->arch.cp15[%zi]", num);
 }
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index f32e2f7..2839afa 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -29,6 +29,7 @@
 #define VCPU_USR_SP		(VCPU_USR_REG(13))
 #define VCPU_FIQ_REG(_reg_nr)	(VCPU_FIQ_REGS + (_reg_nr * 4))
 #define VCPU_FIQ_SPSR		(VCPU_FIQ_REG(7))
+#define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4))

 	.text
 	.align	PAGE_SHIFT
@@ -202,18 +203,18 @@ ENDPROC(__kvm_flush_vm_context)
 	.if \vcpu == 0
 	push	{r2-r12}		@ Push CP15 registers
 	.else
-	str	r2, [\vcpup, #VCPU_SCTLR]
-	str	r3, [\vcpup, #VCPU_CPACR]
-	str	r4, [\vcpup, #VCPU_TTBCR]
-	str	r5, [\vcpup, #VCPU_DACR]
-	add	\vcpup, \vcpup, #VCPU_TTBR0
+	str	r2, [\vcpup, #CP15_OFFSET(c1_SCTLR)]
+	str	r3, [\vcpup, #CP15_OFFSET(c1_CPACR)]
+	str	r4, [\vcpup, #CP15_OFFSET(c2_TTBCR)]
+	str	r5, [\vcpup, #CP15_OFFSET(c3_DACR)]
+	add	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR0)
 	strd	r6, r7, [\vcpup]
-	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	add	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
 	strd	r8, r9, [\vcpup]
-	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
-	str	r10, [\vcpup, #VCPU_PRRR]
-	str	r11, [\vcpup, #VCPU_NMRR]
-	str	r12, [\vcpup, #VCPU_CSSELR]
+	sub	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR1)
+	str	r10, [\vcpup, #CP15_OFFSET(c10_PRRR)]
+	str	r11, [\vcpup, #CP15_OFFSET(c10_NMRR)]
+	str	r12, [\vcpup, #CP15_OFFSET(c0_CSSELR)]
 	.endif

 	mrc	p15, 0, r2, c13, c0, 1	@ CID
@@ -231,17 +232,17 @@ ENDPROC(__kvm_flush_vm_context)
 	.if \vcpu == 0
 	push	{r2-r12}		@ Push CP15 registers
 	.else
-	str	r2, [\vcpup, #VCPU_CID]
-	str	r3, [\vcpup, #VCPU_TID_URW]
-	str	r4, [\vcpup, #VCPU_TID_URO]
-	str	r5, [\vcpup, #VCPU_TID_PRIV]
-	str	r6, [\vcpup, #VCPU_DFSR]
-	str	r7, [\vcpup, #VCPU_IFSR]
-	str	r8, [\vcpup, #VCPU_ADFSR]
-	str	r9, [\vcpup, #VCPU_AIFSR]
-	str	r10, [\vcpup, #VCPU_DFAR]
-	str	r11, [\vcpup, #VCPU_IFAR]
-	str	r12, [\vcpup, #VCPU_VBAR]
+	str	r2, [\vcpup, #CP15_OFFSET(c13_CID)]
+	str	r3, [\vcpup, #CP15_OFFSET(c13_TID_URW)]
+	str	r4, [\vcpup, #CP15_OFFSET(c13_TID_URO)]
+	str	r5, [\vcpup, #CP15_OFFSET(c13_TID_PRIV)]
+	str	r6, [\vcpup, #CP15_OFFSET(c5_DFSR)]
+	str	r7, [\vcpup, #CP15_OFFSET(c5_IFSR)]
+	str	r8, [\vcpup, #CP15_OFFSET(c5_ADFSR)]
+	str	r9, [\vcpup, #CP15_OFFSET(c5_AIFSR)]
+	str	r10, [\vcpup, #CP15_OFFSET(c6_DFAR)]
+	str	r11, [\vcpup, #CP15_OFFSET(c6_IFAR)]
+	str	r12, [\vcpup, #CP15_OFFSET(c12_VBAR)]
 	.endif
 .endm

@@ -254,17 +255,17 @@ ENDPROC(__kvm_flush_vm_context)
 	.if \vcpu == 0
 	pop	{r2-r12}
 	.else
-	ldr	r2, [\vcpup, #VCPU_CID]
-	ldr	r3, [\vcpup, #VCPU_TID_URW]
-	ldr	r4, [\vcpup, #VCPU_TID_URO]
-	ldr	r5, [\vcpup, #VCPU_TID_PRIV]
-	ldr	r6, [\vcpup, #VCPU_DFSR]
-	ldr	r7, [\vcpup, #VCPU_IFSR]
-	ldr	r8, [\vcpup, #VCPU_ADFSR]
-	ldr	r9, [\vcpup, #VCPU_AIFSR]
-	ldr	r10, [\vcpup, #VCPU_DFAR]
-	ldr	r11, [\vcpup, #VCPU_IFAR]
-	ldr	r12, [\vcpup, #VCPU_VBAR]
+	ldr	r2, [\vcpup, #CP15_OFFSET(c13_CID)]
+	ldr	r3, [\vcpup, #CP15_OFFSET(c13_TID_URW)]
+	ldr	r4, [\vcpup, #CP15_OFFSET(c13_TID_URO)]
+	ldr	r5, [\vcpup, #CP15_OFFSET(c13_TID_PRIV)]
+	ldr	r6, [\vcpup, #CP15_OFFSET(c5_DFSR)]
+	ldr	r7, [\vcpup, #CP15_OFFSET(c5_IFSR)]
+	ldr	r8, [\vcpup, #CP15_OFFSET(c5_ADFSR)]
+	ldr	r9, [\vcpup, #CP15_OFFSET(c5_AIFSR)]
+	ldr	r10, [\vcpup, #CP15_OFFSET(c6_DFAR)]
+	ldr	r11, [\vcpup, #CP15_OFFSET(c6_IFAR)]
+	ldr	r12, [\vcpup, #CP15_OFFSET(c12_VBAR)]
 	.endif

 	mcr	p15, 0, r2, c13, c0, 1	@ CID
@@ -282,18 +283,18 @@ ENDPROC(__kvm_flush_vm_context)
 	.if \vcpu == 0
 	pop	{r2-r12}
 	.else
-	ldr	r2, [\vcpup, #VCPU_SCTLR]
-	ldr	r3, [\vcpup, #VCPU_CPACR]
-	ldr	r4, [\vcpup, #VCPU_TTBCR]
-	ldr	r5, [\vcpup, #VCPU_DACR]
-	add	\vcpup, \vcpup, #VCPU_TTBR0
+	ldr	r2, [\vcpup, #CP15_OFFSET(c1_SCTLR)]
+	ldr	r3, [\vcpup, #CP15_OFFSET(c1_CPACR)]
+	ldr	r4, [\vcpup, #CP15_OFFSET(c2_TTBCR)]
+	ldr	r5, [\vcpup, #CP15_OFFSET(c3_DACR)]
+	add	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR0)
 	ldrd	r6, r7, [\vcpup]
-	add	\vcpup, \vcpup, #(VCPU_TTBR1 - VCPU_TTBR0)
+	add	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
 	ldrd	r8, r9, [\vcpup]
-	sub	\vcpup, \vcpup, #(VCPU_TTBR1)
-	ldr	r10, [\vcpup, #VCPU_PRRR]
-	ldr	r11, [\vcpup, #VCPU_NMRR]
-	ldr	r12, [\vcpup, #VCPU_CSSELR]
+	sub	\vcpup, \vcpup, #CP15_OFFSET(c2_TTBR1)
+	ldr	r10, [\vcpup, #CP15_OFFSET(c10_PRRR)]
+	ldr	r11, [\vcpup, #CP15_OFFSET(c10_NMRR)]
+	ldr	r12, [\vcpup, #CP15_OFFSET(c0_CSSELR)]
 	.endif

 	mcr	p15, 0, r2, c1, c0, 0	@ SCTLR
@@ -556,7 +557,7 @@ ENTRY(__kvm_vcpu_run)
 	mcr	p15, 4, r1, c0, c0, 0

 	@ Write guest view of MPIDR into VMPIDR
-	ldr	r1, [r0, #VCPU_MPIDR]
+	ldr	r1, [r0, #CP15_OFFSET(c0_MPIDR)]
 	mcr	p15, 4, r1, c0, c0, 5

 	@ Load guest registers


>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 8a87fc7..087f9d1 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -41,6 +41,7 @@
>>  #include <asm/kvm_arm.h>
>>  #include <asm/kvm_asm.h>
>>  #include <asm/kvm_mmu.h>
>> +#include <asm/kvm_emulate.h>
>>
>>  #ifdef REQUIRES_VIRT
>>  __asm__(".arch_extension       virt");
>> @@ -50,6 +51,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>>  static unsigned long hyp_default_vectors;
>>
>> +/* The VMID used in the VTTBR */
>> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
>> +static u8 kvm_next_vmid;
>> +static DEFINE_SPINLOCK(kvm_vmid_lock);
>>
>>  int kvm_arch_hardware_enable(void *garbage)
>>  {
>> @@ -273,6 +278,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  {
>>         vcpu->cpu = cpu;
>> +       vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
>>  }
>>
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> @@ -305,12 +311,169 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>>
>>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
>>  {
>> +       return v->mode == IN_GUEST_MODE;
>> +}
>> +
>> +static void reset_vm_context(void *info)
>> +{
>> +       __kvm_flush_vm_context();
>> +}
>> +
>> +/**
>> + * need_new_vmid_gen - check that the VMID is still valid
>> + * @kvm: The VM's VMID to checkt
>> + *
>> + * return true if there is a new generation of VMIDs being used
>> + *
>> + * The hardware supports only 256 values with the value zero reserved for the
>> + * host, so we check if an assigned value belongs to a previous generation,
>> + * which which requires us to assign a new value. If we're the first to use a
>> + * VMID for the new generation, we must flush necessary caches and TLBs on all
>> + * CPUs.
>> + */
>> +static bool need_new_vmid_gen(struct kvm *kvm)
>> +{
>> +       return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
>> +}
>> +
>> +/**
>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>> + * @kvm        The guest that we are about to run
>> + *
>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
>> + * caches and TLBs.
>> + */
>> +static void update_vttbr(struct kvm *kvm)
>> +{
>> +       phys_addr_t pgd_phys;
>> +
>> +       if (!need_new_vmid_gen(kvm))
>> +               return;
>> +
>> +       spin_lock(&kvm_vmid_lock);
>> +
>> +       /* First user of a new VMID generation? */
>> +       if (unlikely(kvm_next_vmid == 0)) {
>> +               atomic64_inc(&kvm_vmid_gen);
>> +               kvm_next_vmid = 1;
>> +
>> +               /*
>> +                * On SMP we know no other CPUs can use this CPU's or
>> +                * each other's VMID since the kvm_vmid_lock blocks
>> +                * them from reentry to the guest.
>> +                */
>> +               on_each_cpu(reset_vm_context, NULL, 1);
>
> Why on_each_cpu? The maintenance operations should be broadcast, right?
>

we need each cpu (that runs guests) to exit guests and pick up the new
vmid in their vttbr.

>> +       }
>> +
>> +       kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
>> +       kvm->arch.vmid = kvm_next_vmid;
>> +       kvm_next_vmid++;
>> +
>> +       /* update vttbr to be used with the new vmid */
>> +       pgd_phys = virt_to_phys(kvm->arch.pgd);
>> +       kvm->arch.vttbr = pgd_phys & ((1LLU << 40) - 1)
>> +                         & ~((2 << VTTBR_X) - 1);
>> +       kvm->arch.vttbr |= (u64)(kvm->arch.vmid) << 48;
>> +
>> +       spin_unlock(&kvm_vmid_lock);
>> +}
>
> This smells like a watered-down version of the ASID allocator. Now, I can't
> really see much code sharing going on here, but perhaps your case is
> simpler... do you anticipate running more than 255 VMs in parallel? If not,
> then you could just invalidate the relevant TLB entries on VM shutdown and
> avoid the rollover case.
>

I want to support running more than 255 VMs in parallel. I think
trying to share code with the ASID allocator complicates things
without any real benefit.

>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index edf9ed5..cc9448b 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -23,6 +23,12 @@
>>  #include <asm/asm-offsets.h>
>>  #include <asm/kvm_asm.h>
>>  #include <asm/kvm_arm.h>
>> +#include <asm/vfpmacros.h>
>> +
>> +#define VCPU_USR_REG(_reg_nr)  (VCPU_USR_REGS + (_reg_nr * 4))
>> +#define VCPU_USR_SP            (VCPU_USR_REG(13))
>> +#define VCPU_FIQ_REG(_reg_nr)  (VCPU_FIQ_REGS + (_reg_nr * 4))
>> +#define VCPU_FIQ_SPSR          (VCPU_FIQ_REG(7))
>>
>>         .text
>>         .align  PAGE_SHIFT
>> @@ -34,7 +40,33 @@ __kvm_hyp_code_start:
>>  @  Flush per-VMID TLBs
>>  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>
> This comment syntax crops up a few times in your .S files but doesn't match
> anything currently under arch/arm/. Please can you follow what we do there
> and use /* */ ?
>

sure

>>  ENTRY(__kvm_tlb_flush_vmid)
>> +       hvc     #0                      @ Switch to Hyp mode
>> +       push    {r2, r3}
>> +
>> +       add     r0, r0, #KVM_VTTBR
>> +       ldrd    r2, r3, [r0]
>> +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>> +       isb
>> +       mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
>> +       dsb
>> +       isb
>> +       mov     r2, #0
>> +       mov     r3, #0
>> +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
>> +       isb
>
> Do you need this isb, given that you're about to do an hvc?
>

they're gone

>> +       pop     {r2, r3}
>> +       hvc     #0                      @ Back to SVC
>>         bx      lr
>>  ENDPROC(__kvm_tlb_flush_vmid)
>>
>> @@ -42,26 +74,702 @@ ENDPROC(__kvm_tlb_flush_vmid)
>>  @  Flush TLBs and instruction caches of current CPU for all VMIDs
>>  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>>
>> +/*
>> + * void __kvm_flush_vm_context(void);
>> + */
>>  ENTRY(__kvm_flush_vm_context)
>> +       hvc     #0                      @ switch to hyp-mode
>> +
>> +       mov     r0, #0                  @ rn parameter for c15 flushes is SBZ
>> +       mcr     p15, 4, r0, c8, c7, 4   @ Invalidate Non-secure Non-Hyp TLB
>> +       mcr     p15, 0, r0, c7, c5, 0   @ Invalidate instruction caches
>> +       dsb
>> +       isb
>
> Likewise.
>

ditto

>> +       hvc     #0                      @ switch back to svc-mode, see hyp_svc
>>         bx      lr
>>  ENDPROC(__kvm_flush_vm_context)
>>
>> +/* These are simply for the macros to work - value don't have meaning */
>> +.equ usr, 0
>> +.equ svc, 1
>> +.equ abt, 2
>> +.equ und, 3
>> +.equ irq, 4
>> +.equ fiq, 5
>> +
>> +.macro store_mode_state base_reg, mode
>> +       .if \mode == usr
>> +       mrs     r2, SP_usr
>> +       mov     r3, lr
>> +       stmdb   \base_reg!, {r2, r3}
>> +       .elseif \mode != fiq
>> +       mrs     r2, SP_\mode
>> +       mrs     r3, LR_\mode
>> +       mrs     r4, SPSR_\mode
>> +       stmdb   \base_reg!, {r2, r3, r4}
>> +       .else
>> +       mrs     r2, r8_fiq
>> +       mrs     r3, r9_fiq
>> +       mrs     r4, r10_fiq
>> +       mrs     r5, r11_fiq
>> +       mrs     r6, r12_fiq
>> +       mrs     r7, SP_fiq
>> +       mrs     r8, LR_fiq
>> +       mrs     r9, SPSR_fiq
>> +       stmdb   \base_reg!, {r2-r9}
>> +       .endif
>> +.endm
>
> Perhaps you could stick the assembly macros into a separate file, like we do
> in assembler.h, so the code is more readable and they can be reused if
> need-be?
>
This is a lot of code to stick in a header file (hard to read within C
constructs, no assembly syntax highlighting, cannot use @ for
end-of-line comments), but I factored it out to interrupts_header.S,
which makes it nicer to read interrupts.S and it should be easy to
factor out pieces if ever needed anywhere else.

-Christoffer

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-09-27 14:13         ` Will Deacon
@ 2012-09-30 19:21           ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30 19:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvm, linux-arm-kernel, kvmarm, rusty.russell, avi, marc.zyngier

On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
>> On 09/25/2012 11:20 AM, Will Deacon wrote:
>> >> +/* Multiprocessor Affinity Register */
>> >> +#define MPIDR_CPUID    (0x3 << 0)
>> >
>> > I'm fairly sure we already have code under arch/arm/ for dealing with the
>> > mpidr. Let's re-use that rather than reinventing it here.
>> >
>>
>> I see some defines in topology.c - do you want some of these factored
>> out into a header file that we can then also use from kvm? If so, where?
>
> I guess either in topology.h or a new header (topology-bits.h).
>
>> >> +#define EXCEPTION_NONE      0
>> >> +#define EXCEPTION_RESET     0x80
>> >> +#define EXCEPTION_UNDEFINED 0x40
>> >> +#define EXCEPTION_SOFTWARE  0x20
>> >> +#define EXCEPTION_PREFETCH  0x10
>> >> +#define EXCEPTION_DATA      0x08
>> >> +#define EXCEPTION_IMPRECISE 0x04
>> >> +#define EXCEPTION_IRQ       0x02
>> >> +#define EXCEPTION_FIQ       0x01
>> >
>> > Why the noise?
>> >
>>
>> these are simply cruft from a previous life of KVM/ARM.
>
> Ok, then please get rid of them.
>
>> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
>> >> +{
>> >> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
>> >> +       BUG_ON(mode == 0xf);
>> >> +       return mode;
>> >> +}
>> >
>> > I noticed that you have a fair few BUG_ONs throughout the series. Fair
>> > enough, but for hyp code is that really the right thing to do? Killing the
>> > guest could make more sense, perhaps?
>>
>> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
>> catch explicitly on the host. We have had a pass over the code to change
>> all the BUG_ONs that can be provoked by the guest and inject the proper
>> exceptions into the guest in this case. If you find places where this is
>> not the case, it should be changed, and do let me know.
>
> Ok, so are you saying that a BUG_ON due to some detected inconsistency with
> one guest may not necessarily terminate the other guests? BUG_ONs in the
> host seem like a bad idea if the host is able to continue with a subset of
> guests.
>

No, I'm saying a BUG_ON is an actual BUG, it should not happen and
there should be nowhere where a guest can cause a BUG_ON to occur in
the host, because that would be a bug.

We basically never kill a guest unless really extreme things happen
(like we cannot allocate a pte, in which case we return -ENOMEM). If a
guest does something really weird, that guest will receive the
appropriate exception (undefined, prefetch abort, ...)

>> >
>> >> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
>> >> +{
>> >> +       return vcpu_reg(vcpu, 15);
>> >> +}
>> >
>> > If you stick a struct pt_regs into struct kvm_regs, you could reuse ARM_pc
>> > here etc.
>> >
>>
>> I prefer not to, because we'd have those registers presumably for usr
>> mode and then we only define the others explicit. I think it's much
>> clearer to look at kvm_regs today.
>
> I disagree and think that you should reuse as much of the arch/arm/ code as
> possible. Not only does it make it easier to read by people who are familiar
> with that code (and in turn get you more reviewers) but it also means that
> we limit the amount of duplication that we have.

Reusing a struct just for the sake of reusing is not necessarily an
improvement. Some times it complicates things, and some times it's
misleading. To me, pt_regs carry the meaning that these are the
registers for a user space process that traps into the kernel - in KVM
we emulate a virtual CPU and that current definition is quite clear.

The argument that more people will review the code if the struct
contains a pt_regs field rather than a usr_regs field is completely
invalid, because I'm sure everyone that reviews virtualization code
will know that user mode is a mode on the cpu and it has some
registers and this is the state we store when we context switch a VM -
pt_regs could be read as the regs that we stored in the mode that the
VM happened to be in when we took an exception, which I would think is
crazy, and probably not what you suggest.

Writing the literal 15 for the PC register is not really a problem in
terms of duplication - it's nothing that requires separate
maintenance.

At this point the priority should really be correctness, readability,
and performance, imho.

>
> I think Marc (CC'd) had a go at this with some success.
>

great, if this improves the code, then I suggest someone rebases an
appropriate patch and sends it to the kvmarm mailing list so we can
have a look at it, but there are users out there looking to try
kvm/arm and we should try to give it to them.

>> >> +#ifndef __ARM_KVM_HOST_H__
>> >> +#define __ARM_KVM_HOST_H__
>> >> +
>> >> +#include <asm/kvm.h>
>> >> +
>> >> +#define KVM_MAX_VCPUS 4
>> >
>> > NR_CPUS?
>> >
>>
>> well this is defined by KVM generic code, and is common for other
>> architecture.
>
> I mean #define KVM_MAX_CPUS NR_CPUS. The 4 seems arbitrary.
>
>> >> +int __attribute_const__ kvm_target_cpu(void)
>> >> +{
>> >> +       unsigned int midr;
>> >> +
>> >> +       midr = read_cpuid_id();
>> >> +       switch ((midr >> 4) & 0xfff) {
>> >> +       case KVM_ARM_TARGET_CORTEX_A15:
>> >> +               return KVM_ARM_TARGET_CORTEX_A15;
>> >
>> > I have this code already in perf_event.c. Can we move it somewhere common
>> > and share it? You should also check that the implementor field is 0x41.
>> >
>>
>> by all means, you can probably suggest a good place better than I can...
>
> cputype.h?
>
>> >> +#include <linux/module.h>
>> >> +
>> >> +EXPORT_SYMBOL_GPL(smp_send_reschedule);
>> >
>> > Erm...
>> >
>> > We already have arch/arm/kernel/armksyms.c for exports -- please use that.
>> > However, exporting such low-level operations sounds like a bad idea. How
>> > realistic is kvm-as-a-module on ARM anyway?
>> >
>>
>> at this point it's broken, so I'll just remove this and leave this for a
>> fun project for some poor soul at some point if anyone ever needs half
>> the code outside the kernel as a module (the other half needs to be
>> compiled in anyway)
>
> Ok, that suits me. If it's broken, let's not include it in the initial
> submission.
>
>> >> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>> >> +{
>> >> +       return -EINVAL;
>> >> +}
>> >> +
>> >> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>> >> +{
>> >> +       return -EINVAL;
>> >> +}
>> >
>> > Again, all looks like this should be implemented using regsets from what I
>> > can tell.
>> >
>>
>> this API has been discussed to death on the KVM lists, and we can of
>> course revive that if the regset makes it nicer - I'd prefer getting
>> this upstream the way that it is now though, where GET_REG / SET_REG
>> seems to be the way forward from a KVM perspective.
>
> I'm sure the API has been discussed, but I've not seen that conversation and
> I'm also not aware about whether or not regsets were considered as a
> possibility for this stuff. The advantages of using them are:
>
>         1. It's less code for the arch to implement (and most of what you
>         need, you already have).
>
>         2. You can move the actual copying code into core KVM, like we have
>         for ptrace.
>
>         3. New KVM ports (e.g. arm64) can reuse the core copying code
>         easily.
>
> Furthermore, some registers (typically) floating point and GPRs will already
> have regsets for the ptrace code, so that can be reused if you share the
> datatypes.
>
> The big problem with getting things upstream and then changing it later is
> that you will break the ABI. I highly doubt that's feasible, so can we not
> just use regsets from the start for ARM?
>
>> >> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>> >> +{
>> >> +       struct kvm_regs *cpu_reset;
>> >> +
>> >> +       switch (vcpu->arch.target) {
>> >> +       case KVM_ARM_TARGET_CORTEX_A15:
>> >> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
>> >> +                       return -EINVAL;
>> >> +               cpu_reset = &a15_regs_reset;
>> >> +               vcpu->arch.midr = read_cpuid_id();
>> >> +               break;
>> >> +       default:
>> >> +               return -ENODEV;
>> >> +       }
>> >> +
>> >> +       /* Reset core registers */
>> >> +       memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
>> >> +
>> >> +       /* Reset CP15 registers */
>> >> +       kvm_reset_coprocs(vcpu);
>> >> +
>> >> +       return 0;
>> >> +}
>> >
>> > This is a nice way to plug in new CPUs but the way the rest of the code is
>> > currently written, all the ARMv7 and Cortex-A15 code is merged together. I
>> > *strongly* suggest you isolate this from the start, as it will help you see
>> > what is architected and what is implementation-specific.
>> >
>>
>> not entirely sure what you mean. You want a separate coproc.c file for
>> Cortex-A15 specific stuff like coproc_a15.c?
>
> Indeed. I think it will make adding new CPUs a lot clearer and separate the
> architecture from the implementation.
>
> Cheers,
>
> Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-09-30 19:21           ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30 19:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
>> On 09/25/2012 11:20 AM, Will Deacon wrote:
>> >> +/* Multiprocessor Affinity Register */
>> >> +#define MPIDR_CPUID    (0x3 << 0)
>> >
>> > I'm fairly sure we already have code under arch/arm/ for dealing with the
>> > mpidr. Let's re-use that rather than reinventing it here.
>> >
>>
>> I see some defines in topology.c - do you want some of these factored
>> out into a header file that we can then also use from kvm? If so, where?
>
> I guess either in topology.h or a new header (topology-bits.h).
>
>> >> +#define EXCEPTION_NONE      0
>> >> +#define EXCEPTION_RESET     0x80
>> >> +#define EXCEPTION_UNDEFINED 0x40
>> >> +#define EXCEPTION_SOFTWARE  0x20
>> >> +#define EXCEPTION_PREFETCH  0x10
>> >> +#define EXCEPTION_DATA      0x08
>> >> +#define EXCEPTION_IMPRECISE 0x04
>> >> +#define EXCEPTION_IRQ       0x02
>> >> +#define EXCEPTION_FIQ       0x01
>> >
>> > Why the noise?
>> >
>>
>> these are simply cruft from a previous life of KVM/ARM.
>
> Ok, then please get rid of them.
>
>> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
>> >> +{
>> >> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
>> >> +       BUG_ON(mode == 0xf);
>> >> +       return mode;
>> >> +}
>> >
>> > I noticed that you have a fair few BUG_ONs throughout the series. Fair
>> > enough, but for hyp code is that really the right thing to do? Killing the
>> > guest could make more sense, perhaps?
>>
>> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
>> catch explicitly on the host. We have had a pass over the code to change
>> all the BUG_ONs that can be provoked by the guest and inject the proper
>> exceptions into the guest in this case. If you find places where this is
>> not the case, it should be changed, and do let me know.
>
> Ok, so are you saying that a BUG_ON due to some detected inconsistency with
> one guest may not necessarily terminate the other guests? BUG_ONs in the
> host seem like a bad idea if the host is able to continue with a subset of
> guests.
>

No, I'm saying a BUG_ON is an actual BUG, it should not happen and
there should be nowhere where a guest can cause a BUG_ON to occur in
the host, because that would be a bug.

We basically never kill a guest unless really extreme things happen
(like we cannot allocate a pte, in which case we return -ENOMEM). If a
guest does something really weird, that guest will receive the
appropriate exception (undefined, prefetch abort, ...)

>> >
>> >> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
>> >> +{
>> >> +       return vcpu_reg(vcpu, 15);
>> >> +}
>> >
>> > If you stick a struct pt_regs into struct kvm_regs, you could reuse ARM_pc
>> > here etc.
>> >
>>
>> I prefer not to, because we'd have those registers presumably for usr
>> mode and then we only define the others explicit. I think it's much
>> clearer to look at kvm_regs today.
>
> I disagree and think that you should reuse as much of the arch/arm/ code as
> possible. Not only does it make it easier to read by people who are familiar
> with that code (and in turn get you more reviewers) but it also means that
> we limit the amount of duplication that we have.

Reusing a struct just for the sake of reusing is not necessarily an
improvement. Some times it complicates things, and some times it's
misleading. To me, pt_regs carry the meaning that these are the
registers for a user space process that traps into the kernel - in KVM
we emulate a virtual CPU and that current definition is quite clear.

The argument that more people will review the code if the struct
contains a pt_regs field rather than a usr_regs field is completely
invalid, because I'm sure everyone that reviews virtualization code
will know that user mode is a mode on the cpu and it has some
registers and this is the state we store when we context switch a VM -
pt_regs could be read as the regs that we stored in the mode that the
VM happened to be in when we took an exception, which I would think is
crazy, and probably not what you suggest.

Writing the literal 15 for the PC register is not really a problem in
terms of duplication - it's nothing that requires separate
maintenance.

At this point the priority should really be correctness, readability,
and performance, imho.

>
> I think Marc (CC'd) had a go at this with some success.
>

great, if this improves the code, then I suggest someone rebases an
appropriate patch and sends it to the kvmarm mailing list so we can
have a look at it, but there are users out there looking to try
kvm/arm and we should try to give it to them.

>> >> +#ifndef __ARM_KVM_HOST_H__
>> >> +#define __ARM_KVM_HOST_H__
>> >> +
>> >> +#include <asm/kvm.h>
>> >> +
>> >> +#define KVM_MAX_VCPUS 4
>> >
>> > NR_CPUS?
>> >
>>
>> well this is defined by KVM generic code, and is common for other
>> architecture.
>
> I mean #define KVM_MAX_CPUS NR_CPUS. The 4 seems arbitrary.
>
>> >> +int __attribute_const__ kvm_target_cpu(void)
>> >> +{
>> >> +       unsigned int midr;
>> >> +
>> >> +       midr = read_cpuid_id();
>> >> +       switch ((midr >> 4) & 0xfff) {
>> >> +       case KVM_ARM_TARGET_CORTEX_A15:
>> >> +               return KVM_ARM_TARGET_CORTEX_A15;
>> >
>> > I have this code already in perf_event.c. Can we move it somewhere common
>> > and share it? You should also check that the implementor field is 0x41.
>> >
>>
>> by all means, you can probably suggest a good place better than I can...
>
> cputype.h?
>
>> >> +#include <linux/module.h>
>> >> +
>> >> +EXPORT_SYMBOL_GPL(smp_send_reschedule);
>> >
>> > Erm...
>> >
>> > We already have arch/arm/kernel/armksyms.c for exports -- please use that.
>> > However, exporting such low-level operations sounds like a bad idea. How
>> > realistic is kvm-as-a-module on ARM anyway?
>> >
>>
>> at this point it's broken, so I'll just remove this and leave this for a
>> fun project for some poor soul at some point if anyone ever needs half
>> the code outside the kernel as a module (the other half needs to be
>> compiled in anyway)
>
> Ok, that suits me. If it's broken, let's not include it in the initial
> submission.
>
>> >> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>> >> +{
>> >> +       return -EINVAL;
>> >> +}
>> >> +
>> >> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>> >> +{
>> >> +       return -EINVAL;
>> >> +}
>> >
>> > Again, all looks like this should be implemented using regsets from what I
>> > can tell.
>> >
>>
>> this API has been discussed to death on the KVM lists, and we can of
>> course revive that if the regset makes it nicer - I'd prefer getting
>> this upstream the way that it is now though, where GET_REG / SET_REG
>> seems to be the way forward from a KVM perspective.
>
> I'm sure the API has been discussed, but I've not seen that conversation and
> I'm also not aware about whether or not regsets were considered as a
> possibility for this stuff. The advantages of using them are:
>
>         1. It's less code for the arch to implement (and most of what you
>         need, you already have).
>
>         2. You can move the actual copying code into core KVM, like we have
>         for ptrace.
>
>         3. New KVM ports (e.g. arm64) can reuse the core copying code
>         easily.
>
> Furthermore, some registers (typically) floating point and GPRs will already
> have regsets for the ptrace code, so that can be reused if you share the
> datatypes.
>
> The big problem with getting things upstream and then changing it later is
> that you will break the ABI. I highly doubt that's feasible, so can we not
> just use regsets from the start for ARM?
>
>> >> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>> >> +{
>> >> +       struct kvm_regs *cpu_reset;
>> >> +
>> >> +       switch (vcpu->arch.target) {
>> >> +       case KVM_ARM_TARGET_CORTEX_A15:
>> >> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
>> >> +                       return -EINVAL;
>> >> +               cpu_reset = &a15_regs_reset;
>> >> +               vcpu->arch.midr = read_cpuid_id();
>> >> +               break;
>> >> +       default:
>> >> +               return -ENODEV;
>> >> +       }
>> >> +
>> >> +       /* Reset core registers */
>> >> +       memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
>> >> +
>> >> +       /* Reset CP15 registers */
>> >> +       kvm_reset_coprocs(vcpu);
>> >> +
>> >> +       return 0;
>> >> +}
>> >
>> > This is a nice way to plug in new CPUs but the way the rest of the code is
>> > currently written, all the ARMv7 and Cortex-A15 code is merged together. I
>> > *strongly* suggest you isolate this from the start, as it will help you see
>> > what is architected and what is implementation-specific.
>> >
>>
>> not entirely sure what you mean. You want a separate coproc.c file for
>> Cortex-A15 specific stuff like coproc_a15.c?
>
> Indeed. I think it will make adding new CPUs a lot clearer and separate the
> architecture from the implementation.
>
> Cheers,
>
> Will

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts
  2012-09-27 15:11     ` Will Deacon
@ 2012-09-30 21:49       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30 21:49 UTC (permalink / raw)
  To: Will Deacon; +Cc: dave.martin, linux-arm-kernel, kvm, kvmarm

On Thu, Sep 27, 2012 at 11:11 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:35:59PM +0100, Christoffer Dall wrote:
>> When the guest accesses I/O memory this will create data abort
>> exceptions and they are handled by decoding the HSR information
>> (physical address, read/write, length, register) and forwarding reads
>> and writes to QEMU which performs the device emulation.
>>
>> Certain classes of load/store operations do not support the syndrome
>> information provided in the HSR and we therefore must be able to fetch
>> the offending instruction from guest memory and decode it manually.
>>
>> We only support instruction decoding for valid reasonable MMIO operations
>> where trapping them do not provide sufficient information in the HSR (no
>> 16-bit Thumb instructions provide register writeback that we care about).
>>
>> The following instruciton types are NOT supported for MMIO operations
>> despite the HSR not containing decode info:
>>  - any Load/Store multiple
>>  - any load/store exclusive
>>  - any load/store dual
>>  - anything with the PC as the dest register
>
> [...]
>
>> +
>> +/******************************************************************************
>> + * Load-Store instruction emulation
>> + *****************************************************************************/
>> +
>> +/*
>> + * Must be ordered with LOADS first and WRITES afterwards
>> + * for easy distinction when doing MMIO.
>> + */
>> +#define NUM_LD_INSTR  9
>> +enum INSTR_LS_INDEXES {
>> +       INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
>> +       INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
>> +       INSTR_LS_LDRSH,
>> +       INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
>> +       INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
>> +       NUM_LS_INSTR
>> +};
>> +
>> +static u32 ls_instr[NUM_LS_INSTR][2] = {
>> +       {0x04700000, 0x0d700000}, /* LDRBT */
>> +       {0x04300000, 0x0d700000}, /* LDRT  */
>> +       {0x04100000, 0x0c500000}, /* LDR   */
>> +       {0x04500000, 0x0c500000}, /* LDRB  */
>> +       {0x000000d0, 0x0e1000f0}, /* LDRD  */
>> +       {0x01900090, 0x0ff000f0}, /* LDREX */
>> +       {0x001000b0, 0x0e1000f0}, /* LDRH  */
>> +       {0x001000d0, 0x0e1000f0}, /* LDRSB */
>> +       {0x001000f0, 0x0e1000f0}, /* LDRSH */
>> +       {0x04600000, 0x0d700000}, /* STRBT */
>> +       {0x04200000, 0x0d700000}, /* STRT  */
>> +       {0x04000000, 0x0c500000}, /* STR   */
>> +       {0x04400000, 0x0c500000}, /* STRB  */
>> +       {0x000000f0, 0x0e1000f0}, /* STRD  */
>> +       {0x01800090, 0x0ff000f0}, /* STREX */
>> +       {0x000000b0, 0x0e1000f0}  /* STRH  */
>> +};
>> +
>> +static inline int get_arm_ls_instr_index(u32 instr)
>> +{
>> +       return kvm_instr_index(instr, ls_instr, NUM_LS_INSTR);
>> +}
>> +
>> +/*
>> + * Load-Store instruction decoding
>> + */
>> +#define INSTR_LS_TYPE_BIT              26
>> +#define INSTR_LS_RD_MASK               0x0000f000
>> +#define INSTR_LS_RD_SHIFT              12
>> +#define INSTR_LS_RN_MASK               0x000f0000
>> +#define INSTR_LS_RN_SHIFT              16
>> +#define INSTR_LS_RM_MASK               0x0000000f
>> +#define INSTR_LS_OFFSET12_MASK         0x00000fff
>
> I'm afraid you're not going to thank me much for this, but it's high time we
> unified the various instruction decoding functions we have under arch/arm/
> and this seems like a good opportunity for that. For example, look at the
> following snippets (there is much more in the files I list) in addition to
> what you have:
>

I think it would be great if we had a set of unified decoding functions!

However, I think it's a shame if we can't merge KVM because we want to
clean up this code. There would be nothing in the way of breaking API
or anything like that preventing us from cleaning this up after the
code has been merged.

Please consider reviewing the code for correctness and consider if the
code could be merged as is.

On the other hand, if you or Dave or anyone else wants to create a set
of patches that cleans this up in a timely manner, I will be happy to
merge them for my code as well ;)

>
> asm/ptrace.h
> -------------
> #define PSR_T_BIT       0x00000020
> #define PSR_F_BIT       0x00000040
> #define PSR_I_BIT       0x00000080
> #define PSR_A_BIT       0x00000100
> #define PSR_E_BIT       0x00000200
> #define PSR_J_BIT       0x01000000
> #define PSR_Q_BIT       0x08000000
> #define PSR_V_BIT       0x10000000
> #define PSR_C_BIT       0x20000000
> #define PSR_Z_BIT       0x40000000
> #define PSR_N_BIT       0x80000000
>
> mm/alignment.c
> --------------
> #define LDST_I_BIT(i)   (i & (1 << 26))         /* Immediate constant   */
> #define LDST_P_BIT(i)   (i & (1 << 24))         /* Preindex             */
> #define LDST_U_BIT(i)   (i & (1 << 23))         /* Add offset           */
> #define LDST_W_BIT(i)   (i & (1 << 21))         /* Writeback            */
> #define LDST_L_BIT(i)   (i & (1 << 20))         /* Load                 */
>
> kernel/kprobes*.c
> -----------------
> static void __kprobes
> emulate_ldr(struct kprobe *p, struct pt_regs *regs)
> {
>         kprobe_opcode_t insn = p->opcode;
>         unsigned long pc = (unsigned long)p->addr + 8;
>         int rt = (insn >> 12) & 0xf;
>         int rn = (insn >> 16) & 0xf;
>         int rm = insn & 0xf;
>
> kernel/opcodes.c
> ----------------
> static const unsigned short cc_map[16] = {
>         0xF0F0,                 /* EQ == Z set            */
>         0x0F0F,                 /* NE                     */
>         0xCCCC,                 /* CS == C set            */
>         0x3333,                 /* CC                     */
>         0xFF00,                 /* MI == N set            */
>         0x00FF,                 /* PL                     */
>         0xAAAA,                 /* VS == V set            */
>         0x5555,                 /* VC                     */
>         0x0C0C,                 /* HI == C set && Z clear */
>         0xF3F3,                 /* LS == C clear || Z set */
>         0xAA55,                 /* GE == (N==V)           */
>         0x55AA,                 /* LT == (N!=V)           */
>         0x0A05,                 /* GT == (!Z && (N==V))   */
>         0xF5FA,                 /* LE == (Z || (N!=V))    */
>         0xFFFF,                 /* AL always              */
>         0                       /* NV                     */
> };
>
> kernel/swp_emulate.c
> --------------------
> #define EXTRACT_REG_NUM(instruction, offset) \
>         (((instruction) & (0xf << (offset))) >> (offset))
> #define RN_OFFSET  16
> #define RT_OFFSET  12
> #define RT2_OFFSET  0
>
>
> There are also bits and pieces with the patching frameworks and module
> relocations that could benefit from some code sharing. Now, I think Dave had
> some ideas about moving a load of this code into a common disassembler under
> arch/arm/ so it would be great to tie that in here and implement that for
> load/store instructions. Then other users can augment the number of
> supported instruction classes as and when it is required.
>

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 14/15] KVM: ARM: Handle I/O aborts
@ 2012-09-30 21:49       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-09-30 21:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 27, 2012 at 11:11 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Sep 15, 2012 at 04:35:59PM +0100, Christoffer Dall wrote:
>> When the guest accesses I/O memory this will create data abort
>> exceptions and they are handled by decoding the HSR information
>> (physical address, read/write, length, register) and forwarding reads
>> and writes to QEMU which performs the device emulation.
>>
>> Certain classes of load/store operations do not support the syndrome
>> information provided in the HSR and we therefore must be able to fetch
>> the offending instruction from guest memory and decode it manually.
>>
>> We only support instruction decoding for valid reasonable MMIO operations
>> where trapping them do not provide sufficient information in the HSR (no
>> 16-bit Thumb instructions provide register writeback that we care about).
>>
>> The following instruciton types are NOT supported for MMIO operations
>> despite the HSR not containing decode info:
>>  - any Load/Store multiple
>>  - any load/store exclusive
>>  - any load/store dual
>>  - anything with the PC as the dest register
>
> [...]
>
>> +
>> +/******************************************************************************
>> + * Load-Store instruction emulation
>> + *****************************************************************************/
>> +
>> +/*
>> + * Must be ordered with LOADS first and WRITES afterwards
>> + * for easy distinction when doing MMIO.
>> + */
>> +#define NUM_LD_INSTR  9
>> +enum INSTR_LS_INDEXES {
>> +       INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
>> +       INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
>> +       INSTR_LS_LDRSH,
>> +       INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
>> +       INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
>> +       NUM_LS_INSTR
>> +};
>> +
>> +static u32 ls_instr[NUM_LS_INSTR][2] = {
>> +       {0x04700000, 0x0d700000}, /* LDRBT */
>> +       {0x04300000, 0x0d700000}, /* LDRT  */
>> +       {0x04100000, 0x0c500000}, /* LDR   */
>> +       {0x04500000, 0x0c500000}, /* LDRB  */
>> +       {0x000000d0, 0x0e1000f0}, /* LDRD  */
>> +       {0x01900090, 0x0ff000f0}, /* LDREX */
>> +       {0x001000b0, 0x0e1000f0}, /* LDRH  */
>> +       {0x001000d0, 0x0e1000f0}, /* LDRSB */
>> +       {0x001000f0, 0x0e1000f0}, /* LDRSH */
>> +       {0x04600000, 0x0d700000}, /* STRBT */
>> +       {0x04200000, 0x0d700000}, /* STRT  */
>> +       {0x04000000, 0x0c500000}, /* STR   */
>> +       {0x04400000, 0x0c500000}, /* STRB  */
>> +       {0x000000f0, 0x0e1000f0}, /* STRD  */
>> +       {0x01800090, 0x0ff000f0}, /* STREX */
>> +       {0x000000b0, 0x0e1000f0}  /* STRH  */
>> +};
>> +
>> +static inline int get_arm_ls_instr_index(u32 instr)
>> +{
>> +       return kvm_instr_index(instr, ls_instr, NUM_LS_INSTR);
>> +}
>> +
>> +/*
>> + * Load-Store instruction decoding
>> + */
>> +#define INSTR_LS_TYPE_BIT              26
>> +#define INSTR_LS_RD_MASK               0x0000f000
>> +#define INSTR_LS_RD_SHIFT              12
>> +#define INSTR_LS_RN_MASK               0x000f0000
>> +#define INSTR_LS_RN_SHIFT              16
>> +#define INSTR_LS_RM_MASK               0x0000000f
>> +#define INSTR_LS_OFFSET12_MASK         0x00000fff
>
> I'm afraid you're not going to thank me much for this, but it's high time we
> unified the various instruction decoding functions we have under arch/arm/
> and this seems like a good opportunity for that. For example, look at the
> following snippets (there is much more in the files I list) in addition to
> what you have:
>

I think it would be great if we had a set of unified decoding functions!

However, I think it's a shame if we can't merge KVM because we want to
clean up this code. There would be nothing in the way of breaking API
or anything like that preventing us from cleaning this up after the
code has been merged.

Please consider reviewing the code for correctness and consider if the
code could be merged as is.

On the other hand, if you or Dave or anyone else wants to create a set
of patches that cleans this up in a timely manner, I will be happy to
merge them for my code as well ;)

>
> asm/ptrace.h
> -------------
> #define PSR_T_BIT       0x00000020
> #define PSR_F_BIT       0x00000040
> #define PSR_I_BIT       0x00000080
> #define PSR_A_BIT       0x00000100
> #define PSR_E_BIT       0x00000200
> #define PSR_J_BIT       0x01000000
> #define PSR_Q_BIT       0x08000000
> #define PSR_V_BIT       0x10000000
> #define PSR_C_BIT       0x20000000
> #define PSR_Z_BIT       0x40000000
> #define PSR_N_BIT       0x80000000
>
> mm/alignment.c
> --------------
> #define LDST_I_BIT(i)   (i & (1 << 26))         /* Immediate constant   */
> #define LDST_P_BIT(i)   (i & (1 << 24))         /* Preindex             */
> #define LDST_U_BIT(i)   (i & (1 << 23))         /* Add offset           */
> #define LDST_W_BIT(i)   (i & (1 << 21))         /* Writeback            */
> #define LDST_L_BIT(i)   (i & (1 << 20))         /* Load                 */
>
> kernel/kprobes*.c
> -----------------
> static void __kprobes
> emulate_ldr(struct kprobe *p, struct pt_regs *regs)
> {
>         kprobe_opcode_t insn = p->opcode;
>         unsigned long pc = (unsigned long)p->addr + 8;
>         int rt = (insn >> 12) & 0xf;
>         int rn = (insn >> 16) & 0xf;
>         int rm = insn & 0xf;
>
> kernel/opcodes.c
> ----------------
> static const unsigned short cc_map[16] = {
>         0xF0F0,                 /* EQ == Z set            */
>         0x0F0F,                 /* NE                     */
>         0xCCCC,                 /* CS == C set            */
>         0x3333,                 /* CC                     */
>         0xFF00,                 /* MI == N set            */
>         0x00FF,                 /* PL                     */
>         0xAAAA,                 /* VS == V set            */
>         0x5555,                 /* VC                     */
>         0x0C0C,                 /* HI == C set && Z clear */
>         0xF3F3,                 /* LS == C clear || Z set */
>         0xAA55,                 /* GE == (N==V)           */
>         0x55AA,                 /* LT == (N!=V)           */
>         0x0A05,                 /* GT == (!Z && (N==V))   */
>         0xF5FA,                 /* LE == (Z || (N!=V))    */
>         0xFFFF,                 /* AL always              */
>         0                       /* NV                     */
> };
>
> kernel/swp_emulate.c
> --------------------
> #define EXTRACT_REG_NUM(instruction, offset) \
>         (((instruction) & (0xf << (offset))) >> (offset))
> #define RN_OFFSET  16
> #define RT_OFFSET  12
> #define RT2_OFFSET  0
>
>
> There are also bits and pieces with the patching frameworks and module
> relocations that could benefit from some code sharing. Now, I think Dave had
> some ideas about moving a load of this code into a common disassembler under
> arch/arm/ so it would be great to tie that in here and implement that for
> load/store instructions. Then other users can augment the number of
> supported instruction classes as and when it is required.
>

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 03/15] ARM: Section based HYP idmap
  2012-09-18 13:00     ` Will Deacon
@ 2012-10-01  2:19       ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-10-01  2:19 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier

[snip]

>> +
>> +static int __init hyp_init_static_idmap(void)
>> +{
>> +     hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
>> +     if (!hyp_pgd)
>> +             return -ENOMEM;
>> +
>> +     hyp_idmap_setup();
>> +
>> +     return 0;
>> +}
>> +early_initcall(hyp_init_static_idmap);
>> +#endif
>
> I'd rather the alloc/free functions for the hyp pgd were somewhere else,
> like they are for standard pgds. Then we can just call them here without
> having to encode knowledge of PGD size etc in the mapping code.
>
this used to be the case, but Marc changed it iirc., so just cc'ing
him. The following is an attempt at what you're looking for if I
understand you correctly:

diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
index a1ab8d6..36708ba 100644
--- a/arch/arm/include/asm/idmap.h
+++ b/arch/arm/include/asm/idmap.h
@@ -12,10 +12,8 @@ extern pgd_t *idmap_pgd;
 void setup_mm_for_reboot(void);

 #ifdef CONFIG_ARM_VIRT_EXT
-extern pgd_t *hyp_pgd;
-
-void hyp_idmap_teardown(void);
-void hyp_idmap_setup(void);
+void hyp_idmap_teardown(pgd_t *hyp_pgd);
+void hyp_idmap_setup(pgd_t *hyp_pgd);
 #endif

 #endif	/* __ASM_IDMAP_H */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index c3f90b0..ecfaaf0 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -43,4 +43,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
struct kvm_run *run);

 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);

+unsigned long kvm_mmu_get_httbr(void);
+int kvm_mmu_init(void);
+void kvm_mmu_exit(void);
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 7e11280..d64ce2a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -35,7 +35,6 @@
 #include <asm/ptrace.h>
 #include <asm/mman.h>
 #include <asm/cputype.h>
-#include <asm/idmap.h>
 #include <asm/tlbflush.h>
 #include <asm/cacheflush.h>
 #include <asm/virt.h>
@@ -887,7 +886,7 @@ static void cpu_init_hyp_mode(void *vector)
 	/* Switch from the HYP stub to our own HYP init vector */
 	__hyp_set_vectors((unsigned long)vector);

-	pgd_ptr = virt_to_phys(hyp_pgd);
+	pgd_ptr = kvm_mmu_get_httbr();
 	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
 	hyp_stack_ptr = stack_page + PAGE_SIZE;
 	vector_ptr = (unsigned long)__kvm_hyp_vector;
@@ -918,6 +917,13 @@ static int init_hyp_mode(void)
 	int err = 0;

 	/*
+	 * Allocate Hyp PGD and setup Hyp identity mapping
+	 */
+	err = kvm_mmu_init();
+	if (err)
+		return err;
+
+	/*
 	 * It is probably enough to obtain the default on one
 	 * CPU. It's unlikely to be different on the others.
 	 */
@@ -954,7 +960,7 @@ static int init_hyp_mode(void)
 	/*
 	 * Unmap the identity mapping
 	 */
-	hyp_idmap_teardown();
+	kvm_mmu_exit();

 	/*
 	 * Map the Hyp-code called directly from the host
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 9499d4d..a35a8a9 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -34,6 +34,7 @@
 #include "trace.h"

 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
+static pgd_t *hyp_pgd;

 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
 				  int min, int max)
@@ -994,3 +995,23 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
 	mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
 }
+
+unsigned long kvm_mmu_get_httbr(void)
+{
+	return virt_to_phys(hyp_pgd);
+}
+
+int kvm_mmu_init(void)
+{
+	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!hyp_pgd)
+		return -ENOMEM;
+
+	hyp_idmap_setup(hyp_pgd);
+	return 0;
+}
+
+void kvm_mmu_exit(void)
+{
+	hyp_idmap_teardown(hyp_pgd);
+}
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index f0ab339..ea7430e 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -102,9 +102,6 @@ static int __init init_static_idmap(void)
 early_initcall(init_static_idmap);

 #if defined(CONFIG_ARM_VIRT_EXT) && defined(CONFIG_ARM_LPAE)
-pgd_t *hyp_pgd;
-EXPORT_SYMBOL_GPL(hyp_pgd);
-
 static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
 {
 	pud_t *pud;
@@ -123,7 +120,7 @@ extern char  __hyp_idmap_text_start[],
__hyp_idmap_text_end[];
  * This version actually frees the underlying pmds for all pgds in range and
  * clear the pgds themselves afterwards.
  */
-void hyp_idmap_teardown(void)
+void hyp_idmap_teardown(pgd_t *hyp_pgd)
 {
 	unsigned long addr, end;
 	unsigned long next;
@@ -141,27 +138,12 @@ void hyp_idmap_teardown(void)
 }
 EXPORT_SYMBOL_GPL(hyp_idmap_teardown);

-void hyp_idmap_setup(void)
+void hyp_idmap_setup(pgd_t *hyp_pgd)
 {
 	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
 			     __hyp_idmap_text_end, PMD_SECT_AP1);
 }
 EXPORT_SYMBOL_GPL(hyp_idmap_setup);
-
-static int __init hyp_init_static_idmap(void)
-{
-	if (!is_hyp_mode_available())
-		return 0;
-
-	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
-	if (!hyp_pgd)
-		return -ENOMEM;
-
-	hyp_idmap_setup();
-
-	return 0;
-}
-early_initcall(hyp_init_static_idmap);
 #endif

 /*

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* [PATCH 03/15] ARM: Section based HYP idmap
@ 2012-10-01  2:19       ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-10-01  2:19 UTC (permalink / raw)
  To: linux-arm-kernel

[snip]

>> +
>> +static int __init hyp_init_static_idmap(void)
>> +{
>> +     hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
>> +     if (!hyp_pgd)
>> +             return -ENOMEM;
>> +
>> +     hyp_idmap_setup();
>> +
>> +     return 0;
>> +}
>> +early_initcall(hyp_init_static_idmap);
>> +#endif
>
> I'd rather the alloc/free functions for the hyp pgd were somewhere else,
> like they are for standard pgds. Then we can just call them here without
> having to encode knowledge of PGD size etc in the mapping code.
>
this used to be the case, but Marc changed it iirc., so just cc'ing
him. The following is an attempt at what you're looking for if I
understand you correctly:

diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
index a1ab8d6..36708ba 100644
--- a/arch/arm/include/asm/idmap.h
+++ b/arch/arm/include/asm/idmap.h
@@ -12,10 +12,8 @@ extern pgd_t *idmap_pgd;
 void setup_mm_for_reboot(void);

 #ifdef CONFIG_ARM_VIRT_EXT
-extern pgd_t *hyp_pgd;
-
-void hyp_idmap_teardown(void);
-void hyp_idmap_setup(void);
+void hyp_idmap_teardown(pgd_t *hyp_pgd);
+void hyp_idmap_setup(pgd_t *hyp_pgd);
 #endif

 #endif	/* __ASM_IDMAP_H */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index c3f90b0..ecfaaf0 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -43,4 +43,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
struct kvm_run *run);

 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);

+unsigned long kvm_mmu_get_httbr(void);
+int kvm_mmu_init(void);
+void kvm_mmu_exit(void);
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 7e11280..d64ce2a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -35,7 +35,6 @@
 #include <asm/ptrace.h>
 #include <asm/mman.h>
 #include <asm/cputype.h>
-#include <asm/idmap.h>
 #include <asm/tlbflush.h>
 #include <asm/cacheflush.h>
 #include <asm/virt.h>
@@ -887,7 +886,7 @@ static void cpu_init_hyp_mode(void *vector)
 	/* Switch from the HYP stub to our own HYP init vector */
 	__hyp_set_vectors((unsigned long)vector);

-	pgd_ptr = virt_to_phys(hyp_pgd);
+	pgd_ptr = kvm_mmu_get_httbr();
 	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
 	hyp_stack_ptr = stack_page + PAGE_SIZE;
 	vector_ptr = (unsigned long)__kvm_hyp_vector;
@@ -918,6 +917,13 @@ static int init_hyp_mode(void)
 	int err = 0;

 	/*
+	 * Allocate Hyp PGD and setup Hyp identity mapping
+	 */
+	err = kvm_mmu_init();
+	if (err)
+		return err;
+
+	/*
 	 * It is probably enough to obtain the default on one
 	 * CPU. It's unlikely to be different on the others.
 	 */
@@ -954,7 +960,7 @@ static int init_hyp_mode(void)
 	/*
 	 * Unmap the identity mapping
 	 */
-	hyp_idmap_teardown();
+	kvm_mmu_exit();

 	/*
 	 * Map the Hyp-code called directly from the host
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 9499d4d..a35a8a9 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -34,6 +34,7 @@
 #include "trace.h"

 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
+static pgd_t *hyp_pgd;

 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
 				  int min, int max)
@@ -994,3 +995,23 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
 	mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
 }
+
+unsigned long kvm_mmu_get_httbr(void)
+{
+	return virt_to_phys(hyp_pgd);
+}
+
+int kvm_mmu_init(void)
+{
+	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!hyp_pgd)
+		return -ENOMEM;
+
+	hyp_idmap_setup(hyp_pgd);
+	return 0;
+}
+
+void kvm_mmu_exit(void)
+{
+	hyp_idmap_teardown(hyp_pgd);
+}
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index f0ab339..ea7430e 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -102,9 +102,6 @@ static int __init init_static_idmap(void)
 early_initcall(init_static_idmap);

 #if defined(CONFIG_ARM_VIRT_EXT) && defined(CONFIG_ARM_LPAE)
-pgd_t *hyp_pgd;
-EXPORT_SYMBOL_GPL(hyp_pgd);
-
 static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
 {
 	pud_t *pud;
@@ -123,7 +120,7 @@ extern char  __hyp_idmap_text_start[],
__hyp_idmap_text_end[];
  * This version actually frees the underlying pmds for all pgds in range and
  * clear the pgds themselves afterwards.
  */
-void hyp_idmap_teardown(void)
+void hyp_idmap_teardown(pgd_t *hyp_pgd)
 {
 	unsigned long addr, end;
 	unsigned long next;
@@ -141,27 +138,12 @@ void hyp_idmap_teardown(void)
 }
 EXPORT_SYMBOL_GPL(hyp_idmap_teardown);

-void hyp_idmap_setup(void)
+void hyp_idmap_setup(pgd_t *hyp_pgd)
 {
 	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
 			     __hyp_idmap_text_end, PMD_SECT_AP1);
 }
 EXPORT_SYMBOL_GPL(hyp_idmap_setup);
-
-static int __init hyp_init_static_idmap(void)
-{
-	if (!is_hyp_mode_available())
-		return 0;
-
-	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
-	if (!hyp_pgd)
-		return -ENOMEM;
-
-	hyp_idmap_setup();
-
-	return 0;
-}
-early_initcall(hyp_init_static_idmap);
 #endif

 /*

^ permalink raw reply related	[flat|nested] 164+ messages in thread

* Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts
  2012-09-30 21:49       ` Christoffer Dall
@ 2012-10-01 12:53         ` Dave Martin
  -1 siblings, 0 replies; 164+ messages in thread
From: Dave Martin @ 2012-10-01 12:53 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Will Deacon, Tixy, kvm, linux-arm-kernel, kvmarm

On Sun, Sep 30, 2012 at 05:49:21PM -0400, Christoffer Dall wrote:
> On Thu, Sep 27, 2012 at 11:11 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Sat, Sep 15, 2012 at 04:35:59PM +0100, Christoffer Dall wrote:
> >> When the guest accesses I/O memory this will create data abort
> >> exceptions and they are handled by decoding the HSR information
> >> (physical address, read/write, length, register) and forwarding reads
> >> and writes to QEMU which performs the device emulation.
> >>
> >> Certain classes of load/store operations do not support the syndrome
> >> information provided in the HSR and we therefore must be able to fetch
> >> the offending instruction from guest memory and decode it manually.
> >>
> >> We only support instruction decoding for valid reasonable MMIO operations
> >> where trapping them do not provide sufficient information in the HSR (no
> >> 16-bit Thumb instructions provide register writeback that we care about).
> >>
> >> The following instruciton types are NOT supported for MMIO operations
> >> despite the HSR not containing decode info:
> >>  - any Load/Store multiple
> >>  - any load/store exclusive
> >>  - any load/store dual
> >>  - anything with the PC as the dest register
> >
> > [...]
> >
> >> +
> >> +/******************************************************************************
> >> + * Load-Store instruction emulation
> >> + *****************************************************************************/
> >> +
> >> +/*
> >> + * Must be ordered with LOADS first and WRITES afterwards
> >> + * for easy distinction when doing MMIO.
> >> + */
> >> +#define NUM_LD_INSTR  9
> >> +enum INSTR_LS_INDEXES {
> >> +       INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
> >> +       INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
> >> +       INSTR_LS_LDRSH,
> >> +       INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
> >> +       INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
> >> +       NUM_LS_INSTR
> >> +};
> >> +
> >> +static u32 ls_instr[NUM_LS_INSTR][2] = {
> >> +       {0x04700000, 0x0d700000}, /* LDRBT */
> >> +       {0x04300000, 0x0d700000}, /* LDRT  */
> >> +       {0x04100000, 0x0c500000}, /* LDR   */
> >> +       {0x04500000, 0x0c500000}, /* LDRB  */
> >> +       {0x000000d0, 0x0e1000f0}, /* LDRD  */
> >> +       {0x01900090, 0x0ff000f0}, /* LDREX */
> >> +       {0x001000b0, 0x0e1000f0}, /* LDRH  */
> >> +       {0x001000d0, 0x0e1000f0}, /* LDRSB */
> >> +       {0x001000f0, 0x0e1000f0}, /* LDRSH */
> >> +       {0x04600000, 0x0d700000}, /* STRBT */
> >> +       {0x04200000, 0x0d700000}, /* STRT  */
> >> +       {0x04000000, 0x0c500000}, /* STR   */
> >> +       {0x04400000, 0x0c500000}, /* STRB  */
> >> +       {0x000000f0, 0x0e1000f0}, /* STRD  */
> >> +       {0x01800090, 0x0ff000f0}, /* STREX */
> >> +       {0x000000b0, 0x0e1000f0}  /* STRH  */
> >> +};
> >> +
> >> +static inline int get_arm_ls_instr_index(u32 instr)
> >> +{
> >> +       return kvm_instr_index(instr, ls_instr, NUM_LS_INSTR);
> >> +}
> >> +
> >> +/*
> >> + * Load-Store instruction decoding
> >> + */
> >> +#define INSTR_LS_TYPE_BIT              26
> >> +#define INSTR_LS_RD_MASK               0x0000f000
> >> +#define INSTR_LS_RD_SHIFT              12
> >> +#define INSTR_LS_RN_MASK               0x000f0000
> >> +#define INSTR_LS_RN_SHIFT              16
> >> +#define INSTR_LS_RM_MASK               0x0000000f
> >> +#define INSTR_LS_OFFSET12_MASK         0x00000fff
> >
> > I'm afraid you're not going to thank me much for this, but it's high time we
> > unified the various instruction decoding functions we have under arch/arm/
> > and this seems like a good opportunity for that. For example, look at the
> > following snippets (there is much more in the files I list) in addition to
> > what you have:
> >
> 
> I think it would be great if we had a set of unified decoding functions!
> 
> However, I think it's a shame if we can't merge KVM because we want to
> clean up this code. There would be nothing in the way of breaking API
> or anything like that preventing us from cleaning this up after the
> code has been merged.
> 
> Please consider reviewing the code for correctness and consider if the
> code could be merged as is.
> 
> On the other hand, if you or Dave or anyone else wants to create a set
> of patches that cleans this up in a timely manner, I will be happy to
> merge them for my code as well ;)

The time I would have available to put into this is rather limited, but
I have some initial ideas, as outlined below.

Tixy (who did the kprobes implementation, which is probably the most
sophisticated opcode handling we have in the kernel right now) may have
ideas too.  I would hope that any common framework could reuse a fair
chunk of his implementation and test coverage.



I think that a common framework would have to start out a lot more
generic that the special-purpose code in current subsystems, at least
initially.  We should try to move all the knowledge about how
instructions are encoded out of subsystems and into the common
framework, so far as possible.


We might end up with an interface like this:

Instruction data in memory <-> __arm_mem_to_opcode*() and friends <-> canonical form

canonical form -> __arm_opcode_decode() -> decoded form

decoded form -> __arm_opcode_encode() -> canonical form


The decoded form might look something like this:

struct arm_insn {
	u32			opcode;
	u32			flags;		/* common flags */
	enum arm_insn_type	type;
	u32			insn_flags;	/* insn specific flags */
	enum arm_reg		Rd;
	enum arm_reg		Rn;
	enum arm_reg		Rm;
	enum arm_reg		Rs;
	enum arm_reg		Rt;
	enum arm_reg		Ra;

	/* ... */
};

... and so on.  This just a sketch, and deliberately very incomplete.

The basic principle here is that the client gets the architectural,
encoding-independent view of the instruction: as far as possible, client
code should never need to know anything about how the encoding works.
The client code should not even need to know for most purposes whether
the instruction is ARM or Thumb, once decoded.

Identifying instruction classes (i.e., is this a VFP/NEON instruction,
does this access memory, does this change the PC) etc. should all be
abstracted as helper functions / macros.

All cleverness involving tables and hashes to accelerate decode and
identify instruction classes should move into the framework, but
we can start out with something stupid and simple: if we only need
to distinguish a few instruction types to begin with, a simple switch
statement for decode may be enough to get started.  However, as
things mature, a more sophisticated table-driven approach could be


A good starting point would be load/store emulation as this seems to be a
common theme, and we would need a credible deployment for any new
framework so that we know it's fit for purpose.

So just refactoring the code that appears in the KVM patches could be a
good way of getting such a framework off the ground.

Subsystems could be migrated to the new framework incrementally after
its initial creation, while extending the framework as needed.  This
means that the initial that the initial framework could be pretty simple
-- we don't need all the functionality all at once.


[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 14/15] KVM: ARM: Handle I/O aborts
@ 2012-10-01 12:53         ` Dave Martin
  0 siblings, 0 replies; 164+ messages in thread
From: Dave Martin @ 2012-10-01 12:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Sep 30, 2012 at 05:49:21PM -0400, Christoffer Dall wrote:
> On Thu, Sep 27, 2012 at 11:11 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Sat, Sep 15, 2012 at 04:35:59PM +0100, Christoffer Dall wrote:
> >> When the guest accesses I/O memory this will create data abort
> >> exceptions and they are handled by decoding the HSR information
> >> (physical address, read/write, length, register) and forwarding reads
> >> and writes to QEMU which performs the device emulation.
> >>
> >> Certain classes of load/store operations do not support the syndrome
> >> information provided in the HSR and we therefore must be able to fetch
> >> the offending instruction from guest memory and decode it manually.
> >>
> >> We only support instruction decoding for valid reasonable MMIO operations
> >> where trapping them do not provide sufficient information in the HSR (no
> >> 16-bit Thumb instructions provide register writeback that we care about).
> >>
> >> The following instruciton types are NOT supported for MMIO operations
> >> despite the HSR not containing decode info:
> >>  - any Load/Store multiple
> >>  - any load/store exclusive
> >>  - any load/store dual
> >>  - anything with the PC as the dest register
> >
> > [...]
> >
> >> +
> >> +/******************************************************************************
> >> + * Load-Store instruction emulation
> >> + *****************************************************************************/
> >> +
> >> +/*
> >> + * Must be ordered with LOADS first and WRITES afterwards
> >> + * for easy distinction when doing MMIO.
> >> + */
> >> +#define NUM_LD_INSTR  9
> >> +enum INSTR_LS_INDEXES {
> >> +       INSTR_LS_LDRBT, INSTR_LS_LDRT, INSTR_LS_LDR, INSTR_LS_LDRB,
> >> +       INSTR_LS_LDRD, INSTR_LS_LDREX, INSTR_LS_LDRH, INSTR_LS_LDRSB,
> >> +       INSTR_LS_LDRSH,
> >> +       INSTR_LS_STRBT, INSTR_LS_STRT, INSTR_LS_STR, INSTR_LS_STRB,
> >> +       INSTR_LS_STRD, INSTR_LS_STREX, INSTR_LS_STRH,
> >> +       NUM_LS_INSTR
> >> +};
> >> +
> >> +static u32 ls_instr[NUM_LS_INSTR][2] = {
> >> +       {0x04700000, 0x0d700000}, /* LDRBT */
> >> +       {0x04300000, 0x0d700000}, /* LDRT  */
> >> +       {0x04100000, 0x0c500000}, /* LDR   */
> >> +       {0x04500000, 0x0c500000}, /* LDRB  */
> >> +       {0x000000d0, 0x0e1000f0}, /* LDRD  */
> >> +       {0x01900090, 0x0ff000f0}, /* LDREX */
> >> +       {0x001000b0, 0x0e1000f0}, /* LDRH  */
> >> +       {0x001000d0, 0x0e1000f0}, /* LDRSB */
> >> +       {0x001000f0, 0x0e1000f0}, /* LDRSH */
> >> +       {0x04600000, 0x0d700000}, /* STRBT */
> >> +       {0x04200000, 0x0d700000}, /* STRT  */
> >> +       {0x04000000, 0x0c500000}, /* STR   */
> >> +       {0x04400000, 0x0c500000}, /* STRB  */
> >> +       {0x000000f0, 0x0e1000f0}, /* STRD  */
> >> +       {0x01800090, 0x0ff000f0}, /* STREX */
> >> +       {0x000000b0, 0x0e1000f0}  /* STRH  */
> >> +};
> >> +
> >> +static inline int get_arm_ls_instr_index(u32 instr)
> >> +{
> >> +       return kvm_instr_index(instr, ls_instr, NUM_LS_INSTR);
> >> +}
> >> +
> >> +/*
> >> + * Load-Store instruction decoding
> >> + */
> >> +#define INSTR_LS_TYPE_BIT              26
> >> +#define INSTR_LS_RD_MASK               0x0000f000
> >> +#define INSTR_LS_RD_SHIFT              12
> >> +#define INSTR_LS_RN_MASK               0x000f0000
> >> +#define INSTR_LS_RN_SHIFT              16
> >> +#define INSTR_LS_RM_MASK               0x0000000f
> >> +#define INSTR_LS_OFFSET12_MASK         0x00000fff
> >
> > I'm afraid you're not going to thank me much for this, but it's high time we
> > unified the various instruction decoding functions we have under arch/arm/
> > and this seems like a good opportunity for that. For example, look at the
> > following snippets (there is much more in the files I list) in addition to
> > what you have:
> >
> 
> I think it would be great if we had a set of unified decoding functions!
> 
> However, I think it's a shame if we can't merge KVM because we want to
> clean up this code. There would be nothing in the way of breaking API
> or anything like that preventing us from cleaning this up after the
> code has been merged.
> 
> Please consider reviewing the code for correctness and consider if the
> code could be merged as is.
> 
> On the other hand, if you or Dave or anyone else wants to create a set
> of patches that cleans this up in a timely manner, I will be happy to
> merge them for my code as well ;)

The time I would have available to put into this is rather limited, but
I have some initial ideas, as outlined below.

Tixy (who did the kprobes implementation, which is probably the most
sophisticated opcode handling we have in the kernel right now) may have
ideas too.  I would hope that any common framework could reuse a fair
chunk of his implementation and test coverage.



I think that a common framework would have to start out a lot more
generic that the special-purpose code in current subsystems, at least
initially.  We should try to move all the knowledge about how
instructions are encoded out of subsystems and into the common
framework, so far as possible.


We might end up with an interface like this:

Instruction data in memory <-> __arm_mem_to_opcode*() and friends <-> canonical form

canonical form -> __arm_opcode_decode() -> decoded form

decoded form -> __arm_opcode_encode() -> canonical form


The decoded form might look something like this:

struct arm_insn {
	u32			opcode;
	u32			flags;		/* common flags */
	enum arm_insn_type	type;
	u32			insn_flags;	/* insn specific flags */
	enum arm_reg		Rd;
	enum arm_reg		Rn;
	enum arm_reg		Rm;
	enum arm_reg		Rs;
	enum arm_reg		Rt;
	enum arm_reg		Ra;

	/* ... */
};

... and so on.  This just a sketch, and deliberately very incomplete.

The basic principle here is that the client gets the architectural,
encoding-independent view of the instruction: as far as possible, client
code should never need to know anything about how the encoding works.
The client code should not even need to know for most purposes whether
the instruction is ARM or Thumb, once decoded.

Identifying instruction classes (i.e., is this a VFP/NEON instruction,
does this access memory, does this change the PC) etc. should all be
abstracted as helper functions / macros.

All cleverness involving tables and hashes to accelerate decode and
identify instruction classes should move into the framework, but
we can start out with something stupid and simple: if we only need
to distinguish a few instruction types to begin with, a simple switch
statement for decode may be enough to get started.  However, as
things mature, a more sophisticated table-driven approach could be


A good starting point would be load/store emulation as this seems to be a
common theme, and we would need a credible deployment for any new
framework so that we know it's fit for purpose.

So just refactoring the code that appears in the KVM patches could be a
good way of getting such a framework off the ground.

Subsystems could be migrated to the new framework incrementally after
its initial creation, while extending the framework as needed.  This
means that the initial that the initial framework could be pretty simple
-- we don't need all the functionality all at once.


[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-09-30 19:21           ` Christoffer Dall
@ 2012-10-01 13:03             ` Marc Zyngier
  -1 siblings, 0 replies; 164+ messages in thread
From: Marc Zyngier @ 2012-10-01 13:03 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: linux-arm-kernel, Will Deacon, rusty.russell, kvm, kvmarm

On Sun, 30 Sep 2012 15:21:54 -0400, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon <will.deacon@arm.com>
wrote:
>>
>> I think Marc (CC'd) had a go at this with some success.
>>
> 
> great, if this improves the code, then I suggest someone rebases an
> appropriate patch and sends it to the kvmarm mailing list so we can
> have a look at it, but there are users out there looking to try
> kvm/arm and we should try to give it to them.

Incoming.

        M.
-- 
Who you jivin' with that Cosmik Debris?

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-10-01 13:03             ` Marc Zyngier
  0 siblings, 0 replies; 164+ messages in thread
From: Marc Zyngier @ 2012-10-01 13:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, 30 Sep 2012 15:21:54 -0400, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon <will.deacon@arm.com>
wrote:
>>
>> I think Marc (CC'd) had a go at this with some success.
>>
> 
> great, if this improves the code, then I suggest someone rebases an
> appropriate patch and sends it to the kvmarm mailing list so we can
> have a look at it, but there are users out there looking to try
> kvm/arm and we should try to give it to them.

Incoming.

        M.
-- 
Who you jivin' with that Cosmik Debris?

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts
  2012-10-01 12:53         ` Dave Martin
@ 2012-10-01 15:12           ` Jon Medhurst (Tixy)
  -1 siblings, 0 replies; 164+ messages in thread
From: Jon Medhurst (Tixy) @ 2012-10-01 15:12 UTC (permalink / raw)
  To: Dave Martin; +Cc: Christoffer Dall, Will Deacon, linux-arm-kernel, kvm, kvmarm

On Mon, 2012-10-01 at 13:53 +0100, Dave Martin wrote:
> On Sun, Sep 30, 2012 at 05:49:21PM -0400, Christoffer Dall wrote:
> > On Thu, Sep 27, 2012 at 11:11 AM, Will Deacon <will.deacon@arm.com> wrote:
> > > On Sat, Sep 15, 2012 at 04:35:59PM +0100, Christoffer Dall wrote:
> > > I'm afraid you're not going to thank me much for this, but it's high time we
> > > unified the various instruction decoding functions we have under arch/arm/
> > > and this seems like a good opportunity for that. For example, look at the
> > > following snippets (there is much more in the files I list) in addition to
> > > what you have:
> > >
> > 
> > I think it would be great if we had a set of unified decoding functions!
> > 
> > However, I think it's a shame if we can't merge KVM because we want to
> > clean up this code. There would be nothing in the way of breaking API
> > or anything like that preventing us from cleaning this up after the
> > code has been merged.
> > 
> > Please consider reviewing the code for correctness and consider if the
> > code could be merged as is.
> > 
> > On the other hand, if you or Dave or anyone else wants to create a set
> > of patches that cleans this up in a timely manner, I will be happy to
> > merge them for my code as well ;)
> 
> The time I would have available to put into this is rather limited, but
> I have some initial ideas, as outlined below.
> 
> Tixy (who did the kprobes implementation, which is probably the most
> sophisticated opcode handling we have in the kernel right now) may have
> ideas too.  I would hope that any common framework could reuse a fair
> chunk of his implementation and test coverage.

To my thinking, the kprobes code is very tailored to the job it needs to
do and that turning it into something generic is just going to make
everything bigger and more complex - because a generic framework would
be bigger (as it's trying to be generic) and then things like kprobes
will probably end up having an additional framework layered over the top
to bend it to it's purposes. Perhaps I'm being too pessimistic.

It would also requiring an inordinate amount of time to thrash out
requirements, design, prototype, and to implement. (I don't think I'm
being overly pessimistic about that ;-)

So, unless some-one has serious quantities of spare time lying around...

-- 
Tixy

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 14/15] KVM: ARM: Handle I/O aborts
@ 2012-10-01 15:12           ` Jon Medhurst (Tixy)
  0 siblings, 0 replies; 164+ messages in thread
From: Jon Medhurst (Tixy) @ 2012-10-01 15:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2012-10-01 at 13:53 +0100, Dave Martin wrote:
> On Sun, Sep 30, 2012 at 05:49:21PM -0400, Christoffer Dall wrote:
> > On Thu, Sep 27, 2012 at 11:11 AM, Will Deacon <will.deacon@arm.com> wrote:
> > > On Sat, Sep 15, 2012 at 04:35:59PM +0100, Christoffer Dall wrote:
> > > I'm afraid you're not going to thank me much for this, but it's high time we
> > > unified the various instruction decoding functions we have under arch/arm/
> > > and this seems like a good opportunity for that. For example, look at the
> > > following snippets (there is much more in the files I list) in addition to
> > > what you have:
> > >
> > 
> > I think it would be great if we had a set of unified decoding functions!
> > 
> > However, I think it's a shame if we can't merge KVM because we want to
> > clean up this code. There would be nothing in the way of breaking API
> > or anything like that preventing us from cleaning this up after the
> > code has been merged.
> > 
> > Please consider reviewing the code for correctness and consider if the
> > code could be merged as is.
> > 
> > On the other hand, if you or Dave or anyone else wants to create a set
> > of patches that cleans this up in a timely manner, I will be happy to
> > merge them for my code as well ;)
> 
> The time I would have available to put into this is rather limited, but
> I have some initial ideas, as outlined below.
> 
> Tixy (who did the kprobes implementation, which is probably the most
> sophisticated opcode handling we have in the kernel right now) may have
> ideas too.  I would hope that any common framework could reuse a fair
> chunk of his implementation and test coverage.

To my thinking, the kprobes code is very tailored to the job it needs to
do and that turning it into something generic is just going to make
everything bigger and more complex - because a generic framework would
be bigger (as it's trying to be generic) and then things like kprobes
will probably end up having an additional framework layered over the top
to bend it to it's purposes. Perhaps I'm being too pessimistic.

It would also requiring an inordinate amount of time to thrash out
requirements, design, prototype, and to implement. (I don't think I'm
being overly pessimistic about that ;-)

So, unless some-one has serious quantities of spare time lying around...

-- 
Tixy

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts
  2012-10-01 15:12           ` Jon Medhurst (Tixy)
@ 2012-10-01 16:07             ` Dave Martin
  -1 siblings, 0 replies; 164+ messages in thread
From: Dave Martin @ 2012-10-01 16:07 UTC (permalink / raw)
  To: Jon Medhurst (Tixy)
  Cc: Christoffer Dall, Will Deacon, kvm, linux-arm-kernel, kvmarm

On Mon, Oct 01, 2012 at 04:12:09PM +0100, Jon Medhurst (Tixy) wrote:
> On Mon, 2012-10-01 at 13:53 +0100, Dave Martin wrote:
> > On Sun, Sep 30, 2012 at 05:49:21PM -0400, Christoffer Dall wrote:
> > > On Thu, Sep 27, 2012 at 11:11 AM, Will Deacon <will.deacon@arm.com> wrote:
> > > > On Sat, Sep 15, 2012 at 04:35:59PM +0100, Christoffer Dall wrote:
> > > > I'm afraid you're not going to thank me much for this, but it's high time we
> > > > unified the various instruction decoding functions we have under arch/arm/
> > > > and this seems like a good opportunity for that. For example, look at the
> > > > following snippets (there is much more in the files I list) in addition to
> > > > what you have:
> > > >
> > > 
> > > I think it would be great if we had a set of unified decoding functions!
> > > 
> > > However, I think it's a shame if we can't merge KVM because we want to
> > > clean up this code. There would be nothing in the way of breaking API
> > > or anything like that preventing us from cleaning this up after the
> > > code has been merged.
> > > 
> > > Please consider reviewing the code for correctness and consider if the
> > > code could be merged as is.
> > > 
> > > On the other hand, if you or Dave or anyone else wants to create a set
> > > of patches that cleans this up in a timely manner, I will be happy to
> > > merge them for my code as well ;)
> > 
> > The time I would have available to put into this is rather limited, but
> > I have some initial ideas, as outlined below.
> > 
> > Tixy (who did the kprobes implementation, which is probably the most
> > sophisticated opcode handling we have in the kernel right now) may have
> > ideas too.  I would hope that any common framework could reuse a fair
> > chunk of his implementation and test coverage.
> 
> To my thinking, the kprobes code is very tailored to the job it needs to
> do and that turning it into something generic is just going to make
> everything bigger and more complex - because a generic framework would
> be bigger (as it's trying to be generic) and then things like kprobes
> will probably end up having an additional framework layered over the top
> to bend it to it's purposes. Perhaps I'm being too pessimistic.

Perhaps kprobes is a bit of a double-edged example.  It's an example of
an implementation with some good features, but because it is larger
the amount of adaptation required to convert to a common framework
would necessarily be larger also.

Yet, kprobes isn't trying to solve radically different problems from
other subsystems in the kernel.  It doesn't just want to descode and
manipulate the properties of instructions, it is actually interested in
many of the same properties (for example, whether an instruction is
a load or store, whether it modifies the PC etc.) as some other
subsystems.

I worry that by default every implementation of this ends up rather
deeply tailored to its correcponding subsystem -- so we gradually
accumulate more incompatible partially-overlapping duplicates of this
functionality over time.  This doesn't feel like a good thing.

> It would also requiring an inordinate amount of time to thrash out
> requirements, design, prototype, and to implement. (I don't think I'm
> being overly pessimistic about that ;-)
> 
> So, unless some-one has serious quantities of spare time lying around...

Well, I don't suggest that we should expect to get there in one go:
such an effort won't ever the off the ground for sure.

If we can consolidate a few simpler subsystems' opcode handling then
that would still be a step in the right direction, even if integrating
kprobes could not happen until much later.

If we do nothing, the situation will just gradually get worse.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 14/15] KVM: ARM: Handle I/O aborts
@ 2012-10-01 16:07             ` Dave Martin
  0 siblings, 0 replies; 164+ messages in thread
From: Dave Martin @ 2012-10-01 16:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 01, 2012 at 04:12:09PM +0100, Jon Medhurst (Tixy) wrote:
> On Mon, 2012-10-01 at 13:53 +0100, Dave Martin wrote:
> > On Sun, Sep 30, 2012 at 05:49:21PM -0400, Christoffer Dall wrote:
> > > On Thu, Sep 27, 2012 at 11:11 AM, Will Deacon <will.deacon@arm.com> wrote:
> > > > On Sat, Sep 15, 2012 at 04:35:59PM +0100, Christoffer Dall wrote:
> > > > I'm afraid you're not going to thank me much for this, but it's high time we
> > > > unified the various instruction decoding functions we have under arch/arm/
> > > > and this seems like a good opportunity for that. For example, look at the
> > > > following snippets (there is much more in the files I list) in addition to
> > > > what you have:
> > > >
> > > 
> > > I think it would be great if we had a set of unified decoding functions!
> > > 
> > > However, I think it's a shame if we can't merge KVM because we want to
> > > clean up this code. There would be nothing in the way of breaking API
> > > or anything like that preventing us from cleaning this up after the
> > > code has been merged.
> > > 
> > > Please consider reviewing the code for correctness and consider if the
> > > code could be merged as is.
> > > 
> > > On the other hand, if you or Dave or anyone else wants to create a set
> > > of patches that cleans this up in a timely manner, I will be happy to
> > > merge them for my code as well ;)
> > 
> > The time I would have available to put into this is rather limited, but
> > I have some initial ideas, as outlined below.
> > 
> > Tixy (who did the kprobes implementation, which is probably the most
> > sophisticated opcode handling we have in the kernel right now) may have
> > ideas too.  I would hope that any common framework could reuse a fair
> > chunk of his implementation and test coverage.
> 
> To my thinking, the kprobes code is very tailored to the job it needs to
> do and that turning it into something generic is just going to make
> everything bigger and more complex - because a generic framework would
> be bigger (as it's trying to be generic) and then things like kprobes
> will probably end up having an additional framework layered over the top
> to bend it to it's purposes. Perhaps I'm being too pessimistic.

Perhaps kprobes is a bit of a double-edged example.  It's an example of
an implementation with some good features, but because it is larger
the amount of adaptation required to convert to a common framework
would necessarily be larger also.

Yet, kprobes isn't trying to solve radically different problems from
other subsystems in the kernel.  It doesn't just want to descode and
manipulate the properties of instructions, it is actually interested in
many of the same properties (for example, whether an instruction is
a load or store, whether it modifies the PC etc.) as some other
subsystems.

I worry that by default every implementation of this ends up rather
deeply tailored to its correcponding subsystem -- so we gradually
accumulate more incompatible partially-overlapping duplicates of this
functionality over time.  This doesn't feel like a good thing.

> It would also requiring an inordinate amount of time to thrash out
> requirements, design, prototype, and to implement. (I don't think I'm
> being overly pessimistic about that ;-)
> 
> So, unless some-one has serious quantities of spare time lying around...

Well, I don't suggest that we should expect to get there in one go:
such an effort won't ever the off the ground for sure.

If we can consolidate a few simpler subsystems' opcode handling then
that would still be a step in the right direction, even if integrating
kprobes could not happen until much later.

If we do nothing, the situation will just gradually get worse.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 164+ messages in thread

* RE: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-09-30 19:21           ` Christoffer Dall
@ 2012-10-04 13:02             ` Min-gyu Kim
  -1 siblings, 0 replies; 164+ messages in thread
From: Min-gyu Kim @ 2012-10-04 13:02 UTC (permalink / raw)
  To: 'Christoffer Dall', 'Will Deacon'
  Cc: kvm, linux-arm-kernel, kvmarm, rusty.russell, avi, marc.zyngier,
	김창환



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Christoffer Dall
> Sent: Monday, October 01, 2012 4:22 AM
> To: Will Deacon
> Cc: kvm@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> kvmarm@lists.cs.columbia.edu; rusty.russell@linaro.org; avi@redhat.com;
> marc.zyngier@arm.com
> Subject: Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM
> support
> 
> On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
> >> On 09/25/2012 11:20 AM, Will Deacon wrote:
> >> >> +/* Multiprocessor Affinity Register */
> >> >> +#define MPIDR_CPUID    (0x3 << 0)
> >> >
> >> > I'm fairly sure we already have code under arch/arm/ for dealing
> >> > with the mpidr. Let's re-use that rather than reinventing it here.
> >> >
> >>
> >> I see some defines in topology.c - do you want some of these factored
> >> out into a header file that we can then also use from kvm? If so,
where?
> >
> > I guess either in topology.h or a new header (topology-bits.h).
> >
> >> >> +#define EXCEPTION_NONE      0
> >> >> +#define EXCEPTION_RESET     0x80
> >> >> +#define EXCEPTION_UNDEFINED 0x40
> >> >> +#define EXCEPTION_SOFTWARE  0x20
> >> >> +#define EXCEPTION_PREFETCH  0x10
> >> >> +#define EXCEPTION_DATA      0x08
> >> >> +#define EXCEPTION_IMPRECISE 0x04
> >> >> +#define EXCEPTION_IRQ       0x02
> >> >> +#define EXCEPTION_FIQ       0x01
> >> >
> >> > Why the noise?
> >> >
> >>
> >> these are simply cruft from a previous life of KVM/ARM.
> >
> > Ok, then please get rid of them.
> >
> >> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu) {
> >> >> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
> >> >> +       BUG_ON(mode == 0xf);
> >> >> +       return mode;
> >> >> +}
> >> >
> >> > I noticed that you have a fair few BUG_ONs throughout the series.
> >> > Fair enough, but for hyp code is that really the right thing to do?
> >> > Killing the guest could make more sense, perhaps?
> >>
> >> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
> >> catch explicitly on the host. We have had a pass over the code to
> >> change all the BUG_ONs that can be provoked by the guest and inject
> >> the proper exceptions into the guest in this case. If you find places
> >> where this is not the case, it should be changed, and do let me know.
> >
> > Ok, so are you saying that a BUG_ON due to some detected inconsistency
> > with one guest may not necessarily terminate the other guests? BUG_ONs
> > in the host seem like a bad idea if the host is able to continue with
> > a subset of guests.
> >
> 
> No, I'm saying a BUG_ON is an actual BUG, it should not happen and there
> should be nowhere where a guest can cause a BUG_ON to occur in the host,
> because that would be a bug.
> 
> We basically never kill a guest unless really extreme things happen (like
> we cannot allocate a pte, in which case we return -ENOMEM). If a guest
> does something really weird, that guest will receive the appropriate
> exception (undefined, prefetch abort, ...)
> 

I agree with Will. It seems to be overreacting to kill the entire system.

>From the code above, BUG_ON case clearly points out that there happened a
serious bug case. However, killing the corresponding VM may not cause any
further problem.
Then leave some logs for debugging and killing the VM seems to be enough.

Let's assume KVM for ARM is distributed with a critical bug.
If the case is defended by BUG_ON, it will cause host to shutdown.
If the case is defended by killing VM, it will cause VM to shutdown.
In my opinion, latter case seems to be better.

I looked for a guide on BUG_ON and found this:
     http://yarchive.net/comp/linux/BUG.html


> >> >
> >> >> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu) {
> >> >> +       return vcpu_reg(vcpu, 15); }
> >> >
> >> > If you stick a struct pt_regs into struct kvm_regs, you could reuse
> >> > ARM_pc here etc.
> >> >
> >>
> >> I prefer not to, because we'd have those registers presumably for usr
> >> mode and then we only define the others explicit. I think it's much
> >> clearer to look at kvm_regs today.
> >
> > I disagree and think that you should reuse as much of the arch/arm/
> > code as possible. Not only does it make it easier to read by people
> > who are familiar with that code (and in turn get you more reviewers)
> > but it also means that we limit the amount of duplication that we have.
> 
> Reusing a struct just for the sake of reusing is not necessarily an
> improvement. Some times it complicates things, and some times it's
> misleading. To me, pt_regs carry the meaning that these are the registers
> for a user space process that traps into the kernel - in KVM we emulate a
> virtual CPU and that current definition is quite clear.
> 
> The argument that more people will review the code if the struct contains
> a pt_regs field rather than a usr_regs field is completely invalid,
> because I'm sure everyone that reviews virtualization code will know that
> user mode is a mode on the cpu and it has some registers and this is the
> state we store when we context switch a VM - pt_regs could be read as the
> regs that we stored in the mode that the VM happened to be in when we took
> an exception, which I would think is crazy, and probably not what you
> suggest.
> 
> Writing the literal 15 for the PC register is not really a problem in
> terms of duplication - it's nothing that requires separate maintenance.
> 
> At this point the priority should really be correctness, readability, and
> performance, imho.
> 
> >
> > I think Marc (CC'd) had a go at this with some success.
> >
> 
> great, if this improves the code, then I suggest someone rebases an
> appropriate patch and sends it to the kvmarm mailing list so we can have a
> look at it, but there are users out there looking to try kvm/arm and we
> should try to give it to them.
> 
> >> >> +#ifndef __ARM_KVM_HOST_H__
> >> >> +#define __ARM_KVM_HOST_H__
> >> >> +
> >> >> +#include <asm/kvm.h>
> >> >> +
> >> >> +#define KVM_MAX_VCPUS 4
> >> >
> >> > NR_CPUS?
> >> >
> >>
> >> well this is defined by KVM generic code, and is common for other
> >> architecture.
> >
> > I mean #define KVM_MAX_CPUS NR_CPUS. The 4 seems arbitrary.
> >
> >> >> +int __attribute_const__ kvm_target_cpu(void) {
> >> >> +       unsigned int midr;
> >> >> +
> >> >> +       midr = read_cpuid_id();
> >> >> +       switch ((midr >> 4) & 0xfff) {
> >> >> +       case KVM_ARM_TARGET_CORTEX_A15:
> >> >> +               return KVM_ARM_TARGET_CORTEX_A15;
> >> >
> >> > I have this code already in perf_event.c. Can we move it somewhere
> >> > common and share it? You should also check that the implementor field
> is 0x41.
> >> >
> >>
> >> by all means, you can probably suggest a good place better than I
can...
> >
> > cputype.h?
> >
> >> >> +#include <linux/module.h>
> >> >> +
> >> >> +EXPORT_SYMBOL_GPL(smp_send_reschedule);
> >> >
> >> > Erm...
> >> >
> >> > We already have arch/arm/kernel/armksyms.c for exports -- please use
> that.
> >> > However, exporting such low-level operations sounds like a bad
> >> > idea. How realistic is kvm-as-a-module on ARM anyway?
> >> >
> >>
> >> at this point it's broken, so I'll just remove this and leave this
> >> for a fun project for some poor soul at some point if anyone ever
> >> needs half the code outside the kernel as a module (the other half
> >> needs to be compiled in anyway)
> >
> > Ok, that suits me. If it's broken, let's not include it in the initial
> > submission.
> >
> >> >> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct
> >> >> +kvm_regs *regs) {
> >> >> +       return -EINVAL;
> >> >> +}
> >> >> +
> >> >> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct
> >> >> +kvm_regs *regs) {
> >> >> +       return -EINVAL;
> >> >> +}
> >> >
> >> > Again, all looks like this should be implemented using regsets from
> >> > what I can tell.
> >> >
> >>
> >> this API has been discussed to death on the KVM lists, and we can of
> >> course revive that if the regset makes it nicer - I'd prefer getting
> >> this upstream the way that it is now though, where GET_REG / SET_REG
> >> seems to be the way forward from a KVM perspective.
> >
> > I'm sure the API has been discussed, but I've not seen that
> > conversation and I'm also not aware about whether or not regsets were
> > considered as a possibility for this stuff. The advantages of using them
> are:
> >
> >         1. It's less code for the arch to implement (and most of what
you
> >         need, you already have).
> >
> >         2. You can move the actual copying code into core KVM, like we
> have
> >         for ptrace.
> >
> >         3. New KVM ports (e.g. arm64) can reuse the core copying code
> >         easily.
> >
> > Furthermore, some registers (typically) floating point and GPRs will
> > already have regsets for the ptrace code, so that can be reused if you
> > share the datatypes.
> >
> > The big problem with getting things upstream and then changing it
> > later is that you will break the ABI. I highly doubt that's feasible,
> > so can we not just use regsets from the start for ARM?
> >
> >> >> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu) {
> >> >> +       struct kvm_regs *cpu_reset;
> >> >> +
> >> >> +       switch (vcpu->arch.target) {
> >> >> +       case KVM_ARM_TARGET_CORTEX_A15:
> >> >> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
> >> >> +                       return -EINVAL;
> >> >> +               cpu_reset = &a15_regs_reset;
> >> >> +               vcpu->arch.midr = read_cpuid_id();
> >> >> +               break;
> >> >> +       default:
> >> >> +               return -ENODEV;
> >> >> +       }
> >> >> +
> >> >> +       /* Reset core registers */
> >> >> +       memcpy(&vcpu->arch.regs, cpu_reset,
> >> >> + sizeof(vcpu->arch.regs));
> >> >> +
> >> >> +       /* Reset CP15 registers */
> >> >> +       kvm_reset_coprocs(vcpu);
> >> >> +
> >> >> +       return 0;
> >> >> +}
> >> >
> >> > This is a nice way to plug in new CPUs but the way the rest of the
> >> > code is currently written, all the ARMv7 and Cortex-A15 code is
> >> > merged together. I
> >> > *strongly* suggest you isolate this from the start, as it will help
> >> > you see what is architected and what is implementation-specific.
> >> >
> >>
> >> not entirely sure what you mean. You want a separate coproc.c file
> >> for
> >> Cortex-A15 specific stuff like coproc_a15.c?
> >
> > Indeed. I think it will make adding new CPUs a lot clearer and
> > separate the architecture from the implementation.
> >
> > Cheers,
> >
> > Will
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-10-04 13:02             ` Min-gyu Kim
  0 siblings, 0 replies; 164+ messages in thread
From: Min-gyu Kim @ 2012-10-04 13:02 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: kvm-owner at vger.kernel.org [mailto:kvm-owner at vger.kernel.org] On
> Behalf Of Christoffer Dall
> Sent: Monday, October 01, 2012 4:22 AM
> To: Will Deacon
> Cc: kvm at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
> kvmarm at lists.cs.columbia.edu; rusty.russell at linaro.org; avi at redhat.com;
> marc.zyngier at arm.com
> Subject: Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM
> support
> 
> On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
> >> On 09/25/2012 11:20 AM, Will Deacon wrote:
> >> >> +/* Multiprocessor Affinity Register */
> >> >> +#define MPIDR_CPUID    (0x3 << 0)
> >> >
> >> > I'm fairly sure we already have code under arch/arm/ for dealing
> >> > with the mpidr. Let's re-use that rather than reinventing it here.
> >> >
> >>
> >> I see some defines in topology.c - do you want some of these factored
> >> out into a header file that we can then also use from kvm? If so,
where?
> >
> > I guess either in topology.h or a new header (topology-bits.h).
> >
> >> >> +#define EXCEPTION_NONE      0
> >> >> +#define EXCEPTION_RESET     0x80
> >> >> +#define EXCEPTION_UNDEFINED 0x40
> >> >> +#define EXCEPTION_SOFTWARE  0x20
> >> >> +#define EXCEPTION_PREFETCH  0x10
> >> >> +#define EXCEPTION_DATA      0x08
> >> >> +#define EXCEPTION_IMPRECISE 0x04
> >> >> +#define EXCEPTION_IRQ       0x02
> >> >> +#define EXCEPTION_FIQ       0x01
> >> >
> >> > Why the noise?
> >> >
> >>
> >> these are simply cruft from a previous life of KVM/ARM.
> >
> > Ok, then please get rid of them.
> >
> >> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu) {
> >> >> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
> >> >> +       BUG_ON(mode == 0xf);
> >> >> +       return mode;
> >> >> +}
> >> >
> >> > I noticed that you have a fair few BUG_ONs throughout the series.
> >> > Fair enough, but for hyp code is that really the right thing to do?
> >> > Killing the guest could make more sense, perhaps?
> >>
> >> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
> >> catch explicitly on the host. We have had a pass over the code to
> >> change all the BUG_ONs that can be provoked by the guest and inject
> >> the proper exceptions into the guest in this case. If you find places
> >> where this is not the case, it should be changed, and do let me know.
> >
> > Ok, so are you saying that a BUG_ON due to some detected inconsistency
> > with one guest may not necessarily terminate the other guests? BUG_ONs
> > in the host seem like a bad idea if the host is able to continue with
> > a subset of guests.
> >
> 
> No, I'm saying a BUG_ON is an actual BUG, it should not happen and there
> should be nowhere where a guest can cause a BUG_ON to occur in the host,
> because that would be a bug.
> 
> We basically never kill a guest unless really extreme things happen (like
> we cannot allocate a pte, in which case we return -ENOMEM). If a guest
> does something really weird, that guest will receive the appropriate
> exception (undefined, prefetch abort, ...)
> 

I agree with Will. It seems to be overreacting to kill the entire system.

>From the code above, BUG_ON case clearly points out that there happened a
serious bug case. However, killing the corresponding VM may not cause any
further problem.
Then leave some logs for debugging and killing the VM seems to be enough.

Let's assume KVM for ARM is distributed with a critical bug.
If the case is defended by BUG_ON, it will cause host to shutdown.
If the case is defended by killing VM, it will cause VM to shutdown.
In my opinion, latter case seems to be better.

I looked for a guide on BUG_ON and found this:
     http://yarchive.net/comp/linux/BUG.html


> >> >
> >> >> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu) {
> >> >> +       return vcpu_reg(vcpu, 15); }
> >> >
> >> > If you stick a struct pt_regs into struct kvm_regs, you could reuse
> >> > ARM_pc here etc.
> >> >
> >>
> >> I prefer not to, because we'd have those registers presumably for usr
> >> mode and then we only define the others explicit. I think it's much
> >> clearer to look at kvm_regs today.
> >
> > I disagree and think that you should reuse as much of the arch/arm/
> > code as possible. Not only does it make it easier to read by people
> > who are familiar with that code (and in turn get you more reviewers)
> > but it also means that we limit the amount of duplication that we have.
> 
> Reusing a struct just for the sake of reusing is not necessarily an
> improvement. Some times it complicates things, and some times it's
> misleading. To me, pt_regs carry the meaning that these are the registers
> for a user space process that traps into the kernel - in KVM we emulate a
> virtual CPU and that current definition is quite clear.
> 
> The argument that more people will review the code if the struct contains
> a pt_regs field rather than a usr_regs field is completely invalid,
> because I'm sure everyone that reviews virtualization code will know that
> user mode is a mode on the cpu and it has some registers and this is the
> state we store when we context switch a VM - pt_regs could be read as the
> regs that we stored in the mode that the VM happened to be in when we took
> an exception, which I would think is crazy, and probably not what you
> suggest.
> 
> Writing the literal 15 for the PC register is not really a problem in
> terms of duplication - it's nothing that requires separate maintenance.
> 
> At this point the priority should really be correctness, readability, and
> performance, imho.
> 
> >
> > I think Marc (CC'd) had a go at this with some success.
> >
> 
> great, if this improves the code, then I suggest someone rebases an
> appropriate patch and sends it to the kvmarm mailing list so we can have a
> look at it, but there are users out there looking to try kvm/arm and we
> should try to give it to them.
> 
> >> >> +#ifndef __ARM_KVM_HOST_H__
> >> >> +#define __ARM_KVM_HOST_H__
> >> >> +
> >> >> +#include <asm/kvm.h>
> >> >> +
> >> >> +#define KVM_MAX_VCPUS 4
> >> >
> >> > NR_CPUS?
> >> >
> >>
> >> well this is defined by KVM generic code, and is common for other
> >> architecture.
> >
> > I mean #define KVM_MAX_CPUS NR_CPUS. The 4 seems arbitrary.
> >
> >> >> +int __attribute_const__ kvm_target_cpu(void) {
> >> >> +       unsigned int midr;
> >> >> +
> >> >> +       midr = read_cpuid_id();
> >> >> +       switch ((midr >> 4) & 0xfff) {
> >> >> +       case KVM_ARM_TARGET_CORTEX_A15:
> >> >> +               return KVM_ARM_TARGET_CORTEX_A15;
> >> >
> >> > I have this code already in perf_event.c. Can we move it somewhere
> >> > common and share it? You should also check that the implementor field
> is 0x41.
> >> >
> >>
> >> by all means, you can probably suggest a good place better than I
can...
> >
> > cputype.h?
> >
> >> >> +#include <linux/module.h>
> >> >> +
> >> >> +EXPORT_SYMBOL_GPL(smp_send_reschedule);
> >> >
> >> > Erm...
> >> >
> >> > We already have arch/arm/kernel/armksyms.c for exports -- please use
> that.
> >> > However, exporting such low-level operations sounds like a bad
> >> > idea. How realistic is kvm-as-a-module on ARM anyway?
> >> >
> >>
> >> at this point it's broken, so I'll just remove this and leave this
> >> for a fun project for some poor soul at some point if anyone ever
> >> needs half the code outside the kernel as a module (the other half
> >> needs to be compiled in anyway)
> >
> > Ok, that suits me. If it's broken, let's not include it in the initial
> > submission.
> >
> >> >> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct
> >> >> +kvm_regs *regs) {
> >> >> +       return -EINVAL;
> >> >> +}
> >> >> +
> >> >> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct
> >> >> +kvm_regs *regs) {
> >> >> +       return -EINVAL;
> >> >> +}
> >> >
> >> > Again, all looks like this should be implemented using regsets from
> >> > what I can tell.
> >> >
> >>
> >> this API has been discussed to death on the KVM lists, and we can of
> >> course revive that if the regset makes it nicer - I'd prefer getting
> >> this upstream the way that it is now though, where GET_REG / SET_REG
> >> seems to be the way forward from a KVM perspective.
> >
> > I'm sure the API has been discussed, but I've not seen that
> > conversation and I'm also not aware about whether or not regsets were
> > considered as a possibility for this stuff. The advantages of using them
> are:
> >
> >         1. It's less code for the arch to implement (and most of what
you
> >         need, you already have).
> >
> >         2. You can move the actual copying code into core KVM, like we
> have
> >         for ptrace.
> >
> >         3. New KVM ports (e.g. arm64) can reuse the core copying code
> >         easily.
> >
> > Furthermore, some registers (typically) floating point and GPRs will
> > already have regsets for the ptrace code, so that can be reused if you
> > share the datatypes.
> >
> > The big problem with getting things upstream and then changing it
> > later is that you will break the ABI. I highly doubt that's feasible,
> > so can we not just use regsets from the start for ARM?
> >
> >> >> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu) {
> >> >> +       struct kvm_regs *cpu_reset;
> >> >> +
> >> >> +       switch (vcpu->arch.target) {
> >> >> +       case KVM_ARM_TARGET_CORTEX_A15:
> >> >> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
> >> >> +                       return -EINVAL;
> >> >> +               cpu_reset = &a15_regs_reset;
> >> >> +               vcpu->arch.midr = read_cpuid_id();
> >> >> +               break;
> >> >> +       default:
> >> >> +               return -ENODEV;
> >> >> +       }
> >> >> +
> >> >> +       /* Reset core registers */
> >> >> +       memcpy(&vcpu->arch.regs, cpu_reset,
> >> >> + sizeof(vcpu->arch.regs));
> >> >> +
> >> >> +       /* Reset CP15 registers */
> >> >> +       kvm_reset_coprocs(vcpu);
> >> >> +
> >> >> +       return 0;
> >> >> +}
> >> >
> >> > This is a nice way to plug in new CPUs but the way the rest of the
> >> > code is currently written, all the ARMv7 and Cortex-A15 code is
> >> > merged together. I
> >> > *strongly* suggest you isolate this from the start, as it will help
> >> > you see what is architected and what is implementation-specific.
> >> >
> >>
> >> not entirely sure what you mean. You want a separate coproc.c file
> >> for
> >> Cortex-A15 specific stuff like coproc_a15.c?
> >
> > Indeed. I think it will make adding new CPUs a lot clearer and
> > separate the architecture from the implementation.
> >
> > Cheers,
> >
> > Will
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the body
> of a message to majordomo at vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-10-04 13:02             ` Min-gyu Kim
@ 2012-10-04 13:35               ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-10-04 13:35 UTC (permalink / raw)
  To: Min-gyu Kim
  Cc: Will Deacon, kvm, linux-arm-kernel, kvmarm, rusty.russell, avi,
	marc.zyngier, 김창환

On Thu, Oct 4, 2012 at 9:02 AM, Min-gyu Kim <mingyu84.kim@samsung.com> wrote:
>
>
>> -----Original Message-----
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
>> Behalf Of Christoffer Dall
>> Sent: Monday, October 01, 2012 4:22 AM
>> To: Will Deacon
>> Cc: kvm@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
>> kvmarm@lists.cs.columbia.edu; rusty.russell@linaro.org; avi@redhat.com;
>> marc.zyngier@arm.com
>> Subject: Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM
>> support
>>
>> On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
>> >> On 09/25/2012 11:20 AM, Will Deacon wrote:
>> >> >> +/* Multiprocessor Affinity Register */
>> >> >> +#define MPIDR_CPUID    (0x3 << 0)
>> >> >
>> >> > I'm fairly sure we already have code under arch/arm/ for dealing
>> >> > with the mpidr. Let's re-use that rather than reinventing it here.
>> >> >
>> >>
>> >> I see some defines in topology.c - do you want some of these factored
>> >> out into a header file that we can then also use from kvm? If so,
> where?
>> >
>> > I guess either in topology.h or a new header (topology-bits.h).
>> >
>> >> >> +#define EXCEPTION_NONE      0
>> >> >> +#define EXCEPTION_RESET     0x80
>> >> >> +#define EXCEPTION_UNDEFINED 0x40
>> >> >> +#define EXCEPTION_SOFTWARE  0x20
>> >> >> +#define EXCEPTION_PREFETCH  0x10
>> >> >> +#define EXCEPTION_DATA      0x08
>> >> >> +#define EXCEPTION_IMPRECISE 0x04
>> >> >> +#define EXCEPTION_IRQ       0x02
>> >> >> +#define EXCEPTION_FIQ       0x01
>> >> >
>> >> > Why the noise?
>> >> >
>> >>
>> >> these are simply cruft from a previous life of KVM/ARM.
>> >
>> > Ok, then please get rid of them.
>> >
>> >> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu) {
>> >> >> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
>> >> >> +       BUG_ON(mode == 0xf);
>> >> >> +       return mode;
>> >> >> +}
>> >> >
>> >> > I noticed that you have a fair few BUG_ONs throughout the series.
>> >> > Fair enough, but for hyp code is that really the right thing to do?
>> >> > Killing the guest could make more sense, perhaps?
>> >>
>> >> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
>> >> catch explicitly on the host. We have had a pass over the code to
>> >> change all the BUG_ONs that can be provoked by the guest and inject
>> >> the proper exceptions into the guest in this case. If you find places
>> >> where this is not the case, it should be changed, and do let me know.
>> >
>> > Ok, so are you saying that a BUG_ON due to some detected inconsistency
>> > with one guest may not necessarily terminate the other guests? BUG_ONs
>> > in the host seem like a bad idea if the host is able to continue with
>> > a subset of guests.
>> >
>>
>> No, I'm saying a BUG_ON is an actual BUG, it should not happen and there
>> should be nowhere where a guest can cause a BUG_ON to occur in the host,
>> because that would be a bug.
>>
>> We basically never kill a guest unless really extreme things happen (like
>> we cannot allocate a pte, in which case we return -ENOMEM). If a guest
>> does something really weird, that guest will receive the appropriate
>> exception (undefined, prefetch abort, ...)
>>
>
> I agree with Will. It seems to be overreacting to kill the entire system.
>
> From the code above, BUG_ON case clearly points out that there happened a
> serious bug case. However, killing the corresponding VM may not cause any
> further problem.
> Then leave some logs for debugging and killing the VM seems to be enough.
>
> Let's assume KVM for ARM is distributed with a critical bug.
> If the case is defended by BUG_ON, it will cause host to shutdown.
> If the case is defended by killing VM, it will cause VM to shutdown.
> In my opinion, latter case seems to be better.
>
> I looked for a guide on BUG_ON and found this:
>      http://yarchive.net/comp/linux/BUG.html
>
>

I completely agree with all this, no further argument is needed. The
point of a BUG_ON is to explicitly state the reason for a bug that
will anyhow cause the host kernel to overall malfunction. The above
bug_on statement is long gone (see new patches), and if you see other
cases like this, the code should have been tested and we can remove
the BUG_ON.

>> >> >
>> >> >> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu) {
>> >> >> +       return vcpu_reg(vcpu, 15); }
>> >> >
>> >> > If you stick a struct pt_regs into struct kvm_regs, you could reuse
>> >> > ARM_pc here etc.
>> >> >
>> >>
>> >> I prefer not to, because we'd have those registers presumably for usr
>> >> mode and then we only define the others explicit. I think it's much
>> >> clearer to look at kvm_regs today.
>> >
>> > I disagree and think that you should reuse as much of the arch/arm/
>> > code as possible. Not only does it make it easier to read by people
>> > who are familiar with that code (and in turn get you more reviewers)
>> > but it also means that we limit the amount of duplication that we have.
>>
>> Reusing a struct just for the sake of reusing is not necessarily an
>> improvement. Some times it complicates things, and some times it's
>> misleading. To me, pt_regs carry the meaning that these are the registers
>> for a user space process that traps into the kernel - in KVM we emulate a
>> virtual CPU and that current definition is quite clear.
>>
>> The argument that more people will review the code if the struct contains
>> a pt_regs field rather than a usr_regs field is completely invalid,
>> because I'm sure everyone that reviews virtualization code will know that
>> user mode is a mode on the cpu and it has some registers and this is the
>> state we store when we context switch a VM - pt_regs could be read as the
>> regs that we stored in the mode that the VM happened to be in when we took
>> an exception, which I would think is crazy, and probably not what you
>> suggest.
>>
>> Writing the literal 15 for the PC register is not really a problem in
>> terms of duplication - it's nothing that requires separate maintenance.
>>
>> At this point the priority should really be correctness, readability, and
>> performance, imho.
>>
>> >
>> > I think Marc (CC'd) had a go at this with some success.
>> >
>>
>> great, if this improves the code, then I suggest someone rebases an
>> appropriate patch and sends it to the kvmarm mailing list so we can have a
>> look at it, but there are users out there looking to try kvm/arm and we
>> should try to give it to them.
>>
>> >> >> +#ifndef __ARM_KVM_HOST_H__
>> >> >> +#define __ARM_KVM_HOST_H__
>> >> >> +
>> >> >> +#include <asm/kvm.h>
>> >> >> +
>> >> >> +#define KVM_MAX_VCPUS 4
>> >> >
>> >> > NR_CPUS?
>> >> >
>> >>
>> >> well this is defined by KVM generic code, and is common for other
>> >> architecture.
>> >
>> > I mean #define KVM_MAX_CPUS NR_CPUS. The 4 seems arbitrary.
>> >
>> >> >> +int __attribute_const__ kvm_target_cpu(void) {
>> >> >> +       unsigned int midr;
>> >> >> +
>> >> >> +       midr = read_cpuid_id();
>> >> >> +       switch ((midr >> 4) & 0xfff) {
>> >> >> +       case KVM_ARM_TARGET_CORTEX_A15:
>> >> >> +               return KVM_ARM_TARGET_CORTEX_A15;
>> >> >
>> >> > I have this code already in perf_event.c. Can we move it somewhere
>> >> > common and share it? You should also check that the implementor field
>> is 0x41.
>> >> >
>> >>
>> >> by all means, you can probably suggest a good place better than I
> can...
>> >
>> > cputype.h?
>> >
>> >> >> +#include <linux/module.h>
>> >> >> +
>> >> >> +EXPORT_SYMBOL_GPL(smp_send_reschedule);
>> >> >
>> >> > Erm...
>> >> >
>> >> > We already have arch/arm/kernel/armksyms.c for exports -- please use
>> that.
>> >> > However, exporting such low-level operations sounds like a bad
>> >> > idea. How realistic is kvm-as-a-module on ARM anyway?
>> >> >
>> >>
>> >> at this point it's broken, so I'll just remove this and leave this
>> >> for a fun project for some poor soul at some point if anyone ever
>> >> needs half the code outside the kernel as a module (the other half
>> >> needs to be compiled in anyway)
>> >
>> > Ok, that suits me. If it's broken, let's not include it in the initial
>> > submission.
>> >
>> >> >> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct
>> >> >> +kvm_regs *regs) {
>> >> >> +       return -EINVAL;
>> >> >> +}
>> >> >> +
>> >> >> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct
>> >> >> +kvm_regs *regs) {
>> >> >> +       return -EINVAL;
>> >> >> +}
>> >> >
>> >> > Again, all looks like this should be implemented using regsets from
>> >> > what I can tell.
>> >> >
>> >>
>> >> this API has been discussed to death on the KVM lists, and we can of
>> >> course revive that if the regset makes it nicer - I'd prefer getting
>> >> this upstream the way that it is now though, where GET_REG / SET_REG
>> >> seems to be the way forward from a KVM perspective.
>> >
>> > I'm sure the API has been discussed, but I've not seen that
>> > conversation and I'm also not aware about whether or not regsets were
>> > considered as a possibility for this stuff. The advantages of using them
>> are:
>> >
>> >         1. It's less code for the arch to implement (and most of what
> you
>> >         need, you already have).
>> >
>> >         2. You can move the actual copying code into core KVM, like we
>> have
>> >         for ptrace.
>> >
>> >         3. New KVM ports (e.g. arm64) can reuse the core copying code
>> >         easily.
>> >
>> > Furthermore, some registers (typically) floating point and GPRs will
>> > already have regsets for the ptrace code, so that can be reused if you
>> > share the datatypes.
>> >
>> > The big problem with getting things upstream and then changing it
>> > later is that you will break the ABI. I highly doubt that's feasible,
>> > so can we not just use regsets from the start for ARM?
>> >
>> >> >> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu) {
>> >> >> +       struct kvm_regs *cpu_reset;
>> >> >> +
>> >> >> +       switch (vcpu->arch.target) {
>> >> >> +       case KVM_ARM_TARGET_CORTEX_A15:
>> >> >> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
>> >> >> +                       return -EINVAL;
>> >> >> +               cpu_reset = &a15_regs_reset;
>> >> >> +               vcpu->arch.midr = read_cpuid_id();
>> >> >> +               break;
>> >> >> +       default:
>> >> >> +               return -ENODEV;
>> >> >> +       }
>> >> >> +
>> >> >> +       /* Reset core registers */
>> >> >> +       memcpy(&vcpu->arch.regs, cpu_reset,
>> >> >> + sizeof(vcpu->arch.regs));
>> >> >> +
>> >> >> +       /* Reset CP15 registers */
>> >> >> +       kvm_reset_coprocs(vcpu);
>> >> >> +
>> >> >> +       return 0;
>> >> >> +}
>> >> >
>> >> > This is a nice way to plug in new CPUs but the way the rest of the
>> >> > code is currently written, all the ARMv7 and Cortex-A15 code is
>> >> > merged together. I
>> >> > *strongly* suggest you isolate this from the start, as it will help
>> >> > you see what is architected and what is implementation-specific.
>> >> >
>> >>
>> >> not entirely sure what you mean. You want a separate coproc.c file
>> >> for
>> >> Cortex-A15 specific stuff like coproc_a15.c?
>> >
>> > Indeed. I think it will make adding new CPUs a lot clearer and
>> > separate the architecture from the implementation.
>> >
>> > Cheers,
>> >
>> > Will
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in the body
>> of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-10-04 13:35               ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-10-04 13:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 4, 2012 at 9:02 AM, Min-gyu Kim <mingyu84.kim@samsung.com> wrote:
>
>
>> -----Original Message-----
>> From: kvm-owner at vger.kernel.org [mailto:kvm-owner at vger.kernel.org] On
>> Behalf Of Christoffer Dall
>> Sent: Monday, October 01, 2012 4:22 AM
>> To: Will Deacon
>> Cc: kvm at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
>> kvmarm at lists.cs.columbia.edu; rusty.russell at linaro.org; avi at redhat.com;
>> marc.zyngier at arm.com
>> Subject: Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM
>> support
>>
>> On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
>> >> On 09/25/2012 11:20 AM, Will Deacon wrote:
>> >> >> +/* Multiprocessor Affinity Register */
>> >> >> +#define MPIDR_CPUID    (0x3 << 0)
>> >> >
>> >> > I'm fairly sure we already have code under arch/arm/ for dealing
>> >> > with the mpidr. Let's re-use that rather than reinventing it here.
>> >> >
>> >>
>> >> I see some defines in topology.c - do you want some of these factored
>> >> out into a header file that we can then also use from kvm? If so,
> where?
>> >
>> > I guess either in topology.h or a new header (topology-bits.h).
>> >
>> >> >> +#define EXCEPTION_NONE      0
>> >> >> +#define EXCEPTION_RESET     0x80
>> >> >> +#define EXCEPTION_UNDEFINED 0x40
>> >> >> +#define EXCEPTION_SOFTWARE  0x20
>> >> >> +#define EXCEPTION_PREFETCH  0x10
>> >> >> +#define EXCEPTION_DATA      0x08
>> >> >> +#define EXCEPTION_IMPRECISE 0x04
>> >> >> +#define EXCEPTION_IRQ       0x02
>> >> >> +#define EXCEPTION_FIQ       0x01
>> >> >
>> >> > Why the noise?
>> >> >
>> >>
>> >> these are simply cruft from a previous life of KVM/ARM.
>> >
>> > Ok, then please get rid of them.
>> >
>> >> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu) {
>> >> >> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
>> >> >> +       BUG_ON(mode == 0xf);
>> >> >> +       return mode;
>> >> >> +}
>> >> >
>> >> > I noticed that you have a fair few BUG_ONs throughout the series.
>> >> > Fair enough, but for hyp code is that really the right thing to do?
>> >> > Killing the guest could make more sense, perhaps?
>> >>
>> >> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
>> >> catch explicitly on the host. We have had a pass over the code to
>> >> change all the BUG_ONs that can be provoked by the guest and inject
>> >> the proper exceptions into the guest in this case. If you find places
>> >> where this is not the case, it should be changed, and do let me know.
>> >
>> > Ok, so are you saying that a BUG_ON due to some detected inconsistency
>> > with one guest may not necessarily terminate the other guests? BUG_ONs
>> > in the host seem like a bad idea if the host is able to continue with
>> > a subset of guests.
>> >
>>
>> No, I'm saying a BUG_ON is an actual BUG, it should not happen and there
>> should be nowhere where a guest can cause a BUG_ON to occur in the host,
>> because that would be a bug.
>>
>> We basically never kill a guest unless really extreme things happen (like
>> we cannot allocate a pte, in which case we return -ENOMEM). If a guest
>> does something really weird, that guest will receive the appropriate
>> exception (undefined, prefetch abort, ...)
>>
>
> I agree with Will. It seems to be overreacting to kill the entire system.
>
> From the code above, BUG_ON case clearly points out that there happened a
> serious bug case. However, killing the corresponding VM may not cause any
> further problem.
> Then leave some logs for debugging and killing the VM seems to be enough.
>
> Let's assume KVM for ARM is distributed with a critical bug.
> If the case is defended by BUG_ON, it will cause host to shutdown.
> If the case is defended by killing VM, it will cause VM to shutdown.
> In my opinion, latter case seems to be better.
>
> I looked for a guide on BUG_ON and found this:
>      http://yarchive.net/comp/linux/BUG.html
>
>

I completely agree with all this, no further argument is needed. The
point of a BUG_ON is to explicitly state the reason for a bug that
will anyhow cause the host kernel to overall malfunction. The above
bug_on statement is long gone (see new patches), and if you see other
cases like this, the code should have been tested and we can remove
the BUG_ON.

>> >> >
>> >> >> +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu) {
>> >> >> +       return vcpu_reg(vcpu, 15); }
>> >> >
>> >> > If you stick a struct pt_regs into struct kvm_regs, you could reuse
>> >> > ARM_pc here etc.
>> >> >
>> >>
>> >> I prefer not to, because we'd have those registers presumably for usr
>> >> mode and then we only define the others explicit. I think it's much
>> >> clearer to look at kvm_regs today.
>> >
>> > I disagree and think that you should reuse as much of the arch/arm/
>> > code as possible. Not only does it make it easier to read by people
>> > who are familiar with that code (and in turn get you more reviewers)
>> > but it also means that we limit the amount of duplication that we have.
>>
>> Reusing a struct just for the sake of reusing is not necessarily an
>> improvement. Some times it complicates things, and some times it's
>> misleading. To me, pt_regs carry the meaning that these are the registers
>> for a user space process that traps into the kernel - in KVM we emulate a
>> virtual CPU and that current definition is quite clear.
>>
>> The argument that more people will review the code if the struct contains
>> a pt_regs field rather than a usr_regs field is completely invalid,
>> because I'm sure everyone that reviews virtualization code will know that
>> user mode is a mode on the cpu and it has some registers and this is the
>> state we store when we context switch a VM - pt_regs could be read as the
>> regs that we stored in the mode that the VM happened to be in when we took
>> an exception, which I would think is crazy, and probably not what you
>> suggest.
>>
>> Writing the literal 15 for the PC register is not really a problem in
>> terms of duplication - it's nothing that requires separate maintenance.
>>
>> At this point the priority should really be correctness, readability, and
>> performance, imho.
>>
>> >
>> > I think Marc (CC'd) had a go at this with some success.
>> >
>>
>> great, if this improves the code, then I suggest someone rebases an
>> appropriate patch and sends it to the kvmarm mailing list so we can have a
>> look at it, but there are users out there looking to try kvm/arm and we
>> should try to give it to them.
>>
>> >> >> +#ifndef __ARM_KVM_HOST_H__
>> >> >> +#define __ARM_KVM_HOST_H__
>> >> >> +
>> >> >> +#include <asm/kvm.h>
>> >> >> +
>> >> >> +#define KVM_MAX_VCPUS 4
>> >> >
>> >> > NR_CPUS?
>> >> >
>> >>
>> >> well this is defined by KVM generic code, and is common for other
>> >> architecture.
>> >
>> > I mean #define KVM_MAX_CPUS NR_CPUS. The 4 seems arbitrary.
>> >
>> >> >> +int __attribute_const__ kvm_target_cpu(void) {
>> >> >> +       unsigned int midr;
>> >> >> +
>> >> >> +       midr = read_cpuid_id();
>> >> >> +       switch ((midr >> 4) & 0xfff) {
>> >> >> +       case KVM_ARM_TARGET_CORTEX_A15:
>> >> >> +               return KVM_ARM_TARGET_CORTEX_A15;
>> >> >
>> >> > I have this code already in perf_event.c. Can we move it somewhere
>> >> > common and share it? You should also check that the implementor field
>> is 0x41.
>> >> >
>> >>
>> >> by all means, you can probably suggest a good place better than I
> can...
>> >
>> > cputype.h?
>> >
>> >> >> +#include <linux/module.h>
>> >> >> +
>> >> >> +EXPORT_SYMBOL_GPL(smp_send_reschedule);
>> >> >
>> >> > Erm...
>> >> >
>> >> > We already have arch/arm/kernel/armksyms.c for exports -- please use
>> that.
>> >> > However, exporting such low-level operations sounds like a bad
>> >> > idea. How realistic is kvm-as-a-module on ARM anyway?
>> >> >
>> >>
>> >> at this point it's broken, so I'll just remove this and leave this
>> >> for a fun project for some poor soul at some point if anyone ever
>> >> needs half the code outside the kernel as a module (the other half
>> >> needs to be compiled in anyway)
>> >
>> > Ok, that suits me. If it's broken, let's not include it in the initial
>> > submission.
>> >
>> >> >> +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct
>> >> >> +kvm_regs *regs) {
>> >> >> +       return -EINVAL;
>> >> >> +}
>> >> >> +
>> >> >> +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct
>> >> >> +kvm_regs *regs) {
>> >> >> +       return -EINVAL;
>> >> >> +}
>> >> >
>> >> > Again, all looks like this should be implemented using regsets from
>> >> > what I can tell.
>> >> >
>> >>
>> >> this API has been discussed to death on the KVM lists, and we can of
>> >> course revive that if the regset makes it nicer - I'd prefer getting
>> >> this upstream the way that it is now though, where GET_REG / SET_REG
>> >> seems to be the way forward from a KVM perspective.
>> >
>> > I'm sure the API has been discussed, but I've not seen that
>> > conversation and I'm also not aware about whether or not regsets were
>> > considered as a possibility for this stuff. The advantages of using them
>> are:
>> >
>> >         1. It's less code for the arch to implement (and most of what
> you
>> >         need, you already have).
>> >
>> >         2. You can move the actual copying code into core KVM, like we
>> have
>> >         for ptrace.
>> >
>> >         3. New KVM ports (e.g. arm64) can reuse the core copying code
>> >         easily.
>> >
>> > Furthermore, some registers (typically) floating point and GPRs will
>> > already have regsets for the ptrace code, so that can be reused if you
>> > share the datatypes.
>> >
>> > The big problem with getting things upstream and then changing it
>> > later is that you will break the ABI. I highly doubt that's feasible,
>> > so can we not just use regsets from the start for ARM?
>> >
>> >> >> +int kvm_reset_vcpu(struct kvm_vcpu *vcpu) {
>> >> >> +       struct kvm_regs *cpu_reset;
>> >> >> +
>> >> >> +       switch (vcpu->arch.target) {
>> >> >> +       case KVM_ARM_TARGET_CORTEX_A15:
>> >> >> +               if (vcpu->vcpu_id > a15_max_cpu_idx)
>> >> >> +                       return -EINVAL;
>> >> >> +               cpu_reset = &a15_regs_reset;
>> >> >> +               vcpu->arch.midr = read_cpuid_id();
>> >> >> +               break;
>> >> >> +       default:
>> >> >> +               return -ENODEV;
>> >> >> +       }
>> >> >> +
>> >> >> +       /* Reset core registers */
>> >> >> +       memcpy(&vcpu->arch.regs, cpu_reset,
>> >> >> + sizeof(vcpu->arch.regs));
>> >> >> +
>> >> >> +       /* Reset CP15 registers */
>> >> >> +       kvm_reset_coprocs(vcpu);
>> >> >> +
>> >> >> +       return 0;
>> >> >> +}
>> >> >
>> >> > This is a nice way to plug in new CPUs but the way the rest of the
>> >> > code is currently written, all the ARMv7 and Cortex-A15 code is
>> >> > merged together. I
>> >> > *strongly* suggest you isolate this from the start, as it will help
>> >> > you see what is architected and what is implementation-specific.
>> >> >
>> >>
>> >> not entirely sure what you mean. You want a separate coproc.c file
>> >> for
>> >> Cortex-A15 specific stuff like coproc_a15.c?
>> >
>> > Indeed. I think it will make adding new CPUs a lot clearer and
>> > separate the architecture from the implementation.
>> >
>> > Cheers,
>> >
>> > Will
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in the body
>> of a message to majordomo at vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [kvmarm] [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-09-25 15:20     ` Will Deacon
@ 2012-10-04 13:44       ` Avi Kivity
  -1 siblings, 0 replies; 164+ messages in thread
From: Avi Kivity @ 2012-10-04 13:44 UTC (permalink / raw)
  To: Will Deacon; +Cc: Christoffer Dall, linux-arm-kernel, kvm, kvmarm

On 09/25/2012 05:20 PM, Will Deacon wrote:
>> +       case KVM_GET_REG_LIST: {
>> +               struct kvm_reg_list __user *user_list = argp;
>> +               struct kvm_reg_list reg_list;
>> +               unsigned n;
>> +
>> +               if (copy_from_user(&reg_list, user_list, sizeof reg_list))
>> +                       return -EFAULT;
>> +               n = reg_list.n;
>> +               reg_list.n = kvm_arm_num_regs(vcpu);
>> +               if (copy_to_user(user_list, &reg_list, sizeof reg_list))
>> +                       return -EFAULT;
>> +               if (n < reg_list.n)
>> +                       return -E2BIG;
>> +               return kvm_arm_copy_reg_indices(vcpu, user_list->reg);
> 
> kvm_reg_list sounds like it could be done using a regset instead.

Wouldn't those regsets be userspace oriented?

For example, the GPRs returned here include all the shadowed interrupt
registers (or however they're called) while most user oriented APIs
would only include the user visible registers.

FWIW, we're trying to move to an architecture independent ABI for KVM
registers, but that's a lot of work since we need to make sure all the
weird x86 registers (and non-register state) fit into that.  Maybe that
ABI will be regset based, but I don't want to block the ARM port on this.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [kvmarm] [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-10-04 13:44       ` Avi Kivity
  0 siblings, 0 replies; 164+ messages in thread
From: Avi Kivity @ 2012-10-04 13:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/25/2012 05:20 PM, Will Deacon wrote:
>> +       case KVM_GET_REG_LIST: {
>> +               struct kvm_reg_list __user *user_list = argp;
>> +               struct kvm_reg_list reg_list;
>> +               unsigned n;
>> +
>> +               if (copy_from_user(&reg_list, user_list, sizeof reg_list))
>> +                       return -EFAULT;
>> +               n = reg_list.n;
>> +               reg_list.n = kvm_arm_num_regs(vcpu);
>> +               if (copy_to_user(user_list, &reg_list, sizeof reg_list))
>> +                       return -EFAULT;
>> +               if (n < reg_list.n)
>> +                       return -E2BIG;
>> +               return kvm_arm_copy_reg_indices(vcpu, user_list->reg);
> 
> kvm_reg_list sounds like it could be done using a regset instead.

Wouldn't those regsets be userspace oriented?

For example, the GPRs returned here include all the shadowed interrupt
registers (or however they're called) while most user oriented APIs
would only include the user visible registers.

FWIW, we're trying to move to an architecture independent ABI for KVM
registers, but that's a lot of work since we need to make sure all the
weird x86 registers (and non-register state) fit into that.  Maybe that
ABI will be regset based, but I don't want to block the ARM port on this.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 164+ messages in thread

* RE: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
  2012-10-04 13:02             ` Min-gyu Kim
@ 2012-10-05  6:28               ` Rusty Russell
  -1 siblings, 0 replies; 164+ messages in thread
From: Rusty Russell @ 2012-10-05  6:28 UTC (permalink / raw)
  To: Min-gyu Kim, 'Christoffer Dall', 'Will Deacon'
  Cc: kvm, linux-arm-kernel, kvmarm, rusty.russell, avi, marc.zyngier,
	김창환

Min-gyu Kim <mingyu84.kim@samsung.com> writes:
>> -----Original Message-----
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
>> Behalf Of Christoffer Dall
>> Sent: Monday, October 01, 2012 4:22 AM
>> To: Will Deacon
>> Cc: kvm@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
>> kvmarm@lists.cs.columbia.edu; rusty.russell@linaro.org; avi@redhat.com;
>> marc.zyngier@arm.com
>> Subject: Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM
>> support
>> 
>> On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
>> >> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu) {
>> >> >> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
>> >> >> +       BUG_ON(mode == 0xf);
>> >> >> +       return mode;
>> >> >> +}
>> >> >
>> >> > I noticed that you have a fair few BUG_ONs throughout the series.
>> >> > Fair enough, but for hyp code is that really the right thing to do?
>> >> > Killing the guest could make more sense, perhaps?
>> >>
>> >> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
>> >> catch explicitly on the host. We have had a pass over the code to
>> >> change all the BUG_ONs that can be provoked by the guest and inject
>> >> the proper exceptions into the guest in this case. If you find places
>> >> where this is not the case, it should be changed, and do let me know.
>> >
>> > Ok, so are you saying that a BUG_ON due to some detected inconsistency
>> > with one guest may not necessarily terminate the other guests? BUG_ONs
>> > in the host seem like a bad idea if the host is able to continue with
>> > a subset of guests.
>> >
>> 
>> No, I'm saying a BUG_ON is an actual BUG, it should not happen and there
>> should be nowhere where a guest can cause a BUG_ON to occur in the host,
>> because that would be a bug.
>> 
>> We basically never kill a guest unless really extreme things happen (like
>> we cannot allocate a pte, in which case we return -ENOMEM). If a guest
>> does something really weird, that guest will receive the appropriate
>> exception (undefined, prefetch abort, ...)
>> 
>
> I agree with Will. It seems to be overreacting to kill the entire system.

No.  If we manage to put the guest in an undefined state, we don't know
what has happened.  Something has gone horribly wrong.  Most likely,
vcpu isn't a vcpu pointer at all, or has been so horribly corrupted that
"killing the guest" is just a complicated way of messing ourselves up
further.

The system by this stage is so damaged that we're best off panicing, and
avoiding corrupting things further.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support
@ 2012-10-05  6:28               ` Rusty Russell
  0 siblings, 0 replies; 164+ messages in thread
From: Rusty Russell @ 2012-10-05  6:28 UTC (permalink / raw)
  To: linux-arm-kernel

Min-gyu Kim <mingyu84.kim@samsung.com> writes:
>> -----Original Message-----
>> From: kvm-owner at vger.kernel.org [mailto:kvm-owner at vger.kernel.org] On
>> Behalf Of Christoffer Dall
>> Sent: Monday, October 01, 2012 4:22 AM
>> To: Will Deacon
>> Cc: kvm at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
>> kvmarm at lists.cs.columbia.edu; rusty.russell at linaro.org; avi at redhat.com;
>> marc.zyngier at arm.com
>> Subject: Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM
>> support
>> 
>> On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
>> >> >> +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu) {
>> >> >> +       u8 mode = __vcpu_mode(vcpu->arch.regs.cpsr);
>> >> >> +       BUG_ON(mode == 0xf);
>> >> >> +       return mode;
>> >> >> +}
>> >> >
>> >> > I noticed that you have a fair few BUG_ONs throughout the series.
>> >> > Fair enough, but for hyp code is that really the right thing to do?
>> >> > Killing the guest could make more sense, perhaps?
>> >>
>> >> the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
>> >> catch explicitly on the host. We have had a pass over the code to
>> >> change all the BUG_ONs that can be provoked by the guest and inject
>> >> the proper exceptions into the guest in this case. If you find places
>> >> where this is not the case, it should be changed, and do let me know.
>> >
>> > Ok, so are you saying that a BUG_ON due to some detected inconsistency
>> > with one guest may not necessarily terminate the other guests? BUG_ONs
>> > in the host seem like a bad idea if the host is able to continue with
>> > a subset of guests.
>> >
>> 
>> No, I'm saying a BUG_ON is an actual BUG, it should not happen and there
>> should be nowhere where a guest can cause a BUG_ON to occur in the host,
>> because that would be a bug.
>> 
>> We basically never kill a guest unless really extreme things happen (like
>> we cannot allocate a pte, in which case we return -ENOMEM). If a guest
>> does something really weird, that guest will receive the appropriate
>> exception (undefined, prefetch abort, ...)
>> 
>
> I agree with Will. It seems to be overreacting to kill the entire system.

No.  If we manage to put the guest in an undefined state, we don't know
what has happened.  Something has gone horribly wrong.  Most likely,
vcpu isn't a vcpu pointer at all, or has been so horribly corrupted that
"killing the guest" is just a complicated way of messing ourselves up
further.

The system by this stage is so damaged that we're best off panicing, and
avoiding corrupting things further.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts
  2012-10-01 12:53         ` Dave Martin
@ 2012-10-05  9:00           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 164+ messages in thread
From: Russell King - ARM Linux @ 2012-10-05  9:00 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Tixy, Will Deacon, linux-arm-kernel, kvm, kvmarm

On Mon, Oct 01, 2012 at 01:53:26PM +0100, Dave Martin wrote:
> A good starting point would be load/store emulation as this seems to be a
> common theme, and we would need a credible deployment for any new
> framework so that we know it's fit for purpose.

Probably not actually, that code is written to be fast, because things
like IP stack throughput depend on it - particularly when your network
card can only DMA packets to 32-bit aligned addresses (resulting in
virtually all network data being misaligned.)

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 14/15] KVM: ARM: Handle I/O aborts
@ 2012-10-05  9:00           ` Russell King - ARM Linux
  0 siblings, 0 replies; 164+ messages in thread
From: Russell King - ARM Linux @ 2012-10-05  9:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 01, 2012 at 01:53:26PM +0100, Dave Martin wrote:
> A good starting point would be load/store emulation as this seems to be a
> common theme, and we would need a credible deployment for any new
> framework so that we know it's fit for purpose.

Probably not actually, that code is written to be fast, because things
like IP stack throughput depend on it - particularly when your network
card can only DMA packets to 32-bit aligned addresses (resulting in
virtually all network data being misaligned.)

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts
  2012-10-05  9:00           ` Russell King - ARM Linux
@ 2012-10-08 10:04             ` Dave Martin
  -1 siblings, 0 replies; 164+ messages in thread
From: Dave Martin @ 2012-10-08 10:04 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Christoffer Dall, Tixy, Will Deacon, linux-arm-kernel, kvm, kvmarm

On Fri, Oct 05, 2012 at 10:00:25AM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 01, 2012 at 01:53:26PM +0100, Dave Martin wrote:
> > A good starting point would be load/store emulation as this seems to be a
> > common theme, and we would need a credible deployment for any new
> > framework so that we know it's fit for purpose.
> 
> Probably not actually, that code is written to be fast, because things
> like IP stack throughput depend on it - particularly when your network
> card can only DMA packets to 32-bit aligned addresses (resulting in
> virtually all network data being misaligned.)

A fair point, but surely it would still be worth a try?
 
We might decide that a few particular cases of instruction decode
should not use the generic framework for performance reaons, but in
most cases being critically dependent on fault-driven software
emulation for performance would be a serious mistake in the first place
(discussions about the network code notwithstanding).

This is not an argument for being slower just for the sake of it, but
it can make sense to factor code on paths where performance is not an
issue.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 14/15] KVM: ARM: Handle I/O aborts
@ 2012-10-08 10:04             ` Dave Martin
  0 siblings, 0 replies; 164+ messages in thread
From: Dave Martin @ 2012-10-08 10:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 05, 2012 at 10:00:25AM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 01, 2012 at 01:53:26PM +0100, Dave Martin wrote:
> > A good starting point would be load/store emulation as this seems to be a
> > common theme, and we would need a credible deployment for any new
> > framework so that we know it's fit for purpose.
> 
> Probably not actually, that code is written to be fast, because things
> like IP stack throughput depend on it - particularly when your network
> card can only DMA packets to 32-bit aligned addresses (resulting in
> virtually all network data being misaligned.)

A fair point, but surely it would still be worth a try?
 
We might decide that a few particular cases of instruction decode
should not use the generic framework for performance reaons, but in
most cases being critically dependent on fault-driven software
emulation for performance would be a serious mistake in the first place
(discussions about the network code notwithstanding).

This is not an argument for being slower just for the sake of it, but
it can make sense to factor code on paths where performance is not an
issue.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 164+ messages in thread

* Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts
  2012-10-08 10:04             ` Dave Martin
@ 2012-10-08 21:52               ` Christoffer Dall
  -1 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-10-08 21:52 UTC (permalink / raw)
  To: Dave Martin
  Cc: Russell King - ARM Linux, Tixy, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On Mon, Oct 8, 2012 at 6:04 AM, Dave Martin <dave.martin@linaro.org> wrote:
> On Fri, Oct 05, 2012 at 10:00:25AM +0100, Russell King - ARM Linux wrote:
>> On Mon, Oct 01, 2012 at 01:53:26PM +0100, Dave Martin wrote:
>> > A good starting point would be load/store emulation as this seems to be a
>> > common theme, and we would need a credible deployment for any new
>> > framework so that we know it's fit for purpose.
>>
>> Probably not actually, that code is written to be fast, because things
>> like IP stack throughput depend on it - particularly when your network
>> card can only DMA packets to 32-bit aligned addresses (resulting in
>> virtually all network data being misaligned.)
>
> A fair point, but surely it would still be worth a try?
>
> We might decide that a few particular cases of instruction decode
> should not use the generic framework for performance reaons, but in
> most cases being critically dependent on fault-driven software
> emulation for performance would be a serious mistake in the first place
> (discussions about the network code notwithstanding).
>
> This is not an argument for being slower just for the sake of it, but
> it can make sense to factor code on paths where performance is not an
> issue.
>

I'm all for unifying this stuff, but I still think it doesn't qualify
for holding back on merging KVM patches. The ARM mode instruction
decoding can definitely be cleaned up though to look more like the
Thumb2 mode decoding which will be a good step before refactoring to
use a more common framework. Currently we decode too many types of
instructions (not just the ones with cleared HSR.IV) in ARM mode, so
the whole complexity of that code can be reduced.

I'll give that a go before re-sending the KVM patch series.

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

* [PATCH 14/15] KVM: ARM: Handle I/O aborts
@ 2012-10-08 21:52               ` Christoffer Dall
  0 siblings, 0 replies; 164+ messages in thread
From: Christoffer Dall @ 2012-10-08 21:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 8, 2012 at 6:04 AM, Dave Martin <dave.martin@linaro.org> wrote:
> On Fri, Oct 05, 2012 at 10:00:25AM +0100, Russell King - ARM Linux wrote:
>> On Mon, Oct 01, 2012 at 01:53:26PM +0100, Dave Martin wrote:
>> > A good starting point would be load/store emulation as this seems to be a
>> > common theme, and we would need a credible deployment for any new
>> > framework so that we know it's fit for purpose.
>>
>> Probably not actually, that code is written to be fast, because things
>> like IP stack throughput depend on it - particularly when your network
>> card can only DMA packets to 32-bit aligned addresses (resulting in
>> virtually all network data being misaligned.)
>
> A fair point, but surely it would still be worth a try?
>
> We might decide that a few particular cases of instruction decode
> should not use the generic framework for performance reaons, but in
> most cases being critically dependent on fault-driven software
> emulation for performance would be a serious mistake in the first place
> (discussions about the network code notwithstanding).
>
> This is not an argument for being slower just for the sake of it, but
> it can make sense to factor code on paths where performance is not an
> issue.
>

I'm all for unifying this stuff, but I still think it doesn't qualify
for holding back on merging KVM patches. The ARM mode instruction
decoding can definitely be cleaned up though to look more like the
Thumb2 mode decoding which will be a good step before refactoring to
use a more common framework. Currently we decode too many types of
instructions (not just the ones with cleared HSR.IV) in ARM mode, so
the whole complexity of that code can be reduced.

I'll give that a go before re-sending the KVM patch series.

-Christoffer

^ permalink raw reply	[flat|nested] 164+ messages in thread

end of thread, other threads:[~2012-10-08 21:52 UTC | newest]

Thread overview: 164+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-15 15:34 [PATCH 00/15] KVM/ARM Implementation Christoffer Dall
2012-09-15 15:34 ` Christoffer Dall
2012-09-15 15:34 ` [PATCH 01/15] ARM: add mem_type prot_pte accessor Christoffer Dall
2012-09-15 15:34   ` Christoffer Dall
2012-09-18 12:23   ` Will Deacon
2012-09-18 12:23     ` Will Deacon
2012-09-18 19:18     ` Christoffer Dall
2012-09-18 19:18       ` Christoffer Dall
2012-09-18 21:04   ` Russell King - ARM Linux
2012-09-18 21:04     ` Russell King - ARM Linux
2012-09-18 21:53     ` Christoffer Dall
2012-09-18 21:53       ` Christoffer Dall
2012-09-20 10:01       ` Marc Zyngier
2012-09-20 10:01         ` Marc Zyngier
2012-09-20 13:21         ` Christoffer Dall
2012-09-20 13:21           ` Christoffer Dall
2012-09-15 15:34 ` [PATCH 02/15] ARM: Add page table and page defines needed by KVM Christoffer Dall
2012-09-15 15:34   ` Christoffer Dall
2012-09-18 12:47   ` Will Deacon
2012-09-18 12:47     ` Will Deacon
2012-09-18 14:06     ` Catalin Marinas
2012-09-18 14:06       ` Catalin Marinas
2012-09-18 15:05       ` Christoffer Dall
2012-09-18 15:05         ` Christoffer Dall
2012-09-18 15:07         ` Catalin Marinas
2012-09-18 15:07           ` Catalin Marinas
2012-09-18 15:10           ` Christoffer Dall
2012-09-18 15:10             ` Christoffer Dall
2012-09-18 22:01     ` Christoffer Dall
2012-09-18 22:01       ` Christoffer Dall
2012-09-19  9:21       ` Will Deacon
2012-09-19  9:21         ` Will Deacon
2012-09-20  0:10         ` Christoffer Dall
2012-09-20  0:10           ` Christoffer Dall
2012-09-15 15:34 ` [PATCH 03/15] ARM: Section based HYP idmap Christoffer Dall
2012-09-15 15:34   ` Christoffer Dall
2012-09-18 13:00   ` Will Deacon
2012-09-18 13:00     ` Will Deacon
2012-10-01  2:19     ` Christoffer Dall
2012-10-01  2:19       ` Christoffer Dall
2012-09-15 15:34 ` [PATCH 04/15] ARM: idmap: only initialize HYP idmap when HYP mode is available Christoffer Dall
2012-09-15 15:34   ` Christoffer Dall
2012-09-18 13:03   ` Will Deacon
2012-09-18 13:03     ` Will Deacon
2012-09-20  0:11     ` Christoffer Dall
2012-09-20  0:11       ` Christoffer Dall
2012-09-15 15:35 ` [PATCH 05/15] ARM: Expose PMNC bitfields for KVM use Christoffer Dall
2012-09-15 15:35   ` Christoffer Dall
2012-09-18 13:08   ` Will Deacon
2012-09-18 13:08     ` Will Deacon
2012-09-18 22:13     ` Christoffer Dall
2012-09-18 22:13       ` Christoffer Dall
2012-09-19  4:09     ` [kvmarm] " Rusty Russell
2012-09-19  4:09       ` Rusty Russell
2012-09-19  9:30       ` Will Deacon
2012-09-19  9:30         ` Will Deacon
2012-09-15 15:35 ` [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support Christoffer Dall
2012-09-15 15:35   ` Christoffer Dall
2012-09-25 15:20   ` Will Deacon
2012-09-25 15:20     ` Will Deacon
2012-09-26  1:43     ` Christoffer Dall
2012-09-26  1:43       ` Christoffer Dall
2012-09-27 14:13       ` Will Deacon
2012-09-27 14:13         ` Will Deacon
2012-09-27 14:39         ` Marc Zyngier
2012-09-27 14:39           ` Marc Zyngier
2012-09-27 14:45         ` [kvmarm] " Peter Maydell
2012-09-27 14:45           ` Peter Maydell
2012-09-27 15:20           ` Will Deacon
2012-09-27 15:20             ` Will Deacon
2012-09-30 19:21         ` Christoffer Dall
2012-09-30 19:21           ` Christoffer Dall
2012-10-01 13:03           ` [kvmarm] " Marc Zyngier
2012-10-01 13:03             ` Marc Zyngier
2012-10-04 13:02           ` Min-gyu Kim
2012-10-04 13:02             ` Min-gyu Kim
2012-10-04 13:35             ` Christoffer Dall
2012-10-04 13:35               ` Christoffer Dall
2012-10-05  6:28             ` Rusty Russell
2012-10-05  6:28               ` Rusty Russell
2012-10-04 13:44     ` [kvmarm] " Avi Kivity
2012-10-04 13:44       ` Avi Kivity
2012-09-15 15:35 ` [PATCH 07/15] KVM: ARM: Hypervisor inititalization Christoffer Dall
2012-09-15 15:35   ` Christoffer Dall
2012-09-15 15:35 ` [PATCH 08/15] KVM: ARM: Memory virtualization setup Christoffer Dall
2012-09-15 15:35   ` Christoffer Dall
2012-09-15 15:35 ` [PATCH 09/15] KVM: ARM: Inject IRQs and FIQs from userspace Christoffer Dall
2012-09-15 15:35   ` Christoffer Dall
2012-09-25 15:55   ` Will Deacon
2012-09-25 15:55     ` Will Deacon
2012-09-29 15:50     ` Christoffer Dall
2012-09-29 15:50       ` Christoffer Dall
2012-09-30 12:48       ` Will Deacon
2012-09-30 12:48         ` Will Deacon
2012-09-30 14:34         ` Christoffer Dall
2012-09-30 14:34           ` Christoffer Dall
2012-09-15 15:35 ` [PATCH 10/15] KVM: ARM: World-switch implementation Christoffer Dall
2012-09-15 15:35   ` Christoffer Dall
2012-09-25 17:00   ` Will Deacon
2012-09-25 17:00     ` Will Deacon
2012-09-25 17:15     ` [kvmarm] " Peter Maydell
2012-09-25 17:15       ` Peter Maydell
2012-09-25 17:42       ` Marc Zyngier
2012-09-25 17:42         ` Marc Zyngier
2012-09-30  0:33         ` Christoffer Dall
2012-09-30  0:33           ` Christoffer Dall
2012-09-30  9:48           ` Peter Maydell
2012-09-30  9:48             ` Peter Maydell
2012-09-30 14:31             ` Christoffer Dall
2012-09-30 14:31               ` Christoffer Dall
2012-09-30 17:47     ` Christoffer Dall
2012-09-30 17:47       ` Christoffer Dall
2012-09-15 15:35 ` [PATCH 11/15] KVM: ARM: Emulation framework and CP15 emulation Christoffer Dall
2012-09-15 15:35   ` Christoffer Dall
2012-09-15 15:35 ` [PATCH 12/15] KVM: ARM: User space API for getting/setting co-proc registers Christoffer Dall
2012-09-15 15:35   ` Christoffer Dall
2012-09-15 15:35 ` [PATCH 13/15] KVM: ARM: Handle guest faults in KVM Christoffer Dall
2012-09-15 15:35   ` Christoffer Dall
2012-09-25 11:11   ` Min-gyu Kim
2012-09-25 11:11     ` Min-gyu Kim
2012-09-25 12:38     ` Christoffer Dall
2012-09-25 12:38       ` Christoffer Dall
2012-09-27  3:11       ` Min-gyu Kim
2012-09-27  3:11         ` Min-gyu Kim
2012-09-27  5:35         ` Christoffer Dall
2012-09-27  5:35           ` Christoffer Dall
2012-09-27 15:26         ` [kvmarm] " Marc Zyngier
2012-09-27 15:26           ` Marc Zyngier
2012-09-27 12:39       ` Catalin Marinas
2012-09-27 12:39         ` Catalin Marinas
2012-09-27 17:15         ` Christoffer Dall
2012-09-27 17:15           ` Christoffer Dall
2012-09-27 17:21           ` Catalin Marinas
2012-09-27 17:21             ` Catalin Marinas
2012-09-15 15:35 ` [PATCH 14/15] KVM: ARM: Handle I/O aborts Christoffer Dall
2012-09-15 15:35   ` Christoffer Dall
2012-09-27 15:11   ` Will Deacon
2012-09-27 15:11     ` Will Deacon
2012-09-30 21:49     ` Christoffer Dall
2012-09-30 21:49       ` Christoffer Dall
2012-10-01 12:53       ` Dave Martin
2012-10-01 12:53         ` Dave Martin
2012-10-01 15:12         ` Jon Medhurst (Tixy)
2012-10-01 15:12           ` Jon Medhurst (Tixy)
2012-10-01 16:07           ` Dave Martin
2012-10-01 16:07             ` Dave Martin
2012-10-05  9:00         ` Russell King - ARM Linux
2012-10-05  9:00           ` Russell King - ARM Linux
2012-10-08 10:04           ` Dave Martin
2012-10-08 10:04             ` Dave Martin
2012-10-08 21:52             ` Christoffer Dall
2012-10-08 21:52               ` Christoffer Dall
2012-09-15 15:36 ` [PATCH 15/15] KVM: ARM: Guest wait-for-interrupts (WFI) support Christoffer Dall
2012-09-15 15:36   ` Christoffer Dall
2012-09-25 17:04   ` Will Deacon
2012-09-25 17:04     ` Will Deacon
2012-09-29 23:00     ` Christoffer Dall
2012-09-29 23:00       ` Christoffer Dall
2012-09-18 12:21 ` [PATCH 00/15] KVM/ARM Implementation Will Deacon
2012-09-18 12:21   ` Will Deacon
2012-09-18 12:32   ` Christoffer Dall
2012-09-18 12:32     ` Christoffer Dall
2012-09-19 12:44 ` Avi Kivity
2012-09-19 12:44   ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.