All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/14] KVM/ARM Implementation
@ 2013-01-08 18:38 ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:38 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm

The following series implements KVM support for ARM processors,
specifically on the Cortex-A15 platform.

Work is done in collaboration between Columbia University, Virtual Open
Systems and ARM/Linaro.

The patch series applies to Linux 3.8-rc2 with kvm/next merged:
 git://git.kernel.org/pub/scm/virt/kvm/kvm.git
        branch: next (commit e11ae1a102b)

The series relies on two additional patches in Will Deacon's perf tree:
    ARM: Define CPU part numbers and implementors
    ARM: Use implementor and part defines from cputype.h

This is Version 15 of the patch series, the first 10 versions were
reviewed on the KVM/ARM and KVM mailing lists. Changes can also be
pulled from:
    git://github.com/virtualopensystems/linux-kvm-arm.git
        branch: kvm-arm-v15
        branch: kvm-arm-v15-vgic
        branch: kvm-arm-v15-vgic-timers

A non-flattened edition of the patch series, which can always be merged,
can be found at:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-master

This patch series requires QEMU compatibility.  Use the branch
 git://github.com/virtualopensystems/qemu.git kvm-arm

There is also WIP QEMU patches to support virtio on ARM:
 git://github.com/virtualopensystems/qemu.git kvm-arm-virtio

There is also a rebasing WIP branch with support for huge pages:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-hugetlb

Finally there is kvmtool support available for the mach-virt machine:
 git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git

Following this patch series, which implements core KVM support, are two
other patch series implementing Virtual Generic Interrupt Controller
(VGIC) support and Architected Generic Timers.  All three patch series
should be applied for full QEMU compatibility.

The implementation is broken up into a logical set of patches, the first
are preparatory patches:
  1. ARM: Add page table defines for KVM
  2. ARM: Section based HYP idmaps

The main implementation is broken up into separate patches, the first
containing a skeleton of files, makefile changes, the basic user space
interface and KVM architecture specific stubs.  Subsequent patches
implement parts of the system as listed:
  3. Skeleton and reset hooks
  4. Hypervisor initialization
  5. Memory virtualization setup (hyp mode mappings and 2nd stage)
  6. Inject IRQs and FIQs from userspace
  7. World-switch implementation and Hyp exception vectors
  8. Emulation framework and coproc emulation
  9. Coproc user space API
 10. Demux multiplexed coproc registers
 11. User spac API to get/set VFP registers
 12. Handle guest user memory aborts
 13. Handle guest MMIO aborts
 14. Add an entry in the MAINTAINERS file

Testing:
 Tested on the Versatile Express TC2 devboard and on the Arndale board.
 running simultaenous VMs, all running SMP, on an SMP host, each
 VM running hackbench and cyclictest and with extreme memory pressure
 applied to the host with swapping enabled to provoke page eviction.
 Also tested KSM merging swapping on the host.  Fully boots both Ubuntu
 (user space Thumb-2) and Debian (user space ARM) guests each of which
 can run a number of worloads like apache, mysql, kernel compile, network
 tests, and more.

For a guide on how to set up a testing environment and try out these
patches, see:
 http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf

Changes since v14:
 - Fixed permission fault handling by correctly retrieving the IPA on
   Stage-2 permission faults
 - Fix compile error when !CONFIG_KVM_ARM_HOST
 - Support building into separate object directory
 - Fixed the Vodoo Bug (see
   https://github.com/virtualopensystems/linux-kvm-arm/wiki/Voodoo-Bug)
 - Improved some tracepoint debugs
 - Improved and cleaned up VTCR and VTTBR initialization
 - Clarified and unified Stage-2 page table clearing
 - Addressed a large number of concerns from Will Deacon's review,
   including fixing a race condition and removing unused exports.
 - Be a little more verbose when something goes wrong during the init
   process.

Changes since v13:
 - Fix VTTBR mask bug
 - Change KVM_MAX_VCPUS to config option (defualt 4)
 - Go back to struct pt_regs in kvm_regs struct
 - Factor out mmio instruction decoding to a separate file with non
   kvm-specific data structures as the interface.
 - Update kvm_device_address struct to use 64-bit fields
 - Various cleanups and compile fixes

Changes since v12:
 - Documentation updates
 - Change Hyp-ABI to function call based paradigm
 - Cleanup world-switch code
 - Unify HIFAR/HDFAR on the vcpu struct
 - Simplify vcpu register access in sofware
 - Enforce use of vcpu field accessors
 - Factor out mmio handling into separate file
 - Check for overlaps in mmio address mappings
 - Bugfix in mmio decoding
 - Complete rework of ARM mmio load/store instruction

Changes since v11:
 - Memory setup and page table defines reworked
 - We do not export unused perf bitfields anymore
 - No module support anymore and following cleanup
 - Hide vcpu register accessors
 - Fix unmap range mmu notifier race condition
 - Factored out A15 coprocs in separate file
 - Factored out world-switch assembly macros to separate file
 - Add dmux of multiplexed coprocs to user space
 - Add VFP get/set interface to user space
 - Addressed various cleanup comments from reviewers

Changes since v10:
 - Boot in Hyp mode and user HVC to initialize HVBAR
 - Support VGIC
 - Support Arch timers
 - Support Thumb-2 mmio instruction decoding
 - Transition to GET_ONE/SET_ONE register API
 - Added KVM_VCPU_GET_REG_LIST
 - New interrupt injection API
 - Don't pin guest pages anymore
 - Fix race condition in page fault handler
 - Cleanup guest instruction copying.
 - Fix race when copying SMP guest instructions
 - Inject data/prefetch aborts when guest does something strange

Changes since v9:
 - Addressed reviewer comments (see mailing list archive)
 - Limit the user of .arch_extensiion sec/virt for compilers that need them
 - VFP/Neon Support (Antonios Motakis)
 - Run exit handling under preemption and still handle guest cache ops
 - Add support for IO mapping at Hyp level (VGIC prep)
 - Add support for IO mapping at Guest level (VGIC prep)
 - Remove backdoor call to irq_svc
 - Complete rework of CP15 handling and register reset (Rusty Russell)
 - Don't use HSTR for anything else than CR 15
 - New ioctl to set emulation target core (only A15 supported for now)
 - Support KVM_GET_MSRS / KVM_SET_MSRS
 - Add page accounting and page table eviction
 - Change pgd lock to spinlock and fix sleeping in atomic bugs
 - Check kvm_condition_valid for HVC traps of undefs
 - Added a naive implementation of kvm_unmap_hva_range

Changes since v8:
 - Support cache maintenance on SMP through set/way
 - Hyp mode idmaps are now section based and happen at kernel init
 - Handle aborts in Hyp mode
 - Inject undefined exceptions into the guest on error
 - Kernel-side reset of all crucial registers
 - Specifically state which target CPU is being virtualized
 - Exit statistics in debugfs
 - Some L2CTLR cp15 emulation cleanups
 - Support spte_hva for MMU notifiers and take write faults
 - FIX: Race condition in VMID generation
 - BUG: Run exit handling code with disabled preemption
 - Save/Restore abort fault register during world switch

Changes since v7:
 - Traps accesses to ACTLR
 - Do not trap WFE execution
 - Upgrade barriers and TLB operations to inner-shareable domain
 - Restrucure hyp_pgd related code to be more opaque
 - Random SMP fixes
 - Random BUG fixes
 - Improve commenting
 - Support module loading/unloading of KVM/ARM
 - Thumb-2 support for host kernel and KVM
 - Unaligned cross-page wide guest Thumb instruction fetching
 - Support ITSTATE fields in CPSR for Thumb guests
 - Document HCR settings

Changes since v6:
 - Support for MMU notifiers to not pin user pages in memory
 - Suport build with log debugging
 - Bugfix: v6 clobbered r7 in init code
 - Simplify hyp code mapping
 - Cleanup of register access code
 - Table-based CP15 emulation from Rusty Russell
 - Various other bug fixes and cleanups

Changes since v5:
 - General bugfixes and nit fixes from reviews
 - Implemented re-use of VMIDs
 - Cleaned up the Hyp-mapping code to be readable by non-mm hackers
   (including myself)
 - Integrated preliminary SMP support in base patches
 - Lock-less interrupt injection and WFI support
 - Fixed signal-handling in while in guest (increases overall stability)

Changes since v4:
 - Addressed reviewer comments from v4
    * cleanup debug and trace code
    * remove printks
    * fixup kvm_arch_vcpu_ioctl_run
    * add trace details to mmio emulation
 - Fix from Marc Zyngier: Move kvm_guest_enter/exit into non-preemptible
   section (squashed into world-switch patch)
 - Cleanup create_hyp_mappings/remove_hyp_mappings from Marc Zyngier
   (squashed into hypervisor initialization patch)
 - Removed the remove_hyp_mappings feature. Removing hypervisor mappings
   could potentially unmap other important data shared in the same page.
 - Removed the arm_ prefix from the arch-specific files.
 - Initial SMP host/guest support

Changes since v3:
 - v4 actually works, fully boots a guest
 - Support compiling as a module
 - Use static inlines instead of macros for vcpu_reg and friends
 - Optimize kvm_vcpu_reg function
 - Use Ftrace for trace capabilities
 - Updated documentation and commenting
 - Use KVM_IRQ_LINE instead of KVM_INTERRUPT
 - Emulates load/store instructions not supported through HSR
  syndrome information.
 - Frees 2nd stage translation tables on VM teardown
 - Handles IRQ/FIQ instructions
 - Handles more CP15 accesses
 - Support guest WFI calls
 - Uses debugfs instead of /proc
 - Support compiling in Thumb mode

Changes since v2:
 - Performs world-switch code
 - Maps guest memory using 2nd stage translation
 - Emulates co-processor 15 instructions
 - Forwards I/O faults to QEMU.

---

Christoffer Dall (13):
      ARM: Add page table and page defines needed by KVM
      ARM: Section based HYP idmap
      KVM: ARM: Initial skeleton to compile KVM support
      KVM: ARM: Hypervisor initialization
      KVM: ARM: Memory virtualization setup
      KVM: ARM: Inject IRQs and FIQs from userspace
      KVM: ARM: World-switch implementation
      KVM: ARM: Emulation framework and CP15 emulation
      KVM: ARM: User space API for getting/setting co-proc registers
      KVM: ARM: Demux CCSIDR in the userspace API
      KVM: ARM: Handle guest faults in KVM
      KVM: ARM: Handle I/O aborts
      KVM: ARM: Add maintainer entry for KVM/ARM

Rusty Russell (1):
      KVM: ARM: VFP userspace interface


 Documentation/virtual/kvm/api.txt           |   95 ++
 MAINTAINERS                                 |    8 
 arch/arm/Kconfig                            |    2 
 arch/arm/Makefile                           |    1 
 arch/arm/include/asm/idmap.h                |    1 
 arch/arm/include/asm/kvm_arm.h              |  212 +++++
 arch/arm/include/asm/kvm_asm.h              |   84 ++
 arch/arm/include/asm/kvm_coproc.h           |   47 +
 arch/arm/include/asm/kvm_decode.h           |   47 +
 arch/arm/include/asm/kvm_emulate.h          |   64 ++
 arch/arm/include/asm/kvm_host.h             |  158 ++++
 arch/arm/include/asm/kvm_mmio.h             |   51 +
 arch/arm/include/asm/kvm_mmu.h              |   50 +
 arch/arm/include/asm/pgtable-3level-hwdef.h |    5 
 arch/arm/include/asm/pgtable-3level.h       |   18 
 arch/arm/include/asm/pgtable.h              |    7 
 arch/arm/include/uapi/asm/kvm.h             |  148 ++++
 arch/arm/kernel/asm-offsets.c               |   25 +
 arch/arm/kernel/vmlinux.lds.S               |    6 
 arch/arm/kvm/Kconfig                        |   56 +
 arch/arm/kvm/Makefile                       |   21 +
 arch/arm/kvm/arm.c                          |  998 ++++++++++++++++++++++++++
 arch/arm/kvm/coproc.c                       | 1046 +++++++++++++++++++++++++++
 arch/arm/kvm/coproc.h                       |  153 ++++
 arch/arm/kvm/coproc_a15.c                   |  162 ++++
 arch/arm/kvm/decode.c                       |  462 ++++++++++++
 arch/arm/kvm/emulate.c                      |  542 ++++++++++++++
 arch/arm/kvm/guest.c                        |  222 ++++++
 arch/arm/kvm/init.S                         |  114 +++
 arch/arm/kvm/interrupts.S                   |  494 +++++++++++++
 arch/arm/kvm/interrupts_head.S              |  443 +++++++++++
 arch/arm/kvm/mmio.c                         |  154 ++++
 arch/arm/kvm/mmu.c                          |  777 ++++++++++++++++++++
 arch/arm/kvm/reset.c                        |   74 ++
 arch/arm/kvm/trace.h                        |  215 ++++++
 arch/arm/mm/idmap.c                         |   54 +
 arch/arm/mm/mmu.c                           |   22 +
 include/uapi/linux/kvm.h                    |    8 
 38 files changed, 7026 insertions(+), 20 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_decode.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/asm/kvm_mmio.h
 create mode 100644 arch/arm/include/asm/kvm_mmu.h
 create mode 100644 arch/arm/include/uapi/asm/kvm.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/coproc.h
 create mode 100644 arch/arm/kvm/coproc_a15.c
 create mode 100644 arch/arm/kvm/decode.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/interrupts_head.S
 create mode 100644 arch/arm/kvm/mmio.c
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

-- 

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 00/14] KVM/ARM Implementation
@ 2013-01-08 18:38 ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:38 UTC (permalink / raw)
  To: linux-arm-kernel

The following series implements KVM support for ARM processors,
specifically on the Cortex-A15 platform.

Work is done in collaboration between Columbia University, Virtual Open
Systems and ARM/Linaro.

The patch series applies to Linux 3.8-rc2 with kvm/next merged:
 git://git.kernel.org/pub/scm/virt/kvm/kvm.git
        branch: next (commit e11ae1a102b)

The series relies on two additional patches in Will Deacon's perf tree:
    ARM: Define CPU part numbers and implementors
    ARM: Use implementor and part defines from cputype.h

This is Version 15 of the patch series, the first 10 versions were
reviewed on the KVM/ARM and KVM mailing lists. Changes can also be
pulled from:
    git://github.com/virtualopensystems/linux-kvm-arm.git
        branch: kvm-arm-v15
        branch: kvm-arm-v15-vgic
        branch: kvm-arm-v15-vgic-timers

A non-flattened edition of the patch series, which can always be merged,
can be found at:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-master

This patch series requires QEMU compatibility.  Use the branch
 git://github.com/virtualopensystems/qemu.git kvm-arm

There is also WIP QEMU patches to support virtio on ARM:
 git://github.com/virtualopensystems/qemu.git kvm-arm-virtio

There is also a rebasing WIP branch with support for huge pages:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-hugetlb

Finally there is kvmtool support available for the mach-virt machine:
 git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git

Following this patch series, which implements core KVM support, are two
other patch series implementing Virtual Generic Interrupt Controller
(VGIC) support and Architected Generic Timers.  All three patch series
should be applied for full QEMU compatibility.

The implementation is broken up into a logical set of patches, the first
are preparatory patches:
  1. ARM: Add page table defines for KVM
  2. ARM: Section based HYP idmaps

The main implementation is broken up into separate patches, the first
containing a skeleton of files, makefile changes, the basic user space
interface and KVM architecture specific stubs.  Subsequent patches
implement parts of the system as listed:
  3. Skeleton and reset hooks
  4. Hypervisor initialization
  5. Memory virtualization setup (hyp mode mappings and 2nd stage)
  6. Inject IRQs and FIQs from userspace
  7. World-switch implementation and Hyp exception vectors
  8. Emulation framework and coproc emulation
  9. Coproc user space API
 10. Demux multiplexed coproc registers
 11. User spac API to get/set VFP registers
 12. Handle guest user memory aborts
 13. Handle guest MMIO aborts
 14. Add an entry in the MAINTAINERS file

Testing:
 Tested on the Versatile Express TC2 devboard and on the Arndale board.
 running simultaenous VMs, all running SMP, on an SMP host, each
 VM running hackbench and cyclictest and with extreme memory pressure
 applied to the host with swapping enabled to provoke page eviction.
 Also tested KSM merging swapping on the host.  Fully boots both Ubuntu
 (user space Thumb-2) and Debian (user space ARM) guests each of which
 can run a number of worloads like apache, mysql, kernel compile, network
 tests, and more.

For a guide on how to set up a testing environment and try out these
patches, see:
 http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf

Changes since v14:
 - Fixed permission fault handling by correctly retrieving the IPA on
   Stage-2 permission faults
 - Fix compile error when !CONFIG_KVM_ARM_HOST
 - Support building into separate object directory
 - Fixed the Vodoo Bug (see
   https://github.com/virtualopensystems/linux-kvm-arm/wiki/Voodoo-Bug)
 - Improved some tracepoint debugs
 - Improved and cleaned up VTCR and VTTBR initialization
 - Clarified and unified Stage-2 page table clearing
 - Addressed a large number of concerns from Will Deacon's review,
   including fixing a race condition and removing unused exports.
 - Be a little more verbose when something goes wrong during the init
   process.

Changes since v13:
 - Fix VTTBR mask bug
 - Change KVM_MAX_VCPUS to config option (defualt 4)
 - Go back to struct pt_regs in kvm_regs struct
 - Factor out mmio instruction decoding to a separate file with non
   kvm-specific data structures as the interface.
 - Update kvm_device_address struct to use 64-bit fields
 - Various cleanups and compile fixes

Changes since v12:
 - Documentation updates
 - Change Hyp-ABI to function call based paradigm
 - Cleanup world-switch code
 - Unify HIFAR/HDFAR on the vcpu struct
 - Simplify vcpu register access in sofware
 - Enforce use of vcpu field accessors
 - Factor out mmio handling into separate file
 - Check for overlaps in mmio address mappings
 - Bugfix in mmio decoding
 - Complete rework of ARM mmio load/store instruction

Changes since v11:
 - Memory setup and page table defines reworked
 - We do not export unused perf bitfields anymore
 - No module support anymore and following cleanup
 - Hide vcpu register accessors
 - Fix unmap range mmu notifier race condition
 - Factored out A15 coprocs in separate file
 - Factored out world-switch assembly macros to separate file
 - Add dmux of multiplexed coprocs to user space
 - Add VFP get/set interface to user space
 - Addressed various cleanup comments from reviewers

Changes since v10:
 - Boot in Hyp mode and user HVC to initialize HVBAR
 - Support VGIC
 - Support Arch timers
 - Support Thumb-2 mmio instruction decoding
 - Transition to GET_ONE/SET_ONE register API
 - Added KVM_VCPU_GET_REG_LIST
 - New interrupt injection API
 - Don't pin guest pages anymore
 - Fix race condition in page fault handler
 - Cleanup guest instruction copying.
 - Fix race when copying SMP guest instructions
 - Inject data/prefetch aborts when guest does something strange

Changes since v9:
 - Addressed reviewer comments (see mailing list archive)
 - Limit the user of .arch_extensiion sec/virt for compilers that need them
 - VFP/Neon Support (Antonios Motakis)
 - Run exit handling under preemption and still handle guest cache ops
 - Add support for IO mapping at Hyp level (VGIC prep)
 - Add support for IO mapping at Guest level (VGIC prep)
 - Remove backdoor call to irq_svc
 - Complete rework of CP15 handling and register reset (Rusty Russell)
 - Don't use HSTR for anything else than CR 15
 - New ioctl to set emulation target core (only A15 supported for now)
 - Support KVM_GET_MSRS / KVM_SET_MSRS
 - Add page accounting and page table eviction
 - Change pgd lock to spinlock and fix sleeping in atomic bugs
 - Check kvm_condition_valid for HVC traps of undefs
 - Added a naive implementation of kvm_unmap_hva_range

Changes since v8:
 - Support cache maintenance on SMP through set/way
 - Hyp mode idmaps are now section based and happen at kernel init
 - Handle aborts in Hyp mode
 - Inject undefined exceptions into the guest on error
 - Kernel-side reset of all crucial registers
 - Specifically state which target CPU is being virtualized
 - Exit statistics in debugfs
 - Some L2CTLR cp15 emulation cleanups
 - Support spte_hva for MMU notifiers and take write faults
 - FIX: Race condition in VMID generation
 - BUG: Run exit handling code with disabled preemption
 - Save/Restore abort fault register during world switch

Changes since v7:
 - Traps accesses to ACTLR
 - Do not trap WFE execution
 - Upgrade barriers and TLB operations to inner-shareable domain
 - Restrucure hyp_pgd related code to be more opaque
 - Random SMP fixes
 - Random BUG fixes
 - Improve commenting
 - Support module loading/unloading of KVM/ARM
 - Thumb-2 support for host kernel and KVM
 - Unaligned cross-page wide guest Thumb instruction fetching
 - Support ITSTATE fields in CPSR for Thumb guests
 - Document HCR settings

Changes since v6:
 - Support for MMU notifiers to not pin user pages in memory
 - Suport build with log debugging
 - Bugfix: v6 clobbered r7 in init code
 - Simplify hyp code mapping
 - Cleanup of register access code
 - Table-based CP15 emulation from Rusty Russell
 - Various other bug fixes and cleanups

Changes since v5:
 - General bugfixes and nit fixes from reviews
 - Implemented re-use of VMIDs
 - Cleaned up the Hyp-mapping code to be readable by non-mm hackers
   (including myself)
 - Integrated preliminary SMP support in base patches
 - Lock-less interrupt injection and WFI support
 - Fixed signal-handling in while in guest (increases overall stability)

Changes since v4:
 - Addressed reviewer comments from v4
    * cleanup debug and trace code
    * remove printks
    * fixup kvm_arch_vcpu_ioctl_run
    * add trace details to mmio emulation
 - Fix from Marc Zyngier: Move kvm_guest_enter/exit into non-preemptible
   section (squashed into world-switch patch)
 - Cleanup create_hyp_mappings/remove_hyp_mappings from Marc Zyngier
   (squashed into hypervisor initialization patch)
 - Removed the remove_hyp_mappings feature. Removing hypervisor mappings
   could potentially unmap other important data shared in the same page.
 - Removed the arm_ prefix from the arch-specific files.
 - Initial SMP host/guest support

Changes since v3:
 - v4 actually works, fully boots a guest
 - Support compiling as a module
 - Use static inlines instead of macros for vcpu_reg and friends
 - Optimize kvm_vcpu_reg function
 - Use Ftrace for trace capabilities
 - Updated documentation and commenting
 - Use KVM_IRQ_LINE instead of KVM_INTERRUPT
 - Emulates load/store instructions not supported through HSR
  syndrome information.
 - Frees 2nd stage translation tables on VM teardown
 - Handles IRQ/FIQ instructions
 - Handles more CP15 accesses
 - Support guest WFI calls
 - Uses debugfs instead of /proc
 - Support compiling in Thumb mode

Changes since v2:
 - Performs world-switch code
 - Maps guest memory using 2nd stage translation
 - Emulates co-processor 15 instructions
 - Forwards I/O faults to QEMU.

---

Christoffer Dall (13):
      ARM: Add page table and page defines needed by KVM
      ARM: Section based HYP idmap
      KVM: ARM: Initial skeleton to compile KVM support
      KVM: ARM: Hypervisor initialization
      KVM: ARM: Memory virtualization setup
      KVM: ARM: Inject IRQs and FIQs from userspace
      KVM: ARM: World-switch implementation
      KVM: ARM: Emulation framework and CP15 emulation
      KVM: ARM: User space API for getting/setting co-proc registers
      KVM: ARM: Demux CCSIDR in the userspace API
      KVM: ARM: Handle guest faults in KVM
      KVM: ARM: Handle I/O aborts
      KVM: ARM: Add maintainer entry for KVM/ARM

Rusty Russell (1):
      KVM: ARM: VFP userspace interface


 Documentation/virtual/kvm/api.txt           |   95 ++
 MAINTAINERS                                 |    8 
 arch/arm/Kconfig                            |    2 
 arch/arm/Makefile                           |    1 
 arch/arm/include/asm/idmap.h                |    1 
 arch/arm/include/asm/kvm_arm.h              |  212 +++++
 arch/arm/include/asm/kvm_asm.h              |   84 ++
 arch/arm/include/asm/kvm_coproc.h           |   47 +
 arch/arm/include/asm/kvm_decode.h           |   47 +
 arch/arm/include/asm/kvm_emulate.h          |   64 ++
 arch/arm/include/asm/kvm_host.h             |  158 ++++
 arch/arm/include/asm/kvm_mmio.h             |   51 +
 arch/arm/include/asm/kvm_mmu.h              |   50 +
 arch/arm/include/asm/pgtable-3level-hwdef.h |    5 
 arch/arm/include/asm/pgtable-3level.h       |   18 
 arch/arm/include/asm/pgtable.h              |    7 
 arch/arm/include/uapi/asm/kvm.h             |  148 ++++
 arch/arm/kernel/asm-offsets.c               |   25 +
 arch/arm/kernel/vmlinux.lds.S               |    6 
 arch/arm/kvm/Kconfig                        |   56 +
 arch/arm/kvm/Makefile                       |   21 +
 arch/arm/kvm/arm.c                          |  998 ++++++++++++++++++++++++++
 arch/arm/kvm/coproc.c                       | 1046 +++++++++++++++++++++++++++
 arch/arm/kvm/coproc.h                       |  153 ++++
 arch/arm/kvm/coproc_a15.c                   |  162 ++++
 arch/arm/kvm/decode.c                       |  462 ++++++++++++
 arch/arm/kvm/emulate.c                      |  542 ++++++++++++++
 arch/arm/kvm/guest.c                        |  222 ++++++
 arch/arm/kvm/init.S                         |  114 +++
 arch/arm/kvm/interrupts.S                   |  494 +++++++++++++
 arch/arm/kvm/interrupts_head.S              |  443 +++++++++++
 arch/arm/kvm/mmio.c                         |  154 ++++
 arch/arm/kvm/mmu.c                          |  777 ++++++++++++++++++++
 arch/arm/kvm/reset.c                        |   74 ++
 arch/arm/kvm/trace.h                        |  215 ++++++
 arch/arm/mm/idmap.c                         |   54 +
 arch/arm/mm/mmu.c                           |   22 +
 include/uapi/linux/kvm.h                    |    8 
 38 files changed, 7026 insertions(+), 20 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_decode.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/asm/kvm_mmio.h
 create mode 100644 arch/arm/include/asm/kvm_mmu.h
 create mode 100644 arch/arm/include/uapi/asm/kvm.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/coproc.h
 create mode 100644 arch/arm/kvm/coproc_a15.c
 create mode 100644 arch/arm/kvm/decode.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/interrupts_head.S
 create mode 100644 arch/arm/kvm/mmio.c
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

-- 

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 01/14] ARM: Add page table and page defines needed by KVM
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:38   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:38 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm; +Cc: Marcelo Tosatti

KVM uses the stage-2 page tables and the Hyp page table format,
so we define the fields and page protection flags needed by KVM.

The nomenclature is this:
 - page_hyp:        PL2 code/data mappings
 - page_hyp_device: PL2 device mappings (vgic access)
 - page_s2:         Stage-2 code/data page mappings
 - page_s2_device:  Stage-2 device mappings (vgic access)

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/pgtable-3level.h |   18 ++++++++++++++++++
 arch/arm/include/asm/pgtable.h        |    7 +++++++
 arch/arm/mm/mmu.c                     |   22 ++++++++++++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index a3f3792..6ef8afd 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -104,11 +104,29 @@
  */
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
 
+/*
+ * 2nd stage PTE definitions for LPAE.
+ */
+#define L_PTE_S2_MT_UNCACHED	 (_AT(pteval_t, 0x5) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRITEBACK	 (_AT(pteval_t, 0xf) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_RDONLY		 (_AT(pteval_t, 1) << 6)   /* HAP[1]   */
+#define L_PTE_S2_RDWR		 (_AT(pteval_t, 2) << 6)   /* HAP[2:1] */
+
+/*
+ * Hyp-mode PL2 PTE definitions for LPAE.
+ */
+#define L_PTE_HYP		L_PTE_USER
+
 #ifndef __ASSEMBLY__
 
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & 2))
 #define pud_present(pud)	(pud_val(pud))
+#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_TABLE)
+#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_SECT)
 
 #define pud_clear(pudp)			\
 	do {				\
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 9c82f98..f30ac3b 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -70,6 +70,9 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 
 extern pgprot_t		pgprot_user;
 extern pgprot_t		pgprot_kernel;
+extern pgprot_t		pgprot_hyp_device;
+extern pgprot_t		pgprot_s2;
+extern pgprot_t		pgprot_s2_device;
 
 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
 
@@ -82,6 +85,10 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
+#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_HYP)
+#define PAGE_HYP_DEVICE		_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
+#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
+#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_USER | L_PTE_S2_RDONLY)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 9f06102..1f51d71 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -57,6 +57,9 @@ static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
+pgprot_t pgprot_hyp_device;
+pgprot_t pgprot_s2;
+pgprot_t pgprot_s2_device;
 
 EXPORT_SYMBOL(pgprot_user);
 EXPORT_SYMBOL(pgprot_kernel);
@@ -66,34 +69,46 @@ struct cachepolicy {
 	unsigned int	cr_mask;
 	pmdval_t	pmd;
 	pteval_t	pte;
+	pteval_t	pte_s2;
 };
 
+#ifdef CONFIG_ARM_LPAE
+#define s2_policy(policy)	policy
+#else
+#define s2_policy(policy)	0
+#endif
+
 static struct cachepolicy cache_policies[] __initdata = {
 	{
 		.policy		= "uncached",
 		.cr_mask	= CR_W|CR_C,
 		.pmd		= PMD_SECT_UNCACHED,
 		.pte		= L_PTE_MT_UNCACHED,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_UNCACHED),
 	}, {
 		.policy		= "buffered",
 		.cr_mask	= CR_C,
 		.pmd		= PMD_SECT_BUFFERED,
 		.pte		= L_PTE_MT_BUFFERABLE,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_UNCACHED),
 	}, {
 		.policy		= "writethrough",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WT,
 		.pte		= L_PTE_MT_WRITETHROUGH,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITETHROUGH),
 	}, {
 		.policy		= "writeback",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WB,
 		.pte		= L_PTE_MT_WRITEBACK,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITEBACK),
 	}, {
 		.policy		= "writealloc",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WBWA,
 		.pte		= L_PTE_MT_WRITEALLOC,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITEBACK),
 	}
 };
 
@@ -310,6 +325,7 @@ static void __init build_mem_type_table(void)
 	struct cachepolicy *cp;
 	unsigned int cr = get_cr();
 	pteval_t user_pgprot, kern_pgprot, vecs_pgprot;
+	pteval_t hyp_device_pgprot, s2_pgprot, s2_device_pgprot;
 	int cpu_arch = cpu_architecture();
 	int i;
 
@@ -421,6 +437,8 @@ static void __init build_mem_type_table(void)
 	 */
 	cp = &cache_policies[cachepolicy];
 	vecs_pgprot = kern_pgprot = user_pgprot = cp->pte;
+	s2_pgprot = cp->pte_s2;
+	hyp_device_pgprot = s2_device_pgprot = mem_types[MT_DEVICE].prot_pte;
 
 	/*
 	 * ARMv6 and above have extended page tables.
@@ -444,6 +462,7 @@ static void __init build_mem_type_table(void)
 			user_pgprot |= L_PTE_SHARED;
 			kern_pgprot |= L_PTE_SHARED;
 			vecs_pgprot |= L_PTE_SHARED;
+			s2_pgprot |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
 			mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
@@ -498,6 +517,9 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
+	pgprot_s2  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | s2_pgprot);
+	pgprot_s2_device  = __pgprot(s2_device_pgprot);
+	pgprot_hyp_device  = __pgprot(hyp_device_pgprot);
 
 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 01/14] ARM: Add page table and page defines needed by KVM
@ 2013-01-08 18:38   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:38 UTC (permalink / raw)
  To: linux-arm-kernel

KVM uses the stage-2 page tables and the Hyp page table format,
so we define the fields and page protection flags needed by KVM.

The nomenclature is this:
 - page_hyp:        PL2 code/data mappings
 - page_hyp_device: PL2 device mappings (vgic access)
 - page_s2:         Stage-2 code/data page mappings
 - page_s2_device:  Stage-2 device mappings (vgic access)

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/pgtable-3level.h |   18 ++++++++++++++++++
 arch/arm/include/asm/pgtable.h        |    7 +++++++
 arch/arm/mm/mmu.c                     |   22 ++++++++++++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index a3f3792..6ef8afd 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -104,11 +104,29 @@
  */
 #define L_PGD_SWAPPER		(_AT(pgdval_t, 1) << 55)	/* swapper_pg_dir entry */
 
+/*
+ * 2nd stage PTE definitions for LPAE.
+ */
+#define L_PTE_S2_MT_UNCACHED	 (_AT(pteval_t, 0x5) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRITEBACK	 (_AT(pteval_t, 0xf) << 2) /* MemAttr[3:0] */
+#define L_PTE_S2_RDONLY		 (_AT(pteval_t, 1) << 6)   /* HAP[1]   */
+#define L_PTE_S2_RDWR		 (_AT(pteval_t, 2) << 6)   /* HAP[2:1] */
+
+/*
+ * Hyp-mode PL2 PTE definitions for LPAE.
+ */
+#define L_PTE_HYP		L_PTE_USER
+
 #ifndef __ASSEMBLY__
 
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & 2))
 #define pud_present(pud)	(pud_val(pud))
+#define pmd_table(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_TABLE)
+#define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
+						 PMD_TYPE_SECT)
 
 #define pud_clear(pudp)			\
 	do {				\
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 9c82f98..f30ac3b 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -70,6 +70,9 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 
 extern pgprot_t		pgprot_user;
 extern pgprot_t		pgprot_kernel;
+extern pgprot_t		pgprot_hyp_device;
+extern pgprot_t		pgprot_s2;
+extern pgprot_t		pgprot_s2_device;
 
 #define _MOD_PROT(p, b)	__pgprot(pgprot_val(p) | (b))
 
@@ -82,6 +85,10 @@ extern pgprot_t		pgprot_kernel;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
+#define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_HYP)
+#define PAGE_HYP_DEVICE		_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
+#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
+#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_USER | L_PTE_S2_RDONLY)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 9f06102..1f51d71 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -57,6 +57,9 @@ static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
+pgprot_t pgprot_hyp_device;
+pgprot_t pgprot_s2;
+pgprot_t pgprot_s2_device;
 
 EXPORT_SYMBOL(pgprot_user);
 EXPORT_SYMBOL(pgprot_kernel);
@@ -66,34 +69,46 @@ struct cachepolicy {
 	unsigned int	cr_mask;
 	pmdval_t	pmd;
 	pteval_t	pte;
+	pteval_t	pte_s2;
 };
 
+#ifdef CONFIG_ARM_LPAE
+#define s2_policy(policy)	policy
+#else
+#define s2_policy(policy)	0
+#endif
+
 static struct cachepolicy cache_policies[] __initdata = {
 	{
 		.policy		= "uncached",
 		.cr_mask	= CR_W|CR_C,
 		.pmd		= PMD_SECT_UNCACHED,
 		.pte		= L_PTE_MT_UNCACHED,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_UNCACHED),
 	}, {
 		.policy		= "buffered",
 		.cr_mask	= CR_C,
 		.pmd		= PMD_SECT_BUFFERED,
 		.pte		= L_PTE_MT_BUFFERABLE,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_UNCACHED),
 	}, {
 		.policy		= "writethrough",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WT,
 		.pte		= L_PTE_MT_WRITETHROUGH,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITETHROUGH),
 	}, {
 		.policy		= "writeback",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WB,
 		.pte		= L_PTE_MT_WRITEBACK,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITEBACK),
 	}, {
 		.policy		= "writealloc",
 		.cr_mask	= 0,
 		.pmd		= PMD_SECT_WBWA,
 		.pte		= L_PTE_MT_WRITEALLOC,
+		.pte_s2		= s2_policy(L_PTE_S2_MT_WRITEBACK),
 	}
 };
 
@@ -310,6 +325,7 @@ static void __init build_mem_type_table(void)
 	struct cachepolicy *cp;
 	unsigned int cr = get_cr();
 	pteval_t user_pgprot, kern_pgprot, vecs_pgprot;
+	pteval_t hyp_device_pgprot, s2_pgprot, s2_device_pgprot;
 	int cpu_arch = cpu_architecture();
 	int i;
 
@@ -421,6 +437,8 @@ static void __init build_mem_type_table(void)
 	 */
 	cp = &cache_policies[cachepolicy];
 	vecs_pgprot = kern_pgprot = user_pgprot = cp->pte;
+	s2_pgprot = cp->pte_s2;
+	hyp_device_pgprot = s2_device_pgprot = mem_types[MT_DEVICE].prot_pte;
 
 	/*
 	 * ARMv6 and above have extended page tables.
@@ -444,6 +462,7 @@ static void __init build_mem_type_table(void)
 			user_pgprot |= L_PTE_SHARED;
 			kern_pgprot |= L_PTE_SHARED;
 			vecs_pgprot |= L_PTE_SHARED;
+			s2_pgprot |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
 			mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
@@ -498,6 +517,9 @@ static void __init build_mem_type_table(void)
 	pgprot_user   = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | user_pgprot);
 	pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG |
 				 L_PTE_DIRTY | kern_pgprot);
+	pgprot_s2  = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | s2_pgprot);
+	pgprot_s2_device  = __pgprot(s2_device_pgprot);
+	pgprot_hyp_device  = __pgprot(hyp_device_pgprot);
 
 	mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 02/14] ARM: Section based HYP idmap
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:38   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:38 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm; +Cc: Marc Zyngier, Marcelo Tosatti, Will Deacon

Add a method (hyp_idmap_setup) to populate a hyp pgd with an
identity mapping of the code contained in the .hyp.idmap.text
section.

Offer a method to drop this identity mapping through
hyp_idmap_teardown.

Make all the above depend on CONFIG_ARM_VIRT_EXT and CONFIG_ARM_LPAE.

Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/idmap.h                |    1 +
 arch/arm/include/asm/pgtable-3level-hwdef.h |    1 +
 arch/arm/kernel/vmlinux.lds.S               |    6 +++
 arch/arm/mm/idmap.c                         |   54 ++++++++++++++++++++++-----
 4 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
index bf863ed..1a66f907 100644
--- a/arch/arm/include/asm/idmap.h
+++ b/arch/arm/include/asm/idmap.h
@@ -8,6 +8,7 @@
 #define __idmap __section(.idmap.text) noinline notrace
 
 extern pgd_t *idmap_pgd;
+extern pgd_t *hyp_pgd;
 
 void setup_mm_for_reboot(void);
 
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..a2d404e 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -44,6 +44,7 @@
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
+#define PMD_SECT_AP1		(_AT(pmdval_t, 1) << 6)
 #define PMD_SECT_TEX(x)		(_AT(pmdval_t, 0))
 
 /*
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 11c1785..b571484 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -19,7 +19,11 @@
 	ALIGN_FUNCTION();						\
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
 	*(.idmap.text)							\
-	VMLINUX_SYMBOL(__idmap_text_end) = .;
+	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
+	ALIGN_FUNCTION();						\
+	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
+	*(.hyp.idmap.text)						\
+	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 99db769..d9213a5 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,4 +1,6 @@
+#include <linux/module.h>
 #include <linux/kernel.h>
+#include <linux/slab.h>
 
 #include <asm/cputype.h>
 #include <asm/idmap.h>
@@ -6,6 +8,7 @@
 #include <asm/pgtable.h>
 #include <asm/sections.h>
 #include <asm/system_info.h>
+#include <asm/virt.h>
 
 pgd_t *idmap_pgd;
 
@@ -59,11 +62,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 	} while (pud++, addr = next, addr != end);
 }
 
-static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+static void identity_mapping_add(pgd_t *pgd, const char *text_start,
+				 const char *text_end, unsigned long prot)
 {
-	unsigned long prot, next;
+	unsigned long addr, end;
+	unsigned long next;
+
+	addr = virt_to_phys(text_start);
+	end = virt_to_phys(text_end);
+
+	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
+		prot ? "HYP " : "",
+		(long long)addr, (long long)end);
+	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
-	prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
 		prot |= PMD_BIT4;
 
@@ -74,28 +86,48 @@ static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long e
 	} while (pgd++, addr = next, addr != end);
 }
 
+#ifdef CONFIG_ARM_VIRT_EXT
+pgd_t *hyp_pgd;
+
+extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
+
+static int __init init_static_idmap_hyp(void)
+{
+	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!hyp_pgd)
+		return -ENOMEM;
+
+	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
+			     __hyp_idmap_text_end, PMD_SECT_AP1);
+
+	return 0;
+}
+#else
+static int __init init_static_idmap_hyp(void)
+{
+	return 0;
+}
+#endif
+
 extern char  __idmap_text_start[], __idmap_text_end[];
 
 static int __init init_static_idmap(void)
 {
-	phys_addr_t idmap_start, idmap_end;
+	int ret;
 
 	idmap_pgd = pgd_alloc(&init_mm);
 	if (!idmap_pgd)
 		return -ENOMEM;
 
-	/* Add an identity mapping for the physical address of the section. */
-	idmap_start = virt_to_phys((void *)__idmap_text_start);
-	idmap_end = virt_to_phys((void *)__idmap_text_end);
+	identity_mapping_add(idmap_pgd, __idmap_text_start,
+			     __idmap_text_end, 0);
 
-	pr_info("Setting up static identity map for 0x%llx - 0x%llx\n",
-		(long long)idmap_start, (long long)idmap_end);
-	identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
+	ret = init_static_idmap_hyp();
 
 	/* Flush L1 for the hardware to see this page table content */
 	flush_cache_louis();
 
-	return 0;
+	return ret;
 }
 early_initcall(init_static_idmap);
 


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 02/14] ARM: Section based HYP idmap
@ 2013-01-08 18:38   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:38 UTC (permalink / raw)
  To: linux-arm-kernel

Add a method (hyp_idmap_setup) to populate a hyp pgd with an
identity mapping of the code contained in the .hyp.idmap.text
section.

Offer a method to drop this identity mapping through
hyp_idmap_teardown.

Make all the above depend on CONFIG_ARM_VIRT_EXT and CONFIG_ARM_LPAE.

Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/idmap.h                |    1 +
 arch/arm/include/asm/pgtable-3level-hwdef.h |    1 +
 arch/arm/kernel/vmlinux.lds.S               |    6 +++
 arch/arm/mm/idmap.c                         |   54 ++++++++++++++++++++++-----
 4 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
index bf863ed..1a66f907 100644
--- a/arch/arm/include/asm/idmap.h
+++ b/arch/arm/include/asm/idmap.h
@@ -8,6 +8,7 @@
 #define __idmap __section(.idmap.text) noinline notrace
 
 extern pgd_t *idmap_pgd;
+extern pgd_t *hyp_pgd;
 
 void setup_mm_for_reboot(void);
 
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..a2d404e 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -44,6 +44,7 @@
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
+#define PMD_SECT_AP1		(_AT(pmdval_t, 1) << 6)
 #define PMD_SECT_TEX(x)		(_AT(pmdval_t, 0))
 
 /*
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 11c1785..b571484 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -19,7 +19,11 @@
 	ALIGN_FUNCTION();						\
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
 	*(.idmap.text)							\
-	VMLINUX_SYMBOL(__idmap_text_end) = .;
+	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
+	ALIGN_FUNCTION();						\
+	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
+	*(.hyp.idmap.text)						\
+	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 99db769..d9213a5 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,4 +1,6 @@
+#include <linux/module.h>
 #include <linux/kernel.h>
+#include <linux/slab.h>
 
 #include <asm/cputype.h>
 #include <asm/idmap.h>
@@ -6,6 +8,7 @@
 #include <asm/pgtable.h>
 #include <asm/sections.h>
 #include <asm/system_info.h>
+#include <asm/virt.h>
 
 pgd_t *idmap_pgd;
 
@@ -59,11 +62,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 	} while (pud++, addr = next, addr != end);
 }
 
-static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+static void identity_mapping_add(pgd_t *pgd, const char *text_start,
+				 const char *text_end, unsigned long prot)
 {
-	unsigned long prot, next;
+	unsigned long addr, end;
+	unsigned long next;
+
+	addr = virt_to_phys(text_start);
+	end = virt_to_phys(text_end);
+
+	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
+		prot ? "HYP " : "",
+		(long long)addr, (long long)end);
+	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
-	prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
 		prot |= PMD_BIT4;
 
@@ -74,28 +86,48 @@ static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long e
 	} while (pgd++, addr = next, addr != end);
 }
 
+#ifdef CONFIG_ARM_VIRT_EXT
+pgd_t *hyp_pgd;
+
+extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
+
+static int __init init_static_idmap_hyp(void)
+{
+	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!hyp_pgd)
+		return -ENOMEM;
+
+	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
+			     __hyp_idmap_text_end, PMD_SECT_AP1);
+
+	return 0;
+}
+#else
+static int __init init_static_idmap_hyp(void)
+{
+	return 0;
+}
+#endif
+
 extern char  __idmap_text_start[], __idmap_text_end[];
 
 static int __init init_static_idmap(void)
 {
-	phys_addr_t idmap_start, idmap_end;
+	int ret;
 
 	idmap_pgd = pgd_alloc(&init_mm);
 	if (!idmap_pgd)
 		return -ENOMEM;
 
-	/* Add an identity mapping for the physical address of the section. */
-	idmap_start = virt_to_phys((void *)__idmap_text_start);
-	idmap_end = virt_to_phys((void *)__idmap_text_end);
+	identity_mapping_add(idmap_pgd, __idmap_text_start,
+			     __idmap_text_end, 0);
 
-	pr_info("Setting up static identity map for 0x%llx - 0x%llx\n",
-		(long long)idmap_start, (long long)idmap_end);
-	identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
+	ret = init_static_idmap_hyp();
 
 	/* Flush L1 for the hardware to see this page table content */
 	flush_cache_louis();
 
-	return 0;
+	return ret;
 }
 early_initcall(init_static_idmap);
 

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:38   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:38 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm
  Cc: Marc Zyngier, Marcelo Tosatti, Rusty Russell

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt  |   57 +++++-
 arch/arm/Kconfig                   |    2 
 arch/arm/Makefile                  |    1 
 arch/arm/include/asm/kvm_arm.h     |   24 ++
 arch/arm/include/asm/kvm_asm.h     |   58 ++++++
 arch/arm/include/asm/kvm_coproc.h  |   24 ++
 arch/arm/include/asm/kvm_emulate.h |   50 +++++
 arch/arm/include/asm/kvm_host.h    |  114 ++++++++++++
 arch/arm/include/uapi/asm/kvm.h    |  106 +++++++++++
 arch/arm/kvm/Kconfig               |   55 ++++++
 arch/arm/kvm/Makefile              |   21 ++
 arch/arm/kvm/arm.c                 |  355 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/coproc.c              |   23 ++
 arch/arm/kvm/emulate.c             |  155 ++++++++++++++++
 arch/arm/kvm/guest.c               |  221 ++++++++++++++++++++++
 arch/arm/kvm/init.S                |   19 ++
 arch/arm/kvm/interrupts.S          |   19 ++
 arch/arm/kvm/mmu.c                 |   17 ++
 arch/arm/kvm/reset.c               |   74 ++++++++
 arch/arm/kvm/trace.h               |   52 +++++
 include/uapi/linux/kvm.h           |    7 +
 21 files changed, 1450 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/uapi/asm/kvm.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index a4df553..4237c27 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -293,7 +293,7 @@ kvm_run' (see below).
 4.11 KVM_GET_REGS
 
 Capability: basic
-Architectures: all
+Architectures: all except ARM
 Type: vcpu ioctl
 Parameters: struct kvm_regs (out)
 Returns: 0 on success, -1 on error
@@ -314,7 +314,7 @@ struct kvm_regs {
 4.12 KVM_SET_REGS
 
 Capability: basic
-Architectures: all
+Architectures: all except ARM
 Type: vcpu ioctl
 Parameters: struct kvm_regs (in)
 Returns: 0 on success, -1 on error
@@ -600,7 +600,7 @@ struct kvm_fpu {
 4.24 KVM_CREATE_IRQCHIP
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, ARM
 Type: vm ioctl
 Parameters: none
 Returns: 0 on success, -1 on error
@@ -608,7 +608,8 @@ Returns: 0 on success, -1 on error
 Creates an interrupt controller model in the kernel.  On x86, creates a virtual
 ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
 local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
-only go to the IOAPIC.  On ia64, a IOSAPIC is created.
+only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM, a GIC is
+created.
 
 
 4.25 KVM_IRQ_LINE
@@ -1775,6 +1776,14 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_VPA_DTL   | 128
   PPC   | KVM_REG_PPC_EPCR	| 32
 
+ARM registers are mapped using the lower 32 bits.  The upper 16 of that
+is the register group type, or coprocessor number:
+
+ARM core registers have the following id bit patterns:
+  0x4002 0000 0010 <index into the kvm_regs struct:16>
+
+
+
 4.69 KVM_GET_ONE_REG
 
 Capability: KVM_CAP_ONE_REG
@@ -2127,6 +2136,46 @@ written, then `n_invalid' invalid entries, invalidating any previously
 valid entries found.
 
 
+4.77 KVM_ARM_VCPU_INIT
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct struct kvm_vcpu_init (in)
+Returns: 0 on success; -1 on error
+Errors:
+  EINVAL:    the target is unknown, or the combination of features is invalid.
+  ENOENT:    a features bit specified is unknown.
+
+This tells KVM what type of CPU to present to the guest, and what
+optional features it should have.  This will cause a reset of the cpu
+registers to their initial values.  If this is not called, KVM_RUN will
+return ENOEXEC for that vcpu.
+
+Note that because some registers reflect machine topology, all vcpus
+should be created before this ioctl is invoked.
+
+
+4.78 KVM_GET_REG_LIST
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct kvm_reg_list (in/out)
+Returns: 0 on success; -1 on error
+Errors:
+  E2BIG:     the reg index list is too big to fit in the array specified by
+             the user (the number required will be written into n).
+
+struct kvm_reg_list {
+	__u64 n; /* number of registers in reg[] */
+	__u64 reg[0];
+};
+
+This ioctl returns the guest registers that are supported for the
+KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
+
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index f95ba14..887a0e6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2317,3 +2317,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
+
+source "arch/arm/kvm/Kconfig"
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 30c443c..4bcd2d6 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -252,6 +252,7 @@ core-$(CONFIG_FPE_NWFPE)	+= arch/arm/nwfpe/
 core-$(CONFIG_FPE_FASTFPE)	+= $(FASTFPE_OBJ)
 core-$(CONFIG_VFP)		+= arch/arm/vfp/
 core-$(CONFIG_XEN)		+= arch/arm/xen/
+core-$(CONFIG_KVM_ARM_HOST) 	+= arch/arm/kvm/
 
 # If we have a machine-specific directory, then include it in the build.
 core-y				+= arch/arm/kernel/ arch/arm/mm/ arch/arm/common/
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
new file mode 100644
index 0000000..c196a22
--- /dev/null
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_ARM_H__
+#define __ARM_KVM_ARM_H__
+
+#include <asm/types.h>
+
+#endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
new file mode 100644
index 0000000..f9993e5
--- /dev/null
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -0,0 +1,58 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_ASM_H__
+#define __ARM_KVM_ASM_H__
+
+/* 0 is reserved as an invalid value. */
+#define c0_MPIDR	1	/* MultiProcessor ID Register */
+#define c0_CSSELR	2	/* Cache Size Selection Register */
+#define c1_SCTLR	3	/* System Control Register */
+#define c1_ACTLR	4	/* Auxilliary Control Register */
+#define c1_CPACR	5	/* Coprocessor Access Control */
+#define c2_TTBR0	6	/* Translation Table Base Register 0 */
+#define c2_TTBR0_high	7	/* TTBR0 top 32 bits */
+#define c2_TTBR1	8	/* Translation Table Base Register 1 */
+#define c2_TTBR1_high	9	/* TTBR1 top 32 bits */
+#define c2_TTBCR	10	/* Translation Table Base Control R. */
+#define c3_DACR		11	/* Domain Access Control Register */
+#define c5_DFSR		12	/* Data Fault Status Register */
+#define c5_IFSR		13	/* Instruction Fault Status Register */
+#define c5_ADFSR	14	/* Auxilary Data Fault Status R */
+#define c5_AIFSR	15	/* Auxilary Instrunction Fault Status R */
+#define c6_DFAR		16	/* Data Fault Address Register */
+#define c6_IFAR		17	/* Instruction Fault Address Register */
+#define c9_L2CTLR	18	/* Cortex A15 L2 Control Register */
+#define c10_PRRR	19	/* Primary Region Remap Register */
+#define c10_NMRR	20	/* Normal Memory Remap Register */
+#define c12_VBAR	21	/* Vector Base Address Register */
+#define c13_CID		22	/* Context ID Register */
+#define c13_TID_URW	23	/* Thread ID, User R/W */
+#define c13_TID_URO	24	/* Thread ID, User R/O */
+#define c13_TID_PRIV	25	/* Thread ID, Privileged */
+#define NR_CP15_REGS	26	/* Number of regs (incl. invalid) */
+
+#define ARM_EXCEPTION_RESET	  0
+#define ARM_EXCEPTION_UNDEFINED   1
+#define ARM_EXCEPTION_SOFTWARE    2
+#define ARM_EXCEPTION_PREF_ABORT  3
+#define ARM_EXCEPTION_DATA_ABORT  4
+#define ARM_EXCEPTION_IRQ	  5
+#define ARM_EXCEPTION_FIQ	  6
+
+#endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
new file mode 100644
index 0000000..b6d023d
--- /dev/null
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2012 Rusty Russell IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_COPROC_H__
+#define __ARM_KVM_COPROC_H__
+#include <linux/kvm_host.h>
+
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
+
+#endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
new file mode 100644
index 0000000..17dad67
--- /dev/null
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_EMULATE_H__
+#define __ARM_KVM_EMULATE_H__
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_asm.h>
+
+u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
+u32 *vcpu_spsr(struct kvm_vcpu *vcpu);
+
+static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
+{
+	return (u32 *)&vcpu->arch.regs.usr_regs.ARM_pc;
+}
+
+static inline u32 *vcpu_cpsr(struct kvm_vcpu *vcpu)
+{
+	return (u32 *)&vcpu->arch.regs.usr_regs.ARM_cpsr;
+}
+
+static inline bool mode_has_spsr(struct kvm_vcpu *vcpu)
+{
+	unsigned long cpsr_mode = vcpu->arch.regs.usr_regs.ARM_cpsr & MODE_MASK;
+	return (cpsr_mode > USR_MODE && cpsr_mode < SYSTEM_MODE);
+}
+
+static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
+{
+	unsigned long cpsr_mode = vcpu->arch.regs.usr_regs.ARM_cpsr & MODE_MASK;
+	return cpsr_mode > USR_MODE;;
+}
+
+#endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
new file mode 100644
index 0000000..92e89f3
--- /dev/null
+++ b/arch/arm/include/asm/kvm_host.h
@@ -0,0 +1,114 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_HOST_H__
+#define __ARM_KVM_HOST_H__
+
+#include <asm/kvm.h>
+#include <asm/kvm_asm.h>
+
+#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
+#define KVM_USER_MEM_SLOTS 32
+#define KVM_PRIVATE_MEM_SLOTS 4
+#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+
+#define KVM_VCPU_MAX_FEATURES 0
+
+/* We don't currently support large pages. */
+#define KVM_HPAGE_GFN_SHIFT(x)	0
+#define KVM_NR_PAGE_SIZES	1
+#define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
+
+struct kvm_vcpu;
+u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
+int kvm_target_cpu(void);
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
+
+struct kvm_arch {
+	/* VTTBR value associated with below pgd and vmid */
+	u64    vttbr;
+
+	/*
+	 * Anything that is not used directly from assembly code goes
+	 * here.
+	 */
+
+	/* The VMID generation used for the virt. memory system */
+	u64    vmid_gen;
+	u32    vmid;
+
+	/* Stage-2 page table */
+	pgd_t *pgd;
+};
+
+#define KVM_NR_MEM_OBJS     40
+
+/*
+ * We don't want allocation failures within the mmu code, so we preallocate
+ * enough memory for a single page fault in a cache.
+ */
+struct kvm_mmu_memory_cache {
+	int nobjs;
+	void *objects[KVM_NR_MEM_OBJS];
+};
+
+struct kvm_vcpu_arch {
+	struct kvm_regs regs;
+
+	int target; /* Processor target */
+	DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);
+
+	/* System control coprocessor (cp15) */
+	u32 cp15[NR_CP15_REGS];
+
+	/* The CPU type we expose to the VM */
+	u32 midr;
+
+	/* Exception Information */
+	u32 hsr;		/* Hyp Syndrome Register */
+	u32 hxfar;		/* Hyp Data/Inst Fault Address Register */
+	u32 hpfar;		/* Hyp IPA Fault Address Register */
+
+	/* Interrupt related fields */
+	u32 irq_lines;		/* IRQ and FIQ levels */
+
+	/* Hyp exception information */
+	u32 hyp_pc;		/* PC when exception was taken from Hyp mode */
+
+	/* Cache some mmu pages needed inside spinlock regions */
+	struct kvm_mmu_memory_cache mmu_page_cache;
+};
+
+struct kvm_vm_stat {
+	u32 remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat {
+	u32 halt_wakeup;
+};
+
+struct kvm_vcpu_init;
+int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
+			const struct kvm_vcpu_init *init);
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
+struct kvm_one_reg;
+int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
new file mode 100644
index 0000000..c6298b1
--- /dev/null
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -0,0 +1,106 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_H__
+#define __ARM_KVM_H__
+
+#include <asm/types.h>
+#include <asm/ptrace.h>
+
+#define __KVM_HAVE_GUEST_DEBUG
+
+#define KVM_REG_SIZE(id)						\
+	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
+
+/* Valid for svc_regs, abt_regs, und_regs, irq_regs in struct kvm_regs */
+#define KVM_ARM_SVC_sp		svc_regs[0]
+#define KVM_ARM_SVC_lr		svc_regs[1]
+#define KVM_ARM_SVC_spsr	svc_regs[2]
+#define KVM_ARM_ABT_sp		abt_regs[0]
+#define KVM_ARM_ABT_lr		abt_regs[1]
+#define KVM_ARM_ABT_spsr	abt_regs[2]
+#define KVM_ARM_UND_sp		und_regs[0]
+#define KVM_ARM_UND_lr		und_regs[1]
+#define KVM_ARM_UND_spsr	und_regs[2]
+#define KVM_ARM_IRQ_sp		irq_regs[0]
+#define KVM_ARM_IRQ_lr		irq_regs[1]
+#define KVM_ARM_IRQ_spsr	irq_regs[2]
+
+/* Valid only for fiq_regs in struct kvm_regs */
+#define KVM_ARM_FIQ_r8		fiq_regs[0]
+#define KVM_ARM_FIQ_r9		fiq_regs[1]
+#define KVM_ARM_FIQ_r10		fiq_regs[2]
+#define KVM_ARM_FIQ_fp		fiq_regs[3]
+#define KVM_ARM_FIQ_ip		fiq_regs[4]
+#define KVM_ARM_FIQ_sp		fiq_regs[5]
+#define KVM_ARM_FIQ_lr		fiq_regs[6]
+#define KVM_ARM_FIQ_spsr	fiq_regs[7]
+
+struct kvm_regs {
+	struct pt_regs usr_regs;/* R0_usr - R14_usr, PC, CPSR */
+	__u32 svc_regs[3];	/* SP_svc, LR_svc, SPSR_svc */
+	__u32 abt_regs[3];	/* SP_abt, LR_abt, SPSR_abt */
+	__u32 und_regs[3];	/* SP_und, LR_und, SPSR_und */
+	__u32 irq_regs[3];	/* SP_irq, LR_irq, SPSR_irq */
+	__u32 fiq_regs[8];	/* R8_fiq - R14_fiq, SPSR_fiq */
+};
+
+/* Supported Processor Types */
+#define KVM_ARM_TARGET_CORTEX_A15	0
+#define KVM_ARM_NUM_TARGETS		1
+
+struct kvm_vcpu_init {
+	__u32 target;
+	__u32 features[7];
+};
+
+struct kvm_sregs {
+};
+
+struct kvm_fpu {
+};
+
+struct kvm_guest_debug_arch {
+};
+
+struct kvm_debug_exit_arch {
+};
+
+struct kvm_sync_regs {
+};
+
+struct kvm_arch_memory_slot {
+};
+
+/* If you need to interpret the index values, here is the key: */
+#define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
+#define KVM_REG_ARM_COPROC_SHIFT	16
+#define KVM_REG_ARM_32_OPC2_MASK	0x0000000000000007
+#define KVM_REG_ARM_32_OPC2_SHIFT	0
+#define KVM_REG_ARM_OPC1_MASK		0x0000000000000078
+#define KVM_REG_ARM_OPC1_SHIFT		3
+#define KVM_REG_ARM_CRM_MASK		0x0000000000000780
+#define KVM_REG_ARM_CRM_SHIFT		7
+#define KVM_REG_ARM_32_CRN_MASK		0x0000000000007800
+#define KVM_REG_ARM_32_CRN_SHIFT	11
+
+/* Normal registers are mapped as coprocessor 16. */
+#define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
+
+#endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
new file mode 100644
index 0000000..4a01b6f
--- /dev/null
+++ b/arch/arm/kvm/Kconfig
@@ -0,0 +1,55 @@
+#
+# KVM configuration
+#
+
+source "virt/kvm/Kconfig"
+
+menuconfig VIRTUALIZATION
+	bool "Virtualization"
+	---help---
+	  Say Y here to get to see options for using your Linux host to run
+	  other operating systems inside virtual machines (guests).
+	  This option alone does not add any kernel code.
+
+	  If you say N, all options in this submenu will be skipped and
+	  disabled.
+
+if VIRTUALIZATION
+
+config KVM
+	bool "Kernel-based Virtual Machine (KVM) support"
+	select PREEMPT_NOTIFIERS
+	select ANON_INODES
+	select KVM_MMIO
+	select KVM_ARM_HOST
+	depends on ARM_VIRT_EXT && ARM_LPAE
+	---help---
+	  Support hosting virtualized guest machines. You will also
+	  need to select one or more of the processor modules below.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
+config KVM_ARM_HOST
+	bool "KVM host support for ARM cpus."
+	depends on KVM
+	depends on MMU
+	---help---
+	  Provides host support for ARM processors.
+
+config KVM_ARM_MAX_VCPUS
+	int "Number maximum supported virtual CPUs per VM"
+	depends on KVM_ARM_HOST
+	default 4
+	help
+	  Static number of max supported virtual CPUs per VM.
+
+	  If you choose a high number, the vcpu structures will be quite
+	  large, so only choose a reasonable number that you expect to
+	  actually use.
+
+source drivers/virtio/Kconfig
+
+endif # VIRTUALIZATION
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
new file mode 100644
index 0000000..dfc293f
--- /dev/null
+++ b/arch/arm/kvm/Makefile
@@ -0,0 +1,21 @@
+#
+# Makefile for Kernel-based Virtual Machine module
+#
+
+plus_virt := $(call as-instr,.arch_extension virt,+virt)
+ifeq ($(plus_virt),+virt)
+	plus_virt_def := -DREQUIRES_VIRT=1
+endif
+
+ccflags-y += -Ivirt/kvm -Iarch/arm/kvm
+CFLAGS_arm.o := -I. $(plus_virt_def)
+CFLAGS_mmu.o := -I.
+
+AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
+AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
+
+kvm-arm-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+
+obj-y += kvm-arm.o init.o interrupts.o
+obj-y += arm.o guest.o mmu.o emulate.o reset.o
+obj-y += coproc.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
new file mode 100644
index 0000000..82cb338
--- /dev/null
+++ b/arch/arm/kvm/arm.c
@@ -0,0 +1,355 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <linux/mman.h>
+#include <linux/sched.h>
+#include <trace/events/kvm.h>
+
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
+#include <asm/unified.h>
+#include <asm/uaccess.h>
+#include <asm/ptrace.h>
+#include <asm/mman.h>
+#include <asm/cputype.h>
+
+#ifdef REQUIRES_VIRT
+__asm__(".arch_extension	virt");
+#endif
+
+int kvm_arch_hardware_enable(void *garbage)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
+{
+	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
+}
+
+void kvm_arch_hardware_disable(void *garbage)
+{
+}
+
+int kvm_arch_hardware_setup(void)
+{
+	return 0;
+}
+
+void kvm_arch_hardware_unsetup(void)
+{
+}
+
+void kvm_arch_check_processor_compat(void *rtn)
+{
+	*(int *)rtn = 0;
+}
+
+void kvm_arch_sync_events(struct kvm *kvm)
+{
+}
+
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+{
+	if (type)
+		return -EINVAL;
+
+	return 0;
+}
+
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+	return VM_FAULT_SIGBUS;
+}
+
+void kvm_arch_free_memslot(struct kvm_memory_slot *free,
+			   struct kvm_memory_slot *dont)
+{
+}
+
+int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
+{
+	return 0;
+}
+
+void kvm_arch_destroy_vm(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+		if (kvm->vcpus[i]) {
+			kvm_arch_vcpu_free(kvm->vcpus[i]);
+			kvm->vcpus[i] = NULL;
+		}
+	}
+}
+
+int kvm_dev_ioctl_check_extension(long ext)
+{
+	int r;
+	switch (ext) {
+	case KVM_CAP_USER_MEMORY:
+	case KVM_CAP_SYNC_MMU:
+	case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
+	case KVM_CAP_ONE_REG:
+		r = 1;
+		break;
+	case KVM_CAP_COALESCED_MMIO:
+		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
+		break;
+	case KVM_CAP_NR_VCPUS:
+		r = num_online_cpus();
+		break;
+	case KVM_CAP_MAX_VCPUS:
+		r = KVM_MAX_VCPUS;
+		break;
+	default:
+		r = 0;
+		break;
+	}
+	return r;
+}
+
+long kvm_arch_dev_ioctl(struct file *filp,
+			unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_set_memory_region(struct kvm *kvm,
+			       struct kvm_userspace_memory_region *mem,
+			       struct kvm_memory_slot old,
+			       int user_alloc)
+{
+	return 0;
+}
+
+int kvm_arch_prepare_memory_region(struct kvm *kvm,
+				   struct kvm_memory_slot *memslot,
+				   struct kvm_memory_slot old,
+				   struct kvm_userspace_memory_region *mem,
+				   bool user_alloc)
+{
+	return 0;
+}
+
+void kvm_arch_commit_memory_region(struct kvm *kvm,
+				   struct kvm_userspace_memory_region *mem,
+				   struct kvm_memory_slot old,
+				   bool user_alloc)
+{
+}
+
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+}
+
+void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+				   struct kvm_memory_slot *slot)
+{
+}
+
+struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	int err;
+	struct kvm_vcpu *vcpu;
+
+	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	if (!vcpu) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	return vcpu;
+free_vcpu:
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
+out:
+	return ERR_PTR(err);
+}
+
+int kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvm_arch_vcpu_free(vcpu);
+}
+
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int __attribute_const__ kvm_target_cpu(void)
+{
+	unsigned long implementor = read_cpuid_implementor();
+	unsigned long part_number = read_cpuid_part_number();
+
+	if (implementor != ARM_CPU_IMP_ARM)
+		return -EINVAL;
+
+	switch (part_number) {
+	case ARM_CPU_PART_CORTEX_A15:
+		return KVM_ARM_TARGET_CORTEX_A15;
+	default:
+		return -EINVAL;
+	}
+}
+
+int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
+void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
+{
+}
+
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+					struct kvm_guest_debug *dbg)
+{
+	return -EINVAL;
+}
+
+
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vcpu_ioctl(struct file *filp,
+			 unsigned int ioctl, unsigned long arg)
+{
+	struct kvm_vcpu *vcpu = filp->private_data;
+	void __user *argp = (void __user *)arg;
+
+	switch (ioctl) {
+	case KVM_ARM_VCPU_INIT: {
+		struct kvm_vcpu_init init;
+
+		if (copy_from_user(&init, argp, sizeof(init)))
+			return -EFAULT;
+
+		return kvm_vcpu_set_target(vcpu, &init);
+
+	}
+	case KVM_SET_ONE_REG:
+	case KVM_GET_ONE_REG: {
+		struct kvm_one_reg reg;
+		if (copy_from_user(&reg, argp, sizeof(reg)))
+			return -EFAULT;
+		if (ioctl == KVM_SET_ONE_REG)
+			return kvm_arm_set_reg(vcpu, &reg);
+		else
+			return kvm_arm_get_reg(vcpu, &reg);
+	}
+	case KVM_GET_REG_LIST: {
+		struct kvm_reg_list __user *user_list = argp;
+		struct kvm_reg_list reg_list;
+		unsigned n;
+
+		if (copy_from_user(&reg_list, user_list, sizeof(reg_list)))
+			return -EFAULT;
+		n = reg_list.n;
+		reg_list.n = kvm_arm_num_regs(vcpu);
+		if (copy_to_user(user_list, &reg_list, sizeof(reg_list)))
+			return -EFAULT;
+		if (n < reg_list.n)
+			return -E2BIG;
+		return kvm_arm_copy_reg_indices(vcpu, user_list->reg);
+	}
+	default:
+		return -EINVAL;
+	}
+}
+
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vm_ioctl(struct file *filp,
+		       unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_init(void *opaque)
+{
+	return 0;
+}
+
+/* NOP: Compiling as a module not supported */
+void kvm_arch_exit(void)
+{
+}
+
+static int arm_init(void)
+{
+	int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+	return rc;
+}
+
+module_init(arm_init);
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
new file mode 100644
index 0000000..0c43355
--- /dev/null
+++ b/arch/arm/kvm/coproc.c
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Authors: Rusty Russell <rusty@rustcorp.com.au>
+ *          Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/kvm_host.h>
+
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
+{
+}
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
new file mode 100644
index 0000000..3eadc25
--- /dev/null
+++ b/arch/arm/kvm/emulate.c
@@ -0,0 +1,155 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <asm/kvm_emulate.h>
+
+#define VCPU_NR_MODES		6
+#define VCPU_REG_OFFSET_USR	0
+#define VCPU_REG_OFFSET_FIQ	1
+#define VCPU_REG_OFFSET_IRQ	2
+#define VCPU_REG_OFFSET_SVC	3
+#define VCPU_REG_OFFSET_ABT	4
+#define VCPU_REG_OFFSET_UND	5
+#define REG_OFFSET(_reg) \
+	(offsetof(struct kvm_regs, _reg) / sizeof(u32))
+
+#define USR_REG_OFFSET(_num) REG_OFFSET(usr_regs.uregs[_num])
+
+static const unsigned long vcpu_reg_offsets[VCPU_NR_MODES][15] = {
+	/* USR/SYS Registers */
+	[VCPU_REG_OFFSET_USR] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12), USR_REG_OFFSET(13),	USR_REG_OFFSET(14),
+	},
+
+	/* FIQ Registers */
+	[VCPU_REG_OFFSET_FIQ] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7),
+		REG_OFFSET(fiq_regs[0]), /* r8 */
+		REG_OFFSET(fiq_regs[1]), /* r9 */
+		REG_OFFSET(fiq_regs[2]), /* r10 */
+		REG_OFFSET(fiq_regs[3]), /* r11 */
+		REG_OFFSET(fiq_regs[4]), /* r12 */
+		REG_OFFSET(fiq_regs[5]), /* r13 */
+		REG_OFFSET(fiq_regs[6]), /* r14 */
+	},
+
+	/* IRQ Registers */
+	[VCPU_REG_OFFSET_IRQ] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(irq_regs[0]), /* r13 */
+		REG_OFFSET(irq_regs[1]), /* r14 */
+	},
+
+	/* SVC Registers */
+	[VCPU_REG_OFFSET_SVC] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(svc_regs[0]), /* r13 */
+		REG_OFFSET(svc_regs[1]), /* r14 */
+	},
+
+	/* ABT Registers */
+	[VCPU_REG_OFFSET_ABT] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(abt_regs[0]), /* r13 */
+		REG_OFFSET(abt_regs[1]), /* r14 */
+	},
+
+	/* UND Registers */
+	[VCPU_REG_OFFSET_UND] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(und_regs[0]), /* r13 */
+		REG_OFFSET(und_regs[1]), /* r14 */
+	},
+};
+
+/*
+ * Return a pointer to the register number valid in the current mode of
+ * the virtual CPU.
+ */
+u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num)
+{
+	u32 *reg_array = (u32 *)&vcpu->arch.regs;
+	u32 mode = *vcpu_cpsr(vcpu) & MODE_MASK;
+
+	switch (mode) {
+	case USR_MODE...SVC_MODE:
+		mode &= ~MODE32_BIT; /* 0 ... 3 */
+		break;
+
+	case ABT_MODE:
+		mode = VCPU_REG_OFFSET_ABT;
+		break;
+
+	case UND_MODE:
+		mode = VCPU_REG_OFFSET_UND;
+		break;
+
+	case SYSTEM_MODE:
+		mode = VCPU_REG_OFFSET_USR;
+		break;
+
+	default:
+		BUG();
+	}
+
+	return reg_array + vcpu_reg_offsets[mode][reg_num];
+}
+
+/*
+ * Return the SPSR for the current mode of the virtual CPU.
+ */
+u32 *vcpu_spsr(struct kvm_vcpu *vcpu)
+{
+	u32 mode = *vcpu_cpsr(vcpu) & MODE_MASK;
+	switch (mode) {
+	case SVC_MODE:
+		return &vcpu->arch.regs.KVM_ARM_SVC_spsr;
+	case ABT_MODE:
+		return &vcpu->arch.regs.KVM_ARM_ABT_spsr;
+	case UND_MODE:
+		return &vcpu->arch.regs.KVM_ARM_UND_spsr;
+	case IRQ_MODE:
+		return &vcpu->arch.regs.KVM_ARM_IRQ_spsr;
+	case FIQ_MODE:
+		return &vcpu->arch.regs.KVM_ARM_FIQ_spsr;
+	default:
+		BUG();
+	}
+}
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
new file mode 100644
index 0000000..8db3811
--- /dev/null
+++ b/arch/arm/kvm/guest.c
@@ -0,0 +1,221 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <asm/uaccess.h>
+#include <asm/kvm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
+
+#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
+#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ NULL }
+};
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+static u64 core_reg_offset_from_id(u64 id)
+{
+	return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
+}
+
+static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	u32 __user *uaddr = (u32 __user *)(long)reg->addr;
+	struct kvm_regs *regs = &vcpu->arch.regs;
+	u64 off;
+
+	if (KVM_REG_SIZE(reg->id) != 4)
+		return -ENOENT;
+
+	/* Our ID is an index into the kvm_regs struct. */
+	off = core_reg_offset_from_id(reg->id);
+	if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
+		return -ENOENT;
+
+	return put_user(((u32 *)regs)[off], uaddr);
+}
+
+static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	u32 __user *uaddr = (u32 __user *)(long)reg->addr;
+	struct kvm_regs *regs = &vcpu->arch.regs;
+	u64 off, val;
+
+	if (KVM_REG_SIZE(reg->id) != 4)
+		return -ENOENT;
+
+	/* Our ID is an index into the kvm_regs struct. */
+	off = core_reg_offset_from_id(reg->id);
+	if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
+		return -ENOENT;
+
+	if (get_user(val, uaddr) != 0)
+		return -EFAULT;
+
+	if (off == KVM_REG_ARM_CORE_REG(usr_regs.ARM_cpsr)) {
+		unsigned long mode = val & MODE_MASK;
+		switch (mode) {
+		case USR_MODE:
+		case FIQ_MODE:
+		case IRQ_MODE:
+		case SVC_MODE:
+		case ABT_MODE:
+		case UND_MODE:
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	((u32 *)regs)[off] = val;
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	return -EINVAL;
+}
+
+static unsigned long num_core_regs(void)
+{
+	return sizeof(struct kvm_regs) / sizeof(u32);
+}
+
+/**
+ * kvm_arm_num_regs - how many registers do we present via KVM_GET_ONE_REG
+ *
+ * This is for all registers.
+ */
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
+{
+	return num_core_regs();
+}
+
+/**
+ * kvm_arm_copy_reg_indices - get indices of all registers.
+ *
+ * We do core registers right here, then we apppend coproc regs.
+ */
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
+{
+	unsigned int i;
+	const u64 core_reg = KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_CORE;
+
+	for (i = 0; i < sizeof(struct kvm_regs)/sizeof(u32); i++) {
+		if (put_user(core_reg | i, uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	return 0;
+}
+
+int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	/* We currently use nothing arch-specific in upper 32 bits */
+	if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM >> 32)
+		return -EINVAL;
+
+	/* Register group 16 means we want a core register. */
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
+		return get_core_reg(vcpu, reg);
+
+	return -EINVAL;
+}
+
+int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	/* We currently use nothing arch-specific in upper 32 bits */
+	if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM >> 32)
+		return -EINVAL;
+
+	/* Register group 16 means we set a core register. */
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
+		return set_core_reg(vcpu, reg);
+
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
+			const struct kvm_vcpu_init *init)
+{
+	unsigned int i;
+
+	/* We can only do a cortex A15 for now. */
+	if (init->target != kvm_target_cpu())
+		return -EINVAL;
+
+	vcpu->arch.target = init->target;
+	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
+
+	/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
+	for (i = 0; i < sizeof(init->features)*8; i++) {
+		if (init->features[i / 32] & (1 << (i % 32))) {
+			if (i >= KVM_VCPU_MAX_FEATURES)
+				return -ENOENT;
+			set_bit(i, vcpu->arch.features);
+		}
+	}
+
+	/* Now we know what it is, we can reset it. */
+	return kvm_reset_vcpu(vcpu);
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+				  struct kvm_translation *tr)
+{
+	return -EINVAL;
+}
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
new file mode 100644
index 0000000..1dc8926
--- /dev/null
+++ b/arch/arm/kvm/init.S
@@ -0,0 +1,19 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
new file mode 100644
index 0000000..1dc8926
--- /dev/null
+++ b/arch/arm/kvm/interrupts.S
@@ -0,0 +1,19 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
new file mode 100644
index 0000000..10ed464
--- /dev/null
+++ b/arch/arm/kvm/mmu.c
@@ -0,0 +1,17 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
new file mode 100644
index 0000000..b80256b
--- /dev/null
+++ b/arch/arm/kvm/reset.c
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/sched.h>
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+
+#include <asm/unified.h>
+#include <asm/ptrace.h>
+#include <asm/cputype.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_coproc.h>
+
+/******************************************************************************
+ * Cortex-A15 Reset Values
+ */
+
+static const int a15_max_cpu_idx = 3;
+
+static struct kvm_regs a15_regs_reset = {
+	.usr_regs.ARM_cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
+};
+
+
+/*******************************************************************************
+ * Exported reset function
+ */
+
+/**
+ * kvm_reset_vcpu - sets core registers and cp15 registers to reset value
+ * @vcpu: The VCPU pointer
+ *
+ * This function finds the right table above and sets the registers on the
+ * virtual CPU struct to their architectually defined reset values.
+ */
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+	struct kvm_regs *cpu_reset;
+
+	switch (vcpu->arch.target) {
+	case KVM_ARM_TARGET_CORTEX_A15:
+		if (vcpu->vcpu_id > a15_max_cpu_idx)
+			return -EINVAL;
+		cpu_reset = &a15_regs_reset;
+		vcpu->arch.midr = read_cpuid_id();
+		break;
+	default:
+		return -ENODEV;
+	}
+
+	/* Reset core registers */
+	memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
+
+	/* Reset CP15 registers */
+	kvm_reset_coprocs(vcpu);
+
+	return 0;
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
new file mode 100644
index 0000000..f8869c1
--- /dev/null
+++ b/arch/arm/kvm/trace.h
@@ -0,0 +1,52 @@
+#if !defined(_TRACE_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+/*
+ * Tracepoints for entry/exit to guest
+ */
+TRACE_EVENT(kvm_entry,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+TRACE_EVENT(kvm_exit,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+
+
+#endif /* _TRACE_KVM_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH arch/arm/kvm
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e6e5d4b..24978d5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -764,6 +764,11 @@ struct kvm_dirty_tlb {
 #define KVM_REG_SIZE_U512	0x0060000000000000ULL
 #define KVM_REG_SIZE_U1024	0x0070000000000000ULL
 
+struct kvm_reg_list {
+	__u64 n; /* number of regs */
+	__u64 reg[0];
+};
+
 struct kvm_one_reg {
 	__u64 id;
 	__u64 addr;
@@ -932,6 +937,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_ONE_REG		  _IOW(KVMIO,  0xac, struct kvm_one_reg)
 /* VM is being stopped by host */
 #define KVM_KVMCLOCK_CTRL	  _IO(KVMIO,   0xad)
+#define KVM_ARM_VCPU_INIT	  _IOW(KVMIO,  0xae, struct kvm_vcpu_init)
+#define KVM_GET_REG_LIST	  _IOWR(KVMIO, 0xb0, struct kvm_reg_list)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-08 18:38   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:38 UTC (permalink / raw)
  To: linux-arm-kernel

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt  |   57 +++++-
 arch/arm/Kconfig                   |    2 
 arch/arm/Makefile                  |    1 
 arch/arm/include/asm/kvm_arm.h     |   24 ++
 arch/arm/include/asm/kvm_asm.h     |   58 ++++++
 arch/arm/include/asm/kvm_coproc.h  |   24 ++
 arch/arm/include/asm/kvm_emulate.h |   50 +++++
 arch/arm/include/asm/kvm_host.h    |  114 ++++++++++++
 arch/arm/include/uapi/asm/kvm.h    |  106 +++++++++++
 arch/arm/kvm/Kconfig               |   55 ++++++
 arch/arm/kvm/Makefile              |   21 ++
 arch/arm/kvm/arm.c                 |  355 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/coproc.c              |   23 ++
 arch/arm/kvm/emulate.c             |  155 ++++++++++++++++
 arch/arm/kvm/guest.c               |  221 ++++++++++++++++++++++
 arch/arm/kvm/init.S                |   19 ++
 arch/arm/kvm/interrupts.S          |   19 ++
 arch/arm/kvm/mmu.c                 |   17 ++
 arch/arm/kvm/reset.c               |   74 ++++++++
 arch/arm/kvm/trace.h               |   52 +++++
 include/uapi/linux/kvm.h           |    7 +
 21 files changed, 1450 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/uapi/asm/kvm.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index a4df553..4237c27 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -293,7 +293,7 @@ kvm_run' (see below).
 4.11 KVM_GET_REGS
 
 Capability: basic
-Architectures: all
+Architectures: all except ARM
 Type: vcpu ioctl
 Parameters: struct kvm_regs (out)
 Returns: 0 on success, -1 on error
@@ -314,7 +314,7 @@ struct kvm_regs {
 4.12 KVM_SET_REGS
 
 Capability: basic
-Architectures: all
+Architectures: all except ARM
 Type: vcpu ioctl
 Parameters: struct kvm_regs (in)
 Returns: 0 on success, -1 on error
@@ -600,7 +600,7 @@ struct kvm_fpu {
 4.24 KVM_CREATE_IRQCHIP
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, ARM
 Type: vm ioctl
 Parameters: none
 Returns: 0 on success, -1 on error
@@ -608,7 +608,8 @@ Returns: 0 on success, -1 on error
 Creates an interrupt controller model in the kernel.  On x86, creates a virtual
 ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
 local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
-only go to the IOAPIC.  On ia64, a IOSAPIC is created.
+only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM, a GIC is
+created.
 
 
 4.25 KVM_IRQ_LINE
@@ -1775,6 +1776,14 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_VPA_DTL   | 128
   PPC   | KVM_REG_PPC_EPCR	| 32
 
+ARM registers are mapped using the lower 32 bits.  The upper 16 of that
+is the register group type, or coprocessor number:
+
+ARM core registers have the following id bit patterns:
+  0x4002 0000 0010 <index into the kvm_regs struct:16>
+
+
+
 4.69 KVM_GET_ONE_REG
 
 Capability: KVM_CAP_ONE_REG
@@ -2127,6 +2136,46 @@ written, then `n_invalid' invalid entries, invalidating any previously
 valid entries found.
 
 
+4.77 KVM_ARM_VCPU_INIT
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct struct kvm_vcpu_init (in)
+Returns: 0 on success; -1 on error
+Errors:
+ ?EINVAL: ???the target is unknown, or the combination of features is invalid.
+ ?ENOENT: ???a features bit specified is unknown.
+
+This tells KVM what type of CPU to present to the guest, and what
+optional features it should have. ?This will cause a reset of the cpu
+registers to their initial values. ?If this is not called, KVM_RUN will
+return ENOEXEC for that vcpu.
+
+Note that because some registers reflect machine topology, all vcpus
+should be created before this ioctl is invoked.
+
+
+4.78 KVM_GET_REG_LIST
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct kvm_reg_list (in/out)
+Returns: 0 on success; -1 on error
+Errors:
+ ?E2BIG: ????the reg index list is too big to fit in the array specified by
+ ????????????the user (the number required will be written into n).
+
+struct kvm_reg_list {
+	__u64 n; /* number of registers in reg[] */
+	__u64 reg[0];
+};
+
+This ioctl returns the guest registers that are supported for the
+KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
+
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index f95ba14..887a0e6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2317,3 +2317,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
+
+source "arch/arm/kvm/Kconfig"
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 30c443c..4bcd2d6 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -252,6 +252,7 @@ core-$(CONFIG_FPE_NWFPE)	+= arch/arm/nwfpe/
 core-$(CONFIG_FPE_FASTFPE)	+= $(FASTFPE_OBJ)
 core-$(CONFIG_VFP)		+= arch/arm/vfp/
 core-$(CONFIG_XEN)		+= arch/arm/xen/
+core-$(CONFIG_KVM_ARM_HOST) 	+= arch/arm/kvm/
 
 # If we have a machine-specific directory, then include it in the build.
 core-y				+= arch/arm/kernel/ arch/arm/mm/ arch/arm/common/
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
new file mode 100644
index 0000000..c196a22
--- /dev/null
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_ARM_H__
+#define __ARM_KVM_ARM_H__
+
+#include <asm/types.h>
+
+#endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
new file mode 100644
index 0000000..f9993e5
--- /dev/null
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -0,0 +1,58 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_ASM_H__
+#define __ARM_KVM_ASM_H__
+
+/* 0 is reserved as an invalid value. */
+#define c0_MPIDR	1	/* MultiProcessor ID Register */
+#define c0_CSSELR	2	/* Cache Size Selection Register */
+#define c1_SCTLR	3	/* System Control Register */
+#define c1_ACTLR	4	/* Auxilliary Control Register */
+#define c1_CPACR	5	/* Coprocessor Access Control */
+#define c2_TTBR0	6	/* Translation Table Base Register 0 */
+#define c2_TTBR0_high	7	/* TTBR0 top 32 bits */
+#define c2_TTBR1	8	/* Translation Table Base Register 1 */
+#define c2_TTBR1_high	9	/* TTBR1 top 32 bits */
+#define c2_TTBCR	10	/* Translation Table Base Control R. */
+#define c3_DACR		11	/* Domain Access Control Register */
+#define c5_DFSR		12	/* Data Fault Status Register */
+#define c5_IFSR		13	/* Instruction Fault Status Register */
+#define c5_ADFSR	14	/* Auxilary Data Fault Status R */
+#define c5_AIFSR	15	/* Auxilary Instrunction Fault Status R */
+#define c6_DFAR		16	/* Data Fault Address Register */
+#define c6_IFAR		17	/* Instruction Fault Address Register */
+#define c9_L2CTLR	18	/* Cortex A15 L2 Control Register */
+#define c10_PRRR	19	/* Primary Region Remap Register */
+#define c10_NMRR	20	/* Normal Memory Remap Register */
+#define c12_VBAR	21	/* Vector Base Address Register */
+#define c13_CID		22	/* Context ID Register */
+#define c13_TID_URW	23	/* Thread ID, User R/W */
+#define c13_TID_URO	24	/* Thread ID, User R/O */
+#define c13_TID_PRIV	25	/* Thread ID, Privileged */
+#define NR_CP15_REGS	26	/* Number of regs (incl. invalid) */
+
+#define ARM_EXCEPTION_RESET	  0
+#define ARM_EXCEPTION_UNDEFINED   1
+#define ARM_EXCEPTION_SOFTWARE    2
+#define ARM_EXCEPTION_PREF_ABORT  3
+#define ARM_EXCEPTION_DATA_ABORT  4
+#define ARM_EXCEPTION_IRQ	  5
+#define ARM_EXCEPTION_FIQ	  6
+
+#endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
new file mode 100644
index 0000000..b6d023d
--- /dev/null
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2012 Rusty Russell IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_COPROC_H__
+#define __ARM_KVM_COPROC_H__
+#include <linux/kvm_host.h>
+
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
+
+#endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
new file mode 100644
index 0000000..17dad67
--- /dev/null
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_EMULATE_H__
+#define __ARM_KVM_EMULATE_H__
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_asm.h>
+
+u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
+u32 *vcpu_spsr(struct kvm_vcpu *vcpu);
+
+static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
+{
+	return (u32 *)&vcpu->arch.regs.usr_regs.ARM_pc;
+}
+
+static inline u32 *vcpu_cpsr(struct kvm_vcpu *vcpu)
+{
+	return (u32 *)&vcpu->arch.regs.usr_regs.ARM_cpsr;
+}
+
+static inline bool mode_has_spsr(struct kvm_vcpu *vcpu)
+{
+	unsigned long cpsr_mode = vcpu->arch.regs.usr_regs.ARM_cpsr & MODE_MASK;
+	return (cpsr_mode > USR_MODE && cpsr_mode < SYSTEM_MODE);
+}
+
+static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
+{
+	unsigned long cpsr_mode = vcpu->arch.regs.usr_regs.ARM_cpsr & MODE_MASK;
+	return cpsr_mode > USR_MODE;;
+}
+
+#endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
new file mode 100644
index 0000000..92e89f3
--- /dev/null
+++ b/arch/arm/include/asm/kvm_host.h
@@ -0,0 +1,114 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_HOST_H__
+#define __ARM_KVM_HOST_H__
+
+#include <asm/kvm.h>
+#include <asm/kvm_asm.h>
+
+#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
+#define KVM_USER_MEM_SLOTS 32
+#define KVM_PRIVATE_MEM_SLOTS 4
+#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+
+#define KVM_VCPU_MAX_FEATURES 0
+
+/* We don't currently support large pages. */
+#define KVM_HPAGE_GFN_SHIFT(x)	0
+#define KVM_NR_PAGE_SIZES	1
+#define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
+
+struct kvm_vcpu;
+u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
+int kvm_target_cpu(void);
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
+
+struct kvm_arch {
+	/* VTTBR value associated with below pgd and vmid */
+	u64    vttbr;
+
+	/*
+	 * Anything that is not used directly from assembly code goes
+	 * here.
+	 */
+
+	/* The VMID generation used for the virt. memory system */
+	u64    vmid_gen;
+	u32    vmid;
+
+	/* Stage-2 page table */
+	pgd_t *pgd;
+};
+
+#define KVM_NR_MEM_OBJS     40
+
+/*
+ * We don't want allocation failures within the mmu code, so we preallocate
+ * enough memory for a single page fault in a cache.
+ */
+struct kvm_mmu_memory_cache {
+	int nobjs;
+	void *objects[KVM_NR_MEM_OBJS];
+};
+
+struct kvm_vcpu_arch {
+	struct kvm_regs regs;
+
+	int target; /* Processor target */
+	DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);
+
+	/* System control coprocessor (cp15) */
+	u32 cp15[NR_CP15_REGS];
+
+	/* The CPU type we expose to the VM */
+	u32 midr;
+
+	/* Exception Information */
+	u32 hsr;		/* Hyp Syndrome Register */
+	u32 hxfar;		/* Hyp Data/Inst Fault Address Register */
+	u32 hpfar;		/* Hyp IPA Fault Address Register */
+
+	/* Interrupt related fields */
+	u32 irq_lines;		/* IRQ and FIQ levels */
+
+	/* Hyp exception information */
+	u32 hyp_pc;		/* PC when exception was taken from Hyp mode */
+
+	/* Cache some mmu pages needed inside spinlock regions */
+	struct kvm_mmu_memory_cache mmu_page_cache;
+};
+
+struct kvm_vm_stat {
+	u32 remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat {
+	u32 halt_wakeup;
+};
+
+struct kvm_vcpu_init;
+int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
+			const struct kvm_vcpu_init *init);
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
+struct kvm_one_reg;
+int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
new file mode 100644
index 0000000..c6298b1
--- /dev/null
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -0,0 +1,106 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_H__
+#define __ARM_KVM_H__
+
+#include <asm/types.h>
+#include <asm/ptrace.h>
+
+#define __KVM_HAVE_GUEST_DEBUG
+
+#define KVM_REG_SIZE(id)						\
+	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
+
+/* Valid for svc_regs, abt_regs, und_regs, irq_regs in struct kvm_regs */
+#define KVM_ARM_SVC_sp		svc_regs[0]
+#define KVM_ARM_SVC_lr		svc_regs[1]
+#define KVM_ARM_SVC_spsr	svc_regs[2]
+#define KVM_ARM_ABT_sp		abt_regs[0]
+#define KVM_ARM_ABT_lr		abt_regs[1]
+#define KVM_ARM_ABT_spsr	abt_regs[2]
+#define KVM_ARM_UND_sp		und_regs[0]
+#define KVM_ARM_UND_lr		und_regs[1]
+#define KVM_ARM_UND_spsr	und_regs[2]
+#define KVM_ARM_IRQ_sp		irq_regs[0]
+#define KVM_ARM_IRQ_lr		irq_regs[1]
+#define KVM_ARM_IRQ_spsr	irq_regs[2]
+
+/* Valid only for fiq_regs in struct kvm_regs */
+#define KVM_ARM_FIQ_r8		fiq_regs[0]
+#define KVM_ARM_FIQ_r9		fiq_regs[1]
+#define KVM_ARM_FIQ_r10		fiq_regs[2]
+#define KVM_ARM_FIQ_fp		fiq_regs[3]
+#define KVM_ARM_FIQ_ip		fiq_regs[4]
+#define KVM_ARM_FIQ_sp		fiq_regs[5]
+#define KVM_ARM_FIQ_lr		fiq_regs[6]
+#define KVM_ARM_FIQ_spsr	fiq_regs[7]
+
+struct kvm_regs {
+	struct pt_regs usr_regs;/* R0_usr - R14_usr, PC, CPSR */
+	__u32 svc_regs[3];	/* SP_svc, LR_svc, SPSR_svc */
+	__u32 abt_regs[3];	/* SP_abt, LR_abt, SPSR_abt */
+	__u32 und_regs[3];	/* SP_und, LR_und, SPSR_und */
+	__u32 irq_regs[3];	/* SP_irq, LR_irq, SPSR_irq */
+	__u32 fiq_regs[8];	/* R8_fiq - R14_fiq, SPSR_fiq */
+};
+
+/* Supported Processor Types */
+#define KVM_ARM_TARGET_CORTEX_A15	0
+#define KVM_ARM_NUM_TARGETS		1
+
+struct kvm_vcpu_init {
+	__u32 target;
+	__u32 features[7];
+};
+
+struct kvm_sregs {
+};
+
+struct kvm_fpu {
+};
+
+struct kvm_guest_debug_arch {
+};
+
+struct kvm_debug_exit_arch {
+};
+
+struct kvm_sync_regs {
+};
+
+struct kvm_arch_memory_slot {
+};
+
+/* If you need to interpret the index values, here is the key: */
+#define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
+#define KVM_REG_ARM_COPROC_SHIFT	16
+#define KVM_REG_ARM_32_OPC2_MASK	0x0000000000000007
+#define KVM_REG_ARM_32_OPC2_SHIFT	0
+#define KVM_REG_ARM_OPC1_MASK		0x0000000000000078
+#define KVM_REG_ARM_OPC1_SHIFT		3
+#define KVM_REG_ARM_CRM_MASK		0x0000000000000780
+#define KVM_REG_ARM_CRM_SHIFT		7
+#define KVM_REG_ARM_32_CRN_MASK		0x0000000000007800
+#define KVM_REG_ARM_32_CRN_SHIFT	11
+
+/* Normal registers are mapped as coprocessor 16. */
+#define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
+
+#endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
new file mode 100644
index 0000000..4a01b6f
--- /dev/null
+++ b/arch/arm/kvm/Kconfig
@@ -0,0 +1,55 @@
+#
+# KVM configuration
+#
+
+source "virt/kvm/Kconfig"
+
+menuconfig VIRTUALIZATION
+	bool "Virtualization"
+	---help---
+	  Say Y here to get to see options for using your Linux host to run
+	  other operating systems inside virtual machines (guests).
+	  This option alone does not add any kernel code.
+
+	  If you say N, all options in this submenu will be skipped and
+	  disabled.
+
+if VIRTUALIZATION
+
+config KVM
+	bool "Kernel-based Virtual Machine (KVM) support"
+	select PREEMPT_NOTIFIERS
+	select ANON_INODES
+	select KVM_MMIO
+	select KVM_ARM_HOST
+	depends on ARM_VIRT_EXT && ARM_LPAE
+	---help---
+	  Support hosting virtualized guest machines. You will also
+	  need to select one or more of the processor modules below.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
+config KVM_ARM_HOST
+	bool "KVM host support for ARM cpus."
+	depends on KVM
+	depends on MMU
+	---help---
+	  Provides host support for ARM processors.
+
+config KVM_ARM_MAX_VCPUS
+	int "Number maximum supported virtual CPUs per VM"
+	depends on KVM_ARM_HOST
+	default 4
+	help
+	  Static number of max supported virtual CPUs per VM.
+
+	  If you choose a high number, the vcpu structures will be quite
+	  large, so only choose a reasonable number that you expect to
+	  actually use.
+
+source drivers/virtio/Kconfig
+
+endif # VIRTUALIZATION
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
new file mode 100644
index 0000000..dfc293f
--- /dev/null
+++ b/arch/arm/kvm/Makefile
@@ -0,0 +1,21 @@
+#
+# Makefile for Kernel-based Virtual Machine module
+#
+
+plus_virt := $(call as-instr,.arch_extension virt,+virt)
+ifeq ($(plus_virt),+virt)
+	plus_virt_def := -DREQUIRES_VIRT=1
+endif
+
+ccflags-y += -Ivirt/kvm -Iarch/arm/kvm
+CFLAGS_arm.o := -I. $(plus_virt_def)
+CFLAGS_mmu.o := -I.
+
+AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
+AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
+
+kvm-arm-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+
+obj-y += kvm-arm.o init.o interrupts.o
+obj-y += arm.o guest.o mmu.o emulate.o reset.o
+obj-y += coproc.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
new file mode 100644
index 0000000..82cb338
--- /dev/null
+++ b/arch/arm/kvm/arm.c
@@ -0,0 +1,355 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <linux/mman.h>
+#include <linux/sched.h>
+#include <trace/events/kvm.h>
+
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
+#include <asm/unified.h>
+#include <asm/uaccess.h>
+#include <asm/ptrace.h>
+#include <asm/mman.h>
+#include <asm/cputype.h>
+
+#ifdef REQUIRES_VIRT
+__asm__(".arch_extension	virt");
+#endif
+
+int kvm_arch_hardware_enable(void *garbage)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
+{
+	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
+}
+
+void kvm_arch_hardware_disable(void *garbage)
+{
+}
+
+int kvm_arch_hardware_setup(void)
+{
+	return 0;
+}
+
+void kvm_arch_hardware_unsetup(void)
+{
+}
+
+void kvm_arch_check_processor_compat(void *rtn)
+{
+	*(int *)rtn = 0;
+}
+
+void kvm_arch_sync_events(struct kvm *kvm)
+{
+}
+
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+{
+	if (type)
+		return -EINVAL;
+
+	return 0;
+}
+
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+	return VM_FAULT_SIGBUS;
+}
+
+void kvm_arch_free_memslot(struct kvm_memory_slot *free,
+			   struct kvm_memory_slot *dont)
+{
+}
+
+int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
+{
+	return 0;
+}
+
+void kvm_arch_destroy_vm(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+		if (kvm->vcpus[i]) {
+			kvm_arch_vcpu_free(kvm->vcpus[i]);
+			kvm->vcpus[i] = NULL;
+		}
+	}
+}
+
+int kvm_dev_ioctl_check_extension(long ext)
+{
+	int r;
+	switch (ext) {
+	case KVM_CAP_USER_MEMORY:
+	case KVM_CAP_SYNC_MMU:
+	case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
+	case KVM_CAP_ONE_REG:
+		r = 1;
+		break;
+	case KVM_CAP_COALESCED_MMIO:
+		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
+		break;
+	case KVM_CAP_NR_VCPUS:
+		r = num_online_cpus();
+		break;
+	case KVM_CAP_MAX_VCPUS:
+		r = KVM_MAX_VCPUS;
+		break;
+	default:
+		r = 0;
+		break;
+	}
+	return r;
+}
+
+long kvm_arch_dev_ioctl(struct file *filp,
+			unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_set_memory_region(struct kvm *kvm,
+			       struct kvm_userspace_memory_region *mem,
+			       struct kvm_memory_slot old,
+			       int user_alloc)
+{
+	return 0;
+}
+
+int kvm_arch_prepare_memory_region(struct kvm *kvm,
+				   struct kvm_memory_slot *memslot,
+				   struct kvm_memory_slot old,
+				   struct kvm_userspace_memory_region *mem,
+				   bool user_alloc)
+{
+	return 0;
+}
+
+void kvm_arch_commit_memory_region(struct kvm *kvm,
+				   struct kvm_userspace_memory_region *mem,
+				   struct kvm_memory_slot old,
+				   bool user_alloc)
+{
+}
+
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+}
+
+void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+				   struct kvm_memory_slot *slot)
+{
+}
+
+struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	int err;
+	struct kvm_vcpu *vcpu;
+
+	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+	if (!vcpu) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	return vcpu;
+free_vcpu:
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
+out:
+	return ERR_PTR(err);
+}
+
+int kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvm_arch_vcpu_free(vcpu);
+}
+
+int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int __attribute_const__ kvm_target_cpu(void)
+{
+	unsigned long implementor = read_cpuid_implementor();
+	unsigned long part_number = read_cpuid_part_number();
+
+	if (implementor != ARM_CPU_IMP_ARM)
+		return -EINVAL;
+
+	switch (part_number) {
+	case ARM_CPU_PART_CORTEX_A15:
+		return KVM_ARM_TARGET_CORTEX_A15;
+	default:
+		return -EINVAL;
+	}
+}
+
+int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
+void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
+{
+}
+
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+					struct kvm_guest_debug *dbg)
+{
+	return -EINVAL;
+}
+
+
+int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
+				    struct kvm_mp_state *mp_state)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vcpu_ioctl(struct file *filp,
+			 unsigned int ioctl, unsigned long arg)
+{
+	struct kvm_vcpu *vcpu = filp->private_data;
+	void __user *argp = (void __user *)arg;
+
+	switch (ioctl) {
+	case KVM_ARM_VCPU_INIT: {
+		struct kvm_vcpu_init init;
+
+		if (copy_from_user(&init, argp, sizeof(init)))
+			return -EFAULT;
+
+		return kvm_vcpu_set_target(vcpu, &init);
+
+	}
+	case KVM_SET_ONE_REG:
+	case KVM_GET_ONE_REG: {
+		struct kvm_one_reg reg;
+		if (copy_from_user(&reg, argp, sizeof(reg)))
+			return -EFAULT;
+		if (ioctl == KVM_SET_ONE_REG)
+			return kvm_arm_set_reg(vcpu, &reg);
+		else
+			return kvm_arm_get_reg(vcpu, &reg);
+	}
+	case KVM_GET_REG_LIST: {
+		struct kvm_reg_list __user *user_list = argp;
+		struct kvm_reg_list reg_list;
+		unsigned n;
+
+		if (copy_from_user(&reg_list, user_list, sizeof(reg_list)))
+			return -EFAULT;
+		n = reg_list.n;
+		reg_list.n = kvm_arm_num_regs(vcpu);
+		if (copy_to_user(user_list, &reg_list, sizeof(reg_list)))
+			return -EFAULT;
+		if (n < reg_list.n)
+			return -E2BIG;
+		return kvm_arm_copy_reg_indices(vcpu, user_list->reg);
+	}
+	default:
+		return -EINVAL;
+	}
+}
+
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -EINVAL;
+}
+
+long kvm_arch_vm_ioctl(struct file *filp,
+		       unsigned int ioctl, unsigned long arg)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_init(void *opaque)
+{
+	return 0;
+}
+
+/* NOP: Compiling as a module not supported */
+void kvm_arch_exit(void)
+{
+}
+
+static int arm_init(void)
+{
+	int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+	return rc;
+}
+
+module_init(arm_init);
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
new file mode 100644
index 0000000..0c43355
--- /dev/null
+++ b/arch/arm/kvm/coproc.c
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Authors: Rusty Russell <rusty@rustcorp.com.au>
+ *          Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/kvm_host.h>
+
+void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
+{
+}
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
new file mode 100644
index 0000000..3eadc25
--- /dev/null
+++ b/arch/arm/kvm/emulate.c
@@ -0,0 +1,155 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <asm/kvm_emulate.h>
+
+#define VCPU_NR_MODES		6
+#define VCPU_REG_OFFSET_USR	0
+#define VCPU_REG_OFFSET_FIQ	1
+#define VCPU_REG_OFFSET_IRQ	2
+#define VCPU_REG_OFFSET_SVC	3
+#define VCPU_REG_OFFSET_ABT	4
+#define VCPU_REG_OFFSET_UND	5
+#define REG_OFFSET(_reg) \
+	(offsetof(struct kvm_regs, _reg) / sizeof(u32))
+
+#define USR_REG_OFFSET(_num) REG_OFFSET(usr_regs.uregs[_num])
+
+static const unsigned long vcpu_reg_offsets[VCPU_NR_MODES][15] = {
+	/* USR/SYS Registers */
+	[VCPU_REG_OFFSET_USR] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12), USR_REG_OFFSET(13),	USR_REG_OFFSET(14),
+	},
+
+	/* FIQ Registers */
+	[VCPU_REG_OFFSET_FIQ] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7),
+		REG_OFFSET(fiq_regs[0]), /* r8 */
+		REG_OFFSET(fiq_regs[1]), /* r9 */
+		REG_OFFSET(fiq_regs[2]), /* r10 */
+		REG_OFFSET(fiq_regs[3]), /* r11 */
+		REG_OFFSET(fiq_regs[4]), /* r12 */
+		REG_OFFSET(fiq_regs[5]), /* r13 */
+		REG_OFFSET(fiq_regs[6]), /* r14 */
+	},
+
+	/* IRQ Registers */
+	[VCPU_REG_OFFSET_IRQ] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(irq_regs[0]), /* r13 */
+		REG_OFFSET(irq_regs[1]), /* r14 */
+	},
+
+	/* SVC Registers */
+	[VCPU_REG_OFFSET_SVC] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(svc_regs[0]), /* r13 */
+		REG_OFFSET(svc_regs[1]), /* r14 */
+	},
+
+	/* ABT Registers */
+	[VCPU_REG_OFFSET_ABT] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(abt_regs[0]), /* r13 */
+		REG_OFFSET(abt_regs[1]), /* r14 */
+	},
+
+	/* UND Registers */
+	[VCPU_REG_OFFSET_UND] = {
+		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
+		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
+		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
+		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
+		USR_REG_OFFSET(12),
+		REG_OFFSET(und_regs[0]), /* r13 */
+		REG_OFFSET(und_regs[1]), /* r14 */
+	},
+};
+
+/*
+ * Return a pointer to the register number valid in the current mode of
+ * the virtual CPU.
+ */
+u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num)
+{
+	u32 *reg_array = (u32 *)&vcpu->arch.regs;
+	u32 mode = *vcpu_cpsr(vcpu) & MODE_MASK;
+
+	switch (mode) {
+	case USR_MODE...SVC_MODE:
+		mode &= ~MODE32_BIT; /* 0 ... 3 */
+		break;
+
+	case ABT_MODE:
+		mode = VCPU_REG_OFFSET_ABT;
+		break;
+
+	case UND_MODE:
+		mode = VCPU_REG_OFFSET_UND;
+		break;
+
+	case SYSTEM_MODE:
+		mode = VCPU_REG_OFFSET_USR;
+		break;
+
+	default:
+		BUG();
+	}
+
+	return reg_array + vcpu_reg_offsets[mode][reg_num];
+}
+
+/*
+ * Return the SPSR for the current mode of the virtual CPU.
+ */
+u32 *vcpu_spsr(struct kvm_vcpu *vcpu)
+{
+	u32 mode = *vcpu_cpsr(vcpu) & MODE_MASK;
+	switch (mode) {
+	case SVC_MODE:
+		return &vcpu->arch.regs.KVM_ARM_SVC_spsr;
+	case ABT_MODE:
+		return &vcpu->arch.regs.KVM_ARM_ABT_spsr;
+	case UND_MODE:
+		return &vcpu->arch.regs.KVM_ARM_UND_spsr;
+	case IRQ_MODE:
+		return &vcpu->arch.regs.KVM_ARM_IRQ_spsr;
+	case FIQ_MODE:
+		return &vcpu->arch.regs.KVM_ARM_FIQ_spsr;
+	default:
+		BUG();
+	}
+}
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
new file mode 100644
index 0000000..8db3811
--- /dev/null
+++ b/arch/arm/kvm/guest.c
@@ -0,0 +1,221 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <asm/uaccess.h>
+#include <asm/kvm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
+
+#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
+#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ NULL }
+};
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+static u64 core_reg_offset_from_id(u64 id)
+{
+	return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
+}
+
+static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	u32 __user *uaddr = (u32 __user *)(long)reg->addr;
+	struct kvm_regs *regs = &vcpu->arch.regs;
+	u64 off;
+
+	if (KVM_REG_SIZE(reg->id) != 4)
+		return -ENOENT;
+
+	/* Our ID is an index into the kvm_regs struct. */
+	off = core_reg_offset_from_id(reg->id);
+	if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
+		return -ENOENT;
+
+	return put_user(((u32 *)regs)[off], uaddr);
+}
+
+static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	u32 __user *uaddr = (u32 __user *)(long)reg->addr;
+	struct kvm_regs *regs = &vcpu->arch.regs;
+	u64 off, val;
+
+	if (KVM_REG_SIZE(reg->id) != 4)
+		return -ENOENT;
+
+	/* Our ID is an index into the kvm_regs struct. */
+	off = core_reg_offset_from_id(reg->id);
+	if (off >= sizeof(*regs) / KVM_REG_SIZE(reg->id))
+		return -ENOENT;
+
+	if (get_user(val, uaddr) != 0)
+		return -EFAULT;
+
+	if (off == KVM_REG_ARM_CORE_REG(usr_regs.ARM_cpsr)) {
+		unsigned long mode = val & MODE_MASK;
+		switch (mode) {
+		case USR_MODE:
+		case FIQ_MODE:
+		case IRQ_MODE:
+		case SVC_MODE:
+		case ABT_MODE:
+		case UND_MODE:
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	((u32 *)regs)[off] = val;
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	return -EINVAL;
+}
+
+static unsigned long num_core_regs(void)
+{
+	return sizeof(struct kvm_regs) / sizeof(u32);
+}
+
+/**
+ * kvm_arm_num_regs - how many registers do we present via KVM_GET_ONE_REG
+ *
+ * This is for all registers.
+ */
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
+{
+	return num_core_regs();
+}
+
+/**
+ * kvm_arm_copy_reg_indices - get indices of all registers.
+ *
+ * We do core registers right here, then we apppend coproc regs.
+ */
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
+{
+	unsigned int i;
+	const u64 core_reg = KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_CORE;
+
+	for (i = 0; i < sizeof(struct kvm_regs)/sizeof(u32); i++) {
+		if (put_user(core_reg | i, uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	return 0;
+}
+
+int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	/* We currently use nothing arch-specific in upper 32 bits */
+	if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM >> 32)
+		return -EINVAL;
+
+	/* Register group 16 means we want a core register. */
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
+		return get_core_reg(vcpu, reg);
+
+	return -EINVAL;
+}
+
+int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	/* We currently use nothing arch-specific in upper 32 bits */
+	if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM >> 32)
+		return -EINVAL;
+
+	/* Register group 16 means we set a core register. */
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
+		return set_core_reg(vcpu, reg);
+
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+				  struct kvm_sregs *sregs)
+{
+	return -EINVAL;
+}
+
+int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
+			const struct kvm_vcpu_init *init)
+{
+	unsigned int i;
+
+	/* We can only do a cortex A15 for now. */
+	if (init->target != kvm_target_cpu())
+		return -EINVAL;
+
+	vcpu->arch.target = init->target;
+	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
+
+	/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
+	for (i = 0; i < sizeof(init->features)*8; i++) {
+		if (init->features[i / 32] & (1 << (i % 32))) {
+			if (i >= KVM_VCPU_MAX_FEATURES)
+				return -ENOENT;
+			set_bit(i, vcpu->arch.features);
+		}
+	}
+
+	/* Now we know what it is, we can reset it. */
+	return kvm_reset_vcpu(vcpu);
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -EINVAL;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+				  struct kvm_translation *tr)
+{
+	return -EINVAL;
+}
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
new file mode 100644
index 0000000..1dc8926
--- /dev/null
+++ b/arch/arm/kvm/init.S
@@ -0,0 +1,19 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
new file mode 100644
index 0000000..1dc8926
--- /dev/null
+++ b/arch/arm/kvm/interrupts.S
@@ -0,0 +1,19 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <asm/asm-offsets.h>
+#include <asm/kvm_asm.h>
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
new file mode 100644
index 0000000..10ed464
--- /dev/null
+++ b/arch/arm/kvm/mmu.c
@@ -0,0 +1,17 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
new file mode 100644
index 0000000..b80256b
--- /dev/null
+++ b/arch/arm/kvm/reset.c
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/sched.h>
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+
+#include <asm/unified.h>
+#include <asm/ptrace.h>
+#include <asm/cputype.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_coproc.h>
+
+/******************************************************************************
+ * Cortex-A15 Reset Values
+ */
+
+static const int a15_max_cpu_idx = 3;
+
+static struct kvm_regs a15_regs_reset = {
+	.usr_regs.ARM_cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
+};
+
+
+/*******************************************************************************
+ * Exported reset function
+ */
+
+/**
+ * kvm_reset_vcpu - sets core registers and cp15 registers to reset value
+ * @vcpu: The VCPU pointer
+ *
+ * This function finds the right table above and sets the registers on the
+ * virtual CPU struct to their architectually defined reset values.
+ */
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+	struct kvm_regs *cpu_reset;
+
+	switch (vcpu->arch.target) {
+	case KVM_ARM_TARGET_CORTEX_A15:
+		if (vcpu->vcpu_id > a15_max_cpu_idx)
+			return -EINVAL;
+		cpu_reset = &a15_regs_reset;
+		vcpu->arch.midr = read_cpuid_id();
+		break;
+	default:
+		return -ENODEV;
+	}
+
+	/* Reset core registers */
+	memcpy(&vcpu->arch.regs, cpu_reset, sizeof(vcpu->arch.regs));
+
+	/* Reset CP15 registers */
+	kvm_reset_coprocs(vcpu);
+
+	return 0;
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
new file mode 100644
index 0000000..f8869c1
--- /dev/null
+++ b/arch/arm/kvm/trace.h
@@ -0,0 +1,52 @@
+#if !defined(_TRACE_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+/*
+ * Tracepoints for entry/exit to guest
+ */
+TRACE_EVENT(kvm_entry,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+TRACE_EVENT(kvm_exit,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
+);
+
+
+
+#endif /* _TRACE_KVM_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH arch/arm/kvm
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e6e5d4b..24978d5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -764,6 +764,11 @@ struct kvm_dirty_tlb {
 #define KVM_REG_SIZE_U512	0x0060000000000000ULL
 #define KVM_REG_SIZE_U1024	0x0070000000000000ULL
 
+struct kvm_reg_list {
+	__u64 n; /* number of regs */
+	__u64 reg[0];
+};
+
 struct kvm_one_reg {
 	__u64 id;
 	__u64 addr;
@@ -932,6 +937,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_ONE_REG		  _IOW(KVMIO,  0xac, struct kvm_one_reg)
 /* VM is being stopped by host */
 #define KVM_KVMCLOCK_CTRL	  _IO(KVMIO,   0xad)
+#define KVM_ARM_VCPU_INIT	  _IOW(KVMIO,  0xae, struct kvm_vcpu_init)
+#define KVM_GET_REG_LIST	  _IOWR(KVMIO, 0xb0, struct kvm_reg_list)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 04/14] KVM: ARM: Hypervisor initialization
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:39   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm; +Cc: Marc Zyngier, Marcelo Tosatti

Sets up KVM code to handle all exceptions taken to Hyp mode.

When the kernel is booted in Hyp mode, calling an hvc instruction with r0
pointing to the new vectors, the HVBAR is changed to the the vector pointers.
This allows subsystems (like KVM here) to execute code in Hyp-mode with the
MMU disabled.

We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.

Also provides memory mapping code to map required code pages, data structures,
and I/O regions  accessed in Hyp mode at the same virtual address as the host
kernel virtual addresses, but which conforms to the architectural requirements
for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
and comprises:
 - create_hyp_mappings(from, to);
 - create_hyp_io_mappings(from, to, phys_addr);
 - free_hyp_pmds();

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h              |  124 ++++++++++++++
 arch/arm/include/asm/kvm_asm.h              |   20 ++
 arch/arm/include/asm/kvm_mmu.h              |   29 +++
 arch/arm/include/asm/pgtable-3level-hwdef.h |    4 
 arch/arm/kvm/arm.c                          |  177 +++++++++++++++++++
 arch/arm/kvm/init.S                         |   95 ++++++++++
 arch/arm/kvm/interrupts.S                   |   37 ++++
 arch/arm/kvm/mmu.c                          |  248 +++++++++++++++++++++++++++
 8 files changed, 734 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm_mmu.h

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index c196a22..613afe2 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -21,4 +21,128 @@
 
 #include <asm/types.h>
 
+/* Hyp Configuration Register (HCR) bits */
+#define HCR_TGE		(1 << 27)
+#define HCR_TVM		(1 << 26)
+#define HCR_TTLB	(1 << 25)
+#define HCR_TPU		(1 << 24)
+#define HCR_TPC		(1 << 23)
+#define HCR_TSW		(1 << 22)
+#define HCR_TAC		(1 << 21)
+#define HCR_TIDCP	(1 << 20)
+#define HCR_TSC		(1 << 19)
+#define HCR_TID3	(1 << 18)
+#define HCR_TID2	(1 << 17)
+#define HCR_TID1	(1 << 16)
+#define HCR_TID0	(1 << 15)
+#define HCR_TWE		(1 << 14)
+#define HCR_TWI		(1 << 13)
+#define HCR_DC		(1 << 12)
+#define HCR_BSU		(3 << 10)
+#define HCR_BSU_IS	(1 << 10)
+#define HCR_FB		(1 << 9)
+#define HCR_VA		(1 << 8)
+#define HCR_VI		(1 << 7)
+#define HCR_VF		(1 << 6)
+#define HCR_AMO		(1 << 5)
+#define HCR_IMO		(1 << 4)
+#define HCR_FMO		(1 << 3)
+#define HCR_PTW		(1 << 2)
+#define HCR_SWIO	(1 << 1)
+#define HCR_VM		1
+
+/*
+ * The bits we set in HCR:
+ * TAC:		Trap ACTLR
+ * TSC:		Trap SMC
+ * TSW:		Trap cache operations by set/way
+ * TWI:		Trap WFI
+ * TIDCP:	Trap L2CTLR/L2ECTLR
+ * BSU_IS:	Upgrade barriers to the inner shareable domain
+ * FB:		Force broadcast of all maintainance operations
+ * AMO:		Override CPSR.A and enable signaling with VA
+ * IMO:		Override CPSR.I and enable signaling with VI
+ * FMO:		Override CPSR.F and enable signaling with VF
+ * SWIO:	Turn set/way invalidates into set/way clean+invalidate
+ */
+#define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
+			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
+			HCR_SWIO | HCR_TIDCP)
+
+/* Hyp System Control Register (HSCTLR) bits */
+#define HSCTLR_TE	(1 << 30)
+#define HSCTLR_EE	(1 << 25)
+#define HSCTLR_FI	(1 << 21)
+#define HSCTLR_WXN	(1 << 19)
+#define HSCTLR_I	(1 << 12)
+#define HSCTLR_C	(1 << 2)
+#define HSCTLR_A	(1 << 1)
+#define HSCTLR_M	1
+#define HSCTLR_MASK	(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
+			 HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
+
+/* TTBCR and HTCR Registers bits */
+#define TTBCR_EAE	(1 << 31)
+#define TTBCR_IMP	(1 << 30)
+#define TTBCR_SH1	(3 << 28)
+#define TTBCR_ORGN1	(3 << 26)
+#define TTBCR_IRGN1	(3 << 24)
+#define TTBCR_EPD1	(1 << 23)
+#define TTBCR_A1	(1 << 22)
+#define TTBCR_T1SZ	(3 << 16)
+#define TTBCR_SH0	(3 << 12)
+#define TTBCR_ORGN0	(3 << 10)
+#define TTBCR_IRGN0	(3 << 8)
+#define TTBCR_EPD0	(1 << 7)
+#define TTBCR_T0SZ	3
+#define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
+
+/* Hyp Debug Configuration Register bits */
+#define HDCR_TDRA	(1 << 11)
+#define HDCR_TDOSA	(1 << 10)
+#define HDCR_TDA	(1 << 9)
+#define HDCR_TDE	(1 << 8)
+#define HDCR_HPME	(1 << 7)
+#define HDCR_TPM	(1 << 6)
+#define HDCR_TPMCR	(1 << 5)
+#define HDCR_HPMN_MASK	(0x1F)
+
+/*
+ * The architecture supports 40-bit IPA as input to the 2nd stage translations
+ * and PTRS_PER_S2_PGD becomes 1024, because each entry covers 1GB of address
+ * space.
+ */
+#define KVM_PHYS_SHIFT	(40)
+#define KVM_PHYS_SIZE	(1ULL << KVM_PHYS_SHIFT)
+#define KVM_PHYS_MASK	(KVM_PHYS_SIZE - 1ULL)
+#define PTRS_PER_S2_PGD	(1ULL << (KVM_PHYS_SHIFT - 30))
+#define S2_PGD_ORDER	get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
+#define S2_PGD_SIZE	(1 << S2_PGD_ORDER)
+
+/* Virtualization Translation Control Register (VTCR) bits */
+#define VTCR_SH0	(3 << 12)
+#define VTCR_ORGN0	(3 << 10)
+#define VTCR_IRGN0	(3 << 8)
+#define VTCR_SL0	(3 << 6)
+#define VTCR_S		(1 << 4)
+#define VTCR_T0SZ	(0xf)
+#define VTCR_MASK	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0 | VTCR_SL0 | \
+			 VTCR_S | VTCR_T0SZ)
+#define VTCR_HTCR_SH	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0)
+#define VTCR_SL_L2	(0 << 6)	/* Starting-level: 2 */
+#define VTCR_SL_L1	(1 << 6)	/* Starting-level: 1 */
+#define KVM_VTCR_SL0	VTCR_SL_L1
+/* stage-2 input address range defined as 2^(32-T0SZ) */
+#define KVM_T0SZ	(32 - KVM_PHYS_SHIFT)
+#define KVM_VTCR_T0SZ	(KVM_T0SZ & VTCR_T0SZ)
+#define KVM_VTCR_S	((KVM_VTCR_T0SZ << 1) & VTCR_S)
+
+/* Virtualization Translation Table Base Register (VTTBR) bits */
+#if KVM_VTCR_SL0 == VTCR_SL_L2	/* see ARM DDI 0406C: B4-1720 */
+#define VTTBR_X		(14 - KVM_T0SZ)
+#else
+#define VTTBR_X		(5 - KVM_T0SZ)
+#endif
+
+
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index f9993e5..81324e2 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -54,5 +54,25 @@
 #define ARM_EXCEPTION_DATA_ABORT  4
 #define ARM_EXCEPTION_IRQ	  5
 #define ARM_EXCEPTION_FIQ	  6
+#define ARM_EXCEPTION_HVC	  7
+
+#ifndef __ASSEMBLY__
+struct kvm_vcpu;
+
+extern char __kvm_hyp_init[];
+extern char __kvm_hyp_init_end[];
+
+extern char __kvm_hyp_exit[];
+extern char __kvm_hyp_exit_end[];
+
+extern char __kvm_hyp_vector[];
+
+extern char __kvm_hyp_code_start[];
+extern char __kvm_hyp_code_end[];
+
+extern void __kvm_flush_vm_context(void);
+
+extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+#endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
new file mode 100644
index 0000000..e8679b3
--- /dev/null
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_MMU_H__
+#define __ARM_KVM_MMU_H__
+
+int create_hyp_mappings(void *from, void *to);
+int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
+void free_hyp_pmds(void);
+
+phys_addr_t kvm_mmu_get_httbr(void);
+int kvm_mmu_init(void);
+void kvm_clear_hyp_idmap(void);
+#endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index a2d404e..18f5cef 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -32,6 +32,9 @@
 #define PMD_TYPE_SECT		(_AT(pmdval_t, 1) << 0)
 #define PMD_BIT4		(_AT(pmdval_t, 0))
 #define PMD_DOMAIN(x)		(_AT(pmdval_t, 0))
+#define PMD_APTABLE_SHIFT	(61)
+#define PMD_APTABLE		(_AT(pgdval_t, 3) << PGD_APTABLE_SHIFT)
+#define PMD_PXNTABLE		(_AT(pgdval_t, 1) << 59)
 
 /*
  *   - section
@@ -41,6 +44,7 @@
 #define PMD_SECT_S		(_AT(pmdval_t, 3) << 8)
 #define PMD_SECT_AF		(_AT(pmdval_t, 1) << 10)
 #define PMD_SECT_nG		(_AT(pmdval_t, 1) << 11)
+#define PMD_SECT_PXN		(_AT(pmdval_t, 1) << 53)
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 82cb338..2dddc58 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -34,11 +34,21 @@
 #include <asm/ptrace.h>
 #include <asm/mman.h>
 #include <asm/cputype.h>
+#include <asm/tlbflush.h>
+#include <asm/virt.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_mmu.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
 #endif
 
+static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
+static unsigned long hyp_default_vectors;
+
+
 int kvm_arch_hardware_enable(void *garbage)
 {
 	return 0;
@@ -336,9 +346,176 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	return -EINVAL;
 }
 
+static void cpu_init_hyp_mode(void *vector)
+{
+	unsigned long long pgd_ptr;
+	unsigned long hyp_stack_ptr;
+	unsigned long stack_page;
+	unsigned long vector_ptr;
+
+	/* Switch from the HYP stub to our own HYP init vector */
+	__hyp_set_vectors((unsigned long)vector);
+
+	pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();
+	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
+	hyp_stack_ptr = stack_page + PAGE_SIZE;
+	vector_ptr = (unsigned long)__kvm_hyp_vector;
+
+	/*
+	 * Call initialization code, and switch to the full blown
+	 * HYP code. The init code corrupts r12, so set the clobber
+	 * list accordingly.
+	 */
+	asm volatile (
+		"mov	r0, %[pgd_ptr_low]\n\t"
+		"mov	r1, %[pgd_ptr_high]\n\t"
+		"mov	r2, %[hyp_stack_ptr]\n\t"
+		"mov	r3, %[vector_ptr]\n\t"
+		"hvc	#0\n\t" : :
+		[pgd_ptr_low] "r" ((unsigned long)(pgd_ptr & 0xffffffff)),
+		[pgd_ptr_high] "r" ((unsigned long)(pgd_ptr >> 32ULL)),
+		[hyp_stack_ptr] "r" (hyp_stack_ptr),
+		[vector_ptr] "r" (vector_ptr) :
+		"r0", "r1", "r2", "r3", "r12");
+}
+
+/**
+ * Inits Hyp-mode on all online CPUs
+ */
+static int init_hyp_mode(void)
+{
+	phys_addr_t init_phys_addr;
+	int cpu;
+	int err = 0;
+
+	/*
+	 * Allocate Hyp PGD and setup Hyp identity mapping
+	 */
+	err = kvm_mmu_init();
+	if (err)
+		goto out_err;
+
+	/*
+	 * It is probably enough to obtain the default on one
+	 * CPU. It's unlikely to be different on the others.
+	 */
+	hyp_default_vectors = __hyp_get_vectors();
+
+	/*
+	 * Allocate stack pages for Hypervisor-mode
+	 */
+	for_each_possible_cpu(cpu) {
+		unsigned long stack_page;
+
+		stack_page = __get_free_page(GFP_KERNEL);
+		if (!stack_page) {
+			err = -ENOMEM;
+			goto out_free_stack_pages;
+		}
+
+		per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
+	}
+
+	/*
+	 * Execute the init code on each CPU.
+	 *
+	 * Note: The stack is not mapped yet, so don't do anything else than
+	 * initializing the hypervisor mode on each CPU using a local stack
+	 * space for temporary storage.
+	 */
+	init_phys_addr = virt_to_phys(__kvm_hyp_init);
+	for_each_online_cpu(cpu) {
+		smp_call_function_single(cpu, cpu_init_hyp_mode,
+					 (void *)(long)init_phys_addr, 1);
+	}
+
+	/*
+	 * Unmap the identity mapping
+	 */
+	kvm_clear_hyp_idmap();
+
+	/*
+	 * Map the Hyp-code called directly from the host
+	 */
+	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
+	if (err) {
+		kvm_err("Cannot map world-switch code\n");
+		goto out_free_mappings;
+	}
+
+	/*
+	 * Map the Hyp stack pages
+	 */
+	for_each_possible_cpu(cpu) {
+		char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
+		err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE);
+
+		if (err) {
+			kvm_err("Cannot map hyp stack\n");
+			goto out_free_mappings;
+		}
+	}
+
+	/*
+	 * Map the host VFP structures
+	 */
+	kvm_host_vfp_state = alloc_percpu(struct vfp_hard_struct);
+	if (!kvm_host_vfp_state) {
+		err = -ENOMEM;
+		kvm_err("Cannot allocate host VFP state\n");
+		goto out_free_mappings;
+	}
+
+	for_each_possible_cpu(cpu) {
+		struct vfp_hard_struct *vfp;
+
+		vfp = per_cpu_ptr(kvm_host_vfp_state, cpu);
+		err = create_hyp_mappings(vfp, vfp + 1);
+
+		if (err) {
+			kvm_err("Cannot map host VFP state: %d\n", err);
+			goto out_free_vfp;
+		}
+	}
+
+	kvm_info("Hyp mode initialized successfully\n");
+	return 0;
+out_free_vfp:
+	free_percpu(kvm_host_vfp_state);
+out_free_mappings:
+	free_hyp_pmds();
+out_free_stack_pages:
+	for_each_possible_cpu(cpu)
+		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+out_err:
+	kvm_err("error initializing Hyp mode: %d\n", err);
+	return err;
+}
+
+/**
+ * Initialize Hyp-mode and memory mappings on all CPUs.
+ */
 int kvm_arch_init(void *opaque)
 {
+	int err;
+
+	if (!is_hyp_mode_available()) {
+		kvm_err("HYP mode not available\n");
+		return -ENODEV;
+	}
+
+	if (kvm_target_cpu() < 0) {
+		kvm_err("Target CPU not supported!\n");
+		return -ENODEV;
+	}
+
+	err = init_hyp_mode();
+	if (err)
+		goto out_err;
+
 	return 0;
+out_err:
+	return err;
 }
 
 /* NOP: Compiling as a module not supported */
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 1dc8926..f179f10 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -15,5 +15,100 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/linkage.h>
+#include <asm/unified.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+/********************************************************************
+ * Hypervisor initialization
+ *   - should be called with:
+ *       r0,r1 = Hypervisor pgd pointer
+ *       r2 = top of Hyp stack (kernel VA)
+ *       r3 = pointer to hyp vectors
+ */
+
+	.text
+	.pushsection    .hyp.idmap.text,"ax"
+	.align 5
+__kvm_hyp_init:
+	.globl __kvm_hyp_init
+
+	@ Hyp-mode exception vector
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	__do_hyp_init
+	W(b)	.
+	W(b)	.
+
+__do_hyp_init:
+	@ Set the HTTBR to point to the hypervisor PGD pointer passed
+	mcrr	p15, 4, r0, r1, c2
+
+	@ Set the HTCR and VTCR to the same shareability and cacheability
+	@ settings as the non-secure TTBCR and with T0SZ == 0.
+	mrc	p15, 4, r0, c2, c0, 2	@ HTCR
+	ldr	r12, =HTCR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c2, c0, 2	@ TTBCR
+	and	r1, r1, #(HTCR_MASK & ~TTBCR_T0SZ)
+	orr	r0, r0, r1
+	mcr	p15, 4, r0, c2, c0, 2	@ HTCR
+
+	mrc	p15, 4, r1, c2, c1, 2	@ VTCR
+	ldr	r12, =VTCR_MASK
+	bic	r1, r1, r12
+	bic	r0, r0, #(~VTCR_HTCR_SH)	@ clear non-reusable HTCR bits
+	orr	r1, r0, r1
+	orr	r1, r1, #(KVM_VTCR_SL0 | KVM_VTCR_T0SZ | KVM_VTCR_S)
+	mcr	p15, 4, r1, c2, c1, 2	@ VTCR
+
+	@ Use the same memory attributes for hyp. accesses as the kernel
+	@ (copy MAIRx ro HMAIRx).
+	mrc	p15, 0, r0, c10, c2, 0
+	mcr	p15, 4, r0, c10, c2, 0
+	mrc	p15, 0, r0, c10, c2, 1
+	mcr	p15, 4, r0, c10, c2, 1
+
+	@ Set the HSCTLR to:
+	@  - ARM/THUMB exceptions: Kernel config (Thumb-2 kernel)
+	@  - Endianness: Kernel config
+	@  - Fast Interrupt Features: Kernel config
+	@  - Write permission implies XN: disabled
+	@  - Instruction cache: enabled
+	@  - Data/Unified cache: enabled
+	@  - Memory alignment checks: enabled
+	@  - MMU: enabled (this code must be run from an identity mapping)
+	mrc	p15, 4, r0, c1, c0, 0	@ HSCR
+	ldr	r12, =HSCTLR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c1, c0, 0	@ SCTLR
+	ldr	r12, =(HSCTLR_EE | HSCTLR_FI)
+	and	r1, r1, r12
+ ARM(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I)			)
+ THUMB(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I | HSCTLR_TE)	)
+	orr	r1, r1, r12
+	orr	r0, r0, r1
+	isb
+	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
+	isb
+
+	@ Set stack pointer and return to the kernel
+	mov	sp, r2
+
+	@ Set HVBAR to point to the HYP vectors
+	mcr	p15, 4, r3, c12, c0, 0	@ HVBAR
+
+	eret
+
+	.ltorg
+
+	.globl __kvm_hyp_init_end
+__kvm_hyp_init_end:
+
+	.popsection
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 1dc8926..6bde17b 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -15,5 +15,42 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/unified.h>
+#include <asm/page.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+	.text
+
+__kvm_hyp_code_start:
+	.globl __kvm_hyp_code_start
+
+/********************************************************************
+ * Flush per-VMID TLBs
+ */
+ENTRY(__kvm_flush_vm_context)
+	bx	lr
+ENDPROC(__kvm_flush_vm_context)
+
+/********************************************************************
+ *  Hypervisor world-switch code
+ */
+ENTRY(__kvm_vcpu_run)
+	bx	lr
+
+
+/********************************************************************
+ * Hypervisor exception vector and handlers
+ */
+
+	.align 5
+__kvm_hyp_vector:
+	.globl __kvm_hyp_vector
+	nop
+
+__kvm_hyp_code_end:
+	.globl	__kvm_hyp_code_end
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 10ed464..4decdb6 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -15,3 +15,251 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/mman.h>
+#include <linux/kvm_host.h>
+#include <linux/io.h>
+#include <asm/idmap.h>
+#include <asm/pgalloc.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
+#include <asm/mach/map.h>
+
+extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
+
+static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
+
+static void kvm_set_pte(pte_t *pte, pte_t new_pte)
+{
+	pte_val(*pte) = new_pte;
+	/*
+	 * flush_pmd_entry just takes a void pointer and cleans the necessary
+	 * cache entries, so we can reuse the function for ptes.
+	 */
+	flush_pmd_entry(pte);
+}
+
+static void free_ptes(pmd_t *pmd, unsigned long addr)
+{
+	pte_t *pte;
+	unsigned int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			pte_free_kernel(NULL, pte);
+		}
+		pmd++;
+	}
+}
+
+/**
+ * free_hyp_pmds - free a Hyp-mode level-2 tables and child level-3 tables
+ *
+ * Assumes this is a page table used strictly in Hyp-mode and therefore contains
+ * only mappings in the kernel memory area, which is above PAGE_OFFSET.
+ */
+void free_hyp_pmds(void)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = PAGE_OFFSET; addr != 0; addr += PGDIR_SIZE) {
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none(*pud))
+			continue;
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+		pud_clear(pud);
+	}
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+}
+
+static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,
+				    unsigned long end)
+{
+	pte_t *pte;
+	unsigned long addr;
+	struct page *page;
+
+	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
+		pte = pte_offset_kernel(pmd, addr);
+		BUG_ON(!virt_addr_valid(addr));
+		page = virt_to_page(addr);
+		kvm_set_pte(pte, mk_pte(page, PAGE_HYP));
+	}
+}
+
+static void create_hyp_io_pte_mappings(pmd_t *pmd, unsigned long start,
+				       unsigned long end,
+				       unsigned long *pfn_base)
+{
+	pte_t *pte;
+	unsigned long addr;
+
+	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
+		pte = pte_offset_kernel(pmd, addr);
+		BUG_ON(pfn_valid(*pfn_base));
+		kvm_set_pte(pte, pfn_pte(*pfn_base, PAGE_HYP_DEVICE));
+		(*pfn_base)++;
+	}
+}
+
+static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
+				   unsigned long end, unsigned long *pfn_base)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+	unsigned long addr, next;
+
+	for (addr = start; addr < end; addr = next) {
+		pmd = pmd_offset(pud, addr);
+
+		BUG_ON(pmd_sect(*pmd));
+
+		if (pmd_none(*pmd)) {
+			pte = pte_alloc_one_kernel(NULL, addr);
+			if (!pte) {
+				kvm_err("Cannot allocate Hyp pte\n");
+				return -ENOMEM;
+			}
+			pmd_populate_kernel(NULL, pmd, pte);
+		}
+
+		next = pmd_addr_end(addr, end);
+
+		/*
+		 * If pfn_base is NULL, we map kernel pages into HYP with the
+		 * virtual address. Otherwise, this is considered an I/O
+		 * mapping and we map the physical region starting at
+		 * *pfn_base to [start, end[.
+		 */
+		if (!pfn_base)
+			create_hyp_pte_mappings(pmd, addr, next);
+		else
+			create_hyp_io_pte_mappings(pmd, addr, next, pfn_base);
+	}
+
+	return 0;
+}
+
+static int __create_hyp_mappings(void *from, void *to, unsigned long *pfn_base)
+{
+	unsigned long start = (unsigned long)from;
+	unsigned long end = (unsigned long)to;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr, next;
+	int err = 0;
+
+	BUG_ON(start > end);
+	if (start < PAGE_OFFSET)
+		return -EINVAL;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = start; addr < end; addr = next) {
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none_or_clear_bad(pud)) {
+			pmd = pmd_alloc_one(NULL, addr);
+			if (!pmd) {
+				kvm_err("Cannot allocate Hyp pmd\n");
+				err = -ENOMEM;
+				goto out;
+			}
+			pud_populate(NULL, pud, pmd);
+		}
+
+		next = pgd_addr_end(addr, end);
+		err = create_hyp_pmd_mappings(pud, addr, next, pfn_base);
+		if (err)
+			goto out;
+	}
+out:
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+	return err;
+}
+
+/**
+ * create_hyp_mappings - map a kernel virtual address range in Hyp mode
+ * @from:	The virtual kernel start address of the range
+ * @to:		The virtual kernel end address of the range (exclusive)
+ *
+ * The same virtual address as the kernel virtual address is also used in
+ * Hyp-mode mapping to the same underlying physical pages.
+ *
+ * Note: Wrapping around zero in the "to" address is not supported.
+ */
+int create_hyp_mappings(void *from, void *to)
+{
+	return __create_hyp_mappings(from, to, NULL);
+}
+
+/**
+ * create_hyp_io_mappings - map a physical IO range in Hyp mode
+ * @from:	The virtual HYP start address of the range
+ * @to:		The virtual HYP end address of the range (exclusive)
+ * @addr:	The physical start address which gets mapped
+ */
+int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
+{
+	unsigned long pfn = __phys_to_pfn(addr);
+	return __create_hyp_mappings(from, to, &pfn);
+}
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+phys_addr_t kvm_mmu_get_httbr(void)
+{
+	VM_BUG_ON(!virt_addr_valid(hyp_pgd));
+	return virt_to_phys(hyp_pgd);
+}
+
+int kvm_mmu_init(void)
+{
+	return hyp_pgd ? 0 : -ENOMEM;
+}
+
+/**
+ * kvm_clear_idmap - remove all idmaps from the hyp pgd
+ *
+ * Free the underlying pmds for all pgds in range and clear the pgds (but
+ * don't free them) afterwards.
+ */
+void kvm_clear_hyp_idmap(void)
+{
+	unsigned long addr, end;
+	unsigned long next;
+	pgd_t *pgd = hyp_pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	addr = virt_to_phys(__hyp_idmap_text_start);
+	end = virt_to_phys(__hyp_idmap_text_end);
+
+	pgd += pgd_index(addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none_or_clear_bad(pgd))
+			continue;
+		pud = pud_offset(pgd, addr);
+		pmd = pmd_offset(pud, addr);
+
+		pud_clear(pud);
+		clean_pmd_entry(pmd);
+		pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
+	} while (pgd++, addr = next, addr < end);
+}

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 04/14] KVM: ARM: Hypervisor initialization
@ 2013-01-08 18:39   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

Sets up KVM code to handle all exceptions taken to Hyp mode.

When the kernel is booted in Hyp mode, calling an hvc instruction with r0
pointing to the new vectors, the HVBAR is changed to the the vector pointers.
This allows subsystems (like KVM here) to execute code in Hyp-mode with the
MMU disabled.

We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.

Also provides memory mapping code to map required code pages, data structures,
and I/O regions  accessed in Hyp mode at the same virtual address as the host
kernel virtual addresses, but which conforms to the architectural requirements
for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
and comprises:
 - create_hyp_mappings(from, to);
 - create_hyp_io_mappings(from, to, phys_addr);
 - free_hyp_pmds();

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h              |  124 ++++++++++++++
 arch/arm/include/asm/kvm_asm.h              |   20 ++
 arch/arm/include/asm/kvm_mmu.h              |   29 +++
 arch/arm/include/asm/pgtable-3level-hwdef.h |    4 
 arch/arm/kvm/arm.c                          |  177 +++++++++++++++++++
 arch/arm/kvm/init.S                         |   95 ++++++++++
 arch/arm/kvm/interrupts.S                   |   37 ++++
 arch/arm/kvm/mmu.c                          |  248 +++++++++++++++++++++++++++
 8 files changed, 734 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm_mmu.h

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index c196a22..613afe2 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -21,4 +21,128 @@
 
 #include <asm/types.h>
 
+/* Hyp Configuration Register (HCR) bits */
+#define HCR_TGE		(1 << 27)
+#define HCR_TVM		(1 << 26)
+#define HCR_TTLB	(1 << 25)
+#define HCR_TPU		(1 << 24)
+#define HCR_TPC		(1 << 23)
+#define HCR_TSW		(1 << 22)
+#define HCR_TAC		(1 << 21)
+#define HCR_TIDCP	(1 << 20)
+#define HCR_TSC		(1 << 19)
+#define HCR_TID3	(1 << 18)
+#define HCR_TID2	(1 << 17)
+#define HCR_TID1	(1 << 16)
+#define HCR_TID0	(1 << 15)
+#define HCR_TWE		(1 << 14)
+#define HCR_TWI		(1 << 13)
+#define HCR_DC		(1 << 12)
+#define HCR_BSU		(3 << 10)
+#define HCR_BSU_IS	(1 << 10)
+#define HCR_FB		(1 << 9)
+#define HCR_VA		(1 << 8)
+#define HCR_VI		(1 << 7)
+#define HCR_VF		(1 << 6)
+#define HCR_AMO		(1 << 5)
+#define HCR_IMO		(1 << 4)
+#define HCR_FMO		(1 << 3)
+#define HCR_PTW		(1 << 2)
+#define HCR_SWIO	(1 << 1)
+#define HCR_VM		1
+
+/*
+ * The bits we set in HCR:
+ * TAC:		Trap ACTLR
+ * TSC:		Trap SMC
+ * TSW:		Trap cache operations by set/way
+ * TWI:		Trap WFI
+ * TIDCP:	Trap L2CTLR/L2ECTLR
+ * BSU_IS:	Upgrade barriers to the inner shareable domain
+ * FB:		Force broadcast of all maintainance operations
+ * AMO:		Override CPSR.A and enable signaling with VA
+ * IMO:		Override CPSR.I and enable signaling with VI
+ * FMO:		Override CPSR.F and enable signaling with VF
+ * SWIO:	Turn set/way invalidates into set/way clean+invalidate
+ */
+#define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
+			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
+			HCR_SWIO | HCR_TIDCP)
+
+/* Hyp System Control Register (HSCTLR) bits */
+#define HSCTLR_TE	(1 << 30)
+#define HSCTLR_EE	(1 << 25)
+#define HSCTLR_FI	(1 << 21)
+#define HSCTLR_WXN	(1 << 19)
+#define HSCTLR_I	(1 << 12)
+#define HSCTLR_C	(1 << 2)
+#define HSCTLR_A	(1 << 1)
+#define HSCTLR_M	1
+#define HSCTLR_MASK	(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
+			 HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
+
+/* TTBCR and HTCR Registers bits */
+#define TTBCR_EAE	(1 << 31)
+#define TTBCR_IMP	(1 << 30)
+#define TTBCR_SH1	(3 << 28)
+#define TTBCR_ORGN1	(3 << 26)
+#define TTBCR_IRGN1	(3 << 24)
+#define TTBCR_EPD1	(1 << 23)
+#define TTBCR_A1	(1 << 22)
+#define TTBCR_T1SZ	(3 << 16)
+#define TTBCR_SH0	(3 << 12)
+#define TTBCR_ORGN0	(3 << 10)
+#define TTBCR_IRGN0	(3 << 8)
+#define TTBCR_EPD0	(1 << 7)
+#define TTBCR_T0SZ	3
+#define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
+
+/* Hyp Debug Configuration Register bits */
+#define HDCR_TDRA	(1 << 11)
+#define HDCR_TDOSA	(1 << 10)
+#define HDCR_TDA	(1 << 9)
+#define HDCR_TDE	(1 << 8)
+#define HDCR_HPME	(1 << 7)
+#define HDCR_TPM	(1 << 6)
+#define HDCR_TPMCR	(1 << 5)
+#define HDCR_HPMN_MASK	(0x1F)
+
+/*
+ * The architecture supports 40-bit IPA as input to the 2nd stage translations
+ * and PTRS_PER_S2_PGD becomes 1024, because each entry covers 1GB of address
+ * space.
+ */
+#define KVM_PHYS_SHIFT	(40)
+#define KVM_PHYS_SIZE	(1ULL << KVM_PHYS_SHIFT)
+#define KVM_PHYS_MASK	(KVM_PHYS_SIZE - 1ULL)
+#define PTRS_PER_S2_PGD	(1ULL << (KVM_PHYS_SHIFT - 30))
+#define S2_PGD_ORDER	get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
+#define S2_PGD_SIZE	(1 << S2_PGD_ORDER)
+
+/* Virtualization Translation Control Register (VTCR) bits */
+#define VTCR_SH0	(3 << 12)
+#define VTCR_ORGN0	(3 << 10)
+#define VTCR_IRGN0	(3 << 8)
+#define VTCR_SL0	(3 << 6)
+#define VTCR_S		(1 << 4)
+#define VTCR_T0SZ	(0xf)
+#define VTCR_MASK	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0 | VTCR_SL0 | \
+			 VTCR_S | VTCR_T0SZ)
+#define VTCR_HTCR_SH	(VTCR_SH0 | VTCR_ORGN0 | VTCR_IRGN0)
+#define VTCR_SL_L2	(0 << 6)	/* Starting-level: 2 */
+#define VTCR_SL_L1	(1 << 6)	/* Starting-level: 1 */
+#define KVM_VTCR_SL0	VTCR_SL_L1
+/* stage-2 input address range defined as 2^(32-T0SZ) */
+#define KVM_T0SZ	(32 - KVM_PHYS_SHIFT)
+#define KVM_VTCR_T0SZ	(KVM_T0SZ & VTCR_T0SZ)
+#define KVM_VTCR_S	((KVM_VTCR_T0SZ << 1) & VTCR_S)
+
+/* Virtualization Translation Table Base Register (VTTBR) bits */
+#if KVM_VTCR_SL0 == VTCR_SL_L2	/* see ARM DDI 0406C: B4-1720 */
+#define VTTBR_X		(14 - KVM_T0SZ)
+#else
+#define VTTBR_X		(5 - KVM_T0SZ)
+#endif
+
+
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index f9993e5..81324e2 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -54,5 +54,25 @@
 #define ARM_EXCEPTION_DATA_ABORT  4
 #define ARM_EXCEPTION_IRQ	  5
 #define ARM_EXCEPTION_FIQ	  6
+#define ARM_EXCEPTION_HVC	  7
+
+#ifndef __ASSEMBLY__
+struct kvm_vcpu;
+
+extern char __kvm_hyp_init[];
+extern char __kvm_hyp_init_end[];
+
+extern char __kvm_hyp_exit[];
+extern char __kvm_hyp_exit_end[];
+
+extern char __kvm_hyp_vector[];
+
+extern char __kvm_hyp_code_start[];
+extern char __kvm_hyp_code_end[];
+
+extern void __kvm_flush_vm_context(void);
+
+extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+#endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
new file mode 100644
index 0000000..e8679b3
--- /dev/null
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_MMU_H__
+#define __ARM_KVM_MMU_H__
+
+int create_hyp_mappings(void *from, void *to);
+int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
+void free_hyp_pmds(void);
+
+phys_addr_t kvm_mmu_get_httbr(void);
+int kvm_mmu_init(void);
+void kvm_clear_hyp_idmap(void);
+#endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index a2d404e..18f5cef 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -32,6 +32,9 @@
 #define PMD_TYPE_SECT		(_AT(pmdval_t, 1) << 0)
 #define PMD_BIT4		(_AT(pmdval_t, 0))
 #define PMD_DOMAIN(x)		(_AT(pmdval_t, 0))
+#define PMD_APTABLE_SHIFT	(61)
+#define PMD_APTABLE		(_AT(pgdval_t, 3) << PGD_APTABLE_SHIFT)
+#define PMD_PXNTABLE		(_AT(pgdval_t, 1) << 59)
 
 /*
  *   - section
@@ -41,6 +44,7 @@
 #define PMD_SECT_S		(_AT(pmdval_t, 3) << 8)
 #define PMD_SECT_AF		(_AT(pmdval_t, 1) << 10)
 #define PMD_SECT_nG		(_AT(pmdval_t, 1) << 11)
+#define PMD_SECT_PXN		(_AT(pmdval_t, 1) << 53)
 #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
 #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 82cb338..2dddc58 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -34,11 +34,21 @@
 #include <asm/ptrace.h>
 #include <asm/mman.h>
 #include <asm/cputype.h>
+#include <asm/tlbflush.h>
+#include <asm/virt.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_mmu.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
 #endif
 
+static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
+static unsigned long hyp_default_vectors;
+
+
 int kvm_arch_hardware_enable(void *garbage)
 {
 	return 0;
@@ -336,9 +346,176 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	return -EINVAL;
 }
 
+static void cpu_init_hyp_mode(void *vector)
+{
+	unsigned long long pgd_ptr;
+	unsigned long hyp_stack_ptr;
+	unsigned long stack_page;
+	unsigned long vector_ptr;
+
+	/* Switch from the HYP stub to our own HYP init vector */
+	__hyp_set_vectors((unsigned long)vector);
+
+	pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();
+	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
+	hyp_stack_ptr = stack_page + PAGE_SIZE;
+	vector_ptr = (unsigned long)__kvm_hyp_vector;
+
+	/*
+	 * Call initialization code, and switch to the full blown
+	 * HYP code. The init code corrupts r12, so set the clobber
+	 * list accordingly.
+	 */
+	asm volatile (
+		"mov	r0, %[pgd_ptr_low]\n\t"
+		"mov	r1, %[pgd_ptr_high]\n\t"
+		"mov	r2, %[hyp_stack_ptr]\n\t"
+		"mov	r3, %[vector_ptr]\n\t"
+		"hvc	#0\n\t" : :
+		[pgd_ptr_low] "r" ((unsigned long)(pgd_ptr & 0xffffffff)),
+		[pgd_ptr_high] "r" ((unsigned long)(pgd_ptr >> 32ULL)),
+		[hyp_stack_ptr] "r" (hyp_stack_ptr),
+		[vector_ptr] "r" (vector_ptr) :
+		"r0", "r1", "r2", "r3", "r12");
+}
+
+/**
+ * Inits Hyp-mode on all online CPUs
+ */
+static int init_hyp_mode(void)
+{
+	phys_addr_t init_phys_addr;
+	int cpu;
+	int err = 0;
+
+	/*
+	 * Allocate Hyp PGD and setup Hyp identity mapping
+	 */
+	err = kvm_mmu_init();
+	if (err)
+		goto out_err;
+
+	/*
+	 * It is probably enough to obtain the default on one
+	 * CPU. It's unlikely to be different on the others.
+	 */
+	hyp_default_vectors = __hyp_get_vectors();
+
+	/*
+	 * Allocate stack pages for Hypervisor-mode
+	 */
+	for_each_possible_cpu(cpu) {
+		unsigned long stack_page;
+
+		stack_page = __get_free_page(GFP_KERNEL);
+		if (!stack_page) {
+			err = -ENOMEM;
+			goto out_free_stack_pages;
+		}
+
+		per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
+	}
+
+	/*
+	 * Execute the init code on each CPU.
+	 *
+	 * Note: The stack is not mapped yet, so don't do anything else than
+	 * initializing the hypervisor mode on each CPU using a local stack
+	 * space for temporary storage.
+	 */
+	init_phys_addr = virt_to_phys(__kvm_hyp_init);
+	for_each_online_cpu(cpu) {
+		smp_call_function_single(cpu, cpu_init_hyp_mode,
+					 (void *)(long)init_phys_addr, 1);
+	}
+
+	/*
+	 * Unmap the identity mapping
+	 */
+	kvm_clear_hyp_idmap();
+
+	/*
+	 * Map the Hyp-code called directly from the host
+	 */
+	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
+	if (err) {
+		kvm_err("Cannot map world-switch code\n");
+		goto out_free_mappings;
+	}
+
+	/*
+	 * Map the Hyp stack pages
+	 */
+	for_each_possible_cpu(cpu) {
+		char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
+		err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE);
+
+		if (err) {
+			kvm_err("Cannot map hyp stack\n");
+			goto out_free_mappings;
+		}
+	}
+
+	/*
+	 * Map the host VFP structures
+	 */
+	kvm_host_vfp_state = alloc_percpu(struct vfp_hard_struct);
+	if (!kvm_host_vfp_state) {
+		err = -ENOMEM;
+		kvm_err("Cannot allocate host VFP state\n");
+		goto out_free_mappings;
+	}
+
+	for_each_possible_cpu(cpu) {
+		struct vfp_hard_struct *vfp;
+
+		vfp = per_cpu_ptr(kvm_host_vfp_state, cpu);
+		err = create_hyp_mappings(vfp, vfp + 1);
+
+		if (err) {
+			kvm_err("Cannot map host VFP state: %d\n", err);
+			goto out_free_vfp;
+		}
+	}
+
+	kvm_info("Hyp mode initialized successfully\n");
+	return 0;
+out_free_vfp:
+	free_percpu(kvm_host_vfp_state);
+out_free_mappings:
+	free_hyp_pmds();
+out_free_stack_pages:
+	for_each_possible_cpu(cpu)
+		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+out_err:
+	kvm_err("error initializing Hyp mode: %d\n", err);
+	return err;
+}
+
+/**
+ * Initialize Hyp-mode and memory mappings on all CPUs.
+ */
 int kvm_arch_init(void *opaque)
 {
+	int err;
+
+	if (!is_hyp_mode_available()) {
+		kvm_err("HYP mode not available\n");
+		return -ENODEV;
+	}
+
+	if (kvm_target_cpu() < 0) {
+		kvm_err("Target CPU not supported!\n");
+		return -ENODEV;
+	}
+
+	err = init_hyp_mode();
+	if (err)
+		goto out_err;
+
 	return 0;
+out_err:
+	return err;
 }
 
 /* NOP: Compiling as a module not supported */
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 1dc8926..f179f10 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -15,5 +15,100 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/linkage.h>
+#include <asm/unified.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+/********************************************************************
+ * Hypervisor initialization
+ *   - should be called with:
+ *       r0,r1 = Hypervisor pgd pointer
+ *       r2 = top of Hyp stack (kernel VA)
+ *       r3 = pointer to hyp vectors
+ */
+
+	.text
+	.pushsection    .hyp.idmap.text,"ax"
+	.align 5
+__kvm_hyp_init:
+	.globl __kvm_hyp_init
+
+	@ Hyp-mode exception vector
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	.
+	W(b)	__do_hyp_init
+	W(b)	.
+	W(b)	.
+
+__do_hyp_init:
+	@ Set the HTTBR to point to the hypervisor PGD pointer passed
+	mcrr	p15, 4, r0, r1, c2
+
+	@ Set the HTCR and VTCR to the same shareability and cacheability
+	@ settings as the non-secure TTBCR and with T0SZ == 0.
+	mrc	p15, 4, r0, c2, c0, 2	@ HTCR
+	ldr	r12, =HTCR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c2, c0, 2	@ TTBCR
+	and	r1, r1, #(HTCR_MASK & ~TTBCR_T0SZ)
+	orr	r0, r0, r1
+	mcr	p15, 4, r0, c2, c0, 2	@ HTCR
+
+	mrc	p15, 4, r1, c2, c1, 2	@ VTCR
+	ldr	r12, =VTCR_MASK
+	bic	r1, r1, r12
+	bic	r0, r0, #(~VTCR_HTCR_SH)	@ clear non-reusable HTCR bits
+	orr	r1, r0, r1
+	orr	r1, r1, #(KVM_VTCR_SL0 | KVM_VTCR_T0SZ | KVM_VTCR_S)
+	mcr	p15, 4, r1, c2, c1, 2	@ VTCR
+
+	@ Use the same memory attributes for hyp. accesses as the kernel
+	@ (copy MAIRx ro HMAIRx).
+	mrc	p15, 0, r0, c10, c2, 0
+	mcr	p15, 4, r0, c10, c2, 0
+	mrc	p15, 0, r0, c10, c2, 1
+	mcr	p15, 4, r0, c10, c2, 1
+
+	@ Set the HSCTLR to:
+	@  - ARM/THUMB exceptions: Kernel config (Thumb-2 kernel)
+	@  - Endianness: Kernel config
+	@  - Fast Interrupt Features: Kernel config
+	@  - Write permission implies XN: disabled
+	@  - Instruction cache: enabled
+	@  - Data/Unified cache: enabled
+	@  - Memory alignment checks: enabled
+	@  - MMU: enabled (this code must be run from an identity mapping)
+	mrc	p15, 4, r0, c1, c0, 0	@ HSCR
+	ldr	r12, =HSCTLR_MASK
+	bic	r0, r0, r12
+	mrc	p15, 0, r1, c1, c0, 0	@ SCTLR
+	ldr	r12, =(HSCTLR_EE | HSCTLR_FI)
+	and	r1, r1, r12
+ ARM(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I)			)
+ THUMB(	ldr	r12, =(HSCTLR_M | HSCTLR_A | HSCTLR_I | HSCTLR_TE)	)
+	orr	r1, r1, r12
+	orr	r0, r0, r1
+	isb
+	mcr	p15, 4, r0, c1, c0, 0	@ HSCR
+	isb
+
+	@ Set stack pointer and return to the kernel
+	mov	sp, r2
+
+	@ Set HVBAR to point to the HYP vectors
+	mcr	p15, 4, r3, c12, c0, 0	@ HVBAR
+
+	eret
+
+	.ltorg
+
+	.globl __kvm_hyp_init_end
+__kvm_hyp_init_end:
+
+	.popsection
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 1dc8926..6bde17b 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -15,5 +15,42 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/unified.h>
+#include <asm/page.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+	.text
+
+__kvm_hyp_code_start:
+	.globl __kvm_hyp_code_start
+
+/********************************************************************
+ * Flush per-VMID TLBs
+ */
+ENTRY(__kvm_flush_vm_context)
+	bx	lr
+ENDPROC(__kvm_flush_vm_context)
+
+/********************************************************************
+ *  Hypervisor world-switch code
+ */
+ENTRY(__kvm_vcpu_run)
+	bx	lr
+
+
+/********************************************************************
+ * Hypervisor exception vector and handlers
+ */
+
+	.align 5
+__kvm_hyp_vector:
+	.globl __kvm_hyp_vector
+	nop
+
+__kvm_hyp_code_end:
+	.globl	__kvm_hyp_code_end
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 10ed464..4decdb6 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -15,3 +15,251 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+
+#include <linux/mman.h>
+#include <linux/kvm_host.h>
+#include <linux/io.h>
+#include <asm/idmap.h>
+#include <asm/pgalloc.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
+#include <asm/mach/map.h>
+
+extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
+
+static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
+
+static void kvm_set_pte(pte_t *pte, pte_t new_pte)
+{
+	pte_val(*pte) = new_pte;
+	/*
+	 * flush_pmd_entry just takes a void pointer and cleans the necessary
+	 * cache entries, so we can reuse the function for ptes.
+	 */
+	flush_pmd_entry(pte);
+}
+
+static void free_ptes(pmd_t *pmd, unsigned long addr)
+{
+	pte_t *pte;
+	unsigned int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
+		if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+			pte = pte_offset_kernel(pmd, addr);
+			pte_free_kernel(NULL, pte);
+		}
+		pmd++;
+	}
+}
+
+/**
+ * free_hyp_pmds - free a Hyp-mode level-2 tables and child level-3 tables
+ *
+ * Assumes this is a page table used strictly in Hyp-mode and therefore contains
+ * only mappings in the kernel memory area, which is above PAGE_OFFSET.
+ */
+void free_hyp_pmds(void)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = PAGE_OFFSET; addr != 0; addr += PGDIR_SIZE) {
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none(*pud))
+			continue;
+		BUG_ON(pud_bad(*pud));
+
+		pmd = pmd_offset(pud, addr);
+		free_ptes(pmd, addr);
+		pmd_free(NULL, pmd);
+		pud_clear(pud);
+	}
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+}
+
+static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,
+				    unsigned long end)
+{
+	pte_t *pte;
+	unsigned long addr;
+	struct page *page;
+
+	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
+		pte = pte_offset_kernel(pmd, addr);
+		BUG_ON(!virt_addr_valid(addr));
+		page = virt_to_page(addr);
+		kvm_set_pte(pte, mk_pte(page, PAGE_HYP));
+	}
+}
+
+static void create_hyp_io_pte_mappings(pmd_t *pmd, unsigned long start,
+				       unsigned long end,
+				       unsigned long *pfn_base)
+{
+	pte_t *pte;
+	unsigned long addr;
+
+	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
+		pte = pte_offset_kernel(pmd, addr);
+		BUG_ON(pfn_valid(*pfn_base));
+		kvm_set_pte(pte, pfn_pte(*pfn_base, PAGE_HYP_DEVICE));
+		(*pfn_base)++;
+	}
+}
+
+static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
+				   unsigned long end, unsigned long *pfn_base)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+	unsigned long addr, next;
+
+	for (addr = start; addr < end; addr = next) {
+		pmd = pmd_offset(pud, addr);
+
+		BUG_ON(pmd_sect(*pmd));
+
+		if (pmd_none(*pmd)) {
+			pte = pte_alloc_one_kernel(NULL, addr);
+			if (!pte) {
+				kvm_err("Cannot allocate Hyp pte\n");
+				return -ENOMEM;
+			}
+			pmd_populate_kernel(NULL, pmd, pte);
+		}
+
+		next = pmd_addr_end(addr, end);
+
+		/*
+		 * If pfn_base is NULL, we map kernel pages into HYP with the
+		 * virtual address. Otherwise, this is considered an I/O
+		 * mapping and we map the physical region starting at
+		 * *pfn_base to [start, end[.
+		 */
+		if (!pfn_base)
+			create_hyp_pte_mappings(pmd, addr, next);
+		else
+			create_hyp_io_pte_mappings(pmd, addr, next, pfn_base);
+	}
+
+	return 0;
+}
+
+static int __create_hyp_mappings(void *from, void *to, unsigned long *pfn_base)
+{
+	unsigned long start = (unsigned long)from;
+	unsigned long end = (unsigned long)to;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr, next;
+	int err = 0;
+
+	BUG_ON(start > end);
+	if (start < PAGE_OFFSET)
+		return -EINVAL;
+
+	mutex_lock(&kvm_hyp_pgd_mutex);
+	for (addr = start; addr < end; addr = next) {
+		pgd = hyp_pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+
+		if (pud_none_or_clear_bad(pud)) {
+			pmd = pmd_alloc_one(NULL, addr);
+			if (!pmd) {
+				kvm_err("Cannot allocate Hyp pmd\n");
+				err = -ENOMEM;
+				goto out;
+			}
+			pud_populate(NULL, pud, pmd);
+		}
+
+		next = pgd_addr_end(addr, end);
+		err = create_hyp_pmd_mappings(pud, addr, next, pfn_base);
+		if (err)
+			goto out;
+	}
+out:
+	mutex_unlock(&kvm_hyp_pgd_mutex);
+	return err;
+}
+
+/**
+ * create_hyp_mappings - map a kernel virtual address range in Hyp mode
+ * @from:	The virtual kernel start address of the range
+ * @to:		The virtual kernel end address of the range (exclusive)
+ *
+ * The same virtual address as the kernel virtual address is also used in
+ * Hyp-mode mapping to the same underlying physical pages.
+ *
+ * Note: Wrapping around zero in the "to" address is not supported.
+ */
+int create_hyp_mappings(void *from, void *to)
+{
+	return __create_hyp_mappings(from, to, NULL);
+}
+
+/**
+ * create_hyp_io_mappings - map a physical IO range in Hyp mode
+ * @from:	The virtual HYP start address of the range
+ * @to:		The virtual HYP end address of the range (exclusive)
+ * @addr:	The physical start address which gets mapped
+ */
+int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
+{
+	unsigned long pfn = __phys_to_pfn(addr);
+	return __create_hyp_mappings(from, to, &pfn);
+}
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return -EINVAL;
+}
+
+phys_addr_t kvm_mmu_get_httbr(void)
+{
+	VM_BUG_ON(!virt_addr_valid(hyp_pgd));
+	return virt_to_phys(hyp_pgd);
+}
+
+int kvm_mmu_init(void)
+{
+	return hyp_pgd ? 0 : -ENOMEM;
+}
+
+/**
+ * kvm_clear_idmap - remove all idmaps from the hyp pgd
+ *
+ * Free the underlying pmds for all pgds in range and clear the pgds (but
+ * don't free them) afterwards.
+ */
+void kvm_clear_hyp_idmap(void)
+{
+	unsigned long addr, end;
+	unsigned long next;
+	pgd_t *pgd = hyp_pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	addr = virt_to_phys(__hyp_idmap_text_start);
+	end = virt_to_phys(__hyp_idmap_text_end);
+
+	pgd += pgd_index(addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none_or_clear_bad(pgd))
+			continue;
+		pud = pud_offset(pgd, addr);
+		pmd = pmd_offset(pud, addr);
+
+		pud_clear(pud);
+		clean_pmd_entry(pmd);
+		pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
+	} while (pgd++, addr = next, addr < end);
+}

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 05/14] KVM: ARM: Memory virtualization setup
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:39   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm; +Cc: Marc Zyngier, Marcelo Tosatti

This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Each entry in TLBs and caches are tagged with a VMID identifier in
addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
order that VMs are executed, and caches and tlbs are invalidated when
the VMID space has been used to allow for more than 255 simultaenously
running guests.

The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

We pre-allocate page table memory to be able to synchronize using a
spinlock and be called under rcu_read_lock from the MMU notifiers.  We
steal the mmu_memory_cache implementation from x86 and adapt for our
specific usage.

We support MMU notifiers (thanks to Marc Zyngier) through
kvm_unmap_hva and kvm_set_spte_hva.

Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
which is used by VGIC support to map the virtual CPU interface registers
to the guest. This support is added by Marc Zyngier.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_asm.h  |    2 
 arch/arm/include/asm/kvm_host.h |   19 ++
 arch/arm/include/asm/kvm_mmu.h  |    9 +
 arch/arm/kvm/Kconfig            |    1 
 arch/arm/kvm/arm.c              |   37 ++++
 arch/arm/kvm/interrupts.S       |   10 +
 arch/arm/kvm/mmu.c              |  370 +++++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h            |   46 +++++
 8 files changed, 492 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 81324e2..f6652f6 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -57,6 +57,7 @@
 #define ARM_EXCEPTION_HVC	  7
 
 #ifndef __ASSEMBLY__
+struct kvm;
 struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
@@ -71,6 +72,7 @@ extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 92e89f3..1de6f0d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -111,4 +111,23 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 struct kvm_one_reg;
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+u64 kvm_call_hyp(void *hypfn, ...);
+
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+struct kvm;
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end);
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+
+/* We do not have shadow page tables, hence the empty hooks */
+static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
+
+static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index e8679b3..499e7b0 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -23,6 +23,15 @@ int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
 void free_hyp_pmds(void);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+
 phys_addr_t kvm_mmu_get_httbr(void);
 int kvm_mmu_init(void);
 void kvm_clear_hyp_idmap(void);
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 4a01b6f..05227cb 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM_ARM_HOST
 	bool "KVM host support for ARM cpus."
 	depends on KVM
 	depends on MMU
+	select	MMU_NOTIFIER
 	---help---
 	  Provides host support for ARM processors.
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 2dddc58..ab82039 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -81,12 +81,33 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 }
 
+/**
+ * kvm_arch_init_vm - initializes a VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
+	int ret = 0;
+
 	if (type)
 		return -EINVAL;
 
-	return 0;
+	ret = kvm_alloc_stage2_pgd(kvm);
+	if (ret)
+		goto out_fail_alloc;
+
+	ret = create_hyp_mappings(kvm, kvm + 1);
+	if (ret)
+		goto out_free_stage2_pgd;
+
+	/* Mark the initial VMID generation invalid */
+	kvm->arch.vmid_gen = 0;
+
+	return ret;
+out_free_stage2_pgd:
+	kvm_free_stage2_pgd(kvm);
+out_fail_alloc:
+	return ret;
 }
 
 int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
@@ -104,10 +125,16 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 	return 0;
 }
 
+/**
+ * kvm_arch_destroy_vm - destroy the VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	int i;
 
+	kvm_free_stage2_pgd(kvm);
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
@@ -196,7 +223,13 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (err)
 		goto free_vcpu;
 
+	err = create_hyp_mappings(vcpu, vcpu + 1);
+	if (err)
+		goto vcpu_uninit;
+
 	return vcpu;
+vcpu_uninit:
+	kvm_vcpu_uninit(vcpu);
 free_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 out:
@@ -210,6 +243,8 @@ int kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
+	kvm_mmu_free_memory_caches(vcpu);
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 6bde17b..a923590 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -32,6 +32,13 @@ __kvm_hyp_code_start:
 /********************************************************************
  * Flush per-VMID TLBs
  */
+ENTRY(__kvm_tlb_flush_vmid)
+	bx	lr
+ENDPROC(__kvm_tlb_flush_vmid)
+
+/********************************************************************
+ * Flush TLBs and instruction caches of current CPU for all VMIDs
+ */
 ENTRY(__kvm_flush_vm_context)
 	bx	lr
 ENDPROC(__kvm_flush_vm_context)
@@ -42,6 +49,9 @@ ENDPROC(__kvm_flush_vm_context)
 ENTRY(__kvm_vcpu_run)
 	bx	lr
 
+ENTRY(kvm_call_hyp)
+	bx	lr
+
 
 /********************************************************************
  * Hypervisor exception vector and handlers
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 4decdb6..4347d68 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -23,12 +23,21 @@
 #include <asm/pgalloc.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_asm.h>
 #include <asm/mach/map.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
 
 extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
+static void kvm_tlb_flush_vmid(struct kvm *kvm)
+{
+	kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+}
+
 static void kvm_set_pte(pte_t *pte, pte_t new_pte)
 {
 	pte_val(*pte) = new_pte;
@@ -39,6 +48,38 @@ static void kvm_set_pte(pte_t *pte, pte_t new_pte)
 	flush_pmd_entry(pte);
 }
 
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
+				  int min, int max)
+{
+	void *page;
+
+	BUG_ON(max > KVM_NR_MEM_OBJS);
+	if (cache->nobjs >= min)
+		return 0;
+	while (cache->nobjs < max) {
+		page = (void *)__get_free_page(PGALLOC_GFP);
+		if (!page)
+			return -ENOMEM;
+		cache->objects[cache->nobjs++] = page;
+	}
+	return 0;
+}
+
+static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+{
+	while (mc->nobjs)
+		free_page((unsigned long)mc->objects[--mc->nobjs]);
+}
+
+static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
+{
+	void *p;
+
+	BUG_ON(!mc || !mc->nobjs);
+	p = mc->objects[--mc->nobjs];
+	return p;
+}
+
 static void free_ptes(pmd_t *pmd, unsigned long addr)
 {
 	pte_t *pte;
@@ -217,11 +258,333 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
 	return __create_hyp_mappings(from, to, &pfn);
 }
 
+/**
+ * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Allocates the 1st level table only of size defined by S2_PGD_ORDER (can
+ * support either full 40-bit input addresses or limited to 32-bit input
+ * addresses). Clears the allocated pages.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * created, which can only be done once.
+ */
+int kvm_alloc_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+
+	if (kvm->arch.pgd != NULL) {
+		kvm_err("kvm_arch already initialized?\n");
+		return -EINVAL;
+	}
+
+	pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, S2_PGD_ORDER);
+	if (!pgd)
+		return -ENOMEM;
+
+	/* stage-2 pgd must be aligned to its size */
+	VM_BUG_ON((unsigned long)pgd & (S2_PGD_SIZE - 1));
+
+	memset(pgd, 0, PTRS_PER_S2_PGD * sizeof(pgd_t));
+	clean_dcache_area(pgd, PTRS_PER_S2_PGD * sizeof(pgd_t));
+	kvm->arch.pgd = pgd;
+
+	return 0;
+}
+
+static void clear_pud_entry(pud_t *pud)
+{
+	pmd_t *pmd_table = pmd_offset(pud, 0);
+	pud_clear(pud);
+	pmd_free(NULL, pmd_table);
+	put_page(virt_to_page(pud));
+}
+
+static void clear_pmd_entry(pmd_t *pmd)
+{
+	pte_t *pte_table = pte_offset_kernel(pmd, 0);
+	pmd_clear(pmd);
+	pte_free_kernel(NULL, pte_table);
+	put_page(virt_to_page(pmd));
+}
+
+static bool pmd_empty(pmd_t *pmd)
+{
+	struct page *pmd_page = virt_to_page(pmd);
+	return page_count(pmd_page) == 1;
+}
+
+static void clear_pte_entry(pte_t *pte)
+{
+	if (pte_present(*pte)) {
+		kvm_set_pte(pte, __pte(0));
+		put_page(virt_to_page(pte));
+	}
+}
+
+static bool pte_empty(pte_t *pte)
+{
+	struct page *pte_page = virt_to_page(pte);
+	return page_count(pte_page) == 1;
+}
+
+/**
+ * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
+ * @kvm:   The VM pointer
+ * @start: The intermediate physical base address of the range to unmap
+ * @size:  The size of the area to unmap
+ *
+ * Clear a range of stage-2 mappings, lowering the various ref-counts.  Must
+ * be called while holding mmu_lock (unless for freeing the stage2 pgd before
+ * destroying the VM), otherwise another faulting VCPU may come in and mess
+ * with things behind our backs.
+ */
+static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	phys_addr_t addr = start, end = start + size;
+	u64 range;
+
+	while (addr < end) {
+		pgd = kvm->arch.pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+		if (pud_none(*pud)) {
+			addr += PUD_SIZE;
+			continue;
+		}
+
+		pmd = pmd_offset(pud, addr);
+		if (pmd_none(*pmd)) {
+			addr += PMD_SIZE;
+			continue;
+		}
+
+		pte = pte_offset_kernel(pmd, addr);
+		clear_pte_entry(pte);
+		range = PAGE_SIZE;
+
+		/* If we emptied the pte, walk back up the ladder */
+		if (pte_empty(pte)) {
+			clear_pmd_entry(pmd);
+			range = PMD_SIZE;
+			if (pmd_empty(pmd)) {
+				clear_pud_entry(pud);
+				range = PUD_SIZE;
+			}
+		}
+
+		addr += range;
+	}
+}
+
+/**
+ * kvm_free_stage2_pgd - free all stage-2 tables
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
+ * underlying level-2 and level-3 tables before freeing the actual level-1 table
+ * and setting the struct pointer to NULL.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * destroyed, which can only be done once.
+ */
+void kvm_free_stage2_pgd(struct kvm *kvm)
+{
+	if (kvm->arch.pgd == NULL)
+		return;
+
+	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
+	free_pages((unsigned long)kvm->arch.pgd, S2_PGD_ORDER);
+	kvm->arch.pgd = NULL;
+}
+
+
+static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte, old_pte;
+
+	/* Create 2nd stage page table mapping - Level 1 */
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud)) {
+		if (!cache)
+			return 0; /* ignore calls from kvm_set_spte_hva */
+		pmd = mmu_memory_cache_alloc(cache);
+		pud_populate(NULL, pud, pmd);
+		pmd += pmd_index(addr);
+		get_page(virt_to_page(pud));
+	} else
+		pmd = pmd_offset(pud, addr);
+
+	/* Create 2nd stage page table mapping - Level 2 */
+	if (pmd_none(*pmd)) {
+		if (!cache)
+			return 0; /* ignore calls from kvm_set_spte_hva */
+		pte = mmu_memory_cache_alloc(cache);
+		clean_pte_table(pte);
+		pmd_populate_kernel(NULL, pmd, pte);
+		pte += pte_index(addr);
+		get_page(virt_to_page(pmd));
+	} else
+		pte = pte_offset_kernel(pmd, addr);
+
+	if (iomap && pte_present(*pte))
+		return -EFAULT;
+
+	/* Create 2nd stage page table mapping - Level 3 */
+	old_pte = *pte;
+	kvm_set_pte(pte, *new_pte);
+	if (pte_present(old_pte))
+		kvm_tlb_flush_vmid(kvm);
+	else
+		get_page(virt_to_page(pte));
+
+	return 0;
+}
+
+/**
+ * kvm_phys_addr_ioremap - map a device range to guest IPA
+ *
+ * @kvm:	The KVM pointer
+ * @guest_ipa:	The IPA at which to insert the mapping
+ * @pa:		The physical address of the device
+ * @size:	The size of the mapping
+ */
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size)
+{
+	phys_addr_t addr, end;
+	int ret = 0;
+	unsigned long pfn;
+	struct kvm_mmu_memory_cache cache = { 0, };
+
+	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
+	pfn = __phys_to_pfn(pa);
+
+	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
+		pte_t pte = pfn_pte(pfn, PAGE_S2_DEVICE | L_PTE_S2_RDWR);
+
+		ret = mmu_topup_memory_cache(&cache, 2, 2);
+		if (ret)
+			goto out;
+		spin_lock(&kvm->mmu_lock);
+		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
+		spin_unlock(&kvm->mmu_lock);
+		if (ret)
+			goto out;
+
+		pfn++;
+	}
+
+out:
+	mmu_free_memory_cache(&cache);
+	return ret;
+}
+
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	return -EINVAL;
 }
 
+static void handle_hva_to_gpa(struct kvm *kvm,
+			      unsigned long start,
+			      unsigned long end,
+			      void (*handler)(struct kvm *kvm,
+					      gpa_t gpa, void *data),
+			      void *data)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+
+	slots = kvm_memslots(kvm);
+
+	/* we only care about the pages that the guest sees */
+	kvm_for_each_memslot(memslot, slots) {
+		unsigned long hva_start, hva_end;
+		gfn_t gfn, gfn_end;
+
+		hva_start = max(start, memslot->userspace_addr);
+		hva_end = min(end, memslot->userspace_addr +
+					(memslot->npages << PAGE_SHIFT));
+		if (hva_start >= hva_end)
+			continue;
+
+		/*
+		 * {gfn(page) | page intersects with [hva_start, hva_end)} =
+		 * {gfn_start, gfn_start+1, ..., gfn_end-1}.
+		 */
+		gfn = hva_to_gfn_memslot(hva_start, memslot);
+		gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot);
+
+		for (; gfn < gfn_end; ++gfn) {
+			gpa_t gpa = gfn << PAGE_SHIFT;
+			handler(kvm, gpa, data);
+		}
+	}
+}
+
+static void kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
+{
+	unmap_stage2_range(kvm, gpa, PAGE_SIZE);
+	kvm_tlb_flush_vmid(kvm);
+}
+
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+	unsigned long end = hva + PAGE_SIZE;
+
+	if (!kvm->arch.pgd)
+		return 0;
+
+	trace_kvm_unmap_hva(hva);
+	handle_hva_to_gpa(kvm, hva, end, &kvm_unmap_hva_handler, NULL);
+	return 0;
+}
+
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end)
+{
+	if (!kvm->arch.pgd)
+		return 0;
+
+	trace_kvm_unmap_hva_range(start, end);
+	handle_hva_to_gpa(kvm, start, end, &kvm_unmap_hva_handler, NULL);
+	return 0;
+}
+
+static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
+{
+	pte_t *pte = (pte_t *)data;
+
+	stage2_set_pte(kvm, NULL, gpa, pte, false);
+}
+
+
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+{
+	unsigned long end = hva + PAGE_SIZE;
+	pte_t stage2_pte;
+
+	if (!kvm->arch.pgd)
+		return;
+
+	trace_kvm_set_spte_hva(hva);
+	stage2_pte = pfn_pte(pte_pfn(pte), PAGE_S2);
+	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
+}
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
+{
+	mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+}
+
 phys_addr_t kvm_mmu_get_httbr(void)
 {
 	VM_BUG_ON(!virt_addr_valid(hyp_pgd));
@@ -230,7 +593,12 @@ phys_addr_t kvm_mmu_get_httbr(void)
 
 int kvm_mmu_init(void)
 {
-	return hyp_pgd ? 0 : -ENOMEM;
+	if (!hyp_pgd) {
+		kvm_err("Hyp mode PGD not allocated\n");
+		return -ENOMEM;
+	}
+
+	return 0;
 }
 
 /**
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index f8869c1..862b2cc 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,7 +39,53 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_unmap_hva,
+	TP_PROTO(unsigned long hva),
+	TP_ARGS(hva),
 
+	TP_STRUCT__entry(
+		__field(	unsigned long,	hva		)
+	),
+
+	TP_fast_assign(
+		__entry->hva		= hva;
+	),
+
+	TP_printk("mmu notifier unmap hva: %#08lx", __entry->hva)
+);
+
+TRACE_EVENT(kvm_unmap_hva_range,
+	TP_PROTO(unsigned long start, unsigned long end),
+	TP_ARGS(start, end),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	start		)
+		__field(	unsigned long,	end		)
+	),
+
+	TP_fast_assign(
+		__entry->start		= start;
+		__entry->end		= end;
+	),
+
+	TP_printk("mmu notifier unmap range: %#08lx -- %#08lx",
+		  __entry->start, __entry->end)
+);
+
+TRACE_EVENT(kvm_set_spte_hva,
+	TP_PROTO(unsigned long hva),
+	TP_ARGS(hva),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	hva		)
+	),
+
+	TP_fast_assign(
+		__entry->hva		= hva;
+	),
+
+	TP_printk("mmu notifier set pte hva: %#08lx", __entry->hva)
+);
 
 #endif /* _TRACE_KVM_H */
 


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 05/14] KVM: ARM: Memory virtualization setup
@ 2013-01-08 18:39   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Each entry in TLBs and caches are tagged with a VMID identifier in
addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
order that VMs are executed, and caches and tlbs are invalidated when
the VMID space has been used to allow for more than 255 simultaenously
running guests.

The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

We pre-allocate page table memory to be able to synchronize using a
spinlock and be called under rcu_read_lock from the MMU notifiers.  We
steal the mmu_memory_cache implementation from x86 and adapt for our
specific usage.

We support MMU notifiers (thanks to Marc Zyngier) through
kvm_unmap_hva and kvm_set_spte_hva.

Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
which is used by VGIC support to map the virtual CPU interface registers
to the guest. This support is added by Marc Zyngier.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_asm.h  |    2 
 arch/arm/include/asm/kvm_host.h |   19 ++
 arch/arm/include/asm/kvm_mmu.h  |    9 +
 arch/arm/kvm/Kconfig            |    1 
 arch/arm/kvm/arm.c              |   37 ++++
 arch/arm/kvm/interrupts.S       |   10 +
 arch/arm/kvm/mmu.c              |  370 +++++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h            |   46 +++++
 8 files changed, 492 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 81324e2..f6652f6 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -57,6 +57,7 @@
 #define ARM_EXCEPTION_HVC	  7
 
 #ifndef __ASSEMBLY__
+struct kvm;
 struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
@@ -71,6 +72,7 @@ extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 92e89f3..1de6f0d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -111,4 +111,23 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 struct kvm_one_reg;
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+u64 kvm_call_hyp(void *hypfn, ...);
+
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+struct kvm;
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end);
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+
+/* We do not have shadow page tables, hence the empty hooks */
+static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
+
+static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	return 0;
+}
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index e8679b3..499e7b0 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -23,6 +23,15 @@ int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
 void free_hyp_pmds(void);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+
 phys_addr_t kvm_mmu_get_httbr(void);
 int kvm_mmu_init(void);
 void kvm_clear_hyp_idmap(void);
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 4a01b6f..05227cb 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM_ARM_HOST
 	bool "KVM host support for ARM cpus."
 	depends on KVM
 	depends on MMU
+	select	MMU_NOTIFIER
 	---help---
 	  Provides host support for ARM processors.
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 2dddc58..ab82039 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -81,12 +81,33 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 }
 
+/**
+ * kvm_arch_init_vm - initializes a VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
+	int ret = 0;
+
 	if (type)
 		return -EINVAL;
 
-	return 0;
+	ret = kvm_alloc_stage2_pgd(kvm);
+	if (ret)
+		goto out_fail_alloc;
+
+	ret = create_hyp_mappings(kvm, kvm + 1);
+	if (ret)
+		goto out_free_stage2_pgd;
+
+	/* Mark the initial VMID generation invalid */
+	kvm->arch.vmid_gen = 0;
+
+	return ret;
+out_free_stage2_pgd:
+	kvm_free_stage2_pgd(kvm);
+out_fail_alloc:
+	return ret;
 }
 
 int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
@@ -104,10 +125,16 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 	return 0;
 }
 
+/**
+ * kvm_arch_destroy_vm - destroy the VM data structure
+ * @kvm:	pointer to the KVM struct
+ */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	int i;
 
+	kvm_free_stage2_pgd(kvm);
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
@@ -196,7 +223,13 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (err)
 		goto free_vcpu;
 
+	err = create_hyp_mappings(vcpu, vcpu + 1);
+	if (err)
+		goto vcpu_uninit;
+
 	return vcpu;
+vcpu_uninit:
+	kvm_vcpu_uninit(vcpu);
 free_vcpu:
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 out:
@@ -210,6 +243,8 @@ int kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
+	kvm_mmu_free_memory_caches(vcpu);
+	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 6bde17b..a923590 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -32,6 +32,13 @@ __kvm_hyp_code_start:
 /********************************************************************
  * Flush per-VMID TLBs
  */
+ENTRY(__kvm_tlb_flush_vmid)
+	bx	lr
+ENDPROC(__kvm_tlb_flush_vmid)
+
+/********************************************************************
+ * Flush TLBs and instruction caches of current CPU for all VMIDs
+ */
 ENTRY(__kvm_flush_vm_context)
 	bx	lr
 ENDPROC(__kvm_flush_vm_context)
@@ -42,6 +49,9 @@ ENDPROC(__kvm_flush_vm_context)
 ENTRY(__kvm_vcpu_run)
 	bx	lr
 
+ENTRY(kvm_call_hyp)
+	bx	lr
+
 
 /********************************************************************
  * Hypervisor exception vector and handlers
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 4decdb6..4347d68 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -23,12 +23,21 @@
 #include <asm/pgalloc.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_asm.h>
 #include <asm/mach/map.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
 
 extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
+static void kvm_tlb_flush_vmid(struct kvm *kvm)
+{
+	kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+}
+
 static void kvm_set_pte(pte_t *pte, pte_t new_pte)
 {
 	pte_val(*pte) = new_pte;
@@ -39,6 +48,38 @@ static void kvm_set_pte(pte_t *pte, pte_t new_pte)
 	flush_pmd_entry(pte);
 }
 
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
+				  int min, int max)
+{
+	void *page;
+
+	BUG_ON(max > KVM_NR_MEM_OBJS);
+	if (cache->nobjs >= min)
+		return 0;
+	while (cache->nobjs < max) {
+		page = (void *)__get_free_page(PGALLOC_GFP);
+		if (!page)
+			return -ENOMEM;
+		cache->objects[cache->nobjs++] = page;
+	}
+	return 0;
+}
+
+static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+{
+	while (mc->nobjs)
+		free_page((unsigned long)mc->objects[--mc->nobjs]);
+}
+
+static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
+{
+	void *p;
+
+	BUG_ON(!mc || !mc->nobjs);
+	p = mc->objects[--mc->nobjs];
+	return p;
+}
+
 static void free_ptes(pmd_t *pmd, unsigned long addr)
 {
 	pte_t *pte;
@@ -217,11 +258,333 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t addr)
 	return __create_hyp_mappings(from, to, &pfn);
 }
 
+/**
+ * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Allocates the 1st level table only of size defined by S2_PGD_ORDER (can
+ * support either full 40-bit input addresses or limited to 32-bit input
+ * addresses). Clears the allocated pages.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * created, which can only be done once.
+ */
+int kvm_alloc_stage2_pgd(struct kvm *kvm)
+{
+	pgd_t *pgd;
+
+	if (kvm->arch.pgd != NULL) {
+		kvm_err("kvm_arch already initialized?\n");
+		return -EINVAL;
+	}
+
+	pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, S2_PGD_ORDER);
+	if (!pgd)
+		return -ENOMEM;
+
+	/* stage-2 pgd must be aligned to its size */
+	VM_BUG_ON((unsigned long)pgd & (S2_PGD_SIZE - 1));
+
+	memset(pgd, 0, PTRS_PER_S2_PGD * sizeof(pgd_t));
+	clean_dcache_area(pgd, PTRS_PER_S2_PGD * sizeof(pgd_t));
+	kvm->arch.pgd = pgd;
+
+	return 0;
+}
+
+static void clear_pud_entry(pud_t *pud)
+{
+	pmd_t *pmd_table = pmd_offset(pud, 0);
+	pud_clear(pud);
+	pmd_free(NULL, pmd_table);
+	put_page(virt_to_page(pud));
+}
+
+static void clear_pmd_entry(pmd_t *pmd)
+{
+	pte_t *pte_table = pte_offset_kernel(pmd, 0);
+	pmd_clear(pmd);
+	pte_free_kernel(NULL, pte_table);
+	put_page(virt_to_page(pmd));
+}
+
+static bool pmd_empty(pmd_t *pmd)
+{
+	struct page *pmd_page = virt_to_page(pmd);
+	return page_count(pmd_page) == 1;
+}
+
+static void clear_pte_entry(pte_t *pte)
+{
+	if (pte_present(*pte)) {
+		kvm_set_pte(pte, __pte(0));
+		put_page(virt_to_page(pte));
+	}
+}
+
+static bool pte_empty(pte_t *pte)
+{
+	struct page *pte_page = virt_to_page(pte);
+	return page_count(pte_page) == 1;
+}
+
+/**
+ * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
+ * @kvm:   The VM pointer
+ * @start: The intermediate physical base address of the range to unmap
+ * @size:  The size of the area to unmap
+ *
+ * Clear a range of stage-2 mappings, lowering the various ref-counts.  Must
+ * be called while holding mmu_lock (unless for freeing the stage2 pgd before
+ * destroying the VM), otherwise another faulting VCPU may come in and mess
+ * with things behind our backs.
+ */
+static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	phys_addr_t addr = start, end = start + size;
+	u64 range;
+
+	while (addr < end) {
+		pgd = kvm->arch.pgd + pgd_index(addr);
+		pud = pud_offset(pgd, addr);
+		if (pud_none(*pud)) {
+			addr += PUD_SIZE;
+			continue;
+		}
+
+		pmd = pmd_offset(pud, addr);
+		if (pmd_none(*pmd)) {
+			addr += PMD_SIZE;
+			continue;
+		}
+
+		pte = pte_offset_kernel(pmd, addr);
+		clear_pte_entry(pte);
+		range = PAGE_SIZE;
+
+		/* If we emptied the pte, walk back up the ladder */
+		if (pte_empty(pte)) {
+			clear_pmd_entry(pmd);
+			range = PMD_SIZE;
+			if (pmd_empty(pmd)) {
+				clear_pud_entry(pud);
+				range = PUD_SIZE;
+			}
+		}
+
+		addr += range;
+	}
+}
+
+/**
+ * kvm_free_stage2_pgd - free all stage-2 tables
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
+ * underlying level-2 and level-3 tables before freeing the actual level-1 table
+ * and setting the struct pointer to NULL.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * destroyed, which can only be done once.
+ */
+void kvm_free_stage2_pgd(struct kvm *kvm)
+{
+	if (kvm->arch.pgd == NULL)
+		return;
+
+	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
+	free_pages((unsigned long)kvm->arch.pgd, S2_PGD_ORDER);
+	kvm->arch.pgd = NULL;
+}
+
+
+static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte, old_pte;
+
+	/* Create 2nd stage page table mapping - Level 1 */
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud)) {
+		if (!cache)
+			return 0; /* ignore calls from kvm_set_spte_hva */
+		pmd = mmu_memory_cache_alloc(cache);
+		pud_populate(NULL, pud, pmd);
+		pmd += pmd_index(addr);
+		get_page(virt_to_page(pud));
+	} else
+		pmd = pmd_offset(pud, addr);
+
+	/* Create 2nd stage page table mapping - Level 2 */
+	if (pmd_none(*pmd)) {
+		if (!cache)
+			return 0; /* ignore calls from kvm_set_spte_hva */
+		pte = mmu_memory_cache_alloc(cache);
+		clean_pte_table(pte);
+		pmd_populate_kernel(NULL, pmd, pte);
+		pte += pte_index(addr);
+		get_page(virt_to_page(pmd));
+	} else
+		pte = pte_offset_kernel(pmd, addr);
+
+	if (iomap && pte_present(*pte))
+		return -EFAULT;
+
+	/* Create 2nd stage page table mapping - Level 3 */
+	old_pte = *pte;
+	kvm_set_pte(pte, *new_pte);
+	if (pte_present(old_pte))
+		kvm_tlb_flush_vmid(kvm);
+	else
+		get_page(virt_to_page(pte));
+
+	return 0;
+}
+
+/**
+ * kvm_phys_addr_ioremap - map a device range to guest IPA
+ *
+ * @kvm:	The KVM pointer
+ * @guest_ipa:	The IPA at which to insert the mapping
+ * @pa:		The physical address of the device
+ * @size:	The size of the mapping
+ */
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size)
+{
+	phys_addr_t addr, end;
+	int ret = 0;
+	unsigned long pfn;
+	struct kvm_mmu_memory_cache cache = { 0, };
+
+	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
+	pfn = __phys_to_pfn(pa);
+
+	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
+		pte_t pte = pfn_pte(pfn, PAGE_S2_DEVICE | L_PTE_S2_RDWR);
+
+		ret = mmu_topup_memory_cache(&cache, 2, 2);
+		if (ret)
+			goto out;
+		spin_lock(&kvm->mmu_lock);
+		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
+		spin_unlock(&kvm->mmu_lock);
+		if (ret)
+			goto out;
+
+		pfn++;
+	}
+
+out:
+	mmu_free_memory_cache(&cache);
+	return ret;
+}
+
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	return -EINVAL;
 }
 
+static void handle_hva_to_gpa(struct kvm *kvm,
+			      unsigned long start,
+			      unsigned long end,
+			      void (*handler)(struct kvm *kvm,
+					      gpa_t gpa, void *data),
+			      void *data)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+
+	slots = kvm_memslots(kvm);
+
+	/* we only care about the pages that the guest sees */
+	kvm_for_each_memslot(memslot, slots) {
+		unsigned long hva_start, hva_end;
+		gfn_t gfn, gfn_end;
+
+		hva_start = max(start, memslot->userspace_addr);
+		hva_end = min(end, memslot->userspace_addr +
+					(memslot->npages << PAGE_SHIFT));
+		if (hva_start >= hva_end)
+			continue;
+
+		/*
+		 * {gfn(page) | page intersects with [hva_start, hva_end)} =
+		 * {gfn_start, gfn_start+1, ..., gfn_end-1}.
+		 */
+		gfn = hva_to_gfn_memslot(hva_start, memslot);
+		gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot);
+
+		for (; gfn < gfn_end; ++gfn) {
+			gpa_t gpa = gfn << PAGE_SHIFT;
+			handler(kvm, gpa, data);
+		}
+	}
+}
+
+static void kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
+{
+	unmap_stage2_range(kvm, gpa, PAGE_SIZE);
+	kvm_tlb_flush_vmid(kvm);
+}
+
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+	unsigned long end = hva + PAGE_SIZE;
+
+	if (!kvm->arch.pgd)
+		return 0;
+
+	trace_kvm_unmap_hva(hva);
+	handle_hva_to_gpa(kvm, hva, end, &kvm_unmap_hva_handler, NULL);
+	return 0;
+}
+
+int kvm_unmap_hva_range(struct kvm *kvm,
+			unsigned long start, unsigned long end)
+{
+	if (!kvm->arch.pgd)
+		return 0;
+
+	trace_kvm_unmap_hva_range(start, end);
+	handle_hva_to_gpa(kvm, start, end, &kvm_unmap_hva_handler, NULL);
+	return 0;
+}
+
+static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
+{
+	pte_t *pte = (pte_t *)data;
+
+	stage2_set_pte(kvm, NULL, gpa, pte, false);
+}
+
+
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+{
+	unsigned long end = hva + PAGE_SIZE;
+	pte_t stage2_pte;
+
+	if (!kvm->arch.pgd)
+		return;
+
+	trace_kvm_set_spte_hva(hva);
+	stage2_pte = pfn_pte(pte_pfn(pte), PAGE_S2);
+	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
+}
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
+{
+	mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+}
+
 phys_addr_t kvm_mmu_get_httbr(void)
 {
 	VM_BUG_ON(!virt_addr_valid(hyp_pgd));
@@ -230,7 +593,12 @@ phys_addr_t kvm_mmu_get_httbr(void)
 
 int kvm_mmu_init(void)
 {
-	return hyp_pgd ? 0 : -ENOMEM;
+	if (!hyp_pgd) {
+		kvm_err("Hyp mode PGD not allocated\n");
+		return -ENOMEM;
+	}
+
+	return 0;
 }
 
 /**
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index f8869c1..862b2cc 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,7 +39,53 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_unmap_hva,
+	TP_PROTO(unsigned long hva),
+	TP_ARGS(hva),
 
+	TP_STRUCT__entry(
+		__field(	unsigned long,	hva		)
+	),
+
+	TP_fast_assign(
+		__entry->hva		= hva;
+	),
+
+	TP_printk("mmu notifier unmap hva: %#08lx", __entry->hva)
+);
+
+TRACE_EVENT(kvm_unmap_hva_range,
+	TP_PROTO(unsigned long start, unsigned long end),
+	TP_ARGS(start, end),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	start		)
+		__field(	unsigned long,	end		)
+	),
+
+	TP_fast_assign(
+		__entry->start		= start;
+		__entry->end		= end;
+	),
+
+	TP_printk("mmu notifier unmap range: %#08lx -- %#08lx",
+		  __entry->start, __entry->end)
+);
+
+TRACE_EVENT(kvm_set_spte_hva,
+	TP_PROTO(unsigned long hva),
+	TP_ARGS(hva),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	hva		)
+	),
+
+	TP_fast_assign(
+		__entry->hva		= hva;
+	),
+
+	TP_printk("mmu notifier set pte hva: %#08lx", __entry->hva)
+);
 
 #endif /* _TRACE_KVM_H */
 

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:39   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm; +Cc: Marcelo Tosatti

From: Christoffer Dall <cdall@cs.columbia.edu>

All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
works semantically well for the GIC as we in fact raise/lower a line on
a machine component (the gic).  The IOCTL uses the follwing struct.

struct kvm_irq_level {
	union {
		__u32 irq;     /* GSI */
		__s32 status;  /* not used for KVM_IRQ_LEVEL */
	};
	__u32 level;           /* 0 or 1 */
};

ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
specific cpus.  The irq field is interpreted like this:

  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
  field: | irq_type  | vcpu_index |   irq_number   |

The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
- irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
               (the vcpu_index field is ignored)
- irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)

The irq_number thus corresponds to the irq ID in as in the GICv2 specs.

This is documented in Documentation/kvm/api.txt.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |   25 ++++++++++++--
 arch/arm/include/asm/kvm_arm.h    |    1 +
 arch/arm/include/uapi/asm/kvm.h   |   21 ++++++++++++
 arch/arm/kvm/arm.c                |   65 +++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h              |   25 ++++++++++++++
 include/uapi/linux/kvm.h          |    1 +
 6 files changed, 134 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 4237c27..5050492 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -615,15 +615,32 @@ created.
 4.25 KVM_IRQ_LINE
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irq_level
 Returns: 0 on success, -1 on error
 
 Sets the level of a GSI input to the interrupt controller model in the kernel.
-Requires that an interrupt controller model has been previously created with
-KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
-to be set to 1 and then back to 0.
+On some architectures it is required that an interrupt controller model has
+been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
+interrupts require the level to be set to 1 and then back to 0.
+
+ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
+(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
+specific cpus.  The irq field is interpreted like this:
+
+  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
+  field: | irq_type  | vcpu_index |     irq_id     |
+
+The irq_type field has the following values:
+- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
+- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
+               (the vcpu_index field is ignored)
+- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
+
+(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
+
+In both cases, level is used to raise/lower the line.
 
 struct kvm_irq_level {
 	union {
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 613afe2..fb22ee8 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -68,6 +68,7 @@
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
 			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
 			HCR_SWIO | HCR_TIDCP)
+#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index c6298b1..4cf6d8f 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -23,6 +23,7 @@
 #include <asm/ptrace.h>
 
 #define __KVM_HAVE_GUEST_DEBUG
+#define __KVM_HAVE_IRQ_LINE
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
@@ -103,4 +104,24 @@ struct kvm_arch_memory_slot {
 #define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
 #define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
 
+/* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_TYPE_SHIFT		24
+#define KVM_ARM_IRQ_TYPE_MASK		0xff
+#define KVM_ARM_IRQ_VCPU_SHIFT		16
+#define KVM_ARM_IRQ_VCPU_MASK		0xff
+#define KVM_ARM_IRQ_NUM_SHIFT		0
+#define KVM_ARM_IRQ_NUM_MASK		0xffff
+
+/* irq_type field */
+#define KVM_ARM_IRQ_TYPE_CPU		0
+#define KVM_ARM_IRQ_TYPE_SPI		1
+#define KVM_ARM_IRQ_TYPE_PPI		2
+
+/* out-of-kernel GIC cpu interrupt injection irq_number field */
+#define KVM_ARM_IRQ_CPU_IRQ		0
+#define KVM_ARM_IRQ_CPU_FIQ		1
+
+/* Highest supported SPI, from VGIC_NR_IRQS */
+#define KVM_ARM_IRQ_GIC_MAX		127
+
 #endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ab82039..9b4566e 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -24,6 +24,7 @@
 #include <linux/fs.h>
 #include <linux/mman.h>
 #include <linux/sched.h>
+#include <linux/kvm.h>
 #include <trace/events/kvm.h>
 
 #define CREATE_TRACE_POINTS
@@ -284,6 +285,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+	vcpu->cpu = cpu;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -324,6 +326,69 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return -EINVAL;
 }
 
+static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
+{
+	int bit_index;
+	bool set;
+	unsigned long *ptr;
+
+	if (number == KVM_ARM_IRQ_CPU_IRQ)
+		bit_index = __ffs(HCR_VI);
+	else /* KVM_ARM_IRQ_CPU_FIQ */
+		bit_index = __ffs(HCR_VF);
+
+	ptr = (unsigned long *)&vcpu->arch.irq_lines;
+	if (level)
+		set = test_and_set_bit(bit_index, ptr);
+	else
+		set = test_and_clear_bit(bit_index, ptr);
+
+	/*
+	 * If we didn't change anything, no need to wake up or kick other CPUs
+	 */
+	if (set == level)
+		return 0;
+
+	/*
+	 * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
+	 * trigger a world-switch round on the running physical CPU to set the
+	 * virtual IRQ/FIQ fields in the HCR appropriately.
+	 */
+	kvm_vcpu_kick(vcpu);
+
+	return 0;
+}
+
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
+{
+	u32 irq = irq_level->irq;
+	unsigned int irq_type, vcpu_idx, irq_num;
+	int nrcpus = atomic_read(&kvm->online_vcpus);
+	struct kvm_vcpu *vcpu = NULL;
+	bool level = irq_level->level;
+
+	irq_type = (irq >> KVM_ARM_IRQ_TYPE_SHIFT) & KVM_ARM_IRQ_TYPE_MASK;
+	vcpu_idx = (irq >> KVM_ARM_IRQ_VCPU_SHIFT) & KVM_ARM_IRQ_VCPU_MASK;
+	irq_num = (irq >> KVM_ARM_IRQ_NUM_SHIFT) & KVM_ARM_IRQ_NUM_MASK;
+
+	trace_kvm_irq_line(irq_type, vcpu_idx, irq_num, irq_level->level);
+
+	if (irq_type != KVM_ARM_IRQ_TYPE_CPU)
+		return -EINVAL;
+
+	if (vcpu_idx >= nrcpus)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+	if (!vcpu)
+		return -EINVAL;
+
+	if (irq_num > KVM_ARM_IRQ_CPU_FIQ)
+		return -EINVAL;
+
+	return vcpu_interrupt_line(vcpu, irq_num, level);
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 862b2cc..105d1f7 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,6 +39,31 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_irq_line,
+	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
+	TP_ARGS(type, vcpu_idx, irq_num, level),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	type		)
+		__field(	int,		vcpu_idx	)
+		__field(	int,		irq_num		)
+		__field(	int,		level		)
+	),
+
+	TP_fast_assign(
+		__entry->type		= type;
+		__entry->vcpu_idx	= vcpu_idx;
+		__entry->irq_num	= irq_num;
+		__entry->level		= level;
+	),
+
+	TP_printk("Inject %s interrupt (%d), vcpu->idx: %d, num: %d, level: %d",
+		  (__entry->type == KVM_ARM_IRQ_TYPE_CPU) ? "CPU" :
+		  (__entry->type == KVM_ARM_IRQ_TYPE_PPI) ? "VGIC PPI" :
+		  (__entry->type == KVM_ARM_IRQ_TYPE_SPI) ? "VGIC SPI" : "UNKNOWN",
+		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
+);
+
 TRACE_EVENT(kvm_unmap_hva,
 	TP_PROTO(unsigned long hva),
 	TP_ARGS(hva),
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 24978d5..dc63665 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -115,6 +115,7 @@ struct kvm_irq_level {
 	 * ACPI gsi notion of irq.
 	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
 	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
+	 * For ARM: See Documentation/virtual/kvm/api.txt
 	 */
 	union {
 		__u32 irq;


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2013-01-08 18:39   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christoffer Dall <cdall@cs.columbia.edu>

All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
works semantically well for the GIC as we in fact raise/lower a line on
a machine component (the gic).  The IOCTL uses the follwing struct.

struct kvm_irq_level {
	union {
		__u32 irq;     /* GSI */
		__s32 status;  /* not used for KVM_IRQ_LEVEL */
	};
	__u32 level;           /* 0 or 1 */
};

ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
specific cpus.  The irq field is interpreted like this:

 ?bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
  field: | irq_type  | vcpu_index |   irq_number   |

The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
- irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
               (the vcpu_index field is ignored)
- irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)

The irq_number thus corresponds to the irq ID in as in the GICv2 specs.

This is documented in Documentation/kvm/api.txt.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |   25 ++++++++++++--
 arch/arm/include/asm/kvm_arm.h    |    1 +
 arch/arm/include/uapi/asm/kvm.h   |   21 ++++++++++++
 arch/arm/kvm/arm.c                |   65 +++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h              |   25 ++++++++++++++
 include/uapi/linux/kvm.h          |    1 +
 6 files changed, 134 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 4237c27..5050492 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -615,15 +615,32 @@ created.
 4.25 KVM_IRQ_LINE
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irq_level
 Returns: 0 on success, -1 on error
 
 Sets the level of a GSI input to the interrupt controller model in the kernel.
-Requires that an interrupt controller model has been previously created with
-KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
-to be set to 1 and then back to 0.
+On some architectures it is required that an interrupt controller model has
+been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
+interrupts require the level to be set to 1 and then back to 0.
+
+ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
+(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
+specific cpus.  The irq field is interpreted like this:
+
+ ?bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
+  field: | irq_type  | vcpu_index |     irq_id     |
+
+The irq_type field has the following values:
+- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
+- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
+               (the vcpu_index field is ignored)
+- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
+
+(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
+
+In both cases, level is used to raise/lower the line.
 
 struct kvm_irq_level {
 	union {
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 613afe2..fb22ee8 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -68,6 +68,7 @@
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
 			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
 			HCR_SWIO | HCR_TIDCP)
+#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index c6298b1..4cf6d8f 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -23,6 +23,7 @@
 #include <asm/ptrace.h>
 
 #define __KVM_HAVE_GUEST_DEBUG
+#define __KVM_HAVE_IRQ_LINE
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
@@ -103,4 +104,24 @@ struct kvm_arch_memory_slot {
 #define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
 #define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
 
+/* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_TYPE_SHIFT		24
+#define KVM_ARM_IRQ_TYPE_MASK		0xff
+#define KVM_ARM_IRQ_VCPU_SHIFT		16
+#define KVM_ARM_IRQ_VCPU_MASK		0xff
+#define KVM_ARM_IRQ_NUM_SHIFT		0
+#define KVM_ARM_IRQ_NUM_MASK		0xffff
+
+/* irq_type field */
+#define KVM_ARM_IRQ_TYPE_CPU		0
+#define KVM_ARM_IRQ_TYPE_SPI		1
+#define KVM_ARM_IRQ_TYPE_PPI		2
+
+/* out-of-kernel GIC cpu interrupt injection irq_number field */
+#define KVM_ARM_IRQ_CPU_IRQ		0
+#define KVM_ARM_IRQ_CPU_FIQ		1
+
+/* Highest supported SPI, from VGIC_NR_IRQS */
+#define KVM_ARM_IRQ_GIC_MAX		127
+
 #endif /* __ARM_KVM_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ab82039..9b4566e 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -24,6 +24,7 @@
 #include <linux/fs.h>
 #include <linux/mman.h>
 #include <linux/sched.h>
+#include <linux/kvm.h>
 #include <trace/events/kvm.h>
 
 #define CREATE_TRACE_POINTS
@@ -284,6 +285,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+	vcpu->cpu = cpu;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -324,6 +326,69 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return -EINVAL;
 }
 
+static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
+{
+	int bit_index;
+	bool set;
+	unsigned long *ptr;
+
+	if (number == KVM_ARM_IRQ_CPU_IRQ)
+		bit_index = __ffs(HCR_VI);
+	else /* KVM_ARM_IRQ_CPU_FIQ */
+		bit_index = __ffs(HCR_VF);
+
+	ptr = (unsigned long *)&vcpu->arch.irq_lines;
+	if (level)
+		set = test_and_set_bit(bit_index, ptr);
+	else
+		set = test_and_clear_bit(bit_index, ptr);
+
+	/*
+	 * If we didn't change anything, no need to wake up or kick other CPUs
+	 */
+	if (set == level)
+		return 0;
+
+	/*
+	 * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
+	 * trigger a world-switch round on the running physical CPU to set the
+	 * virtual IRQ/FIQ fields in the HCR appropriately.
+	 */
+	kvm_vcpu_kick(vcpu);
+
+	return 0;
+}
+
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
+{
+	u32 irq = irq_level->irq;
+	unsigned int irq_type, vcpu_idx, irq_num;
+	int nrcpus = atomic_read(&kvm->online_vcpus);
+	struct kvm_vcpu *vcpu = NULL;
+	bool level = irq_level->level;
+
+	irq_type = (irq >> KVM_ARM_IRQ_TYPE_SHIFT) & KVM_ARM_IRQ_TYPE_MASK;
+	vcpu_idx = (irq >> KVM_ARM_IRQ_VCPU_SHIFT) & KVM_ARM_IRQ_VCPU_MASK;
+	irq_num = (irq >> KVM_ARM_IRQ_NUM_SHIFT) & KVM_ARM_IRQ_NUM_MASK;
+
+	trace_kvm_irq_line(irq_type, vcpu_idx, irq_num, irq_level->level);
+
+	if (irq_type != KVM_ARM_IRQ_TYPE_CPU)
+		return -EINVAL;
+
+	if (vcpu_idx >= nrcpus)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+	if (!vcpu)
+		return -EINVAL;
+
+	if (irq_num > KVM_ARM_IRQ_CPU_FIQ)
+		return -EINVAL;
+
+	return vcpu_interrupt_line(vcpu, irq_num, level);
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 862b2cc..105d1f7 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,6 +39,31 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_irq_line,
+	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
+	TP_ARGS(type, vcpu_idx, irq_num, level),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	type		)
+		__field(	int,		vcpu_idx	)
+		__field(	int,		irq_num		)
+		__field(	int,		level		)
+	),
+
+	TP_fast_assign(
+		__entry->type		= type;
+		__entry->vcpu_idx	= vcpu_idx;
+		__entry->irq_num	= irq_num;
+		__entry->level		= level;
+	),
+
+	TP_printk("Inject %s interrupt (%d), vcpu->idx: %d, num: %d, level: %d",
+		  (__entry->type == KVM_ARM_IRQ_TYPE_CPU) ? "CPU" :
+		  (__entry->type == KVM_ARM_IRQ_TYPE_PPI) ? "VGIC PPI" :
+		  (__entry->type == KVM_ARM_IRQ_TYPE_SPI) ? "VGIC SPI" : "UNKNOWN",
+		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
+);
+
 TRACE_EVENT(kvm_unmap_hva,
 	TP_PROTO(unsigned long hva),
 	TP_ARGS(hva),
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 24978d5..dc63665 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -115,6 +115,7 @@ struct kvm_irq_level {
 	 * ACPI gsi notion of irq.
 	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
 	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
+	 * For ARM: See Documentation/virtual/kvm/api.txt
 	 */
 	union {
 		__u32 irq;

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:39   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm
  Cc: Marc Zyngier, Antonios Motakis, Marcelo Tosatti, Rusty Russell

Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.

The following Hyp-ABI is also documented in the code:

Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
   Switching to Hyp mode is done through a simple HVC #0 instruction. The
   exception vector code will check that the HVC comes from VMID==0 and if
   so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
   - r0 contains a pointer to a HYP function
   - r1, r2, and r3 contain arguments to the above function.
   - The HYP function will be called with its arguments in r0, r1 and r2.
   On HYP function return, we return directly to SVC.

A call to a function executing in Hyp mode is performed like the following:

        <svc code>
        ldr     r0, =BSYM(my_hyp_fn)
        ldr     r1, =my_param
        hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
        <svc code>

Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.

SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.

Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.

To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU.  After a guest exit, the VFP state is
returned to the host.  When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state.  We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.

Aborts that are permission faults, and not stage-1 page table walk, do
not report the faulting address in the HPFAR.  We have to resolve the
IPA, and store it just like the HPFAR register on the VCPU struct. If
the IPA cannot be resolved, it means another CPU is playing with the
page tables, and we simply restart the guest.  This quirk was fixed by
Marc Zyngier.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h  |   51 ++++
 arch/arm/include/asm/kvm_host.h |   10 +
 arch/arm/kernel/asm-offsets.c   |   25 ++
 arch/arm/kvm/arm.c              |  187 ++++++++++++++++
 arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
 arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 1108 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/kvm/interrupts_head.S

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index fb22ee8..a3262a2 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -98,6 +98,18 @@
 #define TTBCR_T0SZ	3
 #define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
 
+/* Hyp System Trap Register */
+#define HSTR_T(x)	(1 << x)
+#define HSTR_TTEE	(1 << 16)
+#define HSTR_TJDBX	(1 << 17)
+
+/* Hyp Coprocessor Trap Register */
+#define HCPTR_TCP(x)	(1 << x)
+#define HCPTR_TCP_MASK	(0x3fff)
+#define HCPTR_TASE	(1 << 15)
+#define HCPTR_TTA	(1 << 20)
+#define HCPTR_TCPAC	(1 << 31)
+
 /* Hyp Debug Configuration Register bits */
 #define HDCR_TDRA	(1 << 11)
 #define HDCR_TDOSA	(1 << 10)
@@ -144,6 +156,45 @@
 #else
 #define VTTBR_X		(5 - KVM_T0SZ)
 #endif
+#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
+#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
+#define VTTBR_VMID_SHIFT  (48LLU)
+#define VTTBR_VMID_MASK	  (0xffLLU << VTTBR_VMID_SHIFT)
+
+/* Hyp Syndrome Register (HSR) bits */
+#define HSR_EC_SHIFT	(26)
+#define HSR_EC		(0x3fU << HSR_EC_SHIFT)
+#define HSR_IL		(1U << 25)
+#define HSR_ISS		(HSR_IL - 1)
+#define HSR_ISV_SHIFT	(24)
+#define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_FSC		(0x3f)
+#define HSR_FSC_TYPE	(0x3c)
+#define HSR_WNR		(1 << 6)
+
+#define FSC_FAULT	(0x04)
+#define FSC_PERM	(0x0c)
+
+/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
+#define HPFAR_MASK	(~0xf)
 
+#define HSR_EC_UNKNOWN	(0x00)
+#define HSR_EC_WFI	(0x01)
+#define HSR_EC_CP15_32	(0x03)
+#define HSR_EC_CP15_64	(0x04)
+#define HSR_EC_CP14_MR	(0x05)
+#define HSR_EC_CP14_LS	(0x06)
+#define HSR_EC_CP_0_13	(0x07)
+#define HSR_EC_CP10_ID	(0x08)
+#define HSR_EC_JAZELLE	(0x09)
+#define HSR_EC_BXJ	(0x0A)
+#define HSR_EC_CP14_64	(0x0C)
+#define HSR_EC_SVC_HYP	(0x11)
+#define HSR_EC_HVC	(0x12)
+#define HSR_EC_SMC	(0x13)
+#define HSR_EC_IABT	(0x20)
+#define HSR_EC_IABT_HYP	(0x21)
+#define HSR_EC_DABT	(0x24)
+#define HSR_EC_DABT_HYP	(0x25)
 
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 1de6f0d..ddb09da 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -21,6 +21,7 @@
 
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
+#include <asm/fpstate.h>
 
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
 #define KVM_USER_MEM_SLOTS 32
@@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
 	u32 hxfar;		/* Hyp Data/Inst Fault Address Register */
 	u32 hpfar;		/* Hyp IPA Fault Address Register */
 
+	/* Floating point registers (VFP and Advanced SIMD/NEON) */
+	struct vfp_hard_struct vfp_guest;
+	struct vfp_hard_struct *vfp_host;
+
+	/*
+	 * Anything that is not used directly from assembly code goes
+	 * here.
+	 */
 	/* Interrupt related fields */
 	u32 irq_lines;		/* IRQ and FIQ levels */
 
@@ -112,6 +121,7 @@ struct kvm_one_reg;
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 u64 kvm_call_hyp(void *hypfn, ...);
+void force_vm_exit(const cpumask_t *mask);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 struct kvm;
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index c985b48..c8b3272 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -13,6 +13,9 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/dma-mapping.h>
+#ifdef CONFIG_KVM_ARM_HOST
+#include <linux/kvm_host.h>
+#endif
 #include <asm/cacheflush.h>
 #include <asm/glue-df.h>
 #include <asm/glue-pf.h>
@@ -146,5 +149,27 @@ int main(void)
   DEFINE(DMA_BIDIRECTIONAL,	DMA_BIDIRECTIONAL);
   DEFINE(DMA_TO_DEVICE,		DMA_TO_DEVICE);
   DEFINE(DMA_FROM_DEVICE,	DMA_FROM_DEVICE);
+#ifdef CONFIG_KVM_ARM_HOST
+  DEFINE(VCPU_KVM,		offsetof(struct kvm_vcpu, kvm));
+  DEFINE(VCPU_MIDR,		offsetof(struct kvm_vcpu, arch.midr));
+  DEFINE(VCPU_CP15,		offsetof(struct kvm_vcpu, arch.cp15));
+  DEFINE(VCPU_VFP_GUEST,	offsetof(struct kvm_vcpu, arch.vfp_guest));
+  DEFINE(VCPU_VFP_HOST,		offsetof(struct kvm_vcpu, arch.vfp_host));
+  DEFINE(VCPU_REGS,		offsetof(struct kvm_vcpu, arch.regs));
+  DEFINE(VCPU_USR_REGS,		offsetof(struct kvm_vcpu, arch.regs.usr_regs));
+  DEFINE(VCPU_SVC_REGS,		offsetof(struct kvm_vcpu, arch.regs.svc_regs));
+  DEFINE(VCPU_ABT_REGS,		offsetof(struct kvm_vcpu, arch.regs.abt_regs));
+  DEFINE(VCPU_UND_REGS,		offsetof(struct kvm_vcpu, arch.regs.und_regs));
+  DEFINE(VCPU_IRQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.irq_regs));
+  DEFINE(VCPU_FIQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
+  DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
+  DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
+  DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.hsr));
+  DEFINE(VCPU_HxFAR,		offsetof(struct kvm_vcpu, arch.hxfar));
+  DEFINE(VCPU_HPFAR,		offsetof(struct kvm_vcpu, arch.hpfar));
+  DEFINE(VCPU_HYP_PC,		offsetof(struct kvm_vcpu, arch.hyp_pc));
+  DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
+#endif
   return 0; 
 }
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 9b4566e..c94d278 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -40,6 +40,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_emulate.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
@@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
 static unsigned long hyp_default_vectors;
 
+/* The VMID used in the VTTBR */
+static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
+static u8 kvm_next_vmid;
+static DEFINE_SPINLOCK(kvm_vmid_lock);
 
 int kvm_arch_hardware_enable(void *garbage)
 {
@@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
 
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
+	/* Force users to call KVM_ARM_VCPU_INIT */
+	vcpu->arch.target = -1;
 	return 0;
 }
 
@@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
+	vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
 {
+	return v->mode == IN_GUEST_MODE;
+}
+
+/* Just ensure a guest exit from a particular CPU */
+static void exit_vm_noop(void *info)
+{
+}
+
+void force_vm_exit(const cpumask_t *mask)
+{
+	smp_call_function_many(mask, exit_vm_noop, NULL, true);
+}
+
+/**
+ * need_new_vmid_gen - check that the VMID is still valid
+ * @kvm: The VM's VMID to checkt
+ *
+ * return true if there is a new generation of VMIDs being used
+ *
+ * The hardware supports only 256 values with the value zero reserved for the
+ * host, so we check if an assigned value belongs to a previous generation,
+ * which which requires us to assign a new value. If we're the first to use a
+ * VMID for the new generation, we must flush necessary caches and TLBs on all
+ * CPUs.
+ */
+static bool need_new_vmid_gen(struct kvm *kvm)
+{
+	return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
+}
+
+/**
+ * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
+ * @kvm	The guest that we are about to run
+ *
+ * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
+ * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
+ * caches and TLBs.
+ */
+static void update_vttbr(struct kvm *kvm)
+{
+	phys_addr_t pgd_phys;
+	u64 vmid;
+
+	if (!need_new_vmid_gen(kvm))
+		return;
+
+	spin_lock(&kvm_vmid_lock);
+
+	/*
+	 * We need to re-check the vmid_gen here to ensure that if another vcpu
+	 * already allocated a valid vmid for this vm, then this vcpu should
+	 * use the same vmid.
+	 */
+	if (!need_new_vmid_gen(kvm)) {
+		spin_unlock(&kvm_vmid_lock);
+		return;
+	}
+
+	/* First user of a new VMID generation? */
+	if (unlikely(kvm_next_vmid == 0)) {
+		atomic64_inc(&kvm_vmid_gen);
+		kvm_next_vmid = 1;
+
+		/*
+		 * On SMP we know no other CPUs can use this CPU's or each
+		 * other's VMID after force_vm_exit returns since the
+		 * kvm_vmid_lock blocks them from reentry to the guest.
+		 */
+		force_vm_exit(cpu_all_mask);
+		/*
+		 * Now broadcast TLB + ICACHE invalidation over the inner
+		 * shareable domain to make sure all data structures are
+		 * clean.
+		 */
+		kvm_call_hyp(__kvm_flush_vm_context);
+	}
+
+	kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
+	kvm->arch.vmid = kvm_next_vmid;
+	kvm_next_vmid++;
+
+	/* update vttbr to be used with the new vmid */
+	pgd_phys = virt_to_phys(kvm->arch.pgd);
+	vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
+	kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
+	kvm->arch.vttbr |= vmid;
+
+	spin_unlock(&kvm_vmid_lock);
+}
+
+/*
+ * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
+ * proper exit to QEMU.
+ */
+static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+		       int exception_index)
+{
+	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 	return 0;
 }
 
+/**
+ * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
+ * @vcpu:	The VCPU pointer
+ * @run:	The kvm_run structure pointer used for userspace state exchange
+ *
+ * This function is called through the VCPU_RUN ioctl called from user space. It
+ * will execute VM code in a loop until the time slice for the process is used
+ * or some emulation is needed from user space in which case the function will
+ * return with return value 0 and with the kvm_run structure filled in with the
+ * required data for the requested emulation.
+ */
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	int ret;
+	sigset_t sigsaved;
+
+	/* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
+	if (unlikely(vcpu->arch.target < 0))
+		return -ENOEXEC;
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
+
+	ret = 1;
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	while (ret > 0) {
+		/*
+		 * Check conditions before entering the guest
+		 */
+		cond_resched();
+
+		update_vttbr(vcpu->kvm);
+
+		local_irq_disable();
+
+		/*
+		 * Re-check atomic conditions
+		 */
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			run->exit_reason = KVM_EXIT_INTR;
+		}
+
+		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
+			local_irq_enable();
+			continue;
+		}
+
+		/**************************************************************
+		 * Enter the guest
+		 */
+		trace_kvm_entry(*vcpu_pc(vcpu));
+		kvm_guest_enter();
+		vcpu->mode = IN_GUEST_MODE;
+
+		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
+
+		vcpu->mode = OUTSIDE_GUEST_MODE;
+		kvm_guest_exit();
+		trace_kvm_exit(*vcpu_pc(vcpu));
+		/*
+		 * We may have taken a host interrupt in HYP mode (ie
+		 * while executing the guest). This interrupt is still
+		 * pending, as we haven't serviced it yet!
+		 *
+		 * We're now back in SVC mode, with interrupts
+		 * disabled.  Enabling the interrupts now will have
+		 * the effect of taking the interrupt again, in SVC
+		 * mode this time.
+		 */
+		local_irq_enable();
+
+		/*
+		 * Back from guest
+		 *************************************************************/
+
+		ret = handle_exit(vcpu, run, ret);
+	}
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &sigsaved, NULL);
+	return ret;
 }
 
 static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index a923590..08adcd5 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -20,9 +20,12 @@
 #include <linux/const.h>
 #include <asm/unified.h>
 #include <asm/page.h>
+#include <asm/ptrace.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
+#include <asm/vfpmacros.h>
+#include "interrupts_head.S"
 
 	.text
 
@@ -31,36 +34,423 @@ __kvm_hyp_code_start:
 
 /********************************************************************
  * Flush per-VMID TLBs
+ *
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm);
+ *
+ * We rely on the hardware to broadcast the TLB invalidation to all CPUs
+ * inside the inner-shareable domain (which is the case for all v7
+ * implementations).  If we come across a non-IS SMP implementation, we'll
+ * have to use an IPI based mechanism. Until then, we stick to the simple
+ * hardware assisted version.
  */
 ENTRY(__kvm_tlb_flush_vmid)
+	push	{r2, r3}
+
+	add	r0, r0, #KVM_VTTBR
+	ldrd	r2, r3, [r0]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+	isb
+	mcr     p15, 0, r0, c8, c3, 0	@ TLBIALLIS (rt ignored)
+	dsb
+	isb
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
+	isb				@ Not necessary if followed by eret
+
+	pop	{r2, r3}
 	bx	lr
 ENDPROC(__kvm_tlb_flush_vmid)
 
 /********************************************************************
- * Flush TLBs and instruction caches of current CPU for all VMIDs
+ * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
+ * domain, for all VMIDs
+ *
+ * void __kvm_flush_vm_context(void);
  */
 ENTRY(__kvm_flush_vm_context)
+	mov	r0, #0			@ rn parameter for c15 flushes is SBZ
+
+	/* Invalidate NS Non-Hyp TLB Inner Shareable (TLBIALLNSNHIS) */
+	mcr     p15, 4, r0, c8, c3, 4
+	/* Invalidate instruction caches Inner Shareable (ICIALLUIS) */
+	mcr     p15, 0, r0, c7, c1, 0
+	dsb
+	isb				@ Not necessary if followed by eret
+
 	bx	lr
 ENDPROC(__kvm_flush_vm_context)
 
+
 /********************************************************************
  *  Hypervisor world-switch code
+ *
+ *
+ * int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
  */
 ENTRY(__kvm_vcpu_run)
-	bx	lr
+	@ Save the vcpu pointer
+	mcr	p15, 4, vcpu, c13, c0, 2	@ HTPIDR
+
+	save_host_regs
+
+	@ Store hardware CP15 state and load guest state
+	read_cp15_state store_to_vcpu = 0
+	write_cp15_state read_from_vcpu = 1
+
+	@ If the host kernel has not been configured with VFPv3 support,
+	@ then it is safer if we deny guests from using it as well.
+#ifdef CONFIG_VFPv3
+	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
+	VFPFMRX r2, FPEXC		@ VMRS
+	push	{r2}
+	orr	r2, r2, #FPEXC_EN
+	VFPFMXR FPEXC, r2		@ VMSR
+#endif
+
+	@ Configure Hyp-role
+	configure_hyp_role vmentry
+
+	@ Trap coprocessor CRx accesses
+	set_hstr vmentry
+	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+	set_hdcr vmentry
+
+	@ Write configured ID register into MIDR alias
+	ldr	r1, [vcpu, #VCPU_MIDR]
+	mcr	p15, 4, r1, c0, c0, 0
+
+	@ Write guest view of MPIDR into VMPIDR
+	ldr	r1, [vcpu, #CP15_OFFSET(c0_MPIDR)]
+	mcr	p15, 4, r1, c0, c0, 5
+
+	@ Set up guest memory translation
+	ldr	r1, [vcpu, #VCPU_KVM]
+	add	r1, r1, #KVM_VTTBR
+	ldrd	r2, r3, [r1]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ We're all done, just restore the GPRs and go to the guest
+	restore_guest_regs
+	clrex				@ Clear exclusive monitor
+	eret
+
+__kvm_vcpu_return:
+	/*
+	 * return convention:
+	 * guest r0, r1, r2 saved on the stack
+	 * r0: vcpu pointer
+	 * r1: exception code
+	 */
+	save_guest_regs
+
+	@ Set VMID == 0
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Don't trap coprocessor accesses for host kernel
+	set_hstr vmexit
+	set_hdcr vmexit
+	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+
+#ifdef CONFIG_VFPv3
+	@ Save floating point registers we if let guest use them.
+	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
+	bne	after_vfp_restore
+
+	@ Switch VFP/NEON hardware state to the host's
+	add	r7, vcpu, #VCPU_VFP_GUEST
+	store_vfp_state r7
+	add	r7, vcpu, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	restore_vfp_state r7
+
+after_vfp_restore:
+	@ Restore FPEXC_EN which we clobbered on entry
+	pop	{r2}
+	VFPFMXR FPEXC, r2
+#endif
+
+	@ Reset Hyp-role
+	configure_hyp_role vmexit
+
+	@ Let host read hardware MIDR
+	mrc	p15, 0, r2, c0, c0, 0
+	mcr	p15, 4, r2, c0, c0, 0
+
+	@ Back to hardware MPIDR
+	mrc	p15, 0, r2, c0, c0, 5
+	mcr	p15, 4, r2, c0, c0, 5
+
+	@ Store guest CP15 state and restore host state
+	read_cp15_state store_to_vcpu = 1
+	write_cp15_state read_from_vcpu = 0
+
+	restore_host_regs
+	clrex				@ Clear exclusive monitor
+	mov	r0, r1			@ Return the return code
+	bx	lr			@ return to IOCTL
 
 ENTRY(kvm_call_hyp)
+	hvc	#0
 	bx	lr
 
 
 /********************************************************************
  * Hypervisor exception vector and handlers
+ *
+ *
+ * The KVM/ARM Hypervisor ABI is defined as follows:
+ *
+ * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
+ * instruction is issued since all traps are disabled when running the host
+ * kernel as per the Hyp-mode initialization at boot time.
+ *
+ * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
+ * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
+ * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
+ * instructions are called from within Hyp-mode.
+ *
+ * Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
+ *    Switching to Hyp mode is done through a simple HVC #0 instruction. The
+ *    exception vector code will check that the HVC comes from VMID==0 and if
+ *    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
+ *    - r0 contains a pointer to a HYP function
+ *    - r1, r2, and r3 contain arguments to the above function.
+ *    - The HYP function will be called with its arguments in r0, r1 and r2.
+ *    On HYP function return, we return directly to SVC.
+ *
+ * Note that the above is used to execute code in Hyp-mode from a host-kernel
+ * point of view, and is a different concept from performing a world-switch and
+ * executing guest code SVC mode (with a VMID != 0).
  */
 
+/* Handle undef, svc, pabt, or dabt by crashing with a user notice */
+.macro bad_exception exception_code, panic_str
+	push	{r0-r2}
+	mrrc	p15, 6, r0, r1, c2	@ Read VTTBR
+	lsr	r1, r1, #16
+	ands	r1, r1, #0xff
+	beq	99f
+
+	load_vcpu			@ Load VCPU pointer
+	.if \exception_code == ARM_EXCEPTION_DATA_ABORT
+	mrc	p15, 4, r2, c5, c2, 0	@ HSR
+	mrc	p15, 4, r1, c6, c0, 0	@ HDFAR
+	str	r2, [vcpu, #VCPU_HSR]
+	str	r1, [vcpu, #VCPU_HxFAR]
+	.endif
+	.if \exception_code == ARM_EXCEPTION_PREF_ABORT
+	mrc	p15, 4, r2, c5, c2, 0	@ HSR
+	mrc	p15, 4, r1, c6, c0, 2	@ HIFAR
+	str	r2, [vcpu, #VCPU_HSR]
+	str	r1, [vcpu, #VCPU_HxFAR]
+	.endif
+	mov	r1, #\exception_code
+	b	__kvm_vcpu_return
+
+	@ We were in the host already. Let's craft a panic-ing return to SVC.
+99:	mrs	r2, cpsr
+	bic	r2, r2, #MODE_MASK
+	orr	r2, r2, #SVC_MODE
+THUMB(	orr	r2, r2, #PSR_T_BIT	)
+	msr	spsr_cxsf, r2
+	mrs	r1, ELR_hyp
+	ldr	r2, =BSYM(panic)
+	msr	ELR_hyp, r2
+	ldr	r0, =\panic_str
+	eret
+.endm
+
+	.text
+
 	.align 5
 __kvm_hyp_vector:
 	.globl __kvm_hyp_vector
-	nop
+
+	@ Hyp-mode exception vector
+	W(b)	hyp_reset
+	W(b)	hyp_undef
+	W(b)	hyp_svc
+	W(b)	hyp_pabt
+	W(b)	hyp_dabt
+	W(b)	hyp_hvc
+	W(b)	hyp_irq
+	W(b)	hyp_fiq
+
+	.align
+hyp_reset:
+	b	hyp_reset
+
+	.align
+hyp_undef:
+	bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str
+
+	.align
+hyp_svc:
+	bad_exception ARM_EXCEPTION_HVC, svc_die_str
+
+	.align
+hyp_pabt:
+	bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str
+
+	.align
+hyp_dabt:
+	bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str
+
+	.align
+hyp_hvc:
+	/*
+	 * Getting here is either becuase of a trap from a guest or from calling
+	 * HVC from the host kernel, which means "switch to Hyp mode".
+	 */
+	push	{r0, r1, r2}
+
+	@ Check syndrome register
+	mrc	p15, 4, r1, c5, c2, 0	@ HSR
+	lsr	r0, r1, #HSR_EC_SHIFT
+#ifdef CONFIG_VFPv3
+	cmp	r0, #HSR_EC_CP_0_13
+	beq	switch_to_guest_vfp
+#endif
+	cmp	r0, #HSR_EC_HVC
+	bne	guest_trap		@ Not HVC instr.
+
+	/*
+	 * Let's check if the HVC came from VMID 0 and allow simple
+	 * switch to Hyp mode
+	 */
+	mrrc    p15, 6, r0, r2, c2
+	lsr     r2, r2, #16
+	and     r2, r2, #0xff
+	cmp     r2, #0
+	bne	guest_trap		@ Guest called HVC
+
+host_switch_to_hyp:
+	pop	{r0, r1, r2}
+
+	push	{lr}
+	mrs	lr, SPSR
+	push	{lr}
+
+	mov	lr, r0
+	mov	r0, r1
+	mov	r1, r2
+	mov	r2, r3
+
+THUMB(	orr	lr, #1)
+	blx	lr			@ Call the HYP function
+
+	pop	{lr}
+	msr	SPSR_csxf, lr
+	pop	{lr}
+	eret
+
+guest_trap:
+	load_vcpu			@ Load VCPU pointer to r0
+	str	r1, [vcpu, #VCPU_HSR]
+
+	@ Check if we need the fault information
+	lsr	r1, r1, #HSR_EC_SHIFT
+	cmp	r1, #HSR_EC_IABT
+	mrceq	p15, 4, r2, c6, c0, 2	@ HIFAR
+	beq	2f
+	cmp	r1, #HSR_EC_DABT
+	bne	1f
+	mrc	p15, 4, r2, c6, c0, 0	@ HDFAR
+
+2:	str	r2, [vcpu, #VCPU_HxFAR]
+
+	/*
+	 * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode:
+	 *
+	 * Abort on the stage 2 translation for a memory access from a
+	 * Non-secure PL1 or PL0 mode:
+	 *
+	 * For any Access flag fault or Translation fault, and also for any
+	 * Permission fault on the stage 2 translation of a memory access
+	 * made as part of a translation table walk for a stage 1 translation,
+	 * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR
+	 * is UNKNOWN.
+	 */
+
+	/* Check for permission fault, and S1PTW */
+	mrc	p15, 4, r1, c5, c2, 0	@ HSR
+	and	r0, r1, #HSR_FSC_TYPE
+	cmp	r0, #FSC_PERM
+	tsteq	r1, #(1 << 7)		@ S1PTW
+	mrcne	p15, 4, r2, c6, c0, 4	@ HPFAR
+	bne	3f
+
+	/* Resolve IPA using the xFAR */
+	mcr	p15, 0, r2, c7, c8, 0	@ ATS1CPR
+	isb
+	mrrc	p15, 0, r0, r1, c7	@ PAR
+	tst	r0, #1
+	bne	4f			@ Failed translation
+	ubfx	r2, r0, #12, #20
+	lsl	r2, r2, #4
+	orr	r2, r2, r1, lsl #24
+
+3:	load_vcpu			@ Load VCPU pointer to r0
+	str	r2, [r0, #VCPU_HPFAR]
+
+1:	mov	r1, #ARM_EXCEPTION_HVC
+	b	__kvm_vcpu_return
+
+4:	pop	{r0, r1, r2}		@ Failed translation, return to guest
+	eret
+
+/*
+ * If VFPv3 support is not available, then we will not switch the VFP
+ * registers; however cp10 and cp11 accesses will still trap and fallback
+ * to the regular coprocessor emulation code, which currently will
+ * inject an undefined exception to the guest.
+ */
+#ifdef CONFIG_VFPv3
+switch_to_guest_vfp:
+	load_vcpu			@ Load VCPU pointer to r0
+	push	{r3-r7}
+
+	@ NEON/VFP used.  Turn on VFP access.
+	set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
+
+	@ Switch VFP/NEON hardware state to the guest's
+	add	r7, r0, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	store_vfp_state r7
+	add	r7, r0, #VCPU_VFP_GUEST
+	restore_vfp_state r7
+
+	pop	{r3-r7}
+	pop	{r0-r2}
+	eret
+#endif
+
+	.align
+hyp_irq:
+	push	{r0, r1, r2}
+	mov	r1, #ARM_EXCEPTION_IRQ
+	load_vcpu			@ Load VCPU pointer to r0
+	b	__kvm_vcpu_return
+
+	.align
+hyp_fiq:
+	b	hyp_fiq
+
+	.ltorg
 
 __kvm_hyp_code_end:
 	.globl	__kvm_hyp_code_end
+
+	.section ".rodata"
+
+und_die_str:
+	.ascii	"unexpected undefined exception in Hyp mode at: %#08x"
+pabt_die_str:
+	.ascii	"unexpected prefetch abort in Hyp mode at: %#08x"
+dabt_die_str:
+	.ascii	"unexpected data abort in Hyp mode at: %#08x"
+svc_die_str:
+	.ascii	"unexpected HVC/SVC trap in Hyp mode at: %#08x"
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
new file mode 100644
index 0000000..f59a580
--- /dev/null
+++ b/arch/arm/kvm/interrupts_head.S
@@ -0,0 +1,443 @@
+#define VCPU_USR_REG(_reg_nr)	(VCPU_USR_REGS + (_reg_nr * 4))
+#define VCPU_USR_SP		(VCPU_USR_REG(13))
+#define VCPU_USR_LR		(VCPU_USR_REG(14))
+#define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4))
+
+/*
+ * Many of these macros need to access the VCPU structure, which is always
+ * held in r0. These macros should never clobber r1, as it is used to hold the
+ * exception code on the return path (except of course the macro that switches
+ * all the registers before the final jump to the VM).
+ */
+vcpu	.req	r0		@ vcpu pointer always in r0
+
+/* Clobbers {r2-r6} */
+.macro store_vfp_state vfp_base
+	@ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions
+	VFPFMRX	r2, FPEXC
+	@ Make sure VFP is enabled so we can touch the registers.
+	orr	r6, r2, #FPEXC_EN
+	VFPFMXR	FPEXC, r6
+
+	VFPFMRX	r3, FPSCR
+	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
+	beq	1f
+	@ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so
+	@ we only need to save them if FPEXC_EX is set.
+	VFPFMRX r4, FPINST
+	tst	r2, #FPEXC_FP2V
+	VFPFMRX r5, FPINST2, ne		@ vmrsne
+	bic	r6, r2, #FPEXC_EX	@ FPEXC_EX disable
+	VFPFMXR	FPEXC, r6
+1:
+	VFPFSTMIA \vfp_base, r6		@ Save VFP registers
+	stm	\vfp_base, {r2-r5}	@ Save FPEXC, FPSCR, FPINST, FPINST2
+.endm
+
+/* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */
+.macro restore_vfp_state vfp_base
+	VFPFLDMIA \vfp_base, r6		@ Load VFP registers
+	ldm	\vfp_base, {r2-r5}	@ Load FPEXC, FPSCR, FPINST, FPINST2
+
+	VFPFMXR FPSCR, r3
+	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
+	beq	1f
+	VFPFMXR FPINST, r4
+	tst	r2, #FPEXC_FP2V
+	VFPFMXR FPINST2, r5, ne
+1:
+	VFPFMXR FPEXC, r2	@ FPEXC	(last, in case !EN)
+.endm
+
+/* These are simply for the macros to work - value don't have meaning */
+.equ usr, 0
+.equ svc, 1
+.equ abt, 2
+.equ und, 3
+.equ irq, 4
+.equ fiq, 5
+
+.macro push_host_regs_mode mode
+	mrs	r2, SP_\mode
+	mrs	r3, LR_\mode
+	mrs	r4, SPSR_\mode
+	push	{r2, r3, r4}
+.endm
+
+/*
+ * Store all host persistent registers on the stack.
+ * Clobbers all registers, in all modes, except r0 and r1.
+ */
+.macro save_host_regs
+	/* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
+	mrs	r2, ELR_hyp
+	push	{r2}
+
+	/* usr regs */
+	push	{r4-r12}	@ r0-r3 are always clobbered
+	mrs	r2, SP_usr
+	mov	r3, lr
+	push	{r2, r3}
+
+	push_host_regs_mode svc
+	push_host_regs_mode abt
+	push_host_regs_mode und
+	push_host_regs_mode irq
+
+	/* fiq regs */
+	mrs	r2, r8_fiq
+	mrs	r3, r9_fiq
+	mrs	r4, r10_fiq
+	mrs	r5, r11_fiq
+	mrs	r6, r12_fiq
+	mrs	r7, SP_fiq
+	mrs	r8, LR_fiq
+	mrs	r9, SPSR_fiq
+	push	{r2-r9}
+.endm
+
+.macro pop_host_regs_mode mode
+	pop	{r2, r3, r4}
+	msr	SP_\mode, r2
+	msr	LR_\mode, r3
+	msr	SPSR_\mode, r4
+.endm
+
+/*
+ * Restore all host registers from the stack.
+ * Clobbers all registers, in all modes, except r0 and r1.
+ */
+.macro restore_host_regs
+	pop	{r2-r9}
+	msr	r8_fiq, r2
+	msr	r9_fiq, r3
+	msr	r10_fiq, r4
+	msr	r11_fiq, r5
+	msr	r12_fiq, r6
+	msr	SP_fiq, r7
+	msr	LR_fiq, r8
+	msr	SPSR_fiq, r9
+
+	pop_host_regs_mode irq
+	pop_host_regs_mode und
+	pop_host_regs_mode abt
+	pop_host_regs_mode svc
+
+	pop	{r2, r3}
+	msr	SP_usr, r2
+	mov	lr, r3
+	pop	{r4-r12}
+
+	pop	{r2}
+	msr	ELR_hyp, r2
+.endm
+
+/*
+ * Restore SP, LR and SPSR for a given mode. offset is the offset of
+ * this mode's registers from the VCPU base.
+ *
+ * Assumes vcpu pointer in vcpu reg
+ *
+ * Clobbers r1, r2, r3, r4.
+ */
+.macro restore_guest_regs_mode mode, offset
+	add	r1, vcpu, \offset
+	ldm	r1, {r2, r3, r4}
+	msr	SP_\mode, r2
+	msr	LR_\mode, r3
+	msr	SPSR_\mode, r4
+.endm
+
+/*
+ * Restore all guest registers from the vcpu struct.
+ *
+ * Assumes vcpu pointer in vcpu reg
+ *
+ * Clobbers *all* registers.
+ */
+.macro restore_guest_regs
+	restore_guest_regs_mode svc, #VCPU_SVC_REGS
+	restore_guest_regs_mode abt, #VCPU_ABT_REGS
+	restore_guest_regs_mode und, #VCPU_UND_REGS
+	restore_guest_regs_mode irq, #VCPU_IRQ_REGS
+
+	add	r1, vcpu, #VCPU_FIQ_REGS
+	ldm	r1, {r2-r9}
+	msr	r8_fiq, r2
+	msr	r9_fiq, r3
+	msr	r10_fiq, r4
+	msr	r11_fiq, r5
+	msr	r12_fiq, r6
+	msr	SP_fiq, r7
+	msr	LR_fiq, r8
+	msr	SPSR_fiq, r9
+
+	@ Load return state
+	ldr	r2, [vcpu, #VCPU_PC]
+	ldr	r3, [vcpu, #VCPU_CPSR]
+	msr	ELR_hyp, r2
+	msr	SPSR_cxsf, r3
+
+	@ Load user registers
+	ldr	r2, [vcpu, #VCPU_USR_SP]
+	ldr	r3, [vcpu, #VCPU_USR_LR]
+	msr	SP_usr, r2
+	mov	lr, r3
+	add	vcpu, vcpu, #(VCPU_USR_REGS)
+	ldm	vcpu, {r0-r12}
+.endm
+
+/*
+ * Save SP, LR and SPSR for a given mode. offset is the offset of
+ * this mode's registers from the VCPU base.
+ *
+ * Assumes vcpu pointer in vcpu reg
+ *
+ * Clobbers r2, r3, r4, r5.
+ */
+.macro save_guest_regs_mode mode, offset
+	add	r2, vcpu, \offset
+	mrs	r3, SP_\mode
+	mrs	r4, LR_\mode
+	mrs	r5, SPSR_\mode
+	stm	r2, {r3, r4, r5}
+.endm
+
+/*
+ * Save all guest registers to the vcpu struct
+ * Expects guest's r0, r1, r2 on the stack.
+ *
+ * Assumes vcpu pointer in vcpu reg
+ *
+ * Clobbers r2, r3, r4, r5.
+ */
+.macro save_guest_regs
+	@ Store usr registers
+	add	r2, vcpu, #VCPU_USR_REG(3)
+	stm	r2, {r3-r12}
+	add	r2, vcpu, #VCPU_USR_REG(0)
+	pop	{r3, r4, r5}		@ r0, r1, r2
+	stm	r2, {r3, r4, r5}
+	mrs	r2, SP_usr
+	mov	r3, lr
+	str	r2, [vcpu, #VCPU_USR_SP]
+	str	r3, [vcpu, #VCPU_USR_LR]
+
+	@ Store return state
+	mrs	r2, ELR_hyp
+	mrs	r3, spsr
+	str	r2, [vcpu, #VCPU_PC]
+	str	r3, [vcpu, #VCPU_CPSR]
+
+	@ Store other guest registers
+	save_guest_regs_mode svc, #VCPU_SVC_REGS
+	save_guest_regs_mode abt, #VCPU_ABT_REGS
+	save_guest_regs_mode und, #VCPU_UND_REGS
+	save_guest_regs_mode irq, #VCPU_IRQ_REGS
+.endm
+
+/* Reads cp15 registers from hardware and stores them in memory
+ * @store_to_vcpu: If 0, registers are written in-order to the stack,
+ * 		   otherwise to the VCPU struct pointed to by vcpup
+ *
+ * Assumes vcpu pointer in vcpu reg
+ *
+ * Clobbers r2 - r12
+ */
+.macro read_cp15_state store_to_vcpu
+	mrc	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mrc	p15, 0, r3, c1, c0, 2	@ CPACR
+	mrc	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mrc	p15, 0, r5, c3, c0, 0	@ DACR
+	mrrc	p15, 0, r6, r7, c2	@ TTBR 0
+	mrrc	p15, 1, r8, r9, c2	@ TTBR 1
+	mrc	p15, 0, r10, c10, c2, 0	@ PRRR
+	mrc	p15, 0, r11, c10, c2, 1	@ NMRR
+	mrc	p15, 2, r12, c0, c0, 0	@ CSSELR
+
+	.if \store_to_vcpu == 0
+	push	{r2-r12}		@ Push CP15 registers
+	.else
+	str	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
+	str	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
+	str	r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
+	str	r5, [vcpu, #CP15_OFFSET(c3_DACR)]
+	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
+	strd	r6, r7, [vcpu]
+	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
+	strd	r8, r9, [vcpu]
+	sub	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
+	str	r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
+	str	r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
+	str	r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
+	.endif
+
+	mrc	p15, 0, r2, c13, c0, 1	@ CID
+	mrc	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mrc	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mrc	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	mrc	p15, 0, r6, c5, c0, 0	@ DFSR
+	mrc	p15, 0, r7, c5, c0, 1	@ IFSR
+	mrc	p15, 0, r8, c5, c1, 0	@ ADFSR
+	mrc	p15, 0, r9, c5, c1, 1	@ AIFSR
+	mrc	p15, 0, r10, c6, c0, 0	@ DFAR
+	mrc	p15, 0, r11, c6, c0, 2	@ IFAR
+	mrc	p15, 0, r12, c12, c0, 0	@ VBAR
+
+	.if \store_to_vcpu == 0
+	push	{r2-r12}		@ Push CP15 registers
+	.else
+	str	r2, [vcpu, #CP15_OFFSET(c13_CID)]
+	str	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
+	str	r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
+	str	r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
+	str	r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
+	str	r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
+	str	r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
+	str	r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
+	str	r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
+	str	r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
+	str	r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
+	.endif
+.endm
+
+/*
+ * Reads cp15 registers from memory and writes them to hardware
+ * @read_from_vcpu: If 0, registers are read in-order from the stack,
+ *		    otherwise from the VCPU struct pointed to by vcpup
+ *
+ * Assumes vcpu pointer in vcpu reg
+ */
+.macro write_cp15_state read_from_vcpu
+	.if \read_from_vcpu == 0
+	pop	{r2-r12}
+	.else
+	ldr	r2, [vcpu, #CP15_OFFSET(c13_CID)]
+	ldr	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
+	ldr	r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
+	ldr	r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
+	ldr	r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
+	ldr	r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
+	ldr	r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
+	ldr	r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
+	ldr	r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
+	ldr	r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
+	ldr	r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
+	.endif
+
+	mcr	p15, 0, r2, c13, c0, 1	@ CID
+	mcr	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mcr	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mcr	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	mcr	p15, 0, r6, c5, c0, 0	@ DFSR
+	mcr	p15, 0, r7, c5, c0, 1	@ IFSR
+	mcr	p15, 0, r8, c5, c1, 0	@ ADFSR
+	mcr	p15, 0, r9, c5, c1, 1	@ AIFSR
+	mcr	p15, 0, r10, c6, c0, 0	@ DFAR
+	mcr	p15, 0, r11, c6, c0, 2	@ IFAR
+	mcr	p15, 0, r12, c12, c0, 0	@ VBAR
+
+	.if \read_from_vcpu == 0
+	pop	{r2-r12}
+	.else
+	ldr	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
+	ldr	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
+	ldr	r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
+	ldr	r5, [vcpu, #CP15_OFFSET(c3_DACR)]
+	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
+	ldrd	r6, r7, [vcpu]
+	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
+	ldrd	r8, r9, [vcpu]
+	sub	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
+	ldr	r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
+	ldr	r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
+	ldr	r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
+	.endif
+
+	mcr	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mcr	p15, 0, r3, c1, c0, 2	@ CPACR
+	mcr	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mcr	p15, 0, r5, c3, c0, 0	@ DACR
+	mcrr	p15, 0, r6, r7, c2	@ TTBR 0
+	mcrr	p15, 1, r8, r9, c2	@ TTBR 1
+	mcr	p15, 0, r10, c10, c2, 0	@ PRRR
+	mcr	p15, 0, r11, c10, c2, 1	@ NMRR
+	mcr	p15, 2, r12, c0, c0, 0	@ CSSELR
+.endm
+
+/*
+ * Save the VGIC CPU state into memory
+ *
+ * Assumes vcpu pointer in vcpu reg
+ */
+.macro save_vgic_state
+.endm
+
+/*
+ * Restore the VGIC CPU state from memory
+ *
+ * Assumes vcpu pointer in vcpu reg
+ */
+.macro restore_vgic_state
+.endm
+
+.equ vmentry,	0
+.equ vmexit,	1
+
+/* Configures the HSTR (Hyp System Trap Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hstr operation
+	mrc	p15, 4, r2, c1, c1, 3
+	ldr	r3, =HSTR_T(15)
+	.if \operation == vmentry
+	orr	r2, r2, r3		@ Trap CR{15}
+	.else
+	bic	r2, r2, r3		@ Don't trap any CRx accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 3
+.endm
+
+/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
+ * (hardware reset value is 0). Keep previous value in r2. */
+.macro set_hcptr operation, mask
+	mrc	p15, 4, r2, c1, c1, 2
+	ldr	r3, =\mask
+	.if \operation == vmentry
+	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
+	.else
+	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
+	.endif
+	mcr	p15, 4, r3, c1, c1, 2
+.endm
+
+/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hdcr operation
+	mrc	p15, 4, r2, c1, c1, 1
+	ldr	r3, =(HDCR_TPM|HDCR_TPMCR)
+	.if \operation == vmentry
+	orr	r2, r2, r3		@ Trap some perfmon accesses
+	.else
+	bic	r2, r2, r3		@ Don't trap any perfmon accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 1
+.endm
+
+/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
+.macro configure_hyp_role operation
+	mrc	p15, 4, r2, c1, c1, 0	@ HCR
+	bic	r2, r2, #HCR_VIRT_EXCP_MASK
+	ldr	r3, =HCR_GUEST_MASK
+	.if \operation == vmentry
+	orr	r2, r2, r3
+	ldr	r3, [vcpu, #VCPU_IRQ_LINES]
+	orr	r2, r2, r3
+	.else
+	bic	r2, r2, r3
+	.endif
+	mcr	p15, 4, r2, c1, c1, 0
+.endm
+
+.macro load_vcpu
+	mrc	p15, 4, vcpu, c13, c0, 2	@ HTPIDR
+.endm


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-08 18:39   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.

The following Hyp-ABI is also documented in the code:

Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
   Switching to Hyp mode is done through a simple HVC #0 instruction. The
   exception vector code will check that the HVC comes from VMID==0 and if
   so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
   - r0 contains a pointer to a HYP function
   - r1, r2, and r3 contain arguments to the above function.
   - The HYP function will be called with its arguments in r0, r1 and r2.
   On HYP function return, we return directly to SVC.

A call to a function executing in Hyp mode is performed like the following:

        <svc code>
        ldr     r0, =BSYM(my_hyp_fn)
        ldr     r1, =my_param
        hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
        <svc code>

Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.

SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.

Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.

To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU.  After a guest exit, the VFP state is
returned to the host.  When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state.  We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.

Aborts that are permission faults, and not stage-1 page table walk, do
not report the faulting address in the HPFAR.  We have to resolve the
IPA, and store it just like the HPFAR register on the VCPU struct. If
the IPA cannot be resolved, it means another CPU is playing with the
page tables, and we simply restart the guest.  This quirk was fixed by
Marc Zyngier.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h  |   51 ++++
 arch/arm/include/asm/kvm_host.h |   10 +
 arch/arm/kernel/asm-offsets.c   |   25 ++
 arch/arm/kvm/arm.c              |  187 ++++++++++++++++
 arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
 arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 1108 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/kvm/interrupts_head.S

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index fb22ee8..a3262a2 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -98,6 +98,18 @@
 #define TTBCR_T0SZ	3
 #define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
 
+/* Hyp System Trap Register */
+#define HSTR_T(x)	(1 << x)
+#define HSTR_TTEE	(1 << 16)
+#define HSTR_TJDBX	(1 << 17)
+
+/* Hyp Coprocessor Trap Register */
+#define HCPTR_TCP(x)	(1 << x)
+#define HCPTR_TCP_MASK	(0x3fff)
+#define HCPTR_TASE	(1 << 15)
+#define HCPTR_TTA	(1 << 20)
+#define HCPTR_TCPAC	(1 << 31)
+
 /* Hyp Debug Configuration Register bits */
 #define HDCR_TDRA	(1 << 11)
 #define HDCR_TDOSA	(1 << 10)
@@ -144,6 +156,45 @@
 #else
 #define VTTBR_X		(5 - KVM_T0SZ)
 #endif
+#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
+#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
+#define VTTBR_VMID_SHIFT  (48LLU)
+#define VTTBR_VMID_MASK	  (0xffLLU << VTTBR_VMID_SHIFT)
+
+/* Hyp Syndrome Register (HSR) bits */
+#define HSR_EC_SHIFT	(26)
+#define HSR_EC		(0x3fU << HSR_EC_SHIFT)
+#define HSR_IL		(1U << 25)
+#define HSR_ISS		(HSR_IL - 1)
+#define HSR_ISV_SHIFT	(24)
+#define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_FSC		(0x3f)
+#define HSR_FSC_TYPE	(0x3c)
+#define HSR_WNR		(1 << 6)
+
+#define FSC_FAULT	(0x04)
+#define FSC_PERM	(0x0c)
+
+/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
+#define HPFAR_MASK	(~0xf)
 
+#define HSR_EC_UNKNOWN	(0x00)
+#define HSR_EC_WFI	(0x01)
+#define HSR_EC_CP15_32	(0x03)
+#define HSR_EC_CP15_64	(0x04)
+#define HSR_EC_CP14_MR	(0x05)
+#define HSR_EC_CP14_LS	(0x06)
+#define HSR_EC_CP_0_13	(0x07)
+#define HSR_EC_CP10_ID	(0x08)
+#define HSR_EC_JAZELLE	(0x09)
+#define HSR_EC_BXJ	(0x0A)
+#define HSR_EC_CP14_64	(0x0C)
+#define HSR_EC_SVC_HYP	(0x11)
+#define HSR_EC_HVC	(0x12)
+#define HSR_EC_SMC	(0x13)
+#define HSR_EC_IABT	(0x20)
+#define HSR_EC_IABT_HYP	(0x21)
+#define HSR_EC_DABT	(0x24)
+#define HSR_EC_DABT_HYP	(0x25)
 
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 1de6f0d..ddb09da 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -21,6 +21,7 @@
 
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
+#include <asm/fpstate.h>
 
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
 #define KVM_USER_MEM_SLOTS 32
@@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
 	u32 hxfar;		/* Hyp Data/Inst Fault Address Register */
 	u32 hpfar;		/* Hyp IPA Fault Address Register */
 
+	/* Floating point registers (VFP and Advanced SIMD/NEON) */
+	struct vfp_hard_struct vfp_guest;
+	struct vfp_hard_struct *vfp_host;
+
+	/*
+	 * Anything that is not used directly from assembly code goes
+	 * here.
+	 */
 	/* Interrupt related fields */
 	u32 irq_lines;		/* IRQ and FIQ levels */
 
@@ -112,6 +121,7 @@ struct kvm_one_reg;
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 u64 kvm_call_hyp(void *hypfn, ...);
+void force_vm_exit(const cpumask_t *mask);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 struct kvm;
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index c985b48..c8b3272 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -13,6 +13,9 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/dma-mapping.h>
+#ifdef CONFIG_KVM_ARM_HOST
+#include <linux/kvm_host.h>
+#endif
 #include <asm/cacheflush.h>
 #include <asm/glue-df.h>
 #include <asm/glue-pf.h>
@@ -146,5 +149,27 @@ int main(void)
   DEFINE(DMA_BIDIRECTIONAL,	DMA_BIDIRECTIONAL);
   DEFINE(DMA_TO_DEVICE,		DMA_TO_DEVICE);
   DEFINE(DMA_FROM_DEVICE,	DMA_FROM_DEVICE);
+#ifdef CONFIG_KVM_ARM_HOST
+  DEFINE(VCPU_KVM,		offsetof(struct kvm_vcpu, kvm));
+  DEFINE(VCPU_MIDR,		offsetof(struct kvm_vcpu, arch.midr));
+  DEFINE(VCPU_CP15,		offsetof(struct kvm_vcpu, arch.cp15));
+  DEFINE(VCPU_VFP_GUEST,	offsetof(struct kvm_vcpu, arch.vfp_guest));
+  DEFINE(VCPU_VFP_HOST,		offsetof(struct kvm_vcpu, arch.vfp_host));
+  DEFINE(VCPU_REGS,		offsetof(struct kvm_vcpu, arch.regs));
+  DEFINE(VCPU_USR_REGS,		offsetof(struct kvm_vcpu, arch.regs.usr_regs));
+  DEFINE(VCPU_SVC_REGS,		offsetof(struct kvm_vcpu, arch.regs.svc_regs));
+  DEFINE(VCPU_ABT_REGS,		offsetof(struct kvm_vcpu, arch.regs.abt_regs));
+  DEFINE(VCPU_UND_REGS,		offsetof(struct kvm_vcpu, arch.regs.und_regs));
+  DEFINE(VCPU_IRQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.irq_regs));
+  DEFINE(VCPU_FIQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
+  DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
+  DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
+  DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.hsr));
+  DEFINE(VCPU_HxFAR,		offsetof(struct kvm_vcpu, arch.hxfar));
+  DEFINE(VCPU_HPFAR,		offsetof(struct kvm_vcpu, arch.hpfar));
+  DEFINE(VCPU_HYP_PC,		offsetof(struct kvm_vcpu, arch.hyp_pc));
+  DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
+#endif
   return 0; 
 }
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 9b4566e..c94d278 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -40,6 +40,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_emulate.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
@@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
 static unsigned long hyp_default_vectors;
 
+/* The VMID used in the VTTBR */
+static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
+static u8 kvm_next_vmid;
+static DEFINE_SPINLOCK(kvm_vmid_lock);
 
 int kvm_arch_hardware_enable(void *garbage)
 {
@@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
 
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
+	/* Force users to call KVM_ARM_VCPU_INIT */
+	vcpu->arch.target = -1;
 	return 0;
 }
 
@@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
+	vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
 {
+	return v->mode == IN_GUEST_MODE;
+}
+
+/* Just ensure a guest exit from a particular CPU */
+static void exit_vm_noop(void *info)
+{
+}
+
+void force_vm_exit(const cpumask_t *mask)
+{
+	smp_call_function_many(mask, exit_vm_noop, NULL, true);
+}
+
+/**
+ * need_new_vmid_gen - check that the VMID is still valid
+ * @kvm: The VM's VMID to checkt
+ *
+ * return true if there is a new generation of VMIDs being used
+ *
+ * The hardware supports only 256 values with the value zero reserved for the
+ * host, so we check if an assigned value belongs to a previous generation,
+ * which which requires us to assign a new value. If we're the first to use a
+ * VMID for the new generation, we must flush necessary caches and TLBs on all
+ * CPUs.
+ */
+static bool need_new_vmid_gen(struct kvm *kvm)
+{
+	return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
+}
+
+/**
+ * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
+ * @kvm	The guest that we are about to run
+ *
+ * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
+ * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
+ * caches and TLBs.
+ */
+static void update_vttbr(struct kvm *kvm)
+{
+	phys_addr_t pgd_phys;
+	u64 vmid;
+
+	if (!need_new_vmid_gen(kvm))
+		return;
+
+	spin_lock(&kvm_vmid_lock);
+
+	/*
+	 * We need to re-check the vmid_gen here to ensure that if another vcpu
+	 * already allocated a valid vmid for this vm, then this vcpu should
+	 * use the same vmid.
+	 */
+	if (!need_new_vmid_gen(kvm)) {
+		spin_unlock(&kvm_vmid_lock);
+		return;
+	}
+
+	/* First user of a new VMID generation? */
+	if (unlikely(kvm_next_vmid == 0)) {
+		atomic64_inc(&kvm_vmid_gen);
+		kvm_next_vmid = 1;
+
+		/*
+		 * On SMP we know no other CPUs can use this CPU's or each
+		 * other's VMID after force_vm_exit returns since the
+		 * kvm_vmid_lock blocks them from reentry to the guest.
+		 */
+		force_vm_exit(cpu_all_mask);
+		/*
+		 * Now broadcast TLB + ICACHE invalidation over the inner
+		 * shareable domain to make sure all data structures are
+		 * clean.
+		 */
+		kvm_call_hyp(__kvm_flush_vm_context);
+	}
+
+	kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
+	kvm->arch.vmid = kvm_next_vmid;
+	kvm_next_vmid++;
+
+	/* update vttbr to be used with the new vmid */
+	pgd_phys = virt_to_phys(kvm->arch.pgd);
+	vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
+	kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
+	kvm->arch.vttbr |= vmid;
+
+	spin_unlock(&kvm_vmid_lock);
+}
+
+/*
+ * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
+ * proper exit to QEMU.
+ */
+static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+		       int exception_index)
+{
+	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 	return 0;
 }
 
+/**
+ * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
+ * @vcpu:	The VCPU pointer
+ * @run:	The kvm_run structure pointer used for userspace state exchange
+ *
+ * This function is called through the VCPU_RUN ioctl called from user space. It
+ * will execute VM code in a loop until the time slice for the process is used
+ * or some emulation is needed from user space in which case the function will
+ * return with return value 0 and with the kvm_run structure filled in with the
+ * required data for the requested emulation.
+ */
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	int ret;
+	sigset_t sigsaved;
+
+	/* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
+	if (unlikely(vcpu->arch.target < 0))
+		return -ENOEXEC;
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
+
+	ret = 1;
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	while (ret > 0) {
+		/*
+		 * Check conditions before entering the guest
+		 */
+		cond_resched();
+
+		update_vttbr(vcpu->kvm);
+
+		local_irq_disable();
+
+		/*
+		 * Re-check atomic conditions
+		 */
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			run->exit_reason = KVM_EXIT_INTR;
+		}
+
+		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
+			local_irq_enable();
+			continue;
+		}
+
+		/**************************************************************
+		 * Enter the guest
+		 */
+		trace_kvm_entry(*vcpu_pc(vcpu));
+		kvm_guest_enter();
+		vcpu->mode = IN_GUEST_MODE;
+
+		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
+
+		vcpu->mode = OUTSIDE_GUEST_MODE;
+		kvm_guest_exit();
+		trace_kvm_exit(*vcpu_pc(vcpu));
+		/*
+		 * We may have taken a host interrupt in HYP mode (ie
+		 * while executing the guest). This interrupt is still
+		 * pending, as we haven't serviced it yet!
+		 *
+		 * We're now back in SVC mode, with interrupts
+		 * disabled.  Enabling the interrupts now will have
+		 * the effect of taking the interrupt again, in SVC
+		 * mode this time.
+		 */
+		local_irq_enable();
+
+		/*
+		 * Back from guest
+		 *************************************************************/
+
+		ret = handle_exit(vcpu, run, ret);
+	}
+
+	if (vcpu->sigset_active)
+		sigprocmask(SIG_SETMASK, &sigsaved, NULL);
+	return ret;
 }
 
 static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index a923590..08adcd5 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -20,9 +20,12 @@
 #include <linux/const.h>
 #include <asm/unified.h>
 #include <asm/page.h>
+#include <asm/ptrace.h>
 #include <asm/asm-offsets.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
+#include <asm/vfpmacros.h>
+#include "interrupts_head.S"
 
 	.text
 
@@ -31,36 +34,423 @@ __kvm_hyp_code_start:
 
 /********************************************************************
  * Flush per-VMID TLBs
+ *
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm);
+ *
+ * We rely on the hardware to broadcast the TLB invalidation to all CPUs
+ * inside the inner-shareable domain (which is the case for all v7
+ * implementations).  If we come across a non-IS SMP implementation, we'll
+ * have to use an IPI based mechanism. Until then, we stick to the simple
+ * hardware assisted version.
  */
 ENTRY(__kvm_tlb_flush_vmid)
+	push	{r2, r3}
+
+	add	r0, r0, #KVM_VTTBR
+	ldrd	r2, r3, [r0]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+	isb
+	mcr     p15, 0, r0, c8, c3, 0	@ TLBIALLIS (rt ignored)
+	dsb
+	isb
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
+	isb				@ Not necessary if followed by eret
+
+	pop	{r2, r3}
 	bx	lr
 ENDPROC(__kvm_tlb_flush_vmid)
 
 /********************************************************************
- * Flush TLBs and instruction caches of current CPU for all VMIDs
+ * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
+ * domain, for all VMIDs
+ *
+ * void __kvm_flush_vm_context(void);
  */
 ENTRY(__kvm_flush_vm_context)
+	mov	r0, #0			@ rn parameter for c15 flushes is SBZ
+
+	/* Invalidate NS Non-Hyp TLB Inner Shareable (TLBIALLNSNHIS) */
+	mcr     p15, 4, r0, c8, c3, 4
+	/* Invalidate instruction caches Inner Shareable (ICIALLUIS) */
+	mcr     p15, 0, r0, c7, c1, 0
+	dsb
+	isb				@ Not necessary if followed by eret
+
 	bx	lr
 ENDPROC(__kvm_flush_vm_context)
 
+
 /********************************************************************
  *  Hypervisor world-switch code
+ *
+ *
+ * int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
  */
 ENTRY(__kvm_vcpu_run)
-	bx	lr
+	@ Save the vcpu pointer
+	mcr	p15, 4, vcpu, c13, c0, 2	@ HTPIDR
+
+	save_host_regs
+
+	@ Store hardware CP15 state and load guest state
+	read_cp15_state store_to_vcpu = 0
+	write_cp15_state read_from_vcpu = 1
+
+	@ If the host kernel has not been configured with VFPv3 support,
+	@ then it is safer if we deny guests from using it as well.
+#ifdef CONFIG_VFPv3
+	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
+	VFPFMRX r2, FPEXC		@ VMRS
+	push	{r2}
+	orr	r2, r2, #FPEXC_EN
+	VFPFMXR FPEXC, r2		@ VMSR
+#endif
+
+	@ Configure Hyp-role
+	configure_hyp_role vmentry
+
+	@ Trap coprocessor CRx accesses
+	set_hstr vmentry
+	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+	set_hdcr vmentry
+
+	@ Write configured ID register into MIDR alias
+	ldr	r1, [vcpu, #VCPU_MIDR]
+	mcr	p15, 4, r1, c0, c0, 0
+
+	@ Write guest view of MPIDR into VMPIDR
+	ldr	r1, [vcpu, #CP15_OFFSET(c0_MPIDR)]
+	mcr	p15, 4, r1, c0, c0, 5
+
+	@ Set up guest memory translation
+	ldr	r1, [vcpu, #VCPU_KVM]
+	add	r1, r1, #KVM_VTTBR
+	ldrd	r2, r3, [r1]
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ We're all done, just restore the GPRs and go to the guest
+	restore_guest_regs
+	clrex				@ Clear exclusive monitor
+	eret
+
+__kvm_vcpu_return:
+	/*
+	 * return convention:
+	 * guest r0, r1, r2 saved on the stack
+	 * r0: vcpu pointer
+	 * r1: exception code
+	 */
+	save_guest_regs
+
+	@ Set VMID == 0
+	mov	r2, #0
+	mov	r3, #0
+	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
+
+	@ Don't trap coprocessor accesses for host kernel
+	set_hstr vmexit
+	set_hdcr vmexit
+	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+
+#ifdef CONFIG_VFPv3
+	@ Save floating point registers we if let guest use them.
+	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
+	bne	after_vfp_restore
+
+	@ Switch VFP/NEON hardware state to the host's
+	add	r7, vcpu, #VCPU_VFP_GUEST
+	store_vfp_state r7
+	add	r7, vcpu, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	restore_vfp_state r7
+
+after_vfp_restore:
+	@ Restore FPEXC_EN which we clobbered on entry
+	pop	{r2}
+	VFPFMXR FPEXC, r2
+#endif
+
+	@ Reset Hyp-role
+	configure_hyp_role vmexit
+
+	@ Let host read hardware MIDR
+	mrc	p15, 0, r2, c0, c0, 0
+	mcr	p15, 4, r2, c0, c0, 0
+
+	@ Back to hardware MPIDR
+	mrc	p15, 0, r2, c0, c0, 5
+	mcr	p15, 4, r2, c0, c0, 5
+
+	@ Store guest CP15 state and restore host state
+	read_cp15_state store_to_vcpu = 1
+	write_cp15_state read_from_vcpu = 0
+
+	restore_host_regs
+	clrex				@ Clear exclusive monitor
+	mov	r0, r1			@ Return the return code
+	bx	lr			@ return to IOCTL
 
 ENTRY(kvm_call_hyp)
+	hvc	#0
 	bx	lr
 
 
 /********************************************************************
  * Hypervisor exception vector and handlers
+ *
+ *
+ * The KVM/ARM Hypervisor ABI is defined as follows:
+ *
+ * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
+ * instruction is issued since all traps are disabled when running the host
+ * kernel as per the Hyp-mode initialization at boot time.
+ *
+ * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
+ * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
+ * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
+ * instructions are called from within Hyp-mode.
+ *
+ * Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
+ *    Switching to Hyp mode is done through a simple HVC #0 instruction. The
+ *    exception vector code will check that the HVC comes from VMID==0 and if
+ *    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
+ *    - r0 contains a pointer to a HYP function
+ *    - r1, r2, and r3 contain arguments to the above function.
+ *    - The HYP function will be called with its arguments in r0, r1 and r2.
+ *    On HYP function return, we return directly to SVC.
+ *
+ * Note that the above is used to execute code in Hyp-mode from a host-kernel
+ * point of view, and is a different concept from performing a world-switch and
+ * executing guest code SVC mode (with a VMID != 0).
  */
 
+/* Handle undef, svc, pabt, or dabt by crashing with a user notice */
+.macro bad_exception exception_code, panic_str
+	push	{r0-r2}
+	mrrc	p15, 6, r0, r1, c2	@ Read VTTBR
+	lsr	r1, r1, #16
+	ands	r1, r1, #0xff
+	beq	99f
+
+	load_vcpu			@ Load VCPU pointer
+	.if \exception_code == ARM_EXCEPTION_DATA_ABORT
+	mrc	p15, 4, r2, c5, c2, 0	@ HSR
+	mrc	p15, 4, r1, c6, c0, 0	@ HDFAR
+	str	r2, [vcpu, #VCPU_HSR]
+	str	r1, [vcpu, #VCPU_HxFAR]
+	.endif
+	.if \exception_code == ARM_EXCEPTION_PREF_ABORT
+	mrc	p15, 4, r2, c5, c2, 0	@ HSR
+	mrc	p15, 4, r1, c6, c0, 2	@ HIFAR
+	str	r2, [vcpu, #VCPU_HSR]
+	str	r1, [vcpu, #VCPU_HxFAR]
+	.endif
+	mov	r1, #\exception_code
+	b	__kvm_vcpu_return
+
+	@ We were in the host already. Let's craft a panic-ing return to SVC.
+99:	mrs	r2, cpsr
+	bic	r2, r2, #MODE_MASK
+	orr	r2, r2, #SVC_MODE
+THUMB(	orr	r2, r2, #PSR_T_BIT	)
+	msr	spsr_cxsf, r2
+	mrs	r1, ELR_hyp
+	ldr	r2, =BSYM(panic)
+	msr	ELR_hyp, r2
+	ldr	r0, =\panic_str
+	eret
+.endm
+
+	.text
+
 	.align 5
 __kvm_hyp_vector:
 	.globl __kvm_hyp_vector
-	nop
+
+	@ Hyp-mode exception vector
+	W(b)	hyp_reset
+	W(b)	hyp_undef
+	W(b)	hyp_svc
+	W(b)	hyp_pabt
+	W(b)	hyp_dabt
+	W(b)	hyp_hvc
+	W(b)	hyp_irq
+	W(b)	hyp_fiq
+
+	.align
+hyp_reset:
+	b	hyp_reset
+
+	.align
+hyp_undef:
+	bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str
+
+	.align
+hyp_svc:
+	bad_exception ARM_EXCEPTION_HVC, svc_die_str
+
+	.align
+hyp_pabt:
+	bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str
+
+	.align
+hyp_dabt:
+	bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str
+
+	.align
+hyp_hvc:
+	/*
+	 * Getting here is either becuase of a trap from a guest or from calling
+	 * HVC from the host kernel, which means "switch to Hyp mode".
+	 */
+	push	{r0, r1, r2}
+
+	@ Check syndrome register
+	mrc	p15, 4, r1, c5, c2, 0	@ HSR
+	lsr	r0, r1, #HSR_EC_SHIFT
+#ifdef CONFIG_VFPv3
+	cmp	r0, #HSR_EC_CP_0_13
+	beq	switch_to_guest_vfp
+#endif
+	cmp	r0, #HSR_EC_HVC
+	bne	guest_trap		@ Not HVC instr.
+
+	/*
+	 * Let's check if the HVC came from VMID 0 and allow simple
+	 * switch to Hyp mode
+	 */
+	mrrc    p15, 6, r0, r2, c2
+	lsr     r2, r2, #16
+	and     r2, r2, #0xff
+	cmp     r2, #0
+	bne	guest_trap		@ Guest called HVC
+
+host_switch_to_hyp:
+	pop	{r0, r1, r2}
+
+	push	{lr}
+	mrs	lr, SPSR
+	push	{lr}
+
+	mov	lr, r0
+	mov	r0, r1
+	mov	r1, r2
+	mov	r2, r3
+
+THUMB(	orr	lr, #1)
+	blx	lr			@ Call the HYP function
+
+	pop	{lr}
+	msr	SPSR_csxf, lr
+	pop	{lr}
+	eret
+
+guest_trap:
+	load_vcpu			@ Load VCPU pointer to r0
+	str	r1, [vcpu, #VCPU_HSR]
+
+	@ Check if we need the fault information
+	lsr	r1, r1, #HSR_EC_SHIFT
+	cmp	r1, #HSR_EC_IABT
+	mrceq	p15, 4, r2, c6, c0, 2	@ HIFAR
+	beq	2f
+	cmp	r1, #HSR_EC_DABT
+	bne	1f
+	mrc	p15, 4, r2, c6, c0, 0	@ HDFAR
+
+2:	str	r2, [vcpu, #VCPU_HxFAR]
+
+	/*
+	 * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode:
+	 *
+	 * Abort on the stage 2 translation for a memory access from a
+	 * Non-secure PL1 or PL0 mode:
+	 *
+	 * For any Access flag fault or Translation fault, and also for any
+	 * Permission fault on the stage 2 translation of a memory access
+	 * made as part of a translation table walk for a stage 1 translation,
+	 * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR
+	 * is UNKNOWN.
+	 */
+
+	/* Check for permission fault, and S1PTW */
+	mrc	p15, 4, r1, c5, c2, 0	@ HSR
+	and	r0, r1, #HSR_FSC_TYPE
+	cmp	r0, #FSC_PERM
+	tsteq	r1, #(1 << 7)		@ S1PTW
+	mrcne	p15, 4, r2, c6, c0, 4	@ HPFAR
+	bne	3f
+
+	/* Resolve IPA using the xFAR */
+	mcr	p15, 0, r2, c7, c8, 0	@ ATS1CPR
+	isb
+	mrrc	p15, 0, r0, r1, c7	@ PAR
+	tst	r0, #1
+	bne	4f			@ Failed translation
+	ubfx	r2, r0, #12, #20
+	lsl	r2, r2, #4
+	orr	r2, r2, r1, lsl #24
+
+3:	load_vcpu			@ Load VCPU pointer to r0
+	str	r2, [r0, #VCPU_HPFAR]
+
+1:	mov	r1, #ARM_EXCEPTION_HVC
+	b	__kvm_vcpu_return
+
+4:	pop	{r0, r1, r2}		@ Failed translation, return to guest
+	eret
+
+/*
+ * If VFPv3 support is not available, then we will not switch the VFP
+ * registers; however cp10 and cp11 accesses will still trap and fallback
+ * to the regular coprocessor emulation code, which currently will
+ * inject an undefined exception to the guest.
+ */
+#ifdef CONFIG_VFPv3
+switch_to_guest_vfp:
+	load_vcpu			@ Load VCPU pointer to r0
+	push	{r3-r7}
+
+	@ NEON/VFP used.  Turn on VFP access.
+	set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
+
+	@ Switch VFP/NEON hardware state to the guest's
+	add	r7, r0, #VCPU_VFP_HOST
+	ldr	r7, [r7]
+	store_vfp_state r7
+	add	r7, r0, #VCPU_VFP_GUEST
+	restore_vfp_state r7
+
+	pop	{r3-r7}
+	pop	{r0-r2}
+	eret
+#endif
+
+	.align
+hyp_irq:
+	push	{r0, r1, r2}
+	mov	r1, #ARM_EXCEPTION_IRQ
+	load_vcpu			@ Load VCPU pointer to r0
+	b	__kvm_vcpu_return
+
+	.align
+hyp_fiq:
+	b	hyp_fiq
+
+	.ltorg
 
 __kvm_hyp_code_end:
 	.globl	__kvm_hyp_code_end
+
+	.section ".rodata"
+
+und_die_str:
+	.ascii	"unexpected undefined exception in Hyp mode at: %#08x"
+pabt_die_str:
+	.ascii	"unexpected prefetch abort in Hyp mode at: %#08x"
+dabt_die_str:
+	.ascii	"unexpected data abort in Hyp mode at: %#08x"
+svc_die_str:
+	.ascii	"unexpected HVC/SVC trap in Hyp mode at: %#08x"
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
new file mode 100644
index 0000000..f59a580
--- /dev/null
+++ b/arch/arm/kvm/interrupts_head.S
@@ -0,0 +1,443 @@
+#define VCPU_USR_REG(_reg_nr)	(VCPU_USR_REGS + (_reg_nr * 4))
+#define VCPU_USR_SP		(VCPU_USR_REG(13))
+#define VCPU_USR_LR		(VCPU_USR_REG(14))
+#define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4))
+
+/*
+ * Many of these macros need to access the VCPU structure, which is always
+ * held in r0. These macros should never clobber r1, as it is used to hold the
+ * exception code on the return path (except of course the macro that switches
+ * all the registers before the final jump to the VM).
+ */
+vcpu	.req	r0		@ vcpu pointer always in r0
+
+/* Clobbers {r2-r6} */
+.macro store_vfp_state vfp_base
+	@ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions
+	VFPFMRX	r2, FPEXC
+	@ Make sure VFP is enabled so we can touch the registers.
+	orr	r6, r2, #FPEXC_EN
+	VFPFMXR	FPEXC, r6
+
+	VFPFMRX	r3, FPSCR
+	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
+	beq	1f
+	@ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so
+	@ we only need to save them if FPEXC_EX is set.
+	VFPFMRX r4, FPINST
+	tst	r2, #FPEXC_FP2V
+	VFPFMRX r5, FPINST2, ne		@ vmrsne
+	bic	r6, r2, #FPEXC_EX	@ FPEXC_EX disable
+	VFPFMXR	FPEXC, r6
+1:
+	VFPFSTMIA \vfp_base, r6		@ Save VFP registers
+	stm	\vfp_base, {r2-r5}	@ Save FPEXC, FPSCR, FPINST, FPINST2
+.endm
+
+/* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */
+.macro restore_vfp_state vfp_base
+	VFPFLDMIA \vfp_base, r6		@ Load VFP registers
+	ldm	\vfp_base, {r2-r5}	@ Load FPEXC, FPSCR, FPINST, FPINST2
+
+	VFPFMXR FPSCR, r3
+	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
+	beq	1f
+	VFPFMXR FPINST, r4
+	tst	r2, #FPEXC_FP2V
+	VFPFMXR FPINST2, r5, ne
+1:
+	VFPFMXR FPEXC, r2	@ FPEXC	(last, in case !EN)
+.endm
+
+/* These are simply for the macros to work - value don't have meaning */
+.equ usr, 0
+.equ svc, 1
+.equ abt, 2
+.equ und, 3
+.equ irq, 4
+.equ fiq, 5
+
+.macro push_host_regs_mode mode
+	mrs	r2, SP_\mode
+	mrs	r3, LR_\mode
+	mrs	r4, SPSR_\mode
+	push	{r2, r3, r4}
+.endm
+
+/*
+ * Store all host persistent registers on the stack.
+ * Clobbers all registers, in all modes, except r0 and r1.
+ */
+.macro save_host_regs
+	/* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
+	mrs	r2, ELR_hyp
+	push	{r2}
+
+	/* usr regs */
+	push	{r4-r12}	@ r0-r3 are always clobbered
+	mrs	r2, SP_usr
+	mov	r3, lr
+	push	{r2, r3}
+
+	push_host_regs_mode svc
+	push_host_regs_mode abt
+	push_host_regs_mode und
+	push_host_regs_mode irq
+
+	/* fiq regs */
+	mrs	r2, r8_fiq
+	mrs	r3, r9_fiq
+	mrs	r4, r10_fiq
+	mrs	r5, r11_fiq
+	mrs	r6, r12_fiq
+	mrs	r7, SP_fiq
+	mrs	r8, LR_fiq
+	mrs	r9, SPSR_fiq
+	push	{r2-r9}
+.endm
+
+.macro pop_host_regs_mode mode
+	pop	{r2, r3, r4}
+	msr	SP_\mode, r2
+	msr	LR_\mode, r3
+	msr	SPSR_\mode, r4
+.endm
+
+/*
+ * Restore all host registers from the stack.
+ * Clobbers all registers, in all modes, except r0 and r1.
+ */
+.macro restore_host_regs
+	pop	{r2-r9}
+	msr	r8_fiq, r2
+	msr	r9_fiq, r3
+	msr	r10_fiq, r4
+	msr	r11_fiq, r5
+	msr	r12_fiq, r6
+	msr	SP_fiq, r7
+	msr	LR_fiq, r8
+	msr	SPSR_fiq, r9
+
+	pop_host_regs_mode irq
+	pop_host_regs_mode und
+	pop_host_regs_mode abt
+	pop_host_regs_mode svc
+
+	pop	{r2, r3}
+	msr	SP_usr, r2
+	mov	lr, r3
+	pop	{r4-r12}
+
+	pop	{r2}
+	msr	ELR_hyp, r2
+.endm
+
+/*
+ * Restore SP, LR and SPSR for a given mode. offset is the offset of
+ * this mode's registers from the VCPU base.
+ *
+ * Assumes vcpu pointer in vcpu reg
+ *
+ * Clobbers r1, r2, r3, r4.
+ */
+.macro restore_guest_regs_mode mode, offset
+	add	r1, vcpu, \offset
+	ldm	r1, {r2, r3, r4}
+	msr	SP_\mode, r2
+	msr	LR_\mode, r3
+	msr	SPSR_\mode, r4
+.endm
+
+/*
+ * Restore all guest registers from the vcpu struct.
+ *
+ * Assumes vcpu pointer in vcpu reg
+ *
+ * Clobbers *all* registers.
+ */
+.macro restore_guest_regs
+	restore_guest_regs_mode svc, #VCPU_SVC_REGS
+	restore_guest_regs_mode abt, #VCPU_ABT_REGS
+	restore_guest_regs_mode und, #VCPU_UND_REGS
+	restore_guest_regs_mode irq, #VCPU_IRQ_REGS
+
+	add	r1, vcpu, #VCPU_FIQ_REGS
+	ldm	r1, {r2-r9}
+	msr	r8_fiq, r2
+	msr	r9_fiq, r3
+	msr	r10_fiq, r4
+	msr	r11_fiq, r5
+	msr	r12_fiq, r6
+	msr	SP_fiq, r7
+	msr	LR_fiq, r8
+	msr	SPSR_fiq, r9
+
+	@ Load return state
+	ldr	r2, [vcpu, #VCPU_PC]
+	ldr	r3, [vcpu, #VCPU_CPSR]
+	msr	ELR_hyp, r2
+	msr	SPSR_cxsf, r3
+
+	@ Load user registers
+	ldr	r2, [vcpu, #VCPU_USR_SP]
+	ldr	r3, [vcpu, #VCPU_USR_LR]
+	msr	SP_usr, r2
+	mov	lr, r3
+	add	vcpu, vcpu, #(VCPU_USR_REGS)
+	ldm	vcpu, {r0-r12}
+.endm
+
+/*
+ * Save SP, LR and SPSR for a given mode. offset is the offset of
+ * this mode's registers from the VCPU base.
+ *
+ * Assumes vcpu pointer in vcpu reg
+ *
+ * Clobbers r2, r3, r4, r5.
+ */
+.macro save_guest_regs_mode mode, offset
+	add	r2, vcpu, \offset
+	mrs	r3, SP_\mode
+	mrs	r4, LR_\mode
+	mrs	r5, SPSR_\mode
+	stm	r2, {r3, r4, r5}
+.endm
+
+/*
+ * Save all guest registers to the vcpu struct
+ * Expects guest's r0, r1, r2 on the stack.
+ *
+ * Assumes vcpu pointer in vcpu reg
+ *
+ * Clobbers r2, r3, r4, r5.
+ */
+.macro save_guest_regs
+	@ Store usr registers
+	add	r2, vcpu, #VCPU_USR_REG(3)
+	stm	r2, {r3-r12}
+	add	r2, vcpu, #VCPU_USR_REG(0)
+	pop	{r3, r4, r5}		@ r0, r1, r2
+	stm	r2, {r3, r4, r5}
+	mrs	r2, SP_usr
+	mov	r3, lr
+	str	r2, [vcpu, #VCPU_USR_SP]
+	str	r3, [vcpu, #VCPU_USR_LR]
+
+	@ Store return state
+	mrs	r2, ELR_hyp
+	mrs	r3, spsr
+	str	r2, [vcpu, #VCPU_PC]
+	str	r3, [vcpu, #VCPU_CPSR]
+
+	@ Store other guest registers
+	save_guest_regs_mode svc, #VCPU_SVC_REGS
+	save_guest_regs_mode abt, #VCPU_ABT_REGS
+	save_guest_regs_mode und, #VCPU_UND_REGS
+	save_guest_regs_mode irq, #VCPU_IRQ_REGS
+.endm
+
+/* Reads cp15 registers from hardware and stores them in memory
+ * @store_to_vcpu: If 0, registers are written in-order to the stack,
+ * 		   otherwise to the VCPU struct pointed to by vcpup
+ *
+ * Assumes vcpu pointer in vcpu reg
+ *
+ * Clobbers r2 - r12
+ */
+.macro read_cp15_state store_to_vcpu
+	mrc	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mrc	p15, 0, r3, c1, c0, 2	@ CPACR
+	mrc	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mrc	p15, 0, r5, c3, c0, 0	@ DACR
+	mrrc	p15, 0, r6, r7, c2	@ TTBR 0
+	mrrc	p15, 1, r8, r9, c2	@ TTBR 1
+	mrc	p15, 0, r10, c10, c2, 0	@ PRRR
+	mrc	p15, 0, r11, c10, c2, 1	@ NMRR
+	mrc	p15, 2, r12, c0, c0, 0	@ CSSELR
+
+	.if \store_to_vcpu == 0
+	push	{r2-r12}		@ Push CP15 registers
+	.else
+	str	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
+	str	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
+	str	r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
+	str	r5, [vcpu, #CP15_OFFSET(c3_DACR)]
+	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
+	strd	r6, r7, [vcpu]
+	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
+	strd	r8, r9, [vcpu]
+	sub	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
+	str	r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
+	str	r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
+	str	r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
+	.endif
+
+	mrc	p15, 0, r2, c13, c0, 1	@ CID
+	mrc	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mrc	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mrc	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	mrc	p15, 0, r6, c5, c0, 0	@ DFSR
+	mrc	p15, 0, r7, c5, c0, 1	@ IFSR
+	mrc	p15, 0, r8, c5, c1, 0	@ ADFSR
+	mrc	p15, 0, r9, c5, c1, 1	@ AIFSR
+	mrc	p15, 0, r10, c6, c0, 0	@ DFAR
+	mrc	p15, 0, r11, c6, c0, 2	@ IFAR
+	mrc	p15, 0, r12, c12, c0, 0	@ VBAR
+
+	.if \store_to_vcpu == 0
+	push	{r2-r12}		@ Push CP15 registers
+	.else
+	str	r2, [vcpu, #CP15_OFFSET(c13_CID)]
+	str	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
+	str	r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
+	str	r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
+	str	r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
+	str	r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
+	str	r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
+	str	r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
+	str	r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
+	str	r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
+	str	r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
+	.endif
+.endm
+
+/*
+ * Reads cp15 registers from memory and writes them to hardware
+ * @read_from_vcpu: If 0, registers are read in-order from the stack,
+ *		    otherwise from the VCPU struct pointed to by vcpup
+ *
+ * Assumes vcpu pointer in vcpu reg
+ */
+.macro write_cp15_state read_from_vcpu
+	.if \read_from_vcpu == 0
+	pop	{r2-r12}
+	.else
+	ldr	r2, [vcpu, #CP15_OFFSET(c13_CID)]
+	ldr	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
+	ldr	r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
+	ldr	r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
+	ldr	r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
+	ldr	r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
+	ldr	r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
+	ldr	r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
+	ldr	r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
+	ldr	r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
+	ldr	r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
+	.endif
+
+	mcr	p15, 0, r2, c13, c0, 1	@ CID
+	mcr	p15, 0, r3, c13, c0, 2	@ TID_URW
+	mcr	p15, 0, r4, c13, c0, 3	@ TID_URO
+	mcr	p15, 0, r5, c13, c0, 4	@ TID_PRIV
+	mcr	p15, 0, r6, c5, c0, 0	@ DFSR
+	mcr	p15, 0, r7, c5, c0, 1	@ IFSR
+	mcr	p15, 0, r8, c5, c1, 0	@ ADFSR
+	mcr	p15, 0, r9, c5, c1, 1	@ AIFSR
+	mcr	p15, 0, r10, c6, c0, 0	@ DFAR
+	mcr	p15, 0, r11, c6, c0, 2	@ IFAR
+	mcr	p15, 0, r12, c12, c0, 0	@ VBAR
+
+	.if \read_from_vcpu == 0
+	pop	{r2-r12}
+	.else
+	ldr	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
+	ldr	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
+	ldr	r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
+	ldr	r5, [vcpu, #CP15_OFFSET(c3_DACR)]
+	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
+	ldrd	r6, r7, [vcpu]
+	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
+	ldrd	r8, r9, [vcpu]
+	sub	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
+	ldr	r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
+	ldr	r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
+	ldr	r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
+	.endif
+
+	mcr	p15, 0, r2, c1, c0, 0	@ SCTLR
+	mcr	p15, 0, r3, c1, c0, 2	@ CPACR
+	mcr	p15, 0, r4, c2, c0, 2	@ TTBCR
+	mcr	p15, 0, r5, c3, c0, 0	@ DACR
+	mcrr	p15, 0, r6, r7, c2	@ TTBR 0
+	mcrr	p15, 1, r8, r9, c2	@ TTBR 1
+	mcr	p15, 0, r10, c10, c2, 0	@ PRRR
+	mcr	p15, 0, r11, c10, c2, 1	@ NMRR
+	mcr	p15, 2, r12, c0, c0, 0	@ CSSELR
+.endm
+
+/*
+ * Save the VGIC CPU state into memory
+ *
+ * Assumes vcpu pointer in vcpu reg
+ */
+.macro save_vgic_state
+.endm
+
+/*
+ * Restore the VGIC CPU state from memory
+ *
+ * Assumes vcpu pointer in vcpu reg
+ */
+.macro restore_vgic_state
+.endm
+
+.equ vmentry,	0
+.equ vmexit,	1
+
+/* Configures the HSTR (Hyp System Trap Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hstr operation
+	mrc	p15, 4, r2, c1, c1, 3
+	ldr	r3, =HSTR_T(15)
+	.if \operation == vmentry
+	orr	r2, r2, r3		@ Trap CR{15}
+	.else
+	bic	r2, r2, r3		@ Don't trap any CRx accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 3
+.endm
+
+/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
+ * (hardware reset value is 0). Keep previous value in r2. */
+.macro set_hcptr operation, mask
+	mrc	p15, 4, r2, c1, c1, 2
+	ldr	r3, =\mask
+	.if \operation == vmentry
+	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
+	.else
+	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
+	.endif
+	mcr	p15, 4, r3, c1, c1, 2
+.endm
+
+/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
+ * (hardware reset value is 0) */
+.macro set_hdcr operation
+	mrc	p15, 4, r2, c1, c1, 1
+	ldr	r3, =(HDCR_TPM|HDCR_TPMCR)
+	.if \operation == vmentry
+	orr	r2, r2, r3		@ Trap some perfmon accesses
+	.else
+	bic	r2, r2, r3		@ Don't trap any perfmon accesses
+	.endif
+	mcr	p15, 4, r2, c1, c1, 1
+.endm
+
+/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
+.macro configure_hyp_role operation
+	mrc	p15, 4, r2, c1, c1, 0	@ HCR
+	bic	r2, r2, #HCR_VIRT_EXCP_MASK
+	ldr	r3, =HCR_GUEST_MASK
+	.if \operation == vmentry
+	orr	r2, r2, r3
+	ldr	r3, [vcpu, #VCPU_IRQ_LINES]
+	orr	r2, r2, r3
+	.else
+	bic	r2, r2, r3
+	.endif
+	mcr	p15, 4, r2, c1, c1, 0
+.endm
+
+.macro load_vcpu
+	mrc	p15, 4, vcpu, c13, c0, 2	@ HTPIDR
+.endm

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 08/14] KVM: ARM: Emulation framework and CP15 emulation
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:39   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm; +Cc: Marcelo Tosatti, Rusty Russell

Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.

Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commit handles these exits by
emulating the intended operation in software and skipping the guest
instruction.

Minor notes about the coproc register reset:
1) We reserve a value of 0 as an invalid cp15 offset, to catch bugs in our
   table, at cost of 4 bytes per vcpu.

2) Added comments on the table indicating how we handle each register, for
   simplicity of understanding.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h     |    9 +
 arch/arm/include/asm/kvm_coproc.h  |   14 +
 arch/arm/include/asm/kvm_emulate.h |    6 +
 arch/arm/include/asm/kvm_host.h    |    4 
 arch/arm/kvm/Makefile              |    2 
 arch/arm/kvm/arm.c                 |  175 +++++++++++++++++-
 arch/arm/kvm/coproc.c              |  360 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/coproc.h              |  153 +++++++++++++++
 arch/arm/kvm/coproc_a15.c          |  162 ++++++++++++++++
 arch/arm/kvm/emulate.c             |  218 ++++++++++++++++++++++
 arch/arm/kvm/trace.h               |   45 +++++
 11 files changed, 1144 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/kvm/coproc.h
 create mode 100644 arch/arm/kvm/coproc_a15.c

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index a3262a2..3ff6f22 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -70,6 +70,11 @@
 			HCR_SWIO | HCR_TIDCP)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
+/* System Control Register (SCTLR) bits */
+#define SCTLR_TE	(1 << 30)
+#define SCTLR_EE	(1 << 25)
+#define SCTLR_V		(1 << 13)
+
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
 #define HSCTLR_EE	(1 << 25)
@@ -171,6 +176,10 @@
 #define HSR_FSC		(0x3f)
 #define HSR_FSC_TYPE	(0x3c)
 #define HSR_WNR		(1 << 6)
+#define HSR_CV_SHIFT	(24)
+#define HSR_CV		(1U << HSR_CV_SHIFT)
+#define HSR_COND_SHIFT	(20)
+#define HSR_COND	(0xfU << HSR_COND_SHIFT)
 
 #define FSC_FAULT	(0x04)
 #define FSC_PERM	(0x0c)
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
index b6d023d..bd1ace0 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -21,4 +21,18 @@
 
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 
+struct kvm_coproc_target_table {
+	unsigned target;
+	const struct coproc_reg *table;
+	size_t num;
+};
+void kvm_register_target_coproc_table(struct kvm_coproc_target_table *table);
+
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_coproc_table_init(void);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 17dad67..01a755b 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -25,6 +25,12 @@
 u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
 u32 *vcpu_spsr(struct kvm_vcpu *vcpu);
 
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
+void kvm_inject_undefined(struct kvm_vcpu *vcpu);
+void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
+void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
+
 static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
 {
 	return (u32 *)&vcpu->arch.regs.usr_regs.ARM_pc;
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ddb09da..a56a319 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -94,6 +94,10 @@ struct kvm_vcpu_arch {
 	 * Anything that is not used directly from assembly code goes
 	 * here.
 	 */
+	/* dcache set/way operation pending */
+	int last_pcpu;
+	cpumask_t require_dcache_flush;
+
 	/* Interrupt related fields */
 	u32 irq_lines;		/* IRQ and FIQ levels */
 
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index dfc293f..88edce6 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -18,4 +18,4 @@ kvm-arm-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o
+obj-y += coproc.o coproc_a15.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index c94d278..0b4ffcf 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -36,11 +36,14 @@
 #include <asm/mman.h>
 #include <asm/cputype.h>
 #include <asm/tlbflush.h>
+#include <asm/cacheflush.h>
 #include <asm/virt.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <asm/opcodes.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
@@ -294,6 +297,17 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
 	vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
+
+	/*
+	 * Check whether this vcpu requires the cache to be flushed on
+	 * this physical CPU. This is a consequence of doing dcache
+	 * operations by set/way on this vcpu. We do it here to be in
+	 * a non-preemptible section.
+	 */
+	if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
+		cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+		flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
+	}
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -319,9 +333,16 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+/**
+ * kvm_arch_vcpu_runnable - determine if the vcpu can be scheduled
+ * @v:		The VCPU pointer
+ *
+ * If the guest CPU is not waiting for interrupts or an interrupt line is
+ * asserted, the CPU is by definition runnable.
+ */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return 0;
+	return !!v->arch.irq_lines;
 }
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
@@ -416,6 +437,114 @@ static void update_vttbr(struct kvm *kvm)
 	spin_unlock(&kvm_vmid_lock);
 }
 
+static int handle_svc_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* SVC called from Hyp mode should never get here */
+	kvm_debug("SVC called from Hyp mode shouldn't go here\n");
+	BUG();
+	return -EINVAL; /* Squash warning */
+}
+
+static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/*
+	 * Guest called HVC instruction:
+	 * Let it know we don't want that by injecting an undefined exception.
+	 */
+	kvm_debug("hvc: %x (at %08x)", vcpu->arch.hsr & ((1 << 16) - 1),
+		  *vcpu_pc(vcpu));
+	kvm_debug("         HSR: %8x", vcpu->arch.hsr);
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* We don't support SMC; don't do that. */
+	kvm_debug("smc: at %08x", *vcpu_pc(vcpu));
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int handle_pabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* The hypervisor should never cause aborts */
+	kvm_err("Prefetch Abort taken from Hyp mode at %#08x (HSR: %#08x)\n",
+		vcpu->arch.hxfar, vcpu->arch.hsr);
+	return -EFAULT;
+}
+
+static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* This is either an error in the ws. code or an external abort */
+	kvm_err("Data Abort taken from Hyp mode at %#08x (HSR: %#08x)\n",
+		vcpu->arch.hxfar, vcpu->arch.hsr);
+	return -EFAULT;
+}
+
+typedef int (*exit_handle_fn)(struct kvm_vcpu *, struct kvm_run *);
+static exit_handle_fn arm_exit_handlers[] = {
+	[HSR_EC_WFI]		= kvm_handle_wfi,
+	[HSR_EC_CP15_32]	= kvm_handle_cp15_32,
+	[HSR_EC_CP15_64]	= kvm_handle_cp15_64,
+	[HSR_EC_CP14_MR]	= kvm_handle_cp14_access,
+	[HSR_EC_CP14_LS]	= kvm_handle_cp14_load_store,
+	[HSR_EC_CP14_64]	= kvm_handle_cp14_access,
+	[HSR_EC_CP_0_13]	= kvm_handle_cp_0_13_access,
+	[HSR_EC_CP10_ID]	= kvm_handle_cp10_id,
+	[HSR_EC_SVC_HYP]	= handle_svc_hyp,
+	[HSR_EC_HVC]		= handle_hvc,
+	[HSR_EC_SMC]		= handle_smc,
+	[HSR_EC_IABT]		= kvm_handle_guest_abort,
+	[HSR_EC_IABT_HYP]	= handle_pabt_hyp,
+	[HSR_EC_DABT]		= kvm_handle_guest_abort,
+	[HSR_EC_DABT_HYP]	= handle_dabt_hyp,
+};
+
+/*
+ * A conditional instruction is allowed to trap, even though it
+ * wouldn't be executed.  So let's re-implement the hardware, in
+ * software!
+ */
+static bool kvm_condition_valid(struct kvm_vcpu *vcpu)
+{
+	unsigned long cpsr, cond, insn;
+
+	/*
+	 * Exception Code 0 can only happen if we set HCR.TGE to 1, to
+	 * catch undefined instructions, and then we won't get past
+	 * the arm_exit_handlers test anyway.
+	 */
+	BUG_ON(((vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT) == 0);
+
+	/* Top two bits non-zero?  Unconditional. */
+	if (vcpu->arch.hsr >> 30)
+		return true;
+
+	cpsr = *vcpu_cpsr(vcpu);
+
+	/* Is condition field valid? */
+	if ((vcpu->arch.hsr & HSR_CV) >> HSR_CV_SHIFT)
+		cond = (vcpu->arch.hsr & HSR_COND) >> HSR_COND_SHIFT;
+	else {
+		/* This can happen in Thumb mode: examine IT state. */
+		unsigned long it;
+
+		it = ((cpsr >> 8) & 0xFC) | ((cpsr >> 25) & 0x3);
+
+		/* it == 0 => unconditional. */
+		if (it == 0)
+			return true;
+
+		/* The cond for this insn works out as the top 4 bits. */
+		cond = (it >> 4);
+	}
+
+	/* Shift makes it look like an ARM-mode instruction */
+	insn = cond << 28;
+	return arm_check_condition(insn, cpsr) != ARM_OPCODE_CONDTEST_FAIL;
+}
+
 /*
  * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
  * proper exit to QEMU.
@@ -423,8 +552,46 @@ static void update_vttbr(struct kvm *kvm)
 static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		       int exception_index)
 {
-	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-	return 0;
+	unsigned long hsr_ec;
+
+	switch (exception_index) {
+	case ARM_EXCEPTION_IRQ:
+		return 1;
+	case ARM_EXCEPTION_UNDEFINED:
+		kvm_err("Undefined exception in Hyp mode at: %#08x\n",
+			vcpu->arch.hyp_pc);
+		BUG();
+		panic("KVM: Hypervisor undefined exception!\n");
+	case ARM_EXCEPTION_DATA_ABORT:
+	case ARM_EXCEPTION_PREF_ABORT:
+	case ARM_EXCEPTION_HVC:
+		hsr_ec = (vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT;
+
+		if (hsr_ec >= ARRAY_SIZE(arm_exit_handlers)
+		    || !arm_exit_handlers[hsr_ec]) {
+			kvm_err("Unkown exception class: %#08lx, "
+				"hsr: %#08x\n", hsr_ec,
+				(unsigned int)vcpu->arch.hsr);
+			BUG();
+		}
+
+		/*
+		 * See ARM ARM B1.14.1: "Hyp traps on instructions
+		 * that fail their condition code check"
+		 */
+		if (!kvm_condition_valid(vcpu)) {
+			bool is_wide = vcpu->arch.hsr & HSR_IL;
+			kvm_skip_instr(vcpu, is_wide);
+			return 1;
+		}
+
+		return arm_exit_handlers[hsr_ec](vcpu, run);
+	default:
+		kvm_pr_unimpl("Unsupported exception type: %d",
+			      exception_index);
+		run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+		return 0;
+	}
 }
 
 /**
@@ -485,6 +652,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
+		vcpu->arch.last_pcpu = smp_processor_id();
 		kvm_guest_exit();
 		trace_kvm_exit(*vcpu_pc(vcpu));
 		/*
@@ -798,6 +966,7 @@ int kvm_arch_init(void *opaque)
 	if (err)
 		goto out_err;
 
+	kvm_coproc_table_init();
 	return 0;
 out_err:
 	return err;
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 0c43355..722efe3 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -16,8 +16,368 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+#include <linux/mm.h>
 #include <linux/kvm_host.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <trace/events/kvm.h>
 
+#include "trace.h"
+#include "coproc.h"
+
+
+/******************************************************************************
+ * Co-processor emulation
+ *****************************************************************************/
+
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/*
+	 * We can get here, if the host has been built without VFPv3 support,
+	 * but the guest attempted a floating point operation.
+	 */
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+/* See note at ARM ARM B1.14.4 */
+static bool access_dcsw(struct kvm_vcpu *vcpu,
+			const struct coproc_params *p,
+			const struct coproc_reg *r)
+{
+	u32 val;
+	int cpu;
+
+	cpu = get_cpu();
+
+	if (!p->is_write)
+		return read_from_write_only(vcpu, p);
+
+	cpumask_setall(&vcpu->arch.require_dcache_flush);
+	cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+
+	/* If we were already preempted, take the long way around */
+	if (cpu != vcpu->arch.last_pcpu) {
+		flush_cache_all();
+		goto done;
+	}
+
+	val = *vcpu_reg(vcpu, p->Rt1);
+
+	switch (p->CRm) {
+	case 6:			/* Upgrade DCISW to DCCISW, as per HCR.SWIO */
+	case 14:		/* DCCISW */
+		asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r" (val));
+		break;
+
+	case 10:		/* DCCSW */
+		asm volatile("mcr p15, 0, %0, c7, c10, 2" : : "r" (val));
+		break;
+	}
+
+done:
+	put_cpu();
+
+	return true;
+}
+
+/*
+ * We could trap ID_DFR0 and tell the guest we don't support performance
+ * monitoring.  Unfortunately the patch to make the kernel check ID_DFR0 was
+ * NAKed, so it will read the PMCR anyway.
+ *
+ * Therefore we tell the guest we have 0 counters.  Unfortunately, we
+ * must always support PMCCNTR (the cycle counter): we just RAZ/WI for
+ * all PM registers, which doesn't crash the guest kernel at least.
+ */
+static bool pm_fake(struct kvm_vcpu *vcpu,
+		    const struct coproc_params *p,
+		    const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+	else
+		return read_zero(vcpu, p);
+}
+
+#define access_pmcr pm_fake
+#define access_pmcntenset pm_fake
+#define access_pmcntenclr pm_fake
+#define access_pmovsr pm_fake
+#define access_pmselr pm_fake
+#define access_pmceid0 pm_fake
+#define access_pmceid1 pm_fake
+#define access_pmccntr pm_fake
+#define access_pmxevtyper pm_fake
+#define access_pmxevcntr pm_fake
+#define access_pmuserenr pm_fake
+#define access_pmintenset pm_fake
+#define access_pmintenclr pm_fake
+
+/* Architected CP15 registers.
+ * Important: Must be sorted ascending by CRn, CRM, Op1, Op2
+ */
+static const struct coproc_reg cp15_regs[] = {
+	/* CSSELR: swapped by interrupt.S. */
+	{ CRn( 0), CRm( 0), Op1( 2), Op2( 0), is32,
+			NULL, reset_unknown, c0_CSSELR },
+
+	/* TTBR0/TTBR1: swapped by interrupt.S. */
+	{ CRm( 2), Op1( 0), is64, NULL, reset_unknown64, c2_TTBR0 },
+	{ CRm( 2), Op1( 1), is64, NULL, reset_unknown64, c2_TTBR1 },
+
+	/* TTBCR: swapped by interrupt.S. */
+	{ CRn( 2), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_val, c2_TTBCR, 0x00000000 },
+
+	/* DACR: swapped by interrupt.S. */
+	{ CRn( 3), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c3_DACR },
+
+	/* DFSR/IFSR/ADFSR/AIFSR: swapped by interrupt.S. */
+	{ CRn( 5), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c5_DFSR },
+	{ CRn( 5), CRm( 0), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c5_IFSR },
+	{ CRn( 5), CRm( 1), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c5_ADFSR },
+	{ CRn( 5), CRm( 1), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c5_AIFSR },
+
+	/* DFAR/IFAR: swapped by interrupt.S. */
+	{ CRn( 6), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c6_DFAR },
+	{ CRn( 6), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_unknown, c6_IFAR },
+	/*
+	 * DC{C,I,CI}SW operations:
+	 */
+	{ CRn( 7), CRm( 6), Op1( 0), Op2( 2), is32, access_dcsw},
+	{ CRn( 7), CRm(10), Op1( 0), Op2( 2), is32, access_dcsw},
+	{ CRn( 7), CRm(14), Op1( 0), Op2( 2), is32, access_dcsw},
+	/*
+	 * Dummy performance monitor implementation.
+	 */
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 0), is32, access_pmcr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 1), is32, access_pmcntenset},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 2), is32, access_pmcntenclr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 3), is32, access_pmovsr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 5), is32, access_pmselr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 6), is32, access_pmceid0},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 7), is32, access_pmceid1},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 0), is32, access_pmccntr},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 1), is32, access_pmxevtyper},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 2), is32, access_pmxevcntr},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 0), is32, access_pmuserenr},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 1), is32, access_pmintenset},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 2), is32, access_pmintenclr},
+
+	/* PRRR/NMRR (aka MAIR0/MAIR1): swapped by interrupt.S. */
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c10_PRRR},
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c10_NMRR},
+
+	/* VBAR: swapped by interrupt.S. */
+	{ CRn(12), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_val, c12_VBAR, 0x00000000 },
+
+	/* CONTEXTIDR/TPIDRURW/TPIDRURO/TPIDRPRW: swapped by interrupt.S. */
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 1), is32,
+			NULL, reset_val, c13_CID, 0x00000000 },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_unknown, c13_TID_URW },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 3), is32,
+			NULL, reset_unknown, c13_TID_URO },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 4), is32,
+			NULL, reset_unknown, c13_TID_PRIV },
+};
+
+/* Target specific emulation tables */
+static struct kvm_coproc_target_table *target_tables[KVM_ARM_NUM_TARGETS];
+
+void kvm_register_target_coproc_table(struct kvm_coproc_target_table *table)
+{
+	target_tables[table->target] = table;
+}
+
+/* Get specific register table for this target. */
+static const struct coproc_reg *get_target_table(unsigned target, size_t *num)
+{
+	struct kvm_coproc_target_table *table;
+
+	table = target_tables[target];
+	*num = table->num;
+	return table->table;
+}
+
+static const struct coproc_reg *find_reg(const struct coproc_params *params,
+					 const struct coproc_reg table[],
+					 unsigned int num)
+{
+	unsigned int i;
+
+	for (i = 0; i < num; i++) {
+		const struct coproc_reg *r = &table[i];
+
+		if (params->is_64bit != r->is_64)
+			continue;
+		if (params->CRn != r->CRn)
+			continue;
+		if (params->CRm != r->CRm)
+			continue;
+		if (params->Op1 != r->Op1)
+			continue;
+		if (params->Op2 != r->Op2)
+			continue;
+
+		return r;
+	}
+	return NULL;
+}
+
+static int emulate_cp15(struct kvm_vcpu *vcpu,
+			const struct coproc_params *params)
+{
+	size_t num;
+	const struct coproc_reg *table, *r;
+
+	trace_kvm_emulate_cp15_imp(params->Op1, params->Rt1, params->CRn,
+				   params->CRm, params->Op2, params->is_write);
+
+	table = get_target_table(vcpu->arch.target, &num);
+
+	/* Search target-specific then generic table. */
+	r = find_reg(params, table, num);
+	if (!r)
+		r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	if (likely(r)) {
+		/* If we don't have an accessor, we should never get here! */
+		BUG_ON(!r->access);
+
+		if (likely(r->access(vcpu, params, r))) {
+			/* Skip instruction, since it was emulated */
+			kvm_skip_instr(vcpu, (vcpu->arch.hsr >> 25) & 1);
+			return 1;
+		}
+		/* If access function fails, it should complain. */
+	} else {
+		kvm_err("Unsupported guest CP15 access at: %08x\n",
+			*vcpu_pc(vcpu));
+		print_cp_instr(params);
+	}
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+/**
+ * kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = true;
+
+	params.Op1 = (vcpu->arch.hsr >> 16) & 0xf;
+	params.Op2 = 0;
+	params.Rt2 = (vcpu->arch.hsr >> 10) & 0xf;
+	params.CRn = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+static void reset_coproc_regs(struct kvm_vcpu *vcpu,
+			      const struct coproc_reg *table, size_t num)
+{
+	unsigned long i;
+
+	for (i = 0; i < num; i++)
+		if (table[i].reset)
+			table[i].reset(vcpu, &table[i]);
+}
+
+/**
+ * kvm_handle_cp15_32 -- handles a mrc/mcr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = false;
+
+	params.CRn = (vcpu->arch.hsr >> 10) & 0xf;
+	params.Op1 = (vcpu->arch.hsr >> 14) & 0x7;
+	params.Op2 = (vcpu->arch.hsr >> 17) & 0x7;
+	params.Rt2 = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+void kvm_coproc_table_init(void)
+{
+	unsigned int i;
+
+	/* Make sure tables are unique and in order. */
+	for (i = 1; i < ARRAY_SIZE(cp15_regs); i++)
+		BUG_ON(cmp_reg(&cp15_regs[i-1], &cp15_regs[i]) >= 0);
+}
+
+/**
+ * kvm_reset_coprocs - sets cp15 registers to reset value
+ * @vcpu: The VCPU pointer
+ *
+ * This function finds the right table above and sets the registers on the
+ * virtual CPU struct to their architecturally defined reset values.
+ */
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
 {
+	size_t num;
+	const struct coproc_reg *table;
+
+	/* Catch someone adding a register without putting in reset entry. */
+	memset(vcpu->arch.cp15, 0x42, sizeof(vcpu->arch.cp15));
+
+	/* Generic chip reset first (so target could override). */
+	reset_coproc_regs(vcpu, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	table = get_target_table(vcpu->arch.target, &num);
+	reset_coproc_regs(vcpu, table, num);
+
+	for (num = 1; num < NR_CP15_REGS; num++)
+		if (vcpu->arch.cp15[num] == 0x42424242)
+			panic("Didn't reset vcpu->arch.cp15[%zi]", num);
 }
diff --git a/arch/arm/kvm/coproc.h b/arch/arm/kvm/coproc.h
new file mode 100644
index 0000000..992adfa
--- /dev/null
+++ b/arch/arm/kvm/coproc.h
@@ -0,0 +1,153 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Authors: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_COPROC_LOCAL_H__
+#define __ARM_KVM_COPROC_LOCAL_H__
+
+struct coproc_params {
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+	unsigned long Rt1;
+	unsigned long Rt2;
+	bool is_64bit;
+	bool is_write;
+};
+
+struct coproc_reg {
+	/* MRC/MCR/MRRC/MCRR instruction which accesses it. */
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+
+	bool is_64;
+
+	/* Trapped access from guest, if non-NULL. */
+	bool (*access)(struct kvm_vcpu *,
+		       const struct coproc_params *,
+		       const struct coproc_reg *);
+
+	/* Initialization for vcpu. */
+	void (*reset)(struct kvm_vcpu *, const struct coproc_reg *);
+
+	/* Index into vcpu->arch.cp15[], or 0 if we don't need to save it. */
+	unsigned long reg;
+
+	/* Value (usually reset value) */
+	u64 val;
+};
+
+static inline void print_cp_instr(const struct coproc_params *p)
+{
+	/* Look, we even formatted it for you to paste into the table! */
+	if (p->is_64bit) {
+		kvm_pr_unimpl(" { CRm(%2lu), Op1(%2lu), is64, func_%s },\n",
+			      p->CRm, p->Op1, p->is_write ? "write" : "read");
+	} else {
+		kvm_pr_unimpl(" { CRn(%2lu), CRm(%2lu), Op1(%2lu), Op2(%2lu), is32,"
+			      " func_%s },\n",
+			      p->CRn, p->CRm, p->Op1, p->Op2,
+			      p->is_write ? "write" : "read");
+	}
+}
+
+static inline bool ignore_write(struct kvm_vcpu *vcpu,
+				const struct coproc_params *p)
+{
+	return true;
+}
+
+static inline bool read_zero(struct kvm_vcpu *vcpu,
+			     const struct coproc_params *p)
+{
+	*vcpu_reg(vcpu, p->Rt1) = 0;
+	return true;
+}
+
+static inline bool write_to_read_only(struct kvm_vcpu *vcpu,
+				      const struct coproc_params *params)
+{
+	kvm_debug("CP15 write to read-only register at: %08x\n",
+		  *vcpu_pc(vcpu));
+	print_cp_instr(params);
+	return false;
+}
+
+static inline bool read_from_write_only(struct kvm_vcpu *vcpu,
+					const struct coproc_params *params)
+{
+	kvm_debug("CP15 read to write-only register at: %08x\n",
+		  *vcpu_pc(vcpu));
+	print_cp_instr(params);
+	return false;
+}
+
+/* Reset functions */
+static inline void reset_unknown(struct kvm_vcpu *vcpu,
+				 const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15));
+	vcpu->arch.cp15[r->reg] = 0xdecafbad;
+}
+
+static inline void reset_val(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15));
+	vcpu->arch.cp15[r->reg] = r->val;
+}
+
+static inline void reset_unknown64(struct kvm_vcpu *vcpu,
+				   const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg + 1 >= ARRAY_SIZE(vcpu->arch.cp15));
+
+	vcpu->arch.cp15[r->reg] = 0xdecafbad;
+	vcpu->arch.cp15[r->reg+1] = 0xd0c0ffee;
+}
+
+static inline int cmp_reg(const struct coproc_reg *i1,
+			  const struct coproc_reg *i2)
+{
+	BUG_ON(i1 == i2);
+	if (!i1)
+		return 1;
+	else if (!i2)
+		return -1;
+	if (i1->CRn != i2->CRn)
+		return i1->CRn - i2->CRn;
+	if (i1->CRm != i2->CRm)
+		return i1->CRm - i2->CRm;
+	if (i1->Op1 != i2->Op1)
+		return i1->Op1 - i2->Op1;
+	return i1->Op2 - i2->Op2;
+}
+
+
+#define CRn(_x)		.CRn = _x
+#define CRm(_x) 	.CRm = _x
+#define Op1(_x) 	.Op1 = _x
+#define Op2(_x) 	.Op2 = _x
+#define is64		.is_64 = true
+#define is32		.is_64 = false
+
+#endif /* __ARM_KVM_COPROC_LOCAL_H__ */
diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
new file mode 100644
index 0000000..685063a
--- /dev/null
+++ b/arch/arm/kvm/coproc_a15.c
@@ -0,0 +1,162 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Authors: Rusty Russell <rusty@rustcorp.au>
+ *          Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/kvm_host.h>
+#include <asm/cputype.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <linux/init.h>
+
+static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	/*
+	 * Compute guest MPIDR:
+	 * (Even if we present only one VCPU to the guest on an SMP
+	 * host we don't set the U bit in the MPIDR, or vice versa, as
+	 * revealing the underlying hardware properties is likely to
+	 * be the best choice).
+	 */
+	vcpu->arch.cp15[c0_MPIDR] = (read_cpuid_mpidr() & ~MPIDR_LEVEL_MASK)
+		| (vcpu->vcpu_id & MPIDR_LEVEL_MASK);
+}
+
+#include "coproc.h"
+
+/* A15 TRM 4.3.28: RO WI */
+static bool access_actlr(struct kvm_vcpu *vcpu,
+			 const struct coproc_params *p,
+			 const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c1_ACTLR];
+	return true;
+}
+
+/* A15 TRM 4.3.60: R/O. */
+static bool access_cbar(struct kvm_vcpu *vcpu,
+			const struct coproc_params *p,
+			const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p);
+	return read_zero(vcpu, p);
+}
+
+/* A15 TRM 4.3.48: R/O WI. */
+static bool access_l2ctlr(struct kvm_vcpu *vcpu,
+			  const struct coproc_params *p,
+			  const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c9_L2CTLR];
+	return true;
+}
+
+static void reset_l2ctlr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	u32 l2ctlr, ncores;
+
+	asm volatile("mrc p15, 1, %0, c9, c0, 2\n" : "=r" (l2ctlr));
+	l2ctlr &= ~(3 << 24);
+	ncores = atomic_read(&vcpu->kvm->online_vcpus) - 1;
+	l2ctlr |= (ncores & 3) << 24;
+
+	vcpu->arch.cp15[c9_L2CTLR] = l2ctlr;
+}
+
+static void reset_actlr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	u32 actlr;
+
+	/* ACTLR contains SMP bit: make sure you create all cpus first! */
+	asm volatile("mrc p15, 0, %0, c1, c0, 1\n" : "=r" (actlr));
+	/* Make the SMP bit consistent with the guest configuration */
+	if (atomic_read(&vcpu->kvm->online_vcpus) > 1)
+		actlr |= 1U << 6;
+	else
+		actlr &= ~(1U << 6);
+
+	vcpu->arch.cp15[c1_ACTLR] = actlr;
+}
+
+/* A15 TRM 4.3.49: R/O WI (even if NSACR.NS_L2ERR, a write of 1 is ignored). */
+static bool access_l2ectlr(struct kvm_vcpu *vcpu,
+			   const struct coproc_params *p,
+			   const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = 0;
+	return true;
+}
+
+/*
+ * A15-specific CP15 registers.
+ * Important: Must be sorted ascending by CRn, CRM, Op1, Op2
+ */
+static const struct coproc_reg a15_regs[] = {
+	/* MPIDR: we use VMPIDR for guest access. */
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 5), is32,
+			NULL, reset_mpidr, c0_MPIDR },
+
+	/* SCTLR: swapped by interrupt.S. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_val, c1_SCTLR, 0x00C50078 },
+	/* ACTLR: trapped by HCR.TAC bit. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 1), is32,
+			access_actlr, reset_actlr, c1_ACTLR },
+	/* CPACR: swapped by interrupt.S. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_val, c1_CPACR, 0x00000000 },
+
+	/*
+	 * L2CTLR access (guest wants to know #CPUs).
+	 */
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 2), is32,
+			access_l2ctlr, reset_l2ctlr, c9_L2CTLR },
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 3), is32, access_l2ectlr},
+
+	/* The Configuration Base Address Register. */
+	{ CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
+};
+
+static struct kvm_coproc_target_table a15_target_table = {
+	.target = KVM_ARM_TARGET_CORTEX_A15,
+	.table = a15_regs,
+	.num = ARRAY_SIZE(a15_regs),
+};
+
+static int __init coproc_a15_init(void)
+{
+	unsigned int i;
+
+	for (i = 1; i < ARRAY_SIZE(a15_regs); i++)
+		BUG_ON(cmp_reg(&a15_regs[i-1],
+			       &a15_regs[i]) >= 0);
+
+	kvm_register_target_coproc_table(&a15_target_table);
+	return 0;
+}
+late_initcall(coproc_a15_init);
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 3eadc25..d61450a 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -16,7 +16,13 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 
+#include <linux/mm.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
 
 #define VCPU_NR_MODES		6
 #define VCPU_REG_OFFSET_USR	0
@@ -153,3 +159,215 @@ u32 *vcpu_spsr(struct kvm_vcpu *vcpu)
 		BUG();
 	}
 }
+
+/**
+ * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
+ * @vcpu:	the vcpu pointer
+ * @run:	the kvm_run structure pointer
+ *
+ * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
+ * halt execution of world-switches and schedule other host processes until
+ * there is an incoming IRQ or FIQ to the VM.
+ */
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	trace_kvm_wfi(*vcpu_pc(vcpu));
+	kvm_vcpu_block(vcpu);
+	return 1;
+}
+
+/**
+ * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
+ * @vcpu:	The VCPU pointer
+ *
+ * When exceptions occur while instructions are executed in Thumb IF-THEN
+ * blocks, the ITSTATE field of the CPSR is not advanved (updated), so we have
+ * to do this little bit of work manually. The fields map like this:
+ *
+ * IT[7:0] -> CPSR[26:25],CPSR[15:10]
+ */
+static void kvm_adjust_itstate(struct kvm_vcpu *vcpu)
+{
+	unsigned long itbits, cond;
+	unsigned long cpsr = *vcpu_cpsr(vcpu);
+	bool is_arm = !(cpsr & PSR_T_BIT);
+
+	BUG_ON(is_arm && (cpsr & PSR_IT_MASK));
+
+	if (!(cpsr & PSR_IT_MASK))
+		return;
+
+	cond = (cpsr & 0xe000) >> 13;
+	itbits = (cpsr & 0x1c00) >> (10 - 2);
+	itbits |= (cpsr & (0x3 << 25)) >> 25;
+
+	/* Perform ITAdvance (see page A-52 in ARM DDI 0406C) */
+	if ((itbits & 0x7) == 0)
+		itbits = cond = 0;
+	else
+		itbits = (itbits << 1) & 0x1f;
+
+	cpsr &= ~PSR_IT_MASK;
+	cpsr |= cond << 13;
+	cpsr |= (itbits & 0x1c) << (10 - 2);
+	cpsr |= (itbits & 0x3) << 25;
+	*vcpu_cpsr(vcpu) = cpsr;
+}
+
+/**
+ * kvm_skip_instr - skip a trapped instruction and proceed to the next
+ * @vcpu: The vcpu pointer
+ */
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
+{
+	bool is_thumb;
+
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	if (is_thumb && !is_wide_instr)
+		*vcpu_pc(vcpu) += 2;
+	else
+		*vcpu_pc(vcpu) += 4;
+	kvm_adjust_itstate(vcpu);
+}
+
+
+/******************************************************************************
+ * Inject exceptions into the guest
+ */
+
+static u32 exc_vector_base(struct kvm_vcpu *vcpu)
+{
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	u32 vbar = vcpu->arch.cp15[c12_VBAR];
+
+	if (sctlr & SCTLR_V)
+		return 0xffff0000;
+	else /* always have security exceptions */
+		return vbar;
+}
+
+/**
+ * kvm_inject_undefined - inject an undefined exception into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ *
+ * Modelled after TakeUndefInstrException() pseudocode.
+ */
+void kvm_inject_undefined(struct kvm_vcpu *vcpu)
+{
+	u32 new_lr_value;
+	u32 new_spsr_value;
+	u32 cpsr = *vcpu_cpsr(vcpu);
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	bool is_thumb = (cpsr & PSR_T_BIT);
+	u32 vect_offset = 4;
+	u32 return_offset = (is_thumb) ? 2 : 4;
+
+	new_spsr_value = cpsr;
+	new_lr_value = *vcpu_pc(vcpu) - return_offset;
+
+	*vcpu_cpsr(vcpu) = (cpsr & ~MODE_MASK) | UND_MODE;
+	*vcpu_cpsr(vcpu) |= PSR_I_BIT;
+	*vcpu_cpsr(vcpu) &= ~(PSR_IT_MASK | PSR_J_BIT | PSR_E_BIT | PSR_T_BIT);
+
+	if (sctlr & SCTLR_TE)
+		*vcpu_cpsr(vcpu) |= PSR_T_BIT;
+	if (sctlr & SCTLR_EE)
+		*vcpu_cpsr(vcpu) |= PSR_E_BIT;
+
+	/* Note: These now point to UND banked copies */
+	*vcpu_spsr(vcpu) = cpsr;
+	*vcpu_reg(vcpu, 14) = new_lr_value;
+
+	/* Branch to exception vector */
+	*vcpu_pc(vcpu) = exc_vector_base(vcpu) + vect_offset;
+}
+
+/*
+ * Modelled after TakeDataAbortException() and TakePrefetchAbortException
+ * pseudocode.
+ */
+static void inject_abt(struct kvm_vcpu *vcpu, bool is_pabt, unsigned long addr)
+{
+	u32 new_lr_value;
+	u32 new_spsr_value;
+	u32 cpsr = *vcpu_cpsr(vcpu);
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	bool is_thumb = (cpsr & PSR_T_BIT);
+	u32 vect_offset;
+	u32 return_offset = (is_thumb) ? 4 : 0;
+	bool is_lpae;
+
+	new_spsr_value = cpsr;
+	new_lr_value = *vcpu_pc(vcpu) + return_offset;
+
+	*vcpu_cpsr(vcpu) = (cpsr & ~MODE_MASK) | ABT_MODE;
+	*vcpu_cpsr(vcpu) |= PSR_I_BIT | PSR_A_BIT;
+	*vcpu_cpsr(vcpu) &= ~(PSR_IT_MASK | PSR_J_BIT | PSR_E_BIT | PSR_T_BIT);
+
+	if (sctlr & SCTLR_TE)
+		*vcpu_cpsr(vcpu) |= PSR_T_BIT;
+	if (sctlr & SCTLR_EE)
+		*vcpu_cpsr(vcpu) |= PSR_E_BIT;
+
+	/* Note: These now point to ABT banked copies */
+	*vcpu_spsr(vcpu) = cpsr;
+	*vcpu_reg(vcpu, 14) = new_lr_value;
+
+	if (is_pabt)
+		vect_offset = 12;
+	else
+		vect_offset = 16;
+
+	/* Branch to exception vector */
+	*vcpu_pc(vcpu) = exc_vector_base(vcpu) + vect_offset;
+
+	if (is_pabt) {
+		/* Set DFAR and DFSR */
+		vcpu->arch.cp15[c6_IFAR] = addr;
+		is_lpae = (vcpu->arch.cp15[c2_TTBCR] >> 31);
+		/* Always give debug fault for now - should give guest a clue */
+		if (is_lpae)
+			vcpu->arch.cp15[c5_IFSR] = 1 << 9 | 0x22;
+		else
+			vcpu->arch.cp15[c5_IFSR] = 2;
+	} else { /* !iabt */
+		/* Set DFAR and DFSR */
+		vcpu->arch.cp15[c6_DFAR] = addr;
+		is_lpae = (vcpu->arch.cp15[c2_TTBCR] >> 31);
+		/* Always give debug fault for now - should give guest a clue */
+		if (is_lpae)
+			vcpu->arch.cp15[c5_DFSR] = 1 << 9 | 0x22;
+		else
+			vcpu->arch.cp15[c5_DFSR] = 2;
+	}
+
+}
+
+/**
+ * kvm_inject_dabt - inject a data abort into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ * @addr: The address to report in the DFAR
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ */
+void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
+{
+	inject_abt(vcpu, false, addr);
+}
+
+/**
+ * kvm_inject_pabt - inject a prefetch abort into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ * @addr: The address to report in the DFAR
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ */
+void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
+{
+	inject_abt(vcpu, true, addr);
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 105d1f7..c86a513 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -64,6 +64,51 @@ TRACE_EVENT(kvm_irq_line,
 		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
 );
 
+/* Architecturally implementation defined CP15 register access */
+TRACE_EVENT(kvm_emulate_cp15_imp,
+	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
+		 unsigned long CRm, unsigned long Op2, bool is_write),
+	TP_ARGS(Op1, Rt1, CRn, CRm, Op2, is_write),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	Op1		)
+		__field(	unsigned int,	Rt1		)
+		__field(	unsigned int,	CRn		)
+		__field(	unsigned int,	CRm		)
+		__field(	unsigned int,	Op2		)
+		__field(	bool,		is_write	)
+	),
+
+	TP_fast_assign(
+		__entry->is_write		= is_write;
+		__entry->Op1			= Op1;
+		__entry->Rt1			= Rt1;
+		__entry->CRn			= CRn;
+		__entry->CRm			= CRm;
+		__entry->Op2			= Op2;
+	),
+
+	TP_printk("Implementation defined CP15: %s\tp15, %u, r%u, c%u, c%u, %u",
+			(__entry->is_write) ? "mcr" : "mrc",
+			__entry->Op1, __entry->Rt1, __entry->CRn,
+			__entry->CRm, __entry->Op2)
+);
+
+TRACE_EVENT(kvm_wfi,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("guest executed wfi at: 0x%08lx", __entry->vcpu_pc)
+);
+
 TRACE_EVENT(kvm_unmap_hva,
 	TP_PROTO(unsigned long hva),
 	TP_ARGS(hva),


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 08/14] KVM: ARM: Emulation framework and CP15 emulation
@ 2013-01-08 18:39   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.

Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commit handles these exits by
emulating the intended operation in software and skipping the guest
instruction.

Minor notes about the coproc register reset:
1) We reserve a value of 0 as an invalid cp15 offset, to catch bugs in our
   table, at cost of 4 bytes per vcpu.

2) Added comments on the table indicating how we handle each register, for
   simplicity of understanding.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h     |    9 +
 arch/arm/include/asm/kvm_coproc.h  |   14 +
 arch/arm/include/asm/kvm_emulate.h |    6 +
 arch/arm/include/asm/kvm_host.h    |    4 
 arch/arm/kvm/Makefile              |    2 
 arch/arm/kvm/arm.c                 |  175 +++++++++++++++++-
 arch/arm/kvm/coproc.c              |  360 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/coproc.h              |  153 +++++++++++++++
 arch/arm/kvm/coproc_a15.c          |  162 ++++++++++++++++
 arch/arm/kvm/emulate.c             |  218 ++++++++++++++++++++++
 arch/arm/kvm/trace.h               |   45 +++++
 11 files changed, 1144 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/kvm/coproc.h
 create mode 100644 arch/arm/kvm/coproc_a15.c

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index a3262a2..3ff6f22 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -70,6 +70,11 @@
 			HCR_SWIO | HCR_TIDCP)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
+/* System Control Register (SCTLR) bits */
+#define SCTLR_TE	(1 << 30)
+#define SCTLR_EE	(1 << 25)
+#define SCTLR_V		(1 << 13)
+
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE	(1 << 30)
 #define HSCTLR_EE	(1 << 25)
@@ -171,6 +176,10 @@
 #define HSR_FSC		(0x3f)
 #define HSR_FSC_TYPE	(0x3c)
 #define HSR_WNR		(1 << 6)
+#define HSR_CV_SHIFT	(24)
+#define HSR_CV		(1U << HSR_CV_SHIFT)
+#define HSR_COND_SHIFT	(20)
+#define HSR_COND	(0xfU << HSR_COND_SHIFT)
 
 #define FSC_FAULT	(0x04)
 #define FSC_PERM	(0x0c)
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
index b6d023d..bd1ace0 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -21,4 +21,18 @@
 
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 
+struct kvm_coproc_target_table {
+	unsigned target;
+	const struct coproc_reg *table;
+	size_t num;
+};
+void kvm_register_target_coproc_table(struct kvm_coproc_target_table *table);
+
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_coproc_table_init(void);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 17dad67..01a755b 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -25,6 +25,12 @@
 u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
 u32 *vcpu_spsr(struct kvm_vcpu *vcpu);
 
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
+void kvm_inject_undefined(struct kvm_vcpu *vcpu);
+void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
+void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
+
 static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
 {
 	return (u32 *)&vcpu->arch.regs.usr_regs.ARM_pc;
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ddb09da..a56a319 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -94,6 +94,10 @@ struct kvm_vcpu_arch {
 	 * Anything that is not used directly from assembly code goes
 	 * here.
 	 */
+	/* dcache set/way operation pending */
+	int last_pcpu;
+	cpumask_t require_dcache_flush;
+
 	/* Interrupt related fields */
 	u32 irq_lines;		/* IRQ and FIQ levels */
 
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index dfc293f..88edce6 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -18,4 +18,4 @@ kvm-arm-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o
+obj-y += coproc.o coproc_a15.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index c94d278..0b4ffcf 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -36,11 +36,14 @@
 #include <asm/mman.h>
 #include <asm/cputype.h>
 #include <asm/tlbflush.h>
+#include <asm/cacheflush.h>
 #include <asm/virt.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <asm/opcodes.h>
 
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
@@ -294,6 +297,17 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	vcpu->cpu = cpu;
 	vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
+
+	/*
+	 * Check whether this vcpu requires the cache to be flushed on
+	 * this physical CPU. This is a consequence of doing dcache
+	 * operations by set/way on this vcpu. We do it here to be in
+	 * a non-preemptible section.
+	 */
+	if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
+		cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+		flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
+	}
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -319,9 +333,16 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+/**
+ * kvm_arch_vcpu_runnable - determine if the vcpu can be scheduled
+ * @v:		The VCPU pointer
+ *
+ * If the guest CPU is not waiting for interrupts or an interrupt line is
+ * asserted, the CPU is by definition runnable.
+ */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return 0;
+	return !!v->arch.irq_lines;
 }
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
@@ -416,6 +437,114 @@ static void update_vttbr(struct kvm *kvm)
 	spin_unlock(&kvm_vmid_lock);
 }
 
+static int handle_svc_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* SVC called from Hyp mode should never get here */
+	kvm_debug("SVC called from Hyp mode shouldn't go here\n");
+	BUG();
+	return -EINVAL; /* Squash warning */
+}
+
+static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/*
+	 * Guest called HVC instruction:
+	 * Let it know we don't want that by injecting an undefined exception.
+	 */
+	kvm_debug("hvc: %x (at %08x)", vcpu->arch.hsr & ((1 << 16) - 1),
+		  *vcpu_pc(vcpu));
+	kvm_debug("         HSR: %8x", vcpu->arch.hsr);
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* We don't support SMC; don't do that. */
+	kvm_debug("smc: at %08x", *vcpu_pc(vcpu));
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int handle_pabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* The hypervisor should never cause aborts */
+	kvm_err("Prefetch Abort taken from Hyp mode at %#08x (HSR: %#08x)\n",
+		vcpu->arch.hxfar, vcpu->arch.hsr);
+	return -EFAULT;
+}
+
+static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/* This is either an error in the ws. code or an external abort */
+	kvm_err("Data Abort taken from Hyp mode at %#08x (HSR: %#08x)\n",
+		vcpu->arch.hxfar, vcpu->arch.hsr);
+	return -EFAULT;
+}
+
+typedef int (*exit_handle_fn)(struct kvm_vcpu *, struct kvm_run *);
+static exit_handle_fn arm_exit_handlers[] = {
+	[HSR_EC_WFI]		= kvm_handle_wfi,
+	[HSR_EC_CP15_32]	= kvm_handle_cp15_32,
+	[HSR_EC_CP15_64]	= kvm_handle_cp15_64,
+	[HSR_EC_CP14_MR]	= kvm_handle_cp14_access,
+	[HSR_EC_CP14_LS]	= kvm_handle_cp14_load_store,
+	[HSR_EC_CP14_64]	= kvm_handle_cp14_access,
+	[HSR_EC_CP_0_13]	= kvm_handle_cp_0_13_access,
+	[HSR_EC_CP10_ID]	= kvm_handle_cp10_id,
+	[HSR_EC_SVC_HYP]	= handle_svc_hyp,
+	[HSR_EC_HVC]		= handle_hvc,
+	[HSR_EC_SMC]		= handle_smc,
+	[HSR_EC_IABT]		= kvm_handle_guest_abort,
+	[HSR_EC_IABT_HYP]	= handle_pabt_hyp,
+	[HSR_EC_DABT]		= kvm_handle_guest_abort,
+	[HSR_EC_DABT_HYP]	= handle_dabt_hyp,
+};
+
+/*
+ * A conditional instruction is allowed to trap, even though it
+ * wouldn't be executed.  So let's re-implement the hardware, in
+ * software!
+ */
+static bool kvm_condition_valid(struct kvm_vcpu *vcpu)
+{
+	unsigned long cpsr, cond, insn;
+
+	/*
+	 * Exception Code 0 can only happen if we set HCR.TGE to 1, to
+	 * catch undefined instructions, and then we won't get past
+	 * the arm_exit_handlers test anyway.
+	 */
+	BUG_ON(((vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT) == 0);
+
+	/* Top two bits non-zero?  Unconditional. */
+	if (vcpu->arch.hsr >> 30)
+		return true;
+
+	cpsr = *vcpu_cpsr(vcpu);
+
+	/* Is condition field valid? */
+	if ((vcpu->arch.hsr & HSR_CV) >> HSR_CV_SHIFT)
+		cond = (vcpu->arch.hsr & HSR_COND) >> HSR_COND_SHIFT;
+	else {
+		/* This can happen in Thumb mode: examine IT state. */
+		unsigned long it;
+
+		it = ((cpsr >> 8) & 0xFC) | ((cpsr >> 25) & 0x3);
+
+		/* it == 0 => unconditional. */
+		if (it == 0)
+			return true;
+
+		/* The cond for this insn works out as the top 4 bits. */
+		cond = (it >> 4);
+	}
+
+	/* Shift makes it look like an ARM-mode instruction */
+	insn = cond << 28;
+	return arm_check_condition(insn, cpsr) != ARM_OPCODE_CONDTEST_FAIL;
+}
+
 /*
  * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
  * proper exit to QEMU.
@@ -423,8 +552,46 @@ static void update_vttbr(struct kvm *kvm)
 static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		       int exception_index)
 {
-	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-	return 0;
+	unsigned long hsr_ec;
+
+	switch (exception_index) {
+	case ARM_EXCEPTION_IRQ:
+		return 1;
+	case ARM_EXCEPTION_UNDEFINED:
+		kvm_err("Undefined exception in Hyp mode at: %#08x\n",
+			vcpu->arch.hyp_pc);
+		BUG();
+		panic("KVM: Hypervisor undefined exception!\n");
+	case ARM_EXCEPTION_DATA_ABORT:
+	case ARM_EXCEPTION_PREF_ABORT:
+	case ARM_EXCEPTION_HVC:
+		hsr_ec = (vcpu->arch.hsr & HSR_EC) >> HSR_EC_SHIFT;
+
+		if (hsr_ec >= ARRAY_SIZE(arm_exit_handlers)
+		    || !arm_exit_handlers[hsr_ec]) {
+			kvm_err("Unkown exception class: %#08lx, "
+				"hsr: %#08x\n", hsr_ec,
+				(unsigned int)vcpu->arch.hsr);
+			BUG();
+		}
+
+		/*
+		 * See ARM ARM B1.14.1: "Hyp traps on instructions
+		 * that fail their condition code check"
+		 */
+		if (!kvm_condition_valid(vcpu)) {
+			bool is_wide = vcpu->arch.hsr & HSR_IL;
+			kvm_skip_instr(vcpu, is_wide);
+			return 1;
+		}
+
+		return arm_exit_handlers[hsr_ec](vcpu, run);
+	default:
+		kvm_pr_unimpl("Unsupported exception type: %d",
+			      exception_index);
+		run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+		return 0;
+	}
 }
 
 /**
@@ -485,6 +652,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
+		vcpu->arch.last_pcpu = smp_processor_id();
 		kvm_guest_exit();
 		trace_kvm_exit(*vcpu_pc(vcpu));
 		/*
@@ -798,6 +966,7 @@ int kvm_arch_init(void *opaque)
 	if (err)
 		goto out_err;
 
+	kvm_coproc_table_init();
 	return 0;
 out_err:
 	return err;
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 0c43355..722efe3 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -16,8 +16,368 @@
  * along with this program; if not, write to the Free Software
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+#include <linux/mm.h>
 #include <linux/kvm_host.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <trace/events/kvm.h>
 
+#include "trace.h"
+#include "coproc.h"
+
+
+/******************************************************************************
+ * Co-processor emulation
+ *****************************************************************************/
+
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	/*
+	 * We can get here, if the host has been built without VFPv3 support,
+	 * but the guest attempted a floating point operation.
+	 */
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+/* See note at ARM ARM B1.14.4 */
+static bool access_dcsw(struct kvm_vcpu *vcpu,
+			const struct coproc_params *p,
+			const struct coproc_reg *r)
+{
+	u32 val;
+	int cpu;
+
+	cpu = get_cpu();
+
+	if (!p->is_write)
+		return read_from_write_only(vcpu, p);
+
+	cpumask_setall(&vcpu->arch.require_dcache_flush);
+	cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+
+	/* If we were already preempted, take the long way around */
+	if (cpu != vcpu->arch.last_pcpu) {
+		flush_cache_all();
+		goto done;
+	}
+
+	val = *vcpu_reg(vcpu, p->Rt1);
+
+	switch (p->CRm) {
+	case 6:			/* Upgrade DCISW to DCCISW, as per HCR.SWIO */
+	case 14:		/* DCCISW */
+		asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r" (val));
+		break;
+
+	case 10:		/* DCCSW */
+		asm volatile("mcr p15, 0, %0, c7, c10, 2" : : "r" (val));
+		break;
+	}
+
+done:
+	put_cpu();
+
+	return true;
+}
+
+/*
+ * We could trap ID_DFR0 and tell the guest we don't support performance
+ * monitoring.  Unfortunately the patch to make the kernel check ID_DFR0 was
+ * NAKed, so it will read the PMCR anyway.
+ *
+ * Therefore we tell the guest we have 0 counters.  Unfortunately, we
+ * must always support PMCCNTR (the cycle counter): we just RAZ/WI for
+ * all PM registers, which doesn't crash the guest kernel at least.
+ */
+static bool pm_fake(struct kvm_vcpu *vcpu,
+		    const struct coproc_params *p,
+		    const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+	else
+		return read_zero(vcpu, p);
+}
+
+#define access_pmcr pm_fake
+#define access_pmcntenset pm_fake
+#define access_pmcntenclr pm_fake
+#define access_pmovsr pm_fake
+#define access_pmselr pm_fake
+#define access_pmceid0 pm_fake
+#define access_pmceid1 pm_fake
+#define access_pmccntr pm_fake
+#define access_pmxevtyper pm_fake
+#define access_pmxevcntr pm_fake
+#define access_pmuserenr pm_fake
+#define access_pmintenset pm_fake
+#define access_pmintenclr pm_fake
+
+/* Architected CP15 registers.
+ * Important: Must be sorted ascending by CRn, CRM, Op1, Op2
+ */
+static const struct coproc_reg cp15_regs[] = {
+	/* CSSELR: swapped by interrupt.S. */
+	{ CRn( 0), CRm( 0), Op1( 2), Op2( 0), is32,
+			NULL, reset_unknown, c0_CSSELR },
+
+	/* TTBR0/TTBR1: swapped by interrupt.S. */
+	{ CRm( 2), Op1( 0), is64, NULL, reset_unknown64, c2_TTBR0 },
+	{ CRm( 2), Op1( 1), is64, NULL, reset_unknown64, c2_TTBR1 },
+
+	/* TTBCR: swapped by interrupt.S. */
+	{ CRn( 2), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_val, c2_TTBCR, 0x00000000 },
+
+	/* DACR: swapped by interrupt.S. */
+	{ CRn( 3), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c3_DACR },
+
+	/* DFSR/IFSR/ADFSR/AIFSR: swapped by interrupt.S. */
+	{ CRn( 5), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c5_DFSR },
+	{ CRn( 5), CRm( 0), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c5_IFSR },
+	{ CRn( 5), CRm( 1), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c5_ADFSR },
+	{ CRn( 5), CRm( 1), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c5_AIFSR },
+
+	/* DFAR/IFAR: swapped by interrupt.S. */
+	{ CRn( 6), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c6_DFAR },
+	{ CRn( 6), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_unknown, c6_IFAR },
+	/*
+	 * DC{C,I,CI}SW operations:
+	 */
+	{ CRn( 7), CRm( 6), Op1( 0), Op2( 2), is32, access_dcsw},
+	{ CRn( 7), CRm(10), Op1( 0), Op2( 2), is32, access_dcsw},
+	{ CRn( 7), CRm(14), Op1( 0), Op2( 2), is32, access_dcsw},
+	/*
+	 * Dummy performance monitor implementation.
+	 */
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 0), is32, access_pmcr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 1), is32, access_pmcntenset},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 2), is32, access_pmcntenclr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 3), is32, access_pmovsr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 5), is32, access_pmselr},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 6), is32, access_pmceid0},
+	{ CRn( 9), CRm(12), Op1( 0), Op2( 7), is32, access_pmceid1},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 0), is32, access_pmccntr},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 1), is32, access_pmxevtyper},
+	{ CRn( 9), CRm(13), Op1( 0), Op2( 2), is32, access_pmxevcntr},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 0), is32, access_pmuserenr},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 1), is32, access_pmintenset},
+	{ CRn( 9), CRm(14), Op1( 0), Op2( 2), is32, access_pmintenclr},
+
+	/* PRRR/NMRR (aka MAIR0/MAIR1): swapped by interrupt.S. */
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 0), is32,
+			NULL, reset_unknown, c10_PRRR},
+	{ CRn(10), CRm( 2), Op1( 0), Op2( 1), is32,
+			NULL, reset_unknown, c10_NMRR},
+
+	/* VBAR: swapped by interrupt.S. */
+	{ CRn(12), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_val, c12_VBAR, 0x00000000 },
+
+	/* CONTEXTIDR/TPIDRURW/TPIDRURO/TPIDRPRW: swapped by interrupt.S. */
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 1), is32,
+			NULL, reset_val, c13_CID, 0x00000000 },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_unknown, c13_TID_URW },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 3), is32,
+			NULL, reset_unknown, c13_TID_URO },
+	{ CRn(13), CRm( 0), Op1( 0), Op2( 4), is32,
+			NULL, reset_unknown, c13_TID_PRIV },
+};
+
+/* Target specific emulation tables */
+static struct kvm_coproc_target_table *target_tables[KVM_ARM_NUM_TARGETS];
+
+void kvm_register_target_coproc_table(struct kvm_coproc_target_table *table)
+{
+	target_tables[table->target] = table;
+}
+
+/* Get specific register table for this target. */
+static const struct coproc_reg *get_target_table(unsigned target, size_t *num)
+{
+	struct kvm_coproc_target_table *table;
+
+	table = target_tables[target];
+	*num = table->num;
+	return table->table;
+}
+
+static const struct coproc_reg *find_reg(const struct coproc_params *params,
+					 const struct coproc_reg table[],
+					 unsigned int num)
+{
+	unsigned int i;
+
+	for (i = 0; i < num; i++) {
+		const struct coproc_reg *r = &table[i];
+
+		if (params->is_64bit != r->is_64)
+			continue;
+		if (params->CRn != r->CRn)
+			continue;
+		if (params->CRm != r->CRm)
+			continue;
+		if (params->Op1 != r->Op1)
+			continue;
+		if (params->Op2 != r->Op2)
+			continue;
+
+		return r;
+	}
+	return NULL;
+}
+
+static int emulate_cp15(struct kvm_vcpu *vcpu,
+			const struct coproc_params *params)
+{
+	size_t num;
+	const struct coproc_reg *table, *r;
+
+	trace_kvm_emulate_cp15_imp(params->Op1, params->Rt1, params->CRn,
+				   params->CRm, params->Op2, params->is_write);
+
+	table = get_target_table(vcpu->arch.target, &num);
+
+	/* Search target-specific then generic table. */
+	r = find_reg(params, table, num);
+	if (!r)
+		r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	if (likely(r)) {
+		/* If we don't have an accessor, we should never get here! */
+		BUG_ON(!r->access);
+
+		if (likely(r->access(vcpu, params, r))) {
+			/* Skip instruction, since it was emulated */
+			kvm_skip_instr(vcpu, (vcpu->arch.hsr >> 25) & 1);
+			return 1;
+		}
+		/* If access function fails, it should complain. */
+	} else {
+		kvm_err("Unsupported guest CP15 access at: %08x\n",
+			*vcpu_pc(vcpu));
+		print_cp_instr(params);
+	}
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+/**
+ * kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = true;
+
+	params.Op1 = (vcpu->arch.hsr >> 16) & 0xf;
+	params.Op2 = 0;
+	params.Rt2 = (vcpu->arch.hsr >> 10) & 0xf;
+	params.CRn = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+static void reset_coproc_regs(struct kvm_vcpu *vcpu,
+			      const struct coproc_reg *table, size_t num)
+{
+	unsigned long i;
+
+	for (i = 0; i < num; i++)
+		if (table[i].reset)
+			table[i].reset(vcpu, &table[i]);
+}
+
+/**
+ * kvm_handle_cp15_32 -- handles a mrc/mcr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	struct coproc_params params;
+
+	params.CRm = (vcpu->arch.hsr >> 1) & 0xf;
+	params.Rt1 = (vcpu->arch.hsr >> 5) & 0xf;
+	params.is_write = ((vcpu->arch.hsr & 1) == 0);
+	params.is_64bit = false;
+
+	params.CRn = (vcpu->arch.hsr >> 10) & 0xf;
+	params.Op1 = (vcpu->arch.hsr >> 14) & 0x7;
+	params.Op2 = (vcpu->arch.hsr >> 17) & 0x7;
+	params.Rt2 = 0;
+
+	return emulate_cp15(vcpu, &params);
+}
+
+void kvm_coproc_table_init(void)
+{
+	unsigned int i;
+
+	/* Make sure tables are unique and in order. */
+	for (i = 1; i < ARRAY_SIZE(cp15_regs); i++)
+		BUG_ON(cmp_reg(&cp15_regs[i-1], &cp15_regs[i]) >= 0);
+}
+
+/**
+ * kvm_reset_coprocs - sets cp15 registers to reset value
+ * @vcpu: The VCPU pointer
+ *
+ * This function finds the right table above and sets the registers on the
+ * virtual CPU struct to their architecturally defined reset values.
+ */
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
 {
+	size_t num;
+	const struct coproc_reg *table;
+
+	/* Catch someone adding a register without putting in reset entry. */
+	memset(vcpu->arch.cp15, 0x42, sizeof(vcpu->arch.cp15));
+
+	/* Generic chip reset first (so target could override). */
+	reset_coproc_regs(vcpu, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	table = get_target_table(vcpu->arch.target, &num);
+	reset_coproc_regs(vcpu, table, num);
+
+	for (num = 1; num < NR_CP15_REGS; num++)
+		if (vcpu->arch.cp15[num] == 0x42424242)
+			panic("Didn't reset vcpu->arch.cp15[%zi]", num);
 }
diff --git a/arch/arm/kvm/coproc.h b/arch/arm/kvm/coproc.h
new file mode 100644
index 0000000..992adfa
--- /dev/null
+++ b/arch/arm/kvm/coproc.h
@@ -0,0 +1,153 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Authors: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_COPROC_LOCAL_H__
+#define __ARM_KVM_COPROC_LOCAL_H__
+
+struct coproc_params {
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+	unsigned long Rt1;
+	unsigned long Rt2;
+	bool is_64bit;
+	bool is_write;
+};
+
+struct coproc_reg {
+	/* MRC/MCR/MRRC/MCRR instruction which accesses it. */
+	unsigned long CRn;
+	unsigned long CRm;
+	unsigned long Op1;
+	unsigned long Op2;
+
+	bool is_64;
+
+	/* Trapped access from guest, if non-NULL. */
+	bool (*access)(struct kvm_vcpu *,
+		       const struct coproc_params *,
+		       const struct coproc_reg *);
+
+	/* Initialization for vcpu. */
+	void (*reset)(struct kvm_vcpu *, const struct coproc_reg *);
+
+	/* Index into vcpu->arch.cp15[], or 0 if we don't need to save it. */
+	unsigned long reg;
+
+	/* Value (usually reset value) */
+	u64 val;
+};
+
+static inline void print_cp_instr(const struct coproc_params *p)
+{
+	/* Look, we even formatted it for you to paste into the table! */
+	if (p->is_64bit) {
+		kvm_pr_unimpl(" { CRm(%2lu), Op1(%2lu), is64, func_%s },\n",
+			      p->CRm, p->Op1, p->is_write ? "write" : "read");
+	} else {
+		kvm_pr_unimpl(" { CRn(%2lu), CRm(%2lu), Op1(%2lu), Op2(%2lu), is32,"
+			      " func_%s },\n",
+			      p->CRn, p->CRm, p->Op1, p->Op2,
+			      p->is_write ? "write" : "read");
+	}
+}
+
+static inline bool ignore_write(struct kvm_vcpu *vcpu,
+				const struct coproc_params *p)
+{
+	return true;
+}
+
+static inline bool read_zero(struct kvm_vcpu *vcpu,
+			     const struct coproc_params *p)
+{
+	*vcpu_reg(vcpu, p->Rt1) = 0;
+	return true;
+}
+
+static inline bool write_to_read_only(struct kvm_vcpu *vcpu,
+				      const struct coproc_params *params)
+{
+	kvm_debug("CP15 write to read-only register at: %08x\n",
+		  *vcpu_pc(vcpu));
+	print_cp_instr(params);
+	return false;
+}
+
+static inline bool read_from_write_only(struct kvm_vcpu *vcpu,
+					const struct coproc_params *params)
+{
+	kvm_debug("CP15 read to write-only register at: %08x\n",
+		  *vcpu_pc(vcpu));
+	print_cp_instr(params);
+	return false;
+}
+
+/* Reset functions */
+static inline void reset_unknown(struct kvm_vcpu *vcpu,
+				 const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15));
+	vcpu->arch.cp15[r->reg] = 0xdecafbad;
+}
+
+static inline void reset_val(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15));
+	vcpu->arch.cp15[r->reg] = r->val;
+}
+
+static inline void reset_unknown64(struct kvm_vcpu *vcpu,
+				   const struct coproc_reg *r)
+{
+	BUG_ON(!r->reg);
+	BUG_ON(r->reg + 1 >= ARRAY_SIZE(vcpu->arch.cp15));
+
+	vcpu->arch.cp15[r->reg] = 0xdecafbad;
+	vcpu->arch.cp15[r->reg+1] = 0xd0c0ffee;
+}
+
+static inline int cmp_reg(const struct coproc_reg *i1,
+			  const struct coproc_reg *i2)
+{
+	BUG_ON(i1 == i2);
+	if (!i1)
+		return 1;
+	else if (!i2)
+		return -1;
+	if (i1->CRn != i2->CRn)
+		return i1->CRn - i2->CRn;
+	if (i1->CRm != i2->CRm)
+		return i1->CRm - i2->CRm;
+	if (i1->Op1 != i2->Op1)
+		return i1->Op1 - i2->Op1;
+	return i1->Op2 - i2->Op2;
+}
+
+
+#define CRn(_x)		.CRn = _x
+#define CRm(_x) 	.CRm = _x
+#define Op1(_x) 	.Op1 = _x
+#define Op2(_x) 	.Op2 = _x
+#define is64		.is_64 = true
+#define is32		.is_64 = false
+
+#endif /* __ARM_KVM_COPROC_LOCAL_H__ */
diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
new file mode 100644
index 0000000..685063a
--- /dev/null
+++ b/arch/arm/kvm/coproc_a15.c
@@ -0,0 +1,162 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Authors: Rusty Russell <rusty@rustcorp.au>
+ *          Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/kvm_host.h>
+#include <asm/cputype.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
+#include <linux/init.h>
+
+static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	/*
+	 * Compute guest MPIDR:
+	 * (Even if we present only one VCPU to the guest on an SMP
+	 * host we don't set the U bit in the MPIDR, or vice versa, as
+	 * revealing the underlying hardware properties is likely to
+	 * be the best choice).
+	 */
+	vcpu->arch.cp15[c0_MPIDR] = (read_cpuid_mpidr() & ~MPIDR_LEVEL_MASK)
+		| (vcpu->vcpu_id & MPIDR_LEVEL_MASK);
+}
+
+#include "coproc.h"
+
+/* A15 TRM 4.3.28: RO WI */
+static bool access_actlr(struct kvm_vcpu *vcpu,
+			 const struct coproc_params *p,
+			 const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c1_ACTLR];
+	return true;
+}
+
+/* A15 TRM 4.3.60: R/O. */
+static bool access_cbar(struct kvm_vcpu *vcpu,
+			const struct coproc_params *p,
+			const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p);
+	return read_zero(vcpu, p);
+}
+
+/* A15 TRM 4.3.48: R/O WI. */
+static bool access_l2ctlr(struct kvm_vcpu *vcpu,
+			  const struct coproc_params *p,
+			  const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c9_L2CTLR];
+	return true;
+}
+
+static void reset_l2ctlr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	u32 l2ctlr, ncores;
+
+	asm volatile("mrc p15, 1, %0, c9, c0, 2\n" : "=r" (l2ctlr));
+	l2ctlr &= ~(3 << 24);
+	ncores = atomic_read(&vcpu->kvm->online_vcpus) - 1;
+	l2ctlr |= (ncores & 3) << 24;
+
+	vcpu->arch.cp15[c9_L2CTLR] = l2ctlr;
+}
+
+static void reset_actlr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
+{
+	u32 actlr;
+
+	/* ACTLR contains SMP bit: make sure you create all cpus first! */
+	asm volatile("mrc p15, 0, %0, c1, c0, 1\n" : "=r" (actlr));
+	/* Make the SMP bit consistent with the guest configuration */
+	if (atomic_read(&vcpu->kvm->online_vcpus) > 1)
+		actlr |= 1U << 6;
+	else
+		actlr &= ~(1U << 6);
+
+	vcpu->arch.cp15[c1_ACTLR] = actlr;
+}
+
+/* A15 TRM 4.3.49: R/O WI (even if NSACR.NS_L2ERR, a write of 1 is ignored). */
+static bool access_l2ectlr(struct kvm_vcpu *vcpu,
+			   const struct coproc_params *p,
+			   const struct coproc_reg *r)
+{
+	if (p->is_write)
+		return ignore_write(vcpu, p);
+
+	*vcpu_reg(vcpu, p->Rt1) = 0;
+	return true;
+}
+
+/*
+ * A15-specific CP15 registers.
+ * Important: Must be sorted ascending by CRn, CRM, Op1, Op2
+ */
+static const struct coproc_reg a15_regs[] = {
+	/* MPIDR: we use VMPIDR for guest access. */
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 5), is32,
+			NULL, reset_mpidr, c0_MPIDR },
+
+	/* SCTLR: swapped by interrupt.S. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
+			NULL, reset_val, c1_SCTLR, 0x00C50078 },
+	/* ACTLR: trapped by HCR.TAC bit. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 1), is32,
+			access_actlr, reset_actlr, c1_ACTLR },
+	/* CPACR: swapped by interrupt.S. */
+	{ CRn( 1), CRm( 0), Op1( 0), Op2( 2), is32,
+			NULL, reset_val, c1_CPACR, 0x00000000 },
+
+	/*
+	 * L2CTLR access (guest wants to know #CPUs).
+	 */
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 2), is32,
+			access_l2ctlr, reset_l2ctlr, c9_L2CTLR },
+	{ CRn( 9), CRm( 0), Op1( 1), Op2( 3), is32, access_l2ectlr},
+
+	/* The Configuration Base Address Register. */
+	{ CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
+};
+
+static struct kvm_coproc_target_table a15_target_table = {
+	.target = KVM_ARM_TARGET_CORTEX_A15,
+	.table = a15_regs,
+	.num = ARRAY_SIZE(a15_regs),
+};
+
+static int __init coproc_a15_init(void)
+{
+	unsigned int i;
+
+	for (i = 1; i < ARRAY_SIZE(a15_regs); i++)
+		BUG_ON(cmp_reg(&a15_regs[i-1],
+			       &a15_regs[i]) >= 0);
+
+	kvm_register_target_coproc_table(&a15_target_table);
+	return 0;
+}
+late_initcall(coproc_a15_init);
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 3eadc25..d61450a 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -16,7 +16,13 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 
+#include <linux/mm.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
 
 #define VCPU_NR_MODES		6
 #define VCPU_REG_OFFSET_USR	0
@@ -153,3 +159,215 @@ u32 *vcpu_spsr(struct kvm_vcpu *vcpu)
 		BUG();
 	}
 }
+
+/**
+ * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
+ * @vcpu:	the vcpu pointer
+ * @run:	the kvm_run structure pointer
+ *
+ * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
+ * halt execution of world-switches and schedule other host processes until
+ * there is an incoming IRQ or FIQ to the VM.
+ */
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	trace_kvm_wfi(*vcpu_pc(vcpu));
+	kvm_vcpu_block(vcpu);
+	return 1;
+}
+
+/**
+ * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
+ * @vcpu:	The VCPU pointer
+ *
+ * When exceptions occur while instructions are executed in Thumb IF-THEN
+ * blocks, the ITSTATE field of the CPSR is not advanved (updated), so we have
+ * to do this little bit of work manually. The fields map like this:
+ *
+ * IT[7:0] -> CPSR[26:25],CPSR[15:10]
+ */
+static void kvm_adjust_itstate(struct kvm_vcpu *vcpu)
+{
+	unsigned long itbits, cond;
+	unsigned long cpsr = *vcpu_cpsr(vcpu);
+	bool is_arm = !(cpsr & PSR_T_BIT);
+
+	BUG_ON(is_arm && (cpsr & PSR_IT_MASK));
+
+	if (!(cpsr & PSR_IT_MASK))
+		return;
+
+	cond = (cpsr & 0xe000) >> 13;
+	itbits = (cpsr & 0x1c00) >> (10 - 2);
+	itbits |= (cpsr & (0x3 << 25)) >> 25;
+
+	/* Perform ITAdvance (see page A-52 in ARM DDI 0406C) */
+	if ((itbits & 0x7) == 0)
+		itbits = cond = 0;
+	else
+		itbits = (itbits << 1) & 0x1f;
+
+	cpsr &= ~PSR_IT_MASK;
+	cpsr |= cond << 13;
+	cpsr |= (itbits & 0x1c) << (10 - 2);
+	cpsr |= (itbits & 0x3) << 25;
+	*vcpu_cpsr(vcpu) = cpsr;
+}
+
+/**
+ * kvm_skip_instr - skip a trapped instruction and proceed to the next
+ * @vcpu: The vcpu pointer
+ */
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
+{
+	bool is_thumb;
+
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	if (is_thumb && !is_wide_instr)
+		*vcpu_pc(vcpu) += 2;
+	else
+		*vcpu_pc(vcpu) += 4;
+	kvm_adjust_itstate(vcpu);
+}
+
+
+/******************************************************************************
+ * Inject exceptions into the guest
+ */
+
+static u32 exc_vector_base(struct kvm_vcpu *vcpu)
+{
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	u32 vbar = vcpu->arch.cp15[c12_VBAR];
+
+	if (sctlr & SCTLR_V)
+		return 0xffff0000;
+	else /* always have security exceptions */
+		return vbar;
+}
+
+/**
+ * kvm_inject_undefined - inject an undefined exception into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ *
+ * Modelled after TakeUndefInstrException() pseudocode.
+ */
+void kvm_inject_undefined(struct kvm_vcpu *vcpu)
+{
+	u32 new_lr_value;
+	u32 new_spsr_value;
+	u32 cpsr = *vcpu_cpsr(vcpu);
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	bool is_thumb = (cpsr & PSR_T_BIT);
+	u32 vect_offset = 4;
+	u32 return_offset = (is_thumb) ? 2 : 4;
+
+	new_spsr_value = cpsr;
+	new_lr_value = *vcpu_pc(vcpu) - return_offset;
+
+	*vcpu_cpsr(vcpu) = (cpsr & ~MODE_MASK) | UND_MODE;
+	*vcpu_cpsr(vcpu) |= PSR_I_BIT;
+	*vcpu_cpsr(vcpu) &= ~(PSR_IT_MASK | PSR_J_BIT | PSR_E_BIT | PSR_T_BIT);
+
+	if (sctlr & SCTLR_TE)
+		*vcpu_cpsr(vcpu) |= PSR_T_BIT;
+	if (sctlr & SCTLR_EE)
+		*vcpu_cpsr(vcpu) |= PSR_E_BIT;
+
+	/* Note: These now point to UND banked copies */
+	*vcpu_spsr(vcpu) = cpsr;
+	*vcpu_reg(vcpu, 14) = new_lr_value;
+
+	/* Branch to exception vector */
+	*vcpu_pc(vcpu) = exc_vector_base(vcpu) + vect_offset;
+}
+
+/*
+ * Modelled after TakeDataAbortException() and TakePrefetchAbortException
+ * pseudocode.
+ */
+static void inject_abt(struct kvm_vcpu *vcpu, bool is_pabt, unsigned long addr)
+{
+	u32 new_lr_value;
+	u32 new_spsr_value;
+	u32 cpsr = *vcpu_cpsr(vcpu);
+	u32 sctlr = vcpu->arch.cp15[c1_SCTLR];
+	bool is_thumb = (cpsr & PSR_T_BIT);
+	u32 vect_offset;
+	u32 return_offset = (is_thumb) ? 4 : 0;
+	bool is_lpae;
+
+	new_spsr_value = cpsr;
+	new_lr_value = *vcpu_pc(vcpu) + return_offset;
+
+	*vcpu_cpsr(vcpu) = (cpsr & ~MODE_MASK) | ABT_MODE;
+	*vcpu_cpsr(vcpu) |= PSR_I_BIT | PSR_A_BIT;
+	*vcpu_cpsr(vcpu) &= ~(PSR_IT_MASK | PSR_J_BIT | PSR_E_BIT | PSR_T_BIT);
+
+	if (sctlr & SCTLR_TE)
+		*vcpu_cpsr(vcpu) |= PSR_T_BIT;
+	if (sctlr & SCTLR_EE)
+		*vcpu_cpsr(vcpu) |= PSR_E_BIT;
+
+	/* Note: These now point to ABT banked copies */
+	*vcpu_spsr(vcpu) = cpsr;
+	*vcpu_reg(vcpu, 14) = new_lr_value;
+
+	if (is_pabt)
+		vect_offset = 12;
+	else
+		vect_offset = 16;
+
+	/* Branch to exception vector */
+	*vcpu_pc(vcpu) = exc_vector_base(vcpu) + vect_offset;
+
+	if (is_pabt) {
+		/* Set DFAR and DFSR */
+		vcpu->arch.cp15[c6_IFAR] = addr;
+		is_lpae = (vcpu->arch.cp15[c2_TTBCR] >> 31);
+		/* Always give debug fault for now - should give guest a clue */
+		if (is_lpae)
+			vcpu->arch.cp15[c5_IFSR] = 1 << 9 | 0x22;
+		else
+			vcpu->arch.cp15[c5_IFSR] = 2;
+	} else { /* !iabt */
+		/* Set DFAR and DFSR */
+		vcpu->arch.cp15[c6_DFAR] = addr;
+		is_lpae = (vcpu->arch.cp15[c2_TTBCR] >> 31);
+		/* Always give debug fault for now - should give guest a clue */
+		if (is_lpae)
+			vcpu->arch.cp15[c5_DFSR] = 1 << 9 | 0x22;
+		else
+			vcpu->arch.cp15[c5_DFSR] = 2;
+	}
+
+}
+
+/**
+ * kvm_inject_dabt - inject a data abort into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ * @addr: The address to report in the DFAR
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ */
+void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
+{
+	inject_abt(vcpu, false, addr);
+}
+
+/**
+ * kvm_inject_pabt - inject a prefetch abort into the guest
+ * @vcpu: The VCPU to receive the undefined exception
+ * @addr: The address to report in the DFAR
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ */
+void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
+{
+	inject_abt(vcpu, true, addr);
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 105d1f7..c86a513 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -64,6 +64,51 @@ TRACE_EVENT(kvm_irq_line,
 		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
 );
 
+/* Architecturally implementation defined CP15 register access */
+TRACE_EVENT(kvm_emulate_cp15_imp,
+	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
+		 unsigned long CRm, unsigned long Op2, bool is_write),
+	TP_ARGS(Op1, Rt1, CRn, CRm, Op2, is_write),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	Op1		)
+		__field(	unsigned int,	Rt1		)
+		__field(	unsigned int,	CRn		)
+		__field(	unsigned int,	CRm		)
+		__field(	unsigned int,	Op2		)
+		__field(	bool,		is_write	)
+	),
+
+	TP_fast_assign(
+		__entry->is_write		= is_write;
+		__entry->Op1			= Op1;
+		__entry->Rt1			= Rt1;
+		__entry->CRn			= CRn;
+		__entry->CRm			= CRm;
+		__entry->Op2			= Op2;
+	),
+
+	TP_printk("Implementation defined CP15: %s\tp15, %u, r%u, c%u, c%u, %u",
+			(__entry->is_write) ? "mcr" : "mrc",
+			__entry->Op1, __entry->Rt1, __entry->CRn,
+			__entry->CRm, __entry->Op2)
+);
+
+TRACE_EVENT(kvm_wfi,
+	TP_PROTO(unsigned long vcpu_pc),
+	TP_ARGS(vcpu_pc),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+	),
+
+	TP_printk("guest executed wfi at: 0x%08lx", __entry->vcpu_pc)
+);
+
 TRACE_EVENT(kvm_unmap_hva,
 	TP_PROTO(unsigned long hva),
 	TP_ARGS(hva),

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 09/14] KVM: ARM: User space API for getting/setting co-proc registers
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:39   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm; +Cc: Marcelo Tosatti, Rusty Russell

The following three ioctls are implemented:
 -  KVM_GET_REG_LIST
 -  KVM_GET_ONE_REG
 -  KVM_SET_ONE_REG

Now we have a table for all the cp15 registers, we can drive a generic
API.

The register IDs carry the following encoding:

ARM registers are mapped using the lower 32 bits.  The upper 16 of that
is the register group type, or coprocessor number:

ARM 32-bit CP15 registers have the following id bit patterns:
  0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>

ARM 64-bit CP15 registers have the following id bit patterns:
  0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>

For futureproofing, we need to tell QEMU about the CP15 registers the
host lets the guest access.

It will need this information to restore a current guest on a future
CPU or perhaps a future KVM which allow some of these to be changed.

We use a separate table for these, as they're only for the userspace API.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |    5 +
 arch/arm/include/asm/kvm_coproc.h |    9 +
 arch/arm/include/asm/kvm_host.h   |    4 
 arch/arm/kvm/coproc.c             |  327 +++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/guest.c              |    9 +
 5 files changed, 350 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 5050492..0e22874 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1799,6 +1799,11 @@ is the register group type, or coprocessor number:
 ARM core registers have the following id bit patterns:
   0x4002 0000 0010 <index into the kvm_regs struct:16>
 
+ARM 32-bit CP15 registers have the following id bit patterns:
+  0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
+
+ARM 64-bit CP15 registers have the following id bit patterns:
+  0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
 
 
 4.69 KVM_GET_ONE_REG
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
index bd1ace0..4917c2f 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -34,5 +34,14 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+unsigned long kvm_arm_num_guest_msrs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_msrindices(struct kvm_vcpu *vcpu, u64 __user *uindices);
 void kvm_coproc_table_init(void);
+
+struct kvm_one_reg;
+int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
+int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index a56a319..6cc8933 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -27,6 +27,7 @@
 #define KVM_USER_MEM_SLOTS 32
 #define KVM_PRIVATE_MEM_SLOTS 4
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_HAVE_ONE_REG
 
 #define KVM_VCPU_MAX_FEATURES 0
 
@@ -134,6 +135,9 @@ int kvm_unmap_hva_range(struct kvm *kvm,
 			unsigned long start, unsigned long end);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
+
 /* We do not have shadow page tables, hence the empty hooks */
 static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
 {
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 722efe3..95a0f5e 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -18,6 +18,7 @@
  */
 #include <linux/mm.h>
 #include <linux/kvm_host.h>
+#include <linux/uaccess.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
@@ -347,6 +348,328 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return emulate_cp15(vcpu, &params);
 }
 
+/******************************************************************************
+ * Userspace API
+ *****************************************************************************/
+
+static bool index_to_params(u64 id, struct coproc_params *params)
+{
+	switch (id & KVM_REG_SIZE_MASK) {
+	case KVM_REG_SIZE_U32:
+		/* Any unused index bits means it's not valid. */
+		if (id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK
+			   | KVM_REG_ARM_COPROC_MASK
+			   | KVM_REG_ARM_32_CRN_MASK
+			   | KVM_REG_ARM_CRM_MASK
+			   | KVM_REG_ARM_OPC1_MASK
+			   | KVM_REG_ARM_32_OPC2_MASK))
+			return false;
+
+		params->is_64bit = false;
+		params->CRn = ((id & KVM_REG_ARM_32_CRN_MASK)
+			       >> KVM_REG_ARM_32_CRN_SHIFT);
+		params->CRm = ((id & KVM_REG_ARM_CRM_MASK)
+			       >> KVM_REG_ARM_CRM_SHIFT);
+		params->Op1 = ((id & KVM_REG_ARM_OPC1_MASK)
+			       >> KVM_REG_ARM_OPC1_SHIFT);
+		params->Op2 = ((id & KVM_REG_ARM_32_OPC2_MASK)
+			       >> KVM_REG_ARM_32_OPC2_SHIFT);
+		return true;
+	case KVM_REG_SIZE_U64:
+		/* Any unused index bits means it's not valid. */
+		if (id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK
+			      | KVM_REG_ARM_COPROC_MASK
+			      | KVM_REG_ARM_CRM_MASK
+			      | KVM_REG_ARM_OPC1_MASK))
+			return false;
+		params->is_64bit = true;
+		params->CRm = ((id & KVM_REG_ARM_CRM_MASK)
+			       >> KVM_REG_ARM_CRM_SHIFT);
+		params->Op1 = ((id & KVM_REG_ARM_OPC1_MASK)
+			       >> KVM_REG_ARM_OPC1_SHIFT);
+		params->Op2 = 0;
+		params->CRn = 0;
+		return true;
+	default:
+		return false;
+	}
+}
+
+/* Decode an index value, and find the cp15 coproc_reg entry. */
+static const struct coproc_reg *index_to_coproc_reg(struct kvm_vcpu *vcpu,
+						    u64 id)
+{
+	size_t num;
+	const struct coproc_reg *table, *r;
+	struct coproc_params params;
+
+	/* We only do cp15 for now. */
+	if ((id & KVM_REG_ARM_COPROC_MASK) >> KVM_REG_ARM_COPROC_SHIFT != 15)
+		return NULL;
+
+	if (!index_to_params(id, &params))
+		return NULL;
+
+	table = get_target_table(vcpu->arch.target, &num);
+	r = find_reg(&params, table, num);
+	if (!r)
+		r = find_reg(&params, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	/* Not saved in the cp15 array? */
+	if (r && !r->reg)
+		r = NULL;
+
+	return r;
+}
+
+/*
+ * These are the invariant cp15 registers: we let the guest see the host
+ * versions of these, so they're part of the guest state.
+ *
+ * A future CPU may provide a mechanism to present different values to
+ * the guest, or a future kvm may trap them.
+ */
+/* Unfortunately, there's no register-argument for mrc, so generate. */
+#define FUNCTION_FOR32(crn, crm, op1, op2, name)			\
+	static void get_##name(struct kvm_vcpu *v,			\
+			       const struct coproc_reg *r)		\
+	{								\
+		u32 val;						\
+									\
+		asm volatile("mrc p15, " __stringify(op1)		\
+			     ", %0, c" __stringify(crn)			\
+			     ", c" __stringify(crm)			\
+			     ", " __stringify(op2) "\n" : "=r" (val));	\
+		((struct coproc_reg *)r)->val = val;			\
+	}
+
+FUNCTION_FOR32(0, 0, 0, 0, MIDR)
+FUNCTION_FOR32(0, 0, 0, 1, CTR)
+FUNCTION_FOR32(0, 0, 0, 2, TCMTR)
+FUNCTION_FOR32(0, 0, 0, 3, TLBTR)
+FUNCTION_FOR32(0, 0, 0, 6, REVIDR)
+FUNCTION_FOR32(0, 1, 0, 0, ID_PFR0)
+FUNCTION_FOR32(0, 1, 0, 1, ID_PFR1)
+FUNCTION_FOR32(0, 1, 0, 2, ID_DFR0)
+FUNCTION_FOR32(0, 1, 0, 3, ID_AFR0)
+FUNCTION_FOR32(0, 1, 0, 4, ID_MMFR0)
+FUNCTION_FOR32(0, 1, 0, 5, ID_MMFR1)
+FUNCTION_FOR32(0, 1, 0, 6, ID_MMFR2)
+FUNCTION_FOR32(0, 1, 0, 7, ID_MMFR3)
+FUNCTION_FOR32(0, 2, 0, 0, ID_ISAR0)
+FUNCTION_FOR32(0, 2, 0, 1, ID_ISAR1)
+FUNCTION_FOR32(0, 2, 0, 2, ID_ISAR2)
+FUNCTION_FOR32(0, 2, 0, 3, ID_ISAR3)
+FUNCTION_FOR32(0, 2, 0, 4, ID_ISAR4)
+FUNCTION_FOR32(0, 2, 0, 5, ID_ISAR5)
+FUNCTION_FOR32(0, 0, 1, 1, CLIDR)
+FUNCTION_FOR32(0, 0, 1, 7, AIDR)
+
+/* ->val is filled in by kvm_invariant_coproc_table_init() */
+static struct coproc_reg invariant_cp15[] = {
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 0), is32, NULL, get_MIDR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 1), is32, NULL, get_CTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 2), is32, NULL, get_TCMTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 3), is32, NULL, get_TLBTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 6), is32, NULL, get_REVIDR },
+
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 0), is32, NULL, get_ID_PFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 1), is32, NULL, get_ID_PFR1 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 2), is32, NULL, get_ID_DFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 3), is32, NULL, get_ID_AFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 4), is32, NULL, get_ID_MMFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 5), is32, NULL, get_ID_MMFR1 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 6), is32, NULL, get_ID_MMFR2 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 7), is32, NULL, get_ID_MMFR3 },
+
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 0), is32, NULL, get_ID_ISAR0 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 1), is32, NULL, get_ID_ISAR1 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 2), is32, NULL, get_ID_ISAR2 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 3), is32, NULL, get_ID_ISAR3 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 4), is32, NULL, get_ID_ISAR4 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 5), is32, NULL, get_ID_ISAR5 },
+
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR },
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
+};
+
+static int reg_from_user(void *val, const void __user *uaddr, u64 id)
+{
+	/* This Just Works because we are little endian. */
+	if (copy_from_user(val, uaddr, KVM_REG_SIZE(id)) != 0)
+		return -EFAULT;
+	return 0;
+}
+
+static int reg_to_user(void __user *uaddr, const void *val, u64 id)
+{
+	/* This Just Works because we are little endian. */
+	if (copy_to_user(uaddr, val, KVM_REG_SIZE(id)) != 0)
+		return -EFAULT;
+	return 0;
+}
+
+static int get_invariant_cp15(u64 id, void __user *uaddr)
+{
+	struct coproc_params params;
+	const struct coproc_reg *r;
+
+	if (!index_to_params(id, &params))
+		return -ENOENT;
+
+	r = find_reg(&params, invariant_cp15, ARRAY_SIZE(invariant_cp15));
+	if (!r)
+		return -ENOENT;
+
+	return reg_to_user(uaddr, &r->val, id);
+}
+
+static int set_invariant_cp15(u64 id, void __user *uaddr)
+{
+	struct coproc_params params;
+	const struct coproc_reg *r;
+	int err;
+	u64 val = 0; /* Make sure high bits are 0 for 32-bit regs */
+
+	if (!index_to_params(id, &params))
+		return -ENOENT;
+	r = find_reg(&params, invariant_cp15, ARRAY_SIZE(invariant_cp15));
+	if (!r)
+		return -ENOENT;
+
+	err = reg_from_user(&val, uaddr, id);
+	if (err)
+		return err;
+
+	/* This is what we mean by invariant: you can't change it. */
+	if (r->val != val)
+		return -EINVAL;
+
+	return 0;
+}
+
+int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	const struct coproc_reg *r;
+	void __user *uaddr = (void __user *)(long)reg->addr;
+
+	r = index_to_coproc_reg(vcpu, reg->id);
+	if (!r)
+		return get_invariant_cp15(reg->id, uaddr);
+
+	/* Note: copies two regs if size is 64 bit. */
+	return reg_to_user(uaddr, &vcpu->arch.cp15[r->reg], reg->id);
+}
+
+int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	const struct coproc_reg *r;
+	void __user *uaddr = (void __user *)(long)reg->addr;
+
+	r = index_to_coproc_reg(vcpu, reg->id);
+	if (!r)
+		return set_invariant_cp15(reg->id, uaddr);
+
+	/* Note: copies two regs if size is 64 bit */
+	return reg_from_user(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
+}
+
+static u64 cp15_to_index(const struct coproc_reg *reg)
+{
+	u64 val = KVM_REG_ARM | (15 << KVM_REG_ARM_COPROC_SHIFT);
+	if (reg->is_64) {
+		val |= KVM_REG_SIZE_U64;
+		val |= (reg->Op1 << KVM_REG_ARM_OPC1_SHIFT);
+		val |= (reg->CRm << KVM_REG_ARM_CRM_SHIFT);
+	} else {
+		val |= KVM_REG_SIZE_U32;
+		val |= (reg->Op1 << KVM_REG_ARM_OPC1_SHIFT);
+		val |= (reg->Op2 << KVM_REG_ARM_32_OPC2_SHIFT);
+		val |= (reg->CRm << KVM_REG_ARM_CRM_SHIFT);
+		val |= (reg->CRn << KVM_REG_ARM_32_CRN_SHIFT);
+	}
+	return val;
+}
+
+static bool copy_reg_to_user(const struct coproc_reg *reg, u64 __user **uind)
+{
+	if (!*uind)
+		return true;
+
+	if (put_user(cp15_to_index(reg), *uind))
+		return false;
+
+	(*uind)++;
+	return true;
+}
+
+/* Assumed ordered tables, see kvm_coproc_table_init. */
+static int walk_cp15(struct kvm_vcpu *vcpu, u64 __user *uind)
+{
+	const struct coproc_reg *i1, *i2, *end1, *end2;
+	unsigned int total = 0;
+	size_t num;
+
+	/* We check for duplicates here, to allow arch-specific overrides. */
+	i1 = get_target_table(vcpu->arch.target, &num);
+	end1 = i1 + num;
+	i2 = cp15_regs;
+	end2 = cp15_regs + ARRAY_SIZE(cp15_regs);
+
+	BUG_ON(i1 == end1 || i2 == end2);
+
+	/* Walk carefully, as both tables may refer to the same register. */
+	while (i1 || i2) {
+		int cmp = cmp_reg(i1, i2);
+		/* target-specific overrides generic entry. */
+		if (cmp <= 0) {
+			/* Ignore registers we trap but don't save. */
+			if (i1->reg) {
+				if (!copy_reg_to_user(i1, &uind))
+					return -EFAULT;
+				total++;
+			}
+		} else {
+			/* Ignore registers we trap but don't save. */
+			if (i2->reg) {
+				if (!copy_reg_to_user(i2, &uind))
+					return -EFAULT;
+				total++;
+			}
+		}
+
+		if (cmp <= 0 && ++i1 == end1)
+			i1 = NULL;
+		if (cmp >= 0 && ++i2 == end2)
+			i2 = NULL;
+	}
+	return total;
+}
+
+unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu)
+{
+	return ARRAY_SIZE(invariant_cp15)
+		+ walk_cp15(vcpu, (u64 __user *)NULL);
+}
+
+int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
+{
+	unsigned int i;
+	int err;
+
+	/* Then give them all the invariant registers' indices. */
+	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++) {
+		if (put_user(cp15_to_index(&invariant_cp15[i]), uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	err = walk_cp15(vcpu, uindices);
+	if (err > 0)
+		err = 0;
+	return err;
+}
+
 void kvm_coproc_table_init(void)
 {
 	unsigned int i;
@@ -354,6 +677,10 @@ void kvm_coproc_table_init(void)
 	/* Make sure tables are unique and in order. */
 	for (i = 1; i < ARRAY_SIZE(cp15_regs); i++)
 		BUG_ON(cmp_reg(&cp15_regs[i-1], &cp15_regs[i]) >= 0);
+
+	/* We abuse the reset function to overwrite the table itself. */
+	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++)
+		invariant_cp15[i].reset(NULL, &invariant_cp15[i]);
 }
 
 /**
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 8db3811..65ae563 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -26,6 +26,7 @@
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
 
 #define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
 #define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
@@ -119,7 +120,7 @@ static unsigned long num_core_regs(void)
  */
 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
 {
-	return num_core_regs();
+	return num_core_regs() + kvm_arm_num_coproc_regs(vcpu);
 }
 
 /**
@@ -138,7 +139,7 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 		uindices++;
 	}
 
-	return 0;
+	return kvm_arm_copy_coproc_indices(vcpu, uindices);
 }
 
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@@ -151,7 +152,7 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
 		return get_core_reg(vcpu, reg);
 
-	return -EINVAL;
+	return kvm_arm_coproc_get_reg(vcpu, reg);
 }
 
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@@ -164,7 +165,7 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
 		return set_core_reg(vcpu, reg);
 
-	return -EINVAL;
+	return kvm_arm_coproc_set_reg(vcpu, reg);
 }
 
 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 09/14] KVM: ARM: User space API for getting/setting co-proc registers
@ 2013-01-08 18:39   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

The following three ioctls are implemented:
 -  KVM_GET_REG_LIST
 -  KVM_GET_ONE_REG
 -  KVM_SET_ONE_REG

Now we have a table for all the cp15 registers, we can drive a generic
API.

The register IDs carry the following encoding:

ARM registers are mapped using the lower 32 bits.  The upper 16 of that
is the register group type, or coprocessor number:

ARM 32-bit CP15 registers have the following id bit patterns:
  0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>

ARM 64-bit CP15 registers have the following id bit patterns:
  0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>

For futureproofing, we need to tell QEMU about the CP15 registers the
host lets the guest access.

It will need this information to restore a current guest on a future
CPU or perhaps a future KVM which allow some of these to be changed.

We use a separate table for these, as they're only for the userspace API.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |    5 +
 arch/arm/include/asm/kvm_coproc.h |    9 +
 arch/arm/include/asm/kvm_host.h   |    4 
 arch/arm/kvm/coproc.c             |  327 +++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/guest.c              |    9 +
 5 files changed, 350 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 5050492..0e22874 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1799,6 +1799,11 @@ is the register group type, or coprocessor number:
 ARM core registers have the following id bit patterns:
   0x4002 0000 0010 <index into the kvm_regs struct:16>
 
+ARM 32-bit CP15 registers have the following id bit patterns:
+  0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
+
+ARM 64-bit CP15 registers have the following id bit patterns:
+  0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
 
 
 4.69 KVM_GET_ONE_REG
diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
index bd1ace0..4917c2f 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -34,5 +34,14 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+unsigned long kvm_arm_num_guest_msrs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_msrindices(struct kvm_vcpu *vcpu, u64 __user *uindices);
 void kvm_coproc_table_init(void);
+
+struct kvm_one_reg;
+int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
+int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index a56a319..6cc8933 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -27,6 +27,7 @@
 #define KVM_USER_MEM_SLOTS 32
 #define KVM_PRIVATE_MEM_SLOTS 4
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_HAVE_ONE_REG
 
 #define KVM_VCPU_MAX_FEATURES 0
 
@@ -134,6 +135,9 @@ int kvm_unmap_hva_range(struct kvm *kvm,
 			unsigned long start, unsigned long end);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
+unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
+
 /* We do not have shadow page tables, hence the empty hooks */
 static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
 {
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 722efe3..95a0f5e 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -18,6 +18,7 @@
  */
 #include <linux/mm.h>
 #include <linux/kvm_host.h>
+#include <linux/uaccess.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_emulate.h>
@@ -347,6 +348,328 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return emulate_cp15(vcpu, &params);
 }
 
+/******************************************************************************
+ * Userspace API
+ *****************************************************************************/
+
+static bool index_to_params(u64 id, struct coproc_params *params)
+{
+	switch (id & KVM_REG_SIZE_MASK) {
+	case KVM_REG_SIZE_U32:
+		/* Any unused index bits means it's not valid. */
+		if (id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK
+			   | KVM_REG_ARM_COPROC_MASK
+			   | KVM_REG_ARM_32_CRN_MASK
+			   | KVM_REG_ARM_CRM_MASK
+			   | KVM_REG_ARM_OPC1_MASK
+			   | KVM_REG_ARM_32_OPC2_MASK))
+			return false;
+
+		params->is_64bit = false;
+		params->CRn = ((id & KVM_REG_ARM_32_CRN_MASK)
+			       >> KVM_REG_ARM_32_CRN_SHIFT);
+		params->CRm = ((id & KVM_REG_ARM_CRM_MASK)
+			       >> KVM_REG_ARM_CRM_SHIFT);
+		params->Op1 = ((id & KVM_REG_ARM_OPC1_MASK)
+			       >> KVM_REG_ARM_OPC1_SHIFT);
+		params->Op2 = ((id & KVM_REG_ARM_32_OPC2_MASK)
+			       >> KVM_REG_ARM_32_OPC2_SHIFT);
+		return true;
+	case KVM_REG_SIZE_U64:
+		/* Any unused index bits means it's not valid. */
+		if (id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK
+			      | KVM_REG_ARM_COPROC_MASK
+			      | KVM_REG_ARM_CRM_MASK
+			      | KVM_REG_ARM_OPC1_MASK))
+			return false;
+		params->is_64bit = true;
+		params->CRm = ((id & KVM_REG_ARM_CRM_MASK)
+			       >> KVM_REG_ARM_CRM_SHIFT);
+		params->Op1 = ((id & KVM_REG_ARM_OPC1_MASK)
+			       >> KVM_REG_ARM_OPC1_SHIFT);
+		params->Op2 = 0;
+		params->CRn = 0;
+		return true;
+	default:
+		return false;
+	}
+}
+
+/* Decode an index value, and find the cp15 coproc_reg entry. */
+static const struct coproc_reg *index_to_coproc_reg(struct kvm_vcpu *vcpu,
+						    u64 id)
+{
+	size_t num;
+	const struct coproc_reg *table, *r;
+	struct coproc_params params;
+
+	/* We only do cp15 for now. */
+	if ((id & KVM_REG_ARM_COPROC_MASK) >> KVM_REG_ARM_COPROC_SHIFT != 15)
+		return NULL;
+
+	if (!index_to_params(id, &params))
+		return NULL;
+
+	table = get_target_table(vcpu->arch.target, &num);
+	r = find_reg(&params, table, num);
+	if (!r)
+		r = find_reg(&params, cp15_regs, ARRAY_SIZE(cp15_regs));
+
+	/* Not saved in the cp15 array? */
+	if (r && !r->reg)
+		r = NULL;
+
+	return r;
+}
+
+/*
+ * These are the invariant cp15 registers: we let the guest see the host
+ * versions of these, so they're part of the guest state.
+ *
+ * A future CPU may provide a mechanism to present different values to
+ * the guest, or a future kvm may trap them.
+ */
+/* Unfortunately, there's no register-argument for mrc, so generate. */
+#define FUNCTION_FOR32(crn, crm, op1, op2, name)			\
+	static void get_##name(struct kvm_vcpu *v,			\
+			       const struct coproc_reg *r)		\
+	{								\
+		u32 val;						\
+									\
+		asm volatile("mrc p15, " __stringify(op1)		\
+			     ", %0, c" __stringify(crn)			\
+			     ", c" __stringify(crm)			\
+			     ", " __stringify(op2) "\n" : "=r" (val));	\
+		((struct coproc_reg *)r)->val = val;			\
+	}
+
+FUNCTION_FOR32(0, 0, 0, 0, MIDR)
+FUNCTION_FOR32(0, 0, 0, 1, CTR)
+FUNCTION_FOR32(0, 0, 0, 2, TCMTR)
+FUNCTION_FOR32(0, 0, 0, 3, TLBTR)
+FUNCTION_FOR32(0, 0, 0, 6, REVIDR)
+FUNCTION_FOR32(0, 1, 0, 0, ID_PFR0)
+FUNCTION_FOR32(0, 1, 0, 1, ID_PFR1)
+FUNCTION_FOR32(0, 1, 0, 2, ID_DFR0)
+FUNCTION_FOR32(0, 1, 0, 3, ID_AFR0)
+FUNCTION_FOR32(0, 1, 0, 4, ID_MMFR0)
+FUNCTION_FOR32(0, 1, 0, 5, ID_MMFR1)
+FUNCTION_FOR32(0, 1, 0, 6, ID_MMFR2)
+FUNCTION_FOR32(0, 1, 0, 7, ID_MMFR3)
+FUNCTION_FOR32(0, 2, 0, 0, ID_ISAR0)
+FUNCTION_FOR32(0, 2, 0, 1, ID_ISAR1)
+FUNCTION_FOR32(0, 2, 0, 2, ID_ISAR2)
+FUNCTION_FOR32(0, 2, 0, 3, ID_ISAR3)
+FUNCTION_FOR32(0, 2, 0, 4, ID_ISAR4)
+FUNCTION_FOR32(0, 2, 0, 5, ID_ISAR5)
+FUNCTION_FOR32(0, 0, 1, 1, CLIDR)
+FUNCTION_FOR32(0, 0, 1, 7, AIDR)
+
+/* ->val is filled in by kvm_invariant_coproc_table_init() */
+static struct coproc_reg invariant_cp15[] = {
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 0), is32, NULL, get_MIDR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 1), is32, NULL, get_CTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 2), is32, NULL, get_TCMTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 3), is32, NULL, get_TLBTR },
+	{ CRn( 0), CRm( 0), Op1( 0), Op2( 6), is32, NULL, get_REVIDR },
+
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 0), is32, NULL, get_ID_PFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 1), is32, NULL, get_ID_PFR1 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 2), is32, NULL, get_ID_DFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 3), is32, NULL, get_ID_AFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 4), is32, NULL, get_ID_MMFR0 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 5), is32, NULL, get_ID_MMFR1 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 6), is32, NULL, get_ID_MMFR2 },
+	{ CRn( 0), CRm( 1), Op1( 0), Op2( 7), is32, NULL, get_ID_MMFR3 },
+
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 0), is32, NULL, get_ID_ISAR0 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 1), is32, NULL, get_ID_ISAR1 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 2), is32, NULL, get_ID_ISAR2 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 3), is32, NULL, get_ID_ISAR3 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 4), is32, NULL, get_ID_ISAR4 },
+	{ CRn( 0), CRm( 2), Op1( 0), Op2( 5), is32, NULL, get_ID_ISAR5 },
+
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR },
+	{ CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
+};
+
+static int reg_from_user(void *val, const void __user *uaddr, u64 id)
+{
+	/* This Just Works because we are little endian. */
+	if (copy_from_user(val, uaddr, KVM_REG_SIZE(id)) != 0)
+		return -EFAULT;
+	return 0;
+}
+
+static int reg_to_user(void __user *uaddr, const void *val, u64 id)
+{
+	/* This Just Works because we are little endian. */
+	if (copy_to_user(uaddr, val, KVM_REG_SIZE(id)) != 0)
+		return -EFAULT;
+	return 0;
+}
+
+static int get_invariant_cp15(u64 id, void __user *uaddr)
+{
+	struct coproc_params params;
+	const struct coproc_reg *r;
+
+	if (!index_to_params(id, &params))
+		return -ENOENT;
+
+	r = find_reg(&params, invariant_cp15, ARRAY_SIZE(invariant_cp15));
+	if (!r)
+		return -ENOENT;
+
+	return reg_to_user(uaddr, &r->val, id);
+}
+
+static int set_invariant_cp15(u64 id, void __user *uaddr)
+{
+	struct coproc_params params;
+	const struct coproc_reg *r;
+	int err;
+	u64 val = 0; /* Make sure high bits are 0 for 32-bit regs */
+
+	if (!index_to_params(id, &params))
+		return -ENOENT;
+	r = find_reg(&params, invariant_cp15, ARRAY_SIZE(invariant_cp15));
+	if (!r)
+		return -ENOENT;
+
+	err = reg_from_user(&val, uaddr, id);
+	if (err)
+		return err;
+
+	/* This is what we mean by invariant: you can't change it. */
+	if (r->val != val)
+		return -EINVAL;
+
+	return 0;
+}
+
+int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	const struct coproc_reg *r;
+	void __user *uaddr = (void __user *)(long)reg->addr;
+
+	r = index_to_coproc_reg(vcpu, reg->id);
+	if (!r)
+		return get_invariant_cp15(reg->id, uaddr);
+
+	/* Note: copies two regs if size is 64 bit. */
+	return reg_to_user(uaddr, &vcpu->arch.cp15[r->reg], reg->id);
+}
+
+int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	const struct coproc_reg *r;
+	void __user *uaddr = (void __user *)(long)reg->addr;
+
+	r = index_to_coproc_reg(vcpu, reg->id);
+	if (!r)
+		return set_invariant_cp15(reg->id, uaddr);
+
+	/* Note: copies two regs if size is 64 bit */
+	return reg_from_user(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
+}
+
+static u64 cp15_to_index(const struct coproc_reg *reg)
+{
+	u64 val = KVM_REG_ARM | (15 << KVM_REG_ARM_COPROC_SHIFT);
+	if (reg->is_64) {
+		val |= KVM_REG_SIZE_U64;
+		val |= (reg->Op1 << KVM_REG_ARM_OPC1_SHIFT);
+		val |= (reg->CRm << KVM_REG_ARM_CRM_SHIFT);
+	} else {
+		val |= KVM_REG_SIZE_U32;
+		val |= (reg->Op1 << KVM_REG_ARM_OPC1_SHIFT);
+		val |= (reg->Op2 << KVM_REG_ARM_32_OPC2_SHIFT);
+		val |= (reg->CRm << KVM_REG_ARM_CRM_SHIFT);
+		val |= (reg->CRn << KVM_REG_ARM_32_CRN_SHIFT);
+	}
+	return val;
+}
+
+static bool copy_reg_to_user(const struct coproc_reg *reg, u64 __user **uind)
+{
+	if (!*uind)
+		return true;
+
+	if (put_user(cp15_to_index(reg), *uind))
+		return false;
+
+	(*uind)++;
+	return true;
+}
+
+/* Assumed ordered tables, see kvm_coproc_table_init. */
+static int walk_cp15(struct kvm_vcpu *vcpu, u64 __user *uind)
+{
+	const struct coproc_reg *i1, *i2, *end1, *end2;
+	unsigned int total = 0;
+	size_t num;
+
+	/* We check for duplicates here, to allow arch-specific overrides. */
+	i1 = get_target_table(vcpu->arch.target, &num);
+	end1 = i1 + num;
+	i2 = cp15_regs;
+	end2 = cp15_regs + ARRAY_SIZE(cp15_regs);
+
+	BUG_ON(i1 == end1 || i2 == end2);
+
+	/* Walk carefully, as both tables may refer to the same register. */
+	while (i1 || i2) {
+		int cmp = cmp_reg(i1, i2);
+		/* target-specific overrides generic entry. */
+		if (cmp <= 0) {
+			/* Ignore registers we trap but don't save. */
+			if (i1->reg) {
+				if (!copy_reg_to_user(i1, &uind))
+					return -EFAULT;
+				total++;
+			}
+		} else {
+			/* Ignore registers we trap but don't save. */
+			if (i2->reg) {
+				if (!copy_reg_to_user(i2, &uind))
+					return -EFAULT;
+				total++;
+			}
+		}
+
+		if (cmp <= 0 && ++i1 == end1)
+			i1 = NULL;
+		if (cmp >= 0 && ++i2 == end2)
+			i2 = NULL;
+	}
+	return total;
+}
+
+unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu)
+{
+	return ARRAY_SIZE(invariant_cp15)
+		+ walk_cp15(vcpu, (u64 __user *)NULL);
+}
+
+int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
+{
+	unsigned int i;
+	int err;
+
+	/* Then give them all the invariant registers' indices. */
+	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++) {
+		if (put_user(cp15_to_index(&invariant_cp15[i]), uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	err = walk_cp15(vcpu, uindices);
+	if (err > 0)
+		err = 0;
+	return err;
+}
+
 void kvm_coproc_table_init(void)
 {
 	unsigned int i;
@@ -354,6 +677,10 @@ void kvm_coproc_table_init(void)
 	/* Make sure tables are unique and in order. */
 	for (i = 1; i < ARRAY_SIZE(cp15_regs); i++)
 		BUG_ON(cmp_reg(&cp15_regs[i-1], &cp15_regs[i]) >= 0);
+
+	/* We abuse the reset function to overwrite the table itself. */
+	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++)
+		invariant_cp15[i].reset(NULL, &invariant_cp15[i]);
 }
 
 /**
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 8db3811..65ae563 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -26,6 +26,7 @@
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_coproc.h>
 
 #define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
 #define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
@@ -119,7 +120,7 @@ static unsigned long num_core_regs(void)
  */
 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
 {
-	return num_core_regs();
+	return num_core_regs() + kvm_arm_num_coproc_regs(vcpu);
 }
 
 /**
@@ -138,7 +139,7 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 		uindices++;
 	}
 
-	return 0;
+	return kvm_arm_copy_coproc_indices(vcpu, uindices);
 }
 
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@@ -151,7 +152,7 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
 		return get_core_reg(vcpu, reg);
 
-	return -EINVAL;
+	return kvm_arm_coproc_get_reg(vcpu, reg);
 }
 
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@@ -164,7 +165,7 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
 		return set_core_reg(vcpu, reg);
 
-	return -EINVAL;
+	return kvm_arm_coproc_set_reg(vcpu, reg);
 }
 
 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 10/14] KVM: ARM: Demux CCSIDR in the userspace API
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:39   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm; +Cc: Marcelo Tosatti, Rusty Russell

The Cache Size Selection Register (CSSELR) selects the current Cache
Size ID Register (CCSIDR).  You write which cache you are interested
in to CSSELR, and read the information out of CCSIDR.

Which cache numbers are valid is known by reading the Cache Level ID
Register (CLIDR).

To export this state to userspace, we add a KVM_REG_ARM_DEMUX
numberspace (17), which uses 8 bits to represent which register is
being demultiplexed (0 for CCSIDR), and the lower 8 bits to represent
this demultiplexing (in our case, the CSSELR value, which is 4 bits).

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
---
 Documentation/virtual/kvm/api.txt |    2 
 arch/arm/include/uapi/asm/kvm.h   |    9 ++
 arch/arm/kvm/coproc.c             |  164 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 172 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 0e22874..94f17a3 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1805,6 +1805,8 @@ ARM 32-bit CP15 registers have the following id bit patterns:
 ARM 64-bit CP15 registers have the following id bit patterns:
   0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
 
+ARM CCSIDR registers are demultiplexed by CSSELR value:
+  0x4002 0000 0011 00 <csselr:8>
 
 4.69 KVM_GET_ONE_REG
 
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 4cf6d8f..aa2684c 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -104,6 +104,15 @@ struct kvm_arch_memory_slot {
 #define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
 #define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
 
+/* Some registers need more space to represent values. */
+#define KVM_REG_ARM_DEMUX		(0x0011 << KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_DEMUX_ID_MASK	0x000000000000FF00
+#define KVM_REG_ARM_DEMUX_ID_SHIFT	8
+#define KVM_REG_ARM_DEMUX_ID_CCSIDR	(0x00 << KVM_REG_ARM_DEMUX_ID_SHIFT)
+#define KVM_REG_ARM_DEMUX_VAL_MASK	0x00000000000000FF
+#define KVM_REG_ARM_DEMUX_VAL_SHIFT	0
+
+
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT		24
 #define KVM_ARM_IRQ_TYPE_MASK		0xff
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 95a0f5e..1827b64 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -35,6 +35,12 @@
  * Co-processor emulation
  *****************************************************************************/
 
+/* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
+static u32 cache_levels;
+
+/* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
+#define CSSELR_MAX 12
+
 int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	kvm_inject_undefined(vcpu);
@@ -548,11 +554,113 @@ static int set_invariant_cp15(u64 id, void __user *uaddr)
 	return 0;
 }
 
+static bool is_valid_cache(u32 val)
+{
+	u32 level, ctype;
+
+	if (val >= CSSELR_MAX)
+		return -ENOENT;
+
+	/* Bottom bit is Instruction or Data bit.  Next 3 bits are level. */
+        level = (val >> 1);
+        ctype = (cache_levels >> (level * 3)) & 7;
+
+	switch (ctype) {
+	case 0: /* No cache */
+		return false;
+	case 1: /* Instruction cache only */
+		return (val & 1);
+	case 2: /* Data cache only */
+	case 4: /* Unified cache */
+		return !(val & 1);
+	case 3: /* Separate instruction and data caches */
+		return true;
+	default: /* Reserved: we can't know instruction or data. */
+		return false;
+	}
+}
+
+/* Which cache CCSIDR represents depends on CSSELR value. */
+static u32 get_ccsidr(u32 csselr)
+{
+	u32 ccsidr;
+
+	/* Make sure noone else changes CSSELR during this! */
+	local_irq_disable();
+	/* Put value into CSSELR */
+	asm volatile("mcr p15, 2, %0, c0, c0, 0" : : "r" (csselr));
+	isb();
+	/* Read result out of CCSIDR */
+	asm volatile("mrc p15, 1, %0, c0, c0, 0" : "=r" (ccsidr));
+	local_irq_enable();
+
+	return ccsidr;
+}
+
+static int demux_c15_get(u64 id, void __user *uaddr)
+{
+	u32 val;
+	u32 __user *uval = uaddr;
+
+	/* Fail if we have unknown bits set. */
+	if (id & ~(KVM_REG_ARCH_MASK|KVM_REG_SIZE_MASK|KVM_REG_ARM_COPROC_MASK
+		   | ((1 << KVM_REG_ARM_COPROC_SHIFT)-1)))
+		return -ENOENT;
+
+	switch (id & KVM_REG_ARM_DEMUX_ID_MASK) {
+	case KVM_REG_ARM_DEMUX_ID_CCSIDR:
+		if (KVM_REG_SIZE(id) != 4)
+			return -ENOENT;
+		val = (id & KVM_REG_ARM_DEMUX_VAL_MASK)
+			>> KVM_REG_ARM_DEMUX_VAL_SHIFT;
+		if (!is_valid_cache(val))
+			return -ENOENT;
+
+		return put_user(get_ccsidr(val), uval);
+	default:
+		return -ENOENT;
+	}
+}
+
+static int demux_c15_set(u64 id, void __user *uaddr)
+{
+	u32 val, newval;
+	u32 __user *uval = uaddr;
+
+	/* Fail if we have unknown bits set. */
+	if (id & ~(KVM_REG_ARCH_MASK|KVM_REG_SIZE_MASK|KVM_REG_ARM_COPROC_MASK
+		   | ((1 << KVM_REG_ARM_COPROC_SHIFT)-1)))
+		return -ENOENT;
+
+	switch (id & KVM_REG_ARM_DEMUX_ID_MASK) {
+	case KVM_REG_ARM_DEMUX_ID_CCSIDR:
+		if (KVM_REG_SIZE(id) != 4)
+			return -ENOENT;
+		val = (id & KVM_REG_ARM_DEMUX_VAL_MASK)
+			>> KVM_REG_ARM_DEMUX_VAL_SHIFT;
+		if (!is_valid_cache(val))
+			return -ENOENT;
+
+		if (get_user(newval, uval))
+			return -EFAULT;
+
+		/* This is also invariant: you can't change it. */
+		if (newval != get_ccsidr(val))
+			return -EINVAL;
+		return 0;
+	default:
+		return -ENOENT;
+	}
+}
+
 int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 {
 	const struct coproc_reg *r;
 	void __user *uaddr = (void __user *)(long)reg->addr;
 
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
+		return demux_c15_get(reg->id, uaddr);
+
 	r = index_to_coproc_reg(vcpu, reg->id);
 	if (!r)
 		return get_invariant_cp15(reg->id, uaddr);
@@ -566,6 +674,9 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	const struct coproc_reg *r;
 	void __user *uaddr = (void __user *)(long)reg->addr;
 
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
+		return demux_c15_set(reg->id, uaddr);
+
 	r = index_to_coproc_reg(vcpu, reg->id);
 	if (!r)
 		return set_invariant_cp15(reg->id, uaddr);
@@ -574,6 +685,33 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	return reg_from_user(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
 }
 
+static unsigned int num_demux_regs(void)
+{
+	unsigned int i, count = 0;
+
+	for (i = 0; i < CSSELR_MAX; i++)
+		if (is_valid_cache(i))
+			count++;
+
+	return count;
+}
+
+static int write_demux_regids(u64 __user *uindices)
+{
+	u64 val = KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_DEMUX;
+	unsigned int i;
+
+	val |= KVM_REG_ARM_DEMUX_ID_CCSIDR;
+	for (i = 0; i < CSSELR_MAX; i++) {
+		if (!is_valid_cache(i))
+			continue;
+		if (put_user(val | i, uindices))
+			return -EFAULT;
+		uindices++;
+	}
+	return 0;
+}
+
 static u64 cp15_to_index(const struct coproc_reg *reg)
 {
 	u64 val = KVM_REG_ARM | (15 << KVM_REG_ARM_COPROC_SHIFT);
@@ -649,6 +787,7 @@ static int walk_cp15(struct kvm_vcpu *vcpu, u64 __user *uind)
 unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu)
 {
 	return ARRAY_SIZE(invariant_cp15)
+		+ num_demux_regs()
 		+ walk_cp15(vcpu, (u64 __user *)NULL);
 }
 
@@ -665,9 +804,11 @@ int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 	}
 
 	err = walk_cp15(vcpu, uindices);
-	if (err > 0)
-		err = 0;
-	return err;
+	if (err < 0)
+		return err;
+	uindices += err;
+
+	return write_demux_regids(uindices);
 }
 
 void kvm_coproc_table_init(void)
@@ -681,6 +822,23 @@ void kvm_coproc_table_init(void)
 	/* We abuse the reset function to overwrite the table itself. */
 	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++)
 		invariant_cp15[i].reset(NULL, &invariant_cp15[i]);
+
+	/*
+	 * CLIDR format is awkward, so clean it up.  See ARM B4.1.20:
+	 *
+	 *   If software reads the Cache Type fields from Ctype1
+	 *   upwards, once it has seen a value of 0b000, no caches
+	 *   exist at further-out levels of the hierarchy. So, for
+	 *   example, if Ctype3 is the first Cache Type field with a
+	 *   value of 0b000, the values of Ctype4 to Ctype7 must be
+	 *   ignored.
+	 */
+	asm volatile("mrc p15, 1, %0, c0, c0, 1" : "=r" (cache_levels));
+	for (i = 0; i < 7; i++)
+		if (((cache_levels >> (i*3)) & 7) == 0)
+			break;
+	/* Clear all higher bits. */
+	cache_levels &= (1 << (i*3))-1;
 }
 
 /**

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 10/14] KVM: ARM: Demux CCSIDR in the userspace API
@ 2013-01-08 18:39   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

The Cache Size Selection Register (CSSELR) selects the current Cache
Size ID Register (CCSIDR).  You write which cache you are interested
in to CSSELR, and read the information out of CCSIDR.

Which cache numbers are valid is known by reading the Cache Level ID
Register (CLIDR).

To export this state to userspace, we add a KVM_REG_ARM_DEMUX
numberspace (17), which uses 8 bits to represent which register is
being demultiplexed (0 for CCSIDR), and the lower 8 bits to represent
this demultiplexing (in our case, the CSSELR value, which is 4 bits).

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
---
 Documentation/virtual/kvm/api.txt |    2 
 arch/arm/include/uapi/asm/kvm.h   |    9 ++
 arch/arm/kvm/coproc.c             |  164 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 172 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 0e22874..94f17a3 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1805,6 +1805,8 @@ ARM 32-bit CP15 registers have the following id bit patterns:
 ARM 64-bit CP15 registers have the following id bit patterns:
   0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
 
+ARM CCSIDR registers are demultiplexed by CSSELR value:
+  0x4002 0000 0011 00 <csselr:8>
 
 4.69 KVM_GET_ONE_REG
 
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 4cf6d8f..aa2684c 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -104,6 +104,15 @@ struct kvm_arch_memory_slot {
 #define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
 #define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
 
+/* Some registers need more space to represent values. */
+#define KVM_REG_ARM_DEMUX		(0x0011 << KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_DEMUX_ID_MASK	0x000000000000FF00
+#define KVM_REG_ARM_DEMUX_ID_SHIFT	8
+#define KVM_REG_ARM_DEMUX_ID_CCSIDR	(0x00 << KVM_REG_ARM_DEMUX_ID_SHIFT)
+#define KVM_REG_ARM_DEMUX_VAL_MASK	0x00000000000000FF
+#define KVM_REG_ARM_DEMUX_VAL_SHIFT	0
+
+
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT		24
 #define KVM_ARM_IRQ_TYPE_MASK		0xff
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 95a0f5e..1827b64 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -35,6 +35,12 @@
  * Co-processor emulation
  *****************************************************************************/
 
+/* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
+static u32 cache_levels;
+
+/* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
+#define CSSELR_MAX 12
+
 int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	kvm_inject_undefined(vcpu);
@@ -548,11 +554,113 @@ static int set_invariant_cp15(u64 id, void __user *uaddr)
 	return 0;
 }
 
+static bool is_valid_cache(u32 val)
+{
+	u32 level, ctype;
+
+	if (val >= CSSELR_MAX)
+		return -ENOENT;
+
+	/* Bottom bit is Instruction or Data bit.  Next 3 bits are level. */
+        level = (val >> 1);
+        ctype = (cache_levels >> (level * 3)) & 7;
+
+	switch (ctype) {
+	case 0: /* No cache */
+		return false;
+	case 1: /* Instruction cache only */
+		return (val & 1);
+	case 2: /* Data cache only */
+	case 4: /* Unified cache */
+		return !(val & 1);
+	case 3: /* Separate instruction and data caches */
+		return true;
+	default: /* Reserved: we can't know instruction or data. */
+		return false;
+	}
+}
+
+/* Which cache CCSIDR represents depends on CSSELR value. */
+static u32 get_ccsidr(u32 csselr)
+{
+	u32 ccsidr;
+
+	/* Make sure noone else changes CSSELR during this! */
+	local_irq_disable();
+	/* Put value into CSSELR */
+	asm volatile("mcr p15, 2, %0, c0, c0, 0" : : "r" (csselr));
+	isb();
+	/* Read result out of CCSIDR */
+	asm volatile("mrc p15, 1, %0, c0, c0, 0" : "=r" (ccsidr));
+	local_irq_enable();
+
+	return ccsidr;
+}
+
+static int demux_c15_get(u64 id, void __user *uaddr)
+{
+	u32 val;
+	u32 __user *uval = uaddr;
+
+	/* Fail if we have unknown bits set. */
+	if (id & ~(KVM_REG_ARCH_MASK|KVM_REG_SIZE_MASK|KVM_REG_ARM_COPROC_MASK
+		   | ((1 << KVM_REG_ARM_COPROC_SHIFT)-1)))
+		return -ENOENT;
+
+	switch (id & KVM_REG_ARM_DEMUX_ID_MASK) {
+	case KVM_REG_ARM_DEMUX_ID_CCSIDR:
+		if (KVM_REG_SIZE(id) != 4)
+			return -ENOENT;
+		val = (id & KVM_REG_ARM_DEMUX_VAL_MASK)
+			>> KVM_REG_ARM_DEMUX_VAL_SHIFT;
+		if (!is_valid_cache(val))
+			return -ENOENT;
+
+		return put_user(get_ccsidr(val), uval);
+	default:
+		return -ENOENT;
+	}
+}
+
+static int demux_c15_set(u64 id, void __user *uaddr)
+{
+	u32 val, newval;
+	u32 __user *uval = uaddr;
+
+	/* Fail if we have unknown bits set. */
+	if (id & ~(KVM_REG_ARCH_MASK|KVM_REG_SIZE_MASK|KVM_REG_ARM_COPROC_MASK
+		   | ((1 << KVM_REG_ARM_COPROC_SHIFT)-1)))
+		return -ENOENT;
+
+	switch (id & KVM_REG_ARM_DEMUX_ID_MASK) {
+	case KVM_REG_ARM_DEMUX_ID_CCSIDR:
+		if (KVM_REG_SIZE(id) != 4)
+			return -ENOENT;
+		val = (id & KVM_REG_ARM_DEMUX_VAL_MASK)
+			>> KVM_REG_ARM_DEMUX_VAL_SHIFT;
+		if (!is_valid_cache(val))
+			return -ENOENT;
+
+		if (get_user(newval, uval))
+			return -EFAULT;
+
+		/* This is also invariant: you can't change it. */
+		if (newval != get_ccsidr(val))
+			return -EINVAL;
+		return 0;
+	default:
+		return -ENOENT;
+	}
+}
+
 int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 {
 	const struct coproc_reg *r;
 	void __user *uaddr = (void __user *)(long)reg->addr;
 
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
+		return demux_c15_get(reg->id, uaddr);
+
 	r = index_to_coproc_reg(vcpu, reg->id);
 	if (!r)
 		return get_invariant_cp15(reg->id, uaddr);
@@ -566,6 +674,9 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	const struct coproc_reg *r;
 	void __user *uaddr = (void __user *)(long)reg->addr;
 
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
+		return demux_c15_set(reg->id, uaddr);
+
 	r = index_to_coproc_reg(vcpu, reg->id);
 	if (!r)
 		return set_invariant_cp15(reg->id, uaddr);
@@ -574,6 +685,33 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	return reg_from_user(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
 }
 
+static unsigned int num_demux_regs(void)
+{
+	unsigned int i, count = 0;
+
+	for (i = 0; i < CSSELR_MAX; i++)
+		if (is_valid_cache(i))
+			count++;
+
+	return count;
+}
+
+static int write_demux_regids(u64 __user *uindices)
+{
+	u64 val = KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_DEMUX;
+	unsigned int i;
+
+	val |= KVM_REG_ARM_DEMUX_ID_CCSIDR;
+	for (i = 0; i < CSSELR_MAX; i++) {
+		if (!is_valid_cache(i))
+			continue;
+		if (put_user(val | i, uindices))
+			return -EFAULT;
+		uindices++;
+	}
+	return 0;
+}
+
 static u64 cp15_to_index(const struct coproc_reg *reg)
 {
 	u64 val = KVM_REG_ARM | (15 << KVM_REG_ARM_COPROC_SHIFT);
@@ -649,6 +787,7 @@ static int walk_cp15(struct kvm_vcpu *vcpu, u64 __user *uind)
 unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu)
 {
 	return ARRAY_SIZE(invariant_cp15)
+		+ num_demux_regs()
 		+ walk_cp15(vcpu, (u64 __user *)NULL);
 }
 
@@ -665,9 +804,11 @@ int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 	}
 
 	err = walk_cp15(vcpu, uindices);
-	if (err > 0)
-		err = 0;
-	return err;
+	if (err < 0)
+		return err;
+	uindices += err;
+
+	return write_demux_regids(uindices);
 }
 
 void kvm_coproc_table_init(void)
@@ -681,6 +822,23 @@ void kvm_coproc_table_init(void)
 	/* We abuse the reset function to overwrite the table itself. */
 	for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++)
 		invariant_cp15[i].reset(NULL, &invariant_cp15[i]);
+
+	/*
+	 * CLIDR format is awkward, so clean it up.  See ARM B4.1.20:
+	 *
+	 *   If software reads the Cache Type fields from Ctype1
+	 *   upwards, once it has seen a value of 0b000, no caches
+	 *   exist@further-out levels of the hierarchy. So, for
+	 *   example, if Ctype3 is the first Cache Type field with a
+	 *   value of 0b000, the values of Ctype4 to Ctype7 must be
+	 *   ignored.
+	 */
+	asm volatile("mrc p15, 1, %0, c0, c0, 1" : "=r" (cache_levels));
+	for (i = 0; i < 7; i++)
+		if (((cache_levels >> (i*3)) & 7) == 0)
+			break;
+	/* Clear all higher bits. */
+	cache_levels &= (1 << (i*3))-1;
 }
 
 /**

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 11/14] KVM: ARM: VFP userspace interface
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:39   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm; +Cc: Rusty Russell, Marcelo Tosatti

From: Rusty Russell <rusty.russell@linaro.org>

We use space #18 for floating point regs.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |    6 +
 arch/arm/include/uapi/asm/kvm.h   |   12 ++
 arch/arm/kvm/coproc.c             |  178 +++++++++++++++++++++++++++++++++++++
 3 files changed, 196 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 94f17a3..38066a7a 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1808,6 +1808,12 @@ ARM 64-bit CP15 registers have the following id bit patterns:
 ARM CCSIDR registers are demultiplexed by CSSELR value:
   0x4002 0000 0011 00 <csselr:8>
 
+ARM 32-bit VFP control registers have the following id bit patterns:
+  0x4002 0000 0012 1 <regno:12>
+
+ARM 64-bit FP registers have the following id bit patterns:
+  0x4002 0000 0012 0 <regno:12>
+
 4.69 KVM_GET_ONE_REG
 
 Capability: KVM_CAP_ONE_REG
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index aa2684c..73b9615 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -112,6 +112,18 @@ struct kvm_arch_memory_slot {
 #define KVM_REG_ARM_DEMUX_VAL_MASK	0x00000000000000FF
 #define KVM_REG_ARM_DEMUX_VAL_SHIFT	0
 
+/* VFP registers: we could overload CP10 like ARM does, but that's ugly. */
+#define KVM_REG_ARM_VFP			(0x0012 << KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_VFP_MASK		0x000000000000FFFF
+#define KVM_REG_ARM_VFP_BASE_REG	0x0
+#define KVM_REG_ARM_VFP_FPSID		0x1000
+#define KVM_REG_ARM_VFP_FPSCR		0x1001
+#define KVM_REG_ARM_VFP_MVFR1		0x1006
+#define KVM_REG_ARM_VFP_MVFR0		0x1007
+#define KVM_REG_ARM_VFP_FPEXC		0x1008
+#define KVM_REG_ARM_VFP_FPINST		0x1009
+#define KVM_REG_ARM_VFP_FPINST2		0x100A
+
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT		24
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 1827b64..d782638 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -26,6 +26,8 @@
 #include <asm/cacheflush.h>
 #include <asm/cputype.h>
 #include <trace/events/kvm.h>
+#include <asm/vfp.h>
+#include "../vfp/vfpinstr.h"
 
 #include "trace.h"
 #include "coproc.h"
@@ -653,6 +655,170 @@ static int demux_c15_set(u64 id, void __user *uaddr)
 	}
 }
 
+#ifdef CONFIG_VFPv3
+static const int vfp_sysregs[] = { KVM_REG_ARM_VFP_FPEXC,
+				   KVM_REG_ARM_VFP_FPSCR,
+				   KVM_REG_ARM_VFP_FPINST,
+				   KVM_REG_ARM_VFP_FPINST2,
+				   KVM_REG_ARM_VFP_MVFR0,
+				   KVM_REG_ARM_VFP_MVFR1,
+				   KVM_REG_ARM_VFP_FPSID };
+
+static unsigned int num_fp_regs(void)
+{
+	if (((fmrx(MVFR0) & MVFR0_A_SIMD_MASK) >> MVFR0_A_SIMD_BIT) == 2)
+		return 32;
+	else
+		return 16;
+}
+
+static unsigned int num_vfp_regs(void)
+{
+	/* Normal FP regs + control regs. */
+	return num_fp_regs() + ARRAY_SIZE(vfp_sysregs);
+}
+
+static int copy_vfp_regids(u64 __user *uindices)
+{
+	unsigned int i;
+	const u64 u32reg = KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_VFP;
+	const u64 u64reg = KVM_REG_ARM | KVM_REG_SIZE_U64 | KVM_REG_ARM_VFP;
+
+	for (i = 0; i < num_fp_regs(); i++) {
+		if (put_user((u64reg | KVM_REG_ARM_VFP_BASE_REG) + i,
+			     uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(vfp_sysregs); i++) {
+		if (put_user(u32reg | vfp_sysregs[i], uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	return num_vfp_regs();
+}
+
+static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
+{
+	u32 vfpid = (id & KVM_REG_ARM_VFP_MASK);
+	u32 val;
+
+	/* Fail if we have unknown bits set. */
+	if (id & ~(KVM_REG_ARCH_MASK|KVM_REG_SIZE_MASK|KVM_REG_ARM_COPROC_MASK
+		   | ((1 << KVM_REG_ARM_COPROC_SHIFT)-1)))
+		return -ENOENT;
+
+	if (vfpid < num_fp_regs()) {
+		if (KVM_REG_SIZE(id) != 8)
+			return -ENOENT;
+		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpregs[vfpid],
+				   id);
+	}
+
+	/* FP control registers are all 32 bit. */
+	if (KVM_REG_SIZE(id) != 4)
+		return -ENOENT;
+
+	switch (vfpid) {
+	case KVM_REG_ARM_VFP_FPEXC:
+		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpexc, id);
+	case KVM_REG_ARM_VFP_FPSCR:
+		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpscr, id);
+	case KVM_REG_ARM_VFP_FPINST:
+		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst, id);
+	case KVM_REG_ARM_VFP_FPINST2:
+		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst2, id);
+	case KVM_REG_ARM_VFP_MVFR0:
+		val = fmrx(MVFR0);
+		return reg_to_user(uaddr, &val, id);
+	case KVM_REG_ARM_VFP_MVFR1:
+		val = fmrx(MVFR1);
+		return reg_to_user(uaddr, &val, id);
+	case KVM_REG_ARM_VFP_FPSID:
+		val = fmrx(FPSID);
+		return reg_to_user(uaddr, &val, id);
+	default:
+		return -ENOENT;
+	}
+}
+
+static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr)
+{
+	u32 vfpid = (id & KVM_REG_ARM_VFP_MASK);
+	u32 val;
+
+	/* Fail if we have unknown bits set. */
+	if (id & ~(KVM_REG_ARCH_MASK|KVM_REG_SIZE_MASK|KVM_REG_ARM_COPROC_MASK
+		   | ((1 << KVM_REG_ARM_COPROC_SHIFT)-1)))
+		return -ENOENT;
+
+	if (vfpid < num_fp_regs()) {
+		if (KVM_REG_SIZE(id) != 8)
+			return -ENOENT;
+		return reg_from_user(&vcpu->arch.vfp_guest.fpregs[vfpid],
+				     uaddr, id);
+	}
+
+	/* FP control registers are all 32 bit. */
+	if (KVM_REG_SIZE(id) != 4)
+		return -ENOENT;
+
+	switch (vfpid) {
+	case KVM_REG_ARM_VFP_FPEXC:
+		return reg_from_user(&vcpu->arch.vfp_guest.fpexc, uaddr, id);
+	case KVM_REG_ARM_VFP_FPSCR:
+		return reg_from_user(&vcpu->arch.vfp_guest.fpscr, uaddr, id);
+	case KVM_REG_ARM_VFP_FPINST:
+		return reg_from_user(&vcpu->arch.vfp_guest.fpinst, uaddr, id);
+	case KVM_REG_ARM_VFP_FPINST2:
+		return reg_from_user(&vcpu->arch.vfp_guest.fpinst2, uaddr, id);
+	/* These are invariant. */
+	case KVM_REG_ARM_VFP_MVFR0:
+		if (reg_from_user(&val, uaddr, id))
+			return -EFAULT;
+		if (val != fmrx(MVFR0))
+			return -EINVAL;
+		return 0;
+	case KVM_REG_ARM_VFP_MVFR1:
+		if (reg_from_user(&val, uaddr, id))
+			return -EFAULT;
+		if (val != fmrx(MVFR1))
+			return -EINVAL;
+		return 0;
+	case KVM_REG_ARM_VFP_FPSID:
+		if (reg_from_user(&val, uaddr, id))
+			return -EFAULT;
+		if (val != fmrx(FPSID))
+			return -EINVAL;
+		return 0;
+	default:
+		return -ENOENT;
+	}
+}
+#else /* !CONFIG_VFPv3 */
+static unsigned int num_vfp_regs(void)
+{
+	return 0;
+}
+
+static int copy_vfp_regids(u64 __user *uindices)
+{
+	return 0;
+}
+
+static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
+{
+	return -ENOENT;
+}
+
+static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr)
+{
+	return -ENOENT;
+}
+#endif /* !CONFIG_VFPv3 */
+
 int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 {
 	const struct coproc_reg *r;
@@ -661,6 +827,9 @@ int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
 		return demux_c15_get(reg->id, uaddr);
 
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_VFP)
+		return vfp_get_reg(vcpu, reg->id, uaddr);
+
 	r = index_to_coproc_reg(vcpu, reg->id);
 	if (!r)
 		return get_invariant_cp15(reg->id, uaddr);
@@ -677,6 +846,9 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
 		return demux_c15_set(reg->id, uaddr);
 
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_VFP)
+		return vfp_set_reg(vcpu, reg->id, uaddr);
+
 	r = index_to_coproc_reg(vcpu, reg->id);
 	if (!r)
 		return set_invariant_cp15(reg->id, uaddr);
@@ -788,6 +960,7 @@ unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu)
 {
 	return ARRAY_SIZE(invariant_cp15)
 		+ num_demux_regs()
+		+ num_vfp_regs()
 		+ walk_cp15(vcpu, (u64 __user *)NULL);
 }
 
@@ -808,6 +981,11 @@ int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 		return err;
 	uindices += err;
 
+	err = copy_vfp_regids(uindices);
+	if (err < 0)
+		return err;
+	uindices += err;
+
 	return write_demux_regids(uindices);
 }
 


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 11/14] KVM: ARM: VFP userspace interface
@ 2013-01-08 18:39   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rusty Russell <rusty.russell@linaro.org>

We use space #18 for floating point regs.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |    6 +
 arch/arm/include/uapi/asm/kvm.h   |   12 ++
 arch/arm/kvm/coproc.c             |  178 +++++++++++++++++++++++++++++++++++++
 3 files changed, 196 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 94f17a3..38066a7a 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1808,6 +1808,12 @@ ARM 64-bit CP15 registers have the following id bit patterns:
 ARM CCSIDR registers are demultiplexed by CSSELR value:
   0x4002 0000 0011 00 <csselr:8>
 
+ARM 32-bit VFP control registers have the following id bit patterns:
+  0x4002 0000 0012 1 <regno:12>
+
+ARM 64-bit FP registers have the following id bit patterns:
+  0x4002 0000 0012 0 <regno:12>
+
 4.69 KVM_GET_ONE_REG
 
 Capability: KVM_CAP_ONE_REG
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index aa2684c..73b9615 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -112,6 +112,18 @@ struct kvm_arch_memory_slot {
 #define KVM_REG_ARM_DEMUX_VAL_MASK	0x00000000000000FF
 #define KVM_REG_ARM_DEMUX_VAL_SHIFT	0
 
+/* VFP registers: we could overload CP10 like ARM does, but that's ugly. */
+#define KVM_REG_ARM_VFP			(0x0012 << KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_VFP_MASK		0x000000000000FFFF
+#define KVM_REG_ARM_VFP_BASE_REG	0x0
+#define KVM_REG_ARM_VFP_FPSID		0x1000
+#define KVM_REG_ARM_VFP_FPSCR		0x1001
+#define KVM_REG_ARM_VFP_MVFR1		0x1006
+#define KVM_REG_ARM_VFP_MVFR0		0x1007
+#define KVM_REG_ARM_VFP_FPEXC		0x1008
+#define KVM_REG_ARM_VFP_FPINST		0x1009
+#define KVM_REG_ARM_VFP_FPINST2		0x100A
+
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT		24
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 1827b64..d782638 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -26,6 +26,8 @@
 #include <asm/cacheflush.h>
 #include <asm/cputype.h>
 #include <trace/events/kvm.h>
+#include <asm/vfp.h>
+#include "../vfp/vfpinstr.h"
 
 #include "trace.h"
 #include "coproc.h"
@@ -653,6 +655,170 @@ static int demux_c15_set(u64 id, void __user *uaddr)
 	}
 }
 
+#ifdef CONFIG_VFPv3
+static const int vfp_sysregs[] = { KVM_REG_ARM_VFP_FPEXC,
+				   KVM_REG_ARM_VFP_FPSCR,
+				   KVM_REG_ARM_VFP_FPINST,
+				   KVM_REG_ARM_VFP_FPINST2,
+				   KVM_REG_ARM_VFP_MVFR0,
+				   KVM_REG_ARM_VFP_MVFR1,
+				   KVM_REG_ARM_VFP_FPSID };
+
+static unsigned int num_fp_regs(void)
+{
+	if (((fmrx(MVFR0) & MVFR0_A_SIMD_MASK) >> MVFR0_A_SIMD_BIT) == 2)
+		return 32;
+	else
+		return 16;
+}
+
+static unsigned int num_vfp_regs(void)
+{
+	/* Normal FP regs + control regs. */
+	return num_fp_regs() + ARRAY_SIZE(vfp_sysregs);
+}
+
+static int copy_vfp_regids(u64 __user *uindices)
+{
+	unsigned int i;
+	const u64 u32reg = KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_VFP;
+	const u64 u64reg = KVM_REG_ARM | KVM_REG_SIZE_U64 | KVM_REG_ARM_VFP;
+
+	for (i = 0; i < num_fp_regs(); i++) {
+		if (put_user((u64reg | KVM_REG_ARM_VFP_BASE_REG) + i,
+			     uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(vfp_sysregs); i++) {
+		if (put_user(u32reg | vfp_sysregs[i], uindices))
+			return -EFAULT;
+		uindices++;
+	}
+
+	return num_vfp_regs();
+}
+
+static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
+{
+	u32 vfpid = (id & KVM_REG_ARM_VFP_MASK);
+	u32 val;
+
+	/* Fail if we have unknown bits set. */
+	if (id & ~(KVM_REG_ARCH_MASK|KVM_REG_SIZE_MASK|KVM_REG_ARM_COPROC_MASK
+		   | ((1 << KVM_REG_ARM_COPROC_SHIFT)-1)))
+		return -ENOENT;
+
+	if (vfpid < num_fp_regs()) {
+		if (KVM_REG_SIZE(id) != 8)
+			return -ENOENT;
+		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpregs[vfpid],
+				   id);
+	}
+
+	/* FP control registers are all 32 bit. */
+	if (KVM_REG_SIZE(id) != 4)
+		return -ENOENT;
+
+	switch (vfpid) {
+	case KVM_REG_ARM_VFP_FPEXC:
+		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpexc, id);
+	case KVM_REG_ARM_VFP_FPSCR:
+		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpscr, id);
+	case KVM_REG_ARM_VFP_FPINST:
+		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst, id);
+	case KVM_REG_ARM_VFP_FPINST2:
+		return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst2, id);
+	case KVM_REG_ARM_VFP_MVFR0:
+		val = fmrx(MVFR0);
+		return reg_to_user(uaddr, &val, id);
+	case KVM_REG_ARM_VFP_MVFR1:
+		val = fmrx(MVFR1);
+		return reg_to_user(uaddr, &val, id);
+	case KVM_REG_ARM_VFP_FPSID:
+		val = fmrx(FPSID);
+		return reg_to_user(uaddr, &val, id);
+	default:
+		return -ENOENT;
+	}
+}
+
+static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr)
+{
+	u32 vfpid = (id & KVM_REG_ARM_VFP_MASK);
+	u32 val;
+
+	/* Fail if we have unknown bits set. */
+	if (id & ~(KVM_REG_ARCH_MASK|KVM_REG_SIZE_MASK|KVM_REG_ARM_COPROC_MASK
+		   | ((1 << KVM_REG_ARM_COPROC_SHIFT)-1)))
+		return -ENOENT;
+
+	if (vfpid < num_fp_regs()) {
+		if (KVM_REG_SIZE(id) != 8)
+			return -ENOENT;
+		return reg_from_user(&vcpu->arch.vfp_guest.fpregs[vfpid],
+				     uaddr, id);
+	}
+
+	/* FP control registers are all 32 bit. */
+	if (KVM_REG_SIZE(id) != 4)
+		return -ENOENT;
+
+	switch (vfpid) {
+	case KVM_REG_ARM_VFP_FPEXC:
+		return reg_from_user(&vcpu->arch.vfp_guest.fpexc, uaddr, id);
+	case KVM_REG_ARM_VFP_FPSCR:
+		return reg_from_user(&vcpu->arch.vfp_guest.fpscr, uaddr, id);
+	case KVM_REG_ARM_VFP_FPINST:
+		return reg_from_user(&vcpu->arch.vfp_guest.fpinst, uaddr, id);
+	case KVM_REG_ARM_VFP_FPINST2:
+		return reg_from_user(&vcpu->arch.vfp_guest.fpinst2, uaddr, id);
+	/* These are invariant. */
+	case KVM_REG_ARM_VFP_MVFR0:
+		if (reg_from_user(&val, uaddr, id))
+			return -EFAULT;
+		if (val != fmrx(MVFR0))
+			return -EINVAL;
+		return 0;
+	case KVM_REG_ARM_VFP_MVFR1:
+		if (reg_from_user(&val, uaddr, id))
+			return -EFAULT;
+		if (val != fmrx(MVFR1))
+			return -EINVAL;
+		return 0;
+	case KVM_REG_ARM_VFP_FPSID:
+		if (reg_from_user(&val, uaddr, id))
+			return -EFAULT;
+		if (val != fmrx(FPSID))
+			return -EINVAL;
+		return 0;
+	default:
+		return -ENOENT;
+	}
+}
+#else /* !CONFIG_VFPv3 */
+static unsigned int num_vfp_regs(void)
+{
+	return 0;
+}
+
+static int copy_vfp_regids(u64 __user *uindices)
+{
+	return 0;
+}
+
+static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
+{
+	return -ENOENT;
+}
+
+static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr)
+{
+	return -ENOENT;
+}
+#endif /* !CONFIG_VFPv3 */
+
 int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 {
 	const struct coproc_reg *r;
@@ -661,6 +827,9 @@ int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
 		return demux_c15_get(reg->id, uaddr);
 
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_VFP)
+		return vfp_get_reg(vcpu, reg->id, uaddr);
+
 	r = index_to_coproc_reg(vcpu, reg->id);
 	if (!r)
 		return get_invariant_cp15(reg->id, uaddr);
@@ -677,6 +846,9 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
 		return demux_c15_set(reg->id, uaddr);
 
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_VFP)
+		return vfp_set_reg(vcpu, reg->id, uaddr);
+
 	r = index_to_coproc_reg(vcpu, reg->id);
 	if (!r)
 		return set_invariant_cp15(reg->id, uaddr);
@@ -788,6 +960,7 @@ unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu)
 {
 	return ARRAY_SIZE(invariant_cp15)
 		+ num_demux_regs()
+		+ num_vfp_regs()
 		+ walk_cp15(vcpu, (u64 __user *)NULL);
 }
 
@@ -808,6 +981,11 @@ int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 		return err;
 	uindices += err;
 
+	err = copy_vfp_regids(uindices);
+	if (err < 0)
+		return err;
+	uindices += err;
+
 	return write_demux_regids(uindices);
 }
 

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 12/14] KVM: ARM: Handle guest faults in KVM
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:39   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm; +Cc: Marc Zyngier, Marcelo Tosatti

Handles the guest faults in KVM by mapping in corresponding user pages
in the 2nd stage page tables.

We invalidate the instruction cache by MVA whenever we map a page to the
guest (no, we cannot only do it when we have an iabt because the guest
may happily read/write a page before hitting the icache) if the hardware
uses VIPT or PIPT.  In the latter case, we can invalidate only that
physical page.  In the first case, all bets are off and we simply must
invalidate the whole affair.  Not that VIVT icaches are tagged with
vmids, and we are out of the woods on that one.  Alexander Graf was nice
enough to remind us of this massive pain.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_asm.h |    2 +
 arch/arm/include/asm/kvm_mmu.h |   12 +++
 arch/arm/kvm/mmu.c             |  143 ++++++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h           |   26 +++++++
 4 files changed, 182 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index f6652f6..5e06e81 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -71,6 +71,8 @@ extern char __kvm_hyp_vector[];
 extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 499e7b0..421a20b 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -35,4 +35,16 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 phys_addr_t kvm_mmu_get_httbr(void);
 int kvm_mmu_init(void);
 void kvm_clear_hyp_idmap(void);
+
+static inline bool kvm_is_write_fault(unsigned long hsr)
+{
+	unsigned long hsr_ec = hsr >> HSR_EC_SHIFT;
+	if (hsr_ec == HSR_EC_IABT)
+		return false;
+	else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR))
+		return false;
+	else
+		return true;
+}
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 4347d68..0ce0e77 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -21,9 +21,11 @@
 #include <linux/io.h>
 #include <asm/idmap.h>
 #include <asm/pgalloc.h>
+#include <asm/cacheflush.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
 #include <asm/mach/map.h>
 #include <trace/events/kvm.h>
 
@@ -488,9 +490,148 @@ out:
 	return ret;
 }
 
+static void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
+{
+	/*
+	 * If we are going to insert an instruction page and the icache is
+	 * either VIPT or PIPT, there is a potential problem where the host
+	 * (or another VM) may have used the same page as this guest, and we
+	 * read incorrect data from the icache.  If we're using a PIPT cache,
+	 * we can invalidate just that page, but if we are using a VIPT cache
+	 * we need to invalidate the entire icache - damn shame - as written
+	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
+	 *
+	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
+	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
+	 */
+	if (icache_is_pipt()) {
+		unsigned long hva = gfn_to_hva(kvm, gfn);
+		__cpuc_coherent_user_range(hva, hva + PAGE_SIZE);
+	} else if (!icache_is_vivt_asid_tagged()) {
+		/* any kind of VIPT cache */
+		__flush_icache_all();
+	}
+}
+
+static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			  gfn_t gfn, struct kvm_memory_slot *memslot,
+			  unsigned long fault_status)
+{
+	pte_t new_pte;
+	pfn_t pfn;
+	int ret;
+	bool write_fault, writable;
+	unsigned long mmu_seq;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+
+	write_fault = kvm_is_write_fault(vcpu->arch.hsr);
+	if (fault_status == FSC_PERM && !write_fault) {
+		kvm_err("Unexpected L2 read permission error\n");
+		return -EFAULT;
+	}
+
+	/* We need minimum second+third level pages */
+	ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
+	if (ret)
+		return ret;
+
+	mmu_seq = vcpu->kvm->mmu_notifier_seq;
+	/*
+	 * Ensure the read of mmu_notifier_seq happens before we call
+	 * gfn_to_pfn_prot (which calls get_user_pages), so that we don't risk
+	 * the page we just got a reference to gets unmapped before we have a
+	 * chance to grab the mmu_lock, which ensure that if the page gets
+	 * unmapped afterwards, the call to kvm_unmap_hva will take it away
+	 * from us again properly. This smp_rmb() interacts with the smp_wmb()
+	 * in kvm_mmu_notifier_invalidate_<page|range_end>.
+	 */
+	smp_rmb();
+
+	pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
+	if (is_error_pfn(pfn))
+		return -EFAULT;
+
+	new_pte = pfn_pte(pfn, PAGE_S2);
+	coherent_icache_guest_page(vcpu->kvm, gfn);
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
+		goto out_unlock;
+	if (writable) {
+		pte_val(new_pte) |= L_PTE_S2_RDWR;
+		kvm_set_pfn_dirty(pfn);
+	}
+	stage2_set_pte(vcpu->kvm, memcache, fault_ipa, &new_pte, false);
+
+out_unlock:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+	kvm_release_pfn_clean(pfn);
+	return 0;
+}
+
+/**
+ * kvm_handle_guest_abort - handles all 2nd stage aborts
+ * @vcpu:	the VCPU pointer
+ * @run:	the kvm_run structure
+ *
+ * Any abort that gets to the host is almost guaranteed to be caused by a
+ * missing second stage translation table entry, which can mean that either the
+ * guest simply needs more memory and we must allocate an appropriate page or it
+ * can mean that the guest tried to access I/O memory, which is emulated by user
+ * space. The distinction is based on the IPA causing the fault and whether this
+ * memory region has been registered as standard RAM by user space.
+ */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	unsigned long hsr_ec;
+	unsigned long fault_status;
+	phys_addr_t fault_ipa;
+	struct kvm_memory_slot *memslot = NULL;
+	bool is_iabt;
+	gfn_t gfn;
+	int ret;
+
+	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
+	is_iabt = (hsr_ec == HSR_EC_IABT);
+	fault_ipa = ((phys_addr_t)vcpu->arch.hpfar & HPFAR_MASK) << 8;
+
+	trace_kvm_guest_fault(*vcpu_pc(vcpu), vcpu->arch.hsr,
+			      vcpu->arch.hxfar, fault_ipa);
+
+	/* Check the stage-2 fault is trans. fault or write fault */
+	fault_status = (vcpu->arch.hsr & HSR_FSC_TYPE);
+	if (fault_status != FSC_FAULT && fault_status != FSC_PERM) {
+		kvm_err("Unsupported fault status: EC=%#lx DFCS=%#lx\n",
+			hsr_ec, fault_status);
+		return -EFAULT;
+	}
+
+	gfn = fault_ipa >> PAGE_SHIFT;
+	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
+		if (is_iabt) {
+			/* Prefetch Abort on I/O address */
+			kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
+			return 1;
+		}
+
+		if (fault_status != FSC_FAULT) {
+			kvm_err("Unsupported fault status on io memory: %#lx\n",
+				fault_status);
+			return -EFAULT;
+		}
+
+		kvm_pr_unimpl("I/O address abort...");
+		return 0;
+	}
+
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot->user_alloc) {
+		kvm_err("non user-alloc memslots not supported\n");
+		return -EINVAL;
+	}
+
+	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
+	return ret ? ret : 1;
 }
 
 static void handle_hva_to_gpa(struct kvm *kvm,
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index c86a513..5d65751 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,6 +39,32 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_guest_fault,
+	TP_PROTO(unsigned long vcpu_pc, unsigned long hsr,
+		 unsigned long hxfar,
+		 unsigned long long ipa),
+	TP_ARGS(vcpu_pc, hsr, hxfar, ipa),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+		__field(	unsigned long,	hsr		)
+		__field(	unsigned long,	hxfar		)
+		__field(   unsigned long long,	ipa		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+		__entry->hsr			= hsr;
+		__entry->hxfar			= hxfar;
+		__entry->ipa			= ipa;
+	),
+
+	TP_printk("guest fault at PC %#08lx (hxfar %#08lx, "
+		  "ipa %#16llx, hsr %#08lx",
+		  __entry->vcpu_pc, __entry->hxfar,
+		  __entry->ipa, __entry->hsr)
+);
+
 TRACE_EVENT(kvm_irq_line,
 	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
 	TP_ARGS(type, vcpu_idx, irq_num, level),


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 12/14] KVM: ARM: Handle guest faults in KVM
@ 2013-01-08 18:39   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

Handles the guest faults in KVM by mapping in corresponding user pages
in the 2nd stage page tables.

We invalidate the instruction cache by MVA whenever we map a page to the
guest (no, we cannot only do it when we have an iabt because the guest
may happily read/write a page before hitting the icache) if the hardware
uses VIPT or PIPT.  In the latter case, we can invalidate only that
physical page.  In the first case, all bets are off and we simply must
invalidate the whole affair.  Not that VIVT icaches are tagged with
vmids, and we are out of the woods on that one.  Alexander Graf was nice
enough to remind us of this massive pain.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_asm.h |    2 +
 arch/arm/include/asm/kvm_mmu.h |   12 +++
 arch/arm/kvm/mmu.c             |  143 ++++++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h           |   26 +++++++
 4 files changed, 182 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index f6652f6..5e06e81 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -71,6 +71,8 @@ extern char __kvm_hyp_vector[];
 extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 499e7b0..421a20b 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -35,4 +35,16 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 phys_addr_t kvm_mmu_get_httbr(void);
 int kvm_mmu_init(void);
 void kvm_clear_hyp_idmap(void);
+
+static inline bool kvm_is_write_fault(unsigned long hsr)
+{
+	unsigned long hsr_ec = hsr >> HSR_EC_SHIFT;
+	if (hsr_ec == HSR_EC_IABT)
+		return false;
+	else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR))
+		return false;
+	else
+		return true;
+}
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 4347d68..0ce0e77 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -21,9 +21,11 @@
 #include <linux/io.h>
 #include <asm/idmap.h>
 #include <asm/pgalloc.h>
+#include <asm/cacheflush.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
 #include <asm/mach/map.h>
 #include <trace/events/kvm.h>
 
@@ -488,9 +490,148 @@ out:
 	return ret;
 }
 
+static void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
+{
+	/*
+	 * If we are going to insert an instruction page and the icache is
+	 * either VIPT or PIPT, there is a potential problem where the host
+	 * (or another VM) may have used the same page as this guest, and we
+	 * read incorrect data from the icache.  If we're using a PIPT cache,
+	 * we can invalidate just that page, but if we are using a VIPT cache
+	 * we need to invalidate the entire icache - damn shame - as written
+	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
+	 *
+	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
+	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
+	 */
+	if (icache_is_pipt()) {
+		unsigned long hva = gfn_to_hva(kvm, gfn);
+		__cpuc_coherent_user_range(hva, hva + PAGE_SIZE);
+	} else if (!icache_is_vivt_asid_tagged()) {
+		/* any kind of VIPT cache */
+		__flush_icache_all();
+	}
+}
+
+static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			  gfn_t gfn, struct kvm_memory_slot *memslot,
+			  unsigned long fault_status)
+{
+	pte_t new_pte;
+	pfn_t pfn;
+	int ret;
+	bool write_fault, writable;
+	unsigned long mmu_seq;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+
+	write_fault = kvm_is_write_fault(vcpu->arch.hsr);
+	if (fault_status == FSC_PERM && !write_fault) {
+		kvm_err("Unexpected L2 read permission error\n");
+		return -EFAULT;
+	}
+
+	/* We need minimum second+third level pages */
+	ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
+	if (ret)
+		return ret;
+
+	mmu_seq = vcpu->kvm->mmu_notifier_seq;
+	/*
+	 * Ensure the read of mmu_notifier_seq happens before we call
+	 * gfn_to_pfn_prot (which calls get_user_pages), so that we don't risk
+	 * the page we just got a reference to gets unmapped before we have a
+	 * chance to grab the mmu_lock, which ensure that if the page gets
+	 * unmapped afterwards, the call to kvm_unmap_hva will take it away
+	 * from us again properly. This smp_rmb() interacts with the smp_wmb()
+	 * in kvm_mmu_notifier_invalidate_<page|range_end>.
+	 */
+	smp_rmb();
+
+	pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
+	if (is_error_pfn(pfn))
+		return -EFAULT;
+
+	new_pte = pfn_pte(pfn, PAGE_S2);
+	coherent_icache_guest_page(vcpu->kvm, gfn);
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
+		goto out_unlock;
+	if (writable) {
+		pte_val(new_pte) |= L_PTE_S2_RDWR;
+		kvm_set_pfn_dirty(pfn);
+	}
+	stage2_set_pte(vcpu->kvm, memcache, fault_ipa, &new_pte, false);
+
+out_unlock:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+	kvm_release_pfn_clean(pfn);
+	return 0;
+}
+
+/**
+ * kvm_handle_guest_abort - handles all 2nd stage aborts
+ * @vcpu:	the VCPU pointer
+ * @run:	the kvm_run structure
+ *
+ * Any abort that gets to the host is almost guaranteed to be caused by a
+ * missing second stage translation table entry, which can mean that either the
+ * guest simply needs more memory and we must allocate an appropriate page or it
+ * can mean that the guest tried to access I/O memory, which is emulated by user
+ * space. The distinction is based on the IPA causing the fault and whether this
+ * memory region has been registered as standard RAM by user space.
+ */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	return -EINVAL;
+	unsigned long hsr_ec;
+	unsigned long fault_status;
+	phys_addr_t fault_ipa;
+	struct kvm_memory_slot *memslot = NULL;
+	bool is_iabt;
+	gfn_t gfn;
+	int ret;
+
+	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
+	is_iabt = (hsr_ec == HSR_EC_IABT);
+	fault_ipa = ((phys_addr_t)vcpu->arch.hpfar & HPFAR_MASK) << 8;
+
+	trace_kvm_guest_fault(*vcpu_pc(vcpu), vcpu->arch.hsr,
+			      vcpu->arch.hxfar, fault_ipa);
+
+	/* Check the stage-2 fault is trans. fault or write fault */
+	fault_status = (vcpu->arch.hsr & HSR_FSC_TYPE);
+	if (fault_status != FSC_FAULT && fault_status != FSC_PERM) {
+		kvm_err("Unsupported fault status: EC=%#lx DFCS=%#lx\n",
+			hsr_ec, fault_status);
+		return -EFAULT;
+	}
+
+	gfn = fault_ipa >> PAGE_SHIFT;
+	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
+		if (is_iabt) {
+			/* Prefetch Abort on I/O address */
+			kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
+			return 1;
+		}
+
+		if (fault_status != FSC_FAULT) {
+			kvm_err("Unsupported fault status on io memory: %#lx\n",
+				fault_status);
+			return -EFAULT;
+		}
+
+		kvm_pr_unimpl("I/O address abort...");
+		return 0;
+	}
+
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot->user_alloc) {
+		kvm_err("non user-alloc memslots not supported\n");
+		return -EINVAL;
+	}
+
+	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
+	return ret ? ret : 1;
 }
 
 static void handle_hva_to_gpa(struct kvm *kvm,
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index c86a513..5d65751 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -39,6 +39,32 @@ TRACE_EVENT(kvm_exit,
 	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_guest_fault,
+	TP_PROTO(unsigned long vcpu_pc, unsigned long hsr,
+		 unsigned long hxfar,
+		 unsigned long long ipa),
+	TP_ARGS(vcpu_pc, hsr, hxfar, ipa),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+		__field(	unsigned long,	hsr		)
+		__field(	unsigned long,	hxfar		)
+		__field(   unsigned long long,	ipa		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+		__entry->hsr			= hsr;
+		__entry->hxfar			= hxfar;
+		__entry->ipa			= ipa;
+	),
+
+	TP_printk("guest fault at PC %#08lx (hxfar %#08lx, "
+		  "ipa %#16llx, hsr %#08lx",
+		  __entry->vcpu_pc, __entry->hxfar,
+		  __entry->ipa, __entry->hsr)
+);
+
 TRACE_EVENT(kvm_irq_line,
 	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
 	TP_ARGS(type, vcpu_idx, irq_num, level),

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:40   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:40 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm
  Cc: Marc Zyngier, Marcelo Tosatti, Rusty Russell

When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.

Certain classes of load/store operations do not support the syndrome
information provided in the HSR and we therefore must be able to fetch
the offending instruction from guest memory and decode it manually.

We only support instruction decoding for valid reasonable MMIO operations
where trapping them do not provide sufficient information in the HSR (no
16-bit Thumb instructions provide register writeback that we care about).

The following instruction types are NOT supported for MMIO operations
despite the HSR not containing decode info:
 - any Load/Store multiple
 - any load/store exclusive
 - any load/store dual
 - anything with the PC as the dest register

This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.

Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
(1) Guest complicated mmio instruction traps.
(2) The hardware doesn't tell us enough, so we need to read the actual
    instruction which was being exectuted.
(3) KVM maps the instruction virtual address to a physical address.
(4) The guest (SMP) swaps out that page, and fills it with something else.
(5) We read the physical address, but now that's the wrong thing.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h     |    3 
 arch/arm/include/asm/kvm_asm.h     |    2 
 arch/arm/include/asm/kvm_decode.h  |   47 ++++
 arch/arm/include/asm/kvm_emulate.h |    8 +
 arch/arm/include/asm/kvm_host.h    |    7 +
 arch/arm/include/asm/kvm_mmio.h    |   51 ++++
 arch/arm/kvm/Makefile              |    2 
 arch/arm/kvm/arm.c                 |   14 +
 arch/arm/kvm/decode.c              |  462 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/emulate.c             |  169 +++++++++++++
 arch/arm/kvm/interrupts.S          |   38 +++
 arch/arm/kvm/mmio.c                |  154 ++++++++++++
 arch/arm/kvm/mmu.c                 |    7 -
 arch/arm/kvm/trace.h               |   21 ++
 14 files changed, 981 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_decode.h
 create mode 100644 arch/arm/include/asm/kvm_mmio.h
 create mode 100644 arch/arm/kvm/decode.c
 create mode 100644 arch/arm/kvm/mmio.c

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 3ff6f22..151c4ce 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -173,8 +173,11 @@
 #define HSR_ISS		(HSR_IL - 1)
 #define HSR_ISV_SHIFT	(24)
 #define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_SRT_SHIFT	(16)
+#define HSR_SRT_MASK	(0xf << HSR_SRT_SHIFT)
 #define HSR_FSC		(0x3f)
 #define HSR_FSC_TYPE	(0x3c)
+#define HSR_SSE		(1 << 21)
 #define HSR_WNR		(1 << 6)
 #define HSR_CV_SHIFT	(24)
 #define HSR_CV		(1U << HSR_CV_SHIFT)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 5e06e81..58d787b 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -77,6 +77,8 @@ extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+
+extern u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv);
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_decode.h b/arch/arm/include/asm/kvm_decode.h
new file mode 100644
index 0000000..3c37cb9
--- /dev/null
+++ b/arch/arm/include/asm/kvm_decode.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_DECODE_H__
+#define __ARM_KVM_DECODE_H__
+
+#include <linux/types.h>
+
+struct kvm_vcpu;
+struct kvm_exit_mmio;
+
+struct kvm_decode {
+	struct pt_regs *regs;
+	unsigned long fault_addr;
+	unsigned long rt;
+	bool sign_extend;
+};
+
+int kvm_decode_load_store(struct kvm_decode *decode, unsigned long instr,
+			  struct kvm_exit_mmio *mmio);
+
+static inline unsigned long *kvm_decode_reg(struct kvm_decode *decode, int reg)
+{
+	return &decode->regs->uregs[reg];
+}
+
+static inline unsigned long *kvm_decode_cpsr(struct kvm_decode *decode)
+{
+	return &decode->regs->ARM_cpsr;
+}
+
+#endif /* __ARM_KVM_DECODE_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 01a755b..375795b 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -21,11 +21,14 @@
 
 #include <linux/kvm_host.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_mmio.h>
 
 u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
 u32 *vcpu_spsr(struct kvm_vcpu *vcpu);
 
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			struct kvm_exit_mmio *mmio);
 void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
@@ -53,4 +56,9 @@ static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
 	return cpsr_mode > USR_MODE;;
 }
 
+static inline bool kvm_vcpu_reg_is_pc(struct kvm_vcpu *vcpu, int reg)
+{
+	return reg == 15;
+}
+
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6cc8933..ca40795 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -22,6 +22,7 @@
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/fpstate.h>
+#include <asm/kvm_decode.h>
 
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
 #define KVM_USER_MEM_SLOTS 32
@@ -99,6 +100,12 @@ struct kvm_vcpu_arch {
 	int last_pcpu;
 	cpumask_t require_dcache_flush;
 
+	/* Don't run the guest: see copy_current_insn() */
+	bool pause;
+
+	/* IO related fields */
+	struct kvm_decode mmio_decode;
+
 	/* Interrupt related fields */
 	u32 irq_lines;		/* IRQ and FIQ levels */
 
diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
new file mode 100644
index 0000000..31ab9f5
--- /dev/null
+++ b/arch/arm/include/asm/kvm_mmio.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_MMIO_H__
+#define __ARM_KVM_MMIO_H__
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+/*
+ * The in-kernel MMIO emulation code wants to use a copy of run->mmio,
+ * which is an anonymous type. Use our own type instead.
+ */
+struct kvm_exit_mmio {
+	phys_addr_t	phys_addr;
+	u8		data[8];
+	u32		len;
+	bool		is_write;
+};
+
+static inline void kvm_prepare_mmio(struct kvm_run *run,
+				    struct kvm_exit_mmio *mmio)
+{
+	run->mmio.phys_addr	= mmio->phys_addr;
+	run->mmio.len		= mmio->len;
+	run->mmio.is_write	= mmio->is_write;
+	memcpy(run->mmio.data, mmio->data, mmio->len);
+	run->exit_reason	= KVM_EXIT_MMIO;
+}
+
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
+		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot);
+
+#endif	/* __ARM_KVM_MMIO_H__ */
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 88edce6..44a5f4b 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -18,4 +18,4 @@ kvm-arm-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o coproc_a15.o
+obj-y += coproc.o coproc_a15.o mmio.o decode.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 0b4ffcf..f42d828 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -614,6 +614,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	if (unlikely(vcpu->arch.target < 0))
 		return -ENOEXEC;
 
+	if (run->exit_reason == KVM_EXIT_MMIO) {
+		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
+		if (ret)
+			return ret;
+	}
+
 	if (vcpu->sigset_active)
 		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
 
@@ -649,7 +655,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_guest_enter();
 		vcpu->mode = IN_GUEST_MODE;
 
-		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
+		smp_mb(); /* set mode before reading vcpu->arch.pause */
+		if (unlikely(vcpu->arch.pause)) {
+			/* This means ignore, try again. */
+			ret = ARM_EXCEPTION_IRQ;
+		} else {
+			ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
+		}
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		vcpu->arch.last_pcpu = smp_processor_id();
diff --git a/arch/arm/kvm/decode.c b/arch/arm/kvm/decode.c
new file mode 100644
index 0000000..469cf14
--- /dev/null
+++ b/arch/arm/kvm/decode.c
@@ -0,0 +1,462 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/kvm_host.h>
+#include <asm/kvm_mmio.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_decode.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
+
+struct arm_instr {
+	/* Instruction decoding */
+	u32 opc;
+	u32 opc_mask;
+
+	/* Decoding for the register write back */
+	bool register_form;
+	u32 imm;
+	u8 Rm;
+	u8 type;
+	u8 shift_n;
+
+	/* Common decoding */
+	u8 len;
+	bool sign_extend;
+	bool w;
+
+	bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
+		       unsigned long instr, struct arm_instr *ai);
+};
+
+enum SRType {
+	SRType_LSL,
+	SRType_LSR,
+	SRType_ASR,
+	SRType_ROR,
+	SRType_RRX
+};
+
+/* Modelled after DecodeImmShift() in the ARM ARM */
+static enum SRType decode_imm_shift(u8 type, u8 imm5, u8 *amount)
+{
+	switch (type) {
+	case 0x0:
+		*amount = imm5;
+		return SRType_LSL;
+	case 0x1:
+		*amount = (imm5 == 0) ? 32 : imm5;
+		return SRType_LSR;
+	case 0x2:
+		*amount = (imm5 == 0) ? 32 : imm5;
+		return SRType_ASR;
+	case 0x3:
+		if (imm5 == 0) {
+			*amount = 1;
+			return SRType_RRX;
+		} else {
+			*amount = imm5;
+			return SRType_ROR;
+		}
+	}
+
+	return SRType_LSL;
+}
+
+/* Modelled after Shift() in the ARM ARM */
+static u32 shift(u32 value, u8 N, enum SRType type, u8 amount, bool carry_in)
+{
+	u32 mask = (1 << N) - 1;
+	s32 svalue = (s32)value;
+
+	BUG_ON(N > 32);
+	BUG_ON(type == SRType_RRX && amount != 1);
+	BUG_ON(amount > N);
+
+	if (amount == 0)
+		return value;
+
+	switch (type) {
+	case SRType_LSL:
+		value <<= amount;
+		break;
+	case SRType_LSR:
+		 value >>= amount;
+		break;
+	case SRType_ASR:
+		if (value & (1 << (N - 1)))
+			svalue |= ((-1UL) << N);
+		value = svalue >> amount;
+		break;
+	case SRType_ROR:
+		value = (value >> amount) | (value << (N - amount));
+		break;
+	case SRType_RRX: {
+		u32 C = (carry_in) ? 1 : 0;
+		value = (value >> 1) | (C << (N - 1));
+		break;
+	}
+	}
+
+	return value & mask;
+}
+
+static bool decode_arm_wb(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
+			  unsigned long instr, const struct arm_instr *ai)
+{
+	u8 Rt = (instr >> 12) & 0xf;
+	u8 Rn = (instr >> 16) & 0xf;
+	u8 W = (instr >> 21) & 1;
+	u8 U = (instr >> 23) & 1;
+	u8 P = (instr >> 24) & 1;
+	u32 base_addr = *kvm_decode_reg(decode, Rn);
+	u32 offset_addr, offset;
+
+	/*
+	 * Technically this is allowed in certain circumstances,
+	 * but we don't support it.
+	 */
+	if (Rt == 15 || Rn == 15)
+		return false;
+
+	if (P && !W) {
+		kvm_err("Decoding operation with valid ISV?\n");
+		return false;
+	}
+
+	decode->rt = Rt;
+
+	if (ai->register_form) {
+		/* Register operation */
+		enum SRType s_type;
+		u8 shift_n = 0;
+		bool c_bit = *kvm_decode_cpsr(decode) & PSR_C_BIT;
+		u32 s_reg = *kvm_decode_reg(decode, ai->Rm);
+
+		s_type = decode_imm_shift(ai->type, ai->shift_n, &shift_n);
+		offset = shift(s_reg, 5, s_type, shift_n, c_bit);
+	} else {
+		/* Immediate operation */
+		offset = ai->imm;
+	}
+
+	/* Handle Writeback */
+	if (U)
+		offset_addr = base_addr + offset;
+	else
+		offset_addr = base_addr - offset;
+	*kvm_decode_reg(decode, Rn) = offset_addr;
+	return true;
+}
+
+static bool decode_arm_ls(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
+			  unsigned long instr, struct arm_instr *ai)
+{
+	u8 A = (instr >> 25) & 1;
+
+	mmio->is_write = ai->w;
+	mmio->len = ai->len;
+	decode->sign_extend = false;
+
+	ai->register_form = A;
+	ai->imm = instr & 0xfff;
+	ai->Rm = instr & 0xf;
+	ai->type = (instr >> 5) & 0x3;
+	ai->shift_n = (instr >> 7) & 0x1f;
+
+	return decode_arm_wb(decode, mmio, instr, ai);
+}
+
+static bool decode_arm_extra(struct kvm_decode *decode,
+			     struct kvm_exit_mmio *mmio,
+			     unsigned long instr, struct arm_instr *ai)
+{
+	mmio->is_write = ai->w;
+	mmio->len = ai->len;
+	decode->sign_extend = ai->sign_extend;
+
+	ai->register_form = !((instr >> 22) & 1);
+	ai->imm = ((instr >> 4) & 0xf0) | (instr & 0xf);
+	ai->Rm = instr & 0xf;
+	ai->type = 0; /* SRType_LSL */
+	ai->shift_n = 0;
+
+	return decode_arm_wb(decode, mmio, instr, ai);
+}
+
+/*
+ * The encodings in this table assumes that a fault was generated where the
+ * ISV field in the HSR was clear, and the decoding information was invalid,
+ * which means that a register write-back occurred, the PC was used as the
+ * destination or a load/store multiple operation was used. Since the latter
+ * two cases are crazy for MMIO on the guest side, we simply inject a fault
+ * when this happens and support the common case.
+ *
+ * We treat unpriviledged loads and stores of words and bytes like all other
+ * loads and stores as their encodings mandate the W bit set and the P bit
+ * clear.
+ */
+static const struct arm_instr arm_instr[] = {
+	/**************** Load/Store Word and Byte **********************/
+	/* Store word with writeback */
+	{ .opc = 0x04000000, .opc_mask = 0x0c500000, .len = 4, .w = true,
+		.sign_extend = false, .decode = decode_arm_ls },
+	/* Store byte with writeback */
+	{ .opc = 0x04400000, .opc_mask = 0x0c500000, .len = 1, .w = true,
+		.sign_extend = false, .decode = decode_arm_ls },
+	/* Load word with writeback */
+	{ .opc = 0x04100000, .opc_mask = 0x0c500000, .len = 4, .w = false,
+		.sign_extend = false, .decode = decode_arm_ls },
+	/* Load byte with writeback */
+	{ .opc = 0x04500000, .opc_mask = 0x0c500000, .len = 1, .w = false,
+		.sign_extend = false, .decode = decode_arm_ls },
+
+	/*************** Extra load/store instructions ******************/
+
+	/* Store halfword with writeback */
+	{ .opc = 0x000000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = true,
+		.sign_extend = false, .decode = decode_arm_extra },
+	/* Load halfword with writeback */
+	{ .opc = 0x001000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
+		.sign_extend = false, .decode = decode_arm_extra },
+
+	/* Load dual with writeback */
+	{ .opc = 0x000000d0, .opc_mask = 0x0c1000f0, .len = 8, .w = false,
+		.sign_extend = false, .decode = decode_arm_extra },
+	/* Load signed byte with writeback */
+	{ .opc = 0x001000d0, .opc_mask = 0x0c1000f0, .len = 1, .w = false,
+		.sign_extend = true,  .decode = decode_arm_extra },
+
+	/* Store dual with writeback */
+	{ .opc = 0x000000f0, .opc_mask = 0x0c1000f0, .len = 8, .w = true,
+		.sign_extend = false, .decode = decode_arm_extra },
+	/* Load signed halfword with writeback */
+	{ .opc = 0x001000f0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
+		.sign_extend = true,  .decode = decode_arm_extra },
+
+	/* Store halfword unprivileged */
+	{ .opc = 0x002000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = true,
+		.sign_extend = false, .decode = decode_arm_extra },
+	/* Load halfword unprivileged */
+	{ .opc = 0x003000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
+		.sign_extend = false, .decode = decode_arm_extra },
+	/* Load signed byte unprivileged */
+	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 1, .w = false,
+		.sign_extend = true , .decode = decode_arm_extra },
+	/* Load signed halfword unprivileged */
+	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
+		.sign_extend = true , .decode = decode_arm_extra },
+};
+
+static bool kvm_decode_arm_ls(struct kvm_decode *decode, unsigned long instr,
+			      struct kvm_exit_mmio *mmio)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(arm_instr); i++) {
+		const struct arm_instr *ai = &arm_instr[i];
+		if ((instr & ai->opc_mask) == ai->opc) {
+			struct arm_instr ai_copy = *ai;
+			return ai->decode(decode, mmio, instr, &ai_copy);
+		}
+	}
+	return false;
+}
+
+struct thumb_instr {
+	bool is32;
+
+	u8 opcode;
+	u8 opcode_mask;
+	u8 op2;
+	u8 op2_mask;
+
+	bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
+		       unsigned long instr, const struct thumb_instr *ti);
+};
+
+static bool decode_thumb_wb(struct kvm_decode *decode,
+			    struct kvm_exit_mmio *mmio,
+			    unsigned long instr)
+{
+	bool P = (instr >> 10) & 1;
+	bool U = (instr >> 9) & 1;
+	u8 imm8 = instr & 0xff;
+	u32 offset_addr = decode->fault_addr;
+	u8 Rn = (instr >> 16) & 0xf;
+
+	decode->rt = (instr >> 12) & 0xf;
+
+	if (Rn == 15)
+		return false;
+
+	/* Handle Writeback */
+	if (!P && U)
+		*kvm_decode_reg(decode, Rn) = offset_addr + imm8;
+	else if (!P && !U)
+		*kvm_decode_reg(decode, Rn) = offset_addr - imm8;
+	return true;
+}
+
+static bool decode_thumb_str(struct kvm_decode *decode,
+			     struct kvm_exit_mmio *mmio,
+			     unsigned long instr, const struct thumb_instr *ti)
+{
+	u8 op1 = (instr >> (16 + 5)) & 0x7;
+	u8 op2 = (instr >> 6) & 0x3f;
+
+	mmio->is_write = true;
+	decode->sign_extend = false;
+
+	switch (op1) {
+	case 0x0: mmio->len = 1; break;
+	case 0x1: mmio->len = 2; break;
+	case 0x2: mmio->len = 4; break;
+	default:
+		  return false; /* Only register write-back versions! */
+	}
+
+	if ((op2 & 0x24) == 0x24) {
+		/* STRB (immediate, thumb, W=1) */
+		return decode_thumb_wb(decode, mmio, instr);
+	}
+
+	return false;
+}
+
+static bool decode_thumb_ldr(struct kvm_decode *decode,
+			     struct kvm_exit_mmio *mmio,
+			     unsigned long instr, const struct thumb_instr *ti)
+{
+	u8 op1 = (instr >> (16 + 7)) & 0x3;
+	u8 op2 = (instr >> 6) & 0x3f;
+
+	mmio->is_write = false;
+
+	switch (ti->op2 & 0x7) {
+	case 0x1: mmio->len = 1; break;
+	case 0x3: mmio->len = 2; break;
+	case 0x5: mmio->len = 4; break;
+	}
+
+	if (op1 == 0x0)
+		decode->sign_extend = false;
+	else if (op1 == 0x2 && (ti->op2 & 0x7) != 0x5)
+		decode->sign_extend = true;
+	else
+		return false; /* Only register write-back versions! */
+
+	if ((op2 & 0x24) == 0x24) {
+		/* LDR{S}X (immediate, thumb, W=1) */
+		return decode_thumb_wb(decode, mmio, instr);
+	}
+
+	return false;
+}
+
+/*
+ * We only support instruction decoding for valid reasonable MMIO operations
+ * where trapping them do not provide sufficient information in the HSR (no
+ * 16-bit Thumb instructions provide register writeback that we care about).
+ *
+ * The following instruciton types are NOT supported for MMIO operations
+ * despite the HSR not containing decode info:
+ *  - any Load/Store multiple
+ *  - any load/store exclusive
+ *  - any load/store dual
+ *  - anything with the PC as the dest register
+ */
+static const struct thumb_instr thumb_instr[] = {
+	/**************** 32-bit Thumb instructions **********************/
+	/* Store single data item:	Op1 == 11, Op2 == 000xxx0 */
+	{ .is32 = true,  .opcode = 3, .op2 = 0x00, .op2_mask = 0x71,
+						decode_thumb_str	},
+
+	/* Load byte:			Op1 == 11, Op2 == 00xx001 */
+	{ .is32 = true,  .opcode = 3, .op2 = 0x01, .op2_mask = 0x67,
+						decode_thumb_ldr	},
+
+	/* Load halfword:		Op1 == 11, Op2 == 00xx011 */
+	{ .is32 = true,  .opcode = 3, .op2 = 0x03, .op2_mask = 0x67,
+						decode_thumb_ldr	},
+
+	/* Load word:			Op1 == 11, Op2 == 00xx101 */
+	{ .is32 = true,  .opcode = 3, .op2 = 0x05, .op2_mask = 0x67,
+						decode_thumb_ldr	},
+};
+
+
+
+static bool kvm_decode_thumb_ls(struct kvm_decode *decode, unsigned long instr,
+				struct kvm_exit_mmio *mmio)
+{
+	bool is32 = is_wide_instruction(instr);
+	bool is16 = !is32;
+	struct thumb_instr tinstr; /* re-use to pass on already decoded info */
+	int i;
+
+	if (is16) {
+		tinstr.opcode = (instr >> 10) & 0x3f;
+	} else {
+		tinstr.opcode = (instr >> (16 + 11)) & 0x3;
+		tinstr.op2 = (instr >> (16 + 4)) & 0x7f;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(thumb_instr); i++) {
+		const struct thumb_instr *ti = &thumb_instr[i];
+		if (ti->is32 != is32)
+			continue;
+
+		if (is16) {
+			if ((tinstr.opcode & ti->opcode_mask) != ti->opcode)
+				continue;
+		} else {
+			if (ti->opcode != tinstr.opcode)
+				continue;
+			if ((ti->op2_mask & tinstr.op2) != ti->op2)
+				continue;
+		}
+
+		return ti->decode(decode, mmio, instr, &tinstr);
+	}
+
+	return false;
+}
+
+/**
+ * kvm_decode_load_store - decodes load/store instructions
+ * @decode: reads regs and fault_addr, writes rt and sign_extend
+ * @instr:  instruction to decode
+ * @mmio:   fills in len and is_write
+ *
+ * Decode load/store instructions with HSR ISV clear. The code assumes that
+ * this was indeed a KVM fault and therefore assumes registers write back for
+ * single load/store operations and does not support using the PC as the
+ * destination register.
+ */
+int kvm_decode_load_store(struct kvm_decode *decode, unsigned long instr,
+			  struct kvm_exit_mmio *mmio)
+{
+	bool is_thumb;
+
+	is_thumb = !!(*kvm_decode_cpsr(decode) & PSR_T_BIT);
+	if (!is_thumb)
+		return kvm_decode_arm_ls(decode, instr, mmio) ? 0 : 1;
+	else
+		return kvm_decode_thumb_ls(decode, instr, mmio) ? 0 : 1;
+}
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index d61450a..ad743b7 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -20,6 +20,7 @@
 #include <linux/kvm_host.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_decode.h>
 #include <trace/events/kvm.h>
 
 #include "trace.h"
@@ -176,6 +177,174 @@ int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return 1;
 }
 
+static u64 kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv)
+{
+	return kvm_call_hyp(__kvm_va_to_pa, vcpu, va, priv);
+}
+
+/**
+ * copy_from_guest_va - copy memory from guest (very slow!)
+ * @vcpu:	vcpu pointer
+ * @dest:	memory to copy into
+ * @gva:	virtual address in guest to copy from
+ * @len:	length to copy
+ * @priv:	use guest PL1 (ie. kernel) mappings
+ *              otherwise use guest PL0 mappings.
+ *
+ * Returns true on success, false on failure (unlikely, but retry).
+ */
+static bool copy_from_guest_va(struct kvm_vcpu *vcpu,
+			       void *dest, unsigned long gva, size_t len,
+			       bool priv)
+{
+	u64 par;
+	phys_addr_t pc_ipa;
+	int err;
+
+	BUG_ON((gva & PAGE_MASK) != ((gva + len) & PAGE_MASK));
+	par = kvm_va_to_pa(vcpu, gva & PAGE_MASK, priv);
+	if (par & 1) {
+		kvm_err("IO abort from invalid instruction address"
+			" %#lx!\n", gva);
+		return false;
+	}
+
+	BUG_ON(!(par & (1U << 11)));
+	pc_ipa = par & PAGE_MASK & ((1ULL << 32) - 1);
+	pc_ipa += gva & ~PAGE_MASK;
+
+
+	err = kvm_read_guest(vcpu->kvm, pc_ipa, dest, len);
+	if (unlikely(err))
+		return false;
+
+	return true;
+}
+
+/*
+ * We have to be very careful copying memory from a running (ie. SMP) guest.
+ * Another CPU may remap the page (eg. swap out a userspace text page) as we
+ * read the instruction.  Unlike normal hardware operation, to emulate an
+ * instruction we map the virtual to physical address then read that memory
+ * as separate steps, thus not atomic.
+ *
+ * Fortunately this is so rare (we don't usually need the instruction), we
+ * can go very slowly and noone will mind.
+ */
+static bool copy_current_insn(struct kvm_vcpu *vcpu, unsigned long *instr)
+{
+	int i;
+	bool ret;
+	struct kvm_vcpu *v;
+	bool is_thumb;
+	size_t instr_len;
+
+	/* Don't cross with IPIs in kvm_main.c */
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/* Tell them all to pause, so no more will enter guest. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		v->arch.pause = true;
+
+	/* Set ->pause before we read ->mode */
+	smp_mb();
+
+	/* Kick out any which are still running. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm) {
+		/* Guest could exit now, making cpu wrong. That's OK. */
+		if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
+			force_vm_exit(get_cpu_mask(v->cpu));
+		}
+	}
+
+
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	instr_len = (is_thumb) ? 2 : 4;
+
+	BUG_ON(!is_thumb && *vcpu_pc(vcpu) & 0x3);
+
+	/* Now guest isn't running, we can va->pa map and copy atomically. */
+	ret = copy_from_guest_va(vcpu, instr, *vcpu_pc(vcpu), instr_len,
+				 vcpu_mode_priv(vcpu));
+	if (!ret)
+		goto out;
+
+	/* A 32-bit thumb2 instruction can actually go over a page boundary! */
+	if (is_thumb && is_wide_instruction(*instr)) {
+		*instr = *instr << 16;
+		ret = copy_from_guest_va(vcpu, instr, *vcpu_pc(vcpu) + 2, 2,
+					 vcpu_mode_priv(vcpu));
+	}
+
+out:
+	/* Release them all. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		v->arch.pause = false;
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return ret;
+}
+
+/**
+ * kvm_emulate_mmio_ls - emulates load/store instructions made to I/O memory
+ * @vcpu:	The vcpu pointer
+ * @fault_ipa:	The IPA that caused the 2nd stage fault
+ * @mmio:      Pointer to struct to hold decode information
+ *
+ * Some load/store instructions cannot be emulated using the information
+ * presented in the HSR, for instance, register write-back instructions are not
+ * supported. We therefore need to fetch the instruction, decode it, and then
+ * emulate its behavior.
+ *
+ * Handles emulation of load/store instructions which cannot be emulated through
+ * information found in the HSR on faults. It is necessary in this case to
+ * simply decode the offending instruction in software and determine the
+ * required operands.
+ */
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			struct kvm_exit_mmio *mmio)
+{
+	unsigned long instr = 0;
+	struct pt_regs current_regs;
+	struct kvm_decode *decode = &vcpu->arch.mmio_decode;
+	int ret;
+
+	trace_kvm_mmio_emulate(*vcpu_pc(vcpu), instr, *vcpu_cpsr(vcpu));
+
+	/* If it fails (SMP race?), we reenter guest for it to retry. */
+	if (!copy_current_insn(vcpu, &instr))
+		return 1;
+
+	mmio->phys_addr = fault_ipa;
+
+	memcpy(&current_regs, &vcpu->arch.regs.usr_regs, sizeof(current_regs));
+	current_regs.ARM_sp = *vcpu_reg(vcpu, 13);
+	current_regs.ARM_lr = *vcpu_reg(vcpu, 14);
+
+	decode->regs = &current_regs;
+	decode->fault_addr = vcpu->arch.hxfar;
+	ret = kvm_decode_load_store(decode, instr, mmio);
+	if (ret) {
+		kvm_debug("Insrn. decode error: %#08lx (cpsr: %#08x"
+			  "pc: %#08x)\n",
+			  instr, *vcpu_cpsr(vcpu), *vcpu_pc(vcpu));
+		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
+		return ret;
+	}
+
+	memcpy(&vcpu->arch.regs.usr_regs, &current_regs, sizeof(current_regs));
+	*vcpu_reg(vcpu, 13) = current_regs.ARM_sp;
+	*vcpu_reg(vcpu, 14) = current_regs.ARM_lr;
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	kvm_skip_instr(vcpu, is_wide_instruction(instr));
+	return 0;
+}
+
 /**
  * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
  * @vcpu:	The VCPU pointer
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 08adcd5..45570b8 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -192,6 +192,44 @@ after_vfp_restore:
 	mov	r0, r1			@ Return the return code
 	bx	lr			@ return to IOCTL
 
+
+/********************************************************************
+ * Translate VA to PA
+ *
+ * u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv)
+ *
+ * Arguments:
+ *  r0: pointer to vcpu struct
+ *  r1: virtual address to map (rounded to page)
+ *  r2: 1 = P1 (read) mapping, 0 = P0 (read) mapping.
+ * Returns 64 bit PAR value.
+ */
+ENTRY(__kvm_va_to_pa)
+	push	{r4-r12}
+
+	@ Fold flag into r1, easier than using stack.
+	cmp	r2, #0
+	movne	r2, #1
+	orr	r1, r1, r2
+
+	@ This swaps too many registers, but we're in the slow path anyway.
+	read_cp15_state store_to_vcpu = 0
+	write_cp15_state read_from_vcpu = 1
+
+	ands	r2, r1, #1
+	bic	r1, r1, r2
+	mcrne	p15, 0, r1, c7, c8, 0	@ VA to PA, ATS1CPR
+	mcreq	p15, 0, r1, c7, c8, 2	@ VA to PA, ATS1CUR
+	isb
+
+	@ Restore host state.
+	read_cp15_state store_to_vcpu = 1
+	write_cp15_state read_from_vcpu = 0
+
+	mrrc	p15, 0, r0, r1, c7	@ PAR
+	pop	{r4-r12}
+	bx	lr
+
 ENTRY(kvm_call_hyp)
 	hvc	#0
 	bx	lr
diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
new file mode 100644
index 0000000..d6a4ca0
--- /dev/null
+++ b/arch/arm/kvm/mmio.c
@@ -0,0 +1,154 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <asm/kvm_mmio.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_decode.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
+
+/**
+ * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation
+ * @vcpu: The VCPU pointer
+ * @run:  The VCPU run struct containing the mmio data
+ *
+ * This should only be called after returning from userspace for MMIO load
+ * emulation.
+ */
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	__u32 *dest;
+	unsigned int len;
+	int mask;
+
+	if (!run->mmio.is_write) {
+		dest = vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt);
+		memset(dest, 0, sizeof(int));
+
+		len = run->mmio.len;
+		if (len > 4)
+			return -EINVAL;
+
+		memcpy(dest, run->mmio.data, len);
+
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
+				*((u64 *)run->mmio.data));
+
+		if (vcpu->arch.mmio_decode.sign_extend && len < 4) {
+			mask = 1U << ((len * 8) - 1);
+			*dest = (*dest ^ mask) - mask;
+		}
+	}
+
+	return 0;
+}
+
+static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+		      struct kvm_exit_mmio *mmio)
+{
+	unsigned long rt, len;
+	bool is_write, sign_extend;
+
+	if ((vcpu->arch.hsr >> 8) & 1) {
+		/* cache operation on I/O addr, tell guest unsupported */
+		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
+		return 1;
+	}
+
+	if ((vcpu->arch.hsr >> 7) & 1) {
+		/* page table accesses IO mem: tell guest to fix its TTBR */
+		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
+		return 1;
+	}
+
+	switch ((vcpu->arch.hsr >> 22) & 0x3) {
+	case 0:
+		len = 1;
+		break;
+	case 1:
+		len = 2;
+		break;
+	case 2:
+		len = 4;
+		break;
+	default:
+		kvm_err("Hardware is weird: SAS 0b11 is reserved\n");
+		return -EFAULT;
+	}
+
+	is_write = vcpu->arch.hsr & HSR_WNR;
+	sign_extend = vcpu->arch.hsr & HSR_SSE;
+	rt = (vcpu->arch.hsr & HSR_SRT_MASK) >> HSR_SRT_SHIFT;
+
+	if (kvm_vcpu_reg_is_pc(vcpu, rt)) {
+		/* IO memory trying to read/write pc */
+		kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
+		return 1;
+	}
+
+	mmio->is_write = is_write;
+	mmio->phys_addr = fault_ipa;
+	mmio->len = len;
+	vcpu->arch.mmio_decode.sign_extend = sign_extend;
+	vcpu->arch.mmio_decode.rt = rt;
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	kvm_skip_instr(vcpu, (vcpu->arch.hsr >> 25) & 1);
+	return 0;
+}
+
+int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
+		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
+{
+	struct kvm_exit_mmio mmio;
+	unsigned long rt;
+	int ret;
+
+	/*
+	 * Prepare MMIO operation. First stash it in a private
+	 * structure that we can use for in-kernel emulation. If the
+	 * kernel can't handle it, copy it into run->mmio and let user
+	 * space do its magic.
+	 */
+
+	if (vcpu->arch.hsr & HSR_ISV) {
+		ret = decode_hsr(vcpu, fault_ipa, &mmio);
+		if (ret)
+			return ret;
+	} else {
+		ret = kvm_emulate_mmio_ls(vcpu, fault_ipa, &mmio);
+		if (ret)
+			return ret;
+	}
+
+	rt = vcpu->arch.mmio_decode.rt;
+	trace_kvm_mmio((mmio.is_write) ? KVM_TRACE_MMIO_WRITE :
+					 KVM_TRACE_MMIO_READ_UNSATISFIED,
+			mmio.len, fault_ipa,
+			(mmio.is_write) ? *vcpu_reg(vcpu, rt) : 0);
+
+	if (mmio.is_write)
+		memcpy(mmio.data, vcpu_reg(vcpu, rt), mmio.len);
+
+	kvm_prepare_mmio(run, &mmio);
+	return 0;
+}
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 0ce0e77..2a83ac9 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -19,11 +19,13 @@
 #include <linux/mman.h>
 #include <linux/kvm_host.h>
 #include <linux/io.h>
+#include <trace/events/kvm.h>
 #include <asm/idmap.h>
 #include <asm/pgalloc.h>
 #include <asm/cacheflush.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_mmio.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/mach/map.h>
@@ -620,8 +622,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			return -EFAULT;
 		}
 
-		kvm_pr_unimpl("I/O address abort...");
-		return 0;
+		/* Adjust page offset */
+		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
+		return io_mem_abort(vcpu, run, fault_ipa, memslot);
 	}
 
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 5d65751..cd52640 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -90,6 +90,27 @@ TRACE_EVENT(kvm_irq_line,
 		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
 );
 
+TRACE_EVENT(kvm_mmio_emulate,
+	TP_PROTO(unsigned long vcpu_pc, unsigned long instr,
+		 unsigned long cpsr),
+	TP_ARGS(vcpu_pc, instr, cpsr),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+		__field(	unsigned long,	instr		)
+		__field(	unsigned long,	cpsr		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+		__entry->instr			= instr;
+		__entry->cpsr			= cpsr;
+	),
+
+	TP_printk("Emulate MMIO at: 0x%08lx (instr: %08lx, cpsr: %08lx)",
+		  __entry->vcpu_pc, __entry->instr, __entry->cpsr)
+);
+
 /* Architecturally implementation defined CP15 register access */
 TRACE_EVENT(kvm_emulate_cp15_imp,
 	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-08 18:40   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:40 UTC (permalink / raw)
  To: linux-arm-kernel

When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.

Certain classes of load/store operations do not support the syndrome
information provided in the HSR and we therefore must be able to fetch
the offending instruction from guest memory and decode it manually.

We only support instruction decoding for valid reasonable MMIO operations
where trapping them do not provide sufficient information in the HSR (no
16-bit Thumb instructions provide register writeback that we care about).

The following instruction types are NOT supported for MMIO operations
despite the HSR not containing decode info:
 - any Load/Store multiple
 - any load/store exclusive
 - any load/store dual
 - anything with the PC as the dest register

This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.

Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
(1) Guest complicated mmio instruction traps.
(2) The hardware doesn't tell us enough, so we need to read the actual
    instruction which was being exectuted.
(3) KVM maps the instruction virtual address to a physical address.
(4) The guest (SMP) swaps out that page, and fills it with something else.
(5) We read the physical address, but now that's the wrong thing.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_arm.h     |    3 
 arch/arm/include/asm/kvm_asm.h     |    2 
 arch/arm/include/asm/kvm_decode.h  |   47 ++++
 arch/arm/include/asm/kvm_emulate.h |    8 +
 arch/arm/include/asm/kvm_host.h    |    7 +
 arch/arm/include/asm/kvm_mmio.h    |   51 ++++
 arch/arm/kvm/Makefile              |    2 
 arch/arm/kvm/arm.c                 |   14 +
 arch/arm/kvm/decode.c              |  462 ++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/emulate.c             |  169 +++++++++++++
 arch/arm/kvm/interrupts.S          |   38 +++
 arch/arm/kvm/mmio.c                |  154 ++++++++++++
 arch/arm/kvm/mmu.c                 |    7 -
 arch/arm/kvm/trace.h               |   21 ++
 14 files changed, 981 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_decode.h
 create mode 100644 arch/arm/include/asm/kvm_mmio.h
 create mode 100644 arch/arm/kvm/decode.c
 create mode 100644 arch/arm/kvm/mmio.c

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 3ff6f22..151c4ce 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -173,8 +173,11 @@
 #define HSR_ISS		(HSR_IL - 1)
 #define HSR_ISV_SHIFT	(24)
 #define HSR_ISV		(1U << HSR_ISV_SHIFT)
+#define HSR_SRT_SHIFT	(16)
+#define HSR_SRT_MASK	(0xf << HSR_SRT_SHIFT)
 #define HSR_FSC		(0x3f)
 #define HSR_FSC_TYPE	(0x3c)
+#define HSR_SSE		(1 << 21)
 #define HSR_WNR		(1 << 6)
 #define HSR_CV_SHIFT	(24)
 #define HSR_CV		(1U << HSR_CV_SHIFT)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 5e06e81..58d787b 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -77,6 +77,8 @@ extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+
+extern u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv);
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_decode.h b/arch/arm/include/asm/kvm_decode.h
new file mode 100644
index 0000000..3c37cb9
--- /dev/null
+++ b/arch/arm/include/asm/kvm_decode.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_DECODE_H__
+#define __ARM_KVM_DECODE_H__
+
+#include <linux/types.h>
+
+struct kvm_vcpu;
+struct kvm_exit_mmio;
+
+struct kvm_decode {
+	struct pt_regs *regs;
+	unsigned long fault_addr;
+	unsigned long rt;
+	bool sign_extend;
+};
+
+int kvm_decode_load_store(struct kvm_decode *decode, unsigned long instr,
+			  struct kvm_exit_mmio *mmio);
+
+static inline unsigned long *kvm_decode_reg(struct kvm_decode *decode, int reg)
+{
+	return &decode->regs->uregs[reg];
+}
+
+static inline unsigned long *kvm_decode_cpsr(struct kvm_decode *decode)
+{
+	return &decode->regs->ARM_cpsr;
+}
+
+#endif /* __ARM_KVM_DECODE_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 01a755b..375795b 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -21,11 +21,14 @@
 
 #include <linux/kvm_host.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_mmio.h>
 
 u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
 u32 *vcpu_spsr(struct kvm_vcpu *vcpu);
 
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			struct kvm_exit_mmio *mmio);
 void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
@@ -53,4 +56,9 @@ static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
 	return cpsr_mode > USR_MODE;;
 }
 
+static inline bool kvm_vcpu_reg_is_pc(struct kvm_vcpu *vcpu, int reg)
+{
+	return reg == 15;
+}
+
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6cc8933..ca40795 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -22,6 +22,7 @@
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/fpstate.h>
+#include <asm/kvm_decode.h>
 
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
 #define KVM_USER_MEM_SLOTS 32
@@ -99,6 +100,12 @@ struct kvm_vcpu_arch {
 	int last_pcpu;
 	cpumask_t require_dcache_flush;
 
+	/* Don't run the guest: see copy_current_insn() */
+	bool pause;
+
+	/* IO related fields */
+	struct kvm_decode mmio_decode;
+
 	/* Interrupt related fields */
 	u32 irq_lines;		/* IRQ and FIQ levels */
 
diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
new file mode 100644
index 0000000..31ab9f5
--- /dev/null
+++ b/arch/arm/include/asm/kvm_mmio.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_MMIO_H__
+#define __ARM_KVM_MMIO_H__
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_arm.h>
+
+/*
+ * The in-kernel MMIO emulation code wants to use a copy of run->mmio,
+ * which is an anonymous type. Use our own type instead.
+ */
+struct kvm_exit_mmio {
+	phys_addr_t	phys_addr;
+	u8		data[8];
+	u32		len;
+	bool		is_write;
+};
+
+static inline void kvm_prepare_mmio(struct kvm_run *run,
+				    struct kvm_exit_mmio *mmio)
+{
+	run->mmio.phys_addr	= mmio->phys_addr;
+	run->mmio.len		= mmio->len;
+	run->mmio.is_write	= mmio->is_write;
+	memcpy(run->mmio.data, mmio->data, mmio->len);
+	run->exit_reason	= KVM_EXIT_MMIO;
+}
+
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
+		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot);
+
+#endif	/* __ARM_KVM_MMIO_H__ */
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 88edce6..44a5f4b 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -18,4 +18,4 @@ kvm-arm-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o coproc_a15.o
+obj-y += coproc.o coproc_a15.o mmio.o decode.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 0b4ffcf..f42d828 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -614,6 +614,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	if (unlikely(vcpu->arch.target < 0))
 		return -ENOEXEC;
 
+	if (run->exit_reason == KVM_EXIT_MMIO) {
+		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
+		if (ret)
+			return ret;
+	}
+
 	if (vcpu->sigset_active)
 		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
 
@@ -649,7 +655,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_guest_enter();
 		vcpu->mode = IN_GUEST_MODE;
 
-		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
+		smp_mb(); /* set mode before reading vcpu->arch.pause */
+		if (unlikely(vcpu->arch.pause)) {
+			/* This means ignore, try again. */
+			ret = ARM_EXCEPTION_IRQ;
+		} else {
+			ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
+		}
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		vcpu->arch.last_pcpu = smp_processor_id();
diff --git a/arch/arm/kvm/decode.c b/arch/arm/kvm/decode.c
new file mode 100644
index 0000000..469cf14
--- /dev/null
+++ b/arch/arm/kvm/decode.c
@@ -0,0 +1,462 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <linux/kvm_host.h>
+#include <asm/kvm_mmio.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_decode.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
+
+struct arm_instr {
+	/* Instruction decoding */
+	u32 opc;
+	u32 opc_mask;
+
+	/* Decoding for the register write back */
+	bool register_form;
+	u32 imm;
+	u8 Rm;
+	u8 type;
+	u8 shift_n;
+
+	/* Common decoding */
+	u8 len;
+	bool sign_extend;
+	bool w;
+
+	bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
+		       unsigned long instr, struct arm_instr *ai);
+};
+
+enum SRType {
+	SRType_LSL,
+	SRType_LSR,
+	SRType_ASR,
+	SRType_ROR,
+	SRType_RRX
+};
+
+/* Modelled after DecodeImmShift() in the ARM ARM */
+static enum SRType decode_imm_shift(u8 type, u8 imm5, u8 *amount)
+{
+	switch (type) {
+	case 0x0:
+		*amount = imm5;
+		return SRType_LSL;
+	case 0x1:
+		*amount = (imm5 == 0) ? 32 : imm5;
+		return SRType_LSR;
+	case 0x2:
+		*amount = (imm5 == 0) ? 32 : imm5;
+		return SRType_ASR;
+	case 0x3:
+		if (imm5 == 0) {
+			*amount = 1;
+			return SRType_RRX;
+		} else {
+			*amount = imm5;
+			return SRType_ROR;
+		}
+	}
+
+	return SRType_LSL;
+}
+
+/* Modelled after Shift() in the ARM ARM */
+static u32 shift(u32 value, u8 N, enum SRType type, u8 amount, bool carry_in)
+{
+	u32 mask = (1 << N) - 1;
+	s32 svalue = (s32)value;
+
+	BUG_ON(N > 32);
+	BUG_ON(type == SRType_RRX && amount != 1);
+	BUG_ON(amount > N);
+
+	if (amount == 0)
+		return value;
+
+	switch (type) {
+	case SRType_LSL:
+		value <<= amount;
+		break;
+	case SRType_LSR:
+		 value >>= amount;
+		break;
+	case SRType_ASR:
+		if (value & (1 << (N - 1)))
+			svalue |= ((-1UL) << N);
+		value = svalue >> amount;
+		break;
+	case SRType_ROR:
+		value = (value >> amount) | (value << (N - amount));
+		break;
+	case SRType_RRX: {
+		u32 C = (carry_in) ? 1 : 0;
+		value = (value >> 1) | (C << (N - 1));
+		break;
+	}
+	}
+
+	return value & mask;
+}
+
+static bool decode_arm_wb(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
+			  unsigned long instr, const struct arm_instr *ai)
+{
+	u8 Rt = (instr >> 12) & 0xf;
+	u8 Rn = (instr >> 16) & 0xf;
+	u8 W = (instr >> 21) & 1;
+	u8 U = (instr >> 23) & 1;
+	u8 P = (instr >> 24) & 1;
+	u32 base_addr = *kvm_decode_reg(decode, Rn);
+	u32 offset_addr, offset;
+
+	/*
+	 * Technically this is allowed in certain circumstances,
+	 * but we don't support it.
+	 */
+	if (Rt == 15 || Rn == 15)
+		return false;
+
+	if (P && !W) {
+		kvm_err("Decoding operation with valid ISV?\n");
+		return false;
+	}
+
+	decode->rt = Rt;
+
+	if (ai->register_form) {
+		/* Register operation */
+		enum SRType s_type;
+		u8 shift_n = 0;
+		bool c_bit = *kvm_decode_cpsr(decode) & PSR_C_BIT;
+		u32 s_reg = *kvm_decode_reg(decode, ai->Rm);
+
+		s_type = decode_imm_shift(ai->type, ai->shift_n, &shift_n);
+		offset = shift(s_reg, 5, s_type, shift_n, c_bit);
+	} else {
+		/* Immediate operation */
+		offset = ai->imm;
+	}
+
+	/* Handle Writeback */
+	if (U)
+		offset_addr = base_addr + offset;
+	else
+		offset_addr = base_addr - offset;
+	*kvm_decode_reg(decode, Rn) = offset_addr;
+	return true;
+}
+
+static bool decode_arm_ls(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
+			  unsigned long instr, struct arm_instr *ai)
+{
+	u8 A = (instr >> 25) & 1;
+
+	mmio->is_write = ai->w;
+	mmio->len = ai->len;
+	decode->sign_extend = false;
+
+	ai->register_form = A;
+	ai->imm = instr & 0xfff;
+	ai->Rm = instr & 0xf;
+	ai->type = (instr >> 5) & 0x3;
+	ai->shift_n = (instr >> 7) & 0x1f;
+
+	return decode_arm_wb(decode, mmio, instr, ai);
+}
+
+static bool decode_arm_extra(struct kvm_decode *decode,
+			     struct kvm_exit_mmio *mmio,
+			     unsigned long instr, struct arm_instr *ai)
+{
+	mmio->is_write = ai->w;
+	mmio->len = ai->len;
+	decode->sign_extend = ai->sign_extend;
+
+	ai->register_form = !((instr >> 22) & 1);
+	ai->imm = ((instr >> 4) & 0xf0) | (instr & 0xf);
+	ai->Rm = instr & 0xf;
+	ai->type = 0; /* SRType_LSL */
+	ai->shift_n = 0;
+
+	return decode_arm_wb(decode, mmio, instr, ai);
+}
+
+/*
+ * The encodings in this table assumes that a fault was generated where the
+ * ISV field in the HSR was clear, and the decoding information was invalid,
+ * which means that a register write-back occurred, the PC was used as the
+ * destination or a load/store multiple operation was used. Since the latter
+ * two cases are crazy for MMIO on the guest side, we simply inject a fault
+ * when this happens and support the common case.
+ *
+ * We treat unpriviledged loads and stores of words and bytes like all other
+ * loads and stores as their encodings mandate the W bit set and the P bit
+ * clear.
+ */
+static const struct arm_instr arm_instr[] = {
+	/**************** Load/Store Word and Byte **********************/
+	/* Store word with writeback */
+	{ .opc = 0x04000000, .opc_mask = 0x0c500000, .len = 4, .w = true,
+		.sign_extend = false, .decode = decode_arm_ls },
+	/* Store byte with writeback */
+	{ .opc = 0x04400000, .opc_mask = 0x0c500000, .len = 1, .w = true,
+		.sign_extend = false, .decode = decode_arm_ls },
+	/* Load word with writeback */
+	{ .opc = 0x04100000, .opc_mask = 0x0c500000, .len = 4, .w = false,
+		.sign_extend = false, .decode = decode_arm_ls },
+	/* Load byte with writeback */
+	{ .opc = 0x04500000, .opc_mask = 0x0c500000, .len = 1, .w = false,
+		.sign_extend = false, .decode = decode_arm_ls },
+
+	/*************** Extra load/store instructions ******************/
+
+	/* Store halfword with writeback */
+	{ .opc = 0x000000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = true,
+		.sign_extend = false, .decode = decode_arm_extra },
+	/* Load halfword with writeback */
+	{ .opc = 0x001000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
+		.sign_extend = false, .decode = decode_arm_extra },
+
+	/* Load dual with writeback */
+	{ .opc = 0x000000d0, .opc_mask = 0x0c1000f0, .len = 8, .w = false,
+		.sign_extend = false, .decode = decode_arm_extra },
+	/* Load signed byte with writeback */
+	{ .opc = 0x001000d0, .opc_mask = 0x0c1000f0, .len = 1, .w = false,
+		.sign_extend = true,  .decode = decode_arm_extra },
+
+	/* Store dual with writeback */
+	{ .opc = 0x000000f0, .opc_mask = 0x0c1000f0, .len = 8, .w = true,
+		.sign_extend = false, .decode = decode_arm_extra },
+	/* Load signed halfword with writeback */
+	{ .opc = 0x001000f0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
+		.sign_extend = true,  .decode = decode_arm_extra },
+
+	/* Store halfword unprivileged */
+	{ .opc = 0x002000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = true,
+		.sign_extend = false, .decode = decode_arm_extra },
+	/* Load halfword unprivileged */
+	{ .opc = 0x003000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
+		.sign_extend = false, .decode = decode_arm_extra },
+	/* Load signed byte unprivileged */
+	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 1, .w = false,
+		.sign_extend = true , .decode = decode_arm_extra },
+	/* Load signed halfword unprivileged */
+	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
+		.sign_extend = true , .decode = decode_arm_extra },
+};
+
+static bool kvm_decode_arm_ls(struct kvm_decode *decode, unsigned long instr,
+			      struct kvm_exit_mmio *mmio)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(arm_instr); i++) {
+		const struct arm_instr *ai = &arm_instr[i];
+		if ((instr & ai->opc_mask) == ai->opc) {
+			struct arm_instr ai_copy = *ai;
+			return ai->decode(decode, mmio, instr, &ai_copy);
+		}
+	}
+	return false;
+}
+
+struct thumb_instr {
+	bool is32;
+
+	u8 opcode;
+	u8 opcode_mask;
+	u8 op2;
+	u8 op2_mask;
+
+	bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
+		       unsigned long instr, const struct thumb_instr *ti);
+};
+
+static bool decode_thumb_wb(struct kvm_decode *decode,
+			    struct kvm_exit_mmio *mmio,
+			    unsigned long instr)
+{
+	bool P = (instr >> 10) & 1;
+	bool U = (instr >> 9) & 1;
+	u8 imm8 = instr & 0xff;
+	u32 offset_addr = decode->fault_addr;
+	u8 Rn = (instr >> 16) & 0xf;
+
+	decode->rt = (instr >> 12) & 0xf;
+
+	if (Rn == 15)
+		return false;
+
+	/* Handle Writeback */
+	if (!P && U)
+		*kvm_decode_reg(decode, Rn) = offset_addr + imm8;
+	else if (!P && !U)
+		*kvm_decode_reg(decode, Rn) = offset_addr - imm8;
+	return true;
+}
+
+static bool decode_thumb_str(struct kvm_decode *decode,
+			     struct kvm_exit_mmio *mmio,
+			     unsigned long instr, const struct thumb_instr *ti)
+{
+	u8 op1 = (instr >> (16 + 5)) & 0x7;
+	u8 op2 = (instr >> 6) & 0x3f;
+
+	mmio->is_write = true;
+	decode->sign_extend = false;
+
+	switch (op1) {
+	case 0x0: mmio->len = 1; break;
+	case 0x1: mmio->len = 2; break;
+	case 0x2: mmio->len = 4; break;
+	default:
+		  return false; /* Only register write-back versions! */
+	}
+
+	if ((op2 & 0x24) == 0x24) {
+		/* STRB (immediate, thumb, W=1) */
+		return decode_thumb_wb(decode, mmio, instr);
+	}
+
+	return false;
+}
+
+static bool decode_thumb_ldr(struct kvm_decode *decode,
+			     struct kvm_exit_mmio *mmio,
+			     unsigned long instr, const struct thumb_instr *ti)
+{
+	u8 op1 = (instr >> (16 + 7)) & 0x3;
+	u8 op2 = (instr >> 6) & 0x3f;
+
+	mmio->is_write = false;
+
+	switch (ti->op2 & 0x7) {
+	case 0x1: mmio->len = 1; break;
+	case 0x3: mmio->len = 2; break;
+	case 0x5: mmio->len = 4; break;
+	}
+
+	if (op1 == 0x0)
+		decode->sign_extend = false;
+	else if (op1 == 0x2 && (ti->op2 & 0x7) != 0x5)
+		decode->sign_extend = true;
+	else
+		return false; /* Only register write-back versions! */
+
+	if ((op2 & 0x24) == 0x24) {
+		/* LDR{S}X (immediate, thumb, W=1) */
+		return decode_thumb_wb(decode, mmio, instr);
+	}
+
+	return false;
+}
+
+/*
+ * We only support instruction decoding for valid reasonable MMIO operations
+ * where trapping them do not provide sufficient information in the HSR (no
+ * 16-bit Thumb instructions provide register writeback that we care about).
+ *
+ * The following instruciton types are NOT supported for MMIO operations
+ * despite the HSR not containing decode info:
+ *  - any Load/Store multiple
+ *  - any load/store exclusive
+ *  - any load/store dual
+ *  - anything with the PC as the dest register
+ */
+static const struct thumb_instr thumb_instr[] = {
+	/**************** 32-bit Thumb instructions **********************/
+	/* Store single data item:	Op1 == 11, Op2 == 000xxx0 */
+	{ .is32 = true,  .opcode = 3, .op2 = 0x00, .op2_mask = 0x71,
+						decode_thumb_str	},
+
+	/* Load byte:			Op1 == 11, Op2 == 00xx001 */
+	{ .is32 = true,  .opcode = 3, .op2 = 0x01, .op2_mask = 0x67,
+						decode_thumb_ldr	},
+
+	/* Load halfword:		Op1 == 11, Op2 == 00xx011 */
+	{ .is32 = true,  .opcode = 3, .op2 = 0x03, .op2_mask = 0x67,
+						decode_thumb_ldr	},
+
+	/* Load word:			Op1 == 11, Op2 == 00xx101 */
+	{ .is32 = true,  .opcode = 3, .op2 = 0x05, .op2_mask = 0x67,
+						decode_thumb_ldr	},
+};
+
+
+
+static bool kvm_decode_thumb_ls(struct kvm_decode *decode, unsigned long instr,
+				struct kvm_exit_mmio *mmio)
+{
+	bool is32 = is_wide_instruction(instr);
+	bool is16 = !is32;
+	struct thumb_instr tinstr; /* re-use to pass on already decoded info */
+	int i;
+
+	if (is16) {
+		tinstr.opcode = (instr >> 10) & 0x3f;
+	} else {
+		tinstr.opcode = (instr >> (16 + 11)) & 0x3;
+		tinstr.op2 = (instr >> (16 + 4)) & 0x7f;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(thumb_instr); i++) {
+		const struct thumb_instr *ti = &thumb_instr[i];
+		if (ti->is32 != is32)
+			continue;
+
+		if (is16) {
+			if ((tinstr.opcode & ti->opcode_mask) != ti->opcode)
+				continue;
+		} else {
+			if (ti->opcode != tinstr.opcode)
+				continue;
+			if ((ti->op2_mask & tinstr.op2) != ti->op2)
+				continue;
+		}
+
+		return ti->decode(decode, mmio, instr, &tinstr);
+	}
+
+	return false;
+}
+
+/**
+ * kvm_decode_load_store - decodes load/store instructions
+ * @decode: reads regs and fault_addr, writes rt and sign_extend
+ * @instr:  instruction to decode
+ * @mmio:   fills in len and is_write
+ *
+ * Decode load/store instructions with HSR ISV clear. The code assumes that
+ * this was indeed a KVM fault and therefore assumes registers write back for
+ * single load/store operations and does not support using the PC as the
+ * destination register.
+ */
+int kvm_decode_load_store(struct kvm_decode *decode, unsigned long instr,
+			  struct kvm_exit_mmio *mmio)
+{
+	bool is_thumb;
+
+	is_thumb = !!(*kvm_decode_cpsr(decode) & PSR_T_BIT);
+	if (!is_thumb)
+		return kvm_decode_arm_ls(decode, instr, mmio) ? 0 : 1;
+	else
+		return kvm_decode_thumb_ls(decode, instr, mmio) ? 0 : 1;
+}
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index d61450a..ad743b7 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -20,6 +20,7 @@
 #include <linux/kvm_host.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_decode.h>
 #include <trace/events/kvm.h>
 
 #include "trace.h"
@@ -176,6 +177,174 @@ int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return 1;
 }
 
+static u64 kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv)
+{
+	return kvm_call_hyp(__kvm_va_to_pa, vcpu, va, priv);
+}
+
+/**
+ * copy_from_guest_va - copy memory from guest (very slow!)
+ * @vcpu:	vcpu pointer
+ * @dest:	memory to copy into
+ * @gva:	virtual address in guest to copy from
+ * @len:	length to copy
+ * @priv:	use guest PL1 (ie. kernel) mappings
+ *              otherwise use guest PL0 mappings.
+ *
+ * Returns true on success, false on failure (unlikely, but retry).
+ */
+static bool copy_from_guest_va(struct kvm_vcpu *vcpu,
+			       void *dest, unsigned long gva, size_t len,
+			       bool priv)
+{
+	u64 par;
+	phys_addr_t pc_ipa;
+	int err;
+
+	BUG_ON((gva & PAGE_MASK) != ((gva + len) & PAGE_MASK));
+	par = kvm_va_to_pa(vcpu, gva & PAGE_MASK, priv);
+	if (par & 1) {
+		kvm_err("IO abort from invalid instruction address"
+			" %#lx!\n", gva);
+		return false;
+	}
+
+	BUG_ON(!(par & (1U << 11)));
+	pc_ipa = par & PAGE_MASK & ((1ULL << 32) - 1);
+	pc_ipa += gva & ~PAGE_MASK;
+
+
+	err = kvm_read_guest(vcpu->kvm, pc_ipa, dest, len);
+	if (unlikely(err))
+		return false;
+
+	return true;
+}
+
+/*
+ * We have to be very careful copying memory from a running (ie. SMP) guest.
+ * Another CPU may remap the page (eg. swap out a userspace text page) as we
+ * read the instruction.  Unlike normal hardware operation, to emulate an
+ * instruction we map the virtual to physical address then read that memory
+ * as separate steps, thus not atomic.
+ *
+ * Fortunately this is so rare (we don't usually need the instruction), we
+ * can go very slowly and noone will mind.
+ */
+static bool copy_current_insn(struct kvm_vcpu *vcpu, unsigned long *instr)
+{
+	int i;
+	bool ret;
+	struct kvm_vcpu *v;
+	bool is_thumb;
+	size_t instr_len;
+
+	/* Don't cross with IPIs in kvm_main.c */
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/* Tell them all to pause, so no more will enter guest. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		v->arch.pause = true;
+
+	/* Set ->pause before we read ->mode */
+	smp_mb();
+
+	/* Kick out any which are still running. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm) {
+		/* Guest could exit now, making cpu wrong. That's OK. */
+		if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
+			force_vm_exit(get_cpu_mask(v->cpu));
+		}
+	}
+
+
+	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
+	instr_len = (is_thumb) ? 2 : 4;
+
+	BUG_ON(!is_thumb && *vcpu_pc(vcpu) & 0x3);
+
+	/* Now guest isn't running, we can va->pa map and copy atomically. */
+	ret = copy_from_guest_va(vcpu, instr, *vcpu_pc(vcpu), instr_len,
+				 vcpu_mode_priv(vcpu));
+	if (!ret)
+		goto out;
+
+	/* A 32-bit thumb2 instruction can actually go over a page boundary! */
+	if (is_thumb && is_wide_instruction(*instr)) {
+		*instr = *instr << 16;
+		ret = copy_from_guest_va(vcpu, instr, *vcpu_pc(vcpu) + 2, 2,
+					 vcpu_mode_priv(vcpu));
+	}
+
+out:
+	/* Release them all. */
+	kvm_for_each_vcpu(i, v, vcpu->kvm)
+		v->arch.pause = false;
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return ret;
+}
+
+/**
+ * kvm_emulate_mmio_ls - emulates load/store instructions made to I/O memory
+ * @vcpu:	The vcpu pointer
+ * @fault_ipa:	The IPA that caused the 2nd stage fault
+ * @mmio:      Pointer to struct to hold decode information
+ *
+ * Some load/store instructions cannot be emulated using the information
+ * presented in the HSR, for instance, register write-back instructions are not
+ * supported. We therefore need to fetch the instruction, decode it, and then
+ * emulate its behavior.
+ *
+ * Handles emulation of load/store instructions which cannot be emulated through
+ * information found in the HSR on faults. It is necessary in this case to
+ * simply decode the offending instruction in software and determine the
+ * required operands.
+ */
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			struct kvm_exit_mmio *mmio)
+{
+	unsigned long instr = 0;
+	struct pt_regs current_regs;
+	struct kvm_decode *decode = &vcpu->arch.mmio_decode;
+	int ret;
+
+	trace_kvm_mmio_emulate(*vcpu_pc(vcpu), instr, *vcpu_cpsr(vcpu));
+
+	/* If it fails (SMP race?), we reenter guest for it to retry. */
+	if (!copy_current_insn(vcpu, &instr))
+		return 1;
+
+	mmio->phys_addr = fault_ipa;
+
+	memcpy(&current_regs, &vcpu->arch.regs.usr_regs, sizeof(current_regs));
+	current_regs.ARM_sp = *vcpu_reg(vcpu, 13);
+	current_regs.ARM_lr = *vcpu_reg(vcpu, 14);
+
+	decode->regs = &current_regs;
+	decode->fault_addr = vcpu->arch.hxfar;
+	ret = kvm_decode_load_store(decode, instr, mmio);
+	if (ret) {
+		kvm_debug("Insrn. decode error: %#08lx (cpsr: %#08x"
+			  "pc: %#08x)\n",
+			  instr, *vcpu_cpsr(vcpu), *vcpu_pc(vcpu));
+		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
+		return ret;
+	}
+
+	memcpy(&vcpu->arch.regs.usr_regs, &current_regs, sizeof(current_regs));
+	*vcpu_reg(vcpu, 13) = current_regs.ARM_sp;
+	*vcpu_reg(vcpu, 14) = current_regs.ARM_lr;
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	kvm_skip_instr(vcpu, is_wide_instruction(instr));
+	return 0;
+}
+
 /**
  * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
  * @vcpu:	The VCPU pointer
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 08adcd5..45570b8 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -192,6 +192,44 @@ after_vfp_restore:
 	mov	r0, r1			@ Return the return code
 	bx	lr			@ return to IOCTL
 
+
+/********************************************************************
+ * Translate VA to PA
+ *
+ * u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv)
+ *
+ * Arguments:
+ *  r0: pointer to vcpu struct
+ *  r1: virtual address to map (rounded to page)
+ *  r2: 1 = P1 (read) mapping, 0 = P0 (read) mapping.
+ * Returns 64 bit PAR value.
+ */
+ENTRY(__kvm_va_to_pa)
+	push	{r4-r12}
+
+	@ Fold flag into r1, easier than using stack.
+	cmp	r2, #0
+	movne	r2, #1
+	orr	r1, r1, r2
+
+	@ This swaps too many registers, but we're in the slow path anyway.
+	read_cp15_state store_to_vcpu = 0
+	write_cp15_state read_from_vcpu = 1
+
+	ands	r2, r1, #1
+	bic	r1, r1, r2
+	mcrne	p15, 0, r1, c7, c8, 0	@ VA to PA, ATS1CPR
+	mcreq	p15, 0, r1, c7, c8, 2	@ VA to PA, ATS1CUR
+	isb
+
+	@ Restore host state.
+	read_cp15_state store_to_vcpu = 1
+	write_cp15_state read_from_vcpu = 0
+
+	mrrc	p15, 0, r0, r1, c7	@ PAR
+	pop	{r4-r12}
+	bx	lr
+
 ENTRY(kvm_call_hyp)
 	hvc	#0
 	bx	lr
diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
new file mode 100644
index 0000000..d6a4ca0
--- /dev/null
+++ b/arch/arm/kvm/mmio.c
@@ -0,0 +1,154 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <asm/kvm_mmio.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_decode.h>
+#include <trace/events/kvm.h>
+
+#include "trace.h"
+
+/**
+ * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation
+ * @vcpu: The VCPU pointer
+ * @run:  The VCPU run struct containing the mmio data
+ *
+ * This should only be called after returning from userspace for MMIO load
+ * emulation.
+ */
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	__u32 *dest;
+	unsigned int len;
+	int mask;
+
+	if (!run->mmio.is_write) {
+		dest = vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt);
+		memset(dest, 0, sizeof(int));
+
+		len = run->mmio.len;
+		if (len > 4)
+			return -EINVAL;
+
+		memcpy(dest, run->mmio.data, len);
+
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
+				*((u64 *)run->mmio.data));
+
+		if (vcpu->arch.mmio_decode.sign_extend && len < 4) {
+			mask = 1U << ((len * 8) - 1);
+			*dest = (*dest ^ mask) - mask;
+		}
+	}
+
+	return 0;
+}
+
+static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+		      struct kvm_exit_mmio *mmio)
+{
+	unsigned long rt, len;
+	bool is_write, sign_extend;
+
+	if ((vcpu->arch.hsr >> 8) & 1) {
+		/* cache operation on I/O addr, tell guest unsupported */
+		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
+		return 1;
+	}
+
+	if ((vcpu->arch.hsr >> 7) & 1) {
+		/* page table accesses IO mem: tell guest to fix its TTBR */
+		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
+		return 1;
+	}
+
+	switch ((vcpu->arch.hsr >> 22) & 0x3) {
+	case 0:
+		len = 1;
+		break;
+	case 1:
+		len = 2;
+		break;
+	case 2:
+		len = 4;
+		break;
+	default:
+		kvm_err("Hardware is weird: SAS 0b11 is reserved\n");
+		return -EFAULT;
+	}
+
+	is_write = vcpu->arch.hsr & HSR_WNR;
+	sign_extend = vcpu->arch.hsr & HSR_SSE;
+	rt = (vcpu->arch.hsr & HSR_SRT_MASK) >> HSR_SRT_SHIFT;
+
+	if (kvm_vcpu_reg_is_pc(vcpu, rt)) {
+		/* IO memory trying to read/write pc */
+		kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
+		return 1;
+	}
+
+	mmio->is_write = is_write;
+	mmio->phys_addr = fault_ipa;
+	mmio->len = len;
+	vcpu->arch.mmio_decode.sign_extend = sign_extend;
+	vcpu->arch.mmio_decode.rt = rt;
+
+	/*
+	 * The MMIO instruction is emulated and should not be re-executed
+	 * in the guest.
+	 */
+	kvm_skip_instr(vcpu, (vcpu->arch.hsr >> 25) & 1);
+	return 0;
+}
+
+int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
+		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
+{
+	struct kvm_exit_mmio mmio;
+	unsigned long rt;
+	int ret;
+
+	/*
+	 * Prepare MMIO operation. First stash it in a private
+	 * structure that we can use for in-kernel emulation. If the
+	 * kernel can't handle it, copy it into run->mmio and let user
+	 * space do its magic.
+	 */
+
+	if (vcpu->arch.hsr & HSR_ISV) {
+		ret = decode_hsr(vcpu, fault_ipa, &mmio);
+		if (ret)
+			return ret;
+	} else {
+		ret = kvm_emulate_mmio_ls(vcpu, fault_ipa, &mmio);
+		if (ret)
+			return ret;
+	}
+
+	rt = vcpu->arch.mmio_decode.rt;
+	trace_kvm_mmio((mmio.is_write) ? KVM_TRACE_MMIO_WRITE :
+					 KVM_TRACE_MMIO_READ_UNSATISFIED,
+			mmio.len, fault_ipa,
+			(mmio.is_write) ? *vcpu_reg(vcpu, rt) : 0);
+
+	if (mmio.is_write)
+		memcpy(mmio.data, vcpu_reg(vcpu, rt), mmio.len);
+
+	kvm_prepare_mmio(run, &mmio);
+	return 0;
+}
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 0ce0e77..2a83ac9 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -19,11 +19,13 @@
 #include <linux/mman.h>
 #include <linux/kvm_host.h>
 #include <linux/io.h>
+#include <trace/events/kvm.h>
 #include <asm/idmap.h>
 #include <asm/pgalloc.h>
 #include <asm/cacheflush.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_mmio.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/mach/map.h>
@@ -620,8 +622,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			return -EFAULT;
 		}
 
-		kvm_pr_unimpl("I/O address abort...");
-		return 0;
+		/* Adjust page offset */
+		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
+		return io_mem_abort(vcpu, run, fault_ipa, memslot);
 	}
 
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 5d65751..cd52640 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -90,6 +90,27 @@ TRACE_EVENT(kvm_irq_line,
 		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
 );
 
+TRACE_EVENT(kvm_mmio_emulate,
+	TP_PROTO(unsigned long vcpu_pc, unsigned long instr,
+		 unsigned long cpsr),
+	TP_ARGS(vcpu_pc, instr, cpsr),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	vcpu_pc		)
+		__field(	unsigned long,	instr		)
+		__field(	unsigned long,	cpsr		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_pc		= vcpu_pc;
+		__entry->instr			= instr;
+		__entry->cpsr			= cpsr;
+	),
+
+	TP_printk("Emulate MMIO at: 0x%08lx (instr: %08lx, cpsr: %08lx)",
+		  __entry->vcpu_pc, __entry->instr, __entry->cpsr)
+);
+
 /* Architecturally implementation defined CP15 register access */
 TRACE_EVENT(kvm_emulate_cp15_imp,
 	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 14/14] KVM: ARM: Add maintainer entry for KVM/ARM
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-08 18:40   ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:40 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm; +Cc: Russell King

Add an entry in the MAINTAINERS file for KVM/ARM.

Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 MAINTAINERS |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fa309ab..8349bac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4445,6 +4445,14 @@ F:	arch/s390/include/asm/kvm*
 F:	arch/s390/kvm/
 F:	drivers/s390/kvm/
 
+KERNEL VIRTUAL MACHINE (KVM) FOR ARM
+M:	Christoffer Dall <christofferdall@gmail.com>
+L:	kvmarm@lists.cs.columbia.edu
+W:	http://http://systems.cs.columbia.edu/projects/kvm-arm
+S:	Supported
+F:	arch/arm/include/asm/kvm*
+F:	arch/arm/kvm/
+
 KEXEC
 M:	Eric Biederman <ebiederm@xmission.com>
 W:	http://kernel.org/pub/linux/utils/kernel/kexec/


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 14/14] KVM: ARM: Add maintainer entry for KVM/ARM
@ 2013-01-08 18:40   ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-08 18:40 UTC (permalink / raw)
  To: linux-arm-kernel

Add an entry in the MAINTAINERS file for KVM/ARM.

Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 MAINTAINERS |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fa309ab..8349bac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4445,6 +4445,14 @@ F:	arch/s390/include/asm/kvm*
 F:	arch/s390/kvm/
 F:	drivers/s390/kvm/
 
+KERNEL VIRTUAL MACHINE (KVM) FOR ARM
+M:	Christoffer Dall <christofferdall@gmail.com>
+L:	kvmarm at lists.cs.columbia.edu
+W:	http://http://systems.cs.columbia.edu/projects/kvm-arm
+S:	Supported
+F:	arch/arm/include/asm/kvm*
+F:	arch/arm/kvm/
+
 KEXEC
 M:	Eric Biederman <ebiederm@xmission.com>
 W:	http://kernel.org/pub/linux/utils/kernel/kexec/

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 02/14] ARM: Section based HYP idmap
  2013-01-08 18:38   ` Christoffer Dall
@ 2013-01-14 10:27     ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-14 10:27 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Will Deacon

On Tue, Jan 08, 2013 at 01:38:48PM -0500, Christoffer Dall wrote:
> Add a method (hyp_idmap_setup) to populate a hyp pgd with an
> identity mapping of the code contained in the .hyp.idmap.text
> section.
> 
> Offer a method to drop this identity mapping through
> hyp_idmap_teardown.
> 
> Make all the above depend on CONFIG_ARM_VIRT_EXT and CONFIG_ARM_LPAE.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/idmap.h                |    1 +
>  arch/arm/include/asm/pgtable-3level-hwdef.h |    1 +
>  arch/arm/kernel/vmlinux.lds.S               |    6 +++
>  arch/arm/mm/idmap.c                         |   54 ++++++++++++++++++++++-----
>  4 files changed, 50 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
> index bf863ed..1a66f907 100644
> --- a/arch/arm/include/asm/idmap.h
> +++ b/arch/arm/include/asm/idmap.h
> @@ -8,6 +8,7 @@
>  #define __idmap __section(.idmap.text) noinline notrace
>  
>  extern pgd_t *idmap_pgd;
> +extern pgd_t *hyp_pgd;
>  
>  void setup_mm_for_reboot(void);
>  
> diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
> index d795282..a2d404e 100644
> --- a/arch/arm/include/asm/pgtable-3level-hwdef.h
> +++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
> @@ -44,6 +44,7 @@
>  #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
>  #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
>  #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
> +#define PMD_SECT_AP1		(_AT(pmdval_t, 1) << 6)
>  #define PMD_SECT_TEX(x)		(_AT(pmdval_t, 0))
>  
>  /*
> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index 11c1785..b571484 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -19,7 +19,11 @@
>  	ALIGN_FUNCTION();						\
>  	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
>  	*(.idmap.text)							\
> -	VMLINUX_SYMBOL(__idmap_text_end) = .;
> +	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
> +	ALIGN_FUNCTION();						\
> +	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
> +	*(.hyp.idmap.text)						\
> +	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  #define ARM_CPU_DISCARD(x)
> diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
> index 99db769..d9213a5 100644
> --- a/arch/arm/mm/idmap.c
> +++ b/arch/arm/mm/idmap.c
> @@ -1,4 +1,6 @@
> +#include <linux/module.h>
>  #include <linux/kernel.h>
> +#include <linux/slab.h>
>  
>  #include <asm/cputype.h>
>  #include <asm/idmap.h>
> @@ -6,6 +8,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/sections.h>
>  #include <asm/system_info.h>
> +#include <asm/virt.h>
>  
>  pgd_t *idmap_pgd;
>  
> @@ -59,11 +62,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>  	} while (pud++, addr = next, addr != end);
>  }
>  
> -static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
> +static void identity_mapping_add(pgd_t *pgd, const char *text_start,
> +				 const char *text_end, unsigned long prot)
>  {
> -	unsigned long prot, next;
> +	unsigned long addr, end;
> +	unsigned long next;
> +
> +	addr = virt_to_phys(text_start);
> +	end = virt_to_phys(text_end);
How does this work with phys addresses greater than 32bit (with
LPAE)? This was the same before the patch too, but I am still
curious. Since __virt_to_phys() returns unsigned long kernel cannot be
put in high memory, right?


> +
> +	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
> +		prot ? "HYP " : "",
> +		(long long)addr, (long long)end);
> +	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
>  
> -	prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
>  	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
>  		prot |= PMD_BIT4;
>  
> @@ -74,28 +86,48 @@ static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long e
>  	} while (pgd++, addr = next, addr != end);
>  }
>  
> +#ifdef CONFIG_ARM_VIRT_EXT
> +pgd_t *hyp_pgd;
> +
> +extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
> +
> +static int __init init_static_idmap_hyp(void)
> +{
> +	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
> +	if (!hyp_pgd)
> +		return -ENOMEM;
> +
> +	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
> +			     __hyp_idmap_text_end, PMD_SECT_AP1);
> +
> +	return 0;
> +}
> +#else
> +static int __init init_static_idmap_hyp(void)
> +{
> +	return 0;
> +}
> +#endif
> +
>  extern char  __idmap_text_start[], __idmap_text_end[];
>  
>  static int __init init_static_idmap(void)
>  {
> -	phys_addr_t idmap_start, idmap_end;
> +	int ret;
>  
>  	idmap_pgd = pgd_alloc(&init_mm);
>  	if (!idmap_pgd)
>  		return -ENOMEM;
>  
> -	/* Add an identity mapping for the physical address of the section. */
> -	idmap_start = virt_to_phys((void *)__idmap_text_start);
> -	idmap_end = virt_to_phys((void *)__idmap_text_end);
> +	identity_mapping_add(idmap_pgd, __idmap_text_start,
> +			     __idmap_text_end, 0);
>  
> -	pr_info("Setting up static identity map for 0x%llx - 0x%llx\n",
> -		(long long)idmap_start, (long long)idmap_end);
> -	identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
> +	ret = init_static_idmap_hyp();
>  
>  	/* Flush L1 for the hardware to see this page table content */
>  	flush_cache_louis();
>  
> -	return 0;
> +	return ret;
>  }
>  early_initcall(init_static_idmap);
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 02/14] ARM: Section based HYP idmap
@ 2013-01-14 10:27     ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-14 10:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 08, 2013 at 01:38:48PM -0500, Christoffer Dall wrote:
> Add a method (hyp_idmap_setup) to populate a hyp pgd with an
> identity mapping of the code contained in the .hyp.idmap.text
> section.
> 
> Offer a method to drop this identity mapping through
> hyp_idmap_teardown.
> 
> Make all the above depend on CONFIG_ARM_VIRT_EXT and CONFIG_ARM_LPAE.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/idmap.h                |    1 +
>  arch/arm/include/asm/pgtable-3level-hwdef.h |    1 +
>  arch/arm/kernel/vmlinux.lds.S               |    6 +++
>  arch/arm/mm/idmap.c                         |   54 ++++++++++++++++++++++-----
>  4 files changed, 50 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
> index bf863ed..1a66f907 100644
> --- a/arch/arm/include/asm/idmap.h
> +++ b/arch/arm/include/asm/idmap.h
> @@ -8,6 +8,7 @@
>  #define __idmap __section(.idmap.text) noinline notrace
>  
>  extern pgd_t *idmap_pgd;
> +extern pgd_t *hyp_pgd;
>  
>  void setup_mm_for_reboot(void);
>  
> diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
> index d795282..a2d404e 100644
> --- a/arch/arm/include/asm/pgtable-3level-hwdef.h
> +++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
> @@ -44,6 +44,7 @@
>  #define PMD_SECT_XN		(_AT(pmdval_t, 1) << 54)
>  #define PMD_SECT_AP_WRITE	(_AT(pmdval_t, 0))
>  #define PMD_SECT_AP_READ	(_AT(pmdval_t, 0))
> +#define PMD_SECT_AP1		(_AT(pmdval_t, 1) << 6)
>  #define PMD_SECT_TEX(x)		(_AT(pmdval_t, 0))
>  
>  /*
> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index 11c1785..b571484 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -19,7 +19,11 @@
>  	ALIGN_FUNCTION();						\
>  	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
>  	*(.idmap.text)							\
> -	VMLINUX_SYMBOL(__idmap_text_end) = .;
> +	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
> +	ALIGN_FUNCTION();						\
> +	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
> +	*(.hyp.idmap.text)						\
> +	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  #define ARM_CPU_DISCARD(x)
> diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
> index 99db769..d9213a5 100644
> --- a/arch/arm/mm/idmap.c
> +++ b/arch/arm/mm/idmap.c
> @@ -1,4 +1,6 @@
> +#include <linux/module.h>
>  #include <linux/kernel.h>
> +#include <linux/slab.h>
>  
>  #include <asm/cputype.h>
>  #include <asm/idmap.h>
> @@ -6,6 +8,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/sections.h>
>  #include <asm/system_info.h>
> +#include <asm/virt.h>
>  
>  pgd_t *idmap_pgd;
>  
> @@ -59,11 +62,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>  	} while (pud++, addr = next, addr != end);
>  }
>  
> -static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
> +static void identity_mapping_add(pgd_t *pgd, const char *text_start,
> +				 const char *text_end, unsigned long prot)
>  {
> -	unsigned long prot, next;
> +	unsigned long addr, end;
> +	unsigned long next;
> +
> +	addr = virt_to_phys(text_start);
> +	end = virt_to_phys(text_end);
How does this work with phys addresses greater than 32bit (with
LPAE)? This was the same before the patch too, but I am still
curious. Since __virt_to_phys() returns unsigned long kernel cannot be
put in high memory, right?


> +
> +	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
> +		prot ? "HYP " : "",
> +		(long long)addr, (long long)end);
> +	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
>  
> -	prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
>  	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
>  		prot |= PMD_BIT4;
>  
> @@ -74,28 +86,48 @@ static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long e
>  	} while (pgd++, addr = next, addr != end);
>  }
>  
> +#ifdef CONFIG_ARM_VIRT_EXT
> +pgd_t *hyp_pgd;
> +
> +extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
> +
> +static int __init init_static_idmap_hyp(void)
> +{
> +	hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
> +	if (!hyp_pgd)
> +		return -ENOMEM;
> +
> +	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
> +			     __hyp_idmap_text_end, PMD_SECT_AP1);
> +
> +	return 0;
> +}
> +#else
> +static int __init init_static_idmap_hyp(void)
> +{
> +	return 0;
> +}
> +#endif
> +
>  extern char  __idmap_text_start[], __idmap_text_end[];
>  
>  static int __init init_static_idmap(void)
>  {
> -	phys_addr_t idmap_start, idmap_end;
> +	int ret;
>  
>  	idmap_pgd = pgd_alloc(&init_mm);
>  	if (!idmap_pgd)
>  		return -ENOMEM;
>  
> -	/* Add an identity mapping for the physical address of the section. */
> -	idmap_start = virt_to_phys((void *)__idmap_text_start);
> -	idmap_end = virt_to_phys((void *)__idmap_text_end);
> +	identity_mapping_add(idmap_pgd, __idmap_text_start,
> +			     __idmap_text_end, 0);
>  
> -	pr_info("Setting up static identity map for 0x%llx - 0x%llx\n",
> -		(long long)idmap_start, (long long)idmap_end);
> -	identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
> +	ret = init_static_idmap_hyp();
>  
>  	/* Flush L1 for the hardware to see this page table content */
>  	flush_cache_louis();
>  
> -	return 0;
> +	return ret;
>  }
>  early_initcall(init_static_idmap);
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 02/14] ARM: Section based HYP idmap
  2013-01-14 10:27     ` Gleb Natapov
@ 2013-01-14 10:49       ` Will Deacon
  -1 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 10:49 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christoffer Dall, kvm, Marc Zyngier, Marcelo Tosatti, kvmarm,
	linux-arm-kernel

On Mon, Jan 14, 2013 at 10:27:21AM +0000, Gleb Natapov wrote:
> On Tue, Jan 08, 2013 at 01:38:48PM -0500, Christoffer Dall wrote:
> > Add a method (hyp_idmap_setup) to populate a hyp pgd with an
> > identity mapping of the code contained in the .hyp.idmap.text
> > section.
> > 
> > Offer a method to drop this identity mapping through
> > hyp_idmap_teardown.
> > 
> > Make all the above depend on CONFIG_ARM_VIRT_EXT and CONFIG_ARM_LPAE.
> > 
> > Cc: Will Deacon <will.deacon@arm.com>
> > Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> > ---
> >  arch/arm/include/asm/idmap.h                |    1 +
> >  arch/arm/include/asm/pgtable-3level-hwdef.h |    1 +
> >  arch/arm/kernel/vmlinux.lds.S               |    6 +++
> >  arch/arm/mm/idmap.c                         |   54 ++++++++++++++++++++++-----
> >  4 files changed, 50 insertions(+), 12 deletions(-)

[...]

> > -static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
> > +static void identity_mapping_add(pgd_t *pgd, const char *text_start,
> > +				 const char *text_end, unsigned long prot)
> >  {
> > -	unsigned long prot, next;
> > +	unsigned long addr, end;
> > +	unsigned long next;
> > +
> > +	addr = virt_to_phys(text_start);
> > +	end = virt_to_phys(text_end);
> How does this work with phys addresses greater than 32bit (with
> LPAE)? This was the same before the patch too, but I am still
> curious. Since __virt_to_phys() returns unsigned long kernel cannot be
> put in high memory, right?

Well, AArch32 (arch/arm/) only supports 32-bit virtual addresses by virtue
of the fact that our registers are only 32 bits wide, so we can't
identity-map physical addresses above the 4GB boundary.

You may want to look at the keystone patches from TI for insight about
kernels at high (>32-bit) addresses, although I've not seen any activity
around that for some time now (which is a pity, because the code-patching
stuff was in a good shape).

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 02/14] ARM: Section based HYP idmap
@ 2013-01-14 10:49       ` Will Deacon
  0 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 10:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 10:27:21AM +0000, Gleb Natapov wrote:
> On Tue, Jan 08, 2013 at 01:38:48PM -0500, Christoffer Dall wrote:
> > Add a method (hyp_idmap_setup) to populate a hyp pgd with an
> > identity mapping of the code contained in the .hyp.idmap.text
> > section.
> > 
> > Offer a method to drop this identity mapping through
> > hyp_idmap_teardown.
> > 
> > Make all the above depend on CONFIG_ARM_VIRT_EXT and CONFIG_ARM_LPAE.
> > 
> > Cc: Will Deacon <will.deacon@arm.com>
> > Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> > ---
> >  arch/arm/include/asm/idmap.h                |    1 +
> >  arch/arm/include/asm/pgtable-3level-hwdef.h |    1 +
> >  arch/arm/kernel/vmlinux.lds.S               |    6 +++
> >  arch/arm/mm/idmap.c                         |   54 ++++++++++++++++++++++-----
> >  4 files changed, 50 insertions(+), 12 deletions(-)

[...]

> > -static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
> > +static void identity_mapping_add(pgd_t *pgd, const char *text_start,
> > +				 const char *text_end, unsigned long prot)
> >  {
> > -	unsigned long prot, next;
> > +	unsigned long addr, end;
> > +	unsigned long next;
> > +
> > +	addr = virt_to_phys(text_start);
> > +	end = virt_to_phys(text_end);
> How does this work with phys addresses greater than 32bit (with
> LPAE)? This was the same before the patch too, but I am still
> curious. Since __virt_to_phys() returns unsigned long kernel cannot be
> put in high memory, right?

Well, AArch32 (arch/arm/) only supports 32-bit virtual addresses by virtue
of the fact that our registers are only 32 bits wide, so we can't
identity-map physical addresses above the 4GB boundary.

You may want to look at the keystone patches from TI for insight about
kernels at high (>32-bit) addresses, although I've not seen any activity
around that for some time now (which is a pity, because the code-patching
stuff was in a good shape).

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 02/14] ARM: Section based HYP idmap
  2013-01-14 10:49       ` Will Deacon
@ 2013-01-14 11:07         ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-14 11:07 UTC (permalink / raw)
  To: Will Deacon
  Cc: Christoffer Dall, kvm, Marc Zyngier, Marcelo Tosatti, kvmarm,
	linux-arm-kernel

On Mon, Jan 14, 2013 at 10:49:53AM +0000, Will Deacon wrote:
> On Mon, Jan 14, 2013 at 10:27:21AM +0000, Gleb Natapov wrote:
> > On Tue, Jan 08, 2013 at 01:38:48PM -0500, Christoffer Dall wrote:
> > > Add a method (hyp_idmap_setup) to populate a hyp pgd with an
> > > identity mapping of the code contained in the .hyp.idmap.text
> > > section.
> > > 
> > > Offer a method to drop this identity mapping through
> > > hyp_idmap_teardown.
> > > 
> > > Make all the above depend on CONFIG_ARM_VIRT_EXT and CONFIG_ARM_LPAE.
> > > 
> > > Cc: Will Deacon <will.deacon@arm.com>
> > > Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > > Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> > > ---
> > >  arch/arm/include/asm/idmap.h                |    1 +
> > >  arch/arm/include/asm/pgtable-3level-hwdef.h |    1 +
> > >  arch/arm/kernel/vmlinux.lds.S               |    6 +++
> > >  arch/arm/mm/idmap.c                         |   54 ++++++++++++++++++++++-----
> > >  4 files changed, 50 insertions(+), 12 deletions(-)
> 
> [...]
> 
> > > -static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
> > > +static void identity_mapping_add(pgd_t *pgd, const char *text_start,
> > > +				 const char *text_end, unsigned long prot)
> > >  {
> > > -	unsigned long prot, next;
> > > +	unsigned long addr, end;
> > > +	unsigned long next;
> > > +
> > > +	addr = virt_to_phys(text_start);
> > > +	end = virt_to_phys(text_end);
> > How does this work with phys addresses greater than 32bit (with
> > LPAE)? This was the same before the patch too, but I am still
> > curious. Since __virt_to_phys() returns unsigned long kernel cannot be
> > put in high memory, right?
> 
> Well, AArch32 (arch/arm/) only supports 32-bit virtual addresses by virtue
> of the fact that our registers are only 32 bits wide, so we can't
> identity-map physical addresses above the 4GB boundary.
> 
Ah, of course. This is ident map so by definition it cannot map phys
addresses above 4G. And since __virt_to_phys() suppose to work only on
ident map it's OK to returns unsigned long.


> You may want to look at the keystone patches from TI for insight about
> kernels at high (>32-bit) addresses, although I've not seen any activity
> around that for some time now (which is a pity, because the code-patching
> stuff was in a good shape).
> 
> Will

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 02/14] ARM: Section based HYP idmap
@ 2013-01-14 11:07         ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-14 11:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 10:49:53AM +0000, Will Deacon wrote:
> On Mon, Jan 14, 2013 at 10:27:21AM +0000, Gleb Natapov wrote:
> > On Tue, Jan 08, 2013 at 01:38:48PM -0500, Christoffer Dall wrote:
> > > Add a method (hyp_idmap_setup) to populate a hyp pgd with an
> > > identity mapping of the code contained in the .hyp.idmap.text
> > > section.
> > > 
> > > Offer a method to drop this identity mapping through
> > > hyp_idmap_teardown.
> > > 
> > > Make all the above depend on CONFIG_ARM_VIRT_EXT and CONFIG_ARM_LPAE.
> > > 
> > > Cc: Will Deacon <will.deacon@arm.com>
> > > Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > > Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> > > ---
> > >  arch/arm/include/asm/idmap.h                |    1 +
> > >  arch/arm/include/asm/pgtable-3level-hwdef.h |    1 +
> > >  arch/arm/kernel/vmlinux.lds.S               |    6 +++
> > >  arch/arm/mm/idmap.c                         |   54 ++++++++++++++++++++++-----
> > >  4 files changed, 50 insertions(+), 12 deletions(-)
> 
> [...]
> 
> > > -static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
> > > +static void identity_mapping_add(pgd_t *pgd, const char *text_start,
> > > +				 const char *text_end, unsigned long prot)
> > >  {
> > > -	unsigned long prot, next;
> > > +	unsigned long addr, end;
> > > +	unsigned long next;
> > > +
> > > +	addr = virt_to_phys(text_start);
> > > +	end = virt_to_phys(text_end);
> > How does this work with phys addresses greater than 32bit (with
> > LPAE)? This was the same before the patch too, but I am still
> > curious. Since __virt_to_phys() returns unsigned long kernel cannot be
> > put in high memory, right?
> 
> Well, AArch32 (arch/arm/) only supports 32-bit virtual addresses by virtue
> of the fact that our registers are only 32 bits wide, so we can't
> identity-map physical addresses above the 4GB boundary.
> 
Ah, of course. This is ident map so by definition it cannot map phys
addresses above 4G. And since __virt_to_phys() suppose to work only on
ident map it's OK to returns unsigned long.


> You may want to look at the keystone patches from TI for insight about
> kernels at high (>32-bit) addresses, although I've not seen any activity
> around that for some time now (which is a pity, because the code-patching
> stuff was in a good shape).
> 
> Will

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 02/14] ARM: Section based HYP idmap
  2013-01-14 11:07         ` Gleb Natapov
@ 2013-01-14 13:07           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 13:07 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Will Deacon, Christoffer Dall, kvm, Marc Zyngier,
	Marcelo Tosatti, kvmarm, linux-arm-kernel

On Mon, Jan 14, 2013 at 01:07:56PM +0200, Gleb Natapov wrote:
> Ah, of course. This is ident map so by definition it cannot map phys
> addresses above 4G. And since __virt_to_phys() suppose to work only on
> ident map it's OK to returns unsigned long.

Let's get this right... the lack of correct definition in these
comments needs correction.

Firstly, __virt_to_phys() is an ARM-arch private function.  Only
ARM-arch private code should be using it - it must not be used outside
that context.

Secondly, it's "public" counterpart, virt_to_phys(), and itself are
valid _only_ for what we call the "kernel direct map", which are the
addresses corresponding with the lowmem pages mapped from PAGE_OFFSET
up to the _start_ of vmalloc space.  No other mapping is valid for this.

That means that addresses in the identity mapping, if the identity
mapping is outside of the range {PAGE_OFFSET..vmalloc start}, are _not_
valid for virt_to_phys().

The same is true of their counterparts, __phys_to_virt() and
phys_to_virt().  These are _only_ valid for physical addresses
corresponding to the pages mapped in as "lowmem" and they will return
addresses for that mapping of the pages.

Both these functions are invalid when used on highmem pages.
*virt_to_phys() is invalid when used on pointers returned from
ioremap(), vmalloc(), vmap(), dma_alloc_coherent(), and any other
interface which remaps memory.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 02/14] ARM: Section based HYP idmap
@ 2013-01-14 13:07           ` Russell King - ARM Linux
  0 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 13:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 01:07:56PM +0200, Gleb Natapov wrote:
> Ah, of course. This is ident map so by definition it cannot map phys
> addresses above 4G. And since __virt_to_phys() suppose to work only on
> ident map it's OK to returns unsigned long.

Let's get this right... the lack of correct definition in these
comments needs correction.

Firstly, __virt_to_phys() is an ARM-arch private function.  Only
ARM-arch private code should be using it - it must not be used outside
that context.

Secondly, it's "public" counterpart, virt_to_phys(), and itself are
valid _only_ for what we call the "kernel direct map", which are the
addresses corresponding with the lowmem pages mapped from PAGE_OFFSET
up to the _start_ of vmalloc space.  No other mapping is valid for this.

That means that addresses in the identity mapping, if the identity
mapping is outside of the range {PAGE_OFFSET..vmalloc start}, are _not_
valid for virt_to_phys().

The same is true of their counterparts, __phys_to_virt() and
phys_to_virt().  These are _only_ valid for physical addresses
corresponding to the pages mapped in as "lowmem" and they will return
addresses for that mapping of the pages.

Both these functions are invalid when used on highmem pages.
*virt_to_phys() is invalid when used on pointers returned from
ioremap(), vmalloc(), vmap(), dma_alloc_coherent(), and any other
interface which remaps memory.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-08 18:38   ` Christoffer Dall
@ 2013-01-14 15:09     ` Will Deacon
  -1 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 15:09 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

On Tue, Jan 08, 2013 at 06:38:55PM +0000, Christoffer Dall wrote:
> Targets KVM support for Cortex A-15 processors.
> 
> Contains all the framework components, make files, header files, some
> tracing functionality, and basic user space API.
> 
> Only supported core is Cortex-A15 for now.
> 
> Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.
> 
> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  Documentation/virtual/kvm/api.txt  |   57 +++++-
>  arch/arm/Kconfig                   |    2
>  arch/arm/Makefile                  |    1
>  arch/arm/include/asm/kvm_arm.h     |   24 ++
>  arch/arm/include/asm/kvm_asm.h     |   58 ++++++
>  arch/arm/include/asm/kvm_coproc.h  |   24 ++
>  arch/arm/include/asm/kvm_emulate.h |   50 +++++
>  arch/arm/include/asm/kvm_host.h    |  114 ++++++++++++
>  arch/arm/include/uapi/asm/kvm.h    |  106 +++++++++++

[...]

> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> new file mode 100644
> index 0000000..c6298b1
> --- /dev/null
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -0,0 +1,106 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_H__
> +#define __ARM_KVM_H__
> +
> +#include <asm/types.h>
> +#include <asm/ptrace.h>

I think you want linux/types.h, as asm/types.h isn't exported from what I
can tell. make headers_check screams about it too:

/home/will/sources/linux/linux/usr/include/asm/kvm.h:22: include of <linux/types.h> is preferred over <asm/types.h>
/home/will/sources/linux/linux/usr/include/asm/kvm.h:57: found __[us]{8,16,32,64} type without #include <linux/types.h>

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-14 15:09     ` Will Deacon
  0 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 15:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 08, 2013 at 06:38:55PM +0000, Christoffer Dall wrote:
> Targets KVM support for Cortex A-15 processors.
> 
> Contains all the framework components, make files, header files, some
> tracing functionality, and basic user space API.
> 
> Only supported core is Cortex-A15 for now.
> 
> Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.
> 
> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  Documentation/virtual/kvm/api.txt  |   57 +++++-
>  arch/arm/Kconfig                   |    2
>  arch/arm/Makefile                  |    1
>  arch/arm/include/asm/kvm_arm.h     |   24 ++
>  arch/arm/include/asm/kvm_asm.h     |   58 ++++++
>  arch/arm/include/asm/kvm_coproc.h  |   24 ++
>  arch/arm/include/asm/kvm_emulate.h |   50 +++++
>  arch/arm/include/asm/kvm_host.h    |  114 ++++++++++++
>  arch/arm/include/uapi/asm/kvm.h    |  106 +++++++++++

[...]

> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> new file mode 100644
> index 0000000..c6298b1
> --- /dev/null
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -0,0 +1,106 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_H__
> +#define __ARM_KVM_H__
> +
> +#include <asm/types.h>
> +#include <asm/ptrace.h>

I think you want linux/types.h, as asm/types.h isn't exported from what I
can tell. make headers_check screams about it too:

/home/will/sources/linux/linux/usr/include/asm/kvm.h:22: include of <linux/types.h> is preferred over <asm/types.h>
/home/will/sources/linux/linux/usr/include/asm/kvm.h:57: found __[us]{8,16,32,64} type without #include <linux/types.h>

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 04/14] KVM: ARM: Hypervisor initialization
  2013-01-08 18:39   ` Christoffer Dall
@ 2013-01-14 15:11     ` Will Deacon
  -1 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 15:11 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti

On Tue, Jan 08, 2013 at 06:39:03PM +0000, Christoffer Dall wrote:
> Sets up KVM code to handle all exceptions taken to Hyp mode.
> 
> When the kernel is booted in Hyp mode, calling an hvc instruction with r0
> pointing to the new vectors, the HVBAR is changed to the the vector pointers.
> This allows subsystems (like KVM here) to execute code in Hyp-mode with the
> MMU disabled.
> 
> We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
> the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
> point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
> to perform a world-switch into a KVM guest.
> 
> Also provides memory mapping code to map required code pages, data structures,
> and I/O regions  accessed in Hyp mode at the same virtual address as the host
> kernel virtual addresses, but which conforms to the architectural requirements
> for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
> and comprises:
>  - create_hyp_mappings(from, to);
>  - create_hyp_io_mappings(from, to, phys_addr);
>  - free_hyp_pmds();

[...]

> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 82cb338..2dddc58 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -34,11 +34,21 @@
>  #include <asm/ptrace.h>
>  #include <asm/mman.h>
>  #include <asm/cputype.h>
> +#include <asm/tlbflush.h>
> +#include <asm/virt.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_mmu.h>
> 
>  #ifdef REQUIRES_VIRT
>  __asm__(".arch_extension       virt");
>  #endif
> 
> +static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> +static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
> +static unsigned long hyp_default_vectors;
> +
> +
>  int kvm_arch_hardware_enable(void *garbage)
>  {
>         return 0;
> @@ -336,9 +346,176 @@ long kvm_arch_vm_ioctl(struct file *filp,
>         return -EINVAL;
>  }
> 
> +static void cpu_init_hyp_mode(void *vector)
> +{
> +       unsigned long long pgd_ptr;
> +       unsigned long hyp_stack_ptr;
> +       unsigned long stack_page;
> +       unsigned long vector_ptr;
> +
> +       /* Switch from the HYP stub to our own HYP init vector */
> +       __hyp_set_vectors((unsigned long)vector);
> +
> +       pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();
> +       stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
> +       hyp_stack_ptr = stack_page + PAGE_SIZE;
> +       vector_ptr = (unsigned long)__kvm_hyp_vector;
> +
> +       /*
> +        * Call initialization code, and switch to the full blown
> +        * HYP code. The init code corrupts r12, so set the clobber
> +        * list accordingly.
> +        */
> +       asm volatile (
> +               "mov    r0, %[pgd_ptr_low]\n\t"
> +               "mov    r1, %[pgd_ptr_high]\n\t"
> +               "mov    r2, %[hyp_stack_ptr]\n\t"
> +               "mov    r3, %[vector_ptr]\n\t"
> +               "hvc    #0\n\t" : :
> +               [pgd_ptr_low] "r" ((unsigned long)(pgd_ptr & 0xffffffff)),
> +               [pgd_ptr_high] "r" ((unsigned long)(pgd_ptr >> 32ULL)),
> +               [hyp_stack_ptr] "r" (hyp_stack_ptr),
> +               [vector_ptr] "r" (vector_ptr) :
> +               "r0", "r1", "r2", "r3", "r12");
> +}

Use kvm_call_hyp here instead.

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 04/14] KVM: ARM: Hypervisor initialization
@ 2013-01-14 15:11     ` Will Deacon
  0 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 08, 2013 at 06:39:03PM +0000, Christoffer Dall wrote:
> Sets up KVM code to handle all exceptions taken to Hyp mode.
> 
> When the kernel is booted in Hyp mode, calling an hvc instruction with r0
> pointing to the new vectors, the HVBAR is changed to the the vector pointers.
> This allows subsystems (like KVM here) to execute code in Hyp-mode with the
> MMU disabled.
> 
> We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
> the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
> point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
> to perform a world-switch into a KVM guest.
> 
> Also provides memory mapping code to map required code pages, data structures,
> and I/O regions  accessed in Hyp mode at the same virtual address as the host
> kernel virtual addresses, but which conforms to the architectural requirements
> for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
> and comprises:
>  - create_hyp_mappings(from, to);
>  - create_hyp_io_mappings(from, to, phys_addr);
>  - free_hyp_pmds();

[...]

> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 82cb338..2dddc58 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -34,11 +34,21 @@
>  #include <asm/ptrace.h>
>  #include <asm/mman.h>
>  #include <asm/cputype.h>
> +#include <asm/tlbflush.h>
> +#include <asm/virt.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_mmu.h>
> 
>  #ifdef REQUIRES_VIRT
>  __asm__(".arch_extension       virt");
>  #endif
> 
> +static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> +static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
> +static unsigned long hyp_default_vectors;
> +
> +
>  int kvm_arch_hardware_enable(void *garbage)
>  {
>         return 0;
> @@ -336,9 +346,176 @@ long kvm_arch_vm_ioctl(struct file *filp,
>         return -EINVAL;
>  }
> 
> +static void cpu_init_hyp_mode(void *vector)
> +{
> +       unsigned long long pgd_ptr;
> +       unsigned long hyp_stack_ptr;
> +       unsigned long stack_page;
> +       unsigned long vector_ptr;
> +
> +       /* Switch from the HYP stub to our own HYP init vector */
> +       __hyp_set_vectors((unsigned long)vector);
> +
> +       pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();
> +       stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
> +       hyp_stack_ptr = stack_page + PAGE_SIZE;
> +       vector_ptr = (unsigned long)__kvm_hyp_vector;
> +
> +       /*
> +        * Call initialization code, and switch to the full blown
> +        * HYP code. The init code corrupts r12, so set the clobber
> +        * list accordingly.
> +        */
> +       asm volatile (
> +               "mov    r0, %[pgd_ptr_low]\n\t"
> +               "mov    r1, %[pgd_ptr_high]\n\t"
> +               "mov    r2, %[hyp_stack_ptr]\n\t"
> +               "mov    r3, %[vector_ptr]\n\t"
> +               "hvc    #0\n\t" : :
> +               [pgd_ptr_low] "r" ((unsigned long)(pgd_ptr & 0xffffffff)),
> +               [pgd_ptr_high] "r" ((unsigned long)(pgd_ptr >> 32ULL)),
> +               [hyp_stack_ptr] "r" (hyp_stack_ptr),
> +               [vector_ptr] "r" (vector_ptr) :
> +               "r0", "r1", "r2", "r3", "r12");
> +}

Use kvm_call_hyp here instead.

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-14 15:09     ` Will Deacon
@ 2013-01-14 15:40       ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 15:40 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

On Mon, Jan 14, 2013 at 10:09 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Jan 08, 2013 at 06:38:55PM +0000, Christoffer Dall wrote:
>> Targets KVM support for Cortex A-15 processors.
>>
>> Contains all the framework components, make files, header files, some
>> tracing functionality, and basic user space API.
>>
>> Only supported core is Cortex-A15 for now.
>>
>> Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.
>>
>> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  Documentation/virtual/kvm/api.txt  |   57 +++++-
>>  arch/arm/Kconfig                   |    2
>>  arch/arm/Makefile                  |    1
>>  arch/arm/include/asm/kvm_arm.h     |   24 ++
>>  arch/arm/include/asm/kvm_asm.h     |   58 ++++++
>>  arch/arm/include/asm/kvm_coproc.h  |   24 ++
>>  arch/arm/include/asm/kvm_emulate.h |   50 +++++
>>  arch/arm/include/asm/kvm_host.h    |  114 ++++++++++++
>>  arch/arm/include/uapi/asm/kvm.h    |  106 +++++++++++
>
> [...]
>
>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
>> new file mode 100644
>> index 0000000..c6298b1
>> --- /dev/null
>> +++ b/arch/arm/include/uapi/asm/kvm.h
>> @@ -0,0 +1,106 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_H__
>> +#define __ARM_KVM_H__
>> +
>> +#include <asm/types.h>
>> +#include <asm/ptrace.h>
>
> I think you want linux/types.h, as asm/types.h isn't exported from what I
> can tell. make headers_check screams about it too:
>
> /home/will/sources/linux/linux/usr/include/asm/kvm.h:22: include of <linux/types.h> is preferred over <asm/types.h>
> /home/will/sources/linux/linux/usr/include/asm/kvm.h:57: found __[us]{8,16,32,64} type without #include <linux/types.h>
>
right, fixed:

commit 4f880a3224b26a854736f19b21de9d457829940e
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Mon Jan 14 10:39:03 2013 -0500

    KVM: ARM: Include linux/types.h instead of asm/types.h

    Include the right header file.

    Cc: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 972b90d..236f528 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -19,7 +19,7 @@
 #ifndef __ARM_KVM_H__
 #define __ARM_KVM_H__

-#include <asm/types.h>
+#include <linux/types.h>
 #include <asm/ptrace.h>

 #define __KVM_HAVE_GUEST_DEBUG

--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-14 15:40       ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 10:09 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Jan 08, 2013 at 06:38:55PM +0000, Christoffer Dall wrote:
>> Targets KVM support for Cortex A-15 processors.
>>
>> Contains all the framework components, make files, header files, some
>> tracing functionality, and basic user space API.
>>
>> Only supported core is Cortex-A15 for now.
>>
>> Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.
>>
>> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  Documentation/virtual/kvm/api.txt  |   57 +++++-
>>  arch/arm/Kconfig                   |    2
>>  arch/arm/Makefile                  |    1
>>  arch/arm/include/asm/kvm_arm.h     |   24 ++
>>  arch/arm/include/asm/kvm_asm.h     |   58 ++++++
>>  arch/arm/include/asm/kvm_coproc.h  |   24 ++
>>  arch/arm/include/asm/kvm_emulate.h |   50 +++++
>>  arch/arm/include/asm/kvm_host.h    |  114 ++++++++++++
>>  arch/arm/include/uapi/asm/kvm.h    |  106 +++++++++++
>
> [...]
>
>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
>> new file mode 100644
>> index 0000000..c6298b1
>> --- /dev/null
>> +++ b/arch/arm/include/uapi/asm/kvm.h
>> @@ -0,0 +1,106 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +
>> +#ifndef __ARM_KVM_H__
>> +#define __ARM_KVM_H__
>> +
>> +#include <asm/types.h>
>> +#include <asm/ptrace.h>
>
> I think you want linux/types.h, as asm/types.h isn't exported from what I
> can tell. make headers_check screams about it too:
>
> /home/will/sources/linux/linux/usr/include/asm/kvm.h:22: include of <linux/types.h> is preferred over <asm/types.h>
> /home/will/sources/linux/linux/usr/include/asm/kvm.h:57: found __[us]{8,16,32,64} type without #include <linux/types.h>
>
right, fixed:

commit 4f880a3224b26a854736f19b21de9d457829940e
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Mon Jan 14 10:39:03 2013 -0500

    KVM: ARM: Include linux/types.h instead of asm/types.h

    Include the right header file.

    Cc: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 972b90d..236f528 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -19,7 +19,7 @@
 #ifndef __ARM_KVM_H__
 #define __ARM_KVM_H__

-#include <asm/types.h>
+#include <linux/types.h>
 #include <asm/ptrace.h>

 #define __KVM_HAVE_GUEST_DEBUG

--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 00/14] KVM/ARM Implementation
  2013-01-08 18:38 ` Christoffer Dall
@ 2013-01-14 16:00   ` Will Deacon
  -1 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 16:00 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, marc.zyngier, robherring2,
	mark.rutland, arnd

Hi Christoffer,

On Tue, Jan 08, 2013 at 06:38:34PM +0000, Christoffer Dall wrote:
> The following series implements KVM support for ARM processors,
> specifically on the Cortex-A15 platform.

[...]

This is looking pretty good to me now and I feel that the longer it stays
out-of-tree, the more issues will creep in (without continual effort from
yourself and others). I've sent some minor comments (mainly vgic-related)
so, if you fix those, then you can add:

  Reviewed-by: Will Deacon <will.deacon@arm.com>

for the series.

Now, there's a lot of code here and merging isn't completely
straightforward. I propose:

  * The first series should go via Russell's tree. It depends on my
    perf branch for the CPU type stuff, but that should go in for 3.9
    anyway (also via Russell).

  * The vGIC patches need rebasing on top of Rob Herring's work, which
    he sent a pull for over the weekend:

      http://lists.infradead.org/pipermail/linux-arm-kernel/2013-January/141488.html

    In light of that, this stuff will need to go via arm-soc.

  * The hyp arch-timers are in a similar situation to the vGIC: Mark Rutland
    is moving those into drivers:

      http://lists.infradead.org/pipermail/linux-arm-kernel/2013-January/140560.html

    so the kvm bits will need rebasing appropriately and also sent to
    arm-soc (Mark -- I assume you intend to send a PULL for 3.9 for those
    patches?)

Obviously this is all open for discussion, but that seems like the easiest
option to me.

Cheers,

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 00/14] KVM/ARM Implementation
@ 2013-01-14 16:00   ` Will Deacon
  0 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On Tue, Jan 08, 2013 at 06:38:34PM +0000, Christoffer Dall wrote:
> The following series implements KVM support for ARM processors,
> specifically on the Cortex-A15 platform.

[...]

This is looking pretty good to me now and I feel that the longer it stays
out-of-tree, the more issues will creep in (without continual effort from
yourself and others). I've sent some minor comments (mainly vgic-related)
so, if you fix those, then you can add:

  Reviewed-by: Will Deacon <will.deacon@arm.com>

for the series.

Now, there's a lot of code here and merging isn't completely
straightforward. I propose:

  * The first series should go via Russell's tree. It depends on my
    perf branch for the CPU type stuff, but that should go in for 3.9
    anyway (also via Russell).

  * The vGIC patches need rebasing on top of Rob Herring's work, which
    he sent a pull for over the weekend:

      http://lists.infradead.org/pipermail/linux-arm-kernel/2013-January/141488.html

    In light of that, this stuff will need to go via arm-soc.

  * The hyp arch-timers are in a similar situation to the vGIC: Mark Rutland
    is moving those into drivers:

      http://lists.infradead.org/pipermail/linux-arm-kernel/2013-January/140560.html

    so the kvm bits will need rebasing appropriately and also sent to
    arm-soc (Mark -- I assume you intend to send a PULL for 3.9 for those
    patches?)

Obviously this is all open for discussion, but that seems like the easiest
option to me.

Cheers,

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 02/14] ARM: Section based HYP idmap
  2013-01-08 18:38   ` Christoffer Dall
@ 2013-01-14 16:13     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 16:13 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Will Deacon

On Tue, Jan 08, 2013 at 01:38:48PM -0500, Christoffer Dall wrote:
> +	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
> +		prot ? "HYP " : "",
> +		(long long)addr, (long long)end);

There's no point using 0x%llx and casting to 64-bit longs if the arguments
are always going to be 32-bit.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 02/14] ARM: Section based HYP idmap
@ 2013-01-14 16:13     ` Russell King - ARM Linux
  0 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 16:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 08, 2013 at 01:38:48PM -0500, Christoffer Dall wrote:
> +	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
> +		prot ? "HYP " : "",
> +		(long long)addr, (long long)end);

There's no point using 0x%llx and casting to 64-bit longs if the arguments
are always going to be 32-bit.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-08 18:38   ` Christoffer Dall
@ 2013-01-14 16:24     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 16:24 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
> +	/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
> +	for (i = 0; i < sizeof(init->features)*8; i++) {
> +		if (init->features[i / 32] & (1 << (i % 32))) {

Isn't this an open-coded version of test_bit() ?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-14 16:24     ` Russell King - ARM Linux
  0 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
> +	/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
> +	for (i = 0; i < sizeof(init->features)*8; i++) {
> +		if (init->features[i / 32] & (1 << (i % 32))) {

Isn't this an open-coded version of test_bit() ?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 04/14] KVM: ARM: Hypervisor initialization
  2013-01-14 15:11     ` Will Deacon
@ 2013-01-14 16:35       ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 16:35 UTC (permalink / raw)
  To: Will Deacon; +Cc: Marc Zyngier, Marcelo Tosatti, linux-arm-kernel, kvm, kvmarm

On Mon, Jan 14, 2013 at 10:11 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Jan 08, 2013 at 06:39:03PM +0000, Christoffer Dall wrote:
>> Sets up KVM code to handle all exceptions taken to Hyp mode.
>>
>> When the kernel is booted in Hyp mode, calling an hvc instruction with r0
>> pointing to the new vectors, the HVBAR is changed to the the vector pointers.
>> This allows subsystems (like KVM here) to execute code in Hyp-mode with the
>> MMU disabled.
>>
>> We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
>> the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
>> point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
>> to perform a world-switch into a KVM guest.
>>
>> Also provides memory mapping code to map required code pages, data structures,
>> and I/O regions  accessed in Hyp mode at the same virtual address as the host
>> kernel virtual addresses, but which conforms to the architectural requirements
>> for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
>> and comprises:
>>  - create_hyp_mappings(from, to);
>>  - create_hyp_io_mappings(from, to, phys_addr);
>>  - free_hyp_pmds();
>
> [...]
>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 82cb338..2dddc58 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -34,11 +34,21 @@
>>  #include <asm/ptrace.h>
>>  #include <asm/mman.h>
>>  #include <asm/cputype.h>
>> +#include <asm/tlbflush.h>
>> +#include <asm/virt.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/kvm_asm.h>
>> +#include <asm/kvm_mmu.h>
>>
>>  #ifdef REQUIRES_VIRT
>>  __asm__(".arch_extension       virt");
>>  #endif
>>
>> +static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>> +static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>> +static unsigned long hyp_default_vectors;
>> +
>> +
>>  int kvm_arch_hardware_enable(void *garbage)
>>  {
>>         return 0;
>> @@ -336,9 +346,176 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>         return -EINVAL;
>>  }
>>
>> +static void cpu_init_hyp_mode(void *vector)
>> +{
>> +       unsigned long long pgd_ptr;
>> +       unsigned long hyp_stack_ptr;
>> +       unsigned long stack_page;
>> +       unsigned long vector_ptr;
>> +
>> +       /* Switch from the HYP stub to our own HYP init vector */
>> +       __hyp_set_vectors((unsigned long)vector);
>> +
>> +       pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();
>> +       stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
>> +       hyp_stack_ptr = stack_page + PAGE_SIZE;
>> +       vector_ptr = (unsigned long)__kvm_hyp_vector;
>> +
>> +       /*
>> +        * Call initialization code, and switch to the full blown
>> +        * HYP code. The init code corrupts r12, so set the clobber
>> +        * list accordingly.
>> +        */
>> +       asm volatile (
>> +               "mov    r0, %[pgd_ptr_low]\n\t"
>> +               "mov    r1, %[pgd_ptr_high]\n\t"
>> +               "mov    r2, %[hyp_stack_ptr]\n\t"
>> +               "mov    r3, %[vector_ptr]\n\t"
>> +               "hvc    #0\n\t" : :
>> +               [pgd_ptr_low] "r" ((unsigned long)(pgd_ptr & 0xffffffff)),
>> +               [pgd_ptr_high] "r" ((unsigned long)(pgd_ptr >> 32ULL)),
>> +               [hyp_stack_ptr] "r" (hyp_stack_ptr),
>> +               [vector_ptr] "r" (vector_ptr) :
>> +               "r0", "r1", "r2", "r3", "r12");
>> +}
>
> Use kvm_call_hyp here instead.
>
good idea:

commit 00e22196205800ce9caa561e7c806023f4915138
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Mon Jan 14 11:32:36 2013 -0500

    KVM: ARM: Reuse kvm_call_hyp in vcpu_init_hyp_mode

    Instead of directly and manually callin the hypercall into the KVM init
    code, use the kvm_call_hyp function, which only requires a small abuse
    of the prototype in exchange for much nicer C code.

    Cc: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6997326..b5c6ab1 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -971,6 +971,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
 static void cpu_init_hyp_mode(void *vector)
 {
 	unsigned long long pgd_ptr;
+	unsigned long pgd_low, pgd_high;
 	unsigned long hyp_stack_ptr;
 	unsigned long stack_page;
 	unsigned long vector_ptr;
@@ -979,26 +980,20 @@ static void cpu_init_hyp_mode(void *vector)
 	__hyp_set_vectors((unsigned long)vector);

 	pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();
+	pgd_low = (pgd_ptr & ((1ULL << 32) - 1));
+	pgd_high = (pgd_ptr >> 32ULL);
 	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
 	hyp_stack_ptr = stack_page + PAGE_SIZE;
 	vector_ptr = (unsigned long)__kvm_hyp_vector;

 	/*
 	 * Call initialization code, and switch to the full blown
-	 * HYP code. The init code corrupts r12, so set the clobber
-	 * list accordingly.
+	 * HYP code. The init code doesn't need to preserve these registers as
+	 * r1-r3 and r12 are already callee save according to the AAPCS.
+	 * Note that we slightly misuse the prototype by casing the pgd_low to
+	 * a void *.
 	 */
-	asm volatile (
-		"mov	r0, %[pgd_ptr_low]\n\t"
-		"mov	r1, %[pgd_ptr_high]\n\t"
-		"mov	r2, %[hyp_stack_ptr]\n\t"
-		"mov	r3, %[vector_ptr]\n\t"
-		"hvc	#0\n\t" : :
-		[pgd_ptr_low] "r" ((unsigned long)(pgd_ptr & 0xffffffff)),
-		[pgd_ptr_high] "r" ((unsigned long)(pgd_ptr >> 32ULL)),
-		[hyp_stack_ptr] "r" (hyp_stack_ptr),
-		[vector_ptr] "r" (vector_ptr) :
-		"r0", "r1", "r2", "r3", "r12");
+	kvm_call_hyp((void *)pgd_low, pgd_high, hyp_stack_ptr, vector_ptr);
 }

 /**
--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 04/14] KVM: ARM: Hypervisor initialization
@ 2013-01-14 16:35       ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 10:11 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Jan 08, 2013 at 06:39:03PM +0000, Christoffer Dall wrote:
>> Sets up KVM code to handle all exceptions taken to Hyp mode.
>>
>> When the kernel is booted in Hyp mode, calling an hvc instruction with r0
>> pointing to the new vectors, the HVBAR is changed to the the vector pointers.
>> This allows subsystems (like KVM here) to execute code in Hyp-mode with the
>> MMU disabled.
>>
>> We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
>> the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
>> point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
>> to perform a world-switch into a KVM guest.
>>
>> Also provides memory mapping code to map required code pages, data structures,
>> and I/O regions  accessed in Hyp mode at the same virtual address as the host
>> kernel virtual addresses, but which conforms to the architectural requirements
>> for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
>> and comprises:
>>  - create_hyp_mappings(from, to);
>>  - create_hyp_io_mappings(from, to, phys_addr);
>>  - free_hyp_pmds();
>
> [...]
>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 82cb338..2dddc58 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -34,11 +34,21 @@
>>  #include <asm/ptrace.h>
>>  #include <asm/mman.h>
>>  #include <asm/cputype.h>
>> +#include <asm/tlbflush.h>
>> +#include <asm/virt.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/kvm_asm.h>
>> +#include <asm/kvm_mmu.h>
>>
>>  #ifdef REQUIRES_VIRT
>>  __asm__(".arch_extension       virt");
>>  #endif
>>
>> +static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>> +static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>> +static unsigned long hyp_default_vectors;
>> +
>> +
>>  int kvm_arch_hardware_enable(void *garbage)
>>  {
>>         return 0;
>> @@ -336,9 +346,176 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>         return -EINVAL;
>>  }
>>
>> +static void cpu_init_hyp_mode(void *vector)
>> +{
>> +       unsigned long long pgd_ptr;
>> +       unsigned long hyp_stack_ptr;
>> +       unsigned long stack_page;
>> +       unsigned long vector_ptr;
>> +
>> +       /* Switch from the HYP stub to our own HYP init vector */
>> +       __hyp_set_vectors((unsigned long)vector);
>> +
>> +       pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();
>> +       stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
>> +       hyp_stack_ptr = stack_page + PAGE_SIZE;
>> +       vector_ptr = (unsigned long)__kvm_hyp_vector;
>> +
>> +       /*
>> +        * Call initialization code, and switch to the full blown
>> +        * HYP code. The init code corrupts r12, so set the clobber
>> +        * list accordingly.
>> +        */
>> +       asm volatile (
>> +               "mov    r0, %[pgd_ptr_low]\n\t"
>> +               "mov    r1, %[pgd_ptr_high]\n\t"
>> +               "mov    r2, %[hyp_stack_ptr]\n\t"
>> +               "mov    r3, %[vector_ptr]\n\t"
>> +               "hvc    #0\n\t" : :
>> +               [pgd_ptr_low] "r" ((unsigned long)(pgd_ptr & 0xffffffff)),
>> +               [pgd_ptr_high] "r" ((unsigned long)(pgd_ptr >> 32ULL)),
>> +               [hyp_stack_ptr] "r" (hyp_stack_ptr),
>> +               [vector_ptr] "r" (vector_ptr) :
>> +               "r0", "r1", "r2", "r3", "r12");
>> +}
>
> Use kvm_call_hyp here instead.
>
good idea:

commit 00e22196205800ce9caa561e7c806023f4915138
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Mon Jan 14 11:32:36 2013 -0500

    KVM: ARM: Reuse kvm_call_hyp in vcpu_init_hyp_mode

    Instead of directly and manually callin the hypercall into the KVM init
    code, use the kvm_call_hyp function, which only requires a small abuse
    of the prototype in exchange for much nicer C code.

    Cc: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6997326..b5c6ab1 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -971,6 +971,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
 static void cpu_init_hyp_mode(void *vector)
 {
 	unsigned long long pgd_ptr;
+	unsigned long pgd_low, pgd_high;
 	unsigned long hyp_stack_ptr;
 	unsigned long stack_page;
 	unsigned long vector_ptr;
@@ -979,26 +980,20 @@ static void cpu_init_hyp_mode(void *vector)
 	__hyp_set_vectors((unsigned long)vector);

 	pgd_ptr = (unsigned long long)kvm_mmu_get_httbr();
+	pgd_low = (pgd_ptr & ((1ULL << 32) - 1));
+	pgd_high = (pgd_ptr >> 32ULL);
 	stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
 	hyp_stack_ptr = stack_page + PAGE_SIZE;
 	vector_ptr = (unsigned long)__kvm_hyp_vector;

 	/*
 	 * Call initialization code, and switch to the full blown
-	 * HYP code. The init code corrupts r12, so set the clobber
-	 * list accordingly.
+	 * HYP code. The init code doesn't need to preserve these registers as
+	 * r1-r3 and r12 are already callee save according to the AAPCS.
+	 * Note that we slightly misuse the prototype by casing the pgd_low to
+	 * a void *.
 	 */
-	asm volatile (
-		"mov	r0, %[pgd_ptr_low]\n\t"
-		"mov	r1, %[pgd_ptr_high]\n\t"
-		"mov	r2, %[hyp_stack_ptr]\n\t"
-		"mov	r3, %[vector_ptr]\n\t"
-		"hvc	#0\n\t" : :
-		[pgd_ptr_low] "r" ((unsigned long)(pgd_ptr & 0xffffffff)),
-		[pgd_ptr_high] "r" ((unsigned long)(pgd_ptr >> 32ULL)),
-		[hyp_stack_ptr] "r" (hyp_stack_ptr),
-		[vector_ptr] "r" (vector_ptr) :
-		"r0", "r1", "r2", "r3", "r12");
+	kvm_call_hyp((void *)pgd_low, pgd_high, hyp_stack_ptr, vector_ptr);
 }

 /**
--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 08/14] KVM: ARM: Emulation framework and CP15 emulation
  2013-01-08 18:39   ` Christoffer Dall
@ 2013-01-14 16:36     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 16:36 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marcelo Tosatti, Rusty Russell

On Tue, Jan 08, 2013 at 01:39:31PM -0500, Christoffer Dall wrote:
> +	/*
> +	 * Check whether this vcpu requires the cache to be flushed on
> +	 * this physical CPU. This is a consequence of doing dcache
> +	 * operations by set/way on this vcpu. We do it here to be in
> +	 * a non-preemptible section.
> +	 */
> +	if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
> +		cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);

There is cpumask_test_and_clear_cpu() which may be better for this.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 08/14] KVM: ARM: Emulation framework and CP15 emulation
@ 2013-01-14 16:36     ` Russell King - ARM Linux
  0 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 16:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 08, 2013 at 01:39:31PM -0500, Christoffer Dall wrote:
> +	/*
> +	 * Check whether this vcpu requires the cache to be flushed on
> +	 * this physical CPU. This is a consequence of doing dcache
> +	 * operations by set/way on this vcpu. We do it here to be in
> +	 * a non-preemptible section.
> +	 */
> +	if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
> +		cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);

There is cpumask_test_and_clear_cpu() which may be better for this.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-08 18:40   ` Christoffer Dall
@ 2013-01-14 16:43     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 16:43 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> diff --git a/arch/arm/kvm/decode.c b/arch/arm/kvm/decode.c
> new file mode 100644
> index 0000000..469cf14
> --- /dev/null
> +++ b/arch/arm/kvm/decode.c
> @@ -0,0 +1,462 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_mmio.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_decode.h>
> +#include <trace/events/kvm.h>
> +
> +#include "trace.h"
> +
> +struct arm_instr {
> +	/* Instruction decoding */
> +	u32 opc;
> +	u32 opc_mask;
> +
> +	/* Decoding for the register write back */
> +	bool register_form;
> +	u32 imm;
> +	u8 Rm;
> +	u8 type;
> +	u8 shift_n;
> +
> +	/* Common decoding */
> +	u8 len;
> +	bool sign_extend;
> +	bool w;
> +
> +	bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +		       unsigned long instr, struct arm_instr *ai);
> +};
> +
> +enum SRType {
> +	SRType_LSL,
> +	SRType_LSR,
> +	SRType_ASR,
> +	SRType_ROR,
> +	SRType_RRX
> +};
> +
> +/* Modelled after DecodeImmShift() in the ARM ARM */
> +static enum SRType decode_imm_shift(u8 type, u8 imm5, u8 *amount)
> +{
> +	switch (type) {
> +	case 0x0:
> +		*amount = imm5;
> +		return SRType_LSL;
> +	case 0x1:
> +		*amount = (imm5 == 0) ? 32 : imm5;
> +		return SRType_LSR;
> +	case 0x2:
> +		*amount = (imm5 == 0) ? 32 : imm5;
> +		return SRType_ASR;
> +	case 0x3:
> +		if (imm5 == 0) {
> +			*amount = 1;
> +			return SRType_RRX;
> +		} else {
> +			*amount = imm5;
> +			return SRType_ROR;
> +		}
> +	}
> +
> +	return SRType_LSL;
> +}
> +
> +/* Modelled after Shift() in the ARM ARM */
> +static u32 shift(u32 value, u8 N, enum SRType type, u8 amount, bool carry_in)
> +{
> +	u32 mask = (1 << N) - 1;
> +	s32 svalue = (s32)value;
> +
> +	BUG_ON(N > 32);
> +	BUG_ON(type == SRType_RRX && amount != 1);
> +	BUG_ON(amount > N);
> +
> +	if (amount == 0)
> +		return value;
> +
> +	switch (type) {
> +	case SRType_LSL:
> +		value <<= amount;
> +		break;
> +	case SRType_LSR:
> +		 value >>= amount;
> +		break;
> +	case SRType_ASR:
> +		if (value & (1 << (N - 1)))
> +			svalue |= ((-1UL) << N);
> +		value = svalue >> amount;
> +		break;
> +	case SRType_ROR:
> +		value = (value >> amount) | (value << (N - amount));
> +		break;
> +	case SRType_RRX: {
> +		u32 C = (carry_in) ? 1 : 0;
> +		value = (value >> 1) | (C << (N - 1));
> +		break;
> +	}
> +	}
> +
> +	return value & mask;
> +}
> +
> +static bool decode_arm_wb(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +			  unsigned long instr, const struct arm_instr *ai)
> +{
> +	u8 Rt = (instr >> 12) & 0xf;
> +	u8 Rn = (instr >> 16) & 0xf;
> +	u8 W = (instr >> 21) & 1;
> +	u8 U = (instr >> 23) & 1;
> +	u8 P = (instr >> 24) & 1;
> +	u32 base_addr = *kvm_decode_reg(decode, Rn);
> +	u32 offset_addr, offset;
> +
> +	/*
> +	 * Technically this is allowed in certain circumstances,
> +	 * but we don't support it.
> +	 */
> +	if (Rt == 15 || Rn == 15)
> +		return false;
> +
> +	if (P && !W) {
> +		kvm_err("Decoding operation with valid ISV?\n");
> +		return false;
> +	}
> +
> +	decode->rt = Rt;
> +
> +	if (ai->register_form) {
> +		/* Register operation */
> +		enum SRType s_type;
> +		u8 shift_n = 0;
> +		bool c_bit = *kvm_decode_cpsr(decode) & PSR_C_BIT;
> +		u32 s_reg = *kvm_decode_reg(decode, ai->Rm);
> +
> +		s_type = decode_imm_shift(ai->type, ai->shift_n, &shift_n);
> +		offset = shift(s_reg, 5, s_type, shift_n, c_bit);
> +	} else {
> +		/* Immediate operation */
> +		offset = ai->imm;
> +	}
> +
> +	/* Handle Writeback */
> +	if (U)
> +		offset_addr = base_addr + offset;
> +	else
> +		offset_addr = base_addr - offset;
> +	*kvm_decode_reg(decode, Rn) = offset_addr;
> +	return true;
> +}
> +
> +static bool decode_arm_ls(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +			  unsigned long instr, struct arm_instr *ai)
> +{
> +	u8 A = (instr >> 25) & 1;
> +
> +	mmio->is_write = ai->w;
> +	mmio->len = ai->len;
> +	decode->sign_extend = false;
> +
> +	ai->register_form = A;
> +	ai->imm = instr & 0xfff;
> +	ai->Rm = instr & 0xf;
> +	ai->type = (instr >> 5) & 0x3;
> +	ai->shift_n = (instr >> 7) & 0x1f;
> +
> +	return decode_arm_wb(decode, mmio, instr, ai);
> +}
> +
> +static bool decode_arm_extra(struct kvm_decode *decode,
> +			     struct kvm_exit_mmio *mmio,
> +			     unsigned long instr, struct arm_instr *ai)
> +{
> +	mmio->is_write = ai->w;
> +	mmio->len = ai->len;
> +	decode->sign_extend = ai->sign_extend;
> +
> +	ai->register_form = !((instr >> 22) & 1);
> +	ai->imm = ((instr >> 4) & 0xf0) | (instr & 0xf);
> +	ai->Rm = instr & 0xf;
> +	ai->type = 0; /* SRType_LSL */
> +	ai->shift_n = 0;
> +
> +	return decode_arm_wb(decode, mmio, instr, ai);
> +}
> +
> +/*
> + * The encodings in this table assumes that a fault was generated where the
> + * ISV field in the HSR was clear, and the decoding information was invalid,
> + * which means that a register write-back occurred, the PC was used as the
> + * destination or a load/store multiple operation was used. Since the latter
> + * two cases are crazy for MMIO on the guest side, we simply inject a fault
> + * when this happens and support the common case.
> + *
> + * We treat unpriviledged loads and stores of words and bytes like all other
> + * loads and stores as their encodings mandate the W bit set and the P bit
> + * clear.
> + */
> +static const struct arm_instr arm_instr[] = {
> +	/**************** Load/Store Word and Byte **********************/
> +	/* Store word with writeback */
> +	{ .opc = 0x04000000, .opc_mask = 0x0c500000, .len = 4, .w = true,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Store byte with writeback */
> +	{ .opc = 0x04400000, .opc_mask = 0x0c500000, .len = 1, .w = true,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Load word with writeback */
> +	{ .opc = 0x04100000, .opc_mask = 0x0c500000, .len = 4, .w = false,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Load byte with writeback */
> +	{ .opc = 0x04500000, .opc_mask = 0x0c500000, .len = 1, .w = false,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +
> +	/*************** Extra load/store instructions ******************/
> +
> +	/* Store halfword with writeback */
> +	{ .opc = 0x000000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load halfword with writeback */
> +	{ .opc = 0x001000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +
> +	/* Load dual with writeback */
> +	{ .opc = 0x000000d0, .opc_mask = 0x0c1000f0, .len = 8, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed byte with writeback */
> +	{ .opc = 0x001000d0, .opc_mask = 0x0c1000f0, .len = 1, .w = false,
> +		.sign_extend = true,  .decode = decode_arm_extra },
> +
> +	/* Store dual with writeback */
> +	{ .opc = 0x000000f0, .opc_mask = 0x0c1000f0, .len = 8, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed halfword with writeback */
> +	{ .opc = 0x001000f0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
> +		.sign_extend = true,  .decode = decode_arm_extra },
> +
> +	/* Store halfword unprivileged */
> +	{ .opc = 0x002000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load halfword unprivileged */
> +	{ .opc = 0x003000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed byte unprivileged */
> +	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 1, .w = false,
> +		.sign_extend = true , .decode = decode_arm_extra },
> +	/* Load signed halfword unprivileged */
> +	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
> +		.sign_extend = true , .decode = decode_arm_extra },

So here, yet again, we end up with more code decoding the ARM load/store
instructions so that we can do something with them.  How many places do
we now have in the ARM kernel doing this exact same thing?  Do we really
need to keep rewriting this functionality each time a feature that needs
it gets implemented, or is _someone_ going to sort this out once and for
all?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-14 16:43     ` Russell King - ARM Linux
  0 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> diff --git a/arch/arm/kvm/decode.c b/arch/arm/kvm/decode.c
> new file mode 100644
> index 0000000..469cf14
> --- /dev/null
> +++ b/arch/arm/kvm/decode.c
> @@ -0,0 +1,462 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_mmio.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_decode.h>
> +#include <trace/events/kvm.h>
> +
> +#include "trace.h"
> +
> +struct arm_instr {
> +	/* Instruction decoding */
> +	u32 opc;
> +	u32 opc_mask;
> +
> +	/* Decoding for the register write back */
> +	bool register_form;
> +	u32 imm;
> +	u8 Rm;
> +	u8 type;
> +	u8 shift_n;
> +
> +	/* Common decoding */
> +	u8 len;
> +	bool sign_extend;
> +	bool w;
> +
> +	bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +		       unsigned long instr, struct arm_instr *ai);
> +};
> +
> +enum SRType {
> +	SRType_LSL,
> +	SRType_LSR,
> +	SRType_ASR,
> +	SRType_ROR,
> +	SRType_RRX
> +};
> +
> +/* Modelled after DecodeImmShift() in the ARM ARM */
> +static enum SRType decode_imm_shift(u8 type, u8 imm5, u8 *amount)
> +{
> +	switch (type) {
> +	case 0x0:
> +		*amount = imm5;
> +		return SRType_LSL;
> +	case 0x1:
> +		*amount = (imm5 == 0) ? 32 : imm5;
> +		return SRType_LSR;
> +	case 0x2:
> +		*amount = (imm5 == 0) ? 32 : imm5;
> +		return SRType_ASR;
> +	case 0x3:
> +		if (imm5 == 0) {
> +			*amount = 1;
> +			return SRType_RRX;
> +		} else {
> +			*amount = imm5;
> +			return SRType_ROR;
> +		}
> +	}
> +
> +	return SRType_LSL;
> +}
> +
> +/* Modelled after Shift() in the ARM ARM */
> +static u32 shift(u32 value, u8 N, enum SRType type, u8 amount, bool carry_in)
> +{
> +	u32 mask = (1 << N) - 1;
> +	s32 svalue = (s32)value;
> +
> +	BUG_ON(N > 32);
> +	BUG_ON(type == SRType_RRX && amount != 1);
> +	BUG_ON(amount > N);
> +
> +	if (amount == 0)
> +		return value;
> +
> +	switch (type) {
> +	case SRType_LSL:
> +		value <<= amount;
> +		break;
> +	case SRType_LSR:
> +		 value >>= amount;
> +		break;
> +	case SRType_ASR:
> +		if (value & (1 << (N - 1)))
> +			svalue |= ((-1UL) << N);
> +		value = svalue >> amount;
> +		break;
> +	case SRType_ROR:
> +		value = (value >> amount) | (value << (N - amount));
> +		break;
> +	case SRType_RRX: {
> +		u32 C = (carry_in) ? 1 : 0;
> +		value = (value >> 1) | (C << (N - 1));
> +		break;
> +	}
> +	}
> +
> +	return value & mask;
> +}
> +
> +static bool decode_arm_wb(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +			  unsigned long instr, const struct arm_instr *ai)
> +{
> +	u8 Rt = (instr >> 12) & 0xf;
> +	u8 Rn = (instr >> 16) & 0xf;
> +	u8 W = (instr >> 21) & 1;
> +	u8 U = (instr >> 23) & 1;
> +	u8 P = (instr >> 24) & 1;
> +	u32 base_addr = *kvm_decode_reg(decode, Rn);
> +	u32 offset_addr, offset;
> +
> +	/*
> +	 * Technically this is allowed in certain circumstances,
> +	 * but we don't support it.
> +	 */
> +	if (Rt == 15 || Rn == 15)
> +		return false;
> +
> +	if (P && !W) {
> +		kvm_err("Decoding operation with valid ISV?\n");
> +		return false;
> +	}
> +
> +	decode->rt = Rt;
> +
> +	if (ai->register_form) {
> +		/* Register operation */
> +		enum SRType s_type;
> +		u8 shift_n = 0;
> +		bool c_bit = *kvm_decode_cpsr(decode) & PSR_C_BIT;
> +		u32 s_reg = *kvm_decode_reg(decode, ai->Rm);
> +
> +		s_type = decode_imm_shift(ai->type, ai->shift_n, &shift_n);
> +		offset = shift(s_reg, 5, s_type, shift_n, c_bit);
> +	} else {
> +		/* Immediate operation */
> +		offset = ai->imm;
> +	}
> +
> +	/* Handle Writeback */
> +	if (U)
> +		offset_addr = base_addr + offset;
> +	else
> +		offset_addr = base_addr - offset;
> +	*kvm_decode_reg(decode, Rn) = offset_addr;
> +	return true;
> +}
> +
> +static bool decode_arm_ls(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +			  unsigned long instr, struct arm_instr *ai)
> +{
> +	u8 A = (instr >> 25) & 1;
> +
> +	mmio->is_write = ai->w;
> +	mmio->len = ai->len;
> +	decode->sign_extend = false;
> +
> +	ai->register_form = A;
> +	ai->imm = instr & 0xfff;
> +	ai->Rm = instr & 0xf;
> +	ai->type = (instr >> 5) & 0x3;
> +	ai->shift_n = (instr >> 7) & 0x1f;
> +
> +	return decode_arm_wb(decode, mmio, instr, ai);
> +}
> +
> +static bool decode_arm_extra(struct kvm_decode *decode,
> +			     struct kvm_exit_mmio *mmio,
> +			     unsigned long instr, struct arm_instr *ai)
> +{
> +	mmio->is_write = ai->w;
> +	mmio->len = ai->len;
> +	decode->sign_extend = ai->sign_extend;
> +
> +	ai->register_form = !((instr >> 22) & 1);
> +	ai->imm = ((instr >> 4) & 0xf0) | (instr & 0xf);
> +	ai->Rm = instr & 0xf;
> +	ai->type = 0; /* SRType_LSL */
> +	ai->shift_n = 0;
> +
> +	return decode_arm_wb(decode, mmio, instr, ai);
> +}
> +
> +/*
> + * The encodings in this table assumes that a fault was generated where the
> + * ISV field in the HSR was clear, and the decoding information was invalid,
> + * which means that a register write-back occurred, the PC was used as the
> + * destination or a load/store multiple operation was used. Since the latter
> + * two cases are crazy for MMIO on the guest side, we simply inject a fault
> + * when this happens and support the common case.
> + *
> + * We treat unpriviledged loads and stores of words and bytes like all other
> + * loads and stores as their encodings mandate the W bit set and the P bit
> + * clear.
> + */
> +static const struct arm_instr arm_instr[] = {
> +	/**************** Load/Store Word and Byte **********************/
> +	/* Store word with writeback */
> +	{ .opc = 0x04000000, .opc_mask = 0x0c500000, .len = 4, .w = true,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Store byte with writeback */
> +	{ .opc = 0x04400000, .opc_mask = 0x0c500000, .len = 1, .w = true,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Load word with writeback */
> +	{ .opc = 0x04100000, .opc_mask = 0x0c500000, .len = 4, .w = false,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Load byte with writeback */
> +	{ .opc = 0x04500000, .opc_mask = 0x0c500000, .len = 1, .w = false,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +
> +	/*************** Extra load/store instructions ******************/
> +
> +	/* Store halfword with writeback */
> +	{ .opc = 0x000000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load halfword with writeback */
> +	{ .opc = 0x001000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +
> +	/* Load dual with writeback */
> +	{ .opc = 0x000000d0, .opc_mask = 0x0c1000f0, .len = 8, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed byte with writeback */
> +	{ .opc = 0x001000d0, .opc_mask = 0x0c1000f0, .len = 1, .w = false,
> +		.sign_extend = true,  .decode = decode_arm_extra },
> +
> +	/* Store dual with writeback */
> +	{ .opc = 0x000000f0, .opc_mask = 0x0c1000f0, .len = 8, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed halfword with writeback */
> +	{ .opc = 0x001000f0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
> +		.sign_extend = true,  .decode = decode_arm_extra },
> +
> +	/* Store halfword unprivileged */
> +	{ .opc = 0x002000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load halfword unprivileged */
> +	{ .opc = 0x003000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed byte unprivileged */
> +	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 1, .w = false,
> +		.sign_extend = true , .decode = decode_arm_extra },
> +	/* Load signed halfword unprivileged */
> +	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
> +		.sign_extend = true , .decode = decode_arm_extra },

So here, yet again, we end up with more code decoding the ARM load/store
instructions so that we can do something with them.  How many places do
we now have in the ARM kernel doing this exact same thing?  Do we really
need to keep rewriting this functionality each time a feature that needs
it gets implemented, or is _someone_ going to sort this out once and for
all?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 02/14] ARM: Section based HYP idmap
  2013-01-14 16:13     ` Russell King - ARM Linux
@ 2013-01-14 17:09       ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 17:09 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Will Deacon

On Mon, Jan 14, 2013 at 11:13 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jan 08, 2013 at 01:38:48PM -0500, Christoffer Dall wrote:
>> +     pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
>> +             prot ? "HYP " : "",
>> +             (long long)addr, (long long)end);
>
> There's no point using 0x%llx and casting to 64-bit longs if the arguments
> are always going to be 32-bit.

true, that's silly. This should improve the code, suggested by Will:

commit 1baa03f3d70f082e4522fd32db09e4f5542ff48d
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Mon Jan 14 12:06:26 2013 -0500

    ARM: idmap: cleanup pr_info

    It's cleaner to simply print the info messages in the callers and
    there's no reason to convert 32-bit values to 64-bit values.

    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index d9213a5..b9ae344 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -71,9 +71,6 @@ static void identity_mapping_add(pgd_t *pgd, const
char *text_start,
 	addr = virt_to_phys(text_start);
 	end = virt_to_phys(text_end);

-	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
-		prot ? "HYP " : "",
-		(long long)addr, (long long)end);
 	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;

 	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
@@ -97,6 +94,8 @@ static int __init init_static_idmap_hyp(void)
 	if (!hyp_pgd)
 		return -ENOMEM;

+	pr_info("Setting up static HYP identity map for 0x%p - 0x%p\n",
+		__hyp_idmap_text_start, __hyp_idmap_text_end);
 	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
 			     __hyp_idmap_text_end, PMD_SECT_AP1);

@@ -119,6 +118,8 @@ static int __init init_static_idmap(void)
 	if (!idmap_pgd)
 		return -ENOMEM;

+	pr_info("Setting up static identity map for 0x%p - 0x%p\n",
+		__idmap_text_start, __idmap_text_end);
 	identity_mapping_add(idmap_pgd, __idmap_text_start,
 			     __idmap_text_end, 0);

--

-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 02/14] ARM: Section based HYP idmap
@ 2013-01-14 17:09       ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 17:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 11:13 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jan 08, 2013 at 01:38:48PM -0500, Christoffer Dall wrote:
>> +     pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
>> +             prot ? "HYP " : "",
>> +             (long long)addr, (long long)end);
>
> There's no point using 0x%llx and casting to 64-bit longs if the arguments
> are always going to be 32-bit.

true, that's silly. This should improve the code, suggested by Will:

commit 1baa03f3d70f082e4522fd32db09e4f5542ff48d
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Mon Jan 14 12:06:26 2013 -0500

    ARM: idmap: cleanup pr_info

    It's cleaner to simply print the info messages in the callers and
    there's no reason to convert 32-bit values to 64-bit values.

    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index d9213a5..b9ae344 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -71,9 +71,6 @@ static void identity_mapping_add(pgd_t *pgd, const
char *text_start,
 	addr = virt_to_phys(text_start);
 	end = virt_to_phys(text_end);

-	pr_info("Setting up static %sidentity map for 0x%llx - 0x%llx\n",
-		prot ? "HYP " : "",
-		(long long)addr, (long long)end);
 	prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;

 	if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
@@ -97,6 +94,8 @@ static int __init init_static_idmap_hyp(void)
 	if (!hyp_pgd)
 		return -ENOMEM;

+	pr_info("Setting up static HYP identity map for 0x%p - 0x%p\n",
+		__hyp_idmap_text_start, __hyp_idmap_text_end);
 	identity_mapping_add(hyp_pgd, __hyp_idmap_text_start,
 			     __hyp_idmap_text_end, PMD_SECT_AP1);

@@ -119,6 +118,8 @@ static int __init init_static_idmap(void)
 	if (!idmap_pgd)
 		return -ENOMEM;

+	pr_info("Setting up static identity map for 0x%p - 0x%p\n",
+		__idmap_text_start, __idmap_text_end);
 	identity_mapping_add(idmap_pgd, __idmap_text_start,
 			     __idmap_text_end, 0);

--

-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-14 16:24     ` Russell King - ARM Linux
@ 2013-01-14 17:33       ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 17:33 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

On Mon, Jan 14, 2013 at 11:24 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>> +     /* -ENOENT for unknown features, -EINVAL for invalid combinations. */
>> +     for (i = 0; i < sizeof(init->features)*8; i++) {
>> +             if (init->features[i / 32] & (1 << (i % 32))) {
>
> Isn't this an open-coded version of test_bit() ?

indeed, nicely spotted:

commit 608588674144e403ad0ea3c93066f3175bd5cf88
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Mon Jan 14 12:30:07 2013 -0500

    KVM: ARM: Use test-bit instead of open-coded version

    Makes the code more readable, also adds spaces around the asterisk.

    Cc: Russell King <linux@arm.linux.org.uk>
    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 65ae563..2339d96 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -193,8 +193,8 @@ int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);

 	/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
-	for (i = 0; i < sizeof(init->features)*8; i++) {
-		if (init->features[i / 32] & (1 << (i % 32))) {
+	for (i = 0; i < sizeof(init->features) * 8; i++) {
+		if (test_bit(i, (void *)init->features)) {
 			if (i >= KVM_VCPU_MAX_FEATURES)
 				return -ENOENT;
 			set_bit(i, vcpu->arch.features);
--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-14 17:33       ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 17:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 11:24 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>> +     /* -ENOENT for unknown features, -EINVAL for invalid combinations. */
>> +     for (i = 0; i < sizeof(init->features)*8; i++) {
>> +             if (init->features[i / 32] & (1 << (i % 32))) {
>
> Isn't this an open-coded version of test_bit() ?

indeed, nicely spotted:

commit 608588674144e403ad0ea3c93066f3175bd5cf88
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Mon Jan 14 12:30:07 2013 -0500

    KVM: ARM: Use test-bit instead of open-coded version

    Makes the code more readable, also adds spaces around the asterisk.

    Cc: Russell King <linux@arm.linux.org.uk>
    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 65ae563..2339d96 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -193,8 +193,8 @@ int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);

 	/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
-	for (i = 0; i < sizeof(init->features)*8; i++) {
-		if (init->features[i / 32] & (1 << (i % 32))) {
+	for (i = 0; i < sizeof(init->features) * 8; i++) {
+		if (test_bit(i, (void *)init->features)) {
 			if (i >= KVM_VCPU_MAX_FEATURES)
 				return -ENOENT;
 			set_bit(i, vcpu->arch.features);
--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 08/14] KVM: ARM: Emulation framework and CP15 emulation
  2013-01-14 16:36     ` Russell King - ARM Linux
@ 2013-01-14 17:38       ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 17:38 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: kvm, linux-arm-kernel, kvmarm, Marcelo Tosatti, Rusty Russell

On Mon, Jan 14, 2013 at 11:36 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jan 08, 2013 at 01:39:31PM -0500, Christoffer Dall wrote:
>> +     /*
>> +      * Check whether this vcpu requires the cache to be flushed on
>> +      * this physical CPU. This is a consequence of doing dcache
>> +      * operations by set/way on this vcpu. We do it here to be in
>> +      * a non-preemptible section.
>> +      */
>> +     if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
>> +             cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
>
> There is cpumask_test_and_clear_cpu() which may be better for this.

nice:

commit d31686fadb74ad564f6a5acabdebe411de86d77d
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Mon Jan 14 12:36:53 2013 -0500

    KVM: ARM: Use cpumask_test_and_clear_cpu

    Nicer shorter cleaner code. Ahhhh.

    Cc: Russell King <linux@arm.linux.org.uk>
    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index b5c6ab1..fdd4a7c 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -352,10 +352,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	 * operations by set/way on this vcpu. We do it here to be in
 	 * a non-preemptible section.
 	 */
-	if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
-		cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+	if (cpumask_test_and_clear_cpu(cpu, &vcpu->arch.require_dcache_flush))
 		flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
-	}

 	kvm_arm_set_running_vcpu(vcpu);
 }

--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 08/14] KVM: ARM: Emulation framework and CP15 emulation
@ 2013-01-14 17:38       ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 17:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 11:36 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jan 08, 2013 at 01:39:31PM -0500, Christoffer Dall wrote:
>> +     /*
>> +      * Check whether this vcpu requires the cache to be flushed on
>> +      * this physical CPU. This is a consequence of doing dcache
>> +      * operations by set/way on this vcpu. We do it here to be in
>> +      * a non-preemptible section.
>> +      */
>> +     if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
>> +             cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
>
> There is cpumask_test_and_clear_cpu() which may be better for this.

nice:

commit d31686fadb74ad564f6a5acabdebe411de86d77d
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Mon Jan 14 12:36:53 2013 -0500

    KVM: ARM: Use cpumask_test_and_clear_cpu

    Nicer shorter cleaner code. Ahhhh.

    Cc: Russell King <linux@arm.linux.org.uk>
    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index b5c6ab1..fdd4a7c 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -352,10 +352,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	 * operations by set/way on this vcpu. We do it here to be in
 	 * a non-preemptible section.
 	 */
-	if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
-		cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
+	if (cpumask_test_and_clear_cpu(cpu, &vcpu->arch.require_dcache_flush))
 		flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
-	}

 	kvm_arm_set_running_vcpu(vcpu);
 }

--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-14 16:43     ` Russell King - ARM Linux
@ 2013-01-14 18:25       ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 18:25 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

On Mon, Jan 14, 2013 at 11:43 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
>> diff --git a/arch/arm/kvm/decode.c b/arch/arm/kvm/decode.c
>> new file mode 100644
>> index 0000000..469cf14
>> --- /dev/null
>> +++ b/arch/arm/kvm/decode.c
>> @@ -0,0 +1,462 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +#include <linux/kvm_host.h>
>> +#include <asm/kvm_mmio.h>
>> +#include <asm/kvm_emulate.h>
>> +#include <asm/kvm_decode.h>
>> +#include <trace/events/kvm.h>
>> +
>> +#include "trace.h"
>> +
>> +struct arm_instr {
>> +     /* Instruction decoding */
>> +     u32 opc;
>> +     u32 opc_mask;
>> +
>> +     /* Decoding for the register write back */
>> +     bool register_form;
>> +     u32 imm;
>> +     u8 Rm;
>> +     u8 type;
>> +     u8 shift_n;
>> +
>> +     /* Common decoding */
>> +     u8 len;
>> +     bool sign_extend;
>> +     bool w;
>> +
>> +     bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
>> +                    unsigned long instr, struct arm_instr *ai);
>> +};
>> +
>> +enum SRType {
>> +     SRType_LSL,
>> +     SRType_LSR,
>> +     SRType_ASR,
>> +     SRType_ROR,
>> +     SRType_RRX
>> +};
>> +
>> +/* Modelled after DecodeImmShift() in the ARM ARM */
>> +static enum SRType decode_imm_shift(u8 type, u8 imm5, u8 *amount)
>> +{
>> +     switch (type) {
>> +     case 0x0:
>> +             *amount = imm5;
>> +             return SRType_LSL;
>> +     case 0x1:
>> +             *amount = (imm5 == 0) ? 32 : imm5;
>> +             return SRType_LSR;
>> +     case 0x2:
>> +             *amount = (imm5 == 0) ? 32 : imm5;
>> +             return SRType_ASR;
>> +     case 0x3:
>> +             if (imm5 == 0) {
>> +                     *amount = 1;
>> +                     return SRType_RRX;
>> +             } else {
>> +                     *amount = imm5;
>> +                     return SRType_ROR;
>> +             }
>> +     }
>> +
>> +     return SRType_LSL;
>> +}
>> +
>> +/* Modelled after Shift() in the ARM ARM */
>> +static u32 shift(u32 value, u8 N, enum SRType type, u8 amount, bool carry_in)
>> +{
>> +     u32 mask = (1 << N) - 1;
>> +     s32 svalue = (s32)value;
>> +
>> +     BUG_ON(N > 32);
>> +     BUG_ON(type == SRType_RRX && amount != 1);
>> +     BUG_ON(amount > N);
>> +
>> +     if (amount == 0)
>> +             return value;
>> +
>> +     switch (type) {
>> +     case SRType_LSL:
>> +             value <<= amount;
>> +             break;
>> +     case SRType_LSR:
>> +              value >>= amount;
>> +             break;
>> +     case SRType_ASR:
>> +             if (value & (1 << (N - 1)))
>> +                     svalue |= ((-1UL) << N);
>> +             value = svalue >> amount;
>> +             break;
>> +     case SRType_ROR:
>> +             value = (value >> amount) | (value << (N - amount));
>> +             break;
>> +     case SRType_RRX: {
>> +             u32 C = (carry_in) ? 1 : 0;
>> +             value = (value >> 1) | (C << (N - 1));
>> +             break;
>> +     }
>> +     }
>> +
>> +     return value & mask;
>> +}
>> +
>> +static bool decode_arm_wb(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
>> +                       unsigned long instr, const struct arm_instr *ai)
>> +{
>> +     u8 Rt = (instr >> 12) & 0xf;
>> +     u8 Rn = (instr >> 16) & 0xf;
>> +     u8 W = (instr >> 21) & 1;
>> +     u8 U = (instr >> 23) & 1;
>> +     u8 P = (instr >> 24) & 1;
>> +     u32 base_addr = *kvm_decode_reg(decode, Rn);
>> +     u32 offset_addr, offset;
>> +
>> +     /*
>> +      * Technically this is allowed in certain circumstances,
>> +      * but we don't support it.
>> +      */
>> +     if (Rt == 15 || Rn == 15)
>> +             return false;
>> +
>> +     if (P && !W) {
>> +             kvm_err("Decoding operation with valid ISV?\n");
>> +             return false;
>> +     }
>> +
>> +     decode->rt = Rt;
>> +
>> +     if (ai->register_form) {
>> +             /* Register operation */
>> +             enum SRType s_type;
>> +             u8 shift_n = 0;
>> +             bool c_bit = *kvm_decode_cpsr(decode) & PSR_C_BIT;
>> +             u32 s_reg = *kvm_decode_reg(decode, ai->Rm);
>> +
>> +             s_type = decode_imm_shift(ai->type, ai->shift_n, &shift_n);
>> +             offset = shift(s_reg, 5, s_type, shift_n, c_bit);
>> +     } else {
>> +             /* Immediate operation */
>> +             offset = ai->imm;
>> +     }
>> +
>> +     /* Handle Writeback */
>> +     if (U)
>> +             offset_addr = base_addr + offset;
>> +     else
>> +             offset_addr = base_addr - offset;
>> +     *kvm_decode_reg(decode, Rn) = offset_addr;
>> +     return true;
>> +}
>> +
>> +static bool decode_arm_ls(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
>> +                       unsigned long instr, struct arm_instr *ai)
>> +{
>> +     u8 A = (instr >> 25) & 1;
>> +
>> +     mmio->is_write = ai->w;
>> +     mmio->len = ai->len;
>> +     decode->sign_extend = false;
>> +
>> +     ai->register_form = A;
>> +     ai->imm = instr & 0xfff;
>> +     ai->Rm = instr & 0xf;
>> +     ai->type = (instr >> 5) & 0x3;
>> +     ai->shift_n = (instr >> 7) & 0x1f;
>> +
>> +     return decode_arm_wb(decode, mmio, instr, ai);
>> +}
>> +
>> +static bool decode_arm_extra(struct kvm_decode *decode,
>> +                          struct kvm_exit_mmio *mmio,
>> +                          unsigned long instr, struct arm_instr *ai)
>> +{
>> +     mmio->is_write = ai->w;
>> +     mmio->len = ai->len;
>> +     decode->sign_extend = ai->sign_extend;
>> +
>> +     ai->register_form = !((instr >> 22) & 1);
>> +     ai->imm = ((instr >> 4) & 0xf0) | (instr & 0xf);
>> +     ai->Rm = instr & 0xf;
>> +     ai->type = 0; /* SRType_LSL */
>> +     ai->shift_n = 0;
>> +
>> +     return decode_arm_wb(decode, mmio, instr, ai);
>> +}
>> +
>> +/*
>> + * The encodings in this table assumes that a fault was generated where the
>> + * ISV field in the HSR was clear, and the decoding information was invalid,
>> + * which means that a register write-back occurred, the PC was used as the
>> + * destination or a load/store multiple operation was used. Since the latter
>> + * two cases are crazy for MMIO on the guest side, we simply inject a fault
>> + * when this happens and support the common case.
>> + *
>> + * We treat unpriviledged loads and stores of words and bytes like all other
>> + * loads and stores as their encodings mandate the W bit set and the P bit
>> + * clear.
>> + */
>> +static const struct arm_instr arm_instr[] = {
>> +     /**************** Load/Store Word and Byte **********************/
>> +     /* Store word with writeback */
>> +     { .opc = 0x04000000, .opc_mask = 0x0c500000, .len = 4, .w = true,
>> +             .sign_extend = false, .decode = decode_arm_ls },
>> +     /* Store byte with writeback */
>> +     { .opc = 0x04400000, .opc_mask = 0x0c500000, .len = 1, .w = true,
>> +             .sign_extend = false, .decode = decode_arm_ls },
>> +     /* Load word with writeback */
>> +     { .opc = 0x04100000, .opc_mask = 0x0c500000, .len = 4, .w = false,
>> +             .sign_extend = false, .decode = decode_arm_ls },
>> +     /* Load byte with writeback */
>> +     { .opc = 0x04500000, .opc_mask = 0x0c500000, .len = 1, .w = false,
>> +             .sign_extend = false, .decode = decode_arm_ls },
>> +
>> +     /*************** Extra load/store instructions ******************/
>> +
>> +     /* Store halfword with writeback */
>> +     { .opc = 0x000000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = true,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +     /* Load halfword with writeback */
>> +     { .opc = 0x001000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +
>> +     /* Load dual with writeback */
>> +     { .opc = 0x000000d0, .opc_mask = 0x0c1000f0, .len = 8, .w = false,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +     /* Load signed byte with writeback */
>> +     { .opc = 0x001000d0, .opc_mask = 0x0c1000f0, .len = 1, .w = false,
>> +             .sign_extend = true,  .decode = decode_arm_extra },
>> +
>> +     /* Store dual with writeback */
>> +     { .opc = 0x000000f0, .opc_mask = 0x0c1000f0, .len = 8, .w = true,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +     /* Load signed halfword with writeback */
>> +     { .opc = 0x001000f0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
>> +             .sign_extend = true,  .decode = decode_arm_extra },
>> +
>> +     /* Store halfword unprivileged */
>> +     { .opc = 0x002000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = true,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +     /* Load halfword unprivileged */
>> +     { .opc = 0x003000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +     /* Load signed byte unprivileged */
>> +     { .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 1, .w = false,
>> +             .sign_extend = true , .decode = decode_arm_extra },
>> +     /* Load signed halfword unprivileged */
>> +     { .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
>> +             .sign_extend = true , .decode = decode_arm_extra },
>
> So here, yet again, we end up with more code decoding the ARM load/store
> instructions so that we can do something with them.  How many places do
> we now have in the ARM kernel doing this exact same thing?  Do we really
> need to keep rewriting this functionality each time a feature that needs
> it gets implemented, or is _someone_ going to sort this out once and for
> all?


Hi Russell,

This was indeed discussed a couple of time already, and I hear your concern.

However, unifying all instruction decoding within arch/arm is quite
the heavy task, and requires agreeing on some canonical API that
people can live with and it will likely take a long time.  I seem to
recall there were also arguments against unifying kprobe code with
other instruction decoding, as the kprobe code was also written to
work highly optimized under certain assumptions, if I understood
previous comments correctly.

Therefore I tried writing the decoding up in a way, which was not too
KVM specific, but without adding a lot of code paths to decode
instructions that were never going to be decoded by KVM and thus
trying to avoid having untested code in the kernel.

I really hope that this will not hold up the KVM patches right now, as
unifying the decoding would not break any external APIs when doing it
later.  Maintaining these patches out-of-tree is placing an
increasingly large burden on both me and Marc Zyngier especially, and
more and more people are requesting the KVM functionality.

That being said, I do like the though of having a complete solution
for instruction decoding in the kernel, and if I in any way can find
time down the road, I'd be happy taking part in writing this code,
especially if I receive help from people knowing the other potential
subsystems benefiting from such code.

So, I would go as far as to beg you to accept this code as part of the
KVM/ARM implementation with the promise that I *will* help out or take
charge later on in a unifying effort if in any way possible for me?

Best,
-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-14 18:25       ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 18:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 11:43 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
>> diff --git a/arch/arm/kvm/decode.c b/arch/arm/kvm/decode.c
>> new file mode 100644
>> index 0000000..469cf14
>> --- /dev/null
>> +++ b/arch/arm/kvm/decode.c
>> @@ -0,0 +1,462 @@
>> +/*
>> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
>> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>> + */
>> +#include <linux/kvm_host.h>
>> +#include <asm/kvm_mmio.h>
>> +#include <asm/kvm_emulate.h>
>> +#include <asm/kvm_decode.h>
>> +#include <trace/events/kvm.h>
>> +
>> +#include "trace.h"
>> +
>> +struct arm_instr {
>> +     /* Instruction decoding */
>> +     u32 opc;
>> +     u32 opc_mask;
>> +
>> +     /* Decoding for the register write back */
>> +     bool register_form;
>> +     u32 imm;
>> +     u8 Rm;
>> +     u8 type;
>> +     u8 shift_n;
>> +
>> +     /* Common decoding */
>> +     u8 len;
>> +     bool sign_extend;
>> +     bool w;
>> +
>> +     bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
>> +                    unsigned long instr, struct arm_instr *ai);
>> +};
>> +
>> +enum SRType {
>> +     SRType_LSL,
>> +     SRType_LSR,
>> +     SRType_ASR,
>> +     SRType_ROR,
>> +     SRType_RRX
>> +};
>> +
>> +/* Modelled after DecodeImmShift() in the ARM ARM */
>> +static enum SRType decode_imm_shift(u8 type, u8 imm5, u8 *amount)
>> +{
>> +     switch (type) {
>> +     case 0x0:
>> +             *amount = imm5;
>> +             return SRType_LSL;
>> +     case 0x1:
>> +             *amount = (imm5 == 0) ? 32 : imm5;
>> +             return SRType_LSR;
>> +     case 0x2:
>> +             *amount = (imm5 == 0) ? 32 : imm5;
>> +             return SRType_ASR;
>> +     case 0x3:
>> +             if (imm5 == 0) {
>> +                     *amount = 1;
>> +                     return SRType_RRX;
>> +             } else {
>> +                     *amount = imm5;
>> +                     return SRType_ROR;
>> +             }
>> +     }
>> +
>> +     return SRType_LSL;
>> +}
>> +
>> +/* Modelled after Shift() in the ARM ARM */
>> +static u32 shift(u32 value, u8 N, enum SRType type, u8 amount, bool carry_in)
>> +{
>> +     u32 mask = (1 << N) - 1;
>> +     s32 svalue = (s32)value;
>> +
>> +     BUG_ON(N > 32);
>> +     BUG_ON(type == SRType_RRX && amount != 1);
>> +     BUG_ON(amount > N);
>> +
>> +     if (amount == 0)
>> +             return value;
>> +
>> +     switch (type) {
>> +     case SRType_LSL:
>> +             value <<= amount;
>> +             break;
>> +     case SRType_LSR:
>> +              value >>= amount;
>> +             break;
>> +     case SRType_ASR:
>> +             if (value & (1 << (N - 1)))
>> +                     svalue |= ((-1UL) << N);
>> +             value = svalue >> amount;
>> +             break;
>> +     case SRType_ROR:
>> +             value = (value >> amount) | (value << (N - amount));
>> +             break;
>> +     case SRType_RRX: {
>> +             u32 C = (carry_in) ? 1 : 0;
>> +             value = (value >> 1) | (C << (N - 1));
>> +             break;
>> +     }
>> +     }
>> +
>> +     return value & mask;
>> +}
>> +
>> +static bool decode_arm_wb(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
>> +                       unsigned long instr, const struct arm_instr *ai)
>> +{
>> +     u8 Rt = (instr >> 12) & 0xf;
>> +     u8 Rn = (instr >> 16) & 0xf;
>> +     u8 W = (instr >> 21) & 1;
>> +     u8 U = (instr >> 23) & 1;
>> +     u8 P = (instr >> 24) & 1;
>> +     u32 base_addr = *kvm_decode_reg(decode, Rn);
>> +     u32 offset_addr, offset;
>> +
>> +     /*
>> +      * Technically this is allowed in certain circumstances,
>> +      * but we don't support it.
>> +      */
>> +     if (Rt == 15 || Rn == 15)
>> +             return false;
>> +
>> +     if (P && !W) {
>> +             kvm_err("Decoding operation with valid ISV?\n");
>> +             return false;
>> +     }
>> +
>> +     decode->rt = Rt;
>> +
>> +     if (ai->register_form) {
>> +             /* Register operation */
>> +             enum SRType s_type;
>> +             u8 shift_n = 0;
>> +             bool c_bit = *kvm_decode_cpsr(decode) & PSR_C_BIT;
>> +             u32 s_reg = *kvm_decode_reg(decode, ai->Rm);
>> +
>> +             s_type = decode_imm_shift(ai->type, ai->shift_n, &shift_n);
>> +             offset = shift(s_reg, 5, s_type, shift_n, c_bit);
>> +     } else {
>> +             /* Immediate operation */
>> +             offset = ai->imm;
>> +     }
>> +
>> +     /* Handle Writeback */
>> +     if (U)
>> +             offset_addr = base_addr + offset;
>> +     else
>> +             offset_addr = base_addr - offset;
>> +     *kvm_decode_reg(decode, Rn) = offset_addr;
>> +     return true;
>> +}
>> +
>> +static bool decode_arm_ls(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
>> +                       unsigned long instr, struct arm_instr *ai)
>> +{
>> +     u8 A = (instr >> 25) & 1;
>> +
>> +     mmio->is_write = ai->w;
>> +     mmio->len = ai->len;
>> +     decode->sign_extend = false;
>> +
>> +     ai->register_form = A;
>> +     ai->imm = instr & 0xfff;
>> +     ai->Rm = instr & 0xf;
>> +     ai->type = (instr >> 5) & 0x3;
>> +     ai->shift_n = (instr >> 7) & 0x1f;
>> +
>> +     return decode_arm_wb(decode, mmio, instr, ai);
>> +}
>> +
>> +static bool decode_arm_extra(struct kvm_decode *decode,
>> +                          struct kvm_exit_mmio *mmio,
>> +                          unsigned long instr, struct arm_instr *ai)
>> +{
>> +     mmio->is_write = ai->w;
>> +     mmio->len = ai->len;
>> +     decode->sign_extend = ai->sign_extend;
>> +
>> +     ai->register_form = !((instr >> 22) & 1);
>> +     ai->imm = ((instr >> 4) & 0xf0) | (instr & 0xf);
>> +     ai->Rm = instr & 0xf;
>> +     ai->type = 0; /* SRType_LSL */
>> +     ai->shift_n = 0;
>> +
>> +     return decode_arm_wb(decode, mmio, instr, ai);
>> +}
>> +
>> +/*
>> + * The encodings in this table assumes that a fault was generated where the
>> + * ISV field in the HSR was clear, and the decoding information was invalid,
>> + * which means that a register write-back occurred, the PC was used as the
>> + * destination or a load/store multiple operation was used. Since the latter
>> + * two cases are crazy for MMIO on the guest side, we simply inject a fault
>> + * when this happens and support the common case.
>> + *
>> + * We treat unpriviledged loads and stores of words and bytes like all other
>> + * loads and stores as their encodings mandate the W bit set and the P bit
>> + * clear.
>> + */
>> +static const struct arm_instr arm_instr[] = {
>> +     /**************** Load/Store Word and Byte **********************/
>> +     /* Store word with writeback */
>> +     { .opc = 0x04000000, .opc_mask = 0x0c500000, .len = 4, .w = true,
>> +             .sign_extend = false, .decode = decode_arm_ls },
>> +     /* Store byte with writeback */
>> +     { .opc = 0x04400000, .opc_mask = 0x0c500000, .len = 1, .w = true,
>> +             .sign_extend = false, .decode = decode_arm_ls },
>> +     /* Load word with writeback */
>> +     { .opc = 0x04100000, .opc_mask = 0x0c500000, .len = 4, .w = false,
>> +             .sign_extend = false, .decode = decode_arm_ls },
>> +     /* Load byte with writeback */
>> +     { .opc = 0x04500000, .opc_mask = 0x0c500000, .len = 1, .w = false,
>> +             .sign_extend = false, .decode = decode_arm_ls },
>> +
>> +     /*************** Extra load/store instructions ******************/
>> +
>> +     /* Store halfword with writeback */
>> +     { .opc = 0x000000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = true,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +     /* Load halfword with writeback */
>> +     { .opc = 0x001000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +
>> +     /* Load dual with writeback */
>> +     { .opc = 0x000000d0, .opc_mask = 0x0c1000f0, .len = 8, .w = false,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +     /* Load signed byte with writeback */
>> +     { .opc = 0x001000d0, .opc_mask = 0x0c1000f0, .len = 1, .w = false,
>> +             .sign_extend = true,  .decode = decode_arm_extra },
>> +
>> +     /* Store dual with writeback */
>> +     { .opc = 0x000000f0, .opc_mask = 0x0c1000f0, .len = 8, .w = true,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +     /* Load signed halfword with writeback */
>> +     { .opc = 0x001000f0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
>> +             .sign_extend = true,  .decode = decode_arm_extra },
>> +
>> +     /* Store halfword unprivileged */
>> +     { .opc = 0x002000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = true,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +     /* Load halfword unprivileged */
>> +     { .opc = 0x003000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
>> +             .sign_extend = false, .decode = decode_arm_extra },
>> +     /* Load signed byte unprivileged */
>> +     { .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 1, .w = false,
>> +             .sign_extend = true , .decode = decode_arm_extra },
>> +     /* Load signed halfword unprivileged */
>> +     { .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
>> +             .sign_extend = true , .decode = decode_arm_extra },
>
> So here, yet again, we end up with more code decoding the ARM load/store
> instructions so that we can do something with them.  How many places do
> we now have in the ARM kernel doing this exact same thing?  Do we really
> need to keep rewriting this functionality each time a feature that needs
> it gets implemented, or is _someone_ going to sort this out once and for
> all?


Hi Russell,

This was indeed discussed a couple of time already, and I hear your concern.

However, unifying all instruction decoding within arch/arm is quite
the heavy task, and requires agreeing on some canonical API that
people can live with and it will likely take a long time.  I seem to
recall there were also arguments against unifying kprobe code with
other instruction decoding, as the kprobe code was also written to
work highly optimized under certain assumptions, if I understood
previous comments correctly.

Therefore I tried writing the decoding up in a way, which was not too
KVM specific, but without adding a lot of code paths to decode
instructions that were never going to be decoded by KVM and thus
trying to avoid having untested code in the kernel.

I really hope that this will not hold up the KVM patches right now, as
unifying the decoding would not break any external APIs when doing it
later.  Maintaining these patches out-of-tree is placing an
increasingly large burden on both me and Marc Zyngier especially, and
more and more people are requesting the KVM functionality.

That being said, I do like the though of having a complete solution
for instruction decoding in the kernel, and if I in any way can find
time down the road, I'd be happy taking part in writing this code,
especially if I receive help from people knowing the other potential
subsystems benefiting from such code.

So, I would go as far as to beg you to accept this code as part of the
KVM/ARM implementation with the promise that I *will* help out or take
charge later on in a unifying effort if in any way possible for me?

Best,
-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 08/14] KVM: ARM: Emulation framework and CP15 emulation
  2013-01-14 17:38       ` Christoffer Dall
@ 2013-01-14 18:33         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 18:33 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marcelo Tosatti, Rusty Russell

On Mon, Jan 14, 2013 at 12:38:17PM -0500, Christoffer Dall wrote:
> On Mon, Jan 14, 2013 at 11:36 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Tue, Jan 08, 2013 at 01:39:31PM -0500, Christoffer Dall wrote:
> >> +     /*
> >> +      * Check whether this vcpu requires the cache to be flushed on
> >> +      * this physical CPU. This is a consequence of doing dcache
> >> +      * operations by set/way on this vcpu. We do it here to be in
> >> +      * a non-preemptible section.
> >> +      */
> >> +     if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
> >> +             cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
> >
> > There is cpumask_test_and_clear_cpu() which may be better for this.
> 
> nice:
> 
> commit d31686fadb74ad564f6a5acabdebe411de86d77d
> Author: Christoffer Dall <c.dall@virtualopensystems.com>
> Date:   Mon Jan 14 12:36:53 2013 -0500
> 
>     KVM: ARM: Use cpumask_test_and_clear_cpu
> 
>     Nicer shorter cleaner code. Ahhhh.
> 
>     Cc: Russell King <linux@arm.linux.org.uk>

Great, thanks.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>

>     Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index b5c6ab1..fdd4a7c 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -352,10 +352,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	 * operations by set/way on this vcpu. We do it here to be in
>  	 * a non-preemptible section.
>  	 */
> -	if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
> -		cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
> +	if (cpumask_test_and_clear_cpu(cpu, &vcpu->arch.require_dcache_flush))
>  		flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
> -	}
> 
>  	kvm_arm_set_running_vcpu(vcpu);
>  }
> 
> --
> 
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 08/14] KVM: ARM: Emulation framework and CP15 emulation
@ 2013-01-14 18:33         ` Russell King - ARM Linux
  0 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 12:38:17PM -0500, Christoffer Dall wrote:
> On Mon, Jan 14, 2013 at 11:36 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Tue, Jan 08, 2013 at 01:39:31PM -0500, Christoffer Dall wrote:
> >> +     /*
> >> +      * Check whether this vcpu requires the cache to be flushed on
> >> +      * this physical CPU. This is a consequence of doing dcache
> >> +      * operations by set/way on this vcpu. We do it here to be in
> >> +      * a non-preemptible section.
> >> +      */
> >> +     if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
> >> +             cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
> >
> > There is cpumask_test_and_clear_cpu() which may be better for this.
> 
> nice:
> 
> commit d31686fadb74ad564f6a5acabdebe411de86d77d
> Author: Christoffer Dall <c.dall@virtualopensystems.com>
> Date:   Mon Jan 14 12:36:53 2013 -0500
> 
>     KVM: ARM: Use cpumask_test_and_clear_cpu
> 
>     Nicer shorter cleaner code. Ahhhh.
> 
>     Cc: Russell King <linux@arm.linux.org.uk>

Great, thanks.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>

>     Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index b5c6ab1..fdd4a7c 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -352,10 +352,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	 * operations by set/way on this vcpu. We do it here to be in
>  	 * a non-preemptible section.
>  	 */
> -	if (cpumask_test_cpu(cpu, &vcpu->arch.require_dcache_flush)) {
> -		cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
> +	if (cpumask_test_and_clear_cpu(cpu, &vcpu->arch.require_dcache_flush))
>  		flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
> -	}
> 
>  	kvm_arm_set_running_vcpu(vcpu);
>  }
> 
> --
> 
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-14 18:25       ` Christoffer Dall
@ 2013-01-14 18:43         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 18:43 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

On Mon, Jan 14, 2013 at 01:25:39PM -0500, Christoffer Dall wrote:
> However, unifying all instruction decoding within arch/arm is quite
> the heavy task, and requires agreeing on some canonical API that
> people can live with and it will likely take a long time.  I seem to
> recall there were also arguments against unifying kprobe code with
> other instruction decoding, as the kprobe code was also written to
> work highly optimized under certain assumptions, if I understood
> previous comments correctly.

Yes, I know Rusty had a go.

What I think may make sense is to unify this and the alignment code.
They're really after the same things, which are:

- Given an instruction, and register set, calculate the address of the
  access, size, number of accesses, and the source/destination registers.
- Update the register set as though the instruction had been executed
  by the CPU.

However, I've changed tack slightly from the above in the last 10 minutes
or so.  I'm thinking a little more that we might be able to take what we
already have in alignment.c and provide it with a set of accessors
according to size etc.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-14 18:43         ` Russell King - ARM Linux
  0 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-14 18:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 01:25:39PM -0500, Christoffer Dall wrote:
> However, unifying all instruction decoding within arch/arm is quite
> the heavy task, and requires agreeing on some canonical API that
> people can live with and it will likely take a long time.  I seem to
> recall there were also arguments against unifying kprobe code with
> other instruction decoding, as the kprobe code was also written to
> work highly optimized under certain assumptions, if I understood
> previous comments correctly.

Yes, I know Rusty had a go.

What I think may make sense is to unify this and the alignment code.
They're really after the same things, which are:

- Given an instruction, and register set, calculate the address of the
  access, size, number of accesses, and the source/destination registers.
- Update the register set as though the instruction had been executed
  by the CPU.

However, I've changed tack slightly from the above in the last 10 minutes
or so.  I'm thinking a little more that we might be able to take what we
already have in alignment.c and provide it with a set of accessors
according to size etc.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-08 18:38   ` Christoffer Dall
@ 2013-01-14 18:49     ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-14 18:49 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

A couple of general question about ABI. If they were already answered
just refer me to the previous discussion.

On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index a4df553..4237c27 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -293,7 +293,7 @@ kvm_run' (see below).
>  4.11 KVM_GET_REGS
>  
>  Capability: basic
> -Architectures: all
> +Architectures: all except ARM
>  Type: vcpu ioctl
>  Parameters: struct kvm_regs (out)
>  Returns: 0 on success, -1 on error
> @@ -314,7 +314,7 @@ struct kvm_regs {
>  4.12 KVM_SET_REGS
>  
>  Capability: basic
> -Architectures: all
> +Architectures: all except ARM
>  Type: vcpu ioctl
>  Parameters: struct kvm_regs (in)
>  Returns: 0 on success, -1 on error
> @@ -600,7 +600,7 @@ struct kvm_fpu {
>  4.24 KVM_CREATE_IRQCHIP
Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?

>  
>  Capability: KVM_CAP_IRQCHIP
> -Architectures: x86, ia64
> +Architectures: x86, ia64, ARM
>  Type: vm ioctl
>  Parameters: none
>  Returns: 0 on success, -1 on error
> @@ -608,7 +608,8 @@ Returns: 0 on success, -1 on error
>  Creates an interrupt controller model in the kernel.  On x86, creates a virtual
>  ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
>  local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
> -only go to the IOAPIC.  On ia64, a IOSAPIC is created.
> +only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM, a GIC is
> +created.
>  
>  
>  4.25 KVM_IRQ_LINE
> @@ -1775,6 +1776,14 @@ registers, find a list below:
>    PPC   | KVM_REG_PPC_VPA_DTL   | 128
>    PPC   | KVM_REG_PPC_EPCR	| 32
>  
> +ARM registers are mapped using the lower 32 bits.  The upper 16 of that
> +is the register group type, or coprocessor number:
> +
> +ARM core registers have the following id bit patterns:
> +  0x4002 0000 0010 <index into the kvm_regs struct:16>
> +
> +
> +
>  4.69 KVM_GET_ONE_REG
>  
>  Capability: KVM_CAP_ONE_REG
> @@ -2127,6 +2136,46 @@ written, then `n_invalid' invalid entries, invalidating any previously
>  valid entries found.
>  
>  
> +4.77 KVM_ARM_VCPU_INIT
> +
> +Capability: basic
> +Architectures: arm
> +Type: vcpu ioctl
> +Parameters: struct struct kvm_vcpu_init (in)
> +Returns: 0 on success; -1 on error
> +Errors:
> +  EINVAL:    the target is unknown, or the combination of features is invalid.
> +  ENOENT:    a features bit specified is unknown.
> +
> +This tells KVM what type of CPU to present to the guest, and what
> +optional features it should have.  This will cause a reset of the cpu
> +registers to their initial values.  If this is not called, KVM_RUN will
> +return ENOEXEC for that vcpu.
> +
Can different vcpus of the same VM be of different type?

> +Note that because some registers reflect machine topology, all vcpus
> +should be created before this ioctl is invoked.
How cpu hot plug suppose to work?

> +
> +
> +4.78 KVM_GET_REG_LIST
> +
> +Capability: basic
> +Architectures: arm
> +Type: vcpu ioctl
> +Parameters: struct kvm_reg_list (in/out)
> +Returns: 0 on success; -1 on error
> +Errors:
> +  E2BIG:     the reg index list is too big to fit in the array specified by
> +             the user (the number required will be written into n).
> +
> +struct kvm_reg_list {
> +	__u64 n; /* number of registers in reg[] */
> +	__u64 reg[0];
> +};
> +
> +This ioctl returns the guest registers that are supported for the
> +KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
> +
> +
Doesn't userspace know what registers are supported by each CPU type?

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-14 18:49     ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

A couple of general question about ABI. If they were already answered
just refer me to the previous discussion.

On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index a4df553..4237c27 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -293,7 +293,7 @@ kvm_run' (see below).
>  4.11 KVM_GET_REGS
>  
>  Capability: basic
> -Architectures: all
> +Architectures: all except ARM
>  Type: vcpu ioctl
>  Parameters: struct kvm_regs (out)
>  Returns: 0 on success, -1 on error
> @@ -314,7 +314,7 @@ struct kvm_regs {
>  4.12 KVM_SET_REGS
>  
>  Capability: basic
> -Architectures: all
> +Architectures: all except ARM
>  Type: vcpu ioctl
>  Parameters: struct kvm_regs (in)
>  Returns: 0 on success, -1 on error
> @@ -600,7 +600,7 @@ struct kvm_fpu {
>  4.24 KVM_CREATE_IRQCHIP
Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?

>  
>  Capability: KVM_CAP_IRQCHIP
> -Architectures: x86, ia64
> +Architectures: x86, ia64, ARM
>  Type: vm ioctl
>  Parameters: none
>  Returns: 0 on success, -1 on error
> @@ -608,7 +608,8 @@ Returns: 0 on success, -1 on error
>  Creates an interrupt controller model in the kernel.  On x86, creates a virtual
>  ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
>  local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
> -only go to the IOAPIC.  On ia64, a IOSAPIC is created.
> +only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM, a GIC is
> +created.
>  
>  
>  4.25 KVM_IRQ_LINE
> @@ -1775,6 +1776,14 @@ registers, find a list below:
>    PPC   | KVM_REG_PPC_VPA_DTL   | 128
>    PPC   | KVM_REG_PPC_EPCR	| 32
>  
> +ARM registers are mapped using the lower 32 bits.  The upper 16 of that
> +is the register group type, or coprocessor number:
> +
> +ARM core registers have the following id bit patterns:
> +  0x4002 0000 0010 <index into the kvm_regs struct:16>
> +
> +
> +
>  4.69 KVM_GET_ONE_REG
>  
>  Capability: KVM_CAP_ONE_REG
> @@ -2127,6 +2136,46 @@ written, then `n_invalid' invalid entries, invalidating any previously
>  valid entries found.
>  
>  
> +4.77 KVM_ARM_VCPU_INIT
> +
> +Capability: basic
> +Architectures: arm
> +Type: vcpu ioctl
> +Parameters: struct struct kvm_vcpu_init (in)
> +Returns: 0 on success; -1 on error
> +Errors:
> + ?EINVAL: ???the target is unknown, or the combination of features is invalid.
> + ?ENOENT: ???a features bit specified is unknown.
> +
> +This tells KVM what type of CPU to present to the guest, and what
> +optional features it should have. ?This will cause a reset of the cpu
> +registers to their initial values. ?If this is not called, KVM_RUN will
> +return ENOEXEC for that vcpu.
> +
Can different vcpus of the same VM be of different type?

> +Note that because some registers reflect machine topology, all vcpus
> +should be created before this ioctl is invoked.
How cpu hot plug suppose to work?

> +
> +
> +4.78 KVM_GET_REG_LIST
> +
> +Capability: basic
> +Architectures: arm
> +Type: vcpu ioctl
> +Parameters: struct kvm_reg_list (in/out)
> +Returns: 0 on success; -1 on error
> +Errors:
> + ?E2BIG: ????the reg index list is too big to fit in the array specified by
> + ????????????the user (the number required will be written into n).
> +
> +struct kvm_reg_list {
> +	__u64 n; /* number of registers in reg[] */
> +	__u64 reg[0];
> +};
> +
> +This ioctl returns the guest registers that are supported for the
> +KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
> +
> +
Doesn't userspace know what registers are supported by each CPU type?

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-14 18:43         ` Russell King - ARM Linux
@ 2013-01-14 18:50           ` Will Deacon
  -1 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 18:50 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Christoffer Dall, Marcelo Tosatti, kvm, Marc Zyngier,
	Rusty Russell, kvmarm, linux-arm-kernel

On Mon, Jan 14, 2013 at 06:43:19PM +0000, Russell King - ARM Linux wrote:
> On Mon, Jan 14, 2013 at 01:25:39PM -0500, Christoffer Dall wrote:
> > However, unifying all instruction decoding within arch/arm is quite
> > the heavy task, and requires agreeing on some canonical API that
> > people can live with and it will likely take a long time.  I seem to
> > recall there were also arguments against unifying kprobe code with
> > other instruction decoding, as the kprobe code was also written to
> > work highly optimized under certain assumptions, if I understood
> > previous comments correctly.
> 
> Yes, I know Rusty had a go.
> 
> What I think may make sense is to unify this and the alignment code.
> They're really after the same things, which are:
> 
> - Given an instruction, and register set, calculate the address of the
>   access, size, number of accesses, and the source/destination registers.
> - Update the register set as though the instruction had been executed
>   by the CPU.
> 
> However, I've changed tack slightly from the above in the last 10 minutes
> or so.  I'm thinking a little more that we might be able to take what we
> already have in alignment.c and provide it with a set of accessors
> according to size etc.

FWIW, KVM only needs this code for handling complex MMIO instructions, which
aren't even generated by recent guest kernels. I'm inclined to suggest removing
this emulation code from KVM entirely given that it's likely to bitrot as
it is executed less and less often.

This doesn't solve the problem of having multiple people doing the same
thing, but at least we don't have one extra set of decoding logic for
arch/arm/ (even though the code itself is pretty clean).

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-14 18:50           ` Will Deacon
  0 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 06:43:19PM +0000, Russell King - ARM Linux wrote:
> On Mon, Jan 14, 2013 at 01:25:39PM -0500, Christoffer Dall wrote:
> > However, unifying all instruction decoding within arch/arm is quite
> > the heavy task, and requires agreeing on some canonical API that
> > people can live with and it will likely take a long time.  I seem to
> > recall there were also arguments against unifying kprobe code with
> > other instruction decoding, as the kprobe code was also written to
> > work highly optimized under certain assumptions, if I understood
> > previous comments correctly.
> 
> Yes, I know Rusty had a go.
> 
> What I think may make sense is to unify this and the alignment code.
> They're really after the same things, which are:
> 
> - Given an instruction, and register set, calculate the address of the
>   access, size, number of accesses, and the source/destination registers.
> - Update the register set as though the instruction had been executed
>   by the CPU.
> 
> However, I've changed tack slightly from the above in the last 10 minutes
> or so.  I'm thinking a little more that we might be able to take what we
> already have in alignment.c and provide it with a set of accessors
> according to size etc.

FWIW, KVM only needs this code for handling complex MMIO instructions, which
aren't even generated by recent guest kernels. I'm inclined to suggest removing
this emulation code from KVM entirely given that it's likely to bitrot as
it is executed less and less often.

This doesn't solve the problem of having multiple people doing the same
thing, but at least we don't have one extra set of decoding logic for
arch/arm/ (even though the code itself is pretty clean).

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-14 18:50           ` Will Deacon
@ 2013-01-14 18:53             ` Alexander Graf
  -1 siblings, 0 replies; 160+ messages in thread
From: Alexander Graf @ 2013-01-14 18:53 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King - ARM Linux, kvm, Marc Zyngier, Marcelo Tosatti,
	kvmarm, linux-arm-kernel

On 01/14/2013 07:50 PM, Will Deacon wrote:
> On Mon, Jan 14, 2013 at 06:43:19PM +0000, Russell King - ARM Linux wrote:
>> On Mon, Jan 14, 2013 at 01:25:39PM -0500, Christoffer Dall wrote:
>>> However, unifying all instruction decoding within arch/arm is quite
>>> the heavy task, and requires agreeing on some canonical API that
>>> people can live with and it will likely take a long time.  I seem to
>>> recall there were also arguments against unifying kprobe code with
>>> other instruction decoding, as the kprobe code was also written to
>>> work highly optimized under certain assumptions, if I understood
>>> previous comments correctly.
>> Yes, I know Rusty had a go.
>>
>> What I think may make sense is to unify this and the alignment code.
>> They're really after the same things, which are:
>>
>> - Given an instruction, and register set, calculate the address of the
>>    access, size, number of accesses, and the source/destination registers.
>> - Update the register set as though the instruction had been executed
>>    by the CPU.
>>
>> However, I've changed tack slightly from the above in the last 10 minutes
>> or so.  I'm thinking a little more that we might be able to take what we
>> already have in alignment.c and provide it with a set of accessors
>> according to size etc.
> FWIW, KVM only needs this code for handling complex MMIO instructions, which
> aren't even generated by recent guest kernels. I'm inclined to suggest removing
> this emulation code from KVM entirely given that it's likely to bitrot as
> it is executed less and less often.

That'd mean that you heavily limit what type of guests you're executing, 
which I don't think is a good idea.


Alex


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-14 18:53             ` Alexander Graf
  0 siblings, 0 replies; 160+ messages in thread
From: Alexander Graf @ 2013-01-14 18:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/14/2013 07:50 PM, Will Deacon wrote:
> On Mon, Jan 14, 2013 at 06:43:19PM +0000, Russell King - ARM Linux wrote:
>> On Mon, Jan 14, 2013 at 01:25:39PM -0500, Christoffer Dall wrote:
>>> However, unifying all instruction decoding within arch/arm is quite
>>> the heavy task, and requires agreeing on some canonical API that
>>> people can live with and it will likely take a long time.  I seem to
>>> recall there were also arguments against unifying kprobe code with
>>> other instruction decoding, as the kprobe code was also written to
>>> work highly optimized under certain assumptions, if I understood
>>> previous comments correctly.
>> Yes, I know Rusty had a go.
>>
>> What I think may make sense is to unify this and the alignment code.
>> They're really after the same things, which are:
>>
>> - Given an instruction, and register set, calculate the address of the
>>    access, size, number of accesses, and the source/destination registers.
>> - Update the register set as though the instruction had been executed
>>    by the CPU.
>>
>> However, I've changed tack slightly from the above in the last 10 minutes
>> or so.  I'm thinking a little more that we might be able to take what we
>> already have in alignment.c and provide it with a set of accessors
>> according to size etc.
> FWIW, KVM only needs this code for handling complex MMIO instructions, which
> aren't even generated by recent guest kernels. I'm inclined to suggest removing
> this emulation code from KVM entirely given that it's likely to bitrot as
> it is executed less and less often.

That'd mean that you heavily limit what type of guests you're executing, 
which I don't think is a good idea.


Alex

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-14 18:53             ` Alexander Graf
@ 2013-01-14 18:56               ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 18:56 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Will Deacon, Russell King - ARM Linux, kvm, Marc Zyngier,
	Marcelo Tosatti, kvmarm, linux-arm-kernel

On Mon, Jan 14, 2013 at 1:53 PM, Alexander Graf <agraf@suse.de> wrote:
> On 01/14/2013 07:50 PM, Will Deacon wrote:
>>
>> On Mon, Jan 14, 2013 at 06:43:19PM +0000, Russell King - ARM Linux wrote:
>>>
>>> On Mon, Jan 14, 2013 at 01:25:39PM -0500, Christoffer Dall wrote:
>>>>
>>>> However, unifying all instruction decoding within arch/arm is quite
>>>> the heavy task, and requires agreeing on some canonical API that
>>>> people can live with and it will likely take a long time.  I seem to
>>>> recall there were also arguments against unifying kprobe code with
>>>> other instruction decoding, as the kprobe code was also written to
>>>> work highly optimized under certain assumptions, if I understood
>>>> previous comments correctly.
>>>
>>> Yes, I know Rusty had a go.
>>>
>>> What I think may make sense is to unify this and the alignment code.
>>> They're really after the same things, which are:
>>>
>>> - Given an instruction, and register set, calculate the address of the
>>>    access, size, number of accesses, and the source/destination
>>> registers.
>>> - Update the register set as though the instruction had been executed
>>>    by the CPU.
>>>
>>> However, I've changed tack slightly from the above in the last 10 minutes
>>> or so.  I'm thinking a little more that we might be able to take what we
>>> already have in alignment.c and provide it with a set of accessors
>>> according to size etc.
>>
>> FWIW, KVM only needs this code for handling complex MMIO instructions,
>> which
>> aren't even generated by recent guest kernels. I'm inclined to suggest
>> removing
>> this emulation code from KVM entirely given that it's likely to bitrot as
>> it is executed less and less often.
>
>
> That'd mean that you heavily limit what type of guests you're executing,
> which I don't think is a good idea.
>
It would limit legacy Linux kernels at least, but I think getting
KVM/ARM code in mainline is the highest priority, so if merging the
current code is unacceptable, I'm willing to drop the mmio emulation
for now and queue the task of unifying the code for later.

A bit of a shame (think about someone wanting to run some proprietary
custom OS in a VM), but this code has been out-of-tree for too long
already, and I'm afraid unifying the decoding pre-merge is going to
hold things up.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-14 18:56               ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 18:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 1:53 PM, Alexander Graf <agraf@suse.de> wrote:
> On 01/14/2013 07:50 PM, Will Deacon wrote:
>>
>> On Mon, Jan 14, 2013 at 06:43:19PM +0000, Russell King - ARM Linux wrote:
>>>
>>> On Mon, Jan 14, 2013 at 01:25:39PM -0500, Christoffer Dall wrote:
>>>>
>>>> However, unifying all instruction decoding within arch/arm is quite
>>>> the heavy task, and requires agreeing on some canonical API that
>>>> people can live with and it will likely take a long time.  I seem to
>>>> recall there were also arguments against unifying kprobe code with
>>>> other instruction decoding, as the kprobe code was also written to
>>>> work highly optimized under certain assumptions, if I understood
>>>> previous comments correctly.
>>>
>>> Yes, I know Rusty had a go.
>>>
>>> What I think may make sense is to unify this and the alignment code.
>>> They're really after the same things, which are:
>>>
>>> - Given an instruction, and register set, calculate the address of the
>>>    access, size, number of accesses, and the source/destination
>>> registers.
>>> - Update the register set as though the instruction had been executed
>>>    by the CPU.
>>>
>>> However, I've changed tack slightly from the above in the last 10 minutes
>>> or so.  I'm thinking a little more that we might be able to take what we
>>> already have in alignment.c and provide it with a set of accessors
>>> according to size etc.
>>
>> FWIW, KVM only needs this code for handling complex MMIO instructions,
>> which
>> aren't even generated by recent guest kernels. I'm inclined to suggest
>> removing
>> this emulation code from KVM entirely given that it's likely to bitrot as
>> it is executed less and less often.
>
>
> That'd mean that you heavily limit what type of guests you're executing,
> which I don't think is a good idea.
>
It would limit legacy Linux kernels at least, but I think getting
KVM/ARM code in mainline is the highest priority, so if merging the
current code is unacceptable, I'm willing to drop the mmio emulation
for now and queue the task of unifying the code for later.

A bit of a shame (think about someone wanting to run some proprietary
custom OS in a VM), but this code has been out-of-tree for too long
already, and I'm afraid unifying the decoding pre-merge is going to
hold things up.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-14 18:53             ` Alexander Graf
@ 2013-01-14 19:00               ` Will Deacon
  -1 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 19:00 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Russell King - ARM Linux, kvm, Marc Zyngier, Marcelo Tosatti,
	kvmarm, linux-arm-kernel

On Mon, Jan 14, 2013 at 06:53:14PM +0000, Alexander Graf wrote:
> On 01/14/2013 07:50 PM, Will Deacon wrote:
> > FWIW, KVM only needs this code for handling complex MMIO instructions, which
> > aren't even generated by recent guest kernels. I'm inclined to suggest removing
> > this emulation code from KVM entirely given that it's likely to bitrot as
> > it is executed less and less often.
> 
> That'd mean that you heavily limit what type of guests you're executing, 
> which I don't think is a good idea.

To be honest, I don't think we know whether that's true or not. How many
guests out there do writeback accesses to MMIO devices? Even on older
Linux guests, it was dependent on how GCC felt.

I see where you're coming from, I just don't think we can quantify it either
way outside of Linux.

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-14 19:00               ` Will Deacon
  0 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 19:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 06:53:14PM +0000, Alexander Graf wrote:
> On 01/14/2013 07:50 PM, Will Deacon wrote:
> > FWIW, KVM only needs this code for handling complex MMIO instructions, which
> > aren't even generated by recent guest kernels. I'm inclined to suggest removing
> > this emulation code from KVM entirely given that it's likely to bitrot as
> > it is executed less and less often.
> 
> That'd mean that you heavily limit what type of guests you're executing, 
> which I don't think is a good idea.

To be honest, I don't think we know whether that's true or not. How many
guests out there do writeback accesses to MMIO devices? Even on older
Linux guests, it was dependent on how GCC felt.

I see where you're coming from, I just don't think we can quantify it either
way outside of Linux.

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-14 19:00               ` Will Deacon
@ 2013-01-14 19:12                 ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 19:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alexander Graf, Russell King - ARM Linux, kvm, Marc Zyngier,
	Marcelo Tosatti, kvmarm, linux-arm-kernel

On Mon, Jan 14, 2013 at 2:00 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Jan 14, 2013 at 06:53:14PM +0000, Alexander Graf wrote:
>> On 01/14/2013 07:50 PM, Will Deacon wrote:
>> > FWIW, KVM only needs this code for handling complex MMIO instructions, which
>> > aren't even generated by recent guest kernels. I'm inclined to suggest removing
>> > this emulation code from KVM entirely given that it's likely to bitrot as
>> > it is executed less and less often.
>>
>> That'd mean that you heavily limit what type of guests you're executing,
>> which I don't think is a good idea.
>
> To be honest, I don't think we know whether that's true or not. How many
> guests out there do writeback accesses to MMIO devices? Even on older
> Linux guests, it was dependent on how GCC felt.

I don't think bitrot'ing is a valid argument: the code doesn't depend
on any other implementation state that's likely to change and break
this code (the instruction encoding is not exactly going to change).
And we should simply finish the selftest code to test this stuff
(which should be finished if the code is unified or not, and is on my
todo list).

>
> I see where you're coming from, I just don't think we can quantify it either
> way outside of Linux.
>
FWIW, I know of at least a couple of companies wanting to use KVM for
running non-Linux guests as well.

But, however a shame, I can more easily maintain this single patch
out-of-tree, so I'm willing to drop this logic for now if it gets
things moving.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-14 19:12                 ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 19:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 2:00 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Jan 14, 2013 at 06:53:14PM +0000, Alexander Graf wrote:
>> On 01/14/2013 07:50 PM, Will Deacon wrote:
>> > FWIW, KVM only needs this code for handling complex MMIO instructions, which
>> > aren't even generated by recent guest kernels. I'm inclined to suggest removing
>> > this emulation code from KVM entirely given that it's likely to bitrot as
>> > it is executed less and less often.
>>
>> That'd mean that you heavily limit what type of guests you're executing,
>> which I don't think is a good idea.
>
> To be honest, I don't think we know whether that's true or not. How many
> guests out there do writeback accesses to MMIO devices? Even on older
> Linux guests, it was dependent on how GCC felt.

I don't think bitrot'ing is a valid argument: the code doesn't depend
on any other implementation state that's likely to change and break
this code (the instruction encoding is not exactly going to change).
And we should simply finish the selftest code to test this stuff
(which should be finished if the code is unified or not, and is on my
todo list).

>
> I see where you're coming from, I just don't think we can quantify it either
> way outside of Linux.
>
FWIW, I know of at least a couple of companies wanting to use KVM for
running non-Linux guests as well.

But, however a shame, I can more easily maintain this single patch
out-of-tree, so I'm willing to drop this logic for now if it gets
things moving.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-14 18:49     ` Gleb Natapov
@ 2013-01-14 22:17       ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 22:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

On Mon, Jan 14, 2013 at 1:49 PM, Gleb Natapov <gleb@redhat.com> wrote:
> A couple of general question about ABI. If they were already answered
> just refer me to the previous discussion.
>
> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index a4df553..4237c27 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -293,7 +293,7 @@ kvm_run' (see below).
>>  4.11 KVM_GET_REGS
>>
>>  Capability: basic
>> -Architectures: all
>> +Architectures: all except ARM
>>  Type: vcpu ioctl
>>  Parameters: struct kvm_regs (out)
>>  Returns: 0 on success, -1 on error
>> @@ -314,7 +314,7 @@ struct kvm_regs {
>>  4.12 KVM_SET_REGS
>>
>>  Capability: basic
>> -Architectures: all
>> +Architectures: all except ARM
>>  Type: vcpu ioctl
>>  Parameters: struct kvm_regs (in)
>>  Returns: 0 on success, -1 on error
>> @@ -600,7 +600,7 @@ struct kvm_fpu {
>>  4.24 KVM_CREATE_IRQCHIP
> Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?
>

We use the ONE_REG API instead and we don't want to support two
separate APIs to user space.

>>
>>  Capability: KVM_CAP_IRQCHIP
>> -Architectures: x86, ia64
>> +Architectures: x86, ia64, ARM
>>  Type: vm ioctl
>>  Parameters: none
>>  Returns: 0 on success, -1 on error
>> @@ -608,7 +608,8 @@ Returns: 0 on success, -1 on error
>>  Creates an interrupt controller model in the kernel.  On x86, creates a virtual
>>  ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
>>  local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
>> -only go to the IOAPIC.  On ia64, a IOSAPIC is created.
>> +only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM, a GIC is
>> +created.
>>
>>
>>  4.25 KVM_IRQ_LINE
>> @@ -1775,6 +1776,14 @@ registers, find a list below:
>>    PPC   | KVM_REG_PPC_VPA_DTL   | 128
>>    PPC   | KVM_REG_PPC_EPCR   | 32
>>
>> +ARM registers are mapped using the lower 32 bits.  The upper 16 of that
>> +is the register group type, or coprocessor number:
>> +
>> +ARM core registers have the following id bit patterns:
>> +  0x4002 0000 0010 <index into the kvm_regs struct:16>
>> +
>> +
>> +
>>  4.69 KVM_GET_ONE_REG
>>
>>  Capability: KVM_CAP_ONE_REG
>> @@ -2127,6 +2136,46 @@ written, then `n_invalid' invalid entries, invalidating any previously
>>  valid entries found.
>>
>>
>> +4.77 KVM_ARM_VCPU_INIT
>> +
>> +Capability: basic
>> +Architectures: arm
>> +Type: vcpu ioctl
>> +Parameters: struct struct kvm_vcpu_init (in)
>> +Returns: 0 on success; -1 on error
>> +Errors:
>> +  EINVAL:    the target is unknown, or the combination of features is invalid.
>> +  ENOENT:    a features bit specified is unknown.
>> +
>> +This tells KVM what type of CPU to present to the guest, and what
>> +optional features it should have.  This will cause a reset of the cpu
>> +registers to their initial values.  If this is not called, KVM_RUN will
>> +return ENOEXEC for that vcpu.
>> +
> Can different vcpus of the same VM be of different type?
>

In the future yes. For example, if we ever want to virtualize a
Big.Little system.

>> +Note that because some registers reflect machine topology, all vcpus
>> +should be created before this ioctl is invoked.
> How cpu hot plug suppose to work?
>

Those CPUs would be added from the beginning, but not powered on, and
would be powered on later on, I suppose.  See
https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004617.html.


>> +
>> +
>> +4.78 KVM_GET_REG_LIST
>> +
>> +Capability: basic
>> +Architectures: arm
>> +Type: vcpu ioctl
>> +Parameters: struct kvm_reg_list (in/out)
>> +Returns: 0 on success; -1 on error
>> +Errors:
>> +  E2BIG:     the reg index list is too big to fit in the array specified by
>> +             the user (the number required will be written into n).
>> +
>> +struct kvm_reg_list {
>> +     __u64 n; /* number of registers in reg[] */
>> +     __u64 reg[0];
>> +};
>> +
>> +This ioctl returns the guest registers that are supported for the
>> +KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
>> +
>> +
> Doesn't userspace know what registers are supported by each CPU type?
>
It would know about core registers, but there is a huge space of
co-processors, and we don't emulate all of them or support
getting/setting all of them yet. Surely this is something that will
change over time and we want user space to be able to discover the
available registers for backwards compatibility, migration, etc.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-14 22:17       ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 22:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 1:49 PM, Gleb Natapov <gleb@redhat.com> wrote:
> A couple of general question about ABI. If they were already answered
> just refer me to the previous discussion.
>
> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index a4df553..4237c27 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -293,7 +293,7 @@ kvm_run' (see below).
>>  4.11 KVM_GET_REGS
>>
>>  Capability: basic
>> -Architectures: all
>> +Architectures: all except ARM
>>  Type: vcpu ioctl
>>  Parameters: struct kvm_regs (out)
>>  Returns: 0 on success, -1 on error
>> @@ -314,7 +314,7 @@ struct kvm_regs {
>>  4.12 KVM_SET_REGS
>>
>>  Capability: basic
>> -Architectures: all
>> +Architectures: all except ARM
>>  Type: vcpu ioctl
>>  Parameters: struct kvm_regs (in)
>>  Returns: 0 on success, -1 on error
>> @@ -600,7 +600,7 @@ struct kvm_fpu {
>>  4.24 KVM_CREATE_IRQCHIP
> Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?
>

We use the ONE_REG API instead and we don't want to support two
separate APIs to user space.

>>
>>  Capability: KVM_CAP_IRQCHIP
>> -Architectures: x86, ia64
>> +Architectures: x86, ia64, ARM
>>  Type: vm ioctl
>>  Parameters: none
>>  Returns: 0 on success, -1 on error
>> @@ -608,7 +608,8 @@ Returns: 0 on success, -1 on error
>>  Creates an interrupt controller model in the kernel.  On x86, creates a virtual
>>  ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
>>  local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
>> -only go to the IOAPIC.  On ia64, a IOSAPIC is created.
>> +only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM, a GIC is
>> +created.
>>
>>
>>  4.25 KVM_IRQ_LINE
>> @@ -1775,6 +1776,14 @@ registers, find a list below:
>>    PPC   | KVM_REG_PPC_VPA_DTL   | 128
>>    PPC   | KVM_REG_PPC_EPCR   | 32
>>
>> +ARM registers are mapped using the lower 32 bits.  The upper 16 of that
>> +is the register group type, or coprocessor number:
>> +
>> +ARM core registers have the following id bit patterns:
>> +  0x4002 0000 0010 <index into the kvm_regs struct:16>
>> +
>> +
>> +
>>  4.69 KVM_GET_ONE_REG
>>
>>  Capability: KVM_CAP_ONE_REG
>> @@ -2127,6 +2136,46 @@ written, then `n_invalid' invalid entries, invalidating any previously
>>  valid entries found.
>>
>>
>> +4.77 KVM_ARM_VCPU_INIT
>> +
>> +Capability: basic
>> +Architectures: arm
>> +Type: vcpu ioctl
>> +Parameters: struct struct kvm_vcpu_init (in)
>> +Returns: 0 on success; -1 on error
>> +Errors:
>> +  EINVAL:    the target is unknown, or the combination of features is invalid.
>> +  ENOENT:    a features bit specified is unknown.
>> +
>> +This tells KVM what type of CPU to present to the guest, and what
>> +optional features it should have.  This will cause a reset of the cpu
>> +registers to their initial values.  If this is not called, KVM_RUN will
>> +return ENOEXEC for that vcpu.
>> +
> Can different vcpus of the same VM be of different type?
>

In the future yes. For example, if we ever want to virtualize a
Big.Little system.

>> +Note that because some registers reflect machine topology, all vcpus
>> +should be created before this ioctl is invoked.
> How cpu hot plug suppose to work?
>

Those CPUs would be added from the beginning, but not powered on, and
would be powered on later on, I suppose.  See
https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004617.html.


>> +
>> +
>> +4.78 KVM_GET_REG_LIST
>> +
>> +Capability: basic
>> +Architectures: arm
>> +Type: vcpu ioctl
>> +Parameters: struct kvm_reg_list (in/out)
>> +Returns: 0 on success; -1 on error
>> +Errors:
>> +  E2BIG:     the reg index list is too big to fit in the array specified by
>> +             the user (the number required will be written into n).
>> +
>> +struct kvm_reg_list {
>> +     __u64 n; /* number of registers in reg[] */
>> +     __u64 reg[0];
>> +};
>> +
>> +This ioctl returns the guest registers that are supported for the
>> +KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
>> +
>> +
> Doesn't userspace know what registers are supported by each CPU type?
>
It would know about core registers, but there is a huge space of
co-processors, and we don't emulate all of them or support
getting/setting all of them yet. Surely this is something that will
change over time and we want user space to be able to discover the
available registers for backwards compatibility, migration, etc.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 00/14] KVM/ARM Implementation
  2013-01-14 16:00   ` Will Deacon
@ 2013-01-14 22:31     ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 22:31 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvm, linux-arm-kernel, kvmarm, marc.zyngier, robherring2,
	mark.rutland, arnd

On Mon, Jan 14, 2013 at 11:00 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Christoffer,
>
> On Tue, Jan 08, 2013 at 06:38:34PM +0000, Christoffer Dall wrote:
>> The following series implements KVM support for ARM processors,
>> specifically on the Cortex-A15 platform.
>
> [...]
>
> This is looking pretty good to me now and I feel that the longer it stays
> out-of-tree, the more issues will creep in (without continual effort from
> yourself and others). I've sent some minor comments (mainly vgic-related)
> so, if you fix those, then you can add:
>
>   Reviewed-by: Will Deacon <will.deacon@arm.com>
>
> for the series.

A big thanks!

>
> Now, there's a lot of code here and merging isn't completely
> straightforward. I propose:
>
>   * The first series should go via Russell's tree. It depends on my
>     perf branch for the CPU type stuff, but that should go in for 3.9
>     anyway (also via Russell).
>

Great.

>   * The vGIC patches need rebasing on top of Rob Herring's work, which
>     he sent a pull for over the weekend:
>
>       http://lists.infradead.org/pipermail/linux-arm-kernel/2013-January/141488.html
>
>     In light of that, this stuff will need to go via arm-soc.
>
>   * The hyp arch-timers are in a similar situation to the vGIC: Mark Rutland
>     is moving those into drivers:
>
>       http://lists.infradead.org/pipermail/linux-arm-kernel/2013-January/140560.html
>
>     so the kvm bits will need rebasing appropriately and also sent to
>     arm-soc (Mark -- I assume you intend to send a PULL for 3.9 for those
>     patches?)
>
> Obviously this is all open for discussion, but that seems like the easiest
> option to me.
>
Makes good sense. Given that there are no other major disputes over
the code we are a couple of people strongly hoping that this can
happen for 3.9.

In any case, I'll do whatever makes the process the easiest for you guys.

Best,
-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 00/14] KVM/ARM Implementation
@ 2013-01-14 22:31     ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 22:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 11:00 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Christoffer,
>
> On Tue, Jan 08, 2013 at 06:38:34PM +0000, Christoffer Dall wrote:
>> The following series implements KVM support for ARM processors,
>> specifically on the Cortex-A15 platform.
>
> [...]
>
> This is looking pretty good to me now and I feel that the longer it stays
> out-of-tree, the more issues will creep in (without continual effort from
> yourself and others). I've sent some minor comments (mainly vgic-related)
> so, if you fix those, then you can add:
>
>   Reviewed-by: Will Deacon <will.deacon@arm.com>
>
> for the series.

A big thanks!

>
> Now, there's a lot of code here and merging isn't completely
> straightforward. I propose:
>
>   * The first series should go via Russell's tree. It depends on my
>     perf branch for the CPU type stuff, but that should go in for 3.9
>     anyway (also via Russell).
>

Great.

>   * The vGIC patches need rebasing on top of Rob Herring's work, which
>     he sent a pull for over the weekend:
>
>       http://lists.infradead.org/pipermail/linux-arm-kernel/2013-January/141488.html
>
>     In light of that, this stuff will need to go via arm-soc.
>
>   * The hyp arch-timers are in a similar situation to the vGIC: Mark Rutland
>     is moving those into drivers:
>
>       http://lists.infradead.org/pipermail/linux-arm-kernel/2013-January/140560.html
>
>     so the kvm bits will need rebasing appropriately and also sent to
>     arm-soc (Mark -- I assume you intend to send a PULL for 3.9 for those
>     patches?)
>
> Obviously this is all open for discussion, but that seems like the easiest
> option to me.
>
Makes good sense. Given that there are no other major disputes over
the code we are a couple of people strongly hoping that this can
happen for 3.9.

In any case, I'll do whatever makes the process the easiest for you guys.

Best,
-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-14 19:12                 ` Christoffer Dall
@ 2013-01-14 22:36                   ` Will Deacon
  -1 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 22:36 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Russell King - ARM Linux, kvm, Marc Zyngier, Marcelo Tosatti,
	Alexander Graf, kvmarm, linux-arm-kernel

On Mon, Jan 14, 2013 at 07:12:49PM +0000, Christoffer Dall wrote:
> On Mon, Jan 14, 2013 at 2:00 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Mon, Jan 14, 2013 at 06:53:14PM +0000, Alexander Graf wrote:
> >> On 01/14/2013 07:50 PM, Will Deacon wrote:
> >> > FWIW, KVM only needs this code for handling complex MMIO instructions, which
> >> > aren't even generated by recent guest kernels. I'm inclined to suggest removing
> >> > this emulation code from KVM entirely given that it's likely to bitrot as
> >> > it is executed less and less often.
> >>
> >> That'd mean that you heavily limit what type of guests you're executing,
> >> which I don't think is a good idea.
> >
> > To be honest, I don't think we know whether that's true or not. How many
> > guests out there do writeback accesses to MMIO devices? Even on older
> > Linux guests, it was dependent on how GCC felt.
> 
> I don't think bitrot'ing is a valid argument: the code doesn't depend
> on any other implementation state that's likely to change and break
> this code (the instruction encoding is not exactly going to change).
> And we should simply finish the selftest code to test this stuff
> (which should be finished if the code is unified or not, and is on my
> todo list).

Maybe `bitrot' is the wrong word. The scenario I envisage is the addition
of new instructions to the architecture which aren't handled by the current
code, then we end up with emulation code that works for some percentage of
the instruction set only. If the code is rarely used, it will likely go
untouched until it crashes somebody's VM.

> > I see where you're coming from, I just don't think we can quantify it either
> > way outside of Linux.
> >
> FWIW, I know of at least a couple of companies wanting to use KVM for
> running non-Linux guests as well.

Oh, I don't doubt that. The point is, do we have any idea how they behave
under KVM? Do they generate complex MMIO accesses? Do they expect firmware
shims, possibly sitting above hyp? Do they require a signed boot sequence?
Do they run on Cortex-A15 (the only target CPU we have at the moment)?

> But, however a shame, I can more easily maintain this single patch
> out-of-tree, so I'm willing to drop this logic for now if it gets
> things moving.

I would hope that, if this code is actually required, you would consider
merging it with what we have rather than maintaining it out-of-tree.

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-14 22:36                   ` Will Deacon
  0 siblings, 0 replies; 160+ messages in thread
From: Will Deacon @ 2013-01-14 22:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 07:12:49PM +0000, Christoffer Dall wrote:
> On Mon, Jan 14, 2013 at 2:00 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Mon, Jan 14, 2013 at 06:53:14PM +0000, Alexander Graf wrote:
> >> On 01/14/2013 07:50 PM, Will Deacon wrote:
> >> > FWIW, KVM only needs this code for handling complex MMIO instructions, which
> >> > aren't even generated by recent guest kernels. I'm inclined to suggest removing
> >> > this emulation code from KVM entirely given that it's likely to bitrot as
> >> > it is executed less and less often.
> >>
> >> That'd mean that you heavily limit what type of guests you're executing,
> >> which I don't think is a good idea.
> >
> > To be honest, I don't think we know whether that's true or not. How many
> > guests out there do writeback accesses to MMIO devices? Even on older
> > Linux guests, it was dependent on how GCC felt.
> 
> I don't think bitrot'ing is a valid argument: the code doesn't depend
> on any other implementation state that's likely to change and break
> this code (the instruction encoding is not exactly going to change).
> And we should simply finish the selftest code to test this stuff
> (which should be finished if the code is unified or not, and is on my
> todo list).

Maybe `bitrot' is the wrong word. The scenario I envisage is the addition
of new instructions to the architecture which aren't handled by the current
code, then we end up with emulation code that works for some percentage of
the instruction set only. If the code is rarely used, it will likely go
untouched until it crashes somebody's VM.

> > I see where you're coming from, I just don't think we can quantify it either
> > way outside of Linux.
> >
> FWIW, I know of at least a couple of companies wanting to use KVM for
> running non-Linux guests as well.

Oh, I don't doubt that. The point is, do we have any idea how they behave
under KVM? Do they generate complex MMIO accesses? Do they expect firmware
shims, possibly sitting above hyp? Do they require a signed boot sequence?
Do they run on Cortex-A15 (the only target CPU we have at the moment)?

> But, however a shame, I can more easily maintain this single patch
> out-of-tree, so I'm willing to drop this logic for now if it gets
> things moving.

I would hope that, if this code is actually required, you would consider
merging it with what we have rather than maintaining it out-of-tree.

Will

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-14 22:36                   ` Will Deacon
@ 2013-01-14 22:51                     ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 22:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alexander Graf, Russell King - ARM Linux, kvm, Marc Zyngier,
	Marcelo Tosatti, kvmarm, linux-arm-kernel

On Mon, Jan 14, 2013 at 5:36 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Jan 14, 2013 at 07:12:49PM +0000, Christoffer Dall wrote:
>> On Mon, Jan 14, 2013 at 2:00 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Mon, Jan 14, 2013 at 06:53:14PM +0000, Alexander Graf wrote:
>> >> On 01/14/2013 07:50 PM, Will Deacon wrote:
>> >> > FWIW, KVM only needs this code for handling complex MMIO instructions, which
>> >> > aren't even generated by recent guest kernels. I'm inclined to suggest removing
>> >> > this emulation code from KVM entirely given that it's likely to bitrot as
>> >> > it is executed less and less often.
>> >>
>> >> That'd mean that you heavily limit what type of guests you're executing,
>> >> which I don't think is a good idea.
>> >
>> > To be honest, I don't think we know whether that's true or not. How many
>> > guests out there do writeback accesses to MMIO devices? Even on older
>> > Linux guests, it was dependent on how GCC felt.
>>
>> I don't think bitrot'ing is a valid argument: the code doesn't depend
>> on any other implementation state that's likely to change and break
>> this code (the instruction encoding is not exactly going to change).
>> And we should simply finish the selftest code to test this stuff
>> (which should be finished if the code is unified or not, and is on my
>> todo list).
>
> Maybe `bitrot' is the wrong word. The scenario I envisage is the addition
> of new instructions to the architecture which aren't handled by the current
> code, then we end up with emulation code that works for some percentage of
> the instruction set only. If the code is rarely used, it will likely go
> untouched until it crashes somebody's VM.
>

How is that worse than KVM crashing all VMs that use any of these
instructions for IO?

At least the code we have now has been tested with a number of old
kernels, and we know that it works. As for correctness, it will be the
case for all implementations and this type of code absolutely requires
a test suite.


>> > I see where you're coming from, I just don't think we can quantify it either
>> > way outside of Linux.
>> >
>> FWIW, I know of at least a couple of companies wanting to use KVM for
>> running non-Linux guests as well.
>
> Oh, I don't doubt that. The point is, do we have any idea how they behave
> under KVM? Do they generate complex MMIO accesses? Do they expect firmware
> shims, possibly sitting above hyp? Do they require a signed boot sequence?
> Do they run on Cortex-A15 (the only target CPU we have at the moment)?
>

No we don't know. But there's a fair chance that they do use complex
mmio instructions seeing as older kernels did, without anything
explicitly being involved.

>> But, however a shame, I can more easily maintain this single patch
>> out-of-tree, so I'm willing to drop this logic for now if it gets
>> things moving.
>
> I would hope that, if this code is actually required, you would consider
> merging it with what we have rather than maintaining it out-of-tree.
>
Of course I would, and I would also make an effort to unify the code
if it were merged now, I just don't have the cycles to do the unify
work right now, since it is without doubt a lengthy process.

So from that point of view, I don't quite see how it's better to leave
the code out at this point, but that is not up to me.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-14 22:51                     ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-14 22:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 5:36 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Jan 14, 2013 at 07:12:49PM +0000, Christoffer Dall wrote:
>> On Mon, Jan 14, 2013 at 2:00 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Mon, Jan 14, 2013 at 06:53:14PM +0000, Alexander Graf wrote:
>> >> On 01/14/2013 07:50 PM, Will Deacon wrote:
>> >> > FWIW, KVM only needs this code for handling complex MMIO instructions, which
>> >> > aren't even generated by recent guest kernels. I'm inclined to suggest removing
>> >> > this emulation code from KVM entirely given that it's likely to bitrot as
>> >> > it is executed less and less often.
>> >>
>> >> That'd mean that you heavily limit what type of guests you're executing,
>> >> which I don't think is a good idea.
>> >
>> > To be honest, I don't think we know whether that's true or not. How many
>> > guests out there do writeback accesses to MMIO devices? Even on older
>> > Linux guests, it was dependent on how GCC felt.
>>
>> I don't think bitrot'ing is a valid argument: the code doesn't depend
>> on any other implementation state that's likely to change and break
>> this code (the instruction encoding is not exactly going to change).
>> And we should simply finish the selftest code to test this stuff
>> (which should be finished if the code is unified or not, and is on my
>> todo list).
>
> Maybe `bitrot' is the wrong word. The scenario I envisage is the addition
> of new instructions to the architecture which aren't handled by the current
> code, then we end up with emulation code that works for some percentage of
> the instruction set only. If the code is rarely used, it will likely go
> untouched until it crashes somebody's VM.
>

How is that worse than KVM crashing all VMs that use any of these
instructions for IO?

At least the code we have now has been tested with a number of old
kernels, and we know that it works. As for correctness, it will be the
case for all implementations and this type of code absolutely requires
a test suite.


>> > I see where you're coming from, I just don't think we can quantify it either
>> > way outside of Linux.
>> >
>> FWIW, I know of at least a couple of companies wanting to use KVM for
>> running non-Linux guests as well.
>
> Oh, I don't doubt that. The point is, do we have any idea how they behave
> under KVM? Do they generate complex MMIO accesses? Do they expect firmware
> shims, possibly sitting above hyp? Do they require a signed boot sequence?
> Do they run on Cortex-A15 (the only target CPU we have at the moment)?
>

No we don't know. But there's a fair chance that they do use complex
mmio instructions seeing as older kernels did, without anything
explicitly being involved.

>> But, however a shame, I can more easily maintain this single patch
>> out-of-tree, so I'm willing to drop this logic for now if it gets
>> things moving.
>
> I would hope that, if this code is actually required, you would consider
> merging it with what we have rather than maintaining it out-of-tree.
>
Of course I would, and I would also make an effort to unify the code
if it were merged now, I just don't have the cycles to do the unify
work right now, since it is without doubt a lengthy process.

So from that point of view, I don't quite see how it's better to leave
the code out at this point, but that is not up to me.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-14 22:36                   ` Will Deacon
@ 2013-01-15  7:00                     ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15  7:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: Christoffer Dall, Russell King - ARM Linux, kvm, Marc Zyngier,
	Marcelo Tosatti, Alexander Graf, kvmarm, linux-arm-kernel

On Mon, Jan 14, 2013 at 10:36:38PM +0000, Will Deacon wrote:
> On Mon, Jan 14, 2013 at 07:12:49PM +0000, Christoffer Dall wrote:
> > On Mon, Jan 14, 2013 at 2:00 PM, Will Deacon <will.deacon@arm.com> wrote:
> > > On Mon, Jan 14, 2013 at 06:53:14PM +0000, Alexander Graf wrote:
> > >> On 01/14/2013 07:50 PM, Will Deacon wrote:
> > >> > FWIW, KVM only needs this code for handling complex MMIO instructions, which
> > >> > aren't even generated by recent guest kernels. I'm inclined to suggest removing
> > >> > this emulation code from KVM entirely given that it's likely to bitrot as
> > >> > it is executed less and less often.
> > >>
> > >> That'd mean that you heavily limit what type of guests you're executing,
> > >> which I don't think is a good idea.
> > >
> > > To be honest, I don't think we know whether that's true or not. How many
> > > guests out there do writeback accesses to MMIO devices? Even on older
> > > Linux guests, it was dependent on how GCC felt.
> > 
> > I don't think bitrot'ing is a valid argument: the code doesn't depend
> > on any other implementation state that's likely to change and break
> > this code (the instruction encoding is not exactly going to change).
> > And we should simply finish the selftest code to test this stuff
> > (which should be finished if the code is unified or not, and is on my
> > todo list).
> 
> Maybe `bitrot' is the wrong word. The scenario I envisage is the addition
> of new instructions to the architecture which aren't handled by the current
> code, then we end up with emulation code that works for some percentage of
> the instruction set only. If the code is rarely used, it will likely go
> untouched until it crashes somebody's VM.
> 
This is precisely the situation with x86 too. X86 has to many instruction
that can potentially access MMIO memory, but luckily not all of them
are used for that. When guest appears that uses instruction x86 kvm does
not emulate yet we add emulation of required instruction. If this is the
only concern about the code it should stay IMO.


> > > I see where you're coming from, I just don't think we can quantify it either
> > > way outside of Linux.
> > >
> > FWIW, I know of at least a couple of companies wanting to use KVM for
> > running non-Linux guests as well.
> 
> Oh, I don't doubt that. The point is, do we have any idea how they behave
> under KVM? Do they generate complex MMIO accesses? Do they expect firmware
> shims, possibly sitting above hyp? Do they require a signed boot sequence?
> Do they run on Cortex-A15 (the only target CPU we have at the moment)?
> 
> > But, however a shame, I can more easily maintain this single patch
> > out-of-tree, so I'm willing to drop this logic for now if it gets
> > things moving.
> 
> I would hope that, if this code is actually required, you would consider
> merging it with what we have rather than maintaining it out-of-tree.
> 
> Will
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-15  7:00                     ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15  7:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 10:36:38PM +0000, Will Deacon wrote:
> On Mon, Jan 14, 2013 at 07:12:49PM +0000, Christoffer Dall wrote:
> > On Mon, Jan 14, 2013 at 2:00 PM, Will Deacon <will.deacon@arm.com> wrote:
> > > On Mon, Jan 14, 2013 at 06:53:14PM +0000, Alexander Graf wrote:
> > >> On 01/14/2013 07:50 PM, Will Deacon wrote:
> > >> > FWIW, KVM only needs this code for handling complex MMIO instructions, which
> > >> > aren't even generated by recent guest kernels. I'm inclined to suggest removing
> > >> > this emulation code from KVM entirely given that it's likely to bitrot as
> > >> > it is executed less and less often.
> > >>
> > >> That'd mean that you heavily limit what type of guests you're executing,
> > >> which I don't think is a good idea.
> > >
> > > To be honest, I don't think we know whether that's true or not. How many
> > > guests out there do writeback accesses to MMIO devices? Even on older
> > > Linux guests, it was dependent on how GCC felt.
> > 
> > I don't think bitrot'ing is a valid argument: the code doesn't depend
> > on any other implementation state that's likely to change and break
> > this code (the instruction encoding is not exactly going to change).
> > And we should simply finish the selftest code to test this stuff
> > (which should be finished if the code is unified or not, and is on my
> > todo list).
> 
> Maybe `bitrot' is the wrong word. The scenario I envisage is the addition
> of new instructions to the architecture which aren't handled by the current
> code, then we end up with emulation code that works for some percentage of
> the instruction set only. If the code is rarely used, it will likely go
> untouched until it crashes somebody's VM.
> 
This is precisely the situation with x86 too. X86 has to many instruction
that can potentially access MMIO memory, but luckily not all of them
are used for that. When guest appears that uses instruction x86 kvm does
not emulate yet we add emulation of required instruction. If this is the
only concern about the code it should stay IMO.


> > > I see where you're coming from, I just don't think we can quantify it either
> > > way outside of Linux.
> > >
> > FWIW, I know of at least a couple of companies wanting to use KVM for
> > running non-Linux guests as well.
> 
> Oh, I don't doubt that. The point is, do we have any idea how they behave
> under KVM? Do they generate complex MMIO accesses? Do they expect firmware
> shims, possibly sitting above hyp? Do they require a signed boot sequence?
> Do they run on Cortex-A15 (the only target CPU we have at the moment)?
> 
> > But, however a shame, I can more easily maintain this single patch
> > out-of-tree, so I'm willing to drop this logic for now if it gets
> > things moving.
> 
> I would hope that, if this code is actually required, you would consider
> merging it with what we have rather than maintaining it out-of-tree.
> 
> Will
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-08 18:39   ` Christoffer Dall
@ 2013-01-15  9:43     ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15  9:43 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Antonios Motakis,
	Marcelo Tosatti, Rusty Russell

On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
> Provides complete world-switch implementation to switch to other guests
> running in non-secure modes. Includes Hyp exception handlers that
> capture necessary exception information and stores the information on
> the VCPU and KVM structures.
> 
> The following Hyp-ABI is also documented in the code:
> 
> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
>    exception vector code will check that the HVC comes from VMID==0 and if
>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
>    - r0 contains a pointer to a HYP function
>    - r1, r2, and r3 contain arguments to the above function.
>    - The HYP function will be called with its arguments in r0, r1 and r2.
>    On HYP function return, we return directly to SVC.
> 
> A call to a function executing in Hyp mode is performed like the following:
> 
>         <svc code>
>         ldr     r0, =BSYM(my_hyp_fn)
>         ldr     r1, =my_param
>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
>         <svc code>
> 
> Otherwise, the world-switch is pretty straight-forward. All state that
> can be modified by the guest is first backed up on the Hyp stack and the
> VCPU values is loaded onto the hardware. State, which is not loaded, but
> theoretically modifiable by the guest is protected through the
> virtualiation features to generate a trap and cause software emulation.
> Upon guest returns, all state is restored from hardware onto the VCPU
> struct and the original state is restored from the Hyp-stack onto the
> hardware.
> 
> SMP support using the VMPIDR calculated on the basis of the host MPIDR
> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
> 
> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
> a separate patch into the appropriate patches introducing the
> functionality. Note that the VMIDs are stored per VM as required by the ARM
> architecture reference manual.
> 
> To support VFP/NEON we trap those instructions using the HPCTR. When
> we trap, we switch the FPU.  After a guest exit, the VFP state is
> returned to the host.  When disabling access to floating point
> instructions, we also mask FPEXC_EN in order to avoid the guest
> receiving Undefined instruction exceptions before we have a chance to
> switch back the floating point state.  We are reusing vfp_hard_struct,
> so we depend on VFPv3 being enabled in the host kernel, if not, we still
> trap cp10 and cp11 in order to inject an undefined instruction exception
> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
> Antionios Motakis and Rusty Russell.
> 
> Aborts that are permission faults, and not stage-1 page table walk, do
> not report the faulting address in the HPFAR.  We have to resolve the
> IPA, and store it just like the HPFAR register on the VCPU struct. If
> the IPA cannot be resolved, it means another CPU is playing with the
> page tables, and we simply restart the guest.  This quirk was fixed by
> Marc Zyngier.
> 
> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
>  arch/arm/include/asm/kvm_host.h |   10 +
>  arch/arm/kernel/asm-offsets.c   |   25 ++
>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
>  6 files changed, 1108 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm/kvm/interrupts_head.S
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index fb22ee8..a3262a2 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -98,6 +98,18 @@
>  #define TTBCR_T0SZ	3
>  #define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
>  
> +/* Hyp System Trap Register */
> +#define HSTR_T(x)	(1 << x)
> +#define HSTR_TTEE	(1 << 16)
> +#define HSTR_TJDBX	(1 << 17)
> +
> +/* Hyp Coprocessor Trap Register */
> +#define HCPTR_TCP(x)	(1 << x)
> +#define HCPTR_TCP_MASK	(0x3fff)
> +#define HCPTR_TASE	(1 << 15)
> +#define HCPTR_TTA	(1 << 20)
> +#define HCPTR_TCPAC	(1 << 31)
> +
>  /* Hyp Debug Configuration Register bits */
>  #define HDCR_TDRA	(1 << 11)
>  #define HDCR_TDOSA	(1 << 10)
> @@ -144,6 +156,45 @@
>  #else
>  #define VTTBR_X		(5 - KVM_T0SZ)
>  #endif
> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
> +#define VTTBR_VMID_SHIFT  (48LLU)
> +#define VTTBR_VMID_MASK	  (0xffLLU << VTTBR_VMID_SHIFT)
> +
> +/* Hyp Syndrome Register (HSR) bits */
> +#define HSR_EC_SHIFT	(26)
> +#define HSR_EC		(0x3fU << HSR_EC_SHIFT)
> +#define HSR_IL		(1U << 25)
> +#define HSR_ISS		(HSR_IL - 1)
> +#define HSR_ISV_SHIFT	(24)
> +#define HSR_ISV		(1U << HSR_ISV_SHIFT)
> +#define HSR_FSC		(0x3f)
> +#define HSR_FSC_TYPE	(0x3c)
> +#define HSR_WNR		(1 << 6)
> +
> +#define FSC_FAULT	(0x04)
> +#define FSC_PERM	(0x0c)
> +
> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> +#define HPFAR_MASK	(~0xf)
>  
> +#define HSR_EC_UNKNOWN	(0x00)
> +#define HSR_EC_WFI	(0x01)
> +#define HSR_EC_CP15_32	(0x03)
> +#define HSR_EC_CP15_64	(0x04)
> +#define HSR_EC_CP14_MR	(0x05)
> +#define HSR_EC_CP14_LS	(0x06)
> +#define HSR_EC_CP_0_13	(0x07)
> +#define HSR_EC_CP10_ID	(0x08)
> +#define HSR_EC_JAZELLE	(0x09)
> +#define HSR_EC_BXJ	(0x0A)
> +#define HSR_EC_CP14_64	(0x0C)
> +#define HSR_EC_SVC_HYP	(0x11)
> +#define HSR_EC_HVC	(0x12)
> +#define HSR_EC_SMC	(0x13)
> +#define HSR_EC_IABT	(0x20)
> +#define HSR_EC_IABT_HYP	(0x21)
> +#define HSR_EC_DABT	(0x24)
> +#define HSR_EC_DABT_HYP	(0x25)
>  
>  #endif /* __ARM_KVM_ARM_H__ */
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 1de6f0d..ddb09da 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -21,6 +21,7 @@
>  
>  #include <asm/kvm.h>
>  #include <asm/kvm_asm.h>
> +#include <asm/fpstate.h>
>  
>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>  #define KVM_USER_MEM_SLOTS 32
> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
>  	u32 hxfar;		/* Hyp Data/Inst Fault Address Register */
>  	u32 hpfar;		/* Hyp IPA Fault Address Register */
>  
> +	/* Floating point registers (VFP and Advanced SIMD/NEON) */
> +	struct vfp_hard_struct vfp_guest;
> +	struct vfp_hard_struct *vfp_host;
> +
> +	/*
> +	 * Anything that is not used directly from assembly code goes
> +	 * here.
> +	 */
>  	/* Interrupt related fields */
>  	u32 irq_lines;		/* IRQ and FIQ levels */
>  
> @@ -112,6 +121,7 @@ struct kvm_one_reg;
>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>  u64 kvm_call_hyp(void *hypfn, ...);
> +void force_vm_exit(const cpumask_t *mask);
>  
>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>  struct kvm;
> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> index c985b48..c8b3272 100644
> --- a/arch/arm/kernel/asm-offsets.c
> +++ b/arch/arm/kernel/asm-offsets.c
> @@ -13,6 +13,9 @@
>  #include <linux/sched.h>
>  #include <linux/mm.h>
>  #include <linux/dma-mapping.h>
> +#ifdef CONFIG_KVM_ARM_HOST
> +#include <linux/kvm_host.h>
> +#endif
>  #include <asm/cacheflush.h>
>  #include <asm/glue-df.h>
>  #include <asm/glue-pf.h>
> @@ -146,5 +149,27 @@ int main(void)
>    DEFINE(DMA_BIDIRECTIONAL,	DMA_BIDIRECTIONAL);
>    DEFINE(DMA_TO_DEVICE,		DMA_TO_DEVICE);
>    DEFINE(DMA_FROM_DEVICE,	DMA_FROM_DEVICE);
> +#ifdef CONFIG_KVM_ARM_HOST
> +  DEFINE(VCPU_KVM,		offsetof(struct kvm_vcpu, kvm));
> +  DEFINE(VCPU_MIDR,		offsetof(struct kvm_vcpu, arch.midr));
> +  DEFINE(VCPU_CP15,		offsetof(struct kvm_vcpu, arch.cp15));
> +  DEFINE(VCPU_VFP_GUEST,	offsetof(struct kvm_vcpu, arch.vfp_guest));
> +  DEFINE(VCPU_VFP_HOST,		offsetof(struct kvm_vcpu, arch.vfp_host));
> +  DEFINE(VCPU_REGS,		offsetof(struct kvm_vcpu, arch.regs));
> +  DEFINE(VCPU_USR_REGS,		offsetof(struct kvm_vcpu, arch.regs.usr_regs));
> +  DEFINE(VCPU_SVC_REGS,		offsetof(struct kvm_vcpu, arch.regs.svc_regs));
> +  DEFINE(VCPU_ABT_REGS,		offsetof(struct kvm_vcpu, arch.regs.abt_regs));
> +  DEFINE(VCPU_UND_REGS,		offsetof(struct kvm_vcpu, arch.regs.und_regs));
> +  DEFINE(VCPU_IRQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.irq_regs));
> +  DEFINE(VCPU_FIQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
> +  DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
> +  DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
> +  DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
> +  DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.hsr));
> +  DEFINE(VCPU_HxFAR,		offsetof(struct kvm_vcpu, arch.hxfar));
> +  DEFINE(VCPU_HPFAR,		offsetof(struct kvm_vcpu, arch.hpfar));
> +  DEFINE(VCPU_HYP_PC,		offsetof(struct kvm_vcpu, arch.hyp_pc));
> +  DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
> +#endif
>    return 0; 
>  }
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 9b4566e..c94d278 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -40,6 +40,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_emulate.h>
>  
>  #ifdef REQUIRES_VIRT
>  __asm__(".arch_extension	virt");
> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>  static unsigned long hyp_default_vectors;
>  
> +/* The VMID used in the VTTBR */
> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> +static u8 kvm_next_vmid;
> +static DEFINE_SPINLOCK(kvm_vmid_lock);
>  
>  int kvm_arch_hardware_enable(void *garbage)
>  {
> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
>  
>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  {
> +	/* Force users to call KVM_ARM_VCPU_INIT */
> +	vcpu->arch.target = -1;
>  	return 0;
>  }
>  
> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>  	vcpu->cpu = cpu;
> +	vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  
>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
As far as I see the function is unused.

>  {
> +	return v->mode == IN_GUEST_MODE;
> +}
> +
> +/* Just ensure a guest exit from a particular CPU */
> +static void exit_vm_noop(void *info)
> +{
> +}
> +
> +void force_vm_exit(const cpumask_t *mask)
> +{
> +	smp_call_function_many(mask, exit_vm_noop, NULL, true);
> +}
There is make_all_cpus_request() for that. It actually sends IPIs only
to cpus that are running vcpus.

> +
> +/**
> + * need_new_vmid_gen - check that the VMID is still valid
> + * @kvm: The VM's VMID to checkt
> + *
> + * return true if there is a new generation of VMIDs being used
> + *
> + * The hardware supports only 256 values with the value zero reserved for the
> + * host, so we check if an assigned value belongs to a previous generation,
> + * which which requires us to assign a new value. If we're the first to use a
> + * VMID for the new generation, we must flush necessary caches and TLBs on all
> + * CPUs.
> + */
> +static bool need_new_vmid_gen(struct kvm *kvm)
> +{
> +	return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
> +}
> +
> +/**
> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
> + * @kvm	The guest that we are about to run
> + *
> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
> + * caches and TLBs.
> + */
> +static void update_vttbr(struct kvm *kvm)
> +{
> +	phys_addr_t pgd_phys;
> +	u64 vmid;
> +
> +	if (!need_new_vmid_gen(kvm))
> +		return;
> +
> +	spin_lock(&kvm_vmid_lock);
> +
> +	/*
> +	 * We need to re-check the vmid_gen here to ensure that if another vcpu
> +	 * already allocated a valid vmid for this vm, then this vcpu should
> +	 * use the same vmid.
> +	 */
> +	if (!need_new_vmid_gen(kvm)) {
> +		spin_unlock(&kvm_vmid_lock);
> +		return;
> +	}
> +
> +	/* First user of a new VMID generation? */
> +	if (unlikely(kvm_next_vmid == 0)) {
> +		atomic64_inc(&kvm_vmid_gen);
> +		kvm_next_vmid = 1;
> +
> +		/*
> +		 * On SMP we know no other CPUs can use this CPU's or each
> +		 * other's VMID after force_vm_exit returns since the
> +		 * kvm_vmid_lock blocks them from reentry to the guest.
> +		 */
> +		force_vm_exit(cpu_all_mask);
> +		/*
> +		 * Now broadcast TLB + ICACHE invalidation over the inner
> +		 * shareable domain to make sure all data structures are
> +		 * clean.
> +		 */
> +		kvm_call_hyp(__kvm_flush_vm_context);
> +	}
> +
> +	kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
> +	kvm->arch.vmid = kvm_next_vmid;
> +	kvm_next_vmid++;
> +
> +	/* update vttbr to be used with the new vmid */
> +	pgd_phys = virt_to_phys(kvm->arch.pgd);
> +	vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
> +	kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
> +	kvm->arch.vttbr |= vmid;
> +
> +	spin_unlock(&kvm_vmid_lock);
> +}
> +
> +/*
> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
> + * proper exit to QEMU.
> + */
> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +		       int exception_index)
> +{
> +	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>  	return 0;
>  }
>  
> +/**
> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
> + * @vcpu:	The VCPU pointer
> + * @run:	The kvm_run structure pointer used for userspace state exchange
> + *
> + * This function is called through the VCPU_RUN ioctl called from user space. It
> + * will execute VM code in a loop until the time slice for the process is used
> + * or some emulation is needed from user space in which case the function will
> + * return with return value 0 and with the kvm_run structure filled in with the
> + * required data for the requested emulation.
> + */
>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> -	return -EINVAL;
> +	int ret;
> +	sigset_t sigsaved;
> +
> +	/* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
> +	if (unlikely(vcpu->arch.target < 0))
> +		return -ENOEXEC;
> +
> +	if (vcpu->sigset_active)
> +		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
> +
> +	ret = 1;
> +	run->exit_reason = KVM_EXIT_UNKNOWN;
> +	while (ret > 0) {
> +		/*
> +		 * Check conditions before entering the guest
> +		 */
> +		cond_resched();
> +
> +		update_vttbr(vcpu->kvm);
> +
> +		local_irq_disable();
> +
> +		/*
> +		 * Re-check atomic conditions
> +		 */
> +		if (signal_pending(current)) {
> +			ret = -EINTR;
> +			run->exit_reason = KVM_EXIT_INTR;
> +		}
> +
> +		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> +			local_irq_enable();
> +			continue;
> +		}
> +
> +		/**************************************************************
> +		 * Enter the guest
> +		 */
> +		trace_kvm_entry(*vcpu_pc(vcpu));
> +		kvm_guest_enter();
> +		vcpu->mode = IN_GUEST_MODE;
You need to set mode to IN_GUEST_MODE before disabling interrupt and
check that mode != EXITING_GUEST_MODE after disabling interrupt but
before entering the guest. This way you will catch kicks that were sent
between setting of the mode and disabling the interrupts. Also you need
to check vcpu->requests and exit if it is not empty. I see that you do
not use vcpu->requests at all, but you should since common kvm code
assumes that it is used. make_all_cpus_request() uses it for instance.

> +
> +		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
You do not take kvm->srcu lock before entering the guest. It looks
wrong.

> +
> +		vcpu->mode = OUTSIDE_GUEST_MODE;
> +		kvm_guest_exit();
> +		trace_kvm_exit(*vcpu_pc(vcpu));
> +		/*
> +		 * We may have taken a host interrupt in HYP mode (ie
> +		 * while executing the guest). This interrupt is still
> +		 * pending, as we haven't serviced it yet!
> +		 *
> +		 * We're now back in SVC mode, with interrupts
> +		 * disabled.  Enabling the interrupts now will have
> +		 * the effect of taking the interrupt again, in SVC
> +		 * mode this time.
> +		 */
> +		local_irq_enable();
> +
> +		/*
> +		 * Back from guest
> +		 *************************************************************/
> +
> +		ret = handle_exit(vcpu, run, ret);
> +	}
> +
> +	if (vcpu->sigset_active)
> +		sigprocmask(SIG_SETMASK, &sigsaved, NULL);
> +	return ret;
>  }
>  
>  static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index a923590..08adcd5 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -20,9 +20,12 @@
>  #include <linux/const.h>
>  #include <asm/unified.h>
>  #include <asm/page.h>
> +#include <asm/ptrace.h>
>  #include <asm/asm-offsets.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_arm.h>
> +#include <asm/vfpmacros.h>
> +#include "interrupts_head.S"
>  
>  	.text
>  
> @@ -31,36 +34,423 @@ __kvm_hyp_code_start:
>  
>  /********************************************************************
>   * Flush per-VMID TLBs
> + *
> + * void __kvm_tlb_flush_vmid(struct kvm *kvm);
> + *
> + * We rely on the hardware to broadcast the TLB invalidation to all CPUs
> + * inside the inner-shareable domain (which is the case for all v7
> + * implementations).  If we come across a non-IS SMP implementation, we'll
> + * have to use an IPI based mechanism. Until then, we stick to the simple
> + * hardware assisted version.
>   */
>  ENTRY(__kvm_tlb_flush_vmid)
> +	push	{r2, r3}
> +
> +	add	r0, r0, #KVM_VTTBR
> +	ldrd	r2, r3, [r0]
> +	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
> +	isb
> +	mcr     p15, 0, r0, c8, c3, 0	@ TLBIALLIS (rt ignored)
> +	dsb
> +	isb
> +	mov	r2, #0
> +	mov	r3, #0
> +	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
> +	isb				@ Not necessary if followed by eret
> +
> +	pop	{r2, r3}
>  	bx	lr
>  ENDPROC(__kvm_tlb_flush_vmid)
>  
>  /********************************************************************
> - * Flush TLBs and instruction caches of current CPU for all VMIDs
> + * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
> + * domain, for all VMIDs
> + *
> + * void __kvm_flush_vm_context(void);
>   */
>  ENTRY(__kvm_flush_vm_context)
> +	mov	r0, #0			@ rn parameter for c15 flushes is SBZ
> +
> +	/* Invalidate NS Non-Hyp TLB Inner Shareable (TLBIALLNSNHIS) */
> +	mcr     p15, 4, r0, c8, c3, 4
> +	/* Invalidate instruction caches Inner Shareable (ICIALLUIS) */
> +	mcr     p15, 0, r0, c7, c1, 0
> +	dsb
> +	isb				@ Not necessary if followed by eret
> +
>  	bx	lr
>  ENDPROC(__kvm_flush_vm_context)
>  
> +
>  /********************************************************************
>   *  Hypervisor world-switch code
> + *
> + *
> + * int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>   */
>  ENTRY(__kvm_vcpu_run)
> -	bx	lr
> +	@ Save the vcpu pointer
> +	mcr	p15, 4, vcpu, c13, c0, 2	@ HTPIDR
> +
> +	save_host_regs
> +
> +	@ Store hardware CP15 state and load guest state
> +	read_cp15_state store_to_vcpu = 0
> +	write_cp15_state read_from_vcpu = 1
> +
> +	@ If the host kernel has not been configured with VFPv3 support,
> +	@ then it is safer if we deny guests from using it as well.
> +#ifdef CONFIG_VFPv3
> +	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
> +	VFPFMRX r2, FPEXC		@ VMRS
> +	push	{r2}
> +	orr	r2, r2, #FPEXC_EN
> +	VFPFMXR FPEXC, r2		@ VMSR
> +#endif
> +
> +	@ Configure Hyp-role
> +	configure_hyp_role vmentry
> +
> +	@ Trap coprocessor CRx accesses
> +	set_hstr vmentry
> +	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> +	set_hdcr vmentry
> +
> +	@ Write configured ID register into MIDR alias
> +	ldr	r1, [vcpu, #VCPU_MIDR]
> +	mcr	p15, 4, r1, c0, c0, 0
> +
> +	@ Write guest view of MPIDR into VMPIDR
> +	ldr	r1, [vcpu, #CP15_OFFSET(c0_MPIDR)]
> +	mcr	p15, 4, r1, c0, c0, 5
> +
> +	@ Set up guest memory translation
> +	ldr	r1, [vcpu, #VCPU_KVM]
> +	add	r1, r1, #KVM_VTTBR
> +	ldrd	r2, r3, [r1]
> +	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
> +
> +	@ We're all done, just restore the GPRs and go to the guest
> +	restore_guest_regs
> +	clrex				@ Clear exclusive monitor
> +	eret
> +
> +__kvm_vcpu_return:
> +	/*
> +	 * return convention:
> +	 * guest r0, r1, r2 saved on the stack
> +	 * r0: vcpu pointer
> +	 * r1: exception code
> +	 */
> +	save_guest_regs
> +
> +	@ Set VMID == 0
> +	mov	r2, #0
> +	mov	r3, #0
> +	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
> +
> +	@ Don't trap coprocessor accesses for host kernel
> +	set_hstr vmexit
> +	set_hdcr vmexit
> +	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> +
> +#ifdef CONFIG_VFPv3
> +	@ Save floating point registers we if let guest use them.
> +	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
> +	bne	after_vfp_restore
> +
> +	@ Switch VFP/NEON hardware state to the host's
> +	add	r7, vcpu, #VCPU_VFP_GUEST
> +	store_vfp_state r7
> +	add	r7, vcpu, #VCPU_VFP_HOST
> +	ldr	r7, [r7]
> +	restore_vfp_state r7
> +
> +after_vfp_restore:
> +	@ Restore FPEXC_EN which we clobbered on entry
> +	pop	{r2}
> +	VFPFMXR FPEXC, r2
> +#endif
> +
> +	@ Reset Hyp-role
> +	configure_hyp_role vmexit
> +
> +	@ Let host read hardware MIDR
> +	mrc	p15, 0, r2, c0, c0, 0
> +	mcr	p15, 4, r2, c0, c0, 0
> +
> +	@ Back to hardware MPIDR
> +	mrc	p15, 0, r2, c0, c0, 5
> +	mcr	p15, 4, r2, c0, c0, 5
> +
> +	@ Store guest CP15 state and restore host state
> +	read_cp15_state store_to_vcpu = 1
> +	write_cp15_state read_from_vcpu = 0
> +
> +	restore_host_regs
> +	clrex				@ Clear exclusive monitor
> +	mov	r0, r1			@ Return the return code
> +	bx	lr			@ return to IOCTL
>  
>  ENTRY(kvm_call_hyp)
> +	hvc	#0
>  	bx	lr
>  
>  
>  /********************************************************************
>   * Hypervisor exception vector and handlers
> + *
> + *
> + * The KVM/ARM Hypervisor ABI is defined as follows:
> + *
> + * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
> + * instruction is issued since all traps are disabled when running the host
> + * kernel as per the Hyp-mode initialization at boot time.
> + *
> + * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
> + * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
> + * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
> + * instructions are called from within Hyp-mode.
> + *
> + * Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
> + *    Switching to Hyp mode is done through a simple HVC #0 instruction. The
> + *    exception vector code will check that the HVC comes from VMID==0 and if
> + *    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
> + *    - r0 contains a pointer to a HYP function
> + *    - r1, r2, and r3 contain arguments to the above function.
> + *    - The HYP function will be called with its arguments in r0, r1 and r2.
> + *    On HYP function return, we return directly to SVC.
> + *
> + * Note that the above is used to execute code in Hyp-mode from a host-kernel
> + * point of view, and is a different concept from performing a world-switch and
> + * executing guest code SVC mode (with a VMID != 0).
>   */
>  
> +/* Handle undef, svc, pabt, or dabt by crashing with a user notice */
> +.macro bad_exception exception_code, panic_str
> +	push	{r0-r2}
> +	mrrc	p15, 6, r0, r1, c2	@ Read VTTBR
> +	lsr	r1, r1, #16
> +	ands	r1, r1, #0xff
> +	beq	99f
> +
> +	load_vcpu			@ Load VCPU pointer
> +	.if \exception_code == ARM_EXCEPTION_DATA_ABORT
> +	mrc	p15, 4, r2, c5, c2, 0	@ HSR
> +	mrc	p15, 4, r1, c6, c0, 0	@ HDFAR
> +	str	r2, [vcpu, #VCPU_HSR]
> +	str	r1, [vcpu, #VCPU_HxFAR]
> +	.endif
> +	.if \exception_code == ARM_EXCEPTION_PREF_ABORT
> +	mrc	p15, 4, r2, c5, c2, 0	@ HSR
> +	mrc	p15, 4, r1, c6, c0, 2	@ HIFAR
> +	str	r2, [vcpu, #VCPU_HSR]
> +	str	r1, [vcpu, #VCPU_HxFAR]
> +	.endif
> +	mov	r1, #\exception_code
> +	b	__kvm_vcpu_return
> +
> +	@ We were in the host already. Let's craft a panic-ing return to SVC.
> +99:	mrs	r2, cpsr
> +	bic	r2, r2, #MODE_MASK
> +	orr	r2, r2, #SVC_MODE
> +THUMB(	orr	r2, r2, #PSR_T_BIT	)
> +	msr	spsr_cxsf, r2
> +	mrs	r1, ELR_hyp
> +	ldr	r2, =BSYM(panic)
> +	msr	ELR_hyp, r2
> +	ldr	r0, =\panic_str
> +	eret
> +.endm
> +
> +	.text
> +
>  	.align 5
>  __kvm_hyp_vector:
>  	.globl __kvm_hyp_vector
> -	nop
> +
> +	@ Hyp-mode exception vector
> +	W(b)	hyp_reset
> +	W(b)	hyp_undef
> +	W(b)	hyp_svc
> +	W(b)	hyp_pabt
> +	W(b)	hyp_dabt
> +	W(b)	hyp_hvc
> +	W(b)	hyp_irq
> +	W(b)	hyp_fiq
> +
> +	.align
> +hyp_reset:
> +	b	hyp_reset
> +
> +	.align
> +hyp_undef:
> +	bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str
> +
> +	.align
> +hyp_svc:
> +	bad_exception ARM_EXCEPTION_HVC, svc_die_str
> +
> +	.align
> +hyp_pabt:
> +	bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str
> +
> +	.align
> +hyp_dabt:
> +	bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str
> +
> +	.align
> +hyp_hvc:
> +	/*
> +	 * Getting here is either becuase of a trap from a guest or from calling
> +	 * HVC from the host kernel, which means "switch to Hyp mode".
> +	 */
> +	push	{r0, r1, r2}
> +
> +	@ Check syndrome register
> +	mrc	p15, 4, r1, c5, c2, 0	@ HSR
> +	lsr	r0, r1, #HSR_EC_SHIFT
> +#ifdef CONFIG_VFPv3
> +	cmp	r0, #HSR_EC_CP_0_13
> +	beq	switch_to_guest_vfp
> +#endif
> +	cmp	r0, #HSR_EC_HVC
> +	bne	guest_trap		@ Not HVC instr.
> +
> +	/*
> +	 * Let's check if the HVC came from VMID 0 and allow simple
> +	 * switch to Hyp mode
> +	 */
> +	mrrc    p15, 6, r0, r2, c2
> +	lsr     r2, r2, #16
> +	and     r2, r2, #0xff
> +	cmp     r2, #0
> +	bne	guest_trap		@ Guest called HVC
> +
> +host_switch_to_hyp:
> +	pop	{r0, r1, r2}
> +
> +	push	{lr}
> +	mrs	lr, SPSR
> +	push	{lr}
> +
> +	mov	lr, r0
> +	mov	r0, r1
> +	mov	r1, r2
> +	mov	r2, r3
> +
> +THUMB(	orr	lr, #1)
> +	blx	lr			@ Call the HYP function
> +
> +	pop	{lr}
> +	msr	SPSR_csxf, lr
> +	pop	{lr}
> +	eret
> +
> +guest_trap:
> +	load_vcpu			@ Load VCPU pointer to r0
> +	str	r1, [vcpu, #VCPU_HSR]
> +
> +	@ Check if we need the fault information
> +	lsr	r1, r1, #HSR_EC_SHIFT
> +	cmp	r1, #HSR_EC_IABT
> +	mrceq	p15, 4, r2, c6, c0, 2	@ HIFAR
> +	beq	2f
> +	cmp	r1, #HSR_EC_DABT
> +	bne	1f
> +	mrc	p15, 4, r2, c6, c0, 0	@ HDFAR
> +
> +2:	str	r2, [vcpu, #VCPU_HxFAR]
> +
> +	/*
> +	 * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode:
> +	 *
> +	 * Abort on the stage 2 translation for a memory access from a
> +	 * Non-secure PL1 or PL0 mode:
> +	 *
> +	 * For any Access flag fault or Translation fault, and also for any
> +	 * Permission fault on the stage 2 translation of a memory access
> +	 * made as part of a translation table walk for a stage 1 translation,
> +	 * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR
> +	 * is UNKNOWN.
> +	 */
> +
> +	/* Check for permission fault, and S1PTW */
> +	mrc	p15, 4, r1, c5, c2, 0	@ HSR
> +	and	r0, r1, #HSR_FSC_TYPE
> +	cmp	r0, #FSC_PERM
> +	tsteq	r1, #(1 << 7)		@ S1PTW
> +	mrcne	p15, 4, r2, c6, c0, 4	@ HPFAR
> +	bne	3f
> +
> +	/* Resolve IPA using the xFAR */
> +	mcr	p15, 0, r2, c7, c8, 0	@ ATS1CPR
> +	isb
> +	mrrc	p15, 0, r0, r1, c7	@ PAR
> +	tst	r0, #1
> +	bne	4f			@ Failed translation
> +	ubfx	r2, r0, #12, #20
> +	lsl	r2, r2, #4
> +	orr	r2, r2, r1, lsl #24
> +
> +3:	load_vcpu			@ Load VCPU pointer to r0
> +	str	r2, [r0, #VCPU_HPFAR]
> +
> +1:	mov	r1, #ARM_EXCEPTION_HVC
> +	b	__kvm_vcpu_return
> +
> +4:	pop	{r0, r1, r2}		@ Failed translation, return to guest
> +	eret
> +
> +/*
> + * If VFPv3 support is not available, then we will not switch the VFP
> + * registers; however cp10 and cp11 accesses will still trap and fallback
> + * to the regular coprocessor emulation code, which currently will
> + * inject an undefined exception to the guest.
> + */
> +#ifdef CONFIG_VFPv3
> +switch_to_guest_vfp:
> +	load_vcpu			@ Load VCPU pointer to r0
> +	push	{r3-r7}
> +
> +	@ NEON/VFP used.  Turn on VFP access.
> +	set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
> +
> +	@ Switch VFP/NEON hardware state to the guest's
> +	add	r7, r0, #VCPU_VFP_HOST
> +	ldr	r7, [r7]
> +	store_vfp_state r7
> +	add	r7, r0, #VCPU_VFP_GUEST
> +	restore_vfp_state r7
> +
> +	pop	{r3-r7}
> +	pop	{r0-r2}
> +	eret
> +#endif
> +
> +	.align
> +hyp_irq:
> +	push	{r0, r1, r2}
> +	mov	r1, #ARM_EXCEPTION_IRQ
> +	load_vcpu			@ Load VCPU pointer to r0
> +	b	__kvm_vcpu_return
> +
> +	.align
> +hyp_fiq:
> +	b	hyp_fiq
> +
> +	.ltorg
>  
>  __kvm_hyp_code_end:
>  	.globl	__kvm_hyp_code_end
> +
> +	.section ".rodata"
> +
> +und_die_str:
> +	.ascii	"unexpected undefined exception in Hyp mode at: %#08x"
> +pabt_die_str:
> +	.ascii	"unexpected prefetch abort in Hyp mode at: %#08x"
> +dabt_die_str:
> +	.ascii	"unexpected data abort in Hyp mode at: %#08x"
> +svc_die_str:
> +	.ascii	"unexpected HVC/SVC trap in Hyp mode at: %#08x"
> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> new file mode 100644
> index 0000000..f59a580
> --- /dev/null
> +++ b/arch/arm/kvm/interrupts_head.S
> @@ -0,0 +1,443 @@
> +#define VCPU_USR_REG(_reg_nr)	(VCPU_USR_REGS + (_reg_nr * 4))
> +#define VCPU_USR_SP		(VCPU_USR_REG(13))
> +#define VCPU_USR_LR		(VCPU_USR_REG(14))
> +#define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4))
> +
> +/*
> + * Many of these macros need to access the VCPU structure, which is always
> + * held in r0. These macros should never clobber r1, as it is used to hold the
> + * exception code on the return path (except of course the macro that switches
> + * all the registers before the final jump to the VM).
> + */
> +vcpu	.req	r0		@ vcpu pointer always in r0
> +
> +/* Clobbers {r2-r6} */
> +.macro store_vfp_state vfp_base
> +	@ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions
> +	VFPFMRX	r2, FPEXC
> +	@ Make sure VFP is enabled so we can touch the registers.
> +	orr	r6, r2, #FPEXC_EN
> +	VFPFMXR	FPEXC, r6
> +
> +	VFPFMRX	r3, FPSCR
> +	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
> +	beq	1f
> +	@ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so
> +	@ we only need to save them if FPEXC_EX is set.
> +	VFPFMRX r4, FPINST
> +	tst	r2, #FPEXC_FP2V
> +	VFPFMRX r5, FPINST2, ne		@ vmrsne
> +	bic	r6, r2, #FPEXC_EX	@ FPEXC_EX disable
> +	VFPFMXR	FPEXC, r6
> +1:
> +	VFPFSTMIA \vfp_base, r6		@ Save VFP registers
> +	stm	\vfp_base, {r2-r5}	@ Save FPEXC, FPSCR, FPINST, FPINST2
> +.endm
> +
> +/* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */
> +.macro restore_vfp_state vfp_base
> +	VFPFLDMIA \vfp_base, r6		@ Load VFP registers
> +	ldm	\vfp_base, {r2-r5}	@ Load FPEXC, FPSCR, FPINST, FPINST2
> +
> +	VFPFMXR FPSCR, r3
> +	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
> +	beq	1f
> +	VFPFMXR FPINST, r4
> +	tst	r2, #FPEXC_FP2V
> +	VFPFMXR FPINST2, r5, ne
> +1:
> +	VFPFMXR FPEXC, r2	@ FPEXC	(last, in case !EN)
> +.endm
> +
> +/* These are simply for the macros to work - value don't have meaning */
> +.equ usr, 0
> +.equ svc, 1
> +.equ abt, 2
> +.equ und, 3
> +.equ irq, 4
> +.equ fiq, 5
> +
> +.macro push_host_regs_mode mode
> +	mrs	r2, SP_\mode
> +	mrs	r3, LR_\mode
> +	mrs	r4, SPSR_\mode
> +	push	{r2, r3, r4}
> +.endm
> +
> +/*
> + * Store all host persistent registers on the stack.
> + * Clobbers all registers, in all modes, except r0 and r1.
> + */
> +.macro save_host_regs
> +	/* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
> +	mrs	r2, ELR_hyp
> +	push	{r2}
> +
> +	/* usr regs */
> +	push	{r4-r12}	@ r0-r3 are always clobbered
> +	mrs	r2, SP_usr
> +	mov	r3, lr
> +	push	{r2, r3}
> +
> +	push_host_regs_mode svc
> +	push_host_regs_mode abt
> +	push_host_regs_mode und
> +	push_host_regs_mode irq
> +
> +	/* fiq regs */
> +	mrs	r2, r8_fiq
> +	mrs	r3, r9_fiq
> +	mrs	r4, r10_fiq
> +	mrs	r5, r11_fiq
> +	mrs	r6, r12_fiq
> +	mrs	r7, SP_fiq
> +	mrs	r8, LR_fiq
> +	mrs	r9, SPSR_fiq
> +	push	{r2-r9}
> +.endm
> +
> +.macro pop_host_regs_mode mode
> +	pop	{r2, r3, r4}
> +	msr	SP_\mode, r2
> +	msr	LR_\mode, r3
> +	msr	SPSR_\mode, r4
> +.endm
> +
> +/*
> + * Restore all host registers from the stack.
> + * Clobbers all registers, in all modes, except r0 and r1.
> + */
> +.macro restore_host_regs
> +	pop	{r2-r9}
> +	msr	r8_fiq, r2
> +	msr	r9_fiq, r3
> +	msr	r10_fiq, r4
> +	msr	r11_fiq, r5
> +	msr	r12_fiq, r6
> +	msr	SP_fiq, r7
> +	msr	LR_fiq, r8
> +	msr	SPSR_fiq, r9
> +
> +	pop_host_regs_mode irq
> +	pop_host_regs_mode und
> +	pop_host_regs_mode abt
> +	pop_host_regs_mode svc
> +
> +	pop	{r2, r3}
> +	msr	SP_usr, r2
> +	mov	lr, r3
> +	pop	{r4-r12}
> +
> +	pop	{r2}
> +	msr	ELR_hyp, r2
> +.endm
> +
> +/*
> + * Restore SP, LR and SPSR for a given mode. offset is the offset of
> + * this mode's registers from the VCPU base.
> + *
> + * Assumes vcpu pointer in vcpu reg
> + *
> + * Clobbers r1, r2, r3, r4.
> + */
> +.macro restore_guest_regs_mode mode, offset
> +	add	r1, vcpu, \offset
> +	ldm	r1, {r2, r3, r4}
> +	msr	SP_\mode, r2
> +	msr	LR_\mode, r3
> +	msr	SPSR_\mode, r4
> +.endm
> +
> +/*
> + * Restore all guest registers from the vcpu struct.
> + *
> + * Assumes vcpu pointer in vcpu reg
> + *
> + * Clobbers *all* registers.
> + */
> +.macro restore_guest_regs
> +	restore_guest_regs_mode svc, #VCPU_SVC_REGS
> +	restore_guest_regs_mode abt, #VCPU_ABT_REGS
> +	restore_guest_regs_mode und, #VCPU_UND_REGS
> +	restore_guest_regs_mode irq, #VCPU_IRQ_REGS
> +
> +	add	r1, vcpu, #VCPU_FIQ_REGS
> +	ldm	r1, {r2-r9}
> +	msr	r8_fiq, r2
> +	msr	r9_fiq, r3
> +	msr	r10_fiq, r4
> +	msr	r11_fiq, r5
> +	msr	r12_fiq, r6
> +	msr	SP_fiq, r7
> +	msr	LR_fiq, r8
> +	msr	SPSR_fiq, r9
> +
> +	@ Load return state
> +	ldr	r2, [vcpu, #VCPU_PC]
> +	ldr	r3, [vcpu, #VCPU_CPSR]
> +	msr	ELR_hyp, r2
> +	msr	SPSR_cxsf, r3
> +
> +	@ Load user registers
> +	ldr	r2, [vcpu, #VCPU_USR_SP]
> +	ldr	r3, [vcpu, #VCPU_USR_LR]
> +	msr	SP_usr, r2
> +	mov	lr, r3
> +	add	vcpu, vcpu, #(VCPU_USR_REGS)
> +	ldm	vcpu, {r0-r12}
> +.endm
> +
> +/*
> + * Save SP, LR and SPSR for a given mode. offset is the offset of
> + * this mode's registers from the VCPU base.
> + *
> + * Assumes vcpu pointer in vcpu reg
> + *
> + * Clobbers r2, r3, r4, r5.
> + */
> +.macro save_guest_regs_mode mode, offset
> +	add	r2, vcpu, \offset
> +	mrs	r3, SP_\mode
> +	mrs	r4, LR_\mode
> +	mrs	r5, SPSR_\mode
> +	stm	r2, {r3, r4, r5}
> +.endm
> +
> +/*
> + * Save all guest registers to the vcpu struct
> + * Expects guest's r0, r1, r2 on the stack.
> + *
> + * Assumes vcpu pointer in vcpu reg
> + *
> + * Clobbers r2, r3, r4, r5.
> + */
> +.macro save_guest_regs
> +	@ Store usr registers
> +	add	r2, vcpu, #VCPU_USR_REG(3)
> +	stm	r2, {r3-r12}
> +	add	r2, vcpu, #VCPU_USR_REG(0)
> +	pop	{r3, r4, r5}		@ r0, r1, r2
> +	stm	r2, {r3, r4, r5}
> +	mrs	r2, SP_usr
> +	mov	r3, lr
> +	str	r2, [vcpu, #VCPU_USR_SP]
> +	str	r3, [vcpu, #VCPU_USR_LR]
> +
> +	@ Store return state
> +	mrs	r2, ELR_hyp
> +	mrs	r3, spsr
> +	str	r2, [vcpu, #VCPU_PC]
> +	str	r3, [vcpu, #VCPU_CPSR]
> +
> +	@ Store other guest registers
> +	save_guest_regs_mode svc, #VCPU_SVC_REGS
> +	save_guest_regs_mode abt, #VCPU_ABT_REGS
> +	save_guest_regs_mode und, #VCPU_UND_REGS
> +	save_guest_regs_mode irq, #VCPU_IRQ_REGS
> +.endm
> +
> +/* Reads cp15 registers from hardware and stores them in memory
> + * @store_to_vcpu: If 0, registers are written in-order to the stack,
> + * 		   otherwise to the VCPU struct pointed to by vcpup
> + *
> + * Assumes vcpu pointer in vcpu reg
> + *
> + * Clobbers r2 - r12
> + */
> +.macro read_cp15_state store_to_vcpu
> +	mrc	p15, 0, r2, c1, c0, 0	@ SCTLR
> +	mrc	p15, 0, r3, c1, c0, 2	@ CPACR
> +	mrc	p15, 0, r4, c2, c0, 2	@ TTBCR
> +	mrc	p15, 0, r5, c3, c0, 0	@ DACR
> +	mrrc	p15, 0, r6, r7, c2	@ TTBR 0
> +	mrrc	p15, 1, r8, r9, c2	@ TTBR 1
> +	mrc	p15, 0, r10, c10, c2, 0	@ PRRR
> +	mrc	p15, 0, r11, c10, c2, 1	@ NMRR
> +	mrc	p15, 2, r12, c0, c0, 0	@ CSSELR
> +
> +	.if \store_to_vcpu == 0
> +	push	{r2-r12}		@ Push CP15 registers
> +	.else
> +	str	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
> +	str	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
> +	str	r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
> +	str	r5, [vcpu, #CP15_OFFSET(c3_DACR)]
> +	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
> +	strd	r6, r7, [vcpu]
> +	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
> +	strd	r8, r9, [vcpu]
> +	sub	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
> +	str	r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
> +	str	r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
> +	str	r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
> +	.endif
> +
> +	mrc	p15, 0, r2, c13, c0, 1	@ CID
> +	mrc	p15, 0, r3, c13, c0, 2	@ TID_URW
> +	mrc	p15, 0, r4, c13, c0, 3	@ TID_URO
> +	mrc	p15, 0, r5, c13, c0, 4	@ TID_PRIV
> +	mrc	p15, 0, r6, c5, c0, 0	@ DFSR
> +	mrc	p15, 0, r7, c5, c0, 1	@ IFSR
> +	mrc	p15, 0, r8, c5, c1, 0	@ ADFSR
> +	mrc	p15, 0, r9, c5, c1, 1	@ AIFSR
> +	mrc	p15, 0, r10, c6, c0, 0	@ DFAR
> +	mrc	p15, 0, r11, c6, c0, 2	@ IFAR
> +	mrc	p15, 0, r12, c12, c0, 0	@ VBAR
> +
> +	.if \store_to_vcpu == 0
> +	push	{r2-r12}		@ Push CP15 registers
> +	.else
> +	str	r2, [vcpu, #CP15_OFFSET(c13_CID)]
> +	str	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
> +	str	r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
> +	str	r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
> +	str	r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
> +	str	r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
> +	str	r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
> +	str	r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
> +	str	r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
> +	str	r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
> +	str	r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
> +	.endif
> +.endm
> +
> +/*
> + * Reads cp15 registers from memory and writes them to hardware
> + * @read_from_vcpu: If 0, registers are read in-order from the stack,
> + *		    otherwise from the VCPU struct pointed to by vcpup
> + *
> + * Assumes vcpu pointer in vcpu reg
> + */
> +.macro write_cp15_state read_from_vcpu
> +	.if \read_from_vcpu == 0
> +	pop	{r2-r12}
> +	.else
> +	ldr	r2, [vcpu, #CP15_OFFSET(c13_CID)]
> +	ldr	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
> +	ldr	r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
> +	ldr	r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
> +	ldr	r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
> +	ldr	r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
> +	ldr	r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
> +	ldr	r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
> +	ldr	r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
> +	ldr	r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
> +	ldr	r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
> +	.endif
> +
> +	mcr	p15, 0, r2, c13, c0, 1	@ CID
> +	mcr	p15, 0, r3, c13, c0, 2	@ TID_URW
> +	mcr	p15, 0, r4, c13, c0, 3	@ TID_URO
> +	mcr	p15, 0, r5, c13, c0, 4	@ TID_PRIV
> +	mcr	p15, 0, r6, c5, c0, 0	@ DFSR
> +	mcr	p15, 0, r7, c5, c0, 1	@ IFSR
> +	mcr	p15, 0, r8, c5, c1, 0	@ ADFSR
> +	mcr	p15, 0, r9, c5, c1, 1	@ AIFSR
> +	mcr	p15, 0, r10, c6, c0, 0	@ DFAR
> +	mcr	p15, 0, r11, c6, c0, 2	@ IFAR
> +	mcr	p15, 0, r12, c12, c0, 0	@ VBAR
> +
> +	.if \read_from_vcpu == 0
> +	pop	{r2-r12}
> +	.else
> +	ldr	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
> +	ldr	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
> +	ldr	r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
> +	ldr	r5, [vcpu, #CP15_OFFSET(c3_DACR)]
> +	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
> +	ldrd	r6, r7, [vcpu]
> +	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
> +	ldrd	r8, r9, [vcpu]
> +	sub	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
> +	ldr	r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
> +	ldr	r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
> +	ldr	r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
> +	.endif
> +
> +	mcr	p15, 0, r2, c1, c0, 0	@ SCTLR
> +	mcr	p15, 0, r3, c1, c0, 2	@ CPACR
> +	mcr	p15, 0, r4, c2, c0, 2	@ TTBCR
> +	mcr	p15, 0, r5, c3, c0, 0	@ DACR
> +	mcrr	p15, 0, r6, r7, c2	@ TTBR 0
> +	mcrr	p15, 1, r8, r9, c2	@ TTBR 1
> +	mcr	p15, 0, r10, c10, c2, 0	@ PRRR
> +	mcr	p15, 0, r11, c10, c2, 1	@ NMRR
> +	mcr	p15, 2, r12, c0, c0, 0	@ CSSELR
> +.endm
> +
> +/*
> + * Save the VGIC CPU state into memory
> + *
> + * Assumes vcpu pointer in vcpu reg
> + */
> +.macro save_vgic_state
> +.endm
> +
> +/*
> + * Restore the VGIC CPU state from memory
> + *
> + * Assumes vcpu pointer in vcpu reg
> + */
> +.macro restore_vgic_state
> +.endm
> +
> +.equ vmentry,	0
> +.equ vmexit,	1
> +
> +/* Configures the HSTR (Hyp System Trap Register) on entry/return
> + * (hardware reset value is 0) */
> +.macro set_hstr operation
> +	mrc	p15, 4, r2, c1, c1, 3
> +	ldr	r3, =HSTR_T(15)
> +	.if \operation == vmentry
> +	orr	r2, r2, r3		@ Trap CR{15}
> +	.else
> +	bic	r2, r2, r3		@ Don't trap any CRx accesses
> +	.endif
> +	mcr	p15, 4, r2, c1, c1, 3
> +.endm
> +
> +/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
> + * (hardware reset value is 0). Keep previous value in r2. */
> +.macro set_hcptr operation, mask
> +	mrc	p15, 4, r2, c1, c1, 2
> +	ldr	r3, =\mask
> +	.if \operation == vmentry
> +	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
> +	.else
> +	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
> +	.endif
> +	mcr	p15, 4, r3, c1, c1, 2
> +.endm
> +
> +/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
> + * (hardware reset value is 0) */
> +.macro set_hdcr operation
> +	mrc	p15, 4, r2, c1, c1, 1
> +	ldr	r3, =(HDCR_TPM|HDCR_TPMCR)
> +	.if \operation == vmentry
> +	orr	r2, r2, r3		@ Trap some perfmon accesses
> +	.else
> +	bic	r2, r2, r3		@ Don't trap any perfmon accesses
> +	.endif
> +	mcr	p15, 4, r2, c1, c1, 1
> +.endm
> +
> +/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
> +.macro configure_hyp_role operation
> +	mrc	p15, 4, r2, c1, c1, 0	@ HCR
> +	bic	r2, r2, #HCR_VIRT_EXCP_MASK
> +	ldr	r3, =HCR_GUEST_MASK
> +	.if \operation == vmentry
> +	orr	r2, r2, r3
> +	ldr	r3, [vcpu, #VCPU_IRQ_LINES]
irq_lines are accessed atomically from vcpu_interrupt_line(), but there
is no memory barriers or atomic operations here. Looks suspicious though
I am not familiar with ARM memory model. As far as I understand
different translation regimes are used to access this memory, so who
knows what this does to access ordering.


> +	orr	r2, r2, r3
> +	.else
> +	bic	r2, r2, r3
> +	.endif
> +	mcr	p15, 4, r2, c1, c1, 0
> +.endm
> +
> +.macro load_vcpu
> +	mrc	p15, 4, vcpu, c13, c0, 2	@ HTPIDR
> +.endm
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-15  9:43     ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15  9:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
> Provides complete world-switch implementation to switch to other guests
> running in non-secure modes. Includes Hyp exception handlers that
> capture necessary exception information and stores the information on
> the VCPU and KVM structures.
> 
> The following Hyp-ABI is also documented in the code:
> 
> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
>    exception vector code will check that the HVC comes from VMID==0 and if
>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
>    - r0 contains a pointer to a HYP function
>    - r1, r2, and r3 contain arguments to the above function.
>    - The HYP function will be called with its arguments in r0, r1 and r2.
>    On HYP function return, we return directly to SVC.
> 
> A call to a function executing in Hyp mode is performed like the following:
> 
>         <svc code>
>         ldr     r0, =BSYM(my_hyp_fn)
>         ldr     r1, =my_param
>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
>         <svc code>
> 
> Otherwise, the world-switch is pretty straight-forward. All state that
> can be modified by the guest is first backed up on the Hyp stack and the
> VCPU values is loaded onto the hardware. State, which is not loaded, but
> theoretically modifiable by the guest is protected through the
> virtualiation features to generate a trap and cause software emulation.
> Upon guest returns, all state is restored from hardware onto the VCPU
> struct and the original state is restored from the Hyp-stack onto the
> hardware.
> 
> SMP support using the VMPIDR calculated on the basis of the host MPIDR
> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
> 
> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
> a separate patch into the appropriate patches introducing the
> functionality. Note that the VMIDs are stored per VM as required by the ARM
> architecture reference manual.
> 
> To support VFP/NEON we trap those instructions using the HPCTR. When
> we trap, we switch the FPU.  After a guest exit, the VFP state is
> returned to the host.  When disabling access to floating point
> instructions, we also mask FPEXC_EN in order to avoid the guest
> receiving Undefined instruction exceptions before we have a chance to
> switch back the floating point state.  We are reusing vfp_hard_struct,
> so we depend on VFPv3 being enabled in the host kernel, if not, we still
> trap cp10 and cp11 in order to inject an undefined instruction exception
> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
> Antionios Motakis and Rusty Russell.
> 
> Aborts that are permission faults, and not stage-1 page table walk, do
> not report the faulting address in the HPFAR.  We have to resolve the
> IPA, and store it just like the HPFAR register on the VCPU struct. If
> the IPA cannot be resolved, it means another CPU is playing with the
> page tables, and we simply restart the guest.  This quirk was fixed by
> Marc Zyngier.
> 
> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
>  arch/arm/include/asm/kvm_host.h |   10 +
>  arch/arm/kernel/asm-offsets.c   |   25 ++
>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
>  6 files changed, 1108 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm/kvm/interrupts_head.S
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index fb22ee8..a3262a2 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -98,6 +98,18 @@
>  #define TTBCR_T0SZ	3
>  #define HTCR_MASK	(TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
>  
> +/* Hyp System Trap Register */
> +#define HSTR_T(x)	(1 << x)
> +#define HSTR_TTEE	(1 << 16)
> +#define HSTR_TJDBX	(1 << 17)
> +
> +/* Hyp Coprocessor Trap Register */
> +#define HCPTR_TCP(x)	(1 << x)
> +#define HCPTR_TCP_MASK	(0x3fff)
> +#define HCPTR_TASE	(1 << 15)
> +#define HCPTR_TTA	(1 << 20)
> +#define HCPTR_TCPAC	(1 << 31)
> +
>  /* Hyp Debug Configuration Register bits */
>  #define HDCR_TDRA	(1 << 11)
>  #define HDCR_TDOSA	(1 << 10)
> @@ -144,6 +156,45 @@
>  #else
>  #define VTTBR_X		(5 - KVM_T0SZ)
>  #endif
> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
> +#define VTTBR_VMID_SHIFT  (48LLU)
> +#define VTTBR_VMID_MASK	  (0xffLLU << VTTBR_VMID_SHIFT)
> +
> +/* Hyp Syndrome Register (HSR) bits */
> +#define HSR_EC_SHIFT	(26)
> +#define HSR_EC		(0x3fU << HSR_EC_SHIFT)
> +#define HSR_IL		(1U << 25)
> +#define HSR_ISS		(HSR_IL - 1)
> +#define HSR_ISV_SHIFT	(24)
> +#define HSR_ISV		(1U << HSR_ISV_SHIFT)
> +#define HSR_FSC		(0x3f)
> +#define HSR_FSC_TYPE	(0x3c)
> +#define HSR_WNR		(1 << 6)
> +
> +#define FSC_FAULT	(0x04)
> +#define FSC_PERM	(0x0c)
> +
> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> +#define HPFAR_MASK	(~0xf)
>  
> +#define HSR_EC_UNKNOWN	(0x00)
> +#define HSR_EC_WFI	(0x01)
> +#define HSR_EC_CP15_32	(0x03)
> +#define HSR_EC_CP15_64	(0x04)
> +#define HSR_EC_CP14_MR	(0x05)
> +#define HSR_EC_CP14_LS	(0x06)
> +#define HSR_EC_CP_0_13	(0x07)
> +#define HSR_EC_CP10_ID	(0x08)
> +#define HSR_EC_JAZELLE	(0x09)
> +#define HSR_EC_BXJ	(0x0A)
> +#define HSR_EC_CP14_64	(0x0C)
> +#define HSR_EC_SVC_HYP	(0x11)
> +#define HSR_EC_HVC	(0x12)
> +#define HSR_EC_SMC	(0x13)
> +#define HSR_EC_IABT	(0x20)
> +#define HSR_EC_IABT_HYP	(0x21)
> +#define HSR_EC_DABT	(0x24)
> +#define HSR_EC_DABT_HYP	(0x25)
>  
>  #endif /* __ARM_KVM_ARM_H__ */
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 1de6f0d..ddb09da 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -21,6 +21,7 @@
>  
>  #include <asm/kvm.h>
>  #include <asm/kvm_asm.h>
> +#include <asm/fpstate.h>
>  
>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>  #define KVM_USER_MEM_SLOTS 32
> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
>  	u32 hxfar;		/* Hyp Data/Inst Fault Address Register */
>  	u32 hpfar;		/* Hyp IPA Fault Address Register */
>  
> +	/* Floating point registers (VFP and Advanced SIMD/NEON) */
> +	struct vfp_hard_struct vfp_guest;
> +	struct vfp_hard_struct *vfp_host;
> +
> +	/*
> +	 * Anything that is not used directly from assembly code goes
> +	 * here.
> +	 */
>  	/* Interrupt related fields */
>  	u32 irq_lines;		/* IRQ and FIQ levels */
>  
> @@ -112,6 +121,7 @@ struct kvm_one_reg;
>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>  u64 kvm_call_hyp(void *hypfn, ...);
> +void force_vm_exit(const cpumask_t *mask);
>  
>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>  struct kvm;
> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> index c985b48..c8b3272 100644
> --- a/arch/arm/kernel/asm-offsets.c
> +++ b/arch/arm/kernel/asm-offsets.c
> @@ -13,6 +13,9 @@
>  #include <linux/sched.h>
>  #include <linux/mm.h>
>  #include <linux/dma-mapping.h>
> +#ifdef CONFIG_KVM_ARM_HOST
> +#include <linux/kvm_host.h>
> +#endif
>  #include <asm/cacheflush.h>
>  #include <asm/glue-df.h>
>  #include <asm/glue-pf.h>
> @@ -146,5 +149,27 @@ int main(void)
>    DEFINE(DMA_BIDIRECTIONAL,	DMA_BIDIRECTIONAL);
>    DEFINE(DMA_TO_DEVICE,		DMA_TO_DEVICE);
>    DEFINE(DMA_FROM_DEVICE,	DMA_FROM_DEVICE);
> +#ifdef CONFIG_KVM_ARM_HOST
> +  DEFINE(VCPU_KVM,		offsetof(struct kvm_vcpu, kvm));
> +  DEFINE(VCPU_MIDR,		offsetof(struct kvm_vcpu, arch.midr));
> +  DEFINE(VCPU_CP15,		offsetof(struct kvm_vcpu, arch.cp15));
> +  DEFINE(VCPU_VFP_GUEST,	offsetof(struct kvm_vcpu, arch.vfp_guest));
> +  DEFINE(VCPU_VFP_HOST,		offsetof(struct kvm_vcpu, arch.vfp_host));
> +  DEFINE(VCPU_REGS,		offsetof(struct kvm_vcpu, arch.regs));
> +  DEFINE(VCPU_USR_REGS,		offsetof(struct kvm_vcpu, arch.regs.usr_regs));
> +  DEFINE(VCPU_SVC_REGS,		offsetof(struct kvm_vcpu, arch.regs.svc_regs));
> +  DEFINE(VCPU_ABT_REGS,		offsetof(struct kvm_vcpu, arch.regs.abt_regs));
> +  DEFINE(VCPU_UND_REGS,		offsetof(struct kvm_vcpu, arch.regs.und_regs));
> +  DEFINE(VCPU_IRQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.irq_regs));
> +  DEFINE(VCPU_FIQ_REGS,		offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
> +  DEFINE(VCPU_PC,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
> +  DEFINE(VCPU_CPSR,		offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
> +  DEFINE(VCPU_IRQ_LINES,	offsetof(struct kvm_vcpu, arch.irq_lines));
> +  DEFINE(VCPU_HSR,		offsetof(struct kvm_vcpu, arch.hsr));
> +  DEFINE(VCPU_HxFAR,		offsetof(struct kvm_vcpu, arch.hxfar));
> +  DEFINE(VCPU_HPFAR,		offsetof(struct kvm_vcpu, arch.hpfar));
> +  DEFINE(VCPU_HYP_PC,		offsetof(struct kvm_vcpu, arch.hyp_pc));
> +  DEFINE(KVM_VTTBR,		offsetof(struct kvm, arch.vttbr));
> +#endif
>    return 0; 
>  }
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 9b4566e..c94d278 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -40,6 +40,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_emulate.h>
>  
>  #ifdef REQUIRES_VIRT
>  __asm__(".arch_extension	virt");
> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>  static unsigned long hyp_default_vectors;
>  
> +/* The VMID used in the VTTBR */
> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> +static u8 kvm_next_vmid;
> +static DEFINE_SPINLOCK(kvm_vmid_lock);
>  
>  int kvm_arch_hardware_enable(void *garbage)
>  {
> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
>  
>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  {
> +	/* Force users to call KVM_ARM_VCPU_INIT */
> +	vcpu->arch.target = -1;
>  	return 0;
>  }
>  
> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>  	vcpu->cpu = cpu;
> +	vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  
>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
As far as I see the function is unused.

>  {
> +	return v->mode == IN_GUEST_MODE;
> +}
> +
> +/* Just ensure a guest exit from a particular CPU */
> +static void exit_vm_noop(void *info)
> +{
> +}
> +
> +void force_vm_exit(const cpumask_t *mask)
> +{
> +	smp_call_function_many(mask, exit_vm_noop, NULL, true);
> +}
There is make_all_cpus_request() for that. It actually sends IPIs only
to cpus that are running vcpus.

> +
> +/**
> + * need_new_vmid_gen - check that the VMID is still valid
> + * @kvm: The VM's VMID to checkt
> + *
> + * return true if there is a new generation of VMIDs being used
> + *
> + * The hardware supports only 256 values with the value zero reserved for the
> + * host, so we check if an assigned value belongs to a previous generation,
> + * which which requires us to assign a new value. If we're the first to use a
> + * VMID for the new generation, we must flush necessary caches and TLBs on all
> + * CPUs.
> + */
> +static bool need_new_vmid_gen(struct kvm *kvm)
> +{
> +	return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
> +}
> +
> +/**
> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
> + * @kvm	The guest that we are about to run
> + *
> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
> + * caches and TLBs.
> + */
> +static void update_vttbr(struct kvm *kvm)
> +{
> +	phys_addr_t pgd_phys;
> +	u64 vmid;
> +
> +	if (!need_new_vmid_gen(kvm))
> +		return;
> +
> +	spin_lock(&kvm_vmid_lock);
> +
> +	/*
> +	 * We need to re-check the vmid_gen here to ensure that if another vcpu
> +	 * already allocated a valid vmid for this vm, then this vcpu should
> +	 * use the same vmid.
> +	 */
> +	if (!need_new_vmid_gen(kvm)) {
> +		spin_unlock(&kvm_vmid_lock);
> +		return;
> +	}
> +
> +	/* First user of a new VMID generation? */
> +	if (unlikely(kvm_next_vmid == 0)) {
> +		atomic64_inc(&kvm_vmid_gen);
> +		kvm_next_vmid = 1;
> +
> +		/*
> +		 * On SMP we know no other CPUs can use this CPU's or each
> +		 * other's VMID after force_vm_exit returns since the
> +		 * kvm_vmid_lock blocks them from reentry to the guest.
> +		 */
> +		force_vm_exit(cpu_all_mask);
> +		/*
> +		 * Now broadcast TLB + ICACHE invalidation over the inner
> +		 * shareable domain to make sure all data structures are
> +		 * clean.
> +		 */
> +		kvm_call_hyp(__kvm_flush_vm_context);
> +	}
> +
> +	kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
> +	kvm->arch.vmid = kvm_next_vmid;
> +	kvm_next_vmid++;
> +
> +	/* update vttbr to be used with the new vmid */
> +	pgd_phys = virt_to_phys(kvm->arch.pgd);
> +	vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
> +	kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
> +	kvm->arch.vttbr |= vmid;
> +
> +	spin_unlock(&kvm_vmid_lock);
> +}
> +
> +/*
> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
> + * proper exit to QEMU.
> + */
> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +		       int exception_index)
> +{
> +	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>  	return 0;
>  }
>  
> +/**
> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
> + * @vcpu:	The VCPU pointer
> + * @run:	The kvm_run structure pointer used for userspace state exchange
> + *
> + * This function is called through the VCPU_RUN ioctl called from user space. It
> + * will execute VM code in a loop until the time slice for the process is used
> + * or some emulation is needed from user space in which case the function will
> + * return with return value 0 and with the kvm_run structure filled in with the
> + * required data for the requested emulation.
> + */
>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> -	return -EINVAL;
> +	int ret;
> +	sigset_t sigsaved;
> +
> +	/* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
> +	if (unlikely(vcpu->arch.target < 0))
> +		return -ENOEXEC;
> +
> +	if (vcpu->sigset_active)
> +		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
> +
> +	ret = 1;
> +	run->exit_reason = KVM_EXIT_UNKNOWN;
> +	while (ret > 0) {
> +		/*
> +		 * Check conditions before entering the guest
> +		 */
> +		cond_resched();
> +
> +		update_vttbr(vcpu->kvm);
> +
> +		local_irq_disable();
> +
> +		/*
> +		 * Re-check atomic conditions
> +		 */
> +		if (signal_pending(current)) {
> +			ret = -EINTR;
> +			run->exit_reason = KVM_EXIT_INTR;
> +		}
> +
> +		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> +			local_irq_enable();
> +			continue;
> +		}
> +
> +		/**************************************************************
> +		 * Enter the guest
> +		 */
> +		trace_kvm_entry(*vcpu_pc(vcpu));
> +		kvm_guest_enter();
> +		vcpu->mode = IN_GUEST_MODE;
You need to set mode to IN_GUEST_MODE before disabling interrupt and
check that mode != EXITING_GUEST_MODE after disabling interrupt but
before entering the guest. This way you will catch kicks that were sent
between setting of the mode and disabling the interrupts. Also you need
to check vcpu->requests and exit if it is not empty. I see that you do
not use vcpu->requests at all, but you should since common kvm code
assumes that it is used. make_all_cpus_request() uses it for instance.

> +
> +		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
You do not take kvm->srcu lock before entering the guest. It looks
wrong.

> +
> +		vcpu->mode = OUTSIDE_GUEST_MODE;
> +		kvm_guest_exit();
> +		trace_kvm_exit(*vcpu_pc(vcpu));
> +		/*
> +		 * We may have taken a host interrupt in HYP mode (ie
> +		 * while executing the guest). This interrupt is still
> +		 * pending, as we haven't serviced it yet!
> +		 *
> +		 * We're now back in SVC mode, with interrupts
> +		 * disabled.  Enabling the interrupts now will have
> +		 * the effect of taking the interrupt again, in SVC
> +		 * mode this time.
> +		 */
> +		local_irq_enable();
> +
> +		/*
> +		 * Back from guest
> +		 *************************************************************/
> +
> +		ret = handle_exit(vcpu, run, ret);
> +	}
> +
> +	if (vcpu->sigset_active)
> +		sigprocmask(SIG_SETMASK, &sigsaved, NULL);
> +	return ret;
>  }
>  
>  static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index a923590..08adcd5 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -20,9 +20,12 @@
>  #include <linux/const.h>
>  #include <asm/unified.h>
>  #include <asm/page.h>
> +#include <asm/ptrace.h>
>  #include <asm/asm-offsets.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_arm.h>
> +#include <asm/vfpmacros.h>
> +#include "interrupts_head.S"
>  
>  	.text
>  
> @@ -31,36 +34,423 @@ __kvm_hyp_code_start:
>  
>  /********************************************************************
>   * Flush per-VMID TLBs
> + *
> + * void __kvm_tlb_flush_vmid(struct kvm *kvm);
> + *
> + * We rely on the hardware to broadcast the TLB invalidation to all CPUs
> + * inside the inner-shareable domain (which is the case for all v7
> + * implementations).  If we come across a non-IS SMP implementation, we'll
> + * have to use an IPI based mechanism. Until then, we stick to the simple
> + * hardware assisted version.
>   */
>  ENTRY(__kvm_tlb_flush_vmid)
> +	push	{r2, r3}
> +
> +	add	r0, r0, #KVM_VTTBR
> +	ldrd	r2, r3, [r0]
> +	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
> +	isb
> +	mcr     p15, 0, r0, c8, c3, 0	@ TLBIALLIS (rt ignored)
> +	dsb
> +	isb
> +	mov	r2, #0
> +	mov	r3, #0
> +	mcrr	p15, 6, r2, r3, c2	@ Back to VMID #0
> +	isb				@ Not necessary if followed by eret
> +
> +	pop	{r2, r3}
>  	bx	lr
>  ENDPROC(__kvm_tlb_flush_vmid)
>  
>  /********************************************************************
> - * Flush TLBs and instruction caches of current CPU for all VMIDs
> + * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
> + * domain, for all VMIDs
> + *
> + * void __kvm_flush_vm_context(void);
>   */
>  ENTRY(__kvm_flush_vm_context)
> +	mov	r0, #0			@ rn parameter for c15 flushes is SBZ
> +
> +	/* Invalidate NS Non-Hyp TLB Inner Shareable (TLBIALLNSNHIS) */
> +	mcr     p15, 4, r0, c8, c3, 4
> +	/* Invalidate instruction caches Inner Shareable (ICIALLUIS) */
> +	mcr     p15, 0, r0, c7, c1, 0
> +	dsb
> +	isb				@ Not necessary if followed by eret
> +
>  	bx	lr
>  ENDPROC(__kvm_flush_vm_context)
>  
> +
>  /********************************************************************
>   *  Hypervisor world-switch code
> + *
> + *
> + * int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>   */
>  ENTRY(__kvm_vcpu_run)
> -	bx	lr
> +	@ Save the vcpu pointer
> +	mcr	p15, 4, vcpu, c13, c0, 2	@ HTPIDR
> +
> +	save_host_regs
> +
> +	@ Store hardware CP15 state and load guest state
> +	read_cp15_state store_to_vcpu = 0
> +	write_cp15_state read_from_vcpu = 1
> +
> +	@ If the host kernel has not been configured with VFPv3 support,
> +	@ then it is safer if we deny guests from using it as well.
> +#ifdef CONFIG_VFPv3
> +	@ Set FPEXC_EN so the guest doesn't trap floating point instructions
> +	VFPFMRX r2, FPEXC		@ VMRS
> +	push	{r2}
> +	orr	r2, r2, #FPEXC_EN
> +	VFPFMXR FPEXC, r2		@ VMSR
> +#endif
> +
> +	@ Configure Hyp-role
> +	configure_hyp_role vmentry
> +
> +	@ Trap coprocessor CRx accesses
> +	set_hstr vmentry
> +	set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> +	set_hdcr vmentry
> +
> +	@ Write configured ID register into MIDR alias
> +	ldr	r1, [vcpu, #VCPU_MIDR]
> +	mcr	p15, 4, r1, c0, c0, 0
> +
> +	@ Write guest view of MPIDR into VMPIDR
> +	ldr	r1, [vcpu, #CP15_OFFSET(c0_MPIDR)]
> +	mcr	p15, 4, r1, c0, c0, 5
> +
> +	@ Set up guest memory translation
> +	ldr	r1, [vcpu, #VCPU_KVM]
> +	add	r1, r1, #KVM_VTTBR
> +	ldrd	r2, r3, [r1]
> +	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
> +
> +	@ We're all done, just restore the GPRs and go to the guest
> +	restore_guest_regs
> +	clrex				@ Clear exclusive monitor
> +	eret
> +
> +__kvm_vcpu_return:
> +	/*
> +	 * return convention:
> +	 * guest r0, r1, r2 saved on the stack
> +	 * r0: vcpu pointer
> +	 * r1: exception code
> +	 */
> +	save_guest_regs
> +
> +	@ Set VMID == 0
> +	mov	r2, #0
> +	mov	r3, #0
> +	mcrr	p15, 6, r2, r3, c2	@ Write VTTBR
> +
> +	@ Don't trap coprocessor accesses for host kernel
> +	set_hstr vmexit
> +	set_hdcr vmexit
> +	set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> +
> +#ifdef CONFIG_VFPv3
> +	@ Save floating point registers we if let guest use them.
> +	tst	r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
> +	bne	after_vfp_restore
> +
> +	@ Switch VFP/NEON hardware state to the host's
> +	add	r7, vcpu, #VCPU_VFP_GUEST
> +	store_vfp_state r7
> +	add	r7, vcpu, #VCPU_VFP_HOST
> +	ldr	r7, [r7]
> +	restore_vfp_state r7
> +
> +after_vfp_restore:
> +	@ Restore FPEXC_EN which we clobbered on entry
> +	pop	{r2}
> +	VFPFMXR FPEXC, r2
> +#endif
> +
> +	@ Reset Hyp-role
> +	configure_hyp_role vmexit
> +
> +	@ Let host read hardware MIDR
> +	mrc	p15, 0, r2, c0, c0, 0
> +	mcr	p15, 4, r2, c0, c0, 0
> +
> +	@ Back to hardware MPIDR
> +	mrc	p15, 0, r2, c0, c0, 5
> +	mcr	p15, 4, r2, c0, c0, 5
> +
> +	@ Store guest CP15 state and restore host state
> +	read_cp15_state store_to_vcpu = 1
> +	write_cp15_state read_from_vcpu = 0
> +
> +	restore_host_regs
> +	clrex				@ Clear exclusive monitor
> +	mov	r0, r1			@ Return the return code
> +	bx	lr			@ return to IOCTL
>  
>  ENTRY(kvm_call_hyp)
> +	hvc	#0
>  	bx	lr
>  
>  
>  /********************************************************************
>   * Hypervisor exception vector and handlers
> + *
> + *
> + * The KVM/ARM Hypervisor ABI is defined as follows:
> + *
> + * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
> + * instruction is issued since all traps are disabled when running the host
> + * kernel as per the Hyp-mode initialization at boot time.
> + *
> + * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
> + * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
> + * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
> + * instructions are called from within Hyp-mode.
> + *
> + * Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
> + *    Switching to Hyp mode is done through a simple HVC #0 instruction. The
> + *    exception vector code will check that the HVC comes from VMID==0 and if
> + *    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
> + *    - r0 contains a pointer to a HYP function
> + *    - r1, r2, and r3 contain arguments to the above function.
> + *    - The HYP function will be called with its arguments in r0, r1 and r2.
> + *    On HYP function return, we return directly to SVC.
> + *
> + * Note that the above is used to execute code in Hyp-mode from a host-kernel
> + * point of view, and is a different concept from performing a world-switch and
> + * executing guest code SVC mode (with a VMID != 0).
>   */
>  
> +/* Handle undef, svc, pabt, or dabt by crashing with a user notice */
> +.macro bad_exception exception_code, panic_str
> +	push	{r0-r2}
> +	mrrc	p15, 6, r0, r1, c2	@ Read VTTBR
> +	lsr	r1, r1, #16
> +	ands	r1, r1, #0xff
> +	beq	99f
> +
> +	load_vcpu			@ Load VCPU pointer
> +	.if \exception_code == ARM_EXCEPTION_DATA_ABORT
> +	mrc	p15, 4, r2, c5, c2, 0	@ HSR
> +	mrc	p15, 4, r1, c6, c0, 0	@ HDFAR
> +	str	r2, [vcpu, #VCPU_HSR]
> +	str	r1, [vcpu, #VCPU_HxFAR]
> +	.endif
> +	.if \exception_code == ARM_EXCEPTION_PREF_ABORT
> +	mrc	p15, 4, r2, c5, c2, 0	@ HSR
> +	mrc	p15, 4, r1, c6, c0, 2	@ HIFAR
> +	str	r2, [vcpu, #VCPU_HSR]
> +	str	r1, [vcpu, #VCPU_HxFAR]
> +	.endif
> +	mov	r1, #\exception_code
> +	b	__kvm_vcpu_return
> +
> +	@ We were in the host already. Let's craft a panic-ing return to SVC.
> +99:	mrs	r2, cpsr
> +	bic	r2, r2, #MODE_MASK
> +	orr	r2, r2, #SVC_MODE
> +THUMB(	orr	r2, r2, #PSR_T_BIT	)
> +	msr	spsr_cxsf, r2
> +	mrs	r1, ELR_hyp
> +	ldr	r2, =BSYM(panic)
> +	msr	ELR_hyp, r2
> +	ldr	r0, =\panic_str
> +	eret
> +.endm
> +
> +	.text
> +
>  	.align 5
>  __kvm_hyp_vector:
>  	.globl __kvm_hyp_vector
> -	nop
> +
> +	@ Hyp-mode exception vector
> +	W(b)	hyp_reset
> +	W(b)	hyp_undef
> +	W(b)	hyp_svc
> +	W(b)	hyp_pabt
> +	W(b)	hyp_dabt
> +	W(b)	hyp_hvc
> +	W(b)	hyp_irq
> +	W(b)	hyp_fiq
> +
> +	.align
> +hyp_reset:
> +	b	hyp_reset
> +
> +	.align
> +hyp_undef:
> +	bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str
> +
> +	.align
> +hyp_svc:
> +	bad_exception ARM_EXCEPTION_HVC, svc_die_str
> +
> +	.align
> +hyp_pabt:
> +	bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str
> +
> +	.align
> +hyp_dabt:
> +	bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str
> +
> +	.align
> +hyp_hvc:
> +	/*
> +	 * Getting here is either becuase of a trap from a guest or from calling
> +	 * HVC from the host kernel, which means "switch to Hyp mode".
> +	 */
> +	push	{r0, r1, r2}
> +
> +	@ Check syndrome register
> +	mrc	p15, 4, r1, c5, c2, 0	@ HSR
> +	lsr	r0, r1, #HSR_EC_SHIFT
> +#ifdef CONFIG_VFPv3
> +	cmp	r0, #HSR_EC_CP_0_13
> +	beq	switch_to_guest_vfp
> +#endif
> +	cmp	r0, #HSR_EC_HVC
> +	bne	guest_trap		@ Not HVC instr.
> +
> +	/*
> +	 * Let's check if the HVC came from VMID 0 and allow simple
> +	 * switch to Hyp mode
> +	 */
> +	mrrc    p15, 6, r0, r2, c2
> +	lsr     r2, r2, #16
> +	and     r2, r2, #0xff
> +	cmp     r2, #0
> +	bne	guest_trap		@ Guest called HVC
> +
> +host_switch_to_hyp:
> +	pop	{r0, r1, r2}
> +
> +	push	{lr}
> +	mrs	lr, SPSR
> +	push	{lr}
> +
> +	mov	lr, r0
> +	mov	r0, r1
> +	mov	r1, r2
> +	mov	r2, r3
> +
> +THUMB(	orr	lr, #1)
> +	blx	lr			@ Call the HYP function
> +
> +	pop	{lr}
> +	msr	SPSR_csxf, lr
> +	pop	{lr}
> +	eret
> +
> +guest_trap:
> +	load_vcpu			@ Load VCPU pointer to r0
> +	str	r1, [vcpu, #VCPU_HSR]
> +
> +	@ Check if we need the fault information
> +	lsr	r1, r1, #HSR_EC_SHIFT
> +	cmp	r1, #HSR_EC_IABT
> +	mrceq	p15, 4, r2, c6, c0, 2	@ HIFAR
> +	beq	2f
> +	cmp	r1, #HSR_EC_DABT
> +	bne	1f
> +	mrc	p15, 4, r2, c6, c0, 0	@ HDFAR
> +
> +2:	str	r2, [vcpu, #VCPU_HxFAR]
> +
> +	/*
> +	 * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode:
> +	 *
> +	 * Abort on the stage 2 translation for a memory access from a
> +	 * Non-secure PL1 or PL0 mode:
> +	 *
> +	 * For any Access flag fault or Translation fault, and also for any
> +	 * Permission fault on the stage 2 translation of a memory access
> +	 * made as part of a translation table walk for a stage 1 translation,
> +	 * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR
> +	 * is UNKNOWN.
> +	 */
> +
> +	/* Check for permission fault, and S1PTW */
> +	mrc	p15, 4, r1, c5, c2, 0	@ HSR
> +	and	r0, r1, #HSR_FSC_TYPE
> +	cmp	r0, #FSC_PERM
> +	tsteq	r1, #(1 << 7)		@ S1PTW
> +	mrcne	p15, 4, r2, c6, c0, 4	@ HPFAR
> +	bne	3f
> +
> +	/* Resolve IPA using the xFAR */
> +	mcr	p15, 0, r2, c7, c8, 0	@ ATS1CPR
> +	isb
> +	mrrc	p15, 0, r0, r1, c7	@ PAR
> +	tst	r0, #1
> +	bne	4f			@ Failed translation
> +	ubfx	r2, r0, #12, #20
> +	lsl	r2, r2, #4
> +	orr	r2, r2, r1, lsl #24
> +
> +3:	load_vcpu			@ Load VCPU pointer to r0
> +	str	r2, [r0, #VCPU_HPFAR]
> +
> +1:	mov	r1, #ARM_EXCEPTION_HVC
> +	b	__kvm_vcpu_return
> +
> +4:	pop	{r0, r1, r2}		@ Failed translation, return to guest
> +	eret
> +
> +/*
> + * If VFPv3 support is not available, then we will not switch the VFP
> + * registers; however cp10 and cp11 accesses will still trap and fallback
> + * to the regular coprocessor emulation code, which currently will
> + * inject an undefined exception to the guest.
> + */
> +#ifdef CONFIG_VFPv3
> +switch_to_guest_vfp:
> +	load_vcpu			@ Load VCPU pointer to r0
> +	push	{r3-r7}
> +
> +	@ NEON/VFP used.  Turn on VFP access.
> +	set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
> +
> +	@ Switch VFP/NEON hardware state to the guest's
> +	add	r7, r0, #VCPU_VFP_HOST
> +	ldr	r7, [r7]
> +	store_vfp_state r7
> +	add	r7, r0, #VCPU_VFP_GUEST
> +	restore_vfp_state r7
> +
> +	pop	{r3-r7}
> +	pop	{r0-r2}
> +	eret
> +#endif
> +
> +	.align
> +hyp_irq:
> +	push	{r0, r1, r2}
> +	mov	r1, #ARM_EXCEPTION_IRQ
> +	load_vcpu			@ Load VCPU pointer to r0
> +	b	__kvm_vcpu_return
> +
> +	.align
> +hyp_fiq:
> +	b	hyp_fiq
> +
> +	.ltorg
>  
>  __kvm_hyp_code_end:
>  	.globl	__kvm_hyp_code_end
> +
> +	.section ".rodata"
> +
> +und_die_str:
> +	.ascii	"unexpected undefined exception in Hyp mode at: %#08x"
> +pabt_die_str:
> +	.ascii	"unexpected prefetch abort in Hyp mode at: %#08x"
> +dabt_die_str:
> +	.ascii	"unexpected data abort in Hyp mode at: %#08x"
> +svc_die_str:
> +	.ascii	"unexpected HVC/SVC trap in Hyp mode at: %#08x"
> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> new file mode 100644
> index 0000000..f59a580
> --- /dev/null
> +++ b/arch/arm/kvm/interrupts_head.S
> @@ -0,0 +1,443 @@
> +#define VCPU_USR_REG(_reg_nr)	(VCPU_USR_REGS + (_reg_nr * 4))
> +#define VCPU_USR_SP		(VCPU_USR_REG(13))
> +#define VCPU_USR_LR		(VCPU_USR_REG(14))
> +#define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4))
> +
> +/*
> + * Many of these macros need to access the VCPU structure, which is always
> + * held in r0. These macros should never clobber r1, as it is used to hold the
> + * exception code on the return path (except of course the macro that switches
> + * all the registers before the final jump to the VM).
> + */
> +vcpu	.req	r0		@ vcpu pointer always in r0
> +
> +/* Clobbers {r2-r6} */
> +.macro store_vfp_state vfp_base
> +	@ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions
> +	VFPFMRX	r2, FPEXC
> +	@ Make sure VFP is enabled so we can touch the registers.
> +	orr	r6, r2, #FPEXC_EN
> +	VFPFMXR	FPEXC, r6
> +
> +	VFPFMRX	r3, FPSCR
> +	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
> +	beq	1f
> +	@ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so
> +	@ we only need to save them if FPEXC_EX is set.
> +	VFPFMRX r4, FPINST
> +	tst	r2, #FPEXC_FP2V
> +	VFPFMRX r5, FPINST2, ne		@ vmrsne
> +	bic	r6, r2, #FPEXC_EX	@ FPEXC_EX disable
> +	VFPFMXR	FPEXC, r6
> +1:
> +	VFPFSTMIA \vfp_base, r6		@ Save VFP registers
> +	stm	\vfp_base, {r2-r5}	@ Save FPEXC, FPSCR, FPINST, FPINST2
> +.endm
> +
> +/* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */
> +.macro restore_vfp_state vfp_base
> +	VFPFLDMIA \vfp_base, r6		@ Load VFP registers
> +	ldm	\vfp_base, {r2-r5}	@ Load FPEXC, FPSCR, FPINST, FPINST2
> +
> +	VFPFMXR FPSCR, r3
> +	tst	r2, #FPEXC_EX		@ Check for VFP Subarchitecture
> +	beq	1f
> +	VFPFMXR FPINST, r4
> +	tst	r2, #FPEXC_FP2V
> +	VFPFMXR FPINST2, r5, ne
> +1:
> +	VFPFMXR FPEXC, r2	@ FPEXC	(last, in case !EN)
> +.endm
> +
> +/* These are simply for the macros to work - value don't have meaning */
> +.equ usr, 0
> +.equ svc, 1
> +.equ abt, 2
> +.equ und, 3
> +.equ irq, 4
> +.equ fiq, 5
> +
> +.macro push_host_regs_mode mode
> +	mrs	r2, SP_\mode
> +	mrs	r3, LR_\mode
> +	mrs	r4, SPSR_\mode
> +	push	{r2, r3, r4}
> +.endm
> +
> +/*
> + * Store all host persistent registers on the stack.
> + * Clobbers all registers, in all modes, except r0 and r1.
> + */
> +.macro save_host_regs
> +	/* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
> +	mrs	r2, ELR_hyp
> +	push	{r2}
> +
> +	/* usr regs */
> +	push	{r4-r12}	@ r0-r3 are always clobbered
> +	mrs	r2, SP_usr
> +	mov	r3, lr
> +	push	{r2, r3}
> +
> +	push_host_regs_mode svc
> +	push_host_regs_mode abt
> +	push_host_regs_mode und
> +	push_host_regs_mode irq
> +
> +	/* fiq regs */
> +	mrs	r2, r8_fiq
> +	mrs	r3, r9_fiq
> +	mrs	r4, r10_fiq
> +	mrs	r5, r11_fiq
> +	mrs	r6, r12_fiq
> +	mrs	r7, SP_fiq
> +	mrs	r8, LR_fiq
> +	mrs	r9, SPSR_fiq
> +	push	{r2-r9}
> +.endm
> +
> +.macro pop_host_regs_mode mode
> +	pop	{r2, r3, r4}
> +	msr	SP_\mode, r2
> +	msr	LR_\mode, r3
> +	msr	SPSR_\mode, r4
> +.endm
> +
> +/*
> + * Restore all host registers from the stack.
> + * Clobbers all registers, in all modes, except r0 and r1.
> + */
> +.macro restore_host_regs
> +	pop	{r2-r9}
> +	msr	r8_fiq, r2
> +	msr	r9_fiq, r3
> +	msr	r10_fiq, r4
> +	msr	r11_fiq, r5
> +	msr	r12_fiq, r6
> +	msr	SP_fiq, r7
> +	msr	LR_fiq, r8
> +	msr	SPSR_fiq, r9
> +
> +	pop_host_regs_mode irq
> +	pop_host_regs_mode und
> +	pop_host_regs_mode abt
> +	pop_host_regs_mode svc
> +
> +	pop	{r2, r3}
> +	msr	SP_usr, r2
> +	mov	lr, r3
> +	pop	{r4-r12}
> +
> +	pop	{r2}
> +	msr	ELR_hyp, r2
> +.endm
> +
> +/*
> + * Restore SP, LR and SPSR for a given mode. offset is the offset of
> + * this mode's registers from the VCPU base.
> + *
> + * Assumes vcpu pointer in vcpu reg
> + *
> + * Clobbers r1, r2, r3, r4.
> + */
> +.macro restore_guest_regs_mode mode, offset
> +	add	r1, vcpu, \offset
> +	ldm	r1, {r2, r3, r4}
> +	msr	SP_\mode, r2
> +	msr	LR_\mode, r3
> +	msr	SPSR_\mode, r4
> +.endm
> +
> +/*
> + * Restore all guest registers from the vcpu struct.
> + *
> + * Assumes vcpu pointer in vcpu reg
> + *
> + * Clobbers *all* registers.
> + */
> +.macro restore_guest_regs
> +	restore_guest_regs_mode svc, #VCPU_SVC_REGS
> +	restore_guest_regs_mode abt, #VCPU_ABT_REGS
> +	restore_guest_regs_mode und, #VCPU_UND_REGS
> +	restore_guest_regs_mode irq, #VCPU_IRQ_REGS
> +
> +	add	r1, vcpu, #VCPU_FIQ_REGS
> +	ldm	r1, {r2-r9}
> +	msr	r8_fiq, r2
> +	msr	r9_fiq, r3
> +	msr	r10_fiq, r4
> +	msr	r11_fiq, r5
> +	msr	r12_fiq, r6
> +	msr	SP_fiq, r7
> +	msr	LR_fiq, r8
> +	msr	SPSR_fiq, r9
> +
> +	@ Load return state
> +	ldr	r2, [vcpu, #VCPU_PC]
> +	ldr	r3, [vcpu, #VCPU_CPSR]
> +	msr	ELR_hyp, r2
> +	msr	SPSR_cxsf, r3
> +
> +	@ Load user registers
> +	ldr	r2, [vcpu, #VCPU_USR_SP]
> +	ldr	r3, [vcpu, #VCPU_USR_LR]
> +	msr	SP_usr, r2
> +	mov	lr, r3
> +	add	vcpu, vcpu, #(VCPU_USR_REGS)
> +	ldm	vcpu, {r0-r12}
> +.endm
> +
> +/*
> + * Save SP, LR and SPSR for a given mode. offset is the offset of
> + * this mode's registers from the VCPU base.
> + *
> + * Assumes vcpu pointer in vcpu reg
> + *
> + * Clobbers r2, r3, r4, r5.
> + */
> +.macro save_guest_regs_mode mode, offset
> +	add	r2, vcpu, \offset
> +	mrs	r3, SP_\mode
> +	mrs	r4, LR_\mode
> +	mrs	r5, SPSR_\mode
> +	stm	r2, {r3, r4, r5}
> +.endm
> +
> +/*
> + * Save all guest registers to the vcpu struct
> + * Expects guest's r0, r1, r2 on the stack.
> + *
> + * Assumes vcpu pointer in vcpu reg
> + *
> + * Clobbers r2, r3, r4, r5.
> + */
> +.macro save_guest_regs
> +	@ Store usr registers
> +	add	r2, vcpu, #VCPU_USR_REG(3)
> +	stm	r2, {r3-r12}
> +	add	r2, vcpu, #VCPU_USR_REG(0)
> +	pop	{r3, r4, r5}		@ r0, r1, r2
> +	stm	r2, {r3, r4, r5}
> +	mrs	r2, SP_usr
> +	mov	r3, lr
> +	str	r2, [vcpu, #VCPU_USR_SP]
> +	str	r3, [vcpu, #VCPU_USR_LR]
> +
> +	@ Store return state
> +	mrs	r2, ELR_hyp
> +	mrs	r3, spsr
> +	str	r2, [vcpu, #VCPU_PC]
> +	str	r3, [vcpu, #VCPU_CPSR]
> +
> +	@ Store other guest registers
> +	save_guest_regs_mode svc, #VCPU_SVC_REGS
> +	save_guest_regs_mode abt, #VCPU_ABT_REGS
> +	save_guest_regs_mode und, #VCPU_UND_REGS
> +	save_guest_regs_mode irq, #VCPU_IRQ_REGS
> +.endm
> +
> +/* Reads cp15 registers from hardware and stores them in memory
> + * @store_to_vcpu: If 0, registers are written in-order to the stack,
> + * 		   otherwise to the VCPU struct pointed to by vcpup
> + *
> + * Assumes vcpu pointer in vcpu reg
> + *
> + * Clobbers r2 - r12
> + */
> +.macro read_cp15_state store_to_vcpu
> +	mrc	p15, 0, r2, c1, c0, 0	@ SCTLR
> +	mrc	p15, 0, r3, c1, c0, 2	@ CPACR
> +	mrc	p15, 0, r4, c2, c0, 2	@ TTBCR
> +	mrc	p15, 0, r5, c3, c0, 0	@ DACR
> +	mrrc	p15, 0, r6, r7, c2	@ TTBR 0
> +	mrrc	p15, 1, r8, r9, c2	@ TTBR 1
> +	mrc	p15, 0, r10, c10, c2, 0	@ PRRR
> +	mrc	p15, 0, r11, c10, c2, 1	@ NMRR
> +	mrc	p15, 2, r12, c0, c0, 0	@ CSSELR
> +
> +	.if \store_to_vcpu == 0
> +	push	{r2-r12}		@ Push CP15 registers
> +	.else
> +	str	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
> +	str	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
> +	str	r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
> +	str	r5, [vcpu, #CP15_OFFSET(c3_DACR)]
> +	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
> +	strd	r6, r7, [vcpu]
> +	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
> +	strd	r8, r9, [vcpu]
> +	sub	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
> +	str	r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
> +	str	r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
> +	str	r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
> +	.endif
> +
> +	mrc	p15, 0, r2, c13, c0, 1	@ CID
> +	mrc	p15, 0, r3, c13, c0, 2	@ TID_URW
> +	mrc	p15, 0, r4, c13, c0, 3	@ TID_URO
> +	mrc	p15, 0, r5, c13, c0, 4	@ TID_PRIV
> +	mrc	p15, 0, r6, c5, c0, 0	@ DFSR
> +	mrc	p15, 0, r7, c5, c0, 1	@ IFSR
> +	mrc	p15, 0, r8, c5, c1, 0	@ ADFSR
> +	mrc	p15, 0, r9, c5, c1, 1	@ AIFSR
> +	mrc	p15, 0, r10, c6, c0, 0	@ DFAR
> +	mrc	p15, 0, r11, c6, c0, 2	@ IFAR
> +	mrc	p15, 0, r12, c12, c0, 0	@ VBAR
> +
> +	.if \store_to_vcpu == 0
> +	push	{r2-r12}		@ Push CP15 registers
> +	.else
> +	str	r2, [vcpu, #CP15_OFFSET(c13_CID)]
> +	str	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
> +	str	r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
> +	str	r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
> +	str	r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
> +	str	r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
> +	str	r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
> +	str	r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
> +	str	r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
> +	str	r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
> +	str	r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
> +	.endif
> +.endm
> +
> +/*
> + * Reads cp15 registers from memory and writes them to hardware
> + * @read_from_vcpu: If 0, registers are read in-order from the stack,
> + *		    otherwise from the VCPU struct pointed to by vcpup
> + *
> + * Assumes vcpu pointer in vcpu reg
> + */
> +.macro write_cp15_state read_from_vcpu
> +	.if \read_from_vcpu == 0
> +	pop	{r2-r12}
> +	.else
> +	ldr	r2, [vcpu, #CP15_OFFSET(c13_CID)]
> +	ldr	r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
> +	ldr	r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
> +	ldr	r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
> +	ldr	r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
> +	ldr	r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
> +	ldr	r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
> +	ldr	r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
> +	ldr	r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
> +	ldr	r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
> +	ldr	r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
> +	.endif
> +
> +	mcr	p15, 0, r2, c13, c0, 1	@ CID
> +	mcr	p15, 0, r3, c13, c0, 2	@ TID_URW
> +	mcr	p15, 0, r4, c13, c0, 3	@ TID_URO
> +	mcr	p15, 0, r5, c13, c0, 4	@ TID_PRIV
> +	mcr	p15, 0, r6, c5, c0, 0	@ DFSR
> +	mcr	p15, 0, r7, c5, c0, 1	@ IFSR
> +	mcr	p15, 0, r8, c5, c1, 0	@ ADFSR
> +	mcr	p15, 0, r9, c5, c1, 1	@ AIFSR
> +	mcr	p15, 0, r10, c6, c0, 0	@ DFAR
> +	mcr	p15, 0, r11, c6, c0, 2	@ IFAR
> +	mcr	p15, 0, r12, c12, c0, 0	@ VBAR
> +
> +	.if \read_from_vcpu == 0
> +	pop	{r2-r12}
> +	.else
> +	ldr	r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
> +	ldr	r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
> +	ldr	r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
> +	ldr	r5, [vcpu, #CP15_OFFSET(c3_DACR)]
> +	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
> +	ldrd	r6, r7, [vcpu]
> +	add	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
> +	ldrd	r8, r9, [vcpu]
> +	sub	vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
> +	ldr	r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
> +	ldr	r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
> +	ldr	r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
> +	.endif
> +
> +	mcr	p15, 0, r2, c1, c0, 0	@ SCTLR
> +	mcr	p15, 0, r3, c1, c0, 2	@ CPACR
> +	mcr	p15, 0, r4, c2, c0, 2	@ TTBCR
> +	mcr	p15, 0, r5, c3, c0, 0	@ DACR
> +	mcrr	p15, 0, r6, r7, c2	@ TTBR 0
> +	mcrr	p15, 1, r8, r9, c2	@ TTBR 1
> +	mcr	p15, 0, r10, c10, c2, 0	@ PRRR
> +	mcr	p15, 0, r11, c10, c2, 1	@ NMRR
> +	mcr	p15, 2, r12, c0, c0, 0	@ CSSELR
> +.endm
> +
> +/*
> + * Save the VGIC CPU state into memory
> + *
> + * Assumes vcpu pointer in vcpu reg
> + */
> +.macro save_vgic_state
> +.endm
> +
> +/*
> + * Restore the VGIC CPU state from memory
> + *
> + * Assumes vcpu pointer in vcpu reg
> + */
> +.macro restore_vgic_state
> +.endm
> +
> +.equ vmentry,	0
> +.equ vmexit,	1
> +
> +/* Configures the HSTR (Hyp System Trap Register) on entry/return
> + * (hardware reset value is 0) */
> +.macro set_hstr operation
> +	mrc	p15, 4, r2, c1, c1, 3
> +	ldr	r3, =HSTR_T(15)
> +	.if \operation == vmentry
> +	orr	r2, r2, r3		@ Trap CR{15}
> +	.else
> +	bic	r2, r2, r3		@ Don't trap any CRx accesses
> +	.endif
> +	mcr	p15, 4, r2, c1, c1, 3
> +.endm
> +
> +/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
> + * (hardware reset value is 0). Keep previous value in r2. */
> +.macro set_hcptr operation, mask
> +	mrc	p15, 4, r2, c1, c1, 2
> +	ldr	r3, =\mask
> +	.if \operation == vmentry
> +	orr	r3, r2, r3		@ Trap coproc-accesses defined in mask
> +	.else
> +	bic	r3, r2, r3		@ Don't trap defined coproc-accesses
> +	.endif
> +	mcr	p15, 4, r3, c1, c1, 2
> +.endm
> +
> +/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
> + * (hardware reset value is 0) */
> +.macro set_hdcr operation
> +	mrc	p15, 4, r2, c1, c1, 1
> +	ldr	r3, =(HDCR_TPM|HDCR_TPMCR)
> +	.if \operation == vmentry
> +	orr	r2, r2, r3		@ Trap some perfmon accesses
> +	.else
> +	bic	r2, r2, r3		@ Don't trap any perfmon accesses
> +	.endif
> +	mcr	p15, 4, r2, c1, c1, 1
> +.endm
> +
> +/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
> +.macro configure_hyp_role operation
> +	mrc	p15, 4, r2, c1, c1, 0	@ HCR
> +	bic	r2, r2, #HCR_VIRT_EXCP_MASK
> +	ldr	r3, =HCR_GUEST_MASK
> +	.if \operation == vmentry
> +	orr	r2, r2, r3
> +	ldr	r3, [vcpu, #VCPU_IRQ_LINES]
irq_lines are accessed atomically from vcpu_interrupt_line(), but there
is no memory barriers or atomic operations here. Looks suspicious though
I am not familiar with ARM memory model. As far as I understand
different translation regimes are used to access this memory, so who
knows what this does to access ordering.


> +	orr	r2, r2, r3
> +	.else
> +	bic	r2, r2, r3
> +	.endif
> +	mcr	p15, 4, r2, c1, c1, 0
> +.endm
> +
> +.macro load_vcpu
> +	mrc	p15, 4, vcpu, c13, c0, 2	@ HTPIDR
> +.endm
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2013-01-08 18:39   ` Christoffer Dall
@ 2013-01-15  9:56     ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15  9:56 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, linux-arm-kernel, kvmarm, Marcelo Tosatti

On Tue, Jan 08, 2013 at 01:39:17PM -0500, Christoffer Dall wrote:
> From: Christoffer Dall <cdall@cs.columbia.edu>
> 
> All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
> works semantically well for the GIC as we in fact raise/lower a line on
> a machine component (the gic).  The IOCTL uses the follwing struct.
> 
> struct kvm_irq_level {
> 	union {
> 		__u32 irq;     /* GSI */
> 		__s32 status;  /* not used for KVM_IRQ_LEVEL */
> 	};
> 	__u32 level;           /* 0 or 1 */
> };
> 
> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
CPU level interrupt should use KVM_INTERRUPT instead.

> (GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
> specific cpus.  The irq field is interpreted like this:
> 
Haven't read about GIC yet. Is PPI an interrupt that device can send
directly to a specific CPU? Can we model that with irq routing like we do
for MSI?

>   bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
>   field: | irq_type  | vcpu_index |   irq_number   |
> 
> The irq_type field has the following values:
> - irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
> - irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
>                (the vcpu_index field is ignored)
> - irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)
> 
> The irq_number thus corresponds to the irq ID in as in the GICv2 specs.
> 
> This is documented in Documentation/kvm/api.txt.
> 
> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  Documentation/virtual/kvm/api.txt |   25 ++++++++++++--
>  arch/arm/include/asm/kvm_arm.h    |    1 +
>  arch/arm/include/uapi/asm/kvm.h   |   21 ++++++++++++
>  arch/arm/kvm/arm.c                |   65 +++++++++++++++++++++++++++++++++++++
>  arch/arm/kvm/trace.h              |   25 ++++++++++++++
>  include/uapi/linux/kvm.h          |    1 +
>  6 files changed, 134 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 4237c27..5050492 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -615,15 +615,32 @@ created.
>  4.25 KVM_IRQ_LINE
>  
>  Capability: KVM_CAP_IRQCHIP
> -Architectures: x86, ia64
> +Architectures: x86, ia64, arm
>  Type: vm ioctl
>  Parameters: struct kvm_irq_level
>  Returns: 0 on success, -1 on error
>  
>  Sets the level of a GSI input to the interrupt controller model in the kernel.
> -Requires that an interrupt controller model has been previously created with
> -KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
> -to be set to 1 and then back to 0.
> +On some architectures it is required that an interrupt controller model has
> +been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
> +interrupts require the level to be set to 1 and then back to 0.
> +
> +ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
> +(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
> +specific cpus.  The irq field is interpreted like this:
> +
> +  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
> +  field: | irq_type  | vcpu_index |     irq_id     |
> +
> +The irq_type field has the following values:
> +- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
> +- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
> +               (the vcpu_index field is ignored)
> +- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
> +
> +(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
> +
> +In both cases, level is used to raise/lower the line.
>  
>  struct kvm_irq_level {
>  	union {
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index 613afe2..fb22ee8 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -68,6 +68,7 @@
>  #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
>  			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
>  			HCR_SWIO | HCR_TIDCP)
> +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>  
>  /* Hyp System Control Register (HSCTLR) bits */
>  #define HSCTLR_TE	(1 << 30)
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index c6298b1..4cf6d8f 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -23,6 +23,7 @@
>  #include <asm/ptrace.h>
>  
>  #define __KVM_HAVE_GUEST_DEBUG
> +#define __KVM_HAVE_IRQ_LINE
>  
>  #define KVM_REG_SIZE(id)						\
>  	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> @@ -103,4 +104,24 @@ struct kvm_arch_memory_slot {
>  #define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
>  #define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
>  
> +/* KVM_IRQ_LINE irq field index values */
> +#define KVM_ARM_IRQ_TYPE_SHIFT		24
> +#define KVM_ARM_IRQ_TYPE_MASK		0xff
> +#define KVM_ARM_IRQ_VCPU_SHIFT		16
> +#define KVM_ARM_IRQ_VCPU_MASK		0xff
> +#define KVM_ARM_IRQ_NUM_SHIFT		0
> +#define KVM_ARM_IRQ_NUM_MASK		0xffff
> +
> +/* irq_type field */
> +#define KVM_ARM_IRQ_TYPE_CPU		0
> +#define KVM_ARM_IRQ_TYPE_SPI		1
> +#define KVM_ARM_IRQ_TYPE_PPI		2
> +
> +/* out-of-kernel GIC cpu interrupt injection irq_number field */
> +#define KVM_ARM_IRQ_CPU_IRQ		0
> +#define KVM_ARM_IRQ_CPU_FIQ		1
> +
> +/* Highest supported SPI, from VGIC_NR_IRQS */
> +#define KVM_ARM_IRQ_GIC_MAX		127
> +
>  #endif /* __ARM_KVM_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index ab82039..9b4566e 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -24,6 +24,7 @@
>  #include <linux/fs.h>
>  #include <linux/mman.h>
>  #include <linux/sched.h>
> +#include <linux/kvm.h>
>  #include <trace/events/kvm.h>
>  
>  #define CREATE_TRACE_POINTS
> @@ -284,6 +285,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
> +	vcpu->cpu = cpu;
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -324,6 +326,69 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	return -EINVAL;
>  }
>  
> +static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
> +{
> +	int bit_index;
> +	bool set;
> +	unsigned long *ptr;
> +
> +	if (number == KVM_ARM_IRQ_CPU_IRQ)
> +		bit_index = __ffs(HCR_VI);
> +	else /* KVM_ARM_IRQ_CPU_FIQ */
> +		bit_index = __ffs(HCR_VF);
> +
> +	ptr = (unsigned long *)&vcpu->arch.irq_lines;
> +	if (level)
> +		set = test_and_set_bit(bit_index, ptr);
> +	else
> +		set = test_and_clear_bit(bit_index, ptr);
> +
> +	/*
> +	 * If we didn't change anything, no need to wake up or kick other CPUs
> +	 */
> +	if (set == level)
> +		return 0;
> +
> +	/*
> +	 * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
> +	 * trigger a world-switch round on the running physical CPU to set the
> +	 * virtual IRQ/FIQ fields in the HCR appropriately.
> +	 */
> +	kvm_vcpu_kick(vcpu);
> +
> +	return 0;
> +}
> +
> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
> +{
> +	u32 irq = irq_level->irq;
> +	unsigned int irq_type, vcpu_idx, irq_num;
> +	int nrcpus = atomic_read(&kvm->online_vcpus);
> +	struct kvm_vcpu *vcpu = NULL;
> +	bool level = irq_level->level;
> +
> +	irq_type = (irq >> KVM_ARM_IRQ_TYPE_SHIFT) & KVM_ARM_IRQ_TYPE_MASK;
> +	vcpu_idx = (irq >> KVM_ARM_IRQ_VCPU_SHIFT) & KVM_ARM_IRQ_VCPU_MASK;
> +	irq_num = (irq >> KVM_ARM_IRQ_NUM_SHIFT) & KVM_ARM_IRQ_NUM_MASK;
> +
> +	trace_kvm_irq_line(irq_type, vcpu_idx, irq_num, irq_level->level);
> +
> +	if (irq_type != KVM_ARM_IRQ_TYPE_CPU)
> +		return -EINVAL;
> +
> +	if (vcpu_idx >= nrcpus)
> +		return -EINVAL;
> +
> +	vcpu = kvm_get_vcpu(kvm, vcpu_idx);
> +	if (!vcpu)
> +		return -EINVAL;
> +
> +	if (irq_num > KVM_ARM_IRQ_CPU_FIQ)
> +		return -EINVAL;
> +
> +	return vcpu_interrupt_line(vcpu, irq_num, level);
> +}
> +
>  long kvm_arch_vcpu_ioctl(struct file *filp,
>  			 unsigned int ioctl, unsigned long arg)
>  {
> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index 862b2cc..105d1f7 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -39,6 +39,31 @@ TRACE_EVENT(kvm_exit,
>  	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
>  );
>  
> +TRACE_EVENT(kvm_irq_line,
> +	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
> +	TP_ARGS(type, vcpu_idx, irq_num, level),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned int,	type		)
> +		__field(	int,		vcpu_idx	)
> +		__field(	int,		irq_num		)
> +		__field(	int,		level		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->type		= type;
> +		__entry->vcpu_idx	= vcpu_idx;
> +		__entry->irq_num	= irq_num;
> +		__entry->level		= level;
> +	),
> +
> +	TP_printk("Inject %s interrupt (%d), vcpu->idx: %d, num: %d, level: %d",
> +		  (__entry->type == KVM_ARM_IRQ_TYPE_CPU) ? "CPU" :
> +		  (__entry->type == KVM_ARM_IRQ_TYPE_PPI) ? "VGIC PPI" :
> +		  (__entry->type == KVM_ARM_IRQ_TYPE_SPI) ? "VGIC SPI" : "UNKNOWN",
> +		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
> +);
> +
>  TRACE_EVENT(kvm_unmap_hva,
>  	TP_PROTO(unsigned long hva),
>  	TP_ARGS(hva),
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 24978d5..dc63665 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -115,6 +115,7 @@ struct kvm_irq_level {
>  	 * ACPI gsi notion of irq.
>  	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
>  	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
> +	 * For ARM: See Documentation/virtual/kvm/api.txt
>  	 */
>  	union {
>  		__u32 irq;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2013-01-15  9:56     ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15  9:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 08, 2013 at 01:39:17PM -0500, Christoffer Dall wrote:
> From: Christoffer Dall <cdall@cs.columbia.edu>
> 
> All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
> works semantically well for the GIC as we in fact raise/lower a line on
> a machine component (the gic).  The IOCTL uses the follwing struct.
> 
> struct kvm_irq_level {
> 	union {
> 		__u32 irq;     /* GSI */
> 		__s32 status;  /* not used for KVM_IRQ_LEVEL */
> 	};
> 	__u32 level;           /* 0 or 1 */
> };
> 
> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
CPU level interrupt should use KVM_INTERRUPT instead.

> (GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
> specific cpus.  The irq field is interpreted like this:
> 
Haven't read about GIC yet. Is PPI an interrupt that device can send
directly to a specific CPU? Can we model that with irq routing like we do
for MSI?

>  ?bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
>   field: | irq_type  | vcpu_index |   irq_number   |
> 
> The irq_type field has the following values:
> - irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
> - irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
>                (the vcpu_index field is ignored)
> - irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)
> 
> The irq_number thus corresponds to the irq ID in as in the GICv2 specs.
> 
> This is documented in Documentation/kvm/api.txt.
> 
> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  Documentation/virtual/kvm/api.txt |   25 ++++++++++++--
>  arch/arm/include/asm/kvm_arm.h    |    1 +
>  arch/arm/include/uapi/asm/kvm.h   |   21 ++++++++++++
>  arch/arm/kvm/arm.c                |   65 +++++++++++++++++++++++++++++++++++++
>  arch/arm/kvm/trace.h              |   25 ++++++++++++++
>  include/uapi/linux/kvm.h          |    1 +
>  6 files changed, 134 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 4237c27..5050492 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -615,15 +615,32 @@ created.
>  4.25 KVM_IRQ_LINE
>  
>  Capability: KVM_CAP_IRQCHIP
> -Architectures: x86, ia64
> +Architectures: x86, ia64, arm
>  Type: vm ioctl
>  Parameters: struct kvm_irq_level
>  Returns: 0 on success, -1 on error
>  
>  Sets the level of a GSI input to the interrupt controller model in the kernel.
> -Requires that an interrupt controller model has been previously created with
> -KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
> -to be set to 1 and then back to 0.
> +On some architectures it is required that an interrupt controller model has
> +been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
> +interrupts require the level to be set to 1 and then back to 0.
> +
> +ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
> +(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
> +specific cpus.  The irq field is interpreted like this:
> +
> + ?bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
> +  field: | irq_type  | vcpu_index |     irq_id     |
> +
> +The irq_type field has the following values:
> +- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
> +- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
> +               (the vcpu_index field is ignored)
> +- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
> +
> +(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
> +
> +In both cases, level is used to raise/lower the line.
>  
>  struct kvm_irq_level {
>  	union {
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index 613afe2..fb22ee8 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -68,6 +68,7 @@
>  #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
>  			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
>  			HCR_SWIO | HCR_TIDCP)
> +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>  
>  /* Hyp System Control Register (HSCTLR) bits */
>  #define HSCTLR_TE	(1 << 30)
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index c6298b1..4cf6d8f 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -23,6 +23,7 @@
>  #include <asm/ptrace.h>
>  
>  #define __KVM_HAVE_GUEST_DEBUG
> +#define __KVM_HAVE_IRQ_LINE
>  
>  #define KVM_REG_SIZE(id)						\
>  	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> @@ -103,4 +104,24 @@ struct kvm_arch_memory_slot {
>  #define KVM_REG_ARM_CORE		(0x0010 << KVM_REG_ARM_COPROC_SHIFT)
>  #define KVM_REG_ARM_CORE_REG(name)	(offsetof(struct kvm_regs, name) / 4)
>  
> +/* KVM_IRQ_LINE irq field index values */
> +#define KVM_ARM_IRQ_TYPE_SHIFT		24
> +#define KVM_ARM_IRQ_TYPE_MASK		0xff
> +#define KVM_ARM_IRQ_VCPU_SHIFT		16
> +#define KVM_ARM_IRQ_VCPU_MASK		0xff
> +#define KVM_ARM_IRQ_NUM_SHIFT		0
> +#define KVM_ARM_IRQ_NUM_MASK		0xffff
> +
> +/* irq_type field */
> +#define KVM_ARM_IRQ_TYPE_CPU		0
> +#define KVM_ARM_IRQ_TYPE_SPI		1
> +#define KVM_ARM_IRQ_TYPE_PPI		2
> +
> +/* out-of-kernel GIC cpu interrupt injection irq_number field */
> +#define KVM_ARM_IRQ_CPU_IRQ		0
> +#define KVM_ARM_IRQ_CPU_FIQ		1
> +
> +/* Highest supported SPI, from VGIC_NR_IRQS */
> +#define KVM_ARM_IRQ_GIC_MAX		127
> +
>  #endif /* __ARM_KVM_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index ab82039..9b4566e 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -24,6 +24,7 @@
>  #include <linux/fs.h>
>  #include <linux/mman.h>
>  #include <linux/sched.h>
> +#include <linux/kvm.h>
>  #include <trace/events/kvm.h>
>  
>  #define CREATE_TRACE_POINTS
> @@ -284,6 +285,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>  
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
> +	vcpu->cpu = cpu;
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -324,6 +326,69 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	return -EINVAL;
>  }
>  
> +static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
> +{
> +	int bit_index;
> +	bool set;
> +	unsigned long *ptr;
> +
> +	if (number == KVM_ARM_IRQ_CPU_IRQ)
> +		bit_index = __ffs(HCR_VI);
> +	else /* KVM_ARM_IRQ_CPU_FIQ */
> +		bit_index = __ffs(HCR_VF);
> +
> +	ptr = (unsigned long *)&vcpu->arch.irq_lines;
> +	if (level)
> +		set = test_and_set_bit(bit_index, ptr);
> +	else
> +		set = test_and_clear_bit(bit_index, ptr);
> +
> +	/*
> +	 * If we didn't change anything, no need to wake up or kick other CPUs
> +	 */
> +	if (set == level)
> +		return 0;
> +
> +	/*
> +	 * The vcpu irq_lines field was updated, wake up sleeping VCPUs and
> +	 * trigger a world-switch round on the running physical CPU to set the
> +	 * virtual IRQ/FIQ fields in the HCR appropriately.
> +	 */
> +	kvm_vcpu_kick(vcpu);
> +
> +	return 0;
> +}
> +
> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
> +{
> +	u32 irq = irq_level->irq;
> +	unsigned int irq_type, vcpu_idx, irq_num;
> +	int nrcpus = atomic_read(&kvm->online_vcpus);
> +	struct kvm_vcpu *vcpu = NULL;
> +	bool level = irq_level->level;
> +
> +	irq_type = (irq >> KVM_ARM_IRQ_TYPE_SHIFT) & KVM_ARM_IRQ_TYPE_MASK;
> +	vcpu_idx = (irq >> KVM_ARM_IRQ_VCPU_SHIFT) & KVM_ARM_IRQ_VCPU_MASK;
> +	irq_num = (irq >> KVM_ARM_IRQ_NUM_SHIFT) & KVM_ARM_IRQ_NUM_MASK;
> +
> +	trace_kvm_irq_line(irq_type, vcpu_idx, irq_num, irq_level->level);
> +
> +	if (irq_type != KVM_ARM_IRQ_TYPE_CPU)
> +		return -EINVAL;
> +
> +	if (vcpu_idx >= nrcpus)
> +		return -EINVAL;
> +
> +	vcpu = kvm_get_vcpu(kvm, vcpu_idx);
> +	if (!vcpu)
> +		return -EINVAL;
> +
> +	if (irq_num > KVM_ARM_IRQ_CPU_FIQ)
> +		return -EINVAL;
> +
> +	return vcpu_interrupt_line(vcpu, irq_num, level);
> +}
> +
>  long kvm_arch_vcpu_ioctl(struct file *filp,
>  			 unsigned int ioctl, unsigned long arg)
>  {
> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index 862b2cc..105d1f7 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -39,6 +39,31 @@ TRACE_EVENT(kvm_exit,
>  	TP_printk("PC: 0x%08lx", __entry->vcpu_pc)
>  );
>  
> +TRACE_EVENT(kvm_irq_line,
> +	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
> +	TP_ARGS(type, vcpu_idx, irq_num, level),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned int,	type		)
> +		__field(	int,		vcpu_idx	)
> +		__field(	int,		irq_num		)
> +		__field(	int,		level		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->type		= type;
> +		__entry->vcpu_idx	= vcpu_idx;
> +		__entry->irq_num	= irq_num;
> +		__entry->level		= level;
> +	),
> +
> +	TP_printk("Inject %s interrupt (%d), vcpu->idx: %d, num: %d, level: %d",
> +		  (__entry->type == KVM_ARM_IRQ_TYPE_CPU) ? "CPU" :
> +		  (__entry->type == KVM_ARM_IRQ_TYPE_PPI) ? "VGIC PPI" :
> +		  (__entry->type == KVM_ARM_IRQ_TYPE_SPI) ? "VGIC SPI" : "UNKNOWN",
> +		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
> +);
> +
>  TRACE_EVENT(kvm_unmap_hva,
>  	TP_PROTO(unsigned long hva),
>  	TP_ARGS(hva),
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 24978d5..dc63665 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -115,6 +115,7 @@ struct kvm_irq_level {
>  	 * ACPI gsi notion of irq.
>  	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
>  	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
> +	 * For ARM: See Documentation/virtual/kvm/api.txt
>  	 */
>  	union {
>  		__u32 irq;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2013-01-15  9:56     ` Gleb Natapov
@ 2013-01-15 12:15       ` Peter Maydell
  -1 siblings, 0 replies; 160+ messages in thread
From: Peter Maydell @ 2013-01-15 12:15 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christoffer Dall, Marcelo Tosatti, linux-arm-kernel, kvm, kvmarm

On 15 January 2013 09:56, Gleb Natapov <gleb@redhat.com> wrote:
> On Tue, Jan 08, 2013 at 01:39:17PM -0500, Christoffer Dall wrote:
>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>
>> All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
>> works semantically well for the GIC as we in fact raise/lower a line on
>> a machine component (the gic).  The IOCTL uses the follwing struct.
>>
>> struct kvm_irq_level {
>>       union {
>>               __u32 irq;     /* GSI */
>>               __s32 status;  /* not used for KVM_IRQ_LEVEL */
>>       };
>>       __u32 level;           /* 0 or 1 */
>> };
>>
>> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
> CPU level interrupt should use KVM_INTERRUPT instead.

No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
can be fed to the kernel asynchronously. It happens that on x86 "must be
delivered synchronously" and "not going to in kernel irqchip" are the same, but
this isn't true for other archs. For ARM all our interrupts can be fed
to the kernel
asynchronously, and so we use KVM_IRQ_LINE in all cases.

There was a big discussion thread about this on kvm and qemu-devel last
July (and we cleaned up some of the QEMU code to not smoosh together
all these different concepts under "do I have an irqchip or not?").

>> (GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
>> specific cpus.  The irq field is interpreted like this:
>>
> Haven't read about GIC yet. Is PPI an interrupt that device can send
> directly to a specific CPU? Can we model that with irq routing like we do
> for MSI?

There is no routing involved -- you are raising a specific signal
line (which happens to result in prodding a particular CPU), that's all.

-- PMM

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2013-01-15 12:15       ` Peter Maydell
  0 siblings, 0 replies; 160+ messages in thread
From: Peter Maydell @ 2013-01-15 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 January 2013 09:56, Gleb Natapov <gleb@redhat.com> wrote:
> On Tue, Jan 08, 2013 at 01:39:17PM -0500, Christoffer Dall wrote:
>> From: Christoffer Dall <cdall@cs.columbia.edu>
>>
>> All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
>> works semantically well for the GIC as we in fact raise/lower a line on
>> a machine component (the gic).  The IOCTL uses the follwing struct.
>>
>> struct kvm_irq_level {
>>       union {
>>               __u32 irq;     /* GSI */
>>               __s32 status;  /* not used for KVM_IRQ_LEVEL */
>>       };
>>       __u32 level;           /* 0 or 1 */
>> };
>>
>> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
> CPU level interrupt should use KVM_INTERRUPT instead.

No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
can be fed to the kernel asynchronously. It happens that on x86 "must be
delivered synchronously" and "not going to in kernel irqchip" are the same, but
this isn't true for other archs. For ARM all our interrupts can be fed
to the kernel
asynchronously, and so we use KVM_IRQ_LINE in all cases.

There was a big discussion thread about this on kvm and qemu-devel last
July (and we cleaned up some of the QEMU code to not smoosh together
all these different concepts under "do I have an irqchip or not?").

>> (GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
>> specific cpus.  The irq field is interpreted like this:
>>
> Haven't read about GIC yet. Is PPI an interrupt that device can send
> directly to a specific CPU? Can we model that with irq routing like we do
> for MSI?

There is no routing involved -- you are raising a specific signal
line (which happens to result in prodding a particular CPU), that's all.

-- PMM

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2013-01-15 12:15       ` Peter Maydell
@ 2013-01-15 12:52         ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 12:52 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Christoffer Dall, Marcelo Tosatti, linux-arm-kernel, kvm, kvmarm

On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
> On 15 January 2013 09:56, Gleb Natapov <gleb@redhat.com> wrote:
> > On Tue, Jan 08, 2013 at 01:39:17PM -0500, Christoffer Dall wrote:
> >> From: Christoffer Dall <cdall@cs.columbia.edu>
> >>
> >> All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
> >> works semantically well for the GIC as we in fact raise/lower a line on
> >> a machine component (the gic).  The IOCTL uses the follwing struct.
> >>
> >> struct kvm_irq_level {
> >>       union {
> >>               __u32 irq;     /* GSI */
> >>               __s32 status;  /* not used for KVM_IRQ_LEVEL */
> >>       };
> >>       __u32 level;           /* 0 or 1 */
> >> };
> >>
> >> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
> > CPU level interrupt should use KVM_INTERRUPT instead.
> 
> No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
> delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
> can be fed to the kernel asynchronously. It happens that on x86 "must be
> delivered synchronously" and "not going to in kernel irqchip" are the same, but
> this isn't true for other archs. For ARM all our interrupts can be fed
> to the kernel
> asynchronously, and so we use KVM_IRQ_LINE in all cases.
> 
I do no quite understand what you mean by synchronously and
asynchronously. The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
is that former is used when destination cpu is known to userspace later
is used when kernel code is involved in figuring out the destination. The
injections themselves are currently synchronous for both of them on x86
and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
be injected into a guest and vcpu state is changed to inject interrupt
during next guest entry. In the near feature x86 will be able to inject
interrupt without kicking vcpu out from the guest mode does ARM plan to
do the same? For GIC interrupts or for IRQ/FIQ or for both?

> There was a big discussion thread about this on kvm and qemu-devel last
> July (and we cleaned up some of the QEMU code to not smoosh together
> all these different concepts under "do I have an irqchip or not?").
Do you have a pointer?

> 
> >> (GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
> >> specific cpus.  The irq field is interpreted like this:
> >>
> > Haven't read about GIC yet. Is PPI an interrupt that device can send
> > directly to a specific CPU? Can we model that with irq routing like we do
> > for MSI?
> 
> There is no routing involved -- you are raising a specific signal
> line (which happens to result in prodding a particular CPU), that's all.
> 
We call it "irq routing", but it is not really a router. It just a
configuration to let KVM know how specific lines are wired. We abuse it
for MSI injection. So instead of encoding destination into kvm_irq_level
you configure "irq routing" entry with this information and get back a
cookie. You provide the cookie in kvm_irq_level->irq to KVM_IRQ_LEVEL
ioctl. This way you are not limited to 8 bit of cpuid for instance. This
is not efficient if "irq routing" is very dynamic though.
 
--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2013-01-15 12:52         ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
> On 15 January 2013 09:56, Gleb Natapov <gleb@redhat.com> wrote:
> > On Tue, Jan 08, 2013 at 01:39:17PM -0500, Christoffer Dall wrote:
> >> From: Christoffer Dall <cdall@cs.columbia.edu>
> >>
> >> All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
> >> works semantically well for the GIC as we in fact raise/lower a line on
> >> a machine component (the gic).  The IOCTL uses the follwing struct.
> >>
> >> struct kvm_irq_level {
> >>       union {
> >>               __u32 irq;     /* GSI */
> >>               __s32 status;  /* not used for KVM_IRQ_LEVEL */
> >>       };
> >>       __u32 level;           /* 0 or 1 */
> >> };
> >>
> >> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
> > CPU level interrupt should use KVM_INTERRUPT instead.
> 
> No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
> delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
> can be fed to the kernel asynchronously. It happens that on x86 "must be
> delivered synchronously" and "not going to in kernel irqchip" are the same, but
> this isn't true for other archs. For ARM all our interrupts can be fed
> to the kernel
> asynchronously, and so we use KVM_IRQ_LINE in all cases.
> 
I do no quite understand what you mean by synchronously and
asynchronously. The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
is that former is used when destination cpu is known to userspace later
is used when kernel code is involved in figuring out the destination. The
injections themselves are currently synchronous for both of them on x86
and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
be injected into a guest and vcpu state is changed to inject interrupt
during next guest entry. In the near feature x86 will be able to inject
interrupt without kicking vcpu out from the guest mode does ARM plan to
do the same? For GIC interrupts or for IRQ/FIQ or for both?

> There was a big discussion thread about this on kvm and qemu-devel last
> July (and we cleaned up some of the QEMU code to not smoosh together
> all these different concepts under "do I have an irqchip or not?").
Do you have a pointer?

> 
> >> (GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
> >> specific cpus.  The irq field is interpreted like this:
> >>
> > Haven't read about GIC yet. Is PPI an interrupt that device can send
> > directly to a specific CPU? Can we model that with irq routing like we do
> > for MSI?
> 
> There is no routing involved -- you are raising a specific signal
> line (which happens to result in prodding a particular CPU), that's all.
> 
We call it "irq routing", but it is not really a router. It just a
configuration to let KVM know how specific lines are wired. We abuse it
for MSI injection. So instead of encoding destination into kvm_irq_level
you configure "irq routing" entry with this information and get back a
cookie. You provide the cookie in kvm_irq_level->irq to KVM_IRQ_LEVEL
ioctl. This way you are not limited to 8 bit of cpuid for instance. This
is not efficient if "irq routing" is very dynamic though.
 
--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-08 18:40   ` Christoffer Dall
@ 2013-01-15 13:18     ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 13:18 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> When the guest accesses I/O memory this will create data abort
> exceptions and they are handled by decoding the HSR information
> (physical address, read/write, length, register) and forwarding reads
> and writes to QEMU which performs the device emulation.
> 
> Certain classes of load/store operations do not support the syndrome
> information provided in the HSR and we therefore must be able to fetch
> the offending instruction from guest memory and decode it manually.
> 
> We only support instruction decoding for valid reasonable MMIO operations
> where trapping them do not provide sufficient information in the HSR (no
> 16-bit Thumb instructions provide register writeback that we care about).
> 
> The following instruction types are NOT supported for MMIO operations
> despite the HSR not containing decode info:
>  - any Load/Store multiple
>  - any load/store exclusive
>  - any load/store dual
>  - anything with the PC as the dest register
> 
> This requires changing the general flow somewhat since new calls to run
> the VCPU must check if there's a pending MMIO load and perform the write
> after userspace has made the data available.
> 
> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
> (1) Guest complicated mmio instruction traps.
> (2) The hardware doesn't tell us enough, so we need to read the actual
>     instruction which was being exectuted.
> (3) KVM maps the instruction virtual address to a physical address.
> (4) The guest (SMP) swaps out that page, and fills it with something else.
> (5) We read the physical address, but now that's the wrong thing.
How can this happen?! The guest cannot reuse physical page before it
flushes it from all vcpus tlb cache. For that it needs to send
synchronous IPI to all vcpus and IPI will not be processed by a vcpu
while it does emulation.

> 
> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/kvm_arm.h     |    3 
>  arch/arm/include/asm/kvm_asm.h     |    2 
>  arch/arm/include/asm/kvm_decode.h  |   47 ++++
>  arch/arm/include/asm/kvm_emulate.h |    8 +
>  arch/arm/include/asm/kvm_host.h    |    7 +
>  arch/arm/include/asm/kvm_mmio.h    |   51 ++++
>  arch/arm/kvm/Makefile              |    2 
>  arch/arm/kvm/arm.c                 |   14 +
>  arch/arm/kvm/decode.c              |  462 ++++++++++++++++++++++++++++++++++++
>  arch/arm/kvm/emulate.c             |  169 +++++++++++++
>  arch/arm/kvm/interrupts.S          |   38 +++
>  arch/arm/kvm/mmio.c                |  154 ++++++++++++
>  arch/arm/kvm/mmu.c                 |    7 -
>  arch/arm/kvm/trace.h               |   21 ++
>  14 files changed, 981 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm/include/asm/kvm_decode.h
>  create mode 100644 arch/arm/include/asm/kvm_mmio.h
>  create mode 100644 arch/arm/kvm/decode.c
>  create mode 100644 arch/arm/kvm/mmio.c
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index 3ff6f22..151c4ce 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -173,8 +173,11 @@
>  #define HSR_ISS		(HSR_IL - 1)
>  #define HSR_ISV_SHIFT	(24)
>  #define HSR_ISV		(1U << HSR_ISV_SHIFT)
> +#define HSR_SRT_SHIFT	(16)
> +#define HSR_SRT_MASK	(0xf << HSR_SRT_SHIFT)
>  #define HSR_FSC		(0x3f)
>  #define HSR_FSC_TYPE	(0x3c)
> +#define HSR_SSE		(1 << 21)
>  #define HSR_WNR		(1 << 6)
>  #define HSR_CV_SHIFT	(24)
>  #define HSR_CV		(1U << HSR_CV_SHIFT)
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 5e06e81..58d787b 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -77,6 +77,8 @@ extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +
> +extern u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv);
>  #endif
>  
>  #endif /* __ARM_KVM_ASM_H__ */
> diff --git a/arch/arm/include/asm/kvm_decode.h b/arch/arm/include/asm/kvm_decode.h
> new file mode 100644
> index 0000000..3c37cb9
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_decode.h
> @@ -0,0 +1,47 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_DECODE_H__
> +#define __ARM_KVM_DECODE_H__
> +
> +#include <linux/types.h>
> +
> +struct kvm_vcpu;
> +struct kvm_exit_mmio;
> +
> +struct kvm_decode {
> +	struct pt_regs *regs;
> +	unsigned long fault_addr;
> +	unsigned long rt;
> +	bool sign_extend;
> +};
> +
> +int kvm_decode_load_store(struct kvm_decode *decode, unsigned long instr,
> +			  struct kvm_exit_mmio *mmio);
> +
> +static inline unsigned long *kvm_decode_reg(struct kvm_decode *decode, int reg)
> +{
> +	return &decode->regs->uregs[reg];
> +}
> +
> +static inline unsigned long *kvm_decode_cpsr(struct kvm_decode *decode)
> +{
> +	return &decode->regs->ARM_cpsr;
> +}
> +
> +#endif /* __ARM_KVM_DECODE_H__ */
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 01a755b..375795b 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -21,11 +21,14 @@
>  
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_asm.h>
> +#include <asm/kvm_mmio.h>
>  
>  u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
>  u32 *vcpu_spsr(struct kvm_vcpu *vcpu);
>  
>  int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
> +int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +			struct kvm_exit_mmio *mmio);
>  void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
>  void kvm_inject_undefined(struct kvm_vcpu *vcpu);
>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
> @@ -53,4 +56,9 @@ static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
>  	return cpsr_mode > USR_MODE;;
>  }
>  
> +static inline bool kvm_vcpu_reg_is_pc(struct kvm_vcpu *vcpu, int reg)
> +{
> +	return reg == 15;
> +}
> +
>  #endif /* __ARM_KVM_EMULATE_H__ */
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 6cc8933..ca40795 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -22,6 +22,7 @@
>  #include <asm/kvm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/fpstate.h>
> +#include <asm/kvm_decode.h>
>  
>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>  #define KVM_USER_MEM_SLOTS 32
> @@ -99,6 +100,12 @@ struct kvm_vcpu_arch {
>  	int last_pcpu;
>  	cpumask_t require_dcache_flush;
>  
> +	/* Don't run the guest: see copy_current_insn() */
> +	bool pause;
> +
> +	/* IO related fields */
> +	struct kvm_decode mmio_decode;
> +
>  	/* Interrupt related fields */
>  	u32 irq_lines;		/* IRQ and FIQ levels */
>  
> diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
> new file mode 100644
> index 0000000..31ab9f5
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_mmio.h
> @@ -0,0 +1,51 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_MMIO_H__
> +#define __ARM_KVM_MMIO_H__
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_arm.h>
> +
> +/*
> + * The in-kernel MMIO emulation code wants to use a copy of run->mmio,
> + * which is an anonymous type. Use our own type instead.
> + */
> +struct kvm_exit_mmio {
> +	phys_addr_t	phys_addr;
> +	u8		data[8];
> +	u32		len;
> +	bool		is_write;
> +};
> +
> +static inline void kvm_prepare_mmio(struct kvm_run *run,
> +				    struct kvm_exit_mmio *mmio)
> +{
> +	run->mmio.phys_addr	= mmio->phys_addr;
> +	run->mmio.len		= mmio->len;
> +	run->mmio.is_write	= mmio->is_write;
> +	memcpy(run->mmio.data, mmio->data, mmio->len);
> +	run->exit_reason	= KVM_EXIT_MMIO;
> +}
> +
> +int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
> +int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot);
> +
> +#endif	/* __ARM_KVM_MMIO_H__ */
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index 88edce6..44a5f4b 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -18,4 +18,4 @@ kvm-arm-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
>  
>  obj-y += kvm-arm.o init.o interrupts.o
>  obj-y += arm.o guest.o mmu.o emulate.o reset.o
> -obj-y += coproc.o coproc_a15.o
> +obj-y += coproc.o coproc_a15.o mmio.o decode.o
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 0b4ffcf..f42d828 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -614,6 +614,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	if (unlikely(vcpu->arch.target < 0))
>  		return -ENOEXEC;
>  
> +	if (run->exit_reason == KVM_EXIT_MMIO) {
> +		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	if (vcpu->sigset_active)
>  		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>  
> @@ -649,7 +655,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_guest_enter();
>  		vcpu->mode = IN_GUEST_MODE;
>  
> -		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> +		smp_mb(); /* set mode before reading vcpu->arch.pause */
> +		if (unlikely(vcpu->arch.pause)) {
> +			/* This means ignore, try again. */
> +			ret = ARM_EXCEPTION_IRQ;
> +		} else {
> +			ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> +		}
>  
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>  		vcpu->arch.last_pcpu = smp_processor_id();
> diff --git a/arch/arm/kvm/decode.c b/arch/arm/kvm/decode.c
> new file mode 100644
> index 0000000..469cf14
> --- /dev/null
> +++ b/arch/arm/kvm/decode.c
> @@ -0,0 +1,462 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_mmio.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_decode.h>
> +#include <trace/events/kvm.h>
> +
> +#include "trace.h"
> +
> +struct arm_instr {
> +	/* Instruction decoding */
> +	u32 opc;
> +	u32 opc_mask;
> +
> +	/* Decoding for the register write back */
> +	bool register_form;
> +	u32 imm;
> +	u8 Rm;
> +	u8 type;
> +	u8 shift_n;
> +
> +	/* Common decoding */
> +	u8 len;
> +	bool sign_extend;
> +	bool w;
> +
> +	bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +		       unsigned long instr, struct arm_instr *ai);
> +};
> +
> +enum SRType {
> +	SRType_LSL,
> +	SRType_LSR,
> +	SRType_ASR,
> +	SRType_ROR,
> +	SRType_RRX
> +};
> +
> +/* Modelled after DecodeImmShift() in the ARM ARM */
> +static enum SRType decode_imm_shift(u8 type, u8 imm5, u8 *amount)
> +{
> +	switch (type) {
> +	case 0x0:
> +		*amount = imm5;
> +		return SRType_LSL;
> +	case 0x1:
> +		*amount = (imm5 == 0) ? 32 : imm5;
> +		return SRType_LSR;
> +	case 0x2:
> +		*amount = (imm5 == 0) ? 32 : imm5;
> +		return SRType_ASR;
> +	case 0x3:
> +		if (imm5 == 0) {
> +			*amount = 1;
> +			return SRType_RRX;
> +		} else {
> +			*amount = imm5;
> +			return SRType_ROR;
> +		}
> +	}
> +
> +	return SRType_LSL;
> +}
> +
> +/* Modelled after Shift() in the ARM ARM */
> +static u32 shift(u32 value, u8 N, enum SRType type, u8 amount, bool carry_in)
> +{
> +	u32 mask = (1 << N) - 1;
> +	s32 svalue = (s32)value;
> +
> +	BUG_ON(N > 32);
> +	BUG_ON(type == SRType_RRX && amount != 1);
> +	BUG_ON(amount > N);
> +
> +	if (amount == 0)
> +		return value;
> +
> +	switch (type) {
> +	case SRType_LSL:
> +		value <<= amount;
> +		break;
> +	case SRType_LSR:
> +		 value >>= amount;
> +		break;
> +	case SRType_ASR:
> +		if (value & (1 << (N - 1)))
> +			svalue |= ((-1UL) << N);
> +		value = svalue >> amount;
> +		break;
> +	case SRType_ROR:
> +		value = (value >> amount) | (value << (N - amount));
> +		break;
> +	case SRType_RRX: {
> +		u32 C = (carry_in) ? 1 : 0;
> +		value = (value >> 1) | (C << (N - 1));
> +		break;
> +	}
> +	}
> +
> +	return value & mask;
> +}
> +
> +static bool decode_arm_wb(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +			  unsigned long instr, const struct arm_instr *ai)
> +{
> +	u8 Rt = (instr >> 12) & 0xf;
> +	u8 Rn = (instr >> 16) & 0xf;
> +	u8 W = (instr >> 21) & 1;
> +	u8 U = (instr >> 23) & 1;
> +	u8 P = (instr >> 24) & 1;
> +	u32 base_addr = *kvm_decode_reg(decode, Rn);
> +	u32 offset_addr, offset;
> +
> +	/*
> +	 * Technically this is allowed in certain circumstances,
> +	 * but we don't support it.
> +	 */
> +	if (Rt == 15 || Rn == 15)
> +		return false;
> +
> +	if (P && !W) {
> +		kvm_err("Decoding operation with valid ISV?\n");
> +		return false;
> +	}
> +
> +	decode->rt = Rt;
> +
> +	if (ai->register_form) {
> +		/* Register operation */
> +		enum SRType s_type;
> +		u8 shift_n = 0;
> +		bool c_bit = *kvm_decode_cpsr(decode) & PSR_C_BIT;
> +		u32 s_reg = *kvm_decode_reg(decode, ai->Rm);
> +
> +		s_type = decode_imm_shift(ai->type, ai->shift_n, &shift_n);
> +		offset = shift(s_reg, 5, s_type, shift_n, c_bit);
> +	} else {
> +		/* Immediate operation */
> +		offset = ai->imm;
> +	}
> +
> +	/* Handle Writeback */
> +	if (U)
> +		offset_addr = base_addr + offset;
> +	else
> +		offset_addr = base_addr - offset;
> +	*kvm_decode_reg(decode, Rn) = offset_addr;
> +	return true;
> +}
> +
> +static bool decode_arm_ls(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +			  unsigned long instr, struct arm_instr *ai)
> +{
> +	u8 A = (instr >> 25) & 1;
> +
> +	mmio->is_write = ai->w;
> +	mmio->len = ai->len;
> +	decode->sign_extend = false;
> +
> +	ai->register_form = A;
> +	ai->imm = instr & 0xfff;
> +	ai->Rm = instr & 0xf;
> +	ai->type = (instr >> 5) & 0x3;
> +	ai->shift_n = (instr >> 7) & 0x1f;
> +
> +	return decode_arm_wb(decode, mmio, instr, ai);
> +}
> +
> +static bool decode_arm_extra(struct kvm_decode *decode,
> +			     struct kvm_exit_mmio *mmio,
> +			     unsigned long instr, struct arm_instr *ai)
> +{
> +	mmio->is_write = ai->w;
> +	mmio->len = ai->len;
> +	decode->sign_extend = ai->sign_extend;
> +
> +	ai->register_form = !((instr >> 22) & 1);
> +	ai->imm = ((instr >> 4) & 0xf0) | (instr & 0xf);
> +	ai->Rm = instr & 0xf;
> +	ai->type = 0; /* SRType_LSL */
> +	ai->shift_n = 0;
> +
> +	return decode_arm_wb(decode, mmio, instr, ai);
> +}
> +
> +/*
> + * The encodings in this table assumes that a fault was generated where the
> + * ISV field in the HSR was clear, and the decoding information was invalid,
> + * which means that a register write-back occurred, the PC was used as the
> + * destination or a load/store multiple operation was used. Since the latter
> + * two cases are crazy for MMIO on the guest side, we simply inject a fault
> + * when this happens and support the common case.
> + *
> + * We treat unpriviledged loads and stores of words and bytes like all other
> + * loads and stores as their encodings mandate the W bit set and the P bit
> + * clear.
> + */
> +static const struct arm_instr arm_instr[] = {
> +	/**************** Load/Store Word and Byte **********************/
> +	/* Store word with writeback */
> +	{ .opc = 0x04000000, .opc_mask = 0x0c500000, .len = 4, .w = true,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Store byte with writeback */
> +	{ .opc = 0x04400000, .opc_mask = 0x0c500000, .len = 1, .w = true,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Load word with writeback */
> +	{ .opc = 0x04100000, .opc_mask = 0x0c500000, .len = 4, .w = false,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Load byte with writeback */
> +	{ .opc = 0x04500000, .opc_mask = 0x0c500000, .len = 1, .w = false,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +
> +	/*************** Extra load/store instructions ******************/
> +
> +	/* Store halfword with writeback */
> +	{ .opc = 0x000000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load halfword with writeback */
> +	{ .opc = 0x001000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +
> +	/* Load dual with writeback */
> +	{ .opc = 0x000000d0, .opc_mask = 0x0c1000f0, .len = 8, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed byte with writeback */
> +	{ .opc = 0x001000d0, .opc_mask = 0x0c1000f0, .len = 1, .w = false,
> +		.sign_extend = true,  .decode = decode_arm_extra },
> +
> +	/* Store dual with writeback */
> +	{ .opc = 0x000000f0, .opc_mask = 0x0c1000f0, .len = 8, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed halfword with writeback */
> +	{ .opc = 0x001000f0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
> +		.sign_extend = true,  .decode = decode_arm_extra },
> +
> +	/* Store halfword unprivileged */
> +	{ .opc = 0x002000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load halfword unprivileged */
> +	{ .opc = 0x003000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed byte unprivileged */
> +	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 1, .w = false,
> +		.sign_extend = true , .decode = decode_arm_extra },
> +	/* Load signed halfword unprivileged */
> +	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
> +		.sign_extend = true , .decode = decode_arm_extra },
> +};
> +
> +static bool kvm_decode_arm_ls(struct kvm_decode *decode, unsigned long instr,
> +			      struct kvm_exit_mmio *mmio)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(arm_instr); i++) {
> +		const struct arm_instr *ai = &arm_instr[i];
> +		if ((instr & ai->opc_mask) == ai->opc) {
> +			struct arm_instr ai_copy = *ai;
> +			return ai->decode(decode, mmio, instr, &ai_copy);
> +		}
> +	}
> +	return false;
> +}
> +
> +struct thumb_instr {
> +	bool is32;
> +
> +	u8 opcode;
> +	u8 opcode_mask;
> +	u8 op2;
> +	u8 op2_mask;
> +
> +	bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +		       unsigned long instr, const struct thumb_instr *ti);
> +};
> +
> +static bool decode_thumb_wb(struct kvm_decode *decode,
> +			    struct kvm_exit_mmio *mmio,
> +			    unsigned long instr)
> +{
> +	bool P = (instr >> 10) & 1;
> +	bool U = (instr >> 9) & 1;
> +	u8 imm8 = instr & 0xff;
> +	u32 offset_addr = decode->fault_addr;
> +	u8 Rn = (instr >> 16) & 0xf;
> +
> +	decode->rt = (instr >> 12) & 0xf;
> +
> +	if (Rn == 15)
> +		return false;
> +
> +	/* Handle Writeback */
> +	if (!P && U)
> +		*kvm_decode_reg(decode, Rn) = offset_addr + imm8;
> +	else if (!P && !U)
> +		*kvm_decode_reg(decode, Rn) = offset_addr - imm8;
> +	return true;
> +}
> +
> +static bool decode_thumb_str(struct kvm_decode *decode,
> +			     struct kvm_exit_mmio *mmio,
> +			     unsigned long instr, const struct thumb_instr *ti)
> +{
> +	u8 op1 = (instr >> (16 + 5)) & 0x7;
> +	u8 op2 = (instr >> 6) & 0x3f;
> +
> +	mmio->is_write = true;
> +	decode->sign_extend = false;
> +
> +	switch (op1) {
> +	case 0x0: mmio->len = 1; break;
> +	case 0x1: mmio->len = 2; break;
> +	case 0x2: mmio->len = 4; break;
> +	default:
> +		  return false; /* Only register write-back versions! */
> +	}
> +
> +	if ((op2 & 0x24) == 0x24) {
> +		/* STRB (immediate, thumb, W=1) */
> +		return decode_thumb_wb(decode, mmio, instr);
> +	}
> +
> +	return false;
> +}
> +
> +static bool decode_thumb_ldr(struct kvm_decode *decode,
> +			     struct kvm_exit_mmio *mmio,
> +			     unsigned long instr, const struct thumb_instr *ti)
> +{
> +	u8 op1 = (instr >> (16 + 7)) & 0x3;
> +	u8 op2 = (instr >> 6) & 0x3f;
> +
> +	mmio->is_write = false;
> +
> +	switch (ti->op2 & 0x7) {
> +	case 0x1: mmio->len = 1; break;
> +	case 0x3: mmio->len = 2; break;
> +	case 0x5: mmio->len = 4; break;
> +	}
> +
> +	if (op1 == 0x0)
> +		decode->sign_extend = false;
> +	else if (op1 == 0x2 && (ti->op2 & 0x7) != 0x5)
> +		decode->sign_extend = true;
> +	else
> +		return false; /* Only register write-back versions! */
> +
> +	if ((op2 & 0x24) == 0x24) {
> +		/* LDR{S}X (immediate, thumb, W=1) */
> +		return decode_thumb_wb(decode, mmio, instr);
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * We only support instruction decoding for valid reasonable MMIO operations
> + * where trapping them do not provide sufficient information in the HSR (no
> + * 16-bit Thumb instructions provide register writeback that we care about).
> + *
> + * The following instruciton types are NOT supported for MMIO operations
> + * despite the HSR not containing decode info:
> + *  - any Load/Store multiple
> + *  - any load/store exclusive
> + *  - any load/store dual
> + *  - anything with the PC as the dest register
> + */
> +static const struct thumb_instr thumb_instr[] = {
> +	/**************** 32-bit Thumb instructions **********************/
> +	/* Store single data item:	Op1 == 11, Op2 == 000xxx0 */
> +	{ .is32 = true,  .opcode = 3, .op2 = 0x00, .op2_mask = 0x71,
> +						decode_thumb_str	},
> +
> +	/* Load byte:			Op1 == 11, Op2 == 00xx001 */
> +	{ .is32 = true,  .opcode = 3, .op2 = 0x01, .op2_mask = 0x67,
> +						decode_thumb_ldr	},
> +
> +	/* Load halfword:		Op1 == 11, Op2 == 00xx011 */
> +	{ .is32 = true,  .opcode = 3, .op2 = 0x03, .op2_mask = 0x67,
> +						decode_thumb_ldr	},
> +
> +	/* Load word:			Op1 == 11, Op2 == 00xx101 */
> +	{ .is32 = true,  .opcode = 3, .op2 = 0x05, .op2_mask = 0x67,
> +						decode_thumb_ldr	},
> +};
> +
> +
> +
> +static bool kvm_decode_thumb_ls(struct kvm_decode *decode, unsigned long instr,
> +				struct kvm_exit_mmio *mmio)
> +{
> +	bool is32 = is_wide_instruction(instr);
> +	bool is16 = !is32;
> +	struct thumb_instr tinstr; /* re-use to pass on already decoded info */
> +	int i;
> +
> +	if (is16) {
> +		tinstr.opcode = (instr >> 10) & 0x3f;
> +	} else {
> +		tinstr.opcode = (instr >> (16 + 11)) & 0x3;
> +		tinstr.op2 = (instr >> (16 + 4)) & 0x7f;
> +	}
> +
> +	for (i = 0; i < ARRAY_SIZE(thumb_instr); i++) {
> +		const struct thumb_instr *ti = &thumb_instr[i];
> +		if (ti->is32 != is32)
> +			continue;
> +
> +		if (is16) {
> +			if ((tinstr.opcode & ti->opcode_mask) != ti->opcode)
> +				continue;
> +		} else {
> +			if (ti->opcode != tinstr.opcode)
> +				continue;
> +			if ((ti->op2_mask & tinstr.op2) != ti->op2)
> +				continue;
> +		}
> +
> +		return ti->decode(decode, mmio, instr, &tinstr);
> +	}
> +
> +	return false;
> +}
> +
> +/**
> + * kvm_decode_load_store - decodes load/store instructions
> + * @decode: reads regs and fault_addr, writes rt and sign_extend
> + * @instr:  instruction to decode
> + * @mmio:   fills in len and is_write
> + *
> + * Decode load/store instructions with HSR ISV clear. The code assumes that
> + * this was indeed a KVM fault and therefore assumes registers write back for
> + * single load/store operations and does not support using the PC as the
> + * destination register.
> + */
> +int kvm_decode_load_store(struct kvm_decode *decode, unsigned long instr,
> +			  struct kvm_exit_mmio *mmio)
> +{
> +	bool is_thumb;
> +
> +	is_thumb = !!(*kvm_decode_cpsr(decode) & PSR_T_BIT);
> +	if (!is_thumb)
> +		return kvm_decode_arm_ls(decode, instr, mmio) ? 0 : 1;
> +	else
> +		return kvm_decode_thumb_ls(decode, instr, mmio) ? 0 : 1;
> +}
> diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
> index d61450a..ad743b7 100644
> --- a/arch/arm/kvm/emulate.c
> +++ b/arch/arm/kvm/emulate.c
> @@ -20,6 +20,7 @@
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_decode.h>
>  #include <trace/events/kvm.h>
>  
>  #include "trace.h"
> @@ -176,6 +177,174 @@ int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	return 1;
>  }
>  
> +static u64 kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv)
> +{
> +	return kvm_call_hyp(__kvm_va_to_pa, vcpu, va, priv);
> +}
> +
> +/**
> + * copy_from_guest_va - copy memory from guest (very slow!)
> + * @vcpu:	vcpu pointer
> + * @dest:	memory to copy into
> + * @gva:	virtual address in guest to copy from
> + * @len:	length to copy
> + * @priv:	use guest PL1 (ie. kernel) mappings
> + *              otherwise use guest PL0 mappings.
> + *
> + * Returns true on success, false on failure (unlikely, but retry).
> + */
> +static bool copy_from_guest_va(struct kvm_vcpu *vcpu,
> +			       void *dest, unsigned long gva, size_t len,
> +			       bool priv)
> +{
> +	u64 par;
> +	phys_addr_t pc_ipa;
> +	int err;
> +
> +	BUG_ON((gva & PAGE_MASK) != ((gva + len) & PAGE_MASK));
> +	par = kvm_va_to_pa(vcpu, gva & PAGE_MASK, priv);
> +	if (par & 1) {
> +		kvm_err("IO abort from invalid instruction address"
> +			" %#lx!\n", gva);
> +		return false;
> +	}
> +
> +	BUG_ON(!(par & (1U << 11)));
> +	pc_ipa = par & PAGE_MASK & ((1ULL << 32) - 1);
> +	pc_ipa += gva & ~PAGE_MASK;
> +
> +
> +	err = kvm_read_guest(vcpu->kvm, pc_ipa, dest, len);
> +	if (unlikely(err))
> +		return false;
> +
> +	return true;
> +}
> +
> +/*
> + * We have to be very careful copying memory from a running (ie. SMP) guest.
> + * Another CPU may remap the page (eg. swap out a userspace text page) as we
> + * read the instruction.  Unlike normal hardware operation, to emulate an
> + * instruction we map the virtual to physical address then read that memory
> + * as separate steps, thus not atomic.
> + *
> + * Fortunately this is so rare (we don't usually need the instruction), we
> + * can go very slowly and noone will mind.
> + */
> +static bool copy_current_insn(struct kvm_vcpu *vcpu, unsigned long *instr)
> +{
> +	int i;
> +	bool ret;
> +	struct kvm_vcpu *v;
> +	bool is_thumb;
> +	size_t instr_len;
> +
> +	/* Don't cross with IPIs in kvm_main.c */
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	/* Tell them all to pause, so no more will enter guest. */
> +	kvm_for_each_vcpu(i, v, vcpu->kvm)
> +		v->arch.pause = true;
> +
> +	/* Set ->pause before we read ->mode */
> +	smp_mb();
> +
> +	/* Kick out any which are still running. */
> +	kvm_for_each_vcpu(i, v, vcpu->kvm) {
> +		/* Guest could exit now, making cpu wrong. That's OK. */
> +		if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
> +			force_vm_exit(get_cpu_mask(v->cpu));
> +		}
> +	}
> +
> +
> +	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
> +	instr_len = (is_thumb) ? 2 : 4;
> +
> +	BUG_ON(!is_thumb && *vcpu_pc(vcpu) & 0x3);
> +
> +	/* Now guest isn't running, we can va->pa map and copy atomically. */
> +	ret = copy_from_guest_va(vcpu, instr, *vcpu_pc(vcpu), instr_len,
> +				 vcpu_mode_priv(vcpu));
> +	if (!ret)
> +		goto out;
> +
> +	/* A 32-bit thumb2 instruction can actually go over a page boundary! */
> +	if (is_thumb && is_wide_instruction(*instr)) {
> +		*instr = *instr << 16;
> +		ret = copy_from_guest_va(vcpu, instr, *vcpu_pc(vcpu) + 2, 2,
> +					 vcpu_mode_priv(vcpu));
> +	}
> +
> +out:
> +	/* Release them all. */
> +	kvm_for_each_vcpu(i, v, vcpu->kvm)
> +		v->arch.pause = false;
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return ret;
> +}
> +
> +/**
> + * kvm_emulate_mmio_ls - emulates load/store instructions made to I/O memory
> + * @vcpu:	The vcpu pointer
> + * @fault_ipa:	The IPA that caused the 2nd stage fault
> + * @mmio:      Pointer to struct to hold decode information
> + *
> + * Some load/store instructions cannot be emulated using the information
> + * presented in the HSR, for instance, register write-back instructions are not
> + * supported. We therefore need to fetch the instruction, decode it, and then
> + * emulate its behavior.
> + *
> + * Handles emulation of load/store instructions which cannot be emulated through
> + * information found in the HSR on faults. It is necessary in this case to
> + * simply decode the offending instruction in software and determine the
> + * required operands.
> + */
> +int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +			struct kvm_exit_mmio *mmio)
> +{
> +	unsigned long instr = 0;
> +	struct pt_regs current_regs;
> +	struct kvm_decode *decode = &vcpu->arch.mmio_decode;
> +	int ret;
> +
> +	trace_kvm_mmio_emulate(*vcpu_pc(vcpu), instr, *vcpu_cpsr(vcpu));
> +
> +	/* If it fails (SMP race?), we reenter guest for it to retry. */
> +	if (!copy_current_insn(vcpu, &instr))
> +		return 1;
> +
> +	mmio->phys_addr = fault_ipa;
> +
> +	memcpy(&current_regs, &vcpu->arch.regs.usr_regs, sizeof(current_regs));
> +	current_regs.ARM_sp = *vcpu_reg(vcpu, 13);
> +	current_regs.ARM_lr = *vcpu_reg(vcpu, 14);
> +
> +	decode->regs = &current_regs;
> +	decode->fault_addr = vcpu->arch.hxfar;
> +	ret = kvm_decode_load_store(decode, instr, mmio);
> +	if (ret) {
> +		kvm_debug("Insrn. decode error: %#08lx (cpsr: %#08x"
> +			  "pc: %#08x)\n",
> +			  instr, *vcpu_cpsr(vcpu), *vcpu_pc(vcpu));
> +		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
> +		return ret;
> +	}
> +
> +	memcpy(&vcpu->arch.regs.usr_regs, &current_regs, sizeof(current_regs));
> +	*vcpu_reg(vcpu, 13) = current_regs.ARM_sp;
> +	*vcpu_reg(vcpu, 14) = current_regs.ARM_lr;
> +
> +	/*
> +	 * The MMIO instruction is emulated and should not be re-executed
> +	 * in the guest.
> +	 */
> +	kvm_skip_instr(vcpu, is_wide_instruction(instr));
> +	return 0;
> +}
> +
>  /**
>   * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
>   * @vcpu:	The VCPU pointer
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 08adcd5..45570b8 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -192,6 +192,44 @@ after_vfp_restore:
>  	mov	r0, r1			@ Return the return code
>  	bx	lr			@ return to IOCTL
>  
> +
> +/********************************************************************
> + * Translate VA to PA
> + *
> + * u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv)
> + *
> + * Arguments:
> + *  r0: pointer to vcpu struct
> + *  r1: virtual address to map (rounded to page)
> + *  r2: 1 = P1 (read) mapping, 0 = P0 (read) mapping.
> + * Returns 64 bit PAR value.
> + */
> +ENTRY(__kvm_va_to_pa)
> +	push	{r4-r12}
> +
> +	@ Fold flag into r1, easier than using stack.
> +	cmp	r2, #0
> +	movne	r2, #1
> +	orr	r1, r1, r2
> +
> +	@ This swaps too many registers, but we're in the slow path anyway.
> +	read_cp15_state store_to_vcpu = 0
> +	write_cp15_state read_from_vcpu = 1
> +
> +	ands	r2, r1, #1
> +	bic	r1, r1, r2
> +	mcrne	p15, 0, r1, c7, c8, 0	@ VA to PA, ATS1CPR
> +	mcreq	p15, 0, r1, c7, c8, 2	@ VA to PA, ATS1CUR
> +	isb
> +
> +	@ Restore host state.
> +	read_cp15_state store_to_vcpu = 1
> +	write_cp15_state read_from_vcpu = 0
> +
> +	mrrc	p15, 0, r0, r1, c7	@ PAR
> +	pop	{r4-r12}
> +	bx	lr
> +
>  ENTRY(kvm_call_hyp)
>  	hvc	#0
>  	bx	lr
> diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
> new file mode 100644
> index 0000000..d6a4ca0
> --- /dev/null
> +++ b/arch/arm/kvm/mmio.c
> @@ -0,0 +1,154 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#include <asm/kvm_mmio.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_decode.h>
> +#include <trace/events/kvm.h>
> +
> +#include "trace.h"
> +
> +/**
> + * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation
> + * @vcpu: The VCPU pointer
> + * @run:  The VCPU run struct containing the mmio data
> + *
> + * This should only be called after returning from userspace for MMIO load
> + * emulation.
> + */
> +int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> +	__u32 *dest;
> +	unsigned int len;
> +	int mask;
> +
> +	if (!run->mmio.is_write) {
> +		dest = vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt);
> +		memset(dest, 0, sizeof(int));
> +
> +		len = run->mmio.len;
> +		if (len > 4)
> +			return -EINVAL;
> +
> +		memcpy(dest, run->mmio.data, len);
> +
> +		trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
> +				*((u64 *)run->mmio.data));
> +
> +		if (vcpu->arch.mmio_decode.sign_extend && len < 4) {
> +			mask = 1U << ((len * 8) - 1);
> +			*dest = (*dest ^ mask) - mask;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +		      struct kvm_exit_mmio *mmio)
> +{
> +	unsigned long rt, len;
> +	bool is_write, sign_extend;
> +
> +	if ((vcpu->arch.hsr >> 8) & 1) {
> +		/* cache operation on I/O addr, tell guest unsupported */
> +		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
> +		return 1;
> +	}
> +
> +	if ((vcpu->arch.hsr >> 7) & 1) {
> +		/* page table accesses IO mem: tell guest to fix its TTBR */
> +		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
> +		return 1;
> +	}
> +
> +	switch ((vcpu->arch.hsr >> 22) & 0x3) {
> +	case 0:
> +		len = 1;
> +		break;
> +	case 1:
> +		len = 2;
> +		break;
> +	case 2:
> +		len = 4;
> +		break;
> +	default:
> +		kvm_err("Hardware is weird: SAS 0b11 is reserved\n");
> +		return -EFAULT;
> +	}
> +
> +	is_write = vcpu->arch.hsr & HSR_WNR;
> +	sign_extend = vcpu->arch.hsr & HSR_SSE;
> +	rt = (vcpu->arch.hsr & HSR_SRT_MASK) >> HSR_SRT_SHIFT;
> +
> +	if (kvm_vcpu_reg_is_pc(vcpu, rt)) {
> +		/* IO memory trying to read/write pc */
> +		kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
> +		return 1;
> +	}
> +
> +	mmio->is_write = is_write;
> +	mmio->phys_addr = fault_ipa;
> +	mmio->len = len;
> +	vcpu->arch.mmio_decode.sign_extend = sign_extend;
> +	vcpu->arch.mmio_decode.rt = rt;
> +
> +	/*
> +	 * The MMIO instruction is emulated and should not be re-executed
> +	 * in the guest.
> +	 */
> +	kvm_skip_instr(vcpu, (vcpu->arch.hsr >> 25) & 1);
> +	return 0;
> +}
> +
> +int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
> +{
> +	struct kvm_exit_mmio mmio;
> +	unsigned long rt;
> +	int ret;
> +
> +	/*
> +	 * Prepare MMIO operation. First stash it in a private
> +	 * structure that we can use for in-kernel emulation. If the
> +	 * kernel can't handle it, copy it into run->mmio and let user
> +	 * space do its magic.
> +	 */
> +
> +	if (vcpu->arch.hsr & HSR_ISV) {
> +		ret = decode_hsr(vcpu, fault_ipa, &mmio);
> +		if (ret)
> +			return ret;
> +	} else {
> +		ret = kvm_emulate_mmio_ls(vcpu, fault_ipa, &mmio);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	rt = vcpu->arch.mmio_decode.rt;
> +	trace_kvm_mmio((mmio.is_write) ? KVM_TRACE_MMIO_WRITE :
> +					 KVM_TRACE_MMIO_READ_UNSATISFIED,
> +			mmio.len, fault_ipa,
> +			(mmio.is_write) ? *vcpu_reg(vcpu, rt) : 0);
> +
> +	if (mmio.is_write)
> +		memcpy(mmio.data, vcpu_reg(vcpu, rt), mmio.len);
> +
> +	kvm_prepare_mmio(run, &mmio);
> +	return 0;
> +}
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 0ce0e77..2a83ac9 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -19,11 +19,13 @@
>  #include <linux/mman.h>
>  #include <linux/kvm_host.h>
>  #include <linux/io.h>
> +#include <trace/events/kvm.h>
>  #include <asm/idmap.h>
>  #include <asm/pgalloc.h>
>  #include <asm/cacheflush.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_mmio.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/mach/map.h>
> @@ -620,8 +622,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  			return -EFAULT;
>  		}
>  
> -		kvm_pr_unimpl("I/O address abort...");
> -		return 0;
> +		/* Adjust page offset */
> +		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
> +		return io_mem_abort(vcpu, run, fault_ipa, memslot);
>  	}
>  
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index 5d65751..cd52640 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -90,6 +90,27 @@ TRACE_EVENT(kvm_irq_line,
>  		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
>  );
>  
> +TRACE_EVENT(kvm_mmio_emulate,
> +	TP_PROTO(unsigned long vcpu_pc, unsigned long instr,
> +		 unsigned long cpsr),
> +	TP_ARGS(vcpu_pc, instr, cpsr),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	vcpu_pc		)
> +		__field(	unsigned long,	instr		)
> +		__field(	unsigned long,	cpsr		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu_pc		= vcpu_pc;
> +		__entry->instr			= instr;
> +		__entry->cpsr			= cpsr;
> +	),
> +
> +	TP_printk("Emulate MMIO at: 0x%08lx (instr: %08lx, cpsr: %08lx)",
> +		  __entry->vcpu_pc, __entry->instr, __entry->cpsr)
> +);
> +
>  /* Architecturally implementation defined CP15 register access */
>  TRACE_EVENT(kvm_emulate_cp15_imp,
>  	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-15 13:18     ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> When the guest accesses I/O memory this will create data abort
> exceptions and they are handled by decoding the HSR information
> (physical address, read/write, length, register) and forwarding reads
> and writes to QEMU which performs the device emulation.
> 
> Certain classes of load/store operations do not support the syndrome
> information provided in the HSR and we therefore must be able to fetch
> the offending instruction from guest memory and decode it manually.
> 
> We only support instruction decoding for valid reasonable MMIO operations
> where trapping them do not provide sufficient information in the HSR (no
> 16-bit Thumb instructions provide register writeback that we care about).
> 
> The following instruction types are NOT supported for MMIO operations
> despite the HSR not containing decode info:
>  - any Load/Store multiple
>  - any load/store exclusive
>  - any load/store dual
>  - anything with the PC as the dest register
> 
> This requires changing the general flow somewhat since new calls to run
> the VCPU must check if there's a pending MMIO load and perform the write
> after userspace has made the data available.
> 
> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
> (1) Guest complicated mmio instruction traps.
> (2) The hardware doesn't tell us enough, so we need to read the actual
>     instruction which was being exectuted.
> (3) KVM maps the instruction virtual address to a physical address.
> (4) The guest (SMP) swaps out that page, and fills it with something else.
> (5) We read the physical address, but now that's the wrong thing.
How can this happen?! The guest cannot reuse physical page before it
flushes it from all vcpus tlb cache. For that it needs to send
synchronous IPI to all vcpus and IPI will not be processed by a vcpu
while it does emulation.

> 
> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  arch/arm/include/asm/kvm_arm.h     |    3 
>  arch/arm/include/asm/kvm_asm.h     |    2 
>  arch/arm/include/asm/kvm_decode.h  |   47 ++++
>  arch/arm/include/asm/kvm_emulate.h |    8 +
>  arch/arm/include/asm/kvm_host.h    |    7 +
>  arch/arm/include/asm/kvm_mmio.h    |   51 ++++
>  arch/arm/kvm/Makefile              |    2 
>  arch/arm/kvm/arm.c                 |   14 +
>  arch/arm/kvm/decode.c              |  462 ++++++++++++++++++++++++++++++++++++
>  arch/arm/kvm/emulate.c             |  169 +++++++++++++
>  arch/arm/kvm/interrupts.S          |   38 +++
>  arch/arm/kvm/mmio.c                |  154 ++++++++++++
>  arch/arm/kvm/mmu.c                 |    7 -
>  arch/arm/kvm/trace.h               |   21 ++
>  14 files changed, 981 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm/include/asm/kvm_decode.h
>  create mode 100644 arch/arm/include/asm/kvm_mmio.h
>  create mode 100644 arch/arm/kvm/decode.c
>  create mode 100644 arch/arm/kvm/mmio.c
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index 3ff6f22..151c4ce 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -173,8 +173,11 @@
>  #define HSR_ISS		(HSR_IL - 1)
>  #define HSR_ISV_SHIFT	(24)
>  #define HSR_ISV		(1U << HSR_ISV_SHIFT)
> +#define HSR_SRT_SHIFT	(16)
> +#define HSR_SRT_MASK	(0xf << HSR_SRT_SHIFT)
>  #define HSR_FSC		(0x3f)
>  #define HSR_FSC_TYPE	(0x3c)
> +#define HSR_SSE		(1 << 21)
>  #define HSR_WNR		(1 << 6)
>  #define HSR_CV_SHIFT	(24)
>  #define HSR_CV		(1U << HSR_CV_SHIFT)
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 5e06e81..58d787b 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -77,6 +77,8 @@ extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +
> +extern u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv);
>  #endif
>  
>  #endif /* __ARM_KVM_ASM_H__ */
> diff --git a/arch/arm/include/asm/kvm_decode.h b/arch/arm/include/asm/kvm_decode.h
> new file mode 100644
> index 0000000..3c37cb9
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_decode.h
> @@ -0,0 +1,47 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_DECODE_H__
> +#define __ARM_KVM_DECODE_H__
> +
> +#include <linux/types.h>
> +
> +struct kvm_vcpu;
> +struct kvm_exit_mmio;
> +
> +struct kvm_decode {
> +	struct pt_regs *regs;
> +	unsigned long fault_addr;
> +	unsigned long rt;
> +	bool sign_extend;
> +};
> +
> +int kvm_decode_load_store(struct kvm_decode *decode, unsigned long instr,
> +			  struct kvm_exit_mmio *mmio);
> +
> +static inline unsigned long *kvm_decode_reg(struct kvm_decode *decode, int reg)
> +{
> +	return &decode->regs->uregs[reg];
> +}
> +
> +static inline unsigned long *kvm_decode_cpsr(struct kvm_decode *decode)
> +{
> +	return &decode->regs->ARM_cpsr;
> +}
> +
> +#endif /* __ARM_KVM_DECODE_H__ */
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 01a755b..375795b 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -21,11 +21,14 @@
>  
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_asm.h>
> +#include <asm/kvm_mmio.h>
>  
>  u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
>  u32 *vcpu_spsr(struct kvm_vcpu *vcpu);
>  
>  int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
> +int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +			struct kvm_exit_mmio *mmio);
>  void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
>  void kvm_inject_undefined(struct kvm_vcpu *vcpu);
>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
> @@ -53,4 +56,9 @@ static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
>  	return cpsr_mode > USR_MODE;;
>  }
>  
> +static inline bool kvm_vcpu_reg_is_pc(struct kvm_vcpu *vcpu, int reg)
> +{
> +	return reg == 15;
> +}
> +
>  #endif /* __ARM_KVM_EMULATE_H__ */
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 6cc8933..ca40795 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -22,6 +22,7 @@
>  #include <asm/kvm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/fpstate.h>
> +#include <asm/kvm_decode.h>
>  
>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>  #define KVM_USER_MEM_SLOTS 32
> @@ -99,6 +100,12 @@ struct kvm_vcpu_arch {
>  	int last_pcpu;
>  	cpumask_t require_dcache_flush;
>  
> +	/* Don't run the guest: see copy_current_insn() */
> +	bool pause;
> +
> +	/* IO related fields */
> +	struct kvm_decode mmio_decode;
> +
>  	/* Interrupt related fields */
>  	u32 irq_lines;		/* IRQ and FIQ levels */
>  
> diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
> new file mode 100644
> index 0000000..31ab9f5
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_mmio.h
> @@ -0,0 +1,51 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#ifndef __ARM_KVM_MMIO_H__
> +#define __ARM_KVM_MMIO_H__
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_arm.h>
> +
> +/*
> + * The in-kernel MMIO emulation code wants to use a copy of run->mmio,
> + * which is an anonymous type. Use our own type instead.
> + */
> +struct kvm_exit_mmio {
> +	phys_addr_t	phys_addr;
> +	u8		data[8];
> +	u32		len;
> +	bool		is_write;
> +};
> +
> +static inline void kvm_prepare_mmio(struct kvm_run *run,
> +				    struct kvm_exit_mmio *mmio)
> +{
> +	run->mmio.phys_addr	= mmio->phys_addr;
> +	run->mmio.len		= mmio->len;
> +	run->mmio.is_write	= mmio->is_write;
> +	memcpy(run->mmio.data, mmio->data, mmio->len);
> +	run->exit_reason	= KVM_EXIT_MMIO;
> +}
> +
> +int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
> +int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot);
> +
> +#endif	/* __ARM_KVM_MMIO_H__ */
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index 88edce6..44a5f4b 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -18,4 +18,4 @@ kvm-arm-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
>  
>  obj-y += kvm-arm.o init.o interrupts.o
>  obj-y += arm.o guest.o mmu.o emulate.o reset.o
> -obj-y += coproc.o coproc_a15.o
> +obj-y += coproc.o coproc_a15.o mmio.o decode.o
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 0b4ffcf..f42d828 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -614,6 +614,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	if (unlikely(vcpu->arch.target < 0))
>  		return -ENOEXEC;
>  
> +	if (run->exit_reason == KVM_EXIT_MMIO) {
> +		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	if (vcpu->sigset_active)
>  		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>  
> @@ -649,7 +655,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_guest_enter();
>  		vcpu->mode = IN_GUEST_MODE;
>  
> -		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> +		smp_mb(); /* set mode before reading vcpu->arch.pause */
> +		if (unlikely(vcpu->arch.pause)) {
> +			/* This means ignore, try again. */
> +			ret = ARM_EXCEPTION_IRQ;
> +		} else {
> +			ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> +		}
>  
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>  		vcpu->arch.last_pcpu = smp_processor_id();
> diff --git a/arch/arm/kvm/decode.c b/arch/arm/kvm/decode.c
> new file mode 100644
> index 0000000..469cf14
> --- /dev/null
> +++ b/arch/arm/kvm/decode.c
> @@ -0,0 +1,462 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_mmio.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_decode.h>
> +#include <trace/events/kvm.h>
> +
> +#include "trace.h"
> +
> +struct arm_instr {
> +	/* Instruction decoding */
> +	u32 opc;
> +	u32 opc_mask;
> +
> +	/* Decoding for the register write back */
> +	bool register_form;
> +	u32 imm;
> +	u8 Rm;
> +	u8 type;
> +	u8 shift_n;
> +
> +	/* Common decoding */
> +	u8 len;
> +	bool sign_extend;
> +	bool w;
> +
> +	bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +		       unsigned long instr, struct arm_instr *ai);
> +};
> +
> +enum SRType {
> +	SRType_LSL,
> +	SRType_LSR,
> +	SRType_ASR,
> +	SRType_ROR,
> +	SRType_RRX
> +};
> +
> +/* Modelled after DecodeImmShift() in the ARM ARM */
> +static enum SRType decode_imm_shift(u8 type, u8 imm5, u8 *amount)
> +{
> +	switch (type) {
> +	case 0x0:
> +		*amount = imm5;
> +		return SRType_LSL;
> +	case 0x1:
> +		*amount = (imm5 == 0) ? 32 : imm5;
> +		return SRType_LSR;
> +	case 0x2:
> +		*amount = (imm5 == 0) ? 32 : imm5;
> +		return SRType_ASR;
> +	case 0x3:
> +		if (imm5 == 0) {
> +			*amount = 1;
> +			return SRType_RRX;
> +		} else {
> +			*amount = imm5;
> +			return SRType_ROR;
> +		}
> +	}
> +
> +	return SRType_LSL;
> +}
> +
> +/* Modelled after Shift() in the ARM ARM */
> +static u32 shift(u32 value, u8 N, enum SRType type, u8 amount, bool carry_in)
> +{
> +	u32 mask = (1 << N) - 1;
> +	s32 svalue = (s32)value;
> +
> +	BUG_ON(N > 32);
> +	BUG_ON(type == SRType_RRX && amount != 1);
> +	BUG_ON(amount > N);
> +
> +	if (amount == 0)
> +		return value;
> +
> +	switch (type) {
> +	case SRType_LSL:
> +		value <<= amount;
> +		break;
> +	case SRType_LSR:
> +		 value >>= amount;
> +		break;
> +	case SRType_ASR:
> +		if (value & (1 << (N - 1)))
> +			svalue |= ((-1UL) << N);
> +		value = svalue >> amount;
> +		break;
> +	case SRType_ROR:
> +		value = (value >> amount) | (value << (N - amount));
> +		break;
> +	case SRType_RRX: {
> +		u32 C = (carry_in) ? 1 : 0;
> +		value = (value >> 1) | (C << (N - 1));
> +		break;
> +	}
> +	}
> +
> +	return value & mask;
> +}
> +
> +static bool decode_arm_wb(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +			  unsigned long instr, const struct arm_instr *ai)
> +{
> +	u8 Rt = (instr >> 12) & 0xf;
> +	u8 Rn = (instr >> 16) & 0xf;
> +	u8 W = (instr >> 21) & 1;
> +	u8 U = (instr >> 23) & 1;
> +	u8 P = (instr >> 24) & 1;
> +	u32 base_addr = *kvm_decode_reg(decode, Rn);
> +	u32 offset_addr, offset;
> +
> +	/*
> +	 * Technically this is allowed in certain circumstances,
> +	 * but we don't support it.
> +	 */
> +	if (Rt == 15 || Rn == 15)
> +		return false;
> +
> +	if (P && !W) {
> +		kvm_err("Decoding operation with valid ISV?\n");
> +		return false;
> +	}
> +
> +	decode->rt = Rt;
> +
> +	if (ai->register_form) {
> +		/* Register operation */
> +		enum SRType s_type;
> +		u8 shift_n = 0;
> +		bool c_bit = *kvm_decode_cpsr(decode) & PSR_C_BIT;
> +		u32 s_reg = *kvm_decode_reg(decode, ai->Rm);
> +
> +		s_type = decode_imm_shift(ai->type, ai->shift_n, &shift_n);
> +		offset = shift(s_reg, 5, s_type, shift_n, c_bit);
> +	} else {
> +		/* Immediate operation */
> +		offset = ai->imm;
> +	}
> +
> +	/* Handle Writeback */
> +	if (U)
> +		offset_addr = base_addr + offset;
> +	else
> +		offset_addr = base_addr - offset;
> +	*kvm_decode_reg(decode, Rn) = offset_addr;
> +	return true;
> +}
> +
> +static bool decode_arm_ls(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +			  unsigned long instr, struct arm_instr *ai)
> +{
> +	u8 A = (instr >> 25) & 1;
> +
> +	mmio->is_write = ai->w;
> +	mmio->len = ai->len;
> +	decode->sign_extend = false;
> +
> +	ai->register_form = A;
> +	ai->imm = instr & 0xfff;
> +	ai->Rm = instr & 0xf;
> +	ai->type = (instr >> 5) & 0x3;
> +	ai->shift_n = (instr >> 7) & 0x1f;
> +
> +	return decode_arm_wb(decode, mmio, instr, ai);
> +}
> +
> +static bool decode_arm_extra(struct kvm_decode *decode,
> +			     struct kvm_exit_mmio *mmio,
> +			     unsigned long instr, struct arm_instr *ai)
> +{
> +	mmio->is_write = ai->w;
> +	mmio->len = ai->len;
> +	decode->sign_extend = ai->sign_extend;
> +
> +	ai->register_form = !((instr >> 22) & 1);
> +	ai->imm = ((instr >> 4) & 0xf0) | (instr & 0xf);
> +	ai->Rm = instr & 0xf;
> +	ai->type = 0; /* SRType_LSL */
> +	ai->shift_n = 0;
> +
> +	return decode_arm_wb(decode, mmio, instr, ai);
> +}
> +
> +/*
> + * The encodings in this table assumes that a fault was generated where the
> + * ISV field in the HSR was clear, and the decoding information was invalid,
> + * which means that a register write-back occurred, the PC was used as the
> + * destination or a load/store multiple operation was used. Since the latter
> + * two cases are crazy for MMIO on the guest side, we simply inject a fault
> + * when this happens and support the common case.
> + *
> + * We treat unpriviledged loads and stores of words and bytes like all other
> + * loads and stores as their encodings mandate the W bit set and the P bit
> + * clear.
> + */
> +static const struct arm_instr arm_instr[] = {
> +	/**************** Load/Store Word and Byte **********************/
> +	/* Store word with writeback */
> +	{ .opc = 0x04000000, .opc_mask = 0x0c500000, .len = 4, .w = true,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Store byte with writeback */
> +	{ .opc = 0x04400000, .opc_mask = 0x0c500000, .len = 1, .w = true,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Load word with writeback */
> +	{ .opc = 0x04100000, .opc_mask = 0x0c500000, .len = 4, .w = false,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +	/* Load byte with writeback */
> +	{ .opc = 0x04500000, .opc_mask = 0x0c500000, .len = 1, .w = false,
> +		.sign_extend = false, .decode = decode_arm_ls },
> +
> +	/*************** Extra load/store instructions ******************/
> +
> +	/* Store halfword with writeback */
> +	{ .opc = 0x000000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load halfword with writeback */
> +	{ .opc = 0x001000b0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +
> +	/* Load dual with writeback */
> +	{ .opc = 0x000000d0, .opc_mask = 0x0c1000f0, .len = 8, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed byte with writeback */
> +	{ .opc = 0x001000d0, .opc_mask = 0x0c1000f0, .len = 1, .w = false,
> +		.sign_extend = true,  .decode = decode_arm_extra },
> +
> +	/* Store dual with writeback */
> +	{ .opc = 0x000000f0, .opc_mask = 0x0c1000f0, .len = 8, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed halfword with writeback */
> +	{ .opc = 0x001000f0, .opc_mask = 0x0c1000f0, .len = 2, .w = false,
> +		.sign_extend = true,  .decode = decode_arm_extra },
> +
> +	/* Store halfword unprivileged */
> +	{ .opc = 0x002000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = true,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load halfword unprivileged */
> +	{ .opc = 0x003000b0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
> +		.sign_extend = false, .decode = decode_arm_extra },
> +	/* Load signed byte unprivileged */
> +	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 1, .w = false,
> +		.sign_extend = true , .decode = decode_arm_extra },
> +	/* Load signed halfword unprivileged */
> +	{ .opc = 0x003000d0, .opc_mask = 0x0f3000f0, .len = 2, .w = false,
> +		.sign_extend = true , .decode = decode_arm_extra },
> +};
> +
> +static bool kvm_decode_arm_ls(struct kvm_decode *decode, unsigned long instr,
> +			      struct kvm_exit_mmio *mmio)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(arm_instr); i++) {
> +		const struct arm_instr *ai = &arm_instr[i];
> +		if ((instr & ai->opc_mask) == ai->opc) {
> +			struct arm_instr ai_copy = *ai;
> +			return ai->decode(decode, mmio, instr, &ai_copy);
> +		}
> +	}
> +	return false;
> +}
> +
> +struct thumb_instr {
> +	bool is32;
> +
> +	u8 opcode;
> +	u8 opcode_mask;
> +	u8 op2;
> +	u8 op2_mask;
> +
> +	bool (*decode)(struct kvm_decode *decode, struct kvm_exit_mmio *mmio,
> +		       unsigned long instr, const struct thumb_instr *ti);
> +};
> +
> +static bool decode_thumb_wb(struct kvm_decode *decode,
> +			    struct kvm_exit_mmio *mmio,
> +			    unsigned long instr)
> +{
> +	bool P = (instr >> 10) & 1;
> +	bool U = (instr >> 9) & 1;
> +	u8 imm8 = instr & 0xff;
> +	u32 offset_addr = decode->fault_addr;
> +	u8 Rn = (instr >> 16) & 0xf;
> +
> +	decode->rt = (instr >> 12) & 0xf;
> +
> +	if (Rn == 15)
> +		return false;
> +
> +	/* Handle Writeback */
> +	if (!P && U)
> +		*kvm_decode_reg(decode, Rn) = offset_addr + imm8;
> +	else if (!P && !U)
> +		*kvm_decode_reg(decode, Rn) = offset_addr - imm8;
> +	return true;
> +}
> +
> +static bool decode_thumb_str(struct kvm_decode *decode,
> +			     struct kvm_exit_mmio *mmio,
> +			     unsigned long instr, const struct thumb_instr *ti)
> +{
> +	u8 op1 = (instr >> (16 + 5)) & 0x7;
> +	u8 op2 = (instr >> 6) & 0x3f;
> +
> +	mmio->is_write = true;
> +	decode->sign_extend = false;
> +
> +	switch (op1) {
> +	case 0x0: mmio->len = 1; break;
> +	case 0x1: mmio->len = 2; break;
> +	case 0x2: mmio->len = 4; break;
> +	default:
> +		  return false; /* Only register write-back versions! */
> +	}
> +
> +	if ((op2 & 0x24) == 0x24) {
> +		/* STRB (immediate, thumb, W=1) */
> +		return decode_thumb_wb(decode, mmio, instr);
> +	}
> +
> +	return false;
> +}
> +
> +static bool decode_thumb_ldr(struct kvm_decode *decode,
> +			     struct kvm_exit_mmio *mmio,
> +			     unsigned long instr, const struct thumb_instr *ti)
> +{
> +	u8 op1 = (instr >> (16 + 7)) & 0x3;
> +	u8 op2 = (instr >> 6) & 0x3f;
> +
> +	mmio->is_write = false;
> +
> +	switch (ti->op2 & 0x7) {
> +	case 0x1: mmio->len = 1; break;
> +	case 0x3: mmio->len = 2; break;
> +	case 0x5: mmio->len = 4; break;
> +	}
> +
> +	if (op1 == 0x0)
> +		decode->sign_extend = false;
> +	else if (op1 == 0x2 && (ti->op2 & 0x7) != 0x5)
> +		decode->sign_extend = true;
> +	else
> +		return false; /* Only register write-back versions! */
> +
> +	if ((op2 & 0x24) == 0x24) {
> +		/* LDR{S}X (immediate, thumb, W=1) */
> +		return decode_thumb_wb(decode, mmio, instr);
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * We only support instruction decoding for valid reasonable MMIO operations
> + * where trapping them do not provide sufficient information in the HSR (no
> + * 16-bit Thumb instructions provide register writeback that we care about).
> + *
> + * The following instruciton types are NOT supported for MMIO operations
> + * despite the HSR not containing decode info:
> + *  - any Load/Store multiple
> + *  - any load/store exclusive
> + *  - any load/store dual
> + *  - anything with the PC as the dest register
> + */
> +static const struct thumb_instr thumb_instr[] = {
> +	/**************** 32-bit Thumb instructions **********************/
> +	/* Store single data item:	Op1 == 11, Op2 == 000xxx0 */
> +	{ .is32 = true,  .opcode = 3, .op2 = 0x00, .op2_mask = 0x71,
> +						decode_thumb_str	},
> +
> +	/* Load byte:			Op1 == 11, Op2 == 00xx001 */
> +	{ .is32 = true,  .opcode = 3, .op2 = 0x01, .op2_mask = 0x67,
> +						decode_thumb_ldr	},
> +
> +	/* Load halfword:		Op1 == 11, Op2 == 00xx011 */
> +	{ .is32 = true,  .opcode = 3, .op2 = 0x03, .op2_mask = 0x67,
> +						decode_thumb_ldr	},
> +
> +	/* Load word:			Op1 == 11, Op2 == 00xx101 */
> +	{ .is32 = true,  .opcode = 3, .op2 = 0x05, .op2_mask = 0x67,
> +						decode_thumb_ldr	},
> +};
> +
> +
> +
> +static bool kvm_decode_thumb_ls(struct kvm_decode *decode, unsigned long instr,
> +				struct kvm_exit_mmio *mmio)
> +{
> +	bool is32 = is_wide_instruction(instr);
> +	bool is16 = !is32;
> +	struct thumb_instr tinstr; /* re-use to pass on already decoded info */
> +	int i;
> +
> +	if (is16) {
> +		tinstr.opcode = (instr >> 10) & 0x3f;
> +	} else {
> +		tinstr.opcode = (instr >> (16 + 11)) & 0x3;
> +		tinstr.op2 = (instr >> (16 + 4)) & 0x7f;
> +	}
> +
> +	for (i = 0; i < ARRAY_SIZE(thumb_instr); i++) {
> +		const struct thumb_instr *ti = &thumb_instr[i];
> +		if (ti->is32 != is32)
> +			continue;
> +
> +		if (is16) {
> +			if ((tinstr.opcode & ti->opcode_mask) != ti->opcode)
> +				continue;
> +		} else {
> +			if (ti->opcode != tinstr.opcode)
> +				continue;
> +			if ((ti->op2_mask & tinstr.op2) != ti->op2)
> +				continue;
> +		}
> +
> +		return ti->decode(decode, mmio, instr, &tinstr);
> +	}
> +
> +	return false;
> +}
> +
> +/**
> + * kvm_decode_load_store - decodes load/store instructions
> + * @decode: reads regs and fault_addr, writes rt and sign_extend
> + * @instr:  instruction to decode
> + * @mmio:   fills in len and is_write
> + *
> + * Decode load/store instructions with HSR ISV clear. The code assumes that
> + * this was indeed a KVM fault and therefore assumes registers write back for
> + * single load/store operations and does not support using the PC as the
> + * destination register.
> + */
> +int kvm_decode_load_store(struct kvm_decode *decode, unsigned long instr,
> +			  struct kvm_exit_mmio *mmio)
> +{
> +	bool is_thumb;
> +
> +	is_thumb = !!(*kvm_decode_cpsr(decode) & PSR_T_BIT);
> +	if (!is_thumb)
> +		return kvm_decode_arm_ls(decode, instr, mmio) ? 0 : 1;
> +	else
> +		return kvm_decode_thumb_ls(decode, instr, mmio) ? 0 : 1;
> +}
> diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
> index d61450a..ad743b7 100644
> --- a/arch/arm/kvm/emulate.c
> +++ b/arch/arm/kvm/emulate.c
> @@ -20,6 +20,7 @@
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_decode.h>
>  #include <trace/events/kvm.h>
>  
>  #include "trace.h"
> @@ -176,6 +177,174 @@ int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	return 1;
>  }
>  
> +static u64 kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv)
> +{
> +	return kvm_call_hyp(__kvm_va_to_pa, vcpu, va, priv);
> +}
> +
> +/**
> + * copy_from_guest_va - copy memory from guest (very slow!)
> + * @vcpu:	vcpu pointer
> + * @dest:	memory to copy into
> + * @gva:	virtual address in guest to copy from
> + * @len:	length to copy
> + * @priv:	use guest PL1 (ie. kernel) mappings
> + *              otherwise use guest PL0 mappings.
> + *
> + * Returns true on success, false on failure (unlikely, but retry).
> + */
> +static bool copy_from_guest_va(struct kvm_vcpu *vcpu,
> +			       void *dest, unsigned long gva, size_t len,
> +			       bool priv)
> +{
> +	u64 par;
> +	phys_addr_t pc_ipa;
> +	int err;
> +
> +	BUG_ON((gva & PAGE_MASK) != ((gva + len) & PAGE_MASK));
> +	par = kvm_va_to_pa(vcpu, gva & PAGE_MASK, priv);
> +	if (par & 1) {
> +		kvm_err("IO abort from invalid instruction address"
> +			" %#lx!\n", gva);
> +		return false;
> +	}
> +
> +	BUG_ON(!(par & (1U << 11)));
> +	pc_ipa = par & PAGE_MASK & ((1ULL << 32) - 1);
> +	pc_ipa += gva & ~PAGE_MASK;
> +
> +
> +	err = kvm_read_guest(vcpu->kvm, pc_ipa, dest, len);
> +	if (unlikely(err))
> +		return false;
> +
> +	return true;
> +}
> +
> +/*
> + * We have to be very careful copying memory from a running (ie. SMP) guest.
> + * Another CPU may remap the page (eg. swap out a userspace text page) as we
> + * read the instruction.  Unlike normal hardware operation, to emulate an
> + * instruction we map the virtual to physical address then read that memory
> + * as separate steps, thus not atomic.
> + *
> + * Fortunately this is so rare (we don't usually need the instruction), we
> + * can go very slowly and noone will mind.
> + */
> +static bool copy_current_insn(struct kvm_vcpu *vcpu, unsigned long *instr)
> +{
> +	int i;
> +	bool ret;
> +	struct kvm_vcpu *v;
> +	bool is_thumb;
> +	size_t instr_len;
> +
> +	/* Don't cross with IPIs in kvm_main.c */
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	/* Tell them all to pause, so no more will enter guest. */
> +	kvm_for_each_vcpu(i, v, vcpu->kvm)
> +		v->arch.pause = true;
> +
> +	/* Set ->pause before we read ->mode */
> +	smp_mb();
> +
> +	/* Kick out any which are still running. */
> +	kvm_for_each_vcpu(i, v, vcpu->kvm) {
> +		/* Guest could exit now, making cpu wrong. That's OK. */
> +		if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
> +			force_vm_exit(get_cpu_mask(v->cpu));
> +		}
> +	}
> +
> +
> +	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT);
> +	instr_len = (is_thumb) ? 2 : 4;
> +
> +	BUG_ON(!is_thumb && *vcpu_pc(vcpu) & 0x3);
> +
> +	/* Now guest isn't running, we can va->pa map and copy atomically. */
> +	ret = copy_from_guest_va(vcpu, instr, *vcpu_pc(vcpu), instr_len,
> +				 vcpu_mode_priv(vcpu));
> +	if (!ret)
> +		goto out;
> +
> +	/* A 32-bit thumb2 instruction can actually go over a page boundary! */
> +	if (is_thumb && is_wide_instruction(*instr)) {
> +		*instr = *instr << 16;
> +		ret = copy_from_guest_va(vcpu, instr, *vcpu_pc(vcpu) + 2, 2,
> +					 vcpu_mode_priv(vcpu));
> +	}
> +
> +out:
> +	/* Release them all. */
> +	kvm_for_each_vcpu(i, v, vcpu->kvm)
> +		v->arch.pause = false;
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return ret;
> +}
> +
> +/**
> + * kvm_emulate_mmio_ls - emulates load/store instructions made to I/O memory
> + * @vcpu:	The vcpu pointer
> + * @fault_ipa:	The IPA that caused the 2nd stage fault
> + * @mmio:      Pointer to struct to hold decode information
> + *
> + * Some load/store instructions cannot be emulated using the information
> + * presented in the HSR, for instance, register write-back instructions are not
> + * supported. We therefore need to fetch the instruction, decode it, and then
> + * emulate its behavior.
> + *
> + * Handles emulation of load/store instructions which cannot be emulated through
> + * information found in the HSR on faults. It is necessary in this case to
> + * simply decode the offending instruction in software and determine the
> + * required operands.
> + */
> +int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +			struct kvm_exit_mmio *mmio)
> +{
> +	unsigned long instr = 0;
> +	struct pt_regs current_regs;
> +	struct kvm_decode *decode = &vcpu->arch.mmio_decode;
> +	int ret;
> +
> +	trace_kvm_mmio_emulate(*vcpu_pc(vcpu), instr, *vcpu_cpsr(vcpu));
> +
> +	/* If it fails (SMP race?), we reenter guest for it to retry. */
> +	if (!copy_current_insn(vcpu, &instr))
> +		return 1;
> +
> +	mmio->phys_addr = fault_ipa;
> +
> +	memcpy(&current_regs, &vcpu->arch.regs.usr_regs, sizeof(current_regs));
> +	current_regs.ARM_sp = *vcpu_reg(vcpu, 13);
> +	current_regs.ARM_lr = *vcpu_reg(vcpu, 14);
> +
> +	decode->regs = &current_regs;
> +	decode->fault_addr = vcpu->arch.hxfar;
> +	ret = kvm_decode_load_store(decode, instr, mmio);
> +	if (ret) {
> +		kvm_debug("Insrn. decode error: %#08lx (cpsr: %#08x"
> +			  "pc: %#08x)\n",
> +			  instr, *vcpu_cpsr(vcpu), *vcpu_pc(vcpu));
> +		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
> +		return ret;
> +	}
> +
> +	memcpy(&vcpu->arch.regs.usr_regs, &current_regs, sizeof(current_regs));
> +	*vcpu_reg(vcpu, 13) = current_regs.ARM_sp;
> +	*vcpu_reg(vcpu, 14) = current_regs.ARM_lr;
> +
> +	/*
> +	 * The MMIO instruction is emulated and should not be re-executed
> +	 * in the guest.
> +	 */
> +	kvm_skip_instr(vcpu, is_wide_instruction(instr));
> +	return 0;
> +}
> +
>  /**
>   * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
>   * @vcpu:	The VCPU pointer
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 08adcd5..45570b8 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -192,6 +192,44 @@ after_vfp_restore:
>  	mov	r0, r1			@ Return the return code
>  	bx	lr			@ return to IOCTL
>  
> +
> +/********************************************************************
> + * Translate VA to PA
> + *
> + * u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv)
> + *
> + * Arguments:
> + *  r0: pointer to vcpu struct
> + *  r1: virtual address to map (rounded to page)
> + *  r2: 1 = P1 (read) mapping, 0 = P0 (read) mapping.
> + * Returns 64 bit PAR value.
> + */
> +ENTRY(__kvm_va_to_pa)
> +	push	{r4-r12}
> +
> +	@ Fold flag into r1, easier than using stack.
> +	cmp	r2, #0
> +	movne	r2, #1
> +	orr	r1, r1, r2
> +
> +	@ This swaps too many registers, but we're in the slow path anyway.
> +	read_cp15_state store_to_vcpu = 0
> +	write_cp15_state read_from_vcpu = 1
> +
> +	ands	r2, r1, #1
> +	bic	r1, r1, r2
> +	mcrne	p15, 0, r1, c7, c8, 0	@ VA to PA, ATS1CPR
> +	mcreq	p15, 0, r1, c7, c8, 2	@ VA to PA, ATS1CUR
> +	isb
> +
> +	@ Restore host state.
> +	read_cp15_state store_to_vcpu = 1
> +	write_cp15_state read_from_vcpu = 0
> +
> +	mrrc	p15, 0, r0, r1, c7	@ PAR
> +	pop	{r4-r12}
> +	bx	lr
> +
>  ENTRY(kvm_call_hyp)
>  	hvc	#0
>  	bx	lr
> diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
> new file mode 100644
> index 0000000..d6a4ca0
> --- /dev/null
> +++ b/arch/arm/kvm/mmio.c
> @@ -0,0 +1,154 @@
> +/*
> + * Copyright (C) 2012 - Virtual Open Systems and Columbia University
> + * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +
> +#include <asm/kvm_mmio.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_decode.h>
> +#include <trace/events/kvm.h>
> +
> +#include "trace.h"
> +
> +/**
> + * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation
> + * @vcpu: The VCPU pointer
> + * @run:  The VCPU run struct containing the mmio data
> + *
> + * This should only be called after returning from userspace for MMIO load
> + * emulation.
> + */
> +int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> +	__u32 *dest;
> +	unsigned int len;
> +	int mask;
> +
> +	if (!run->mmio.is_write) {
> +		dest = vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt);
> +		memset(dest, 0, sizeof(int));
> +
> +		len = run->mmio.len;
> +		if (len > 4)
> +			return -EINVAL;
> +
> +		memcpy(dest, run->mmio.data, len);
> +
> +		trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
> +				*((u64 *)run->mmio.data));
> +
> +		if (vcpu->arch.mmio_decode.sign_extend && len < 4) {
> +			mask = 1U << ((len * 8) - 1);
> +			*dest = (*dest ^ mask) - mask;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +		      struct kvm_exit_mmio *mmio)
> +{
> +	unsigned long rt, len;
> +	bool is_write, sign_extend;
> +
> +	if ((vcpu->arch.hsr >> 8) & 1) {
> +		/* cache operation on I/O addr, tell guest unsupported */
> +		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
> +		return 1;
> +	}
> +
> +	if ((vcpu->arch.hsr >> 7) & 1) {
> +		/* page table accesses IO mem: tell guest to fix its TTBR */
> +		kvm_inject_dabt(vcpu, vcpu->arch.hxfar);
> +		return 1;
> +	}
> +
> +	switch ((vcpu->arch.hsr >> 22) & 0x3) {
> +	case 0:
> +		len = 1;
> +		break;
> +	case 1:
> +		len = 2;
> +		break;
> +	case 2:
> +		len = 4;
> +		break;
> +	default:
> +		kvm_err("Hardware is weird: SAS 0b11 is reserved\n");
> +		return -EFAULT;
> +	}
> +
> +	is_write = vcpu->arch.hsr & HSR_WNR;
> +	sign_extend = vcpu->arch.hsr & HSR_SSE;
> +	rt = (vcpu->arch.hsr & HSR_SRT_MASK) >> HSR_SRT_SHIFT;
> +
> +	if (kvm_vcpu_reg_is_pc(vcpu, rt)) {
> +		/* IO memory trying to read/write pc */
> +		kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
> +		return 1;
> +	}
> +
> +	mmio->is_write = is_write;
> +	mmio->phys_addr = fault_ipa;
> +	mmio->len = len;
> +	vcpu->arch.mmio_decode.sign_extend = sign_extend;
> +	vcpu->arch.mmio_decode.rt = rt;
> +
> +	/*
> +	 * The MMIO instruction is emulated and should not be re-executed
> +	 * in the guest.
> +	 */
> +	kvm_skip_instr(vcpu, (vcpu->arch.hsr >> 25) & 1);
> +	return 0;
> +}
> +
> +int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
> +{
> +	struct kvm_exit_mmio mmio;
> +	unsigned long rt;
> +	int ret;
> +
> +	/*
> +	 * Prepare MMIO operation. First stash it in a private
> +	 * structure that we can use for in-kernel emulation. If the
> +	 * kernel can't handle it, copy it into run->mmio and let user
> +	 * space do its magic.
> +	 */
> +
> +	if (vcpu->arch.hsr & HSR_ISV) {
> +		ret = decode_hsr(vcpu, fault_ipa, &mmio);
> +		if (ret)
> +			return ret;
> +	} else {
> +		ret = kvm_emulate_mmio_ls(vcpu, fault_ipa, &mmio);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	rt = vcpu->arch.mmio_decode.rt;
> +	trace_kvm_mmio((mmio.is_write) ? KVM_TRACE_MMIO_WRITE :
> +					 KVM_TRACE_MMIO_READ_UNSATISFIED,
> +			mmio.len, fault_ipa,
> +			(mmio.is_write) ? *vcpu_reg(vcpu, rt) : 0);
> +
> +	if (mmio.is_write)
> +		memcpy(mmio.data, vcpu_reg(vcpu, rt), mmio.len);
> +
> +	kvm_prepare_mmio(run, &mmio);
> +	return 0;
> +}
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 0ce0e77..2a83ac9 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -19,11 +19,13 @@
>  #include <linux/mman.h>
>  #include <linux/kvm_host.h>
>  #include <linux/io.h>
> +#include <trace/events/kvm.h>
>  #include <asm/idmap.h>
>  #include <asm/pgalloc.h>
>  #include <asm/cacheflush.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_mmio.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/mach/map.h>
> @@ -620,8 +622,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  			return -EFAULT;
>  		}
>  
> -		kvm_pr_unimpl("I/O address abort...");
> -		return 0;
> +		/* Adjust page offset */
> +		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
> +		return io_mem_abort(vcpu, run, fault_ipa, memslot);
>  	}
>  
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index 5d65751..cd52640 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -90,6 +90,27 @@ TRACE_EVENT(kvm_irq_line,
>  		  __entry->type, __entry->vcpu_idx, __entry->irq_num, __entry->level)
>  );
>  
> +TRACE_EVENT(kvm_mmio_emulate,
> +	TP_PROTO(unsigned long vcpu_pc, unsigned long instr,
> +		 unsigned long cpsr),
> +	TP_ARGS(vcpu_pc, instr, cpsr),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	vcpu_pc		)
> +		__field(	unsigned long,	instr		)
> +		__field(	unsigned long,	cpsr		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu_pc		= vcpu_pc;
> +		__entry->instr			= instr;
> +		__entry->cpsr			= cpsr;
> +	),
> +
> +	TP_printk("Emulate MMIO at: 0x%08lx (instr: %08lx, cpsr: %08lx)",
> +		  __entry->vcpu_pc, __entry->instr, __entry->cpsr)
> +);
> +
>  /* Architecturally implementation defined CP15 register access */
>  TRACE_EVENT(kvm_emulate_cp15_imp,
>  	TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-15 13:18     ` Gleb Natapov
@ 2013-01-15 13:29       ` Marc Zyngier
  -1 siblings, 0 replies; 160+ messages in thread
From: Marc Zyngier @ 2013-01-15 13:29 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christoffer Dall, kvm, linux-arm-kernel, kvmarm, Marcelo Tosatti,
	Rusty Russell

On 15/01/13 13:18, Gleb Natapov wrote:
> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
>> When the guest accesses I/O memory this will create data abort
>> exceptions and they are handled by decoding the HSR information
>> (physical address, read/write, length, register) and forwarding reads
>> and writes to QEMU which performs the device emulation.
>>
>> Certain classes of load/store operations do not support the syndrome
>> information provided in the HSR and we therefore must be able to fetch
>> the offending instruction from guest memory and decode it manually.
>>
>> We only support instruction decoding for valid reasonable MMIO operations
>> where trapping them do not provide sufficient information in the HSR (no
>> 16-bit Thumb instructions provide register writeback that we care about).
>>
>> The following instruction types are NOT supported for MMIO operations
>> despite the HSR not containing decode info:
>>  - any Load/Store multiple
>>  - any load/store exclusive
>>  - any load/store dual
>>  - anything with the PC as the dest register
>>
>> This requires changing the general flow somewhat since new calls to run
>> the VCPU must check if there's a pending MMIO load and perform the write
>> after userspace has made the data available.
>>
>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
>> (1) Guest complicated mmio instruction traps.
>> (2) The hardware doesn't tell us enough, so we need to read the actual
>>     instruction which was being exectuted.
>> (3) KVM maps the instruction virtual address to a physical address.
>> (4) The guest (SMP) swaps out that page, and fills it with something else.
>> (5) We read the physical address, but now that's the wrong thing.
> How can this happen?! The guest cannot reuse physical page before it
> flushes it from all vcpus tlb cache. For that it needs to send
> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
> while it does emulation.

I don't know how this works on x86, but a KVM/ARM guest can definitely
handle an IPI.

Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
we're doing some set/way operation which is handled separately).

	M.
-- 
Jazz is not dead. It just smells funny...


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-15 13:29       ` Marc Zyngier
  0 siblings, 0 replies; 160+ messages in thread
From: Marc Zyngier @ 2013-01-15 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 15/01/13 13:18, Gleb Natapov wrote:
> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
>> When the guest accesses I/O memory this will create data abort
>> exceptions and they are handled by decoding the HSR information
>> (physical address, read/write, length, register) and forwarding reads
>> and writes to QEMU which performs the device emulation.
>>
>> Certain classes of load/store operations do not support the syndrome
>> information provided in the HSR and we therefore must be able to fetch
>> the offending instruction from guest memory and decode it manually.
>>
>> We only support instruction decoding for valid reasonable MMIO operations
>> where trapping them do not provide sufficient information in the HSR (no
>> 16-bit Thumb instructions provide register writeback that we care about).
>>
>> The following instruction types are NOT supported for MMIO operations
>> despite the HSR not containing decode info:
>>  - any Load/Store multiple
>>  - any load/store exclusive
>>  - any load/store dual
>>  - anything with the PC as the dest register
>>
>> This requires changing the general flow somewhat since new calls to run
>> the VCPU must check if there's a pending MMIO load and perform the write
>> after userspace has made the data available.
>>
>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
>> (1) Guest complicated mmio instruction traps.
>> (2) The hardware doesn't tell us enough, so we need to read the actual
>>     instruction which was being exectuted.
>> (3) KVM maps the instruction virtual address to a physical address.
>> (4) The guest (SMP) swaps out that page, and fills it with something else.
>> (5) We read the physical address, but now that's the wrong thing.
> How can this happen?! The guest cannot reuse physical page before it
> flushes it from all vcpus tlb cache. For that it needs to send
> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
> while it does emulation.

I don't know how this works on x86, but a KVM/ARM guest can definitely
handle an IPI.

Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
we're doing some set/way operation which is handled separately).

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-14 22:17       ` Christoffer Dall
@ 2013-01-15 13:32         ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 13:32 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

On Mon, Jan 14, 2013 at 05:17:31PM -0500, Christoffer Dall wrote:
> On Mon, Jan 14, 2013 at 1:49 PM, Gleb Natapov <gleb@redhat.com> wrote:
> > A couple of general question about ABI. If they were already answered
> > just refer me to the previous discussion.
> >
> > On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> index a4df553..4237c27 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -293,7 +293,7 @@ kvm_run' (see below).
> >>  4.11 KVM_GET_REGS
> >>
> >>  Capability: basic
> >> -Architectures: all
> >> +Architectures: all except ARM
> >>  Type: vcpu ioctl
> >>  Parameters: struct kvm_regs (out)
> >>  Returns: 0 on success, -1 on error
> >> @@ -314,7 +314,7 @@ struct kvm_regs {
> >>  4.12 KVM_SET_REGS
> >>
> >>  Capability: basic
> >> -Architectures: all
> >> +Architectures: all except ARM
> >>  Type: vcpu ioctl
> >>  Parameters: struct kvm_regs (in)
> >>  Returns: 0 on success, -1 on error
> >> @@ -600,7 +600,7 @@ struct kvm_fpu {
> >>  4.24 KVM_CREATE_IRQCHIP
> > Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?
> >
> 
> We use the ONE_REG API instead and we don't want to support two
> separate APIs to user space.
> 
I suppose fetching all registers is not anywhere on a fast path in
userspace :)

> >>
> >>  Capability: KVM_CAP_IRQCHIP
> >> -Architectures: x86, ia64
> >> +Architectures: x86, ia64, ARM
> >>  Type: vm ioctl
> >>  Parameters: none
> >>  Returns: 0 on success, -1 on error
> >> @@ -608,7 +608,8 @@ Returns: 0 on success, -1 on error
> >>  Creates an interrupt controller model in the kernel.  On x86, creates a virtual
> >>  ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
> >>  local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
> >> -only go to the IOAPIC.  On ia64, a IOSAPIC is created.
> >> +only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM, a GIC is
> >> +created.
> >>
> >>
> >>  4.25 KVM_IRQ_LINE
> >> @@ -1775,6 +1776,14 @@ registers, find a list below:
> >>    PPC   | KVM_REG_PPC_VPA_DTL   | 128
> >>    PPC   | KVM_REG_PPC_EPCR   | 32
> >>
> >> +ARM registers are mapped using the lower 32 bits.  The upper 16 of that
> >> +is the register group type, or coprocessor number:
> >> +
> >> +ARM core registers have the following id bit patterns:
> >> +  0x4002 0000 0010 <index into the kvm_regs struct:16>
> >> +
> >> +
> >> +
> >>  4.69 KVM_GET_ONE_REG
> >>
> >>  Capability: KVM_CAP_ONE_REG
> >> @@ -2127,6 +2136,46 @@ written, then `n_invalid' invalid entries, invalidating any previously
> >>  valid entries found.
> >>
> >>
> >> +4.77 KVM_ARM_VCPU_INIT
> >> +
> >> +Capability: basic
> >> +Architectures: arm
> >> +Type: vcpu ioctl
> >> +Parameters: struct struct kvm_vcpu_init (in)
> >> +Returns: 0 on success; -1 on error
> >> +Errors:
> >> +  EINVAL:    the target is unknown, or the combination of features is invalid.
> >> +  ENOENT:    a features bit specified is unknown.
> >> +
> >> +This tells KVM what type of CPU to present to the guest, and what
> >> +optional features it should have.  This will cause a reset of the cpu
> >> +registers to their initial values.  If this is not called, KVM_RUN will
> >> +return ENOEXEC for that vcpu.
> >> +
> > Can different vcpus of the same VM be of different type?
> >
> 
> In the future yes. For example, if we ever want to virtualize a
> Big.Little system.
> 
> >> +Note that because some registers reflect machine topology, all vcpus
> >> +should be created before this ioctl is invoked.
> > How cpu hot plug suppose to work?
> >
> 
> Those CPUs would be added from the beginning, but not powered on, and
> would be powered on later on, I suppose.  See
> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004617.html.
> 
When we suggested similar "hot plug" for x86, people started screaming
how they suppose to know when they create a VM how much vcpus they will
need in the future. In short people who are asking for hot plug (on x86
at least) do not like such solution.

> 
> >> +
> >> +
> >> +4.78 KVM_GET_REG_LIST
> >> +
> >> +Capability: basic
> >> +Architectures: arm
> >> +Type: vcpu ioctl
> >> +Parameters: struct kvm_reg_list (in/out)
> >> +Returns: 0 on success; -1 on error
> >> +Errors:
> >> +  E2BIG:     the reg index list is too big to fit in the array specified by
> >> +             the user (the number required will be written into n).
> >> +
> >> +struct kvm_reg_list {
> >> +     __u64 n; /* number of registers in reg[] */
> >> +     __u64 reg[0];
> >> +};
> >> +
> >> +This ioctl returns the guest registers that are supported for the
> >> +KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
> >> +
> >> +
> > Doesn't userspace know what registers are supported by each CPU type?
> >
> It would know about core registers, but there is a huge space of
> co-processors, and we don't emulate all of them or support
> getting/setting all of them yet. Surely this is something that will
> change over time and we want user space to be able to discover the
> available registers for backwards compatibility, migration, etc.
> 
> -Christoffer

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-15 13:32         ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 14, 2013 at 05:17:31PM -0500, Christoffer Dall wrote:
> On Mon, Jan 14, 2013 at 1:49 PM, Gleb Natapov <gleb@redhat.com> wrote:
> > A couple of general question about ABI. If they were already answered
> > just refer me to the previous discussion.
> >
> > On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> index a4df553..4237c27 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -293,7 +293,7 @@ kvm_run' (see below).
> >>  4.11 KVM_GET_REGS
> >>
> >>  Capability: basic
> >> -Architectures: all
> >> +Architectures: all except ARM
> >>  Type: vcpu ioctl
> >>  Parameters: struct kvm_regs (out)
> >>  Returns: 0 on success, -1 on error
> >> @@ -314,7 +314,7 @@ struct kvm_regs {
> >>  4.12 KVM_SET_REGS
> >>
> >>  Capability: basic
> >> -Architectures: all
> >> +Architectures: all except ARM
> >>  Type: vcpu ioctl
> >>  Parameters: struct kvm_regs (in)
> >>  Returns: 0 on success, -1 on error
> >> @@ -600,7 +600,7 @@ struct kvm_fpu {
> >>  4.24 KVM_CREATE_IRQCHIP
> > Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?
> >
> 
> We use the ONE_REG API instead and we don't want to support two
> separate APIs to user space.
> 
I suppose fetching all registers is not anywhere on a fast path in
userspace :)

> >>
> >>  Capability: KVM_CAP_IRQCHIP
> >> -Architectures: x86, ia64
> >> +Architectures: x86, ia64, ARM
> >>  Type: vm ioctl
> >>  Parameters: none
> >>  Returns: 0 on success, -1 on error
> >> @@ -608,7 +608,8 @@ Returns: 0 on success, -1 on error
> >>  Creates an interrupt controller model in the kernel.  On x86, creates a virtual
> >>  ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
> >>  local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
> >> -only go to the IOAPIC.  On ia64, a IOSAPIC is created.
> >> +only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM, a GIC is
> >> +created.
> >>
> >>
> >>  4.25 KVM_IRQ_LINE
> >> @@ -1775,6 +1776,14 @@ registers, find a list below:
> >>    PPC   | KVM_REG_PPC_VPA_DTL   | 128
> >>    PPC   | KVM_REG_PPC_EPCR   | 32
> >>
> >> +ARM registers are mapped using the lower 32 bits.  The upper 16 of that
> >> +is the register group type, or coprocessor number:
> >> +
> >> +ARM core registers have the following id bit patterns:
> >> +  0x4002 0000 0010 <index into the kvm_regs struct:16>
> >> +
> >> +
> >> +
> >>  4.69 KVM_GET_ONE_REG
> >>
> >>  Capability: KVM_CAP_ONE_REG
> >> @@ -2127,6 +2136,46 @@ written, then `n_invalid' invalid entries, invalidating any previously
> >>  valid entries found.
> >>
> >>
> >> +4.77 KVM_ARM_VCPU_INIT
> >> +
> >> +Capability: basic
> >> +Architectures: arm
> >> +Type: vcpu ioctl
> >> +Parameters: struct struct kvm_vcpu_init (in)
> >> +Returns: 0 on success; -1 on error
> >> +Errors:
> >> +  EINVAL:    the target is unknown, or the combination of features is invalid.
> >> +  ENOENT:    a features bit specified is unknown.
> >> +
> >> +This tells KVM what type of CPU to present to the guest, and what
> >> +optional features it should have.  This will cause a reset of the cpu
> >> +registers to their initial values.  If this is not called, KVM_RUN will
> >> +return ENOEXEC for that vcpu.
> >> +
> > Can different vcpus of the same VM be of different type?
> >
> 
> In the future yes. For example, if we ever want to virtualize a
> Big.Little system.
> 
> >> +Note that because some registers reflect machine topology, all vcpus
> >> +should be created before this ioctl is invoked.
> > How cpu hot plug suppose to work?
> >
> 
> Those CPUs would be added from the beginning, but not powered on, and
> would be powered on later on, I suppose.  See
> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004617.html.
> 
When we suggested similar "hot plug" for x86, people started screaming
how they suppose to know when they create a VM how much vcpus they will
need in the future. In short people who are asking for hot plug (on x86
at least) do not like such solution.

> 
> >> +
> >> +
> >> +4.78 KVM_GET_REG_LIST
> >> +
> >> +Capability: basic
> >> +Architectures: arm
> >> +Type: vcpu ioctl
> >> +Parameters: struct kvm_reg_list (in/out)
> >> +Returns: 0 on success; -1 on error
> >> +Errors:
> >> +  E2BIG:     the reg index list is too big to fit in the array specified by
> >> +             the user (the number required will be written into n).
> >> +
> >> +struct kvm_reg_list {
> >> +     __u64 n; /* number of registers in reg[] */
> >> +     __u64 reg[0];
> >> +};
> >> +
> >> +This ioctl returns the guest registers that are supported for the
> >> +KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
> >> +
> >> +
> > Doesn't userspace know what registers are supported by each CPU type?
> >
> It would know about core registers, but there is a huge space of
> co-processors, and we don't emulate all of them or support
> getting/setting all of them yet. Surely this is something that will
> change over time and we want user space to be able to discover the
> available registers for backwards compatibility, migration, etc.
> 
> -Christoffer

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-15 13:29       ` Marc Zyngier
@ 2013-01-15 13:34         ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 13:34 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, kvm, linux-arm-kernel, kvmarm, Marcelo Tosatti,
	Rusty Russell

On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
> On 15/01/13 13:18, Gleb Natapov wrote:
> > On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> >> When the guest accesses I/O memory this will create data abort
> >> exceptions and they are handled by decoding the HSR information
> >> (physical address, read/write, length, register) and forwarding reads
> >> and writes to QEMU which performs the device emulation.
> >>
> >> Certain classes of load/store operations do not support the syndrome
> >> information provided in the HSR and we therefore must be able to fetch
> >> the offending instruction from guest memory and decode it manually.
> >>
> >> We only support instruction decoding for valid reasonable MMIO operations
> >> where trapping them do not provide sufficient information in the HSR (no
> >> 16-bit Thumb instructions provide register writeback that we care about).
> >>
> >> The following instruction types are NOT supported for MMIO operations
> >> despite the HSR not containing decode info:
> >>  - any Load/Store multiple
> >>  - any load/store exclusive
> >>  - any load/store dual
> >>  - anything with the PC as the dest register
> >>
> >> This requires changing the general flow somewhat since new calls to run
> >> the VCPU must check if there's a pending MMIO load and perform the write
> >> after userspace has made the data available.
> >>
> >> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
> >> (1) Guest complicated mmio instruction traps.
> >> (2) The hardware doesn't tell us enough, so we need to read the actual
> >>     instruction which was being exectuted.
> >> (3) KVM maps the instruction virtual address to a physical address.
> >> (4) The guest (SMP) swaps out that page, and fills it with something else.
> >> (5) We read the physical address, but now that's the wrong thing.
> > How can this happen?! The guest cannot reuse physical page before it
> > flushes it from all vcpus tlb cache. For that it needs to send
> > synchronous IPI to all vcpus and IPI will not be processed by a vcpu
> > while it does emulation.
> 
> I don't know how this works on x86, but a KVM/ARM guest can definitely
> handle an IPI.
> 
How can a vcpu handle an IPI while it is not in a guest mode?

> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
> we're doing some set/way operation which is handled separately).
> 
What prevents a page to be swapped out while code is fetched from it?
 
--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-15 13:34         ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 13:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
> On 15/01/13 13:18, Gleb Natapov wrote:
> > On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> >> When the guest accesses I/O memory this will create data abort
> >> exceptions and they are handled by decoding the HSR information
> >> (physical address, read/write, length, register) and forwarding reads
> >> and writes to QEMU which performs the device emulation.
> >>
> >> Certain classes of load/store operations do not support the syndrome
> >> information provided in the HSR and we therefore must be able to fetch
> >> the offending instruction from guest memory and decode it manually.
> >>
> >> We only support instruction decoding for valid reasonable MMIO operations
> >> where trapping them do not provide sufficient information in the HSR (no
> >> 16-bit Thumb instructions provide register writeback that we care about).
> >>
> >> The following instruction types are NOT supported for MMIO operations
> >> despite the HSR not containing decode info:
> >>  - any Load/Store multiple
> >>  - any load/store exclusive
> >>  - any load/store dual
> >>  - anything with the PC as the dest register
> >>
> >> This requires changing the general flow somewhat since new calls to run
> >> the VCPU must check if there's a pending MMIO load and perform the write
> >> after userspace has made the data available.
> >>
> >> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
> >> (1) Guest complicated mmio instruction traps.
> >> (2) The hardware doesn't tell us enough, so we need to read the actual
> >>     instruction which was being exectuted.
> >> (3) KVM maps the instruction virtual address to a physical address.
> >> (4) The guest (SMP) swaps out that page, and fills it with something else.
> >> (5) We read the physical address, but now that's the wrong thing.
> > How can this happen?! The guest cannot reuse physical page before it
> > flushes it from all vcpus tlb cache. For that it needs to send
> > synchronous IPI to all vcpus and IPI will not be processed by a vcpu
> > while it does emulation.
> 
> I don't know how this works on x86, but a KVM/ARM guest can definitely
> handle an IPI.
> 
How can a vcpu handle an IPI while it is not in a guest mode?

> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
> we're doing some set/way operation which is handled separately).
> 
What prevents a page to be swapped out while code is fetched from it?
 
--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-15 13:32         ` Gleb Natapov
@ 2013-01-15 13:43           ` Alexander Graf
  -1 siblings, 0 replies; 160+ messages in thread
From: Alexander Graf @ 2013-01-15 13:43 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christoffer Dall, Marcelo Tosatti, kvm, kvmarm, linux-arm-kernel


On 15.01.2013, at 14:32, Gleb Natapov wrote:

> On Mon, Jan 14, 2013 at 05:17:31PM -0500, Christoffer Dall wrote:
>> On Mon, Jan 14, 2013 at 1:49 PM, Gleb Natapov <gleb@redhat.com> wrote:
>>> A couple of general question about ABI. If they were already answered
>>> just refer me to the previous discussion.
>>> 
>>> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>> index a4df553..4237c27 100644
>>>> --- a/Documentation/virtual/kvm/api.txt
>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>> @@ -293,7 +293,7 @@ kvm_run' (see below).
>>>> 4.11 KVM_GET_REGS
>>>> 
>>>> Capability: basic
>>>> -Architectures: all
>>>> +Architectures: all except ARM
>>>> Type: vcpu ioctl
>>>> Parameters: struct kvm_regs (out)
>>>> Returns: 0 on success, -1 on error
>>>> @@ -314,7 +314,7 @@ struct kvm_regs {
>>>> 4.12 KVM_SET_REGS
>>>> 
>>>> Capability: basic
>>>> -Architectures: all
>>>> +Architectures: all except ARM
>>>> Type: vcpu ioctl
>>>> Parameters: struct kvm_regs (in)
>>>> Returns: 0 on success, -1 on error
>>>> @@ -600,7 +600,7 @@ struct kvm_fpu {
>>>> 4.24 KVM_CREATE_IRQCHIP
>>> Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?
>>> 
>> 
>> We use the ONE_REG API instead and we don't want to support two
>> separate APIs to user space.
>> 
> I suppose fetching all registers is not anywhere on a fast path in
> userspace :)

If it's ever going to be in a fast path, we will add MULTI_REG which will feature an array of ONE_REG structs to fetch multiple registers at once.


Alex


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-15 13:43           ` Alexander Graf
  0 siblings, 0 replies; 160+ messages in thread
From: Alexander Graf @ 2013-01-15 13:43 UTC (permalink / raw)
  To: linux-arm-kernel


On 15.01.2013, at 14:32, Gleb Natapov wrote:

> On Mon, Jan 14, 2013 at 05:17:31PM -0500, Christoffer Dall wrote:
>> On Mon, Jan 14, 2013 at 1:49 PM, Gleb Natapov <gleb@redhat.com> wrote:
>>> A couple of general question about ABI. If they were already answered
>>> just refer me to the previous discussion.
>>> 
>>> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>> index a4df553..4237c27 100644
>>>> --- a/Documentation/virtual/kvm/api.txt
>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>> @@ -293,7 +293,7 @@ kvm_run' (see below).
>>>> 4.11 KVM_GET_REGS
>>>> 
>>>> Capability: basic
>>>> -Architectures: all
>>>> +Architectures: all except ARM
>>>> Type: vcpu ioctl
>>>> Parameters: struct kvm_regs (out)
>>>> Returns: 0 on success, -1 on error
>>>> @@ -314,7 +314,7 @@ struct kvm_regs {
>>>> 4.12 KVM_SET_REGS
>>>> 
>>>> Capability: basic
>>>> -Architectures: all
>>>> +Architectures: all except ARM
>>>> Type: vcpu ioctl
>>>> Parameters: struct kvm_regs (in)
>>>> Returns: 0 on success, -1 on error
>>>> @@ -600,7 +600,7 @@ struct kvm_fpu {
>>>> 4.24 KVM_CREATE_IRQCHIP
>>> Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?
>>> 
>> 
>> We use the ONE_REG API instead and we don't want to support two
>> separate APIs to user space.
>> 
> I suppose fetching all registers is not anywhere on a fast path in
> userspace :)

If it's ever going to be in a fast path, we will add MULTI_REG which will feature an array of ONE_REG structs to fetch multiple registers at once.


Alex

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-15 13:34         ` Gleb Natapov
@ 2013-01-15 13:46           ` Marc Zyngier
  -1 siblings, 0 replies; 160+ messages in thread
From: Marc Zyngier @ 2013-01-15 13:46 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christoffer Dall, kvm, linux-arm-kernel, kvmarm, Marcelo Tosatti,
	Rusty Russell

On 15/01/13 13:34, Gleb Natapov wrote:
> On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
>> On 15/01/13 13:18, Gleb Natapov wrote:
>>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
>>>> When the guest accesses I/O memory this will create data abort
>>>> exceptions and they are handled by decoding the HSR information
>>>> (physical address, read/write, length, register) and forwarding reads
>>>> and writes to QEMU which performs the device emulation.
>>>>
>>>> Certain classes of load/store operations do not support the syndrome
>>>> information provided in the HSR and we therefore must be able to fetch
>>>> the offending instruction from guest memory and decode it manually.
>>>>
>>>> We only support instruction decoding for valid reasonable MMIO operations
>>>> where trapping them do not provide sufficient information in the HSR (no
>>>> 16-bit Thumb instructions provide register writeback that we care about).
>>>>
>>>> The following instruction types are NOT supported for MMIO operations
>>>> despite the HSR not containing decode info:
>>>>  - any Load/Store multiple
>>>>  - any load/store exclusive
>>>>  - any load/store dual
>>>>  - anything with the PC as the dest register
>>>>
>>>> This requires changing the general flow somewhat since new calls to run
>>>> the VCPU must check if there's a pending MMIO load and perform the write
>>>> after userspace has made the data available.
>>>>
>>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
>>>> (1) Guest complicated mmio instruction traps.
>>>> (2) The hardware doesn't tell us enough, so we need to read the actual
>>>>     instruction which was being exectuted.
>>>> (3) KVM maps the instruction virtual address to a physical address.
>>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
>>>> (5) We read the physical address, but now that's the wrong thing.
>>> How can this happen?! The guest cannot reuse physical page before it
>>> flushes it from all vcpus tlb cache. For that it needs to send
>>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
>>> while it does emulation.
>>
>> I don't know how this works on x86, but a KVM/ARM guest can definitely
>> handle an IPI.
>>
> How can a vcpu handle an IPI while it is not in a guest mode?

I think there is some misunderstanding. A guest IPI is of course handled
while running the guest. You completely lost me here.

>> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
>> we're doing some set/way operation which is handled separately).
>>
> What prevents a page to be swapped out while code is fetched from it?

Why would you prevent it? TLB invalidation is broadcast by the HW. If
you swap a page out, you flag the entry as invalid and invalidate the
corresponding TLB. If you hit it, you swap the page back in.

	M.
-- 
Jazz is not dead. It just smells funny...


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-15 13:46           ` Marc Zyngier
  0 siblings, 0 replies; 160+ messages in thread
From: Marc Zyngier @ 2013-01-15 13:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 15/01/13 13:34, Gleb Natapov wrote:
> On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
>> On 15/01/13 13:18, Gleb Natapov wrote:
>>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
>>>> When the guest accesses I/O memory this will create data abort
>>>> exceptions and they are handled by decoding the HSR information
>>>> (physical address, read/write, length, register) and forwarding reads
>>>> and writes to QEMU which performs the device emulation.
>>>>
>>>> Certain classes of load/store operations do not support the syndrome
>>>> information provided in the HSR and we therefore must be able to fetch
>>>> the offending instruction from guest memory and decode it manually.
>>>>
>>>> We only support instruction decoding for valid reasonable MMIO operations
>>>> where trapping them do not provide sufficient information in the HSR (no
>>>> 16-bit Thumb instructions provide register writeback that we care about).
>>>>
>>>> The following instruction types are NOT supported for MMIO operations
>>>> despite the HSR not containing decode info:
>>>>  - any Load/Store multiple
>>>>  - any load/store exclusive
>>>>  - any load/store dual
>>>>  - anything with the PC as the dest register
>>>>
>>>> This requires changing the general flow somewhat since new calls to run
>>>> the VCPU must check if there's a pending MMIO load and perform the write
>>>> after userspace has made the data available.
>>>>
>>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
>>>> (1) Guest complicated mmio instruction traps.
>>>> (2) The hardware doesn't tell us enough, so we need to read the actual
>>>>     instruction which was being exectuted.
>>>> (3) KVM maps the instruction virtual address to a physical address.
>>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
>>>> (5) We read the physical address, but now that's the wrong thing.
>>> How can this happen?! The guest cannot reuse physical page before it
>>> flushes it from all vcpus tlb cache. For that it needs to send
>>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
>>> while it does emulation.
>>
>> I don't know how this works on x86, but a KVM/ARM guest can definitely
>> handle an IPI.
>>
> How can a vcpu handle an IPI while it is not in a guest mode?

I think there is some misunderstanding. A guest IPI is of course handled
while running the guest. You completely lost me here.

>> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
>> we're doing some set/way operation which is handled separately).
>>
> What prevents a page to be swapped out while code is fetched from it?

Why would you prevent it? TLB invalidation is broadcast by the HW. If
you swap a page out, you flag the entry as invalid and invalidate the
corresponding TLB. If you hit it, you swap the page back in.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2013-01-15 12:52         ` Gleb Natapov
@ 2013-01-15 14:04           ` Peter Maydell
  -1 siblings, 0 replies; 160+ messages in thread
From: Peter Maydell @ 2013-01-15 14:04 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christoffer Dall, Marcelo Tosatti, linux-arm-kernel, kvm, kvmarm

On 15 January 2013 12:52, Gleb Natapov <gleb@redhat.com> wrote:
> On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
>> On 15 January 2013 09:56, Gleb Natapov <gleb@redhat.com> wrote:
>> >> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
>> > CPU level interrupt should use KVM_INTERRUPT instead.
>>
>> No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
>> delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
>> can be fed to the kernel asynchronously. It happens that on x86 "must be
>> delivered synchronously" and "not going to in kernel irqchip" are the same, but
>> this isn't true for other archs. For ARM all our interrupts can be fed
>> to the kernel asynchronously, and so we use KVM_IRQ_LINE in all
>> cases.

> I do no quite understand what you mean by synchronously and
> asynchronously.

Synchronously: the vcpu has to be stopped and userspace then
feeds in the interrupt to be taken when the guest is resumed.
Asynchronously: any old thread can tell the kernel there's an
interrupt, and the guest vcpu then deals with it when needed
(the vcpu thread may leave the guest but doesn't come out of
the host kernel to qemu).

> The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
> is that former is used when destination cpu is known to userspace later
> is used when kernel code is involved in figuring out the destination.

This doesn't match up with Avi's explanation at all.

> The
> injections themselves are currently synchronous for both of them on x86
> and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
> be injected into a guest and vcpu state is changed to inject interrupt
> during next guest entry. In the near feature x86 will be able to inject
> interrupt without kicking vcpu out from the guest mode does ARM plan to
> do the same? For GIC interrupts or for IRQ/FIQ or for both?
>
>> There was a big discussion thread about this on kvm and qemu-devel last
>> July (and we cleaned up some of the QEMU code to not smoosh together
>> all these different concepts under "do I have an irqchip or not?").
> Do you have a pointer?

  http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02460.html
and there was a later longer (but less clear) thread which included
this mail from Avi:
  http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02872.html
basically explaining that the reason for the weird synchronous
KVM_INTERRUPT API is that it's emulating a weird synchronous
hardware interface which is specific to x86. ARM doesn't have
a synchronous interface in the same way, so it's much more
straightforward to use KVM_IRQ_LINE.

-- PMM

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2013-01-15 14:04           ` Peter Maydell
  0 siblings, 0 replies; 160+ messages in thread
From: Peter Maydell @ 2013-01-15 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 January 2013 12:52, Gleb Natapov <gleb@redhat.com> wrote:
> On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
>> On 15 January 2013 09:56, Gleb Natapov <gleb@redhat.com> wrote:
>> >> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
>> > CPU level interrupt should use KVM_INTERRUPT instead.
>>
>> No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
>> delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
>> can be fed to the kernel asynchronously. It happens that on x86 "must be
>> delivered synchronously" and "not going to in kernel irqchip" are the same, but
>> this isn't true for other archs. For ARM all our interrupts can be fed
>> to the kernel asynchronously, and so we use KVM_IRQ_LINE in all
>> cases.

> I do no quite understand what you mean by synchronously and
> asynchronously.

Synchronously: the vcpu has to be stopped and userspace then
feeds in the interrupt to be taken when the guest is resumed.
Asynchronously: any old thread can tell the kernel there's an
interrupt, and the guest vcpu then deals with it when needed
(the vcpu thread may leave the guest but doesn't come out of
the host kernel to qemu).

> The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
> is that former is used when destination cpu is known to userspace later
> is used when kernel code is involved in figuring out the destination.

This doesn't match up with Avi's explanation at all.

> The
> injections themselves are currently synchronous for both of them on x86
> and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
> be injected into a guest and vcpu state is changed to inject interrupt
> during next guest entry. In the near feature x86 will be able to inject
> interrupt without kicking vcpu out from the guest mode does ARM plan to
> do the same? For GIC interrupts or for IRQ/FIQ or for both?
>
>> There was a big discussion thread about this on kvm and qemu-devel last
>> July (and we cleaned up some of the QEMU code to not smoosh together
>> all these different concepts under "do I have an irqchip or not?").
> Do you have a pointer?

  http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02460.html
and there was a later longer (but less clear) thread which included
this mail from Avi:
  http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02872.html
basically explaining that the reason for the weird synchronous
KVM_INTERRUPT API is that it's emulating a weird synchronous
hardware interface which is specific to x86. ARM doesn't have
a synchronous interface in the same way, so it's much more
straightforward to use KVM_IRQ_LINE.

-- PMM

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-15 13:46           ` Marc Zyngier
@ 2013-01-15 14:27             ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 14:27 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, kvm, linux-arm-kernel, kvmarm, Marcelo Tosatti,
	Rusty Russell

On Tue, Jan 15, 2013 at 01:46:04PM +0000, Marc Zyngier wrote:
> On 15/01/13 13:34, Gleb Natapov wrote:
> > On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
> >> On 15/01/13 13:18, Gleb Natapov wrote:
> >>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> >>>> When the guest accesses I/O memory this will create data abort
> >>>> exceptions and they are handled by decoding the HSR information
> >>>> (physical address, read/write, length, register) and forwarding reads
> >>>> and writes to QEMU which performs the device emulation.
> >>>>
> >>>> Certain classes of load/store operations do not support the syndrome
> >>>> information provided in the HSR and we therefore must be able to fetch
> >>>> the offending instruction from guest memory and decode it manually.
> >>>>
> >>>> We only support instruction decoding for valid reasonable MMIO operations
> >>>> where trapping them do not provide sufficient information in the HSR (no
> >>>> 16-bit Thumb instructions provide register writeback that we care about).
> >>>>
> >>>> The following instruction types are NOT supported for MMIO operations
> >>>> despite the HSR not containing decode info:
> >>>>  - any Load/Store multiple
> >>>>  - any load/store exclusive
> >>>>  - any load/store dual
> >>>>  - anything with the PC as the dest register
> >>>>
> >>>> This requires changing the general flow somewhat since new calls to run
> >>>> the VCPU must check if there's a pending MMIO load and perform the write
> >>>> after userspace has made the data available.
> >>>>
> >>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
> >>>> (1) Guest complicated mmio instruction traps.
> >>>> (2) The hardware doesn't tell us enough, so we need to read the actual
> >>>>     instruction which was being exectuted.
> >>>> (3) KVM maps the instruction virtual address to a physical address.
> >>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
> >>>> (5) We read the physical address, but now that's the wrong thing.
> >>> How can this happen?! The guest cannot reuse physical page before it
> >>> flushes it from all vcpus tlb cache. For that it needs to send
> >>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
> >>> while it does emulation.
> >>
> >> I don't know how this works on x86, but a KVM/ARM guest can definitely
> >> handle an IPI.
> >>
> > How can a vcpu handle an IPI while it is not in a guest mode?
> 
> I think there is some misunderstanding. A guest IPI is of course handled
> while running the guest. You completely lost me here.
You need IPI from one guest vcpu to another to invalidate its TLB on
x86. That prevents the race from happening there.

> 
> >> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
> >> we're doing some set/way operation which is handled separately).
> >>
> > What prevents a page to be swapped out while code is fetched from it?
> 
> Why would you prevent it? TLB invalidation is broadcast by the HW. If
> you swap a page out, you flag the entry as invalid and invalidate the
> corresponding TLB. If you hit it, you swap the page back in.
> 
There is no IPI (or anything that requires response from cpu whose TLB
is invalidated) involved in invalidating remote TLB?


--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-15 14:27             ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 14:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 01:46:04PM +0000, Marc Zyngier wrote:
> On 15/01/13 13:34, Gleb Natapov wrote:
> > On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
> >> On 15/01/13 13:18, Gleb Natapov wrote:
> >>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> >>>> When the guest accesses I/O memory this will create data abort
> >>>> exceptions and they are handled by decoding the HSR information
> >>>> (physical address, read/write, length, register) and forwarding reads
> >>>> and writes to QEMU which performs the device emulation.
> >>>>
> >>>> Certain classes of load/store operations do not support the syndrome
> >>>> information provided in the HSR and we therefore must be able to fetch
> >>>> the offending instruction from guest memory and decode it manually.
> >>>>
> >>>> We only support instruction decoding for valid reasonable MMIO operations
> >>>> where trapping them do not provide sufficient information in the HSR (no
> >>>> 16-bit Thumb instructions provide register writeback that we care about).
> >>>>
> >>>> The following instruction types are NOT supported for MMIO operations
> >>>> despite the HSR not containing decode info:
> >>>>  - any Load/Store multiple
> >>>>  - any load/store exclusive
> >>>>  - any load/store dual
> >>>>  - anything with the PC as the dest register
> >>>>
> >>>> This requires changing the general flow somewhat since new calls to run
> >>>> the VCPU must check if there's a pending MMIO load and perform the write
> >>>> after userspace has made the data available.
> >>>>
> >>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
> >>>> (1) Guest complicated mmio instruction traps.
> >>>> (2) The hardware doesn't tell us enough, so we need to read the actual
> >>>>     instruction which was being exectuted.
> >>>> (3) KVM maps the instruction virtual address to a physical address.
> >>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
> >>>> (5) We read the physical address, but now that's the wrong thing.
> >>> How can this happen?! The guest cannot reuse physical page before it
> >>> flushes it from all vcpus tlb cache. For that it needs to send
> >>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
> >>> while it does emulation.
> >>
> >> I don't know how this works on x86, but a KVM/ARM guest can definitely
> >> handle an IPI.
> >>
> > How can a vcpu handle an IPI while it is not in a guest mode?
> 
> I think there is some misunderstanding. A guest IPI is of course handled
> while running the guest. You completely lost me here.
You need IPI from one guest vcpu to another to invalidate its TLB on
x86. That prevents the race from happening there.

> 
> >> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
> >> we're doing some set/way operation which is handled separately).
> >>
> > What prevents a page to be swapped out while code is fetched from it?
> 
> Why would you prevent it? TLB invalidation is broadcast by the HW. If
> you swap a page out, you flag the entry as invalid and invalidate the
> corresponding TLB. If you hit it, you swap the page back in.
> 
There is no IPI (or anything that requires response from cpu whose TLB
is invalidated) involved in invalidating remote TLB?


--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2013-01-15 14:04           ` Peter Maydell
@ 2013-01-15 14:40             ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-15 14:40 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Gleb Natapov, Marcelo Tosatti, linux-arm-kernel, kvm, kvmarm

On Tue, Jan 15, 2013 at 9:04 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 15 January 2013 12:52, Gleb Natapov <gleb@redhat.com> wrote:
>> On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
>>> On 15 January 2013 09:56, Gleb Natapov <gleb@redhat.com> wrote:
>>> >> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
>>> > CPU level interrupt should use KVM_INTERRUPT instead.
>>>
>>> No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
>>> delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
>>> can be fed to the kernel asynchronously. It happens that on x86 "must be
>>> delivered synchronously" and "not going to in kernel irqchip" are the same, but
>>> this isn't true for other archs. For ARM all our interrupts can be fed
>>> to the kernel asynchronously, and so we use KVM_IRQ_LINE in all
>>> cases.
>
>> I do no quite understand what you mean by synchronously and
>> asynchronously.
>
> Synchronously: the vcpu has to be stopped and userspace then
> feeds in the interrupt to be taken when the guest is resumed.
> Asynchronously: any old thread can tell the kernel there's an
> interrupt, and the guest vcpu then deals with it when needed
> (the vcpu thread may leave the guest but doesn't come out of
> the host kernel to qemu).
>
>> The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
>> is that former is used when destination cpu is known to userspace later
>> is used when kernel code is involved in figuring out the destination.
>
> This doesn't match up with Avi's explanation at all.
>
>> The
>> injections themselves are currently synchronous for both of them on x86
>> and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
>> be injected into a guest and vcpu state is changed to inject interrupt
>> during next guest entry. In the near feature x86 will be able to inject
>> interrupt without kicking vcpu out from the guest mode does ARM plan to
>> do the same? For GIC interrupts or for IRQ/FIQ or for both?
>>
>>> There was a big discussion thread about this on kvm and qemu-devel last
>>> July (and we cleaned up some of the QEMU code to not smoosh together
>>> all these different concepts under "do I have an irqchip or not?").
>> Do you have a pointer?
>
>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02460.html
> and there was a later longer (but less clear) thread which included
> this mail from Avi:
>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02872.html
> basically explaining that the reason for the weird synchronous
> KVM_INTERRUPT API is that it's emulating a weird synchronous
> hardware interface which is specific to x86. ARM doesn't have
> a synchronous interface in the same way, so it's much more
> straightforward to use KVM_IRQ_LINE.
>
Also, this code has been reviewed numerous times by the KVM community
and as Peter points out has also been discussed in detail.

Could we please not change this API in the last second?

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2013-01-15 14:40             ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-15 14:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 9:04 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 15 January 2013 12:52, Gleb Natapov <gleb@redhat.com> wrote:
>> On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
>>> On 15 January 2013 09:56, Gleb Natapov <gleb@redhat.com> wrote:
>>> >> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
>>> > CPU level interrupt should use KVM_INTERRUPT instead.
>>>
>>> No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
>>> delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
>>> can be fed to the kernel asynchronously. It happens that on x86 "must be
>>> delivered synchronously" and "not going to in kernel irqchip" are the same, but
>>> this isn't true for other archs. For ARM all our interrupts can be fed
>>> to the kernel asynchronously, and so we use KVM_IRQ_LINE in all
>>> cases.
>
>> I do no quite understand what you mean by synchronously and
>> asynchronously.
>
> Synchronously: the vcpu has to be stopped and userspace then
> feeds in the interrupt to be taken when the guest is resumed.
> Asynchronously: any old thread can tell the kernel there's an
> interrupt, and the guest vcpu then deals with it when needed
> (the vcpu thread may leave the guest but doesn't come out of
> the host kernel to qemu).
>
>> The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
>> is that former is used when destination cpu is known to userspace later
>> is used when kernel code is involved in figuring out the destination.
>
> This doesn't match up with Avi's explanation at all.
>
>> The
>> injections themselves are currently synchronous for both of them on x86
>> and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
>> be injected into a guest and vcpu state is changed to inject interrupt
>> during next guest entry. In the near feature x86 will be able to inject
>> interrupt without kicking vcpu out from the guest mode does ARM plan to
>> do the same? For GIC interrupts or for IRQ/FIQ or for both?
>>
>>> There was a big discussion thread about this on kvm and qemu-devel last
>>> July (and we cleaned up some of the QEMU code to not smoosh together
>>> all these different concepts under "do I have an irqchip or not?").
>> Do you have a pointer?
>
>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02460.html
> and there was a later longer (but less clear) thread which included
> this mail from Avi:
>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02872.html
> basically explaining that the reason for the weird synchronous
> KVM_INTERRUPT API is that it's emulating a weird synchronous
> hardware interface which is specific to x86. ARM doesn't have
> a synchronous interface in the same way, so it's much more
> straightforward to use KVM_IRQ_LINE.
>
Also, this code has been reviewed numerous times by the KVM community
and as Peter points out has also been discussed in detail.

Could we please not change this API in the last second?

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-15 14:27             ` Gleb Natapov
@ 2013-01-15 14:42               ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-15 14:42 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marc Zyngier, kvm, linux-arm-kernel, kvmarm, Marcelo Tosatti,
	Rusty Russell

On Tue, Jan 15, 2013 at 9:27 AM, Gleb Natapov <gleb@redhat.com> wrote:
> On Tue, Jan 15, 2013 at 01:46:04PM +0000, Marc Zyngier wrote:
>> On 15/01/13 13:34, Gleb Natapov wrote:
>> > On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
>> >> On 15/01/13 13:18, Gleb Natapov wrote:
>> >>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
>> >>>> When the guest accesses I/O memory this will create data abort
>> >>>> exceptions and they are handled by decoding the HSR information
>> >>>> (physical address, read/write, length, register) and forwarding reads
>> >>>> and writes to QEMU which performs the device emulation.
>> >>>>
>> >>>> Certain classes of load/store operations do not support the syndrome
>> >>>> information provided in the HSR and we therefore must be able to fetch
>> >>>> the offending instruction from guest memory and decode it manually.
>> >>>>
>> >>>> We only support instruction decoding for valid reasonable MMIO operations
>> >>>> where trapping them do not provide sufficient information in the HSR (no
>> >>>> 16-bit Thumb instructions provide register writeback that we care about).
>> >>>>
>> >>>> The following instruction types are NOT supported for MMIO operations
>> >>>> despite the HSR not containing decode info:
>> >>>>  - any Load/Store multiple
>> >>>>  - any load/store exclusive
>> >>>>  - any load/store dual
>> >>>>  - anything with the PC as the dest register
>> >>>>
>> >>>> This requires changing the general flow somewhat since new calls to run
>> >>>> the VCPU must check if there's a pending MMIO load and perform the write
>> >>>> after userspace has made the data available.
>> >>>>
>> >>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
>> >>>> (1) Guest complicated mmio instruction traps.
>> >>>> (2) The hardware doesn't tell us enough, so we need to read the actual
>> >>>>     instruction which was being exectuted.
>> >>>> (3) KVM maps the instruction virtual address to a physical address.
>> >>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
>> >>>> (5) We read the physical address, but now that's the wrong thing.
>> >>> How can this happen?! The guest cannot reuse physical page before it
>> >>> flushes it from all vcpus tlb cache. For that it needs to send
>> >>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
>> >>> while it does emulation.
>> >>
>> >> I don't know how this works on x86, but a KVM/ARM guest can definitely
>> >> handle an IPI.
>> >>
>> > How can a vcpu handle an IPI while it is not in a guest mode?
>>
>> I think there is some misunderstanding. A guest IPI is of course handled
>> while running the guest. You completely lost me here.
> You need IPI from one guest vcpu to another to invalidate its TLB on
> x86. That prevents the race from happening there.
>
>>
>> >> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
>> >> we're doing some set/way operation which is handled separately).
>> >>
>> > What prevents a page to be swapped out while code is fetched from it?
>>
>> Why would you prevent it? TLB invalidation is broadcast by the HW. If
>> you swap a page out, you flag the entry as invalid and invalidate the
>> corresponding TLB. If you hit it, you swap the page back in.
>>
> There is no IPI (or anything that requires response from cpu whose TLB
> is invalidated) involved in invalidating remote TLB?
>
>
no there's not, the hardware broadcasts the TLB invalidate operation.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-15 14:42               ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-15 14:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 9:27 AM, Gleb Natapov <gleb@redhat.com> wrote:
> On Tue, Jan 15, 2013 at 01:46:04PM +0000, Marc Zyngier wrote:
>> On 15/01/13 13:34, Gleb Natapov wrote:
>> > On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
>> >> On 15/01/13 13:18, Gleb Natapov wrote:
>> >>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
>> >>>> When the guest accesses I/O memory this will create data abort
>> >>>> exceptions and they are handled by decoding the HSR information
>> >>>> (physical address, read/write, length, register) and forwarding reads
>> >>>> and writes to QEMU which performs the device emulation.
>> >>>>
>> >>>> Certain classes of load/store operations do not support the syndrome
>> >>>> information provided in the HSR and we therefore must be able to fetch
>> >>>> the offending instruction from guest memory and decode it manually.
>> >>>>
>> >>>> We only support instruction decoding for valid reasonable MMIO operations
>> >>>> where trapping them do not provide sufficient information in the HSR (no
>> >>>> 16-bit Thumb instructions provide register writeback that we care about).
>> >>>>
>> >>>> The following instruction types are NOT supported for MMIO operations
>> >>>> despite the HSR not containing decode info:
>> >>>>  - any Load/Store multiple
>> >>>>  - any load/store exclusive
>> >>>>  - any load/store dual
>> >>>>  - anything with the PC as the dest register
>> >>>>
>> >>>> This requires changing the general flow somewhat since new calls to run
>> >>>> the VCPU must check if there's a pending MMIO load and perform the write
>> >>>> after userspace has made the data available.
>> >>>>
>> >>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
>> >>>> (1) Guest complicated mmio instruction traps.
>> >>>> (2) The hardware doesn't tell us enough, so we need to read the actual
>> >>>>     instruction which was being exectuted.
>> >>>> (3) KVM maps the instruction virtual address to a physical address.
>> >>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
>> >>>> (5) We read the physical address, but now that's the wrong thing.
>> >>> How can this happen?! The guest cannot reuse physical page before it
>> >>> flushes it from all vcpus tlb cache. For that it needs to send
>> >>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
>> >>> while it does emulation.
>> >>
>> >> I don't know how this works on x86, but a KVM/ARM guest can definitely
>> >> handle an IPI.
>> >>
>> > How can a vcpu handle an IPI while it is not in a guest mode?
>>
>> I think there is some misunderstanding. A guest IPI is of course handled
>> while running the guest. You completely lost me here.
> You need IPI from one guest vcpu to another to invalidate its TLB on
> x86. That prevents the race from happening there.
>
>>
>> >> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
>> >> we're doing some set/way operation which is handled separately).
>> >>
>> > What prevents a page to be swapped out while code is fetched from it?
>>
>> Why would you prevent it? TLB invalidation is broadcast by the HW. If
>> you swap a page out, you flag the entry as invalid and invalidate the
>> corresponding TLB. If you hit it, you swap the page back in.
>>
> There is no IPI (or anything that requires response from cpu whose TLB
> is invalidated) involved in invalidating remote TLB?
>
>
no there's not, the hardware broadcasts the TLB invalidate operation.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-15 14:27             ` Gleb Natapov
@ 2013-01-15 14:48               ` Marc Zyngier
  -1 siblings, 0 replies; 160+ messages in thread
From: Marc Zyngier @ 2013-01-15 14:48 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christoffer Dall, kvm, linux-arm-kernel, kvmarm, Marcelo Tosatti,
	Rusty Russell

On 15/01/13 14:27, Gleb Natapov wrote:
> On Tue, Jan 15, 2013 at 01:46:04PM +0000, Marc Zyngier wrote:
>> On 15/01/13 13:34, Gleb Natapov wrote:
>>> On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
>>>> On 15/01/13 13:18, Gleb Natapov wrote:
>>>>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
>>>>>> When the guest accesses I/O memory this will create data abort
>>>>>> exceptions and they are handled by decoding the HSR information
>>>>>> (physical address, read/write, length, register) and forwarding reads
>>>>>> and writes to QEMU which performs the device emulation.
>>>>>>
>>>>>> Certain classes of load/store operations do not support the syndrome
>>>>>> information provided in the HSR and we therefore must be able to fetch
>>>>>> the offending instruction from guest memory and decode it manually.
>>>>>>
>>>>>> We only support instruction decoding for valid reasonable MMIO operations
>>>>>> where trapping them do not provide sufficient information in the HSR (no
>>>>>> 16-bit Thumb instructions provide register writeback that we care about).
>>>>>>
>>>>>> The following instruction types are NOT supported for MMIO operations
>>>>>> despite the HSR not containing decode info:
>>>>>>  - any Load/Store multiple
>>>>>>  - any load/store exclusive
>>>>>>  - any load/store dual
>>>>>>  - anything with the PC as the dest register
>>>>>>
>>>>>> This requires changing the general flow somewhat since new calls to run
>>>>>> the VCPU must check if there's a pending MMIO load and perform the write
>>>>>> after userspace has made the data available.
>>>>>>
>>>>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
>>>>>> (1) Guest complicated mmio instruction traps.
>>>>>> (2) The hardware doesn't tell us enough, so we need to read the actual
>>>>>>     instruction which was being exectuted.
>>>>>> (3) KVM maps the instruction virtual address to a physical address.
>>>>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
>>>>>> (5) We read the physical address, but now that's the wrong thing.
>>>>> How can this happen?! The guest cannot reuse physical page before it
>>>>> flushes it from all vcpus tlb cache. For that it needs to send
>>>>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
>>>>> while it does emulation.
>>>>
>>>> I don't know how this works on x86, but a KVM/ARM guest can definitely
>>>> handle an IPI.
>>>>
>>> How can a vcpu handle an IPI while it is not in a guest mode?
>>
>> I think there is some misunderstanding. A guest IPI is of course handled
>> while running the guest. You completely lost me here.
> You need IPI from one guest vcpu to another to invalidate its TLB on
> x86. That prevents the race from happening there.

We don't need this on ARM (starting with v7, v6 is an entirely different
story, and we do not support KVM on v6).

The TLB is propagated by the HW using the following (pseudocode) sequence:
	tlb_invalidate VA
	barrier

Leaving the barrier guaranties that all TLB invalidations have been
propagated.

>>
>>>> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
>>>> we're doing some set/way operation which is handled separately).
>>>>
>>> What prevents a page to be swapped out while code is fetched from it?
>>
>> Why would you prevent it? TLB invalidation is broadcast by the HW. If
>> you swap a page out, you flag the entry as invalid and invalidate the
>> corresponding TLB. If you hit it, you swap the page back in.
>>
> There is no IPI (or anything that requires response from cpu whose TLB
> is invalidated) involved in invalidating remote TLB?

No. The above sequence is all you have to do.

This is why the above race is a bit hairy. A vcpu will happily
invalidate TLBs, but as the faulting vcpu already performed the
translation, we're screwed.

Thankfully, this is a case that only matters when we have to emulate an
MMIO operation that is not automatically decoded by the HW. They are
rare (the Linux kernel doesn't use them). In this case, we stop the
world (IPI).

	M.
-- 
Jazz is not dead. It just smells funny...


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-15 14:48               ` Marc Zyngier
  0 siblings, 0 replies; 160+ messages in thread
From: Marc Zyngier @ 2013-01-15 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 15/01/13 14:27, Gleb Natapov wrote:
> On Tue, Jan 15, 2013 at 01:46:04PM +0000, Marc Zyngier wrote:
>> On 15/01/13 13:34, Gleb Natapov wrote:
>>> On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
>>>> On 15/01/13 13:18, Gleb Natapov wrote:
>>>>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
>>>>>> When the guest accesses I/O memory this will create data abort
>>>>>> exceptions and they are handled by decoding the HSR information
>>>>>> (physical address, read/write, length, register) and forwarding reads
>>>>>> and writes to QEMU which performs the device emulation.
>>>>>>
>>>>>> Certain classes of load/store operations do not support the syndrome
>>>>>> information provided in the HSR and we therefore must be able to fetch
>>>>>> the offending instruction from guest memory and decode it manually.
>>>>>>
>>>>>> We only support instruction decoding for valid reasonable MMIO operations
>>>>>> where trapping them do not provide sufficient information in the HSR (no
>>>>>> 16-bit Thumb instructions provide register writeback that we care about).
>>>>>>
>>>>>> The following instruction types are NOT supported for MMIO operations
>>>>>> despite the HSR not containing decode info:
>>>>>>  - any Load/Store multiple
>>>>>>  - any load/store exclusive
>>>>>>  - any load/store dual
>>>>>>  - anything with the PC as the dest register
>>>>>>
>>>>>> This requires changing the general flow somewhat since new calls to run
>>>>>> the VCPU must check if there's a pending MMIO load and perform the write
>>>>>> after userspace has made the data available.
>>>>>>
>>>>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
>>>>>> (1) Guest complicated mmio instruction traps.
>>>>>> (2) The hardware doesn't tell us enough, so we need to read the actual
>>>>>>     instruction which was being exectuted.
>>>>>> (3) KVM maps the instruction virtual address to a physical address.
>>>>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
>>>>>> (5) We read the physical address, but now that's the wrong thing.
>>>>> How can this happen?! The guest cannot reuse physical page before it
>>>>> flushes it from all vcpus tlb cache. For that it needs to send
>>>>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
>>>>> while it does emulation.
>>>>
>>>> I don't know how this works on x86, but a KVM/ARM guest can definitely
>>>> handle an IPI.
>>>>
>>> How can a vcpu handle an IPI while it is not in a guest mode?
>>
>> I think there is some misunderstanding. A guest IPI is of course handled
>> while running the guest. You completely lost me here.
> You need IPI from one guest vcpu to another to invalidate its TLB on
> x86. That prevents the race from happening there.

We don't need this on ARM (starting with v7, v6 is an entirely different
story, and we do not support KVM on v6).

The TLB is propagated by the HW using the following (pseudocode) sequence:
	tlb_invalidate VA
	barrier

Leaving the barrier guaranties that all TLB invalidations have been
propagated.

>>
>>>> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
>>>> we're doing some set/way operation which is handled separately).
>>>>
>>> What prevents a page to be swapped out while code is fetched from it?
>>
>> Why would you prevent it? TLB invalidation is broadcast by the HW. If
>> you swap a page out, you flag the entry as invalid and invalidate the
>> corresponding TLB. If you hit it, you swap the page back in.
>>
> There is no IPI (or anything that requires response from cpu whose TLB
> is invalidated) involved in invalidating remote TLB?

No. The above sequence is all you have to do.

This is why the above race is a bit hairy. A vcpu will happily
invalidate TLBs, but as the faulting vcpu already performed the
translation, we're screwed.

Thankfully, this is a case that only matters when we have to emulate an
MMIO operation that is not automatically decoded by the HW. They are
rare (the Linux kernel doesn't use them). In this case, we stop the
world (IPI).

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2013-01-15 14:04           ` Peter Maydell
@ 2013-01-15 15:17             ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 15:17 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Christoffer Dall, Marcelo Tosatti, linux-arm-kernel, kvm, kvmarm

On Tue, Jan 15, 2013 at 02:04:47PM +0000, Peter Maydell wrote:
> On 15 January 2013 12:52, Gleb Natapov <gleb@redhat.com> wrote:
> > On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
> >> On 15 January 2013 09:56, Gleb Natapov <gleb@redhat.com> wrote:
> >> >> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
> >> > CPU level interrupt should use KVM_INTERRUPT instead.
> >>
> >> No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
> >> delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
> >> can be fed to the kernel asynchronously. It happens that on x86 "must be
> >> delivered synchronously" and "not going to in kernel irqchip" are the same, but
> >> this isn't true for other archs. For ARM all our interrupts can be fed
> >> to the kernel asynchronously, and so we use KVM_IRQ_LINE in all
> >> cases.
> 
> > I do no quite understand what you mean by synchronously and
> > asynchronously.
> 
> Synchronously: the vcpu has to be stopped and userspace then
> feeds in the interrupt to be taken when the guest is resumed.
> Asynchronously: any old thread can tell the kernel there's an
> interrupt, and the guest vcpu then deals with it when needed
> (the vcpu thread may leave the guest but doesn't come out of
> the host kernel to qemu).
> 
> > The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
> > is that former is used when destination cpu is known to userspace later
> > is used when kernel code is involved in figuring out the destination.
> 
> This doesn't match up with Avi's explanation at all.
> 
> > The
> > injections themselves are currently synchronous for both of them on x86
> > and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
> > be injected into a guest and vcpu state is changed to inject interrupt
> > during next guest entry. In the near feature x86 will be able to inject
> > interrupt without kicking vcpu out from the guest mode does ARM plan to
> > do the same? For GIC interrupts or for IRQ/FIQ or for both?
> >
> >> There was a big discussion thread about this on kvm and qemu-devel last
> >> July (and we cleaned up some of the QEMU code to not smoosh together
> >> all these different concepts under "do I have an irqchip or not?").
> > Do you have a pointer?
> 
>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02460.html
> and there was a later longer (but less clear) thread which included
> this mail from Avi:
>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02872.html
> basically explaining that the reason for the weird synchronous
> KVM_INTERRUPT API is that it's emulating a weird synchronous
> hardware interface which is specific to x86. ARM doesn't have
> a synchronous interface in the same way, so it's much more
> straightforward to use KVM_IRQ_LINE.
> 
OK. I see. So basically Avi saw KVM_INTERRUPT as an oddball interface
required only for APIC emulation in userspace. It is used for PIC also,
where this is not strictly needed, but this is for historical reasons
(KVM_IRQ_LINE was introduces late and it is GSI centric on x86).

Thank you for the pointer.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2013-01-15 15:17             ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 15:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 02:04:47PM +0000, Peter Maydell wrote:
> On 15 January 2013 12:52, Gleb Natapov <gleb@redhat.com> wrote:
> > On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
> >> On 15 January 2013 09:56, Gleb Natapov <gleb@redhat.com> wrote:
> >> >> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
> >> > CPU level interrupt should use KVM_INTERRUPT instead.
> >>
> >> No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
> >> delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
> >> can be fed to the kernel asynchronously. It happens that on x86 "must be
> >> delivered synchronously" and "not going to in kernel irqchip" are the same, but
> >> this isn't true for other archs. For ARM all our interrupts can be fed
> >> to the kernel asynchronously, and so we use KVM_IRQ_LINE in all
> >> cases.
> 
> > I do no quite understand what you mean by synchronously and
> > asynchronously.
> 
> Synchronously: the vcpu has to be stopped and userspace then
> feeds in the interrupt to be taken when the guest is resumed.
> Asynchronously: any old thread can tell the kernel there's an
> interrupt, and the guest vcpu then deals with it when needed
> (the vcpu thread may leave the guest but doesn't come out of
> the host kernel to qemu).
> 
> > The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
> > is that former is used when destination cpu is known to userspace later
> > is used when kernel code is involved in figuring out the destination.
> 
> This doesn't match up with Avi's explanation at all.
> 
> > The
> > injections themselves are currently synchronous for both of them on x86
> > and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
> > be injected into a guest and vcpu state is changed to inject interrupt
> > during next guest entry. In the near feature x86 will be able to inject
> > interrupt without kicking vcpu out from the guest mode does ARM plan to
> > do the same? For GIC interrupts or for IRQ/FIQ or for both?
> >
> >> There was a big discussion thread about this on kvm and qemu-devel last
> >> July (and we cleaned up some of the QEMU code to not smoosh together
> >> all these different concepts under "do I have an irqchip or not?").
> > Do you have a pointer?
> 
>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02460.html
> and there was a later longer (but less clear) thread which included
> this mail from Avi:
>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02872.html
> basically explaining that the reason for the weird synchronous
> KVM_INTERRUPT API is that it's emulating a weird synchronous
> hardware interface which is specific to x86. ARM doesn't have
> a synchronous interface in the same way, so it's much more
> straightforward to use KVM_IRQ_LINE.
> 
OK. I see. So basically Avi saw KVM_INTERRUPT as an oddball interface
required only for APIC emulation in userspace. It is used for PIC also,
where this is not strictly needed, but this is for historical reasons
(KVM_IRQ_LINE was introduces late and it is GSI centric on x86).

Thank you for the pointer.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
  2013-01-15 14:48               ` Marc Zyngier
@ 2013-01-15 15:31                 ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 15:31 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, kvm, linux-arm-kernel, kvmarm, Marcelo Tosatti,
	Rusty Russell

On Tue, Jan 15, 2013 at 02:48:27PM +0000, Marc Zyngier wrote:
> On 15/01/13 14:27, Gleb Natapov wrote:
> > On Tue, Jan 15, 2013 at 01:46:04PM +0000, Marc Zyngier wrote:
> >> On 15/01/13 13:34, Gleb Natapov wrote:
> >>> On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
> >>>> On 15/01/13 13:18, Gleb Natapov wrote:
> >>>>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> >>>>>> When the guest accesses I/O memory this will create data abort
> >>>>>> exceptions and they are handled by decoding the HSR information
> >>>>>> (physical address, read/write, length, register) and forwarding reads
> >>>>>> and writes to QEMU which performs the device emulation.
> >>>>>>
> >>>>>> Certain classes of load/store operations do not support the syndrome
> >>>>>> information provided in the HSR and we therefore must be able to fetch
> >>>>>> the offending instruction from guest memory and decode it manually.
> >>>>>>
> >>>>>> We only support instruction decoding for valid reasonable MMIO operations
> >>>>>> where trapping them do not provide sufficient information in the HSR (no
> >>>>>> 16-bit Thumb instructions provide register writeback that we care about).
> >>>>>>
> >>>>>> The following instruction types are NOT supported for MMIO operations
> >>>>>> despite the HSR not containing decode info:
> >>>>>>  - any Load/Store multiple
> >>>>>>  - any load/store exclusive
> >>>>>>  - any load/store dual
> >>>>>>  - anything with the PC as the dest register
> >>>>>>
> >>>>>> This requires changing the general flow somewhat since new calls to run
> >>>>>> the VCPU must check if there's a pending MMIO load and perform the write
> >>>>>> after userspace has made the data available.
> >>>>>>
> >>>>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
> >>>>>> (1) Guest complicated mmio instruction traps.
> >>>>>> (2) The hardware doesn't tell us enough, so we need to read the actual
> >>>>>>     instruction which was being exectuted.
> >>>>>> (3) KVM maps the instruction virtual address to a physical address.
> >>>>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
> >>>>>> (5) We read the physical address, but now that's the wrong thing.
> >>>>> How can this happen?! The guest cannot reuse physical page before it
> >>>>> flushes it from all vcpus tlb cache. For that it needs to send
> >>>>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
> >>>>> while it does emulation.
> >>>>
> >>>> I don't know how this works on x86, but a KVM/ARM guest can definitely
> >>>> handle an IPI.
> >>>>
> >>> How can a vcpu handle an IPI while it is not in a guest mode?
> >>
> >> I think there is some misunderstanding. A guest IPI is of course handled
> >> while running the guest. You completely lost me here.
> > You need IPI from one guest vcpu to another to invalidate its TLB on
> > x86. That prevents the race from happening there.
> 
> We don't need this on ARM (starting with v7, v6 is an entirely different
> story, and we do not support KVM on v6).
> 
> The TLB is propagated by the HW using the following (pseudocode) sequence:
> 	tlb_invalidate VA
> 	barrier
> 
> Leaving the barrier guaranties that all TLB invalidations have been
> propagated.
> 
That explains why __get_user_pages_fast() is missing on ARM :)

> >>
> >>>> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
> >>>> we're doing some set/way operation which is handled separately).
> >>>>
> >>> What prevents a page to be swapped out while code is fetched from it?
> >>
> >> Why would you prevent it? TLB invalidation is broadcast by the HW. If
> >> you swap a page out, you flag the entry as invalid and invalidate the
> >> corresponding TLB. If you hit it, you swap the page back in.
> >>
> > There is no IPI (or anything that requires response from cpu whose TLB
> > is invalidated) involved in invalidating remote TLB?
> 
> No. The above sequence is all you have to do.
> 
> This is why the above race is a bit hairy. A vcpu will happily
> invalidate TLBs, but as the faulting vcpu already performed the
> translation, we're screwed.
> 
> Thankfully, this is a case that only matters when we have to emulate an
> MMIO operation that is not automatically decoded by the HW. They are
> rare (the Linux kernel doesn't use them). In this case, we stop the
> world (IPI).
> 
Got it. Thanks.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
@ 2013-01-15 15:31                 ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 15:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 02:48:27PM +0000, Marc Zyngier wrote:
> On 15/01/13 14:27, Gleb Natapov wrote:
> > On Tue, Jan 15, 2013 at 01:46:04PM +0000, Marc Zyngier wrote:
> >> On 15/01/13 13:34, Gleb Natapov wrote:
> >>> On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
> >>>> On 15/01/13 13:18, Gleb Natapov wrote:
> >>>>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> >>>>>> When the guest accesses I/O memory this will create data abort
> >>>>>> exceptions and they are handled by decoding the HSR information
> >>>>>> (physical address, read/write, length, register) and forwarding reads
> >>>>>> and writes to QEMU which performs the device emulation.
> >>>>>>
> >>>>>> Certain classes of load/store operations do not support the syndrome
> >>>>>> information provided in the HSR and we therefore must be able to fetch
> >>>>>> the offending instruction from guest memory and decode it manually.
> >>>>>>
> >>>>>> We only support instruction decoding for valid reasonable MMIO operations
> >>>>>> where trapping them do not provide sufficient information in the HSR (no
> >>>>>> 16-bit Thumb instructions provide register writeback that we care about).
> >>>>>>
> >>>>>> The following instruction types are NOT supported for MMIO operations
> >>>>>> despite the HSR not containing decode info:
> >>>>>>  - any Load/Store multiple
> >>>>>>  - any load/store exclusive
> >>>>>>  - any load/store dual
> >>>>>>  - anything with the PC as the dest register
> >>>>>>
> >>>>>> This requires changing the general flow somewhat since new calls to run
> >>>>>> the VCPU must check if there's a pending MMIO load and perform the write
> >>>>>> after userspace has made the data available.
> >>>>>>
> >>>>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
> >>>>>> (1) Guest complicated mmio instruction traps.
> >>>>>> (2) The hardware doesn't tell us enough, so we need to read the actual
> >>>>>>     instruction which was being exectuted.
> >>>>>> (3) KVM maps the instruction virtual address to a physical address.
> >>>>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
> >>>>>> (5) We read the physical address, but now that's the wrong thing.
> >>>>> How can this happen?! The guest cannot reuse physical page before it
> >>>>> flushes it from all vcpus tlb cache. For that it needs to send
> >>>>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
> >>>>> while it does emulation.
> >>>>
> >>>> I don't know how this works on x86, but a KVM/ARM guest can definitely
> >>>> handle an IPI.
> >>>>
> >>> How can a vcpu handle an IPI while it is not in a guest mode?
> >>
> >> I think there is some misunderstanding. A guest IPI is of course handled
> >> while running the guest. You completely lost me here.
> > You need IPI from one guest vcpu to another to invalidate its TLB on
> > x86. That prevents the race from happening there.
> 
> We don't need this on ARM (starting with v7, v6 is an entirely different
> story, and we do not support KVM on v6).
> 
> The TLB is propagated by the HW using the following (pseudocode) sequence:
> 	tlb_invalidate VA
> 	barrier
> 
> Leaving the barrier guaranties that all TLB invalidations have been
> propagated.
> 
That explains why __get_user_pages_fast() is missing on ARM :)

> >>
> >>>> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
> >>>> we're doing some set/way operation which is handled separately).
> >>>>
> >>> What prevents a page to be swapped out while code is fetched from it?
> >>
> >> Why would you prevent it? TLB invalidation is broadcast by the HW. If
> >> you swap a page out, you flag the entry as invalid and invalidate the
> >> corresponding TLB. If you hit it, you swap the page back in.
> >>
> > There is no IPI (or anything that requires response from cpu whose TLB
> > is invalidated) involved in invalidating remote TLB?
> 
> No. The above sequence is all you have to do.
> 
> This is why the above race is a bit hairy. A vcpu will happily
> invalidate TLBs, but as the faulting vcpu already performed the
> translation, we're screwed.
> 
> Thankfully, this is a case that only matters when we have to emulate an
> MMIO operation that is not automatically decoded by the HW. They are
> rare (the Linux kernel doesn't use them). In this case, we stop the
> world (IPI).
> 
Got it. Thanks.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-15 13:43           ` Alexander Graf
@ 2013-01-15 15:35             ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 15:35 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-arm-kernel, Christoffer Dall, kvmarm, kvm, Marcelo Tosatti

On Tue, Jan 15, 2013 at 02:43:26PM +0100, Alexander Graf wrote:
> 
> On 15.01.2013, at 14:32, Gleb Natapov wrote:
> 
> > On Mon, Jan 14, 2013 at 05:17:31PM -0500, Christoffer Dall wrote:
> >> On Mon, Jan 14, 2013 at 1:49 PM, Gleb Natapov <gleb@redhat.com> wrote:
> >>> A couple of general question about ABI. If they were already answered
> >>> just refer me to the previous discussion.
> >>> 
> >>> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>>> index a4df553..4237c27 100644
> >>>> --- a/Documentation/virtual/kvm/api.txt
> >>>> +++ b/Documentation/virtual/kvm/api.txt
> >>>> @@ -293,7 +293,7 @@ kvm_run' (see below).
> >>>> 4.11 KVM_GET_REGS
> >>>> 
> >>>> Capability: basic
> >>>> -Architectures: all
> >>>> +Architectures: all except ARM
> >>>> Type: vcpu ioctl
> >>>> Parameters: struct kvm_regs (out)
> >>>> Returns: 0 on success, -1 on error
> >>>> @@ -314,7 +314,7 @@ struct kvm_regs {
> >>>> 4.12 KVM_SET_REGS
> >>>> 
> >>>> Capability: basic
> >>>> -Architectures: all
> >>>> +Architectures: all except ARM
> >>>> Type: vcpu ioctl
> >>>> Parameters: struct kvm_regs (in)
> >>>> Returns: 0 on success, -1 on error
> >>>> @@ -600,7 +600,7 @@ struct kvm_fpu {
> >>>> 4.24 KVM_CREATE_IRQCHIP
> >>> Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?
> >>> 
> >> 
> >> We use the ONE_REG API instead and we don't want to support two
> >> separate APIs to user space.
> >> 
> > I suppose fetching all registers is not anywhere on a fast path in
> > userspace :)
> 
> If it's ever going to be in a fast path, we will add MULTI_REG which will feature an array of ONE_REG structs to fetch multiple registers at once.
> 
And I hope it will not be. On x86 only the broken vmware PV
interface requires reading registers for emulation.

This reminds me the question I wanted to ask you about ONE_REG
interface. Why architecture is encoded in register name? Is architecture
implied by HW the code is running on anyway?

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-15 15:35             ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-15 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 02:43:26PM +0100, Alexander Graf wrote:
> 
> On 15.01.2013, at 14:32, Gleb Natapov wrote:
> 
> > On Mon, Jan 14, 2013 at 05:17:31PM -0500, Christoffer Dall wrote:
> >> On Mon, Jan 14, 2013 at 1:49 PM, Gleb Natapov <gleb@redhat.com> wrote:
> >>> A couple of general question about ABI. If they were already answered
> >>> just refer me to the previous discussion.
> >>> 
> >>> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>>> index a4df553..4237c27 100644
> >>>> --- a/Documentation/virtual/kvm/api.txt
> >>>> +++ b/Documentation/virtual/kvm/api.txt
> >>>> @@ -293,7 +293,7 @@ kvm_run' (see below).
> >>>> 4.11 KVM_GET_REGS
> >>>> 
> >>>> Capability: basic
> >>>> -Architectures: all
> >>>> +Architectures: all except ARM
> >>>> Type: vcpu ioctl
> >>>> Parameters: struct kvm_regs (out)
> >>>> Returns: 0 on success, -1 on error
> >>>> @@ -314,7 +314,7 @@ struct kvm_regs {
> >>>> 4.12 KVM_SET_REGS
> >>>> 
> >>>> Capability: basic
> >>>> -Architectures: all
> >>>> +Architectures: all except ARM
> >>>> Type: vcpu ioctl
> >>>> Parameters: struct kvm_regs (in)
> >>>> Returns: 0 on success, -1 on error
> >>>> @@ -600,7 +600,7 @@ struct kvm_fpu {
> >>>> 4.24 KVM_CREATE_IRQCHIP
> >>> Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?
> >>> 
> >> 
> >> We use the ONE_REG API instead and we don't want to support two
> >> separate APIs to user space.
> >> 
> > I suppose fetching all registers is not anywhere on a fast path in
> > userspace :)
> 
> If it's ever going to be in a fast path, we will add MULTI_REG which will feature an array of ONE_REG structs to fetch multiple registers at once.
> 
And I hope it will not be. On x86 only the broken vmware PV
interface requires reading registers for emulation.

This reminds me the question I wanted to ask you about ONE_REG
interface. Why architecture is encoded in register name? Is architecture
implied by HW the code is running on anyway?

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-15 15:35             ` Gleb Natapov
@ 2013-01-15 16:21               ` Alexander Graf
  -1 siblings, 0 replies; 160+ messages in thread
From: Alexander Graf @ 2013-01-15 16:21 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christoffer Dall, Marcelo Tosatti, kvm, kvmarm, linux-arm-kernel

On 01/15/2013 04:35 PM, Gleb Natapov wrote:
> On Tue, Jan 15, 2013 at 02:43:26PM +0100, Alexander Graf wrote:
>> On 15.01.2013, at 14:32, Gleb Natapov wrote:
>>
>>> On Mon, Jan 14, 2013 at 05:17:31PM -0500, Christoffer Dall wrote:
>>>> On Mon, Jan 14, 2013 at 1:49 PM, Gleb Natapov<gleb@redhat.com>  wrote:
>>>>> A couple of general question about ABI. If they were already answered
>>>>> just refer me to the previous discussion.
>>>>>
>>>>> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>> index a4df553..4237c27 100644
>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>> @@ -293,7 +293,7 @@ kvm_run' (see below).
>>>>>> 4.11 KVM_GET_REGS
>>>>>>
>>>>>> Capability: basic
>>>>>> -Architectures: all
>>>>>> +Architectures: all except ARM
>>>>>> Type: vcpu ioctl
>>>>>> Parameters: struct kvm_regs (out)
>>>>>> Returns: 0 on success, -1 on error
>>>>>> @@ -314,7 +314,7 @@ struct kvm_regs {
>>>>>> 4.12 KVM_SET_REGS
>>>>>>
>>>>>> Capability: basic
>>>>>> -Architectures: all
>>>>>> +Architectures: all except ARM
>>>>>> Type: vcpu ioctl
>>>>>> Parameters: struct kvm_regs (in)
>>>>>> Returns: 0 on success, -1 on error
>>>>>> @@ -600,7 +600,7 @@ struct kvm_fpu {
>>>>>> 4.24 KVM_CREATE_IRQCHIP
>>>>> Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?
>>>>>
>>>> We use the ONE_REG API instead and we don't want to support two
>>>> separate APIs to user space.
>>>>
>>> I suppose fetching all registers is not anywhere on a fast path in
>>> userspace :)
>> If it's ever going to be in a fast path, we will add MULTI_REG which will feature an array of ONE_REG structs to fetch multiple registers at once.
>>
> And I hope it will not be. On x86 only the broken vmware PV
> interface requires reading registers for emulation.
>
> This reminds me the question I wanted to ask you about ONE_REG
> interface. Why architecture is encoded in register name? Is architecture
> implied by HW the code is running on anyway?

It provides for nice sanity checks. Also, 64 bits are quite a lot, so we 
can easily waste a few bits for redundancy.


Alex


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-15 16:21               ` Alexander Graf
  0 siblings, 0 replies; 160+ messages in thread
From: Alexander Graf @ 2013-01-15 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/15/2013 04:35 PM, Gleb Natapov wrote:
> On Tue, Jan 15, 2013 at 02:43:26PM +0100, Alexander Graf wrote:
>> On 15.01.2013, at 14:32, Gleb Natapov wrote:
>>
>>> On Mon, Jan 14, 2013 at 05:17:31PM -0500, Christoffer Dall wrote:
>>>> On Mon, Jan 14, 2013 at 1:49 PM, Gleb Natapov<gleb@redhat.com>  wrote:
>>>>> A couple of general question about ABI. If they were already answered
>>>>> just refer me to the previous discussion.
>>>>>
>>>>> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>> index a4df553..4237c27 100644
>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>> @@ -293,7 +293,7 @@ kvm_run' (see below).
>>>>>> 4.11 KVM_GET_REGS
>>>>>>
>>>>>> Capability: basic
>>>>>> -Architectures: all
>>>>>> +Architectures: all except ARM
>>>>>> Type: vcpu ioctl
>>>>>> Parameters: struct kvm_regs (out)
>>>>>> Returns: 0 on success, -1 on error
>>>>>> @@ -314,7 +314,7 @@ struct kvm_regs {
>>>>>> 4.12 KVM_SET_REGS
>>>>>>
>>>>>> Capability: basic
>>>>>> -Architectures: all
>>>>>> +Architectures: all except ARM
>>>>>> Type: vcpu ioctl
>>>>>> Parameters: struct kvm_regs (in)
>>>>>> Returns: 0 on success, -1 on error
>>>>>> @@ -600,7 +600,7 @@ struct kvm_fpu {
>>>>>> 4.24 KVM_CREATE_IRQCHIP
>>>>> Why KVM_GET_REGS/KVM_SET_REGS are not usable for arm?
>>>>>
>>>> We use the ONE_REG API instead and we don't want to support two
>>>> separate APIs to user space.
>>>>
>>> I suppose fetching all registers is not anywhere on a fast path in
>>> userspace :)
>> If it's ever going to be in a fast path, we will add MULTI_REG which will feature an array of ONE_REG structs to fetch multiple registers at once.
>>
> And I hope it will not be. On x86 only the broken vmware PV
> interface requires reading registers for emulation.
>
> This reminds me the question I wanted to ask you about ONE_REG
> interface. Why architecture is encoded in register name? Is architecture
> implied by HW the code is running on anyway?

It provides for nice sanity checks. Also, 64 bits are quite a lot, so we 
can easily waste a few bits for redundancy.


Alex

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2013-01-15 15:17             ` Gleb Natapov
@ 2013-01-15 16:25               ` Alexander Graf
  -1 siblings, 0 replies; 160+ messages in thread
From: Alexander Graf @ 2013-01-15 16:25 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Peter Maydell, kvmarm, kvm, linux-arm-kernel, Marcelo Tosatti

On 01/15/2013 04:17 PM, Gleb Natapov wrote:
> On Tue, Jan 15, 2013 at 02:04:47PM +0000, Peter Maydell wrote:
>> On 15 January 2013 12:52, Gleb Natapov<gleb@redhat.com>  wrote:
>>> On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
>>>> On 15 January 2013 09:56, Gleb Natapov<gleb@redhat.com>  wrote:
>>>>>> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
>>>>> CPU level interrupt should use KVM_INTERRUPT instead.
>>>> No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
>>>> delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
>>>> can be fed to the kernel asynchronously. It happens that on x86 "must be
>>>> delivered synchronously" and "not going to in kernel irqchip" are the same, but
>>>> this isn't true for other archs. For ARM all our interrupts can be fed
>>>> to the kernel asynchronously, and so we use KVM_IRQ_LINE in all
>>>> cases.
>>> I do no quite understand what you mean by synchronously and
>>> asynchronously.
>> Synchronously: the vcpu has to be stopped and userspace then
>> feeds in the interrupt to be taken when the guest is resumed.
>> Asynchronously: any old thread can tell the kernel there's an
>> interrupt, and the guest vcpu then deals with it when needed
>> (the vcpu thread may leave the guest but doesn't come out of
>> the host kernel to qemu).
>>
>>> The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
>>> is that former is used when destination cpu is known to userspace later
>>> is used when kernel code is involved in figuring out the destination.
>> This doesn't match up with Avi's explanation at all.
>>
>>> The
>>> injections themselves are currently synchronous for both of them on x86
>>> and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
>>> be injected into a guest and vcpu state is changed to inject interrupt
>>> during next guest entry. In the near feature x86 will be able to inject
>>> interrupt without kicking vcpu out from the guest mode does ARM plan to
>>> do the same? For GIC interrupts or for IRQ/FIQ or for both?
>>>
>>>> There was a big discussion thread about this on kvm and qemu-devel last
>>>> July (and we cleaned up some of the QEMU code to not smoosh together
>>>> all these different concepts under "do I have an irqchip or not?").
>>> Do you have a pointer?
>>    http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02460.html
>> and there was a later longer (but less clear) thread which included
>> this mail from Avi:
>>    http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02872.html
>> basically explaining that the reason for the weird synchronous
>> KVM_INTERRUPT API is that it's emulating a weird synchronous
>> hardware interface which is specific to x86. ARM doesn't have
>> a synchronous interface in the same way, so it's much more
>> straightforward to use KVM_IRQ_LINE.
>>
> OK. I see. So basically Avi saw KVM_INTERRUPT as an oddball interface
> required only for APIC emulation in userspace. It is used for PIC also,
> where this is not strictly needed, but this is for historical reasons
> (KVM_IRQ_LINE was introduces late and it is GSI centric on x86).
>
> Thank you for the pointer.

Yeah, please keep in mind that KVM_INTERRUPT is not a unified interface 
either. In fact, it is asynchronous on PPC :). And it's called 
KVM_S390_INTERRUPT on s390 and also asynchronous. X86 is the oddball here.

But I don't care whether we call the ioctl to steer CPU interrupt pins 
KVM_INTERRUPT, KVM_S390_INTERRUPT or KVM_IRQ_LINE, as long as the code 
makes it obvious what is happening.


Alex


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2013-01-15 16:25               ` Alexander Graf
  0 siblings, 0 replies; 160+ messages in thread
From: Alexander Graf @ 2013-01-15 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/15/2013 04:17 PM, Gleb Natapov wrote:
> On Tue, Jan 15, 2013 at 02:04:47PM +0000, Peter Maydell wrote:
>> On 15 January 2013 12:52, Gleb Natapov<gleb@redhat.com>  wrote:
>>> On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
>>>> On 15 January 2013 09:56, Gleb Natapov<gleb@redhat.com>  wrote:
>>>>>> ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
>>>>> CPU level interrupt should use KVM_INTERRUPT instead.
>>>> No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
>>>> delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
>>>> can be fed to the kernel asynchronously. It happens that on x86 "must be
>>>> delivered synchronously" and "not going to in kernel irqchip" are the same, but
>>>> this isn't true for other archs. For ARM all our interrupts can be fed
>>>> to the kernel asynchronously, and so we use KVM_IRQ_LINE in all
>>>> cases.
>>> I do no quite understand what you mean by synchronously and
>>> asynchronously.
>> Synchronously: the vcpu has to be stopped and userspace then
>> feeds in the interrupt to be taken when the guest is resumed.
>> Asynchronously: any old thread can tell the kernel there's an
>> interrupt, and the guest vcpu then deals with it when needed
>> (the vcpu thread may leave the guest but doesn't come out of
>> the host kernel to qemu).
>>
>>> The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
>>> is that former is used when destination cpu is known to userspace later
>>> is used when kernel code is involved in figuring out the destination.
>> This doesn't match up with Avi's explanation at all.
>>
>>> The
>>> injections themselves are currently synchronous for both of them on x86
>>> and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
>>> be injected into a guest and vcpu state is changed to inject interrupt
>>> during next guest entry. In the near feature x86 will be able to inject
>>> interrupt without kicking vcpu out from the guest mode does ARM plan to
>>> do the same? For GIC interrupts or for IRQ/FIQ or for both?
>>>
>>>> There was a big discussion thread about this on kvm and qemu-devel last
>>>> July (and we cleaned up some of the QEMU code to not smoosh together
>>>> all these different concepts under "do I have an irqchip or not?").
>>> Do you have a pointer?
>>    http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02460.html
>> and there was a later longer (but less clear) thread which included
>> this mail from Avi:
>>    http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02872.html
>> basically explaining that the reason for the weird synchronous
>> KVM_INTERRUPT API is that it's emulating a weird synchronous
>> hardware interface which is specific to x86. ARM doesn't have
>> a synchronous interface in the same way, so it's much more
>> straightforward to use KVM_IRQ_LINE.
>>
> OK. I see. So basically Avi saw KVM_INTERRUPT as an oddball interface
> required only for APIC emulation in userspace. It is used for PIC also,
> where this is not strictly needed, but this is for historical reasons
> (KVM_IRQ_LINE was introduces late and it is GSI centric on x86).
>
> Thank you for the pointer.

Yeah, please keep in mind that KVM_INTERRUPT is not a unified interface 
either. In fact, it is asynchronous on PPC :). And it's called 
KVM_S390_INTERRUPT on s390 and also asynchronous. X86 is the oddball here.

But I don't care whether we call the ioctl to steer CPU interrupt pins 
KVM_INTERRUPT, KVM_S390_INTERRUPT or KVM_IRQ_LINE, as long as the code 
makes it obvious what is happening.


Alex

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-15  9:43     ` Gleb Natapov
@ 2013-01-16  2:08       ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-16  2:08 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Antonios Motakis,
	Marcelo Tosatti, Rusty Russell, nicolas

On Tue, Jan 15, 2013 at 4:43 AM, Gleb Natapov <gleb@redhat.com> wrote:
> On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
>> Provides complete world-switch implementation to switch to other guests
>> running in non-secure modes. Includes Hyp exception handlers that
>> capture necessary exception information and stores the information on
>> the VCPU and KVM structures.
>>
>> The following Hyp-ABI is also documented in the code:
>>
>> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
>>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
>>    exception vector code will check that the HVC comes from VMID==0 and if
>>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
>>    - r0 contains a pointer to a HYP function
>>    - r1, r2, and r3 contain arguments to the above function.
>>    - The HYP function will be called with its arguments in r0, r1 and r2.
>>    On HYP function return, we return directly to SVC.
>>
>> A call to a function executing in Hyp mode is performed like the following:
>>
>>         <svc code>
>>         ldr     r0, =BSYM(my_hyp_fn)
>>         ldr     r1, =my_param
>>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
>>         <svc code>
>>
>> Otherwise, the world-switch is pretty straight-forward. All state that
>> can be modified by the guest is first backed up on the Hyp stack and the
>> VCPU values is loaded onto the hardware. State, which is not loaded, but
>> theoretically modifiable by the guest is protected through the
>> virtualiation features to generate a trap and cause software emulation.
>> Upon guest returns, all state is restored from hardware onto the VCPU
>> struct and the original state is restored from the Hyp-stack onto the
>> hardware.
>>
>> SMP support using the VMPIDR calculated on the basis of the host MPIDR
>> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
>>
>> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
>> a separate patch into the appropriate patches introducing the
>> functionality. Note that the VMIDs are stored per VM as required by the ARM
>> architecture reference manual.
>>
>> To support VFP/NEON we trap those instructions using the HPCTR. When
>> we trap, we switch the FPU.  After a guest exit, the VFP state is
>> returned to the host.  When disabling access to floating point
>> instructions, we also mask FPEXC_EN in order to avoid the guest
>> receiving Undefined instruction exceptions before we have a chance to
>> switch back the floating point state.  We are reusing vfp_hard_struct,
>> so we depend on VFPv3 being enabled in the host kernel, if not, we still
>> trap cp10 and cp11 in order to inject an undefined instruction exception
>> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
>> Antionios Motakis and Rusty Russell.
>>
>> Aborts that are permission faults, and not stage-1 page table walk, do
>> not report the faulting address in the HPFAR.  We have to resolve the
>> IPA, and store it just like the HPFAR register on the VCPU struct. If
>> the IPA cannot be resolved, it means another CPU is playing with the
>> page tables, and we simply restart the guest.  This quirk was fixed by
>> Marc Zyngier.
>>
>> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
>> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
>>  arch/arm/include/asm/kvm_host.h |   10 +
>>  arch/arm/kernel/asm-offsets.c   |   25 ++
>>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
>>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
>>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
>>  6 files changed, 1108 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/arm/kvm/interrupts_head.S
>>
>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index fb22ee8..a3262a2 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -98,6 +98,18 @@
>>  #define TTBCR_T0SZ   3
>>  #define HTCR_MASK    (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
>>
>> +/* Hyp System Trap Register */
>> +#define HSTR_T(x)    (1 << x)
>> +#define HSTR_TTEE    (1 << 16)
>> +#define HSTR_TJDBX   (1 << 17)
>> +
>> +/* Hyp Coprocessor Trap Register */
>> +#define HCPTR_TCP(x) (1 << x)
>> +#define HCPTR_TCP_MASK       (0x3fff)
>> +#define HCPTR_TASE   (1 << 15)
>> +#define HCPTR_TTA    (1 << 20)
>> +#define HCPTR_TCPAC  (1 << 31)
>> +
>>  /* Hyp Debug Configuration Register bits */
>>  #define HDCR_TDRA    (1 << 11)
>>  #define HDCR_TDOSA   (1 << 10)
>> @@ -144,6 +156,45 @@
>>  #else
>>  #define VTTBR_X              (5 - KVM_T0SZ)
>>  #endif
>> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
>> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
>> +#define VTTBR_VMID_SHIFT  (48LLU)
>> +#define VTTBR_VMID_MASK        (0xffLLU << VTTBR_VMID_SHIFT)
>> +
>> +/* Hyp Syndrome Register (HSR) bits */
>> +#define HSR_EC_SHIFT (26)
>> +#define HSR_EC               (0x3fU << HSR_EC_SHIFT)
>> +#define HSR_IL               (1U << 25)
>> +#define HSR_ISS              (HSR_IL - 1)
>> +#define HSR_ISV_SHIFT        (24)
>> +#define HSR_ISV              (1U << HSR_ISV_SHIFT)
>> +#define HSR_FSC              (0x3f)
>> +#define HSR_FSC_TYPE (0x3c)
>> +#define HSR_WNR              (1 << 6)
>> +
>> +#define FSC_FAULT    (0x04)
>> +#define FSC_PERM     (0x0c)
>> +
>> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
>> +#define HPFAR_MASK   (~0xf)
>>
>> +#define HSR_EC_UNKNOWN       (0x00)
>> +#define HSR_EC_WFI   (0x01)
>> +#define HSR_EC_CP15_32       (0x03)
>> +#define HSR_EC_CP15_64       (0x04)
>> +#define HSR_EC_CP14_MR       (0x05)
>> +#define HSR_EC_CP14_LS       (0x06)
>> +#define HSR_EC_CP_0_13       (0x07)
>> +#define HSR_EC_CP10_ID       (0x08)
>> +#define HSR_EC_JAZELLE       (0x09)
>> +#define HSR_EC_BXJ   (0x0A)
>> +#define HSR_EC_CP14_64       (0x0C)
>> +#define HSR_EC_SVC_HYP       (0x11)
>> +#define HSR_EC_HVC   (0x12)
>> +#define HSR_EC_SMC   (0x13)
>> +#define HSR_EC_IABT  (0x20)
>> +#define HSR_EC_IABT_HYP      (0x21)
>> +#define HSR_EC_DABT  (0x24)
>> +#define HSR_EC_DABT_HYP      (0x25)
>>
>>  #endif /* __ARM_KVM_ARM_H__ */
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 1de6f0d..ddb09da 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -21,6 +21,7 @@
>>
>>  #include <asm/kvm.h>
>>  #include <asm/kvm_asm.h>
>> +#include <asm/fpstate.h>
>>
>>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>>  #define KVM_USER_MEM_SLOTS 32
>> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
>>       u32 hxfar;              /* Hyp Data/Inst Fault Address Register */
>>       u32 hpfar;              /* Hyp IPA Fault Address Register */
>>
>> +     /* Floating point registers (VFP and Advanced SIMD/NEON) */
>> +     struct vfp_hard_struct vfp_guest;
>> +     struct vfp_hard_struct *vfp_host;
>> +
>> +     /*
>> +      * Anything that is not used directly from assembly code goes
>> +      * here.
>> +      */
>>       /* Interrupt related fields */
>>       u32 irq_lines;          /* IRQ and FIQ levels */
>>
>> @@ -112,6 +121,7 @@ struct kvm_one_reg;
>>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>>  u64 kvm_call_hyp(void *hypfn, ...);
>> +void force_vm_exit(const cpumask_t *mask);
>>
>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>>  struct kvm;
>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>> index c985b48..c8b3272 100644
>> --- a/arch/arm/kernel/asm-offsets.c
>> +++ b/arch/arm/kernel/asm-offsets.c
>> @@ -13,6 +13,9 @@
>>  #include <linux/sched.h>
>>  #include <linux/mm.h>
>>  #include <linux/dma-mapping.h>
>> +#ifdef CONFIG_KVM_ARM_HOST
>> +#include <linux/kvm_host.h>
>> +#endif
>>  #include <asm/cacheflush.h>
>>  #include <asm/glue-df.h>
>>  #include <asm/glue-pf.h>
>> @@ -146,5 +149,27 @@ int main(void)
>>    DEFINE(DMA_BIDIRECTIONAL,  DMA_BIDIRECTIONAL);
>>    DEFINE(DMA_TO_DEVICE,              DMA_TO_DEVICE);
>>    DEFINE(DMA_FROM_DEVICE,    DMA_FROM_DEVICE);
>> +#ifdef CONFIG_KVM_ARM_HOST
>> +  DEFINE(VCPU_KVM,           offsetof(struct kvm_vcpu, kvm));
>> +  DEFINE(VCPU_MIDR,          offsetof(struct kvm_vcpu, arch.midr));
>> +  DEFINE(VCPU_CP15,          offsetof(struct kvm_vcpu, arch.cp15));
>> +  DEFINE(VCPU_VFP_GUEST,     offsetof(struct kvm_vcpu, arch.vfp_guest));
>> +  DEFINE(VCPU_VFP_HOST,              offsetof(struct kvm_vcpu, arch.vfp_host));
>> +  DEFINE(VCPU_REGS,          offsetof(struct kvm_vcpu, arch.regs));
>> +  DEFINE(VCPU_USR_REGS,              offsetof(struct kvm_vcpu, arch.regs.usr_regs));
>> +  DEFINE(VCPU_SVC_REGS,              offsetof(struct kvm_vcpu, arch.regs.svc_regs));
>> +  DEFINE(VCPU_ABT_REGS,              offsetof(struct kvm_vcpu, arch.regs.abt_regs));
>> +  DEFINE(VCPU_UND_REGS,              offsetof(struct kvm_vcpu, arch.regs.und_regs));
>> +  DEFINE(VCPU_IRQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.irq_regs));
>> +  DEFINE(VCPU_FIQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
>> +  DEFINE(VCPU_PC,            offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
>> +  DEFINE(VCPU_CPSR,          offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
>> +  DEFINE(VCPU_IRQ_LINES,     offsetof(struct kvm_vcpu, arch.irq_lines));
>> +  DEFINE(VCPU_HSR,           offsetof(struct kvm_vcpu, arch.hsr));
>> +  DEFINE(VCPU_HxFAR,         offsetof(struct kvm_vcpu, arch.hxfar));
>> +  DEFINE(VCPU_HPFAR,         offsetof(struct kvm_vcpu, arch.hpfar));
>> +  DEFINE(VCPU_HYP_PC,                offsetof(struct kvm_vcpu, arch.hyp_pc));
>> +  DEFINE(KVM_VTTBR,          offsetof(struct kvm, arch.vttbr));
>> +#endif
>>    return 0;
>>  }
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 9b4566e..c94d278 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -40,6 +40,7 @@
>>  #include <asm/kvm_arm.h>
>>  #include <asm/kvm_asm.h>
>>  #include <asm/kvm_mmu.h>
>> +#include <asm/kvm_emulate.h>
>>
>>  #ifdef REQUIRES_VIRT
>>  __asm__(".arch_extension     virt");
>> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>>  static unsigned long hyp_default_vectors;
>>
>> +/* The VMID used in the VTTBR */
>> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
>> +static u8 kvm_next_vmid;
>> +static DEFINE_SPINLOCK(kvm_vmid_lock);
>>
>>  int kvm_arch_hardware_enable(void *garbage)
>>  {
>> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
>>
>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>  {
>> +     /* Force users to call KVM_ARM_VCPU_INIT */
>> +     vcpu->arch.target = -1;
>>       return 0;
>>  }
>>
>> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  {
>>       vcpu->cpu = cpu;
>> +     vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
>>  }
>>
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>>
>>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
> As far as I see the function is unused.
>
>>  {
>> +     return v->mode == IN_GUEST_MODE;
>> +}
>> +
>> +/* Just ensure a guest exit from a particular CPU */
>> +static void exit_vm_noop(void *info)
>> +{
>> +}
>> +
>> +void force_vm_exit(const cpumask_t *mask)
>> +{
>> +     smp_call_function_many(mask, exit_vm_noop, NULL, true);
>> +}
> There is make_all_cpus_request() for that. It actually sends IPIs only
> to cpus that are running vcpus.
>
>> +
>> +/**
>> + * need_new_vmid_gen - check that the VMID is still valid
>> + * @kvm: The VM's VMID to checkt
>> + *
>> + * return true if there is a new generation of VMIDs being used
>> + *
>> + * The hardware supports only 256 values with the value zero reserved for the
>> + * host, so we check if an assigned value belongs to a previous generation,
>> + * which which requires us to assign a new value. If we're the first to use a
>> + * VMID for the new generation, we must flush necessary caches and TLBs on all
>> + * CPUs.
>> + */
>> +static bool need_new_vmid_gen(struct kvm *kvm)
>> +{
>> +     return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
>> +}
>> +
>> +/**
>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>> + * @kvm      The guest that we are about to run
>> + *
>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
>> + * caches and TLBs.
>> + */
>> +static void update_vttbr(struct kvm *kvm)
>> +{
>> +     phys_addr_t pgd_phys;
>> +     u64 vmid;
>> +
>> +     if (!need_new_vmid_gen(kvm))
>> +             return;
>> +
>> +     spin_lock(&kvm_vmid_lock);
>> +
>> +     /*
>> +      * We need to re-check the vmid_gen here to ensure that if another vcpu
>> +      * already allocated a valid vmid for this vm, then this vcpu should
>> +      * use the same vmid.
>> +      */
>> +     if (!need_new_vmid_gen(kvm)) {
>> +             spin_unlock(&kvm_vmid_lock);
>> +             return;
>> +     }
>> +
>> +     /* First user of a new VMID generation? */
>> +     if (unlikely(kvm_next_vmid == 0)) {
>> +             atomic64_inc(&kvm_vmid_gen);
>> +             kvm_next_vmid = 1;
>> +
>> +             /*
>> +              * On SMP we know no other CPUs can use this CPU's or each
>> +              * other's VMID after force_vm_exit returns since the
>> +              * kvm_vmid_lock blocks them from reentry to the guest.
>> +              */
>> +             force_vm_exit(cpu_all_mask);
>> +             /*
>> +              * Now broadcast TLB + ICACHE invalidation over the inner
>> +              * shareable domain to make sure all data structures are
>> +              * clean.
>> +              */
>> +             kvm_call_hyp(__kvm_flush_vm_context);
>> +     }
>> +
>> +     kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
>> +     kvm->arch.vmid = kvm_next_vmid;
>> +     kvm_next_vmid++;
>> +
>> +     /* update vttbr to be used with the new vmid */
>> +     pgd_phys = virt_to_phys(kvm->arch.pgd);
>> +     vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
>> +     kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
>> +     kvm->arch.vttbr |= vmid;
>> +
>> +     spin_unlock(&kvm_vmid_lock);
>> +}
>> +
>> +/*
>> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>> + * proper exit to QEMU.
>> + */
>> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>> +                    int exception_index)
>> +{
>> +     run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>>       return 0;
>>  }
>>
>> +/**
>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>> + * @vcpu:    The VCPU pointer
>> + * @run:     The kvm_run structure pointer used for userspace state exchange
>> + *
>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>> + * will execute VM code in a loop until the time slice for the process is used
>> + * or some emulation is needed from user space in which case the function will
>> + * return with return value 0 and with the kvm_run structure filled in with the
>> + * required data for the requested emulation.
>> + */
>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  {
>> -     return -EINVAL;
>> +     int ret;
>> +     sigset_t sigsaved;
>> +
>> +     /* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
>> +     if (unlikely(vcpu->arch.target < 0))
>> +             return -ENOEXEC;
>> +
>> +     if (vcpu->sigset_active)
>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>> +
>> +     ret = 1;
>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
>> +     while (ret > 0) {
>> +             /*
>> +              * Check conditions before entering the guest
>> +              */
>> +             cond_resched();
>> +
>> +             update_vttbr(vcpu->kvm);
>> +
>> +             local_irq_disable();
>> +
>> +             /*
>> +              * Re-check atomic conditions
>> +              */
>> +             if (signal_pending(current)) {
>> +                     ret = -EINTR;
>> +                     run->exit_reason = KVM_EXIT_INTR;
>> +             }
>> +
>> +             if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>> +                     local_irq_enable();
>> +                     continue;
>> +             }
>> +
>> +             /**************************************************************
>> +              * Enter the guest
>> +              */
>> +             trace_kvm_entry(*vcpu_pc(vcpu));
>> +             kvm_guest_enter();
>> +             vcpu->mode = IN_GUEST_MODE;
> You need to set mode to IN_GUEST_MODE before disabling interrupt and
> check that mode != EXITING_GUEST_MODE after disabling interrupt but
> before entering the guest. This way you will catch kicks that were sent
> between setting of the mode and disabling the interrupts. Also you need
> to check vcpu->requests and exit if it is not empty. I see that you do
> not use vcpu->requests at all, but you should since common kvm code
> assumes that it is used. make_all_cpus_request() uses it for instance.
>

I don't quite agree, but almost:

Why would you set IN_GUEST_MODE before disabling interrupts? The only
reason I can see for to be a requirement is to leverage an implicit
memory barrier. Receiving the IPI in this little window does nothing
(the smp_cross_call is a noop).

Checking that mode != EXITING_GUEST_MODE is equally useless in my
opinion, as I read the requests code the only reason for this mode is
to avoid sending an IPI twice.

Kicks sent between setting the mode and disabling the interrupts is
not the point, the point is to check the requests field (which we
don't use at all on ARM, and generic code also doesn't use on ARM)
after disabling interrupts, and after setting IN_GUEST_MODE.

The patch below fixes your issues, and while I would push back on
anything else than direct bug fixes at this point, the current code is
semantically incorrect wrt. KVM vcpu requests, so it's worth a fix,
and the patch itself is trivial.

>> +
>> +             ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> You do not take kvm->srcu lock before entering the guest. It looks
> wrong.
>

why would I take that before entering the guest? The only thing the
read side RCU protects against is the memslots data structure as far
as I can see, so the second patch pasted below fixes this for the code
that actually accesses this data structure.

>> +
>> +             vcpu->mode = OUTSIDE_GUEST_MODE;
>> +             kvm_guest_exit();
>> +             trace_kvm_exit(*vcpu_pc(vcpu));
>> +             /*
>> +              * We may have taken a host interrupt in HYP mode (ie
>> +              * while executing the guest). This interrupt is still
>> +              * pending, as we haven't serviced it yet!
>> +              *
>> +              * We're now back in SVC mode, with interrupts
>> +              * disabled.  Enabling the interrupts now will have
>> +              * the effect of taking the interrupt again, in SVC
>> +              * mode this time.
>> +              */
>> +             local_irq_enable();
>> +
>> +             /*
>> +              * Back from guest
>> +              *************************************************************/
>> +
>> +             ret = handle_exit(vcpu, run, ret);
>> +     }
>> +
>> +     if (vcpu->sigset_active)
>> +             sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>> +     return ret;
>>  }
>>
>>  static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index a923590..08adcd5 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -20,9 +20,12 @@
>>  #include <linux/const.h>
>>  #include <asm/unified.h>
>>  #include <asm/page.h>
>> +#include <asm/ptrace.h>
>>  #include <asm/asm-offsets.h>
>>  #include <asm/kvm_asm.h>
>>  #include <asm/kvm_arm.h>
>> +#include <asm/vfpmacros.h>
>> +#include "interrupts_head.S"
>>
>>       .text
>>
>> @@ -31,36 +34,423 @@ __kvm_hyp_code_start:
>>
>>  /********************************************************************
>>   * Flush per-VMID TLBs
>> + *
>> + * void __kvm_tlb_flush_vmid(struct kvm *kvm);
>> + *
>> + * We rely on the hardware to broadcast the TLB invalidation to all CPUs
>> + * inside the inner-shareable domain (which is the case for all v7
>> + * implementations).  If we come across a non-IS SMP implementation, we'll
>> + * have to use an IPI based mechanism. Until then, we stick to the simple
>> + * hardware assisted version.
>>   */
>>  ENTRY(__kvm_tlb_flush_vmid)
>> +     push    {r2, r3}
>> +
>> +     add     r0, r0, #KVM_VTTBR
>> +     ldrd    r2, r3, [r0]
>> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>> +     isb
>> +     mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
>> +     dsb
>> +     isb
>> +     mov     r2, #0
>> +     mov     r3, #0
>> +     mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
>> +     isb                             @ Not necessary if followed by eret
>> +
>> +     pop     {r2, r3}
>>       bx      lr
>>  ENDPROC(__kvm_tlb_flush_vmid)
>>
>>  /********************************************************************
>> - * Flush TLBs and instruction caches of current CPU for all VMIDs
>> + * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>> + * domain, for all VMIDs
>> + *
>> + * void __kvm_flush_vm_context(void);
>>   */
>>  ENTRY(__kvm_flush_vm_context)
>> +     mov     r0, #0                  @ rn parameter for c15 flushes is SBZ
>> +
>> +     /* Invalidate NS Non-Hyp TLB Inner Shareable (TLBIALLNSNHIS) */
>> +     mcr     p15, 4, r0, c8, c3, 4
>> +     /* Invalidate instruction caches Inner Shareable (ICIALLUIS) */
>> +     mcr     p15, 0, r0, c7, c1, 0
>> +     dsb
>> +     isb                             @ Not necessary if followed by eret
>> +
>>       bx      lr
>>  ENDPROC(__kvm_flush_vm_context)
>>
>> +
>>  /********************************************************************
>>   *  Hypervisor world-switch code
>> + *
>> + *
>> + * int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>>   */
>>  ENTRY(__kvm_vcpu_run)
>> -     bx      lr
>> +     @ Save the vcpu pointer
>> +     mcr     p15, 4, vcpu, c13, c0, 2        @ HTPIDR
>> +
>> +     save_host_regs
>> +
>> +     @ Store hardware CP15 state and load guest state
>> +     read_cp15_state store_to_vcpu = 0
>> +     write_cp15_state read_from_vcpu = 1
>> +
>> +     @ If the host kernel has not been configured with VFPv3 support,
>> +     @ then it is safer if we deny guests from using it as well.
>> +#ifdef CONFIG_VFPv3
>> +     @ Set FPEXC_EN so the guest doesn't trap floating point instructions
>> +     VFPFMRX r2, FPEXC               @ VMRS
>> +     push    {r2}
>> +     orr     r2, r2, #FPEXC_EN
>> +     VFPFMXR FPEXC, r2               @ VMSR
>> +#endif
>> +
>> +     @ Configure Hyp-role
>> +     configure_hyp_role vmentry
>> +
>> +     @ Trap coprocessor CRx accesses
>> +     set_hstr vmentry
>> +     set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>> +     set_hdcr vmentry
>> +
>> +     @ Write configured ID register into MIDR alias
>> +     ldr     r1, [vcpu, #VCPU_MIDR]
>> +     mcr     p15, 4, r1, c0, c0, 0
>> +
>> +     @ Write guest view of MPIDR into VMPIDR
>> +     ldr     r1, [vcpu, #CP15_OFFSET(c0_MPIDR)]
>> +     mcr     p15, 4, r1, c0, c0, 5
>> +
>> +     @ Set up guest memory translation
>> +     ldr     r1, [vcpu, #VCPU_KVM]
>> +     add     r1, r1, #KVM_VTTBR
>> +     ldrd    r2, r3, [r1]
>> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>> +
>> +     @ We're all done, just restore the GPRs and go to the guest
>> +     restore_guest_regs
>> +     clrex                           @ Clear exclusive monitor
>> +     eret
>> +
>> +__kvm_vcpu_return:
>> +     /*
>> +      * return convention:
>> +      * guest r0, r1, r2 saved on the stack
>> +      * r0: vcpu pointer
>> +      * r1: exception code
>> +      */
>> +     save_guest_regs
>> +
>> +     @ Set VMID == 0
>> +     mov     r2, #0
>> +     mov     r3, #0
>> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>> +
>> +     @ Don't trap coprocessor accesses for host kernel
>> +     set_hstr vmexit
>> +     set_hdcr vmexit
>> +     set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>> +
>> +#ifdef CONFIG_VFPv3
>> +     @ Save floating point registers we if let guest use them.
>> +     tst     r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
>> +     bne     after_vfp_restore
>> +
>> +     @ Switch VFP/NEON hardware state to the host's
>> +     add     r7, vcpu, #VCPU_VFP_GUEST
>> +     store_vfp_state r7
>> +     add     r7, vcpu, #VCPU_VFP_HOST
>> +     ldr     r7, [r7]
>> +     restore_vfp_state r7
>> +
>> +after_vfp_restore:
>> +     @ Restore FPEXC_EN which we clobbered on entry
>> +     pop     {r2}
>> +     VFPFMXR FPEXC, r2
>> +#endif
>> +
>> +     @ Reset Hyp-role
>> +     configure_hyp_role vmexit
>> +
>> +     @ Let host read hardware MIDR
>> +     mrc     p15, 0, r2, c0, c0, 0
>> +     mcr     p15, 4, r2, c0, c0, 0
>> +
>> +     @ Back to hardware MPIDR
>> +     mrc     p15, 0, r2, c0, c0, 5
>> +     mcr     p15, 4, r2, c0, c0, 5
>> +
>> +     @ Store guest CP15 state and restore host state
>> +     read_cp15_state store_to_vcpu = 1
>> +     write_cp15_state read_from_vcpu = 0
>> +
>> +     restore_host_regs
>> +     clrex                           @ Clear exclusive monitor
>> +     mov     r0, r1                  @ Return the return code
>> +     bx      lr                      @ return to IOCTL
>>
>>  ENTRY(kvm_call_hyp)
>> +     hvc     #0
>>       bx      lr
>>
>>
>>  /********************************************************************
>>   * Hypervisor exception vector and handlers
>> + *
>> + *
>> + * The KVM/ARM Hypervisor ABI is defined as follows:
>> + *
>> + * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
>> + * instruction is issued since all traps are disabled when running the host
>> + * kernel as per the Hyp-mode initialization at boot time.
>> + *
>> + * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
>> + * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
>> + * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
>> + * instructions are called from within Hyp-mode.
>> + *
>> + * Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
>> + *    Switching to Hyp mode is done through a simple HVC #0 instruction. The
>> + *    exception vector code will check that the HVC comes from VMID==0 and if
>> + *    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
>> + *    - r0 contains a pointer to a HYP function
>> + *    - r1, r2, and r3 contain arguments to the above function.
>> + *    - The HYP function will be called with its arguments in r0, r1 and r2.
>> + *    On HYP function return, we return directly to SVC.
>> + *
>> + * Note that the above is used to execute code in Hyp-mode from a host-kernel
>> + * point of view, and is a different concept from performing a world-switch and
>> + * executing guest code SVC mode (with a VMID != 0).
>>   */
>>
>> +/* Handle undef, svc, pabt, or dabt by crashing with a user notice */
>> +.macro bad_exception exception_code, panic_str
>> +     push    {r0-r2}
>> +     mrrc    p15, 6, r0, r1, c2      @ Read VTTBR
>> +     lsr     r1, r1, #16
>> +     ands    r1, r1, #0xff
>> +     beq     99f
>> +
>> +     load_vcpu                       @ Load VCPU pointer
>> +     .if \exception_code == ARM_EXCEPTION_DATA_ABORT
>> +     mrc     p15, 4, r2, c5, c2, 0   @ HSR
>> +     mrc     p15, 4, r1, c6, c0, 0   @ HDFAR
>> +     str     r2, [vcpu, #VCPU_HSR]
>> +     str     r1, [vcpu, #VCPU_HxFAR]
>> +     .endif
>> +     .if \exception_code == ARM_EXCEPTION_PREF_ABORT
>> +     mrc     p15, 4, r2, c5, c2, 0   @ HSR
>> +     mrc     p15, 4, r1, c6, c0, 2   @ HIFAR
>> +     str     r2, [vcpu, #VCPU_HSR]
>> +     str     r1, [vcpu, #VCPU_HxFAR]
>> +     .endif
>> +     mov     r1, #\exception_code
>> +     b       __kvm_vcpu_return
>> +
>> +     @ We were in the host already. Let's craft a panic-ing return to SVC.
>> +99:  mrs     r2, cpsr
>> +     bic     r2, r2, #MODE_MASK
>> +     orr     r2, r2, #SVC_MODE
>> +THUMB(       orr     r2, r2, #PSR_T_BIT      )
>> +     msr     spsr_cxsf, r2
>> +     mrs     r1, ELR_hyp
>> +     ldr     r2, =BSYM(panic)
>> +     msr     ELR_hyp, r2
>> +     ldr     r0, =\panic_str
>> +     eret
>> +.endm
>> +
>> +     .text
>> +
>>       .align 5
>>  __kvm_hyp_vector:
>>       .globl __kvm_hyp_vector
>> -     nop
>> +
>> +     @ Hyp-mode exception vector
>> +     W(b)    hyp_reset
>> +     W(b)    hyp_undef
>> +     W(b)    hyp_svc
>> +     W(b)    hyp_pabt
>> +     W(b)    hyp_dabt
>> +     W(b)    hyp_hvc
>> +     W(b)    hyp_irq
>> +     W(b)    hyp_fiq
>> +
>> +     .align
>> +hyp_reset:
>> +     b       hyp_reset
>> +
>> +     .align
>> +hyp_undef:
>> +     bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str
>> +
>> +     .align
>> +hyp_svc:
>> +     bad_exception ARM_EXCEPTION_HVC, svc_die_str
>> +
>> +     .align
>> +hyp_pabt:
>> +     bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str
>> +
>> +     .align
>> +hyp_dabt:
>> +     bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str
>> +
>> +     .align
>> +hyp_hvc:
>> +     /*
>> +      * Getting here is either becuase of a trap from a guest or from calling
>> +      * HVC from the host kernel, which means "switch to Hyp mode".
>> +      */
>> +     push    {r0, r1, r2}
>> +
>> +     @ Check syndrome register
>> +     mrc     p15, 4, r1, c5, c2, 0   @ HSR
>> +     lsr     r0, r1, #HSR_EC_SHIFT
>> +#ifdef CONFIG_VFPv3
>> +     cmp     r0, #HSR_EC_CP_0_13
>> +     beq     switch_to_guest_vfp
>> +#endif
>> +     cmp     r0, #HSR_EC_HVC
>> +     bne     guest_trap              @ Not HVC instr.
>> +
>> +     /*
>> +      * Let's check if the HVC came from VMID 0 and allow simple
>> +      * switch to Hyp mode
>> +      */
>> +     mrrc    p15, 6, r0, r2, c2
>> +     lsr     r2, r2, #16
>> +     and     r2, r2, #0xff
>> +     cmp     r2, #0
>> +     bne     guest_trap              @ Guest called HVC
>> +
>> +host_switch_to_hyp:
>> +     pop     {r0, r1, r2}
>> +
>> +     push    {lr}
>> +     mrs     lr, SPSR
>> +     push    {lr}
>> +
>> +     mov     lr, r0
>> +     mov     r0, r1
>> +     mov     r1, r2
>> +     mov     r2, r3
>> +
>> +THUMB(       orr     lr, #1)
>> +     blx     lr                      @ Call the HYP function
>> +
>> +     pop     {lr}
>> +     msr     SPSR_csxf, lr
>> +     pop     {lr}
>> +     eret
>> +
>> +guest_trap:
>> +     load_vcpu                       @ Load VCPU pointer to r0
>> +     str     r1, [vcpu, #VCPU_HSR]
>> +
>> +     @ Check if we need the fault information
>> +     lsr     r1, r1, #HSR_EC_SHIFT
>> +     cmp     r1, #HSR_EC_IABT
>> +     mrceq   p15, 4, r2, c6, c0, 2   @ HIFAR
>> +     beq     2f
>> +     cmp     r1, #HSR_EC_DABT
>> +     bne     1f
>> +     mrc     p15, 4, r2, c6, c0, 0   @ HDFAR
>> +
>> +2:   str     r2, [vcpu, #VCPU_HxFAR]
>> +
>> +     /*
>> +      * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode:
>> +      *
>> +      * Abort on the stage 2 translation for a memory access from a
>> +      * Non-secure PL1 or PL0 mode:
>> +      *
>> +      * For any Access flag fault or Translation fault, and also for any
>> +      * Permission fault on the stage 2 translation of a memory access
>> +      * made as part of a translation table walk for a stage 1 translation,
>> +      * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR
>> +      * is UNKNOWN.
>> +      */
>> +
>> +     /* Check for permission fault, and S1PTW */
>> +     mrc     p15, 4, r1, c5, c2, 0   @ HSR
>> +     and     r0, r1, #HSR_FSC_TYPE
>> +     cmp     r0, #FSC_PERM
>> +     tsteq   r1, #(1 << 7)           @ S1PTW
>> +     mrcne   p15, 4, r2, c6, c0, 4   @ HPFAR
>> +     bne     3f
>> +
>> +     /* Resolve IPA using the xFAR */
>> +     mcr     p15, 0, r2, c7, c8, 0   @ ATS1CPR
>> +     isb
>> +     mrrc    p15, 0, r0, r1, c7      @ PAR
>> +     tst     r0, #1
>> +     bne     4f                      @ Failed translation
>> +     ubfx    r2, r0, #12, #20
>> +     lsl     r2, r2, #4
>> +     orr     r2, r2, r1, lsl #24
>> +
>> +3:   load_vcpu                       @ Load VCPU pointer to r0
>> +     str     r2, [r0, #VCPU_HPFAR]
>> +
>> +1:   mov     r1, #ARM_EXCEPTION_HVC
>> +     b       __kvm_vcpu_return
>> +
>> +4:   pop     {r0, r1, r2}            @ Failed translation, return to guest
>> +     eret
>> +
>> +/*
>> + * If VFPv3 support is not available, then we will not switch the VFP
>> + * registers; however cp10 and cp11 accesses will still trap and fallback
>> + * to the regular coprocessor emulation code, which currently will
>> + * inject an undefined exception to the guest.
>> + */
>> +#ifdef CONFIG_VFPv3
>> +switch_to_guest_vfp:
>> +     load_vcpu                       @ Load VCPU pointer to r0
>> +     push    {r3-r7}
>> +
>> +     @ NEON/VFP used.  Turn on VFP access.
>> +     set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
>> +
>> +     @ Switch VFP/NEON hardware state to the guest's
>> +     add     r7, r0, #VCPU_VFP_HOST
>> +     ldr     r7, [r7]
>> +     store_vfp_state r7
>> +     add     r7, r0, #VCPU_VFP_GUEST
>> +     restore_vfp_state r7
>> +
>> +     pop     {r3-r7}
>> +     pop     {r0-r2}
>> +     eret
>> +#endif
>> +
>> +     .align
>> +hyp_irq:
>> +     push    {r0, r1, r2}
>> +     mov     r1, #ARM_EXCEPTION_IRQ
>> +     load_vcpu                       @ Load VCPU pointer to r0
>> +     b       __kvm_vcpu_return
>> +
>> +     .align
>> +hyp_fiq:
>> +     b       hyp_fiq
>> +
>> +     .ltorg
>>
>>  __kvm_hyp_code_end:
>>       .globl  __kvm_hyp_code_end
>> +
>> +     .section ".rodata"
>> +
>> +und_die_str:
>> +     .ascii  "unexpected undefined exception in Hyp mode at: %#08x"
>> +pabt_die_str:
>> +     .ascii  "unexpected prefetch abort in Hyp mode at: %#08x"
>> +dabt_die_str:
>> +     .ascii  "unexpected data abort in Hyp mode at: %#08x"
>> +svc_die_str:
>> +     .ascii  "unexpected HVC/SVC trap in Hyp mode at: %#08x"
>> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
>> new file mode 100644
>> index 0000000..f59a580
>> --- /dev/null
>> +++ b/arch/arm/kvm/interrupts_head.S
>> @@ -0,0 +1,443 @@
>> +#define VCPU_USR_REG(_reg_nr)        (VCPU_USR_REGS + (_reg_nr * 4))
>> +#define VCPU_USR_SP          (VCPU_USR_REG(13))
>> +#define VCPU_USR_LR          (VCPU_USR_REG(14))
>> +#define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4))
>> +
>> +/*
>> + * Many of these macros need to access the VCPU structure, which is always
>> + * held in r0. These macros should never clobber r1, as it is used to hold the
>> + * exception code on the return path (except of course the macro that switches
>> + * all the registers before the final jump to the VM).
>> + */
>> +vcpu .req    r0              @ vcpu pointer always in r0
>> +
>> +/* Clobbers {r2-r6} */
>> +.macro store_vfp_state vfp_base
>> +     @ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions
>> +     VFPFMRX r2, FPEXC
>> +     @ Make sure VFP is enabled so we can touch the registers.
>> +     orr     r6, r2, #FPEXC_EN
>> +     VFPFMXR FPEXC, r6
>> +
>> +     VFPFMRX r3, FPSCR
>> +     tst     r2, #FPEXC_EX           @ Check for VFP Subarchitecture
>> +     beq     1f
>> +     @ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so
>> +     @ we only need to save them if FPEXC_EX is set.
>> +     VFPFMRX r4, FPINST
>> +     tst     r2, #FPEXC_FP2V
>> +     VFPFMRX r5, FPINST2, ne         @ vmrsne
>> +     bic     r6, r2, #FPEXC_EX       @ FPEXC_EX disable
>> +     VFPFMXR FPEXC, r6
>> +1:
>> +     VFPFSTMIA \vfp_base, r6         @ Save VFP registers
>> +     stm     \vfp_base, {r2-r5}      @ Save FPEXC, FPSCR, FPINST, FPINST2
>> +.endm
>> +
>> +/* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */
>> +.macro restore_vfp_state vfp_base
>> +     VFPFLDMIA \vfp_base, r6         @ Load VFP registers
>> +     ldm     \vfp_base, {r2-r5}      @ Load FPEXC, FPSCR, FPINST, FPINST2
>> +
>> +     VFPFMXR FPSCR, r3
>> +     tst     r2, #FPEXC_EX           @ Check for VFP Subarchitecture
>> +     beq     1f
>> +     VFPFMXR FPINST, r4
>> +     tst     r2, #FPEXC_FP2V
>> +     VFPFMXR FPINST2, r5, ne
>> +1:
>> +     VFPFMXR FPEXC, r2       @ FPEXC (last, in case !EN)
>> +.endm
>> +
>> +/* These are simply for the macros to work - value don't have meaning */
>> +.equ usr, 0
>> +.equ svc, 1
>> +.equ abt, 2
>> +.equ und, 3
>> +.equ irq, 4
>> +.equ fiq, 5
>> +
>> +.macro push_host_regs_mode mode
>> +     mrs     r2, SP_\mode
>> +     mrs     r3, LR_\mode
>> +     mrs     r4, SPSR_\mode
>> +     push    {r2, r3, r4}
>> +.endm
>> +
>> +/*
>> + * Store all host persistent registers on the stack.
>> + * Clobbers all registers, in all modes, except r0 and r1.
>> + */
>> +.macro save_host_regs
>> +     /* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
>> +     mrs     r2, ELR_hyp
>> +     push    {r2}
>> +
>> +     /* usr regs */
>> +     push    {r4-r12}        @ r0-r3 are always clobbered
>> +     mrs     r2, SP_usr
>> +     mov     r3, lr
>> +     push    {r2, r3}
>> +
>> +     push_host_regs_mode svc
>> +     push_host_regs_mode abt
>> +     push_host_regs_mode und
>> +     push_host_regs_mode irq
>> +
>> +     /* fiq regs */
>> +     mrs     r2, r8_fiq
>> +     mrs     r3, r9_fiq
>> +     mrs     r4, r10_fiq
>> +     mrs     r5, r11_fiq
>> +     mrs     r6, r12_fiq
>> +     mrs     r7, SP_fiq
>> +     mrs     r8, LR_fiq
>> +     mrs     r9, SPSR_fiq
>> +     push    {r2-r9}
>> +.endm
>> +
>> +.macro pop_host_regs_mode mode
>> +     pop     {r2, r3, r4}
>> +     msr     SP_\mode, r2
>> +     msr     LR_\mode, r3
>> +     msr     SPSR_\mode, r4
>> +.endm
>> +
>> +/*
>> + * Restore all host registers from the stack.
>> + * Clobbers all registers, in all modes, except r0 and r1.
>> + */
>> +.macro restore_host_regs
>> +     pop     {r2-r9}
>> +     msr     r8_fiq, r2
>> +     msr     r9_fiq, r3
>> +     msr     r10_fiq, r4
>> +     msr     r11_fiq, r5
>> +     msr     r12_fiq, r6
>> +     msr     SP_fiq, r7
>> +     msr     LR_fiq, r8
>> +     msr     SPSR_fiq, r9
>> +
>> +     pop_host_regs_mode irq
>> +     pop_host_regs_mode und
>> +     pop_host_regs_mode abt
>> +     pop_host_regs_mode svc
>> +
>> +     pop     {r2, r3}
>> +     msr     SP_usr, r2
>> +     mov     lr, r3
>> +     pop     {r4-r12}
>> +
>> +     pop     {r2}
>> +     msr     ELR_hyp, r2
>> +.endm
>> +
>> +/*
>> + * Restore SP, LR and SPSR for a given mode. offset is the offset of
>> + * this mode's registers from the VCPU base.
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + *
>> + * Clobbers r1, r2, r3, r4.
>> + */
>> +.macro restore_guest_regs_mode mode, offset
>> +     add     r1, vcpu, \offset
>> +     ldm     r1, {r2, r3, r4}
>> +     msr     SP_\mode, r2
>> +     msr     LR_\mode, r3
>> +     msr     SPSR_\mode, r4
>> +.endm
>> +
>> +/*
>> + * Restore all guest registers from the vcpu struct.
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + *
>> + * Clobbers *all* registers.
>> + */
>> +.macro restore_guest_regs
>> +     restore_guest_regs_mode svc, #VCPU_SVC_REGS
>> +     restore_guest_regs_mode abt, #VCPU_ABT_REGS
>> +     restore_guest_regs_mode und, #VCPU_UND_REGS
>> +     restore_guest_regs_mode irq, #VCPU_IRQ_REGS
>> +
>> +     add     r1, vcpu, #VCPU_FIQ_REGS
>> +     ldm     r1, {r2-r9}
>> +     msr     r8_fiq, r2
>> +     msr     r9_fiq, r3
>> +     msr     r10_fiq, r4
>> +     msr     r11_fiq, r5
>> +     msr     r12_fiq, r6
>> +     msr     SP_fiq, r7
>> +     msr     LR_fiq, r8
>> +     msr     SPSR_fiq, r9
>> +
>> +     @ Load return state
>> +     ldr     r2, [vcpu, #VCPU_PC]
>> +     ldr     r3, [vcpu, #VCPU_CPSR]
>> +     msr     ELR_hyp, r2
>> +     msr     SPSR_cxsf, r3
>> +
>> +     @ Load user registers
>> +     ldr     r2, [vcpu, #VCPU_USR_SP]
>> +     ldr     r3, [vcpu, #VCPU_USR_LR]
>> +     msr     SP_usr, r2
>> +     mov     lr, r3
>> +     add     vcpu, vcpu, #(VCPU_USR_REGS)
>> +     ldm     vcpu, {r0-r12}
>> +.endm
>> +
>> +/*
>> + * Save SP, LR and SPSR for a given mode. offset is the offset of
>> + * this mode's registers from the VCPU base.
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + *
>> + * Clobbers r2, r3, r4, r5.
>> + */
>> +.macro save_guest_regs_mode mode, offset
>> +     add     r2, vcpu, \offset
>> +     mrs     r3, SP_\mode
>> +     mrs     r4, LR_\mode
>> +     mrs     r5, SPSR_\mode
>> +     stm     r2, {r3, r4, r5}
>> +.endm
>> +
>> +/*
>> + * Save all guest registers to the vcpu struct
>> + * Expects guest's r0, r1, r2 on the stack.
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + *
>> + * Clobbers r2, r3, r4, r5.
>> + */
>> +.macro save_guest_regs
>> +     @ Store usr registers
>> +     add     r2, vcpu, #VCPU_USR_REG(3)
>> +     stm     r2, {r3-r12}
>> +     add     r2, vcpu, #VCPU_USR_REG(0)
>> +     pop     {r3, r4, r5}            @ r0, r1, r2
>> +     stm     r2, {r3, r4, r5}
>> +     mrs     r2, SP_usr
>> +     mov     r3, lr
>> +     str     r2, [vcpu, #VCPU_USR_SP]
>> +     str     r3, [vcpu, #VCPU_USR_LR]
>> +
>> +     @ Store return state
>> +     mrs     r2, ELR_hyp
>> +     mrs     r3, spsr
>> +     str     r2, [vcpu, #VCPU_PC]
>> +     str     r3, [vcpu, #VCPU_CPSR]
>> +
>> +     @ Store other guest registers
>> +     save_guest_regs_mode svc, #VCPU_SVC_REGS
>> +     save_guest_regs_mode abt, #VCPU_ABT_REGS
>> +     save_guest_regs_mode und, #VCPU_UND_REGS
>> +     save_guest_regs_mode irq, #VCPU_IRQ_REGS
>> +.endm
>> +
>> +/* Reads cp15 registers from hardware and stores them in memory
>> + * @store_to_vcpu: If 0, registers are written in-order to the stack,
>> + *              otherwise to the VCPU struct pointed to by vcpup
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + *
>> + * Clobbers r2 - r12
>> + */
>> +.macro read_cp15_state store_to_vcpu
>> +     mrc     p15, 0, r2, c1, c0, 0   @ SCTLR
>> +     mrc     p15, 0, r3, c1, c0, 2   @ CPACR
>> +     mrc     p15, 0, r4, c2, c0, 2   @ TTBCR
>> +     mrc     p15, 0, r5, c3, c0, 0   @ DACR
>> +     mrrc    p15, 0, r6, r7, c2      @ TTBR 0
>> +     mrrc    p15, 1, r8, r9, c2      @ TTBR 1
>> +     mrc     p15, 0, r10, c10, c2, 0 @ PRRR
>> +     mrc     p15, 0, r11, c10, c2, 1 @ NMRR
>> +     mrc     p15, 2, r12, c0, c0, 0  @ CSSELR
>> +
>> +     .if \store_to_vcpu == 0
>> +     push    {r2-r12}                @ Push CP15 registers
>> +     .else
>> +     str     r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
>> +     str     r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
>> +     str     r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
>> +     str     r5, [vcpu, #CP15_OFFSET(c3_DACR)]
>> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
>> +     strd    r6, r7, [vcpu]
>> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
>> +     strd    r8, r9, [vcpu]
>> +     sub     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
>> +     str     r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
>> +     str     r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
>> +     str     r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
>> +     .endif
>> +
>> +     mrc     p15, 0, r2, c13, c0, 1  @ CID
>> +     mrc     p15, 0, r3, c13, c0, 2  @ TID_URW
>> +     mrc     p15, 0, r4, c13, c0, 3  @ TID_URO
>> +     mrc     p15, 0, r5, c13, c0, 4  @ TID_PRIV
>> +     mrc     p15, 0, r6, c5, c0, 0   @ DFSR
>> +     mrc     p15, 0, r7, c5, c0, 1   @ IFSR
>> +     mrc     p15, 0, r8, c5, c1, 0   @ ADFSR
>> +     mrc     p15, 0, r9, c5, c1, 1   @ AIFSR
>> +     mrc     p15, 0, r10, c6, c0, 0  @ DFAR
>> +     mrc     p15, 0, r11, c6, c0, 2  @ IFAR
>> +     mrc     p15, 0, r12, c12, c0, 0 @ VBAR
>> +
>> +     .if \store_to_vcpu == 0
>> +     push    {r2-r12}                @ Push CP15 registers
>> +     .else
>> +     str     r2, [vcpu, #CP15_OFFSET(c13_CID)]
>> +     str     r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
>> +     str     r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
>> +     str     r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
>> +     str     r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
>> +     str     r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
>> +     str     r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
>> +     str     r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
>> +     str     r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
>> +     str     r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
>> +     str     r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
>> +     .endif
>> +.endm
>> +
>> +/*
>> + * Reads cp15 registers from memory and writes them to hardware
>> + * @read_from_vcpu: If 0, registers are read in-order from the stack,
>> + *               otherwise from the VCPU struct pointed to by vcpup
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + */
>> +.macro write_cp15_state read_from_vcpu
>> +     .if \read_from_vcpu == 0
>> +     pop     {r2-r12}
>> +     .else
>> +     ldr     r2, [vcpu, #CP15_OFFSET(c13_CID)]
>> +     ldr     r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
>> +     ldr     r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
>> +     ldr     r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
>> +     ldr     r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
>> +     ldr     r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
>> +     ldr     r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
>> +     ldr     r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
>> +     ldr     r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
>> +     ldr     r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
>> +     ldr     r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
>> +     .endif
>> +
>> +     mcr     p15, 0, r2, c13, c0, 1  @ CID
>> +     mcr     p15, 0, r3, c13, c0, 2  @ TID_URW
>> +     mcr     p15, 0, r4, c13, c0, 3  @ TID_URO
>> +     mcr     p15, 0, r5, c13, c0, 4  @ TID_PRIV
>> +     mcr     p15, 0, r6, c5, c0, 0   @ DFSR
>> +     mcr     p15, 0, r7, c5, c0, 1   @ IFSR
>> +     mcr     p15, 0, r8, c5, c1, 0   @ ADFSR
>> +     mcr     p15, 0, r9, c5, c1, 1   @ AIFSR
>> +     mcr     p15, 0, r10, c6, c0, 0  @ DFAR
>> +     mcr     p15, 0, r11, c6, c0, 2  @ IFAR
>> +     mcr     p15, 0, r12, c12, c0, 0 @ VBAR
>> +
>> +     .if \read_from_vcpu == 0
>> +     pop     {r2-r12}
>> +     .else
>> +     ldr     r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
>> +     ldr     r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
>> +     ldr     r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
>> +     ldr     r5, [vcpu, #CP15_OFFSET(c3_DACR)]
>> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
>> +     ldrd    r6, r7, [vcpu]
>> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
>> +     ldrd    r8, r9, [vcpu]
>> +     sub     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
>> +     ldr     r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
>> +     ldr     r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
>> +     ldr     r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
>> +     .endif
>> +
>> +     mcr     p15, 0, r2, c1, c0, 0   @ SCTLR
>> +     mcr     p15, 0, r3, c1, c0, 2   @ CPACR
>> +     mcr     p15, 0, r4, c2, c0, 2   @ TTBCR
>> +     mcr     p15, 0, r5, c3, c0, 0   @ DACR
>> +     mcrr    p15, 0, r6, r7, c2      @ TTBR 0
>> +     mcrr    p15, 1, r8, r9, c2      @ TTBR 1
>> +     mcr     p15, 0, r10, c10, c2, 0 @ PRRR
>> +     mcr     p15, 0, r11, c10, c2, 1 @ NMRR
>> +     mcr     p15, 2, r12, c0, c0, 0  @ CSSELR
>> +.endm
>> +
>> +/*
>> + * Save the VGIC CPU state into memory
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + */
>> +.macro save_vgic_state
>> +.endm
>> +
>> +/*
>> + * Restore the VGIC CPU state from memory
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + */
>> +.macro restore_vgic_state
>> +.endm
>> +
>> +.equ vmentry,        0
>> +.equ vmexit, 1
>> +
>> +/* Configures the HSTR (Hyp System Trap Register) on entry/return
>> + * (hardware reset value is 0) */
>> +.macro set_hstr operation
>> +     mrc     p15, 4, r2, c1, c1, 3
>> +     ldr     r3, =HSTR_T(15)
>> +     .if \operation == vmentry
>> +     orr     r2, r2, r3              @ Trap CR{15}
>> +     .else
>> +     bic     r2, r2, r3              @ Don't trap any CRx accesses
>> +     .endif
>> +     mcr     p15, 4, r2, c1, c1, 3
>> +.endm
>> +
>> +/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
>> + * (hardware reset value is 0). Keep previous value in r2. */
>> +.macro set_hcptr operation, mask
>> +     mrc     p15, 4, r2, c1, c1, 2
>> +     ldr     r3, =\mask
>> +     .if \operation == vmentry
>> +     orr     r3, r2, r3              @ Trap coproc-accesses defined in mask
>> +     .else
>> +     bic     r3, r2, r3              @ Don't trap defined coproc-accesses
>> +     .endif
>> +     mcr     p15, 4, r3, c1, c1, 2
>> +.endm
>> +
>> +/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
>> + * (hardware reset value is 0) */
>> +.macro set_hdcr operation
>> +     mrc     p15, 4, r2, c1, c1, 1
>> +     ldr     r3, =(HDCR_TPM|HDCR_TPMCR)
>> +     .if \operation == vmentry
>> +     orr     r2, r2, r3              @ Trap some perfmon accesses
>> +     .else
>> +     bic     r2, r2, r3              @ Don't trap any perfmon accesses
>> +     .endif
>> +     mcr     p15, 4, r2, c1, c1, 1
>> +.endm
>> +
>> +/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
>> +.macro configure_hyp_role operation
>> +     mrc     p15, 4, r2, c1, c1, 0   @ HCR
>> +     bic     r2, r2, #HCR_VIRT_EXCP_MASK
>> +     ldr     r3, =HCR_GUEST_MASK
>> +     .if \operation == vmentry
>> +     orr     r2, r2, r3
>> +     ldr     r3, [vcpu, #VCPU_IRQ_LINES]
> irq_lines are accessed atomically from vcpu_interrupt_line(), but there
> is no memory barriers or atomic operations here. Looks suspicious though
> I am not familiar with ARM memory model. As far as I understand
> different translation regimes are used to access this memory, so who
> knows what this does to access ordering.
>
>

there's an exception taken to switch to Hyp mode, which I'm quite sure
implies a memory barrier.

>> +     orr     r2, r2, r3
>> +     .else
>> +     bic     r2, r2, r3
>> +     .endif
>> +     mcr     p15, 4, r2, c1, c1, 0
>> +.endm
>> +
>> +.macro load_vcpu
>> +     mrc     p15, 4, vcpu, c13, c0, 2        @ HTPIDR
>> +.endm
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

commit e290b507f0d31c895bd515d69c0c2b50d76b20db
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Tue Jan 15 20:53:03 2013 -0500

    KVM: ARM: Honor vcpu->requests in the world-switch code

    Honor vpuc->request by checking them accordingly and explicitly raise an
    error if unsupported requests are set (we don't support any requests on
    ARM currently).

    Also add some commenting to explain the synchronization in more details
    here.  The commenting implied renaming a variable and changing error
    handling slightly to improve readibility.

    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6ff5337..b23a709 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -620,7 +620,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
  */
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	int ret;
+	int guest_ret, ret;
 	sigset_t sigsaved;

 	/* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
@@ -640,9 +640,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
struct kvm_run *run)
 	if (vcpu->sigset_active)
 		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);

-	ret = 1;
 	run->exit_reason = KVM_EXIT_UNKNOWN;
-	while (ret > 0) {
+	for (;;) {
 		/*
 		 * Check conditions before entering the guest
 		 */
@@ -650,18 +649,44 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
*vcpu, struct kvm_run *run)

 		update_vttbr(vcpu->kvm);

+		/*
+		 * There is a dependency between setting IN_GUEST_MODE and
+		 * sending requests.  We need to ensure:
+		 *   1. Setting IN_GUEST_MODE before checking vcpu->requests.
+		 *   2. We need to check vcpu_request after disabling IRQs
+		 *      (see comment about signal_pending below).
+		 */
+		vcpu->mode = IN_GUEST_MODE;
+
 		local_irq_disable();

 		/*
-		 * Re-check atomic conditions
+		 * We need to be careful to check these variables after
+		 * disabling interrupts.  For example with signals:
+		 *   1. If the signal comes before the signal_pending check,
+		 *      we will return to user space and everything's good.
+		 *   2. If the signal comes after the signal_pending check,
+		 *      we rely on an IPI to exit the guest and continue the
+		 *      while loop, which checks for pending signals again.
 		 */
 		if (signal_pending(current)) {
 			ret = -EINTR;
 			run->exit_reason = KVM_EXIT_INTR;
+			local_irq_enable();
+			vcpu->mode = OUTSIDE_GUEST_MODE;
+			break;
 		}

-		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
+		if (vcpu->requests) {
+			ret = -ENOSYS; /* requests not supported */
 			local_irq_enable();
+			vcpu->mode = OUTSIDE_GUEST_MODE;
+			break;
+		}
+
+		if (need_new_vmid_gen(vcpu->kvm)) {
+			local_irq_enable();
+			vcpu->mode = OUTSIDE_GUEST_MODE;
 			continue;
 		}

@@ -670,17 +695,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 		 */
 		trace_kvm_entry(*vcpu_pc(vcpu));
 		kvm_guest_enter();
-		vcpu->mode = IN_GUEST_MODE;

 		smp_mb(); /* set mode before reading vcpu->arch.pause */
 		if (unlikely(vcpu->arch.pause)) {
 			/* This means ignore, try again. */
-			ret = ARM_EXCEPTION_IRQ;
+			guest_ret = ARM_EXCEPTION_IRQ;
 		} else {
-			ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
+			guest_ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 		}

-		vcpu->mode = OUTSIDE_GUEST_MODE;
 		vcpu->arch.last_pcpu = smp_processor_id();
 		kvm_guest_exit();
 		trace_kvm_exit(*vcpu_pc(vcpu));
@@ -695,12 +718,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 		 * mode this time.
 		 */
 		local_irq_enable();
+		vcpu->mode = OUTSIDE_GUEST_MODE;

 		/*
 		 * Back from guest
 		 *************************************************************/

-		ret = handle_exit(vcpu, run, ret);
+		ret = handle_exit(vcpu, run, guest_ret);
+		if (ret <= 0)
+			break;
 	}

 	if (vcpu->sigset_active)

commit fc9a9c5e9dd4eba4acd6bea5c8c083a9a854d662
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Tue Jan 15 20:42:15 2013 -0500

    KVM: ARM: Remove unused memslot parameter

diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
index 31ab9f5..571ccf0 100644
--- a/arch/arm/include/asm/kvm_mmio.h
+++ b/arch/arm/include/asm/kvm_mmio.h
@@ -46,6 +46,6 @@ static inline void kvm_prepare_mmio(struct kvm_run *run,

 int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
-		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot);
+		 phys_addr_t fault_ipa);

 #endif	/* __ARM_KVM_MMIO_H__ */
diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
index d6a4ca0..f655088 100644
--- a/arch/arm/kvm/mmio.c
+++ b/arch/arm/kvm/mmio.c
@@ -117,7 +117,7 @@ static int decode_hsr(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
 }

 int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
-		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
+		 phys_addr_t fault_ipa)
 {
 	struct kvm_exit_mmio mmio;
 	unsigned long rt;
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2a83ac9..c806080 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -588,7 +588,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
struct kvm_run *run)
 	unsigned long hsr_ec;
 	unsigned long fault_status;
 	phys_addr_t fault_ipa;
-	struct kvm_memory_slot *memslot = NULL;
+	struct kvm_memory_slot *memslot;
 	bool is_iabt;
 	gfn_t gfn;
 	int ret;
@@ -624,7 +624,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
struct kvm_run *run)

 		/* Adjust page offset */
 		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
-		return io_mem_abort(vcpu, run, fault_ipa, memslot);
+		return io_mem_abort(vcpu, run, fault_ipa);
 	}

 	memslot = gfn_to_memslot(vcpu->kvm, gfn);

commit 70667a06e445e240fb5e6352ccdc4bc8a290866e
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Tue Jan 15 20:51:42 2013 -0500

    KVM: ARM: Grab kvm->srcu lock when handling page faults

    The memslots data structure is protected with an SRCU lock, so we should
    grab the read side lock before traversing this data structure.

    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index c806080..0b7eabf 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -591,7 +591,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
struct kvm_run *run)
 	struct kvm_memory_slot *memslot;
 	bool is_iabt;
 	gfn_t gfn;
-	int ret;
+	int ret, idx;

 	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
 	is_iabt = (hsr_ec == HSR_EC_IABT);
@@ -627,13 +627,17 @@ int kvm_handle_guest_abort(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 		return io_mem_abort(vcpu, run, fault_ipa);
 	}

+	idx = srcu_read_lock(&vcpu->kvm->srcu);
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	if (!memslot->user_alloc) {
 		kvm_err("non user-alloc memslots not supported\n");
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out_unlock;
 	}

 	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
+out_unlock:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 	return ret ? ret : 1;
 }

--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-16  2:08       ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-16  2:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 4:43 AM, Gleb Natapov <gleb@redhat.com> wrote:
> On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
>> Provides complete world-switch implementation to switch to other guests
>> running in non-secure modes. Includes Hyp exception handlers that
>> capture necessary exception information and stores the information on
>> the VCPU and KVM structures.
>>
>> The following Hyp-ABI is also documented in the code:
>>
>> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
>>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
>>    exception vector code will check that the HVC comes from VMID==0 and if
>>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
>>    - r0 contains a pointer to a HYP function
>>    - r1, r2, and r3 contain arguments to the above function.
>>    - The HYP function will be called with its arguments in r0, r1 and r2.
>>    On HYP function return, we return directly to SVC.
>>
>> A call to a function executing in Hyp mode is performed like the following:
>>
>>         <svc code>
>>         ldr     r0, =BSYM(my_hyp_fn)
>>         ldr     r1, =my_param
>>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
>>         <svc code>
>>
>> Otherwise, the world-switch is pretty straight-forward. All state that
>> can be modified by the guest is first backed up on the Hyp stack and the
>> VCPU values is loaded onto the hardware. State, which is not loaded, but
>> theoretically modifiable by the guest is protected through the
>> virtualiation features to generate a trap and cause software emulation.
>> Upon guest returns, all state is restored from hardware onto the VCPU
>> struct and the original state is restored from the Hyp-stack onto the
>> hardware.
>>
>> SMP support using the VMPIDR calculated on the basis of the host MPIDR
>> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
>>
>> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
>> a separate patch into the appropriate patches introducing the
>> functionality. Note that the VMIDs are stored per VM as required by the ARM
>> architecture reference manual.
>>
>> To support VFP/NEON we trap those instructions using the HPCTR. When
>> we trap, we switch the FPU.  After a guest exit, the VFP state is
>> returned to the host.  When disabling access to floating point
>> instructions, we also mask FPEXC_EN in order to avoid the guest
>> receiving Undefined instruction exceptions before we have a chance to
>> switch back the floating point state.  We are reusing vfp_hard_struct,
>> so we depend on VFPv3 being enabled in the host kernel, if not, we still
>> trap cp10 and cp11 in order to inject an undefined instruction exception
>> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
>> Antionios Motakis and Rusty Russell.
>>
>> Aborts that are permission faults, and not stage-1 page table walk, do
>> not report the faulting address in the HPFAR.  We have to resolve the
>> IPA, and store it just like the HPFAR register on the VCPU struct. If
>> the IPA cannot be resolved, it means another CPU is playing with the
>> page tables, and we simply restart the guest.  This quirk was fixed by
>> Marc Zyngier.
>>
>> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
>> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
>>  arch/arm/include/asm/kvm_host.h |   10 +
>>  arch/arm/kernel/asm-offsets.c   |   25 ++
>>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
>>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
>>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
>>  6 files changed, 1108 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/arm/kvm/interrupts_head.S
>>
>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index fb22ee8..a3262a2 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -98,6 +98,18 @@
>>  #define TTBCR_T0SZ   3
>>  #define HTCR_MASK    (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
>>
>> +/* Hyp System Trap Register */
>> +#define HSTR_T(x)    (1 << x)
>> +#define HSTR_TTEE    (1 << 16)
>> +#define HSTR_TJDBX   (1 << 17)
>> +
>> +/* Hyp Coprocessor Trap Register */
>> +#define HCPTR_TCP(x) (1 << x)
>> +#define HCPTR_TCP_MASK       (0x3fff)
>> +#define HCPTR_TASE   (1 << 15)
>> +#define HCPTR_TTA    (1 << 20)
>> +#define HCPTR_TCPAC  (1 << 31)
>> +
>>  /* Hyp Debug Configuration Register bits */
>>  #define HDCR_TDRA    (1 << 11)
>>  #define HDCR_TDOSA   (1 << 10)
>> @@ -144,6 +156,45 @@
>>  #else
>>  #define VTTBR_X              (5 - KVM_T0SZ)
>>  #endif
>> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
>> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
>> +#define VTTBR_VMID_SHIFT  (48LLU)
>> +#define VTTBR_VMID_MASK        (0xffLLU << VTTBR_VMID_SHIFT)
>> +
>> +/* Hyp Syndrome Register (HSR) bits */
>> +#define HSR_EC_SHIFT (26)
>> +#define HSR_EC               (0x3fU << HSR_EC_SHIFT)
>> +#define HSR_IL               (1U << 25)
>> +#define HSR_ISS              (HSR_IL - 1)
>> +#define HSR_ISV_SHIFT        (24)
>> +#define HSR_ISV              (1U << HSR_ISV_SHIFT)
>> +#define HSR_FSC              (0x3f)
>> +#define HSR_FSC_TYPE (0x3c)
>> +#define HSR_WNR              (1 << 6)
>> +
>> +#define FSC_FAULT    (0x04)
>> +#define FSC_PERM     (0x0c)
>> +
>> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
>> +#define HPFAR_MASK   (~0xf)
>>
>> +#define HSR_EC_UNKNOWN       (0x00)
>> +#define HSR_EC_WFI   (0x01)
>> +#define HSR_EC_CP15_32       (0x03)
>> +#define HSR_EC_CP15_64       (0x04)
>> +#define HSR_EC_CP14_MR       (0x05)
>> +#define HSR_EC_CP14_LS       (0x06)
>> +#define HSR_EC_CP_0_13       (0x07)
>> +#define HSR_EC_CP10_ID       (0x08)
>> +#define HSR_EC_JAZELLE       (0x09)
>> +#define HSR_EC_BXJ   (0x0A)
>> +#define HSR_EC_CP14_64       (0x0C)
>> +#define HSR_EC_SVC_HYP       (0x11)
>> +#define HSR_EC_HVC   (0x12)
>> +#define HSR_EC_SMC   (0x13)
>> +#define HSR_EC_IABT  (0x20)
>> +#define HSR_EC_IABT_HYP      (0x21)
>> +#define HSR_EC_DABT  (0x24)
>> +#define HSR_EC_DABT_HYP      (0x25)
>>
>>  #endif /* __ARM_KVM_ARM_H__ */
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 1de6f0d..ddb09da 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -21,6 +21,7 @@
>>
>>  #include <asm/kvm.h>
>>  #include <asm/kvm_asm.h>
>> +#include <asm/fpstate.h>
>>
>>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>>  #define KVM_USER_MEM_SLOTS 32
>> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
>>       u32 hxfar;              /* Hyp Data/Inst Fault Address Register */
>>       u32 hpfar;              /* Hyp IPA Fault Address Register */
>>
>> +     /* Floating point registers (VFP and Advanced SIMD/NEON) */
>> +     struct vfp_hard_struct vfp_guest;
>> +     struct vfp_hard_struct *vfp_host;
>> +
>> +     /*
>> +      * Anything that is not used directly from assembly code goes
>> +      * here.
>> +      */
>>       /* Interrupt related fields */
>>       u32 irq_lines;          /* IRQ and FIQ levels */
>>
>> @@ -112,6 +121,7 @@ struct kvm_one_reg;
>>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>>  u64 kvm_call_hyp(void *hypfn, ...);
>> +void force_vm_exit(const cpumask_t *mask);
>>
>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>>  struct kvm;
>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>> index c985b48..c8b3272 100644
>> --- a/arch/arm/kernel/asm-offsets.c
>> +++ b/arch/arm/kernel/asm-offsets.c
>> @@ -13,6 +13,9 @@
>>  #include <linux/sched.h>
>>  #include <linux/mm.h>
>>  #include <linux/dma-mapping.h>
>> +#ifdef CONFIG_KVM_ARM_HOST
>> +#include <linux/kvm_host.h>
>> +#endif
>>  #include <asm/cacheflush.h>
>>  #include <asm/glue-df.h>
>>  #include <asm/glue-pf.h>
>> @@ -146,5 +149,27 @@ int main(void)
>>    DEFINE(DMA_BIDIRECTIONAL,  DMA_BIDIRECTIONAL);
>>    DEFINE(DMA_TO_DEVICE,              DMA_TO_DEVICE);
>>    DEFINE(DMA_FROM_DEVICE,    DMA_FROM_DEVICE);
>> +#ifdef CONFIG_KVM_ARM_HOST
>> +  DEFINE(VCPU_KVM,           offsetof(struct kvm_vcpu, kvm));
>> +  DEFINE(VCPU_MIDR,          offsetof(struct kvm_vcpu, arch.midr));
>> +  DEFINE(VCPU_CP15,          offsetof(struct kvm_vcpu, arch.cp15));
>> +  DEFINE(VCPU_VFP_GUEST,     offsetof(struct kvm_vcpu, arch.vfp_guest));
>> +  DEFINE(VCPU_VFP_HOST,              offsetof(struct kvm_vcpu, arch.vfp_host));
>> +  DEFINE(VCPU_REGS,          offsetof(struct kvm_vcpu, arch.regs));
>> +  DEFINE(VCPU_USR_REGS,              offsetof(struct kvm_vcpu, arch.regs.usr_regs));
>> +  DEFINE(VCPU_SVC_REGS,              offsetof(struct kvm_vcpu, arch.regs.svc_regs));
>> +  DEFINE(VCPU_ABT_REGS,              offsetof(struct kvm_vcpu, arch.regs.abt_regs));
>> +  DEFINE(VCPU_UND_REGS,              offsetof(struct kvm_vcpu, arch.regs.und_regs));
>> +  DEFINE(VCPU_IRQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.irq_regs));
>> +  DEFINE(VCPU_FIQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
>> +  DEFINE(VCPU_PC,            offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
>> +  DEFINE(VCPU_CPSR,          offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
>> +  DEFINE(VCPU_IRQ_LINES,     offsetof(struct kvm_vcpu, arch.irq_lines));
>> +  DEFINE(VCPU_HSR,           offsetof(struct kvm_vcpu, arch.hsr));
>> +  DEFINE(VCPU_HxFAR,         offsetof(struct kvm_vcpu, arch.hxfar));
>> +  DEFINE(VCPU_HPFAR,         offsetof(struct kvm_vcpu, arch.hpfar));
>> +  DEFINE(VCPU_HYP_PC,                offsetof(struct kvm_vcpu, arch.hyp_pc));
>> +  DEFINE(KVM_VTTBR,          offsetof(struct kvm, arch.vttbr));
>> +#endif
>>    return 0;
>>  }
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 9b4566e..c94d278 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -40,6 +40,7 @@
>>  #include <asm/kvm_arm.h>
>>  #include <asm/kvm_asm.h>
>>  #include <asm/kvm_mmu.h>
>> +#include <asm/kvm_emulate.h>
>>
>>  #ifdef REQUIRES_VIRT
>>  __asm__(".arch_extension     virt");
>> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>>  static unsigned long hyp_default_vectors;
>>
>> +/* The VMID used in the VTTBR */
>> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
>> +static u8 kvm_next_vmid;
>> +static DEFINE_SPINLOCK(kvm_vmid_lock);
>>
>>  int kvm_arch_hardware_enable(void *garbage)
>>  {
>> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
>>
>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>  {
>> +     /* Force users to call KVM_ARM_VCPU_INIT */
>> +     vcpu->arch.target = -1;
>>       return 0;
>>  }
>>
>> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  {
>>       vcpu->cpu = cpu;
>> +     vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
>>  }
>>
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>>
>>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
> As far as I see the function is unused.
>
>>  {
>> +     return v->mode == IN_GUEST_MODE;
>> +}
>> +
>> +/* Just ensure a guest exit from a particular CPU */
>> +static void exit_vm_noop(void *info)
>> +{
>> +}
>> +
>> +void force_vm_exit(const cpumask_t *mask)
>> +{
>> +     smp_call_function_many(mask, exit_vm_noop, NULL, true);
>> +}
> There is make_all_cpus_request() for that. It actually sends IPIs only
> to cpus that are running vcpus.
>
>> +
>> +/**
>> + * need_new_vmid_gen - check that the VMID is still valid
>> + * @kvm: The VM's VMID to checkt
>> + *
>> + * return true if there is a new generation of VMIDs being used
>> + *
>> + * The hardware supports only 256 values with the value zero reserved for the
>> + * host, so we check if an assigned value belongs to a previous generation,
>> + * which which requires us to assign a new value. If we're the first to use a
>> + * VMID for the new generation, we must flush necessary caches and TLBs on all
>> + * CPUs.
>> + */
>> +static bool need_new_vmid_gen(struct kvm *kvm)
>> +{
>> +     return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
>> +}
>> +
>> +/**
>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>> + * @kvm      The guest that we are about to run
>> + *
>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
>> + * caches and TLBs.
>> + */
>> +static void update_vttbr(struct kvm *kvm)
>> +{
>> +     phys_addr_t pgd_phys;
>> +     u64 vmid;
>> +
>> +     if (!need_new_vmid_gen(kvm))
>> +             return;
>> +
>> +     spin_lock(&kvm_vmid_lock);
>> +
>> +     /*
>> +      * We need to re-check the vmid_gen here to ensure that if another vcpu
>> +      * already allocated a valid vmid for this vm, then this vcpu should
>> +      * use the same vmid.
>> +      */
>> +     if (!need_new_vmid_gen(kvm)) {
>> +             spin_unlock(&kvm_vmid_lock);
>> +             return;
>> +     }
>> +
>> +     /* First user of a new VMID generation? */
>> +     if (unlikely(kvm_next_vmid == 0)) {
>> +             atomic64_inc(&kvm_vmid_gen);
>> +             kvm_next_vmid = 1;
>> +
>> +             /*
>> +              * On SMP we know no other CPUs can use this CPU's or each
>> +              * other's VMID after force_vm_exit returns since the
>> +              * kvm_vmid_lock blocks them from reentry to the guest.
>> +              */
>> +             force_vm_exit(cpu_all_mask);
>> +             /*
>> +              * Now broadcast TLB + ICACHE invalidation over the inner
>> +              * shareable domain to make sure all data structures are
>> +              * clean.
>> +              */
>> +             kvm_call_hyp(__kvm_flush_vm_context);
>> +     }
>> +
>> +     kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
>> +     kvm->arch.vmid = kvm_next_vmid;
>> +     kvm_next_vmid++;
>> +
>> +     /* update vttbr to be used with the new vmid */
>> +     pgd_phys = virt_to_phys(kvm->arch.pgd);
>> +     vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
>> +     kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
>> +     kvm->arch.vttbr |= vmid;
>> +
>> +     spin_unlock(&kvm_vmid_lock);
>> +}
>> +
>> +/*
>> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>> + * proper exit to QEMU.
>> + */
>> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>> +                    int exception_index)
>> +{
>> +     run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>>       return 0;
>>  }
>>
>> +/**
>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>> + * @vcpu:    The VCPU pointer
>> + * @run:     The kvm_run structure pointer used for userspace state exchange
>> + *
>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>> + * will execute VM code in a loop until the time slice for the process is used
>> + * or some emulation is needed from user space in which case the function will
>> + * return with return value 0 and with the kvm_run structure filled in with the
>> + * required data for the requested emulation.
>> + */
>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  {
>> -     return -EINVAL;
>> +     int ret;
>> +     sigset_t sigsaved;
>> +
>> +     /* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
>> +     if (unlikely(vcpu->arch.target < 0))
>> +             return -ENOEXEC;
>> +
>> +     if (vcpu->sigset_active)
>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>> +
>> +     ret = 1;
>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
>> +     while (ret > 0) {
>> +             /*
>> +              * Check conditions before entering the guest
>> +              */
>> +             cond_resched();
>> +
>> +             update_vttbr(vcpu->kvm);
>> +
>> +             local_irq_disable();
>> +
>> +             /*
>> +              * Re-check atomic conditions
>> +              */
>> +             if (signal_pending(current)) {
>> +                     ret = -EINTR;
>> +                     run->exit_reason = KVM_EXIT_INTR;
>> +             }
>> +
>> +             if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>> +                     local_irq_enable();
>> +                     continue;
>> +             }
>> +
>> +             /**************************************************************
>> +              * Enter the guest
>> +              */
>> +             trace_kvm_entry(*vcpu_pc(vcpu));
>> +             kvm_guest_enter();
>> +             vcpu->mode = IN_GUEST_MODE;
> You need to set mode to IN_GUEST_MODE before disabling interrupt and
> check that mode != EXITING_GUEST_MODE after disabling interrupt but
> before entering the guest. This way you will catch kicks that were sent
> between setting of the mode and disabling the interrupts. Also you need
> to check vcpu->requests and exit if it is not empty. I see that you do
> not use vcpu->requests at all, but you should since common kvm code
> assumes that it is used. make_all_cpus_request() uses it for instance.
>

I don't quite agree, but almost:

Why would you set IN_GUEST_MODE before disabling interrupts? The only
reason I can see for to be a requirement is to leverage an implicit
memory barrier. Receiving the IPI in this little window does nothing
(the smp_cross_call is a noop).

Checking that mode != EXITING_GUEST_MODE is equally useless in my
opinion, as I read the requests code the only reason for this mode is
to avoid sending an IPI twice.

Kicks sent between setting the mode and disabling the interrupts is
not the point, the point is to check the requests field (which we
don't use at all on ARM, and generic code also doesn't use on ARM)
after disabling interrupts, and after setting IN_GUEST_MODE.

The patch below fixes your issues, and while I would push back on
anything else than direct bug fixes at this point, the current code is
semantically incorrect wrt. KVM vcpu requests, so it's worth a fix,
and the patch itself is trivial.

>> +
>> +             ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> You do not take kvm->srcu lock before entering the guest. It looks
> wrong.
>

why would I take that before entering the guest? The only thing the
read side RCU protects against is the memslots data structure as far
as I can see, so the second patch pasted below fixes this for the code
that actually accesses this data structure.

>> +
>> +             vcpu->mode = OUTSIDE_GUEST_MODE;
>> +             kvm_guest_exit();
>> +             trace_kvm_exit(*vcpu_pc(vcpu));
>> +             /*
>> +              * We may have taken a host interrupt in HYP mode (ie
>> +              * while executing the guest). This interrupt is still
>> +              * pending, as we haven't serviced it yet!
>> +              *
>> +              * We're now back in SVC mode, with interrupts
>> +              * disabled.  Enabling the interrupts now will have
>> +              * the effect of taking the interrupt again, in SVC
>> +              * mode this time.
>> +              */
>> +             local_irq_enable();
>> +
>> +             /*
>> +              * Back from guest
>> +              *************************************************************/
>> +
>> +             ret = handle_exit(vcpu, run, ret);
>> +     }
>> +
>> +     if (vcpu->sigset_active)
>> +             sigprocmask(SIG_SETMASK, &sigsaved, NULL);
>> +     return ret;
>>  }
>>
>>  static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index a923590..08adcd5 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -20,9 +20,12 @@
>>  #include <linux/const.h>
>>  #include <asm/unified.h>
>>  #include <asm/page.h>
>> +#include <asm/ptrace.h>
>>  #include <asm/asm-offsets.h>
>>  #include <asm/kvm_asm.h>
>>  #include <asm/kvm_arm.h>
>> +#include <asm/vfpmacros.h>
>> +#include "interrupts_head.S"
>>
>>       .text
>>
>> @@ -31,36 +34,423 @@ __kvm_hyp_code_start:
>>
>>  /********************************************************************
>>   * Flush per-VMID TLBs
>> + *
>> + * void __kvm_tlb_flush_vmid(struct kvm *kvm);
>> + *
>> + * We rely on the hardware to broadcast the TLB invalidation to all CPUs
>> + * inside the inner-shareable domain (which is the case for all v7
>> + * implementations).  If we come across a non-IS SMP implementation, we'll
>> + * have to use an IPI based mechanism. Until then, we stick to the simple
>> + * hardware assisted version.
>>   */
>>  ENTRY(__kvm_tlb_flush_vmid)
>> +     push    {r2, r3}
>> +
>> +     add     r0, r0, #KVM_VTTBR
>> +     ldrd    r2, r3, [r0]
>> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>> +     isb
>> +     mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
>> +     dsb
>> +     isb
>> +     mov     r2, #0
>> +     mov     r3, #0
>> +     mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
>> +     isb                             @ Not necessary if followed by eret
>> +
>> +     pop     {r2, r3}
>>       bx      lr
>>  ENDPROC(__kvm_tlb_flush_vmid)
>>
>>  /********************************************************************
>> - * Flush TLBs and instruction caches of current CPU for all VMIDs
>> + * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>> + * domain, for all VMIDs
>> + *
>> + * void __kvm_flush_vm_context(void);
>>   */
>>  ENTRY(__kvm_flush_vm_context)
>> +     mov     r0, #0                  @ rn parameter for c15 flushes is SBZ
>> +
>> +     /* Invalidate NS Non-Hyp TLB Inner Shareable (TLBIALLNSNHIS) */
>> +     mcr     p15, 4, r0, c8, c3, 4
>> +     /* Invalidate instruction caches Inner Shareable (ICIALLUIS) */
>> +     mcr     p15, 0, r0, c7, c1, 0
>> +     dsb
>> +     isb                             @ Not necessary if followed by eret
>> +
>>       bx      lr
>>  ENDPROC(__kvm_flush_vm_context)
>>
>> +
>>  /********************************************************************
>>   *  Hypervisor world-switch code
>> + *
>> + *
>> + * int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>>   */
>>  ENTRY(__kvm_vcpu_run)
>> -     bx      lr
>> +     @ Save the vcpu pointer
>> +     mcr     p15, 4, vcpu, c13, c0, 2        @ HTPIDR
>> +
>> +     save_host_regs
>> +
>> +     @ Store hardware CP15 state and load guest state
>> +     read_cp15_state store_to_vcpu = 0
>> +     write_cp15_state read_from_vcpu = 1
>> +
>> +     @ If the host kernel has not been configured with VFPv3 support,
>> +     @ then it is safer if we deny guests from using it as well.
>> +#ifdef CONFIG_VFPv3
>> +     @ Set FPEXC_EN so the guest doesn't trap floating point instructions
>> +     VFPFMRX r2, FPEXC               @ VMRS
>> +     push    {r2}
>> +     orr     r2, r2, #FPEXC_EN
>> +     VFPFMXR FPEXC, r2               @ VMSR
>> +#endif
>> +
>> +     @ Configure Hyp-role
>> +     configure_hyp_role vmentry
>> +
>> +     @ Trap coprocessor CRx accesses
>> +     set_hstr vmentry
>> +     set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>> +     set_hdcr vmentry
>> +
>> +     @ Write configured ID register into MIDR alias
>> +     ldr     r1, [vcpu, #VCPU_MIDR]
>> +     mcr     p15, 4, r1, c0, c0, 0
>> +
>> +     @ Write guest view of MPIDR into VMPIDR
>> +     ldr     r1, [vcpu, #CP15_OFFSET(c0_MPIDR)]
>> +     mcr     p15, 4, r1, c0, c0, 5
>> +
>> +     @ Set up guest memory translation
>> +     ldr     r1, [vcpu, #VCPU_KVM]
>> +     add     r1, r1, #KVM_VTTBR
>> +     ldrd    r2, r3, [r1]
>> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>> +
>> +     @ We're all done, just restore the GPRs and go to the guest
>> +     restore_guest_regs
>> +     clrex                           @ Clear exclusive monitor
>> +     eret
>> +
>> +__kvm_vcpu_return:
>> +     /*
>> +      * return convention:
>> +      * guest r0, r1, r2 saved on the stack
>> +      * r0: vcpu pointer
>> +      * r1: exception code
>> +      */
>> +     save_guest_regs
>> +
>> +     @ Set VMID == 0
>> +     mov     r2, #0
>> +     mov     r3, #0
>> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
>> +
>> +     @ Don't trap coprocessor accesses for host kernel
>> +     set_hstr vmexit
>> +     set_hdcr vmexit
>> +     set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
>> +
>> +#ifdef CONFIG_VFPv3
>> +     @ Save floating point registers we if let guest use them.
>> +     tst     r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
>> +     bne     after_vfp_restore
>> +
>> +     @ Switch VFP/NEON hardware state to the host's
>> +     add     r7, vcpu, #VCPU_VFP_GUEST
>> +     store_vfp_state r7
>> +     add     r7, vcpu, #VCPU_VFP_HOST
>> +     ldr     r7, [r7]
>> +     restore_vfp_state r7
>> +
>> +after_vfp_restore:
>> +     @ Restore FPEXC_EN which we clobbered on entry
>> +     pop     {r2}
>> +     VFPFMXR FPEXC, r2
>> +#endif
>> +
>> +     @ Reset Hyp-role
>> +     configure_hyp_role vmexit
>> +
>> +     @ Let host read hardware MIDR
>> +     mrc     p15, 0, r2, c0, c0, 0
>> +     mcr     p15, 4, r2, c0, c0, 0
>> +
>> +     @ Back to hardware MPIDR
>> +     mrc     p15, 0, r2, c0, c0, 5
>> +     mcr     p15, 4, r2, c0, c0, 5
>> +
>> +     @ Store guest CP15 state and restore host state
>> +     read_cp15_state store_to_vcpu = 1
>> +     write_cp15_state read_from_vcpu = 0
>> +
>> +     restore_host_regs
>> +     clrex                           @ Clear exclusive monitor
>> +     mov     r0, r1                  @ Return the return code
>> +     bx      lr                      @ return to IOCTL
>>
>>  ENTRY(kvm_call_hyp)
>> +     hvc     #0
>>       bx      lr
>>
>>
>>  /********************************************************************
>>   * Hypervisor exception vector and handlers
>> + *
>> + *
>> + * The KVM/ARM Hypervisor ABI is defined as follows:
>> + *
>> + * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
>> + * instruction is issued since all traps are disabled when running the host
>> + * kernel as per the Hyp-mode initialization at boot time.
>> + *
>> + * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
>> + * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
>> + * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
>> + * instructions are called from within Hyp-mode.
>> + *
>> + * Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
>> + *    Switching to Hyp mode is done through a simple HVC #0 instruction. The
>> + *    exception vector code will check that the HVC comes from VMID==0 and if
>> + *    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
>> + *    - r0 contains a pointer to a HYP function
>> + *    - r1, r2, and r3 contain arguments to the above function.
>> + *    - The HYP function will be called with its arguments in r0, r1 and r2.
>> + *    On HYP function return, we return directly to SVC.
>> + *
>> + * Note that the above is used to execute code in Hyp-mode from a host-kernel
>> + * point of view, and is a different concept from performing a world-switch and
>> + * executing guest code SVC mode (with a VMID != 0).
>>   */
>>
>> +/* Handle undef, svc, pabt, or dabt by crashing with a user notice */
>> +.macro bad_exception exception_code, panic_str
>> +     push    {r0-r2}
>> +     mrrc    p15, 6, r0, r1, c2      @ Read VTTBR
>> +     lsr     r1, r1, #16
>> +     ands    r1, r1, #0xff
>> +     beq     99f
>> +
>> +     load_vcpu                       @ Load VCPU pointer
>> +     .if \exception_code == ARM_EXCEPTION_DATA_ABORT
>> +     mrc     p15, 4, r2, c5, c2, 0   @ HSR
>> +     mrc     p15, 4, r1, c6, c0, 0   @ HDFAR
>> +     str     r2, [vcpu, #VCPU_HSR]
>> +     str     r1, [vcpu, #VCPU_HxFAR]
>> +     .endif
>> +     .if \exception_code == ARM_EXCEPTION_PREF_ABORT
>> +     mrc     p15, 4, r2, c5, c2, 0   @ HSR
>> +     mrc     p15, 4, r1, c6, c0, 2   @ HIFAR
>> +     str     r2, [vcpu, #VCPU_HSR]
>> +     str     r1, [vcpu, #VCPU_HxFAR]
>> +     .endif
>> +     mov     r1, #\exception_code
>> +     b       __kvm_vcpu_return
>> +
>> +     @ We were in the host already. Let's craft a panic-ing return to SVC.
>> +99:  mrs     r2, cpsr
>> +     bic     r2, r2, #MODE_MASK
>> +     orr     r2, r2, #SVC_MODE
>> +THUMB(       orr     r2, r2, #PSR_T_BIT      )
>> +     msr     spsr_cxsf, r2
>> +     mrs     r1, ELR_hyp
>> +     ldr     r2, =BSYM(panic)
>> +     msr     ELR_hyp, r2
>> +     ldr     r0, =\panic_str
>> +     eret
>> +.endm
>> +
>> +     .text
>> +
>>       .align 5
>>  __kvm_hyp_vector:
>>       .globl __kvm_hyp_vector
>> -     nop
>> +
>> +     @ Hyp-mode exception vector
>> +     W(b)    hyp_reset
>> +     W(b)    hyp_undef
>> +     W(b)    hyp_svc
>> +     W(b)    hyp_pabt
>> +     W(b)    hyp_dabt
>> +     W(b)    hyp_hvc
>> +     W(b)    hyp_irq
>> +     W(b)    hyp_fiq
>> +
>> +     .align
>> +hyp_reset:
>> +     b       hyp_reset
>> +
>> +     .align
>> +hyp_undef:
>> +     bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str
>> +
>> +     .align
>> +hyp_svc:
>> +     bad_exception ARM_EXCEPTION_HVC, svc_die_str
>> +
>> +     .align
>> +hyp_pabt:
>> +     bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str
>> +
>> +     .align
>> +hyp_dabt:
>> +     bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str
>> +
>> +     .align
>> +hyp_hvc:
>> +     /*
>> +      * Getting here is either becuase of a trap from a guest or from calling
>> +      * HVC from the host kernel, which means "switch to Hyp mode".
>> +      */
>> +     push    {r0, r1, r2}
>> +
>> +     @ Check syndrome register
>> +     mrc     p15, 4, r1, c5, c2, 0   @ HSR
>> +     lsr     r0, r1, #HSR_EC_SHIFT
>> +#ifdef CONFIG_VFPv3
>> +     cmp     r0, #HSR_EC_CP_0_13
>> +     beq     switch_to_guest_vfp
>> +#endif
>> +     cmp     r0, #HSR_EC_HVC
>> +     bne     guest_trap              @ Not HVC instr.
>> +
>> +     /*
>> +      * Let's check if the HVC came from VMID 0 and allow simple
>> +      * switch to Hyp mode
>> +      */
>> +     mrrc    p15, 6, r0, r2, c2
>> +     lsr     r2, r2, #16
>> +     and     r2, r2, #0xff
>> +     cmp     r2, #0
>> +     bne     guest_trap              @ Guest called HVC
>> +
>> +host_switch_to_hyp:
>> +     pop     {r0, r1, r2}
>> +
>> +     push    {lr}
>> +     mrs     lr, SPSR
>> +     push    {lr}
>> +
>> +     mov     lr, r0
>> +     mov     r0, r1
>> +     mov     r1, r2
>> +     mov     r2, r3
>> +
>> +THUMB(       orr     lr, #1)
>> +     blx     lr                      @ Call the HYP function
>> +
>> +     pop     {lr}
>> +     msr     SPSR_csxf, lr
>> +     pop     {lr}
>> +     eret
>> +
>> +guest_trap:
>> +     load_vcpu                       @ Load VCPU pointer to r0
>> +     str     r1, [vcpu, #VCPU_HSR]
>> +
>> +     @ Check if we need the fault information
>> +     lsr     r1, r1, #HSR_EC_SHIFT
>> +     cmp     r1, #HSR_EC_IABT
>> +     mrceq   p15, 4, r2, c6, c0, 2   @ HIFAR
>> +     beq     2f
>> +     cmp     r1, #HSR_EC_DABT
>> +     bne     1f
>> +     mrc     p15, 4, r2, c6, c0, 0   @ HDFAR
>> +
>> +2:   str     r2, [vcpu, #VCPU_HxFAR]
>> +
>> +     /*
>> +      * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode:
>> +      *
>> +      * Abort on the stage 2 translation for a memory access from a
>> +      * Non-secure PL1 or PL0 mode:
>> +      *
>> +      * For any Access flag fault or Translation fault, and also for any
>> +      * Permission fault on the stage 2 translation of a memory access
>> +      * made as part of a translation table walk for a stage 1 translation,
>> +      * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR
>> +      * is UNKNOWN.
>> +      */
>> +
>> +     /* Check for permission fault, and S1PTW */
>> +     mrc     p15, 4, r1, c5, c2, 0   @ HSR
>> +     and     r0, r1, #HSR_FSC_TYPE
>> +     cmp     r0, #FSC_PERM
>> +     tsteq   r1, #(1 << 7)           @ S1PTW
>> +     mrcne   p15, 4, r2, c6, c0, 4   @ HPFAR
>> +     bne     3f
>> +
>> +     /* Resolve IPA using the xFAR */
>> +     mcr     p15, 0, r2, c7, c8, 0   @ ATS1CPR
>> +     isb
>> +     mrrc    p15, 0, r0, r1, c7      @ PAR
>> +     tst     r0, #1
>> +     bne     4f                      @ Failed translation
>> +     ubfx    r2, r0, #12, #20
>> +     lsl     r2, r2, #4
>> +     orr     r2, r2, r1, lsl #24
>> +
>> +3:   load_vcpu                       @ Load VCPU pointer to r0
>> +     str     r2, [r0, #VCPU_HPFAR]
>> +
>> +1:   mov     r1, #ARM_EXCEPTION_HVC
>> +     b       __kvm_vcpu_return
>> +
>> +4:   pop     {r0, r1, r2}            @ Failed translation, return to guest
>> +     eret
>> +
>> +/*
>> + * If VFPv3 support is not available, then we will not switch the VFP
>> + * registers; however cp10 and cp11 accesses will still trap and fallback
>> + * to the regular coprocessor emulation code, which currently will
>> + * inject an undefined exception to the guest.
>> + */
>> +#ifdef CONFIG_VFPv3
>> +switch_to_guest_vfp:
>> +     load_vcpu                       @ Load VCPU pointer to r0
>> +     push    {r3-r7}
>> +
>> +     @ NEON/VFP used.  Turn on VFP access.
>> +     set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
>> +
>> +     @ Switch VFP/NEON hardware state to the guest's
>> +     add     r7, r0, #VCPU_VFP_HOST
>> +     ldr     r7, [r7]
>> +     store_vfp_state r7
>> +     add     r7, r0, #VCPU_VFP_GUEST
>> +     restore_vfp_state r7
>> +
>> +     pop     {r3-r7}
>> +     pop     {r0-r2}
>> +     eret
>> +#endif
>> +
>> +     .align
>> +hyp_irq:
>> +     push    {r0, r1, r2}
>> +     mov     r1, #ARM_EXCEPTION_IRQ
>> +     load_vcpu                       @ Load VCPU pointer to r0
>> +     b       __kvm_vcpu_return
>> +
>> +     .align
>> +hyp_fiq:
>> +     b       hyp_fiq
>> +
>> +     .ltorg
>>
>>  __kvm_hyp_code_end:
>>       .globl  __kvm_hyp_code_end
>> +
>> +     .section ".rodata"
>> +
>> +und_die_str:
>> +     .ascii  "unexpected undefined exception in Hyp mode at: %#08x"
>> +pabt_die_str:
>> +     .ascii  "unexpected prefetch abort in Hyp mode at: %#08x"
>> +dabt_die_str:
>> +     .ascii  "unexpected data abort in Hyp mode at: %#08x"
>> +svc_die_str:
>> +     .ascii  "unexpected HVC/SVC trap in Hyp mode at: %#08x"
>> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
>> new file mode 100644
>> index 0000000..f59a580
>> --- /dev/null
>> +++ b/arch/arm/kvm/interrupts_head.S
>> @@ -0,0 +1,443 @@
>> +#define VCPU_USR_REG(_reg_nr)        (VCPU_USR_REGS + (_reg_nr * 4))
>> +#define VCPU_USR_SP          (VCPU_USR_REG(13))
>> +#define VCPU_USR_LR          (VCPU_USR_REG(14))
>> +#define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4))
>> +
>> +/*
>> + * Many of these macros need to access the VCPU structure, which is always
>> + * held in r0. These macros should never clobber r1, as it is used to hold the
>> + * exception code on the return path (except of course the macro that switches
>> + * all the registers before the final jump to the VM).
>> + */
>> +vcpu .req    r0              @ vcpu pointer always in r0
>> +
>> +/* Clobbers {r2-r6} */
>> +.macro store_vfp_state vfp_base
>> +     @ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions
>> +     VFPFMRX r2, FPEXC
>> +     @ Make sure VFP is enabled so we can touch the registers.
>> +     orr     r6, r2, #FPEXC_EN
>> +     VFPFMXR FPEXC, r6
>> +
>> +     VFPFMRX r3, FPSCR
>> +     tst     r2, #FPEXC_EX           @ Check for VFP Subarchitecture
>> +     beq     1f
>> +     @ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so
>> +     @ we only need to save them if FPEXC_EX is set.
>> +     VFPFMRX r4, FPINST
>> +     tst     r2, #FPEXC_FP2V
>> +     VFPFMRX r5, FPINST2, ne         @ vmrsne
>> +     bic     r6, r2, #FPEXC_EX       @ FPEXC_EX disable
>> +     VFPFMXR FPEXC, r6
>> +1:
>> +     VFPFSTMIA \vfp_base, r6         @ Save VFP registers
>> +     stm     \vfp_base, {r2-r5}      @ Save FPEXC, FPSCR, FPINST, FPINST2
>> +.endm
>> +
>> +/* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */
>> +.macro restore_vfp_state vfp_base
>> +     VFPFLDMIA \vfp_base, r6         @ Load VFP registers
>> +     ldm     \vfp_base, {r2-r5}      @ Load FPEXC, FPSCR, FPINST, FPINST2
>> +
>> +     VFPFMXR FPSCR, r3
>> +     tst     r2, #FPEXC_EX           @ Check for VFP Subarchitecture
>> +     beq     1f
>> +     VFPFMXR FPINST, r4
>> +     tst     r2, #FPEXC_FP2V
>> +     VFPFMXR FPINST2, r5, ne
>> +1:
>> +     VFPFMXR FPEXC, r2       @ FPEXC (last, in case !EN)
>> +.endm
>> +
>> +/* These are simply for the macros to work - value don't have meaning */
>> +.equ usr, 0
>> +.equ svc, 1
>> +.equ abt, 2
>> +.equ und, 3
>> +.equ irq, 4
>> +.equ fiq, 5
>> +
>> +.macro push_host_regs_mode mode
>> +     mrs     r2, SP_\mode
>> +     mrs     r3, LR_\mode
>> +     mrs     r4, SPSR_\mode
>> +     push    {r2, r3, r4}
>> +.endm
>> +
>> +/*
>> + * Store all host persistent registers on the stack.
>> + * Clobbers all registers, in all modes, except r0 and r1.
>> + */
>> +.macro save_host_regs
>> +     /* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
>> +     mrs     r2, ELR_hyp
>> +     push    {r2}
>> +
>> +     /* usr regs */
>> +     push    {r4-r12}        @ r0-r3 are always clobbered
>> +     mrs     r2, SP_usr
>> +     mov     r3, lr
>> +     push    {r2, r3}
>> +
>> +     push_host_regs_mode svc
>> +     push_host_regs_mode abt
>> +     push_host_regs_mode und
>> +     push_host_regs_mode irq
>> +
>> +     /* fiq regs */
>> +     mrs     r2, r8_fiq
>> +     mrs     r3, r9_fiq
>> +     mrs     r4, r10_fiq
>> +     mrs     r5, r11_fiq
>> +     mrs     r6, r12_fiq
>> +     mrs     r7, SP_fiq
>> +     mrs     r8, LR_fiq
>> +     mrs     r9, SPSR_fiq
>> +     push    {r2-r9}
>> +.endm
>> +
>> +.macro pop_host_regs_mode mode
>> +     pop     {r2, r3, r4}
>> +     msr     SP_\mode, r2
>> +     msr     LR_\mode, r3
>> +     msr     SPSR_\mode, r4
>> +.endm
>> +
>> +/*
>> + * Restore all host registers from the stack.
>> + * Clobbers all registers, in all modes, except r0 and r1.
>> + */
>> +.macro restore_host_regs
>> +     pop     {r2-r9}
>> +     msr     r8_fiq, r2
>> +     msr     r9_fiq, r3
>> +     msr     r10_fiq, r4
>> +     msr     r11_fiq, r5
>> +     msr     r12_fiq, r6
>> +     msr     SP_fiq, r7
>> +     msr     LR_fiq, r8
>> +     msr     SPSR_fiq, r9
>> +
>> +     pop_host_regs_mode irq
>> +     pop_host_regs_mode und
>> +     pop_host_regs_mode abt
>> +     pop_host_regs_mode svc
>> +
>> +     pop     {r2, r3}
>> +     msr     SP_usr, r2
>> +     mov     lr, r3
>> +     pop     {r4-r12}
>> +
>> +     pop     {r2}
>> +     msr     ELR_hyp, r2
>> +.endm
>> +
>> +/*
>> + * Restore SP, LR and SPSR for a given mode. offset is the offset of
>> + * this mode's registers from the VCPU base.
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + *
>> + * Clobbers r1, r2, r3, r4.
>> + */
>> +.macro restore_guest_regs_mode mode, offset
>> +     add     r1, vcpu, \offset
>> +     ldm     r1, {r2, r3, r4}
>> +     msr     SP_\mode, r2
>> +     msr     LR_\mode, r3
>> +     msr     SPSR_\mode, r4
>> +.endm
>> +
>> +/*
>> + * Restore all guest registers from the vcpu struct.
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + *
>> + * Clobbers *all* registers.
>> + */
>> +.macro restore_guest_regs
>> +     restore_guest_regs_mode svc, #VCPU_SVC_REGS
>> +     restore_guest_regs_mode abt, #VCPU_ABT_REGS
>> +     restore_guest_regs_mode und, #VCPU_UND_REGS
>> +     restore_guest_regs_mode irq, #VCPU_IRQ_REGS
>> +
>> +     add     r1, vcpu, #VCPU_FIQ_REGS
>> +     ldm     r1, {r2-r9}
>> +     msr     r8_fiq, r2
>> +     msr     r9_fiq, r3
>> +     msr     r10_fiq, r4
>> +     msr     r11_fiq, r5
>> +     msr     r12_fiq, r6
>> +     msr     SP_fiq, r7
>> +     msr     LR_fiq, r8
>> +     msr     SPSR_fiq, r9
>> +
>> +     @ Load return state
>> +     ldr     r2, [vcpu, #VCPU_PC]
>> +     ldr     r3, [vcpu, #VCPU_CPSR]
>> +     msr     ELR_hyp, r2
>> +     msr     SPSR_cxsf, r3
>> +
>> +     @ Load user registers
>> +     ldr     r2, [vcpu, #VCPU_USR_SP]
>> +     ldr     r3, [vcpu, #VCPU_USR_LR]
>> +     msr     SP_usr, r2
>> +     mov     lr, r3
>> +     add     vcpu, vcpu, #(VCPU_USR_REGS)
>> +     ldm     vcpu, {r0-r12}
>> +.endm
>> +
>> +/*
>> + * Save SP, LR and SPSR for a given mode. offset is the offset of
>> + * this mode's registers from the VCPU base.
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + *
>> + * Clobbers r2, r3, r4, r5.
>> + */
>> +.macro save_guest_regs_mode mode, offset
>> +     add     r2, vcpu, \offset
>> +     mrs     r3, SP_\mode
>> +     mrs     r4, LR_\mode
>> +     mrs     r5, SPSR_\mode
>> +     stm     r2, {r3, r4, r5}
>> +.endm
>> +
>> +/*
>> + * Save all guest registers to the vcpu struct
>> + * Expects guest's r0, r1, r2 on the stack.
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + *
>> + * Clobbers r2, r3, r4, r5.
>> + */
>> +.macro save_guest_regs
>> +     @ Store usr registers
>> +     add     r2, vcpu, #VCPU_USR_REG(3)
>> +     stm     r2, {r3-r12}
>> +     add     r2, vcpu, #VCPU_USR_REG(0)
>> +     pop     {r3, r4, r5}            @ r0, r1, r2
>> +     stm     r2, {r3, r4, r5}
>> +     mrs     r2, SP_usr
>> +     mov     r3, lr
>> +     str     r2, [vcpu, #VCPU_USR_SP]
>> +     str     r3, [vcpu, #VCPU_USR_LR]
>> +
>> +     @ Store return state
>> +     mrs     r2, ELR_hyp
>> +     mrs     r3, spsr
>> +     str     r2, [vcpu, #VCPU_PC]
>> +     str     r3, [vcpu, #VCPU_CPSR]
>> +
>> +     @ Store other guest registers
>> +     save_guest_regs_mode svc, #VCPU_SVC_REGS
>> +     save_guest_regs_mode abt, #VCPU_ABT_REGS
>> +     save_guest_regs_mode und, #VCPU_UND_REGS
>> +     save_guest_regs_mode irq, #VCPU_IRQ_REGS
>> +.endm
>> +
>> +/* Reads cp15 registers from hardware and stores them in memory
>> + * @store_to_vcpu: If 0, registers are written in-order to the stack,
>> + *              otherwise to the VCPU struct pointed to by vcpup
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + *
>> + * Clobbers r2 - r12
>> + */
>> +.macro read_cp15_state store_to_vcpu
>> +     mrc     p15, 0, r2, c1, c0, 0   @ SCTLR
>> +     mrc     p15, 0, r3, c1, c0, 2   @ CPACR
>> +     mrc     p15, 0, r4, c2, c0, 2   @ TTBCR
>> +     mrc     p15, 0, r5, c3, c0, 0   @ DACR
>> +     mrrc    p15, 0, r6, r7, c2      @ TTBR 0
>> +     mrrc    p15, 1, r8, r9, c2      @ TTBR 1
>> +     mrc     p15, 0, r10, c10, c2, 0 @ PRRR
>> +     mrc     p15, 0, r11, c10, c2, 1 @ NMRR
>> +     mrc     p15, 2, r12, c0, c0, 0  @ CSSELR
>> +
>> +     .if \store_to_vcpu == 0
>> +     push    {r2-r12}                @ Push CP15 registers
>> +     .else
>> +     str     r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
>> +     str     r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
>> +     str     r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
>> +     str     r5, [vcpu, #CP15_OFFSET(c3_DACR)]
>> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
>> +     strd    r6, r7, [vcpu]
>> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
>> +     strd    r8, r9, [vcpu]
>> +     sub     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
>> +     str     r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
>> +     str     r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
>> +     str     r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
>> +     .endif
>> +
>> +     mrc     p15, 0, r2, c13, c0, 1  @ CID
>> +     mrc     p15, 0, r3, c13, c0, 2  @ TID_URW
>> +     mrc     p15, 0, r4, c13, c0, 3  @ TID_URO
>> +     mrc     p15, 0, r5, c13, c0, 4  @ TID_PRIV
>> +     mrc     p15, 0, r6, c5, c0, 0   @ DFSR
>> +     mrc     p15, 0, r7, c5, c0, 1   @ IFSR
>> +     mrc     p15, 0, r8, c5, c1, 0   @ ADFSR
>> +     mrc     p15, 0, r9, c5, c1, 1   @ AIFSR
>> +     mrc     p15, 0, r10, c6, c0, 0  @ DFAR
>> +     mrc     p15, 0, r11, c6, c0, 2  @ IFAR
>> +     mrc     p15, 0, r12, c12, c0, 0 @ VBAR
>> +
>> +     .if \store_to_vcpu == 0
>> +     push    {r2-r12}                @ Push CP15 registers
>> +     .else
>> +     str     r2, [vcpu, #CP15_OFFSET(c13_CID)]
>> +     str     r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
>> +     str     r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
>> +     str     r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
>> +     str     r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
>> +     str     r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
>> +     str     r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
>> +     str     r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
>> +     str     r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
>> +     str     r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
>> +     str     r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
>> +     .endif
>> +.endm
>> +
>> +/*
>> + * Reads cp15 registers from memory and writes them to hardware
>> + * @read_from_vcpu: If 0, registers are read in-order from the stack,
>> + *               otherwise from the VCPU struct pointed to by vcpup
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + */
>> +.macro write_cp15_state read_from_vcpu
>> +     .if \read_from_vcpu == 0
>> +     pop     {r2-r12}
>> +     .else
>> +     ldr     r2, [vcpu, #CP15_OFFSET(c13_CID)]
>> +     ldr     r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
>> +     ldr     r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
>> +     ldr     r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
>> +     ldr     r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
>> +     ldr     r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
>> +     ldr     r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
>> +     ldr     r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
>> +     ldr     r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
>> +     ldr     r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
>> +     ldr     r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
>> +     .endif
>> +
>> +     mcr     p15, 0, r2, c13, c0, 1  @ CID
>> +     mcr     p15, 0, r3, c13, c0, 2  @ TID_URW
>> +     mcr     p15, 0, r4, c13, c0, 3  @ TID_URO
>> +     mcr     p15, 0, r5, c13, c0, 4  @ TID_PRIV
>> +     mcr     p15, 0, r6, c5, c0, 0   @ DFSR
>> +     mcr     p15, 0, r7, c5, c0, 1   @ IFSR
>> +     mcr     p15, 0, r8, c5, c1, 0   @ ADFSR
>> +     mcr     p15, 0, r9, c5, c1, 1   @ AIFSR
>> +     mcr     p15, 0, r10, c6, c0, 0  @ DFAR
>> +     mcr     p15, 0, r11, c6, c0, 2  @ IFAR
>> +     mcr     p15, 0, r12, c12, c0, 0 @ VBAR
>> +
>> +     .if \read_from_vcpu == 0
>> +     pop     {r2-r12}
>> +     .else
>> +     ldr     r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
>> +     ldr     r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
>> +     ldr     r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
>> +     ldr     r5, [vcpu, #CP15_OFFSET(c3_DACR)]
>> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
>> +     ldrd    r6, r7, [vcpu]
>> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
>> +     ldrd    r8, r9, [vcpu]
>> +     sub     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
>> +     ldr     r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
>> +     ldr     r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
>> +     ldr     r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
>> +     .endif
>> +
>> +     mcr     p15, 0, r2, c1, c0, 0   @ SCTLR
>> +     mcr     p15, 0, r3, c1, c0, 2   @ CPACR
>> +     mcr     p15, 0, r4, c2, c0, 2   @ TTBCR
>> +     mcr     p15, 0, r5, c3, c0, 0   @ DACR
>> +     mcrr    p15, 0, r6, r7, c2      @ TTBR 0
>> +     mcrr    p15, 1, r8, r9, c2      @ TTBR 1
>> +     mcr     p15, 0, r10, c10, c2, 0 @ PRRR
>> +     mcr     p15, 0, r11, c10, c2, 1 @ NMRR
>> +     mcr     p15, 2, r12, c0, c0, 0  @ CSSELR
>> +.endm
>> +
>> +/*
>> + * Save the VGIC CPU state into memory
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + */
>> +.macro save_vgic_state
>> +.endm
>> +
>> +/*
>> + * Restore the VGIC CPU state from memory
>> + *
>> + * Assumes vcpu pointer in vcpu reg
>> + */
>> +.macro restore_vgic_state
>> +.endm
>> +
>> +.equ vmentry,        0
>> +.equ vmexit, 1
>> +
>> +/* Configures the HSTR (Hyp System Trap Register) on entry/return
>> + * (hardware reset value is 0) */
>> +.macro set_hstr operation
>> +     mrc     p15, 4, r2, c1, c1, 3
>> +     ldr     r3, =HSTR_T(15)
>> +     .if \operation == vmentry
>> +     orr     r2, r2, r3              @ Trap CR{15}
>> +     .else
>> +     bic     r2, r2, r3              @ Don't trap any CRx accesses
>> +     .endif
>> +     mcr     p15, 4, r2, c1, c1, 3
>> +.endm
>> +
>> +/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
>> + * (hardware reset value is 0). Keep previous value in r2. */
>> +.macro set_hcptr operation, mask
>> +     mrc     p15, 4, r2, c1, c1, 2
>> +     ldr     r3, =\mask
>> +     .if \operation == vmentry
>> +     orr     r3, r2, r3              @ Trap coproc-accesses defined in mask
>> +     .else
>> +     bic     r3, r2, r3              @ Don't trap defined coproc-accesses
>> +     .endif
>> +     mcr     p15, 4, r3, c1, c1, 2
>> +.endm
>> +
>> +/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
>> + * (hardware reset value is 0) */
>> +.macro set_hdcr operation
>> +     mrc     p15, 4, r2, c1, c1, 1
>> +     ldr     r3, =(HDCR_TPM|HDCR_TPMCR)
>> +     .if \operation == vmentry
>> +     orr     r2, r2, r3              @ Trap some perfmon accesses
>> +     .else
>> +     bic     r2, r2, r3              @ Don't trap any perfmon accesses
>> +     .endif
>> +     mcr     p15, 4, r2, c1, c1, 1
>> +.endm
>> +
>> +/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
>> +.macro configure_hyp_role operation
>> +     mrc     p15, 4, r2, c1, c1, 0   @ HCR
>> +     bic     r2, r2, #HCR_VIRT_EXCP_MASK
>> +     ldr     r3, =HCR_GUEST_MASK
>> +     .if \operation == vmentry
>> +     orr     r2, r2, r3
>> +     ldr     r3, [vcpu, #VCPU_IRQ_LINES]
> irq_lines are accessed atomically from vcpu_interrupt_line(), but there
> is no memory barriers or atomic operations here. Looks suspicious though
> I am not familiar with ARM memory model. As far as I understand
> different translation regimes are used to access this memory, so who
> knows what this does to access ordering.
>
>

there's an exception taken to switch to Hyp mode, which I'm quite sure
implies a memory barrier.

>> +     orr     r2, r2, r3
>> +     .else
>> +     bic     r2, r2, r3
>> +     .endif
>> +     mcr     p15, 4, r2, c1, c1, 0
>> +.endm
>> +
>> +.macro load_vcpu
>> +     mrc     p15, 4, vcpu, c13, c0, 2        @ HTPIDR
>> +.endm
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

commit e290b507f0d31c895bd515d69c0c2b50d76b20db
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Tue Jan 15 20:53:03 2013 -0500

    KVM: ARM: Honor vcpu->requests in the world-switch code

    Honor vpuc->request by checking them accordingly and explicitly raise an
    error if unsupported requests are set (we don't support any requests on
    ARM currently).

    Also add some commenting to explain the synchronization in more details
    here.  The commenting implied renaming a variable and changing error
    handling slightly to improve readibility.

    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6ff5337..b23a709 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -620,7 +620,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
  */
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	int ret;
+	int guest_ret, ret;
 	sigset_t sigsaved;

 	/* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
@@ -640,9 +640,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
struct kvm_run *run)
 	if (vcpu->sigset_active)
 		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);

-	ret = 1;
 	run->exit_reason = KVM_EXIT_UNKNOWN;
-	while (ret > 0) {
+	for (;;) {
 		/*
 		 * Check conditions before entering the guest
 		 */
@@ -650,18 +649,44 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
*vcpu, struct kvm_run *run)

 		update_vttbr(vcpu->kvm);

+		/*
+		 * There is a dependency between setting IN_GUEST_MODE and
+		 * sending requests.  We need to ensure:
+		 *   1. Setting IN_GUEST_MODE before checking vcpu->requests.
+		 *   2. We need to check vcpu_request after disabling IRQs
+		 *      (see comment about signal_pending below).
+		 */
+		vcpu->mode = IN_GUEST_MODE;
+
 		local_irq_disable();

 		/*
-		 * Re-check atomic conditions
+		 * We need to be careful to check these variables after
+		 * disabling interrupts.  For example with signals:
+		 *   1. If the signal comes before the signal_pending check,
+		 *      we will return to user space and everything's good.
+		 *   2. If the signal comes after the signal_pending check,
+		 *      we rely on an IPI to exit the guest and continue the
+		 *      while loop, which checks for pending signals again.
 		 */
 		if (signal_pending(current)) {
 			ret = -EINTR;
 			run->exit_reason = KVM_EXIT_INTR;
+			local_irq_enable();
+			vcpu->mode = OUTSIDE_GUEST_MODE;
+			break;
 		}

-		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
+		if (vcpu->requests) {
+			ret = -ENOSYS; /* requests not supported */
 			local_irq_enable();
+			vcpu->mode = OUTSIDE_GUEST_MODE;
+			break;
+		}
+
+		if (need_new_vmid_gen(vcpu->kvm)) {
+			local_irq_enable();
+			vcpu->mode = OUTSIDE_GUEST_MODE;
 			continue;
 		}

@@ -670,17 +695,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 		 */
 		trace_kvm_entry(*vcpu_pc(vcpu));
 		kvm_guest_enter();
-		vcpu->mode = IN_GUEST_MODE;

 		smp_mb(); /* set mode before reading vcpu->arch.pause */
 		if (unlikely(vcpu->arch.pause)) {
 			/* This means ignore, try again. */
-			ret = ARM_EXCEPTION_IRQ;
+			guest_ret = ARM_EXCEPTION_IRQ;
 		} else {
-			ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
+			guest_ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 		}

-		vcpu->mode = OUTSIDE_GUEST_MODE;
 		vcpu->arch.last_pcpu = smp_processor_id();
 		kvm_guest_exit();
 		trace_kvm_exit(*vcpu_pc(vcpu));
@@ -695,12 +718,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 		 * mode this time.
 		 */
 		local_irq_enable();
+		vcpu->mode = OUTSIDE_GUEST_MODE;

 		/*
 		 * Back from guest
 		 *************************************************************/

-		ret = handle_exit(vcpu, run, ret);
+		ret = handle_exit(vcpu, run, guest_ret);
+		if (ret <= 0)
+			break;
 	}

 	if (vcpu->sigset_active)

commit fc9a9c5e9dd4eba4acd6bea5c8c083a9a854d662
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Tue Jan 15 20:42:15 2013 -0500

    KVM: ARM: Remove unused memslot parameter

diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
index 31ab9f5..571ccf0 100644
--- a/arch/arm/include/asm/kvm_mmio.h
+++ b/arch/arm/include/asm/kvm_mmio.h
@@ -46,6 +46,6 @@ static inline void kvm_prepare_mmio(struct kvm_run *run,

 int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
-		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot);
+		 phys_addr_t fault_ipa);

 #endif	/* __ARM_KVM_MMIO_H__ */
diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
index d6a4ca0..f655088 100644
--- a/arch/arm/kvm/mmio.c
+++ b/arch/arm/kvm/mmio.c
@@ -117,7 +117,7 @@ static int decode_hsr(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
 }

 int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
-		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
+		 phys_addr_t fault_ipa)
 {
 	struct kvm_exit_mmio mmio;
 	unsigned long rt;
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2a83ac9..c806080 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -588,7 +588,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
struct kvm_run *run)
 	unsigned long hsr_ec;
 	unsigned long fault_status;
 	phys_addr_t fault_ipa;
-	struct kvm_memory_slot *memslot = NULL;
+	struct kvm_memory_slot *memslot;
 	bool is_iabt;
 	gfn_t gfn;
 	int ret;
@@ -624,7 +624,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
struct kvm_run *run)

 		/* Adjust page offset */
 		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
-		return io_mem_abort(vcpu, run, fault_ipa, memslot);
+		return io_mem_abort(vcpu, run, fault_ipa);
 	}

 	memslot = gfn_to_memslot(vcpu->kvm, gfn);

commit 70667a06e445e240fb5e6352ccdc4bc8a290866e
Author: Christoffer Dall <c.dall@virtualopensystems.com>
Date:   Tue Jan 15 20:51:42 2013 -0500

    KVM: ARM: Grab kvm->srcu lock when handling page faults

    The memslots data structure is protected with an SRCU lock, so we should
    grab the read side lock before traversing this data structure.

    Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index c806080..0b7eabf 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -591,7 +591,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
struct kvm_run *run)
 	struct kvm_memory_slot *memslot;
 	bool is_iabt;
 	gfn_t gfn;
-	int ret;
+	int ret, idx;

 	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
 	is_iabt = (hsr_ec == HSR_EC_IABT);
@@ -627,13 +627,17 @@ int kvm_handle_guest_abort(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 		return io_mem_abort(vcpu, run, fault_ipa);
 	}

+	idx = srcu_read_lock(&vcpu->kvm->srcu);
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	if (!memslot->user_alloc) {
 		kvm_err("non user-alloc memslots not supported\n");
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out_unlock;
 	}

 	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
+out_unlock:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 	return ret ? ret : 1;
 }

--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-14 17:33       ` Christoffer Dall
@ 2013-01-16  2:56         ` Rusty Russell
  -1 siblings, 0 replies; 160+ messages in thread
From: Rusty Russell @ 2013-01-16  2:56 UTC (permalink / raw)
  To: Christoffer Dall, Russell King - ARM Linux
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell

Christoffer Dall <c.dall@virtualopensystems.com> writes:

> On Mon, Jan 14, 2013 at 11:24 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>>> +     /* -ENOENT for unknown features, -EINVAL for invalid combinations. */
>>> +     for (i = 0; i < sizeof(init->features)*8; i++) {
>>> +             if (init->features[i / 32] & (1 << (i % 32))) {
>>
>> Isn't this an open-coded version of test_bit() ?
>
> indeed, nicely spotted:

BTW, I wrote it that was out of excessive paranoia: it's a userspace
API, and test_bit() won't be right on 64 bit BE systems.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-16  2:56         ` Rusty Russell
  0 siblings, 0 replies; 160+ messages in thread
From: Rusty Russell @ 2013-01-16  2:56 UTC (permalink / raw)
  To: linux-arm-kernel

Christoffer Dall <c.dall@virtualopensystems.com> writes:

> On Mon, Jan 14, 2013 at 11:24 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>>> +     /* -ENOENT for unknown features, -EINVAL for invalid combinations. */
>>> +     for (i = 0; i < sizeof(init->features)*8; i++) {
>>> +             if (init->features[i / 32] & (1 << (i % 32))) {
>>
>> Isn't this an open-coded version of test_bit() ?
>
> indeed, nicely spotted:

BTW, I wrote it that was out of excessive paranoia: it's a userspace
API, and test_bit() won't be right on 64 bit BE systems.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-16  2:08       ` Christoffer Dall
@ 2013-01-16  4:08         ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-16  4:08 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Antonios Motakis,
	Marcelo Tosatti, Rusty Russell, nicolas

On Tue, Jan 15, 2013 at 9:08 PM, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On Tue, Jan 15, 2013 at 4:43 AM, Gleb Natapov <gleb@redhat.com> wrote:
>> On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
>>> Provides complete world-switch implementation to switch to other guests
>>> running in non-secure modes. Includes Hyp exception handlers that
>>> capture necessary exception information and stores the information on
>>> the VCPU and KVM structures.
>>>
>>> The following Hyp-ABI is also documented in the code:
>>>
>>> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
>>>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
>>>    exception vector code will check that the HVC comes from VMID==0 and if
>>>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
>>>    - r0 contains a pointer to a HYP function
>>>    - r1, r2, and r3 contain arguments to the above function.
>>>    - The HYP function will be called with its arguments in r0, r1 and r2.
>>>    On HYP function return, we return directly to SVC.
>>>
>>> A call to a function executing in Hyp mode is performed like the following:
>>>
>>>         <svc code>
>>>         ldr     r0, =BSYM(my_hyp_fn)
>>>         ldr     r1, =my_param
>>>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
>>>         <svc code>
>>>
>>> Otherwise, the world-switch is pretty straight-forward. All state that
>>> can be modified by the guest is first backed up on the Hyp stack and the
>>> VCPU values is loaded onto the hardware. State, which is not loaded, but
>>> theoretically modifiable by the guest is protected through the
>>> virtualiation features to generate a trap and cause software emulation.
>>> Upon guest returns, all state is restored from hardware onto the VCPU
>>> struct and the original state is restored from the Hyp-stack onto the
>>> hardware.
>>>
>>> SMP support using the VMPIDR calculated on the basis of the host MPIDR
>>> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
>>>
>>> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
>>> a separate patch into the appropriate patches introducing the
>>> functionality. Note that the VMIDs are stored per VM as required by the ARM
>>> architecture reference manual.
>>>
>>> To support VFP/NEON we trap those instructions using the HPCTR. When
>>> we trap, we switch the FPU.  After a guest exit, the VFP state is
>>> returned to the host.  When disabling access to floating point
>>> instructions, we also mask FPEXC_EN in order to avoid the guest
>>> receiving Undefined instruction exceptions before we have a chance to
>>> switch back the floating point state.  We are reusing vfp_hard_struct,
>>> so we depend on VFPv3 being enabled in the host kernel, if not, we still
>>> trap cp10 and cp11 in order to inject an undefined instruction exception
>>> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
>>> Antionios Motakis and Rusty Russell.
>>>
>>> Aborts that are permission faults, and not stage-1 page table walk, do
>>> not report the faulting address in the HPFAR.  We have to resolve the
>>> IPA, and store it just like the HPFAR register on the VCPU struct. If
>>> the IPA cannot be resolved, it means another CPU is playing with the
>>> page tables, and we simply restart the guest.  This quirk was fixed by
>>> Marc Zyngier.
>>>
>>> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
>>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
>>> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>>> ---
>>>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
>>>  arch/arm/include/asm/kvm_host.h |   10 +
>>>  arch/arm/kernel/asm-offsets.c   |   25 ++
>>>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
>>>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
>>>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
>>>  6 files changed, 1108 insertions(+), 4 deletions(-)
>>>  create mode 100644 arch/arm/kvm/interrupts_head.S
>>>
>>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>>> index fb22ee8..a3262a2 100644
>>> --- a/arch/arm/include/asm/kvm_arm.h
>>> +++ b/arch/arm/include/asm/kvm_arm.h
>>> @@ -98,6 +98,18 @@
>>>  #define TTBCR_T0SZ   3
>>>  #define HTCR_MASK    (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
>>>
>>> +/* Hyp System Trap Register */
>>> +#define HSTR_T(x)    (1 << x)
>>> +#define HSTR_TTEE    (1 << 16)
>>> +#define HSTR_TJDBX   (1 << 17)
>>> +
>>> +/* Hyp Coprocessor Trap Register */
>>> +#define HCPTR_TCP(x) (1 << x)
>>> +#define HCPTR_TCP_MASK       (0x3fff)
>>> +#define HCPTR_TASE   (1 << 15)
>>> +#define HCPTR_TTA    (1 << 20)
>>> +#define HCPTR_TCPAC  (1 << 31)
>>> +
>>>  /* Hyp Debug Configuration Register bits */
>>>  #define HDCR_TDRA    (1 << 11)
>>>  #define HDCR_TDOSA   (1 << 10)
>>> @@ -144,6 +156,45 @@
>>>  #else
>>>  #define VTTBR_X              (5 - KVM_T0SZ)
>>>  #endif
>>> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
>>> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
>>> +#define VTTBR_VMID_SHIFT  (48LLU)
>>> +#define VTTBR_VMID_MASK        (0xffLLU << VTTBR_VMID_SHIFT)
>>> +
>>> +/* Hyp Syndrome Register (HSR) bits */
>>> +#define HSR_EC_SHIFT (26)
>>> +#define HSR_EC               (0x3fU << HSR_EC_SHIFT)
>>> +#define HSR_IL               (1U << 25)
>>> +#define HSR_ISS              (HSR_IL - 1)
>>> +#define HSR_ISV_SHIFT        (24)
>>> +#define HSR_ISV              (1U << HSR_ISV_SHIFT)
>>> +#define HSR_FSC              (0x3f)
>>> +#define HSR_FSC_TYPE (0x3c)
>>> +#define HSR_WNR              (1 << 6)
>>> +
>>> +#define FSC_FAULT    (0x04)
>>> +#define FSC_PERM     (0x0c)
>>> +
>>> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
>>> +#define HPFAR_MASK   (~0xf)
>>>
>>> +#define HSR_EC_UNKNOWN       (0x00)
>>> +#define HSR_EC_WFI   (0x01)
>>> +#define HSR_EC_CP15_32       (0x03)
>>> +#define HSR_EC_CP15_64       (0x04)
>>> +#define HSR_EC_CP14_MR       (0x05)
>>> +#define HSR_EC_CP14_LS       (0x06)
>>> +#define HSR_EC_CP_0_13       (0x07)
>>> +#define HSR_EC_CP10_ID       (0x08)
>>> +#define HSR_EC_JAZELLE       (0x09)
>>> +#define HSR_EC_BXJ   (0x0A)
>>> +#define HSR_EC_CP14_64       (0x0C)
>>> +#define HSR_EC_SVC_HYP       (0x11)
>>> +#define HSR_EC_HVC   (0x12)
>>> +#define HSR_EC_SMC   (0x13)
>>> +#define HSR_EC_IABT  (0x20)
>>> +#define HSR_EC_IABT_HYP      (0x21)
>>> +#define HSR_EC_DABT  (0x24)
>>> +#define HSR_EC_DABT_HYP      (0x25)
>>>
>>>  #endif /* __ARM_KVM_ARM_H__ */
>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>> index 1de6f0d..ddb09da 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -21,6 +21,7 @@
>>>
>>>  #include <asm/kvm.h>
>>>  #include <asm/kvm_asm.h>
>>> +#include <asm/fpstate.h>
>>>
>>>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>>>  #define KVM_USER_MEM_SLOTS 32
>>> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
>>>       u32 hxfar;              /* Hyp Data/Inst Fault Address Register */
>>>       u32 hpfar;              /* Hyp IPA Fault Address Register */
>>>
>>> +     /* Floating point registers (VFP and Advanced SIMD/NEON) */
>>> +     struct vfp_hard_struct vfp_guest;
>>> +     struct vfp_hard_struct *vfp_host;
>>> +
>>> +     /*
>>> +      * Anything that is not used directly from assembly code goes
>>> +      * here.
>>> +      */
>>>       /* Interrupt related fields */
>>>       u32 irq_lines;          /* IRQ and FIQ levels */
>>>
>>> @@ -112,6 +121,7 @@ struct kvm_one_reg;
>>>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>>>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>>>  u64 kvm_call_hyp(void *hypfn, ...);
>>> +void force_vm_exit(const cpumask_t *mask);
>>>
>>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>>>  struct kvm;
>>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>>> index c985b48..c8b3272 100644
>>> --- a/arch/arm/kernel/asm-offsets.c
>>> +++ b/arch/arm/kernel/asm-offsets.c
>>> @@ -13,6 +13,9 @@
>>>  #include <linux/sched.h>
>>>  #include <linux/mm.h>
>>>  #include <linux/dma-mapping.h>
>>> +#ifdef CONFIG_KVM_ARM_HOST
>>> +#include <linux/kvm_host.h>
>>> +#endif
>>>  #include <asm/cacheflush.h>
>>>  #include <asm/glue-df.h>
>>>  #include <asm/glue-pf.h>
>>> @@ -146,5 +149,27 @@ int main(void)
>>>    DEFINE(DMA_BIDIRECTIONAL,  DMA_BIDIRECTIONAL);
>>>    DEFINE(DMA_TO_DEVICE,              DMA_TO_DEVICE);
>>>    DEFINE(DMA_FROM_DEVICE,    DMA_FROM_DEVICE);
>>> +#ifdef CONFIG_KVM_ARM_HOST
>>> +  DEFINE(VCPU_KVM,           offsetof(struct kvm_vcpu, kvm));
>>> +  DEFINE(VCPU_MIDR,          offsetof(struct kvm_vcpu, arch.midr));
>>> +  DEFINE(VCPU_CP15,          offsetof(struct kvm_vcpu, arch.cp15));
>>> +  DEFINE(VCPU_VFP_GUEST,     offsetof(struct kvm_vcpu, arch.vfp_guest));
>>> +  DEFINE(VCPU_VFP_HOST,              offsetof(struct kvm_vcpu, arch.vfp_host));
>>> +  DEFINE(VCPU_REGS,          offsetof(struct kvm_vcpu, arch.regs));
>>> +  DEFINE(VCPU_USR_REGS,              offsetof(struct kvm_vcpu, arch.regs.usr_regs));
>>> +  DEFINE(VCPU_SVC_REGS,              offsetof(struct kvm_vcpu, arch.regs.svc_regs));
>>> +  DEFINE(VCPU_ABT_REGS,              offsetof(struct kvm_vcpu, arch.regs.abt_regs));
>>> +  DEFINE(VCPU_UND_REGS,              offsetof(struct kvm_vcpu, arch.regs.und_regs));
>>> +  DEFINE(VCPU_IRQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.irq_regs));
>>> +  DEFINE(VCPU_FIQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
>>> +  DEFINE(VCPU_PC,            offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
>>> +  DEFINE(VCPU_CPSR,          offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
>>> +  DEFINE(VCPU_IRQ_LINES,     offsetof(struct kvm_vcpu, arch.irq_lines));
>>> +  DEFINE(VCPU_HSR,           offsetof(struct kvm_vcpu, arch.hsr));
>>> +  DEFINE(VCPU_HxFAR,         offsetof(struct kvm_vcpu, arch.hxfar));
>>> +  DEFINE(VCPU_HPFAR,         offsetof(struct kvm_vcpu, arch.hpfar));
>>> +  DEFINE(VCPU_HYP_PC,                offsetof(struct kvm_vcpu, arch.hyp_pc));
>>> +  DEFINE(KVM_VTTBR,          offsetof(struct kvm, arch.vttbr));
>>> +#endif
>>>    return 0;
>>>  }
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index 9b4566e..c94d278 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -40,6 +40,7 @@
>>>  #include <asm/kvm_arm.h>
>>>  #include <asm/kvm_asm.h>
>>>  #include <asm/kvm_mmu.h>
>>> +#include <asm/kvm_emulate.h>
>>>
>>>  #ifdef REQUIRES_VIRT
>>>  __asm__(".arch_extension     virt");
>>> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>>>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>>>  static unsigned long hyp_default_vectors;
>>>
>>> +/* The VMID used in the VTTBR */
>>> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
>>> +static u8 kvm_next_vmid;
>>> +static DEFINE_SPINLOCK(kvm_vmid_lock);
>>>
>>>  int kvm_arch_hardware_enable(void *garbage)
>>>  {
>>> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
>>>
>>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>>  {
>>> +     /* Force users to call KVM_ARM_VCPU_INIT */
>>> +     vcpu->arch.target = -1;
>>>       return 0;
>>>  }
>>>
>>> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>  {
>>>       vcpu->cpu = cpu;
>>> +     vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
>>>  }
>>>
>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>>>
>>>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
>> As far as I see the function is unused.
>>
>>>  {
>>> +     return v->mode == IN_GUEST_MODE;
>>> +}
>>> +
>>> +/* Just ensure a guest exit from a particular CPU */
>>> +static void exit_vm_noop(void *info)
>>> +{
>>> +}
>>> +
>>> +void force_vm_exit(const cpumask_t *mask)
>>> +{
>>> +     smp_call_function_many(mask, exit_vm_noop, NULL, true);
>>> +}
>> There is make_all_cpus_request() for that. It actually sends IPIs only
>> to cpus that are running vcpus.
>>
>>> +
>>> +/**
>>> + * need_new_vmid_gen - check that the VMID is still valid
>>> + * @kvm: The VM's VMID to checkt
>>> + *
>>> + * return true if there is a new generation of VMIDs being used
>>> + *
>>> + * The hardware supports only 256 values with the value zero reserved for the
>>> + * host, so we check if an assigned value belongs to a previous generation,
>>> + * which which requires us to assign a new value. If we're the first to use a
>>> + * VMID for the new generation, we must flush necessary caches and TLBs on all
>>> + * CPUs.
>>> + */
>>> +static bool need_new_vmid_gen(struct kvm *kvm)
>>> +{
>>> +     return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
>>> +}
>>> +
>>> +/**
>>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>>> + * @kvm      The guest that we are about to run
>>> + *
>>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
>>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
>>> + * caches and TLBs.
>>> + */
>>> +static void update_vttbr(struct kvm *kvm)
>>> +{
>>> +     phys_addr_t pgd_phys;
>>> +     u64 vmid;
>>> +
>>> +     if (!need_new_vmid_gen(kvm))
>>> +             return;
>>> +
>>> +     spin_lock(&kvm_vmid_lock);
>>> +
>>> +     /*
>>> +      * We need to re-check the vmid_gen here to ensure that if another vcpu
>>> +      * already allocated a valid vmid for this vm, then this vcpu should
>>> +      * use the same vmid.
>>> +      */
>>> +     if (!need_new_vmid_gen(kvm)) {
>>> +             spin_unlock(&kvm_vmid_lock);
>>> +             return;
>>> +     }
>>> +
>>> +     /* First user of a new VMID generation? */
>>> +     if (unlikely(kvm_next_vmid == 0)) {
>>> +             atomic64_inc(&kvm_vmid_gen);
>>> +             kvm_next_vmid = 1;
>>> +
>>> +             /*
>>> +              * On SMP we know no other CPUs can use this CPU's or each
>>> +              * other's VMID after force_vm_exit returns since the
>>> +              * kvm_vmid_lock blocks them from reentry to the guest.
>>> +              */
>>> +             force_vm_exit(cpu_all_mask);
>>> +             /*
>>> +              * Now broadcast TLB + ICACHE invalidation over the inner
>>> +              * shareable domain to make sure all data structures are
>>> +              * clean.
>>> +              */
>>> +             kvm_call_hyp(__kvm_flush_vm_context);
>>> +     }
>>> +
>>> +     kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
>>> +     kvm->arch.vmid = kvm_next_vmid;
>>> +     kvm_next_vmid++;
>>> +
>>> +     /* update vttbr to be used with the new vmid */
>>> +     pgd_phys = virt_to_phys(kvm->arch.pgd);
>>> +     vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
>>> +     kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
>>> +     kvm->arch.vttbr |= vmid;
>>> +
>>> +     spin_unlock(&kvm_vmid_lock);
>>> +}
>>> +
>>> +/*
>>> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>>> + * proper exit to QEMU.
>>> + */
>>> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>> +                    int exception_index)
>>> +{
>>> +     run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>>>       return 0;
>>>  }
>>>
>>> +/**
>>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>>> + * @vcpu:    The VCPU pointer
>>> + * @run:     The kvm_run structure pointer used for userspace state exchange
>>> + *
>>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>>> + * will execute VM code in a loop until the time slice for the process is used
>>> + * or some emulation is needed from user space in which case the function will
>>> + * return with return value 0 and with the kvm_run structure filled in with the
>>> + * required data for the requested emulation.
>>> + */
>>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  {
>>> -     return -EINVAL;
>>> +     int ret;
>>> +     sigset_t sigsaved;
>>> +
>>> +     /* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
>>> +     if (unlikely(vcpu->arch.target < 0))
>>> +             return -ENOEXEC;
>>> +
>>> +     if (vcpu->sigset_active)
>>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>>> +
>>> +     ret = 1;
>>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
>>> +     while (ret > 0) {
>>> +             /*
>>> +              * Check conditions before entering the guest
>>> +              */
>>> +             cond_resched();
>>> +
>>> +             update_vttbr(vcpu->kvm);
>>> +
>>> +             local_irq_disable();
>>> +
>>> +             /*
>>> +              * Re-check atomic conditions
>>> +              */
>>> +             if (signal_pending(current)) {
>>> +                     ret = -EINTR;
>>> +                     run->exit_reason = KVM_EXIT_INTR;
>>> +             }
>>> +
>>> +             if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>> +                     local_irq_enable();
>>> +                     continue;
>>> +             }
>>> +
>>> +             /**************************************************************
>>> +              * Enter the guest
>>> +              */
>>> +             trace_kvm_entry(*vcpu_pc(vcpu));
>>> +             kvm_guest_enter();
>>> +             vcpu->mode = IN_GUEST_MODE;
>> You need to set mode to IN_GUEST_MODE before disabling interrupt and
>> check that mode != EXITING_GUEST_MODE after disabling interrupt but
>> before entering the guest. This way you will catch kicks that were sent
>> between setting of the mode and disabling the interrupts. Also you need
>> to check vcpu->requests and exit if it is not empty. I see that you do
>> not use vcpu->requests at all, but you should since common kvm code
>> assumes that it is used. make_all_cpus_request() uses it for instance.
>>
>
> I don't quite agree, but almost:
>
> Why would you set IN_GUEST_MODE before disabling interrupts? The only
> reason I can see for to be a requirement is to leverage an implicit
> memory barrier. Receiving the IPI in this little window does nothing
> (the smp_cross_call is a noop).
>
> Checking that mode != EXITING_GUEST_MODE is equally useless in my
> opinion, as I read the requests code the only reason for this mode is
> to avoid sending an IPI twice.
>
> Kicks sent between setting the mode and disabling the interrupts is
> not the point, the point is to check the requests field (which we
> don't use at all on ARM, and generic code also doesn't use on ARM)
> after disabling interrupts, and after setting IN_GUEST_MODE.
>
> The patch below fixes your issues, and while I would push back on
> anything else than direct bug fixes at this point, the current code is
> semantically incorrect wrt. KVM vcpu requests, so it's worth a fix,
> and the patch itself is trivial.
>
[...]

Actually, I take that back, the kvm_vcpu_block function does make a
request, which we don't need to handle, so adding code that checks for
features we don't support is useless at this point. Please ignore the
patch I sent earlier.

Later on we can change some of the code to use the vcpu->features map
if there's a real benefit, but right now the priority is to merge this
code, so anything that's not a bugfix should not go in.

The srcu lock is a real bug though, and should be fixed.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-16  4:08         ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-16  4:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 9:08 PM, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On Tue, Jan 15, 2013 at 4:43 AM, Gleb Natapov <gleb@redhat.com> wrote:
>> On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
>>> Provides complete world-switch implementation to switch to other guests
>>> running in non-secure modes. Includes Hyp exception handlers that
>>> capture necessary exception information and stores the information on
>>> the VCPU and KVM structures.
>>>
>>> The following Hyp-ABI is also documented in the code:
>>>
>>> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
>>>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
>>>    exception vector code will check that the HVC comes from VMID==0 and if
>>>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
>>>    - r0 contains a pointer to a HYP function
>>>    - r1, r2, and r3 contain arguments to the above function.
>>>    - The HYP function will be called with its arguments in r0, r1 and r2.
>>>    On HYP function return, we return directly to SVC.
>>>
>>> A call to a function executing in Hyp mode is performed like the following:
>>>
>>>         <svc code>
>>>         ldr     r0, =BSYM(my_hyp_fn)
>>>         ldr     r1, =my_param
>>>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
>>>         <svc code>
>>>
>>> Otherwise, the world-switch is pretty straight-forward. All state that
>>> can be modified by the guest is first backed up on the Hyp stack and the
>>> VCPU values is loaded onto the hardware. State, which is not loaded, but
>>> theoretically modifiable by the guest is protected through the
>>> virtualiation features to generate a trap and cause software emulation.
>>> Upon guest returns, all state is restored from hardware onto the VCPU
>>> struct and the original state is restored from the Hyp-stack onto the
>>> hardware.
>>>
>>> SMP support using the VMPIDR calculated on the basis of the host MPIDR
>>> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
>>>
>>> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
>>> a separate patch into the appropriate patches introducing the
>>> functionality. Note that the VMIDs are stored per VM as required by the ARM
>>> architecture reference manual.
>>>
>>> To support VFP/NEON we trap those instructions using the HPCTR. When
>>> we trap, we switch the FPU.  After a guest exit, the VFP state is
>>> returned to the host.  When disabling access to floating point
>>> instructions, we also mask FPEXC_EN in order to avoid the guest
>>> receiving Undefined instruction exceptions before we have a chance to
>>> switch back the floating point state.  We are reusing vfp_hard_struct,
>>> so we depend on VFPv3 being enabled in the host kernel, if not, we still
>>> trap cp10 and cp11 in order to inject an undefined instruction exception
>>> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
>>> Antionios Motakis and Rusty Russell.
>>>
>>> Aborts that are permission faults, and not stage-1 page table walk, do
>>> not report the faulting address in the HPFAR.  We have to resolve the
>>> IPA, and store it just like the HPFAR register on the VCPU struct. If
>>> the IPA cannot be resolved, it means another CPU is playing with the
>>> page tables, and we simply restart the guest.  This quirk was fixed by
>>> Marc Zyngier.
>>>
>>> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
>>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
>>> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>>> ---
>>>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
>>>  arch/arm/include/asm/kvm_host.h |   10 +
>>>  arch/arm/kernel/asm-offsets.c   |   25 ++
>>>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
>>>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
>>>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
>>>  6 files changed, 1108 insertions(+), 4 deletions(-)
>>>  create mode 100644 arch/arm/kvm/interrupts_head.S
>>>
>>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>>> index fb22ee8..a3262a2 100644
>>> --- a/arch/arm/include/asm/kvm_arm.h
>>> +++ b/arch/arm/include/asm/kvm_arm.h
>>> @@ -98,6 +98,18 @@
>>>  #define TTBCR_T0SZ   3
>>>  #define HTCR_MASK    (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
>>>
>>> +/* Hyp System Trap Register */
>>> +#define HSTR_T(x)    (1 << x)
>>> +#define HSTR_TTEE    (1 << 16)
>>> +#define HSTR_TJDBX   (1 << 17)
>>> +
>>> +/* Hyp Coprocessor Trap Register */
>>> +#define HCPTR_TCP(x) (1 << x)
>>> +#define HCPTR_TCP_MASK       (0x3fff)
>>> +#define HCPTR_TASE   (1 << 15)
>>> +#define HCPTR_TTA    (1 << 20)
>>> +#define HCPTR_TCPAC  (1 << 31)
>>> +
>>>  /* Hyp Debug Configuration Register bits */
>>>  #define HDCR_TDRA    (1 << 11)
>>>  #define HDCR_TDOSA   (1 << 10)
>>> @@ -144,6 +156,45 @@
>>>  #else
>>>  #define VTTBR_X              (5 - KVM_T0SZ)
>>>  #endif
>>> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
>>> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
>>> +#define VTTBR_VMID_SHIFT  (48LLU)
>>> +#define VTTBR_VMID_MASK        (0xffLLU << VTTBR_VMID_SHIFT)
>>> +
>>> +/* Hyp Syndrome Register (HSR) bits */
>>> +#define HSR_EC_SHIFT (26)
>>> +#define HSR_EC               (0x3fU << HSR_EC_SHIFT)
>>> +#define HSR_IL               (1U << 25)
>>> +#define HSR_ISS              (HSR_IL - 1)
>>> +#define HSR_ISV_SHIFT        (24)
>>> +#define HSR_ISV              (1U << HSR_ISV_SHIFT)
>>> +#define HSR_FSC              (0x3f)
>>> +#define HSR_FSC_TYPE (0x3c)
>>> +#define HSR_WNR              (1 << 6)
>>> +
>>> +#define FSC_FAULT    (0x04)
>>> +#define FSC_PERM     (0x0c)
>>> +
>>> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
>>> +#define HPFAR_MASK   (~0xf)
>>>
>>> +#define HSR_EC_UNKNOWN       (0x00)
>>> +#define HSR_EC_WFI   (0x01)
>>> +#define HSR_EC_CP15_32       (0x03)
>>> +#define HSR_EC_CP15_64       (0x04)
>>> +#define HSR_EC_CP14_MR       (0x05)
>>> +#define HSR_EC_CP14_LS       (0x06)
>>> +#define HSR_EC_CP_0_13       (0x07)
>>> +#define HSR_EC_CP10_ID       (0x08)
>>> +#define HSR_EC_JAZELLE       (0x09)
>>> +#define HSR_EC_BXJ   (0x0A)
>>> +#define HSR_EC_CP14_64       (0x0C)
>>> +#define HSR_EC_SVC_HYP       (0x11)
>>> +#define HSR_EC_HVC   (0x12)
>>> +#define HSR_EC_SMC   (0x13)
>>> +#define HSR_EC_IABT  (0x20)
>>> +#define HSR_EC_IABT_HYP      (0x21)
>>> +#define HSR_EC_DABT  (0x24)
>>> +#define HSR_EC_DABT_HYP      (0x25)
>>>
>>>  #endif /* __ARM_KVM_ARM_H__ */
>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>> index 1de6f0d..ddb09da 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -21,6 +21,7 @@
>>>
>>>  #include <asm/kvm.h>
>>>  #include <asm/kvm_asm.h>
>>> +#include <asm/fpstate.h>
>>>
>>>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>>>  #define KVM_USER_MEM_SLOTS 32
>>> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
>>>       u32 hxfar;              /* Hyp Data/Inst Fault Address Register */
>>>       u32 hpfar;              /* Hyp IPA Fault Address Register */
>>>
>>> +     /* Floating point registers (VFP and Advanced SIMD/NEON) */
>>> +     struct vfp_hard_struct vfp_guest;
>>> +     struct vfp_hard_struct *vfp_host;
>>> +
>>> +     /*
>>> +      * Anything that is not used directly from assembly code goes
>>> +      * here.
>>> +      */
>>>       /* Interrupt related fields */
>>>       u32 irq_lines;          /* IRQ and FIQ levels */
>>>
>>> @@ -112,6 +121,7 @@ struct kvm_one_reg;
>>>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>>>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>>>  u64 kvm_call_hyp(void *hypfn, ...);
>>> +void force_vm_exit(const cpumask_t *mask);
>>>
>>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>>>  struct kvm;
>>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>>> index c985b48..c8b3272 100644
>>> --- a/arch/arm/kernel/asm-offsets.c
>>> +++ b/arch/arm/kernel/asm-offsets.c
>>> @@ -13,6 +13,9 @@
>>>  #include <linux/sched.h>
>>>  #include <linux/mm.h>
>>>  #include <linux/dma-mapping.h>
>>> +#ifdef CONFIG_KVM_ARM_HOST
>>> +#include <linux/kvm_host.h>
>>> +#endif
>>>  #include <asm/cacheflush.h>
>>>  #include <asm/glue-df.h>
>>>  #include <asm/glue-pf.h>
>>> @@ -146,5 +149,27 @@ int main(void)
>>>    DEFINE(DMA_BIDIRECTIONAL,  DMA_BIDIRECTIONAL);
>>>    DEFINE(DMA_TO_DEVICE,              DMA_TO_DEVICE);
>>>    DEFINE(DMA_FROM_DEVICE,    DMA_FROM_DEVICE);
>>> +#ifdef CONFIG_KVM_ARM_HOST
>>> +  DEFINE(VCPU_KVM,           offsetof(struct kvm_vcpu, kvm));
>>> +  DEFINE(VCPU_MIDR,          offsetof(struct kvm_vcpu, arch.midr));
>>> +  DEFINE(VCPU_CP15,          offsetof(struct kvm_vcpu, arch.cp15));
>>> +  DEFINE(VCPU_VFP_GUEST,     offsetof(struct kvm_vcpu, arch.vfp_guest));
>>> +  DEFINE(VCPU_VFP_HOST,              offsetof(struct kvm_vcpu, arch.vfp_host));
>>> +  DEFINE(VCPU_REGS,          offsetof(struct kvm_vcpu, arch.regs));
>>> +  DEFINE(VCPU_USR_REGS,              offsetof(struct kvm_vcpu, arch.regs.usr_regs));
>>> +  DEFINE(VCPU_SVC_REGS,              offsetof(struct kvm_vcpu, arch.regs.svc_regs));
>>> +  DEFINE(VCPU_ABT_REGS,              offsetof(struct kvm_vcpu, arch.regs.abt_regs));
>>> +  DEFINE(VCPU_UND_REGS,              offsetof(struct kvm_vcpu, arch.regs.und_regs));
>>> +  DEFINE(VCPU_IRQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.irq_regs));
>>> +  DEFINE(VCPU_FIQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
>>> +  DEFINE(VCPU_PC,            offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
>>> +  DEFINE(VCPU_CPSR,          offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
>>> +  DEFINE(VCPU_IRQ_LINES,     offsetof(struct kvm_vcpu, arch.irq_lines));
>>> +  DEFINE(VCPU_HSR,           offsetof(struct kvm_vcpu, arch.hsr));
>>> +  DEFINE(VCPU_HxFAR,         offsetof(struct kvm_vcpu, arch.hxfar));
>>> +  DEFINE(VCPU_HPFAR,         offsetof(struct kvm_vcpu, arch.hpfar));
>>> +  DEFINE(VCPU_HYP_PC,                offsetof(struct kvm_vcpu, arch.hyp_pc));
>>> +  DEFINE(KVM_VTTBR,          offsetof(struct kvm, arch.vttbr));
>>> +#endif
>>>    return 0;
>>>  }
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index 9b4566e..c94d278 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -40,6 +40,7 @@
>>>  #include <asm/kvm_arm.h>
>>>  #include <asm/kvm_asm.h>
>>>  #include <asm/kvm_mmu.h>
>>> +#include <asm/kvm_emulate.h>
>>>
>>>  #ifdef REQUIRES_VIRT
>>>  __asm__(".arch_extension     virt");
>>> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>>>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>>>  static unsigned long hyp_default_vectors;
>>>
>>> +/* The VMID used in the VTTBR */
>>> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
>>> +static u8 kvm_next_vmid;
>>> +static DEFINE_SPINLOCK(kvm_vmid_lock);
>>>
>>>  int kvm_arch_hardware_enable(void *garbage)
>>>  {
>>> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
>>>
>>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>>  {
>>> +     /* Force users to call KVM_ARM_VCPU_INIT */
>>> +     vcpu->arch.target = -1;
>>>       return 0;
>>>  }
>>>
>>> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>  {
>>>       vcpu->cpu = cpu;
>>> +     vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
>>>  }
>>>
>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>>>
>>>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
>> As far as I see the function is unused.
>>
>>>  {
>>> +     return v->mode == IN_GUEST_MODE;
>>> +}
>>> +
>>> +/* Just ensure a guest exit from a particular CPU */
>>> +static void exit_vm_noop(void *info)
>>> +{
>>> +}
>>> +
>>> +void force_vm_exit(const cpumask_t *mask)
>>> +{
>>> +     smp_call_function_many(mask, exit_vm_noop, NULL, true);
>>> +}
>> There is make_all_cpus_request() for that. It actually sends IPIs only
>> to cpus that are running vcpus.
>>
>>> +
>>> +/**
>>> + * need_new_vmid_gen - check that the VMID is still valid
>>> + * @kvm: The VM's VMID to checkt
>>> + *
>>> + * return true if there is a new generation of VMIDs being used
>>> + *
>>> + * The hardware supports only 256 values with the value zero reserved for the
>>> + * host, so we check if an assigned value belongs to a previous generation,
>>> + * which which requires us to assign a new value. If we're the first to use a
>>> + * VMID for the new generation, we must flush necessary caches and TLBs on all
>>> + * CPUs.
>>> + */
>>> +static bool need_new_vmid_gen(struct kvm *kvm)
>>> +{
>>> +     return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
>>> +}
>>> +
>>> +/**
>>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>>> + * @kvm      The guest that we are about to run
>>> + *
>>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
>>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
>>> + * caches and TLBs.
>>> + */
>>> +static void update_vttbr(struct kvm *kvm)
>>> +{
>>> +     phys_addr_t pgd_phys;
>>> +     u64 vmid;
>>> +
>>> +     if (!need_new_vmid_gen(kvm))
>>> +             return;
>>> +
>>> +     spin_lock(&kvm_vmid_lock);
>>> +
>>> +     /*
>>> +      * We need to re-check the vmid_gen here to ensure that if another vcpu
>>> +      * already allocated a valid vmid for this vm, then this vcpu should
>>> +      * use the same vmid.
>>> +      */
>>> +     if (!need_new_vmid_gen(kvm)) {
>>> +             spin_unlock(&kvm_vmid_lock);
>>> +             return;
>>> +     }
>>> +
>>> +     /* First user of a new VMID generation? */
>>> +     if (unlikely(kvm_next_vmid == 0)) {
>>> +             atomic64_inc(&kvm_vmid_gen);
>>> +             kvm_next_vmid = 1;
>>> +
>>> +             /*
>>> +              * On SMP we know no other CPUs can use this CPU's or each
>>> +              * other's VMID after force_vm_exit returns since the
>>> +              * kvm_vmid_lock blocks them from reentry to the guest.
>>> +              */
>>> +             force_vm_exit(cpu_all_mask);
>>> +             /*
>>> +              * Now broadcast TLB + ICACHE invalidation over the inner
>>> +              * shareable domain to make sure all data structures are
>>> +              * clean.
>>> +              */
>>> +             kvm_call_hyp(__kvm_flush_vm_context);
>>> +     }
>>> +
>>> +     kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
>>> +     kvm->arch.vmid = kvm_next_vmid;
>>> +     kvm_next_vmid++;
>>> +
>>> +     /* update vttbr to be used with the new vmid */
>>> +     pgd_phys = virt_to_phys(kvm->arch.pgd);
>>> +     vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
>>> +     kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
>>> +     kvm->arch.vttbr |= vmid;
>>> +
>>> +     spin_unlock(&kvm_vmid_lock);
>>> +}
>>> +
>>> +/*
>>> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>>> + * proper exit to QEMU.
>>> + */
>>> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>> +                    int exception_index)
>>> +{
>>> +     run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>>>       return 0;
>>>  }
>>>
>>> +/**
>>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>>> + * @vcpu:    The VCPU pointer
>>> + * @run:     The kvm_run structure pointer used for userspace state exchange
>>> + *
>>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>>> + * will execute VM code in a loop until the time slice for the process is used
>>> + * or some emulation is needed from user space in which case the function will
>>> + * return with return value 0 and with the kvm_run structure filled in with the
>>> + * required data for the requested emulation.
>>> + */
>>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  {
>>> -     return -EINVAL;
>>> +     int ret;
>>> +     sigset_t sigsaved;
>>> +
>>> +     /* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
>>> +     if (unlikely(vcpu->arch.target < 0))
>>> +             return -ENOEXEC;
>>> +
>>> +     if (vcpu->sigset_active)
>>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>>> +
>>> +     ret = 1;
>>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
>>> +     while (ret > 0) {
>>> +             /*
>>> +              * Check conditions before entering the guest
>>> +              */
>>> +             cond_resched();
>>> +
>>> +             update_vttbr(vcpu->kvm);
>>> +
>>> +             local_irq_disable();
>>> +
>>> +             /*
>>> +              * Re-check atomic conditions
>>> +              */
>>> +             if (signal_pending(current)) {
>>> +                     ret = -EINTR;
>>> +                     run->exit_reason = KVM_EXIT_INTR;
>>> +             }
>>> +
>>> +             if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>> +                     local_irq_enable();
>>> +                     continue;
>>> +             }
>>> +
>>> +             /**************************************************************
>>> +              * Enter the guest
>>> +              */
>>> +             trace_kvm_entry(*vcpu_pc(vcpu));
>>> +             kvm_guest_enter();
>>> +             vcpu->mode = IN_GUEST_MODE;
>> You need to set mode to IN_GUEST_MODE before disabling interrupt and
>> check that mode != EXITING_GUEST_MODE after disabling interrupt but
>> before entering the guest. This way you will catch kicks that were sent
>> between setting of the mode and disabling the interrupts. Also you need
>> to check vcpu->requests and exit if it is not empty. I see that you do
>> not use vcpu->requests at all, but you should since common kvm code
>> assumes that it is used. make_all_cpus_request() uses it for instance.
>>
>
> I don't quite agree, but almost:
>
> Why would you set IN_GUEST_MODE before disabling interrupts? The only
> reason I can see for to be a requirement is to leverage an implicit
> memory barrier. Receiving the IPI in this little window does nothing
> (the smp_cross_call is a noop).
>
> Checking that mode != EXITING_GUEST_MODE is equally useless in my
> opinion, as I read the requests code the only reason for this mode is
> to avoid sending an IPI twice.
>
> Kicks sent between setting the mode and disabling the interrupts is
> not the point, the point is to check the requests field (which we
> don't use at all on ARM, and generic code also doesn't use on ARM)
> after disabling interrupts, and after setting IN_GUEST_MODE.
>
> The patch below fixes your issues, and while I would push back on
> anything else than direct bug fixes at this point, the current code is
> semantically incorrect wrt. KVM vcpu requests, so it's worth a fix,
> and the patch itself is trivial.
>
[...]

Actually, I take that back, the kvm_vcpu_block function does make a
request, which we don't need to handle, so adding code that checks for
features we don't support is useless at this point. Please ignore the
patch I sent earlier.

Later on we can change some of the code to use the vcpu->features map
if there's a real benefit, but right now the priority is to merge this
code, so anything that's not a bugfix should not go in.

The srcu lock is a real bug though, and should be fixed.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-16  2:56         ` Rusty Russell
@ 2013-01-16  9:44           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-16  9:44 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Christoffer Dall, kvm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell, kvmarm, linux-arm-kernel

On Wed, Jan 16, 2013 at 01:26:01PM +1030, Rusty Russell wrote:
> Christoffer Dall <c.dall@virtualopensystems.com> writes:
> 
> > On Mon, Jan 14, 2013 at 11:24 AM, Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> >> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
> >>> +     /* -ENOENT for unknown features, -EINVAL for invalid combinations. */
> >>> +     for (i = 0; i < sizeof(init->features)*8; i++) {
> >>> +             if (init->features[i / 32] & (1 << (i % 32))) {
> >>
> >> Isn't this an open-coded version of test_bit() ?
> >
> > indeed, nicely spotted:
> 
> BTW, I wrote it that was out of excessive paranoia: it's a userspace
> API, and test_bit() won't be right on 64 bit BE systems.

So why is this a concern for 32-bit systems (which are, by definition,
only in arch/arm) ?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-16  9:44           ` Russell King - ARM Linux
  0 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-16  9:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 16, 2013 at 01:26:01PM +1030, Rusty Russell wrote:
> Christoffer Dall <c.dall@virtualopensystems.com> writes:
> 
> > On Mon, Jan 14, 2013 at 11:24 AM, Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> >> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
> >>> +     /* -ENOENT for unknown features, -EINVAL for invalid combinations. */
> >>> +     for (i = 0; i < sizeof(init->features)*8; i++) {
> >>> +             if (init->features[i / 32] & (1 << (i % 32))) {
> >>
> >> Isn't this an open-coded version of test_bit() ?
> >
> > indeed, nicely spotted:
> 
> BTW, I wrote it that was out of excessive paranoia: it's a userspace
> API, and test_bit() won't be right on 64 bit BE systems.

So why is this a concern for 32-bit systems (which are, by definition,
only in arch/arm) ?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
  2013-01-15 16:25               ` Alexander Graf
@ 2013-01-16 10:40                 ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 10:40 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-arm-kernel, Peter Maydell, Marcelo Tosatti, kvmarm, kvm

On Tue, Jan 15, 2013 at 05:25:13PM +0100, Alexander Graf wrote:
> On 01/15/2013 04:17 PM, Gleb Natapov wrote:
> >On Tue, Jan 15, 2013 at 02:04:47PM +0000, Peter Maydell wrote:
> >>On 15 January 2013 12:52, Gleb Natapov<gleb@redhat.com>  wrote:
> >>>On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
> >>>>On 15 January 2013 09:56, Gleb Natapov<gleb@redhat.com>  wrote:
> >>>>>>ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
> >>>>>CPU level interrupt should use KVM_INTERRUPT instead.
> >>>>No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
> >>>>delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
> >>>>can be fed to the kernel asynchronously. It happens that on x86 "must be
> >>>>delivered synchronously" and "not going to in kernel irqchip" are the same, but
> >>>>this isn't true for other archs. For ARM all our interrupts can be fed
> >>>>to the kernel asynchronously, and so we use KVM_IRQ_LINE in all
> >>>>cases.
> >>>I do no quite understand what you mean by synchronously and
> >>>asynchronously.
> >>Synchronously: the vcpu has to be stopped and userspace then
> >>feeds in the interrupt to be taken when the guest is resumed.
> >>Asynchronously: any old thread can tell the kernel there's an
> >>interrupt, and the guest vcpu then deals with it when needed
> >>(the vcpu thread may leave the guest but doesn't come out of
> >>the host kernel to qemu).
> >>
> >>>The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
> >>>is that former is used when destination cpu is known to userspace later
> >>>is used when kernel code is involved in figuring out the destination.
> >>This doesn't match up with Avi's explanation at all.
> >>
> >>>The
> >>>injections themselves are currently synchronous for both of them on x86
> >>>and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
> >>>be injected into a guest and vcpu state is changed to inject interrupt
> >>>during next guest entry. In the near feature x86 will be able to inject
> >>>interrupt without kicking vcpu out from the guest mode does ARM plan to
> >>>do the same? For GIC interrupts or for IRQ/FIQ or for both?
> >>>
> >>>>There was a big discussion thread about this on kvm and qemu-devel last
> >>>>July (and we cleaned up some of the QEMU code to not smoosh together
> >>>>all these different concepts under "do I have an irqchip or not?").
> >>>Do you have a pointer?
> >>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02460.html
> >>and there was a later longer (but less clear) thread which included
> >>this mail from Avi:
> >>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02872.html
> >>basically explaining that the reason for the weird synchronous
> >>KVM_INTERRUPT API is that it's emulating a weird synchronous
> >>hardware interface which is specific to x86. ARM doesn't have
> >>a synchronous interface in the same way, so it's much more
> >>straightforward to use KVM_IRQ_LINE.
> >>
> >OK. I see. So basically Avi saw KVM_INTERRUPT as an oddball interface
> >required only for APIC emulation in userspace. It is used for PIC also,
> >where this is not strictly needed, but this is for historical reasons
> >(KVM_IRQ_LINE was introduces late and it is GSI centric on x86).
> >
> >Thank you for the pointer.
> 
> Yeah, please keep in mind that KVM_INTERRUPT is not a unified
> interface either. In fact, it is asynchronous on PPC :). And it's
> called KVM_S390_INTERRUPT on s390 and also asynchronous. X86 is the
> oddball here.
> 
KVM_INTERRUPT needs vcpu fd to be issues. Usually such ioctls are
issued only by vcpu thread which makes them synchronous and vcpu_load()
synchronise them anyway if the rule is not met. And sure enough those
KVM_S390_INTERRUPT/KVM_INTERRUPT are special cased in kvm_vcpu_ioctl()
to not call vcpu_load(), sigh :(

There was an idea to change vcpu ioctls to kvm syscall which would have
made it impossible to use KVM_INTERRUPT asynchronously.


> But I don't care whether we call the ioctl to steer CPU interrupt
> pins KVM_INTERRUPT, KVM_S390_INTERRUPT or KVM_IRQ_LINE, as long as
> the code makes it obvious what is happening.
> 
Some consistency would be nice though. You do not always look at the
kernel code when you read userspace code and iothread calling KVM_INTERRUPT
would have made me suspicious.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [kvmarm] [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace
@ 2013-01-16 10:40                 ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 10:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 05:25:13PM +0100, Alexander Graf wrote:
> On 01/15/2013 04:17 PM, Gleb Natapov wrote:
> >On Tue, Jan 15, 2013 at 02:04:47PM +0000, Peter Maydell wrote:
> >>On 15 January 2013 12:52, Gleb Natapov<gleb@redhat.com>  wrote:
> >>>On Tue, Jan 15, 2013 at 12:15:01PM +0000, Peter Maydell wrote:
> >>>>On 15 January 2013 09:56, Gleb Natapov<gleb@redhat.com>  wrote:
> >>>>>>ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
> >>>>>CPU level interrupt should use KVM_INTERRUPT instead.
> >>>>No, that would be wrong. KVM_INTERRUPT is for interrupts which must be
> >>>>delivered synchronously to the CPU. KVM_IRQ_LINE is for interrupts which
> >>>>can be fed to the kernel asynchronously. It happens that on x86 "must be
> >>>>delivered synchronously" and "not going to in kernel irqchip" are the same, but
> >>>>this isn't true for other archs. For ARM all our interrupts can be fed
> >>>>to the kernel asynchronously, and so we use KVM_IRQ_LINE in all
> >>>>cases.
> >>>I do no quite understand what you mean by synchronously and
> >>>asynchronously.
> >>Synchronously: the vcpu has to be stopped and userspace then
> >>feeds in the interrupt to be taken when the guest is resumed.
> >>Asynchronously: any old thread can tell the kernel there's an
> >>interrupt, and the guest vcpu then deals with it when needed
> >>(the vcpu thread may leave the guest but doesn't come out of
> >>the host kernel to qemu).
> >>
> >>>The difference between KVM_INTERRUPT and KVM_IRQ_LINE line
> >>>is that former is used when destination cpu is known to userspace later
> >>>is used when kernel code is involved in figuring out the destination.
> >>This doesn't match up with Avi's explanation at all.
> >>
> >>>The
> >>>injections themselves are currently synchronous for both of them on x86
> >>>and ARM. i.e vcpu is kicked out from guest mode when interrupt need to
> >>>be injected into a guest and vcpu state is changed to inject interrupt
> >>>during next guest entry. In the near feature x86 will be able to inject
> >>>interrupt without kicking vcpu out from the guest mode does ARM plan to
> >>>do the same? For GIC interrupts or for IRQ/FIQ or for both?
> >>>
> >>>>There was a big discussion thread about this on kvm and qemu-devel last
> >>>>July (and we cleaned up some of the QEMU code to not smoosh together
> >>>>all these different concepts under "do I have an irqchip or not?").
> >>>Do you have a pointer?
> >>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02460.html
> >>and there was a later longer (but less clear) thread which included
> >>this mail from Avi:
> >>   http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg02872.html
> >>basically explaining that the reason for the weird synchronous
> >>KVM_INTERRUPT API is that it's emulating a weird synchronous
> >>hardware interface which is specific to x86. ARM doesn't have
> >>a synchronous interface in the same way, so it's much more
> >>straightforward to use KVM_IRQ_LINE.
> >>
> >OK. I see. So basically Avi saw KVM_INTERRUPT as an oddball interface
> >required only for APIC emulation in userspace. It is used for PIC also,
> >where this is not strictly needed, but this is for historical reasons
> >(KVM_IRQ_LINE was introduces late and it is GSI centric on x86).
> >
> >Thank you for the pointer.
> 
> Yeah, please keep in mind that KVM_INTERRUPT is not a unified
> interface either. In fact, it is asynchronous on PPC :). And it's
> called KVM_S390_INTERRUPT on s390 and also asynchronous. X86 is the
> oddball here.
> 
KVM_INTERRUPT needs vcpu fd to be issues. Usually such ioctls are
issued only by vcpu thread which makes them synchronous and vcpu_load()
synchronise them anyway if the rule is not met. And sure enough those
KVM_S390_INTERRUPT/KVM_INTERRUPT are special cased in kvm_vcpu_ioctl()
to not call vcpu_load(), sigh :(

There was an idea to change vcpu ioctls to kvm syscall which would have
made it impossible to use KVM_INTERRUPT asynchronously.


> But I don't care whether we call the ioctl to steer CPU interrupt
> pins KVM_INTERRUPT, KVM_S390_INTERRUPT or KVM_IRQ_LINE, as long as
> the code makes it obvious what is happening.
> 
Some consistency would be nice though. You do not always look at the
kernel code when you read userspace code and iothread calling KVM_INTERRUPT
would have made me suspicious.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-16  2:08       ` Christoffer Dall
@ 2013-01-16 12:12         ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 12:12 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Antonios Motakis,
	Marcelo Tosatti, Rusty Russell, nicolas

On Tue, Jan 15, 2013 at 09:08:11PM -0500, Christoffer Dall wrote:
> On Tue, Jan 15, 2013 at 4:43 AM, Gleb Natapov <gleb@redhat.com> wrote:
> > On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
> >> Provides complete world-switch implementation to switch to other guests
> >> running in non-secure modes. Includes Hyp exception handlers that
> >> capture necessary exception information and stores the information on
> >> the VCPU and KVM structures.
> >>
> >> The following Hyp-ABI is also documented in the code:
> >>
> >> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
> >>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
> >>    exception vector code will check that the HVC comes from VMID==0 and if
> >>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
> >>    - r0 contains a pointer to a HYP function
> >>    - r1, r2, and r3 contain arguments to the above function.
> >>    - The HYP function will be called with its arguments in r0, r1 and r2.
> >>    On HYP function return, we return directly to SVC.
> >>
> >> A call to a function executing in Hyp mode is performed like the following:
> >>
> >>         <svc code>
> >>         ldr     r0, =BSYM(my_hyp_fn)
> >>         ldr     r1, =my_param
> >>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
> >>         <svc code>
> >>
> >> Otherwise, the world-switch is pretty straight-forward. All state that
> >> can be modified by the guest is first backed up on the Hyp stack and the
> >> VCPU values is loaded onto the hardware. State, which is not loaded, but
> >> theoretically modifiable by the guest is protected through the
> >> virtualiation features to generate a trap and cause software emulation.
> >> Upon guest returns, all state is restored from hardware onto the VCPU
> >> struct and the original state is restored from the Hyp-stack onto the
> >> hardware.
> >>
> >> SMP support using the VMPIDR calculated on the basis of the host MPIDR
> >> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
> >>
> >> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
> >> a separate patch into the appropriate patches introducing the
> >> functionality. Note that the VMIDs are stored per VM as required by the ARM
> >> architecture reference manual.
> >>
> >> To support VFP/NEON we trap those instructions using the HPCTR. When
> >> we trap, we switch the FPU.  After a guest exit, the VFP state is
> >> returned to the host.  When disabling access to floating point
> >> instructions, we also mask FPEXC_EN in order to avoid the guest
> >> receiving Undefined instruction exceptions before we have a chance to
> >> switch back the floating point state.  We are reusing vfp_hard_struct,
> >> so we depend on VFPv3 being enabled in the host kernel, if not, we still
> >> trap cp10 and cp11 in order to inject an undefined instruction exception
> >> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
> >> Antionios Motakis and Rusty Russell.
> >>
> >> Aborts that are permission faults, and not stage-1 page table walk, do
> >> not report the faulting address in the HPFAR.  We have to resolve the
> >> IPA, and store it just like the HPFAR register on the VCPU struct. If
> >> the IPA cannot be resolved, it means another CPU is playing with the
> >> page tables, and we simply restart the guest.  This quirk was fixed by
> >> Marc Zyngier.
> >>
> >> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> >> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> >> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> >> ---
> >>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
> >>  arch/arm/include/asm/kvm_host.h |   10 +
> >>  arch/arm/kernel/asm-offsets.c   |   25 ++
> >>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
> >>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
> >>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
> >>  6 files changed, 1108 insertions(+), 4 deletions(-)
> >>  create mode 100644 arch/arm/kvm/interrupts_head.S
> >>
> >> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> >> index fb22ee8..a3262a2 100644
> >> --- a/arch/arm/include/asm/kvm_arm.h
> >> +++ b/arch/arm/include/asm/kvm_arm.h
> >> @@ -98,6 +98,18 @@
> >>  #define TTBCR_T0SZ   3
> >>  #define HTCR_MASK    (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
> >>
> >> +/* Hyp System Trap Register */
> >> +#define HSTR_T(x)    (1 << x)
> >> +#define HSTR_TTEE    (1 << 16)
> >> +#define HSTR_TJDBX   (1 << 17)
> >> +
> >> +/* Hyp Coprocessor Trap Register */
> >> +#define HCPTR_TCP(x) (1 << x)
> >> +#define HCPTR_TCP_MASK       (0x3fff)
> >> +#define HCPTR_TASE   (1 << 15)
> >> +#define HCPTR_TTA    (1 << 20)
> >> +#define HCPTR_TCPAC  (1 << 31)
> >> +
> >>  /* Hyp Debug Configuration Register bits */
> >>  #define HDCR_TDRA    (1 << 11)
> >>  #define HDCR_TDOSA   (1 << 10)
> >> @@ -144,6 +156,45 @@
> >>  #else
> >>  #define VTTBR_X              (5 - KVM_T0SZ)
> >>  #endif
> >> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
> >> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
> >> +#define VTTBR_VMID_SHIFT  (48LLU)
> >> +#define VTTBR_VMID_MASK        (0xffLLU << VTTBR_VMID_SHIFT)
> >> +
> >> +/* Hyp Syndrome Register (HSR) bits */
> >> +#define HSR_EC_SHIFT (26)
> >> +#define HSR_EC               (0x3fU << HSR_EC_SHIFT)
> >> +#define HSR_IL               (1U << 25)
> >> +#define HSR_ISS              (HSR_IL - 1)
> >> +#define HSR_ISV_SHIFT        (24)
> >> +#define HSR_ISV              (1U << HSR_ISV_SHIFT)
> >> +#define HSR_FSC              (0x3f)
> >> +#define HSR_FSC_TYPE (0x3c)
> >> +#define HSR_WNR              (1 << 6)
> >> +
> >> +#define FSC_FAULT    (0x04)
> >> +#define FSC_PERM     (0x0c)
> >> +
> >> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> >> +#define HPFAR_MASK   (~0xf)
> >>
> >> +#define HSR_EC_UNKNOWN       (0x00)
> >> +#define HSR_EC_WFI   (0x01)
> >> +#define HSR_EC_CP15_32       (0x03)
> >> +#define HSR_EC_CP15_64       (0x04)
> >> +#define HSR_EC_CP14_MR       (0x05)
> >> +#define HSR_EC_CP14_LS       (0x06)
> >> +#define HSR_EC_CP_0_13       (0x07)
> >> +#define HSR_EC_CP10_ID       (0x08)
> >> +#define HSR_EC_JAZELLE       (0x09)
> >> +#define HSR_EC_BXJ   (0x0A)
> >> +#define HSR_EC_CP14_64       (0x0C)
> >> +#define HSR_EC_SVC_HYP       (0x11)
> >> +#define HSR_EC_HVC   (0x12)
> >> +#define HSR_EC_SMC   (0x13)
> >> +#define HSR_EC_IABT  (0x20)
> >> +#define HSR_EC_IABT_HYP      (0x21)
> >> +#define HSR_EC_DABT  (0x24)
> >> +#define HSR_EC_DABT_HYP      (0x25)
> >>
> >>  #endif /* __ARM_KVM_ARM_H__ */
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index 1de6f0d..ddb09da 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -21,6 +21,7 @@
> >>
> >>  #include <asm/kvm.h>
> >>  #include <asm/kvm_asm.h>
> >> +#include <asm/fpstate.h>
> >>
> >>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
> >>  #define KVM_USER_MEM_SLOTS 32
> >> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
> >>       u32 hxfar;              /* Hyp Data/Inst Fault Address Register */
> >>       u32 hpfar;              /* Hyp IPA Fault Address Register */
> >>
> >> +     /* Floating point registers (VFP and Advanced SIMD/NEON) */
> >> +     struct vfp_hard_struct vfp_guest;
> >> +     struct vfp_hard_struct *vfp_host;
> >> +
> >> +     /*
> >> +      * Anything that is not used directly from assembly code goes
> >> +      * here.
> >> +      */
> >>       /* Interrupt related fields */
> >>       u32 irq_lines;          /* IRQ and FIQ levels */
> >>
> >> @@ -112,6 +121,7 @@ struct kvm_one_reg;
> >>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> >>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> >>  u64 kvm_call_hyp(void *hypfn, ...);
> >> +void force_vm_exit(const cpumask_t *mask);
> >>
> >>  #define KVM_ARCH_WANT_MMU_NOTIFIER
> >>  struct kvm;
> >> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> >> index c985b48..c8b3272 100644
> >> --- a/arch/arm/kernel/asm-offsets.c
> >> +++ b/arch/arm/kernel/asm-offsets.c
> >> @@ -13,6 +13,9 @@
> >>  #include <linux/sched.h>
> >>  #include <linux/mm.h>
> >>  #include <linux/dma-mapping.h>
> >> +#ifdef CONFIG_KVM_ARM_HOST
> >> +#include <linux/kvm_host.h>
> >> +#endif
> >>  #include <asm/cacheflush.h>
> >>  #include <asm/glue-df.h>
> >>  #include <asm/glue-pf.h>
> >> @@ -146,5 +149,27 @@ int main(void)
> >>    DEFINE(DMA_BIDIRECTIONAL,  DMA_BIDIRECTIONAL);
> >>    DEFINE(DMA_TO_DEVICE,              DMA_TO_DEVICE);
> >>    DEFINE(DMA_FROM_DEVICE,    DMA_FROM_DEVICE);
> >> +#ifdef CONFIG_KVM_ARM_HOST
> >> +  DEFINE(VCPU_KVM,           offsetof(struct kvm_vcpu, kvm));
> >> +  DEFINE(VCPU_MIDR,          offsetof(struct kvm_vcpu, arch.midr));
> >> +  DEFINE(VCPU_CP15,          offsetof(struct kvm_vcpu, arch.cp15));
> >> +  DEFINE(VCPU_VFP_GUEST,     offsetof(struct kvm_vcpu, arch.vfp_guest));
> >> +  DEFINE(VCPU_VFP_HOST,              offsetof(struct kvm_vcpu, arch.vfp_host));
> >> +  DEFINE(VCPU_REGS,          offsetof(struct kvm_vcpu, arch.regs));
> >> +  DEFINE(VCPU_USR_REGS,              offsetof(struct kvm_vcpu, arch.regs.usr_regs));
> >> +  DEFINE(VCPU_SVC_REGS,              offsetof(struct kvm_vcpu, arch.regs.svc_regs));
> >> +  DEFINE(VCPU_ABT_REGS,              offsetof(struct kvm_vcpu, arch.regs.abt_regs));
> >> +  DEFINE(VCPU_UND_REGS,              offsetof(struct kvm_vcpu, arch.regs.und_regs));
> >> +  DEFINE(VCPU_IRQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.irq_regs));
> >> +  DEFINE(VCPU_FIQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
> >> +  DEFINE(VCPU_PC,            offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
> >> +  DEFINE(VCPU_CPSR,          offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
> >> +  DEFINE(VCPU_IRQ_LINES,     offsetof(struct kvm_vcpu, arch.irq_lines));
> >> +  DEFINE(VCPU_HSR,           offsetof(struct kvm_vcpu, arch.hsr));
> >> +  DEFINE(VCPU_HxFAR,         offsetof(struct kvm_vcpu, arch.hxfar));
> >> +  DEFINE(VCPU_HPFAR,         offsetof(struct kvm_vcpu, arch.hpfar));
> >> +  DEFINE(VCPU_HYP_PC,                offsetof(struct kvm_vcpu, arch.hyp_pc));
> >> +  DEFINE(KVM_VTTBR,          offsetof(struct kvm, arch.vttbr));
> >> +#endif
> >>    return 0;
> >>  }
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index 9b4566e..c94d278 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -40,6 +40,7 @@
> >>  #include <asm/kvm_arm.h>
> >>  #include <asm/kvm_asm.h>
> >>  #include <asm/kvm_mmu.h>
> >> +#include <asm/kvm_emulate.h>
> >>
> >>  #ifdef REQUIRES_VIRT
> >>  __asm__(".arch_extension     virt");
> >> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> >>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
> >>  static unsigned long hyp_default_vectors;
> >>
> >> +/* The VMID used in the VTTBR */
> >> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> >> +static u8 kvm_next_vmid;
> >> +static DEFINE_SPINLOCK(kvm_vmid_lock);
> >>
> >>  int kvm_arch_hardware_enable(void *garbage)
> >>  {
> >> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
> >>
> >>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >>  {
> >> +     /* Force users to call KVM_ARM_VCPU_INIT */
> >> +     vcpu->arch.target = -1;
> >>       return 0;
> >>  }
> >>
> >> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
> >>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>  {
> >>       vcpu->cpu = cpu;
> >> +     vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
> >>  }
> >>
> >>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> >>
> >>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
> > As far as I see the function is unused.
> >
> >>  {
> >> +     return v->mode == IN_GUEST_MODE;
> >> +}
> >> +
> >> +/* Just ensure a guest exit from a particular CPU */
> >> +static void exit_vm_noop(void *info)
> >> +{
> >> +}
> >> +
> >> +void force_vm_exit(const cpumask_t *mask)
> >> +{
> >> +     smp_call_function_many(mask, exit_vm_noop, NULL, true);
> >> +}
> > There is make_all_cpus_request() for that. It actually sends IPIs only
> > to cpus that are running vcpus.
> >
> >> +
> >> +/**
> >> + * need_new_vmid_gen - check that the VMID is still valid
> >> + * @kvm: The VM's VMID to checkt
> >> + *
> >> + * return true if there is a new generation of VMIDs being used
> >> + *
> >> + * The hardware supports only 256 values with the value zero reserved for the
> >> + * host, so we check if an assigned value belongs to a previous generation,
> >> + * which which requires us to assign a new value. If we're the first to use a
> >> + * VMID for the new generation, we must flush necessary caches and TLBs on all
> >> + * CPUs.
> >> + */
> >> +static bool need_new_vmid_gen(struct kvm *kvm)
> >> +{
> >> +     return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
> >> +}
> >> +
> >> +/**
> >> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
> >> + * @kvm      The guest that we are about to run
> >> + *
> >> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
> >> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
> >> + * caches and TLBs.
> >> + */
> >> +static void update_vttbr(struct kvm *kvm)
> >> +{
> >> +     phys_addr_t pgd_phys;
> >> +     u64 vmid;
> >> +
> >> +     if (!need_new_vmid_gen(kvm))
> >> +             return;
> >> +
> >> +     spin_lock(&kvm_vmid_lock);
> >> +
> >> +     /*
> >> +      * We need to re-check the vmid_gen here to ensure that if another vcpu
> >> +      * already allocated a valid vmid for this vm, then this vcpu should
> >> +      * use the same vmid.
> >> +      */
> >> +     if (!need_new_vmid_gen(kvm)) {
> >> +             spin_unlock(&kvm_vmid_lock);
> >> +             return;
> >> +     }
> >> +
> >> +     /* First user of a new VMID generation? */
> >> +     if (unlikely(kvm_next_vmid == 0)) {
> >> +             atomic64_inc(&kvm_vmid_gen);
> >> +             kvm_next_vmid = 1;
> >> +
> >> +             /*
> >> +              * On SMP we know no other CPUs can use this CPU's or each
> >> +              * other's VMID after force_vm_exit returns since the
> >> +              * kvm_vmid_lock blocks them from reentry to the guest.
> >> +              */
> >> +             force_vm_exit(cpu_all_mask);
> >> +             /*
> >> +              * Now broadcast TLB + ICACHE invalidation over the inner
> >> +              * shareable domain to make sure all data structures are
> >> +              * clean.
> >> +              */
> >> +             kvm_call_hyp(__kvm_flush_vm_context);
> >> +     }
> >> +
> >> +     kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
> >> +     kvm->arch.vmid = kvm_next_vmid;
> >> +     kvm_next_vmid++;
> >> +
> >> +     /* update vttbr to be used with the new vmid */
> >> +     pgd_phys = virt_to_phys(kvm->arch.pgd);
> >> +     vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
> >> +     kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
> >> +     kvm->arch.vttbr |= vmid;
> >> +
> >> +     spin_unlock(&kvm_vmid_lock);
> >> +}
> >> +
> >> +/*
> >> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
> >> + * proper exit to QEMU.
> >> + */
> >> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >> +                    int exception_index)
> >> +{
> >> +     run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> >>       return 0;
> >>  }
> >>
> >> +/**
> >> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
> >> + * @vcpu:    The VCPU pointer
> >> + * @run:     The kvm_run structure pointer used for userspace state exchange
> >> + *
> >> + * This function is called through the VCPU_RUN ioctl called from user space. It
> >> + * will execute VM code in a loop until the time slice for the process is used
> >> + * or some emulation is needed from user space in which case the function will
> >> + * return with return value 0 and with the kvm_run structure filled in with the
> >> + * required data for the requested emulation.
> >> + */
> >>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>  {
> >> -     return -EINVAL;
> >> +     int ret;
> >> +     sigset_t sigsaved;
> >> +
> >> +     /* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
> >> +     if (unlikely(vcpu->arch.target < 0))
> >> +             return -ENOEXEC;
> >> +
> >> +     if (vcpu->sigset_active)
> >> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
> >> +
> >> +     ret = 1;
> >> +     run->exit_reason = KVM_EXIT_UNKNOWN;
> >> +     while (ret > 0) {
> >> +             /*
> >> +              * Check conditions before entering the guest
> >> +              */
> >> +             cond_resched();
> >> +
> >> +             update_vttbr(vcpu->kvm);
> >> +
> >> +             local_irq_disable();
> >> +
> >> +             /*
> >> +              * Re-check atomic conditions
> >> +              */
> >> +             if (signal_pending(current)) {
> >> +                     ret = -EINTR;
> >> +                     run->exit_reason = KVM_EXIT_INTR;
> >> +             }
> >> +
> >> +             if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >> +                     local_irq_enable();
> >> +                     continue;
> >> +             }
> >> +
> >> +             /**************************************************************
> >> +              * Enter the guest
> >> +              */
> >> +             trace_kvm_entry(*vcpu_pc(vcpu));
> >> +             kvm_guest_enter();
> >> +             vcpu->mode = IN_GUEST_MODE;
> > You need to set mode to IN_GUEST_MODE before disabling interrupt and
> > check that mode != EXITING_GUEST_MODE after disabling interrupt but
> > before entering the guest. This way you will catch kicks that were sent
> > between setting of the mode and disabling the interrupts. Also you need
> > to check vcpu->requests and exit if it is not empty. I see that you do
> > not use vcpu->requests at all, but you should since common kvm code
> > assumes that it is used. make_all_cpus_request() uses it for instance.
> >
> 
> I don't quite agree, but almost:
> 
> Why would you set IN_GUEST_MODE before disabling interrupts? The only
> reason I can see for to be a requirement is to leverage an implicit
> memory barrier. Receiving the IPI in this little window does nothing
> (the smp_cross_call is a noop).
> 
> Checking that mode != EXITING_GUEST_MODE is equally useless in my
> opinion, as I read the requests code the only reason for this mode is
> to avoid sending an IPI twice.
> 
> Kicks sent between setting the mode and disabling the interrupts is
> not the point, the point is to check the requests field (which we
> don't use at all on ARM, and generic code also doesn't use on ARM)
> after disabling interrupts, and after setting IN_GUEST_MODE.
> 
Yes, you are right. There is not race here.

> The patch below fixes your issues, and while I would push back on
> anything else than direct bug fixes at this point, the current code is
> semantically incorrect wrt. KVM vcpu requests, so it's worth a fix,
> and the patch itself is trivial.
> 
> >> +
> >> +             ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> > You do not take kvm->srcu lock before entering the guest. It looks
> > wrong.
> >
> 
> why would I take that before entering the guest? The only thing the
Right. No need to take it before entering a guest of course. I should
have said "after". Anyway absents of srcu handling raced my suspicion
and made me check all the code for kvm->srcu handling. I expected to see
kvm->srcu handling in vcpu_run() because x86 locks srcu at the beginning
of vcpu loop, release it before guest entry and retakes it after exit.
This way all the code called from vcpu run loop is protected. This is of
course not the only way to tackle it and you can do locking only around
memslot use.

> read side RCU protects against is the memslots data structure as far
> as I can see, so the second patch pasted below fixes this for the code
> that actually accesses this data structure.
Many memory related functions that you call access memslots under the
hood and assume that locking is done by the caller. From the quick look
I found those that you've missed:
kvm_is_visible_gfn()
kvm_read_guest()
gfn_to_hva()
gfn_to_pfn_prot()
kvm_memslots()

May be there are more. Can you enable RCU debugging in your kernel config
and check? This does not guaranty that it will catch all of the places,
but better than nothing.

> 
> >> +
> >> +             vcpu->mode = OUTSIDE_GUEST_MODE;
> >> +             kvm_guest_exit();
> >> +             trace_kvm_exit(*vcpu_pc(vcpu));
> >> +             /*
> >> +              * We may have taken a host interrupt in HYP mode (ie
> >> +              * while executing the guest). This interrupt is still
> >> +              * pending, as we haven't serviced it yet!
> >> +              *
> >> +              * We're now back in SVC mode, with interrupts
> >> +              * disabled.  Enabling the interrupts now will have
> >> +              * the effect of taking the interrupt again, in SVC
> >> +              * mode this time.
> >> +              */
> >> +             local_irq_enable();
> >> +
> >> +             /*
> >> +              * Back from guest
> >> +              *************************************************************/
> >> +
> >> +             ret = handle_exit(vcpu, run, ret);
> >> +     }
> >> +
> >> +     if (vcpu->sigset_active)
> >> +             sigprocmask(SIG_SETMASK, &sigsaved, NULL);
> >> +     return ret;
> >>  }
> >>
> >>  static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
> >> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> >> index a923590..08adcd5 100644
> >> --- a/arch/arm/kvm/interrupts.S
> >> +++ b/arch/arm/kvm/interrupts.S
> >> @@ -20,9 +20,12 @@
> >>  #include <linux/const.h>
> >>  #include <asm/unified.h>
> >>  #include <asm/page.h>
> >> +#include <asm/ptrace.h>
> >>  #include <asm/asm-offsets.h>
> >>  #include <asm/kvm_asm.h>
> >>  #include <asm/kvm_arm.h>
> >> +#include <asm/vfpmacros.h>
> >> +#include "interrupts_head.S"
> >>
> >>       .text
> >>
> >> @@ -31,36 +34,423 @@ __kvm_hyp_code_start:
> >>
> >>  /********************************************************************
> >>   * Flush per-VMID TLBs
> >> + *
> >> + * void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >> + *
> >> + * We rely on the hardware to broadcast the TLB invalidation to all CPUs
> >> + * inside the inner-shareable domain (which is the case for all v7
> >> + * implementations).  If we come across a non-IS SMP implementation, we'll
> >> + * have to use an IPI based mechanism. Until then, we stick to the simple
> >> + * hardware assisted version.
> >>   */
> >>  ENTRY(__kvm_tlb_flush_vmid)
> >> +     push    {r2, r3}
> >> +
> >> +     add     r0, r0, #KVM_VTTBR
> >> +     ldrd    r2, r3, [r0]
> >> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
> >> +     isb
> >> +     mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
> >> +     dsb
> >> +     isb
> >> +     mov     r2, #0
> >> +     mov     r3, #0
> >> +     mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
> >> +     isb                             @ Not necessary if followed by eret
> >> +
> >> +     pop     {r2, r3}
> >>       bx      lr
> >>  ENDPROC(__kvm_tlb_flush_vmid)
> >>
> >>  /********************************************************************
> >> - * Flush TLBs and instruction caches of current CPU for all VMIDs
> >> + * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
> >> + * domain, for all VMIDs
> >> + *
> >> + * void __kvm_flush_vm_context(void);
> >>   */
> >>  ENTRY(__kvm_flush_vm_context)
> >> +     mov     r0, #0                  @ rn parameter for c15 flushes is SBZ
> >> +
> >> +     /* Invalidate NS Non-Hyp TLB Inner Shareable (TLBIALLNSNHIS) */
> >> +     mcr     p15, 4, r0, c8, c3, 4
> >> +     /* Invalidate instruction caches Inner Shareable (ICIALLUIS) */
> >> +     mcr     p15, 0, r0, c7, c1, 0
> >> +     dsb
> >> +     isb                             @ Not necessary if followed by eret
> >> +
> >>       bx      lr
> >>  ENDPROC(__kvm_flush_vm_context)
> >>
> >> +
> >>  /********************************************************************
> >>   *  Hypervisor world-switch code
> >> + *
> >> + *
> >> + * int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >>   */
> >>  ENTRY(__kvm_vcpu_run)
> >> -     bx      lr
> >> +     @ Save the vcpu pointer
> >> +     mcr     p15, 4, vcpu, c13, c0, 2        @ HTPIDR
> >> +
> >> +     save_host_regs
> >> +
> >> +     @ Store hardware CP15 state and load guest state
> >> +     read_cp15_state store_to_vcpu = 0
> >> +     write_cp15_state read_from_vcpu = 1
> >> +
> >> +     @ If the host kernel has not been configured with VFPv3 support,
> >> +     @ then it is safer if we deny guests from using it as well.
> >> +#ifdef CONFIG_VFPv3
> >> +     @ Set FPEXC_EN so the guest doesn't trap floating point instructions
> >> +     VFPFMRX r2, FPEXC               @ VMRS
> >> +     push    {r2}
> >> +     orr     r2, r2, #FPEXC_EN
> >> +     VFPFMXR FPEXC, r2               @ VMSR
> >> +#endif
> >> +
> >> +     @ Configure Hyp-role
> >> +     configure_hyp_role vmentry
> >> +
> >> +     @ Trap coprocessor CRx accesses
> >> +     set_hstr vmentry
> >> +     set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> >> +     set_hdcr vmentry
> >> +
> >> +     @ Write configured ID register into MIDR alias
> >> +     ldr     r1, [vcpu, #VCPU_MIDR]
> >> +     mcr     p15, 4, r1, c0, c0, 0
> >> +
> >> +     @ Write guest view of MPIDR into VMPIDR
> >> +     ldr     r1, [vcpu, #CP15_OFFSET(c0_MPIDR)]
> >> +     mcr     p15, 4, r1, c0, c0, 5
> >> +
> >> +     @ Set up guest memory translation
> >> +     ldr     r1, [vcpu, #VCPU_KVM]
> >> +     add     r1, r1, #KVM_VTTBR
> >> +     ldrd    r2, r3, [r1]
> >> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
> >> +
> >> +     @ We're all done, just restore the GPRs and go to the guest
> >> +     restore_guest_regs
> >> +     clrex                           @ Clear exclusive monitor
> >> +     eret
> >> +
> >> +__kvm_vcpu_return:
> >> +     /*
> >> +      * return convention:
> >> +      * guest r0, r1, r2 saved on the stack
> >> +      * r0: vcpu pointer
> >> +      * r1: exception code
> >> +      */
> >> +     save_guest_regs
> >> +
> >> +     @ Set VMID == 0
> >> +     mov     r2, #0
> >> +     mov     r3, #0
> >> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
> >> +
> >> +     @ Don't trap coprocessor accesses for host kernel
> >> +     set_hstr vmexit
> >> +     set_hdcr vmexit
> >> +     set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> >> +
> >> +#ifdef CONFIG_VFPv3
> >> +     @ Save floating point registers we if let guest use them.
> >> +     tst     r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
> >> +     bne     after_vfp_restore
> >> +
> >> +     @ Switch VFP/NEON hardware state to the host's
> >> +     add     r7, vcpu, #VCPU_VFP_GUEST
> >> +     store_vfp_state r7
> >> +     add     r7, vcpu, #VCPU_VFP_HOST
> >> +     ldr     r7, [r7]
> >> +     restore_vfp_state r7
> >> +
> >> +after_vfp_restore:
> >> +     @ Restore FPEXC_EN which we clobbered on entry
> >> +     pop     {r2}
> >> +     VFPFMXR FPEXC, r2
> >> +#endif
> >> +
> >> +     @ Reset Hyp-role
> >> +     configure_hyp_role vmexit
> >> +
> >> +     @ Let host read hardware MIDR
> >> +     mrc     p15, 0, r2, c0, c0, 0
> >> +     mcr     p15, 4, r2, c0, c0, 0
> >> +
> >> +     @ Back to hardware MPIDR
> >> +     mrc     p15, 0, r2, c0, c0, 5
> >> +     mcr     p15, 4, r2, c0, c0, 5
> >> +
> >> +     @ Store guest CP15 state and restore host state
> >> +     read_cp15_state store_to_vcpu = 1
> >> +     write_cp15_state read_from_vcpu = 0
> >> +
> >> +     restore_host_regs
> >> +     clrex                           @ Clear exclusive monitor
> >> +     mov     r0, r1                  @ Return the return code
> >> +     bx      lr                      @ return to IOCTL
> >>
> >>  ENTRY(kvm_call_hyp)
> >> +     hvc     #0
> >>       bx      lr
> >>
> >>
> >>  /********************************************************************
> >>   * Hypervisor exception vector and handlers
> >> + *
> >> + *
> >> + * The KVM/ARM Hypervisor ABI is defined as follows:
> >> + *
> >> + * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
> >> + * instruction is issued since all traps are disabled when running the host
> >> + * kernel as per the Hyp-mode initialization at boot time.
> >> + *
> >> + * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
> >> + * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
> >> + * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
> >> + * instructions are called from within Hyp-mode.
> >> + *
> >> + * Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
> >> + *    Switching to Hyp mode is done through a simple HVC #0 instruction. The
> >> + *    exception vector code will check that the HVC comes from VMID==0 and if
> >> + *    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
> >> + *    - r0 contains a pointer to a HYP function
> >> + *    - r1, r2, and r3 contain arguments to the above function.
> >> + *    - The HYP function will be called with its arguments in r0, r1 and r2.
> >> + *    On HYP function return, we return directly to SVC.
> >> + *
> >> + * Note that the above is used to execute code in Hyp-mode from a host-kernel
> >> + * point of view, and is a different concept from performing a world-switch and
> >> + * executing guest code SVC mode (with a VMID != 0).
> >>   */
> >>
> >> +/* Handle undef, svc, pabt, or dabt by crashing with a user notice */
> >> +.macro bad_exception exception_code, panic_str
> >> +     push    {r0-r2}
> >> +     mrrc    p15, 6, r0, r1, c2      @ Read VTTBR
> >> +     lsr     r1, r1, #16
> >> +     ands    r1, r1, #0xff
> >> +     beq     99f
> >> +
> >> +     load_vcpu                       @ Load VCPU pointer
> >> +     .if \exception_code == ARM_EXCEPTION_DATA_ABORT
> >> +     mrc     p15, 4, r2, c5, c2, 0   @ HSR
> >> +     mrc     p15, 4, r1, c6, c0, 0   @ HDFAR
> >> +     str     r2, [vcpu, #VCPU_HSR]
> >> +     str     r1, [vcpu, #VCPU_HxFAR]
> >> +     .endif
> >> +     .if \exception_code == ARM_EXCEPTION_PREF_ABORT
> >> +     mrc     p15, 4, r2, c5, c2, 0   @ HSR
> >> +     mrc     p15, 4, r1, c6, c0, 2   @ HIFAR
> >> +     str     r2, [vcpu, #VCPU_HSR]
> >> +     str     r1, [vcpu, #VCPU_HxFAR]
> >> +     .endif
> >> +     mov     r1, #\exception_code
> >> +     b       __kvm_vcpu_return
> >> +
> >> +     @ We were in the host already. Let's craft a panic-ing return to SVC.
> >> +99:  mrs     r2, cpsr
> >> +     bic     r2, r2, #MODE_MASK
> >> +     orr     r2, r2, #SVC_MODE
> >> +THUMB(       orr     r2, r2, #PSR_T_BIT      )
> >> +     msr     spsr_cxsf, r2
> >> +     mrs     r1, ELR_hyp
> >> +     ldr     r2, =BSYM(panic)
> >> +     msr     ELR_hyp, r2
> >> +     ldr     r0, =\panic_str
> >> +     eret
> >> +.endm
> >> +
> >> +     .text
> >> +
> >>       .align 5
> >>  __kvm_hyp_vector:
> >>       .globl __kvm_hyp_vector
> >> -     nop
> >> +
> >> +     @ Hyp-mode exception vector
> >> +     W(b)    hyp_reset
> >> +     W(b)    hyp_undef
> >> +     W(b)    hyp_svc
> >> +     W(b)    hyp_pabt
> >> +     W(b)    hyp_dabt
> >> +     W(b)    hyp_hvc
> >> +     W(b)    hyp_irq
> >> +     W(b)    hyp_fiq
> >> +
> >> +     .align
> >> +hyp_reset:
> >> +     b       hyp_reset
> >> +
> >> +     .align
> >> +hyp_undef:
> >> +     bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str
> >> +
> >> +     .align
> >> +hyp_svc:
> >> +     bad_exception ARM_EXCEPTION_HVC, svc_die_str
> >> +
> >> +     .align
> >> +hyp_pabt:
> >> +     bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str
> >> +
> >> +     .align
> >> +hyp_dabt:
> >> +     bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str
> >> +
> >> +     .align
> >> +hyp_hvc:
> >> +     /*
> >> +      * Getting here is either becuase of a trap from a guest or from calling
> >> +      * HVC from the host kernel, which means "switch to Hyp mode".
> >> +      */
> >> +     push    {r0, r1, r2}
> >> +
> >> +     @ Check syndrome register
> >> +     mrc     p15, 4, r1, c5, c2, 0   @ HSR
> >> +     lsr     r0, r1, #HSR_EC_SHIFT
> >> +#ifdef CONFIG_VFPv3
> >> +     cmp     r0, #HSR_EC_CP_0_13
> >> +     beq     switch_to_guest_vfp
> >> +#endif
> >> +     cmp     r0, #HSR_EC_HVC
> >> +     bne     guest_trap              @ Not HVC instr.
> >> +
> >> +     /*
> >> +      * Let's check if the HVC came from VMID 0 and allow simple
> >> +      * switch to Hyp mode
> >> +      */
> >> +     mrrc    p15, 6, r0, r2, c2
> >> +     lsr     r2, r2, #16
> >> +     and     r2, r2, #0xff
> >> +     cmp     r2, #0
> >> +     bne     guest_trap              @ Guest called HVC
> >> +
> >> +host_switch_to_hyp:
> >> +     pop     {r0, r1, r2}
> >> +
> >> +     push    {lr}
> >> +     mrs     lr, SPSR
> >> +     push    {lr}
> >> +
> >> +     mov     lr, r0
> >> +     mov     r0, r1
> >> +     mov     r1, r2
> >> +     mov     r2, r3
> >> +
> >> +THUMB(       orr     lr, #1)
> >> +     blx     lr                      @ Call the HYP function
> >> +
> >> +     pop     {lr}
> >> +     msr     SPSR_csxf, lr
> >> +     pop     {lr}
> >> +     eret
> >> +
> >> +guest_trap:
> >> +     load_vcpu                       @ Load VCPU pointer to r0
> >> +     str     r1, [vcpu, #VCPU_HSR]
> >> +
> >> +     @ Check if we need the fault information
> >> +     lsr     r1, r1, #HSR_EC_SHIFT
> >> +     cmp     r1, #HSR_EC_IABT
> >> +     mrceq   p15, 4, r2, c6, c0, 2   @ HIFAR
> >> +     beq     2f
> >> +     cmp     r1, #HSR_EC_DABT
> >> +     bne     1f
> >> +     mrc     p15, 4, r2, c6, c0, 0   @ HDFAR
> >> +
> >> +2:   str     r2, [vcpu, #VCPU_HxFAR]
> >> +
> >> +     /*
> >> +      * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode:
> >> +      *
> >> +      * Abort on the stage 2 translation for a memory access from a
> >> +      * Non-secure PL1 or PL0 mode:
> >> +      *
> >> +      * For any Access flag fault or Translation fault, and also for any
> >> +      * Permission fault on the stage 2 translation of a memory access
> >> +      * made as part of a translation table walk for a stage 1 translation,
> >> +      * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR
> >> +      * is UNKNOWN.
> >> +      */
> >> +
> >> +     /* Check for permission fault, and S1PTW */
> >> +     mrc     p15, 4, r1, c5, c2, 0   @ HSR
> >> +     and     r0, r1, #HSR_FSC_TYPE
> >> +     cmp     r0, #FSC_PERM
> >> +     tsteq   r1, #(1 << 7)           @ S1PTW
> >> +     mrcne   p15, 4, r2, c6, c0, 4   @ HPFAR
> >> +     bne     3f
> >> +
> >> +     /* Resolve IPA using the xFAR */
> >> +     mcr     p15, 0, r2, c7, c8, 0   @ ATS1CPR
> >> +     isb
> >> +     mrrc    p15, 0, r0, r1, c7      @ PAR
> >> +     tst     r0, #1
> >> +     bne     4f                      @ Failed translation
> >> +     ubfx    r2, r0, #12, #20
> >> +     lsl     r2, r2, #4
> >> +     orr     r2, r2, r1, lsl #24
> >> +
> >> +3:   load_vcpu                       @ Load VCPU pointer to r0
> >> +     str     r2, [r0, #VCPU_HPFAR]
> >> +
> >> +1:   mov     r1, #ARM_EXCEPTION_HVC
> >> +     b       __kvm_vcpu_return
> >> +
> >> +4:   pop     {r0, r1, r2}            @ Failed translation, return to guest
> >> +     eret
> >> +
> >> +/*
> >> + * If VFPv3 support is not available, then we will not switch the VFP
> >> + * registers; however cp10 and cp11 accesses will still trap and fallback
> >> + * to the regular coprocessor emulation code, which currently will
> >> + * inject an undefined exception to the guest.
> >> + */
> >> +#ifdef CONFIG_VFPv3
> >> +switch_to_guest_vfp:
> >> +     load_vcpu                       @ Load VCPU pointer to r0
> >> +     push    {r3-r7}
> >> +
> >> +     @ NEON/VFP used.  Turn on VFP access.
> >> +     set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
> >> +
> >> +     @ Switch VFP/NEON hardware state to the guest's
> >> +     add     r7, r0, #VCPU_VFP_HOST
> >> +     ldr     r7, [r7]
> >> +     store_vfp_state r7
> >> +     add     r7, r0, #VCPU_VFP_GUEST
> >> +     restore_vfp_state r7
> >> +
> >> +     pop     {r3-r7}
> >> +     pop     {r0-r2}
> >> +     eret
> >> +#endif
> >> +
> >> +     .align
> >> +hyp_irq:
> >> +     push    {r0, r1, r2}
> >> +     mov     r1, #ARM_EXCEPTION_IRQ
> >> +     load_vcpu                       @ Load VCPU pointer to r0
> >> +     b       __kvm_vcpu_return
> >> +
> >> +     .align
> >> +hyp_fiq:
> >> +     b       hyp_fiq
> >> +
> >> +     .ltorg
> >>
> >>  __kvm_hyp_code_end:
> >>       .globl  __kvm_hyp_code_end
> >> +
> >> +     .section ".rodata"
> >> +
> >> +und_die_str:
> >> +     .ascii  "unexpected undefined exception in Hyp mode at: %#08x"
> >> +pabt_die_str:
> >> +     .ascii  "unexpected prefetch abort in Hyp mode at: %#08x"
> >> +dabt_die_str:
> >> +     .ascii  "unexpected data abort in Hyp mode at: %#08x"
> >> +svc_die_str:
> >> +     .ascii  "unexpected HVC/SVC trap in Hyp mode at: %#08x"
> >> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> >> new file mode 100644
> >> index 0000000..f59a580
> >> --- /dev/null
> >> +++ b/arch/arm/kvm/interrupts_head.S
> >> @@ -0,0 +1,443 @@
> >> +#define VCPU_USR_REG(_reg_nr)        (VCPU_USR_REGS + (_reg_nr * 4))
> >> +#define VCPU_USR_SP          (VCPU_USR_REG(13))
> >> +#define VCPU_USR_LR          (VCPU_USR_REG(14))
> >> +#define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4))
> >> +
> >> +/*
> >> + * Many of these macros need to access the VCPU structure, which is always
> >> + * held in r0. These macros should never clobber r1, as it is used to hold the
> >> + * exception code on the return path (except of course the macro that switches
> >> + * all the registers before the final jump to the VM).
> >> + */
> >> +vcpu .req    r0              @ vcpu pointer always in r0
> >> +
> >> +/* Clobbers {r2-r6} */
> >> +.macro store_vfp_state vfp_base
> >> +     @ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions
> >> +     VFPFMRX r2, FPEXC
> >> +     @ Make sure VFP is enabled so we can touch the registers.
> >> +     orr     r6, r2, #FPEXC_EN
> >> +     VFPFMXR FPEXC, r6
> >> +
> >> +     VFPFMRX r3, FPSCR
> >> +     tst     r2, #FPEXC_EX           @ Check for VFP Subarchitecture
> >> +     beq     1f
> >> +     @ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so
> >> +     @ we only need to save them if FPEXC_EX is set.
> >> +     VFPFMRX r4, FPINST
> >> +     tst     r2, #FPEXC_FP2V
> >> +     VFPFMRX r5, FPINST2, ne         @ vmrsne
> >> +     bic     r6, r2, #FPEXC_EX       @ FPEXC_EX disable
> >> +     VFPFMXR FPEXC, r6
> >> +1:
> >> +     VFPFSTMIA \vfp_base, r6         @ Save VFP registers
> >> +     stm     \vfp_base, {r2-r5}      @ Save FPEXC, FPSCR, FPINST, FPINST2
> >> +.endm
> >> +
> >> +/* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */
> >> +.macro restore_vfp_state vfp_base
> >> +     VFPFLDMIA \vfp_base, r6         @ Load VFP registers
> >> +     ldm     \vfp_base, {r2-r5}      @ Load FPEXC, FPSCR, FPINST, FPINST2
> >> +
> >> +     VFPFMXR FPSCR, r3
> >> +     tst     r2, #FPEXC_EX           @ Check for VFP Subarchitecture
> >> +     beq     1f
> >> +     VFPFMXR FPINST, r4
> >> +     tst     r2, #FPEXC_FP2V
> >> +     VFPFMXR FPINST2, r5, ne
> >> +1:
> >> +     VFPFMXR FPEXC, r2       @ FPEXC (last, in case !EN)
> >> +.endm
> >> +
> >> +/* These are simply for the macros to work - value don't have meaning */
> >> +.equ usr, 0
> >> +.equ svc, 1
> >> +.equ abt, 2
> >> +.equ und, 3
> >> +.equ irq, 4
> >> +.equ fiq, 5
> >> +
> >> +.macro push_host_regs_mode mode
> >> +     mrs     r2, SP_\mode
> >> +     mrs     r3, LR_\mode
> >> +     mrs     r4, SPSR_\mode
> >> +     push    {r2, r3, r4}
> >> +.endm
> >> +
> >> +/*
> >> + * Store all host persistent registers on the stack.
> >> + * Clobbers all registers, in all modes, except r0 and r1.
> >> + */
> >> +.macro save_host_regs
> >> +     /* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
> >> +     mrs     r2, ELR_hyp
> >> +     push    {r2}
> >> +
> >> +     /* usr regs */
> >> +     push    {r4-r12}        @ r0-r3 are always clobbered
> >> +     mrs     r2, SP_usr
> >> +     mov     r3, lr
> >> +     push    {r2, r3}
> >> +
> >> +     push_host_regs_mode svc
> >> +     push_host_regs_mode abt
> >> +     push_host_regs_mode und
> >> +     push_host_regs_mode irq
> >> +
> >> +     /* fiq regs */
> >> +     mrs     r2, r8_fiq
> >> +     mrs     r3, r9_fiq
> >> +     mrs     r4, r10_fiq
> >> +     mrs     r5, r11_fiq
> >> +     mrs     r6, r12_fiq
> >> +     mrs     r7, SP_fiq
> >> +     mrs     r8, LR_fiq
> >> +     mrs     r9, SPSR_fiq
> >> +     push    {r2-r9}
> >> +.endm
> >> +
> >> +.macro pop_host_regs_mode mode
> >> +     pop     {r2, r3, r4}
> >> +     msr     SP_\mode, r2
> >> +     msr     LR_\mode, r3
> >> +     msr     SPSR_\mode, r4
> >> +.endm
> >> +
> >> +/*
> >> + * Restore all host registers from the stack.
> >> + * Clobbers all registers, in all modes, except r0 and r1.
> >> + */
> >> +.macro restore_host_regs
> >> +     pop     {r2-r9}
> >> +     msr     r8_fiq, r2
> >> +     msr     r9_fiq, r3
> >> +     msr     r10_fiq, r4
> >> +     msr     r11_fiq, r5
> >> +     msr     r12_fiq, r6
> >> +     msr     SP_fiq, r7
> >> +     msr     LR_fiq, r8
> >> +     msr     SPSR_fiq, r9
> >> +
> >> +     pop_host_regs_mode irq
> >> +     pop_host_regs_mode und
> >> +     pop_host_regs_mode abt
> >> +     pop_host_regs_mode svc
> >> +
> >> +     pop     {r2, r3}
> >> +     msr     SP_usr, r2
> >> +     mov     lr, r3
> >> +     pop     {r4-r12}
> >> +
> >> +     pop     {r2}
> >> +     msr     ELR_hyp, r2
> >> +.endm
> >> +
> >> +/*
> >> + * Restore SP, LR and SPSR for a given mode. offset is the offset of
> >> + * this mode's registers from the VCPU base.
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + *
> >> + * Clobbers r1, r2, r3, r4.
> >> + */
> >> +.macro restore_guest_regs_mode mode, offset
> >> +     add     r1, vcpu, \offset
> >> +     ldm     r1, {r2, r3, r4}
> >> +     msr     SP_\mode, r2
> >> +     msr     LR_\mode, r3
> >> +     msr     SPSR_\mode, r4
> >> +.endm
> >> +
> >> +/*
> >> + * Restore all guest registers from the vcpu struct.
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + *
> >> + * Clobbers *all* registers.
> >> + */
> >> +.macro restore_guest_regs
> >> +     restore_guest_regs_mode svc, #VCPU_SVC_REGS
> >> +     restore_guest_regs_mode abt, #VCPU_ABT_REGS
> >> +     restore_guest_regs_mode und, #VCPU_UND_REGS
> >> +     restore_guest_regs_mode irq, #VCPU_IRQ_REGS
> >> +
> >> +     add     r1, vcpu, #VCPU_FIQ_REGS
> >> +     ldm     r1, {r2-r9}
> >> +     msr     r8_fiq, r2
> >> +     msr     r9_fiq, r3
> >> +     msr     r10_fiq, r4
> >> +     msr     r11_fiq, r5
> >> +     msr     r12_fiq, r6
> >> +     msr     SP_fiq, r7
> >> +     msr     LR_fiq, r8
> >> +     msr     SPSR_fiq, r9
> >> +
> >> +     @ Load return state
> >> +     ldr     r2, [vcpu, #VCPU_PC]
> >> +     ldr     r3, [vcpu, #VCPU_CPSR]
> >> +     msr     ELR_hyp, r2
> >> +     msr     SPSR_cxsf, r3
> >> +
> >> +     @ Load user registers
> >> +     ldr     r2, [vcpu, #VCPU_USR_SP]
> >> +     ldr     r3, [vcpu, #VCPU_USR_LR]
> >> +     msr     SP_usr, r2
> >> +     mov     lr, r3
> >> +     add     vcpu, vcpu, #(VCPU_USR_REGS)
> >> +     ldm     vcpu, {r0-r12}
> >> +.endm
> >> +
> >> +/*
> >> + * Save SP, LR and SPSR for a given mode. offset is the offset of
> >> + * this mode's registers from the VCPU base.
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + *
> >> + * Clobbers r2, r3, r4, r5.
> >> + */
> >> +.macro save_guest_regs_mode mode, offset
> >> +     add     r2, vcpu, \offset
> >> +     mrs     r3, SP_\mode
> >> +     mrs     r4, LR_\mode
> >> +     mrs     r5, SPSR_\mode
> >> +     stm     r2, {r3, r4, r5}
> >> +.endm
> >> +
> >> +/*
> >> + * Save all guest registers to the vcpu struct
> >> + * Expects guest's r0, r1, r2 on the stack.
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + *
> >> + * Clobbers r2, r3, r4, r5.
> >> + */
> >> +.macro save_guest_regs
> >> +     @ Store usr registers
> >> +     add     r2, vcpu, #VCPU_USR_REG(3)
> >> +     stm     r2, {r3-r12}
> >> +     add     r2, vcpu, #VCPU_USR_REG(0)
> >> +     pop     {r3, r4, r5}            @ r0, r1, r2
> >> +     stm     r2, {r3, r4, r5}
> >> +     mrs     r2, SP_usr
> >> +     mov     r3, lr
> >> +     str     r2, [vcpu, #VCPU_USR_SP]
> >> +     str     r3, [vcpu, #VCPU_USR_LR]
> >> +
> >> +     @ Store return state
> >> +     mrs     r2, ELR_hyp
> >> +     mrs     r3, spsr
> >> +     str     r2, [vcpu, #VCPU_PC]
> >> +     str     r3, [vcpu, #VCPU_CPSR]
> >> +
> >> +     @ Store other guest registers
> >> +     save_guest_regs_mode svc, #VCPU_SVC_REGS
> >> +     save_guest_regs_mode abt, #VCPU_ABT_REGS
> >> +     save_guest_regs_mode und, #VCPU_UND_REGS
> >> +     save_guest_regs_mode irq, #VCPU_IRQ_REGS
> >> +.endm
> >> +
> >> +/* Reads cp15 registers from hardware and stores them in memory
> >> + * @store_to_vcpu: If 0, registers are written in-order to the stack,
> >> + *              otherwise to the VCPU struct pointed to by vcpup
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + *
> >> + * Clobbers r2 - r12
> >> + */
> >> +.macro read_cp15_state store_to_vcpu
> >> +     mrc     p15, 0, r2, c1, c0, 0   @ SCTLR
> >> +     mrc     p15, 0, r3, c1, c0, 2   @ CPACR
> >> +     mrc     p15, 0, r4, c2, c0, 2   @ TTBCR
> >> +     mrc     p15, 0, r5, c3, c0, 0   @ DACR
> >> +     mrrc    p15, 0, r6, r7, c2      @ TTBR 0
> >> +     mrrc    p15, 1, r8, r9, c2      @ TTBR 1
> >> +     mrc     p15, 0, r10, c10, c2, 0 @ PRRR
> >> +     mrc     p15, 0, r11, c10, c2, 1 @ NMRR
> >> +     mrc     p15, 2, r12, c0, c0, 0  @ CSSELR
> >> +
> >> +     .if \store_to_vcpu == 0
> >> +     push    {r2-r12}                @ Push CP15 registers
> >> +     .else
> >> +     str     r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
> >> +     str     r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
> >> +     str     r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
> >> +     str     r5, [vcpu, #CP15_OFFSET(c3_DACR)]
> >> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
> >> +     strd    r6, r7, [vcpu]
> >> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
> >> +     strd    r8, r9, [vcpu]
> >> +     sub     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
> >> +     str     r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
> >> +     str     r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
> >> +     str     r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
> >> +     .endif
> >> +
> >> +     mrc     p15, 0, r2, c13, c0, 1  @ CID
> >> +     mrc     p15, 0, r3, c13, c0, 2  @ TID_URW
> >> +     mrc     p15, 0, r4, c13, c0, 3  @ TID_URO
> >> +     mrc     p15, 0, r5, c13, c0, 4  @ TID_PRIV
> >> +     mrc     p15, 0, r6, c5, c0, 0   @ DFSR
> >> +     mrc     p15, 0, r7, c5, c0, 1   @ IFSR
> >> +     mrc     p15, 0, r8, c5, c1, 0   @ ADFSR
> >> +     mrc     p15, 0, r9, c5, c1, 1   @ AIFSR
> >> +     mrc     p15, 0, r10, c6, c0, 0  @ DFAR
> >> +     mrc     p15, 0, r11, c6, c0, 2  @ IFAR
> >> +     mrc     p15, 0, r12, c12, c0, 0 @ VBAR
> >> +
> >> +     .if \store_to_vcpu == 0
> >> +     push    {r2-r12}                @ Push CP15 registers
> >> +     .else
> >> +     str     r2, [vcpu, #CP15_OFFSET(c13_CID)]
> >> +     str     r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
> >> +     str     r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
> >> +     str     r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
> >> +     str     r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
> >> +     str     r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
> >> +     str     r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
> >> +     str     r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
> >> +     str     r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
> >> +     str     r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
> >> +     str     r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
> >> +     .endif
> >> +.endm
> >> +
> >> +/*
> >> + * Reads cp15 registers from memory and writes them to hardware
> >> + * @read_from_vcpu: If 0, registers are read in-order from the stack,
> >> + *               otherwise from the VCPU struct pointed to by vcpup
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + */
> >> +.macro write_cp15_state read_from_vcpu
> >> +     .if \read_from_vcpu == 0
> >> +     pop     {r2-r12}
> >> +     .else
> >> +     ldr     r2, [vcpu, #CP15_OFFSET(c13_CID)]
> >> +     ldr     r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
> >> +     ldr     r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
> >> +     ldr     r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
> >> +     ldr     r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
> >> +     ldr     r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
> >> +     ldr     r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
> >> +     ldr     r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
> >> +     ldr     r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
> >> +     ldr     r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
> >> +     ldr     r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
> >> +     .endif
> >> +
> >> +     mcr     p15, 0, r2, c13, c0, 1  @ CID
> >> +     mcr     p15, 0, r3, c13, c0, 2  @ TID_URW
> >> +     mcr     p15, 0, r4, c13, c0, 3  @ TID_URO
> >> +     mcr     p15, 0, r5, c13, c0, 4  @ TID_PRIV
> >> +     mcr     p15, 0, r6, c5, c0, 0   @ DFSR
> >> +     mcr     p15, 0, r7, c5, c0, 1   @ IFSR
> >> +     mcr     p15, 0, r8, c5, c1, 0   @ ADFSR
> >> +     mcr     p15, 0, r9, c5, c1, 1   @ AIFSR
> >> +     mcr     p15, 0, r10, c6, c0, 0  @ DFAR
> >> +     mcr     p15, 0, r11, c6, c0, 2  @ IFAR
> >> +     mcr     p15, 0, r12, c12, c0, 0 @ VBAR
> >> +
> >> +     .if \read_from_vcpu == 0
> >> +     pop     {r2-r12}
> >> +     .else
> >> +     ldr     r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
> >> +     ldr     r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
> >> +     ldr     r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
> >> +     ldr     r5, [vcpu, #CP15_OFFSET(c3_DACR)]
> >> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
> >> +     ldrd    r6, r7, [vcpu]
> >> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
> >> +     ldrd    r8, r9, [vcpu]
> >> +     sub     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
> >> +     ldr     r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
> >> +     ldr     r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
> >> +     ldr     r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
> >> +     .endif
> >> +
> >> +     mcr     p15, 0, r2, c1, c0, 0   @ SCTLR
> >> +     mcr     p15, 0, r3, c1, c0, 2   @ CPACR
> >> +     mcr     p15, 0, r4, c2, c0, 2   @ TTBCR
> >> +     mcr     p15, 0, r5, c3, c0, 0   @ DACR
> >> +     mcrr    p15, 0, r6, r7, c2      @ TTBR 0
> >> +     mcrr    p15, 1, r8, r9, c2      @ TTBR 1
> >> +     mcr     p15, 0, r10, c10, c2, 0 @ PRRR
> >> +     mcr     p15, 0, r11, c10, c2, 1 @ NMRR
> >> +     mcr     p15, 2, r12, c0, c0, 0  @ CSSELR
> >> +.endm
> >> +
> >> +/*
> >> + * Save the VGIC CPU state into memory
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + */
> >> +.macro save_vgic_state
> >> +.endm
> >> +
> >> +/*
> >> + * Restore the VGIC CPU state from memory
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + */
> >> +.macro restore_vgic_state
> >> +.endm
> >> +
> >> +.equ vmentry,        0
> >> +.equ vmexit, 1
> >> +
> >> +/* Configures the HSTR (Hyp System Trap Register) on entry/return
> >> + * (hardware reset value is 0) */
> >> +.macro set_hstr operation
> >> +     mrc     p15, 4, r2, c1, c1, 3
> >> +     ldr     r3, =HSTR_T(15)
> >> +     .if \operation == vmentry
> >> +     orr     r2, r2, r3              @ Trap CR{15}
> >> +     .else
> >> +     bic     r2, r2, r3              @ Don't trap any CRx accesses
> >> +     .endif
> >> +     mcr     p15, 4, r2, c1, c1, 3
> >> +.endm
> >> +
> >> +/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
> >> + * (hardware reset value is 0). Keep previous value in r2. */
> >> +.macro set_hcptr operation, mask
> >> +     mrc     p15, 4, r2, c1, c1, 2
> >> +     ldr     r3, =\mask
> >> +     .if \operation == vmentry
> >> +     orr     r3, r2, r3              @ Trap coproc-accesses defined in mask
> >> +     .else
> >> +     bic     r3, r2, r3              @ Don't trap defined coproc-accesses
> >> +     .endif
> >> +     mcr     p15, 4, r3, c1, c1, 2
> >> +.endm
> >> +
> >> +/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
> >> + * (hardware reset value is 0) */
> >> +.macro set_hdcr operation
> >> +     mrc     p15, 4, r2, c1, c1, 1
> >> +     ldr     r3, =(HDCR_TPM|HDCR_TPMCR)
> >> +     .if \operation == vmentry
> >> +     orr     r2, r2, r3              @ Trap some perfmon accesses
> >> +     .else
> >> +     bic     r2, r2, r3              @ Don't trap any perfmon accesses
> >> +     .endif
> >> +     mcr     p15, 4, r2, c1, c1, 1
> >> +.endm
> >> +
> >> +/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
> >> +.macro configure_hyp_role operation
> >> +     mrc     p15, 4, r2, c1, c1, 0   @ HCR
> >> +     bic     r2, r2, #HCR_VIRT_EXCP_MASK
> >> +     ldr     r3, =HCR_GUEST_MASK
> >> +     .if \operation == vmentry
> >> +     orr     r2, r2, r3
> >> +     ldr     r3, [vcpu, #VCPU_IRQ_LINES]
> > irq_lines are accessed atomically from vcpu_interrupt_line(), but there
> > is no memory barriers or atomic operations here. Looks suspicious though
> > I am not familiar with ARM memory model. As far as I understand
> > different translation regimes are used to access this memory, so who
> > knows what this does to access ordering.
> >
> >
> 
> there's an exception taken to switch to Hyp mode, which I'm quite sure
> implies a memory barrier.
> 
> >> +     orr     r2, r2, r3
> >> +     .else
> >> +     bic     r2, r2, r3
> >> +     .endif
> >> +     mcr     p15, 4, r2, c1, c1, 0
> >> +.endm
> >> +
> >> +.macro load_vcpu
> >> +     mrc     p15, 4, vcpu, c13, c0, 2        @ HTPIDR
> >> +.endm
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> commit e290b507f0d31c895bd515d69c0c2b50d76b20db
> Author: Christoffer Dall <c.dall@virtualopensystems.com>
> Date:   Tue Jan 15 20:53:03 2013 -0500
> 
>     KVM: ARM: Honor vcpu->requests in the world-switch code
> 
>     Honor vpuc->request by checking them accordingly and explicitly raise an
>     error if unsupported requests are set (we don't support any requests on
>     ARM currently).
> 
>     Also add some commenting to explain the synchronization in more details
>     here.  The commenting implied renaming a variable and changing error
>     handling slightly to improve readibility.
> 
>     Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 6ff5337..b23a709 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -620,7 +620,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>   */
>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> -	int ret;
> +	int guest_ret, ret;
>  	sigset_t sigsaved;
> 
>  	/* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
> @@ -640,9 +640,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>  	if (vcpu->sigset_active)
>  		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
> 
> -	ret = 1;
>  	run->exit_reason = KVM_EXIT_UNKNOWN;
> -	while (ret > 0) {
> +	for (;;) {
>  		/*
>  		 * Check conditions before entering the guest
>  		 */
> @@ -650,18 +649,44 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
> 
>  		update_vttbr(vcpu->kvm);
> 
> +		/*
> +		 * There is a dependency between setting IN_GUEST_MODE and
> +		 * sending requests.  We need to ensure:
> +		 *   1. Setting IN_GUEST_MODE before checking vcpu->requests.
> +		 *   2. We need to check vcpu_request after disabling IRQs
> +		 *      (see comment about signal_pending below).
> +		 */
> +		vcpu->mode = IN_GUEST_MODE;
> +
>  		local_irq_disable();
> 
>  		/*
> -		 * Re-check atomic conditions
> +		 * We need to be careful to check these variables after
> +		 * disabling interrupts.  For example with signals:
> +		 *   1. If the signal comes before the signal_pending check,
> +		 *      we will return to user space and everything's good.
> +		 *   2. If the signal comes after the signal_pending check,
> +		 *      we rely on an IPI to exit the guest and continue the
> +		 *      while loop, which checks for pending signals again.
>  		 */
>  		if (signal_pending(current)) {
>  			ret = -EINTR;
>  			run->exit_reason = KVM_EXIT_INTR;
> +			local_irq_enable();
> +			vcpu->mode = OUTSIDE_GUEST_MODE;
> +			break;
>  		}
> 
> -		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> +		if (vcpu->requests) {
> +			ret = -ENOSYS; /* requests not supported */
>  			local_irq_enable();
> +			vcpu->mode = OUTSIDE_GUEST_MODE;
> +			break;
> +		}
> +
> +		if (need_new_vmid_gen(vcpu->kvm)) {
> +			local_irq_enable();
> +			vcpu->mode = OUTSIDE_GUEST_MODE;
>  			continue;
>  		}
> 
> @@ -670,17 +695,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
>  		 */
>  		trace_kvm_entry(*vcpu_pc(vcpu));
>  		kvm_guest_enter();
> -		vcpu->mode = IN_GUEST_MODE;
> 
>  		smp_mb(); /* set mode before reading vcpu->arch.pause */
>  		if (unlikely(vcpu->arch.pause)) {
>  			/* This means ignore, try again. */
> -			ret = ARM_EXCEPTION_IRQ;
> +			guest_ret = ARM_EXCEPTION_IRQ;
>  		} else {
> -			ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> +			guest_ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>  		}
> 
> -		vcpu->mode = OUTSIDE_GUEST_MODE;
>  		vcpu->arch.last_pcpu = smp_processor_id();
>  		kvm_guest_exit();
>  		trace_kvm_exit(*vcpu_pc(vcpu));
> @@ -695,12 +718,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
>  		 * mode this time.
>  		 */
>  		local_irq_enable();
> +		vcpu->mode = OUTSIDE_GUEST_MODE;
> 
>  		/*
>  		 * Back from guest
>  		 *************************************************************/
> 
> -		ret = handle_exit(vcpu, run, ret);
> +		ret = handle_exit(vcpu, run, guest_ret);
> +		if (ret <= 0)
> +			break;
>  	}
> 
>  	if (vcpu->sigset_active)
> 
> commit fc9a9c5e9dd4eba4acd6bea5c8c083a9a854d662
> Author: Christoffer Dall <c.dall@virtualopensystems.com>
> Date:   Tue Jan 15 20:42:15 2013 -0500
> 
>     KVM: ARM: Remove unused memslot parameter
> 
> diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
> index 31ab9f5..571ccf0 100644
> --- a/arch/arm/include/asm/kvm_mmio.h
> +++ b/arch/arm/include/asm/kvm_mmio.h
> @@ -46,6 +46,6 @@ static inline void kvm_prepare_mmio(struct kvm_run *run,
> 
>  int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> -		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot);
> +		 phys_addr_t fault_ipa);
> 
>  #endif	/* __ARM_KVM_MMIO_H__ */
> diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
> index d6a4ca0..f655088 100644
> --- a/arch/arm/kvm/mmio.c
> +++ b/arch/arm/kvm/mmio.c
> @@ -117,7 +117,7 @@ static int decode_hsr(struct kvm_vcpu *vcpu,
> phys_addr_t fault_ipa,
>  }
> 
>  int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> -		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
> +		 phys_addr_t fault_ipa)
>  {
>  	struct kvm_exit_mmio mmio;
>  	unsigned long rt;
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 2a83ac9..c806080 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -588,7 +588,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>  	unsigned long hsr_ec;
>  	unsigned long fault_status;
>  	phys_addr_t fault_ipa;
> -	struct kvm_memory_slot *memslot = NULL;
> +	struct kvm_memory_slot *memslot;
>  	bool is_iabt;
>  	gfn_t gfn;
>  	int ret;
> @@ -624,7 +624,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
> 
>  		/* Adjust page offset */
>  		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
> -		return io_mem_abort(vcpu, run, fault_ipa, memslot);
> +		return io_mem_abort(vcpu, run, fault_ipa);
>  	}
> 
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
> 
> commit 70667a06e445e240fb5e6352ccdc4bc8a290866e
> Author: Christoffer Dall <c.dall@virtualopensystems.com>
> Date:   Tue Jan 15 20:51:42 2013 -0500
> 
>     KVM: ARM: Grab kvm->srcu lock when handling page faults
> 
>     The memslots data structure is protected with an SRCU lock, so we should
>     grab the read side lock before traversing this data structure.
> 
>     Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index c806080..0b7eabf 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -591,7 +591,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>  	struct kvm_memory_slot *memslot;
>  	bool is_iabt;
>  	gfn_t gfn;
> -	int ret;
> +	int ret, idx;
> 
>  	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
>  	is_iabt = (hsr_ec == HSR_EC_IABT);
> @@ -627,13 +627,17 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
>  		return io_mem_abort(vcpu, run, fault_ipa);
>  	}
> 
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
>  	if (!memslot->user_alloc) {
>  		kvm_err("non user-alloc memslots not supported\n");
> -		return -EINVAL;
> +		ret = -EINVAL;
> +		goto out_unlock;
>  	}
> 
>  	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
> +out_unlock:
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  	return ret ? ret : 1;
>  }
> 
> --
> 
> Thanks,
> -Christoffer

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-16 12:12         ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 09:08:11PM -0500, Christoffer Dall wrote:
> On Tue, Jan 15, 2013 at 4:43 AM, Gleb Natapov <gleb@redhat.com> wrote:
> > On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
> >> Provides complete world-switch implementation to switch to other guests
> >> running in non-secure modes. Includes Hyp exception handlers that
> >> capture necessary exception information and stores the information on
> >> the VCPU and KVM structures.
> >>
> >> The following Hyp-ABI is also documented in the code:
> >>
> >> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
> >>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
> >>    exception vector code will check that the HVC comes from VMID==0 and if
> >>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
> >>    - r0 contains a pointer to a HYP function
> >>    - r1, r2, and r3 contain arguments to the above function.
> >>    - The HYP function will be called with its arguments in r0, r1 and r2.
> >>    On HYP function return, we return directly to SVC.
> >>
> >> A call to a function executing in Hyp mode is performed like the following:
> >>
> >>         <svc code>
> >>         ldr     r0, =BSYM(my_hyp_fn)
> >>         ldr     r1, =my_param
> >>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
> >>         <svc code>
> >>
> >> Otherwise, the world-switch is pretty straight-forward. All state that
> >> can be modified by the guest is first backed up on the Hyp stack and the
> >> VCPU values is loaded onto the hardware. State, which is not loaded, but
> >> theoretically modifiable by the guest is protected through the
> >> virtualiation features to generate a trap and cause software emulation.
> >> Upon guest returns, all state is restored from hardware onto the VCPU
> >> struct and the original state is restored from the Hyp-stack onto the
> >> hardware.
> >>
> >> SMP support using the VMPIDR calculated on the basis of the host MPIDR
> >> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
> >>
> >> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
> >> a separate patch into the appropriate patches introducing the
> >> functionality. Note that the VMIDs are stored per VM as required by the ARM
> >> architecture reference manual.
> >>
> >> To support VFP/NEON we trap those instructions using the HPCTR. When
> >> we trap, we switch the FPU.  After a guest exit, the VFP state is
> >> returned to the host.  When disabling access to floating point
> >> instructions, we also mask FPEXC_EN in order to avoid the guest
> >> receiving Undefined instruction exceptions before we have a chance to
> >> switch back the floating point state.  We are reusing vfp_hard_struct,
> >> so we depend on VFPv3 being enabled in the host kernel, if not, we still
> >> trap cp10 and cp11 in order to inject an undefined instruction exception
> >> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
> >> Antionios Motakis and Rusty Russell.
> >>
> >> Aborts that are permission faults, and not stage-1 page table walk, do
> >> not report the faulting address in the HPFAR.  We have to resolve the
> >> IPA, and store it just like the HPFAR register on the VCPU struct. If
> >> the IPA cannot be resolved, it means another CPU is playing with the
> >> page tables, and we simply restart the guest.  This quirk was fixed by
> >> Marc Zyngier.
> >>
> >> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> >> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> >> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> >> ---
> >>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
> >>  arch/arm/include/asm/kvm_host.h |   10 +
> >>  arch/arm/kernel/asm-offsets.c   |   25 ++
> >>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
> >>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
> >>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
> >>  6 files changed, 1108 insertions(+), 4 deletions(-)
> >>  create mode 100644 arch/arm/kvm/interrupts_head.S
> >>
> >> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> >> index fb22ee8..a3262a2 100644
> >> --- a/arch/arm/include/asm/kvm_arm.h
> >> +++ b/arch/arm/include/asm/kvm_arm.h
> >> @@ -98,6 +98,18 @@
> >>  #define TTBCR_T0SZ   3
> >>  #define HTCR_MASK    (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
> >>
> >> +/* Hyp System Trap Register */
> >> +#define HSTR_T(x)    (1 << x)
> >> +#define HSTR_TTEE    (1 << 16)
> >> +#define HSTR_TJDBX   (1 << 17)
> >> +
> >> +/* Hyp Coprocessor Trap Register */
> >> +#define HCPTR_TCP(x) (1 << x)
> >> +#define HCPTR_TCP_MASK       (0x3fff)
> >> +#define HCPTR_TASE   (1 << 15)
> >> +#define HCPTR_TTA    (1 << 20)
> >> +#define HCPTR_TCPAC  (1 << 31)
> >> +
> >>  /* Hyp Debug Configuration Register bits */
> >>  #define HDCR_TDRA    (1 << 11)
> >>  #define HDCR_TDOSA   (1 << 10)
> >> @@ -144,6 +156,45 @@
> >>  #else
> >>  #define VTTBR_X              (5 - KVM_T0SZ)
> >>  #endif
> >> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
> >> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
> >> +#define VTTBR_VMID_SHIFT  (48LLU)
> >> +#define VTTBR_VMID_MASK        (0xffLLU << VTTBR_VMID_SHIFT)
> >> +
> >> +/* Hyp Syndrome Register (HSR) bits */
> >> +#define HSR_EC_SHIFT (26)
> >> +#define HSR_EC               (0x3fU << HSR_EC_SHIFT)
> >> +#define HSR_IL               (1U << 25)
> >> +#define HSR_ISS              (HSR_IL - 1)
> >> +#define HSR_ISV_SHIFT        (24)
> >> +#define HSR_ISV              (1U << HSR_ISV_SHIFT)
> >> +#define HSR_FSC              (0x3f)
> >> +#define HSR_FSC_TYPE (0x3c)
> >> +#define HSR_WNR              (1 << 6)
> >> +
> >> +#define FSC_FAULT    (0x04)
> >> +#define FSC_PERM     (0x0c)
> >> +
> >> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> >> +#define HPFAR_MASK   (~0xf)
> >>
> >> +#define HSR_EC_UNKNOWN       (0x00)
> >> +#define HSR_EC_WFI   (0x01)
> >> +#define HSR_EC_CP15_32       (0x03)
> >> +#define HSR_EC_CP15_64       (0x04)
> >> +#define HSR_EC_CP14_MR       (0x05)
> >> +#define HSR_EC_CP14_LS       (0x06)
> >> +#define HSR_EC_CP_0_13       (0x07)
> >> +#define HSR_EC_CP10_ID       (0x08)
> >> +#define HSR_EC_JAZELLE       (0x09)
> >> +#define HSR_EC_BXJ   (0x0A)
> >> +#define HSR_EC_CP14_64       (0x0C)
> >> +#define HSR_EC_SVC_HYP       (0x11)
> >> +#define HSR_EC_HVC   (0x12)
> >> +#define HSR_EC_SMC   (0x13)
> >> +#define HSR_EC_IABT  (0x20)
> >> +#define HSR_EC_IABT_HYP      (0x21)
> >> +#define HSR_EC_DABT  (0x24)
> >> +#define HSR_EC_DABT_HYP      (0x25)
> >>
> >>  #endif /* __ARM_KVM_ARM_H__ */
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index 1de6f0d..ddb09da 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -21,6 +21,7 @@
> >>
> >>  #include <asm/kvm.h>
> >>  #include <asm/kvm_asm.h>
> >> +#include <asm/fpstate.h>
> >>
> >>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
> >>  #define KVM_USER_MEM_SLOTS 32
> >> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
> >>       u32 hxfar;              /* Hyp Data/Inst Fault Address Register */
> >>       u32 hpfar;              /* Hyp IPA Fault Address Register */
> >>
> >> +     /* Floating point registers (VFP and Advanced SIMD/NEON) */
> >> +     struct vfp_hard_struct vfp_guest;
> >> +     struct vfp_hard_struct *vfp_host;
> >> +
> >> +     /*
> >> +      * Anything that is not used directly from assembly code goes
> >> +      * here.
> >> +      */
> >>       /* Interrupt related fields */
> >>       u32 irq_lines;          /* IRQ and FIQ levels */
> >>
> >> @@ -112,6 +121,7 @@ struct kvm_one_reg;
> >>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> >>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> >>  u64 kvm_call_hyp(void *hypfn, ...);
> >> +void force_vm_exit(const cpumask_t *mask);
> >>
> >>  #define KVM_ARCH_WANT_MMU_NOTIFIER
> >>  struct kvm;
> >> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> >> index c985b48..c8b3272 100644
> >> --- a/arch/arm/kernel/asm-offsets.c
> >> +++ b/arch/arm/kernel/asm-offsets.c
> >> @@ -13,6 +13,9 @@
> >>  #include <linux/sched.h>
> >>  #include <linux/mm.h>
> >>  #include <linux/dma-mapping.h>
> >> +#ifdef CONFIG_KVM_ARM_HOST
> >> +#include <linux/kvm_host.h>
> >> +#endif
> >>  #include <asm/cacheflush.h>
> >>  #include <asm/glue-df.h>
> >>  #include <asm/glue-pf.h>
> >> @@ -146,5 +149,27 @@ int main(void)
> >>    DEFINE(DMA_BIDIRECTIONAL,  DMA_BIDIRECTIONAL);
> >>    DEFINE(DMA_TO_DEVICE,              DMA_TO_DEVICE);
> >>    DEFINE(DMA_FROM_DEVICE,    DMA_FROM_DEVICE);
> >> +#ifdef CONFIG_KVM_ARM_HOST
> >> +  DEFINE(VCPU_KVM,           offsetof(struct kvm_vcpu, kvm));
> >> +  DEFINE(VCPU_MIDR,          offsetof(struct kvm_vcpu, arch.midr));
> >> +  DEFINE(VCPU_CP15,          offsetof(struct kvm_vcpu, arch.cp15));
> >> +  DEFINE(VCPU_VFP_GUEST,     offsetof(struct kvm_vcpu, arch.vfp_guest));
> >> +  DEFINE(VCPU_VFP_HOST,              offsetof(struct kvm_vcpu, arch.vfp_host));
> >> +  DEFINE(VCPU_REGS,          offsetof(struct kvm_vcpu, arch.regs));
> >> +  DEFINE(VCPU_USR_REGS,              offsetof(struct kvm_vcpu, arch.regs.usr_regs));
> >> +  DEFINE(VCPU_SVC_REGS,              offsetof(struct kvm_vcpu, arch.regs.svc_regs));
> >> +  DEFINE(VCPU_ABT_REGS,              offsetof(struct kvm_vcpu, arch.regs.abt_regs));
> >> +  DEFINE(VCPU_UND_REGS,              offsetof(struct kvm_vcpu, arch.regs.und_regs));
> >> +  DEFINE(VCPU_IRQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.irq_regs));
> >> +  DEFINE(VCPU_FIQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
> >> +  DEFINE(VCPU_PC,            offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
> >> +  DEFINE(VCPU_CPSR,          offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
> >> +  DEFINE(VCPU_IRQ_LINES,     offsetof(struct kvm_vcpu, arch.irq_lines));
> >> +  DEFINE(VCPU_HSR,           offsetof(struct kvm_vcpu, arch.hsr));
> >> +  DEFINE(VCPU_HxFAR,         offsetof(struct kvm_vcpu, arch.hxfar));
> >> +  DEFINE(VCPU_HPFAR,         offsetof(struct kvm_vcpu, arch.hpfar));
> >> +  DEFINE(VCPU_HYP_PC,                offsetof(struct kvm_vcpu, arch.hyp_pc));
> >> +  DEFINE(KVM_VTTBR,          offsetof(struct kvm, arch.vttbr));
> >> +#endif
> >>    return 0;
> >>  }
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index 9b4566e..c94d278 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -40,6 +40,7 @@
> >>  #include <asm/kvm_arm.h>
> >>  #include <asm/kvm_asm.h>
> >>  #include <asm/kvm_mmu.h>
> >> +#include <asm/kvm_emulate.h>
> >>
> >>  #ifdef REQUIRES_VIRT
> >>  __asm__(".arch_extension     virt");
> >> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> >>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
> >>  static unsigned long hyp_default_vectors;
> >>
> >> +/* The VMID used in the VTTBR */
> >> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> >> +static u8 kvm_next_vmid;
> >> +static DEFINE_SPINLOCK(kvm_vmid_lock);
> >>
> >>  int kvm_arch_hardware_enable(void *garbage)
> >>  {
> >> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
> >>
> >>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >>  {
> >> +     /* Force users to call KVM_ARM_VCPU_INIT */
> >> +     vcpu->arch.target = -1;
> >>       return 0;
> >>  }
> >>
> >> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
> >>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>  {
> >>       vcpu->cpu = cpu;
> >> +     vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
> >>  }
> >>
> >>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> >>
> >>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
> > As far as I see the function is unused.
> >
> >>  {
> >> +     return v->mode == IN_GUEST_MODE;
> >> +}
> >> +
> >> +/* Just ensure a guest exit from a particular CPU */
> >> +static void exit_vm_noop(void *info)
> >> +{
> >> +}
> >> +
> >> +void force_vm_exit(const cpumask_t *mask)
> >> +{
> >> +     smp_call_function_many(mask, exit_vm_noop, NULL, true);
> >> +}
> > There is make_all_cpus_request() for that. It actually sends IPIs only
> > to cpus that are running vcpus.
> >
> >> +
> >> +/**
> >> + * need_new_vmid_gen - check that the VMID is still valid
> >> + * @kvm: The VM's VMID to checkt
> >> + *
> >> + * return true if there is a new generation of VMIDs being used
> >> + *
> >> + * The hardware supports only 256 values with the value zero reserved for the
> >> + * host, so we check if an assigned value belongs to a previous generation,
> >> + * which which requires us to assign a new value. If we're the first to use a
> >> + * VMID for the new generation, we must flush necessary caches and TLBs on all
> >> + * CPUs.
> >> + */
> >> +static bool need_new_vmid_gen(struct kvm *kvm)
> >> +{
> >> +     return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
> >> +}
> >> +
> >> +/**
> >> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
> >> + * @kvm      The guest that we are about to run
> >> + *
> >> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
> >> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
> >> + * caches and TLBs.
> >> + */
> >> +static void update_vttbr(struct kvm *kvm)
> >> +{
> >> +     phys_addr_t pgd_phys;
> >> +     u64 vmid;
> >> +
> >> +     if (!need_new_vmid_gen(kvm))
> >> +             return;
> >> +
> >> +     spin_lock(&kvm_vmid_lock);
> >> +
> >> +     /*
> >> +      * We need to re-check the vmid_gen here to ensure that if another vcpu
> >> +      * already allocated a valid vmid for this vm, then this vcpu should
> >> +      * use the same vmid.
> >> +      */
> >> +     if (!need_new_vmid_gen(kvm)) {
> >> +             spin_unlock(&kvm_vmid_lock);
> >> +             return;
> >> +     }
> >> +
> >> +     /* First user of a new VMID generation? */
> >> +     if (unlikely(kvm_next_vmid == 0)) {
> >> +             atomic64_inc(&kvm_vmid_gen);
> >> +             kvm_next_vmid = 1;
> >> +
> >> +             /*
> >> +              * On SMP we know no other CPUs can use this CPU's or each
> >> +              * other's VMID after force_vm_exit returns since the
> >> +              * kvm_vmid_lock blocks them from reentry to the guest.
> >> +              */
> >> +             force_vm_exit(cpu_all_mask);
> >> +             /*
> >> +              * Now broadcast TLB + ICACHE invalidation over the inner
> >> +              * shareable domain to make sure all data structures are
> >> +              * clean.
> >> +              */
> >> +             kvm_call_hyp(__kvm_flush_vm_context);
> >> +     }
> >> +
> >> +     kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
> >> +     kvm->arch.vmid = kvm_next_vmid;
> >> +     kvm_next_vmid++;
> >> +
> >> +     /* update vttbr to be used with the new vmid */
> >> +     pgd_phys = virt_to_phys(kvm->arch.pgd);
> >> +     vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
> >> +     kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
> >> +     kvm->arch.vttbr |= vmid;
> >> +
> >> +     spin_unlock(&kvm_vmid_lock);
> >> +}
> >> +
> >> +/*
> >> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
> >> + * proper exit to QEMU.
> >> + */
> >> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >> +                    int exception_index)
> >> +{
> >> +     run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> >>       return 0;
> >>  }
> >>
> >> +/**
> >> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
> >> + * @vcpu:    The VCPU pointer
> >> + * @run:     The kvm_run structure pointer used for userspace state exchange
> >> + *
> >> + * This function is called through the VCPU_RUN ioctl called from user space. It
> >> + * will execute VM code in a loop until the time slice for the process is used
> >> + * or some emulation is needed from user space in which case the function will
> >> + * return with return value 0 and with the kvm_run structure filled in with the
> >> + * required data for the requested emulation.
> >> + */
> >>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>  {
> >> -     return -EINVAL;
> >> +     int ret;
> >> +     sigset_t sigsaved;
> >> +
> >> +     /* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
> >> +     if (unlikely(vcpu->arch.target < 0))
> >> +             return -ENOEXEC;
> >> +
> >> +     if (vcpu->sigset_active)
> >> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
> >> +
> >> +     ret = 1;
> >> +     run->exit_reason = KVM_EXIT_UNKNOWN;
> >> +     while (ret > 0) {
> >> +             /*
> >> +              * Check conditions before entering the guest
> >> +              */
> >> +             cond_resched();
> >> +
> >> +             update_vttbr(vcpu->kvm);
> >> +
> >> +             local_irq_disable();
> >> +
> >> +             /*
> >> +              * Re-check atomic conditions
> >> +              */
> >> +             if (signal_pending(current)) {
> >> +                     ret = -EINTR;
> >> +                     run->exit_reason = KVM_EXIT_INTR;
> >> +             }
> >> +
> >> +             if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >> +                     local_irq_enable();
> >> +                     continue;
> >> +             }
> >> +
> >> +             /**************************************************************
> >> +              * Enter the guest
> >> +              */
> >> +             trace_kvm_entry(*vcpu_pc(vcpu));
> >> +             kvm_guest_enter();
> >> +             vcpu->mode = IN_GUEST_MODE;
> > You need to set mode to IN_GUEST_MODE before disabling interrupt and
> > check that mode != EXITING_GUEST_MODE after disabling interrupt but
> > before entering the guest. This way you will catch kicks that were sent
> > between setting of the mode and disabling the interrupts. Also you need
> > to check vcpu->requests and exit if it is not empty. I see that you do
> > not use vcpu->requests at all, but you should since common kvm code
> > assumes that it is used. make_all_cpus_request() uses it for instance.
> >
> 
> I don't quite agree, but almost:
> 
> Why would you set IN_GUEST_MODE before disabling interrupts? The only
> reason I can see for to be a requirement is to leverage an implicit
> memory barrier. Receiving the IPI in this little window does nothing
> (the smp_cross_call is a noop).
> 
> Checking that mode != EXITING_GUEST_MODE is equally useless in my
> opinion, as I read the requests code the only reason for this mode is
> to avoid sending an IPI twice.
> 
> Kicks sent between setting the mode and disabling the interrupts is
> not the point, the point is to check the requests field (which we
> don't use at all on ARM, and generic code also doesn't use on ARM)
> after disabling interrupts, and after setting IN_GUEST_MODE.
> 
Yes, you are right. There is not race here.

> The patch below fixes your issues, and while I would push back on
> anything else than direct bug fixes at this point, the current code is
> semantically incorrect wrt. KVM vcpu requests, so it's worth a fix,
> and the patch itself is trivial.
> 
> >> +
> >> +             ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> > You do not take kvm->srcu lock before entering the guest. It looks
> > wrong.
> >
> 
> why would I take that before entering the guest? The only thing the
Right. No need to take it before entering a guest of course. I should
have said "after". Anyway absents of srcu handling raced my suspicion
and made me check all the code for kvm->srcu handling. I expected to see
kvm->srcu handling in vcpu_run() because x86 locks srcu at the beginning
of vcpu loop, release it before guest entry and retakes it after exit.
This way all the code called from vcpu run loop is protected. This is of
course not the only way to tackle it and you can do locking only around
memslot use.

> read side RCU protects against is the memslots data structure as far
> as I can see, so the second patch pasted below fixes this for the code
> that actually accesses this data structure.
Many memory related functions that you call access memslots under the
hood and assume that locking is done by the caller. From the quick look
I found those that you've missed:
kvm_is_visible_gfn()
kvm_read_guest()
gfn_to_hva()
gfn_to_pfn_prot()
kvm_memslots()

May be there are more. Can you enable RCU debugging in your kernel config
and check? This does not guaranty that it will catch all of the places,
but better than nothing.

> 
> >> +
> >> +             vcpu->mode = OUTSIDE_GUEST_MODE;
> >> +             kvm_guest_exit();
> >> +             trace_kvm_exit(*vcpu_pc(vcpu));
> >> +             /*
> >> +              * We may have taken a host interrupt in HYP mode (ie
> >> +              * while executing the guest). This interrupt is still
> >> +              * pending, as we haven't serviced it yet!
> >> +              *
> >> +              * We're now back in SVC mode, with interrupts
> >> +              * disabled.  Enabling the interrupts now will have
> >> +              * the effect of taking the interrupt again, in SVC
> >> +              * mode this time.
> >> +              */
> >> +             local_irq_enable();
> >> +
> >> +             /*
> >> +              * Back from guest
> >> +              *************************************************************/
> >> +
> >> +             ret = handle_exit(vcpu, run, ret);
> >> +     }
> >> +
> >> +     if (vcpu->sigset_active)
> >> +             sigprocmask(SIG_SETMASK, &sigsaved, NULL);
> >> +     return ret;
> >>  }
> >>
> >>  static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
> >> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> >> index a923590..08adcd5 100644
> >> --- a/arch/arm/kvm/interrupts.S
> >> +++ b/arch/arm/kvm/interrupts.S
> >> @@ -20,9 +20,12 @@
> >>  #include <linux/const.h>
> >>  #include <asm/unified.h>
> >>  #include <asm/page.h>
> >> +#include <asm/ptrace.h>
> >>  #include <asm/asm-offsets.h>
> >>  #include <asm/kvm_asm.h>
> >>  #include <asm/kvm_arm.h>
> >> +#include <asm/vfpmacros.h>
> >> +#include "interrupts_head.S"
> >>
> >>       .text
> >>
> >> @@ -31,36 +34,423 @@ __kvm_hyp_code_start:
> >>
> >>  /********************************************************************
> >>   * Flush per-VMID TLBs
> >> + *
> >> + * void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >> + *
> >> + * We rely on the hardware to broadcast the TLB invalidation to all CPUs
> >> + * inside the inner-shareable domain (which is the case for all v7
> >> + * implementations).  If we come across a non-IS SMP implementation, we'll
> >> + * have to use an IPI based mechanism. Until then, we stick to the simple
> >> + * hardware assisted version.
> >>   */
> >>  ENTRY(__kvm_tlb_flush_vmid)
> >> +     push    {r2, r3}
> >> +
> >> +     add     r0, r0, #KVM_VTTBR
> >> +     ldrd    r2, r3, [r0]
> >> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
> >> +     isb
> >> +     mcr     p15, 0, r0, c8, c3, 0   @ TLBIALLIS (rt ignored)
> >> +     dsb
> >> +     isb
> >> +     mov     r2, #0
> >> +     mov     r3, #0
> >> +     mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
> >> +     isb                             @ Not necessary if followed by eret
> >> +
> >> +     pop     {r2, r3}
> >>       bx      lr
> >>  ENDPROC(__kvm_tlb_flush_vmid)
> >>
> >>  /********************************************************************
> >> - * Flush TLBs and instruction caches of current CPU for all VMIDs
> >> + * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
> >> + * domain, for all VMIDs
> >> + *
> >> + * void __kvm_flush_vm_context(void);
> >>   */
> >>  ENTRY(__kvm_flush_vm_context)
> >> +     mov     r0, #0                  @ rn parameter for c15 flushes is SBZ
> >> +
> >> +     /* Invalidate NS Non-Hyp TLB Inner Shareable (TLBIALLNSNHIS) */
> >> +     mcr     p15, 4, r0, c8, c3, 4
> >> +     /* Invalidate instruction caches Inner Shareable (ICIALLUIS) */
> >> +     mcr     p15, 0, r0, c7, c1, 0
> >> +     dsb
> >> +     isb                             @ Not necessary if followed by eret
> >> +
> >>       bx      lr
> >>  ENDPROC(__kvm_flush_vm_context)
> >>
> >> +
> >>  /********************************************************************
> >>   *  Hypervisor world-switch code
> >> + *
> >> + *
> >> + * int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >>   */
> >>  ENTRY(__kvm_vcpu_run)
> >> -     bx      lr
> >> +     @ Save the vcpu pointer
> >> +     mcr     p15, 4, vcpu, c13, c0, 2        @ HTPIDR
> >> +
> >> +     save_host_regs
> >> +
> >> +     @ Store hardware CP15 state and load guest state
> >> +     read_cp15_state store_to_vcpu = 0
> >> +     write_cp15_state read_from_vcpu = 1
> >> +
> >> +     @ If the host kernel has not been configured with VFPv3 support,
> >> +     @ then it is safer if we deny guests from using it as well.
> >> +#ifdef CONFIG_VFPv3
> >> +     @ Set FPEXC_EN so the guest doesn't trap floating point instructions
> >> +     VFPFMRX r2, FPEXC               @ VMRS
> >> +     push    {r2}
> >> +     orr     r2, r2, #FPEXC_EN
> >> +     VFPFMXR FPEXC, r2               @ VMSR
> >> +#endif
> >> +
> >> +     @ Configure Hyp-role
> >> +     configure_hyp_role vmentry
> >> +
> >> +     @ Trap coprocessor CRx accesses
> >> +     set_hstr vmentry
> >> +     set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> >> +     set_hdcr vmentry
> >> +
> >> +     @ Write configured ID register into MIDR alias
> >> +     ldr     r1, [vcpu, #VCPU_MIDR]
> >> +     mcr     p15, 4, r1, c0, c0, 0
> >> +
> >> +     @ Write guest view of MPIDR into VMPIDR
> >> +     ldr     r1, [vcpu, #CP15_OFFSET(c0_MPIDR)]
> >> +     mcr     p15, 4, r1, c0, c0, 5
> >> +
> >> +     @ Set up guest memory translation
> >> +     ldr     r1, [vcpu, #VCPU_KVM]
> >> +     add     r1, r1, #KVM_VTTBR
> >> +     ldrd    r2, r3, [r1]
> >> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
> >> +
> >> +     @ We're all done, just restore the GPRs and go to the guest
> >> +     restore_guest_regs
> >> +     clrex                           @ Clear exclusive monitor
> >> +     eret
> >> +
> >> +__kvm_vcpu_return:
> >> +     /*
> >> +      * return convention:
> >> +      * guest r0, r1, r2 saved on the stack
> >> +      * r0: vcpu pointer
> >> +      * r1: exception code
> >> +      */
> >> +     save_guest_regs
> >> +
> >> +     @ Set VMID == 0
> >> +     mov     r2, #0
> >> +     mov     r3, #0
> >> +     mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
> >> +
> >> +     @ Don't trap coprocessor accesses for host kernel
> >> +     set_hstr vmexit
> >> +     set_hdcr vmexit
> >> +     set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> >> +
> >> +#ifdef CONFIG_VFPv3
> >> +     @ Save floating point registers we if let guest use them.
> >> +     tst     r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
> >> +     bne     after_vfp_restore
> >> +
> >> +     @ Switch VFP/NEON hardware state to the host's
> >> +     add     r7, vcpu, #VCPU_VFP_GUEST
> >> +     store_vfp_state r7
> >> +     add     r7, vcpu, #VCPU_VFP_HOST
> >> +     ldr     r7, [r7]
> >> +     restore_vfp_state r7
> >> +
> >> +after_vfp_restore:
> >> +     @ Restore FPEXC_EN which we clobbered on entry
> >> +     pop     {r2}
> >> +     VFPFMXR FPEXC, r2
> >> +#endif
> >> +
> >> +     @ Reset Hyp-role
> >> +     configure_hyp_role vmexit
> >> +
> >> +     @ Let host read hardware MIDR
> >> +     mrc     p15, 0, r2, c0, c0, 0
> >> +     mcr     p15, 4, r2, c0, c0, 0
> >> +
> >> +     @ Back to hardware MPIDR
> >> +     mrc     p15, 0, r2, c0, c0, 5
> >> +     mcr     p15, 4, r2, c0, c0, 5
> >> +
> >> +     @ Store guest CP15 state and restore host state
> >> +     read_cp15_state store_to_vcpu = 1
> >> +     write_cp15_state read_from_vcpu = 0
> >> +
> >> +     restore_host_regs
> >> +     clrex                           @ Clear exclusive monitor
> >> +     mov     r0, r1                  @ Return the return code
> >> +     bx      lr                      @ return to IOCTL
> >>
> >>  ENTRY(kvm_call_hyp)
> >> +     hvc     #0
> >>       bx      lr
> >>
> >>
> >>  /********************************************************************
> >>   * Hypervisor exception vector and handlers
> >> + *
> >> + *
> >> + * The KVM/ARM Hypervisor ABI is defined as follows:
> >> + *
> >> + * Entry to Hyp mode from the host kernel will happen _only_ when an HVC
> >> + * instruction is issued since all traps are disabled when running the host
> >> + * kernel as per the Hyp-mode initialization at boot time.
> >> + *
> >> + * HVC instructions cause a trap to the vector page + offset 0x18 (see hyp_hvc
> >> + * below) when the HVC instruction is called from SVC mode (i.e. a guest or the
> >> + * host kernel) and they cause a trap to the vector page + offset 0xc when HVC
> >> + * instructions are called from within Hyp-mode.
> >> + *
> >> + * Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
> >> + *    Switching to Hyp mode is done through a simple HVC #0 instruction. The
> >> + *    exception vector code will check that the HVC comes from VMID==0 and if
> >> + *    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
> >> + *    - r0 contains a pointer to a HYP function
> >> + *    - r1, r2, and r3 contain arguments to the above function.
> >> + *    - The HYP function will be called with its arguments in r0, r1 and r2.
> >> + *    On HYP function return, we return directly to SVC.
> >> + *
> >> + * Note that the above is used to execute code in Hyp-mode from a host-kernel
> >> + * point of view, and is a different concept from performing a world-switch and
> >> + * executing guest code SVC mode (with a VMID != 0).
> >>   */
> >>
> >> +/* Handle undef, svc, pabt, or dabt by crashing with a user notice */
> >> +.macro bad_exception exception_code, panic_str
> >> +     push    {r0-r2}
> >> +     mrrc    p15, 6, r0, r1, c2      @ Read VTTBR
> >> +     lsr     r1, r1, #16
> >> +     ands    r1, r1, #0xff
> >> +     beq     99f
> >> +
> >> +     load_vcpu                       @ Load VCPU pointer
> >> +     .if \exception_code == ARM_EXCEPTION_DATA_ABORT
> >> +     mrc     p15, 4, r2, c5, c2, 0   @ HSR
> >> +     mrc     p15, 4, r1, c6, c0, 0   @ HDFAR
> >> +     str     r2, [vcpu, #VCPU_HSR]
> >> +     str     r1, [vcpu, #VCPU_HxFAR]
> >> +     .endif
> >> +     .if \exception_code == ARM_EXCEPTION_PREF_ABORT
> >> +     mrc     p15, 4, r2, c5, c2, 0   @ HSR
> >> +     mrc     p15, 4, r1, c6, c0, 2   @ HIFAR
> >> +     str     r2, [vcpu, #VCPU_HSR]
> >> +     str     r1, [vcpu, #VCPU_HxFAR]
> >> +     .endif
> >> +     mov     r1, #\exception_code
> >> +     b       __kvm_vcpu_return
> >> +
> >> +     @ We were in the host already. Let's craft a panic-ing return to SVC.
> >> +99:  mrs     r2, cpsr
> >> +     bic     r2, r2, #MODE_MASK
> >> +     orr     r2, r2, #SVC_MODE
> >> +THUMB(       orr     r2, r2, #PSR_T_BIT      )
> >> +     msr     spsr_cxsf, r2
> >> +     mrs     r1, ELR_hyp
> >> +     ldr     r2, =BSYM(panic)
> >> +     msr     ELR_hyp, r2
> >> +     ldr     r0, =\panic_str
> >> +     eret
> >> +.endm
> >> +
> >> +     .text
> >> +
> >>       .align 5
> >>  __kvm_hyp_vector:
> >>       .globl __kvm_hyp_vector
> >> -     nop
> >> +
> >> +     @ Hyp-mode exception vector
> >> +     W(b)    hyp_reset
> >> +     W(b)    hyp_undef
> >> +     W(b)    hyp_svc
> >> +     W(b)    hyp_pabt
> >> +     W(b)    hyp_dabt
> >> +     W(b)    hyp_hvc
> >> +     W(b)    hyp_irq
> >> +     W(b)    hyp_fiq
> >> +
> >> +     .align
> >> +hyp_reset:
> >> +     b       hyp_reset
> >> +
> >> +     .align
> >> +hyp_undef:
> >> +     bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str
> >> +
> >> +     .align
> >> +hyp_svc:
> >> +     bad_exception ARM_EXCEPTION_HVC, svc_die_str
> >> +
> >> +     .align
> >> +hyp_pabt:
> >> +     bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str
> >> +
> >> +     .align
> >> +hyp_dabt:
> >> +     bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str
> >> +
> >> +     .align
> >> +hyp_hvc:
> >> +     /*
> >> +      * Getting here is either becuase of a trap from a guest or from calling
> >> +      * HVC from the host kernel, which means "switch to Hyp mode".
> >> +      */
> >> +     push    {r0, r1, r2}
> >> +
> >> +     @ Check syndrome register
> >> +     mrc     p15, 4, r1, c5, c2, 0   @ HSR
> >> +     lsr     r0, r1, #HSR_EC_SHIFT
> >> +#ifdef CONFIG_VFPv3
> >> +     cmp     r0, #HSR_EC_CP_0_13
> >> +     beq     switch_to_guest_vfp
> >> +#endif
> >> +     cmp     r0, #HSR_EC_HVC
> >> +     bne     guest_trap              @ Not HVC instr.
> >> +
> >> +     /*
> >> +      * Let's check if the HVC came from VMID 0 and allow simple
> >> +      * switch to Hyp mode
> >> +      */
> >> +     mrrc    p15, 6, r0, r2, c2
> >> +     lsr     r2, r2, #16
> >> +     and     r2, r2, #0xff
> >> +     cmp     r2, #0
> >> +     bne     guest_trap              @ Guest called HVC
> >> +
> >> +host_switch_to_hyp:
> >> +     pop     {r0, r1, r2}
> >> +
> >> +     push    {lr}
> >> +     mrs     lr, SPSR
> >> +     push    {lr}
> >> +
> >> +     mov     lr, r0
> >> +     mov     r0, r1
> >> +     mov     r1, r2
> >> +     mov     r2, r3
> >> +
> >> +THUMB(       orr     lr, #1)
> >> +     blx     lr                      @ Call the HYP function
> >> +
> >> +     pop     {lr}
> >> +     msr     SPSR_csxf, lr
> >> +     pop     {lr}
> >> +     eret
> >> +
> >> +guest_trap:
> >> +     load_vcpu                       @ Load VCPU pointer to r0
> >> +     str     r1, [vcpu, #VCPU_HSR]
> >> +
> >> +     @ Check if we need the fault information
> >> +     lsr     r1, r1, #HSR_EC_SHIFT
> >> +     cmp     r1, #HSR_EC_IABT
> >> +     mrceq   p15, 4, r2, c6, c0, 2   @ HIFAR
> >> +     beq     2f
> >> +     cmp     r1, #HSR_EC_DABT
> >> +     bne     1f
> >> +     mrc     p15, 4, r2, c6, c0, 0   @ HDFAR
> >> +
> >> +2:   str     r2, [vcpu, #VCPU_HxFAR]
> >> +
> >> +     /*
> >> +      * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode:
> >> +      *
> >> +      * Abort on the stage 2 translation for a memory access from a
> >> +      * Non-secure PL1 or PL0 mode:
> >> +      *
> >> +      * For any Access flag fault or Translation fault, and also for any
> >> +      * Permission fault on the stage 2 translation of a memory access
> >> +      * made as part of a translation table walk for a stage 1 translation,
> >> +      * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR
> >> +      * is UNKNOWN.
> >> +      */
> >> +
> >> +     /* Check for permission fault, and S1PTW */
> >> +     mrc     p15, 4, r1, c5, c2, 0   @ HSR
> >> +     and     r0, r1, #HSR_FSC_TYPE
> >> +     cmp     r0, #FSC_PERM
> >> +     tsteq   r1, #(1 << 7)           @ S1PTW
> >> +     mrcne   p15, 4, r2, c6, c0, 4   @ HPFAR
> >> +     bne     3f
> >> +
> >> +     /* Resolve IPA using the xFAR */
> >> +     mcr     p15, 0, r2, c7, c8, 0   @ ATS1CPR
> >> +     isb
> >> +     mrrc    p15, 0, r0, r1, c7      @ PAR
> >> +     tst     r0, #1
> >> +     bne     4f                      @ Failed translation
> >> +     ubfx    r2, r0, #12, #20
> >> +     lsl     r2, r2, #4
> >> +     orr     r2, r2, r1, lsl #24
> >> +
> >> +3:   load_vcpu                       @ Load VCPU pointer to r0
> >> +     str     r2, [r0, #VCPU_HPFAR]
> >> +
> >> +1:   mov     r1, #ARM_EXCEPTION_HVC
> >> +     b       __kvm_vcpu_return
> >> +
> >> +4:   pop     {r0, r1, r2}            @ Failed translation, return to guest
> >> +     eret
> >> +
> >> +/*
> >> + * If VFPv3 support is not available, then we will not switch the VFP
> >> + * registers; however cp10 and cp11 accesses will still trap and fallback
> >> + * to the regular coprocessor emulation code, which currently will
> >> + * inject an undefined exception to the guest.
> >> + */
> >> +#ifdef CONFIG_VFPv3
> >> +switch_to_guest_vfp:
> >> +     load_vcpu                       @ Load VCPU pointer to r0
> >> +     push    {r3-r7}
> >> +
> >> +     @ NEON/VFP used.  Turn on VFP access.
> >> +     set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
> >> +
> >> +     @ Switch VFP/NEON hardware state to the guest's
> >> +     add     r7, r0, #VCPU_VFP_HOST
> >> +     ldr     r7, [r7]
> >> +     store_vfp_state r7
> >> +     add     r7, r0, #VCPU_VFP_GUEST
> >> +     restore_vfp_state r7
> >> +
> >> +     pop     {r3-r7}
> >> +     pop     {r0-r2}
> >> +     eret
> >> +#endif
> >> +
> >> +     .align
> >> +hyp_irq:
> >> +     push    {r0, r1, r2}
> >> +     mov     r1, #ARM_EXCEPTION_IRQ
> >> +     load_vcpu                       @ Load VCPU pointer to r0
> >> +     b       __kvm_vcpu_return
> >> +
> >> +     .align
> >> +hyp_fiq:
> >> +     b       hyp_fiq
> >> +
> >> +     .ltorg
> >>
> >>  __kvm_hyp_code_end:
> >>       .globl  __kvm_hyp_code_end
> >> +
> >> +     .section ".rodata"
> >> +
> >> +und_die_str:
> >> +     .ascii  "unexpected undefined exception in Hyp mode at: %#08x"
> >> +pabt_die_str:
> >> +     .ascii  "unexpected prefetch abort in Hyp mode at: %#08x"
> >> +dabt_die_str:
> >> +     .ascii  "unexpected data abort in Hyp mode at: %#08x"
> >> +svc_die_str:
> >> +     .ascii  "unexpected HVC/SVC trap in Hyp mode at: %#08x"
> >> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> >> new file mode 100644
> >> index 0000000..f59a580
> >> --- /dev/null
> >> +++ b/arch/arm/kvm/interrupts_head.S
> >> @@ -0,0 +1,443 @@
> >> +#define VCPU_USR_REG(_reg_nr)        (VCPU_USR_REGS + (_reg_nr * 4))
> >> +#define VCPU_USR_SP          (VCPU_USR_REG(13))
> >> +#define VCPU_USR_LR          (VCPU_USR_REG(14))
> >> +#define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4))
> >> +
> >> +/*
> >> + * Many of these macros need to access the VCPU structure, which is always
> >> + * held in r0. These macros should never clobber r1, as it is used to hold the
> >> + * exception code on the return path (except of course the macro that switches
> >> + * all the registers before the final jump to the VM).
> >> + */
> >> +vcpu .req    r0              @ vcpu pointer always in r0
> >> +
> >> +/* Clobbers {r2-r6} */
> >> +.macro store_vfp_state vfp_base
> >> +     @ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions
> >> +     VFPFMRX r2, FPEXC
> >> +     @ Make sure VFP is enabled so we can touch the registers.
> >> +     orr     r6, r2, #FPEXC_EN
> >> +     VFPFMXR FPEXC, r6
> >> +
> >> +     VFPFMRX r3, FPSCR
> >> +     tst     r2, #FPEXC_EX           @ Check for VFP Subarchitecture
> >> +     beq     1f
> >> +     @ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so
> >> +     @ we only need to save them if FPEXC_EX is set.
> >> +     VFPFMRX r4, FPINST
> >> +     tst     r2, #FPEXC_FP2V
> >> +     VFPFMRX r5, FPINST2, ne         @ vmrsne
> >> +     bic     r6, r2, #FPEXC_EX       @ FPEXC_EX disable
> >> +     VFPFMXR FPEXC, r6
> >> +1:
> >> +     VFPFSTMIA \vfp_base, r6         @ Save VFP registers
> >> +     stm     \vfp_base, {r2-r5}      @ Save FPEXC, FPSCR, FPINST, FPINST2
> >> +.endm
> >> +
> >> +/* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */
> >> +.macro restore_vfp_state vfp_base
> >> +     VFPFLDMIA \vfp_base, r6         @ Load VFP registers
> >> +     ldm     \vfp_base, {r2-r5}      @ Load FPEXC, FPSCR, FPINST, FPINST2
> >> +
> >> +     VFPFMXR FPSCR, r3
> >> +     tst     r2, #FPEXC_EX           @ Check for VFP Subarchitecture
> >> +     beq     1f
> >> +     VFPFMXR FPINST, r4
> >> +     tst     r2, #FPEXC_FP2V
> >> +     VFPFMXR FPINST2, r5, ne
> >> +1:
> >> +     VFPFMXR FPEXC, r2       @ FPEXC (last, in case !EN)
> >> +.endm
> >> +
> >> +/* These are simply for the macros to work - value don't have meaning */
> >> +.equ usr, 0
> >> +.equ svc, 1
> >> +.equ abt, 2
> >> +.equ und, 3
> >> +.equ irq, 4
> >> +.equ fiq, 5
> >> +
> >> +.macro push_host_regs_mode mode
> >> +     mrs     r2, SP_\mode
> >> +     mrs     r3, LR_\mode
> >> +     mrs     r4, SPSR_\mode
> >> +     push    {r2, r3, r4}
> >> +.endm
> >> +
> >> +/*
> >> + * Store all host persistent registers on the stack.
> >> + * Clobbers all registers, in all modes, except r0 and r1.
> >> + */
> >> +.macro save_host_regs
> >> +     /* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */
> >> +     mrs     r2, ELR_hyp
> >> +     push    {r2}
> >> +
> >> +     /* usr regs */
> >> +     push    {r4-r12}        @ r0-r3 are always clobbered
> >> +     mrs     r2, SP_usr
> >> +     mov     r3, lr
> >> +     push    {r2, r3}
> >> +
> >> +     push_host_regs_mode svc
> >> +     push_host_regs_mode abt
> >> +     push_host_regs_mode und
> >> +     push_host_regs_mode irq
> >> +
> >> +     /* fiq regs */
> >> +     mrs     r2, r8_fiq
> >> +     mrs     r3, r9_fiq
> >> +     mrs     r4, r10_fiq
> >> +     mrs     r5, r11_fiq
> >> +     mrs     r6, r12_fiq
> >> +     mrs     r7, SP_fiq
> >> +     mrs     r8, LR_fiq
> >> +     mrs     r9, SPSR_fiq
> >> +     push    {r2-r9}
> >> +.endm
> >> +
> >> +.macro pop_host_regs_mode mode
> >> +     pop     {r2, r3, r4}
> >> +     msr     SP_\mode, r2
> >> +     msr     LR_\mode, r3
> >> +     msr     SPSR_\mode, r4
> >> +.endm
> >> +
> >> +/*
> >> + * Restore all host registers from the stack.
> >> + * Clobbers all registers, in all modes, except r0 and r1.
> >> + */
> >> +.macro restore_host_regs
> >> +     pop     {r2-r9}
> >> +     msr     r8_fiq, r2
> >> +     msr     r9_fiq, r3
> >> +     msr     r10_fiq, r4
> >> +     msr     r11_fiq, r5
> >> +     msr     r12_fiq, r6
> >> +     msr     SP_fiq, r7
> >> +     msr     LR_fiq, r8
> >> +     msr     SPSR_fiq, r9
> >> +
> >> +     pop_host_regs_mode irq
> >> +     pop_host_regs_mode und
> >> +     pop_host_regs_mode abt
> >> +     pop_host_regs_mode svc
> >> +
> >> +     pop     {r2, r3}
> >> +     msr     SP_usr, r2
> >> +     mov     lr, r3
> >> +     pop     {r4-r12}
> >> +
> >> +     pop     {r2}
> >> +     msr     ELR_hyp, r2
> >> +.endm
> >> +
> >> +/*
> >> + * Restore SP, LR and SPSR for a given mode. offset is the offset of
> >> + * this mode's registers from the VCPU base.
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + *
> >> + * Clobbers r1, r2, r3, r4.
> >> + */
> >> +.macro restore_guest_regs_mode mode, offset
> >> +     add     r1, vcpu, \offset
> >> +     ldm     r1, {r2, r3, r4}
> >> +     msr     SP_\mode, r2
> >> +     msr     LR_\mode, r3
> >> +     msr     SPSR_\mode, r4
> >> +.endm
> >> +
> >> +/*
> >> + * Restore all guest registers from the vcpu struct.
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + *
> >> + * Clobbers *all* registers.
> >> + */
> >> +.macro restore_guest_regs
> >> +     restore_guest_regs_mode svc, #VCPU_SVC_REGS
> >> +     restore_guest_regs_mode abt, #VCPU_ABT_REGS
> >> +     restore_guest_regs_mode und, #VCPU_UND_REGS
> >> +     restore_guest_regs_mode irq, #VCPU_IRQ_REGS
> >> +
> >> +     add     r1, vcpu, #VCPU_FIQ_REGS
> >> +     ldm     r1, {r2-r9}
> >> +     msr     r8_fiq, r2
> >> +     msr     r9_fiq, r3
> >> +     msr     r10_fiq, r4
> >> +     msr     r11_fiq, r5
> >> +     msr     r12_fiq, r6
> >> +     msr     SP_fiq, r7
> >> +     msr     LR_fiq, r8
> >> +     msr     SPSR_fiq, r9
> >> +
> >> +     @ Load return state
> >> +     ldr     r2, [vcpu, #VCPU_PC]
> >> +     ldr     r3, [vcpu, #VCPU_CPSR]
> >> +     msr     ELR_hyp, r2
> >> +     msr     SPSR_cxsf, r3
> >> +
> >> +     @ Load user registers
> >> +     ldr     r2, [vcpu, #VCPU_USR_SP]
> >> +     ldr     r3, [vcpu, #VCPU_USR_LR]
> >> +     msr     SP_usr, r2
> >> +     mov     lr, r3
> >> +     add     vcpu, vcpu, #(VCPU_USR_REGS)
> >> +     ldm     vcpu, {r0-r12}
> >> +.endm
> >> +
> >> +/*
> >> + * Save SP, LR and SPSR for a given mode. offset is the offset of
> >> + * this mode's registers from the VCPU base.
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + *
> >> + * Clobbers r2, r3, r4, r5.
> >> + */
> >> +.macro save_guest_regs_mode mode, offset
> >> +     add     r2, vcpu, \offset
> >> +     mrs     r3, SP_\mode
> >> +     mrs     r4, LR_\mode
> >> +     mrs     r5, SPSR_\mode
> >> +     stm     r2, {r3, r4, r5}
> >> +.endm
> >> +
> >> +/*
> >> + * Save all guest registers to the vcpu struct
> >> + * Expects guest's r0, r1, r2 on the stack.
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + *
> >> + * Clobbers r2, r3, r4, r5.
> >> + */
> >> +.macro save_guest_regs
> >> +     @ Store usr registers
> >> +     add     r2, vcpu, #VCPU_USR_REG(3)
> >> +     stm     r2, {r3-r12}
> >> +     add     r2, vcpu, #VCPU_USR_REG(0)
> >> +     pop     {r3, r4, r5}            @ r0, r1, r2
> >> +     stm     r2, {r3, r4, r5}
> >> +     mrs     r2, SP_usr
> >> +     mov     r3, lr
> >> +     str     r2, [vcpu, #VCPU_USR_SP]
> >> +     str     r3, [vcpu, #VCPU_USR_LR]
> >> +
> >> +     @ Store return state
> >> +     mrs     r2, ELR_hyp
> >> +     mrs     r3, spsr
> >> +     str     r2, [vcpu, #VCPU_PC]
> >> +     str     r3, [vcpu, #VCPU_CPSR]
> >> +
> >> +     @ Store other guest registers
> >> +     save_guest_regs_mode svc, #VCPU_SVC_REGS
> >> +     save_guest_regs_mode abt, #VCPU_ABT_REGS
> >> +     save_guest_regs_mode und, #VCPU_UND_REGS
> >> +     save_guest_regs_mode irq, #VCPU_IRQ_REGS
> >> +.endm
> >> +
> >> +/* Reads cp15 registers from hardware and stores them in memory
> >> + * @store_to_vcpu: If 0, registers are written in-order to the stack,
> >> + *              otherwise to the VCPU struct pointed to by vcpup
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + *
> >> + * Clobbers r2 - r12
> >> + */
> >> +.macro read_cp15_state store_to_vcpu
> >> +     mrc     p15, 0, r2, c1, c0, 0   @ SCTLR
> >> +     mrc     p15, 0, r3, c1, c0, 2   @ CPACR
> >> +     mrc     p15, 0, r4, c2, c0, 2   @ TTBCR
> >> +     mrc     p15, 0, r5, c3, c0, 0   @ DACR
> >> +     mrrc    p15, 0, r6, r7, c2      @ TTBR 0
> >> +     mrrc    p15, 1, r8, r9, c2      @ TTBR 1
> >> +     mrc     p15, 0, r10, c10, c2, 0 @ PRRR
> >> +     mrc     p15, 0, r11, c10, c2, 1 @ NMRR
> >> +     mrc     p15, 2, r12, c0, c0, 0  @ CSSELR
> >> +
> >> +     .if \store_to_vcpu == 0
> >> +     push    {r2-r12}                @ Push CP15 registers
> >> +     .else
> >> +     str     r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
> >> +     str     r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
> >> +     str     r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
> >> +     str     r5, [vcpu, #CP15_OFFSET(c3_DACR)]
> >> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
> >> +     strd    r6, r7, [vcpu]
> >> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
> >> +     strd    r8, r9, [vcpu]
> >> +     sub     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
> >> +     str     r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
> >> +     str     r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
> >> +     str     r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
> >> +     .endif
> >> +
> >> +     mrc     p15, 0, r2, c13, c0, 1  @ CID
> >> +     mrc     p15, 0, r3, c13, c0, 2  @ TID_URW
> >> +     mrc     p15, 0, r4, c13, c0, 3  @ TID_URO
> >> +     mrc     p15, 0, r5, c13, c0, 4  @ TID_PRIV
> >> +     mrc     p15, 0, r6, c5, c0, 0   @ DFSR
> >> +     mrc     p15, 0, r7, c5, c0, 1   @ IFSR
> >> +     mrc     p15, 0, r8, c5, c1, 0   @ ADFSR
> >> +     mrc     p15, 0, r9, c5, c1, 1   @ AIFSR
> >> +     mrc     p15, 0, r10, c6, c0, 0  @ DFAR
> >> +     mrc     p15, 0, r11, c6, c0, 2  @ IFAR
> >> +     mrc     p15, 0, r12, c12, c0, 0 @ VBAR
> >> +
> >> +     .if \store_to_vcpu == 0
> >> +     push    {r2-r12}                @ Push CP15 registers
> >> +     .else
> >> +     str     r2, [vcpu, #CP15_OFFSET(c13_CID)]
> >> +     str     r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
> >> +     str     r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
> >> +     str     r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
> >> +     str     r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
> >> +     str     r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
> >> +     str     r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
> >> +     str     r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
> >> +     str     r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
> >> +     str     r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
> >> +     str     r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
> >> +     .endif
> >> +.endm
> >> +
> >> +/*
> >> + * Reads cp15 registers from memory and writes them to hardware
> >> + * @read_from_vcpu: If 0, registers are read in-order from the stack,
> >> + *               otherwise from the VCPU struct pointed to by vcpup
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + */
> >> +.macro write_cp15_state read_from_vcpu
> >> +     .if \read_from_vcpu == 0
> >> +     pop     {r2-r12}
> >> +     .else
> >> +     ldr     r2, [vcpu, #CP15_OFFSET(c13_CID)]
> >> +     ldr     r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
> >> +     ldr     r4, [vcpu, #CP15_OFFSET(c13_TID_URO)]
> >> +     ldr     r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)]
> >> +     ldr     r6, [vcpu, #CP15_OFFSET(c5_DFSR)]
> >> +     ldr     r7, [vcpu, #CP15_OFFSET(c5_IFSR)]
> >> +     ldr     r8, [vcpu, #CP15_OFFSET(c5_ADFSR)]
> >> +     ldr     r9, [vcpu, #CP15_OFFSET(c5_AIFSR)]
> >> +     ldr     r10, [vcpu, #CP15_OFFSET(c6_DFAR)]
> >> +     ldr     r11, [vcpu, #CP15_OFFSET(c6_IFAR)]
> >> +     ldr     r12, [vcpu, #CP15_OFFSET(c12_VBAR)]
> >> +     .endif
> >> +
> >> +     mcr     p15, 0, r2, c13, c0, 1  @ CID
> >> +     mcr     p15, 0, r3, c13, c0, 2  @ TID_URW
> >> +     mcr     p15, 0, r4, c13, c0, 3  @ TID_URO
> >> +     mcr     p15, 0, r5, c13, c0, 4  @ TID_PRIV
> >> +     mcr     p15, 0, r6, c5, c0, 0   @ DFSR
> >> +     mcr     p15, 0, r7, c5, c0, 1   @ IFSR
> >> +     mcr     p15, 0, r8, c5, c1, 0   @ ADFSR
> >> +     mcr     p15, 0, r9, c5, c1, 1   @ AIFSR
> >> +     mcr     p15, 0, r10, c6, c0, 0  @ DFAR
> >> +     mcr     p15, 0, r11, c6, c0, 2  @ IFAR
> >> +     mcr     p15, 0, r12, c12, c0, 0 @ VBAR
> >> +
> >> +     .if \read_from_vcpu == 0
> >> +     pop     {r2-r12}
> >> +     .else
> >> +     ldr     r2, [vcpu, #CP15_OFFSET(c1_SCTLR)]
> >> +     ldr     r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
> >> +     ldr     r4, [vcpu, #CP15_OFFSET(c2_TTBCR)]
> >> +     ldr     r5, [vcpu, #CP15_OFFSET(c3_DACR)]
> >> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR0)
> >> +     ldrd    r6, r7, [vcpu]
> >> +     add     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1) - CP15_OFFSET(c2_TTBR0)
> >> +     ldrd    r8, r9, [vcpu]
> >> +     sub     vcpu, vcpu, #CP15_OFFSET(c2_TTBR1)
> >> +     ldr     r10, [vcpu, #CP15_OFFSET(c10_PRRR)]
> >> +     ldr     r11, [vcpu, #CP15_OFFSET(c10_NMRR)]
> >> +     ldr     r12, [vcpu, #CP15_OFFSET(c0_CSSELR)]
> >> +     .endif
> >> +
> >> +     mcr     p15, 0, r2, c1, c0, 0   @ SCTLR
> >> +     mcr     p15, 0, r3, c1, c0, 2   @ CPACR
> >> +     mcr     p15, 0, r4, c2, c0, 2   @ TTBCR
> >> +     mcr     p15, 0, r5, c3, c0, 0   @ DACR
> >> +     mcrr    p15, 0, r6, r7, c2      @ TTBR 0
> >> +     mcrr    p15, 1, r8, r9, c2      @ TTBR 1
> >> +     mcr     p15, 0, r10, c10, c2, 0 @ PRRR
> >> +     mcr     p15, 0, r11, c10, c2, 1 @ NMRR
> >> +     mcr     p15, 2, r12, c0, c0, 0  @ CSSELR
> >> +.endm
> >> +
> >> +/*
> >> + * Save the VGIC CPU state into memory
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + */
> >> +.macro save_vgic_state
> >> +.endm
> >> +
> >> +/*
> >> + * Restore the VGIC CPU state from memory
> >> + *
> >> + * Assumes vcpu pointer in vcpu reg
> >> + */
> >> +.macro restore_vgic_state
> >> +.endm
> >> +
> >> +.equ vmentry,        0
> >> +.equ vmexit, 1
> >> +
> >> +/* Configures the HSTR (Hyp System Trap Register) on entry/return
> >> + * (hardware reset value is 0) */
> >> +.macro set_hstr operation
> >> +     mrc     p15, 4, r2, c1, c1, 3
> >> +     ldr     r3, =HSTR_T(15)
> >> +     .if \operation == vmentry
> >> +     orr     r2, r2, r3              @ Trap CR{15}
> >> +     .else
> >> +     bic     r2, r2, r3              @ Don't trap any CRx accesses
> >> +     .endif
> >> +     mcr     p15, 4, r2, c1, c1, 3
> >> +.endm
> >> +
> >> +/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
> >> + * (hardware reset value is 0). Keep previous value in r2. */
> >> +.macro set_hcptr operation, mask
> >> +     mrc     p15, 4, r2, c1, c1, 2
> >> +     ldr     r3, =\mask
> >> +     .if \operation == vmentry
> >> +     orr     r3, r2, r3              @ Trap coproc-accesses defined in mask
> >> +     .else
> >> +     bic     r3, r2, r3              @ Don't trap defined coproc-accesses
> >> +     .endif
> >> +     mcr     p15, 4, r3, c1, c1, 2
> >> +.endm
> >> +
> >> +/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
> >> + * (hardware reset value is 0) */
> >> +.macro set_hdcr operation
> >> +     mrc     p15, 4, r2, c1, c1, 1
> >> +     ldr     r3, =(HDCR_TPM|HDCR_TPMCR)
> >> +     .if \operation == vmentry
> >> +     orr     r2, r2, r3              @ Trap some perfmon accesses
> >> +     .else
> >> +     bic     r2, r2, r3              @ Don't trap any perfmon accesses
> >> +     .endif
> >> +     mcr     p15, 4, r2, c1, c1, 1
> >> +.endm
> >> +
> >> +/* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */
> >> +.macro configure_hyp_role operation
> >> +     mrc     p15, 4, r2, c1, c1, 0   @ HCR
> >> +     bic     r2, r2, #HCR_VIRT_EXCP_MASK
> >> +     ldr     r3, =HCR_GUEST_MASK
> >> +     .if \operation == vmentry
> >> +     orr     r2, r2, r3
> >> +     ldr     r3, [vcpu, #VCPU_IRQ_LINES]
> > irq_lines are accessed atomically from vcpu_interrupt_line(), but there
> > is no memory barriers or atomic operations here. Looks suspicious though
> > I am not familiar with ARM memory model. As far as I understand
> > different translation regimes are used to access this memory, so who
> > knows what this does to access ordering.
> >
> >
> 
> there's an exception taken to switch to Hyp mode, which I'm quite sure
> implies a memory barrier.
> 
> >> +     orr     r2, r2, r3
> >> +     .else
> >> +     bic     r2, r2, r3
> >> +     .endif
> >> +     mcr     p15, 4, r2, c1, c1, 0
> >> +.endm
> >> +
> >> +.macro load_vcpu
> >> +     mrc     p15, 4, vcpu, c13, c0, 2        @ HTPIDR
> >> +.endm
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majordomo at vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> commit e290b507f0d31c895bd515d69c0c2b50d76b20db
> Author: Christoffer Dall <c.dall@virtualopensystems.com>
> Date:   Tue Jan 15 20:53:03 2013 -0500
> 
>     KVM: ARM: Honor vcpu->requests in the world-switch code
> 
>     Honor vpuc->request by checking them accordingly and explicitly raise an
>     error if unsupported requests are set (we don't support any requests on
>     ARM currently).
> 
>     Also add some commenting to explain the synchronization in more details
>     here.  The commenting implied renaming a variable and changing error
>     handling slightly to improve readibility.
> 
>     Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 6ff5337..b23a709 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -620,7 +620,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>   */
>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> -	int ret;
> +	int guest_ret, ret;
>  	sigset_t sigsaved;
> 
>  	/* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
> @@ -640,9 +640,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>  	if (vcpu->sigset_active)
>  		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
> 
> -	ret = 1;
>  	run->exit_reason = KVM_EXIT_UNKNOWN;
> -	while (ret > 0) {
> +	for (;;) {
>  		/*
>  		 * Check conditions before entering the guest
>  		 */
> @@ -650,18 +649,44 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
> 
>  		update_vttbr(vcpu->kvm);
> 
> +		/*
> +		 * There is a dependency between setting IN_GUEST_MODE and
> +		 * sending requests.  We need to ensure:
> +		 *   1. Setting IN_GUEST_MODE before checking vcpu->requests.
> +		 *   2. We need to check vcpu_request after disabling IRQs
> +		 *      (see comment about signal_pending below).
> +		 */
> +		vcpu->mode = IN_GUEST_MODE;
> +
>  		local_irq_disable();
> 
>  		/*
> -		 * Re-check atomic conditions
> +		 * We need to be careful to check these variables after
> +		 * disabling interrupts.  For example with signals:
> +		 *   1. If the signal comes before the signal_pending check,
> +		 *      we will return to user space and everything's good.
> +		 *   2. If the signal comes after the signal_pending check,
> +		 *      we rely on an IPI to exit the guest and continue the
> +		 *      while loop, which checks for pending signals again.
>  		 */
>  		if (signal_pending(current)) {
>  			ret = -EINTR;
>  			run->exit_reason = KVM_EXIT_INTR;
> +			local_irq_enable();
> +			vcpu->mode = OUTSIDE_GUEST_MODE;
> +			break;
>  		}
> 
> -		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> +		if (vcpu->requests) {
> +			ret = -ENOSYS; /* requests not supported */
>  			local_irq_enable();
> +			vcpu->mode = OUTSIDE_GUEST_MODE;
> +			break;
> +		}
> +
> +		if (need_new_vmid_gen(vcpu->kvm)) {
> +			local_irq_enable();
> +			vcpu->mode = OUTSIDE_GUEST_MODE;
>  			continue;
>  		}
> 
> @@ -670,17 +695,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
>  		 */
>  		trace_kvm_entry(*vcpu_pc(vcpu));
>  		kvm_guest_enter();
> -		vcpu->mode = IN_GUEST_MODE;
> 
>  		smp_mb(); /* set mode before reading vcpu->arch.pause */
>  		if (unlikely(vcpu->arch.pause)) {
>  			/* This means ignore, try again. */
> -			ret = ARM_EXCEPTION_IRQ;
> +			guest_ret = ARM_EXCEPTION_IRQ;
>  		} else {
> -			ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> +			guest_ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>  		}
> 
> -		vcpu->mode = OUTSIDE_GUEST_MODE;
>  		vcpu->arch.last_pcpu = smp_processor_id();
>  		kvm_guest_exit();
>  		trace_kvm_exit(*vcpu_pc(vcpu));
> @@ -695,12 +718,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
>  		 * mode this time.
>  		 */
>  		local_irq_enable();
> +		vcpu->mode = OUTSIDE_GUEST_MODE;
> 
>  		/*
>  		 * Back from guest
>  		 *************************************************************/
> 
> -		ret = handle_exit(vcpu, run, ret);
> +		ret = handle_exit(vcpu, run, guest_ret);
> +		if (ret <= 0)
> +			break;
>  	}
> 
>  	if (vcpu->sigset_active)
> 
> commit fc9a9c5e9dd4eba4acd6bea5c8c083a9a854d662
> Author: Christoffer Dall <c.dall@virtualopensystems.com>
> Date:   Tue Jan 15 20:42:15 2013 -0500
> 
>     KVM: ARM: Remove unused memslot parameter
> 
> diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
> index 31ab9f5..571ccf0 100644
> --- a/arch/arm/include/asm/kvm_mmio.h
> +++ b/arch/arm/include/asm/kvm_mmio.h
> @@ -46,6 +46,6 @@ static inline void kvm_prepare_mmio(struct kvm_run *run,
> 
>  int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> -		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot);
> +		 phys_addr_t fault_ipa);
> 
>  #endif	/* __ARM_KVM_MMIO_H__ */
> diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
> index d6a4ca0..f655088 100644
> --- a/arch/arm/kvm/mmio.c
> +++ b/arch/arm/kvm/mmio.c
> @@ -117,7 +117,7 @@ static int decode_hsr(struct kvm_vcpu *vcpu,
> phys_addr_t fault_ipa,
>  }
> 
>  int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> -		 phys_addr_t fault_ipa, struct kvm_memory_slot *memslot)
> +		 phys_addr_t fault_ipa)
>  {
>  	struct kvm_exit_mmio mmio;
>  	unsigned long rt;
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 2a83ac9..c806080 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -588,7 +588,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>  	unsigned long hsr_ec;
>  	unsigned long fault_status;
>  	phys_addr_t fault_ipa;
> -	struct kvm_memory_slot *memslot = NULL;
> +	struct kvm_memory_slot *memslot;
>  	bool is_iabt;
>  	gfn_t gfn;
>  	int ret;
> @@ -624,7 +624,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
> 
>  		/* Adjust page offset */
>  		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
> -		return io_mem_abort(vcpu, run, fault_ipa, memslot);
> +		return io_mem_abort(vcpu, run, fault_ipa);
>  	}
> 
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
> 
> commit 70667a06e445e240fb5e6352ccdc4bc8a290866e
> Author: Christoffer Dall <c.dall@virtualopensystems.com>
> Date:   Tue Jan 15 20:51:42 2013 -0500
> 
>     KVM: ARM: Grab kvm->srcu lock when handling page faults
> 
>     The memslots data structure is protected with an SRCU lock, so we should
>     grab the read side lock before traversing this data structure.
> 
>     Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index c806080..0b7eabf 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -591,7 +591,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>  	struct kvm_memory_slot *memslot;
>  	bool is_iabt;
>  	gfn_t gfn;
> -	int ret;
> +	int ret, idx;
> 
>  	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
>  	is_iabt = (hsr_ec == HSR_EC_IABT);
> @@ -627,13 +627,17 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
>  		return io_mem_abort(vcpu, run, fault_ipa);
>  	}
> 
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
>  	if (!memslot->user_alloc) {
>  		kvm_err("non user-alloc memslots not supported\n");
> -		return -EINVAL;
> +		ret = -EINVAL;
> +		goto out_unlock;
>  	}
> 
>  	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
> +out_unlock:
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  	return ret ? ret : 1;
>  }
> 
> --
> 
> Thanks,
> -Christoffer

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-16  4:08         ` Christoffer Dall
@ 2013-01-16 12:57           ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 12:57 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Antonios Motakis,
	Marcelo Tosatti, Rusty Russell, nicolas

On Tue, Jan 15, 2013 at 11:08:24PM -0500, Christoffer Dall wrote:
> On Tue, Jan 15, 2013 at 9:08 PM, Christoffer Dall
> <c.dall@virtualopensystems.com> wrote:
> > On Tue, Jan 15, 2013 at 4:43 AM, Gleb Natapov <gleb@redhat.com> wrote:
> >> On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
> >>> Provides complete world-switch implementation to switch to other guests
> >>> running in non-secure modes. Includes Hyp exception handlers that
> >>> capture necessary exception information and stores the information on
> >>> the VCPU and KVM structures.
> >>>
> >>> The following Hyp-ABI is also documented in the code:
> >>>
> >>> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
> >>>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
> >>>    exception vector code will check that the HVC comes from VMID==0 and if
> >>>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
> >>>    - r0 contains a pointer to a HYP function
> >>>    - r1, r2, and r3 contain arguments to the above function.
> >>>    - The HYP function will be called with its arguments in r0, r1 and r2.
> >>>    On HYP function return, we return directly to SVC.
> >>>
> >>> A call to a function executing in Hyp mode is performed like the following:
> >>>
> >>>         <svc code>
> >>>         ldr     r0, =BSYM(my_hyp_fn)
> >>>         ldr     r1, =my_param
> >>>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
> >>>         <svc code>
> >>>
> >>> Otherwise, the world-switch is pretty straight-forward. All state that
> >>> can be modified by the guest is first backed up on the Hyp stack and the
> >>> VCPU values is loaded onto the hardware. State, which is not loaded, but
> >>> theoretically modifiable by the guest is protected through the
> >>> virtualiation features to generate a trap and cause software emulation.
> >>> Upon guest returns, all state is restored from hardware onto the VCPU
> >>> struct and the original state is restored from the Hyp-stack onto the
> >>> hardware.
> >>>
> >>> SMP support using the VMPIDR calculated on the basis of the host MPIDR
> >>> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
> >>>
> >>> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
> >>> a separate patch into the appropriate patches introducing the
> >>> functionality. Note that the VMIDs are stored per VM as required by the ARM
> >>> architecture reference manual.
> >>>
> >>> To support VFP/NEON we trap those instructions using the HPCTR. When
> >>> we trap, we switch the FPU.  After a guest exit, the VFP state is
> >>> returned to the host.  When disabling access to floating point
> >>> instructions, we also mask FPEXC_EN in order to avoid the guest
> >>> receiving Undefined instruction exceptions before we have a chance to
> >>> switch back the floating point state.  We are reusing vfp_hard_struct,
> >>> so we depend on VFPv3 being enabled in the host kernel, if not, we still
> >>> trap cp10 and cp11 in order to inject an undefined instruction exception
> >>> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
> >>> Antionios Motakis and Rusty Russell.
> >>>
> >>> Aborts that are permission faults, and not stage-1 page table walk, do
> >>> not report the faulting address in the HPFAR.  We have to resolve the
> >>> IPA, and store it just like the HPFAR register on the VCPU struct. If
> >>> the IPA cannot be resolved, it means another CPU is playing with the
> >>> page tables, and we simply restart the guest.  This quirk was fixed by
> >>> Marc Zyngier.
> >>>
> >>> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> >>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> >>> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> >>> ---
> >>>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
> >>>  arch/arm/include/asm/kvm_host.h |   10 +
> >>>  arch/arm/kernel/asm-offsets.c   |   25 ++
> >>>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
> >>>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
> >>>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
> >>>  6 files changed, 1108 insertions(+), 4 deletions(-)
> >>>  create mode 100644 arch/arm/kvm/interrupts_head.S
> >>>
> >>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> >>> index fb22ee8..a3262a2 100644
> >>> --- a/arch/arm/include/asm/kvm_arm.h
> >>> +++ b/arch/arm/include/asm/kvm_arm.h
> >>> @@ -98,6 +98,18 @@
> >>>  #define TTBCR_T0SZ   3
> >>>  #define HTCR_MASK    (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
> >>>
> >>> +/* Hyp System Trap Register */
> >>> +#define HSTR_T(x)    (1 << x)
> >>> +#define HSTR_TTEE    (1 << 16)
> >>> +#define HSTR_TJDBX   (1 << 17)
> >>> +
> >>> +/* Hyp Coprocessor Trap Register */
> >>> +#define HCPTR_TCP(x) (1 << x)
> >>> +#define HCPTR_TCP_MASK       (0x3fff)
> >>> +#define HCPTR_TASE   (1 << 15)
> >>> +#define HCPTR_TTA    (1 << 20)
> >>> +#define HCPTR_TCPAC  (1 << 31)
> >>> +
> >>>  /* Hyp Debug Configuration Register bits */
> >>>  #define HDCR_TDRA    (1 << 11)
> >>>  #define HDCR_TDOSA   (1 << 10)
> >>> @@ -144,6 +156,45 @@
> >>>  #else
> >>>  #define VTTBR_X              (5 - KVM_T0SZ)
> >>>  #endif
> >>> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
> >>> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
> >>> +#define VTTBR_VMID_SHIFT  (48LLU)
> >>> +#define VTTBR_VMID_MASK        (0xffLLU << VTTBR_VMID_SHIFT)
> >>> +
> >>> +/* Hyp Syndrome Register (HSR) bits */
> >>> +#define HSR_EC_SHIFT (26)
> >>> +#define HSR_EC               (0x3fU << HSR_EC_SHIFT)
> >>> +#define HSR_IL               (1U << 25)
> >>> +#define HSR_ISS              (HSR_IL - 1)
> >>> +#define HSR_ISV_SHIFT        (24)
> >>> +#define HSR_ISV              (1U << HSR_ISV_SHIFT)
> >>> +#define HSR_FSC              (0x3f)
> >>> +#define HSR_FSC_TYPE (0x3c)
> >>> +#define HSR_WNR              (1 << 6)
> >>> +
> >>> +#define FSC_FAULT    (0x04)
> >>> +#define FSC_PERM     (0x0c)
> >>> +
> >>> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> >>> +#define HPFAR_MASK   (~0xf)
> >>>
> >>> +#define HSR_EC_UNKNOWN       (0x00)
> >>> +#define HSR_EC_WFI   (0x01)
> >>> +#define HSR_EC_CP15_32       (0x03)
> >>> +#define HSR_EC_CP15_64       (0x04)
> >>> +#define HSR_EC_CP14_MR       (0x05)
> >>> +#define HSR_EC_CP14_LS       (0x06)
> >>> +#define HSR_EC_CP_0_13       (0x07)
> >>> +#define HSR_EC_CP10_ID       (0x08)
> >>> +#define HSR_EC_JAZELLE       (0x09)
> >>> +#define HSR_EC_BXJ   (0x0A)
> >>> +#define HSR_EC_CP14_64       (0x0C)
> >>> +#define HSR_EC_SVC_HYP       (0x11)
> >>> +#define HSR_EC_HVC   (0x12)
> >>> +#define HSR_EC_SMC   (0x13)
> >>> +#define HSR_EC_IABT  (0x20)
> >>> +#define HSR_EC_IABT_HYP      (0x21)
> >>> +#define HSR_EC_DABT  (0x24)
> >>> +#define HSR_EC_DABT_HYP      (0x25)
> >>>
> >>>  #endif /* __ARM_KVM_ARM_H__ */
> >>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >>> index 1de6f0d..ddb09da 100644
> >>> --- a/arch/arm/include/asm/kvm_host.h
> >>> +++ b/arch/arm/include/asm/kvm_host.h
> >>> @@ -21,6 +21,7 @@
> >>>
> >>>  #include <asm/kvm.h>
> >>>  #include <asm/kvm_asm.h>
> >>> +#include <asm/fpstate.h>
> >>>
> >>>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
> >>>  #define KVM_USER_MEM_SLOTS 32
> >>> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
> >>>       u32 hxfar;              /* Hyp Data/Inst Fault Address Register */
> >>>       u32 hpfar;              /* Hyp IPA Fault Address Register */
> >>>
> >>> +     /* Floating point registers (VFP and Advanced SIMD/NEON) */
> >>> +     struct vfp_hard_struct vfp_guest;
> >>> +     struct vfp_hard_struct *vfp_host;
> >>> +
> >>> +     /*
> >>> +      * Anything that is not used directly from assembly code goes
> >>> +      * here.
> >>> +      */
> >>>       /* Interrupt related fields */
> >>>       u32 irq_lines;          /* IRQ and FIQ levels */
> >>>
> >>> @@ -112,6 +121,7 @@ struct kvm_one_reg;
> >>>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> >>>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> >>>  u64 kvm_call_hyp(void *hypfn, ...);
> >>> +void force_vm_exit(const cpumask_t *mask);
> >>>
> >>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
> >>>  struct kvm;
> >>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> >>> index c985b48..c8b3272 100644
> >>> --- a/arch/arm/kernel/asm-offsets.c
> >>> +++ b/arch/arm/kernel/asm-offsets.c
> >>> @@ -13,6 +13,9 @@
> >>>  #include <linux/sched.h>
> >>>  #include <linux/mm.h>
> >>>  #include <linux/dma-mapping.h>
> >>> +#ifdef CONFIG_KVM_ARM_HOST
> >>> +#include <linux/kvm_host.h>
> >>> +#endif
> >>>  #include <asm/cacheflush.h>
> >>>  #include <asm/glue-df.h>
> >>>  #include <asm/glue-pf.h>
> >>> @@ -146,5 +149,27 @@ int main(void)
> >>>    DEFINE(DMA_BIDIRECTIONAL,  DMA_BIDIRECTIONAL);
> >>>    DEFINE(DMA_TO_DEVICE,              DMA_TO_DEVICE);
> >>>    DEFINE(DMA_FROM_DEVICE,    DMA_FROM_DEVICE);
> >>> +#ifdef CONFIG_KVM_ARM_HOST
> >>> +  DEFINE(VCPU_KVM,           offsetof(struct kvm_vcpu, kvm));
> >>> +  DEFINE(VCPU_MIDR,          offsetof(struct kvm_vcpu, arch.midr));
> >>> +  DEFINE(VCPU_CP15,          offsetof(struct kvm_vcpu, arch.cp15));
> >>> +  DEFINE(VCPU_VFP_GUEST,     offsetof(struct kvm_vcpu, arch.vfp_guest));
> >>> +  DEFINE(VCPU_VFP_HOST,              offsetof(struct kvm_vcpu, arch.vfp_host));
> >>> +  DEFINE(VCPU_REGS,          offsetof(struct kvm_vcpu, arch.regs));
> >>> +  DEFINE(VCPU_USR_REGS,              offsetof(struct kvm_vcpu, arch.regs.usr_regs));
> >>> +  DEFINE(VCPU_SVC_REGS,              offsetof(struct kvm_vcpu, arch.regs.svc_regs));
> >>> +  DEFINE(VCPU_ABT_REGS,              offsetof(struct kvm_vcpu, arch.regs.abt_regs));
> >>> +  DEFINE(VCPU_UND_REGS,              offsetof(struct kvm_vcpu, arch.regs.und_regs));
> >>> +  DEFINE(VCPU_IRQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.irq_regs));
> >>> +  DEFINE(VCPU_FIQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
> >>> +  DEFINE(VCPU_PC,            offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
> >>> +  DEFINE(VCPU_CPSR,          offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
> >>> +  DEFINE(VCPU_IRQ_LINES,     offsetof(struct kvm_vcpu, arch.irq_lines));
> >>> +  DEFINE(VCPU_HSR,           offsetof(struct kvm_vcpu, arch.hsr));
> >>> +  DEFINE(VCPU_HxFAR,         offsetof(struct kvm_vcpu, arch.hxfar));
> >>> +  DEFINE(VCPU_HPFAR,         offsetof(struct kvm_vcpu, arch.hpfar));
> >>> +  DEFINE(VCPU_HYP_PC,                offsetof(struct kvm_vcpu, arch.hyp_pc));
> >>> +  DEFINE(KVM_VTTBR,          offsetof(struct kvm, arch.vttbr));
> >>> +#endif
> >>>    return 0;
> >>>  }
> >>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>> index 9b4566e..c94d278 100644
> >>> --- a/arch/arm/kvm/arm.c
> >>> +++ b/arch/arm/kvm/arm.c
> >>> @@ -40,6 +40,7 @@
> >>>  #include <asm/kvm_arm.h>
> >>>  #include <asm/kvm_asm.h>
> >>>  #include <asm/kvm_mmu.h>
> >>> +#include <asm/kvm_emulate.h>
> >>>
> >>>  #ifdef REQUIRES_VIRT
> >>>  __asm__(".arch_extension     virt");
> >>> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> >>>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
> >>>  static unsigned long hyp_default_vectors;
> >>>
> >>> +/* The VMID used in the VTTBR */
> >>> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> >>> +static u8 kvm_next_vmid;
> >>> +static DEFINE_SPINLOCK(kvm_vmid_lock);
> >>>
> >>>  int kvm_arch_hardware_enable(void *garbage)
> >>>  {
> >>> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
> >>>
> >>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >>>  {
> >>> +     /* Force users to call KVM_ARM_VCPU_INIT */
> >>> +     vcpu->arch.target = -1;
> >>>       return 0;
> >>>  }
> >>>
> >>> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
> >>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>>  {
> >>>       vcpu->cpu = cpu;
> >>> +     vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
> >>>  }
> >>>
> >>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >>> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> >>>
> >>>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
> >> As far as I see the function is unused.
> >>
> >>>  {
> >>> +     return v->mode == IN_GUEST_MODE;
> >>> +}
> >>> +
> >>> +/* Just ensure a guest exit from a particular CPU */
> >>> +static void exit_vm_noop(void *info)
> >>> +{
> >>> +}
> >>> +
> >>> +void force_vm_exit(const cpumask_t *mask)
> >>> +{
> >>> +     smp_call_function_many(mask, exit_vm_noop, NULL, true);
> >>> +}
> >> There is make_all_cpus_request() for that. It actually sends IPIs only
> >> to cpus that are running vcpus.
> >>
> >>> +
> >>> +/**
> >>> + * need_new_vmid_gen - check that the VMID is still valid
> >>> + * @kvm: The VM's VMID to checkt
> >>> + *
> >>> + * return true if there is a new generation of VMIDs being used
> >>> + *
> >>> + * The hardware supports only 256 values with the value zero reserved for the
> >>> + * host, so we check if an assigned value belongs to a previous generation,
> >>> + * which which requires us to assign a new value. If we're the first to use a
> >>> + * VMID for the new generation, we must flush necessary caches and TLBs on all
> >>> + * CPUs.
> >>> + */
> >>> +static bool need_new_vmid_gen(struct kvm *kvm)
> >>> +{
> >>> +     return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
> >>> +}
> >>> +
> >>> +/**
> >>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
> >>> + * @kvm      The guest that we are about to run
> >>> + *
> >>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
> >>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
> >>> + * caches and TLBs.
> >>> + */
> >>> +static void update_vttbr(struct kvm *kvm)
> >>> +{
> >>> +     phys_addr_t pgd_phys;
> >>> +     u64 vmid;
> >>> +
> >>> +     if (!need_new_vmid_gen(kvm))
> >>> +             return;
> >>> +
> >>> +     spin_lock(&kvm_vmid_lock);
> >>> +
> >>> +     /*
> >>> +      * We need to re-check the vmid_gen here to ensure that if another vcpu
> >>> +      * already allocated a valid vmid for this vm, then this vcpu should
> >>> +      * use the same vmid.
> >>> +      */
> >>> +     if (!need_new_vmid_gen(kvm)) {
> >>> +             spin_unlock(&kvm_vmid_lock);
> >>> +             return;
> >>> +     }
> >>> +
> >>> +     /* First user of a new VMID generation? */
> >>> +     if (unlikely(kvm_next_vmid == 0)) {
> >>> +             atomic64_inc(&kvm_vmid_gen);
> >>> +             kvm_next_vmid = 1;
> >>> +
> >>> +             /*
> >>> +              * On SMP we know no other CPUs can use this CPU's or each
> >>> +              * other's VMID after force_vm_exit returns since the
> >>> +              * kvm_vmid_lock blocks them from reentry to the guest.
> >>> +              */
> >>> +             force_vm_exit(cpu_all_mask);
> >>> +             /*
> >>> +              * Now broadcast TLB + ICACHE invalidation over the inner
> >>> +              * shareable domain to make sure all data structures are
> >>> +              * clean.
> >>> +              */
> >>> +             kvm_call_hyp(__kvm_flush_vm_context);
> >>> +     }
> >>> +
> >>> +     kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
> >>> +     kvm->arch.vmid = kvm_next_vmid;
> >>> +     kvm_next_vmid++;
> >>> +
> >>> +     /* update vttbr to be used with the new vmid */
> >>> +     pgd_phys = virt_to_phys(kvm->arch.pgd);
> >>> +     vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
> >>> +     kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
> >>> +     kvm->arch.vttbr |= vmid;
> >>> +
> >>> +     spin_unlock(&kvm_vmid_lock);
> >>> +}
> >>> +
> >>> +/*
> >>> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
> >>> + * proper exit to QEMU.
> >>> + */
> >>> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >>> +                    int exception_index)
> >>> +{
> >>> +     run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> >>>       return 0;
> >>>  }
> >>>
> >>> +/**
> >>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
> >>> + * @vcpu:    The VCPU pointer
> >>> + * @run:     The kvm_run structure pointer used for userspace state exchange
> >>> + *
> >>> + * This function is called through the VCPU_RUN ioctl called from user space. It
> >>> + * will execute VM code in a loop until the time slice for the process is used
> >>> + * or some emulation is needed from user space in which case the function will
> >>> + * return with return value 0 and with the kvm_run structure filled in with the
> >>> + * required data for the requested emulation.
> >>> + */
> >>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  {
> >>> -     return -EINVAL;
> >>> +     int ret;
> >>> +     sigset_t sigsaved;
> >>> +
> >>> +     /* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
> >>> +     if (unlikely(vcpu->arch.target < 0))
> >>> +             return -ENOEXEC;
> >>> +
> >>> +     if (vcpu->sigset_active)
> >>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
> >>> +
> >>> +     ret = 1;
> >>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
> >>> +     while (ret > 0) {
> >>> +             /*
> >>> +              * Check conditions before entering the guest
> >>> +              */
> >>> +             cond_resched();
> >>> +
> >>> +             update_vttbr(vcpu->kvm);
> >>> +
> >>> +             local_irq_disable();
> >>> +
> >>> +             /*
> >>> +              * Re-check atomic conditions
> >>> +              */
> >>> +             if (signal_pending(current)) {
> >>> +                     ret = -EINTR;
> >>> +                     run->exit_reason = KVM_EXIT_INTR;
> >>> +             }
> >>> +
> >>> +             if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >>> +                     local_irq_enable();
> >>> +                     continue;
> >>> +             }
> >>> +
> >>> +             /**************************************************************
> >>> +              * Enter the guest
> >>> +              */
> >>> +             trace_kvm_entry(*vcpu_pc(vcpu));
> >>> +             kvm_guest_enter();
> >>> +             vcpu->mode = IN_GUEST_MODE;
> >> You need to set mode to IN_GUEST_MODE before disabling interrupt and
> >> check that mode != EXITING_GUEST_MODE after disabling interrupt but
> >> before entering the guest. This way you will catch kicks that were sent
> >> between setting of the mode and disabling the interrupts. Also you need
> >> to check vcpu->requests and exit if it is not empty. I see that you do
> >> not use vcpu->requests at all, but you should since common kvm code
> >> assumes that it is used. make_all_cpus_request() uses it for instance.
> >>
> >
> > I don't quite agree, but almost:
> >
> > Why would you set IN_GUEST_MODE before disabling interrupts? The only
> > reason I can see for to be a requirement is to leverage an implicit
> > memory barrier. Receiving the IPI in this little window does nothing
> > (the smp_cross_call is a noop).
> >
> > Checking that mode != EXITING_GUEST_MODE is equally useless in my
> > opinion, as I read the requests code the only reason for this mode is
> > to avoid sending an IPI twice.
> >
> > Kicks sent between setting the mode and disabling the interrupts is
> > not the point, the point is to check the requests field (which we
> > don't use at all on ARM, and generic code also doesn't use on ARM)
> > after disabling interrupts, and after setting IN_GUEST_MODE.
> >
> > The patch below fixes your issues, and while I would push back on
> > anything else than direct bug fixes at this point, the current code is
> > semantically incorrect wrt. KVM vcpu requests, so it's worth a fix,
> > and the patch itself is trivial.
> >
> [...]
> 
> Actually, I take that back, the kvm_vcpu_block function does make a
> request, which we don't need to handle, so adding code that checks for
> features we don't support is useless at this point. Please ignore the
> patch I sent earlier.
> 
The archs which are not interested in unhalt request just clear it after
return from kvm_vcpu_block.

> Later on we can change some of the code to use the vcpu->features map
> if there's a real benefit, but right now the priority is to merge this
> code, so anything that's not a bugfix should not go in.
> 

Agree. Lets merge it and change later. The vcpu run loop is simple
enough at this point. The question of using vcpu->requests is not
the question of "real benefit" though, of course you can introduce your
own mechanism to pass requests to vcpus instead of using whatever kvm
provides you. But from maintenance and code share point of view this
is wrong thing to do. Looks at this code for instance:

        /* Kick out any which are still running. */
        kvm_for_each_vcpu(i, v, vcpu->kvm) {
                /* Guest could exit now, making cpu wrong. That's OK. */
                if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
                        force_vm_exit(get_cpu_mask(v->cpu));
                }
        }

Why not make_all_cpus_request(vcpu->kvm, KVM_REQ_PAUSE)?

And I am not sure KVM_REQ_UNHALT is so useless to you in the first
place. kvm_vcpu_block() can return even when vcpu is not runnable (if
signal is pending). KVM_REQ_UNHALT is the way to check for that. Hmm
this is actually looks like a BUG in the current code.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-16 12:57           ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 15, 2013 at 11:08:24PM -0500, Christoffer Dall wrote:
> On Tue, Jan 15, 2013 at 9:08 PM, Christoffer Dall
> <c.dall@virtualopensystems.com> wrote:
> > On Tue, Jan 15, 2013 at 4:43 AM, Gleb Natapov <gleb@redhat.com> wrote:
> >> On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
> >>> Provides complete world-switch implementation to switch to other guests
> >>> running in non-secure modes. Includes Hyp exception handlers that
> >>> capture necessary exception information and stores the information on
> >>> the VCPU and KVM structures.
> >>>
> >>> The following Hyp-ABI is also documented in the code:
> >>>
> >>> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
> >>>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
> >>>    exception vector code will check that the HVC comes from VMID==0 and if
> >>>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
> >>>    - r0 contains a pointer to a HYP function
> >>>    - r1, r2, and r3 contain arguments to the above function.
> >>>    - The HYP function will be called with its arguments in r0, r1 and r2.
> >>>    On HYP function return, we return directly to SVC.
> >>>
> >>> A call to a function executing in Hyp mode is performed like the following:
> >>>
> >>>         <svc code>
> >>>         ldr     r0, =BSYM(my_hyp_fn)
> >>>         ldr     r1, =my_param
> >>>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
> >>>         <svc code>
> >>>
> >>> Otherwise, the world-switch is pretty straight-forward. All state that
> >>> can be modified by the guest is first backed up on the Hyp stack and the
> >>> VCPU values is loaded onto the hardware. State, which is not loaded, but
> >>> theoretically modifiable by the guest is protected through the
> >>> virtualiation features to generate a trap and cause software emulation.
> >>> Upon guest returns, all state is restored from hardware onto the VCPU
> >>> struct and the original state is restored from the Hyp-stack onto the
> >>> hardware.
> >>>
> >>> SMP support using the VMPIDR calculated on the basis of the host MPIDR
> >>> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
> >>>
> >>> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
> >>> a separate patch into the appropriate patches introducing the
> >>> functionality. Note that the VMIDs are stored per VM as required by the ARM
> >>> architecture reference manual.
> >>>
> >>> To support VFP/NEON we trap those instructions using the HPCTR. When
> >>> we trap, we switch the FPU.  After a guest exit, the VFP state is
> >>> returned to the host.  When disabling access to floating point
> >>> instructions, we also mask FPEXC_EN in order to avoid the guest
> >>> receiving Undefined instruction exceptions before we have a chance to
> >>> switch back the floating point state.  We are reusing vfp_hard_struct,
> >>> so we depend on VFPv3 being enabled in the host kernel, if not, we still
> >>> trap cp10 and cp11 in order to inject an undefined instruction exception
> >>> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
> >>> Antionios Motakis and Rusty Russell.
> >>>
> >>> Aborts that are permission faults, and not stage-1 page table walk, do
> >>> not report the faulting address in the HPFAR.  We have to resolve the
> >>> IPA, and store it just like the HPFAR register on the VCPU struct. If
> >>> the IPA cannot be resolved, it means another CPU is playing with the
> >>> page tables, and we simply restart the guest.  This quirk was fixed by
> >>> Marc Zyngier.
> >>>
> >>> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
> >>> Signed-off-by: Rusty Russell <rusty.russell@linaro.org>
> >>> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> >>> ---
> >>>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
> >>>  arch/arm/include/asm/kvm_host.h |   10 +
> >>>  arch/arm/kernel/asm-offsets.c   |   25 ++
> >>>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
> >>>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
> >>>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
> >>>  6 files changed, 1108 insertions(+), 4 deletions(-)
> >>>  create mode 100644 arch/arm/kvm/interrupts_head.S
> >>>
> >>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> >>> index fb22ee8..a3262a2 100644
> >>> --- a/arch/arm/include/asm/kvm_arm.h
> >>> +++ b/arch/arm/include/asm/kvm_arm.h
> >>> @@ -98,6 +98,18 @@
> >>>  #define TTBCR_T0SZ   3
> >>>  #define HTCR_MASK    (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
> >>>
> >>> +/* Hyp System Trap Register */
> >>> +#define HSTR_T(x)    (1 << x)
> >>> +#define HSTR_TTEE    (1 << 16)
> >>> +#define HSTR_TJDBX   (1 << 17)
> >>> +
> >>> +/* Hyp Coprocessor Trap Register */
> >>> +#define HCPTR_TCP(x) (1 << x)
> >>> +#define HCPTR_TCP_MASK       (0x3fff)
> >>> +#define HCPTR_TASE   (1 << 15)
> >>> +#define HCPTR_TTA    (1 << 20)
> >>> +#define HCPTR_TCPAC  (1 << 31)
> >>> +
> >>>  /* Hyp Debug Configuration Register bits */
> >>>  #define HDCR_TDRA    (1 << 11)
> >>>  #define HDCR_TDOSA   (1 << 10)
> >>> @@ -144,6 +156,45 @@
> >>>  #else
> >>>  #define VTTBR_X              (5 - KVM_T0SZ)
> >>>  #endif
> >>> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
> >>> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
> >>> +#define VTTBR_VMID_SHIFT  (48LLU)
> >>> +#define VTTBR_VMID_MASK        (0xffLLU << VTTBR_VMID_SHIFT)
> >>> +
> >>> +/* Hyp Syndrome Register (HSR) bits */
> >>> +#define HSR_EC_SHIFT (26)
> >>> +#define HSR_EC               (0x3fU << HSR_EC_SHIFT)
> >>> +#define HSR_IL               (1U << 25)
> >>> +#define HSR_ISS              (HSR_IL - 1)
> >>> +#define HSR_ISV_SHIFT        (24)
> >>> +#define HSR_ISV              (1U << HSR_ISV_SHIFT)
> >>> +#define HSR_FSC              (0x3f)
> >>> +#define HSR_FSC_TYPE (0x3c)
> >>> +#define HSR_WNR              (1 << 6)
> >>> +
> >>> +#define FSC_FAULT    (0x04)
> >>> +#define FSC_PERM     (0x0c)
> >>> +
> >>> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> >>> +#define HPFAR_MASK   (~0xf)
> >>>
> >>> +#define HSR_EC_UNKNOWN       (0x00)
> >>> +#define HSR_EC_WFI   (0x01)
> >>> +#define HSR_EC_CP15_32       (0x03)
> >>> +#define HSR_EC_CP15_64       (0x04)
> >>> +#define HSR_EC_CP14_MR       (0x05)
> >>> +#define HSR_EC_CP14_LS       (0x06)
> >>> +#define HSR_EC_CP_0_13       (0x07)
> >>> +#define HSR_EC_CP10_ID       (0x08)
> >>> +#define HSR_EC_JAZELLE       (0x09)
> >>> +#define HSR_EC_BXJ   (0x0A)
> >>> +#define HSR_EC_CP14_64       (0x0C)
> >>> +#define HSR_EC_SVC_HYP       (0x11)
> >>> +#define HSR_EC_HVC   (0x12)
> >>> +#define HSR_EC_SMC   (0x13)
> >>> +#define HSR_EC_IABT  (0x20)
> >>> +#define HSR_EC_IABT_HYP      (0x21)
> >>> +#define HSR_EC_DABT  (0x24)
> >>> +#define HSR_EC_DABT_HYP      (0x25)
> >>>
> >>>  #endif /* __ARM_KVM_ARM_H__ */
> >>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >>> index 1de6f0d..ddb09da 100644
> >>> --- a/arch/arm/include/asm/kvm_host.h
> >>> +++ b/arch/arm/include/asm/kvm_host.h
> >>> @@ -21,6 +21,7 @@
> >>>
> >>>  #include <asm/kvm.h>
> >>>  #include <asm/kvm_asm.h>
> >>> +#include <asm/fpstate.h>
> >>>
> >>>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
> >>>  #define KVM_USER_MEM_SLOTS 32
> >>> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
> >>>       u32 hxfar;              /* Hyp Data/Inst Fault Address Register */
> >>>       u32 hpfar;              /* Hyp IPA Fault Address Register */
> >>>
> >>> +     /* Floating point registers (VFP and Advanced SIMD/NEON) */
> >>> +     struct vfp_hard_struct vfp_guest;
> >>> +     struct vfp_hard_struct *vfp_host;
> >>> +
> >>> +     /*
> >>> +      * Anything that is not used directly from assembly code goes
> >>> +      * here.
> >>> +      */
> >>>       /* Interrupt related fields */
> >>>       u32 irq_lines;          /* IRQ and FIQ levels */
> >>>
> >>> @@ -112,6 +121,7 @@ struct kvm_one_reg;
> >>>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> >>>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> >>>  u64 kvm_call_hyp(void *hypfn, ...);
> >>> +void force_vm_exit(const cpumask_t *mask);
> >>>
> >>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
> >>>  struct kvm;
> >>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> >>> index c985b48..c8b3272 100644
> >>> --- a/arch/arm/kernel/asm-offsets.c
> >>> +++ b/arch/arm/kernel/asm-offsets.c
> >>> @@ -13,6 +13,9 @@
> >>>  #include <linux/sched.h>
> >>>  #include <linux/mm.h>
> >>>  #include <linux/dma-mapping.h>
> >>> +#ifdef CONFIG_KVM_ARM_HOST
> >>> +#include <linux/kvm_host.h>
> >>> +#endif
> >>>  #include <asm/cacheflush.h>
> >>>  #include <asm/glue-df.h>
> >>>  #include <asm/glue-pf.h>
> >>> @@ -146,5 +149,27 @@ int main(void)
> >>>    DEFINE(DMA_BIDIRECTIONAL,  DMA_BIDIRECTIONAL);
> >>>    DEFINE(DMA_TO_DEVICE,              DMA_TO_DEVICE);
> >>>    DEFINE(DMA_FROM_DEVICE,    DMA_FROM_DEVICE);
> >>> +#ifdef CONFIG_KVM_ARM_HOST
> >>> +  DEFINE(VCPU_KVM,           offsetof(struct kvm_vcpu, kvm));
> >>> +  DEFINE(VCPU_MIDR,          offsetof(struct kvm_vcpu, arch.midr));
> >>> +  DEFINE(VCPU_CP15,          offsetof(struct kvm_vcpu, arch.cp15));
> >>> +  DEFINE(VCPU_VFP_GUEST,     offsetof(struct kvm_vcpu, arch.vfp_guest));
> >>> +  DEFINE(VCPU_VFP_HOST,              offsetof(struct kvm_vcpu, arch.vfp_host));
> >>> +  DEFINE(VCPU_REGS,          offsetof(struct kvm_vcpu, arch.regs));
> >>> +  DEFINE(VCPU_USR_REGS,              offsetof(struct kvm_vcpu, arch.regs.usr_regs));
> >>> +  DEFINE(VCPU_SVC_REGS,              offsetof(struct kvm_vcpu, arch.regs.svc_regs));
> >>> +  DEFINE(VCPU_ABT_REGS,              offsetof(struct kvm_vcpu, arch.regs.abt_regs));
> >>> +  DEFINE(VCPU_UND_REGS,              offsetof(struct kvm_vcpu, arch.regs.und_regs));
> >>> +  DEFINE(VCPU_IRQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.irq_regs));
> >>> +  DEFINE(VCPU_FIQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
> >>> +  DEFINE(VCPU_PC,            offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
> >>> +  DEFINE(VCPU_CPSR,          offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
> >>> +  DEFINE(VCPU_IRQ_LINES,     offsetof(struct kvm_vcpu, arch.irq_lines));
> >>> +  DEFINE(VCPU_HSR,           offsetof(struct kvm_vcpu, arch.hsr));
> >>> +  DEFINE(VCPU_HxFAR,         offsetof(struct kvm_vcpu, arch.hxfar));
> >>> +  DEFINE(VCPU_HPFAR,         offsetof(struct kvm_vcpu, arch.hpfar));
> >>> +  DEFINE(VCPU_HYP_PC,                offsetof(struct kvm_vcpu, arch.hyp_pc));
> >>> +  DEFINE(KVM_VTTBR,          offsetof(struct kvm, arch.vttbr));
> >>> +#endif
> >>>    return 0;
> >>>  }
> >>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>> index 9b4566e..c94d278 100644
> >>> --- a/arch/arm/kvm/arm.c
> >>> +++ b/arch/arm/kvm/arm.c
> >>> @@ -40,6 +40,7 @@
> >>>  #include <asm/kvm_arm.h>
> >>>  #include <asm/kvm_asm.h>
> >>>  #include <asm/kvm_mmu.h>
> >>> +#include <asm/kvm_emulate.h>
> >>>
> >>>  #ifdef REQUIRES_VIRT
> >>>  __asm__(".arch_extension     virt");
> >>> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> >>>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
> >>>  static unsigned long hyp_default_vectors;
> >>>
> >>> +/* The VMID used in the VTTBR */
> >>> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> >>> +static u8 kvm_next_vmid;
> >>> +static DEFINE_SPINLOCK(kvm_vmid_lock);
> >>>
> >>>  int kvm_arch_hardware_enable(void *garbage)
> >>>  {
> >>> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
> >>>
> >>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >>>  {
> >>> +     /* Force users to call KVM_ARM_VCPU_INIT */
> >>> +     vcpu->arch.target = -1;
> >>>       return 0;
> >>>  }
> >>>
> >>> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
> >>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>>  {
> >>>       vcpu->cpu = cpu;
> >>> +     vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
> >>>  }
> >>>
> >>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >>> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> >>>
> >>>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
> >> As far as I see the function is unused.
> >>
> >>>  {
> >>> +     return v->mode == IN_GUEST_MODE;
> >>> +}
> >>> +
> >>> +/* Just ensure a guest exit from a particular CPU */
> >>> +static void exit_vm_noop(void *info)
> >>> +{
> >>> +}
> >>> +
> >>> +void force_vm_exit(const cpumask_t *mask)
> >>> +{
> >>> +     smp_call_function_many(mask, exit_vm_noop, NULL, true);
> >>> +}
> >> There is make_all_cpus_request() for that. It actually sends IPIs only
> >> to cpus that are running vcpus.
> >>
> >>> +
> >>> +/**
> >>> + * need_new_vmid_gen - check that the VMID is still valid
> >>> + * @kvm: The VM's VMID to checkt
> >>> + *
> >>> + * return true if there is a new generation of VMIDs being used
> >>> + *
> >>> + * The hardware supports only 256 values with the value zero reserved for the
> >>> + * host, so we check if an assigned value belongs to a previous generation,
> >>> + * which which requires us to assign a new value. If we're the first to use a
> >>> + * VMID for the new generation, we must flush necessary caches and TLBs on all
> >>> + * CPUs.
> >>> + */
> >>> +static bool need_new_vmid_gen(struct kvm *kvm)
> >>> +{
> >>> +     return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
> >>> +}
> >>> +
> >>> +/**
> >>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
> >>> + * @kvm      The guest that we are about to run
> >>> + *
> >>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
> >>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
> >>> + * caches and TLBs.
> >>> + */
> >>> +static void update_vttbr(struct kvm *kvm)
> >>> +{
> >>> +     phys_addr_t pgd_phys;
> >>> +     u64 vmid;
> >>> +
> >>> +     if (!need_new_vmid_gen(kvm))
> >>> +             return;
> >>> +
> >>> +     spin_lock(&kvm_vmid_lock);
> >>> +
> >>> +     /*
> >>> +      * We need to re-check the vmid_gen here to ensure that if another vcpu
> >>> +      * already allocated a valid vmid for this vm, then this vcpu should
> >>> +      * use the same vmid.
> >>> +      */
> >>> +     if (!need_new_vmid_gen(kvm)) {
> >>> +             spin_unlock(&kvm_vmid_lock);
> >>> +             return;
> >>> +     }
> >>> +
> >>> +     /* First user of a new VMID generation? */
> >>> +     if (unlikely(kvm_next_vmid == 0)) {
> >>> +             atomic64_inc(&kvm_vmid_gen);
> >>> +             kvm_next_vmid = 1;
> >>> +
> >>> +             /*
> >>> +              * On SMP we know no other CPUs can use this CPU's or each
> >>> +              * other's VMID after force_vm_exit returns since the
> >>> +              * kvm_vmid_lock blocks them from reentry to the guest.
> >>> +              */
> >>> +             force_vm_exit(cpu_all_mask);
> >>> +             /*
> >>> +              * Now broadcast TLB + ICACHE invalidation over the inner
> >>> +              * shareable domain to make sure all data structures are
> >>> +              * clean.
> >>> +              */
> >>> +             kvm_call_hyp(__kvm_flush_vm_context);
> >>> +     }
> >>> +
> >>> +     kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
> >>> +     kvm->arch.vmid = kvm_next_vmid;
> >>> +     kvm_next_vmid++;
> >>> +
> >>> +     /* update vttbr to be used with the new vmid */
> >>> +     pgd_phys = virt_to_phys(kvm->arch.pgd);
> >>> +     vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
> >>> +     kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
> >>> +     kvm->arch.vttbr |= vmid;
> >>> +
> >>> +     spin_unlock(&kvm_vmid_lock);
> >>> +}
> >>> +
> >>> +/*
> >>> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
> >>> + * proper exit to QEMU.
> >>> + */
> >>> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >>> +                    int exception_index)
> >>> +{
> >>> +     run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> >>>       return 0;
> >>>  }
> >>>
> >>> +/**
> >>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
> >>> + * @vcpu:    The VCPU pointer
> >>> + * @run:     The kvm_run structure pointer used for userspace state exchange
> >>> + *
> >>> + * This function is called through the VCPU_RUN ioctl called from user space. It
> >>> + * will execute VM code in a loop until the time slice for the process is used
> >>> + * or some emulation is needed from user space in which case the function will
> >>> + * return with return value 0 and with the kvm_run structure filled in with the
> >>> + * required data for the requested emulation.
> >>> + */
> >>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  {
> >>> -     return -EINVAL;
> >>> +     int ret;
> >>> +     sigset_t sigsaved;
> >>> +
> >>> +     /* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
> >>> +     if (unlikely(vcpu->arch.target < 0))
> >>> +             return -ENOEXEC;
> >>> +
> >>> +     if (vcpu->sigset_active)
> >>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
> >>> +
> >>> +     ret = 1;
> >>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
> >>> +     while (ret > 0) {
> >>> +             /*
> >>> +              * Check conditions before entering the guest
> >>> +              */
> >>> +             cond_resched();
> >>> +
> >>> +             update_vttbr(vcpu->kvm);
> >>> +
> >>> +             local_irq_disable();
> >>> +
> >>> +             /*
> >>> +              * Re-check atomic conditions
> >>> +              */
> >>> +             if (signal_pending(current)) {
> >>> +                     ret = -EINTR;
> >>> +                     run->exit_reason = KVM_EXIT_INTR;
> >>> +             }
> >>> +
> >>> +             if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >>> +                     local_irq_enable();
> >>> +                     continue;
> >>> +             }
> >>> +
> >>> +             /**************************************************************
> >>> +              * Enter the guest
> >>> +              */
> >>> +             trace_kvm_entry(*vcpu_pc(vcpu));
> >>> +             kvm_guest_enter();
> >>> +             vcpu->mode = IN_GUEST_MODE;
> >> You need to set mode to IN_GUEST_MODE before disabling interrupt and
> >> check that mode != EXITING_GUEST_MODE after disabling interrupt but
> >> before entering the guest. This way you will catch kicks that were sent
> >> between setting of the mode and disabling the interrupts. Also you need
> >> to check vcpu->requests and exit if it is not empty. I see that you do
> >> not use vcpu->requests at all, but you should since common kvm code
> >> assumes that it is used. make_all_cpus_request() uses it for instance.
> >>
> >
> > I don't quite agree, but almost:
> >
> > Why would you set IN_GUEST_MODE before disabling interrupts? The only
> > reason I can see for to be a requirement is to leverage an implicit
> > memory barrier. Receiving the IPI in this little window does nothing
> > (the smp_cross_call is a noop).
> >
> > Checking that mode != EXITING_GUEST_MODE is equally useless in my
> > opinion, as I read the requests code the only reason for this mode is
> > to avoid sending an IPI twice.
> >
> > Kicks sent between setting the mode and disabling the interrupts is
> > not the point, the point is to check the requests field (which we
> > don't use at all on ARM, and generic code also doesn't use on ARM)
> > after disabling interrupts, and after setting IN_GUEST_MODE.
> >
> > The patch below fixes your issues, and while I would push back on
> > anything else than direct bug fixes at this point, the current code is
> > semantically incorrect wrt. KVM vcpu requests, so it's worth a fix,
> > and the patch itself is trivial.
> >
> [...]
> 
> Actually, I take that back, the kvm_vcpu_block function does make a
> request, which we don't need to handle, so adding code that checks for
> features we don't support is useless at this point. Please ignore the
> patch I sent earlier.
> 
The archs which are not interested in unhalt request just clear it after
return from kvm_vcpu_block.

> Later on we can change some of the code to use the vcpu->features map
> if there's a real benefit, but right now the priority is to merge this
> code, so anything that's not a bugfix should not go in.
> 

Agree. Lets merge it and change later. The vcpu run loop is simple
enough at this point. The question of using vcpu->requests is not
the question of "real benefit" though, of course you can introduce your
own mechanism to pass requests to vcpus instead of using whatever kvm
provides you. But from maintenance and code share point of view this
is wrong thing to do. Looks at this code for instance:

        /* Kick out any which are still running. */
        kvm_for_each_vcpu(i, v, vcpu->kvm) {
                /* Guest could exit now, making cpu wrong. That's OK. */
                if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
                        force_vm_exit(get_cpu_mask(v->cpu));
                }
        }

Why not make_all_cpus_request(vcpu->kvm, KVM_REQ_PAUSE)?

And I am not sure KVM_REQ_UNHALT is so useless to you in the first
place. kvm_vcpu_block() can return even when vcpu is not runnable (if
signal is pending). KVM_REQ_UNHALT is the way to check for that. Hmm
this is actually looks like a BUG in the current code.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-16 12:12         ` Gleb Natapov
@ 2013-01-16 13:14           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-16 13:14 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christoffer Dall, Rusty Russell, kvm, Marc Zyngier,
	Marcelo Tosatti, nicolas, Antonios Motakis, kvmarm,
	linux-arm-kernel

If you're going to bother commenting on a big long email, please
_CHOP OUT_ content which is not relevant to your reply.  I paged down 5
pages, hit end, paged up, found no comment from you, so I'm not going to
bother reading anything further in this message.

Help your readers to read your email.  Don't expect them to search a
1600-line email message for a one line reply.

(This has been said many times over the history of the Internet.  There's
etiquette documents on Internet mail stating it too.  Please, comply
with it or you will find people will ignore your messages.)

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-16 13:14           ` Russell King - ARM Linux
  0 siblings, 0 replies; 160+ messages in thread
From: Russell King - ARM Linux @ 2013-01-16 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

If you're going to bother commenting on a big long email, please
_CHOP OUT_ content which is not relevant to your reply.  I paged down 5
pages, hit end, paged up, found no comment from you, so I'm not going to
bother reading anything further in this message.

Help your readers to read your email.  Don't expect them to search a
1600-line email message for a one line reply.

(This has been said many times over the history of the Internet.  There's
etiquette documents on Internet mail stating it too.  Please, comply
with it or you will find people will ignore your messages.)

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-16 12:57           ` Gleb Natapov
@ 2013-01-16 15:40             ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-16 15:40 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Antonios Motakis,
	Marcelo Tosatti, Rusty Russell, nicolas

[...]

>>
>
> Agree. Lets merge it and change later. The vcpu run loop is simple
> enough at this point. The question of using vcpu->requests is not
> the question of "real benefit" though, of course you can introduce your
> own mechanism to pass requests to vcpus instead of using whatever kvm
> provides you. But from maintenance and code share point of view this
> is wrong thing to do. Looks at this code for instance:
>
>         /* Kick out any which are still running. */
>         kvm_for_each_vcpu(i, v, vcpu->kvm) {
>                 /* Guest could exit now, making cpu wrong. That's OK. */
>                 if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
>                         force_vm_exit(get_cpu_mask(v->cpu));
>                 }
>         }
>
> Why not make_all_cpus_request(vcpu->kvm, KVM_REQ_PAUSE)?

well for one, make_all_cpus_request is a static function in kvm_main.c
and the semantics of that one is really tricky with respect to locking
and requires (imho) a much clearer explanation with commenting (see
separate e-mail to kvm list). And now is not the time to do this.

>
> And I am not sure KVM_REQ_UNHALT is so useless to you in the first
> place. kvm_vcpu_block() can return even when vcpu is not runnable (if
> signal is pending). KVM_REQ_UNHALT is the way to check for that. Hmm
> this is actually looks like a BUG in the current code.
>
there's no guarantee that you won't be woken up from a WFI instruction
for spurious interrupts on ARM, so we don't care about this, we simply
return to the guest, and it must go back to sleep if that's what it
wants to do.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-16 15:40             ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-16 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

[...]

>>
>
> Agree. Lets merge it and change later. The vcpu run loop is simple
> enough at this point. The question of using vcpu->requests is not
> the question of "real benefit" though, of course you can introduce your
> own mechanism to pass requests to vcpus instead of using whatever kvm
> provides you. But from maintenance and code share point of view this
> is wrong thing to do. Looks at this code for instance:
>
>         /* Kick out any which are still running. */
>         kvm_for_each_vcpu(i, v, vcpu->kvm) {
>                 /* Guest could exit now, making cpu wrong. That's OK. */
>                 if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
>                         force_vm_exit(get_cpu_mask(v->cpu));
>                 }
>         }
>
> Why not make_all_cpus_request(vcpu->kvm, KVM_REQ_PAUSE)?

well for one, make_all_cpus_request is a static function in kvm_main.c
and the semantics of that one is really tricky with respect to locking
and requires (imho) a much clearer explanation with commenting (see
separate e-mail to kvm list). And now is not the time to do this.

>
> And I am not sure KVM_REQ_UNHALT is so useless to you in the first
> place. kvm_vcpu_block() can return even when vcpu is not runnable (if
> signal is pending). KVM_REQ_UNHALT is the way to check for that. Hmm
> this is actually looks like a BUG in the current code.
>
there's no guarantee that you won't be woken up from a WFI instruction
for spurious interrupts on ARM, so we don't care about this, we simply
return to the guest, and it must go back to sleep if that's what it
wants to do.

-Christoffer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-16 12:12         ` Gleb Natapov
@ 2013-01-16 15:42           ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-16 15:42 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Antonios Motakis,
	Marcelo Tosatti, Rusty Russell, nicolas

[...]

>
>> read side RCU protects against is the memslots data structure as far
>> as I can see, so the second patch pasted below fixes this for the code
>> that actually accesses this data structure.
> Many memory related functions that you call access memslots under the
> hood and assume that locking is done by the caller. From the quick look
> I found those that you've missed:
> kvm_is_visible_gfn()
> kvm_read_guest()
> gfn_to_hva()
> gfn_to_pfn_prot()
> kvm_memslots()
>
> May be there are more. Can you enable RCU debugging in your kernel config
> and check? This does not guaranty that it will catch all of the places,
> but better than nothing.
>

yeah, I missed the call to is_visible_gfn and friends, this fixes it:

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index c806080..f30e131 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -591,7 +591,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
struct kvm_run *run)
 	struct kvm_memory_slot *memslot;
 	bool is_iabt;
 	gfn_t gfn;
-	int ret;
+	int ret, idx;

 	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
 	is_iabt = (hsr_ec == HSR_EC_IABT);
@@ -608,33 +608,43 @@ int kvm_handle_guest_abort(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 		return -EFAULT;
 	}

+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+
 	gfn = fault_ipa >> PAGE_SHIFT;
 	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
 		if (is_iabt) {
 			/* Prefetch Abort on I/O address */
 			kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
-			return 1;
+			ret = 1;
+			goto out_unlock;
 		}

 		if (fault_status != FSC_FAULT) {
 			kvm_err("Unsupported fault status on io memory: %#lx\n",
 				fault_status);
-			return -EFAULT;
+			ret = -EFAULT;
+			goto out_unlock;
 		}

 		/* Adjust page offset */
 		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
-		return io_mem_abort(vcpu, run, fault_ipa);
+		ret = io_mem_abort(vcpu, run, fault_ipa);
+		goto out_unlock;
 	}

 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	if (!memslot->user_alloc) {
 		kvm_err("non user-alloc memslots not supported\n");
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out_unlock;
 	}

 	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
-	return ret ? ret : 1;
+	if (ret == 0)
+		ret = 1;
+out_unlock:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+	return ret;
 }

 static void handle_hva_to_gpa(struct kvm *kvm,
--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-16 15:42           ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-16 15:42 UTC (permalink / raw)
  To: linux-arm-kernel

[...]

>
>> read side RCU protects against is the memslots data structure as far
>> as I can see, so the second patch pasted below fixes this for the code
>> that actually accesses this data structure.
> Many memory related functions that you call access memslots under the
> hood and assume that locking is done by the caller. From the quick look
> I found those that you've missed:
> kvm_is_visible_gfn()
> kvm_read_guest()
> gfn_to_hva()
> gfn_to_pfn_prot()
> kvm_memslots()
>
> May be there are more. Can you enable RCU debugging in your kernel config
> and check? This does not guaranty that it will catch all of the places,
> but better than nothing.
>

yeah, I missed the call to is_visible_gfn and friends, this fixes it:

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index c806080..f30e131 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -591,7 +591,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
struct kvm_run *run)
 	struct kvm_memory_slot *memslot;
 	bool is_iabt;
 	gfn_t gfn;
-	int ret;
+	int ret, idx;

 	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
 	is_iabt = (hsr_ec == HSR_EC_IABT);
@@ -608,33 +608,43 @@ int kvm_handle_guest_abort(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 		return -EFAULT;
 	}

+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+
 	gfn = fault_ipa >> PAGE_SHIFT;
 	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
 		if (is_iabt) {
 			/* Prefetch Abort on I/O address */
 			kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
-			return 1;
+			ret = 1;
+			goto out_unlock;
 		}

 		if (fault_status != FSC_FAULT) {
 			kvm_err("Unsupported fault status on io memory: %#lx\n",
 				fault_status);
-			return -EFAULT;
+			ret = -EFAULT;
+			goto out_unlock;
 		}

 		/* Adjust page offset */
 		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
-		return io_mem_abort(vcpu, run, fault_ipa);
+		ret = io_mem_abort(vcpu, run, fault_ipa);
+		goto out_unlock;
 	}

 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	if (!memslot->user_alloc) {
 		kvm_err("non user-alloc memslots not supported\n");
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out_unlock;
 	}

 	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
-	return ret ? ret : 1;
+	if (ret == 0)
+		ret = 1;
+out_unlock:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+	return ret;
 }

 static void handle_hva_to_gpa(struct kvm *kvm,
--

Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-16 15:42           ` Christoffer Dall
@ 2013-01-16 15:52             ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 15:52 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Antonios Motakis,
	Marcelo Tosatti, Rusty Russell, nicolas

On Wed, Jan 16, 2013 at 10:42:02AM -0500, Christoffer Dall wrote:
> [...]
> 
> >
> >> read side RCU protects against is the memslots data structure as far
> >> as I can see, so the second patch pasted below fixes this for the code
> >> that actually accesses this data structure.
> > Many memory related functions that you call access memslots under the
> > hood and assume that locking is done by the caller. From the quick look
> > I found those that you've missed:
> > kvm_is_visible_gfn()
> > kvm_read_guest()
> > gfn_to_hva()
> > gfn_to_pfn_prot()
> > kvm_memslots()
> >
> > May be there are more. Can you enable RCU debugging in your kernel config
> > and check? This does not guaranty that it will catch all of the places,
> > but better than nothing.
> >
> 
> yeah, I missed the call to is_visible_gfn and friends, this fixes it:
> 
Thank you. One more kvm_read_guest() in emulate.c.

> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index c806080..f30e131 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -591,7 +591,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>  	struct kvm_memory_slot *memslot;
>  	bool is_iabt;
>  	gfn_t gfn;
> -	int ret;
> +	int ret, idx;
> 
>  	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
>  	is_iabt = (hsr_ec == HSR_EC_IABT);
> @@ -608,33 +608,43 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
>  		return -EFAULT;
>  	}
> 
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> +
>  	gfn = fault_ipa >> PAGE_SHIFT;
>  	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
>  		if (is_iabt) {
>  			/* Prefetch Abort on I/O address */
>  			kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
> -			return 1;
> +			ret = 1;
> +			goto out_unlock;
>  		}
> 
>  		if (fault_status != FSC_FAULT) {
>  			kvm_err("Unsupported fault status on io memory: %#lx\n",
>  				fault_status);
> -			return -EFAULT;
> +			ret = -EFAULT;
> +			goto out_unlock;
>  		}
> 
>  		/* Adjust page offset */
>  		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
> -		return io_mem_abort(vcpu, run, fault_ipa);
> +		ret = io_mem_abort(vcpu, run, fault_ipa);
> +		goto out_unlock;
>  	}
> 
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
>  	if (!memslot->user_alloc) {
>  		kvm_err("non user-alloc memslots not supported\n");
> -		return -EINVAL;
> +		ret = -EINVAL;
> +		goto out_unlock;
>  	}
> 
>  	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
> -	return ret ? ret : 1;
> +	if (ret == 0)
> +		ret = 1;
> +out_unlock:
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +	return ret;
>  }
> 
>  static void handle_hva_to_gpa(struct kvm *kvm,
> --
> 
> Thanks,
> -Christoffer

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-16 15:52             ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 16, 2013 at 10:42:02AM -0500, Christoffer Dall wrote:
> [...]
> 
> >
> >> read side RCU protects against is the memslots data structure as far
> >> as I can see, so the second patch pasted below fixes this for the code
> >> that actually accesses this data structure.
> > Many memory related functions that you call access memslots under the
> > hood and assume that locking is done by the caller. From the quick look
> > I found those that you've missed:
> > kvm_is_visible_gfn()
> > kvm_read_guest()
> > gfn_to_hva()
> > gfn_to_pfn_prot()
> > kvm_memslots()
> >
> > May be there are more. Can you enable RCU debugging in your kernel config
> > and check? This does not guaranty that it will catch all of the places,
> > but better than nothing.
> >
> 
> yeah, I missed the call to is_visible_gfn and friends, this fixes it:
> 
Thank you. One more kvm_read_guest() in emulate.c.

> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index c806080..f30e131 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -591,7 +591,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>  	struct kvm_memory_slot *memslot;
>  	bool is_iabt;
>  	gfn_t gfn;
> -	int ret;
> +	int ret, idx;
> 
>  	hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
>  	is_iabt = (hsr_ec == HSR_EC_IABT);
> @@ -608,33 +608,43 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
>  		return -EFAULT;
>  	}
> 
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> +
>  	gfn = fault_ipa >> PAGE_SHIFT;
>  	if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
>  		if (is_iabt) {
>  			/* Prefetch Abort on I/O address */
>  			kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
> -			return 1;
> +			ret = 1;
> +			goto out_unlock;
>  		}
> 
>  		if (fault_status != FSC_FAULT) {
>  			kvm_err("Unsupported fault status on io memory: %#lx\n",
>  				fault_status);
> -			return -EFAULT;
> +			ret = -EFAULT;
> +			goto out_unlock;
>  		}
> 
>  		/* Adjust page offset */
>  		fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
> -		return io_mem_abort(vcpu, run, fault_ipa);
> +		ret = io_mem_abort(vcpu, run, fault_ipa);
> +		goto out_unlock;
>  	}
> 
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
>  	if (!memslot->user_alloc) {
>  		kvm_err("non user-alloc memslots not supported\n");
> -		return -EINVAL;
> +		ret = -EINVAL;
> +		goto out_unlock;
>  	}
> 
>  	ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
> -	return ret ? ret : 1;
> +	if (ret == 0)
> +		ret = 1;
> +out_unlock:
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +	return ret;
>  }
> 
>  static void handle_hva_to_gpa(struct kvm *kvm,
> --
> 
> Thanks,
> -Christoffer

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-16 15:52             ` Gleb Natapov
@ 2013-01-16 16:17               ` Christoffer Dall
  -1 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-16 16:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Antonios Motakis,
	Marcelo Tosatti, Rusty Russell, nicolas

On Wed, Jan 16, 2013 at 10:52 AM, Gleb Natapov <gleb@redhat.com> wrote:
> On Wed, Jan 16, 2013 at 10:42:02AM -0500, Christoffer Dall wrote:
>> [...]
>>
>> >
>> >> read side RCU protects against is the memslots data structure as far
>> >> as I can see, so the second patch pasted below fixes this for the code
>> >> that actually accesses this data structure.
>> > Many memory related functions that you call access memslots under the
>> > hood and assume that locking is done by the caller. From the quick look
>> > I found those that you've missed:
>> > kvm_is_visible_gfn()
>> > kvm_read_guest()
>> > gfn_to_hva()
>> > gfn_to_pfn_prot()
>> > kvm_memslots()
>> >
>> > May be there are more. Can you enable RCU debugging in your kernel config
>> > and check? This does not guaranty that it will catch all of the places,
>> > but better than nothing.
>> >
>>
>> yeah, I missed the call to is_visible_gfn and friends, this fixes it:
>>
> Thank you. One more kvm_read_guest() in emulate.c.
>

this one is going out for now (see the i/o discussion).

>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index c806080..f30e131 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -591,7 +591,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
>> struct kvm_run *run)
>>       struct kvm_memory_slot *memslot;
>>       bool is_iabt;
>>       gfn_t gfn;
>> -     int ret;
>> +     int ret, idx;
>>
>>       hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
>>       is_iabt = (hsr_ec == HSR_EC_IABT);
>> @@ -608,33 +608,43 @@ int kvm_handle_guest_abort(struct kvm_vcpu
>> *vcpu, struct kvm_run *run)
>>               return -EFAULT;
>>       }
>>
>> +     idx = srcu_read_lock(&vcpu->kvm->srcu);
>> +
>>       gfn = fault_ipa >> PAGE_SHIFT;
>>       if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
>>               if (is_iabt) {
>>                       /* Prefetch Abort on I/O address */
>>                       kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
>> -                     return 1;
>> +                     ret = 1;
>> +                     goto out_unlock;
>>               }
>>
>>               if (fault_status != FSC_FAULT) {
>>                       kvm_err("Unsupported fault status on io memory: %#lx\n",
>>                               fault_status);
>> -                     return -EFAULT;
>> +                     ret = -EFAULT;
>> +                     goto out_unlock;
>>               }
>>
>>               /* Adjust page offset */
>>               fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
>> -             return io_mem_abort(vcpu, run, fault_ipa);
>> +             ret = io_mem_abort(vcpu, run, fault_ipa);
>> +             goto out_unlock;
>>       }
>>
>>       memslot = gfn_to_memslot(vcpu->kvm, gfn);
>>       if (!memslot->user_alloc) {
>>               kvm_err("non user-alloc memslots not supported\n");
>> -             return -EINVAL;
>> +             ret = -EINVAL;
>> +             goto out_unlock;
>>       }
>>
>>       ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
>> -     return ret ? ret : 1;
>> +     if (ret == 0)
>> +             ret = 1;
>> +out_unlock:
>> +     srcu_read_unlock(&vcpu->kvm->srcu, idx);
>> +     return ret;
>>  }
>>
>>  static void handle_hva_to_gpa(struct kvm *kvm,
>> --
>>
>> Thanks,
>> -Christoffer
>
> --
>                         Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-16 16:17               ` Christoffer Dall
  0 siblings, 0 replies; 160+ messages in thread
From: Christoffer Dall @ 2013-01-16 16:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 16, 2013 at 10:52 AM, Gleb Natapov <gleb@redhat.com> wrote:
> On Wed, Jan 16, 2013 at 10:42:02AM -0500, Christoffer Dall wrote:
>> [...]
>>
>> >
>> >> read side RCU protects against is the memslots data structure as far
>> >> as I can see, so the second patch pasted below fixes this for the code
>> >> that actually accesses this data structure.
>> > Many memory related functions that you call access memslots under the
>> > hood and assume that locking is done by the caller. From the quick look
>> > I found those that you've missed:
>> > kvm_is_visible_gfn()
>> > kvm_read_guest()
>> > gfn_to_hva()
>> > gfn_to_pfn_prot()
>> > kvm_memslots()
>> >
>> > May be there are more. Can you enable RCU debugging in your kernel config
>> > and check? This does not guaranty that it will catch all of the places,
>> > but better than nothing.
>> >
>>
>> yeah, I missed the call to is_visible_gfn and friends, this fixes it:
>>
> Thank you. One more kvm_read_guest() in emulate.c.
>

this one is going out for now (see the i/o discussion).

>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index c806080..f30e131 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -591,7 +591,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu,
>> struct kvm_run *run)
>>       struct kvm_memory_slot *memslot;
>>       bool is_iabt;
>>       gfn_t gfn;
>> -     int ret;
>> +     int ret, idx;
>>
>>       hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT;
>>       is_iabt = (hsr_ec == HSR_EC_IABT);
>> @@ -608,33 +608,43 @@ int kvm_handle_guest_abort(struct kvm_vcpu
>> *vcpu, struct kvm_run *run)
>>               return -EFAULT;
>>       }
>>
>> +     idx = srcu_read_lock(&vcpu->kvm->srcu);
>> +
>>       gfn = fault_ipa >> PAGE_SHIFT;
>>       if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
>>               if (is_iabt) {
>>                       /* Prefetch Abort on I/O address */
>>                       kvm_inject_pabt(vcpu, vcpu->arch.hxfar);
>> -                     return 1;
>> +                     ret = 1;
>> +                     goto out_unlock;
>>               }
>>
>>               if (fault_status != FSC_FAULT) {
>>                       kvm_err("Unsupported fault status on io memory: %#lx\n",
>>                               fault_status);
>> -                     return -EFAULT;
>> +                     ret = -EFAULT;
>> +                     goto out_unlock;
>>               }
>>
>>               /* Adjust page offset */
>>               fault_ipa |= vcpu->arch.hxfar & ~PAGE_MASK;
>> -             return io_mem_abort(vcpu, run, fault_ipa);
>> +             ret = io_mem_abort(vcpu, run, fault_ipa);
>> +             goto out_unlock;
>>       }
>>
>>       memslot = gfn_to_memslot(vcpu->kvm, gfn);
>>       if (!memslot->user_alloc) {
>>               kvm_err("non user-alloc memslots not supported\n");
>> -             return -EINVAL;
>> +             ret = -EINVAL;
>> +             goto out_unlock;
>>       }
>>
>>       ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status);
>> -     return ret ? ret : 1;
>> +     if (ret == 0)
>> +             ret = 1;
>> +out_unlock:
>> +     srcu_read_unlock(&vcpu->kvm->srcu, idx);
>> +     return ret;
>>  }
>>
>>  static void handle_hva_to_gpa(struct kvm *kvm,
>> --
>>
>> Thanks,
>> -Christoffer
>
> --
>                         Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-16 15:40             ` Christoffer Dall
@ 2013-01-16 16:17               ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 16:17 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Antonios Motakis,
	Marcelo Tosatti, Rusty Russell, nicolas

On Wed, Jan 16, 2013 at 10:40:37AM -0500, Christoffer Dall wrote:
> [...]
> 
> >>
> >
> > Agree. Lets merge it and change later. The vcpu run loop is simple
> > enough at this point. The question of using vcpu->requests is not
> > the question of "real benefit" though, of course you can introduce your
> > own mechanism to pass requests to vcpus instead of using whatever kvm
> > provides you. But from maintenance and code share point of view this
> > is wrong thing to do. Looks at this code for instance:
> >
> >         /* Kick out any which are still running. */
> >         kvm_for_each_vcpu(i, v, vcpu->kvm) {
> >                 /* Guest could exit now, making cpu wrong. That's OK. */
> >                 if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
> >                         force_vm_exit(get_cpu_mask(v->cpu));
> >                 }
> >         }
> >
> > Why not make_all_cpus_request(vcpu->kvm, KVM_REQ_PAUSE)?
> 
> well for one, make_all_cpus_request is a static function in kvm_main.c
> and the semantics of that one is really tricky with respect to locking
> and requires (imho) a much clearer explanation with commenting (see
> separate e-mail to kvm list). And now is not the time to do this.
> 
All current users add exported function that calls
make_all_cpus_request(). But this is very valid question why just not
export it directly. Patch is welcome :)

> >
> > And I am not sure KVM_REQ_UNHALT is so useless to you in the first
> > place. kvm_vcpu_block() can return even when vcpu is not runnable (if
> > signal is pending). KVM_REQ_UNHALT is the way to check for that. Hmm
> > this is actually looks like a BUG in the current code.
> >
> there's no guarantee that you won't be woken up from a WFI instruction
> for spurious interrupts on ARM, so we don't care about this, we simply
> return to the guest, and it must go back to sleep if that's what it
> wants to do.
> 
If guest can handle it then we can ignore it (at least for now), but still
it's strange that signal unhalts vcpu.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-16 16:17               ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 16:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 16, 2013 at 10:40:37AM -0500, Christoffer Dall wrote:
> [...]
> 
> >>
> >
> > Agree. Lets merge it and change later. The vcpu run loop is simple
> > enough at this point. The question of using vcpu->requests is not
> > the question of "real benefit" though, of course you can introduce your
> > own mechanism to pass requests to vcpus instead of using whatever kvm
> > provides you. But from maintenance and code share point of view this
> > is wrong thing to do. Looks at this code for instance:
> >
> >         /* Kick out any which are still running. */
> >         kvm_for_each_vcpu(i, v, vcpu->kvm) {
> >                 /* Guest could exit now, making cpu wrong. That's OK. */
> >                 if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
> >                         force_vm_exit(get_cpu_mask(v->cpu));
> >                 }
> >         }
> >
> > Why not make_all_cpus_request(vcpu->kvm, KVM_REQ_PAUSE)?
> 
> well for one, make_all_cpus_request is a static function in kvm_main.c
> and the semantics of that one is really tricky with respect to locking
> and requires (imho) a much clearer explanation with commenting (see
> separate e-mail to kvm list). And now is not the time to do this.
> 
All current users add exported function that calls
make_all_cpus_request(). But this is very valid question why just not
export it directly. Patch is welcome :)

> >
> > And I am not sure KVM_REQ_UNHALT is so useless to you in the first
> > place. kvm_vcpu_block() can return even when vcpu is not runnable (if
> > signal is pending). KVM_REQ_UNHALT is the way to check for that. Hmm
> > this is actually looks like a BUG in the current code.
> >
> there's no guarantee that you won't be woken up from a WFI instruction
> for spurious interrupts on ARM, so we don't care about this, we simply
> return to the guest, and it must go back to sleep if that's what it
> wants to do.
> 
If guest can handle it then we can ignore it (at least for now), but still
it's strange that signal unhalts vcpu.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 07/14] KVM: ARM: World-switch implementation
  2013-01-16 16:17               ` Christoffer Dall
@ 2013-01-16 16:21                 ` Gleb Natapov
  -1 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 16:21 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-arm-kernel, kvmarm, Marc Zyngier, Antonios Motakis,
	Marcelo Tosatti, Rusty Russell, nicolas

On Wed, Jan 16, 2013 at 11:17:06AM -0500, Christoffer Dall wrote:
> On Wed, Jan 16, 2013 at 10:52 AM, Gleb Natapov <gleb@redhat.com> wrote:
> > On Wed, Jan 16, 2013 at 10:42:02AM -0500, Christoffer Dall wrote:
> >> [...]
> >>
> >> >
> >> >> read side RCU protects against is the memslots data structure as far
> >> >> as I can see, so the second patch pasted below fixes this for the code
> >> >> that actually accesses this data structure.
> >> > Many memory related functions that you call access memslots under the
> >> > hood and assume that locking is done by the caller. From the quick look
> >> > I found those that you've missed:
> >> > kvm_is_visible_gfn()
> >> > kvm_read_guest()
> >> > gfn_to_hva()
> >> > gfn_to_pfn_prot()
> >> > kvm_memslots()
> >> >
> >> > May be there are more. Can you enable RCU debugging in your kernel config
> >> > and check? This does not guaranty that it will catch all of the places,
> >> > but better than nothing.
> >> >
> >>
> >> yeah, I missed the call to is_visible_gfn and friends, this fixes it:
> >>
> > Thank you. One more kvm_read_guest() in emulate.c.
> >
> 
> this one is going out for now (see the i/o discussion).
> 
I thought there wasn't resolution yet. Guess I missed something. If
kvm_read_guest() is removed from emulator then the patch looks good to me.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 07/14] KVM: ARM: World-switch implementation
@ 2013-01-16 16:21                 ` Gleb Natapov
  0 siblings, 0 replies; 160+ messages in thread
From: Gleb Natapov @ 2013-01-16 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 16, 2013 at 11:17:06AM -0500, Christoffer Dall wrote:
> On Wed, Jan 16, 2013 at 10:52 AM, Gleb Natapov <gleb@redhat.com> wrote:
> > On Wed, Jan 16, 2013 at 10:42:02AM -0500, Christoffer Dall wrote:
> >> [...]
> >>
> >> >
> >> >> read side RCU protects against is the memslots data structure as far
> >> >> as I can see, so the second patch pasted below fixes this for the code
> >> >> that actually accesses this data structure.
> >> > Many memory related functions that you call access memslots under the
> >> > hood and assume that locking is done by the caller. From the quick look
> >> > I found those that you've missed:
> >> > kvm_is_visible_gfn()
> >> > kvm_read_guest()
> >> > gfn_to_hva()
> >> > gfn_to_pfn_prot()
> >> > kvm_memslots()
> >> >
> >> > May be there are more. Can you enable RCU debugging in your kernel config
> >> > and check? This does not guaranty that it will catch all of the places,
> >> > but better than nothing.
> >> >
> >>
> >> yeah, I missed the call to is_visible_gfn and friends, this fixes it:
> >>
> > Thank you. One more kvm_read_guest() in emulate.c.
> >
> 
> this one is going out for now (see the i/o discussion).
> 
I thought there wasn't resolution yet. Guess I missed something. If
kvm_read_guest() is removed from emulator then the patch looks good to me.

--
			Gleb.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
  2013-01-16  9:44           ` Russell King - ARM Linux
@ 2013-01-17  2:11             ` Rusty Russell
  -1 siblings, 0 replies; 160+ messages in thread
From: Rusty Russell @ 2013-01-17  2:11 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Christoffer Dall, kvm, Marc Zyngier, Marcelo Tosatti,
	Rusty Russell, kvmarm, linux-arm-kernel

Russell King - ARM Linux <linux@arm.linux.org.uk> writes:
> On Wed, Jan 16, 2013 at 01:26:01PM +1030, Rusty Russell wrote:
>> Christoffer Dall <c.dall@virtualopensystems.com> writes:
>> 
>> > On Mon, Jan 14, 2013 at 11:24 AM, Russell King - ARM Linux
>> > <linux@arm.linux.org.uk> wrote:
>> >> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>> >>> +     /* -ENOENT for unknown features, -EINVAL for invalid combinations. */
>> >>> +     for (i = 0; i < sizeof(init->features)*8; i++) {
>> >>> +             if (init->features[i / 32] & (1 << (i % 32))) {
>> >>
>> >> Isn't this an open-coded version of test_bit() ?
>> >
>> > indeed, nicely spotted:
>> 
>> BTW, I wrote it that was out of excessive paranoia: it's a userspace
>> API, and test_bit() won't be right on 64 bit BE systems.
>
> So why is this a concern for 32-bit systems (which are, by definition,
> only in arch/arm) ?

Because this feature bitmap system is the same code which other archs
*should* be using for specifying kvm cpus :)

Cheers,
Rusty.



^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support
@ 2013-01-17  2:11             ` Rusty Russell
  0 siblings, 0 replies; 160+ messages in thread
From: Rusty Russell @ 2013-01-17  2:11 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux <linux@arm.linux.org.uk> writes:
> On Wed, Jan 16, 2013 at 01:26:01PM +1030, Rusty Russell wrote:
>> Christoffer Dall <c.dall@virtualopensystems.com> writes:
>> 
>> > On Mon, Jan 14, 2013 at 11:24 AM, Russell King - ARM Linux
>> > <linux@arm.linux.org.uk> wrote:
>> >> On Tue, Jan 08, 2013 at 01:38:55PM -0500, Christoffer Dall wrote:
>> >>> +     /* -ENOENT for unknown features, -EINVAL for invalid combinations. */
>> >>> +     for (i = 0; i < sizeof(init->features)*8; i++) {
>> >>> +             if (init->features[i / 32] & (1 << (i % 32))) {
>> >>
>> >> Isn't this an open-coded version of test_bit() ?
>> >
>> > indeed, nicely spotted:
>> 
>> BTW, I wrote it that was out of excessive paranoia: it's a userspace
>> API, and test_bit() won't be right on 64 bit BE systems.
>
> So why is this a concern for 32-bit systems (which are, by definition,
> only in arch/arm) ?

Because this feature bitmap system is the same code which other archs
*should* be using for specifying kvm cpus :)

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 160+ messages in thread

end of thread, other threads:[~2013-01-17  2:38 UTC | newest]

Thread overview: 160+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-08 18:38 [PATCH v5 00/14] KVM/ARM Implementation Christoffer Dall
2013-01-08 18:38 ` Christoffer Dall
2013-01-08 18:38 ` [PATCH v5 01/14] ARM: Add page table and page defines needed by KVM Christoffer Dall
2013-01-08 18:38   ` Christoffer Dall
2013-01-08 18:38 ` [PATCH v5 02/14] ARM: Section based HYP idmap Christoffer Dall
2013-01-08 18:38   ` Christoffer Dall
2013-01-14 10:27   ` Gleb Natapov
2013-01-14 10:27     ` Gleb Natapov
2013-01-14 10:49     ` Will Deacon
2013-01-14 10:49       ` Will Deacon
2013-01-14 11:07       ` Gleb Natapov
2013-01-14 11:07         ` Gleb Natapov
2013-01-14 13:07         ` Russell King - ARM Linux
2013-01-14 13:07           ` Russell King - ARM Linux
2013-01-14 16:13   ` Russell King - ARM Linux
2013-01-14 16:13     ` Russell King - ARM Linux
2013-01-14 17:09     ` Christoffer Dall
2013-01-14 17:09       ` Christoffer Dall
2013-01-08 18:38 ` [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support Christoffer Dall
2013-01-08 18:38   ` Christoffer Dall
2013-01-14 15:09   ` Will Deacon
2013-01-14 15:09     ` Will Deacon
2013-01-14 15:40     ` Christoffer Dall
2013-01-14 15:40       ` Christoffer Dall
2013-01-14 16:24   ` Russell King - ARM Linux
2013-01-14 16:24     ` Russell King - ARM Linux
2013-01-14 17:33     ` Christoffer Dall
2013-01-14 17:33       ` Christoffer Dall
2013-01-16  2:56       ` Rusty Russell
2013-01-16  2:56         ` Rusty Russell
2013-01-16  9:44         ` Russell King - ARM Linux
2013-01-16  9:44           ` Russell King - ARM Linux
2013-01-17  2:11           ` Rusty Russell
2013-01-17  2:11             ` Rusty Russell
2013-01-14 18:49   ` Gleb Natapov
2013-01-14 18:49     ` Gleb Natapov
2013-01-14 22:17     ` Christoffer Dall
2013-01-14 22:17       ` Christoffer Dall
2013-01-15 13:32       ` Gleb Natapov
2013-01-15 13:32         ` Gleb Natapov
2013-01-15 13:43         ` [kvmarm] " Alexander Graf
2013-01-15 13:43           ` Alexander Graf
2013-01-15 15:35           ` Gleb Natapov
2013-01-15 15:35             ` Gleb Natapov
2013-01-15 16:21             ` Alexander Graf
2013-01-15 16:21               ` Alexander Graf
2013-01-08 18:39 ` [PATCH v5 04/14] KVM: ARM: Hypervisor initialization Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-14 15:11   ` Will Deacon
2013-01-14 15:11     ` Will Deacon
2013-01-14 16:35     ` Christoffer Dall
2013-01-14 16:35       ` Christoffer Dall
2013-01-08 18:39 ` [PATCH v5 05/14] KVM: ARM: Memory virtualization setup Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-08 18:39 ` [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-15  9:56   ` Gleb Natapov
2013-01-15  9:56     ` Gleb Natapov
2013-01-15 12:15     ` [kvmarm] " Peter Maydell
2013-01-15 12:15       ` Peter Maydell
2013-01-15 12:52       ` Gleb Natapov
2013-01-15 12:52         ` Gleb Natapov
2013-01-15 14:04         ` Peter Maydell
2013-01-15 14:04           ` Peter Maydell
2013-01-15 14:40           ` Christoffer Dall
2013-01-15 14:40             ` Christoffer Dall
2013-01-15 15:17           ` Gleb Natapov
2013-01-15 15:17             ` Gleb Natapov
2013-01-15 16:25             ` Alexander Graf
2013-01-15 16:25               ` Alexander Graf
2013-01-16 10:40               ` Gleb Natapov
2013-01-16 10:40                 ` Gleb Natapov
2013-01-08 18:39 ` [PATCH v5 07/14] KVM: ARM: World-switch implementation Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-15  9:43   ` Gleb Natapov
2013-01-15  9:43     ` Gleb Natapov
2013-01-16  2:08     ` Christoffer Dall
2013-01-16  2:08       ` Christoffer Dall
2013-01-16  4:08       ` Christoffer Dall
2013-01-16  4:08         ` Christoffer Dall
2013-01-16 12:57         ` Gleb Natapov
2013-01-16 12:57           ` Gleb Natapov
2013-01-16 15:40           ` Christoffer Dall
2013-01-16 15:40             ` Christoffer Dall
2013-01-16 16:17             ` Gleb Natapov
2013-01-16 16:17               ` Gleb Natapov
2013-01-16 12:12       ` Gleb Natapov
2013-01-16 12:12         ` Gleb Natapov
2013-01-16 13:14         ` Russell King - ARM Linux
2013-01-16 13:14           ` Russell King - ARM Linux
2013-01-16 15:42         ` Christoffer Dall
2013-01-16 15:42           ` Christoffer Dall
2013-01-16 15:52           ` Gleb Natapov
2013-01-16 15:52             ` Gleb Natapov
2013-01-16 16:17             ` Christoffer Dall
2013-01-16 16:17               ` Christoffer Dall
2013-01-16 16:21               ` Gleb Natapov
2013-01-16 16:21                 ` Gleb Natapov
2013-01-08 18:39 ` [PATCH v5 08/14] KVM: ARM: Emulation framework and CP15 emulation Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-14 16:36   ` Russell King - ARM Linux
2013-01-14 16:36     ` Russell King - ARM Linux
2013-01-14 17:38     ` Christoffer Dall
2013-01-14 17:38       ` Christoffer Dall
2013-01-14 18:33       ` Russell King - ARM Linux
2013-01-14 18:33         ` Russell King - ARM Linux
2013-01-08 18:39 ` [PATCH v5 09/14] KVM: ARM: User space API for getting/setting co-proc registers Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-08 18:39 ` [PATCH v5 10/14] KVM: ARM: Demux CCSIDR in the userspace API Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-08 18:39 ` [PATCH v5 11/14] KVM: ARM: VFP userspace interface Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-08 18:39 ` [PATCH v5 12/14] KVM: ARM: Handle guest faults in KVM Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-08 18:40 ` [PATCH v5 13/14] KVM: ARM: Handle I/O aborts Christoffer Dall
2013-01-08 18:40   ` Christoffer Dall
2013-01-14 16:43   ` Russell King - ARM Linux
2013-01-14 16:43     ` Russell King - ARM Linux
2013-01-14 18:25     ` Christoffer Dall
2013-01-14 18:25       ` Christoffer Dall
2013-01-14 18:43       ` Russell King - ARM Linux
2013-01-14 18:43         ` Russell King - ARM Linux
2013-01-14 18:50         ` Will Deacon
2013-01-14 18:50           ` Will Deacon
2013-01-14 18:53           ` [kvmarm] " Alexander Graf
2013-01-14 18:53             ` Alexander Graf
2013-01-14 18:56             ` Christoffer Dall
2013-01-14 18:56               ` Christoffer Dall
2013-01-14 19:00             ` Will Deacon
2013-01-14 19:00               ` Will Deacon
2013-01-14 19:12               ` Christoffer Dall
2013-01-14 19:12                 ` Christoffer Dall
2013-01-14 22:36                 ` Will Deacon
2013-01-14 22:36                   ` Will Deacon
2013-01-14 22:51                   ` Christoffer Dall
2013-01-14 22:51                     ` Christoffer Dall
2013-01-15  7:00                   ` Gleb Natapov
2013-01-15  7:00                     ` Gleb Natapov
2013-01-15 13:18   ` Gleb Natapov
2013-01-15 13:18     ` Gleb Natapov
2013-01-15 13:29     ` Marc Zyngier
2013-01-15 13:29       ` Marc Zyngier
2013-01-15 13:34       ` Gleb Natapov
2013-01-15 13:34         ` Gleb Natapov
2013-01-15 13:46         ` Marc Zyngier
2013-01-15 13:46           ` Marc Zyngier
2013-01-15 14:27           ` Gleb Natapov
2013-01-15 14:27             ` Gleb Natapov
2013-01-15 14:42             ` Christoffer Dall
2013-01-15 14:42               ` Christoffer Dall
2013-01-15 14:48             ` Marc Zyngier
2013-01-15 14:48               ` Marc Zyngier
2013-01-15 15:31               ` Gleb Natapov
2013-01-15 15:31                 ` Gleb Natapov
2013-01-08 18:40 ` [PATCH v5 14/14] KVM: ARM: Add maintainer entry for KVM/ARM Christoffer Dall
2013-01-08 18:40   ` Christoffer Dall
2013-01-14 16:00 ` [PATCH v5 00/14] KVM/ARM Implementation Will Deacon
2013-01-14 16:00   ` Will Deacon
2013-01-14 22:31   ` Christoffer Dall
2013-01-14 22:31     ` Christoffer Dall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.