All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v5 00/38] KVM: arm64: Add Statistical Profiling Extension (SPE) support
@ 2021-11-17 15:38 ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The series is based on v5.16-rc1 can be found on gitlab at [1]. kvmtool
support is needed to create a VM with SPE enabled; a branch with the
necessary changes can be found at [2].

v4 of the patches can be found at [3]. v3 can be found at [4], and the
original series at [5].

Introduction
============

Statistical Profiling Extension (SPE) is an optional feature added in
ARMv8.2. It allows sampling at regular intervals of the operations executed
by the PE and storing a record of each operation in a memory buffer. A high
level overview of the extension is presented in an article on arm.com [4].

More information about how I implemented the SPE emulation and why can be
found in the cover letter for version 3 of the series [4].

Changes from v4
===============

Mostly fixes, but some small features were added too:

- Implemented review comments, many thanks!

- Reworked heterogeneous support, because probing was broken on systems
  which had more than one SPE instance.

- Allow locking a memslot after the VCPU has run to make it possible for
  the VMM to cancel migration, as migration requires unlocking the memslots
  before it is initiated.

- Unmap the memory from stage 2 if a memslot is unlocked before any of the
  VCPUs have run. This is so KVM can perform the needed dcache maintenance
  operations in the stage 2 abort handler (details in patch #7 "KVM: arm64:
  Unmap unlocked memslot from stage 2 if kvm_mmu_has_pending_ops()").

- Dropped the KVM_ARM_SUPPORTED_CPUS ioctl in favor of KVM setting the
  cpumask of allowed CPUs implicitly when the SPE VCPU feature is set.

Missing features
================

Although it might look like it, I've tried to keep the series as small as
possible to make it easier to review, while implementing the core
functionality needed for the SPE emulation. As such, I've chosen to not
implement several features:

- Host profiling a guest which has the SPE feature bit set.

- No errata workarounds impacting SPE have been implemented yet.

- Disabling CONFIG_NUMA_BALANCING is a hack to get KVM SPE to work and I am
  investigating other ways to get around automatic numa balancing, like
  requiring userspace to disable it via set_mempolicy(). I am also going to
  look at how VFIO gets around it. Suggestions welcome.

- There's plenty of room for optimization. Off the top of my head, using
  block mappings at stage 2, batch pinning of pages (similar to what VFIO
  does), optimize the way KVM keeps track of pinned pages (using a linked
  list triples the memory usage), context-switch the SPE registers on
  vcpu_load/vcpu_put on VHE if the host is not profiling, improving the
  locking scheme, especially when a memslot is locked, etc, etc

- ...and others. I'm sure I'm missing at least a few things which are
  important for someone.

Known issues
============

This is an RFC, so keep in mind that almost definitely there will be scary
bugs. For example, below is a list of known issues which don't affect the
correctness of the emulation, and which I'm planning to fix in a future
iteration:

- With CONFIG_PROVE_LOCKING=y, lockdep complains about lock contention when
  the VCPU executes the dcache clean pending ops.

- With CONFIG_PROVE_LOCKING=y, KVM will hit a BUG at
  kvm_lock_all_vcpus()->mutex_trylock(&vcpu->mutex) with 47 or more VCPUs.

This BUG is bening and can also be triggered with mainline; it can be made
to go away by increasing MAX_LOCK_DEPTH. I've hacked kvmtool to to
reproduce the splat with a mainline kernel, which can be found at [6]
(instructions in the commit message).

Open questions
==============

1. Userspace is not allowed to profile a CPU event (not bound to a task) is
!perf_allow_cpu(). It is my understanding that this is because of security
reasons, as we don't want a task to profile another task. Because a VM
will only be able to profile itself, I don't think it's necessary to
restrict the VM in any way based on perf_allow_cpu(), like we do with
perfmon_capable() and physical timer timestamps.

2. How to handle guest triggered SPE errors. By error I mean all syndromes
reported by the PMSBR_EL1 register other than buffer full and stage 1
fault (because those can and should be handled by the guest). The SPE
driver disables profiling when it encounters a buffer syndrome other than
buffer full. I see several options here:

a. KVM can do the same thing as the SPE driver and disable SPE emulation
for that guest.

b. KVM returns an error from KVM_RUN.

c. KVM allows the guest direct access to the buffer register (they aren't
trapped anymore), but, because the guest can trigger a maintenance
interrupt with a write to PMBSR_EL1, KVM will ignore all syndromes,
including SError or stage 2 fault.

3. Related to 2, SPE can report an SError. The SPE driver doesn't treat
this separately from the other syndromes. Should KVM treat it like any
other syndrome? Should KVM do more?

At the moment, KVM injects an external abort when it encounters an error
syndrome, and for a Stage 2 fault, prints a warning. The warning has proved
very useful for testing and debugging.

Testing
=======

Testing was done on Altra server with two sockets, both populated. The
patches are based on v5.16-rc1.

For testing, I've used a version of kvmtool with SPE support [2].  SPE_STOP
API, I used a special version of kvmtool which starts the guest in one of
the stopped states; that can be found at [7] (compile from a different
commit if a different state and/or transition is desired).

Finally, in the VM I used defconfig Linux guest compiled from v5.15-rc5 and
kvm-unit-tests patches which I wrote to test SPE [8].

All tests were run twice times: once with VHE enabled, once in NVHE mode
(kvm-arm.mode=nvhe).

The first test that I ran was the kvm-unit-tests test. This is also the
test that I used to check that KVM_ARM_VCPU_SPE_STOP_{TRAP,EXIT,RESUME}
works correctly with kvmtool.

Then I profiled iperf3 in the guest (32 VCPUs to limit the size of
perf.data, 32GiB memory), while concurrently profiling in the host. This is
the command that I used:

# perf record -ae arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ -- iperf3 -c 127.0.0.1 -t 30

Everything looked right to me and I didn't see any kernel warnings or bugs.

[1] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/kvm-spe-v5
[2] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/kvm-spe-v5
[3] https://www.spinics.net/lists/arm-kernel/msg917220.html
[4] https://lore.kernel.org/linux-arm-kernel/20201027172705.15181-1-alexandru.elisei@arm.com/
[5] https://www.spinics.net/lists/arm-kernel/msg776228.html
[6] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/vgic-lock-all-vcpus-lockdep-bug-v1
[7] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/kvm-spe-v5-stop-tests
[8] https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/kvm-spe-v4

Alexandru Elisei (35):
  KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
  KVM: arm64: Add lock/unlock memslot user API
  KVM: arm64: Implement the memslot lock/unlock functionality
  KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
  KVM: arm64: Perform CMOs on locked memslots when userspace resets
    VCPUs
  KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs
  KVM: arm64: Unmap unlocked memslot from stage 2 if
    kvm_mmu_has_pending_ops()
  KVM: arm64: Unlock memslots after stage 2 tables are freed
  KVM: arm64: Deny changes to locked memslots
  KVM: Add kvm_warn{,_ratelimited} macros
  KVM: arm64: Print a warning for unexpected faults on locked memslots
  KVM: arm64: Allow userspace to lock and unlock memslots
  KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option
  KVM: arm64: Add SPE capability and VCPU feature
  perf: arm_spe_pmu: Move struct arm_spe_pmu to a separate header file
  KVM: arm64: Allow SPE emulation when the SPE hardware is present
  KVM: arm64: Allow userspace to set the SPE feature only if SPE is
    present
  KVM: arm64: Expose SPE version to guests
  KVM: arm64: Do not run a VCPU on a CPU without SPE
  KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE
  KVM: arm64: Move accesses to MDCR_EL2 out of
    __{activate,deactivate}_traps_common
  KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE
  KVM: arm64: Add SPE system registers to VCPU context
  KVM: arm64: nVHE: Save PMSCR_EL1 to the host context
  KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags
  KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE
  KVM: arm64: VHE: Context switch SPE state if VCPU has SPE
  KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load
  KVM: arm64: Allow guest to use physical timestamps if
    perfmon_capable()
  KVM: arm64: Emulate SPE buffer management interrupt
  KVM: arm64: Add an userspace API to stop a VCPU profiling
  KVM: arm64: Implement userspace API to stop a VCPU profiling
  KVM: arm64: Add PMSIDR_EL1 to the SPE register context
  KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING
  KVM: arm64: Allow userspace to enable SPE for guests

Sudeep Holla (3):
  KVM: arm64: Add a new VCPU device control group for SPE
  KVM: arm64: Add SPE VCPU device attribute to set the interrupt number
  KVM: arm64: Add SPE VCPU device attribute to initialize SPE

 Documentation/virt/kvm/api.rst          |  69 ++++
 Documentation/virt/kvm/devices/vcpu.rst |  76 ++++
 arch/arm64/include/asm/kvm_arm.h        |   1 +
 arch/arm64/include/asm/kvm_host.h       |  75 +++-
 arch/arm64/include/asm/kvm_hyp.h        |  49 ++-
 arch/arm64/include/asm/kvm_mmu.h        |   8 +
 arch/arm64/include/asm/kvm_spe.h        | 103 ++++++
 arch/arm64/include/asm/sysreg.h         |   3 +
 arch/arm64/include/uapi/asm/kvm.h       |  11 +
 arch/arm64/kvm/Kconfig                  |   8 +
 arch/arm64/kvm/Makefile                 |   1 +
 arch/arm64/kvm/arm.c                    | 120 ++++++-
 arch/arm64/kvm/debug.c                  |  55 ++-
 arch/arm64/kvm/guest.c                  |  10 +
 arch/arm64/kvm/hyp/include/hyp/spe-sr.h |  32 ++
 arch/arm64/kvm/hyp/include/hyp/switch.h |  16 +-
 arch/arm64/kvm/hyp/nvhe/Makefile        |   1 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c      |  24 +-
 arch/arm64/kvm/hyp/nvhe/spe-sr.c        | 133 +++++++
 arch/arm64/kvm/hyp/nvhe/switch.c        |  35 +-
 arch/arm64/kvm/hyp/vhe/Makefile         |   1 +
 arch/arm64/kvm/hyp/vhe/spe-sr.c         | 193 ++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c         |  21 ++
 arch/arm64/kvm/mmu.c                    | 444 +++++++++++++++++++++++-
 arch/arm64/kvm/reset.c                  |   8 +
 arch/arm64/kvm/spe.c                    | 383 ++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c               |  77 +++-
 arch/arm64/kvm/vgic/vgic-init.c         |   4 +-
 arch/arm64/kvm/vgic/vgic-its.c          |   8 +-
 arch/arm64/kvm/vgic/vgic-kvm-device.c   |  50 +--
 arch/arm64/kvm/vgic/vgic.h              |   3 -
 drivers/perf/arm_spe_pmu.c              |  31 +-
 include/linux/kvm_host.h                |   4 +
 include/linux/perf/arm_spe_pmu.h        |  55 +++
 include/uapi/linux/kvm.h                |   9 +
 35 files changed, 1968 insertions(+), 153 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_spe.h
 create mode 100644 arch/arm64/kvm/hyp/include/hyp/spe-sr.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/spe-sr.c
 create mode 100644 arch/arm64/kvm/hyp/vhe/spe-sr.c
 create mode 100644 arch/arm64/kvm/spe.c
 create mode 100644 include/linux/perf/arm_spe_pmu.h

-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 00/38] KVM: arm64: Add Statistical Profiling Extension (SPE) support
@ 2021-11-17 15:38 ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The series is based on v5.16-rc1 can be found on gitlab at [1]. kvmtool
support is needed to create a VM with SPE enabled; a branch with the
necessary changes can be found at [2].

v4 of the patches can be found at [3]. v3 can be found at [4], and the
original series at [5].

Introduction
============

Statistical Profiling Extension (SPE) is an optional feature added in
ARMv8.2. It allows sampling at regular intervals of the operations executed
by the PE and storing a record of each operation in a memory buffer. A high
level overview of the extension is presented in an article on arm.com [4].

More information about how I implemented the SPE emulation and why can be
found in the cover letter for version 3 of the series [4].

Changes from v4
===============

Mostly fixes, but some small features were added too:

- Implemented review comments, many thanks!

- Reworked heterogeneous support, because probing was broken on systems
  which had more than one SPE instance.

- Allow locking a memslot after the VCPU has run to make it possible for
  the VMM to cancel migration, as migration requires unlocking the memslots
  before it is initiated.

- Unmap the memory from stage 2 if a memslot is unlocked before any of the
  VCPUs have run. This is so KVM can perform the needed dcache maintenance
  operations in the stage 2 abort handler (details in patch #7 "KVM: arm64:
  Unmap unlocked memslot from stage 2 if kvm_mmu_has_pending_ops()").

- Dropped the KVM_ARM_SUPPORTED_CPUS ioctl in favor of KVM setting the
  cpumask of allowed CPUs implicitly when the SPE VCPU feature is set.

Missing features
================

Although it might look like it, I've tried to keep the series as small as
possible to make it easier to review, while implementing the core
functionality needed for the SPE emulation. As such, I've chosen to not
implement several features:

- Host profiling a guest which has the SPE feature bit set.

- No errata workarounds impacting SPE have been implemented yet.

- Disabling CONFIG_NUMA_BALANCING is a hack to get KVM SPE to work and I am
  investigating other ways to get around automatic numa balancing, like
  requiring userspace to disable it via set_mempolicy(). I am also going to
  look at how VFIO gets around it. Suggestions welcome.

- There's plenty of room for optimization. Off the top of my head, using
  block mappings at stage 2, batch pinning of pages (similar to what VFIO
  does), optimize the way KVM keeps track of pinned pages (using a linked
  list triples the memory usage), context-switch the SPE registers on
  vcpu_load/vcpu_put on VHE if the host is not profiling, improving the
  locking scheme, especially when a memslot is locked, etc, etc

- ...and others. I'm sure I'm missing at least a few things which are
  important for someone.

Known issues
============

This is an RFC, so keep in mind that almost definitely there will be scary
bugs. For example, below is a list of known issues which don't affect the
correctness of the emulation, and which I'm planning to fix in a future
iteration:

- With CONFIG_PROVE_LOCKING=y, lockdep complains about lock contention when
  the VCPU executes the dcache clean pending ops.

- With CONFIG_PROVE_LOCKING=y, KVM will hit a BUG at
  kvm_lock_all_vcpus()->mutex_trylock(&vcpu->mutex) with 47 or more VCPUs.

This BUG is bening and can also be triggered with mainline; it can be made
to go away by increasing MAX_LOCK_DEPTH. I've hacked kvmtool to to
reproduce the splat with a mainline kernel, which can be found at [6]
(instructions in the commit message).

Open questions
==============

1. Userspace is not allowed to profile a CPU event (not bound to a task) is
!perf_allow_cpu(). It is my understanding that this is because of security
reasons, as we don't want a task to profile another task. Because a VM
will only be able to profile itself, I don't think it's necessary to
restrict the VM in any way based on perf_allow_cpu(), like we do with
perfmon_capable() and physical timer timestamps.

2. How to handle guest triggered SPE errors. By error I mean all syndromes
reported by the PMSBR_EL1 register other than buffer full and stage 1
fault (because those can and should be handled by the guest). The SPE
driver disables profiling when it encounters a buffer syndrome other than
buffer full. I see several options here:

a. KVM can do the same thing as the SPE driver and disable SPE emulation
for that guest.

b. KVM returns an error from KVM_RUN.

c. KVM allows the guest direct access to the buffer register (they aren't
trapped anymore), but, because the guest can trigger a maintenance
interrupt with a write to PMBSR_EL1, KVM will ignore all syndromes,
including SError or stage 2 fault.

3. Related to 2, SPE can report an SError. The SPE driver doesn't treat
this separately from the other syndromes. Should KVM treat it like any
other syndrome? Should KVM do more?

At the moment, KVM injects an external abort when it encounters an error
syndrome, and for a Stage 2 fault, prints a warning. The warning has proved
very useful for testing and debugging.

Testing
=======

Testing was done on Altra server with two sockets, both populated. The
patches are based on v5.16-rc1.

For testing, I've used a version of kvmtool with SPE support [2].  SPE_STOP
API, I used a special version of kvmtool which starts the guest in one of
the stopped states; that can be found at [7] (compile from a different
commit if a different state and/or transition is desired).

Finally, in the VM I used defconfig Linux guest compiled from v5.15-rc5 and
kvm-unit-tests patches which I wrote to test SPE [8].

All tests were run twice times: once with VHE enabled, once in NVHE mode
(kvm-arm.mode=nvhe).

The first test that I ran was the kvm-unit-tests test. This is also the
test that I used to check that KVM_ARM_VCPU_SPE_STOP_{TRAP,EXIT,RESUME}
works correctly with kvmtool.

Then I profiled iperf3 in the guest (32 VCPUs to limit the size of
perf.data, 32GiB memory), while concurrently profiling in the host. This is
the command that I used:

# perf record -ae arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ -- iperf3 -c 127.0.0.1 -t 30

Everything looked right to me and I didn't see any kernel warnings or bugs.

[1] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/kvm-spe-v5
[2] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/kvm-spe-v5
[3] https://www.spinics.net/lists/arm-kernel/msg917220.html
[4] https://lore.kernel.org/linux-arm-kernel/20201027172705.15181-1-alexandru.elisei@arm.com/
[5] https://www.spinics.net/lists/arm-kernel/msg776228.html
[6] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/vgic-lock-all-vcpus-lockdep-bug-v1
[7] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/kvm-spe-v5-stop-tests
[8] https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/kvm-spe-v4

Alexandru Elisei (35):
  KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
  KVM: arm64: Add lock/unlock memslot user API
  KVM: arm64: Implement the memslot lock/unlock functionality
  KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
  KVM: arm64: Perform CMOs on locked memslots when userspace resets
    VCPUs
  KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs
  KVM: arm64: Unmap unlocked memslot from stage 2 if
    kvm_mmu_has_pending_ops()
  KVM: arm64: Unlock memslots after stage 2 tables are freed
  KVM: arm64: Deny changes to locked memslots
  KVM: Add kvm_warn{,_ratelimited} macros
  KVM: arm64: Print a warning for unexpected faults on locked memslots
  KVM: arm64: Allow userspace to lock and unlock memslots
  KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option
  KVM: arm64: Add SPE capability and VCPU feature
  perf: arm_spe_pmu: Move struct arm_spe_pmu to a separate header file
  KVM: arm64: Allow SPE emulation when the SPE hardware is present
  KVM: arm64: Allow userspace to set the SPE feature only if SPE is
    present
  KVM: arm64: Expose SPE version to guests
  KVM: arm64: Do not run a VCPU on a CPU without SPE
  KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE
  KVM: arm64: Move accesses to MDCR_EL2 out of
    __{activate,deactivate}_traps_common
  KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE
  KVM: arm64: Add SPE system registers to VCPU context
  KVM: arm64: nVHE: Save PMSCR_EL1 to the host context
  KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags
  KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE
  KVM: arm64: VHE: Context switch SPE state if VCPU has SPE
  KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load
  KVM: arm64: Allow guest to use physical timestamps if
    perfmon_capable()
  KVM: arm64: Emulate SPE buffer management interrupt
  KVM: arm64: Add an userspace API to stop a VCPU profiling
  KVM: arm64: Implement userspace API to stop a VCPU profiling
  KVM: arm64: Add PMSIDR_EL1 to the SPE register context
  KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING
  KVM: arm64: Allow userspace to enable SPE for guests

Sudeep Holla (3):
  KVM: arm64: Add a new VCPU device control group for SPE
  KVM: arm64: Add SPE VCPU device attribute to set the interrupt number
  KVM: arm64: Add SPE VCPU device attribute to initialize SPE

 Documentation/virt/kvm/api.rst          |  69 ++++
 Documentation/virt/kvm/devices/vcpu.rst |  76 ++++
 arch/arm64/include/asm/kvm_arm.h        |   1 +
 arch/arm64/include/asm/kvm_host.h       |  75 +++-
 arch/arm64/include/asm/kvm_hyp.h        |  49 ++-
 arch/arm64/include/asm/kvm_mmu.h        |   8 +
 arch/arm64/include/asm/kvm_spe.h        | 103 ++++++
 arch/arm64/include/asm/sysreg.h         |   3 +
 arch/arm64/include/uapi/asm/kvm.h       |  11 +
 arch/arm64/kvm/Kconfig                  |   8 +
 arch/arm64/kvm/Makefile                 |   1 +
 arch/arm64/kvm/arm.c                    | 120 ++++++-
 arch/arm64/kvm/debug.c                  |  55 ++-
 arch/arm64/kvm/guest.c                  |  10 +
 arch/arm64/kvm/hyp/include/hyp/spe-sr.h |  32 ++
 arch/arm64/kvm/hyp/include/hyp/switch.h |  16 +-
 arch/arm64/kvm/hyp/nvhe/Makefile        |   1 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c      |  24 +-
 arch/arm64/kvm/hyp/nvhe/spe-sr.c        | 133 +++++++
 arch/arm64/kvm/hyp/nvhe/switch.c        |  35 +-
 arch/arm64/kvm/hyp/vhe/Makefile         |   1 +
 arch/arm64/kvm/hyp/vhe/spe-sr.c         | 193 ++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c         |  21 ++
 arch/arm64/kvm/mmu.c                    | 444 +++++++++++++++++++++++-
 arch/arm64/kvm/reset.c                  |   8 +
 arch/arm64/kvm/spe.c                    | 383 ++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c               |  77 +++-
 arch/arm64/kvm/vgic/vgic-init.c         |   4 +-
 arch/arm64/kvm/vgic/vgic-its.c          |   8 +-
 arch/arm64/kvm/vgic/vgic-kvm-device.c   |  50 +--
 arch/arm64/kvm/vgic/vgic.h              |   3 -
 drivers/perf/arm_spe_pmu.c              |  31 +-
 include/linux/kvm_host.h                |   4 +
 include/linux/perf/arm_spe_pmu.h        |  55 +++
 include/uapi/linux/kvm.h                |   9 +
 35 files changed, 1968 insertions(+), 153 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_spe.h
 create mode 100644 arch/arm64/kvm/hyp/include/hyp/spe-sr.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/spe-sr.c
 create mode 100644 arch/arm64/kvm/hyp/vhe/spe-sr.c
 create mode 100644 arch/arm64/kvm/spe.c
 create mode 100644 include/linux/perf/arm_spe_pmu.h

-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 01/38] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The VGIC code uses the lock_all_vcpus() function to make sure no VCPUs are
run while it fiddles with the global VGIC state. Move the declaration of
lock_all_vcpus() and the corresponding unlock function into asm/kvm_host.h
where it can be reused by other parts of KVM/arm64 and rename the functions
to kvm_{lock,unlock}_all_vcpus() to make them more generic.

Because the scope of the code potentially using the functions has
increased, add a lockdep check that the kvm->lock is held by the caller.
Holding the lock is necessary because otherwise userspace would be able to
create new VCPUs and run them while the existing VCPUs are locked.

No functional change intended.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h     |  3 ++
 arch/arm64/kvm/arm.c                  | 41 ++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-init.c       |  4 +--
 arch/arm64/kvm/vgic/vgic-its.c        |  8 ++---
 arch/arm64/kvm/vgic/vgic-kvm-device.c | 50 ++++-----------------------
 arch/arm64/kvm/vgic/vgic.h            |  3 --
 6 files changed, 56 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2a5f7f38006f..733621e41900 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -606,6 +606,9 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 void kvm_arm_halt_guest(struct kvm *kvm);
 void kvm_arm_resume_guest(struct kvm *kvm);
 
+bool kvm_lock_all_vcpus(struct kvm *kvm);
+void kvm_unlock_all_vcpus(struct kvm *kvm);
+
 #ifndef __KVM_NVHE_HYPERVISOR__
 #define kvm_call_hyp_nvhe(f, ...)						\
 	({								\
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2f03cbfefe67..e9b4ad7b5c82 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -651,6 +651,47 @@ void kvm_arm_resume_guest(struct kvm *kvm)
 	}
 }
 
+/* unlocks vcpus from @vcpu_lock_idx and smaller */
+static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
+{
+	struct kvm_vcpu *tmp_vcpu;
+
+	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
+		tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
+		mutex_unlock(&tmp_vcpu->mutex);
+	}
+}
+
+void kvm_unlock_all_vcpus(struct kvm *kvm)
+{
+	lockdep_assert_held(&kvm->lock);
+	unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
+}
+
+/* Returns true if all vcpus were locked, false otherwise */
+bool kvm_lock_all_vcpus(struct kvm *kvm)
+{
+	struct kvm_vcpu *tmp_vcpu;
+	int c;
+
+	lockdep_assert_held(&kvm->lock);
+
+	/*
+	 * Any time a vcpu is run, vcpu_load is called which tries to grab the
+	 * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure that
+	 * no other VCPUs are run and it is safe to fiddle with KVM global
+	 * state.
+	 */
+	kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
+		if (!mutex_trylock(&tmp_vcpu->mutex)) {
+			unlock_vcpus(kvm, c - 1);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
 {
 	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 0a06d0648970..cd045c7abde8 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -87,7 +87,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 		return -ENODEV;
 
 	ret = -EBUSY;
-	if (!lock_all_vcpus(kvm))
+	if (!kvm_lock_all_vcpus(kvm))
 		return ret;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
@@ -117,7 +117,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 		INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
 
 out_unlock:
-	unlock_all_vcpus(kvm);
+	kvm_unlock_all_vcpus(kvm);
 	return ret;
 }
 
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 089fc2ffcb43..bc4197e87d95 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -2005,7 +2005,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 		goto out;
 	}
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	if (!kvm_lock_all_vcpus(dev->kvm)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -2023,7 +2023,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 	} else {
 		*reg = region->its_read(dev->kvm, its, addr, len);
 	}
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 out:
 	mutex_unlock(&dev->kvm->lock);
 	return ret;
@@ -2668,7 +2668,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 	mutex_lock(&kvm->lock);
 	mutex_lock(&its->its_lock);
 
-	if (!lock_all_vcpus(kvm)) {
+	if (!kvm_lock_all_vcpus(kvm)) {
 		mutex_unlock(&its->its_lock);
 		mutex_unlock(&kvm->lock);
 		return -EBUSY;
@@ -2686,7 +2686,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 		break;
 	}
 
-	unlock_all_vcpus(kvm);
+	kvm_unlock_all_vcpus(kvm);
 	mutex_unlock(&its->its_lock);
 	mutex_unlock(&kvm->lock);
 	return ret;
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 0d000d2fe8d2..c5de904643cc 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -305,44 +305,6 @@ int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
 	return 0;
 }
 
-/* unlocks vcpus from @vcpu_lock_idx and smaller */
-static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
-{
-	struct kvm_vcpu *tmp_vcpu;
-
-	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
-		tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
-		mutex_unlock(&tmp_vcpu->mutex);
-	}
-}
-
-void unlock_all_vcpus(struct kvm *kvm)
-{
-	unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
-}
-
-/* Returns true if all vcpus were locked, false otherwise */
-bool lock_all_vcpus(struct kvm *kvm)
-{
-	struct kvm_vcpu *tmp_vcpu;
-	int c;
-
-	/*
-	 * Any time a vcpu is run, vcpu_load is called which tries to grab the
-	 * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure
-	 * that no other VCPUs are run and fiddle with the vgic state while we
-	 * access it.
-	 */
-	kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
-		if (!mutex_trylock(&tmp_vcpu->mutex)) {
-			unlock_vcpus(kvm, c - 1);
-			return false;
-		}
-	}
-
-	return true;
-}
-
 /**
  * vgic_v2_attr_regs_access - allows user space to access VGIC v2 state
  *
@@ -373,7 +335,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
 	if (ret)
 		goto out;
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	if (!kvm_lock_all_vcpus(dev->kvm)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -390,7 +352,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
 		break;
 	}
 
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 out:
 	mutex_unlock(&dev->kvm->lock);
 	return ret;
@@ -539,7 +501,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 		goto out;
 	}
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	if (!kvm_lock_all_vcpus(dev->kvm)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -589,7 +551,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 		break;
 	}
 
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 out:
 	mutex_unlock(&dev->kvm->lock);
 	return ret;
@@ -644,12 +606,12 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
 		case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
 			mutex_lock(&dev->kvm->lock);
 
-			if (!lock_all_vcpus(dev->kvm)) {
+			if (!kvm_lock_all_vcpus(dev->kvm)) {
 				mutex_unlock(&dev->kvm->lock);
 				return -EBUSY;
 			}
 			ret = vgic_v3_save_pending_tables(dev->kvm);
-			unlock_all_vcpus(dev->kvm);
+			kvm_unlock_all_vcpus(dev->kvm);
 			mutex_unlock(&dev->kvm->lock);
 			return ret;
 		}
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 3fd6c86a7ef3..e69c839a6941 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -255,9 +255,6 @@ int vgic_init(struct kvm *kvm);
 void vgic_debug_init(struct kvm *kvm);
 void vgic_debug_destroy(struct kvm *kvm);
 
-bool lock_all_vcpus(struct kvm *kvm);
-void unlock_all_vcpus(struct kvm *kvm);
-
 static inline int vgic_v3_max_apr_idx(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *cpu_if = &vcpu->arch.vgic_cpu;
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 01/38] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The VGIC code uses the lock_all_vcpus() function to make sure no VCPUs are
run while it fiddles with the global VGIC state. Move the declaration of
lock_all_vcpus() and the corresponding unlock function into asm/kvm_host.h
where it can be reused by other parts of KVM/arm64 and rename the functions
to kvm_{lock,unlock}_all_vcpus() to make them more generic.

Because the scope of the code potentially using the functions has
increased, add a lockdep check that the kvm->lock is held by the caller.
Holding the lock is necessary because otherwise userspace would be able to
create new VCPUs and run them while the existing VCPUs are locked.

No functional change intended.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h     |  3 ++
 arch/arm64/kvm/arm.c                  | 41 ++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-init.c       |  4 +--
 arch/arm64/kvm/vgic/vgic-its.c        |  8 ++---
 arch/arm64/kvm/vgic/vgic-kvm-device.c | 50 ++++-----------------------
 arch/arm64/kvm/vgic/vgic.h            |  3 --
 6 files changed, 56 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2a5f7f38006f..733621e41900 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -606,6 +606,9 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 void kvm_arm_halt_guest(struct kvm *kvm);
 void kvm_arm_resume_guest(struct kvm *kvm);
 
+bool kvm_lock_all_vcpus(struct kvm *kvm);
+void kvm_unlock_all_vcpus(struct kvm *kvm);
+
 #ifndef __KVM_NVHE_HYPERVISOR__
 #define kvm_call_hyp_nvhe(f, ...)						\
 	({								\
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2f03cbfefe67..e9b4ad7b5c82 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -651,6 +651,47 @@ void kvm_arm_resume_guest(struct kvm *kvm)
 	}
 }
 
+/* unlocks vcpus from @vcpu_lock_idx and smaller */
+static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
+{
+	struct kvm_vcpu *tmp_vcpu;
+
+	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
+		tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
+		mutex_unlock(&tmp_vcpu->mutex);
+	}
+}
+
+void kvm_unlock_all_vcpus(struct kvm *kvm)
+{
+	lockdep_assert_held(&kvm->lock);
+	unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
+}
+
+/* Returns true if all vcpus were locked, false otherwise */
+bool kvm_lock_all_vcpus(struct kvm *kvm)
+{
+	struct kvm_vcpu *tmp_vcpu;
+	int c;
+
+	lockdep_assert_held(&kvm->lock);
+
+	/*
+	 * Any time a vcpu is run, vcpu_load is called which tries to grab the
+	 * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure that
+	 * no other VCPUs are run and it is safe to fiddle with KVM global
+	 * state.
+	 */
+	kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
+		if (!mutex_trylock(&tmp_vcpu->mutex)) {
+			unlock_vcpus(kvm, c - 1);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
 {
 	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 0a06d0648970..cd045c7abde8 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -87,7 +87,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 		return -ENODEV;
 
 	ret = -EBUSY;
-	if (!lock_all_vcpus(kvm))
+	if (!kvm_lock_all_vcpus(kvm))
 		return ret;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
@@ -117,7 +117,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 		INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
 
 out_unlock:
-	unlock_all_vcpus(kvm);
+	kvm_unlock_all_vcpus(kvm);
 	return ret;
 }
 
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 089fc2ffcb43..bc4197e87d95 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -2005,7 +2005,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 		goto out;
 	}
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	if (!kvm_lock_all_vcpus(dev->kvm)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -2023,7 +2023,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 	} else {
 		*reg = region->its_read(dev->kvm, its, addr, len);
 	}
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 out:
 	mutex_unlock(&dev->kvm->lock);
 	return ret;
@@ -2668,7 +2668,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 	mutex_lock(&kvm->lock);
 	mutex_lock(&its->its_lock);
 
-	if (!lock_all_vcpus(kvm)) {
+	if (!kvm_lock_all_vcpus(kvm)) {
 		mutex_unlock(&its->its_lock);
 		mutex_unlock(&kvm->lock);
 		return -EBUSY;
@@ -2686,7 +2686,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 		break;
 	}
 
-	unlock_all_vcpus(kvm);
+	kvm_unlock_all_vcpus(kvm);
 	mutex_unlock(&its->its_lock);
 	mutex_unlock(&kvm->lock);
 	return ret;
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 0d000d2fe8d2..c5de904643cc 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -305,44 +305,6 @@ int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
 	return 0;
 }
 
-/* unlocks vcpus from @vcpu_lock_idx and smaller */
-static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
-{
-	struct kvm_vcpu *tmp_vcpu;
-
-	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
-		tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
-		mutex_unlock(&tmp_vcpu->mutex);
-	}
-}
-
-void unlock_all_vcpus(struct kvm *kvm)
-{
-	unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
-}
-
-/* Returns true if all vcpus were locked, false otherwise */
-bool lock_all_vcpus(struct kvm *kvm)
-{
-	struct kvm_vcpu *tmp_vcpu;
-	int c;
-
-	/*
-	 * Any time a vcpu is run, vcpu_load is called which tries to grab the
-	 * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure
-	 * that no other VCPUs are run and fiddle with the vgic state while we
-	 * access it.
-	 */
-	kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
-		if (!mutex_trylock(&tmp_vcpu->mutex)) {
-			unlock_vcpus(kvm, c - 1);
-			return false;
-		}
-	}
-
-	return true;
-}
-
 /**
  * vgic_v2_attr_regs_access - allows user space to access VGIC v2 state
  *
@@ -373,7 +335,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
 	if (ret)
 		goto out;
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	if (!kvm_lock_all_vcpus(dev->kvm)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -390,7 +352,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
 		break;
 	}
 
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 out:
 	mutex_unlock(&dev->kvm->lock);
 	return ret;
@@ -539,7 +501,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 		goto out;
 	}
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	if (!kvm_lock_all_vcpus(dev->kvm)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -589,7 +551,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 		break;
 	}
 
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 out:
 	mutex_unlock(&dev->kvm->lock);
 	return ret;
@@ -644,12 +606,12 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
 		case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
 			mutex_lock(&dev->kvm->lock);
 
-			if (!lock_all_vcpus(dev->kvm)) {
+			if (!kvm_lock_all_vcpus(dev->kvm)) {
 				mutex_unlock(&dev->kvm->lock);
 				return -EBUSY;
 			}
 			ret = vgic_v3_save_pending_tables(dev->kvm);
-			unlock_all_vcpus(dev->kvm);
+			kvm_unlock_all_vcpus(dev->kvm);
 			mutex_unlock(&dev->kvm->lock);
 			return ret;
 		}
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 3fd6c86a7ef3..e69c839a6941 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -255,9 +255,6 @@ int vgic_init(struct kvm *kvm);
 void vgic_debug_init(struct kvm *kvm);
 void vgic_debug_destroy(struct kvm *kvm);
 
-bool lock_all_vcpus(struct kvm *kvm);
-void unlock_all_vcpus(struct kvm *kvm);
-
 static inline int vgic_v3_max_apr_idx(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *cpu_if = &vcpu->arch.vgic_cpu;
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Stage 2 faults triggered by the profiling buffer attempting to write to
memory are reported by the SPE hardware by asserting a buffer management
event interrupt. Interrupts are by their nature asynchronous, which means
that the guest might have changed its stage 1 translation tables since the
attempted write. SPE reports the guest virtual address that caused the data
abort, not the IPA, which means that KVM would have to walk the guest's
stage 1 tables to find the IPA. Using the AT instruction to walk the
guest's tables in hardware is not an option because it doesn't report the
IPA in the case of a stage 2 fault on a stage 1 table walk.

Avoid both issues by pre-mapping the guest memory at stage 2. This is being
done by adding a capability that allows the user to pin the memory backing
a memslot. The same capability can be used to unlock a memslot, which
unpins the pages associated with the memslot, but doesn't unmap the IPA
range from stage 2; in this case, the addresses will be unmapped from stage
2 via the MMU notifiers when the process' address space changes.

For now, the capability doesn't actually do anything other than checking
that the usage is correct; the memory operations will be added in future
patches.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
 arch/arm64/include/asm/kvm_mmu.h |  3 ++
 arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
 arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h         |  8 ++++
 5 files changed, 174 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index aeeb071c7688..16aa59eae3d9 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
 This is intended to support intra-host migration of VMs between userspace VMMs,
 upgrading the VMM process without interrupting the guest.
 
+7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
+----------------------------------------
+
+:Architectures: arm64
+:Target: VM
+:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
+                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
+             args[0] is the slot number
+             args[1] specifies the permisions when the memslot is locked or if
+                     all memslots should be unlocked
+
+The presence of this capability indicates that KVM supports locking the memory
+associated with the memslot, and unlocking a previously locked memslot.
+
+The 'flags' parameter is defined as follows:
+
+7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
+-------------------------------------------------
+
+:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
+:Architectures: arm64
+:Target: VM
+:Parameters: args[0] contains the memory slot number
+             args[1] contains the permissions for the locked memory:
+                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
+                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
+                     (optional) with write permissions
+:Returns: 0 on success; negative error code on failure
+
+Enabling this capability causes the memory described by the memslot to be
+pinned in the process address space and the corresponding stage 2 IPA range
+mapped at stage 2. The permissions specified in args[1] apply to both
+mappings. The memory pinned with this capability counts towards the max
+locked memory limit for the current process.
+
+The capability should be enabled when no VCPUs are in the kernel executing an
+ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
+VCPUs have returned. The virtual memory range described by the memslot must be
+mapped in the userspace process without any gaps. It is considered an error if
+write permissions are specified for a memslot which logs dirty pages.
+
+7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
+---------------------------------------------------
+
+:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
+:Architectures: arm64
+:Target: VM
+:Parameters: args[0] contains the memory slot number
+             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
+                     which unlocks all previously locked memslots.
+:Returns: 0 on success; negative error code on failure
+
+Enabling this capability causes the memory pinned when locking the memslot
+specified in args[0] to be unpinned, or, optionally, all memslots to be
+unlocked. The IPA range is not unmapped from stage 2.
+>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 02d378887743..2c50734f048d 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
+int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
+int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
+
 static inline unsigned int kvm_get_vmid_bits(void)
 {
 	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e9b4ad7b5c82..d49905d18cee 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
 	return 0;
 }
 
+static int kvm_arm_lock_memslot_supported(void)
+{
+	return 0;
+}
+
+static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
+					     struct kvm_enable_cap *cap)
+{
+	u64 slot, action_flags;
+	u32 action;
+
+	if (cap->args[2] || cap->args[3])
+		return -EINVAL;
+
+	slot = cap->args[0];
+	action = cap->flags;
+	action_flags = cap->args[1];
+
+	switch (action) {
+	case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
+		return kvm_mmu_lock_memslot(kvm, slot, action_flags);
+	case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
+		return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
+	default:
+		return -EINVAL;
+	}
+}
+
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
 	int r;
 
-	if (cap->flags)
-		return -EINVAL;
-
 	switch (cap->cap) {
 	case KVM_CAP_ARM_NISV_TO_USER:
+		if (cap->flags)
+			return -EINVAL;
 		r = 0;
 		kvm->arch.return_nisv_io_abort_to_user = true;
 		break;
@@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->lock);
 		break;
+	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
+		if (!kvm_arm_lock_memslot_supported())
+			return -EINVAL;
+		r = kvm_lock_user_memory_region_ioctl(kvm, cap);
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
 	return VM_FAULT_SIGBUS;
 }
 
-
 /**
  * kvm_arch_destroy_vm - destroy the VM data structure
  * @kvm:	pointer to the KVM struct
@@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_PTRAUTH_GENERIC:
 		r = system_has_full_ptr_auth();
 		break;
+	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
+		r = kvm_arm_lock_memslot_supported();
+		break;
 	default:
 		r = 0;
 	}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 326cdfec74a1..f65bcbc9ae69 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
+{
+	struct kvm_memory_slot *memslot;
+	int ret;
+
+	if (slot >= KVM_MEM_SLOTS_NUM)
+		return -EINVAL;
+
+	if (!(flags & KVM_ARM_LOCK_MEM_READ))
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+	if (!kvm_lock_all_vcpus(kvm)) {
+		ret = -EBUSY;
+		goto out_unlock_kvm;
+	}
+	mutex_lock(&kvm->slots_lock);
+
+	memslot = id_to_memslot(kvm_memslots(kvm), slot);
+	if (!memslot) {
+		ret = -EINVAL;
+		goto out_unlock_slots;
+	}
+	if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
+	    ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
+		ret = -EPERM;
+		goto out_unlock_slots;
+	}
+
+	ret = -EINVAL;
+
+out_unlock_slots:
+	mutex_unlock(&kvm->slots_lock);
+	kvm_unlock_all_vcpus(kvm);
+out_unlock_kvm:
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
+int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
+{
+	bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
+	struct kvm_memory_slot *memslot;
+	int ret;
+
+	if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
+		return -EINVAL;
+
+	mutex_lock(&kvm->slots_lock);
+
+	if (unlock_all) {
+		ret = -EINVAL;
+		goto out_unlock_slots;
+	}
+
+	memslot = id_to_memslot(kvm_memslots(kvm), slot);
+	if (!memslot) {
+		ret = -EINVAL;
+		goto out_unlock_slots;
+	}
+
+	ret = -EINVAL;
+
+out_unlock_slots:
+	mutex_unlock(&kvm->slots_lock);
+	return ret;
+}
+
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	if (!kvm->arch.mmu.pgt)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1daa45268de2..70c969967557 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
+#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 #define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
 
+/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
+#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK	(1 << 0)
+#define   KVM_ARM_LOCK_MEM_READ				(1 << 0)
+#define   KVM_ARM_LOCK_MEM_WRITE			(1 << 1)
+#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK	(1 << 1)
+#define   KVM_ARM_UNLOCK_MEM_ALL			(1 << 0)
+
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
 
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Stage 2 faults triggered by the profiling buffer attempting to write to
memory are reported by the SPE hardware by asserting a buffer management
event interrupt. Interrupts are by their nature asynchronous, which means
that the guest might have changed its stage 1 translation tables since the
attempted write. SPE reports the guest virtual address that caused the data
abort, not the IPA, which means that KVM would have to walk the guest's
stage 1 tables to find the IPA. Using the AT instruction to walk the
guest's tables in hardware is not an option because it doesn't report the
IPA in the case of a stage 2 fault on a stage 1 table walk.

Avoid both issues by pre-mapping the guest memory at stage 2. This is being
done by adding a capability that allows the user to pin the memory backing
a memslot. The same capability can be used to unlock a memslot, which
unpins the pages associated with the memslot, but doesn't unmap the IPA
range from stage 2; in this case, the addresses will be unmapped from stage
2 via the MMU notifiers when the process' address space changes.

For now, the capability doesn't actually do anything other than checking
that the usage is correct; the memory operations will be added in future
patches.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
 arch/arm64/include/asm/kvm_mmu.h |  3 ++
 arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
 arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h         |  8 ++++
 5 files changed, 174 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index aeeb071c7688..16aa59eae3d9 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
 This is intended to support intra-host migration of VMs between userspace VMMs,
 upgrading the VMM process without interrupting the guest.
 
+7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
+----------------------------------------
+
+:Architectures: arm64
+:Target: VM
+:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
+                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
+             args[0] is the slot number
+             args[1] specifies the permisions when the memslot is locked or if
+                     all memslots should be unlocked
+
+The presence of this capability indicates that KVM supports locking the memory
+associated with the memslot, and unlocking a previously locked memslot.
+
+The 'flags' parameter is defined as follows:
+
+7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
+-------------------------------------------------
+
+:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
+:Architectures: arm64
+:Target: VM
+:Parameters: args[0] contains the memory slot number
+             args[1] contains the permissions for the locked memory:
+                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
+                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
+                     (optional) with write permissions
+:Returns: 0 on success; negative error code on failure
+
+Enabling this capability causes the memory described by the memslot to be
+pinned in the process address space and the corresponding stage 2 IPA range
+mapped at stage 2. The permissions specified in args[1] apply to both
+mappings. The memory pinned with this capability counts towards the max
+locked memory limit for the current process.
+
+The capability should be enabled when no VCPUs are in the kernel executing an
+ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
+VCPUs have returned. The virtual memory range described by the memslot must be
+mapped in the userspace process without any gaps. It is considered an error if
+write permissions are specified for a memslot which logs dirty pages.
+
+7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
+---------------------------------------------------
+
+:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
+:Architectures: arm64
+:Target: VM
+:Parameters: args[0] contains the memory slot number
+             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
+                     which unlocks all previously locked memslots.
+:Returns: 0 on success; negative error code on failure
+
+Enabling this capability causes the memory pinned when locking the memslot
+specified in args[0] to be unpinned, or, optionally, all memslots to be
+unlocked. The IPA range is not unmapped from stage 2.
+>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 02d378887743..2c50734f048d 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
+int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
+int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
+
 static inline unsigned int kvm_get_vmid_bits(void)
 {
 	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e9b4ad7b5c82..d49905d18cee 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
 	return 0;
 }
 
+static int kvm_arm_lock_memslot_supported(void)
+{
+	return 0;
+}
+
+static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
+					     struct kvm_enable_cap *cap)
+{
+	u64 slot, action_flags;
+	u32 action;
+
+	if (cap->args[2] || cap->args[3])
+		return -EINVAL;
+
+	slot = cap->args[0];
+	action = cap->flags;
+	action_flags = cap->args[1];
+
+	switch (action) {
+	case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
+		return kvm_mmu_lock_memslot(kvm, slot, action_flags);
+	case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
+		return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
+	default:
+		return -EINVAL;
+	}
+}
+
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
 	int r;
 
-	if (cap->flags)
-		return -EINVAL;
-
 	switch (cap->cap) {
 	case KVM_CAP_ARM_NISV_TO_USER:
+		if (cap->flags)
+			return -EINVAL;
 		r = 0;
 		kvm->arch.return_nisv_io_abort_to_user = true;
 		break;
@@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->lock);
 		break;
+	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
+		if (!kvm_arm_lock_memslot_supported())
+			return -EINVAL;
+		r = kvm_lock_user_memory_region_ioctl(kvm, cap);
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
 	return VM_FAULT_SIGBUS;
 }
 
-
 /**
  * kvm_arch_destroy_vm - destroy the VM data structure
  * @kvm:	pointer to the KVM struct
@@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_PTRAUTH_GENERIC:
 		r = system_has_full_ptr_auth();
 		break;
+	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
+		r = kvm_arm_lock_memslot_supported();
+		break;
 	default:
 		r = 0;
 	}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 326cdfec74a1..f65bcbc9ae69 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
+{
+	struct kvm_memory_slot *memslot;
+	int ret;
+
+	if (slot >= KVM_MEM_SLOTS_NUM)
+		return -EINVAL;
+
+	if (!(flags & KVM_ARM_LOCK_MEM_READ))
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+	if (!kvm_lock_all_vcpus(kvm)) {
+		ret = -EBUSY;
+		goto out_unlock_kvm;
+	}
+	mutex_lock(&kvm->slots_lock);
+
+	memslot = id_to_memslot(kvm_memslots(kvm), slot);
+	if (!memslot) {
+		ret = -EINVAL;
+		goto out_unlock_slots;
+	}
+	if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
+	    ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
+		ret = -EPERM;
+		goto out_unlock_slots;
+	}
+
+	ret = -EINVAL;
+
+out_unlock_slots:
+	mutex_unlock(&kvm->slots_lock);
+	kvm_unlock_all_vcpus(kvm);
+out_unlock_kvm:
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
+int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
+{
+	bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
+	struct kvm_memory_slot *memslot;
+	int ret;
+
+	if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
+		return -EINVAL;
+
+	mutex_lock(&kvm->slots_lock);
+
+	if (unlock_all) {
+		ret = -EINVAL;
+		goto out_unlock_slots;
+	}
+
+	memslot = id_to_memslot(kvm_memslots(kvm), slot);
+	if (!memslot) {
+		ret = -EINVAL;
+		goto out_unlock_slots;
+	}
+
+	ret = -EINVAL;
+
+out_unlock_slots:
+	mutex_unlock(&kvm->slots_lock);
+	return ret;
+}
+
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	if (!kvm->arch.mmu.pgt)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1daa45268de2..70c969967557 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
+#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 #define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
 
+/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
+#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK	(1 << 0)
+#define   KVM_ARM_LOCK_MEM_READ				(1 << 0)
+#define   KVM_ARM_LOCK_MEM_WRITE			(1 << 1)
+#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK	(1 << 1)
+#define   KVM_ARM_UNLOCK_MEM_ALL			(1 << 0)
+
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
 
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 03/38] KVM: arm64: Implement the memslot lock/unlock functionality
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Pin memory in the process address space and map it in the stage 2 tables as
a result of userspace enabling the KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
capability; and unpin it from the process address space when the capability
is used with the KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK flag.

The current implementation has two drawbacks which will be fixed in future
patches:

- The dcache maintenance is done when the memslot is locked, which means
  that it is possible that memory changes made by userspace after the ioctl
  completes won't be visible to a guest running with the MMU off.

- Tag scrubbing is done when the memslot is locked. If the MTE capability
  is enabled after the ioctl, the guest will be able to access unsanitised
  pages. This is prevented by forbidding userspace to enable the MTE
  capability if any memslots are locked.

Only PAGE_SIZE mappings are supported at stage 2.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst    |   4 +-
 arch/arm64/include/asm/kvm_host.h |  11 ++
 arch/arm64/kvm/arm.c              |  22 +++-
 arch/arm64/kvm/mmu.c              | 204 ++++++++++++++++++++++++++++--
 4 files changed, 226 insertions(+), 15 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 16aa59eae3d9..0ac12a730013 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6979,8 +6979,8 @@ write permissions are specified for a memslot which logs dirty pages.
 
 Enabling this capability causes the memory pinned when locking the memslot
 specified in args[0] to be unpinned, or, optionally, all memslots to be
-unlocked. The IPA range is not unmapped from stage 2.
->>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
+unlocked. The IPA range is not unmapped from stage 2. It is considered an error
+to attempt to unlock a memslot which is not locked.
 
 8. Other capabilities.
 ======================
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 733621e41900..7fd70ad90c16 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -99,7 +99,18 @@ struct kvm_s2_mmu {
 	struct kvm_arch *arch;
 };
 
+#define KVM_MEMSLOT_LOCK_READ		(1 << 0)
+#define KVM_MEMSLOT_LOCK_WRITE		(1 << 1)
+#define KVM_MEMSLOT_LOCK_MASK		0x3
+
+struct kvm_memory_slot_page {
+	struct list_head list;
+	struct page *page;
+};
+
 struct kvm_arch_memory_slot {
+	struct kvm_memory_slot_page pages;
+	u32 flags;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index d49905d18cee..b9b8b43835e3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -106,6 +106,25 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
 	}
 }
 
+static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot;
+	bool has_locked_memslots = false;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+	kvm_for_each_memslot(memslot, slots) {
+		if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
+			has_locked_memslots = true;
+			break;
+		}
+	}
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return has_locked_memslots;
+}
+
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
@@ -120,7 +139,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 	case KVM_CAP_ARM_MTE:
 		mutex_lock(&kvm->lock);
-		if (!system_supports_mte() || kvm->created_vcpus) {
+		if (!system_supports_mte() || kvm->created_vcpus ||
+		    (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
 			r = -EINVAL;
 		} else {
 			r = 0;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f65bcbc9ae69..b0a8e61315e4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -72,6 +72,11 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
 	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
 }
 
+static bool memslot_is_locked(struct kvm_memory_slot *memslot)
+{
+	return memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK;
+}
+
 /**
  * kvm_flush_remote_tlbs() - flush all VM TLB entries for v7/8
  * @kvm:	pointer to kvm structure.
@@ -769,6 +774,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
 	if (map_size == PAGE_SIZE)
 		return true;
 
+	/* Allow only PAGE_SIZE mappings for locked memslots */
+	if (memslot_is_locked(memslot))
+		return false;
+
 	size = memslot->npages * PAGE_SIZE;
 
 	gpa_start = memslot->base_gfn << PAGE_SHIFT;
@@ -1296,6 +1305,159 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+static int try_rlimit_memlock(unsigned long npages)
+{
+	unsigned long lock_limit;
+	bool has_lock_cap;
+	int ret = 0;
+
+	has_lock_cap = capable(CAP_IPC_LOCK);
+	if (has_lock_cap)
+		goto out;
+
+	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+	mmap_read_lock(current->mm);
+	if (npages + current->mm->locked_vm > lock_limit)
+		ret = -ENOMEM;
+	mmap_read_unlock(current->mm);
+
+out:
+	return ret;
+}
+
+static void unpin_memslot_pages(struct kvm_memory_slot *memslot, bool writable)
+{
+	struct kvm_memory_slot_page *entry, *tmp;
+
+	list_for_each_entry_safe(entry, tmp, &memslot->arch.pages.list, list) {
+		if (writable)
+			set_page_dirty_lock(entry->page);
+		unpin_user_page(entry->page);
+		kfree(entry);
+	}
+}
+
+static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
+			u64 flags)
+{
+	struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
+	struct kvm_memory_slot_page *page_entry;
+	bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
+	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+	struct vm_area_struct *vma;
+	unsigned long npages = memslot->npages;
+	unsigned int pin_flags = FOLL_LONGTERM;
+	unsigned long i, hva, ipa, mmu_seq;
+	int ret;
+
+	ret = try_rlimit_memlock(npages);
+	if (ret)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&memslot->arch.pages.list);
+
+	if (writable) {
+		prot |= KVM_PGTABLE_PROT_W;
+		pin_flags |= FOLL_WRITE;
+	}
+
+	hva = memslot->userspace_addr;
+	ipa = memslot->base_gfn << PAGE_SHIFT;
+
+	mmu_seq = kvm->mmu_notifier_seq;
+	smp_rmb();
+
+	for (i = 0; i < npages; i++) {
+		page_entry = kzalloc(sizeof(*page_entry), GFP_KERNEL);
+		if (!page_entry) {
+			unpin_memslot_pages(memslot, writable);
+			ret = -ENOMEM;
+			goto out_err;
+		}
+
+		mmap_read_lock(current->mm);
+		ret = pin_user_pages(hva, 1, pin_flags, &page_entry->page, &vma);
+		if (ret != 1) {
+			mmap_read_unlock(current->mm);
+			unpin_memslot_pages(memslot, writable);
+			ret = -ENOMEM;
+			goto out_err;
+		}
+		if (kvm_has_mte(kvm)) {
+			if (vma->vm_flags & VM_SHARED) {
+				ret = -EFAULT;
+			} else {
+				ret = sanitise_mte_tags(kvm,
+					page_to_pfn(page_entry->page),
+					PAGE_SIZE);
+			}
+			if (ret) {
+				mmap_read_unlock(current->mm);
+				goto out_err;
+			}
+		}
+		mmap_read_unlock(current->mm);
+
+		ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
+		if (ret) {
+			unpin_memslot_pages(memslot, writable);
+			goto out_err;
+		}
+
+		spin_lock(&kvm->mmu_lock);
+		if (mmu_notifier_retry(kvm, mmu_seq)) {
+			spin_unlock(&kvm->mmu_lock);
+			unpin_memslot_pages(memslot, writable);
+			ret = -EAGAIN;
+			goto out_err;
+		}
+
+		ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
+					     page_to_phys(page_entry->page),
+					     prot, &cache);
+		spin_unlock(&kvm->mmu_lock);
+
+		if (ret) {
+			kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+						 i << PAGE_SHIFT);
+			unpin_memslot_pages(memslot, writable);
+			goto out_err;
+		}
+		list_add(&page_entry->list, &memslot->arch.pages.list);
+
+		hva += PAGE_SIZE;
+		ipa += PAGE_SIZE;
+	}
+
+
+	/*
+	 * Even though we've checked the limit at the start, we can still exceed
+	 * it if userspace locked other pages in the meantime or if the
+	 * CAP_IPC_LOCK capability has been revoked.
+	 */
+	ret = account_locked_vm(current->mm, npages, true);
+	if (ret) {
+		kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+					 npages << PAGE_SHIFT);
+		unpin_memslot_pages(memslot, writable);
+		goto out_err;
+	}
+
+	memslot->arch.flags = KVM_MEMSLOT_LOCK_READ;
+	if (writable)
+		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
+
+	kvm_mmu_free_memory_cache(&cache);
+
+	return 0;
+
+out_err:
+	kvm_mmu_free_memory_cache(&cache);
+	return ret;
+}
+
 int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 {
 	struct kvm_memory_slot *memslot;
@@ -1325,7 +1487,12 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 		goto out_unlock_slots;
 	}
 
-	ret = -EINVAL;
+	if (memslot_is_locked(memslot)) {
+		ret = -EBUSY;
+		goto out_unlock_slots;
+	}
+
+	ret = lock_memslot(kvm, memslot, flags);
 
 out_unlock_slots:
 	mutex_unlock(&kvm->slots_lock);
@@ -1335,11 +1502,22 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 	return ret;
 }
 
+static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+	bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
+	unsigned long npages = memslot->npages;
+
+	unpin_memslot_pages(memslot, writable);
+	account_locked_vm(current->mm, npages, false);
+
+	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
+}
+
 int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 {
 	bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
 	struct kvm_memory_slot *memslot;
-	int ret;
+	int ret = 0;
 
 	if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
 		return -EINVAL;
@@ -1347,18 +1525,20 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 	mutex_lock(&kvm->slots_lock);
 
 	if (unlock_all) {
-		ret = -EINVAL;
-		goto out_unlock_slots;
-	}
-
-	memslot = id_to_memslot(kvm_memslots(kvm), slot);
-	if (!memslot) {
-		ret = -EINVAL;
-		goto out_unlock_slots;
+		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+			if (!memslot_is_locked(memslot))
+				continue;
+			unlock_memslot(kvm, memslot);
+		}
+	} else {
+		memslot = id_to_memslot(kvm_memslots(kvm), slot);
+		if (!memslot || !memslot_is_locked(memslot)) {
+			ret = -EINVAL;
+			goto out_unlock_slots;
+		}
+		unlock_memslot(kvm, memslot);
 	}
 
-	ret = -EINVAL;
-
 out_unlock_slots:
 	mutex_unlock(&kvm->slots_lock);
 	return ret;
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 03/38] KVM: arm64: Implement the memslot lock/unlock functionality
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Pin memory in the process address space and map it in the stage 2 tables as
a result of userspace enabling the KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
capability; and unpin it from the process address space when the capability
is used with the KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK flag.

The current implementation has two drawbacks which will be fixed in future
patches:

- The dcache maintenance is done when the memslot is locked, which means
  that it is possible that memory changes made by userspace after the ioctl
  completes won't be visible to a guest running with the MMU off.

- Tag scrubbing is done when the memslot is locked. If the MTE capability
  is enabled after the ioctl, the guest will be able to access unsanitised
  pages. This is prevented by forbidding userspace to enable the MTE
  capability if any memslots are locked.

Only PAGE_SIZE mappings are supported at stage 2.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst    |   4 +-
 arch/arm64/include/asm/kvm_host.h |  11 ++
 arch/arm64/kvm/arm.c              |  22 +++-
 arch/arm64/kvm/mmu.c              | 204 ++++++++++++++++++++++++++++--
 4 files changed, 226 insertions(+), 15 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 16aa59eae3d9..0ac12a730013 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6979,8 +6979,8 @@ write permissions are specified for a memslot which logs dirty pages.
 
 Enabling this capability causes the memory pinned when locking the memslot
 specified in args[0] to be unpinned, or, optionally, all memslots to be
-unlocked. The IPA range is not unmapped from stage 2.
->>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
+unlocked. The IPA range is not unmapped from stage 2. It is considered an error
+to attempt to unlock a memslot which is not locked.
 
 8. Other capabilities.
 ======================
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 733621e41900..7fd70ad90c16 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -99,7 +99,18 @@ struct kvm_s2_mmu {
 	struct kvm_arch *arch;
 };
 
+#define KVM_MEMSLOT_LOCK_READ		(1 << 0)
+#define KVM_MEMSLOT_LOCK_WRITE		(1 << 1)
+#define KVM_MEMSLOT_LOCK_MASK		0x3
+
+struct kvm_memory_slot_page {
+	struct list_head list;
+	struct page *page;
+};
+
 struct kvm_arch_memory_slot {
+	struct kvm_memory_slot_page pages;
+	u32 flags;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index d49905d18cee..b9b8b43835e3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -106,6 +106,25 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
 	}
 }
 
+static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot;
+	bool has_locked_memslots = false;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+	kvm_for_each_memslot(memslot, slots) {
+		if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
+			has_locked_memslots = true;
+			break;
+		}
+	}
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return has_locked_memslots;
+}
+
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
@@ -120,7 +139,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 	case KVM_CAP_ARM_MTE:
 		mutex_lock(&kvm->lock);
-		if (!system_supports_mte() || kvm->created_vcpus) {
+		if (!system_supports_mte() || kvm->created_vcpus ||
+		    (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
 			r = -EINVAL;
 		} else {
 			r = 0;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f65bcbc9ae69..b0a8e61315e4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -72,6 +72,11 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
 	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
 }
 
+static bool memslot_is_locked(struct kvm_memory_slot *memslot)
+{
+	return memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK;
+}
+
 /**
  * kvm_flush_remote_tlbs() - flush all VM TLB entries for v7/8
  * @kvm:	pointer to kvm structure.
@@ -769,6 +774,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
 	if (map_size == PAGE_SIZE)
 		return true;
 
+	/* Allow only PAGE_SIZE mappings for locked memslots */
+	if (memslot_is_locked(memslot))
+		return false;
+
 	size = memslot->npages * PAGE_SIZE;
 
 	gpa_start = memslot->base_gfn << PAGE_SHIFT;
@@ -1296,6 +1305,159 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+static int try_rlimit_memlock(unsigned long npages)
+{
+	unsigned long lock_limit;
+	bool has_lock_cap;
+	int ret = 0;
+
+	has_lock_cap = capable(CAP_IPC_LOCK);
+	if (has_lock_cap)
+		goto out;
+
+	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+	mmap_read_lock(current->mm);
+	if (npages + current->mm->locked_vm > lock_limit)
+		ret = -ENOMEM;
+	mmap_read_unlock(current->mm);
+
+out:
+	return ret;
+}
+
+static void unpin_memslot_pages(struct kvm_memory_slot *memslot, bool writable)
+{
+	struct kvm_memory_slot_page *entry, *tmp;
+
+	list_for_each_entry_safe(entry, tmp, &memslot->arch.pages.list, list) {
+		if (writable)
+			set_page_dirty_lock(entry->page);
+		unpin_user_page(entry->page);
+		kfree(entry);
+	}
+}
+
+static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
+			u64 flags)
+{
+	struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
+	struct kvm_memory_slot_page *page_entry;
+	bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
+	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+	struct vm_area_struct *vma;
+	unsigned long npages = memslot->npages;
+	unsigned int pin_flags = FOLL_LONGTERM;
+	unsigned long i, hva, ipa, mmu_seq;
+	int ret;
+
+	ret = try_rlimit_memlock(npages);
+	if (ret)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&memslot->arch.pages.list);
+
+	if (writable) {
+		prot |= KVM_PGTABLE_PROT_W;
+		pin_flags |= FOLL_WRITE;
+	}
+
+	hva = memslot->userspace_addr;
+	ipa = memslot->base_gfn << PAGE_SHIFT;
+
+	mmu_seq = kvm->mmu_notifier_seq;
+	smp_rmb();
+
+	for (i = 0; i < npages; i++) {
+		page_entry = kzalloc(sizeof(*page_entry), GFP_KERNEL);
+		if (!page_entry) {
+			unpin_memslot_pages(memslot, writable);
+			ret = -ENOMEM;
+			goto out_err;
+		}
+
+		mmap_read_lock(current->mm);
+		ret = pin_user_pages(hva, 1, pin_flags, &page_entry->page, &vma);
+		if (ret != 1) {
+			mmap_read_unlock(current->mm);
+			unpin_memslot_pages(memslot, writable);
+			ret = -ENOMEM;
+			goto out_err;
+		}
+		if (kvm_has_mte(kvm)) {
+			if (vma->vm_flags & VM_SHARED) {
+				ret = -EFAULT;
+			} else {
+				ret = sanitise_mte_tags(kvm,
+					page_to_pfn(page_entry->page),
+					PAGE_SIZE);
+			}
+			if (ret) {
+				mmap_read_unlock(current->mm);
+				goto out_err;
+			}
+		}
+		mmap_read_unlock(current->mm);
+
+		ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
+		if (ret) {
+			unpin_memslot_pages(memslot, writable);
+			goto out_err;
+		}
+
+		spin_lock(&kvm->mmu_lock);
+		if (mmu_notifier_retry(kvm, mmu_seq)) {
+			spin_unlock(&kvm->mmu_lock);
+			unpin_memslot_pages(memslot, writable);
+			ret = -EAGAIN;
+			goto out_err;
+		}
+
+		ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
+					     page_to_phys(page_entry->page),
+					     prot, &cache);
+		spin_unlock(&kvm->mmu_lock);
+
+		if (ret) {
+			kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+						 i << PAGE_SHIFT);
+			unpin_memslot_pages(memslot, writable);
+			goto out_err;
+		}
+		list_add(&page_entry->list, &memslot->arch.pages.list);
+
+		hva += PAGE_SIZE;
+		ipa += PAGE_SIZE;
+	}
+
+
+	/*
+	 * Even though we've checked the limit at the start, we can still exceed
+	 * it if userspace locked other pages in the meantime or if the
+	 * CAP_IPC_LOCK capability has been revoked.
+	 */
+	ret = account_locked_vm(current->mm, npages, true);
+	if (ret) {
+		kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+					 npages << PAGE_SHIFT);
+		unpin_memslot_pages(memslot, writable);
+		goto out_err;
+	}
+
+	memslot->arch.flags = KVM_MEMSLOT_LOCK_READ;
+	if (writable)
+		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
+
+	kvm_mmu_free_memory_cache(&cache);
+
+	return 0;
+
+out_err:
+	kvm_mmu_free_memory_cache(&cache);
+	return ret;
+}
+
 int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 {
 	struct kvm_memory_slot *memslot;
@@ -1325,7 +1487,12 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 		goto out_unlock_slots;
 	}
 
-	ret = -EINVAL;
+	if (memslot_is_locked(memslot)) {
+		ret = -EBUSY;
+		goto out_unlock_slots;
+	}
+
+	ret = lock_memslot(kvm, memslot, flags);
 
 out_unlock_slots:
 	mutex_unlock(&kvm->slots_lock);
@@ -1335,11 +1502,22 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 	return ret;
 }
 
+static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+	bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
+	unsigned long npages = memslot->npages;
+
+	unpin_memslot_pages(memslot, writable);
+	account_locked_vm(current->mm, npages, false);
+
+	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
+}
+
 int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 {
 	bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
 	struct kvm_memory_slot *memslot;
-	int ret;
+	int ret = 0;
 
 	if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
 		return -EINVAL;
@@ -1347,18 +1525,20 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 	mutex_lock(&kvm->slots_lock);
 
 	if (unlock_all) {
-		ret = -EINVAL;
-		goto out_unlock_slots;
-	}
-
-	memslot = id_to_memslot(kvm_memslots(kvm), slot);
-	if (!memslot) {
-		ret = -EINVAL;
-		goto out_unlock_slots;
+		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+			if (!memslot_is_locked(memslot))
+				continue;
+			unlock_memslot(kvm, memslot);
+		}
+	} else {
+		memslot = id_to_memslot(kvm_memslots(kvm), slot);
+		if (!memslot || !memslot_is_locked(memslot)) {
+			ret = -EINVAL;
+			goto out_unlock_slots;
+		}
+		unlock_memslot(kvm, memslot);
 	}
 
-	ret = -EINVAL;
-
 out_unlock_slots:
 	mutex_unlock(&kvm->slots_lock);
 	return ret;
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 04/38] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

KVM relies on doing dcache maintenance on stage 2 faults to present to a
guest running with the MMU off the same view of memory as userspace. For
locked memslots, KVM so far has done the dcache maintenance when a memslot
is locked, but that leaves KVM in a rather awkward position: what userspace
writes to guest memory after the memslot is locked, but before a VCPU is
run, might not be visible to the guest.

Fix this by deferring the dcache maintenance until the first VCPU is run.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  7 ++++
 arch/arm64/include/asm/kvm_mmu.h  |  5 +++
 arch/arm64/kvm/arm.c              |  3 ++
 arch/arm64/kvm/mmu.c              | 55 ++++++++++++++++++++++++++++---
 4 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7fd70ad90c16..3b4839b447c4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -113,6 +113,10 @@ struct kvm_arch_memory_slot {
 	u32 flags;
 };
 
+/* kvm->arch.mmu_pending_ops flags */
+#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
+#define KVM_MAX_MMU_PENDING_OPS		1
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -136,6 +140,9 @@ struct kvm_arch {
 	 */
 	bool return_nisv_io_abort_to_user;
 
+	/* Defer MMU operations until a VCPU is run. */
+	unsigned long mmu_pending_ops;
+
 	/*
 	 * VM-wide PMU filter, implemented as a bitmap and big enough for
 	 * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 2c50734f048d..cbf57c474fea 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
 int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
 
+#define kvm_mmu_has_pending_ops(kvm)	\
+	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
+
+void kvm_mmu_perform_pending_ops(struct kvm *kvm);
+
 static inline unsigned int kvm_get_vmid_bits(void)
 {
 	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b9b8b43835e3..96ed48455cdd 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -870,6 +870,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 	if (unlikely(!kvm_vcpu_initialized(vcpu)))
 		return -ENOEXEC;
 
+	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
+		kvm_mmu_perform_pending_ops(vcpu->kvm);
+
 	ret = kvm_vcpu_first_run_init(vcpu);
 	if (ret)
 		return ret;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b0a8e61315e4..8e4787019840 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1305,6 +1305,40 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+/*
+ * It's safe to do the CMOs when the first VCPU is run because:
+ * - VCPUs cannot run until mmu_cmo_needed is cleared.
+ * - Memslots cannot be modified because we hold the kvm->slots_lock.
+ *
+ * It's safe to periodically release the mmu_lock because:
+ * - VCPUs cannot run.
+ * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
+ *   the mmu_lock, which means accesses will be serialized.
+ * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
+ *   is live, which means that the VM will be live.
+ */
+void kvm_mmu_perform_pending_ops(struct kvm *kvm)
+{
+	struct kvm_memory_slot *memslot;
+
+	mutex_lock(&kvm->slots_lock);
+	if (!kvm_mmu_has_pending_ops(kvm))
+		goto out_unlock;
+
+	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
+		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+			if (!memslot_is_locked(memslot))
+				continue;
+			stage2_flush_memslot(kvm, memslot);
+		}
+		clear_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
+	}
+
+out_unlock:
+	mutex_unlock(&kvm->slots_lock);
+	return;
+}
+
 static int try_rlimit_memlock(unsigned long npages)
 {
 	unsigned long lock_limit;
@@ -1345,7 +1379,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	struct kvm_memory_slot_page *page_entry;
 	bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
-	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+	struct kvm_pgtable pgt;
+	struct kvm_pgtable_mm_ops mm_ops;
 	struct vm_area_struct *vma;
 	unsigned long npages = memslot->npages;
 	unsigned int pin_flags = FOLL_LONGTERM;
@@ -1363,6 +1398,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		pin_flags |= FOLL_WRITE;
 	}
 
+	/*
+	 * Make a copy of the stage 2 translation table struct to remove the
+	 * dcache callback so we can postpone the cache maintenance operations
+	 * until the first VCPU is run.
+	 */
+	mm_ops = *kvm->arch.mmu.pgt->mm_ops;
+	mm_ops.dcache_clean_inval_poc = NULL;
+	pgt = *kvm->arch.mmu.pgt;
+	pgt.mm_ops = &mm_ops;
+
 	hva = memslot->userspace_addr;
 	ipa = memslot->base_gfn << PAGE_SHIFT;
 
@@ -1414,13 +1459,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			goto out_err;
 		}
 
-		ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
+		ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
 					     page_to_phys(page_entry->page),
 					     prot, &cache);
 		spin_unlock(&kvm->mmu_lock);
 
 		if (ret) {
-			kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+			kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
 						 i << PAGE_SHIFT);
 			unpin_memslot_pages(memslot, writable);
 			goto out_err;
@@ -1439,7 +1484,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	 */
 	ret = account_locked_vm(current->mm, npages, true);
 	if (ret) {
-		kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+		kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
 					 npages << PAGE_SHIFT);
 		unpin_memslot_pages(memslot, writable);
 		goto out_err;
@@ -1449,6 +1494,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	if (writable)
 		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
 
+	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
+
 	kvm_mmu_free_memory_cache(&cache);
 
 	return 0;
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 04/38] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

KVM relies on doing dcache maintenance on stage 2 faults to present to a
guest running with the MMU off the same view of memory as userspace. For
locked memslots, KVM so far has done the dcache maintenance when a memslot
is locked, but that leaves KVM in a rather awkward position: what userspace
writes to guest memory after the memslot is locked, but before a VCPU is
run, might not be visible to the guest.

Fix this by deferring the dcache maintenance until the first VCPU is run.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  7 ++++
 arch/arm64/include/asm/kvm_mmu.h  |  5 +++
 arch/arm64/kvm/arm.c              |  3 ++
 arch/arm64/kvm/mmu.c              | 55 ++++++++++++++++++++++++++++---
 4 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7fd70ad90c16..3b4839b447c4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -113,6 +113,10 @@ struct kvm_arch_memory_slot {
 	u32 flags;
 };
 
+/* kvm->arch.mmu_pending_ops flags */
+#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
+#define KVM_MAX_MMU_PENDING_OPS		1
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -136,6 +140,9 @@ struct kvm_arch {
 	 */
 	bool return_nisv_io_abort_to_user;
 
+	/* Defer MMU operations until a VCPU is run. */
+	unsigned long mmu_pending_ops;
+
 	/*
 	 * VM-wide PMU filter, implemented as a bitmap and big enough for
 	 * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 2c50734f048d..cbf57c474fea 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
 int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
 
+#define kvm_mmu_has_pending_ops(kvm)	\
+	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
+
+void kvm_mmu_perform_pending_ops(struct kvm *kvm);
+
 static inline unsigned int kvm_get_vmid_bits(void)
 {
 	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b9b8b43835e3..96ed48455cdd 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -870,6 +870,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 	if (unlikely(!kvm_vcpu_initialized(vcpu)))
 		return -ENOEXEC;
 
+	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
+		kvm_mmu_perform_pending_ops(vcpu->kvm);
+
 	ret = kvm_vcpu_first_run_init(vcpu);
 	if (ret)
 		return ret;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b0a8e61315e4..8e4787019840 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1305,6 +1305,40 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+/*
+ * It's safe to do the CMOs when the first VCPU is run because:
+ * - VCPUs cannot run until mmu_cmo_needed is cleared.
+ * - Memslots cannot be modified because we hold the kvm->slots_lock.
+ *
+ * It's safe to periodically release the mmu_lock because:
+ * - VCPUs cannot run.
+ * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
+ *   the mmu_lock, which means accesses will be serialized.
+ * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
+ *   is live, which means that the VM will be live.
+ */
+void kvm_mmu_perform_pending_ops(struct kvm *kvm)
+{
+	struct kvm_memory_slot *memslot;
+
+	mutex_lock(&kvm->slots_lock);
+	if (!kvm_mmu_has_pending_ops(kvm))
+		goto out_unlock;
+
+	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
+		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+			if (!memslot_is_locked(memslot))
+				continue;
+			stage2_flush_memslot(kvm, memslot);
+		}
+		clear_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
+	}
+
+out_unlock:
+	mutex_unlock(&kvm->slots_lock);
+	return;
+}
+
 static int try_rlimit_memlock(unsigned long npages)
 {
 	unsigned long lock_limit;
@@ -1345,7 +1379,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	struct kvm_memory_slot_page *page_entry;
 	bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
-	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+	struct kvm_pgtable pgt;
+	struct kvm_pgtable_mm_ops mm_ops;
 	struct vm_area_struct *vma;
 	unsigned long npages = memslot->npages;
 	unsigned int pin_flags = FOLL_LONGTERM;
@@ -1363,6 +1398,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		pin_flags |= FOLL_WRITE;
 	}
 
+	/*
+	 * Make a copy of the stage 2 translation table struct to remove the
+	 * dcache callback so we can postpone the cache maintenance operations
+	 * until the first VCPU is run.
+	 */
+	mm_ops = *kvm->arch.mmu.pgt->mm_ops;
+	mm_ops.dcache_clean_inval_poc = NULL;
+	pgt = *kvm->arch.mmu.pgt;
+	pgt.mm_ops = &mm_ops;
+
 	hva = memslot->userspace_addr;
 	ipa = memslot->base_gfn << PAGE_SHIFT;
 
@@ -1414,13 +1459,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			goto out_err;
 		}
 
-		ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
+		ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
 					     page_to_phys(page_entry->page),
 					     prot, &cache);
 		spin_unlock(&kvm->mmu_lock);
 
 		if (ret) {
-			kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+			kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
 						 i << PAGE_SHIFT);
 			unpin_memslot_pages(memslot, writable);
 			goto out_err;
@@ -1439,7 +1484,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	 */
 	ret = account_locked_vm(current->mm, npages, true);
 	if (ret) {
-		kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+		kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
 					 npages << PAGE_SHIFT);
 		unpin_memslot_pages(memslot, writable);
 		goto out_err;
@@ -1449,6 +1494,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	if (writable)
 		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
 
+	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
+
 	kvm_mmu_free_memory_cache(&cache);
 
 	return 0;
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 05/38] KVM: arm64: Perform CMOs on locked memslots when userspace resets VCPUs
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Userspace resets a VCPU that has already run by means of a
KVM_ARM_VCPU_INIT ioctl. This is usually done after a VM shutdown and
before the same VM is rebooted, and during this interval the VM memory can
be modified by userspace (for example, to copy the original guest kernel
image). In this situation, KVM unmaps the entire stage 2 to trigger stage 2
faults, which ensures that the guest has the same view of memory as the
host's userspace.

Unmapping stage 2 is not an option for locked memslots, so instead do the
cache maintenance the first time a VCPU is run, similar to what KVM does
when a memslot is locked.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++-
 arch/arm64/kvm/mmu.c              | 15 ++++++++++++++-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3b4839b447c4..5f49a27ce289 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -115,7 +115,8 @@ struct kvm_arch_memory_slot {
 
 /* kvm->arch.mmu_pending_ops flags */
 #define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
-#define KVM_MAX_MMU_PENDING_OPS		1
+#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE	1
+#define KVM_MAX_MMU_PENDING_OPS		2
 
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8e4787019840..188064c5839c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -607,8 +607,16 @@ void stage2_unmap_vm(struct kvm *kvm)
 	spin_lock(&kvm->mmu_lock);
 
 	slots = kvm_memslots(kvm);
-	kvm_for_each_memslot(memslot, slots)
+	kvm_for_each_memslot(memslot, slots) {
+		if (memslot_is_locked(memslot)) {
+			set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE,
+				&kvm->arch.mmu_pending_ops);
+			set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE,
+				&kvm->arch.mmu_pending_ops);
+			continue;
+		}
 		stage2_unmap_memslot(kvm, memslot);
+	}
 
 	spin_unlock(&kvm->mmu_lock);
 	mmap_read_unlock(current->mm);
@@ -1334,6 +1342,11 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm)
 		clear_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
 	}
 
+	if (test_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE, &kvm->arch.mmu_pending_ops)) {
+		icache_inval_all_pou();
+		clear_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE, &kvm->arch.mmu_pending_ops);
+	}
+
 out_unlock:
 	mutex_unlock(&kvm->slots_lock);
 	return;
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 05/38] KVM: arm64: Perform CMOs on locked memslots when userspace resets VCPUs
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Userspace resets a VCPU that has already run by means of a
KVM_ARM_VCPU_INIT ioctl. This is usually done after a VM shutdown and
before the same VM is rebooted, and during this interval the VM memory can
be modified by userspace (for example, to copy the original guest kernel
image). In this situation, KVM unmaps the entire stage 2 to trigger stage 2
faults, which ensures that the guest has the same view of memory as the
host's userspace.

Unmapping stage 2 is not an option for locked memslots, so instead do the
cache maintenance the first time a VCPU is run, similar to what KVM does
when a memslot is locked.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++-
 arch/arm64/kvm/mmu.c              | 15 ++++++++++++++-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3b4839b447c4..5f49a27ce289 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -115,7 +115,8 @@ struct kvm_arch_memory_slot {
 
 /* kvm->arch.mmu_pending_ops flags */
 #define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
-#define KVM_MAX_MMU_PENDING_OPS		1
+#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE	1
+#define KVM_MAX_MMU_PENDING_OPS		2
 
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8e4787019840..188064c5839c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -607,8 +607,16 @@ void stage2_unmap_vm(struct kvm *kvm)
 	spin_lock(&kvm->mmu_lock);
 
 	slots = kvm_memslots(kvm);
-	kvm_for_each_memslot(memslot, slots)
+	kvm_for_each_memslot(memslot, slots) {
+		if (memslot_is_locked(memslot)) {
+			set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE,
+				&kvm->arch.mmu_pending_ops);
+			set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE,
+				&kvm->arch.mmu_pending_ops);
+			continue;
+		}
 		stage2_unmap_memslot(kvm, memslot);
+	}
 
 	spin_unlock(&kvm->mmu_lock);
 	mmap_read_unlock(current->mm);
@@ -1334,6 +1342,11 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm)
 		clear_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
 	}
 
+	if (test_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE, &kvm->arch.mmu_pending_ops)) {
+		icache_inval_all_pou();
+		clear_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE, &kvm->arch.mmu_pending_ops);
+	}
+
 out_unlock:
 	mutex_unlock(&kvm->slots_lock);
 	return;
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 06/38] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

When an MTE-enabled guest first accesses a physical page, that page must be
scrubbed for tags. This is normally done by KVM on a translation fault, but
with locked memslots we will not get translation faults. So far, this has
been handled by forbidding userspace to enable the MTE capability after
locking a memslot.

Remove this constraint by deferring tag cleaning until the first VCPU is
run, similar to how KVM handles cache maintenance operations.

When userspace resets a VCPU, KVM again performs cache maintenance
operations on locked memslots because userspace might have modified the
guest memory. Clean the tags the next time a VCPU is run for the same
reason.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  7 ++-
 arch/arm64/include/asm/kvm_mmu.h  |  2 +-
 arch/arm64/kvm/arm.c              | 29 ++--------
 arch/arm64/kvm/mmu.c              | 95 ++++++++++++++++++++++++++-----
 4 files changed, 91 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 5f49a27ce289..0ebdef158020 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -114,9 +114,10 @@ struct kvm_arch_memory_slot {
 };
 
 /* kvm->arch.mmu_pending_ops flags */
-#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
-#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE	1
-#define KVM_MAX_MMU_PENDING_OPS		2
+#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE		0
+#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE		1
+#define KVM_LOCKED_MEMSLOT_SANITISE_TAGS	2
+#define KVM_MAX_MMU_PENDING_OPS			3
 
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index cbf57c474fea..2d2f902000b3 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -222,7 +222,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
 #define kvm_mmu_has_pending_ops(kvm)	\
 	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
 
-void kvm_mmu_perform_pending_ops(struct kvm *kvm);
+int kvm_mmu_perform_pending_ops(struct kvm *kvm);
 
 static inline unsigned int kvm_get_vmid_bits(void)
 {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 96ed48455cdd..13f3af1f2e78 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -106,25 +106,6 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
 	}
 }
 
-static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
-{
-	struct kvm_memslots *slots = kvm_memslots(kvm);
-	struct kvm_memory_slot *memslot;
-	bool has_locked_memslots = false;
-	int idx;
-
-	idx = srcu_read_lock(&kvm->srcu);
-	kvm_for_each_memslot(memslot, slots) {
-		if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
-			has_locked_memslots = true;
-			break;
-		}
-	}
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	return has_locked_memslots;
-}
-
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
@@ -139,8 +120,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 	case KVM_CAP_ARM_MTE:
 		mutex_lock(&kvm->lock);
-		if (!system_supports_mte() || kvm->created_vcpus ||
-		    (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
+		if (!system_supports_mte() || kvm->created_vcpus) {
 			r = -EINVAL;
 		} else {
 			r = 0;
@@ -870,8 +850,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 	if (unlikely(!kvm_vcpu_initialized(vcpu)))
 		return -ENOEXEC;
 
-	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
-		kvm_mmu_perform_pending_ops(vcpu->kvm);
+	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm))) {
+		ret = kvm_mmu_perform_pending_ops(vcpu->kvm);
+		if (ret)
+			return ret;
+	}
 
 	ret = kvm_vcpu_first_run_init(vcpu);
 	if (ret)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 188064c5839c..2491e73e3d31 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -613,6 +613,15 @@ void stage2_unmap_vm(struct kvm *kvm)
 				&kvm->arch.mmu_pending_ops);
 			set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE,
 				&kvm->arch.mmu_pending_ops);
+			/*
+			 * stage2_unmap_vm() is called after a VCPU has run, at
+			 * which point the state of the MTE cap (either enabled
+			 * or disabled) is final.
+			 */
+			if (kvm_has_mte(kvm)) {
+				set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS,
+					&kvm->arch.mmu_pending_ops);
+			}
 			continue;
 		}
 		stage2_unmap_memslot(kvm, memslot);
@@ -956,6 +965,55 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
 	return 0;
 }
 
+static int sanitise_mte_tags_memslot(struct kvm *kvm,
+				     struct kvm_memory_slot *memslot)
+{
+	unsigned long hva, slot_size, slot_end;
+	struct kvm_memory_slot_page *entry;
+	struct page *page;
+	int ret = 0;
+
+	hva = memslot->userspace_addr;
+	slot_size = memslot->npages << PAGE_SHIFT;
+	slot_end = hva + slot_size;
+
+	/* First check that the VMAs spanning the memslot are not shared... */
+	do {
+		struct vm_area_struct *vma;
+
+		vma = find_vma_intersection(current->mm, hva, slot_end);
+		/* The VMAs spanning the memslot must be contiguous. */
+		if (!vma) {
+			ret = -EFAULT;
+			goto out;
+		}
+		/*
+		 * VM_SHARED mappings are not allowed with MTE to avoid races
+		 * when updating the PG_mte_tagged page flag, see
+		 * sanitise_mte_tags for more details.
+		 */
+		if (vma->vm_flags & VM_SHARED) {
+			ret = -EFAULT;
+			goto out;
+		}
+		hva = min(slot_end, vma->vm_end);
+	} while (hva < slot_end);
+
+	/* ... then clear the tags. */
+	list_for_each_entry(entry, &memslot->arch.pages.list, list) {
+		page = entry->page;
+		if (!test_bit(PG_mte_tagged, &page->flags)) {
+			mte_clear_page_tags(page_address(page));
+			set_bit(PG_mte_tagged, &page->flags);
+		}
+	}
+
+out:
+	mmap_read_unlock(current->mm);
+
+	return ret;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -1325,14 +1383,29 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
  * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
  *   is live, which means that the VM will be live.
  */
-void kvm_mmu_perform_pending_ops(struct kvm *kvm)
+int kvm_mmu_perform_pending_ops(struct kvm *kvm)
 {
 	struct kvm_memory_slot *memslot;
+	int ret = 0;
 
 	mutex_lock(&kvm->slots_lock);
 	if (!kvm_mmu_has_pending_ops(kvm))
 		goto out_unlock;
 
+	if (kvm_has_mte(kvm) &&
+	    (test_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops))) {
+		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+			if (!memslot_is_locked(memslot))
+				continue;
+			mmap_read_lock(current->mm);
+			ret = sanitise_mte_tags_memslot(kvm, memslot);
+			mmap_read_unlock(current->mm);
+			if (ret)
+				goto out_unlock;
+		}
+		clear_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
+	}
+
 	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
 		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
 			if (!memslot_is_locked(memslot))
@@ -1349,7 +1422,7 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm)
 
 out_unlock:
 	mutex_unlock(&kvm->slots_lock);
-	return;
+	return ret;
 }
 
 static int try_rlimit_memlock(unsigned long npages)
@@ -1443,19 +1516,6 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			ret = -ENOMEM;
 			goto out_err;
 		}
-		if (kvm_has_mte(kvm)) {
-			if (vma->vm_flags & VM_SHARED) {
-				ret = -EFAULT;
-			} else {
-				ret = sanitise_mte_tags(kvm,
-					page_to_pfn(page_entry->page),
-					PAGE_SIZE);
-			}
-			if (ret) {
-				mmap_read_unlock(current->mm);
-				goto out_err;
-			}
-		}
 		mmap_read_unlock(current->mm);
 
 		ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
@@ -1508,6 +1568,11 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
 
 	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
+	/*
+	 * MTE might be enabled after we lock the memslot, set it here
+	 * unconditionally.
+	 */
+	set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
 
 	kvm_mmu_free_memory_cache(&cache);
 
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 06/38] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

When an MTE-enabled guest first accesses a physical page, that page must be
scrubbed for tags. This is normally done by KVM on a translation fault, but
with locked memslots we will not get translation faults. So far, this has
been handled by forbidding userspace to enable the MTE capability after
locking a memslot.

Remove this constraint by deferring tag cleaning until the first VCPU is
run, similar to how KVM handles cache maintenance operations.

When userspace resets a VCPU, KVM again performs cache maintenance
operations on locked memslots because userspace might have modified the
guest memory. Clean the tags the next time a VCPU is run for the same
reason.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  7 ++-
 arch/arm64/include/asm/kvm_mmu.h  |  2 +-
 arch/arm64/kvm/arm.c              | 29 ++--------
 arch/arm64/kvm/mmu.c              | 95 ++++++++++++++++++++++++++-----
 4 files changed, 91 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 5f49a27ce289..0ebdef158020 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -114,9 +114,10 @@ struct kvm_arch_memory_slot {
 };
 
 /* kvm->arch.mmu_pending_ops flags */
-#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
-#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE	1
-#define KVM_MAX_MMU_PENDING_OPS		2
+#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE		0
+#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE		1
+#define KVM_LOCKED_MEMSLOT_SANITISE_TAGS	2
+#define KVM_MAX_MMU_PENDING_OPS			3
 
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index cbf57c474fea..2d2f902000b3 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -222,7 +222,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
 #define kvm_mmu_has_pending_ops(kvm)	\
 	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
 
-void kvm_mmu_perform_pending_ops(struct kvm *kvm);
+int kvm_mmu_perform_pending_ops(struct kvm *kvm);
 
 static inline unsigned int kvm_get_vmid_bits(void)
 {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 96ed48455cdd..13f3af1f2e78 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -106,25 +106,6 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
 	}
 }
 
-static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
-{
-	struct kvm_memslots *slots = kvm_memslots(kvm);
-	struct kvm_memory_slot *memslot;
-	bool has_locked_memslots = false;
-	int idx;
-
-	idx = srcu_read_lock(&kvm->srcu);
-	kvm_for_each_memslot(memslot, slots) {
-		if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
-			has_locked_memslots = true;
-			break;
-		}
-	}
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	return has_locked_memslots;
-}
-
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
@@ -139,8 +120,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 	case KVM_CAP_ARM_MTE:
 		mutex_lock(&kvm->lock);
-		if (!system_supports_mte() || kvm->created_vcpus ||
-		    (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
+		if (!system_supports_mte() || kvm->created_vcpus) {
 			r = -EINVAL;
 		} else {
 			r = 0;
@@ -870,8 +850,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 	if (unlikely(!kvm_vcpu_initialized(vcpu)))
 		return -ENOEXEC;
 
-	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
-		kvm_mmu_perform_pending_ops(vcpu->kvm);
+	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm))) {
+		ret = kvm_mmu_perform_pending_ops(vcpu->kvm);
+		if (ret)
+			return ret;
+	}
 
 	ret = kvm_vcpu_first_run_init(vcpu);
 	if (ret)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 188064c5839c..2491e73e3d31 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -613,6 +613,15 @@ void stage2_unmap_vm(struct kvm *kvm)
 				&kvm->arch.mmu_pending_ops);
 			set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE,
 				&kvm->arch.mmu_pending_ops);
+			/*
+			 * stage2_unmap_vm() is called after a VCPU has run, at
+			 * which point the state of the MTE cap (either enabled
+			 * or disabled) is final.
+			 */
+			if (kvm_has_mte(kvm)) {
+				set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS,
+					&kvm->arch.mmu_pending_ops);
+			}
 			continue;
 		}
 		stage2_unmap_memslot(kvm, memslot);
@@ -956,6 +965,55 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
 	return 0;
 }
 
+static int sanitise_mte_tags_memslot(struct kvm *kvm,
+				     struct kvm_memory_slot *memslot)
+{
+	unsigned long hva, slot_size, slot_end;
+	struct kvm_memory_slot_page *entry;
+	struct page *page;
+	int ret = 0;
+
+	hva = memslot->userspace_addr;
+	slot_size = memslot->npages << PAGE_SHIFT;
+	slot_end = hva + slot_size;
+
+	/* First check that the VMAs spanning the memslot are not shared... */
+	do {
+		struct vm_area_struct *vma;
+
+		vma = find_vma_intersection(current->mm, hva, slot_end);
+		/* The VMAs spanning the memslot must be contiguous. */
+		if (!vma) {
+			ret = -EFAULT;
+			goto out;
+		}
+		/*
+		 * VM_SHARED mappings are not allowed with MTE to avoid races
+		 * when updating the PG_mte_tagged page flag, see
+		 * sanitise_mte_tags for more details.
+		 */
+		if (vma->vm_flags & VM_SHARED) {
+			ret = -EFAULT;
+			goto out;
+		}
+		hva = min(slot_end, vma->vm_end);
+	} while (hva < slot_end);
+
+	/* ... then clear the tags. */
+	list_for_each_entry(entry, &memslot->arch.pages.list, list) {
+		page = entry->page;
+		if (!test_bit(PG_mte_tagged, &page->flags)) {
+			mte_clear_page_tags(page_address(page));
+			set_bit(PG_mte_tagged, &page->flags);
+		}
+	}
+
+out:
+	mmap_read_unlock(current->mm);
+
+	return ret;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -1325,14 +1383,29 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
  * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
  *   is live, which means that the VM will be live.
  */
-void kvm_mmu_perform_pending_ops(struct kvm *kvm)
+int kvm_mmu_perform_pending_ops(struct kvm *kvm)
 {
 	struct kvm_memory_slot *memslot;
+	int ret = 0;
 
 	mutex_lock(&kvm->slots_lock);
 	if (!kvm_mmu_has_pending_ops(kvm))
 		goto out_unlock;
 
+	if (kvm_has_mte(kvm) &&
+	    (test_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops))) {
+		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+			if (!memslot_is_locked(memslot))
+				continue;
+			mmap_read_lock(current->mm);
+			ret = sanitise_mte_tags_memslot(kvm, memslot);
+			mmap_read_unlock(current->mm);
+			if (ret)
+				goto out_unlock;
+		}
+		clear_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
+	}
+
 	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
 		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
 			if (!memslot_is_locked(memslot))
@@ -1349,7 +1422,7 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm)
 
 out_unlock:
 	mutex_unlock(&kvm->slots_lock);
-	return;
+	return ret;
 }
 
 static int try_rlimit_memlock(unsigned long npages)
@@ -1443,19 +1516,6 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			ret = -ENOMEM;
 			goto out_err;
 		}
-		if (kvm_has_mte(kvm)) {
-			if (vma->vm_flags & VM_SHARED) {
-				ret = -EFAULT;
-			} else {
-				ret = sanitise_mte_tags(kvm,
-					page_to_pfn(page_entry->page),
-					PAGE_SIZE);
-			}
-			if (ret) {
-				mmap_read_unlock(current->mm);
-				goto out_err;
-			}
-		}
 		mmap_read_unlock(current->mm);
 
 		ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
@@ -1508,6 +1568,11 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
 
 	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
+	/*
+	 * MTE might be enabled after we lock the memslot, set it here
+	 * unconditionally.
+	 */
+	set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
 
 	kvm_mmu_free_memory_cache(&cache);
 
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 07/38] KVM: arm64: Unmap unlocked memslot from stage 2 if kvm_mmu_has_pending_ops()
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

KVM relies on doing the necessary maintenance operations for locked
memslots when the first VCPU is run. If the memslot has been locked, and
then unlocked before the first VCPU is run, the maintenance operations
won't be performed for the region described by the memslot, but the memory
remains mapped at stage 2. Which means that it is possible for a guest
running with the MMU off to read stale value from memory instead of the
newest values written by the host (and not written back to memory).

In this case, unmap the memslot from stage 2 to trigger stage 2 data
aborts, which will take care of any synchronisation requirements.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst |  7 +++++--
 arch/arm64/kvm/mmu.c           | 20 ++++++++++++++++++++
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0ac12a730013..5a69b3b543c0 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6979,8 +6979,11 @@ write permissions are specified for a memslot which logs dirty pages.
 
 Enabling this capability causes the memory pinned when locking the memslot
 specified in args[0] to be unpinned, or, optionally, all memslots to be
-unlocked. The IPA range is not unmapped from stage 2. It is considered an error
-to attempt to unlock a memslot which is not locked.
+unlocked. If between the user memory region being locked and the same region
+being unlocked no VCPU has run, then unlocking the memory region also causes the
+corresponding IPA range to be unmapped from stage 2; otherwise, stage 2 is left
+unchanged. It is considered an error to attempt to unlock a memslot which is not
+locked.
 
 8. Other capabilities.
 ======================
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2491e73e3d31..cd6f1bc7842d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1632,6 +1632,14 @@ static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
 	bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
 	unsigned long npages = memslot->npages;
 
+	/*
+	 * MMU maintenace operations aren't performed on an unlocked memslot.
+	 * Unmap it from stage 2 so the abort handler performs the necessary
+	 * operations.
+	 */
+	if (kvm_mmu_has_pending_ops(kvm))
+		kvm_arch_flush_shadow_memslot(kvm, memslot);
+
 	unpin_memslot_pages(memslot, writable);
 	account_locked_vm(current->mm, npages, false);
 
@@ -1642,6 +1650,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 {
 	bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
 	struct kvm_memory_slot *memslot;
+	bool has_locked_memslot;
 	int ret = 0;
 
 	if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
@@ -1664,6 +1673,17 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 		unlock_memslot(kvm, memslot);
 	}
 
+	if (kvm_mmu_has_pending_ops(kvm)) {
+		has_locked_memslot = false;
+		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+			if (memslot_is_locked(memslot)) {
+				has_locked_memslot = true;
+			}
+		}
+		if (!has_locked_memslot)
+			kvm->arch.mmu_pending_ops = 0;
+	}
+
 out_unlock_slots:
 	mutex_unlock(&kvm->slots_lock);
 	return ret;
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 07/38] KVM: arm64: Unmap unlocked memslot from stage 2 if kvm_mmu_has_pending_ops()
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

KVM relies on doing the necessary maintenance operations for locked
memslots when the first VCPU is run. If the memslot has been locked, and
then unlocked before the first VCPU is run, the maintenance operations
won't be performed for the region described by the memslot, but the memory
remains mapped at stage 2. Which means that it is possible for a guest
running with the MMU off to read stale value from memory instead of the
newest values written by the host (and not written back to memory).

In this case, unmap the memslot from stage 2 to trigger stage 2 data
aborts, which will take care of any synchronisation requirements.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst |  7 +++++--
 arch/arm64/kvm/mmu.c           | 20 ++++++++++++++++++++
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0ac12a730013..5a69b3b543c0 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6979,8 +6979,11 @@ write permissions are specified for a memslot which logs dirty pages.
 
 Enabling this capability causes the memory pinned when locking the memslot
 specified in args[0] to be unpinned, or, optionally, all memslots to be
-unlocked. The IPA range is not unmapped from stage 2. It is considered an error
-to attempt to unlock a memslot which is not locked.
+unlocked. If between the user memory region being locked and the same region
+being unlocked no VCPU has run, then unlocking the memory region also causes the
+corresponding IPA range to be unmapped from stage 2; otherwise, stage 2 is left
+unchanged. It is considered an error to attempt to unlock a memslot which is not
+locked.
 
 8. Other capabilities.
 ======================
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2491e73e3d31..cd6f1bc7842d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1632,6 +1632,14 @@ static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
 	bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
 	unsigned long npages = memslot->npages;
 
+	/*
+	 * MMU maintenace operations aren't performed on an unlocked memslot.
+	 * Unmap it from stage 2 so the abort handler performs the necessary
+	 * operations.
+	 */
+	if (kvm_mmu_has_pending_ops(kvm))
+		kvm_arch_flush_shadow_memslot(kvm, memslot);
+
 	unpin_memslot_pages(memslot, writable);
 	account_locked_vm(current->mm, npages, false);
 
@@ -1642,6 +1650,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 {
 	bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
 	struct kvm_memory_slot *memslot;
+	bool has_locked_memslot;
 	int ret = 0;
 
 	if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
@@ -1664,6 +1673,17 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 		unlock_memslot(kvm, memslot);
 	}
 
+	if (kvm_mmu_has_pending_ops(kvm)) {
+		has_locked_memslot = false;
+		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+			if (memslot_is_locked(memslot)) {
+				has_locked_memslot = true;
+			}
+		}
+		if (!has_locked_memslot)
+			kvm->arch.mmu_pending_ops = 0;
+	}
+
 out_unlock_slots:
 	mutex_unlock(&kvm->slots_lock);
 	return ret;
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 08/38] KVM: arm64: Unlock memslots after stage 2 tables are freed
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Unpin the backing pages mapped at stage 2 after the stage 2 translation
tables are destroyed.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/mmu.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index cd6f1bc7842d..072e2aba371f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1627,11 +1627,19 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 	return ret;
 }
 
-static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
+static void __unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
 	bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
 	unsigned long npages = memslot->npages;
 
+	unpin_memslot_pages(memslot, writable);
+	account_locked_vm(current->mm, npages, false);
+
+	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
+}
+
+static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
 	/*
 	 * MMU maintenace operations aren't performed on an unlocked memslot.
 	 * Unmap it from stage 2 so the abort handler performs the necessary
@@ -1640,10 +1648,7 @@ static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
 	if (kvm_mmu_has_pending_ops(kvm))
 		kvm_arch_flush_shadow_memslot(kvm, memslot);
 
-	unpin_memslot_pages(memslot, writable);
-	account_locked_vm(current->mm, npages, false);
-
-	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
+	__unlock_memslot(kvm, memslot);
 }
 
 int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
@@ -1951,7 +1956,15 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
+	struct kvm_memory_slot *memslot;
+
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
+
+	kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+		if (!memslot_is_locked(memslot))
+			continue;
+		__unlock_memslot(kvm, memslot);
+	}
 }
 
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 08/38] KVM: arm64: Unlock memslots after stage 2 tables are freed
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Unpin the backing pages mapped at stage 2 after the stage 2 translation
tables are destroyed.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/mmu.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index cd6f1bc7842d..072e2aba371f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1627,11 +1627,19 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 	return ret;
 }
 
-static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
+static void __unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
 	bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
 	unsigned long npages = memslot->npages;
 
+	unpin_memslot_pages(memslot, writable);
+	account_locked_vm(current->mm, npages, false);
+
+	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
+}
+
+static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
 	/*
 	 * MMU maintenace operations aren't performed on an unlocked memslot.
 	 * Unmap it from stage 2 so the abort handler performs the necessary
@@ -1640,10 +1648,7 @@ static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
 	if (kvm_mmu_has_pending_ops(kvm))
 		kvm_arch_flush_shadow_memslot(kvm, memslot);
 
-	unpin_memslot_pages(memslot, writable);
-	account_locked_vm(current->mm, npages, false);
-
-	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
+	__unlock_memslot(kvm, memslot);
 }
 
 int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
@@ -1951,7 +1956,15 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
+	struct kvm_memory_slot *memslot;
+
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
+
+	kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+		if (!memslot_is_locked(memslot))
+			continue;
+		__unlock_memslot(kvm, memslot);
+	}
 }
 
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 09/38] KVM: arm64: Deny changes to locked memslots
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Forbid userspace from making changes to a locked memslot. If userspace
wants to modify a locked memslot, then they will need to unlock it.

One special case is allowed: memslots locked for read, but not for write,
can have dirty page logging turned on.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/mmu.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 072e2aba371f..bc2a546f65c3 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1890,8 +1890,23 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 {
 	hva_t hva = mem->userspace_addr;
 	hva_t reg_end = hva + mem->memory_size;
+	struct kvm_memory_slot *old;
 	int ret = 0;
 
+	/*
+	 * Forbid all changes to locked memslots with the exception of turning
+	 * on dirty page logging for memslots locked only for reads.
+	 */
+	old = id_to_memslot(kvm_memslots(kvm), memslot->id);
+	if (old && memslot_is_locked(old)) {
+		if (change == KVM_MR_FLAGS_ONLY &&
+		    memslot_is_logging(memslot) &&
+		    !(old->arch.flags & KVM_MEMSLOT_LOCK_WRITE))
+			memcpy(&memslot->arch, &old->arch, sizeof(old->arch));
+		else
+			return -EBUSY;
+	}
+
 	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
 			change != KVM_MR_FLAGS_ONLY)
 		return 0;
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 09/38] KVM: arm64: Deny changes to locked memslots
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Forbid userspace from making changes to a locked memslot. If userspace
wants to modify a locked memslot, then they will need to unlock it.

One special case is allowed: memslots locked for read, but not for write,
can have dirty page logging turned on.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/mmu.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 072e2aba371f..bc2a546f65c3 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1890,8 +1890,23 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 {
 	hva_t hva = mem->userspace_addr;
 	hva_t reg_end = hva + mem->memory_size;
+	struct kvm_memory_slot *old;
 	int ret = 0;
 
+	/*
+	 * Forbid all changes to locked memslots with the exception of turning
+	 * on dirty page logging for memslots locked only for reads.
+	 */
+	old = id_to_memslot(kvm_memslots(kvm), memslot->id);
+	if (old && memslot_is_locked(old)) {
+		if (change == KVM_MR_FLAGS_ONLY &&
+		    memslot_is_logging(memslot) &&
+		    !(old->arch.flags & KVM_MEMSLOT_LOCK_WRITE))
+			memcpy(&memslot->arch, &old->arch, sizeof(old->arch));
+		else
+			return -EBUSY;
+	}
+
 	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
 			change != KVM_MR_FLAGS_ONLY)
 		return 0;
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 10/38] KVM: Add kvm_warn{,_ratelimited} macros
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Add the kvm_warn() and kvm_warn_ratelimited() macros to print a kernel
warning.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/linux/kvm_host.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9e0667e3723e..79c6b448d95c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -627,6 +627,10 @@ struct kvm {
 
 #define kvm_err(fmt, ...) \
 	pr_err("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
+#define kvm_warn(fmt, ...) \
+	pr_warn("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
+#define kvm_warn_ratelimited(fmt, ...) \
+	pr_warn_ratelimited("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
 #define kvm_info(fmt, ...) \
 	pr_info("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
 #define kvm_debug(fmt, ...) \
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 10/38] KVM: Add kvm_warn{,_ratelimited} macros
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Add the kvm_warn() and kvm_warn_ratelimited() macros to print a kernel
warning.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/linux/kvm_host.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9e0667e3723e..79c6b448d95c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -627,6 +627,10 @@ struct kvm {
 
 #define kvm_err(fmt, ...) \
 	pr_err("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
+#define kvm_warn(fmt, ...) \
+	pr_warn("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
+#define kvm_warn_ratelimited(fmt, ...) \
+	pr_warn_ratelimited("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
 #define kvm_info(fmt, ...) \
 	pr_info("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
 #define kvm_debug(fmt, ...) \
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 11/38] KVM: arm64: Print a warning for unexpected faults on locked memslots
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

When userspace unmaps a VMA backing a memslot, the corresponding stage 2
address range gets unmapped via the MMU notifiers. This makes it possible
to get stage 2 faults on a locked memslot, which might not be what
userspace wants because the purpose of locking a memslot is to avoid stage
2 faults in the first place.

Addresses being unmapped from stage 2 can happen from other reasons too,
like bugs in the implementation of the lock memslot API, however unlikely
that might seem (or not).

Let's try to make debugging easier by printing a warning when this happens.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/mmu.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bc2a546f65c3..3e9ec646cc34 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1352,6 +1352,27 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	/* Userspace should not be able to register out-of-bounds IPAs */
 	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
 
+	if (memslot_is_locked(memslot)) {
+		const char *fault_type_str;
+
+		if (kvm_vcpu_trap_is_exec_fault(vcpu))
+			goto handle_fault;
+
+		if (fault_status == FSC_ACCESS)
+			fault_type_str = "access";
+		else if (write_fault && (memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE))
+			fault_type_str = "write";
+		else if (!write_fault)
+			fault_type_str = "read";
+		else
+			goto handle_fault;
+
+		kvm_warn_ratelimited("Unexpected L2 %s fault on locked memslot %d: IPA=%#llx, ESR_EL2=%#08x]\n",
+				     fault_type_str, memslot->id, fault_ipa,
+				     kvm_vcpu_get_esr(vcpu));
+	}
+
+handle_fault:
 	if (fault_status == FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
 		ret = 1;
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 11/38] KVM: arm64: Print a warning for unexpected faults on locked memslots
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

When userspace unmaps a VMA backing a memslot, the corresponding stage 2
address range gets unmapped via the MMU notifiers. This makes it possible
to get stage 2 faults on a locked memslot, which might not be what
userspace wants because the purpose of locking a memslot is to avoid stage
2 faults in the first place.

Addresses being unmapped from stage 2 can happen from other reasons too,
like bugs in the implementation of the lock memslot API, however unlikely
that might seem (or not).

Let's try to make debugging easier by printing a warning when this happens.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/mmu.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bc2a546f65c3..3e9ec646cc34 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1352,6 +1352,27 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	/* Userspace should not be able to register out-of-bounds IPAs */
 	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
 
+	if (memslot_is_locked(memslot)) {
+		const char *fault_type_str;
+
+		if (kvm_vcpu_trap_is_exec_fault(vcpu))
+			goto handle_fault;
+
+		if (fault_status == FSC_ACCESS)
+			fault_type_str = "access";
+		else if (write_fault && (memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE))
+			fault_type_str = "write";
+		else if (!write_fault)
+			fault_type_str = "read";
+		else
+			goto handle_fault;
+
+		kvm_warn_ratelimited("Unexpected L2 %s fault on locked memslot %d: IPA=%#llx, ESR_EL2=%#08x]\n",
+				     fault_type_str, memslot->id, fault_ipa,
+				     kvm_vcpu_get_esr(vcpu));
+	}
+
+handle_fault:
 	if (fault_status == FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
 		ret = 1;
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 12/38] KVM: arm64: Allow userspace to lock and unlock memslots
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The ioctls have been implemented, allow the userspace to lock and unlock
memslots.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/arm.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 13f3af1f2e78..27e86e480355 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -78,11 +78,6 @@ int kvm_arch_check_processor_compat(void *opaque)
 	return 0;
 }
 
-static int kvm_arm_lock_memslot_supported(void)
-{
-	return 0;
-}
-
 static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
 					     struct kvm_enable_cap *cap)
 {
@@ -129,8 +124,6 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		mutex_unlock(&kvm->lock);
 		break;
 	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
-		if (!kvm_arm_lock_memslot_supported())
-			return -EINVAL;
 		r = kvm_lock_user_memory_region_ioctl(kvm, cap);
 		break;
 	default:
@@ -308,7 +301,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = system_has_full_ptr_auth();
 		break;
 	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
-		r = kvm_arm_lock_memslot_supported();
+		r = 1;
 		break;
 	default:
 		r = 0;
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 12/38] KVM: arm64: Allow userspace to lock and unlock memslots
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The ioctls have been implemented, allow the userspace to lock and unlock
memslots.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/arm.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 13f3af1f2e78..27e86e480355 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -78,11 +78,6 @@ int kvm_arch_check_processor_compat(void *opaque)
 	return 0;
 }
 
-static int kvm_arm_lock_memslot_supported(void)
-{
-	return 0;
-}
-
 static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
 					     struct kvm_enable_cap *cap)
 {
@@ -129,8 +124,6 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		mutex_unlock(&kvm->lock);
 		break;
 	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
-		if (!kvm_arm_lock_memslot_supported())
-			return -EINVAL;
 		r = kvm_lock_user_memory_region_ioctl(kvm, cap);
 		break;
 	default:
@@ -308,7 +301,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = system_has_full_ptr_auth();
 		break;
 	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
-		r = kvm_arm_lock_memslot_supported();
+		r = 1;
 		break;
 	default:
 		r = 0;
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 13/38] KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Add a new configuration option that will be used for KVM SPE emulation.
CONFIG_KVM_ARM_SPE depends on the SPE driver being builtin because:

1. The cpumask of physical CPUs that support SPE will be used by KVM to
   emulate SPE on heterogeneous systems.

2. KVM will rely on the SPE driver enabling the SPE interrupt at the GIC
   level.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/Kconfig | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8ffcbe29395e..9c8c8424ab58 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -54,4 +54,12 @@ config NVHE_EL2_DEBUG
 
 	  If unsure, say N.
 
+config KVM_ARM_SPE
+	bool "Virtual Statistical Profiling Extension (SPE) support"
+	depends on KVM && ARM_SPE_PMU=y
+	default y
+	help
+	  Adds support for Statistical Profiling Extension (SPE) in virtual
+	  machines.
+
 endif # VIRTUALIZATION
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 13/38] KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Add a new configuration option that will be used for KVM SPE emulation.
CONFIG_KVM_ARM_SPE depends on the SPE driver being builtin because:

1. The cpumask of physical CPUs that support SPE will be used by KVM to
   emulate SPE on heterogeneous systems.

2. KVM will rely on the SPE driver enabling the SPE interrupt at the GIC
   level.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/Kconfig | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8ffcbe29395e..9c8c8424ab58 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -54,4 +54,12 @@ config NVHE_EL2_DEBUG
 
 	  If unsure, say N.
 
+config KVM_ARM_SPE
+	bool "Virtual Statistical Profiling Extension (SPE) support"
+	depends on KVM && ARM_SPE_PMU=y
+	default y
+	help
+	  Adds support for Statistical Profiling Extension (SPE) in virtual
+	  machines.
+
 endif # VIRTUALIZATION
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 14/38] KVM: arm64: Add SPE capability and VCPU feature
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Add the KVM_CAP_ARM_SPE capability which allows userspace to discover if
SPE emulation is available. Add the KVM_ARM_VCPU_SPE feature which
enables the emulation for a VCPU.

Both are disabled for now.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst    | 9 +++++++++
 arch/arm64/include/uapi/asm/kvm.h | 1 +
 arch/arm64/kvm/arm.c              | 3 +++
 include/uapi/linux/kvm.h          | 1 +
 4 files changed, 14 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 5a69b3b543c0..acd2b97c9ca9 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7544,3 +7544,12 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
 of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
 the hypercalls whose corresponding bit is in the argument, and return
 ENOSYS for the others.
+
+8.35 KVM_CAP_ARM_SPE
+--------------------
+
+:Architectures: arm64
+
+This capability indicates that Statistical Profiling Extension (SPE) emulation
+is available in KVM. SPE emulation is enabled for each VCPU which has the
+feature bit KVM_ARM_VCPU_SPE set.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..9f0a8ea50ea9 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_SPE		7 /* enable SPE for this CPU */
 
 struct kvm_vcpu_init {
 	__u32 target;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 27e86e480355..2cdb18d9a740 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -303,6 +303,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
 		r = 1;
 		break;
+	case KVM_CAP_ARM_SPE:
+		r = 0;
+		break;
 	default:
 		r = 0;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 70c969967557..6f63b0f4017d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1132,6 +1132,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_MTE 205
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
 #define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
+#define KVM_CAP_ARM_SPE 208
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 14/38] KVM: arm64: Add SPE capability and VCPU feature
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Add the KVM_CAP_ARM_SPE capability which allows userspace to discover if
SPE emulation is available. Add the KVM_ARM_VCPU_SPE feature which
enables the emulation for a VCPU.

Both are disabled for now.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst    | 9 +++++++++
 arch/arm64/include/uapi/asm/kvm.h | 1 +
 arch/arm64/kvm/arm.c              | 3 +++
 include/uapi/linux/kvm.h          | 1 +
 4 files changed, 14 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 5a69b3b543c0..acd2b97c9ca9 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7544,3 +7544,12 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
 of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
 the hypercalls whose corresponding bit is in the argument, and return
 ENOSYS for the others.
+
+8.35 KVM_CAP_ARM_SPE
+--------------------
+
+:Architectures: arm64
+
+This capability indicates that Statistical Profiling Extension (SPE) emulation
+is available in KVM. SPE emulation is enabled for each VCPU which has the
+feature bit KVM_ARM_VCPU_SPE set.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..9f0a8ea50ea9 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_SPE		7 /* enable SPE for this CPU */
 
 struct kvm_vcpu_init {
 	__u32 target;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 27e86e480355..2cdb18d9a740 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -303,6 +303,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
 		r = 1;
 		break;
+	case KVM_CAP_ARM_SPE:
+		r = 0;
+		break;
 	default:
 		r = 0;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 70c969967557..6f63b0f4017d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1132,6 +1132,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_MTE 205
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
 #define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
+#define KVM_CAP_ARM_SPE 208
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 15/38] perf: arm_spe_pmu: Move struct arm_spe_pmu to a separate header file
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

KVM will soon want to make use of struct arm_spe_pmu, move it to a separate
header where it will be easily accessible. This is a straightforward move
and functionality should not be impacted.

CC: Will Deacon <will@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 drivers/perf/arm_spe_pmu.c       | 29 +------------------
 include/linux/perf/arm_spe_pmu.h | 49 ++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+), 28 deletions(-)
 create mode 100644 include/linux/perf/arm_spe_pmu.h

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d44bcc29d99c..ccb92c182527 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -27,7 +27,7 @@
 #include <linux/of_address.h>
 #include <linux/of_device.h>
 #include <linux/perf_event.h>
-#include <linux/perf/arm_pmu.h>
+#include <linux/perf/arm_spe_pmu.h>
 #include <linux/platform_device.h>
 #include <linux/printk.h>
 #include <linux/slab.h>
@@ -47,33 +47,6 @@ struct arm_spe_pmu_buf {
 	void					*base;
 };
 
-struct arm_spe_pmu {
-	struct pmu				pmu;
-	struct platform_device			*pdev;
-	cpumask_t				supported_cpus;
-	struct hlist_node			hotplug_node;
-
-	int					irq; /* PPI */
-	u16					pmsver;
-	u16					min_period;
-	u16					counter_sz;
-
-#define SPE_PMU_FEAT_FILT_EVT			(1UL << 0)
-#define SPE_PMU_FEAT_FILT_TYP			(1UL << 1)
-#define SPE_PMU_FEAT_FILT_LAT			(1UL << 2)
-#define SPE_PMU_FEAT_ARCH_INST			(1UL << 3)
-#define SPE_PMU_FEAT_LDS			(1UL << 4)
-#define SPE_PMU_FEAT_ERND			(1UL << 5)
-#define SPE_PMU_FEAT_DEV_PROBED			(1UL << 63)
-	u64					features;
-
-	u16					max_record_sz;
-	u16					align;
-	struct perf_output_handle __percpu	*handle;
-};
-
-#define to_spe_pmu(p) (container_of(p, struct arm_spe_pmu, pmu))
-
 /* Convert a free-running index from perf into an SPE buffer offset */
 #define PERF_IDX2OFF(idx, buf)	((idx) % ((buf)->nr_pages << PAGE_SHIFT))
 
diff --git a/include/linux/perf/arm_spe_pmu.h b/include/linux/perf/arm_spe_pmu.h
new file mode 100644
index 000000000000..7711e59c5727
--- /dev/null
+++ b/include/linux/perf/arm_spe_pmu.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Split from from drivers/perf/arm_spe_pmu.c.
+ *
+ *  Copyright (C) 2021 ARM Limited
+ */
+
+#ifndef __ARM_SPE_PMU_H__
+#define __ARM_SPE_PMU_H__
+
+#include <linux/cpumask.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+
+#ifdef CONFIG_ARM_SPE_PMU
+
+struct arm_spe_pmu {
+	struct pmu				pmu;
+	struct platform_device			*pdev;
+	cpumask_t				supported_cpus;
+	struct hlist_node			hotplug_node;
+
+	int					irq; /* PPI */
+	u16					pmsver;
+	u16					min_period;
+	u16					counter_sz;
+
+#define SPE_PMU_FEAT_FILT_EVT			(1UL << 0)
+#define SPE_PMU_FEAT_FILT_TYP			(1UL << 1)
+#define SPE_PMU_FEAT_FILT_LAT			(1UL << 2)
+#define SPE_PMU_FEAT_ARCH_INST			(1UL << 3)
+#define SPE_PMU_FEAT_LDS			(1UL << 4)
+#define SPE_PMU_FEAT_ERND			(1UL << 5)
+#define SPE_PMU_FEAT_DEV_PROBED			(1UL << 63)
+	u64					features;
+
+	u16					max_record_sz;
+	u16					align;
+	struct perf_output_handle __percpu	*handle;
+};
+
+#define to_spe_pmu(p) (container_of(p, struct arm_spe_pmu, pmu))
+
+#define ARMV8_SPE_PDEV_NAME "arm,spe-v1"
+
+#endif /* CONFIG_ARM_SPE_PMU */
+
+#endif /* __ARM_SPE_PMU_H__ */
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 15/38] perf: arm_spe_pmu: Move struct arm_spe_pmu to a separate header file
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

KVM will soon want to make use of struct arm_spe_pmu, move it to a separate
header where it will be easily accessible. This is a straightforward move
and functionality should not be impacted.

CC: Will Deacon <will@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 drivers/perf/arm_spe_pmu.c       | 29 +------------------
 include/linux/perf/arm_spe_pmu.h | 49 ++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+), 28 deletions(-)
 create mode 100644 include/linux/perf/arm_spe_pmu.h

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d44bcc29d99c..ccb92c182527 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -27,7 +27,7 @@
 #include <linux/of_address.h>
 #include <linux/of_device.h>
 #include <linux/perf_event.h>
-#include <linux/perf/arm_pmu.h>
+#include <linux/perf/arm_spe_pmu.h>
 #include <linux/platform_device.h>
 #include <linux/printk.h>
 #include <linux/slab.h>
@@ -47,33 +47,6 @@ struct arm_spe_pmu_buf {
 	void					*base;
 };
 
-struct arm_spe_pmu {
-	struct pmu				pmu;
-	struct platform_device			*pdev;
-	cpumask_t				supported_cpus;
-	struct hlist_node			hotplug_node;
-
-	int					irq; /* PPI */
-	u16					pmsver;
-	u16					min_period;
-	u16					counter_sz;
-
-#define SPE_PMU_FEAT_FILT_EVT			(1UL << 0)
-#define SPE_PMU_FEAT_FILT_TYP			(1UL << 1)
-#define SPE_PMU_FEAT_FILT_LAT			(1UL << 2)
-#define SPE_PMU_FEAT_ARCH_INST			(1UL << 3)
-#define SPE_PMU_FEAT_LDS			(1UL << 4)
-#define SPE_PMU_FEAT_ERND			(1UL << 5)
-#define SPE_PMU_FEAT_DEV_PROBED			(1UL << 63)
-	u64					features;
-
-	u16					max_record_sz;
-	u16					align;
-	struct perf_output_handle __percpu	*handle;
-};
-
-#define to_spe_pmu(p) (container_of(p, struct arm_spe_pmu, pmu))
-
 /* Convert a free-running index from perf into an SPE buffer offset */
 #define PERF_IDX2OFF(idx, buf)	((idx) % ((buf)->nr_pages << PAGE_SHIFT))
 
diff --git a/include/linux/perf/arm_spe_pmu.h b/include/linux/perf/arm_spe_pmu.h
new file mode 100644
index 000000000000..7711e59c5727
--- /dev/null
+++ b/include/linux/perf/arm_spe_pmu.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Split from from drivers/perf/arm_spe_pmu.c.
+ *
+ *  Copyright (C) 2021 ARM Limited
+ */
+
+#ifndef __ARM_SPE_PMU_H__
+#define __ARM_SPE_PMU_H__
+
+#include <linux/cpumask.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+
+#ifdef CONFIG_ARM_SPE_PMU
+
+struct arm_spe_pmu {
+	struct pmu				pmu;
+	struct platform_device			*pdev;
+	cpumask_t				supported_cpus;
+	struct hlist_node			hotplug_node;
+
+	int					irq; /* PPI */
+	u16					pmsver;
+	u16					min_period;
+	u16					counter_sz;
+
+#define SPE_PMU_FEAT_FILT_EVT			(1UL << 0)
+#define SPE_PMU_FEAT_FILT_TYP			(1UL << 1)
+#define SPE_PMU_FEAT_FILT_LAT			(1UL << 2)
+#define SPE_PMU_FEAT_ARCH_INST			(1UL << 3)
+#define SPE_PMU_FEAT_LDS			(1UL << 4)
+#define SPE_PMU_FEAT_ERND			(1UL << 5)
+#define SPE_PMU_FEAT_DEV_PROBED			(1UL << 63)
+	u64					features;
+
+	u16					max_record_sz;
+	u16					align;
+	struct perf_output_handle __percpu	*handle;
+};
+
+#define to_spe_pmu(p) (container_of(p, struct arm_spe_pmu, pmu))
+
+#define ARMV8_SPE_PDEV_NAME "arm,spe-v1"
+
+#endif /* CONFIG_ARM_SPE_PMU */
+
+#endif /* __ARM_SPE_PMU_H__ */
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 16/38] KVM: arm64: Allow SPE emulation when the SPE hardware is present
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

KVM SPE emulation requires at least one physical CPU to support SPE.  Each
time the SPE driver successfully probes the SPE hardware, keep track of the
CPUs that belong to the SPE instance and enable the static key that allows
SPE to emulated by KVM.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_spe.h | 20 ++++++++++++++++++++
 arch/arm64/kvm/Makefile          |  1 +
 arch/arm64/kvm/arm.c             |  1 +
 arch/arm64/kvm/spe.c             | 28 ++++++++++++++++++++++++++++
 drivers/perf/arm_spe_pmu.c       |  2 ++
 include/linux/perf/arm_spe_pmu.h |  6 ++++++
 6 files changed, 58 insertions(+)
 create mode 100644 arch/arm64/include/asm/kvm_spe.h
 create mode 100644 arch/arm64/kvm/spe.c

diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
new file mode 100644
index 000000000000..8f8b7dd7fd90
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ */
+
+#ifndef __ARM64_KVM_SPE_H__
+#define __ARM64_KVM_SPE_H__
+
+#ifdef CONFIG_KVM_ARM_SPE
+DECLARE_STATIC_KEY_FALSE(kvm_spe_available);
+
+static __always_inline bool kvm_supports_spe(void)
+{
+	return static_branch_likely(&kvm_spe_available);
+}
+#else
+#define kvm_supports_spe()	(false)
+#endif /* CONFIG_KVM_ARM_SPE */
+
+#endif /* __ARM64_KVM_SPE_H__ */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 989bb5dad2c8..86092a0f8367 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -25,3 +25,4 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 	 vgic/vgic-its.o vgic/vgic-debug.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
+kvm-$(CONFIG_KVM_ARM_SPE)     += spe.o
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2cdb18d9a740..b2997b919be2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -37,6 +37,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_spe.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
new file mode 100644
index 000000000000..6cd0e24ddeec
--- /dev/null
+++ b/arch/arm64/kvm/spe.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ */
+
+#include <linux/cpumask.h>
+#include <linux/kvm_host.h>
+#include <linux/perf/arm_spe_pmu.h>
+
+#include <asm/kvm_spe.h>
+
+DEFINE_STATIC_KEY_FALSE(kvm_spe_available);
+
+static cpumask_t supported_cpus;
+static DEFINE_MUTEX(supported_cpus_lock);
+
+void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu)
+{
+	if (!spe_pmu->pmsver)
+		return;
+
+	mutex_lock(&supported_cpus_lock);
+
+	static_branch_enable(&kvm_spe_available);
+	cpumask_or(&supported_cpus, &supported_cpus, &spe_pmu->supported_cpus);
+
+	mutex_unlock(&supported_cpus_lock);
+}
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index ccb92c182527..4ffc02a98282 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -1183,6 +1183,8 @@ static int arm_spe_pmu_device_probe(struct platform_device *pdev)
 	if (ret)
 		goto out_teardown_dev;
 
+	kvm_host_spe_init(spe_pmu);
+
 	return 0;
 
 out_teardown_dev:
diff --git a/include/linux/perf/arm_spe_pmu.h b/include/linux/perf/arm_spe_pmu.h
index 7711e59c5727..505a8867daad 100644
--- a/include/linux/perf/arm_spe_pmu.h
+++ b/include/linux/perf/arm_spe_pmu.h
@@ -44,6 +44,12 @@ struct arm_spe_pmu {
 
 #define ARMV8_SPE_PDEV_NAME "arm,spe-v1"
 
+#ifdef CONFIG_KVM_ARM_SPE
+void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu);
+#else
+#define kvm_host_spe_init(x)	do { } while(0)
+#endif
+
 #endif /* CONFIG_ARM_SPE_PMU */
 
 #endif /* __ARM_SPE_PMU_H__ */
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 16/38] KVM: arm64: Allow SPE emulation when the SPE hardware is present
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

KVM SPE emulation requires at least one physical CPU to support SPE.  Each
time the SPE driver successfully probes the SPE hardware, keep track of the
CPUs that belong to the SPE instance and enable the static key that allows
SPE to emulated by KVM.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_spe.h | 20 ++++++++++++++++++++
 arch/arm64/kvm/Makefile          |  1 +
 arch/arm64/kvm/arm.c             |  1 +
 arch/arm64/kvm/spe.c             | 28 ++++++++++++++++++++++++++++
 drivers/perf/arm_spe_pmu.c       |  2 ++
 include/linux/perf/arm_spe_pmu.h |  6 ++++++
 6 files changed, 58 insertions(+)
 create mode 100644 arch/arm64/include/asm/kvm_spe.h
 create mode 100644 arch/arm64/kvm/spe.c

diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
new file mode 100644
index 000000000000..8f8b7dd7fd90
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ */
+
+#ifndef __ARM64_KVM_SPE_H__
+#define __ARM64_KVM_SPE_H__
+
+#ifdef CONFIG_KVM_ARM_SPE
+DECLARE_STATIC_KEY_FALSE(kvm_spe_available);
+
+static __always_inline bool kvm_supports_spe(void)
+{
+	return static_branch_likely(&kvm_spe_available);
+}
+#else
+#define kvm_supports_spe()	(false)
+#endif /* CONFIG_KVM_ARM_SPE */
+
+#endif /* __ARM64_KVM_SPE_H__ */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 989bb5dad2c8..86092a0f8367 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -25,3 +25,4 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 	 vgic/vgic-its.o vgic/vgic-debug.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
+kvm-$(CONFIG_KVM_ARM_SPE)     += spe.o
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2cdb18d9a740..b2997b919be2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -37,6 +37,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_spe.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
new file mode 100644
index 000000000000..6cd0e24ddeec
--- /dev/null
+++ b/arch/arm64/kvm/spe.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ */
+
+#include <linux/cpumask.h>
+#include <linux/kvm_host.h>
+#include <linux/perf/arm_spe_pmu.h>
+
+#include <asm/kvm_spe.h>
+
+DEFINE_STATIC_KEY_FALSE(kvm_spe_available);
+
+static cpumask_t supported_cpus;
+static DEFINE_MUTEX(supported_cpus_lock);
+
+void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu)
+{
+	if (!spe_pmu->pmsver)
+		return;
+
+	mutex_lock(&supported_cpus_lock);
+
+	static_branch_enable(&kvm_spe_available);
+	cpumask_or(&supported_cpus, &supported_cpus, &spe_pmu->supported_cpus);
+
+	mutex_unlock(&supported_cpus_lock);
+}
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index ccb92c182527..4ffc02a98282 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -1183,6 +1183,8 @@ static int arm_spe_pmu_device_probe(struct platform_device *pdev)
 	if (ret)
 		goto out_teardown_dev;
 
+	kvm_host_spe_init(spe_pmu);
+
 	return 0;
 
 out_teardown_dev:
diff --git a/include/linux/perf/arm_spe_pmu.h b/include/linux/perf/arm_spe_pmu.h
index 7711e59c5727..505a8867daad 100644
--- a/include/linux/perf/arm_spe_pmu.h
+++ b/include/linux/perf/arm_spe_pmu.h
@@ -44,6 +44,12 @@ struct arm_spe_pmu {
 
 #define ARMV8_SPE_PDEV_NAME "arm,spe-v1"
 
+#ifdef CONFIG_KVM_ARM_SPE
+void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu);
+#else
+#define kvm_host_spe_init(x)	do { } while(0)
+#endif
+
 #endif /* CONFIG_ARM_SPE_PMU */
 
 #endif /* __ARM_SPE_PMU_H__ */
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 17/38] KVM: arm64: Allow userspace to set the SPE feature only if SPE is present
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Check that KVM SPE emulation is supported before allowing userspace to set
the KVM_ARM_VCPU_SPE feature.

According to ARM DDI 0487G.a, page D9-2946 the Profiling Buffer is disabled
if the owning Exception level is 32 bit, reject the SPE feature if the
VCPU's execution state at EL1 is aarch32.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/include/asm/kvm_spe.h  |  7 +++++++
 arch/arm64/kvm/reset.c            |  8 ++++++++
 arch/arm64/kvm/spe.c              | 16 ++++++++++++++++
 4 files changed, 34 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0ebdef158020..8b3faed48914 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -815,6 +815,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_vcpu_has_pmu(vcpu)					\
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
+#define kvm_vcpu_has_spe(vcpu)					\
+	(test_bit(KVM_ARM_VCPU_SPE, (vcpu)->arch.features))
+
 int kvm_trng_call(struct kvm_vcpu *vcpu);
 #ifdef CONFIG_KVM
 extern phys_addr_t hyp_mem_base;
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 8f8b7dd7fd90..d33a46a74f78 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -13,8 +13,15 @@ static __always_inline bool kvm_supports_spe(void)
 {
 	return static_branch_likely(&kvm_spe_available);
 }
+
+int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 #else
 #define kvm_supports_spe()	(false)
+
+static inline int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
 #endif /* CONFIG_KVM_ARM_SPE */
 
 #endif /* __ARM64_KVM_SPE_H__ */
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 426bd7fbc3fd..8198156978bc 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -27,6 +27,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_spe.h>
 #include <asm/virt.h>
 
 /* Maximum phys_shift supported for any VM on this host */
@@ -251,6 +252,13 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		goto out;
 	}
 
+	if (kvm_vcpu_has_spe(vcpu)) {
+		if (kvm_spe_vcpu_enable_spe(vcpu)) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 6cd0e24ddeec..7c6f94358cc1 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -7,6 +7,7 @@
 #include <linux/kvm_host.h>
 #include <linux/perf/arm_spe_pmu.h>
 
+#include <asm/kvm_emulate.h>
 #include <asm/kvm_spe.h>
 
 DEFINE_STATIC_KEY_FALSE(kvm_spe_available);
@@ -26,3 +27,18 @@ void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu)
 
 	mutex_unlock(&supported_cpus_lock);
 }
+
+int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_supports_spe())
+		return -EINVAL;
+
+	/*
+	 * The Profiling Buffer is disabled if the owning Exception level is
+	 * aarch32.
+	 */
+	if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT))
+		return -EINVAL;
+
+	return 0;
+}
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 17/38] KVM: arm64: Allow userspace to set the SPE feature only if SPE is present
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Check that KVM SPE emulation is supported before allowing userspace to set
the KVM_ARM_VCPU_SPE feature.

According to ARM DDI 0487G.a, page D9-2946 the Profiling Buffer is disabled
if the owning Exception level is 32 bit, reject the SPE feature if the
VCPU's execution state at EL1 is aarch32.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/include/asm/kvm_spe.h  |  7 +++++++
 arch/arm64/kvm/reset.c            |  8 ++++++++
 arch/arm64/kvm/spe.c              | 16 ++++++++++++++++
 4 files changed, 34 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0ebdef158020..8b3faed48914 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -815,6 +815,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_vcpu_has_pmu(vcpu)					\
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
+#define kvm_vcpu_has_spe(vcpu)					\
+	(test_bit(KVM_ARM_VCPU_SPE, (vcpu)->arch.features))
+
 int kvm_trng_call(struct kvm_vcpu *vcpu);
 #ifdef CONFIG_KVM
 extern phys_addr_t hyp_mem_base;
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 8f8b7dd7fd90..d33a46a74f78 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -13,8 +13,15 @@ static __always_inline bool kvm_supports_spe(void)
 {
 	return static_branch_likely(&kvm_spe_available);
 }
+
+int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 #else
 #define kvm_supports_spe()	(false)
+
+static inline int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
 #endif /* CONFIG_KVM_ARM_SPE */
 
 #endif /* __ARM64_KVM_SPE_H__ */
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 426bd7fbc3fd..8198156978bc 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -27,6 +27,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_spe.h>
 #include <asm/virt.h>
 
 /* Maximum phys_shift supported for any VM on this host */
@@ -251,6 +252,13 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		goto out;
 	}
 
+	if (kvm_vcpu_has_spe(vcpu)) {
+		if (kvm_spe_vcpu_enable_spe(vcpu)) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 6cd0e24ddeec..7c6f94358cc1 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -7,6 +7,7 @@
 #include <linux/kvm_host.h>
 #include <linux/perf/arm_spe_pmu.h>
 
+#include <asm/kvm_emulate.h>
 #include <asm/kvm_spe.h>
 
 DEFINE_STATIC_KEY_FALSE(kvm_spe_available);
@@ -26,3 +27,18 @@ void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu)
 
 	mutex_unlock(&supported_cpus_lock);
 }
+
+int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_supports_spe())
+		return -EINVAL;
+
+	/*
+	 * The Profiling Buffer is disabled if the owning Exception level is
+	 * aarch32.
+	 */
+	if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT))
+		return -EINVAL;
+
+	return 0;
+}
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 18/38] KVM: arm64: Expose SPE version to guests
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Set the ID_AA64DFR0_EL1.PMSVer field to a non-zero value if the VCPU SPE
feature is set. SPE version is capped at FEAT_SPEv1p1 because KVM doesn't
yet implement freezing of PMU event counters on a SPE buffer management
event.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e3ec1a44f94d..c36df734c1ad 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1105,8 +1105,10 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 		val = cpuid_feature_cap_perfmon_field(val,
 						      ID_AA64DFR0_PMUVER_SHIFT,
 						      kvm_vcpu_has_pmu(vcpu) ? ID_AA64DFR0_PMUVER_8_4 : 0);
-		/* Hide SPE from guests */
-		val &= ~ARM64_FEATURE_MASK(ID_AA64DFR0_PMSVER);
+		/* Limit guests to SPE for ARMv8.3 */
+		val = cpuid_feature_cap_perfmon_field(val,
+						      ID_AA64DFR0_PMSVER_SHIFT,
+						      kvm_vcpu_has_spe(vcpu) ? ID_AA64DFR0_PMSVER_8_3 : 0);
 		break;
 	case SYS_ID_DFR0_EL1:
 		/* Limit guests to PMUv3 for ARMv8.4 */
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 18/38] KVM: arm64: Expose SPE version to guests
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Set the ID_AA64DFR0_EL1.PMSVer field to a non-zero value if the VCPU SPE
feature is set. SPE version is capped at FEAT_SPEv1p1 because KVM doesn't
yet implement freezing of PMU event counters on a SPE buffer management
event.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e3ec1a44f94d..c36df734c1ad 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1105,8 +1105,10 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 		val = cpuid_feature_cap_perfmon_field(val,
 						      ID_AA64DFR0_PMUVER_SHIFT,
 						      kvm_vcpu_has_pmu(vcpu) ? ID_AA64DFR0_PMUVER_8_4 : 0);
-		/* Hide SPE from guests */
-		val &= ~ARM64_FEATURE_MASK(ID_AA64DFR0_PMSVER);
+		/* Limit guests to SPE for ARMv8.3 */
+		val = cpuid_feature_cap_perfmon_field(val,
+						      ID_AA64DFR0_PMSVER_SHIFT,
+						      kvm_vcpu_has_spe(vcpu) ? ID_AA64DFR0_PMSVER_8_3 : 0);
 		break;
 	case SYS_ID_DFR0_EL1:
 		/* Limit guests to PMUv3 for ARMv8.4 */
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 19/38] KVM: arm64: Do not run a VCPU on a CPU without SPE
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The kernel allows heterogeneous systems where FEAT_SPE is not present on
all CPUs. This presents a challenge for KVM, as it will have to touch the
SPE registers when emulating SPE for a guest, and those accesses will cause
an undefined exception if SPE is not present on the CPU.

Avoid this situation by keeping a cpumask of CPUs that the VCPU is
allowed run on, which for SPE is the reunion of all CPUs that support
SPE, and refuse to run the VCPU on a CPU which is not part of the
cpumask.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/arm.c              | 15 +++++++++++++++
 arch/arm64/kvm/spe.c              |  2 ++
 3 files changed, 20 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8b3faed48914..96ce98f6135d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -405,6 +405,9 @@ struct kvm_vcpu_arch {
 		u64 last_steal;
 		gpa_t base;
 	} steal;
+
+	cpumask_var_t supported_cpus;
+	bool cpu_not_supported;
 };
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b2997b919be2..8a7c01d1df58 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -351,6 +351,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
 
+	if (!zalloc_cpumask_var(&vcpu->arch.supported_cpus, GFP_KERNEL))
+		return -ENOMEM;
+
 	/* Set up the timer */
 	kvm_timer_vcpu_init(vcpu);
 
@@ -378,6 +381,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		static_branch_dec(&userspace_irqchip_in_use);
 
+	free_cpumask_var(vcpu->arch.supported_cpus);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
@@ -456,6 +460,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (vcpu_has_ptrauth(vcpu))
 		vcpu_ptrauth_disable(vcpu);
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
+
+	if (!cpumask_empty(vcpu->arch.supported_cpus) &&
+	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
+		vcpu->arch.cpu_not_supported = true;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -893,6 +901,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 */
 		preempt_disable();
 
+		if (unlikely(vcpu->arch.cpu_not_supported)) {
+			vcpu->arch.cpu_not_supported = false;
+			ret = -ENOEXEC;
+			preempt_enable();
+			continue;
+		}
+
 		kvm_pmu_flush_hwstate(vcpu);
 
 		local_irq_disable();
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 7c6f94358cc1..f3863728bab6 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -40,5 +40,7 @@ int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 	if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT))
 		return -EINVAL;
 
+	cpumask_copy(vcpu->arch.supported_cpus, &supported_cpus);
+
 	return 0;
 }
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 19/38] KVM: arm64: Do not run a VCPU on a CPU without SPE
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The kernel allows heterogeneous systems where FEAT_SPE is not present on
all CPUs. This presents a challenge for KVM, as it will have to touch the
SPE registers when emulating SPE for a guest, and those accesses will cause
an undefined exception if SPE is not present on the CPU.

Avoid this situation by keeping a cpumask of CPUs that the VCPU is
allowed run on, which for SPE is the reunion of all CPUs that support
SPE, and refuse to run the VCPU on a CPU which is not part of the
cpumask.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/arm.c              | 15 +++++++++++++++
 arch/arm64/kvm/spe.c              |  2 ++
 3 files changed, 20 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8b3faed48914..96ce98f6135d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -405,6 +405,9 @@ struct kvm_vcpu_arch {
 		u64 last_steal;
 		gpa_t base;
 	} steal;
+
+	cpumask_var_t supported_cpus;
+	bool cpu_not_supported;
 };
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b2997b919be2..8a7c01d1df58 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -351,6 +351,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
 
+	if (!zalloc_cpumask_var(&vcpu->arch.supported_cpus, GFP_KERNEL))
+		return -ENOMEM;
+
 	/* Set up the timer */
 	kvm_timer_vcpu_init(vcpu);
 
@@ -378,6 +381,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		static_branch_dec(&userspace_irqchip_in_use);
 
+	free_cpumask_var(vcpu->arch.supported_cpus);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
@@ -456,6 +460,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (vcpu_has_ptrauth(vcpu))
 		vcpu_ptrauth_disable(vcpu);
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
+
+	if (!cpumask_empty(vcpu->arch.supported_cpus) &&
+	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
+		vcpu->arch.cpu_not_supported = true;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -893,6 +901,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 */
 		preempt_disable();
 
+		if (unlikely(vcpu->arch.cpu_not_supported)) {
+			vcpu->arch.cpu_not_supported = false;
+			ret = -ENOEXEC;
+			preempt_enable();
+			continue;
+		}
+
 		kvm_pmu_flush_hwstate(vcpu);
 
 		local_irq_disable();
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 7c6f94358cc1..f3863728bab6 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -40,5 +40,7 @@ int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 	if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT))
 		return -EINVAL;
 
+	cpumask_copy(vcpu->arch.supported_cpus, &supported_cpus);
+
 	return 0;
 }
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 20/38] KVM: arm64: Add a new VCPU device control group for SPE
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland
  Cc: Sudeep Holla

From: Sudeep Holla <sudeep.holla@arm.com>

Add a new VCPU device control group to control various aspects of KVM's SPE
emulation. Functionality will be added in later patches.

[ Alexandru E: Rewrote patch ]

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst |  5 +++++
 arch/arm64/include/asm/kvm_spe.h        | 20 ++++++++++++++++++++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/guest.c                  | 10 ++++++++++
 arch/arm64/kvm/spe.c                    | 15 +++++++++++++++
 5 files changed, 51 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 60a29972d3f1..c200545d4950 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -231,3 +231,8 @@ From the destination VMM process:
 
 7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the
    respective value derived in the previous step.
+
+5. GROUP: KVM_ARM_VCPU_SPE_CTRL
+===============================
+
+:Architectures: ARM64
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index d33a46a74f78..6443f9b66e4c 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -15,6 +15,10 @@ static __always_inline bool kvm_supports_spe(void)
 }
 
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
+
+int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
+int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
+int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 #else
 #define kvm_supports_spe()	(false)
 
@@ -22,6 +26,22 @@ static inline int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 {
 	return 0;
 }
+
+static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
+				   struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+static inline int kvm_spe_get_attr(struct kvm_vcpu *vcpu,
+				   struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+static inline int kvm_spe_has_attr(struct kvm_vcpu *vcpu,
+				   struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
 #endif /* CONFIG_KVM_ARM_SPE */
 
 #endif /* __ARM64_KVM_SPE_H__ */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 9f0a8ea50ea9..7159a1e23da2 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -368,6 +368,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
 #define KVM_ARM_VCPU_PVTIME_CTRL	2
 #define   KVM_ARM_VCPU_PVTIME_IPA	0
+#define KVM_ARM_VCPU_SPE_CTRL		3
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index e116c7767730..d5b961a80592 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -24,6 +24,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_spe.h>
 #include <asm/sigcontext.h>
 
 #include "trace.h"
@@ -954,6 +955,9 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_set_attr(vcpu, attr);
 		break;
+	case KVM_ARM_VCPU_SPE_CTRL:
+		ret = kvm_spe_set_attr(vcpu, attr);
+		break;
 	default:
 		ret = -ENXIO;
 		break;
@@ -977,6 +981,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_get_attr(vcpu, attr);
 		break;
+	case KVM_ARM_VCPU_SPE_CTRL:
+		ret = kvm_spe_get_attr(vcpu, attr);
+		break;
 	default:
 		ret = -ENXIO;
 		break;
@@ -1000,6 +1007,9 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_has_attr(vcpu, attr);
 		break;
+	case KVM_ARM_VCPU_SPE_CTRL:
+		ret = kvm_spe_has_attr(vcpu, attr);
+		break;
 	default:
 		ret = -ENXIO;
 		break;
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index f3863728bab6..e3f82be398a6 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -44,3 +44,18 @@ int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 
 	return 0;
 }
+
+int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 20/38] KVM: arm64: Add a new VCPU device control group for SPE
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland
  Cc: Sudeep Holla

From: Sudeep Holla <sudeep.holla@arm.com>

Add a new VCPU device control group to control various aspects of KVM's SPE
emulation. Functionality will be added in later patches.

[ Alexandru E: Rewrote patch ]

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst |  5 +++++
 arch/arm64/include/asm/kvm_spe.h        | 20 ++++++++++++++++++++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/guest.c                  | 10 ++++++++++
 arch/arm64/kvm/spe.c                    | 15 +++++++++++++++
 5 files changed, 51 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 60a29972d3f1..c200545d4950 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -231,3 +231,8 @@ From the destination VMM process:
 
 7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the
    respective value derived in the previous step.
+
+5. GROUP: KVM_ARM_VCPU_SPE_CTRL
+===============================
+
+:Architectures: ARM64
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index d33a46a74f78..6443f9b66e4c 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -15,6 +15,10 @@ static __always_inline bool kvm_supports_spe(void)
 }
 
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
+
+int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
+int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
+int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 #else
 #define kvm_supports_spe()	(false)
 
@@ -22,6 +26,22 @@ static inline int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 {
 	return 0;
 }
+
+static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
+				   struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+static inline int kvm_spe_get_attr(struct kvm_vcpu *vcpu,
+				   struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+static inline int kvm_spe_has_attr(struct kvm_vcpu *vcpu,
+				   struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
 #endif /* CONFIG_KVM_ARM_SPE */
 
 #endif /* __ARM64_KVM_SPE_H__ */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 9f0a8ea50ea9..7159a1e23da2 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -368,6 +368,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
 #define KVM_ARM_VCPU_PVTIME_CTRL	2
 #define   KVM_ARM_VCPU_PVTIME_IPA	0
+#define KVM_ARM_VCPU_SPE_CTRL		3
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index e116c7767730..d5b961a80592 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -24,6 +24,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_spe.h>
 #include <asm/sigcontext.h>
 
 #include "trace.h"
@@ -954,6 +955,9 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_set_attr(vcpu, attr);
 		break;
+	case KVM_ARM_VCPU_SPE_CTRL:
+		ret = kvm_spe_set_attr(vcpu, attr);
+		break;
 	default:
 		ret = -ENXIO;
 		break;
@@ -977,6 +981,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_get_attr(vcpu, attr);
 		break;
+	case KVM_ARM_VCPU_SPE_CTRL:
+		ret = kvm_spe_get_attr(vcpu, attr);
+		break;
 	default:
 		ret = -ENXIO;
 		break;
@@ -1000,6 +1007,9 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_has_attr(vcpu, attr);
 		break;
+	case KVM_ARM_VCPU_SPE_CTRL:
+		ret = kvm_spe_has_attr(vcpu, attr);
+		break;
 	default:
 		ret = -ENXIO;
 		break;
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index f3863728bab6..e3f82be398a6 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -44,3 +44,18 @@ int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 
 	return 0;
 }
+
+int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 21/38] KVM: arm64: Add SPE VCPU device attribute to set the interrupt number
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland
  Cc: Sudeep Holla

From: Sudeep Holla <sudeep.holla@arm.com>

Add KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_IRQ) to allow the user to set
the interrupt number for the buffer management interrupt.

[ Alexandru E: Split from "KVM: arm64: Add a new VCPU device control group
               for SPE" ]

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst | 19 ++++++
 arch/arm64/include/asm/kvm_host.h       |  2 +
 arch/arm64/include/asm/kvm_spe.h        | 10 +++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/spe.c                    | 86 +++++++++++++++++++++++++
 5 files changed, 118 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index c200545d4950..a27b149c3b8b 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -236,3 +236,22 @@ From the destination VMM process:
 ===============================
 
 :Architectures: ARM64
+
+5.1 ATTRIBUTE: KVM_ARM_VCPU_SPE_IRQ
+-----------------------------------
+
+:Parameters: in kvm_device_attr.addr the address for the Profiling Buffer
+             management interrupt number as a pointer to an int
+
+Returns:
+
+	 =======  ======================================================
+	 -EBUSY   Interrupt number already set for this VCPU
+	 -EFAULT  Error accessing the buffer management interrupt number
+	 -EINVAL  Invalid interrupt number
+	 -ENXIO   SPE not supported or not properly configured
+	 =======  ======================================================
+
+Specifies the Profiling Buffer management interrupt number. The interrupt number
+must be a PPI and the interrupt number must be the same for each VCPU. SPE
+emulation requires an in-kernel vGIC implementation.
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 96ce98f6135d..8c6e6eef0ae9 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_spe.h>
 #include <asm/thread_info.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
@@ -357,6 +358,7 @@ struct kvm_vcpu_arch {
 	struct vgic_cpu vgic_cpu;
 	struct arch_timer_cpu timer_cpu;
 	struct kvm_pmu pmu;
+	struct kvm_vcpu_spe spe;
 
 	/*
 	 * Anything that is not used directly from assembly code goes
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 6443f9b66e4c..a5484953d06f 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -6,6 +6,8 @@
 #ifndef __ARM64_KVM_SPE_H__
 #define __ARM64_KVM_SPE_H__
 
+#include <linux/kvm.h>
+
 #ifdef CONFIG_KVM_ARM_SPE
 DECLARE_STATIC_KEY_FALSE(kvm_spe_available);
 
@@ -14,6 +16,11 @@ static __always_inline bool kvm_supports_spe(void)
 	return static_branch_likely(&kvm_spe_available);
 }
 
+struct kvm_vcpu_spe {
+	bool initialized;	/* SPE initialized for the VCPU */
+	int irq_num;		/* Buffer management interrut number */
+};
+
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
@@ -22,6 +29,9 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 #else
 #define kvm_supports_spe()	(false)
 
+struct kvm_vcpu_spe {
+};
+
 static inline int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 {
 	return 0;
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 7159a1e23da2..c55d94a1a8f5 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -369,6 +369,7 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_ARM_VCPU_PVTIME_CTRL	2
 #define   KVM_ARM_VCPU_PVTIME_IPA	0
 #define KVM_ARM_VCPU_SPE_CTRL		3
+#define   KVM_ARM_VCPU_SPE_IRQ		0
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index e3f82be398a6..7520d7925460 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -45,17 +45,103 @@ int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_supports_spe())
+		return false;
+
+	if (!kvm_vcpu_has_spe(vcpu))
+		return false;
+
+	if (!irqchip_in_kernel(vcpu->kvm))
+		return false;
+
+	return true;
+}
+
+static bool kvm_spe_irq_is_valid(struct kvm *kvm, int irq)
+{
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	if (!irq_is_ppi(irq))
+		return -EINVAL;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!vcpu->arch.spe.irq_num)
+			continue;
+
+		if (vcpu->arch.spe.irq_num != irq)
+			return false;
+	}
+
+	return true;
+}
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
+	if (!kvm_vcpu_supports_spe(vcpu))
+		return -ENXIO;
+
+	if (vcpu->arch.spe.initialized)
+		return -EBUSY;
+
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_SPE_IRQ: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int irq;
+
+		if (vcpu->arch.spe.irq_num)
+			return -EBUSY;
+
+		if (get_user(irq, uaddr))
+			return -EFAULT;
+
+		if (!kvm_spe_irq_is_valid(vcpu->kvm, irq))
+			return -EINVAL;
+
+		kvm_debug("Set KVM_ARM_VCPU_SPE_IRQ: %d\n", irq);
+		vcpu->arch.spe.irq_num = irq;
+		return 0;
+	}
+	}
+
 	return -ENXIO;
 }
 
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
+	if (!kvm_vcpu_supports_spe(vcpu))
+		return -ENXIO;
+
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_SPE_IRQ: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int irq;
+
+		if (!vcpu->arch.spe.irq_num)
+			return -ENXIO;
+
+		irq = vcpu->arch.spe.irq_num;
+		if (put_user(irq, uaddr))
+			return -EFAULT;
+
+		return 0;
+	}
+	}
+
 	return -ENXIO;
 }
 
 int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
+	if (!kvm_vcpu_supports_spe(vcpu))
+		return -ENXIO;
+
+	switch(attr->attr) {
+	case KVM_ARM_VCPU_SPE_IRQ:
+		return 0;
+	}
+
 	return -ENXIO;
 }
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 21/38] KVM: arm64: Add SPE VCPU device attribute to set the interrupt number
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland
  Cc: Sudeep Holla

From: Sudeep Holla <sudeep.holla@arm.com>

Add KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_IRQ) to allow the user to set
the interrupt number for the buffer management interrupt.

[ Alexandru E: Split from "KVM: arm64: Add a new VCPU device control group
               for SPE" ]

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst | 19 ++++++
 arch/arm64/include/asm/kvm_host.h       |  2 +
 arch/arm64/include/asm/kvm_spe.h        | 10 +++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/spe.c                    | 86 +++++++++++++++++++++++++
 5 files changed, 118 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index c200545d4950..a27b149c3b8b 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -236,3 +236,22 @@ From the destination VMM process:
 ===============================
 
 :Architectures: ARM64
+
+5.1 ATTRIBUTE: KVM_ARM_VCPU_SPE_IRQ
+-----------------------------------
+
+:Parameters: in kvm_device_attr.addr the address for the Profiling Buffer
+             management interrupt number as a pointer to an int
+
+Returns:
+
+	 =======  ======================================================
+	 -EBUSY   Interrupt number already set for this VCPU
+	 -EFAULT  Error accessing the buffer management interrupt number
+	 -EINVAL  Invalid interrupt number
+	 -ENXIO   SPE not supported or not properly configured
+	 =======  ======================================================
+
+Specifies the Profiling Buffer management interrupt number. The interrupt number
+must be a PPI and the interrupt number must be the same for each VCPU. SPE
+emulation requires an in-kernel vGIC implementation.
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 96ce98f6135d..8c6e6eef0ae9 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_spe.h>
 #include <asm/thread_info.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
@@ -357,6 +358,7 @@ struct kvm_vcpu_arch {
 	struct vgic_cpu vgic_cpu;
 	struct arch_timer_cpu timer_cpu;
 	struct kvm_pmu pmu;
+	struct kvm_vcpu_spe spe;
 
 	/*
 	 * Anything that is not used directly from assembly code goes
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 6443f9b66e4c..a5484953d06f 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -6,6 +6,8 @@
 #ifndef __ARM64_KVM_SPE_H__
 #define __ARM64_KVM_SPE_H__
 
+#include <linux/kvm.h>
+
 #ifdef CONFIG_KVM_ARM_SPE
 DECLARE_STATIC_KEY_FALSE(kvm_spe_available);
 
@@ -14,6 +16,11 @@ static __always_inline bool kvm_supports_spe(void)
 	return static_branch_likely(&kvm_spe_available);
 }
 
+struct kvm_vcpu_spe {
+	bool initialized;	/* SPE initialized for the VCPU */
+	int irq_num;		/* Buffer management interrut number */
+};
+
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
@@ -22,6 +29,9 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 #else
 #define kvm_supports_spe()	(false)
 
+struct kvm_vcpu_spe {
+};
+
 static inline int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 {
 	return 0;
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 7159a1e23da2..c55d94a1a8f5 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -369,6 +369,7 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_ARM_VCPU_PVTIME_CTRL	2
 #define   KVM_ARM_VCPU_PVTIME_IPA	0
 #define KVM_ARM_VCPU_SPE_CTRL		3
+#define   KVM_ARM_VCPU_SPE_IRQ		0
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index e3f82be398a6..7520d7925460 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -45,17 +45,103 @@ int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_supports_spe())
+		return false;
+
+	if (!kvm_vcpu_has_spe(vcpu))
+		return false;
+
+	if (!irqchip_in_kernel(vcpu->kvm))
+		return false;
+
+	return true;
+}
+
+static bool kvm_spe_irq_is_valid(struct kvm *kvm, int irq)
+{
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	if (!irq_is_ppi(irq))
+		return -EINVAL;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!vcpu->arch.spe.irq_num)
+			continue;
+
+		if (vcpu->arch.spe.irq_num != irq)
+			return false;
+	}
+
+	return true;
+}
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
+	if (!kvm_vcpu_supports_spe(vcpu))
+		return -ENXIO;
+
+	if (vcpu->arch.spe.initialized)
+		return -EBUSY;
+
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_SPE_IRQ: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int irq;
+
+		if (vcpu->arch.spe.irq_num)
+			return -EBUSY;
+
+		if (get_user(irq, uaddr))
+			return -EFAULT;
+
+		if (!kvm_spe_irq_is_valid(vcpu->kvm, irq))
+			return -EINVAL;
+
+		kvm_debug("Set KVM_ARM_VCPU_SPE_IRQ: %d\n", irq);
+		vcpu->arch.spe.irq_num = irq;
+		return 0;
+	}
+	}
+
 	return -ENXIO;
 }
 
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
+	if (!kvm_vcpu_supports_spe(vcpu))
+		return -ENXIO;
+
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_SPE_IRQ: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int irq;
+
+		if (!vcpu->arch.spe.irq_num)
+			return -ENXIO;
+
+		irq = vcpu->arch.spe.irq_num;
+		if (put_user(irq, uaddr))
+			return -EFAULT;
+
+		return 0;
+	}
+	}
+
 	return -ENXIO;
 }
 
 int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
+	if (!kvm_vcpu_supports_spe(vcpu))
+		return -ENXIO;
+
+	switch(attr->attr) {
+	case KVM_ARM_VCPU_SPE_IRQ:
+		return 0;
+	}
+
 	return -ENXIO;
 }
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 22/38] KVM: arm64: Add SPE VCPU device attribute to initialize SPE
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland
  Cc: Sudeep Holla

From: Sudeep Holla <sudeep.holla@arm.com>

Add KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_INIT) VCPU ioctl to initialize
SPE. Initialization can only be done once for a VCPU. If the feature bit is
set, then SPE must be initialized before the VCPU can be run.

[ Alexandru E: Split from "KVM: arm64: Add a new VCPU device control group
	       for SPE" ]

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst | 16 ++++++++++++++++
 arch/arm64/include/asm/kvm_spe.h        |  6 ++++++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/arm.c                    |  4 ++++
 arch/arm64/kvm/spe.c                    | 24 ++++++++++++++++++++++++
 5 files changed, 51 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index a27b149c3b8b..0ed852315664 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -255,3 +255,19 @@ Returns:
 Specifies the Profiling Buffer management interrupt number. The interrupt number
 must be a PPI and the interrupt number must be the same for each VCPU. SPE
 emulation requires an in-kernel vGIC implementation.
+
+5.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_INIT
+-----------------------------------
+
+:Parameters: no additional parameter in kvm_device_attr.addr
+
+Returns:
+
+	 =======  ============================================
+	 -EBUSY   SPE already initialized for this VCPU
+	 -ENXIO   SPE not supported or not properly configured
+	 =======  ============================================
+
+Request initialization of the Statistical Profiling Extension for this VCPU.
+Must be done after initializaing the in-kernel irqchip and after setting the
+Profiling Buffer management interrupt number for the VCPU.
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index a5484953d06f..14df2c830fda 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -22,6 +22,7 @@ struct kvm_vcpu_spe {
 };
 
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
+int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
@@ -37,6 +38,11 @@ static inline int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
 static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
 				   struct kvm_device_attr *attr)
 {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index c55d94a1a8f5..d4c0b53a5fb2 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -370,6 +370,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_PVTIME_IPA	0
 #define KVM_ARM_VCPU_SPE_CTRL		3
 #define   KVM_ARM_VCPU_SPE_IRQ		0
+#define   KVM_ARM_VCPU_SPE_INIT		1
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 8a7c01d1df58..5270f3b9886c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -652,6 +652,10 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 		return ret;
 
 	ret = kvm_arm_pmu_v3_enable(vcpu);
+	if (ret)
+		return ret;
+
+	ret = kvm_spe_vcpu_first_run_init(vcpu);
 
 	/*
 	 * Initialize traps for protected VMs.
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 7520d7925460..a3d5bcd1a96b 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -45,6 +45,17 @@ int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_vcpu_has_spe(vcpu))
+		return 0;
+
+	if (!vcpu->arch.spe.initialized)
+		return -EINVAL;
+
+	return 0;
+}
+
 static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_supports_spe())
@@ -104,6 +115,18 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		vcpu->arch.spe.irq_num = irq;
 		return 0;
 	}
+	case KVM_ARM_VCPU_SPE_INIT:
+		if (!vcpu->arch.spe.irq_num)
+			return -ENXIO;
+
+		if (!vgic_initialized(vcpu->kvm))
+			return -ENXIO;
+
+		if (kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num, &vcpu->arch.spe))
+			return -ENXIO;
+
+		vcpu->arch.spe.initialized = true;
+		return 0;
 	}
 
 	return -ENXIO;
@@ -140,6 +163,7 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 	switch(attr->attr) {
 	case KVM_ARM_VCPU_SPE_IRQ:
+	case KVM_ARM_VCPU_SPE_INIT:
 		return 0;
 	}
 
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 22/38] KVM: arm64: Add SPE VCPU device attribute to initialize SPE
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland
  Cc: Sudeep Holla

From: Sudeep Holla <sudeep.holla@arm.com>

Add KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_INIT) VCPU ioctl to initialize
SPE. Initialization can only be done once for a VCPU. If the feature bit is
set, then SPE must be initialized before the VCPU can be run.

[ Alexandru E: Split from "KVM: arm64: Add a new VCPU device control group
	       for SPE" ]

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst | 16 ++++++++++++++++
 arch/arm64/include/asm/kvm_spe.h        |  6 ++++++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/arm.c                    |  4 ++++
 arch/arm64/kvm/spe.c                    | 24 ++++++++++++++++++++++++
 5 files changed, 51 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index a27b149c3b8b..0ed852315664 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -255,3 +255,19 @@ Returns:
 Specifies the Profiling Buffer management interrupt number. The interrupt number
 must be a PPI and the interrupt number must be the same for each VCPU. SPE
 emulation requires an in-kernel vGIC implementation.
+
+5.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_INIT
+-----------------------------------
+
+:Parameters: no additional parameter in kvm_device_attr.addr
+
+Returns:
+
+	 =======  ============================================
+	 -EBUSY   SPE already initialized for this VCPU
+	 -ENXIO   SPE not supported or not properly configured
+	 =======  ============================================
+
+Request initialization of the Statistical Profiling Extension for this VCPU.
+Must be done after initializaing the in-kernel irqchip and after setting the
+Profiling Buffer management interrupt number for the VCPU.
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index a5484953d06f..14df2c830fda 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -22,6 +22,7 @@ struct kvm_vcpu_spe {
 };
 
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
+int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
@@ -37,6 +38,11 @@ static inline int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
 static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
 				   struct kvm_device_attr *attr)
 {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index c55d94a1a8f5..d4c0b53a5fb2 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -370,6 +370,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_PVTIME_IPA	0
 #define KVM_ARM_VCPU_SPE_CTRL		3
 #define   KVM_ARM_VCPU_SPE_IRQ		0
+#define   KVM_ARM_VCPU_SPE_INIT		1
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 8a7c01d1df58..5270f3b9886c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -652,6 +652,10 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 		return ret;
 
 	ret = kvm_arm_pmu_v3_enable(vcpu);
+	if (ret)
+		return ret;
+
+	ret = kvm_spe_vcpu_first_run_init(vcpu);
 
 	/*
 	 * Initialize traps for protected VMs.
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 7520d7925460..a3d5bcd1a96b 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -45,6 +45,17 @@ int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_vcpu_has_spe(vcpu))
+		return 0;
+
+	if (!vcpu->arch.spe.initialized)
+		return -EINVAL;
+
+	return 0;
+}
+
 static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_supports_spe())
@@ -104,6 +115,18 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		vcpu->arch.spe.irq_num = irq;
 		return 0;
 	}
+	case KVM_ARM_VCPU_SPE_INIT:
+		if (!vcpu->arch.spe.irq_num)
+			return -ENXIO;
+
+		if (!vgic_initialized(vcpu->kvm))
+			return -ENXIO;
+
+		if (kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num, &vcpu->arch.spe))
+			return -ENXIO;
+
+		vcpu->arch.spe.initialized = true;
+		return 0;
 	}
 
 	return -ENXIO;
@@ -140,6 +163,7 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 	switch(attr->attr) {
 	case KVM_ARM_VCPU_SPE_IRQ:
+	case KVM_ARM_VCPU_SPE_INIT:
 		return 0;
 	}
 
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 23/38] KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Allow the guest running at EL1 to use SPE when that feature is enabled for
the VCPU by setting the profiling buffer owning translation regime to EL1&0
and disabling traps to the profiling control registers. Keep trapping
accesses to the buffer control registers because that's needed to emulate
the buffer management interrupt.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h |  1 +
 arch/arm64/kvm/debug.c           | 23 +++++++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index a39fcf318c77..3c98bcf7234f 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -302,6 +302,7 @@
 #define MDCR_EL2_TPMS		(UL(1) << 14)
 #define MDCR_EL2_E2PB_MASK	(UL(0x3))
 #define MDCR_EL2_E2PB_SHIFT	(UL(12))
+#define MDCR_EL2_E2PB_TRAP_EL1	(UL(2))
 #define MDCR_EL2_TDRA		(UL(1) << 11)
 #define MDCR_EL2_TDOSA		(UL(1) << 10)
 #define MDCR_EL2_TDA		(UL(1) << 9)
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index db9361338b2a..64629b4bb036 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -77,24 +77,39 @@ void kvm_arm_init_debug(void)
  *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
  *  - Debug ROM Address (MDCR_EL2_TDRA)
  *  - OS related registers (MDCR_EL2_TDOSA)
- *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
  *  - Self-hosted Trace Filter controls (MDCR_EL2_TTRF)
  *  - Self-hosted Trace (MDCR_EL2_TTRF/MDCR_EL2_E2TB)
  */
 static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK
-	 * to disable guest access to the profiling and trace buffers
+	 * This also clears MDCR_EL2_E2TB_MASK to disable guest access to the
+	 * trace buffers.
 	 */
 	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
 	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
-				MDCR_EL2_TPMS |
 				MDCR_EL2_TTRF |
 				MDCR_EL2_TPMCR |
 				MDCR_EL2_TDRA |
 				MDCR_EL2_TDOSA);
 
+	if (kvm_supports_spe() && kvm_vcpu_has_spe(vcpu)) {
+		/*
+		 * Use EL1&0 for the profiling buffer translation regime and
+		 * trap accesses to the buffer control registers; leave
+		 * MDCR_EL2.TPMS unset and do not trap accesses to the profiling
+		 * control registers.
+		 */
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_E2PB_TRAP_EL1 << MDCR_EL2_E2PB_SHIFT;
+	} else {
+		/*
+		 * Trap accesses to the profiling control registers; leave
+		 * MDCR_EL2.E2PB unset and use the EL2&0 translation regime for
+		 * the profiling buffer.
+		 */
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_TPMS;
+	}
+
 	/* Is the VM being debugged by userspace? */
 	if (vcpu->guest_debug)
 		/* Route all software debug exceptions to EL2 */
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 23/38] KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Allow the guest running at EL1 to use SPE when that feature is enabled for
the VCPU by setting the profiling buffer owning translation regime to EL1&0
and disabling traps to the profiling control registers. Keep trapping
accesses to the buffer control registers because that's needed to emulate
the buffer management interrupt.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h |  1 +
 arch/arm64/kvm/debug.c           | 23 +++++++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index a39fcf318c77..3c98bcf7234f 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -302,6 +302,7 @@
 #define MDCR_EL2_TPMS		(UL(1) << 14)
 #define MDCR_EL2_E2PB_MASK	(UL(0x3))
 #define MDCR_EL2_E2PB_SHIFT	(UL(12))
+#define MDCR_EL2_E2PB_TRAP_EL1	(UL(2))
 #define MDCR_EL2_TDRA		(UL(1) << 11)
 #define MDCR_EL2_TDOSA		(UL(1) << 10)
 #define MDCR_EL2_TDA		(UL(1) << 9)
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index db9361338b2a..64629b4bb036 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -77,24 +77,39 @@ void kvm_arm_init_debug(void)
  *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
  *  - Debug ROM Address (MDCR_EL2_TDRA)
  *  - OS related registers (MDCR_EL2_TDOSA)
- *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
  *  - Self-hosted Trace Filter controls (MDCR_EL2_TTRF)
  *  - Self-hosted Trace (MDCR_EL2_TTRF/MDCR_EL2_E2TB)
  */
 static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK
-	 * to disable guest access to the profiling and trace buffers
+	 * This also clears MDCR_EL2_E2TB_MASK to disable guest access to the
+	 * trace buffers.
 	 */
 	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
 	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
-				MDCR_EL2_TPMS |
 				MDCR_EL2_TTRF |
 				MDCR_EL2_TPMCR |
 				MDCR_EL2_TDRA |
 				MDCR_EL2_TDOSA);
 
+	if (kvm_supports_spe() && kvm_vcpu_has_spe(vcpu)) {
+		/*
+		 * Use EL1&0 for the profiling buffer translation regime and
+		 * trap accesses to the buffer control registers; leave
+		 * MDCR_EL2.TPMS unset and do not trap accesses to the profiling
+		 * control registers.
+		 */
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_E2PB_TRAP_EL1 << MDCR_EL2_E2PB_SHIFT;
+	} else {
+		/*
+		 * Trap accesses to the profiling control registers; leave
+		 * MDCR_EL2.E2PB unset and use the EL2&0 translation regime for
+		 * the profiling buffer.
+		 */
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_TPMS;
+	}
+
 	/* Is the VM being debugged by userspace? */
 	if (vcpu->guest_debug)
 		/* Route all software debug exceptions to EL2 */
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 24/38] KVM: arm64: Move accesses to MDCR_EL2 out of __{activate, deactivate}_traps_common
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

To run a guest with SPE, MDCR_EL2 must be configured such that the buffer
owning regime is EL1&0. With VHE enabled, the guest runs at EL2 and
changing the owning regime to EL1&0 in vcpu_put() would mean creating an
extended blackout window for the host.

Move the acceses to MDCR_EL2 out of __activate_traps_common() and
__deactivate_traps_common() to prepare for executing them later in the run
loop in the VHE case.

No functional change intended.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 16 +++++++++++-----
 arch/arm64/kvm/hyp/nvhe/switch.c        |  2 ++
 arch/arm64/kvm/hyp/vhe/switch.c         |  2 ++
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 7a0af1d39303..cf9b66a2acb0 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -97,20 +97,26 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
 		write_sysreg(0, pmselr_el0);
 		write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
 	}
-
-	vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
-	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
 }
 
 static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
 {
-	write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
-
 	write_sysreg(0, hstr_el2);
 	if (kvm_arm_support_pmu_v3())
 		write_sysreg(0, pmuserenr_el0);
 }
 
+static inline void __restore_guest_mdcr_el2(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
+	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+}
+
+static inline void __restore_host_mdcr_el2(struct kvm_vcpu *vcpu)
+{
+	write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
+}
+
 static inline void ___activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 hcr = vcpu->arch.hcr_el2;
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index c0e3fed26d93..d1f55514effc 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -41,6 +41,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 
 	___activate_traps(vcpu);
 	__activate_traps_common(vcpu);
+	__restore_guest_mdcr_el2(vcpu);
 
 	val = vcpu->arch.cptr_el2;
 	val |= CPTR_EL2_TTA | CPTR_EL2_TAM;
@@ -91,6 +92,7 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 		isb();
 	}
 
+	__restore_host_mdcr_el2(vcpu);
 	__deactivate_traps_common(vcpu);
 
 	write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 5a2cb5d9bc4b..f85a13bfad3d 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -89,10 +89,12 @@ NOKPROBE_SYMBOL(__deactivate_traps);
 void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 {
 	__activate_traps_common(vcpu);
+	__restore_guest_mdcr_el2(vcpu);
 }
 
 void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 {
+	__restore_host_mdcr_el2(vcpu);
 	__deactivate_traps_common(vcpu);
 }
 
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 24/38] KVM: arm64: Move accesses to MDCR_EL2 out of __{activate, deactivate}_traps_common
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

To run a guest with SPE, MDCR_EL2 must be configured such that the buffer
owning regime is EL1&0. With VHE enabled, the guest runs at EL2 and
changing the owning regime to EL1&0 in vcpu_put() would mean creating an
extended blackout window for the host.

Move the acceses to MDCR_EL2 out of __activate_traps_common() and
__deactivate_traps_common() to prepare for executing them later in the run
loop in the VHE case.

No functional change intended.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 16 +++++++++++-----
 arch/arm64/kvm/hyp/nvhe/switch.c        |  2 ++
 arch/arm64/kvm/hyp/vhe/switch.c         |  2 ++
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 7a0af1d39303..cf9b66a2acb0 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -97,20 +97,26 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
 		write_sysreg(0, pmselr_el0);
 		write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
 	}
-
-	vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
-	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
 }
 
 static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
 {
-	write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
-
 	write_sysreg(0, hstr_el2);
 	if (kvm_arm_support_pmu_v3())
 		write_sysreg(0, pmuserenr_el0);
 }
 
+static inline void __restore_guest_mdcr_el2(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
+	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+}
+
+static inline void __restore_host_mdcr_el2(struct kvm_vcpu *vcpu)
+{
+	write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
+}
+
 static inline void ___activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 hcr = vcpu->arch.hcr_el2;
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index c0e3fed26d93..d1f55514effc 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -41,6 +41,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 
 	___activate_traps(vcpu);
 	__activate_traps_common(vcpu);
+	__restore_guest_mdcr_el2(vcpu);
 
 	val = vcpu->arch.cptr_el2;
 	val |= CPTR_EL2_TTA | CPTR_EL2_TAM;
@@ -91,6 +92,7 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 		isb();
 	}
 
+	__restore_host_mdcr_el2(vcpu);
 	__deactivate_traps_common(vcpu);
 
 	write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 5a2cb5d9bc4b..f85a13bfad3d 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -89,10 +89,12 @@ NOKPROBE_SYMBOL(__deactivate_traps);
 void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 {
 	__activate_traps_common(vcpu);
+	__restore_guest_mdcr_el2(vcpu);
 }
 
 void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 {
+	__restore_host_mdcr_el2(vcpu);
 	__deactivate_traps_common(vcpu);
 }
 
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 25/38] KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

When a VCPU has the SPE feature, MDCR_EL2 sets the buffer owning regime to
EL1&0. Write the guest's MDCR_EL2 value as late as possible and restore the
host's value as soon as possible at each world switch to make the profiling
blackout window as small as possible for the host.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/debug.c          | 14 ++++++++++++--
 arch/arm64/kvm/hyp/vhe/switch.c | 15 +++++++++++++--
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 64629b4bb036..ee764ea0da5b 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -249,9 +249,19 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 
 	/* Write mdcr_el2 changes since vcpu_load on VHE systems */
-	if (has_vhe() && orig_mdcr_el2 != vcpu->arch.mdcr_el2)
-		write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+	if (has_vhe()) {
+		/*
+		 * MDCR_EL2 can modify the SPE buffer owning regime, defer the
+		 * write until the VCPU is run.
+		 */
+		if (kvm_vcpu_has_spe(vcpu))
+			goto out;
+
+		if (orig_mdcr_el2 != vcpu->arch.mdcr_el2)
+			write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+	}
 
+out:
 	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
 }
 
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index f85a13bfad3d..1a46a4840d17 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -35,6 +35,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
+	if (kvm_vcpu_has_spe(vcpu))
+		__restore_guest_mdcr_el2(vcpu);
+
 	___activate_traps(vcpu);
 
 	val = read_sysreg(cpacr_el1);
@@ -70,6 +73,9 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 {
 	extern char vectors[];	/* kernel exception vectors */
 
+	if (kvm_vcpu_has_spe(vcpu))
+		__restore_host_mdcr_el2(vcpu);
+
 	___deactivate_traps(vcpu);
 
 	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
@@ -82,6 +88,7 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
 
 	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
+
 	write_sysreg(vectors, vbar_el1);
 }
 NOKPROBE_SYMBOL(__deactivate_traps);
@@ -89,12 +96,16 @@ NOKPROBE_SYMBOL(__deactivate_traps);
 void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 {
 	__activate_traps_common(vcpu);
-	__restore_guest_mdcr_el2(vcpu);
+
+	if (!kvm_vcpu_has_spe(vcpu))
+		__restore_guest_mdcr_el2(vcpu);
 }
 
 void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 {
-	__restore_host_mdcr_el2(vcpu);
+	if (!kvm_vcpu_has_spe(vcpu))
+		__restore_host_mdcr_el2(vcpu);
+
 	__deactivate_traps_common(vcpu);
 }
 
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 25/38] KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

When a VCPU has the SPE feature, MDCR_EL2 sets the buffer owning regime to
EL1&0. Write the guest's MDCR_EL2 value as late as possible and restore the
host's value as soon as possible at each world switch to make the profiling
blackout window as small as possible for the host.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/debug.c          | 14 ++++++++++++--
 arch/arm64/kvm/hyp/vhe/switch.c | 15 +++++++++++++--
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 64629b4bb036..ee764ea0da5b 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -249,9 +249,19 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 
 	/* Write mdcr_el2 changes since vcpu_load on VHE systems */
-	if (has_vhe() && orig_mdcr_el2 != vcpu->arch.mdcr_el2)
-		write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+	if (has_vhe()) {
+		/*
+		 * MDCR_EL2 can modify the SPE buffer owning regime, defer the
+		 * write until the VCPU is run.
+		 */
+		if (kvm_vcpu_has_spe(vcpu))
+			goto out;
+
+		if (orig_mdcr_el2 != vcpu->arch.mdcr_el2)
+			write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+	}
 
+out:
 	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
 }
 
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index f85a13bfad3d..1a46a4840d17 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -35,6 +35,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
+	if (kvm_vcpu_has_spe(vcpu))
+		__restore_guest_mdcr_el2(vcpu);
+
 	___activate_traps(vcpu);
 
 	val = read_sysreg(cpacr_el1);
@@ -70,6 +73,9 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 {
 	extern char vectors[];	/* kernel exception vectors */
 
+	if (kvm_vcpu_has_spe(vcpu))
+		__restore_host_mdcr_el2(vcpu);
+
 	___deactivate_traps(vcpu);
 
 	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
@@ -82,6 +88,7 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
 
 	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
+
 	write_sysreg(vectors, vbar_el1);
 }
 NOKPROBE_SYMBOL(__deactivate_traps);
@@ -89,12 +96,16 @@ NOKPROBE_SYMBOL(__deactivate_traps);
 void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 {
 	__activate_traps_common(vcpu);
-	__restore_guest_mdcr_el2(vcpu);
+
+	if (!kvm_vcpu_has_spe(vcpu))
+		__restore_guest_mdcr_el2(vcpu);
 }
 
 void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 {
-	__restore_host_mdcr_el2(vcpu);
+	if (!kvm_vcpu_has_spe(vcpu))
+		__restore_host_mdcr_el2(vcpu);
+
 	__deactivate_traps_common(vcpu);
 }
 
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 26/38] KVM: arm64: Add SPE system registers to VCPU context
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Add the SPE system registers to the VCPU context. Omitted are
PMBIDR_EL1, which cannot be trapped, and PMSIR_EL1, which is a read-only
register. The registers that KVM traps are stored in the sys_regs array
on a write, and returned on a read; complete emulation and save/restore
for all registers on world switch will be added a future patches.

KVM exposes FEAT_SPEv1p1 to guests in the ID_AA64DFR0_EL1 register and
doesn't trap accesses to the profiling control registers. If the hardware
supports FEAT_SPEv1p2, the guest will be able to access the PMSNEVFR_EL1
register, which is UNDEFINED for FEAT_SPEv1p1. However, that
inconsistency is somewhat consistent with the architecture because
PMBIDR_EL1 behaves similarly: the register is UNDEFINED if SPE is missing,
but a VCPU without the SPE feature can still read the register because
there is no (easy) way for KVM to trap accesses to the register.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 12 +++++++
 arch/arm64/include/asm/kvm_spe.h  |  7 ++++
 arch/arm64/kvm/spe.c              | 10 ++++++
 arch/arm64/kvm/sys_regs.c         | 54 ++++++++++++++++++++++++-------
 4 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8c6e6eef0ae9..dd7746836477 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -237,6 +237,18 @@ enum vcpu_sysreg {
 	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
 
+       /* Statistical Profiling Extension Registers. */
+	PMSCR_EL1,      /* Statistical Profiling Control Register */
+	PMSICR_EL1,     /* Sampling Interval Counter Register */
+	PMSIRR_EL1,     /* Sampling Interval Reload Register */
+	PMSFCR_EL1,     /* Sampling Filter Control Register */
+	PMSEVFR_EL1,    /* Sampling Event Filter Register */
+	PMSLATFR_EL1,   /* Sampling Latency Filter Register */
+	PMBLIMITR_EL1,  /* Profiling Buffer Limit Address Register */
+	PMBPTR_EL1,     /* Profiling Buffer Write Pointer Register */
+	PMBSR_EL1,      /* Profiling Buffer Status/syndrome Register */
+	PMSCR_EL2,	/* Statistical Profiling Control Register, EL2 */
+
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 14df2c830fda..7c2d5695120a 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -24,9 +24,13 @@ struct kvm_vcpu_spe {
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 
+void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
+u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
+
 #else
 #define kvm_supports_spe()	(false)
 
@@ -43,6 +47,9 @@ static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
+static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
+
 static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
 				   struct kvm_device_attr *attr)
 {
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index a3d5bcd1a96b..e8a8aa7f10b9 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -56,6 +56,16 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val)
+{
+	__vcpu_sys_reg(vcpu, reg) = val;
+}
+
+u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg)
+{
+	return __vcpu_sys_reg(vcpu, reg);
+}
+
 static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_supports_spe())
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c36df734c1ad..2026eaebcc31 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -590,6 +590,33 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	vcpu_write_sys_reg(vcpu, (1ULL << 31) | mpidr, MPIDR_EL1);
 }
 
+static unsigned int spe_visibility(const struct kvm_vcpu *vcpu,
+				  const struct sys_reg_desc *r)
+{
+	if (kvm_vcpu_has_spe(vcpu))
+		return 0;
+
+	return REG_HIDDEN;
+}
+
+static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{	int reg = r->reg;
+	u64 val = p->regval;
+
+	if (reg < PMBLIMITR_EL1) {
+		print_sys_reg_msg(p, "Unsupported guest SPE register access at: %lx [%08lx]\n",
+				  *vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
+	}
+
+	if (p->is_write)
+		kvm_spe_write_sysreg(vcpu, reg, val);
+	else
+		p->regval = kvm_spe_read_sysreg(vcpu, reg);
+
+	return true;
+}
+
 static unsigned int pmu_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *r)
 {
@@ -989,6 +1016,10 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	  .reset = reset_pmevtyper,					\
 	  .access = access_pmu_evtyper, .reg = (PMEVTYPER0_EL0 + n), }
 
+#define SPE_SYS_REG(r)							\
+	SYS_DESC(r), .access = access_spe_reg, .reset = reset_val,	\
+	.val = 0, .visibility = spe_visibility
+
 static bool undef_access(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			 const struct sys_reg_desc *r)
 {
@@ -1582,18 +1613,17 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
-	{ SYS_DESC(SYS_PMSCR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSNEVFR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSICR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSIRR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSFCR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSEVFR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSLATFR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSIDR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMBLIMITR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMBPTR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMBSR_EL1), undef_access },
-	/* PMBIDR_EL1 is not trapped */
+	{ SPE_SYS_REG(SYS_PMSCR_EL1), .reg = PMSCR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSICR_EL1), .reg = PMSICR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSIRR_EL1), .reg = PMSIRR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSFCR_EL1), .reg = PMSFCR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSEVFR_EL1), .reg = PMSEVFR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSLATFR_EL1), .reg = PMSLATFR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSIDR_EL1), .reset = NULL },
+	{ SPE_SYS_REG(SYS_PMBLIMITR_EL1), .reg = PMBLIMITR_EL1 },
+	{ SPE_SYS_REG(SYS_PMBPTR_EL1), .reg = PMBPTR_EL1 },
+	{ SPE_SYS_REG(SYS_PMBSR_EL1), .reg = PMBSR_EL1 },
+	/* PMBIDR_EL1 and PMSCR_EL2 are not trapped */
 
 	{ PMU_SYS_REG(SYS_PMINTENSET_EL1),
 	  .access = access_pminten, .reg = PMINTENSET_EL1 },
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 26/38] KVM: arm64: Add SPE system registers to VCPU context
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Add the SPE system registers to the VCPU context. Omitted are
PMBIDR_EL1, which cannot be trapped, and PMSIR_EL1, which is a read-only
register. The registers that KVM traps are stored in the sys_regs array
on a write, and returned on a read; complete emulation and save/restore
for all registers on world switch will be added a future patches.

KVM exposes FEAT_SPEv1p1 to guests in the ID_AA64DFR0_EL1 register and
doesn't trap accesses to the profiling control registers. If the hardware
supports FEAT_SPEv1p2, the guest will be able to access the PMSNEVFR_EL1
register, which is UNDEFINED for FEAT_SPEv1p1. However, that
inconsistency is somewhat consistent with the architecture because
PMBIDR_EL1 behaves similarly: the register is UNDEFINED if SPE is missing,
but a VCPU without the SPE feature can still read the register because
there is no (easy) way for KVM to trap accesses to the register.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 12 +++++++
 arch/arm64/include/asm/kvm_spe.h  |  7 ++++
 arch/arm64/kvm/spe.c              | 10 ++++++
 arch/arm64/kvm/sys_regs.c         | 54 ++++++++++++++++++++++++-------
 4 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8c6e6eef0ae9..dd7746836477 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -237,6 +237,18 @@ enum vcpu_sysreg {
 	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
 
+       /* Statistical Profiling Extension Registers. */
+	PMSCR_EL1,      /* Statistical Profiling Control Register */
+	PMSICR_EL1,     /* Sampling Interval Counter Register */
+	PMSIRR_EL1,     /* Sampling Interval Reload Register */
+	PMSFCR_EL1,     /* Sampling Filter Control Register */
+	PMSEVFR_EL1,    /* Sampling Event Filter Register */
+	PMSLATFR_EL1,   /* Sampling Latency Filter Register */
+	PMBLIMITR_EL1,  /* Profiling Buffer Limit Address Register */
+	PMBPTR_EL1,     /* Profiling Buffer Write Pointer Register */
+	PMBSR_EL1,      /* Profiling Buffer Status/syndrome Register */
+	PMSCR_EL2,	/* Statistical Profiling Control Register, EL2 */
+
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 14df2c830fda..7c2d5695120a 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -24,9 +24,13 @@ struct kvm_vcpu_spe {
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 
+void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
+u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
+
 #else
 #define kvm_supports_spe()	(false)
 
@@ -43,6 +47,9 @@ static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
+static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
+
 static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
 				   struct kvm_device_attr *attr)
 {
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index a3d5bcd1a96b..e8a8aa7f10b9 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -56,6 +56,16 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val)
+{
+	__vcpu_sys_reg(vcpu, reg) = val;
+}
+
+u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg)
+{
+	return __vcpu_sys_reg(vcpu, reg);
+}
+
 static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_supports_spe())
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c36df734c1ad..2026eaebcc31 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -590,6 +590,33 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	vcpu_write_sys_reg(vcpu, (1ULL << 31) | mpidr, MPIDR_EL1);
 }
 
+static unsigned int spe_visibility(const struct kvm_vcpu *vcpu,
+				  const struct sys_reg_desc *r)
+{
+	if (kvm_vcpu_has_spe(vcpu))
+		return 0;
+
+	return REG_HIDDEN;
+}
+
+static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{	int reg = r->reg;
+	u64 val = p->regval;
+
+	if (reg < PMBLIMITR_EL1) {
+		print_sys_reg_msg(p, "Unsupported guest SPE register access at: %lx [%08lx]\n",
+				  *vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
+	}
+
+	if (p->is_write)
+		kvm_spe_write_sysreg(vcpu, reg, val);
+	else
+		p->regval = kvm_spe_read_sysreg(vcpu, reg);
+
+	return true;
+}
+
 static unsigned int pmu_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *r)
 {
@@ -989,6 +1016,10 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	  .reset = reset_pmevtyper,					\
 	  .access = access_pmu_evtyper, .reg = (PMEVTYPER0_EL0 + n), }
 
+#define SPE_SYS_REG(r)							\
+	SYS_DESC(r), .access = access_spe_reg, .reset = reset_val,	\
+	.val = 0, .visibility = spe_visibility
+
 static bool undef_access(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			 const struct sys_reg_desc *r)
 {
@@ -1582,18 +1613,17 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
-	{ SYS_DESC(SYS_PMSCR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSNEVFR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSICR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSIRR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSFCR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSEVFR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSLATFR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSIDR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMBLIMITR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMBPTR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMBSR_EL1), undef_access },
-	/* PMBIDR_EL1 is not trapped */
+	{ SPE_SYS_REG(SYS_PMSCR_EL1), .reg = PMSCR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSICR_EL1), .reg = PMSICR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSIRR_EL1), .reg = PMSIRR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSFCR_EL1), .reg = PMSFCR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSEVFR_EL1), .reg = PMSEVFR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSLATFR_EL1), .reg = PMSLATFR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSIDR_EL1), .reset = NULL },
+	{ SPE_SYS_REG(SYS_PMBLIMITR_EL1), .reg = PMBLIMITR_EL1 },
+	{ SPE_SYS_REG(SYS_PMBPTR_EL1), .reg = PMBPTR_EL1 },
+	{ SPE_SYS_REG(SYS_PMBSR_EL1), .reg = PMBSR_EL1 },
+	/* PMBIDR_EL1 and PMSCR_EL2 are not trapped */
 
 	{ PMU_SYS_REG(SYS_PMINTENSET_EL1),
 	  .access = access_pminten, .reg = PMINTENSET_EL1 },
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 27/38] KVM: arm64: nVHE: Save PMSCR_EL1 to the host context
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The SPE registers are now part of the KVM register context, use the host
context to save the value of PMSCR_EL1 instead of a dedicated field in
host_debug_state.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h  |  2 --
 arch/arm64/include/asm/kvm_hyp.h   |  6 ++++--
 arch/arm64/kvm/hyp/nvhe/debug-sr.c | 10 ++++++----
 arch/arm64/kvm/hyp/nvhe/switch.c   |  4 ++--
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index dd7746836477..e29e9de42cfb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -360,8 +360,6 @@ struct kvm_vcpu_arch {
 	struct {
 		/* {Break,watch}point registers */
 		struct kvm_guest_debug_arch regs;
-		/* Statistical profiling extension */
-		u64 pmscr_el1;
 		/* Self-hosted trace */
 		u64 trfcr_el1;
 	} host_debug_state;
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 5afd14ab15b9..0a5ff4361069 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -84,8 +84,10 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
 void __debug_switch_to_host(struct kvm_vcpu *vcpu);
 
 #ifdef __KVM_NVHE_HYPERVISOR__
-void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu);
-void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
+void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *host_ctxt);
+void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				       struct kvm_cpu_context *host_ctxt);
 #endif
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index df361d839902..565c31c15311 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -81,11 +81,12 @@ static void __debug_restore_trace(u64 trfcr_el1)
 	write_sysreg_s(trfcr_el1, SYS_TRFCR_EL1);
 }
 
-void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
+void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *host_ctxt)
 {
 	/* Disable and flush SPE data generation */
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
-		__debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
+		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	/* Disable and flush Self-Hosted Trace generation */
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
 		__debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
@@ -96,10 +97,11 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 	__debug_switch_to_guest_common(vcpu);
 }
 
-void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
+void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				       struct kvm_cpu_context *host_ctxt)
 {
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
-		__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
+		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
 		__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index d1f55514effc..b6489e244025 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -290,7 +290,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 * translation regime to EL2 (via MDCR_EL2_E2PB == 0) and
 	 * before we load guest Stage1.
 	 */
-	__debug_save_host_buffers_nvhe(vcpu);
+	__debug_save_host_buffers_nvhe(vcpu, host_ctxt);
 
 	__kvm_adjust_pc(vcpu);
 
@@ -342,7 +342,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 * This must come after restoring the host sysregs, since a non-VHE
 	 * system may enable SPE here and make use of the TTBRs.
 	 */
-	__debug_restore_host_buffers_nvhe(vcpu);
+	__debug_restore_host_buffers_nvhe(vcpu, host_ctxt);
 
 	if (pmu_switch_needed)
 		__pmu_switch_to_host(host_ctxt);
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 27/38] KVM: arm64: nVHE: Save PMSCR_EL1 to the host context
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The SPE registers are now part of the KVM register context, use the host
context to save the value of PMSCR_EL1 instead of a dedicated field in
host_debug_state.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h  |  2 --
 arch/arm64/include/asm/kvm_hyp.h   |  6 ++++--
 arch/arm64/kvm/hyp/nvhe/debug-sr.c | 10 ++++++----
 arch/arm64/kvm/hyp/nvhe/switch.c   |  4 ++--
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index dd7746836477..e29e9de42cfb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -360,8 +360,6 @@ struct kvm_vcpu_arch {
 	struct {
 		/* {Break,watch}point registers */
 		struct kvm_guest_debug_arch regs;
-		/* Statistical profiling extension */
-		u64 pmscr_el1;
 		/* Self-hosted trace */
 		u64 trfcr_el1;
 	} host_debug_state;
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 5afd14ab15b9..0a5ff4361069 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -84,8 +84,10 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
 void __debug_switch_to_host(struct kvm_vcpu *vcpu);
 
 #ifdef __KVM_NVHE_HYPERVISOR__
-void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu);
-void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
+void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *host_ctxt);
+void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				       struct kvm_cpu_context *host_ctxt);
 #endif
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index df361d839902..565c31c15311 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -81,11 +81,12 @@ static void __debug_restore_trace(u64 trfcr_el1)
 	write_sysreg_s(trfcr_el1, SYS_TRFCR_EL1);
 }
 
-void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
+void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *host_ctxt)
 {
 	/* Disable and flush SPE data generation */
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
-		__debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
+		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	/* Disable and flush Self-Hosted Trace generation */
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
 		__debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
@@ -96,10 +97,11 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 	__debug_switch_to_guest_common(vcpu);
 }
 
-void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
+void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				       struct kvm_cpu_context *host_ctxt)
 {
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
-		__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
+		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
 		__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index d1f55514effc..b6489e244025 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -290,7 +290,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 * translation regime to EL2 (via MDCR_EL2_E2PB == 0) and
 	 * before we load guest Stage1.
 	 */
-	__debug_save_host_buffers_nvhe(vcpu);
+	__debug_save_host_buffers_nvhe(vcpu, host_ctxt);
 
 	__kvm_adjust_pc(vcpu);
 
@@ -342,7 +342,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 * This must come after restoring the host sysregs, since a non-VHE
 	 * system may enable SPE here and make use of the TTBRs.
 	 */
-	__debug_restore_host_buffers_nvhe(vcpu);
+	__debug_restore_host_buffers_nvhe(vcpu, host_ctxt);
 
 	if (pmu_switch_needed)
 		__pmu_switch_to_host(host_ctxt);
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 28/38] KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Setting the KVM_ARM64_DEBUG_STATE_SAVE_SPE flag will stop profiling to
drain the buffer, if the buffer is enabled when switching to the guest, and
then re-enable profiling on the return to the host. Rename it to
KVM_ARM64_DEBUG_SAVE_SPE_BUFFER to avoid any confusion with what a SPE
enabled VCPU will do, which is to save and restore the full SPE state on a
world switch, and not just part of it, some of the time. This also matches
the name of the function __debug_save_host_buffers_nvhe(), which makes use
of the flag to decide if the buffer should be drained.

Similar treatment was applied to KVM_ARM64_DEBUG_STATE_SAVE_TRBE, which was
renamed to KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER, for consistency and to better
reflect what it is doing.

CC: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h  | 24 ++++++++++++------------
 arch/arm64/kvm/debug.c             | 11 ++++++-----
 arch/arm64/kvm/hyp/nvhe/debug-sr.c |  8 ++++----
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e29e9de42cfb..082994f5fb0e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -443,18 +443,18 @@ struct kvm_vcpu_arch {
 })
 
 /* vcpu_arch flags field values: */
-#define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
-#define KVM_ARM64_FP_ENABLED		(1 << 1) /* guest FP regs loaded */
-#define KVM_ARM64_FP_HOST		(1 << 2) /* host FP regs loaded */
-#define KVM_ARM64_HOST_SVE_IN_USE	(1 << 3) /* backup for host TIF_SVE */
-#define KVM_ARM64_HOST_SVE_ENABLED	(1 << 4) /* SVE enabled for EL0 */
-#define KVM_ARM64_GUEST_HAS_SVE		(1 << 5) /* SVE exposed to guest */
-#define KVM_ARM64_VCPU_SVE_FINALIZED	(1 << 6) /* SVE config completed */
-#define KVM_ARM64_GUEST_HAS_PTRAUTH	(1 << 7) /* PTRAUTH exposed to guest */
-#define KVM_ARM64_PENDING_EXCEPTION	(1 << 8) /* Exception pending */
-#define KVM_ARM64_EXCEPT_MASK		(7 << 9) /* Target EL/MODE */
-#define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
-#define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
+#define KVM_ARM64_DEBUG_DIRTY			(1 << 0)
+#define KVM_ARM64_FP_ENABLED			(1 << 1) /* guest FP regs loaded */
+#define KVM_ARM64_FP_HOST			(1 << 2) /* host FP regs loaded */
+#define KVM_ARM64_HOST_SVE_IN_USE		(1 << 3) /* backup for host TIF_SVE */
+#define KVM_ARM64_HOST_SVE_ENABLED		(1 << 4) /* SVE enabled for EL0 */
+#define KVM_ARM64_GUEST_HAS_SVE			(1 << 5) /* SVE exposed to guest */
+#define KVM_ARM64_VCPU_SVE_FINALIZED		(1 << 6) /* SVE config completed */
+#define KVM_ARM64_GUEST_HAS_PTRAUTH		(1 << 7) /* PTRAUTH exposed to guest */
+#define KVM_ARM64_PENDING_EXCEPTION		(1 << 8) /* Exception pending */
+#define KVM_ARM64_EXCEPT_MASK			(7 << 9) /* Target EL/MODE */
+#define KVM_ARM64_DEBUG_SAVE_SPE_BUFFER		(1 << 12) /* Save SPE buffer if active  */
+#define KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER	(1 << 13) /* Save TRBE buffer if active  */
 
 #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
 				 KVM_GUESTDBG_USE_SW_BP | \
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index ee764ea0da5b..c09bbbe8f62b 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -299,22 +299,23 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
 		return;
 
 	dfr0 = read_sysreg(id_aa64dfr0_el1);
+
 	/*
 	 * If SPE is present on this CPU and is available at current EL,
-	 * we may need to check if the host state needs to be saved.
+	 * we may need to check if the host buffer needs to be drained.
 	 */
 	if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_PMSVER_SHIFT) &&
 	    !(read_sysreg_s(SYS_PMBIDR_EL1) & BIT(SYS_PMBIDR_EL1_P_SHIFT)))
-		vcpu->arch.flags |= KVM_ARM64_DEBUG_STATE_SAVE_SPE;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_SAVE_SPE_BUFFER;
 
 	/* Check if we have TRBE implemented and available at the host */
 	if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_TRBE_SHIFT) &&
 	    !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_PROG))
-		vcpu->arch.flags |= KVM_ARM64_DEBUG_STATE_SAVE_TRBE;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER;
 }
 
 void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.flags &= ~(KVM_ARM64_DEBUG_STATE_SAVE_SPE |
-			      KVM_ARM64_DEBUG_STATE_SAVE_TRBE);
+	vcpu->arch.flags &= ~(KVM_ARM64_DEBUG_SAVE_SPE_BUFFER |
+			      KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER);
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 565c31c15311..adabdcbbd753 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -85,10 +85,10 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt)
 {
 	/* Disable and flush SPE data generation */
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	/* Disable and flush Self-Hosted Trace generation */
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
 		__debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
 }
 
@@ -100,9 +100,9 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt)
 {
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
 		__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
 }
 
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 28/38] KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Setting the KVM_ARM64_DEBUG_STATE_SAVE_SPE flag will stop profiling to
drain the buffer, if the buffer is enabled when switching to the guest, and
then re-enable profiling on the return to the host. Rename it to
KVM_ARM64_DEBUG_SAVE_SPE_BUFFER to avoid any confusion with what a SPE
enabled VCPU will do, which is to save and restore the full SPE state on a
world switch, and not just part of it, some of the time. This also matches
the name of the function __debug_save_host_buffers_nvhe(), which makes use
of the flag to decide if the buffer should be drained.

Similar treatment was applied to KVM_ARM64_DEBUG_STATE_SAVE_TRBE, which was
renamed to KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER, for consistency and to better
reflect what it is doing.

CC: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h  | 24 ++++++++++++------------
 arch/arm64/kvm/debug.c             | 11 ++++++-----
 arch/arm64/kvm/hyp/nvhe/debug-sr.c |  8 ++++----
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e29e9de42cfb..082994f5fb0e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -443,18 +443,18 @@ struct kvm_vcpu_arch {
 })
 
 /* vcpu_arch flags field values: */
-#define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
-#define KVM_ARM64_FP_ENABLED		(1 << 1) /* guest FP regs loaded */
-#define KVM_ARM64_FP_HOST		(1 << 2) /* host FP regs loaded */
-#define KVM_ARM64_HOST_SVE_IN_USE	(1 << 3) /* backup for host TIF_SVE */
-#define KVM_ARM64_HOST_SVE_ENABLED	(1 << 4) /* SVE enabled for EL0 */
-#define KVM_ARM64_GUEST_HAS_SVE		(1 << 5) /* SVE exposed to guest */
-#define KVM_ARM64_VCPU_SVE_FINALIZED	(1 << 6) /* SVE config completed */
-#define KVM_ARM64_GUEST_HAS_PTRAUTH	(1 << 7) /* PTRAUTH exposed to guest */
-#define KVM_ARM64_PENDING_EXCEPTION	(1 << 8) /* Exception pending */
-#define KVM_ARM64_EXCEPT_MASK		(7 << 9) /* Target EL/MODE */
-#define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
-#define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
+#define KVM_ARM64_DEBUG_DIRTY			(1 << 0)
+#define KVM_ARM64_FP_ENABLED			(1 << 1) /* guest FP regs loaded */
+#define KVM_ARM64_FP_HOST			(1 << 2) /* host FP regs loaded */
+#define KVM_ARM64_HOST_SVE_IN_USE		(1 << 3) /* backup for host TIF_SVE */
+#define KVM_ARM64_HOST_SVE_ENABLED		(1 << 4) /* SVE enabled for EL0 */
+#define KVM_ARM64_GUEST_HAS_SVE			(1 << 5) /* SVE exposed to guest */
+#define KVM_ARM64_VCPU_SVE_FINALIZED		(1 << 6) /* SVE config completed */
+#define KVM_ARM64_GUEST_HAS_PTRAUTH		(1 << 7) /* PTRAUTH exposed to guest */
+#define KVM_ARM64_PENDING_EXCEPTION		(1 << 8) /* Exception pending */
+#define KVM_ARM64_EXCEPT_MASK			(7 << 9) /* Target EL/MODE */
+#define KVM_ARM64_DEBUG_SAVE_SPE_BUFFER		(1 << 12) /* Save SPE buffer if active  */
+#define KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER	(1 << 13) /* Save TRBE buffer if active  */
 
 #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
 				 KVM_GUESTDBG_USE_SW_BP | \
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index ee764ea0da5b..c09bbbe8f62b 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -299,22 +299,23 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
 		return;
 
 	dfr0 = read_sysreg(id_aa64dfr0_el1);
+
 	/*
 	 * If SPE is present on this CPU and is available at current EL,
-	 * we may need to check if the host state needs to be saved.
+	 * we may need to check if the host buffer needs to be drained.
 	 */
 	if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_PMSVER_SHIFT) &&
 	    !(read_sysreg_s(SYS_PMBIDR_EL1) & BIT(SYS_PMBIDR_EL1_P_SHIFT)))
-		vcpu->arch.flags |= KVM_ARM64_DEBUG_STATE_SAVE_SPE;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_SAVE_SPE_BUFFER;
 
 	/* Check if we have TRBE implemented and available at the host */
 	if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_TRBE_SHIFT) &&
 	    !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_PROG))
-		vcpu->arch.flags |= KVM_ARM64_DEBUG_STATE_SAVE_TRBE;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER;
 }
 
 void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.flags &= ~(KVM_ARM64_DEBUG_STATE_SAVE_SPE |
-			      KVM_ARM64_DEBUG_STATE_SAVE_TRBE);
+	vcpu->arch.flags &= ~(KVM_ARM64_DEBUG_SAVE_SPE_BUFFER |
+			      KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER);
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 565c31c15311..adabdcbbd753 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -85,10 +85,10 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt)
 {
 	/* Disable and flush SPE data generation */
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	/* Disable and flush Self-Hosted Trace generation */
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
 		__debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
 }
 
@@ -100,9 +100,9 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt)
 {
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
 		__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
 }
 
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 29/38] KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

For non-VHE systems, make the SPE register state part of the context that
is saved and restored at each world switch. The SPE buffer management
interrupt will be handled in a later patch.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h        | 19 ++++++
 arch/arm64/kvm/hyp/include/hyp/spe-sr.h | 32 +++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile        |  1 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c      |  6 +-
 arch/arm64/kvm/hyp/nvhe/spe-sr.c        | 87 +++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/switch.c        | 29 +++++++--
 6 files changed, 165 insertions(+), 9 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/hyp/spe-sr.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/spe-sr.c

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 0a5ff4361069..08f020912103 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -88,6 +88,25 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt);
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt);
+#ifdef CONFIG_KVM_ARM_SPE
+void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *host_ctxt);
+void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				 struct kvm_cpu_context *guest_ctxt);
+void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *host_ctxt);
+void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *guest_ctxt);
+#else
+static inline void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+static inline void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+#endif
 #endif
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/include/hyp/spe-sr.h b/arch/arm64/kvm/hyp/include/hyp/spe-sr.h
new file mode 100644
index 000000000000..d5f8f3ffc7d4
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/hyp/spe-sr.h
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ * Author: Alexandru Elisei <alexandru.elisei@arm.com>
+ */
+
+#ifndef __ARM64_KVM_HYP_SPE_SR_H__
+#define __ARM64_KVM_HYP_SPE_SR_H__
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_asm.h>
+
+static inline void __spe_save_common_state(struct kvm_cpu_context *ctxt)
+{
+	ctxt_sys_reg(ctxt, PMSICR_EL1) = read_sysreg_s(SYS_PMSICR_EL1);
+	ctxt_sys_reg(ctxt, PMSIRR_EL1) = read_sysreg_s(SYS_PMSIRR_EL1);
+	ctxt_sys_reg(ctxt, PMSFCR_EL1) = read_sysreg_s(SYS_PMSFCR_EL1);
+	ctxt_sys_reg(ctxt, PMSEVFR_EL1) = read_sysreg_s(SYS_PMSEVFR_EL1);
+	ctxt_sys_reg(ctxt, PMSLATFR_EL1) = read_sysreg_s(SYS_PMSLATFR_EL1);
+}
+
+static inline void __spe_restore_common_state(struct kvm_cpu_context *ctxt)
+{
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSICR_EL1), SYS_PMSICR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSIRR_EL1), SYS_PMSIRR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSFCR_EL1), SYS_PMSFCR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSEVFR_EL1), SYS_PMSEVFR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSLATFR_EL1), SYS_PMSLATFR_EL1);
+}
+
+#endif /* __ARM64_KVM_HYP_SPE_SR_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index c3c11974fa3b..06e66945eaab 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -17,6 +17,7 @@ obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
+obj-$(CONFIG_KVM_ARM_SPE) += spe-sr.o
 obj-y += $(lib-objs)
 
 ##
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index adabdcbbd753..02171dcf29c3 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -85,7 +85,8 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt)
 {
 	/* Disable and flush SPE data generation */
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
+	if (!kvm_vcpu_has_spe(vcpu) &&
+	    vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	/* Disable and flush Self-Hosted Trace generation */
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
@@ -100,7 +101,8 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt)
 {
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
+	if (!kvm_vcpu_has_spe(vcpu) &&
+	    vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
 		__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
new file mode 100644
index 000000000000..46e47c9fd08f
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ * Author: Alexandru Elisei <alexandru.elisei@arm.com>
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_hyp.h>
+
+#include <hyp/spe-sr.h>
+
+/*
+ * The owning exception level remains unchange from EL1 during the world switch,
+ * which means that profiling is disabled for as long as we execute at EL2. KVM
+ * does not need to explicitely disable profiling, like it does when the VCPU
+ * does not have SPE and we change buffer owning exception level, nor does it
+ * need to do any synchronization around sysreg save/restore.
+ */
+
+void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *host_ctxt)
+{
+	u64 pmblimitr;
+
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/*
+		 * The buffer performs indirect writes to system registers, a
+		 * context synchronization event is needed before the new
+		 * PMBPTR_EL1 value is visible to subsequent direct reads.
+		 */
+		isb();
+	}
+
+	ctxt_sys_reg(host_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1) = pmblimitr;
+	ctxt_sys_reg(host_ctxt, PMSCR_EL1) = read_sysreg_s(SYS_PMSCR_EL1);
+	ctxt_sys_reg(host_ctxt, PMSCR_EL2) = read_sysreg_el2(SYS_PMSCR);
+
+	__spe_save_common_state(host_ctxt);
+}
+
+void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				 struct kvm_cpu_context *guest_ctxt)
+{
+	if (read_sysreg_s(SYS_PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/* Ensure hardware updates to PMBPTR_EL1 are visible. */
+		isb();
+	}
+
+	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	/* PMBLIMITR_EL1 is updated only on a trapped write. */
+	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_s(SYS_PMSCR_EL1);
+
+	__spe_save_common_state(guest_ctxt);
+}
+
+void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *host_ctxt)
+{
+	__spe_restore_common_state(host_ctxt);
+
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
+	write_sysreg_el2(ctxt_sys_reg(host_ctxt, PMSCR_EL2), SYS_PMSCR);
+}
+
+void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *guest_ctxt)
+{
+	__spe_restore_common_state(guest_ctxt);
+
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
+	write_sysreg_el2(0, SYS_PMSCR);
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index b6489e244025..d97b56559e50 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -284,12 +284,16 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_state_nvhe(host_ctxt);
 	/*
-	 * We must flush and disable the SPE buffer for nVHE, as
-	 * the translation regime(EL1&0) is going to be loaded with
-	 * that of the guest. And we must do this before we change the
-	 * translation regime to EL2 (via MDCR_EL2_E2PB == 0) and
-	 * before we load guest Stage1.
+	 * If the VCPU has the SPE feature bit set, then we save the host's SPE
+	 * context.
+	 *
+	 * Otherwise, we only flush and disable the SPE buffer for nVHE, as the
+	 * translation regime(EL1&0) is going to be loaded with that of the
+	 * guest. And we must do this before we change the translation regime to
+	 * EL2 (via MDCR_EL2_E2PB == 0) and before we load guest Stage1.
 	 */
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_host_state_nvhe(vcpu, host_ctxt);
 	__debug_save_host_buffers_nvhe(vcpu, host_ctxt);
 
 	__kvm_adjust_pc(vcpu);
@@ -309,6 +313,9 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	__load_stage2(mmu, kern_hyp_va(mmu->arch));
 	__activate_traps(vcpu);
 
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_guest_state_nvhe(vcpu, guest_ctxt);
+
 	__hyp_vgic_restore_state(vcpu);
 	__timer_enable_traps(vcpu);
 
@@ -326,6 +333,10 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_state_nvhe(guest_ctxt);
 	__sysreg32_save_state(vcpu);
+
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_guest_state_nvhe(vcpu, guest_ctxt);
+
 	__timer_disable_traps(vcpu);
 	__hyp_vgic_save_state(vcpu);
 
@@ -338,10 +349,14 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 		__fpsimd_save_fpexc32(vcpu);
 
 	__debug_switch_to_host(vcpu);
+
 	/*
-	 * This must come after restoring the host sysregs, since a non-VHE
-	 * system may enable SPE here and make use of the TTBRs.
+	 * Restoring the host context must come after restoring the host
+	 * sysregs, since a non-VHE system may enable SPE here and make use of
+	 * the TTBRs.
 	 */
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_host_state_nvhe(vcpu, host_ctxt);
 	__debug_restore_host_buffers_nvhe(vcpu, host_ctxt);
 
 	if (pmu_switch_needed)
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 29/38] KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

For non-VHE systems, make the SPE register state part of the context that
is saved and restored at each world switch. The SPE buffer management
interrupt will be handled in a later patch.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h        | 19 ++++++
 arch/arm64/kvm/hyp/include/hyp/spe-sr.h | 32 +++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile        |  1 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c      |  6 +-
 arch/arm64/kvm/hyp/nvhe/spe-sr.c        | 87 +++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/switch.c        | 29 +++++++--
 6 files changed, 165 insertions(+), 9 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/hyp/spe-sr.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/spe-sr.c

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 0a5ff4361069..08f020912103 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -88,6 +88,25 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt);
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt);
+#ifdef CONFIG_KVM_ARM_SPE
+void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *host_ctxt);
+void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				 struct kvm_cpu_context *guest_ctxt);
+void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *host_ctxt);
+void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *guest_ctxt);
+#else
+static inline void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+static inline void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+#endif
 #endif
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/include/hyp/spe-sr.h b/arch/arm64/kvm/hyp/include/hyp/spe-sr.h
new file mode 100644
index 000000000000..d5f8f3ffc7d4
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/hyp/spe-sr.h
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ * Author: Alexandru Elisei <alexandru.elisei@arm.com>
+ */
+
+#ifndef __ARM64_KVM_HYP_SPE_SR_H__
+#define __ARM64_KVM_HYP_SPE_SR_H__
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_asm.h>
+
+static inline void __spe_save_common_state(struct kvm_cpu_context *ctxt)
+{
+	ctxt_sys_reg(ctxt, PMSICR_EL1) = read_sysreg_s(SYS_PMSICR_EL1);
+	ctxt_sys_reg(ctxt, PMSIRR_EL1) = read_sysreg_s(SYS_PMSIRR_EL1);
+	ctxt_sys_reg(ctxt, PMSFCR_EL1) = read_sysreg_s(SYS_PMSFCR_EL1);
+	ctxt_sys_reg(ctxt, PMSEVFR_EL1) = read_sysreg_s(SYS_PMSEVFR_EL1);
+	ctxt_sys_reg(ctxt, PMSLATFR_EL1) = read_sysreg_s(SYS_PMSLATFR_EL1);
+}
+
+static inline void __spe_restore_common_state(struct kvm_cpu_context *ctxt)
+{
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSICR_EL1), SYS_PMSICR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSIRR_EL1), SYS_PMSIRR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSFCR_EL1), SYS_PMSFCR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSEVFR_EL1), SYS_PMSEVFR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSLATFR_EL1), SYS_PMSLATFR_EL1);
+}
+
+#endif /* __ARM64_KVM_HYP_SPE_SR_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index c3c11974fa3b..06e66945eaab 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -17,6 +17,7 @@ obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
+obj-$(CONFIG_KVM_ARM_SPE) += spe-sr.o
 obj-y += $(lib-objs)
 
 ##
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index adabdcbbd753..02171dcf29c3 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -85,7 +85,8 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt)
 {
 	/* Disable and flush SPE data generation */
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
+	if (!kvm_vcpu_has_spe(vcpu) &&
+	    vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	/* Disable and flush Self-Hosted Trace generation */
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
@@ -100,7 +101,8 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt)
 {
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
+	if (!kvm_vcpu_has_spe(vcpu) &&
+	    vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
 		__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
new file mode 100644
index 000000000000..46e47c9fd08f
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ * Author: Alexandru Elisei <alexandru.elisei@arm.com>
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_hyp.h>
+
+#include <hyp/spe-sr.h>
+
+/*
+ * The owning exception level remains unchange from EL1 during the world switch,
+ * which means that profiling is disabled for as long as we execute at EL2. KVM
+ * does not need to explicitely disable profiling, like it does when the VCPU
+ * does not have SPE and we change buffer owning exception level, nor does it
+ * need to do any synchronization around sysreg save/restore.
+ */
+
+void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *host_ctxt)
+{
+	u64 pmblimitr;
+
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/*
+		 * The buffer performs indirect writes to system registers, a
+		 * context synchronization event is needed before the new
+		 * PMBPTR_EL1 value is visible to subsequent direct reads.
+		 */
+		isb();
+	}
+
+	ctxt_sys_reg(host_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1) = pmblimitr;
+	ctxt_sys_reg(host_ctxt, PMSCR_EL1) = read_sysreg_s(SYS_PMSCR_EL1);
+	ctxt_sys_reg(host_ctxt, PMSCR_EL2) = read_sysreg_el2(SYS_PMSCR);
+
+	__spe_save_common_state(host_ctxt);
+}
+
+void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				 struct kvm_cpu_context *guest_ctxt)
+{
+	if (read_sysreg_s(SYS_PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/* Ensure hardware updates to PMBPTR_EL1 are visible. */
+		isb();
+	}
+
+	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	/* PMBLIMITR_EL1 is updated only on a trapped write. */
+	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_s(SYS_PMSCR_EL1);
+
+	__spe_save_common_state(guest_ctxt);
+}
+
+void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *host_ctxt)
+{
+	__spe_restore_common_state(host_ctxt);
+
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
+	write_sysreg_el2(ctxt_sys_reg(host_ctxt, PMSCR_EL2), SYS_PMSCR);
+}
+
+void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *guest_ctxt)
+{
+	__spe_restore_common_state(guest_ctxt);
+
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
+	write_sysreg_el2(0, SYS_PMSCR);
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index b6489e244025..d97b56559e50 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -284,12 +284,16 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_state_nvhe(host_ctxt);
 	/*
-	 * We must flush and disable the SPE buffer for nVHE, as
-	 * the translation regime(EL1&0) is going to be loaded with
-	 * that of the guest. And we must do this before we change the
-	 * translation regime to EL2 (via MDCR_EL2_E2PB == 0) and
-	 * before we load guest Stage1.
+	 * If the VCPU has the SPE feature bit set, then we save the host's SPE
+	 * context.
+	 *
+	 * Otherwise, we only flush and disable the SPE buffer for nVHE, as the
+	 * translation regime(EL1&0) is going to be loaded with that of the
+	 * guest. And we must do this before we change the translation regime to
+	 * EL2 (via MDCR_EL2_E2PB == 0) and before we load guest Stage1.
 	 */
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_host_state_nvhe(vcpu, host_ctxt);
 	__debug_save_host_buffers_nvhe(vcpu, host_ctxt);
 
 	__kvm_adjust_pc(vcpu);
@@ -309,6 +313,9 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	__load_stage2(mmu, kern_hyp_va(mmu->arch));
 	__activate_traps(vcpu);
 
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_guest_state_nvhe(vcpu, guest_ctxt);
+
 	__hyp_vgic_restore_state(vcpu);
 	__timer_enable_traps(vcpu);
 
@@ -326,6 +333,10 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_state_nvhe(guest_ctxt);
 	__sysreg32_save_state(vcpu);
+
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_guest_state_nvhe(vcpu, guest_ctxt);
+
 	__timer_disable_traps(vcpu);
 	__hyp_vgic_save_state(vcpu);
 
@@ -338,10 +349,14 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 		__fpsimd_save_fpexc32(vcpu);
 
 	__debug_switch_to_host(vcpu);
+
 	/*
-	 * This must come after restoring the host sysregs, since a non-VHE
-	 * system may enable SPE here and make use of the TTBRs.
+	 * Restoring the host context must come after restoring the host
+	 * sysregs, since a non-VHE system may enable SPE here and make use of
+	 * the TTBRs.
 	 */
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_host_state_nvhe(vcpu, host_ctxt);
 	__debug_restore_host_buffers_nvhe(vcpu, host_ctxt);
 
 	if (pmu_switch_needed)
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 30/38] KVM: arm64: VHE: Context switch SPE state if VCPU has SPE
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Similar to the non-VHE case, save and restore the SPE register state at
each world switch for VHE enabled systems if the VCPU has the SPE
feature.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h |  24 +++++-
 arch/arm64/include/asm/sysreg.h  |   2 +
 arch/arm64/kvm/hyp/vhe/Makefile  |   1 +
 arch/arm64/kvm/hyp/vhe/spe-sr.c  | 128 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c  |   8 ++
 5 files changed, 161 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/vhe/spe-sr.c

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 08f020912103..e8541ec9fca0 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -106,8 +106,28 @@ static inline void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
 					struct kvm_cpu_context *host_ctxt) {}
 static inline void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 					struct kvm_cpu_context *guest_ctxt) {}
-#endif
-#endif
+#endif /* CONFIG_KVM_ARM_SPE */
+#else
+#ifdef CONFIG_KVM_ARM_SPE
+void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
+			       struct kvm_cpu_context *host_ctxt);
+void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *guest_ctxt);
+void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
+				  struct kvm_cpu_context *host_ctxt);
+void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *guest_ctxt);
+#else
+static inline void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+static inline void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+#endif /* CONFIG_KVM_ARM_SPE */
+#endif /* __KVM_NVHE_HYPERVISOR__ */
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 16b3f1a1d468..e8201aef165d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -270,6 +270,8 @@
 #define SYS_PMSCR_EL1_TS_SHIFT		5
 #define SYS_PMSCR_EL1_PCT_SHIFT		6
 
+#define SYS_PMSCR_EL12			sys_reg(3, 5, 9, 9, 0)
+
 #define SYS_PMSCR_EL2			sys_reg(3, 4, 9, 9, 0)
 #define SYS_PMSCR_EL2_E0HSPE_SHIFT	0
 #define SYS_PMSCR_EL2_E2SPE_SHIFT	1
diff --git a/arch/arm64/kvm/hyp/vhe/Makefile b/arch/arm64/kvm/hyp/vhe/Makefile
index 96bec0ecf9dd..7cb4a9e5ceb0 100644
--- a/arch/arm64/kvm/hyp/vhe/Makefile
+++ b/arch/arm64/kvm/hyp/vhe/Makefile
@@ -7,5 +7,6 @@ asflags-y := -D__KVM_VHE_HYPERVISOR__
 ccflags-y := -D__KVM_VHE_HYPERVISOR__
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o
+obj-$(CONFIG_KVM_ARM_SPE) += spe-sr.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
new file mode 100644
index 000000000000..00eab9e2ec60
--- /dev/null
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_hyp.h>
+#include <asm/kprobes.h>
+
+#include <hyp/spe-sr.h>
+
+/*
+ * Disable host profiling, drain the buffer and save the host SPE context.
+ * Extra care must be taken because profiling might be in progress.
+ */
+void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
+			       struct kvm_cpu_context *host_ctxt)
+{
+	u64 pmblimitr, pmscr_el2;
+
+	/* Disable profiling while the SPE context is being switched. */
+	pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
+	write_sysreg_el2(0, SYS_PMSCR);
+	isb();
+
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/* Ensure hardware updates to PMBPTR_EL1 are visible. */
+		isb();
+	}
+
+	ctxt_sys_reg(host_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1) = pmblimitr;
+	ctxt_sys_reg(host_ctxt, PMSCR_EL2) = pmscr_el2;
+
+	__spe_save_common_state(host_ctxt);
+}
+NOKPROBE_SYMBOL(__spe_save_host_state_vhe);
+
+/*
+ * Drain the guest's buffer and save the SPE state. Profiling is disabled
+ * because we're at EL2 and the buffer owning exceptions level is EL1.
+ */
+void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *guest_ctxt)
+{
+	u64 pmblimitr;
+
+	/*
+	 * We're at EL2 and the buffer owning regime is EL1, which means that
+	 * profiling is disabled. After we disable traps and restore the host's
+	 * MDCR_EL2, profiling will remain disabled because we've disabled it
+	 * via PMSCR_EL2 when we saved the host's SPE state. All it's needed
+	 * here is to drain the buffer.
+	 */
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/* Ensure hardware updates to PMBPTR_EL1 are visible. */
+		isb();
+	}
+
+	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	/* PMBLIMITR_EL1 is updated only on a trapped write. */
+	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_el1(SYS_PMSCR);
+
+	__spe_save_common_state(guest_ctxt);
+}
+NOKPROBE_SYMBOL(__spe_save_guest_state_vhe);
+
+/*
+ * Restore the host SPE context. Special care must be taken because we're
+ * potentially resuming a profiling session which was stopped when we saved the
+ * host SPE register state.
+ */
+void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
+				  struct kvm_cpu_context *host_ctxt)
+{
+	__spe_restore_common_state(host_ctxt);
+
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+
+	/*
+	 * Make sure buffer pointer and limit is updated first, so we don't end
+	 * up in a situation where profiling is enabled and the buffer uses the
+	 * values programmed by the guest.
+	 *
+	 * This also serves to make sure the write to MDCR_EL2 which changes the
+	 * buffer owning Exception level is visible.
+	 *
+	 * After the synchronization, profiling is still disabled at EL2,
+	 * because we cleared PMSCR_EL2 when we saved the host context.
+	 */
+	isb();
+
+	write_sysreg_el2(ctxt_sys_reg(host_ctxt, PMSCR_EL2), SYS_PMSCR);
+}
+NOKPROBE_SYMBOL(__spe_restore_host_state_vhe);
+
+/*
+ * Restore the guest SPE context while profiling is disabled at EL2.
+ */
+void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *guest_ctxt)
+{
+	__spe_restore_common_state(guest_ctxt);
+
+	/*
+	 * No synchronization needed here. Profiling is disabled at EL2 because
+	 * PMSCR_EL2 has been cleared when saving the host's context, and the
+	 * buffer has already been drained.
+	 */
+
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_el1(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR);
+	/* PMSCR_EL2 has been cleared when saving the host state. */
+}
+NOKPROBE_SYMBOL(__spe_restore_guest_state_vhe);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 1a46a4840d17..fa95606af893 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -137,6 +137,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	guest_ctxt = &vcpu->arch.ctxt;
 
 	sysreg_save_host_state_vhe(host_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_host_state_vhe(vcpu, host_ctxt);
 
 	/*
 	 * ARM erratum 1165522 requires us to configure both stage 1 and
@@ -155,6 +157,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__kvm_adjust_pc(vcpu);
 
 	sysreg_restore_guest_state_vhe(guest_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_guest_state_vhe(vcpu, guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
 	do {
@@ -165,10 +169,14 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
 	sysreg_save_guest_state_vhe(guest_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_guest_state_vhe(vcpu, guest_ctxt);
 
 	__deactivate_traps(vcpu);
 
 	sysreg_restore_host_state_vhe(host_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_host_state_vhe(vcpu, host_ctxt);
 
 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
 		__fpsimd_save_fpexc32(vcpu);
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 30/38] KVM: arm64: VHE: Context switch SPE state if VCPU has SPE
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Similar to the non-VHE case, save and restore the SPE register state at
each world switch for VHE enabled systems if the VCPU has the SPE
feature.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h |  24 +++++-
 arch/arm64/include/asm/sysreg.h  |   2 +
 arch/arm64/kvm/hyp/vhe/Makefile  |   1 +
 arch/arm64/kvm/hyp/vhe/spe-sr.c  | 128 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c  |   8 ++
 5 files changed, 161 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/vhe/spe-sr.c

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 08f020912103..e8541ec9fca0 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -106,8 +106,28 @@ static inline void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
 					struct kvm_cpu_context *host_ctxt) {}
 static inline void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 					struct kvm_cpu_context *guest_ctxt) {}
-#endif
-#endif
+#endif /* CONFIG_KVM_ARM_SPE */
+#else
+#ifdef CONFIG_KVM_ARM_SPE
+void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
+			       struct kvm_cpu_context *host_ctxt);
+void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *guest_ctxt);
+void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
+				  struct kvm_cpu_context *host_ctxt);
+void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *guest_ctxt);
+#else
+static inline void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+static inline void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+#endif /* CONFIG_KVM_ARM_SPE */
+#endif /* __KVM_NVHE_HYPERVISOR__ */
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 16b3f1a1d468..e8201aef165d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -270,6 +270,8 @@
 #define SYS_PMSCR_EL1_TS_SHIFT		5
 #define SYS_PMSCR_EL1_PCT_SHIFT		6
 
+#define SYS_PMSCR_EL12			sys_reg(3, 5, 9, 9, 0)
+
 #define SYS_PMSCR_EL2			sys_reg(3, 4, 9, 9, 0)
 #define SYS_PMSCR_EL2_E0HSPE_SHIFT	0
 #define SYS_PMSCR_EL2_E2SPE_SHIFT	1
diff --git a/arch/arm64/kvm/hyp/vhe/Makefile b/arch/arm64/kvm/hyp/vhe/Makefile
index 96bec0ecf9dd..7cb4a9e5ceb0 100644
--- a/arch/arm64/kvm/hyp/vhe/Makefile
+++ b/arch/arm64/kvm/hyp/vhe/Makefile
@@ -7,5 +7,6 @@ asflags-y := -D__KVM_VHE_HYPERVISOR__
 ccflags-y := -D__KVM_VHE_HYPERVISOR__
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o
+obj-$(CONFIG_KVM_ARM_SPE) += spe-sr.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
new file mode 100644
index 000000000000..00eab9e2ec60
--- /dev/null
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_hyp.h>
+#include <asm/kprobes.h>
+
+#include <hyp/spe-sr.h>
+
+/*
+ * Disable host profiling, drain the buffer and save the host SPE context.
+ * Extra care must be taken because profiling might be in progress.
+ */
+void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
+			       struct kvm_cpu_context *host_ctxt)
+{
+	u64 pmblimitr, pmscr_el2;
+
+	/* Disable profiling while the SPE context is being switched. */
+	pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
+	write_sysreg_el2(0, SYS_PMSCR);
+	isb();
+
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/* Ensure hardware updates to PMBPTR_EL1 are visible. */
+		isb();
+	}
+
+	ctxt_sys_reg(host_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1) = pmblimitr;
+	ctxt_sys_reg(host_ctxt, PMSCR_EL2) = pmscr_el2;
+
+	__spe_save_common_state(host_ctxt);
+}
+NOKPROBE_SYMBOL(__spe_save_host_state_vhe);
+
+/*
+ * Drain the guest's buffer and save the SPE state. Profiling is disabled
+ * because we're at EL2 and the buffer owning exceptions level is EL1.
+ */
+void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *guest_ctxt)
+{
+	u64 pmblimitr;
+
+	/*
+	 * We're at EL2 and the buffer owning regime is EL1, which means that
+	 * profiling is disabled. After we disable traps and restore the host's
+	 * MDCR_EL2, profiling will remain disabled because we've disabled it
+	 * via PMSCR_EL2 when we saved the host's SPE state. All it's needed
+	 * here is to drain the buffer.
+	 */
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/* Ensure hardware updates to PMBPTR_EL1 are visible. */
+		isb();
+	}
+
+	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	/* PMBLIMITR_EL1 is updated only on a trapped write. */
+	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_el1(SYS_PMSCR);
+
+	__spe_save_common_state(guest_ctxt);
+}
+NOKPROBE_SYMBOL(__spe_save_guest_state_vhe);
+
+/*
+ * Restore the host SPE context. Special care must be taken because we're
+ * potentially resuming a profiling session which was stopped when we saved the
+ * host SPE register state.
+ */
+void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
+				  struct kvm_cpu_context *host_ctxt)
+{
+	__spe_restore_common_state(host_ctxt);
+
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+
+	/*
+	 * Make sure buffer pointer and limit is updated first, so we don't end
+	 * up in a situation where profiling is enabled and the buffer uses the
+	 * values programmed by the guest.
+	 *
+	 * This also serves to make sure the write to MDCR_EL2 which changes the
+	 * buffer owning Exception level is visible.
+	 *
+	 * After the synchronization, profiling is still disabled at EL2,
+	 * because we cleared PMSCR_EL2 when we saved the host context.
+	 */
+	isb();
+
+	write_sysreg_el2(ctxt_sys_reg(host_ctxt, PMSCR_EL2), SYS_PMSCR);
+}
+NOKPROBE_SYMBOL(__spe_restore_host_state_vhe);
+
+/*
+ * Restore the guest SPE context while profiling is disabled at EL2.
+ */
+void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *guest_ctxt)
+{
+	__spe_restore_common_state(guest_ctxt);
+
+	/*
+	 * No synchronization needed here. Profiling is disabled at EL2 because
+	 * PMSCR_EL2 has been cleared when saving the host's context, and the
+	 * buffer has already been drained.
+	 */
+
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_el1(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR);
+	/* PMSCR_EL2 has been cleared when saving the host state. */
+}
+NOKPROBE_SYMBOL(__spe_restore_guest_state_vhe);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 1a46a4840d17..fa95606af893 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -137,6 +137,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	guest_ctxt = &vcpu->arch.ctxt;
 
 	sysreg_save_host_state_vhe(host_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_host_state_vhe(vcpu, host_ctxt);
 
 	/*
 	 * ARM erratum 1165522 requires us to configure both stage 1 and
@@ -155,6 +157,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__kvm_adjust_pc(vcpu);
 
 	sysreg_restore_guest_state_vhe(guest_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_guest_state_vhe(vcpu, guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
 	do {
@@ -165,10 +169,14 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
 	sysreg_save_guest_state_vhe(guest_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_guest_state_vhe(vcpu, guest_ctxt);
 
 	__deactivate_traps(vcpu);
 
 	sysreg_restore_host_state_vhe(host_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_host_state_vhe(vcpu, host_ctxt);
 
 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
 		__fpsimd_save_fpexc32(vcpu);
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 31/38] KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

FEAT_SPEv1p2 introduced a new register, PMSNEVFR_EL1. The SPE driver is not
using the register, so save the register to the guest context on vcpu_put()
and restore it on vcpu_load() since it will not be touched by the host, and
the value programmed by the guest doesn't affect the host.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/include/asm/kvm_spe.h  |  6 ++++++
 arch/arm64/include/asm/sysreg.h   |  1 +
 arch/arm64/kvm/arm.c              |  2 ++
 arch/arm64/kvm/spe.c              | 29 +++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c         |  1 +
 6 files changed, 40 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 082994f5fb0e..3eef642d7bba 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -239,6 +239,7 @@ enum vcpu_sysreg {
 
        /* Statistical Profiling Extension Registers. */
 	PMSCR_EL1,      /* Statistical Profiling Control Register */
+	PMSNEVFR_EL1,   /* Sampling Inverted Event Filter Register */
 	PMSICR_EL1,     /* Sampling Interval Counter Register */
 	PMSIRR_EL1,     /* Sampling Interval Reload Register */
 	PMSFCR_EL1,     /* Sampling Filter Control Register */
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 7c2d5695120a..ce92d5f1db19 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -27,6 +27,9 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
 
+void kvm_spe_vcpu_load(struct kvm_vcpu *vcpu);
+void kvm_spe_vcpu_put(struct kvm_vcpu *vcpu);
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
@@ -50,6 +53,9 @@ static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
 static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
 
+static inline void kvm_spe_vcpu_load(struct kvm_vcpu *vcpu) {}
+static inline void kvm_spe_vcpu_put(struct kvm_vcpu *vcpu) {}
+
 static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
 				   struct kvm_device_attr *attr)
 {
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index e8201aef165d..36c3185663ff 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -943,6 +943,7 @@
 
 #define ID_AA64DFR0_PMSVER_8_2		0x1
 #define ID_AA64DFR0_PMSVER_8_3		0x2
+#define ID_AA64DFR0_PMSVER_8_7		0x3
 
 #define ID_DFR0_PERFMON_SHIFT		24
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5270f3b9886c..a4f17f7bf943 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -460,6 +460,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (vcpu_has_ptrauth(vcpu))
 		vcpu_ptrauth_disable(vcpu);
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
+	kvm_spe_vcpu_load(vcpu);
 
 	if (!cpumask_empty(vcpu->arch.supported_cpus) &&
 	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
@@ -468,6 +469,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_spe_vcpu_put(vcpu);
 	kvm_arch_vcpu_put_debug_state_flags(vcpu);
 	kvm_arch_vcpu_put_fp(vcpu);
 	if (has_vhe())
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index e8a8aa7f10b9..9c0567dadff1 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -66,6 +66,35 @@ u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg)
 	return __vcpu_sys_reg(vcpu, reg);
 }
 
+static unsigned int kvm_spe_get_pmsver(void)
+{
+	u64 dfr0 = read_sysreg(id_aa64dfr0_el1);
+
+	return cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_PMSVER_SHIFT);
+}
+
+void kvm_spe_vcpu_load(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_vcpu_has_spe(vcpu))
+		return;
+
+	if (kvm_spe_get_pmsver() < ID_AA64DFR0_PMSVER_8_7)
+		return;
+
+	write_sysreg_s(__vcpu_sys_reg(vcpu, PMSNEVFR_EL1), SYS_PMSNEVFR_EL1);
+}
+
+void kvm_spe_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_vcpu_has_spe(vcpu))
+		return;
+
+	if (kvm_spe_get_pmsver() < ID_AA64DFR0_PMSVER_8_7)
+		return;
+
+	__vcpu_sys_reg(vcpu, PMSNEVFR_EL1) = read_sysreg_s(SYS_PMSNEVFR_EL1);
+}
+
 static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_supports_spe())
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2026eaebcc31..21b6b8bc1f25 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1614,6 +1614,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
 	{ SPE_SYS_REG(SYS_PMSCR_EL1), .reg = PMSCR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSNEVFR_EL1), .reg = PMSNEVFR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSICR_EL1), .reg = PMSICR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSIRR_EL1), .reg = PMSIRR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSFCR_EL1), .reg = PMSFCR_EL1 },
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 31/38] KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

FEAT_SPEv1p2 introduced a new register, PMSNEVFR_EL1. The SPE driver is not
using the register, so save the register to the guest context on vcpu_put()
and restore it on vcpu_load() since it will not be touched by the host, and
the value programmed by the guest doesn't affect the host.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/include/asm/kvm_spe.h  |  6 ++++++
 arch/arm64/include/asm/sysreg.h   |  1 +
 arch/arm64/kvm/arm.c              |  2 ++
 arch/arm64/kvm/spe.c              | 29 +++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c         |  1 +
 6 files changed, 40 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 082994f5fb0e..3eef642d7bba 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -239,6 +239,7 @@ enum vcpu_sysreg {
 
        /* Statistical Profiling Extension Registers. */
 	PMSCR_EL1,      /* Statistical Profiling Control Register */
+	PMSNEVFR_EL1,   /* Sampling Inverted Event Filter Register */
 	PMSICR_EL1,     /* Sampling Interval Counter Register */
 	PMSIRR_EL1,     /* Sampling Interval Reload Register */
 	PMSFCR_EL1,     /* Sampling Filter Control Register */
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 7c2d5695120a..ce92d5f1db19 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -27,6 +27,9 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
 
+void kvm_spe_vcpu_load(struct kvm_vcpu *vcpu);
+void kvm_spe_vcpu_put(struct kvm_vcpu *vcpu);
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
@@ -50,6 +53,9 @@ static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
 static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
 
+static inline void kvm_spe_vcpu_load(struct kvm_vcpu *vcpu) {}
+static inline void kvm_spe_vcpu_put(struct kvm_vcpu *vcpu) {}
+
 static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
 				   struct kvm_device_attr *attr)
 {
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index e8201aef165d..36c3185663ff 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -943,6 +943,7 @@
 
 #define ID_AA64DFR0_PMSVER_8_2		0x1
 #define ID_AA64DFR0_PMSVER_8_3		0x2
+#define ID_AA64DFR0_PMSVER_8_7		0x3
 
 #define ID_DFR0_PERFMON_SHIFT		24
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5270f3b9886c..a4f17f7bf943 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -460,6 +460,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (vcpu_has_ptrauth(vcpu))
 		vcpu_ptrauth_disable(vcpu);
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
+	kvm_spe_vcpu_load(vcpu);
 
 	if (!cpumask_empty(vcpu->arch.supported_cpus) &&
 	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
@@ -468,6 +469,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_spe_vcpu_put(vcpu);
 	kvm_arch_vcpu_put_debug_state_flags(vcpu);
 	kvm_arch_vcpu_put_fp(vcpu);
 	if (has_vhe())
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index e8a8aa7f10b9..9c0567dadff1 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -66,6 +66,35 @@ u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg)
 	return __vcpu_sys_reg(vcpu, reg);
 }
 
+static unsigned int kvm_spe_get_pmsver(void)
+{
+	u64 dfr0 = read_sysreg(id_aa64dfr0_el1);
+
+	return cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_PMSVER_SHIFT);
+}
+
+void kvm_spe_vcpu_load(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_vcpu_has_spe(vcpu))
+		return;
+
+	if (kvm_spe_get_pmsver() < ID_AA64DFR0_PMSVER_8_7)
+		return;
+
+	write_sysreg_s(__vcpu_sys_reg(vcpu, PMSNEVFR_EL1), SYS_PMSNEVFR_EL1);
+}
+
+void kvm_spe_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_vcpu_has_spe(vcpu))
+		return;
+
+	if (kvm_spe_get_pmsver() < ID_AA64DFR0_PMSVER_8_7)
+		return;
+
+	__vcpu_sys_reg(vcpu, PMSNEVFR_EL1) = read_sysreg_s(SYS_PMSNEVFR_EL1);
+}
+
 static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_supports_spe())
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2026eaebcc31..21b6b8bc1f25 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1614,6 +1614,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
 	{ SPE_SYS_REG(SYS_PMSCR_EL1), .reg = PMSCR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSNEVFR_EL1), .reg = PMSNEVFR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSICR_EL1), .reg = PMSICR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSIRR_EL1), .reg = PMSIRR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSFCR_EL1), .reg = PMSFCR_EL1 },
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 32/38] KVM: arm64: Allow guest to use physical timestamps if perfmon_capable()
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The SPE driver allows userspace to use physical timestamps for records only
if the process if perfmon_capable(). Do the same for a virtual machine with
the SPE feature.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/include/asm/kvm_spe.h  |  9 +++++++++
 arch/arm64/kvm/arm.c              |  1 +
 arch/arm64/kvm/hyp/nvhe/spe-sr.c  |  2 +-
 arch/arm64/kvm/hyp/vhe/spe-sr.c   |  2 +-
 arch/arm64/kvm/spe.c              | 17 +++++++++++++++++
 6 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3eef642d7bba..102e1c087798 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -158,6 +158,8 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	struct kvm_spe spe;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index ce92d5f1db19..7b87cf1eed37 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -21,6 +21,11 @@ struct kvm_vcpu_spe {
 	int irq_num;		/* Buffer management interrut number */
 };
 
+struct kvm_spe {
+	bool perfmon_capable;	/* Is the VM perfmon_capable()? */
+};
+
+void kvm_spe_init_vm(struct kvm *kvm);
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 
@@ -40,6 +45,10 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 struct kvm_vcpu_spe {
 };
 
+struct kvm_spe {
+};
+
+static inline void kvm_spe_init_vm(struct kvm *kvm) {}
 static inline int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 {
 	return 0;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a4f17f7bf943..5e166ffc6067 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -177,6 +177,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		goto out_free_stage2_pgd;
 
 	kvm_vgic_early_init(kvm);
+	kvm_spe_init_vm(kvm);
 
 	/* The maximum number of VCPUs is limited by the host's GIC model */
 	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
index 46e47c9fd08f..4f6579daddb5 100644
--- a/arch/arm64/kvm/hyp/nvhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -83,5 +83,5 @@ void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
-	write_sysreg_el2(0, SYS_PMSCR);
+	write_sysreg_el2(ctxt_sys_reg(guest_ctxt, PMSCR_EL2), SYS_PMSCR);
 }
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
index 00eab9e2ec60..f557ac64a1cc 100644
--- a/arch/arm64/kvm/hyp/vhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -21,7 +21,7 @@ void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
 
 	/* Disable profiling while the SPE context is being switched. */
 	pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
-	write_sysreg_el2(0, SYS_PMSCR);
+	write_sysreg_el2(__vcpu_sys_reg(vcpu, PMSCR_EL2), SYS_PMSCR);
 	isb();
 
 	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 9c0567dadff1..f5e9dc249e9a 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2021 - ARM Ltd
  */
 
+#include <linux/capability.h>
 #include <linux/cpumask.h>
 #include <linux/kvm_host.h>
 #include <linux/perf/arm_spe_pmu.h>
@@ -28,6 +29,19 @@ void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu)
 	mutex_unlock(&supported_cpus_lock);
 }
 
+void kvm_spe_init_vm(struct kvm *kvm)
+{
+	/*
+	 * Allow the guest to use the physical timer for timestamps only if the
+	 * VMM is perfmon_capable(), similar to what the SPE driver allows.
+	 *
+	 * CAP_PERFMON can be changed during the lifetime of the VM, so record
+	 * its value when the VM is created to avoid situations where only some
+	 * VCPUs allow physical timer timestamps, while others don't.
+	 */
+	kvm->arch.spe.perfmon_capable = perfmon_capable();
+}
+
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_supports_spe())
@@ -53,6 +67,9 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.spe.initialized)
 		return -EINVAL;
 
+	if (vcpu->kvm->arch.spe.perfmon_capable)
+		__vcpu_sys_reg(vcpu, PMSCR_EL2) = BIT(SYS_PMSCR_EL1_PCT_SHIFT);
+
 	return 0;
 }
 
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 32/38] KVM: arm64: Allow guest to use physical timestamps if perfmon_capable()
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

The SPE driver allows userspace to use physical timestamps for records only
if the process if perfmon_capable(). Do the same for a virtual machine with
the SPE feature.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/include/asm/kvm_spe.h  |  9 +++++++++
 arch/arm64/kvm/arm.c              |  1 +
 arch/arm64/kvm/hyp/nvhe/spe-sr.c  |  2 +-
 arch/arm64/kvm/hyp/vhe/spe-sr.c   |  2 +-
 arch/arm64/kvm/spe.c              | 17 +++++++++++++++++
 6 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3eef642d7bba..102e1c087798 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -158,6 +158,8 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	struct kvm_spe spe;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index ce92d5f1db19..7b87cf1eed37 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -21,6 +21,11 @@ struct kvm_vcpu_spe {
 	int irq_num;		/* Buffer management interrut number */
 };
 
+struct kvm_spe {
+	bool perfmon_capable;	/* Is the VM perfmon_capable()? */
+};
+
+void kvm_spe_init_vm(struct kvm *kvm);
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 
@@ -40,6 +45,10 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 struct kvm_vcpu_spe {
 };
 
+struct kvm_spe {
+};
+
+static inline void kvm_spe_init_vm(struct kvm *kvm) {}
 static inline int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 {
 	return 0;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a4f17f7bf943..5e166ffc6067 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -177,6 +177,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		goto out_free_stage2_pgd;
 
 	kvm_vgic_early_init(kvm);
+	kvm_spe_init_vm(kvm);
 
 	/* The maximum number of VCPUs is limited by the host's GIC model */
 	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
index 46e47c9fd08f..4f6579daddb5 100644
--- a/arch/arm64/kvm/hyp/nvhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -83,5 +83,5 @@ void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
-	write_sysreg_el2(0, SYS_PMSCR);
+	write_sysreg_el2(ctxt_sys_reg(guest_ctxt, PMSCR_EL2), SYS_PMSCR);
 }
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
index 00eab9e2ec60..f557ac64a1cc 100644
--- a/arch/arm64/kvm/hyp/vhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -21,7 +21,7 @@ void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
 
 	/* Disable profiling while the SPE context is being switched. */
 	pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
-	write_sysreg_el2(0, SYS_PMSCR);
+	write_sysreg_el2(__vcpu_sys_reg(vcpu, PMSCR_EL2), SYS_PMSCR);
 	isb();
 
 	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 9c0567dadff1..f5e9dc249e9a 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2021 - ARM Ltd
  */
 
+#include <linux/capability.h>
 #include <linux/cpumask.h>
 #include <linux/kvm_host.h>
 #include <linux/perf/arm_spe_pmu.h>
@@ -28,6 +29,19 @@ void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu)
 	mutex_unlock(&supported_cpus_lock);
 }
 
+void kvm_spe_init_vm(struct kvm *kvm)
+{
+	/*
+	 * Allow the guest to use the physical timer for timestamps only if the
+	 * VMM is perfmon_capable(), similar to what the SPE driver allows.
+	 *
+	 * CAP_PERFMON can be changed during the lifetime of the VM, so record
+	 * its value when the VM is created to avoid situations where only some
+	 * VCPUs allow physical timer timestamps, while others don't.
+	 */
+	kvm->arch.spe.perfmon_capable = perfmon_capable();
+}
+
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_supports_spe())
@@ -53,6 +67,9 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.spe.initialized)
 		return -EINVAL;
 
+	if (vcpu->kvm->arch.spe.perfmon_capable)
+		__vcpu_sys_reg(vcpu, PMSCR_EL2) = BIT(SYS_PMSCR_EL1_PCT_SHIFT);
+
 	return 0;
 }
 
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 33/38] KVM: arm64: Emulate SPE buffer management interrupt
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

A profiling buffer management interrupt is asserted when the buffer fills,
on a fault or on an external abort. The service bit, PMBSR_EL1.S, is set as
long as SPE asserts this interrupt. The interrupt can also be asserted
following a direct write to PMBSR_EL1 that sets the bit. The SPE hardware
stops asserting the interrupt only when the service bit is cleared.

KVM emulates the interrupt by reading the value of the service bit on
each guest exit to determine if the SPE hardware asserted the interrupt
(for example, if the buffer was full). Writes to the buffer registers are
trapped, to determine when the interrupt should be cleared or when the
guest wants to explicitely assert the interrupt by setting the service bit.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_spe.h |  4 ++
 arch/arm64/kvm/arm.c             |  3 ++
 arch/arm64/kvm/hyp/nvhe/spe-sr.c | 28 +++++++++++--
 arch/arm64/kvm/hyp/vhe/spe-sr.c  | 17 ++++++--
 arch/arm64/kvm/spe.c             | 72 ++++++++++++++++++++++++++++++++
 5 files changed, 117 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 7b87cf1eed37..7a7b1c2149a1 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -19,6 +19,8 @@ static __always_inline bool kvm_supports_spe(void)
 struct kvm_vcpu_spe {
 	bool initialized;	/* SPE initialized for the VCPU */
 	int irq_num;		/* Buffer management interrut number */
+	bool virq_level;	/* 'true' if the interrupt is asserted at the VGIC */
+	bool hwirq_level;	/* 'true' if the SPE hardware is asserting the interrupt */
 };
 
 struct kvm_spe {
@@ -28,6 +30,7 @@ struct kvm_spe {
 void kvm_spe_init_vm(struct kvm *kvm);
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
+void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
 
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
@@ -58,6 +61,7 @@ static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 {
 	return 0;
 }
+static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
 
 static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
 static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5e166ffc6067..49b629e7e1aa 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -966,6 +966,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 */
 		kvm_pmu_sync_hwstate(vcpu);
 
+		if (kvm_supports_spe() && kvm_vcpu_has_spe(vcpu))
+			kvm_spe_sync_hwstate(vcpu);
+
 		/*
 		 * Sync the vgic state before syncing the timer state because
 		 * the timer code needs to know if the virtual timer
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
index 4f6579daddb5..4ef84c400d4f 100644
--- a/arch/arm64/kvm/hyp/nvhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -47,6 +47,8 @@ void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
 void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 				 struct kvm_cpu_context *guest_ctxt)
 {
+	u64 pmbsr;
+
 	if (read_sysreg_s(SYS_PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
 		psb_csync();
 		dsb(nsh);
@@ -55,7 +57,22 @@ void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 	}
 
 	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
-	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	/*
+	 * We need to differentiate between the hardware asserting the interrupt
+	 * and the guest setting the service bit as a result of a direct
+	 * register write, hence the extra field in the spe struct.
+	 *
+	 * The PMBSR_EL1 register is not directly accessed by the guest, KVM
+	 * needs to update the in-memory copy when the hardware asserts the
+	 * interrupt as that's the only case when KVM will show the guest a
+	 * value which is different from what the guest last wrote to the
+	 * register.
+	 */
+	pmbsr = read_sysreg_s(SYS_PMBSR_EL1);
+	if (pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT)) {
+		ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = pmbsr;
+		vcpu->arch.spe.hwirq_level = true;
+	}
 	/* PMBLIMITR_EL1 is updated only on a trapped write. */
 	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_s(SYS_PMSCR_EL1);
 
@@ -80,8 +97,13 @@ void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 	__spe_restore_common_state(guest_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	/* The buffer management interrupt is virtual. */
+	write_sysreg_s(0, SYS_PMBSR_EL1);
+	/* The buffer is disabled when the interrupt is asserted. */
+	if (vcpu->arch.spe.virq_level)
+		write_sysreg_s(0, SYS_PMBLIMITR_EL1);
+	else
+		write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
 	write_sysreg_el2(ctxt_sys_reg(guest_ctxt, PMSCR_EL2), SYS_PMSCR);
 }
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
index f557ac64a1cc..3821807b3ec8 100644
--- a/arch/arm64/kvm/hyp/vhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -48,7 +48,7 @@ NOKPROBE_SYMBOL(__spe_save_host_state_vhe);
 void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 				struct kvm_cpu_context *guest_ctxt)
 {
-	u64 pmblimitr;
+	u64 pmblimitr, pmbsr;
 
 	/*
 	 * We're at EL2 and the buffer owning regime is EL1, which means that
@@ -66,7 +66,11 @@ void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 	}
 
 	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
-	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	pmbsr = read_sysreg_s(SYS_PMBSR_EL1);
+	if (pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT)) {
+		ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = pmbsr;
+		vcpu->arch.spe.hwirq_level = true;
+	}
 	/* PMBLIMITR_EL1 is updated only on a trapped write. */
 	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_el1(SYS_PMSCR);
 
@@ -120,8 +124,13 @@ void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
 	 */
 
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	/* The buffer management interrupt is virtual. */
+	write_sysreg_s(0, SYS_PMBSR_EL1);
+	/* The buffer is disabled when the interrupt is asserted. */
+	if (vcpu->arch.spe.virq_level)
+		write_sysreg_s(0, SYS_PMBLIMITR_EL1);
+	else
+		write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
 	write_sysreg_el1(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR);
 	/* PMSCR_EL2 has been cleared when saving the host state. */
 }
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index f5e9dc249e9a..e856554039a1 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -73,9 +73,81 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static void kvm_spe_update_irq(struct kvm_vcpu *vcpu, bool level)
+{
+	struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+	int ret;
+
+	if (spe->virq_level == level)
+		return;
+
+	spe->virq_level = level;
+	ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id, spe->irq_num,
+				  level, spe);
+	WARN_ON(ret);
+}
+
+static __printf(2, 3)
+void print_buf_warn(struct kvm_vcpu *vcpu, char *fmt, ...)
+{
+	va_list va;
+
+	va_start(va, fmt);
+	kvm_warn_ratelimited("%pV [PMBSR=0x%016llx, PMBPTR=0x%016llx, PMBLIMITR=0x%016llx]\n",
+			    &(struct va_format){ fmt, &va },
+			    __vcpu_sys_reg(vcpu, PMBSR_EL1),
+			    __vcpu_sys_reg(vcpu, PMBPTR_EL1),
+			    __vcpu_sys_reg(vcpu, PMBLIMITR_EL1));
+	va_end(va);
+}
+
+static void kvm_spe_inject_ext_abt(struct kvm_vcpu *vcpu)
+{
+	__vcpu_sys_reg(vcpu, PMBSR_EL1) = BIT(SYS_PMBSR_EL1_EA_SHIFT) |
+					  BIT(SYS_PMBSR_EL1_S_SHIFT);
+	__vcpu_sys_reg(vcpu, PMBSR_EL1) |= SYS_PMBSR_EL1_EC_FAULT_S1;
+	/* Synchronous External Abort, not on translation table walk. */
+	__vcpu_sys_reg(vcpu, PMBSR_EL1) |= 0x10 << SYS_PMBSR_EL1_FAULT_FSC_SHIFT;
+}
+
+void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+	u64 pmbsr, pmbsr_ec;
+
+	if (!spe->hwirq_level)
+		return;
+	spe->hwirq_level = false;
+
+	pmbsr = __vcpu_sys_reg(vcpu, PMBSR_EL1);
+	pmbsr_ec = pmbsr & (SYS_PMBSR_EL1_EC_MASK << SYS_PMBSR_EL1_EC_SHIFT);
+
+	switch (pmbsr_ec) {
+	case SYS_PMBSR_EL1_EC_FAULT_S2:
+		print_buf_warn(vcpu, "SPE stage 2 data abort");
+		kvm_spe_inject_ext_abt(vcpu);
+		break;
+	case SYS_PMBSR_EL1_EC_FAULT_S1:
+	case SYS_PMBSR_EL1_EC_BUF:
+		/*
+		 * These two exception syndromes are entirely up to the guest to
+		 * figure out, leave PMBSR_EL1 unchanged.
+		 */
+		break;
+	default:
+		print_buf_warn(vcpu, "SPE unknown buffer syndrome");
+		kvm_spe_inject_ext_abt(vcpu);
+	}
+
+	kvm_spe_update_irq(vcpu, true);
+}
+
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val)
 {
 	__vcpu_sys_reg(vcpu, reg) = val;
+
+	if (reg == PMBSR_EL1)
+		kvm_spe_update_irq(vcpu, val & BIT(SYS_PMBSR_EL1_S_SHIFT));
 }
 
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg)
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 33/38] KVM: arm64: Emulate SPE buffer management interrupt
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

A profiling buffer management interrupt is asserted when the buffer fills,
on a fault or on an external abort. The service bit, PMBSR_EL1.S, is set as
long as SPE asserts this interrupt. The interrupt can also be asserted
following a direct write to PMBSR_EL1 that sets the bit. The SPE hardware
stops asserting the interrupt only when the service bit is cleared.

KVM emulates the interrupt by reading the value of the service bit on
each guest exit to determine if the SPE hardware asserted the interrupt
(for example, if the buffer was full). Writes to the buffer registers are
trapped, to determine when the interrupt should be cleared or when the
guest wants to explicitely assert the interrupt by setting the service bit.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_spe.h |  4 ++
 arch/arm64/kvm/arm.c             |  3 ++
 arch/arm64/kvm/hyp/nvhe/spe-sr.c | 28 +++++++++++--
 arch/arm64/kvm/hyp/vhe/spe-sr.c  | 17 ++++++--
 arch/arm64/kvm/spe.c             | 72 ++++++++++++++++++++++++++++++++
 5 files changed, 117 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 7b87cf1eed37..7a7b1c2149a1 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -19,6 +19,8 @@ static __always_inline bool kvm_supports_spe(void)
 struct kvm_vcpu_spe {
 	bool initialized;	/* SPE initialized for the VCPU */
 	int irq_num;		/* Buffer management interrut number */
+	bool virq_level;	/* 'true' if the interrupt is asserted at the VGIC */
+	bool hwirq_level;	/* 'true' if the SPE hardware is asserting the interrupt */
 };
 
 struct kvm_spe {
@@ -28,6 +30,7 @@ struct kvm_spe {
 void kvm_spe_init_vm(struct kvm *kvm);
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
+void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
 
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
@@ -58,6 +61,7 @@ static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 {
 	return 0;
 }
+static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
 
 static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
 static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5e166ffc6067..49b629e7e1aa 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -966,6 +966,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 */
 		kvm_pmu_sync_hwstate(vcpu);
 
+		if (kvm_supports_spe() && kvm_vcpu_has_spe(vcpu))
+			kvm_spe_sync_hwstate(vcpu);
+
 		/*
 		 * Sync the vgic state before syncing the timer state because
 		 * the timer code needs to know if the virtual timer
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
index 4f6579daddb5..4ef84c400d4f 100644
--- a/arch/arm64/kvm/hyp/nvhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -47,6 +47,8 @@ void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
 void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 				 struct kvm_cpu_context *guest_ctxt)
 {
+	u64 pmbsr;
+
 	if (read_sysreg_s(SYS_PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
 		psb_csync();
 		dsb(nsh);
@@ -55,7 +57,22 @@ void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 	}
 
 	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
-	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	/*
+	 * We need to differentiate between the hardware asserting the interrupt
+	 * and the guest setting the service bit as a result of a direct
+	 * register write, hence the extra field in the spe struct.
+	 *
+	 * The PMBSR_EL1 register is not directly accessed by the guest, KVM
+	 * needs to update the in-memory copy when the hardware asserts the
+	 * interrupt as that's the only case when KVM will show the guest a
+	 * value which is different from what the guest last wrote to the
+	 * register.
+	 */
+	pmbsr = read_sysreg_s(SYS_PMBSR_EL1);
+	if (pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT)) {
+		ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = pmbsr;
+		vcpu->arch.spe.hwirq_level = true;
+	}
 	/* PMBLIMITR_EL1 is updated only on a trapped write. */
 	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_s(SYS_PMSCR_EL1);
 
@@ -80,8 +97,13 @@ void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 	__spe_restore_common_state(guest_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	/* The buffer management interrupt is virtual. */
+	write_sysreg_s(0, SYS_PMBSR_EL1);
+	/* The buffer is disabled when the interrupt is asserted. */
+	if (vcpu->arch.spe.virq_level)
+		write_sysreg_s(0, SYS_PMBLIMITR_EL1);
+	else
+		write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
 	write_sysreg_el2(ctxt_sys_reg(guest_ctxt, PMSCR_EL2), SYS_PMSCR);
 }
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
index f557ac64a1cc..3821807b3ec8 100644
--- a/arch/arm64/kvm/hyp/vhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -48,7 +48,7 @@ NOKPROBE_SYMBOL(__spe_save_host_state_vhe);
 void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 				struct kvm_cpu_context *guest_ctxt)
 {
-	u64 pmblimitr;
+	u64 pmblimitr, pmbsr;
 
 	/*
 	 * We're at EL2 and the buffer owning regime is EL1, which means that
@@ -66,7 +66,11 @@ void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 	}
 
 	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
-	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	pmbsr = read_sysreg_s(SYS_PMBSR_EL1);
+	if (pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT)) {
+		ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = pmbsr;
+		vcpu->arch.spe.hwirq_level = true;
+	}
 	/* PMBLIMITR_EL1 is updated only on a trapped write. */
 	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_el1(SYS_PMSCR);
 
@@ -120,8 +124,13 @@ void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
 	 */
 
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	/* The buffer management interrupt is virtual. */
+	write_sysreg_s(0, SYS_PMBSR_EL1);
+	/* The buffer is disabled when the interrupt is asserted. */
+	if (vcpu->arch.spe.virq_level)
+		write_sysreg_s(0, SYS_PMBLIMITR_EL1);
+	else
+		write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
 	write_sysreg_el1(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR);
 	/* PMSCR_EL2 has been cleared when saving the host state. */
 }
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index f5e9dc249e9a..e856554039a1 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -73,9 +73,81 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static void kvm_spe_update_irq(struct kvm_vcpu *vcpu, bool level)
+{
+	struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+	int ret;
+
+	if (spe->virq_level == level)
+		return;
+
+	spe->virq_level = level;
+	ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id, spe->irq_num,
+				  level, spe);
+	WARN_ON(ret);
+}
+
+static __printf(2, 3)
+void print_buf_warn(struct kvm_vcpu *vcpu, char *fmt, ...)
+{
+	va_list va;
+
+	va_start(va, fmt);
+	kvm_warn_ratelimited("%pV [PMBSR=0x%016llx, PMBPTR=0x%016llx, PMBLIMITR=0x%016llx]\n",
+			    &(struct va_format){ fmt, &va },
+			    __vcpu_sys_reg(vcpu, PMBSR_EL1),
+			    __vcpu_sys_reg(vcpu, PMBPTR_EL1),
+			    __vcpu_sys_reg(vcpu, PMBLIMITR_EL1));
+	va_end(va);
+}
+
+static void kvm_spe_inject_ext_abt(struct kvm_vcpu *vcpu)
+{
+	__vcpu_sys_reg(vcpu, PMBSR_EL1) = BIT(SYS_PMBSR_EL1_EA_SHIFT) |
+					  BIT(SYS_PMBSR_EL1_S_SHIFT);
+	__vcpu_sys_reg(vcpu, PMBSR_EL1) |= SYS_PMBSR_EL1_EC_FAULT_S1;
+	/* Synchronous External Abort, not on translation table walk. */
+	__vcpu_sys_reg(vcpu, PMBSR_EL1) |= 0x10 << SYS_PMBSR_EL1_FAULT_FSC_SHIFT;
+}
+
+void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+	u64 pmbsr, pmbsr_ec;
+
+	if (!spe->hwirq_level)
+		return;
+	spe->hwirq_level = false;
+
+	pmbsr = __vcpu_sys_reg(vcpu, PMBSR_EL1);
+	pmbsr_ec = pmbsr & (SYS_PMBSR_EL1_EC_MASK << SYS_PMBSR_EL1_EC_SHIFT);
+
+	switch (pmbsr_ec) {
+	case SYS_PMBSR_EL1_EC_FAULT_S2:
+		print_buf_warn(vcpu, "SPE stage 2 data abort");
+		kvm_spe_inject_ext_abt(vcpu);
+		break;
+	case SYS_PMBSR_EL1_EC_FAULT_S1:
+	case SYS_PMBSR_EL1_EC_BUF:
+		/*
+		 * These two exception syndromes are entirely up to the guest to
+		 * figure out, leave PMBSR_EL1 unchanged.
+		 */
+		break;
+	default:
+		print_buf_warn(vcpu, "SPE unknown buffer syndrome");
+		kvm_spe_inject_ext_abt(vcpu);
+	}
+
+	kvm_spe_update_irq(vcpu, true);
+}
+
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val)
 {
 	__vcpu_sys_reg(vcpu, reg) = val;
+
+	if (reg == PMBSR_EL1)
+		kvm_spe_update_irq(vcpu, val & BIT(SYS_PMBSR_EL1_S_SHIFT));
 }
 
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg)
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 34/38] KVM: arm64: Add an userspace API to stop a VCPU profiling
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Add the KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_STOP) VCPU attribute to
allow userspace to request that KVM disables profiling for that VCPU. The
ioctl does nothing yet.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst | 36 +++++++++++++++++++++++++
 arch/arm64/include/uapi/asm/kvm.h       |  4 +++
 arch/arm64/kvm/spe.c                    | 23 +++++++++++++---
 3 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 0ed852315664..2e41928c50b1 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -271,3 +271,39 @@ Returns:
 Request initialization of the Statistical Profiling Extension for this VCPU.
 Must be done after initializaing the in-kernel irqchip and after setting the
 Profiling Buffer management interrupt number for the VCPU.
+
+5.3 ATTRIBUTE: KVM_ARM_VCPU_SPE_STOP
+------------------------------------
+
+:Parameters: in kvm_device_attr.addr the address to the flag that specifies
+             what KVM should do when the guest enables profiling
+
+The flag must be exactly one of:
+
+- KVM_ARM_VCPU_SPE_STOP_TRAP: trap all register accesses and ignore the guest
+  trying to enable profiling.
+- KVM_ARM_VCPU_SPE_STOP_EXIT: exit to userspace when the guest tries to enable
+  profiling.
+- KVM_ARM_VCPU_SPE_RESUME: resume profiling, if it was previously stopped using
+  this attribute.
+
+If KVM detects that a vcpu is trying to run with SPE enabled when
+KVM_ARM_VCPU_STOP_EXIT is set, KVM_RUN will return without entering the guest
+with kvm_run.exit_reason equal to KVM_EXIT_FAIL_ENTRY, and the fail_entry struct
+will be zeroed.
+
+Returns:
+
+	 =======  ============================================
+	 -EAGAIN  SPE not initialized
+	 -EFAULT  Error accessing the flag
+	 -EINVAL  Invalid flag
+	 -ENXIO   SPE not supported or not properly configured
+	 =======  ============================================
+
+Request that KVM disables SPE for the given vcpu. This can be useful for
+migration, which relies on tracking dirty pages by write-protecting memory, but
+breaks SPE in the guest as KVM does not handle buffer stage 2 faults.
+
+The attribute must be set after SPE has been initialized successfully. It can be
+set multiple times, with the latest value overwritting the previous one.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index d4c0b53a5fb2..75a5113f610e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -371,6 +371,10 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_ARM_VCPU_SPE_CTRL		3
 #define   KVM_ARM_VCPU_SPE_IRQ		0
 #define   KVM_ARM_VCPU_SPE_INIT		1
+#define   KVM_ARM_VCPU_SPE_STOP		2
+#define     KVM_ARM_VCPU_SPE_STOP_TRAP		(1 << 0)
+#define     KVM_ARM_VCPU_SPE_STOP_EXIT		(1 << 1)
+#define     KVM_ARM_VCPU_SPE_RESUME		(1 << 2)
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index e856554039a1..95d00d8f4faf 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -222,14 +222,14 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	if (!kvm_vcpu_supports_spe(vcpu))
 		return -ENXIO;
 
-	if (vcpu->arch.spe.initialized)
-		return -EBUSY;
-
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_SPE_IRQ: {
 		int __user *uaddr = (int __user *)(long)attr->addr;
 		int irq;
 
+		if (vcpu->arch.spe.initialized)
+			return -EBUSY;
+
 		if (vcpu->arch.spe.irq_num)
 			return -EBUSY;
 
@@ -250,11 +250,27 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		if (!vgic_initialized(vcpu->kvm))
 			return -ENXIO;
 
+		if (vcpu->arch.spe.initialized)
+			return -EBUSY;
+
 		if (kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num, &vcpu->arch.spe))
 			return -ENXIO;
 
 		vcpu->arch.spe.initialized = true;
 		return 0;
+	case KVM_ARM_VCPU_SPE_STOP: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int flags;
+
+		if (!vcpu->arch.spe.initialized)
+			return -EAGAIN;
+
+		if (get_user(flags, uaddr))
+			return -EFAULT;
+
+		if (!flags)
+			return -EINVAL;
+	}
 	}
 
 	return -ENXIO;
@@ -292,6 +308,7 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	switch(attr->attr) {
 	case KVM_ARM_VCPU_SPE_IRQ:
 	case KVM_ARM_VCPU_SPE_INIT:
+	case KVM_ARM_VCPU_SPE_STOP:
 		return 0;
 	}
 
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 34/38] KVM: arm64: Add an userspace API to stop a VCPU profiling
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Add the KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_STOP) VCPU attribute to
allow userspace to request that KVM disables profiling for that VCPU. The
ioctl does nothing yet.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst | 36 +++++++++++++++++++++++++
 arch/arm64/include/uapi/asm/kvm.h       |  4 +++
 arch/arm64/kvm/spe.c                    | 23 +++++++++++++---
 3 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 0ed852315664..2e41928c50b1 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -271,3 +271,39 @@ Returns:
 Request initialization of the Statistical Profiling Extension for this VCPU.
 Must be done after initializaing the in-kernel irqchip and after setting the
 Profiling Buffer management interrupt number for the VCPU.
+
+5.3 ATTRIBUTE: KVM_ARM_VCPU_SPE_STOP
+------------------------------------
+
+:Parameters: in kvm_device_attr.addr the address to the flag that specifies
+             what KVM should do when the guest enables profiling
+
+The flag must be exactly one of:
+
+- KVM_ARM_VCPU_SPE_STOP_TRAP: trap all register accesses and ignore the guest
+  trying to enable profiling.
+- KVM_ARM_VCPU_SPE_STOP_EXIT: exit to userspace when the guest tries to enable
+  profiling.
+- KVM_ARM_VCPU_SPE_RESUME: resume profiling, if it was previously stopped using
+  this attribute.
+
+If KVM detects that a vcpu is trying to run with SPE enabled when
+KVM_ARM_VCPU_STOP_EXIT is set, KVM_RUN will return without entering the guest
+with kvm_run.exit_reason equal to KVM_EXIT_FAIL_ENTRY, and the fail_entry struct
+will be zeroed.
+
+Returns:
+
+	 =======  ============================================
+	 -EAGAIN  SPE not initialized
+	 -EFAULT  Error accessing the flag
+	 -EINVAL  Invalid flag
+	 -ENXIO   SPE not supported or not properly configured
+	 =======  ============================================
+
+Request that KVM disables SPE for the given vcpu. This can be useful for
+migration, which relies on tracking dirty pages by write-protecting memory, but
+breaks SPE in the guest as KVM does not handle buffer stage 2 faults.
+
+The attribute must be set after SPE has been initialized successfully. It can be
+set multiple times, with the latest value overwritting the previous one.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index d4c0b53a5fb2..75a5113f610e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -371,6 +371,10 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_ARM_VCPU_SPE_CTRL		3
 #define   KVM_ARM_VCPU_SPE_IRQ		0
 #define   KVM_ARM_VCPU_SPE_INIT		1
+#define   KVM_ARM_VCPU_SPE_STOP		2
+#define     KVM_ARM_VCPU_SPE_STOP_TRAP		(1 << 0)
+#define     KVM_ARM_VCPU_SPE_STOP_EXIT		(1 << 1)
+#define     KVM_ARM_VCPU_SPE_RESUME		(1 << 2)
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index e856554039a1..95d00d8f4faf 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -222,14 +222,14 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	if (!kvm_vcpu_supports_spe(vcpu))
 		return -ENXIO;
 
-	if (vcpu->arch.spe.initialized)
-		return -EBUSY;
-
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_SPE_IRQ: {
 		int __user *uaddr = (int __user *)(long)attr->addr;
 		int irq;
 
+		if (vcpu->arch.spe.initialized)
+			return -EBUSY;
+
 		if (vcpu->arch.spe.irq_num)
 			return -EBUSY;
 
@@ -250,11 +250,27 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		if (!vgic_initialized(vcpu->kvm))
 			return -ENXIO;
 
+		if (vcpu->arch.spe.initialized)
+			return -EBUSY;
+
 		if (kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num, &vcpu->arch.spe))
 			return -ENXIO;
 
 		vcpu->arch.spe.initialized = true;
 		return 0;
+	case KVM_ARM_VCPU_SPE_STOP: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int flags;
+
+		if (!vcpu->arch.spe.initialized)
+			return -EAGAIN;
+
+		if (get_user(flags, uaddr))
+			return -EFAULT;
+
+		if (!flags)
+			return -EINVAL;
+	}
 	}
 
 	return -ENXIO;
@@ -292,6 +308,7 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	switch(attr->attr) {
 	case KVM_ARM_VCPU_SPE_IRQ:
 	case KVM_ARM_VCPU_SPE_INIT:
+	case KVM_ARM_VCPU_SPE_STOP:
 		return 0;
 	}
 
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 35/38] KVM: arm64: Implement userspace API to stop a VCPU profiling
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

When userspace requests that a VCPU is not allowed to profile anymore via the
KVM_ARM_VCPU_SPE_STOP attribute, keep all the register state in memory and
trap all registers, not just the buffer registers, and don't copy any of
this shadow state on the hardware.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h   |  2 +
 arch/arm64/include/asm/kvm_spe.h   | 14 +++++++
 arch/arm64/include/uapi/asm/kvm.h  |  3 ++
 arch/arm64/kvm/arm.c               |  9 ++++
 arch/arm64/kvm/debug.c             | 13 ++++--
 arch/arm64/kvm/hyp/nvhe/debug-sr.c |  4 +-
 arch/arm64/kvm/hyp/nvhe/spe-sr.c   | 24 +++++++++++
 arch/arm64/kvm/hyp/vhe/spe-sr.c    | 56 +++++++++++++++++++++++++
 arch/arm64/kvm/spe.c               | 67 ++++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c          |  2 +-
 10 files changed, 188 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index e8541ec9fca0..e5e2aab9fabe 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -86,8 +86,10 @@ void __debug_switch_to_host(struct kvm_vcpu *vcpu);
 #ifdef __KVM_NVHE_HYPERVISOR__
 void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt);
+void __debug_save_spe(u64 *pmscr_el1);
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt);
+void __debug_restore_spe(u64 pmscr_el1);
 #ifdef CONFIG_KVM_ARM_SPE
 void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
 				struct kvm_cpu_context *host_ctxt);
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 7a7b1c2149a1..c4cdc16bfbf0 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -16,13 +16,23 @@ static __always_inline bool kvm_supports_spe(void)
 	return static_branch_likely(&kvm_spe_available);
 }
 
+/* Guest profiling disabled by the user. */
+#define KVM_VCPU_SPE_STOP_USER		(1 << 0)
+/* Stop profiling and exit to userspace when guest starts profiling. */
+#define KVM_VCPU_SPE_STOP_USER_EXIT	(1 << 1)
+
 struct kvm_vcpu_spe {
 	bool initialized;	/* SPE initialized for the VCPU */
 	int irq_num;		/* Buffer management interrut number */
 	bool virq_level;	/* 'true' if the interrupt is asserted at the VGIC */
 	bool hwirq_level;	/* 'true' if the SPE hardware is asserting the interrupt */
+	u64 flags;
 };
 
+#define kvm_spe_profiling_stopped(vcpu)					\
+	(((vcpu)->arch.spe.flags & KVM_VCPU_SPE_STOP_USER) ||		\
+	 ((vcpu)->arch.spe.flags & KVM_VCPU_SPE_STOP_USER_EXIT))	\
+
 struct kvm_spe {
 	bool perfmon_capable;	/* Is the VM perfmon_capable()? */
 };
@@ -31,6 +41,7 @@ void kvm_spe_init_vm(struct kvm *kvm);
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
+bool kvm_spe_exit_to_user(struct kvm_vcpu *vcpu);
 
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
@@ -48,6 +59,8 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 struct kvm_vcpu_spe {
 };
 
+#define kvm_spe_profiling_stopped(vcpu)		(false)
+
 struct kvm_spe {
 };
 
@@ -62,6 +75,7 @@ static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	return 0;
 }
 static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
+static inline bool kvm_spe_exit_to_user(struct kvm_vcpu *vcpu) { return false; }
 
 static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
 static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 75a5113f610e..63599ee39a7b 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -376,6 +376,9 @@ struct kvm_arm_copy_mte_tags {
 #define     KVM_ARM_VCPU_SPE_STOP_EXIT		(1 << 1)
 #define     KVM_ARM_VCPU_SPE_RESUME		(1 << 2)
 
+/* run->fail_entry.hardware_entry_failure_reason codes. */
+#define KVM_EXIT_FAIL_ENTRY_SPE		(1 << 0)
+
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
 #define KVM_ARM_IRQ_VCPU2_MASK		0xf
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 49b629e7e1aa..8c3ea26e7c29 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -915,6 +915,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 			continue;
 		}
 
+		if (unlikely(kvm_spe_exit_to_user(vcpu))) {
+			run->exit_reason = KVM_EXIT_FAIL_ENTRY;
+			run->fail_entry.hardware_entry_failure_reason
+				= KVM_EXIT_FAIL_ENTRY_SPE;
+			ret = -EAGAIN;
+			preempt_enable();
+			continue;
+		}
+
 		kvm_pmu_flush_hwstate(vcpu);
 
 		local_irq_disable();
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index c09bbbe8f62b..b11c4f633ea0 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -96,11 +96,18 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
 	if (kvm_supports_spe() && kvm_vcpu_has_spe(vcpu)) {
 		/*
 		 * Use EL1&0 for the profiling buffer translation regime and
-		 * trap accesses to the buffer control registers; leave
-		 * MDCR_EL2.TPMS unset and do not trap accesses to the profiling
-		 * control registers.
+		 * trap accesses to the buffer control registers; if profiling
+		 * is stopped, also set MSCR_EL2.TMPS to trap accesses to the
+		 * rest of the registers, otherwise leave it clear.
+		 *
+		 * Leaving MDCR_EL2.E2P unset, like we do when the VCPU does not
+		 * have SPE, means that the PMBIDR_EL1.P (which KVM does not
+		 * trap) will be set and the guest will detect SPE as being
+		 * unavailable.
 		 */
 		vcpu->arch.mdcr_el2 |= MDCR_EL2_E2PB_TRAP_EL1 << MDCR_EL2_E2PB_SHIFT;
+		if (kvm_spe_profiling_stopped(vcpu))
+			vcpu->arch.mdcr_el2 |= MDCR_EL2_TPMS;
 	} else {
 		/*
 		 * Trap accesses to the profiling control registers; leave
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 02171dcf29c3..501f787b6773 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -14,7 +14,7 @@
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
-static void __debug_save_spe(u64 *pmscr_el1)
+void __debug_save_spe(u64 *pmscr_el1)
 {
 	u64 reg;
 
@@ -40,7 +40,7 @@ static void __debug_save_spe(u64 *pmscr_el1)
 	dsb(nsh);
 }
 
-static void __debug_restore_spe(u64 pmscr_el1)
+void __debug_restore_spe(u64 pmscr_el1)
 {
 	if (!pmscr_el1)
 		return;
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
index 4ef84c400d4f..11cf65b2050c 100644
--- a/arch/arm64/kvm/hyp/nvhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -23,6 +23,11 @@ void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmblimitr;
 
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
+		return;
+	}
+
 	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
 	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
 		psb_csync();
@@ -49,6 +54,13 @@ void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmbsr;
 
+	/*
+	 * Profiling is stopped and all register accesses are trapped, nothing
+	 * to save here.
+	 */
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	if (read_sysreg_s(SYS_PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
 		psb_csync();
 		dsb(nsh);
@@ -82,6 +94,11 @@ void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
 				   struct kvm_cpu_context *host_ctxt)
 {
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
+		return;
+	}
+
 	__spe_restore_common_state(host_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
@@ -94,6 +111,13 @@ void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
 void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *guest_ctxt)
 {
+	/*
+	 * Profiling is stopped and all register accesses are trapped, nothing
+	 * to restore here.
+	 */
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	__spe_restore_common_state(guest_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
index 3821807b3ec8..241ea8a1e5d4 100644
--- a/arch/arm64/kvm/hyp/vhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -10,6 +10,34 @@
 
 #include <hyp/spe-sr.h>
 
+static void __spe_save_host_buffer(u64 *pmscr_el2)
+{
+	u64 pmblimitr;
+
+	/* Disable guest profiling. */
+	write_sysreg_el1(0, SYS_PMSCR);
+
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (!(pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT))) {
+		*pmscr_el2 = 0;
+		return;
+	}
+
+	*pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
+
+	/* Disable profiling at EL2 so we can drain the buffer. */
+	write_sysreg_el2(0, SYS_PMSCR);
+	isb();
+
+	/*
+	 * We're going to change the buffer owning exception level when we
+	 * activate traps, drain the buffer now.
+	 */
+	psb_csync();
+	dsb(nsh);
+}
+NOKPROBE_SYMBOL(__spe_save_host_buffer);
+
 /*
  * Disable host profiling, drain the buffer and save the host SPE context.
  * Extra care must be taken because profiling might be in progress.
@@ -19,6 +47,11 @@ void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmblimitr, pmscr_el2;
 
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__spe_save_host_buffer(__ctxt_sys_reg(host_ctxt, PMSCR_EL2));
+		return;
+	}
+
 	/* Disable profiling while the SPE context is being switched. */
 	pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
 	write_sysreg_el2(__vcpu_sys_reg(vcpu, PMSCR_EL2), SYS_PMSCR);
@@ -50,6 +83,9 @@ void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmblimitr, pmbsr;
 
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	/*
 	 * We're at EL2 and the buffer owning regime is EL1, which means that
 	 * profiling is disabled. After we disable traps and restore the host's
@@ -78,6 +114,18 @@ void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 }
 NOKPROBE_SYMBOL(__spe_save_guest_state_vhe);
 
+static void __spe_restore_host_buffer(u64 pmscr_el2)
+{
+	if (!pmscr_el2)
+		return;
+
+	/* Synchronize MDCR_EL2 write. */
+	isb();
+
+	write_sysreg_el2(pmscr_el2, SYS_PMSCR);
+}
+NOKPROBE_SYMBOL(__spe_restore_host_buffer);
+
 /*
  * Restore the host SPE context. Special care must be taken because we're
  * potentially resuming a profiling session which was stopped when we saved the
@@ -86,6 +134,11 @@ NOKPROBE_SYMBOL(__spe_save_guest_state_vhe);
 void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
 				  struct kvm_cpu_context *host_ctxt)
 {
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__spe_restore_host_buffer(ctxt_sys_reg(host_ctxt, PMSCR_EL2));
+		return;
+	}
+
 	__spe_restore_common_state(host_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
@@ -115,6 +168,9 @@ NOKPROBE_SYMBOL(__spe_restore_host_state_vhe);
 void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
 				   struct kvm_cpu_context *guest_ctxt)
 {
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	__spe_restore_common_state(guest_ctxt);
 
 	/*
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 95d00d8f4faf..4fa3783562ef 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -142,6 +142,28 @@ void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
 	kvm_spe_update_irq(vcpu, true);
 }
 
+static bool kvm_spe_buffer_enabled(struct kvm_vcpu *vcpu)
+{
+	return !vcpu->arch.spe.virq_level  &&
+		(__vcpu_sys_reg(vcpu, PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT));
+}
+
+bool kvm_spe_exit_to_user(struct kvm_vcpu *vcpu)
+{
+	u64 pmscr_enabled_mask = BIT(SYS_PMSCR_EL1_E0SPE_SHIFT) |
+				 BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
+
+	if (!(vcpu->arch.spe.flags & KVM_VCPU_SPE_STOP_USER_EXIT))
+		return false;
+
+	/*
+	 * We don't trap the guest dropping to EL0, so exit even if profiling is
+	 * disabled at EL1, but enabled at EL0.
+	 */
+	return kvm_spe_buffer_enabled(vcpu) &&
+		(__vcpu_sys_reg(vcpu, PMSCR_EL1) & pmscr_enabled_mask);
+}
+
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val)
 {
 	__vcpu_sys_reg(vcpu, reg) = val;
@@ -217,6 +239,31 @@ static bool kvm_spe_irq_is_valid(struct kvm *kvm, int irq)
 	return true;
 }
 
+static int kvm_spe_stop_user(struct kvm_vcpu *vcpu, int flags)
+{
+	struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+
+	if (flags & KVM_ARM_VCPU_SPE_STOP_TRAP) {
+		if (flags & ~KVM_ARM_VCPU_SPE_STOP_TRAP)
+			return -EINVAL;
+		spe->flags = KVM_VCPU_SPE_STOP_USER;
+	}
+
+	if (flags & KVM_ARM_VCPU_SPE_STOP_EXIT) {
+		if (flags & ~KVM_ARM_VCPU_SPE_STOP_EXIT)
+			return -EINVAL;
+		spe->flags = KVM_VCPU_SPE_STOP_USER_EXIT;
+	}
+
+	if (flags & KVM_ARM_VCPU_SPE_RESUME) {
+		if (flags & ~KVM_ARM_VCPU_SPE_RESUME)
+			return -EINVAL;
+		spe->flags = 0;
+	}
+
+	return 0;
+}
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
 	if (!kvm_vcpu_supports_spe(vcpu))
@@ -270,6 +317,8 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 		if (!flags)
 			return -EINVAL;
+
+		return kvm_spe_stop_user(vcpu, flags);
 	}
 	}
 
@@ -295,6 +344,24 @@ int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 		return 0;
 	}
+	case KVM_ARM_VCPU_SPE_STOP: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+		int flag = 0;
+
+		if (!vcpu->arch.spe.initialized)
+			return -EAGAIN;
+
+		if (spe->flags & KVM_VCPU_SPE_STOP_USER)
+			flag = KVM_ARM_VCPU_SPE_STOP_TRAP;
+		else if (spe->flags & KVM_VCPU_SPE_STOP_USER_EXIT)
+			flag = KVM_ARM_VCPU_SPE_STOP_EXIT;
+
+		if (put_user(flag, uaddr))
+			return -EFAULT;
+
+		return 0;
+	}
 	}
 
 	return -ENXIO;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 21b6b8bc1f25..be8801f87567 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -604,7 +604,7 @@ static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 {	int reg = r->reg;
 	u64 val = p->regval;
 
-	if (reg < PMBLIMITR_EL1) {
+	if (reg < PMBLIMITR_EL1 && !kvm_spe_profiling_stopped(vcpu)) {
 		print_sys_reg_msg(p, "Unsupported guest SPE register access at: %lx [%08lx]\n",
 				  *vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
 	}
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 35/38] KVM: arm64: Implement userspace API to stop a VCPU profiling
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

When userspace requests that a VCPU is not allowed to profile anymore via the
KVM_ARM_VCPU_SPE_STOP attribute, keep all the register state in memory and
trap all registers, not just the buffer registers, and don't copy any of
this shadow state on the hardware.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h   |  2 +
 arch/arm64/include/asm/kvm_spe.h   | 14 +++++++
 arch/arm64/include/uapi/asm/kvm.h  |  3 ++
 arch/arm64/kvm/arm.c               |  9 ++++
 arch/arm64/kvm/debug.c             | 13 ++++--
 arch/arm64/kvm/hyp/nvhe/debug-sr.c |  4 +-
 arch/arm64/kvm/hyp/nvhe/spe-sr.c   | 24 +++++++++++
 arch/arm64/kvm/hyp/vhe/spe-sr.c    | 56 +++++++++++++++++++++++++
 arch/arm64/kvm/spe.c               | 67 ++++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c          |  2 +-
 10 files changed, 188 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index e8541ec9fca0..e5e2aab9fabe 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -86,8 +86,10 @@ void __debug_switch_to_host(struct kvm_vcpu *vcpu);
 #ifdef __KVM_NVHE_HYPERVISOR__
 void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt);
+void __debug_save_spe(u64 *pmscr_el1);
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt);
+void __debug_restore_spe(u64 pmscr_el1);
 #ifdef CONFIG_KVM_ARM_SPE
 void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
 				struct kvm_cpu_context *host_ctxt);
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 7a7b1c2149a1..c4cdc16bfbf0 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -16,13 +16,23 @@ static __always_inline bool kvm_supports_spe(void)
 	return static_branch_likely(&kvm_spe_available);
 }
 
+/* Guest profiling disabled by the user. */
+#define KVM_VCPU_SPE_STOP_USER		(1 << 0)
+/* Stop profiling and exit to userspace when guest starts profiling. */
+#define KVM_VCPU_SPE_STOP_USER_EXIT	(1 << 1)
+
 struct kvm_vcpu_spe {
 	bool initialized;	/* SPE initialized for the VCPU */
 	int irq_num;		/* Buffer management interrut number */
 	bool virq_level;	/* 'true' if the interrupt is asserted at the VGIC */
 	bool hwirq_level;	/* 'true' if the SPE hardware is asserting the interrupt */
+	u64 flags;
 };
 
+#define kvm_spe_profiling_stopped(vcpu)					\
+	(((vcpu)->arch.spe.flags & KVM_VCPU_SPE_STOP_USER) ||		\
+	 ((vcpu)->arch.spe.flags & KVM_VCPU_SPE_STOP_USER_EXIT))	\
+
 struct kvm_spe {
 	bool perfmon_capable;	/* Is the VM perfmon_capable()? */
 };
@@ -31,6 +41,7 @@ void kvm_spe_init_vm(struct kvm *kvm);
 int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
+bool kvm_spe_exit_to_user(struct kvm_vcpu *vcpu);
 
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
@@ -48,6 +59,8 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 struct kvm_vcpu_spe {
 };
 
+#define kvm_spe_profiling_stopped(vcpu)		(false)
+
 struct kvm_spe {
 };
 
@@ -62,6 +75,7 @@ static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	return 0;
 }
 static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
+static inline bool kvm_spe_exit_to_user(struct kvm_vcpu *vcpu) { return false; }
 
 static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
 static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 75a5113f610e..63599ee39a7b 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -376,6 +376,9 @@ struct kvm_arm_copy_mte_tags {
 #define     KVM_ARM_VCPU_SPE_STOP_EXIT		(1 << 1)
 #define     KVM_ARM_VCPU_SPE_RESUME		(1 << 2)
 
+/* run->fail_entry.hardware_entry_failure_reason codes. */
+#define KVM_EXIT_FAIL_ENTRY_SPE		(1 << 0)
+
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
 #define KVM_ARM_IRQ_VCPU2_MASK		0xf
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 49b629e7e1aa..8c3ea26e7c29 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -915,6 +915,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 			continue;
 		}
 
+		if (unlikely(kvm_spe_exit_to_user(vcpu))) {
+			run->exit_reason = KVM_EXIT_FAIL_ENTRY;
+			run->fail_entry.hardware_entry_failure_reason
+				= KVM_EXIT_FAIL_ENTRY_SPE;
+			ret = -EAGAIN;
+			preempt_enable();
+			continue;
+		}
+
 		kvm_pmu_flush_hwstate(vcpu);
 
 		local_irq_disable();
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index c09bbbe8f62b..b11c4f633ea0 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -96,11 +96,18 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
 	if (kvm_supports_spe() && kvm_vcpu_has_spe(vcpu)) {
 		/*
 		 * Use EL1&0 for the profiling buffer translation regime and
-		 * trap accesses to the buffer control registers; leave
-		 * MDCR_EL2.TPMS unset and do not trap accesses to the profiling
-		 * control registers.
+		 * trap accesses to the buffer control registers; if profiling
+		 * is stopped, also set MSCR_EL2.TMPS to trap accesses to the
+		 * rest of the registers, otherwise leave it clear.
+		 *
+		 * Leaving MDCR_EL2.E2P unset, like we do when the VCPU does not
+		 * have SPE, means that the PMBIDR_EL1.P (which KVM does not
+		 * trap) will be set and the guest will detect SPE as being
+		 * unavailable.
 		 */
 		vcpu->arch.mdcr_el2 |= MDCR_EL2_E2PB_TRAP_EL1 << MDCR_EL2_E2PB_SHIFT;
+		if (kvm_spe_profiling_stopped(vcpu))
+			vcpu->arch.mdcr_el2 |= MDCR_EL2_TPMS;
 	} else {
 		/*
 		 * Trap accesses to the profiling control registers; leave
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 02171dcf29c3..501f787b6773 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -14,7 +14,7 @@
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
-static void __debug_save_spe(u64 *pmscr_el1)
+void __debug_save_spe(u64 *pmscr_el1)
 {
 	u64 reg;
 
@@ -40,7 +40,7 @@ static void __debug_save_spe(u64 *pmscr_el1)
 	dsb(nsh);
 }
 
-static void __debug_restore_spe(u64 pmscr_el1)
+void __debug_restore_spe(u64 pmscr_el1)
 {
 	if (!pmscr_el1)
 		return;
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
index 4ef84c400d4f..11cf65b2050c 100644
--- a/arch/arm64/kvm/hyp/nvhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -23,6 +23,11 @@ void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmblimitr;
 
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
+		return;
+	}
+
 	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
 	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
 		psb_csync();
@@ -49,6 +54,13 @@ void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmbsr;
 
+	/*
+	 * Profiling is stopped and all register accesses are trapped, nothing
+	 * to save here.
+	 */
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	if (read_sysreg_s(SYS_PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
 		psb_csync();
 		dsb(nsh);
@@ -82,6 +94,11 @@ void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
 				   struct kvm_cpu_context *host_ctxt)
 {
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
+		return;
+	}
+
 	__spe_restore_common_state(host_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
@@ -94,6 +111,13 @@ void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
 void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *guest_ctxt)
 {
+	/*
+	 * Profiling is stopped and all register accesses are trapped, nothing
+	 * to restore here.
+	 */
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	__spe_restore_common_state(guest_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
index 3821807b3ec8..241ea8a1e5d4 100644
--- a/arch/arm64/kvm/hyp/vhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -10,6 +10,34 @@
 
 #include <hyp/spe-sr.h>
 
+static void __spe_save_host_buffer(u64 *pmscr_el2)
+{
+	u64 pmblimitr;
+
+	/* Disable guest profiling. */
+	write_sysreg_el1(0, SYS_PMSCR);
+
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (!(pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT))) {
+		*pmscr_el2 = 0;
+		return;
+	}
+
+	*pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
+
+	/* Disable profiling at EL2 so we can drain the buffer. */
+	write_sysreg_el2(0, SYS_PMSCR);
+	isb();
+
+	/*
+	 * We're going to change the buffer owning exception level when we
+	 * activate traps, drain the buffer now.
+	 */
+	psb_csync();
+	dsb(nsh);
+}
+NOKPROBE_SYMBOL(__spe_save_host_buffer);
+
 /*
  * Disable host profiling, drain the buffer and save the host SPE context.
  * Extra care must be taken because profiling might be in progress.
@@ -19,6 +47,11 @@ void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmblimitr, pmscr_el2;
 
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__spe_save_host_buffer(__ctxt_sys_reg(host_ctxt, PMSCR_EL2));
+		return;
+	}
+
 	/* Disable profiling while the SPE context is being switched. */
 	pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
 	write_sysreg_el2(__vcpu_sys_reg(vcpu, PMSCR_EL2), SYS_PMSCR);
@@ -50,6 +83,9 @@ void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmblimitr, pmbsr;
 
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	/*
 	 * We're at EL2 and the buffer owning regime is EL1, which means that
 	 * profiling is disabled. After we disable traps and restore the host's
@@ -78,6 +114,18 @@ void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 }
 NOKPROBE_SYMBOL(__spe_save_guest_state_vhe);
 
+static void __spe_restore_host_buffer(u64 pmscr_el2)
+{
+	if (!pmscr_el2)
+		return;
+
+	/* Synchronize MDCR_EL2 write. */
+	isb();
+
+	write_sysreg_el2(pmscr_el2, SYS_PMSCR);
+}
+NOKPROBE_SYMBOL(__spe_restore_host_buffer);
+
 /*
  * Restore the host SPE context. Special care must be taken because we're
  * potentially resuming a profiling session which was stopped when we saved the
@@ -86,6 +134,11 @@ NOKPROBE_SYMBOL(__spe_save_guest_state_vhe);
 void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
 				  struct kvm_cpu_context *host_ctxt)
 {
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__spe_restore_host_buffer(ctxt_sys_reg(host_ctxt, PMSCR_EL2));
+		return;
+	}
+
 	__spe_restore_common_state(host_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
@@ -115,6 +168,9 @@ NOKPROBE_SYMBOL(__spe_restore_host_state_vhe);
 void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
 				   struct kvm_cpu_context *guest_ctxt)
 {
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	__spe_restore_common_state(guest_ctxt);
 
 	/*
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 95d00d8f4faf..4fa3783562ef 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -142,6 +142,28 @@ void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
 	kvm_spe_update_irq(vcpu, true);
 }
 
+static bool kvm_spe_buffer_enabled(struct kvm_vcpu *vcpu)
+{
+	return !vcpu->arch.spe.virq_level  &&
+		(__vcpu_sys_reg(vcpu, PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT));
+}
+
+bool kvm_spe_exit_to_user(struct kvm_vcpu *vcpu)
+{
+	u64 pmscr_enabled_mask = BIT(SYS_PMSCR_EL1_E0SPE_SHIFT) |
+				 BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
+
+	if (!(vcpu->arch.spe.flags & KVM_VCPU_SPE_STOP_USER_EXIT))
+		return false;
+
+	/*
+	 * We don't trap the guest dropping to EL0, so exit even if profiling is
+	 * disabled at EL1, but enabled at EL0.
+	 */
+	return kvm_spe_buffer_enabled(vcpu) &&
+		(__vcpu_sys_reg(vcpu, PMSCR_EL1) & pmscr_enabled_mask);
+}
+
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val)
 {
 	__vcpu_sys_reg(vcpu, reg) = val;
@@ -217,6 +239,31 @@ static bool kvm_spe_irq_is_valid(struct kvm *kvm, int irq)
 	return true;
 }
 
+static int kvm_spe_stop_user(struct kvm_vcpu *vcpu, int flags)
+{
+	struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+
+	if (flags & KVM_ARM_VCPU_SPE_STOP_TRAP) {
+		if (flags & ~KVM_ARM_VCPU_SPE_STOP_TRAP)
+			return -EINVAL;
+		spe->flags = KVM_VCPU_SPE_STOP_USER;
+	}
+
+	if (flags & KVM_ARM_VCPU_SPE_STOP_EXIT) {
+		if (flags & ~KVM_ARM_VCPU_SPE_STOP_EXIT)
+			return -EINVAL;
+		spe->flags = KVM_VCPU_SPE_STOP_USER_EXIT;
+	}
+
+	if (flags & KVM_ARM_VCPU_SPE_RESUME) {
+		if (flags & ~KVM_ARM_VCPU_SPE_RESUME)
+			return -EINVAL;
+		spe->flags = 0;
+	}
+
+	return 0;
+}
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
 	if (!kvm_vcpu_supports_spe(vcpu))
@@ -270,6 +317,8 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 		if (!flags)
 			return -EINVAL;
+
+		return kvm_spe_stop_user(vcpu, flags);
 	}
 	}
 
@@ -295,6 +344,24 @@ int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 		return 0;
 	}
+	case KVM_ARM_VCPU_SPE_STOP: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+		int flag = 0;
+
+		if (!vcpu->arch.spe.initialized)
+			return -EAGAIN;
+
+		if (spe->flags & KVM_VCPU_SPE_STOP_USER)
+			flag = KVM_ARM_VCPU_SPE_STOP_TRAP;
+		else if (spe->flags & KVM_VCPU_SPE_STOP_USER_EXIT)
+			flag = KVM_ARM_VCPU_SPE_STOP_EXIT;
+
+		if (put_user(flag, uaddr))
+			return -EFAULT;
+
+		return 0;
+	}
 	}
 
 	return -ENXIO;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 21b6b8bc1f25..be8801f87567 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -604,7 +604,7 @@ static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 {	int reg = r->reg;
 	u64 val = p->regval;
 
-	if (reg < PMBLIMITR_EL1) {
+	if (reg < PMBLIMITR_EL1 && !kvm_spe_profiling_stopped(vcpu)) {
 		print_sys_reg_msg(p, "Unsupported guest SPE register access at: %lx [%08lx]\n",
 				  *vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
 	}
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 36/38] KVM: arm64: Add PMSIDR_EL1 to the SPE register context
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

PMSIDR_EL1 is not part of the VCPU register context because the profiling
control registers were not trapped and the register is read-only. With the
introduction of the KVM_ARM_VCPU_SPE_STOP API, KVM will start trapping
accesses to the profiling control registers, add PMSIDR_EL1 to the VCPU
register context to prevent KVM injecting undefined exceptions.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/sys_regs.c         | 22 +++++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 102e1c087798..95306ca8f1bc 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -247,6 +247,7 @@ enum vcpu_sysreg {
 	PMSFCR_EL1,     /* Sampling Filter Control Register */
 	PMSEVFR_EL1,    /* Sampling Event Filter Register */
 	PMSLATFR_EL1,   /* Sampling Latency Filter Register */
+	PMSIDR_EL1,	/* Sampling Profiling ID Register */
 	PMBLIMITR_EL1,  /* Profiling Buffer Limit Address Register */
 	PMBPTR_EL1,     /* Profiling Buffer Write Pointer Register */
 	PMBSR_EL1,      /* Profiling Buffer Status/syndrome Register */
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index be8801f87567..132bd6da84e2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -599,6 +599,18 @@ static unsigned int spe_visibility(const struct kvm_vcpu *vcpu,
 	return REG_HIDDEN;
 }
 
+static void reset_pmsidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	/*
+	 * When SPE is stopped by userspace, the guest reads the in-memory value
+	 * of the register. When SPE is resumed, accesses to the control
+	 * registers are not trapped and the guest reads the hardware
+	 * value. Reset PMSIDR_EL1 to the hardware value to avoid mistmatches
+	 * between the two.
+	 */
+	vcpu_write_sys_reg(vcpu, read_sysreg_s(SYS_PMSIDR_EL1), PMSIDR_EL1);
+}
+
 static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {	int reg = r->reg;
@@ -609,10 +621,14 @@ static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 				  *vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
 	}
 
-	if (p->is_write)
+	if (p->is_write) {
+		if (reg == PMSIDR_EL1)
+			return write_to_read_only(vcpu, p, r);
+
 		kvm_spe_write_sysreg(vcpu, reg, val);
-	else
+	} else {
 		p->regval = kvm_spe_read_sysreg(vcpu, reg);
+	}
 
 	return true;
 }
@@ -1620,7 +1636,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SPE_SYS_REG(SYS_PMSFCR_EL1), .reg = PMSFCR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSEVFR_EL1), .reg = PMSEVFR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSLATFR_EL1), .reg = PMSLATFR_EL1 },
-	{ SPE_SYS_REG(SYS_PMSIDR_EL1), .reset = NULL },
+	{ SPE_SYS_REG(SYS_PMSIDR_EL1), .reset = reset_pmsidr, .reg = PMSIDR_EL1 },
 	{ SPE_SYS_REG(SYS_PMBLIMITR_EL1), .reg = PMBLIMITR_EL1 },
 	{ SPE_SYS_REG(SYS_PMBPTR_EL1), .reg = PMBPTR_EL1 },
 	{ SPE_SYS_REG(SYS_PMBSR_EL1), .reg = PMBSR_EL1 },
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 36/38] KVM: arm64: Add PMSIDR_EL1 to the SPE register context
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

PMSIDR_EL1 is not part of the VCPU register context because the profiling
control registers were not trapped and the register is read-only. With the
introduction of the KVM_ARM_VCPU_SPE_STOP API, KVM will start trapping
accesses to the profiling control registers, add PMSIDR_EL1 to the VCPU
register context to prevent KVM injecting undefined exceptions.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/sys_regs.c         | 22 +++++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 102e1c087798..95306ca8f1bc 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -247,6 +247,7 @@ enum vcpu_sysreg {
 	PMSFCR_EL1,     /* Sampling Filter Control Register */
 	PMSEVFR_EL1,    /* Sampling Event Filter Register */
 	PMSLATFR_EL1,   /* Sampling Latency Filter Register */
+	PMSIDR_EL1,	/* Sampling Profiling ID Register */
 	PMBLIMITR_EL1,  /* Profiling Buffer Limit Address Register */
 	PMBPTR_EL1,     /* Profiling Buffer Write Pointer Register */
 	PMBSR_EL1,      /* Profiling Buffer Status/syndrome Register */
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index be8801f87567..132bd6da84e2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -599,6 +599,18 @@ static unsigned int spe_visibility(const struct kvm_vcpu *vcpu,
 	return REG_HIDDEN;
 }
 
+static void reset_pmsidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	/*
+	 * When SPE is stopped by userspace, the guest reads the in-memory value
+	 * of the register. When SPE is resumed, accesses to the control
+	 * registers are not trapped and the guest reads the hardware
+	 * value. Reset PMSIDR_EL1 to the hardware value to avoid mistmatches
+	 * between the two.
+	 */
+	vcpu_write_sys_reg(vcpu, read_sysreg_s(SYS_PMSIDR_EL1), PMSIDR_EL1);
+}
+
 static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {	int reg = r->reg;
@@ -609,10 +621,14 @@ static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 				  *vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
 	}
 
-	if (p->is_write)
+	if (p->is_write) {
+		if (reg == PMSIDR_EL1)
+			return write_to_read_only(vcpu, p, r);
+
 		kvm_spe_write_sysreg(vcpu, reg, val);
-	else
+	} else {
 		p->regval = kvm_spe_read_sysreg(vcpu, reg);
+	}
 
 	return true;
 }
@@ -1620,7 +1636,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SPE_SYS_REG(SYS_PMSFCR_EL1), .reg = PMSFCR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSEVFR_EL1), .reg = PMSEVFR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSLATFR_EL1), .reg = PMSLATFR_EL1 },
-	{ SPE_SYS_REG(SYS_PMSIDR_EL1), .reset = NULL },
+	{ SPE_SYS_REG(SYS_PMSIDR_EL1), .reset = reset_pmsidr, .reg = PMSIDR_EL1 },
 	{ SPE_SYS_REG(SYS_PMBLIMITR_EL1), .reg = PMBLIMITR_EL1 },
 	{ SPE_SYS_REG(SYS_PMBPTR_EL1), .reg = PMBPTR_EL1 },
 	{ SPE_SYS_REG(SYS_PMBSR_EL1), .reg = PMBSR_EL1 },
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 37/38] KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Automatic NUMA balancing is a performance strategy that Linux uses to
reduce the cost associated with memory accesses by having a task use
the memory closest to the NUMA node where the task is executing. This is
accomplished by triggering periodic page faults to examine the memory
location that a task uses, and decide if page migration is necessary.

The periodic page faults that drive automatic NUMA balancing are triggered
by clearing permissions on certain pages from the task's address space.
Clearing the permissions invokes mmu_notifier_invalidate_range_start(),
which causes guest memory from being unmapped from stage 2. As a result,
SPE can start reporting stage 2 faults, which KVM has no way of handling.

Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING to keep SPE usable
for a guest.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 9c8c8424ab58..5899ee95fbda 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -56,7 +56,7 @@ config NVHE_EL2_DEBUG
 
 config KVM_ARM_SPE
 	bool "Virtual Statistical Profiling Extension (SPE) support"
-	depends on KVM && ARM_SPE_PMU=y
+	depends on KVM && ARM_SPE_PMU=y && !NUMA_BALANCING
 	default y
 	help
 	  Adds support for Statistical Profiling Extension (SPE) in virtual
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 37/38] KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Automatic NUMA balancing is a performance strategy that Linux uses to
reduce the cost associated with memory accesses by having a task use
the memory closest to the NUMA node where the task is executing. This is
accomplished by triggering periodic page faults to examine the memory
location that a task uses, and decide if page migration is necessary.

The periodic page faults that drive automatic NUMA balancing are triggered
by clearing permissions on certain pages from the task's address space.
Clearing the permissions invokes mmu_notifier_invalidate_range_start(),
which causes guest memory from being unmapped from stage 2. As a result,
SPE can start reporting stage 2 faults, which KVM has no way of handling.

Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING to keep SPE usable
for a guest.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 9c8c8424ab58..5899ee95fbda 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -56,7 +56,7 @@ config NVHE_EL2_DEBUG
 
 config KVM_ARM_SPE
 	bool "Virtual Statistical Profiling Extension (SPE) support"
-	depends on KVM && ARM_SPE_PMU=y
+	depends on KVM && ARM_SPE_PMU=y && !NUMA_BALANCING
 	default y
 	help
 	  Adds support for Statistical Profiling Extension (SPE) in virtual
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 38/38] KVM: arm64: Allow userspace to enable SPE for guests
  2021-11-17 15:38 ` Alexandru Elisei
@ 2021-11-17 15:38   ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Everything is in place to emulate SPE for a guest, allow userspace to set
the VCPU feature.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/arm.c              | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 95306ca8f1bc..c095b4216c01 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -39,7 +39,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 8c3ea26e7c29..087566eccb1b 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -306,7 +306,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = 1;
 		break;
 	case KVM_CAP_ARM_SPE:
-		r = 0;
+		r = kvm_supports_spe();
 		break;
 	default:
 		r = 0;
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [RFC PATCH v5 38/38] KVM: arm64: Allow userspace to enable SPE for guests
@ 2021-11-17 15:38   ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2021-11-17 15:38 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Everything is in place to emulate SPE for a guest, allow userspace to set
the VCPU feature.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/arm.c              | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 95306ca8f1bc..c095b4216c01 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -39,7 +39,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 8c3ea26e7c29..087566eccb1b 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -306,7 +306,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = 1;
 		break;
 	case KVM_CAP_ARM_SPE:
-		r = 0;
+		r = kvm_supports_spe();
 		break;
 	default:
 		r = 0;
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 19/38] KVM: arm64: Do not run a VCPU on a CPU without SPE
  2021-11-17 15:38   ` Alexandru Elisei
@ 2022-01-10 11:40     ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-01-10 11:40 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Hello,

This patch will be dropped in the next iteration, and instead I'll
implement the same approach that PMU emulation emulation uses, which is
currently being worked on [1].

Prospective reviewers can safely ignore this patch.

[1] https://lore.kernel.org/linux-arm-kernel/20211213152309.158462-1-alexandru.elisei@arm.com

Thanks,
Alex

On Wed, Nov 17, 2021 at 03:38:23PM +0000, Alexandru Elisei wrote:
> The kernel allows heterogeneous systems where FEAT_SPE is not present on
> all CPUs. This presents a challenge for KVM, as it will have to touch the
> SPE registers when emulating SPE for a guest, and those accesses will cause
> an undefined exception if SPE is not present on the CPU.
> 
> Avoid this situation by keeping a cpumask of CPUs that the VCPU is
> allowed run on, which for SPE is the reunion of all CPUs that support
> SPE, and refuse to run the VCPU on a CPU which is not part of the
> cpumask.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  3 +++
>  arch/arm64/kvm/arm.c              | 15 +++++++++++++++
>  arch/arm64/kvm/spe.c              |  2 ++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 8b3faed48914..96ce98f6135d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -405,6 +405,9 @@ struct kvm_vcpu_arch {
>  		u64 last_steal;
>  		gpa_t base;
>  	} steal;
> +
> +	cpumask_var_t supported_cpus;
> +	bool cpu_not_supported;
>  };
>  
>  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index b2997b919be2..8a7c01d1df58 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -351,6 +351,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
>  
> +	if (!zalloc_cpumask_var(&vcpu->arch.supported_cpus, GFP_KERNEL))
> +		return -ENOMEM;
> +
>  	/* Set up the timer */
>  	kvm_timer_vcpu_init(vcpu);
>  
> @@ -378,6 +381,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  	if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
>  		static_branch_dec(&userspace_irqchip_in_use);
>  
> +	free_cpumask_var(vcpu->arch.supported_cpus);
>  	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>  	kvm_timer_vcpu_terminate(vcpu);
>  	kvm_pmu_vcpu_destroy(vcpu);
> @@ -456,6 +460,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (vcpu_has_ptrauth(vcpu))
>  		vcpu_ptrauth_disable(vcpu);
>  	kvm_arch_vcpu_load_debug_state_flags(vcpu);
> +
> +	if (!cpumask_empty(vcpu->arch.supported_cpus) &&
> +	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
> +		vcpu->arch.cpu_not_supported = true;
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -893,6 +901,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  		 */
>  		preempt_disable();
>  
> +		if (unlikely(vcpu->arch.cpu_not_supported)) {
> +			vcpu->arch.cpu_not_supported = false;
> +			ret = -ENOEXEC;
> +			preempt_enable();
> +			continue;
> +		}
> +
>  		kvm_pmu_flush_hwstate(vcpu);
>  
>  		local_irq_disable();
> diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
> index 7c6f94358cc1..f3863728bab6 100644
> --- a/arch/arm64/kvm/spe.c
> +++ b/arch/arm64/kvm/spe.c
> @@ -40,5 +40,7 @@ int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
>  	if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT))
>  		return -EINVAL;
>  
> +	cpumask_copy(vcpu->arch.supported_cpus, &supported_cpus);
> +
>  	return 0;
>  }
> -- 
> 2.33.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 19/38] KVM: arm64: Do not run a VCPU on a CPU without SPE
@ 2022-01-10 11:40     ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-01-10 11:40 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Hello,

This patch will be dropped in the next iteration, and instead I'll
implement the same approach that PMU emulation emulation uses, which is
currently being worked on [1].

Prospective reviewers can safely ignore this patch.

[1] https://lore.kernel.org/linux-arm-kernel/20211213152309.158462-1-alexandru.elisei@arm.com

Thanks,
Alex

On Wed, Nov 17, 2021 at 03:38:23PM +0000, Alexandru Elisei wrote:
> The kernel allows heterogeneous systems where FEAT_SPE is not present on
> all CPUs. This presents a challenge for KVM, as it will have to touch the
> SPE registers when emulating SPE for a guest, and those accesses will cause
> an undefined exception if SPE is not present on the CPU.
> 
> Avoid this situation by keeping a cpumask of CPUs that the VCPU is
> allowed run on, which for SPE is the reunion of all CPUs that support
> SPE, and refuse to run the VCPU on a CPU which is not part of the
> cpumask.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  3 +++
>  arch/arm64/kvm/arm.c              | 15 +++++++++++++++
>  arch/arm64/kvm/spe.c              |  2 ++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 8b3faed48914..96ce98f6135d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -405,6 +405,9 @@ struct kvm_vcpu_arch {
>  		u64 last_steal;
>  		gpa_t base;
>  	} steal;
> +
> +	cpumask_var_t supported_cpus;
> +	bool cpu_not_supported;
>  };
>  
>  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index b2997b919be2..8a7c01d1df58 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -351,6 +351,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
>  
> +	if (!zalloc_cpumask_var(&vcpu->arch.supported_cpus, GFP_KERNEL))
> +		return -ENOMEM;
> +
>  	/* Set up the timer */
>  	kvm_timer_vcpu_init(vcpu);
>  
> @@ -378,6 +381,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  	if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
>  		static_branch_dec(&userspace_irqchip_in_use);
>  
> +	free_cpumask_var(vcpu->arch.supported_cpus);
>  	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>  	kvm_timer_vcpu_terminate(vcpu);
>  	kvm_pmu_vcpu_destroy(vcpu);
> @@ -456,6 +460,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (vcpu_has_ptrauth(vcpu))
>  		vcpu_ptrauth_disable(vcpu);
>  	kvm_arch_vcpu_load_debug_state_flags(vcpu);
> +
> +	if (!cpumask_empty(vcpu->arch.supported_cpus) &&
> +	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
> +		vcpu->arch.cpu_not_supported = true;
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -893,6 +901,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  		 */
>  		preempt_disable();
>  
> +		if (unlikely(vcpu->arch.cpu_not_supported)) {
> +			vcpu->arch.cpu_not_supported = false;
> +			ret = -ENOEXEC;
> +			preempt_enable();
> +			continue;
> +		}
> +
>  		kvm_pmu_flush_hwstate(vcpu);
>  
>  		local_irq_disable();
> diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
> index 7c6f94358cc1..f3863728bab6 100644
> --- a/arch/arm64/kvm/spe.c
> +++ b/arch/arm64/kvm/spe.c
> @@ -40,5 +40,7 @@ int kvm_spe_vcpu_enable_spe(struct kvm_vcpu *vcpu)
>  	if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT))
>  		return -EINVAL;
>  
> +	cpumask_copy(vcpu->arch.supported_cpus, &supported_cpus);
> +
>  	return 0;
>  }
> -- 
> 2.33.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 01/38] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
  2021-11-17 15:38   ` Alexandru Elisei
@ 2022-02-15  5:34     ` Reiji Watanabe
  -1 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-15  5:34 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: Marc Zyngier, Will Deacon, kvmarm, Linux ARM

Hi Alex,

On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> The VGIC code uses the lock_all_vcpus() function to make sure no VCPUs are
> run while it fiddles with the global VGIC state. Move the declaration of
> lock_all_vcpus() and the corresponding unlock function into asm/kvm_host.h
> where it can be reused by other parts of KVM/arm64 and rename the functions
> to kvm_{lock,unlock}_all_vcpus() to make them more generic.
>
> Because the scope of the code potentially using the functions has
> increased, add a lockdep check that the kvm->lock is held by the caller.
> Holding the lock is necessary because otherwise userspace would be able to
> create new VCPUs and run them while the existing VCPUs are locked.
>
> No functional change intended.
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h     |  3 ++
>  arch/arm64/kvm/arm.c                  | 41 ++++++++++++++++++++++
>  arch/arm64/kvm/vgic/vgic-init.c       |  4 +--
>  arch/arm64/kvm/vgic/vgic-its.c        |  8 ++---
>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 50 ++++-----------------------
>  arch/arm64/kvm/vgic/vgic.h            |  3 --
>  6 files changed, 56 insertions(+), 53 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2a5f7f38006f..733621e41900 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -606,6 +606,9 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
>  void kvm_arm_halt_guest(struct kvm *kvm);
>  void kvm_arm_resume_guest(struct kvm *kvm);
>
> +bool kvm_lock_all_vcpus(struct kvm *kvm);
> +void kvm_unlock_all_vcpus(struct kvm *kvm);
> +
>  #ifndef __KVM_NVHE_HYPERVISOR__
>  #define kvm_call_hyp_nvhe(f, ...)                                              \
>         ({                                                              \
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 2f03cbfefe67..e9b4ad7b5c82 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -651,6 +651,47 @@ void kvm_arm_resume_guest(struct kvm *kvm)
>         }
>  }
>
> +/* unlocks vcpus from @vcpu_lock_idx and smaller */
> +static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
> +{
> +       struct kvm_vcpu *tmp_vcpu;
> +
> +       for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> +               tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> +               mutex_unlock(&tmp_vcpu->mutex);
> +       }
> +}
> +
> +void kvm_unlock_all_vcpus(struct kvm *kvm)
> +{
> +       lockdep_assert_held(&kvm->lock);
> +       unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
> +}
> +
> +/* Returns true if all vcpus were locked, false otherwise */
> +bool kvm_lock_all_vcpus(struct kvm *kvm)
> +{
> +       struct kvm_vcpu *tmp_vcpu;
> +       int c;
> +
> +       lockdep_assert_held(&kvm->lock);
> +
> +       /*
> +        * Any time a vcpu is run, vcpu_load is called which tries to grab the
> +        * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure that

Nit: vcpu_load() doesn't try to grab the vcpu->mutex, but kvm_vcpu_ioctl()
does (The original comment in lock_all_vcpus() was outdated).

Reviewed-by: Reiji Watanabe <reijiw@google.com>

Thanks,
Reiji


> +        * no other VCPUs are run and it is safe to fiddle with KVM global
> +        * state.
> +        */
> +       kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
> +               if (!mutex_trylock(&tmp_vcpu->mutex)) {
> +                       unlock_vcpus(kvm, c - 1);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
>  static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
>  {
>         struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
> diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> index 0a06d0648970..cd045c7abde8 100644
> --- a/arch/arm64/kvm/vgic/vgic-init.c
> +++ b/arch/arm64/kvm/vgic/vgic-init.c
> @@ -87,7 +87,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>                 return -ENODEV;
>
>         ret = -EBUSY;
> -       if (!lock_all_vcpus(kvm))
> +       if (!kvm_lock_all_vcpus(kvm))
>                 return ret;
>
>         kvm_for_each_vcpu(i, vcpu, kvm) {
> @@ -117,7 +117,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>                 INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
>
>  out_unlock:
> -       unlock_all_vcpus(kvm);
> +       kvm_unlock_all_vcpus(kvm);
>         return ret;
>  }
>
> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> index 089fc2ffcb43..bc4197e87d95 100644
> --- a/arch/arm64/kvm/vgic/vgic-its.c
> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> @@ -2005,7 +2005,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
>                 goto out;
>         }
>
> -       if (!lock_all_vcpus(dev->kvm)) {
> +       if (!kvm_lock_all_vcpus(dev->kvm)) {
>                 ret = -EBUSY;
>                 goto out;
>         }
> @@ -2023,7 +2023,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
>         } else {
>                 *reg = region->its_read(dev->kvm, its, addr, len);
>         }
> -       unlock_all_vcpus(dev->kvm);
> +       kvm_unlock_all_vcpus(dev->kvm);
>  out:
>         mutex_unlock(&dev->kvm->lock);
>         return ret;
> @@ -2668,7 +2668,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
>         mutex_lock(&kvm->lock);
>         mutex_lock(&its->its_lock);
>
> -       if (!lock_all_vcpus(kvm)) {
> +       if (!kvm_lock_all_vcpus(kvm)) {
>                 mutex_unlock(&its->its_lock);
>                 mutex_unlock(&kvm->lock);
>                 return -EBUSY;
> @@ -2686,7 +2686,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
>                 break;
>         }
>
> -       unlock_all_vcpus(kvm);
> +       kvm_unlock_all_vcpus(kvm);
>         mutex_unlock(&its->its_lock);
>         mutex_unlock(&kvm->lock);
>         return ret;
> diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> index 0d000d2fe8d2..c5de904643cc 100644
> --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> @@ -305,44 +305,6 @@ int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
>         return 0;
>  }
>
> -/* unlocks vcpus from @vcpu_lock_idx and smaller */
> -static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
> -{
> -       struct kvm_vcpu *tmp_vcpu;
> -
> -       for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> -               tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> -               mutex_unlock(&tmp_vcpu->mutex);
> -       }
> -}
> -
> -void unlock_all_vcpus(struct kvm *kvm)
> -{
> -       unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
> -}
> -
> -/* Returns true if all vcpus were locked, false otherwise */
> -bool lock_all_vcpus(struct kvm *kvm)
> -{
> -       struct kvm_vcpu *tmp_vcpu;
> -       int c;
> -
> -       /*
> -        * Any time a vcpu is run, vcpu_load is called which tries to grab the
> -        * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure
> -        * that no other VCPUs are run and fiddle with the vgic state while we
> -        * access it.
> -        */
> -       kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
> -               if (!mutex_trylock(&tmp_vcpu->mutex)) {
> -                       unlock_vcpus(kvm, c - 1);
> -                       return false;
> -               }
> -       }
> -
> -       return true;
> -}
> -
>  /**
>   * vgic_v2_attr_regs_access - allows user space to access VGIC v2 state
>   *
> @@ -373,7 +335,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
>         if (ret)
>                 goto out;
>
> -       if (!lock_all_vcpus(dev->kvm)) {
> +       if (!kvm_lock_all_vcpus(dev->kvm)) {
>                 ret = -EBUSY;
>                 goto out;
>         }
> @@ -390,7 +352,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
>                 break;
>         }
>
> -       unlock_all_vcpus(dev->kvm);
> +       kvm_unlock_all_vcpus(dev->kvm);
>  out:
>         mutex_unlock(&dev->kvm->lock);
>         return ret;
> @@ -539,7 +501,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
>                 goto out;
>         }
>
> -       if (!lock_all_vcpus(dev->kvm)) {
> +       if (!kvm_lock_all_vcpus(dev->kvm)) {
>                 ret = -EBUSY;
>                 goto out;
>         }
> @@ -589,7 +551,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
>                 break;
>         }
>
> -       unlock_all_vcpus(dev->kvm);
> +       kvm_unlock_all_vcpus(dev->kvm);
>  out:
>         mutex_unlock(&dev->kvm->lock);
>         return ret;
> @@ -644,12 +606,12 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
>                 case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
>                         mutex_lock(&dev->kvm->lock);
>
> -                       if (!lock_all_vcpus(dev->kvm)) {
> +                       if (!kvm_lock_all_vcpus(dev->kvm)) {
>                                 mutex_unlock(&dev->kvm->lock);
>                                 return -EBUSY;
>                         }
>                         ret = vgic_v3_save_pending_tables(dev->kvm);
> -                       unlock_all_vcpus(dev->kvm);
> +                       kvm_unlock_all_vcpus(dev->kvm);
>                         mutex_unlock(&dev->kvm->lock);
>                         return ret;
>                 }
> diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
> index 3fd6c86a7ef3..e69c839a6941 100644
> --- a/arch/arm64/kvm/vgic/vgic.h
> +++ b/arch/arm64/kvm/vgic/vgic.h
> @@ -255,9 +255,6 @@ int vgic_init(struct kvm *kvm);
>  void vgic_debug_init(struct kvm *kvm);
>  void vgic_debug_destroy(struct kvm *kvm);
>
> -bool lock_all_vcpus(struct kvm *kvm);
> -void unlock_all_vcpus(struct kvm *kvm);
> -
>  static inline int vgic_v3_max_apr_idx(struct kvm_vcpu *vcpu)
>  {
>         struct vgic_cpu *cpu_if = &vcpu->arch.vgic_cpu;
> --
> 2.33.1
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 01/38] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
@ 2022-02-15  5:34     ` Reiji Watanabe
  0 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-15  5:34 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi Alex,

On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> The VGIC code uses the lock_all_vcpus() function to make sure no VCPUs are
> run while it fiddles with the global VGIC state. Move the declaration of
> lock_all_vcpus() and the corresponding unlock function into asm/kvm_host.h
> where it can be reused by other parts of KVM/arm64 and rename the functions
> to kvm_{lock,unlock}_all_vcpus() to make them more generic.
>
> Because the scope of the code potentially using the functions has
> increased, add a lockdep check that the kvm->lock is held by the caller.
> Holding the lock is necessary because otherwise userspace would be able to
> create new VCPUs and run them while the existing VCPUs are locked.
>
> No functional change intended.
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h     |  3 ++
>  arch/arm64/kvm/arm.c                  | 41 ++++++++++++++++++++++
>  arch/arm64/kvm/vgic/vgic-init.c       |  4 +--
>  arch/arm64/kvm/vgic/vgic-its.c        |  8 ++---
>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 50 ++++-----------------------
>  arch/arm64/kvm/vgic/vgic.h            |  3 --
>  6 files changed, 56 insertions(+), 53 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2a5f7f38006f..733621e41900 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -606,6 +606,9 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
>  void kvm_arm_halt_guest(struct kvm *kvm);
>  void kvm_arm_resume_guest(struct kvm *kvm);
>
> +bool kvm_lock_all_vcpus(struct kvm *kvm);
> +void kvm_unlock_all_vcpus(struct kvm *kvm);
> +
>  #ifndef __KVM_NVHE_HYPERVISOR__
>  #define kvm_call_hyp_nvhe(f, ...)                                              \
>         ({                                                              \
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 2f03cbfefe67..e9b4ad7b5c82 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -651,6 +651,47 @@ void kvm_arm_resume_guest(struct kvm *kvm)
>         }
>  }
>
> +/* unlocks vcpus from @vcpu_lock_idx and smaller */
> +static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
> +{
> +       struct kvm_vcpu *tmp_vcpu;
> +
> +       for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> +               tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> +               mutex_unlock(&tmp_vcpu->mutex);
> +       }
> +}
> +
> +void kvm_unlock_all_vcpus(struct kvm *kvm)
> +{
> +       lockdep_assert_held(&kvm->lock);
> +       unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
> +}
> +
> +/* Returns true if all vcpus were locked, false otherwise */
> +bool kvm_lock_all_vcpus(struct kvm *kvm)
> +{
> +       struct kvm_vcpu *tmp_vcpu;
> +       int c;
> +
> +       lockdep_assert_held(&kvm->lock);
> +
> +       /*
> +        * Any time a vcpu is run, vcpu_load is called which tries to grab the
> +        * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure that

Nit: vcpu_load() doesn't try to grab the vcpu->mutex, but kvm_vcpu_ioctl()
does (The original comment in lock_all_vcpus() was outdated).

Reviewed-by: Reiji Watanabe <reijiw@google.com>

Thanks,
Reiji


> +        * no other VCPUs are run and it is safe to fiddle with KVM global
> +        * state.
> +        */
> +       kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
> +               if (!mutex_trylock(&tmp_vcpu->mutex)) {
> +                       unlock_vcpus(kvm, c - 1);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
>  static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
>  {
>         struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
> diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> index 0a06d0648970..cd045c7abde8 100644
> --- a/arch/arm64/kvm/vgic/vgic-init.c
> +++ b/arch/arm64/kvm/vgic/vgic-init.c
> @@ -87,7 +87,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>                 return -ENODEV;
>
>         ret = -EBUSY;
> -       if (!lock_all_vcpus(kvm))
> +       if (!kvm_lock_all_vcpus(kvm))
>                 return ret;
>
>         kvm_for_each_vcpu(i, vcpu, kvm) {
> @@ -117,7 +117,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>                 INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
>
>  out_unlock:
> -       unlock_all_vcpus(kvm);
> +       kvm_unlock_all_vcpus(kvm);
>         return ret;
>  }
>
> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> index 089fc2ffcb43..bc4197e87d95 100644
> --- a/arch/arm64/kvm/vgic/vgic-its.c
> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> @@ -2005,7 +2005,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
>                 goto out;
>         }
>
> -       if (!lock_all_vcpus(dev->kvm)) {
> +       if (!kvm_lock_all_vcpus(dev->kvm)) {
>                 ret = -EBUSY;
>                 goto out;
>         }
> @@ -2023,7 +2023,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
>         } else {
>                 *reg = region->its_read(dev->kvm, its, addr, len);
>         }
> -       unlock_all_vcpus(dev->kvm);
> +       kvm_unlock_all_vcpus(dev->kvm);
>  out:
>         mutex_unlock(&dev->kvm->lock);
>         return ret;
> @@ -2668,7 +2668,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
>         mutex_lock(&kvm->lock);
>         mutex_lock(&its->its_lock);
>
> -       if (!lock_all_vcpus(kvm)) {
> +       if (!kvm_lock_all_vcpus(kvm)) {
>                 mutex_unlock(&its->its_lock);
>                 mutex_unlock(&kvm->lock);
>                 return -EBUSY;
> @@ -2686,7 +2686,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
>                 break;
>         }
>
> -       unlock_all_vcpus(kvm);
> +       kvm_unlock_all_vcpus(kvm);
>         mutex_unlock(&its->its_lock);
>         mutex_unlock(&kvm->lock);
>         return ret;
> diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> index 0d000d2fe8d2..c5de904643cc 100644
> --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> @@ -305,44 +305,6 @@ int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
>         return 0;
>  }
>
> -/* unlocks vcpus from @vcpu_lock_idx and smaller */
> -static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
> -{
> -       struct kvm_vcpu *tmp_vcpu;
> -
> -       for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> -               tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> -               mutex_unlock(&tmp_vcpu->mutex);
> -       }
> -}
> -
> -void unlock_all_vcpus(struct kvm *kvm)
> -{
> -       unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
> -}
> -
> -/* Returns true if all vcpus were locked, false otherwise */
> -bool lock_all_vcpus(struct kvm *kvm)
> -{
> -       struct kvm_vcpu *tmp_vcpu;
> -       int c;
> -
> -       /*
> -        * Any time a vcpu is run, vcpu_load is called which tries to grab the
> -        * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure
> -        * that no other VCPUs are run and fiddle with the vgic state while we
> -        * access it.
> -        */
> -       kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
> -               if (!mutex_trylock(&tmp_vcpu->mutex)) {
> -                       unlock_vcpus(kvm, c - 1);
> -                       return false;
> -               }
> -       }
> -
> -       return true;
> -}
> -
>  /**
>   * vgic_v2_attr_regs_access - allows user space to access VGIC v2 state
>   *
> @@ -373,7 +335,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
>         if (ret)
>                 goto out;
>
> -       if (!lock_all_vcpus(dev->kvm)) {
> +       if (!kvm_lock_all_vcpus(dev->kvm)) {
>                 ret = -EBUSY;
>                 goto out;
>         }
> @@ -390,7 +352,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
>                 break;
>         }
>
> -       unlock_all_vcpus(dev->kvm);
> +       kvm_unlock_all_vcpus(dev->kvm);
>  out:
>         mutex_unlock(&dev->kvm->lock);
>         return ret;
> @@ -539,7 +501,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
>                 goto out;
>         }
>
> -       if (!lock_all_vcpus(dev->kvm)) {
> +       if (!kvm_lock_all_vcpus(dev->kvm)) {
>                 ret = -EBUSY;
>                 goto out;
>         }
> @@ -589,7 +551,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
>                 break;
>         }
>
> -       unlock_all_vcpus(dev->kvm);
> +       kvm_unlock_all_vcpus(dev->kvm);
>  out:
>         mutex_unlock(&dev->kvm->lock);
>         return ret;
> @@ -644,12 +606,12 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
>                 case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
>                         mutex_lock(&dev->kvm->lock);
>
> -                       if (!lock_all_vcpus(dev->kvm)) {
> +                       if (!kvm_lock_all_vcpus(dev->kvm)) {
>                                 mutex_unlock(&dev->kvm->lock);
>                                 return -EBUSY;
>                         }
>                         ret = vgic_v3_save_pending_tables(dev->kvm);
> -                       unlock_all_vcpus(dev->kvm);
> +                       kvm_unlock_all_vcpus(dev->kvm);
>                         mutex_unlock(&dev->kvm->lock);
>                         return ret;
>                 }
> diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
> index 3fd6c86a7ef3..e69c839a6941 100644
> --- a/arch/arm64/kvm/vgic/vgic.h
> +++ b/arch/arm64/kvm/vgic/vgic.h
> @@ -255,9 +255,6 @@ int vgic_init(struct kvm *kvm);
>  void vgic_debug_init(struct kvm *kvm);
>  void vgic_debug_destroy(struct kvm *kvm);
>
> -bool lock_all_vcpus(struct kvm *kvm);
> -void unlock_all_vcpus(struct kvm *kvm);
> -
>  static inline int vgic_v3_max_apr_idx(struct kvm_vcpu *vcpu)
>  {
>         struct vgic_cpu *cpu_if = &vcpu->arch.vgic_cpu;
> --
> 2.33.1
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
  2021-11-17 15:38   ` Alexandru Elisei
@ 2022-02-15  5:59     ` Reiji Watanabe
  -1 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-15  5:59 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: Marc Zyngier, Will Deacon, kvmarm, Linux ARM

Hi Alex,

On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Stage 2 faults triggered by the profiling buffer attempting to write to
> memory are reported by the SPE hardware by asserting a buffer management
> event interrupt. Interrupts are by their nature asynchronous, which means
> that the guest might have changed its stage 1 translation tables since the
> attempted write. SPE reports the guest virtual address that caused the data
> abort, not the IPA, which means that KVM would have to walk the guest's
> stage 1 tables to find the IPA. Using the AT instruction to walk the
> guest's tables in hardware is not an option because it doesn't report the
> IPA in the case of a stage 2 fault on a stage 1 table walk.
>
> Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> done by adding a capability that allows the user to pin the memory backing
> a memslot. The same capability can be used to unlock a memslot, which
> unpins the pages associated with the memslot, but doesn't unmap the IPA
> range from stage 2; in this case, the addresses will be unmapped from stage
> 2 via the MMU notifiers when the process' address space changes.
>
> For now, the capability doesn't actually do anything other than checking
> that the usage is correct; the memory operations will be added in future
> patches.
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
>  arch/arm64/include/asm/kvm_mmu.h |  3 ++
>  arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
>  arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h         |  8 ++++
>  5 files changed, 174 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index aeeb071c7688..16aa59eae3d9 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
>  This is intended to support intra-host migration of VMs between userspace VMMs,
>  upgrading the VMM process without interrupting the guest.
>
> +7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> +----------------------------------------
> +
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> +             args[0] is the slot number
> +             args[1] specifies the permisions when the memslot is locked or if
> +                     all memslots should be unlocked
> +
> +The presence of this capability indicates that KVM supports locking the memory
> +associated with the memslot, and unlocking a previously locked memslot.
> +
> +The 'flags' parameter is defined as follows:
> +
> +7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> +-------------------------------------------------
> +
> +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: args[0] contains the memory slot number
> +             args[1] contains the permissions for the locked memory:
> +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> +                     (optional) with write permissions

Nit: Those flag names don't match the ones in the code.
(Their names in the code are KVM_ARM_LOCK_MEM_READ/KVM_ARM_LOCK_MEM_WRITE)

What is the reason why KVM_ARM_LOCK_MEMORY_{READ,WRITE} flags need
to be specified even though memslot already has similar flags ??

> +:Returns: 0 on success; negative error code on failure
> +
> +Enabling this capability causes the memory described by the memslot to be
> +pinned in the process address space and the corresponding stage 2 IPA range
> +mapped at stage 2. The permissions specified in args[1] apply to both
> +mappings. The memory pinned with this capability counts towards the max
> +locked memory limit for the current process.
> +
> +The capability should be enabled when no VCPUs are in the kernel executing an
> +ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
> +VCPUs have returned. The virtual memory range described by the memslot must be
> +mapped in the userspace process without any gaps. It is considered an error if
> +write permissions are specified for a memslot which logs dirty pages.
> +
> +7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> +---------------------------------------------------
> +
> +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: args[0] contains the memory slot number
> +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> +                     which unlocks all previously locked memslots.
> +:Returns: 0 on success; negative error code on failure
> +
> +Enabling this capability causes the memory pinned when locking the memslot
> +specified in args[0] to be unpinned, or, optionally, all memslots to be
> +unlocked. The IPA range is not unmapped from stage 2.
> +>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)

Nit: An unnecessary line.

If a memslot with read/write permission is locked with read only,
and then unlocked, can userspace expect stage 2 mapping for the
memslot to be updated with read/write ?
Can userspace delete the memslot that is locked (without unlocking) ?
If so, userspace can expect the corresponding range to be implicitly
unlocked, correct ?

Thanks,
Reiji

> +
>  8. Other capabilities.
>  ======================
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 02d378887743..2c50734f048d 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
>  void kvm_set_way_flush(struct kvm_vcpu *vcpu);
>  void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>
> +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> +
>  static inline unsigned int kvm_get_vmid_bits(void)
>  {
>         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e9b4ad7b5c82..d49905d18cee 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
>         return 0;
>  }
>
> +static int kvm_arm_lock_memslot_supported(void)
> +{
> +       return 0;
> +}
> +
> +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> +                                            struct kvm_enable_cap *cap)
> +{
> +       u64 slot, action_flags;
> +       u32 action;
> +
> +       if (cap->args[2] || cap->args[3])
> +               return -EINVAL;
> +
> +       slot = cap->args[0];
> +       action = cap->flags;
> +       action_flags = cap->args[1];
> +
> +       switch (action) {
> +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
> +               return kvm_mmu_lock_memslot(kvm, slot, action_flags);
> +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
> +               return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
> +       default:
> +               return -EINVAL;
> +       }
> +}
> +
>  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                             struct kvm_enable_cap *cap)
>  {
>         int r;
>
> -       if (cap->flags)
> -               return -EINVAL;
> -
>         switch (cap->cap) {
>         case KVM_CAP_ARM_NISV_TO_USER:
> +               if (cap->flags)
> +                       return -EINVAL;
>                 r = 0;
>                 kvm->arch.return_nisv_io_abort_to_user = true;
>                 break;
> @@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                 }
>                 mutex_unlock(&kvm->lock);
>                 break;
> +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> +               if (!kvm_arm_lock_memslot_supported())
> +                       return -EINVAL;
> +               r = kvm_lock_user_memory_region_ioctl(kvm, cap);
> +               break;
>         default:
>                 r = -EINVAL;
>                 break;
> @@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>         return VM_FAULT_SIGBUS;
>  }
>
> -
>  /**
>   * kvm_arch_destroy_vm - destroy the VM data structure
>   * @kvm:       pointer to the KVM struct
> @@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>         case KVM_CAP_ARM_PTRAUTH_GENERIC:
>                 r = system_has_full_ptr_auth();
>                 break;
> +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> +               r = kvm_arm_lock_memslot_supported();
> +               break;
>         default:
>                 r = 0;
>         }
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 326cdfec74a1..f65bcbc9ae69 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>         return ret;
>  }
>
> +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> +{
> +       struct kvm_memory_slot *memslot;
> +       int ret;
> +
> +       if (slot >= KVM_MEM_SLOTS_NUM)
> +               return -EINVAL;
> +
> +       if (!(flags & KVM_ARM_LOCK_MEM_READ))
> +               return -EINVAL;
> +
> +       mutex_lock(&kvm->lock);
> +       if (!kvm_lock_all_vcpus(kvm)) {
> +               ret = -EBUSY;
> +               goto out_unlock_kvm;
> +       }
> +       mutex_lock(&kvm->slots_lock);
> +
> +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> +       if (!memslot) {
> +               ret = -EINVAL;
> +               goto out_unlock_slots;
> +       }
> +       if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
> +           ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
> +               ret = -EPERM;
> +               goto out_unlock_slots;
> +       }
> +
> +       ret = -EINVAL;
> +
> +out_unlock_slots:
> +       mutex_unlock(&kvm->slots_lock);
> +       kvm_unlock_all_vcpus(kvm);
> +out_unlock_kvm:
> +       mutex_unlock(&kvm->lock);
> +       return ret;
> +}
> +
> +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> +{
> +       bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> +       struct kvm_memory_slot *memslot;
> +       int ret;
> +
> +       if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> +               return -EINVAL;
> +
> +       mutex_lock(&kvm->slots_lock);
> +
> +       if (unlock_all) {
> +               ret = -EINVAL;
> +               goto out_unlock_slots;
> +       }
> +
> +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> +       if (!memslot) {
> +               ret = -EINVAL;
> +               goto out_unlock_slots;
> +       }
> +
> +       ret = -EINVAL;
> +
> +out_unlock_slots:
> +       mutex_unlock(&kvm->slots_lock);
> +       return ret;
> +}
> +
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>  {
>         if (!kvm->arch.mmu.pgt)
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 1daa45268de2..70c969967557 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
>  #define KVM_CAP_ARM_MTE 205
>  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> +#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
>  #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
>  #define KVM_ARM_MTE_COPY_TAGS    _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
>
> +/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
> +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK     (1 << 0)
> +#define   KVM_ARM_LOCK_MEM_READ                                (1 << 0)
> +#define   KVM_ARM_LOCK_MEM_WRITE                       (1 << 1)
> +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK   (1 << 1)
> +#define   KVM_ARM_UNLOCK_MEM_ALL                       (1 << 0)
> +
>  /* ioctl for vm fd */
>  #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
>
> --
> 2.33.1
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
@ 2022-02-15  5:59     ` Reiji Watanabe
  0 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-15  5:59 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi Alex,

On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Stage 2 faults triggered by the profiling buffer attempting to write to
> memory are reported by the SPE hardware by asserting a buffer management
> event interrupt. Interrupts are by their nature asynchronous, which means
> that the guest might have changed its stage 1 translation tables since the
> attempted write. SPE reports the guest virtual address that caused the data
> abort, not the IPA, which means that KVM would have to walk the guest's
> stage 1 tables to find the IPA. Using the AT instruction to walk the
> guest's tables in hardware is not an option because it doesn't report the
> IPA in the case of a stage 2 fault on a stage 1 table walk.
>
> Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> done by adding a capability that allows the user to pin the memory backing
> a memslot. The same capability can be used to unlock a memslot, which
> unpins the pages associated with the memslot, but doesn't unmap the IPA
> range from stage 2; in this case, the addresses will be unmapped from stage
> 2 via the MMU notifiers when the process' address space changes.
>
> For now, the capability doesn't actually do anything other than checking
> that the usage is correct; the memory operations will be added in future
> patches.
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
>  arch/arm64/include/asm/kvm_mmu.h |  3 ++
>  arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
>  arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h         |  8 ++++
>  5 files changed, 174 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index aeeb071c7688..16aa59eae3d9 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
>  This is intended to support intra-host migration of VMs between userspace VMMs,
>  upgrading the VMM process without interrupting the guest.
>
> +7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> +----------------------------------------
> +
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> +             args[0] is the slot number
> +             args[1] specifies the permisions when the memslot is locked or if
> +                     all memslots should be unlocked
> +
> +The presence of this capability indicates that KVM supports locking the memory
> +associated with the memslot, and unlocking a previously locked memslot.
> +
> +The 'flags' parameter is defined as follows:
> +
> +7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> +-------------------------------------------------
> +
> +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: args[0] contains the memory slot number
> +             args[1] contains the permissions for the locked memory:
> +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> +                     (optional) with write permissions

Nit: Those flag names don't match the ones in the code.
(Their names in the code are KVM_ARM_LOCK_MEM_READ/KVM_ARM_LOCK_MEM_WRITE)

What is the reason why KVM_ARM_LOCK_MEMORY_{READ,WRITE} flags need
to be specified even though memslot already has similar flags ??

> +:Returns: 0 on success; negative error code on failure
> +
> +Enabling this capability causes the memory described by the memslot to be
> +pinned in the process address space and the corresponding stage 2 IPA range
> +mapped at stage 2. The permissions specified in args[1] apply to both
> +mappings. The memory pinned with this capability counts towards the max
> +locked memory limit for the current process.
> +
> +The capability should be enabled when no VCPUs are in the kernel executing an
> +ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
> +VCPUs have returned. The virtual memory range described by the memslot must be
> +mapped in the userspace process without any gaps. It is considered an error if
> +write permissions are specified for a memslot which logs dirty pages.
> +
> +7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> +---------------------------------------------------
> +
> +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: args[0] contains the memory slot number
> +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> +                     which unlocks all previously locked memslots.
> +:Returns: 0 on success; negative error code on failure
> +
> +Enabling this capability causes the memory pinned when locking the memslot
> +specified in args[0] to be unpinned, or, optionally, all memslots to be
> +unlocked. The IPA range is not unmapped from stage 2.
> +>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)

Nit: An unnecessary line.

If a memslot with read/write permission is locked with read only,
and then unlocked, can userspace expect stage 2 mapping for the
memslot to be updated with read/write ?
Can userspace delete the memslot that is locked (without unlocking) ?
If so, userspace can expect the corresponding range to be implicitly
unlocked, correct ?

Thanks,
Reiji

> +
>  8. Other capabilities.
>  ======================
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 02d378887743..2c50734f048d 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
>  void kvm_set_way_flush(struct kvm_vcpu *vcpu);
>  void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>
> +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> +
>  static inline unsigned int kvm_get_vmid_bits(void)
>  {
>         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e9b4ad7b5c82..d49905d18cee 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
>         return 0;
>  }
>
> +static int kvm_arm_lock_memslot_supported(void)
> +{
> +       return 0;
> +}
> +
> +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> +                                            struct kvm_enable_cap *cap)
> +{
> +       u64 slot, action_flags;
> +       u32 action;
> +
> +       if (cap->args[2] || cap->args[3])
> +               return -EINVAL;
> +
> +       slot = cap->args[0];
> +       action = cap->flags;
> +       action_flags = cap->args[1];
> +
> +       switch (action) {
> +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
> +               return kvm_mmu_lock_memslot(kvm, slot, action_flags);
> +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
> +               return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
> +       default:
> +               return -EINVAL;
> +       }
> +}
> +
>  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                             struct kvm_enable_cap *cap)
>  {
>         int r;
>
> -       if (cap->flags)
> -               return -EINVAL;
> -
>         switch (cap->cap) {
>         case KVM_CAP_ARM_NISV_TO_USER:
> +               if (cap->flags)
> +                       return -EINVAL;
>                 r = 0;
>                 kvm->arch.return_nisv_io_abort_to_user = true;
>                 break;
> @@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                 }
>                 mutex_unlock(&kvm->lock);
>                 break;
> +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> +               if (!kvm_arm_lock_memslot_supported())
> +                       return -EINVAL;
> +               r = kvm_lock_user_memory_region_ioctl(kvm, cap);
> +               break;
>         default:
>                 r = -EINVAL;
>                 break;
> @@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
>         return VM_FAULT_SIGBUS;
>  }
>
> -
>  /**
>   * kvm_arch_destroy_vm - destroy the VM data structure
>   * @kvm:       pointer to the KVM struct
> @@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>         case KVM_CAP_ARM_PTRAUTH_GENERIC:
>                 r = system_has_full_ptr_auth();
>                 break;
> +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> +               r = kvm_arm_lock_memslot_supported();
> +               break;
>         default:
>                 r = 0;
>         }
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 326cdfec74a1..f65bcbc9ae69 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>         return ret;
>  }
>
> +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> +{
> +       struct kvm_memory_slot *memslot;
> +       int ret;
> +
> +       if (slot >= KVM_MEM_SLOTS_NUM)
> +               return -EINVAL;
> +
> +       if (!(flags & KVM_ARM_LOCK_MEM_READ))
> +               return -EINVAL;
> +
> +       mutex_lock(&kvm->lock);
> +       if (!kvm_lock_all_vcpus(kvm)) {
> +               ret = -EBUSY;
> +               goto out_unlock_kvm;
> +       }
> +       mutex_lock(&kvm->slots_lock);
> +
> +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> +       if (!memslot) {
> +               ret = -EINVAL;
> +               goto out_unlock_slots;
> +       }
> +       if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
> +           ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
> +               ret = -EPERM;
> +               goto out_unlock_slots;
> +       }
> +
> +       ret = -EINVAL;
> +
> +out_unlock_slots:
> +       mutex_unlock(&kvm->slots_lock);
> +       kvm_unlock_all_vcpus(kvm);
> +out_unlock_kvm:
> +       mutex_unlock(&kvm->lock);
> +       return ret;
> +}
> +
> +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> +{
> +       bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> +       struct kvm_memory_slot *memslot;
> +       int ret;
> +
> +       if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> +               return -EINVAL;
> +
> +       mutex_lock(&kvm->slots_lock);
> +
> +       if (unlock_all) {
> +               ret = -EINVAL;
> +               goto out_unlock_slots;
> +       }
> +
> +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> +       if (!memslot) {
> +               ret = -EINVAL;
> +               goto out_unlock_slots;
> +       }
> +
> +       ret = -EINVAL;
> +
> +out_unlock_slots:
> +       mutex_unlock(&kvm->slots_lock);
> +       return ret;
> +}
> +
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>  {
>         if (!kvm->arch.mmu.pgt)
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 1daa45268de2..70c969967557 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
>  #define KVM_CAP_ARM_MTE 205
>  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> +#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
>  #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
>  #define KVM_ARM_MTE_COPY_TAGS    _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
>
> +/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
> +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK     (1 << 0)
> +#define   KVM_ARM_LOCK_MEM_READ                                (1 << 0)
> +#define   KVM_ARM_LOCK_MEM_WRITE                       (1 << 1)
> +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK   (1 << 1)
> +#define   KVM_ARM_UNLOCK_MEM_ALL                       (1 << 0)
> +
>  /* ioctl for vm fd */
>  #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
>
> --
> 2.33.1
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 03/38] KVM: arm64: Implement the memslot lock/unlock functionality
  2021-11-17 15:38   ` Alexandru Elisei
@ 2022-02-15  7:46     ` Reiji Watanabe
  -1 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-15  7:46 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: Marc Zyngier, Will Deacon, kvmarm, Linux ARM

Hi Alex,

On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Pin memory in the process address space and map it in the stage 2 tables as
> a result of userspace enabling the KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> capability; and unpin it from the process address space when the capability
> is used with the KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK flag.
>
> The current implementation has two drawbacks which will be fixed in future
> patches:
>
> - The dcache maintenance is done when the memslot is locked, which means
>   that it is possible that memory changes made by userspace after the ioctl
>   completes won't be visible to a guest running with the MMU off.
>
> - Tag scrubbing is done when the memslot is locked. If the MTE capability
>   is enabled after the ioctl, the guest will be able to access unsanitised
>   pages. This is prevented by forbidding userspace to enable the MTE
>   capability if any memslots are locked.
>
> Only PAGE_SIZE mappings are supported at stage 2.
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  Documentation/virt/kvm/api.rst    |   4 +-
>  arch/arm64/include/asm/kvm_host.h |  11 ++
>  arch/arm64/kvm/arm.c              |  22 +++-
>  arch/arm64/kvm/mmu.c              | 204 ++++++++++++++++++++++++++++--
>  4 files changed, 226 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 16aa59eae3d9..0ac12a730013 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6979,8 +6979,8 @@ write permissions are specified for a memslot which logs dirty pages.
>
>  Enabling this capability causes the memory pinned when locking the memslot
>  specified in args[0] to be unpinned, or, optionally, all memslots to be
> -unlocked. The IPA range is not unmapped from stage 2.
> ->>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> +unlocked. The IPA range is not unmapped from stage 2. It is considered an error
> +to attempt to unlock a memslot which is not locked.
>
>  8. Other capabilities.
>  ======================
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 733621e41900..7fd70ad90c16 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -99,7 +99,18 @@ struct kvm_s2_mmu {
>         struct kvm_arch *arch;
>  };
>
> +#define KVM_MEMSLOT_LOCK_READ          (1 << 0)
> +#define KVM_MEMSLOT_LOCK_WRITE         (1 << 1)
> +#define KVM_MEMSLOT_LOCK_MASK          0x3
> +
> +struct kvm_memory_slot_page {
> +       struct list_head list;
> +       struct page *page;
> +};
> +
>  struct kvm_arch_memory_slot {
> +       struct kvm_memory_slot_page pages;
> +       u32 flags;
>  };
>
>  struct kvm_arch {
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index d49905d18cee..b9b8b43835e3 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -106,6 +106,25 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
>         }
>  }
>
> +static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
> +{
> +       struct kvm_memslots *slots = kvm_memslots(kvm);
> +       struct kvm_memory_slot *memslot;
> +       bool has_locked_memslots = false;
> +       int idx;
> +
> +       idx = srcu_read_lock(&kvm->srcu);
> +       kvm_for_each_memslot(memslot, slots) {
> +               if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
> +                       has_locked_memslots = true;
> +                       break;
> +               }
> +       }
> +       srcu_read_unlock(&kvm->srcu, idx);
> +
> +       return has_locked_memslots;
> +}
> +
>  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                             struct kvm_enable_cap *cap)
>  {
> @@ -120,7 +139,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                 break;
>         case KVM_CAP_ARM_MTE:
>                 mutex_lock(&kvm->lock);
> -               if (!system_supports_mte() || kvm->created_vcpus) {
> +               if (!system_supports_mte() || kvm->created_vcpus ||
> +                   (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
>                         r = -EINVAL;
>                 } else {
>                         r = 0;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index f65bcbc9ae69..b0a8e61315e4 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -72,6 +72,11 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>         return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
>  }
>
> +static bool memslot_is_locked(struct kvm_memory_slot *memslot)
> +{
> +       return memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK;
> +}
> +
>  /**
>   * kvm_flush_remote_tlbs() - flush all VM TLB entries for v7/8
>   * @kvm:       pointer to kvm structure.
> @@ -769,6 +774,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>         if (map_size == PAGE_SIZE)
>                 return true;
>
> +       /* Allow only PAGE_SIZE mappings for locked memslots */
> +       if (memslot_is_locked(memslot))
> +               return false;
> +
>         size = memslot->npages * PAGE_SIZE;
>
>         gpa_start = memslot->base_gfn << PAGE_SHIFT;
> @@ -1296,6 +1305,159 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>         return ret;
>  }
>
> +static int try_rlimit_memlock(unsigned long npages)
> +{
> +       unsigned long lock_limit;
> +       bool has_lock_cap;
> +       int ret = 0;
> +
> +       has_lock_cap = capable(CAP_IPC_LOCK);
> +       if (has_lock_cap)
> +               goto out;
> +
> +       lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +
> +       mmap_read_lock(current->mm);
> +       if (npages + current->mm->locked_vm > lock_limit)
> +               ret = -ENOMEM;
> +       mmap_read_unlock(current->mm);
> +
> +out:
> +       return ret;
> +}
> +
> +static void unpin_memslot_pages(struct kvm_memory_slot *memslot, bool writable)
> +{
> +       struct kvm_memory_slot_page *entry, *tmp;
> +
> +       list_for_each_entry_safe(entry, tmp, &memslot->arch.pages.list, list) {
> +               if (writable)
> +                       set_page_dirty_lock(entry->page);
> +               unpin_user_page(entry->page);
> +               kfree(entry);
> +       }
> +}

Shouldn't this be done when the memslot is deleted ?
(Or should the locked memslot be prevented from deleting ?)

> +
> +static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> +                       u64 flags)
> +{
> +       struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
> +       struct kvm_memory_slot_page *page_entry;
> +       bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
> +       enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> +       struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +       struct vm_area_struct *vma;
> +       unsigned long npages = memslot->npages;
> +       unsigned int pin_flags = FOLL_LONGTERM;
> +       unsigned long i, hva, ipa, mmu_seq;
> +       int ret;
> +
> +       ret = try_rlimit_memlock(npages);

Even if the memory for the hva described by the memslot is already
'locked' by mlock or etc, is this checking needed ?


> +       if (ret)
> +               return -ENOMEM;
> +
> +       INIT_LIST_HEAD(&memslot->arch.pages.list);
> +
> +       if (writable) {
> +               prot |= KVM_PGTABLE_PROT_W;
> +               pin_flags |= FOLL_WRITE;

The lock flag is just for stage 2 mapping, correct ?
I wonder if it is appropriate for KVM to set 'pin_flags', which is
passed to pin_user_pages(), based on the lock flag.

> +       }
> +
> +       hva = memslot->userspace_addr;
> +       ipa = memslot->base_gfn << PAGE_SHIFT;
> +
> +       mmu_seq = kvm->mmu_notifier_seq;
> +       smp_rmb();
> +
> +       for (i = 0; i < npages; i++) {
> +               page_entry = kzalloc(sizeof(*page_entry), GFP_KERNEL);
> +               if (!page_entry) {
> +                       unpin_memslot_pages(memslot, writable);
> +                       ret = -ENOMEM;
> +                       goto out_err;

Nit: It seems we can call unpin_memslot_pages() from 'out_err'
instead of calling it from each of the error cases.

> +               }
> +
> +               mmap_read_lock(current->mm);
> +               ret = pin_user_pages(hva, 1, pin_flags, &page_entry->page, &vma);
> +               if (ret != 1) {
> +                       mmap_read_unlock(current->mm);
> +                       unpin_memslot_pages(memslot, writable);
> +                       ret = -ENOMEM;
> +                       goto out_err;
> +               }
> +               if (kvm_has_mte(kvm)) {
> +                       if (vma->vm_flags & VM_SHARED) {
> +                               ret = -EFAULT;
> +                       } else {
> +                               ret = sanitise_mte_tags(kvm,
> +                                       page_to_pfn(page_entry->page),
> +                                       PAGE_SIZE);
> +                       }
> +                       if (ret) {
> +                               mmap_read_unlock(current->mm);
> +                               goto out_err;
> +                       }
> +               }
> +               mmap_read_unlock(current->mm);
> +
> +               ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
> +               if (ret) {
> +                       unpin_memslot_pages(memslot, writable);
> +                       goto out_err;
> +               }
> +
> +               spin_lock(&kvm->mmu_lock);
> +               if (mmu_notifier_retry(kvm, mmu_seq)) {
> +                       spin_unlock(&kvm->mmu_lock);
> +                       unpin_memslot_pages(memslot, writable);
> +                       ret = -EAGAIN;
> +                       goto out_err;
> +               }
> +
> +               ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> +                                            page_to_phys(page_entry->page),
> +                                            prot, &cache);
> +               spin_unlock(&kvm->mmu_lock);
> +
> +               if (ret) {
> +                       kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +                                                i << PAGE_SHIFT);
> +                       unpin_memslot_pages(memslot, writable);
> +                       goto out_err;
> +               }
> +               list_add(&page_entry->list, &memslot->arch.pages.list);
> +
> +               hva += PAGE_SIZE;
> +               ipa += PAGE_SIZE;
> +       }
> +
> +
> +       /*
> +        * Even though we've checked the limit at the start, we can still exceed
> +        * it if userspace locked other pages in the meantime or if the
> +        * CAP_IPC_LOCK capability has been revoked.
> +        */
> +       ret = account_locked_vm(current->mm, npages, true);
> +       if (ret) {
> +               kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +                                        npages << PAGE_SHIFT);
> +               unpin_memslot_pages(memslot, writable);
> +               goto out_err;
> +       }
> +
> +       memslot->arch.flags = KVM_MEMSLOT_LOCK_READ;
> +       if (writable)
> +               memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
> +
> +       kvm_mmu_free_memory_cache(&cache);
> +
> +       return 0;
> +
> +out_err:
> +       kvm_mmu_free_memory_cache(&cache);
> +       return ret;
> +}
> +
>  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>  {
>         struct kvm_memory_slot *memslot;
> @@ -1325,7 +1487,12 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>                 goto out_unlock_slots;
>         }
>
> -       ret = -EINVAL;
> +       if (memslot_is_locked(memslot)) {
> +               ret = -EBUSY;
> +               goto out_unlock_slots;
> +       }
> +
> +       ret = lock_memslot(kvm, memslot, flags);
>
>  out_unlock_slots:
>         mutex_unlock(&kvm->slots_lock);
> @@ -1335,11 +1502,22 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>         return ret;
>  }
>
> +static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> +{
> +       bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
> +       unsigned long npages = memslot->npages;
> +
> +       unpin_memslot_pages(memslot, writable);
> +       account_locked_vm(current->mm, npages, false);
> +
> +       memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> +}

What if the memslot was locked with read only but the memslot
has read/write permission set ?  Shouldn't the stage 2 mapping
updated if KVM allows for the scenario ?

Thanks,
Reiji


> +
>  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>  {
>         bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
>         struct kvm_memory_slot *memslot;
> -       int ret;
> +       int ret = 0;
>
>         if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
>                 return -EINVAL;
> @@ -1347,18 +1525,20 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>         mutex_lock(&kvm->slots_lock);
>
>         if (unlock_all) {
> -               ret = -EINVAL;
> -               goto out_unlock_slots;
> -       }
> -
> -       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> -       if (!memslot) {
> -               ret = -EINVAL;
> -               goto out_unlock_slots;
> +               kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +                       if (!memslot_is_locked(memslot))
> +                               continue;
> +                       unlock_memslot(kvm, memslot);
> +               }
> +       } else {
> +               memslot = id_to_memslot(kvm_memslots(kvm), slot);
> +               if (!memslot || !memslot_is_locked(memslot)) {
> +                       ret = -EINVAL;
> +                       goto out_unlock_slots;
> +               }
> +               unlock_memslot(kvm, memslot);
>         }
>
> -       ret = -EINVAL;
> -
>  out_unlock_slots:
>         mutex_unlock(&kvm->slots_lock);
>         return ret;
> --
> 2.33.1
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 03/38] KVM: arm64: Implement the memslot lock/unlock functionality
@ 2022-02-15  7:46     ` Reiji Watanabe
  0 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-15  7:46 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi Alex,

On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Pin memory in the process address space and map it in the stage 2 tables as
> a result of userspace enabling the KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> capability; and unpin it from the process address space when the capability
> is used with the KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK flag.
>
> The current implementation has two drawbacks which will be fixed in future
> patches:
>
> - The dcache maintenance is done when the memslot is locked, which means
>   that it is possible that memory changes made by userspace after the ioctl
>   completes won't be visible to a guest running with the MMU off.
>
> - Tag scrubbing is done when the memslot is locked. If the MTE capability
>   is enabled after the ioctl, the guest will be able to access unsanitised
>   pages. This is prevented by forbidding userspace to enable the MTE
>   capability if any memslots are locked.
>
> Only PAGE_SIZE mappings are supported at stage 2.
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  Documentation/virt/kvm/api.rst    |   4 +-
>  arch/arm64/include/asm/kvm_host.h |  11 ++
>  arch/arm64/kvm/arm.c              |  22 +++-
>  arch/arm64/kvm/mmu.c              | 204 ++++++++++++++++++++++++++++--
>  4 files changed, 226 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 16aa59eae3d9..0ac12a730013 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6979,8 +6979,8 @@ write permissions are specified for a memslot which logs dirty pages.
>
>  Enabling this capability causes the memory pinned when locking the memslot
>  specified in args[0] to be unpinned, or, optionally, all memslots to be
> -unlocked. The IPA range is not unmapped from stage 2.
> ->>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> +unlocked. The IPA range is not unmapped from stage 2. It is considered an error
> +to attempt to unlock a memslot which is not locked.
>
>  8. Other capabilities.
>  ======================
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 733621e41900..7fd70ad90c16 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -99,7 +99,18 @@ struct kvm_s2_mmu {
>         struct kvm_arch *arch;
>  };
>
> +#define KVM_MEMSLOT_LOCK_READ          (1 << 0)
> +#define KVM_MEMSLOT_LOCK_WRITE         (1 << 1)
> +#define KVM_MEMSLOT_LOCK_MASK          0x3
> +
> +struct kvm_memory_slot_page {
> +       struct list_head list;
> +       struct page *page;
> +};
> +
>  struct kvm_arch_memory_slot {
> +       struct kvm_memory_slot_page pages;
> +       u32 flags;
>  };
>
>  struct kvm_arch {
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index d49905d18cee..b9b8b43835e3 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -106,6 +106,25 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
>         }
>  }
>
> +static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
> +{
> +       struct kvm_memslots *slots = kvm_memslots(kvm);
> +       struct kvm_memory_slot *memslot;
> +       bool has_locked_memslots = false;
> +       int idx;
> +
> +       idx = srcu_read_lock(&kvm->srcu);
> +       kvm_for_each_memslot(memslot, slots) {
> +               if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
> +                       has_locked_memslots = true;
> +                       break;
> +               }
> +       }
> +       srcu_read_unlock(&kvm->srcu, idx);
> +
> +       return has_locked_memslots;
> +}
> +
>  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                             struct kvm_enable_cap *cap)
>  {
> @@ -120,7 +139,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                 break;
>         case KVM_CAP_ARM_MTE:
>                 mutex_lock(&kvm->lock);
> -               if (!system_supports_mte() || kvm->created_vcpus) {
> +               if (!system_supports_mte() || kvm->created_vcpus ||
> +                   (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
>                         r = -EINVAL;
>                 } else {
>                         r = 0;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index f65bcbc9ae69..b0a8e61315e4 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -72,6 +72,11 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>         return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
>  }
>
> +static bool memslot_is_locked(struct kvm_memory_slot *memslot)
> +{
> +       return memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK;
> +}
> +
>  /**
>   * kvm_flush_remote_tlbs() - flush all VM TLB entries for v7/8
>   * @kvm:       pointer to kvm structure.
> @@ -769,6 +774,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>         if (map_size == PAGE_SIZE)
>                 return true;
>
> +       /* Allow only PAGE_SIZE mappings for locked memslots */
> +       if (memslot_is_locked(memslot))
> +               return false;
> +
>         size = memslot->npages * PAGE_SIZE;
>
>         gpa_start = memslot->base_gfn << PAGE_SHIFT;
> @@ -1296,6 +1305,159 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>         return ret;
>  }
>
> +static int try_rlimit_memlock(unsigned long npages)
> +{
> +       unsigned long lock_limit;
> +       bool has_lock_cap;
> +       int ret = 0;
> +
> +       has_lock_cap = capable(CAP_IPC_LOCK);
> +       if (has_lock_cap)
> +               goto out;
> +
> +       lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +
> +       mmap_read_lock(current->mm);
> +       if (npages + current->mm->locked_vm > lock_limit)
> +               ret = -ENOMEM;
> +       mmap_read_unlock(current->mm);
> +
> +out:
> +       return ret;
> +}
> +
> +static void unpin_memslot_pages(struct kvm_memory_slot *memslot, bool writable)
> +{
> +       struct kvm_memory_slot_page *entry, *tmp;
> +
> +       list_for_each_entry_safe(entry, tmp, &memslot->arch.pages.list, list) {
> +               if (writable)
> +                       set_page_dirty_lock(entry->page);
> +               unpin_user_page(entry->page);
> +               kfree(entry);
> +       }
> +}

Shouldn't this be done when the memslot is deleted ?
(Or should the locked memslot be prevented from deleting ?)

> +
> +static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> +                       u64 flags)
> +{
> +       struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
> +       struct kvm_memory_slot_page *page_entry;
> +       bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
> +       enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> +       struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +       struct vm_area_struct *vma;
> +       unsigned long npages = memslot->npages;
> +       unsigned int pin_flags = FOLL_LONGTERM;
> +       unsigned long i, hva, ipa, mmu_seq;
> +       int ret;
> +
> +       ret = try_rlimit_memlock(npages);

Even if the memory for the hva described by the memslot is already
'locked' by mlock or etc, is this checking needed ?


> +       if (ret)
> +               return -ENOMEM;
> +
> +       INIT_LIST_HEAD(&memslot->arch.pages.list);
> +
> +       if (writable) {
> +               prot |= KVM_PGTABLE_PROT_W;
> +               pin_flags |= FOLL_WRITE;

The lock flag is just for stage 2 mapping, correct ?
I wonder if it is appropriate for KVM to set 'pin_flags', which is
passed to pin_user_pages(), based on the lock flag.

> +       }
> +
> +       hva = memslot->userspace_addr;
> +       ipa = memslot->base_gfn << PAGE_SHIFT;
> +
> +       mmu_seq = kvm->mmu_notifier_seq;
> +       smp_rmb();
> +
> +       for (i = 0; i < npages; i++) {
> +               page_entry = kzalloc(sizeof(*page_entry), GFP_KERNEL);
> +               if (!page_entry) {
> +                       unpin_memslot_pages(memslot, writable);
> +                       ret = -ENOMEM;
> +                       goto out_err;

Nit: It seems we can call unpin_memslot_pages() from 'out_err'
instead of calling it from each of the error cases.

> +               }
> +
> +               mmap_read_lock(current->mm);
> +               ret = pin_user_pages(hva, 1, pin_flags, &page_entry->page, &vma);
> +               if (ret != 1) {
> +                       mmap_read_unlock(current->mm);
> +                       unpin_memslot_pages(memslot, writable);
> +                       ret = -ENOMEM;
> +                       goto out_err;
> +               }
> +               if (kvm_has_mte(kvm)) {
> +                       if (vma->vm_flags & VM_SHARED) {
> +                               ret = -EFAULT;
> +                       } else {
> +                               ret = sanitise_mte_tags(kvm,
> +                                       page_to_pfn(page_entry->page),
> +                                       PAGE_SIZE);
> +                       }
> +                       if (ret) {
> +                               mmap_read_unlock(current->mm);
> +                               goto out_err;
> +                       }
> +               }
> +               mmap_read_unlock(current->mm);
> +
> +               ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
> +               if (ret) {
> +                       unpin_memslot_pages(memslot, writable);
> +                       goto out_err;
> +               }
> +
> +               spin_lock(&kvm->mmu_lock);
> +               if (mmu_notifier_retry(kvm, mmu_seq)) {
> +                       spin_unlock(&kvm->mmu_lock);
> +                       unpin_memslot_pages(memslot, writable);
> +                       ret = -EAGAIN;
> +                       goto out_err;
> +               }
> +
> +               ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> +                                            page_to_phys(page_entry->page),
> +                                            prot, &cache);
> +               spin_unlock(&kvm->mmu_lock);
> +
> +               if (ret) {
> +                       kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +                                                i << PAGE_SHIFT);
> +                       unpin_memslot_pages(memslot, writable);
> +                       goto out_err;
> +               }
> +               list_add(&page_entry->list, &memslot->arch.pages.list);
> +
> +               hva += PAGE_SIZE;
> +               ipa += PAGE_SIZE;
> +       }
> +
> +
> +       /*
> +        * Even though we've checked the limit at the start, we can still exceed
> +        * it if userspace locked other pages in the meantime or if the
> +        * CAP_IPC_LOCK capability has been revoked.
> +        */
> +       ret = account_locked_vm(current->mm, npages, true);
> +       if (ret) {
> +               kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +                                        npages << PAGE_SHIFT);
> +               unpin_memslot_pages(memslot, writable);
> +               goto out_err;
> +       }
> +
> +       memslot->arch.flags = KVM_MEMSLOT_LOCK_READ;
> +       if (writable)
> +               memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
> +
> +       kvm_mmu_free_memory_cache(&cache);
> +
> +       return 0;
> +
> +out_err:
> +       kvm_mmu_free_memory_cache(&cache);
> +       return ret;
> +}
> +
>  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>  {
>         struct kvm_memory_slot *memslot;
> @@ -1325,7 +1487,12 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>                 goto out_unlock_slots;
>         }
>
> -       ret = -EINVAL;
> +       if (memslot_is_locked(memslot)) {
> +               ret = -EBUSY;
> +               goto out_unlock_slots;
> +       }
> +
> +       ret = lock_memslot(kvm, memslot, flags);
>
>  out_unlock_slots:
>         mutex_unlock(&kvm->slots_lock);
> @@ -1335,11 +1502,22 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>         return ret;
>  }
>
> +static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> +{
> +       bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
> +       unsigned long npages = memslot->npages;
> +
> +       unpin_memslot_pages(memslot, writable);
> +       account_locked_vm(current->mm, npages, false);
> +
> +       memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> +}

What if the memslot was locked with read only but the memslot
has read/write permission set ?  Shouldn't the stage 2 mapping
updated if KVM allows for the scenario ?

Thanks,
Reiji


> +
>  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>  {
>         bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
>         struct kvm_memory_slot *memslot;
> -       int ret;
> +       int ret = 0;
>
>         if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
>                 return -EINVAL;
> @@ -1347,18 +1525,20 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>         mutex_lock(&kvm->slots_lock);
>
>         if (unlock_all) {
> -               ret = -EINVAL;
> -               goto out_unlock_slots;
> -       }
> -
> -       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> -       if (!memslot) {
> -               ret = -EINVAL;
> -               goto out_unlock_slots;
> +               kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +                       if (!memslot_is_locked(memslot))
> +                               continue;
> +                       unlock_memslot(kvm, memslot);
> +               }
> +       } else {
> +               memslot = id_to_memslot(kvm_memslots(kvm), slot);
> +               if (!memslot || !memslot_is_locked(memslot)) {
> +                       ret = -EINVAL;
> +                       goto out_unlock_slots;
> +               }
> +               unlock_memslot(kvm, memslot);
>         }
>
> -       ret = -EINVAL;
> -
>  out_unlock_slots:
>         mutex_unlock(&kvm->slots_lock);
>         return ret;
> --
> 2.33.1
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 01/38] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
  2022-02-15  5:34     ` Reiji Watanabe
@ 2022-02-15 10:34       ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-02-15 10:34 UTC (permalink / raw)
  To: Reiji Watanabe; +Cc: Marc Zyngier, Will Deacon, kvmarm, Linux ARM

Hi Reiji,

On Mon, Feb 14, 2022 at 09:34:30PM -0800, Reiji Watanabe wrote:
> Hi Alex,
> 
> On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > The VGIC code uses the lock_all_vcpus() function to make sure no VCPUs are
> > run while it fiddles with the global VGIC state. Move the declaration of
> > lock_all_vcpus() and the corresponding unlock function into asm/kvm_host.h
> > where it can be reused by other parts of KVM/arm64 and rename the functions
> > to kvm_{lock,unlock}_all_vcpus() to make them more generic.
> >
> > Because the scope of the code potentially using the functions has
> > increased, add a lockdep check that the kvm->lock is held by the caller.
> > Holding the lock is necessary because otherwise userspace would be able to
> > create new VCPUs and run them while the existing VCPUs are locked.
> >
> > No functional change intended.
> >
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h     |  3 ++
> >  arch/arm64/kvm/arm.c                  | 41 ++++++++++++++++++++++
> >  arch/arm64/kvm/vgic/vgic-init.c       |  4 +--
> >  arch/arm64/kvm/vgic/vgic-its.c        |  8 ++---
> >  arch/arm64/kvm/vgic/vgic-kvm-device.c | 50 ++++-----------------------
> >  arch/arm64/kvm/vgic/vgic.h            |  3 --
> >  6 files changed, 56 insertions(+), 53 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 2a5f7f38006f..733621e41900 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -606,6 +606,9 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> >  void kvm_arm_halt_guest(struct kvm *kvm);
> >  void kvm_arm_resume_guest(struct kvm *kvm);
> >
> > +bool kvm_lock_all_vcpus(struct kvm *kvm);
> > +void kvm_unlock_all_vcpus(struct kvm *kvm);
> > +
> >  #ifndef __KVM_NVHE_HYPERVISOR__
> >  #define kvm_call_hyp_nvhe(f, ...)                                              \
> >         ({                                                              \
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 2f03cbfefe67..e9b4ad7b5c82 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -651,6 +651,47 @@ void kvm_arm_resume_guest(struct kvm *kvm)
> >         }
> >  }
> >
> > +/* unlocks vcpus from @vcpu_lock_idx and smaller */
> > +static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
> > +{
> > +       struct kvm_vcpu *tmp_vcpu;
> > +
> > +       for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> > +               tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> > +               mutex_unlock(&tmp_vcpu->mutex);
> > +       }
> > +}
> > +
> > +void kvm_unlock_all_vcpus(struct kvm *kvm)
> > +{
> > +       lockdep_assert_held(&kvm->lock);
> > +       unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
> > +}
> > +
> > +/* Returns true if all vcpus were locked, false otherwise */
> > +bool kvm_lock_all_vcpus(struct kvm *kvm)
> > +{
> > +       struct kvm_vcpu *tmp_vcpu;
> > +       int c;
> > +
> > +       lockdep_assert_held(&kvm->lock);
> > +
> > +       /*
> > +        * Any time a vcpu is run, vcpu_load is called which tries to grab the
> > +        * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure that
> 
> Nit: vcpu_load() doesn't try to grab the vcpu->mutex, but kvm_vcpu_ioctl()
> does (The original comment in lock_all_vcpus() was outdated).

Will change.

> 
> Reviewed-by: Reiji Watanabe <reijiw@google.com>

Thanks!

Alex

> 
> Thanks,
> Reiji
> 
> 
> > +        * no other VCPUs are run and it is safe to fiddle with KVM global
> > +        * state.
> > +        */
> > +       kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
> > +               if (!mutex_trylock(&tmp_vcpu->mutex)) {
> > +                       unlock_vcpus(kvm, c - 1);
> > +                       return false;
> > +               }
> > +       }
> > +
> > +       return true;
> > +}
> > +
> >  static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
> >  {
> >         struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
> > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > index 0a06d0648970..cd045c7abde8 100644
> > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > @@ -87,7 +87,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
> >                 return -ENODEV;
> >
> >         ret = -EBUSY;
> > -       if (!lock_all_vcpus(kvm))
> > +       if (!kvm_lock_all_vcpus(kvm))
> >                 return ret;
> >
> >         kvm_for_each_vcpu(i, vcpu, kvm) {
> > @@ -117,7 +117,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
> >                 INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
> >
> >  out_unlock:
> > -       unlock_all_vcpus(kvm);
> > +       kvm_unlock_all_vcpus(kvm);
> >         return ret;
> >  }
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> > index 089fc2ffcb43..bc4197e87d95 100644
> > --- a/arch/arm64/kvm/vgic/vgic-its.c
> > +++ b/arch/arm64/kvm/vgic/vgic-its.c
> > @@ -2005,7 +2005,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
> >                 goto out;
> >         }
> >
> > -       if (!lock_all_vcpus(dev->kvm)) {
> > +       if (!kvm_lock_all_vcpus(dev->kvm)) {
> >                 ret = -EBUSY;
> >                 goto out;
> >         }
> > @@ -2023,7 +2023,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
> >         } else {
> >                 *reg = region->its_read(dev->kvm, its, addr, len);
> >         }
> > -       unlock_all_vcpus(dev->kvm);
> > +       kvm_unlock_all_vcpus(dev->kvm);
> >  out:
> >         mutex_unlock(&dev->kvm->lock);
> >         return ret;
> > @@ -2668,7 +2668,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
> >         mutex_lock(&kvm->lock);
> >         mutex_lock(&its->its_lock);
> >
> > -       if (!lock_all_vcpus(kvm)) {
> > +       if (!kvm_lock_all_vcpus(kvm)) {
> >                 mutex_unlock(&its->its_lock);
> >                 mutex_unlock(&kvm->lock);
> >                 return -EBUSY;
> > @@ -2686,7 +2686,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
> >                 break;
> >         }
> >
> > -       unlock_all_vcpus(kvm);
> > +       kvm_unlock_all_vcpus(kvm);
> >         mutex_unlock(&its->its_lock);
> >         mutex_unlock(&kvm->lock);
> >         return ret;
> > diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> > index 0d000d2fe8d2..c5de904643cc 100644
> > --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> > +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> > @@ -305,44 +305,6 @@ int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
> >         return 0;
> >  }
> >
> > -/* unlocks vcpus from @vcpu_lock_idx and smaller */
> > -static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
> > -{
> > -       struct kvm_vcpu *tmp_vcpu;
> > -
> > -       for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> > -               tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> > -               mutex_unlock(&tmp_vcpu->mutex);
> > -       }
> > -}
> > -
> > -void unlock_all_vcpus(struct kvm *kvm)
> > -{
> > -       unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
> > -}
> > -
> > -/* Returns true if all vcpus were locked, false otherwise */
> > -bool lock_all_vcpus(struct kvm *kvm)
> > -{
> > -       struct kvm_vcpu *tmp_vcpu;
> > -       int c;
> > -
> > -       /*
> > -        * Any time a vcpu is run, vcpu_load is called which tries to grab the
> > -        * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure
> > -        * that no other VCPUs are run and fiddle with the vgic state while we
> > -        * access it.
> > -        */
> > -       kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
> > -               if (!mutex_trylock(&tmp_vcpu->mutex)) {
> > -                       unlock_vcpus(kvm, c - 1);
> > -                       return false;
> > -               }
> > -       }
> > -
> > -       return true;
> > -}
> > -
> >  /**
> >   * vgic_v2_attr_regs_access - allows user space to access VGIC v2 state
> >   *
> > @@ -373,7 +335,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
> >         if (ret)
> >                 goto out;
> >
> > -       if (!lock_all_vcpus(dev->kvm)) {
> > +       if (!kvm_lock_all_vcpus(dev->kvm)) {
> >                 ret = -EBUSY;
> >                 goto out;
> >         }
> > @@ -390,7 +352,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
> >                 break;
> >         }
> >
> > -       unlock_all_vcpus(dev->kvm);
> > +       kvm_unlock_all_vcpus(dev->kvm);
> >  out:
> >         mutex_unlock(&dev->kvm->lock);
> >         return ret;
> > @@ -539,7 +501,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
> >                 goto out;
> >         }
> >
> > -       if (!lock_all_vcpus(dev->kvm)) {
> > +       if (!kvm_lock_all_vcpus(dev->kvm)) {
> >                 ret = -EBUSY;
> >                 goto out;
> >         }
> > @@ -589,7 +551,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
> >                 break;
> >         }
> >
> > -       unlock_all_vcpus(dev->kvm);
> > +       kvm_unlock_all_vcpus(dev->kvm);
> >  out:
> >         mutex_unlock(&dev->kvm->lock);
> >         return ret;
> > @@ -644,12 +606,12 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
> >                 case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
> >                         mutex_lock(&dev->kvm->lock);
> >
> > -                       if (!lock_all_vcpus(dev->kvm)) {
> > +                       if (!kvm_lock_all_vcpus(dev->kvm)) {
> >                                 mutex_unlock(&dev->kvm->lock);
> >                                 return -EBUSY;
> >                         }
> >                         ret = vgic_v3_save_pending_tables(dev->kvm);
> > -                       unlock_all_vcpus(dev->kvm);
> > +                       kvm_unlock_all_vcpus(dev->kvm);
> >                         mutex_unlock(&dev->kvm->lock);
> >                         return ret;
> >                 }
> > diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
> > index 3fd6c86a7ef3..e69c839a6941 100644
> > --- a/arch/arm64/kvm/vgic/vgic.h
> > +++ b/arch/arm64/kvm/vgic/vgic.h
> > @@ -255,9 +255,6 @@ int vgic_init(struct kvm *kvm);
> >  void vgic_debug_init(struct kvm *kvm);
> >  void vgic_debug_destroy(struct kvm *kvm);
> >
> > -bool lock_all_vcpus(struct kvm *kvm);
> > -void unlock_all_vcpus(struct kvm *kvm);
> > -
> >  static inline int vgic_v3_max_apr_idx(struct kvm_vcpu *vcpu)
> >  {
> >         struct vgic_cpu *cpu_if = &vcpu->arch.vgic_cpu;
> > --
> > 2.33.1
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 01/38] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
@ 2022-02-15 10:34       ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-02-15 10:34 UTC (permalink / raw)
  To: Reiji Watanabe
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi Reiji,

On Mon, Feb 14, 2022 at 09:34:30PM -0800, Reiji Watanabe wrote:
> Hi Alex,
> 
> On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > The VGIC code uses the lock_all_vcpus() function to make sure no VCPUs are
> > run while it fiddles with the global VGIC state. Move the declaration of
> > lock_all_vcpus() and the corresponding unlock function into asm/kvm_host.h
> > where it can be reused by other parts of KVM/arm64 and rename the functions
> > to kvm_{lock,unlock}_all_vcpus() to make them more generic.
> >
> > Because the scope of the code potentially using the functions has
> > increased, add a lockdep check that the kvm->lock is held by the caller.
> > Holding the lock is necessary because otherwise userspace would be able to
> > create new VCPUs and run them while the existing VCPUs are locked.
> >
> > No functional change intended.
> >
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h     |  3 ++
> >  arch/arm64/kvm/arm.c                  | 41 ++++++++++++++++++++++
> >  arch/arm64/kvm/vgic/vgic-init.c       |  4 +--
> >  arch/arm64/kvm/vgic/vgic-its.c        |  8 ++---
> >  arch/arm64/kvm/vgic/vgic-kvm-device.c | 50 ++++-----------------------
> >  arch/arm64/kvm/vgic/vgic.h            |  3 --
> >  6 files changed, 56 insertions(+), 53 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 2a5f7f38006f..733621e41900 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -606,6 +606,9 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> >  void kvm_arm_halt_guest(struct kvm *kvm);
> >  void kvm_arm_resume_guest(struct kvm *kvm);
> >
> > +bool kvm_lock_all_vcpus(struct kvm *kvm);
> > +void kvm_unlock_all_vcpus(struct kvm *kvm);
> > +
> >  #ifndef __KVM_NVHE_HYPERVISOR__
> >  #define kvm_call_hyp_nvhe(f, ...)                                              \
> >         ({                                                              \
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 2f03cbfefe67..e9b4ad7b5c82 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -651,6 +651,47 @@ void kvm_arm_resume_guest(struct kvm *kvm)
> >         }
> >  }
> >
> > +/* unlocks vcpus from @vcpu_lock_idx and smaller */
> > +static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
> > +{
> > +       struct kvm_vcpu *tmp_vcpu;
> > +
> > +       for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> > +               tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> > +               mutex_unlock(&tmp_vcpu->mutex);
> > +       }
> > +}
> > +
> > +void kvm_unlock_all_vcpus(struct kvm *kvm)
> > +{
> > +       lockdep_assert_held(&kvm->lock);
> > +       unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
> > +}
> > +
> > +/* Returns true if all vcpus were locked, false otherwise */
> > +bool kvm_lock_all_vcpus(struct kvm *kvm)
> > +{
> > +       struct kvm_vcpu *tmp_vcpu;
> > +       int c;
> > +
> > +       lockdep_assert_held(&kvm->lock);
> > +
> > +       /*
> > +        * Any time a vcpu is run, vcpu_load is called which tries to grab the
> > +        * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure that
> 
> Nit: vcpu_load() doesn't try to grab the vcpu->mutex, but kvm_vcpu_ioctl()
> does (The original comment in lock_all_vcpus() was outdated).

Will change.

> 
> Reviewed-by: Reiji Watanabe <reijiw@google.com>

Thanks!

Alex

> 
> Thanks,
> Reiji
> 
> 
> > +        * no other VCPUs are run and it is safe to fiddle with KVM global
> > +        * state.
> > +        */
> > +       kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
> > +               if (!mutex_trylock(&tmp_vcpu->mutex)) {
> > +                       unlock_vcpus(kvm, c - 1);
> > +                       return false;
> > +               }
> > +       }
> > +
> > +       return true;
> > +}
> > +
> >  static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
> >  {
> >         struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
> > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > index 0a06d0648970..cd045c7abde8 100644
> > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > @@ -87,7 +87,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
> >                 return -ENODEV;
> >
> >         ret = -EBUSY;
> > -       if (!lock_all_vcpus(kvm))
> > +       if (!kvm_lock_all_vcpus(kvm))
> >                 return ret;
> >
> >         kvm_for_each_vcpu(i, vcpu, kvm) {
> > @@ -117,7 +117,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
> >                 INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
> >
> >  out_unlock:
> > -       unlock_all_vcpus(kvm);
> > +       kvm_unlock_all_vcpus(kvm);
> >         return ret;
> >  }
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> > index 089fc2ffcb43..bc4197e87d95 100644
> > --- a/arch/arm64/kvm/vgic/vgic-its.c
> > +++ b/arch/arm64/kvm/vgic/vgic-its.c
> > @@ -2005,7 +2005,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
> >                 goto out;
> >         }
> >
> > -       if (!lock_all_vcpus(dev->kvm)) {
> > +       if (!kvm_lock_all_vcpus(dev->kvm)) {
> >                 ret = -EBUSY;
> >                 goto out;
> >         }
> > @@ -2023,7 +2023,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
> >         } else {
> >                 *reg = region->its_read(dev->kvm, its, addr, len);
> >         }
> > -       unlock_all_vcpus(dev->kvm);
> > +       kvm_unlock_all_vcpus(dev->kvm);
> >  out:
> >         mutex_unlock(&dev->kvm->lock);
> >         return ret;
> > @@ -2668,7 +2668,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
> >         mutex_lock(&kvm->lock);
> >         mutex_lock(&its->its_lock);
> >
> > -       if (!lock_all_vcpus(kvm)) {
> > +       if (!kvm_lock_all_vcpus(kvm)) {
> >                 mutex_unlock(&its->its_lock);
> >                 mutex_unlock(&kvm->lock);
> >                 return -EBUSY;
> > @@ -2686,7 +2686,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
> >                 break;
> >         }
> >
> > -       unlock_all_vcpus(kvm);
> > +       kvm_unlock_all_vcpus(kvm);
> >         mutex_unlock(&its->its_lock);
> >         mutex_unlock(&kvm->lock);
> >         return ret;
> > diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> > index 0d000d2fe8d2..c5de904643cc 100644
> > --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> > +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> > @@ -305,44 +305,6 @@ int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
> >         return 0;
> >  }
> >
> > -/* unlocks vcpus from @vcpu_lock_idx and smaller */
> > -static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
> > -{
> > -       struct kvm_vcpu *tmp_vcpu;
> > -
> > -       for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> > -               tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> > -               mutex_unlock(&tmp_vcpu->mutex);
> > -       }
> > -}
> > -
> > -void unlock_all_vcpus(struct kvm *kvm)
> > -{
> > -       unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
> > -}
> > -
> > -/* Returns true if all vcpus were locked, false otherwise */
> > -bool lock_all_vcpus(struct kvm *kvm)
> > -{
> > -       struct kvm_vcpu *tmp_vcpu;
> > -       int c;
> > -
> > -       /*
> > -        * Any time a vcpu is run, vcpu_load is called which tries to grab the
> > -        * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure
> > -        * that no other VCPUs are run and fiddle with the vgic state while we
> > -        * access it.
> > -        */
> > -       kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
> > -               if (!mutex_trylock(&tmp_vcpu->mutex)) {
> > -                       unlock_vcpus(kvm, c - 1);
> > -                       return false;
> > -               }
> > -       }
> > -
> > -       return true;
> > -}
> > -
> >  /**
> >   * vgic_v2_attr_regs_access - allows user space to access VGIC v2 state
> >   *
> > @@ -373,7 +335,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
> >         if (ret)
> >                 goto out;
> >
> > -       if (!lock_all_vcpus(dev->kvm)) {
> > +       if (!kvm_lock_all_vcpus(dev->kvm)) {
> >                 ret = -EBUSY;
> >                 goto out;
> >         }
> > @@ -390,7 +352,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
> >                 break;
> >         }
> >
> > -       unlock_all_vcpus(dev->kvm);
> > +       kvm_unlock_all_vcpus(dev->kvm);
> >  out:
> >         mutex_unlock(&dev->kvm->lock);
> >         return ret;
> > @@ -539,7 +501,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
> >                 goto out;
> >         }
> >
> > -       if (!lock_all_vcpus(dev->kvm)) {
> > +       if (!kvm_lock_all_vcpus(dev->kvm)) {
> >                 ret = -EBUSY;
> >                 goto out;
> >         }
> > @@ -589,7 +551,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
> >                 break;
> >         }
> >
> > -       unlock_all_vcpus(dev->kvm);
> > +       kvm_unlock_all_vcpus(dev->kvm);
> >  out:
> >         mutex_unlock(&dev->kvm->lock);
> >         return ret;
> > @@ -644,12 +606,12 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
> >                 case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
> >                         mutex_lock(&dev->kvm->lock);
> >
> > -                       if (!lock_all_vcpus(dev->kvm)) {
> > +                       if (!kvm_lock_all_vcpus(dev->kvm)) {
> >                                 mutex_unlock(&dev->kvm->lock);
> >                                 return -EBUSY;
> >                         }
> >                         ret = vgic_v3_save_pending_tables(dev->kvm);
> > -                       unlock_all_vcpus(dev->kvm);
> > +                       kvm_unlock_all_vcpus(dev->kvm);
> >                         mutex_unlock(&dev->kvm->lock);
> >                         return ret;
> >                 }
> > diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
> > index 3fd6c86a7ef3..e69c839a6941 100644
> > --- a/arch/arm64/kvm/vgic/vgic.h
> > +++ b/arch/arm64/kvm/vgic/vgic.h
> > @@ -255,9 +255,6 @@ int vgic_init(struct kvm *kvm);
> >  void vgic_debug_init(struct kvm *kvm);
> >  void vgic_debug_destroy(struct kvm *kvm);
> >
> > -bool lock_all_vcpus(struct kvm *kvm);
> > -void unlock_all_vcpus(struct kvm *kvm);
> > -
> >  static inline int vgic_v3_max_apr_idx(struct kvm_vcpu *vcpu)
> >  {
> >         struct vgic_cpu *cpu_if = &vcpu->arch.vgic_cpu;
> > --
> > 2.33.1
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
  2022-02-15  5:59     ` Reiji Watanabe
@ 2022-02-15 11:03       ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-02-15 11:03 UTC (permalink / raw)
  To: Reiji Watanabe; +Cc: Marc Zyngier, Will Deacon, kvmarm, Linux ARM

Hi Reiji,

On Mon, Feb 14, 2022 at 09:59:09PM -0800, Reiji Watanabe wrote:
> Hi Alex,
> 
> On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Stage 2 faults triggered by the profiling buffer attempting to write to
> > memory are reported by the SPE hardware by asserting a buffer management
> > event interrupt. Interrupts are by their nature asynchronous, which means
> > that the guest might have changed its stage 1 translation tables since the
> > attempted write. SPE reports the guest virtual address that caused the data
> > abort, not the IPA, which means that KVM would have to walk the guest's
> > stage 1 tables to find the IPA. Using the AT instruction to walk the
> > guest's tables in hardware is not an option because it doesn't report the
> > IPA in the case of a stage 2 fault on a stage 1 table walk.
> >
> > Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> > done by adding a capability that allows the user to pin the memory backing
> > a memslot. The same capability can be used to unlock a memslot, which
> > unpins the pages associated with the memslot, but doesn't unmap the IPA
> > range from stage 2; in this case, the addresses will be unmapped from stage
> > 2 via the MMU notifiers when the process' address space changes.
> >
> > For now, the capability doesn't actually do anything other than checking
> > that the usage is correct; the memory operations will be added in future
> > patches.
> >
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
> >  arch/arm64/include/asm/kvm_mmu.h |  3 ++
> >  arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
> >  arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
> >  include/uapi/linux/kvm.h         |  8 ++++
> >  5 files changed, 174 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index aeeb071c7688..16aa59eae3d9 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
> >  This is intended to support intra-host migration of VMs between userspace VMMs,
> >  upgrading the VMM process without interrupting the guest.
> >
> > +7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > +----------------------------------------
> > +
> > +:Architectures: arm64
> > +:Target: VM
> > +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> > +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > +             args[0] is the slot number
> > +             args[1] specifies the permisions when the memslot is locked or if
> > +                     all memslots should be unlocked
> > +
> > +The presence of this capability indicates that KVM supports locking the memory
> > +associated with the memslot, and unlocking a previously locked memslot.
> > +
> > +The 'flags' parameter is defined as follows:
> > +
> > +7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> > +-------------------------------------------------
> > +
> > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > +:Architectures: arm64
> > +:Target: VM
> > +:Parameters: args[0] contains the memory slot number
> > +             args[1] contains the permissions for the locked memory:
> > +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> > +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> > +                     (optional) with write permissions
> 
> Nit: Those flag names don't match the ones in the code.
> (Their names in the code are KVM_ARM_LOCK_MEM_READ/KVM_ARM_LOCK_MEM_WRITE)

That's true, I'll change the flags to match.

> 
> What is the reason why KVM_ARM_LOCK_MEMORY_{READ,WRITE} flags need
> to be specified even though memslot already has similar flags ??

I added both flags to make the ABI more flexible, and I don't think it's a
burden on userspace to specify the flags when locking a memslot.

For this reason, I would rather keep it like this for now, unless you think
there's a good reason to remove them.

> 
> > +:Returns: 0 on success; negative error code on failure
> > +
> > +Enabling this capability causes the memory described by the memslot to be
> > +pinned in the process address space and the corresponding stage 2 IPA range
> > +mapped at stage 2. The permissions specified in args[1] apply to both
> > +mappings. The memory pinned with this capability counts towards the max
> > +locked memory limit for the current process.
> > +
> > +The capability should be enabled when no VCPUs are in the kernel executing an
> > +ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
> > +VCPUs have returned. The virtual memory range described by the memslot must be
> > +mapped in the userspace process without any gaps. It is considered an error if
> > +write permissions are specified for a memslot which logs dirty pages.
> > +
> > +7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > +---------------------------------------------------
> > +
> > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > +:Architectures: arm64
> > +:Target: VM
> > +:Parameters: args[0] contains the memory slot number
> > +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> > +                     which unlocks all previously locked memslots.
> > +:Returns: 0 on success; negative error code on failure
> > +
> > +Enabling this capability causes the memory pinned when locking the memslot
> > +specified in args[0] to be unpinned, or, optionally, all memslots to be
> > +unlocked. The IPA range is not unmapped from stage 2.
> > +>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> 
> Nit: An unnecessary line.
> 
> If a memslot with read/write permission is locked with read only,
> and then unlocked, can userspace expect stage 2 mapping for the
> memslot to be updated with read/write ?

Locking a memslot with the read flag would map the memory described by the
memslot with read permissions at stage 2. When the memslot is unlocked, KVM
won't touch the stage 2 entries.

When the memslot is unlocked, the pages (as in, struct page) backing the VM
memory as described by the memslot are unpinned. Then the host's MM subsystem
can treat the memory like any other pages (make them old, new, unmap them, do
nothing, etc), and the MMU notifier will take care of updating the stage 2
entries as necessary.

I guess I should have been more precise in the description. I'll change "causes
the memory pinned when locking the memslot specified in args[0] to be unpinned"
to something that clearly states that the memory in the host that backs the
memslot is unpinned.

> Can userspace delete the memslot that is locked (without unlocking) ?

No, it cannot.

> If so, userspace can expect the corresponding range to be implicitly
> unlocked, correct ?

Userspace must explicitely unlock the memslot before deleting it. I want
userspace to be explicit in its intent.

Thanks,
Alex

> 
> Thanks,
> Reiji
> 
> > +
> >  8. Other capabilities.
> >  ======================
> >
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index 02d378887743..2c50734f048d 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
> >  void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> >  void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> >
> > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > +
> >  static inline unsigned int kvm_get_vmid_bits(void)
> >  {
> >         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index e9b4ad7b5c82..d49905d18cee 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
> >         return 0;
> >  }
> >
> > +static int kvm_arm_lock_memslot_supported(void)
> > +{
> > +       return 0;
> > +}
> > +
> > +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> > +                                            struct kvm_enable_cap *cap)
> > +{
> > +       u64 slot, action_flags;
> > +       u32 action;
> > +
> > +       if (cap->args[2] || cap->args[3])
> > +               return -EINVAL;
> > +
> > +       slot = cap->args[0];
> > +       action = cap->flags;
> > +       action_flags = cap->args[1];
> > +
> > +       switch (action) {
> > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
> > +               return kvm_mmu_lock_memslot(kvm, slot, action_flags);
> > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
> > +               return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
> > +       default:
> > +               return -EINVAL;
> > +       }
> > +}
> > +
> >  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >                             struct kvm_enable_cap *cap)
> >  {
> >         int r;
> >
> > -       if (cap->flags)
> > -               return -EINVAL;
> > -
> >         switch (cap->cap) {
> >         case KVM_CAP_ARM_NISV_TO_USER:
> > +               if (cap->flags)
> > +                       return -EINVAL;
> >                 r = 0;
> >                 kvm->arch.return_nisv_io_abort_to_user = true;
> >                 break;
> > @@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >                 }
> >                 mutex_unlock(&kvm->lock);
> >                 break;
> > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > +               if (!kvm_arm_lock_memslot_supported())
> > +                       return -EINVAL;
> > +               r = kvm_lock_user_memory_region_ioctl(kvm, cap);
> > +               break;
> >         default:
> >                 r = -EINVAL;
> >                 break;
> > @@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> >         return VM_FAULT_SIGBUS;
> >  }
> >
> > -
> >  /**
> >   * kvm_arch_destroy_vm - destroy the VM data structure
> >   * @kvm:       pointer to the KVM struct
> > @@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >         case KVM_CAP_ARM_PTRAUTH_GENERIC:
> >                 r = system_has_full_ptr_auth();
> >                 break;
> > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > +               r = kvm_arm_lock_memslot_supported();
> > +               break;
> >         default:
> >                 r = 0;
> >         }
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 326cdfec74a1..f65bcbc9ae69 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >         return ret;
> >  }
> >
> > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > +{
> > +       struct kvm_memory_slot *memslot;
> > +       int ret;
> > +
> > +       if (slot >= KVM_MEM_SLOTS_NUM)
> > +               return -EINVAL;
> > +
> > +       if (!(flags & KVM_ARM_LOCK_MEM_READ))
> > +               return -EINVAL;
> > +
> > +       mutex_lock(&kvm->lock);
> > +       if (!kvm_lock_all_vcpus(kvm)) {
> > +               ret = -EBUSY;
> > +               goto out_unlock_kvm;
> > +       }
> > +       mutex_lock(&kvm->slots_lock);
> > +
> > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > +       if (!memslot) {
> > +               ret = -EINVAL;
> > +               goto out_unlock_slots;
> > +       }
> > +       if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
> > +           ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
> > +               ret = -EPERM;
> > +               goto out_unlock_slots;
> > +       }
> > +
> > +       ret = -EINVAL;
> > +
> > +out_unlock_slots:
> > +       mutex_unlock(&kvm->slots_lock);
> > +       kvm_unlock_all_vcpus(kvm);
> > +out_unlock_kvm:
> > +       mutex_unlock(&kvm->lock);
> > +       return ret;
> > +}
> > +
> > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > +{
> > +       bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> > +       struct kvm_memory_slot *memslot;
> > +       int ret;
> > +
> > +       if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> > +               return -EINVAL;
> > +
> > +       mutex_lock(&kvm->slots_lock);
> > +
> > +       if (unlock_all) {
> > +               ret = -EINVAL;
> > +               goto out_unlock_slots;
> > +       }
> > +
> > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > +       if (!memslot) {
> > +               ret = -EINVAL;
> > +               goto out_unlock_slots;
> > +       }
> > +
> > +       ret = -EINVAL;
> > +
> > +out_unlock_slots:
> > +       mutex_unlock(&kvm->slots_lock);
> > +       return ret;
> > +}
> > +
> >  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> >  {
> >         if (!kvm->arch.mmu.pgt)
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 1daa45268de2..70c969967557 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> >  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> >  #define KVM_CAP_ARM_MTE 205
> >  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> > +#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
> >
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >
> > @@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
> >  #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
> >  #define KVM_ARM_MTE_COPY_TAGS    _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
> >
> > +/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
> > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK     (1 << 0)
> > +#define   KVM_ARM_LOCK_MEM_READ                                (1 << 0)
> > +#define   KVM_ARM_LOCK_MEM_WRITE                       (1 << 1)
> > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK   (1 << 1)
> > +#define   KVM_ARM_UNLOCK_MEM_ALL                       (1 << 0)
> > +
> >  /* ioctl for vm fd */
> >  #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
> >
> > --
> > 2.33.1
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
@ 2022-02-15 11:03       ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-02-15 11:03 UTC (permalink / raw)
  To: Reiji Watanabe
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi Reiji,

On Mon, Feb 14, 2022 at 09:59:09PM -0800, Reiji Watanabe wrote:
> Hi Alex,
> 
> On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Stage 2 faults triggered by the profiling buffer attempting to write to
> > memory are reported by the SPE hardware by asserting a buffer management
> > event interrupt. Interrupts are by their nature asynchronous, which means
> > that the guest might have changed its stage 1 translation tables since the
> > attempted write. SPE reports the guest virtual address that caused the data
> > abort, not the IPA, which means that KVM would have to walk the guest's
> > stage 1 tables to find the IPA. Using the AT instruction to walk the
> > guest's tables in hardware is not an option because it doesn't report the
> > IPA in the case of a stage 2 fault on a stage 1 table walk.
> >
> > Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> > done by adding a capability that allows the user to pin the memory backing
> > a memslot. The same capability can be used to unlock a memslot, which
> > unpins the pages associated with the memslot, but doesn't unmap the IPA
> > range from stage 2; in this case, the addresses will be unmapped from stage
> > 2 via the MMU notifiers when the process' address space changes.
> >
> > For now, the capability doesn't actually do anything other than checking
> > that the usage is correct; the memory operations will be added in future
> > patches.
> >
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
> >  arch/arm64/include/asm/kvm_mmu.h |  3 ++
> >  arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
> >  arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
> >  include/uapi/linux/kvm.h         |  8 ++++
> >  5 files changed, 174 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index aeeb071c7688..16aa59eae3d9 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
> >  This is intended to support intra-host migration of VMs between userspace VMMs,
> >  upgrading the VMM process without interrupting the guest.
> >
> > +7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > +----------------------------------------
> > +
> > +:Architectures: arm64
> > +:Target: VM
> > +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> > +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > +             args[0] is the slot number
> > +             args[1] specifies the permisions when the memslot is locked or if
> > +                     all memslots should be unlocked
> > +
> > +The presence of this capability indicates that KVM supports locking the memory
> > +associated with the memslot, and unlocking a previously locked memslot.
> > +
> > +The 'flags' parameter is defined as follows:
> > +
> > +7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> > +-------------------------------------------------
> > +
> > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > +:Architectures: arm64
> > +:Target: VM
> > +:Parameters: args[0] contains the memory slot number
> > +             args[1] contains the permissions for the locked memory:
> > +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> > +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> > +                     (optional) with write permissions
> 
> Nit: Those flag names don't match the ones in the code.
> (Their names in the code are KVM_ARM_LOCK_MEM_READ/KVM_ARM_LOCK_MEM_WRITE)

That's true, I'll change the flags to match.

> 
> What is the reason why KVM_ARM_LOCK_MEMORY_{READ,WRITE} flags need
> to be specified even though memslot already has similar flags ??

I added both flags to make the ABI more flexible, and I don't think it's a
burden on userspace to specify the flags when locking a memslot.

For this reason, I would rather keep it like this for now, unless you think
there's a good reason to remove them.

> 
> > +:Returns: 0 on success; negative error code on failure
> > +
> > +Enabling this capability causes the memory described by the memslot to be
> > +pinned in the process address space and the corresponding stage 2 IPA range
> > +mapped at stage 2. The permissions specified in args[1] apply to both
> > +mappings. The memory pinned with this capability counts towards the max
> > +locked memory limit for the current process.
> > +
> > +The capability should be enabled when no VCPUs are in the kernel executing an
> > +ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
> > +VCPUs have returned. The virtual memory range described by the memslot must be
> > +mapped in the userspace process without any gaps. It is considered an error if
> > +write permissions are specified for a memslot which logs dirty pages.
> > +
> > +7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > +---------------------------------------------------
> > +
> > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > +:Architectures: arm64
> > +:Target: VM
> > +:Parameters: args[0] contains the memory slot number
> > +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> > +                     which unlocks all previously locked memslots.
> > +:Returns: 0 on success; negative error code on failure
> > +
> > +Enabling this capability causes the memory pinned when locking the memslot
> > +specified in args[0] to be unpinned, or, optionally, all memslots to be
> > +unlocked. The IPA range is not unmapped from stage 2.
> > +>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> 
> Nit: An unnecessary line.
> 
> If a memslot with read/write permission is locked with read only,
> and then unlocked, can userspace expect stage 2 mapping for the
> memslot to be updated with read/write ?

Locking a memslot with the read flag would map the memory described by the
memslot with read permissions at stage 2. When the memslot is unlocked, KVM
won't touch the stage 2 entries.

When the memslot is unlocked, the pages (as in, struct page) backing the VM
memory as described by the memslot are unpinned. Then the host's MM subsystem
can treat the memory like any other pages (make them old, new, unmap them, do
nothing, etc), and the MMU notifier will take care of updating the stage 2
entries as necessary.

I guess I should have been more precise in the description. I'll change "causes
the memory pinned when locking the memslot specified in args[0] to be unpinned"
to something that clearly states that the memory in the host that backs the
memslot is unpinned.

> Can userspace delete the memslot that is locked (without unlocking) ?

No, it cannot.

> If so, userspace can expect the corresponding range to be implicitly
> unlocked, correct ?

Userspace must explicitely unlock the memslot before deleting it. I want
userspace to be explicit in its intent.

Thanks,
Alex

> 
> Thanks,
> Reiji
> 
> > +
> >  8. Other capabilities.
> >  ======================
> >
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index 02d378887743..2c50734f048d 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
> >  void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> >  void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> >
> > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > +
> >  static inline unsigned int kvm_get_vmid_bits(void)
> >  {
> >         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index e9b4ad7b5c82..d49905d18cee 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
> >         return 0;
> >  }
> >
> > +static int kvm_arm_lock_memslot_supported(void)
> > +{
> > +       return 0;
> > +}
> > +
> > +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> > +                                            struct kvm_enable_cap *cap)
> > +{
> > +       u64 slot, action_flags;
> > +       u32 action;
> > +
> > +       if (cap->args[2] || cap->args[3])
> > +               return -EINVAL;
> > +
> > +       slot = cap->args[0];
> > +       action = cap->flags;
> > +       action_flags = cap->args[1];
> > +
> > +       switch (action) {
> > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
> > +               return kvm_mmu_lock_memslot(kvm, slot, action_flags);
> > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
> > +               return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
> > +       default:
> > +               return -EINVAL;
> > +       }
> > +}
> > +
> >  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >                             struct kvm_enable_cap *cap)
> >  {
> >         int r;
> >
> > -       if (cap->flags)
> > -               return -EINVAL;
> > -
> >         switch (cap->cap) {
> >         case KVM_CAP_ARM_NISV_TO_USER:
> > +               if (cap->flags)
> > +                       return -EINVAL;
> >                 r = 0;
> >                 kvm->arch.return_nisv_io_abort_to_user = true;
> >                 break;
> > @@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >                 }
> >                 mutex_unlock(&kvm->lock);
> >                 break;
> > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > +               if (!kvm_arm_lock_memslot_supported())
> > +                       return -EINVAL;
> > +               r = kvm_lock_user_memory_region_ioctl(kvm, cap);
> > +               break;
> >         default:
> >                 r = -EINVAL;
> >                 break;
> > @@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> >         return VM_FAULT_SIGBUS;
> >  }
> >
> > -
> >  /**
> >   * kvm_arch_destroy_vm - destroy the VM data structure
> >   * @kvm:       pointer to the KVM struct
> > @@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >         case KVM_CAP_ARM_PTRAUTH_GENERIC:
> >                 r = system_has_full_ptr_auth();
> >                 break;
> > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > +               r = kvm_arm_lock_memslot_supported();
> > +               break;
> >         default:
> >                 r = 0;
> >         }
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 326cdfec74a1..f65bcbc9ae69 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >         return ret;
> >  }
> >
> > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > +{
> > +       struct kvm_memory_slot *memslot;
> > +       int ret;
> > +
> > +       if (slot >= KVM_MEM_SLOTS_NUM)
> > +               return -EINVAL;
> > +
> > +       if (!(flags & KVM_ARM_LOCK_MEM_READ))
> > +               return -EINVAL;
> > +
> > +       mutex_lock(&kvm->lock);
> > +       if (!kvm_lock_all_vcpus(kvm)) {
> > +               ret = -EBUSY;
> > +               goto out_unlock_kvm;
> > +       }
> > +       mutex_lock(&kvm->slots_lock);
> > +
> > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > +       if (!memslot) {
> > +               ret = -EINVAL;
> > +               goto out_unlock_slots;
> > +       }
> > +       if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
> > +           ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
> > +               ret = -EPERM;
> > +               goto out_unlock_slots;
> > +       }
> > +
> > +       ret = -EINVAL;
> > +
> > +out_unlock_slots:
> > +       mutex_unlock(&kvm->slots_lock);
> > +       kvm_unlock_all_vcpus(kvm);
> > +out_unlock_kvm:
> > +       mutex_unlock(&kvm->lock);
> > +       return ret;
> > +}
> > +
> > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > +{
> > +       bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> > +       struct kvm_memory_slot *memslot;
> > +       int ret;
> > +
> > +       if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> > +               return -EINVAL;
> > +
> > +       mutex_lock(&kvm->slots_lock);
> > +
> > +       if (unlock_all) {
> > +               ret = -EINVAL;
> > +               goto out_unlock_slots;
> > +       }
> > +
> > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > +       if (!memslot) {
> > +               ret = -EINVAL;
> > +               goto out_unlock_slots;
> > +       }
> > +
> > +       ret = -EINVAL;
> > +
> > +out_unlock_slots:
> > +       mutex_unlock(&kvm->slots_lock);
> > +       return ret;
> > +}
> > +
> >  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> >  {
> >         if (!kvm->arch.mmu.pgt)
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 1daa45268de2..70c969967557 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> >  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> >  #define KVM_CAP_ARM_MTE 205
> >  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> > +#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
> >
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >
> > @@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
> >  #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
> >  #define KVM_ARM_MTE_COPY_TAGS    _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
> >
> > +/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
> > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK     (1 << 0)
> > +#define   KVM_ARM_LOCK_MEM_READ                                (1 << 0)
> > +#define   KVM_ARM_LOCK_MEM_WRITE                       (1 << 1)
> > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK   (1 << 1)
> > +#define   KVM_ARM_UNLOCK_MEM_ALL                       (1 << 0)
> > +
> >  /* ioctl for vm fd */
> >  #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
> >
> > --
> > 2.33.1
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 03/38] KVM: arm64: Implement the memslot lock/unlock functionality
  2022-02-15  7:46     ` Reiji Watanabe
@ 2022-02-15 11:26       ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-02-15 11:26 UTC (permalink / raw)
  To: Reiji Watanabe; +Cc: Marc Zyngier, Will Deacon, kvmarm, Linux ARM

Hi Reiji,

On Mon, Feb 14, 2022 at 11:46:38PM -0800, Reiji Watanabe wrote:
> Hi Alex,
> 
> On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Pin memory in the process address space and map it in the stage 2 tables as
> > a result of userspace enabling the KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > capability; and unpin it from the process address space when the capability
> > is used with the KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK flag.
> >
> > The current implementation has two drawbacks which will be fixed in future
> > patches:
> >
> > - The dcache maintenance is done when the memslot is locked, which means
> >   that it is possible that memory changes made by userspace after the ioctl
> >   completes won't be visible to a guest running with the MMU off.
> >
> > - Tag scrubbing is done when the memslot is locked. If the MTE capability
> >   is enabled after the ioctl, the guest will be able to access unsanitised
> >   pages. This is prevented by forbidding userspace to enable the MTE
> >   capability if any memslots are locked.
> >
> > Only PAGE_SIZE mappings are supported at stage 2.
> >
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  Documentation/virt/kvm/api.rst    |   4 +-
> >  arch/arm64/include/asm/kvm_host.h |  11 ++
> >  arch/arm64/kvm/arm.c              |  22 +++-
> >  arch/arm64/kvm/mmu.c              | 204 ++++++++++++++++++++++++++++--
> >  4 files changed, 226 insertions(+), 15 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 16aa59eae3d9..0ac12a730013 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6979,8 +6979,8 @@ write permissions are specified for a memslot which logs dirty pages.
> >
> >  Enabling this capability causes the memory pinned when locking the memslot
> >  specified in args[0] to be unpinned, or, optionally, all memslots to be
> > -unlocked. The IPA range is not unmapped from stage 2.
> > ->>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> > +unlocked. The IPA range is not unmapped from stage 2. It is considered an error
> > +to attempt to unlock a memslot which is not locked.
> >
> >  8. Other capabilities.
> >  ======================
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 733621e41900..7fd70ad90c16 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -99,7 +99,18 @@ struct kvm_s2_mmu {
> >         struct kvm_arch *arch;
> >  };
> >
> > +#define KVM_MEMSLOT_LOCK_READ          (1 << 0)
> > +#define KVM_MEMSLOT_LOCK_WRITE         (1 << 1)
> > +#define KVM_MEMSLOT_LOCK_MASK          0x3
> > +
> > +struct kvm_memory_slot_page {
> > +       struct list_head list;
> > +       struct page *page;
> > +};
> > +
> >  struct kvm_arch_memory_slot {
> > +       struct kvm_memory_slot_page pages;
> > +       u32 flags;
> >  };
> >
> >  struct kvm_arch {
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index d49905d18cee..b9b8b43835e3 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -106,6 +106,25 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> >         }
> >  }
> >
> > +static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
> > +{
> > +       struct kvm_memslots *slots = kvm_memslots(kvm);
> > +       struct kvm_memory_slot *memslot;
> > +       bool has_locked_memslots = false;
> > +       int idx;
> > +
> > +       idx = srcu_read_lock(&kvm->srcu);
> > +       kvm_for_each_memslot(memslot, slots) {
> > +               if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
> > +                       has_locked_memslots = true;
> > +                       break;
> > +               }
> > +       }
> > +       srcu_read_unlock(&kvm->srcu, idx);
> > +
> > +       return has_locked_memslots;
> > +}
> > +
> >  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >                             struct kvm_enable_cap *cap)
> >  {
> > @@ -120,7 +139,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >                 break;
> >         case KVM_CAP_ARM_MTE:
> >                 mutex_lock(&kvm->lock);
> > -               if (!system_supports_mte() || kvm->created_vcpus) {
> > +               if (!system_supports_mte() || kvm->created_vcpus ||
> > +                   (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
> >                         r = -EINVAL;
> >                 } else {
> >                         r = 0;
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index f65bcbc9ae69..b0a8e61315e4 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -72,6 +72,11 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
> >         return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
> >  }
> >
> > +static bool memslot_is_locked(struct kvm_memory_slot *memslot)
> > +{
> > +       return memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK;
> > +}
> > +
> >  /**
> >   * kvm_flush_remote_tlbs() - flush all VM TLB entries for v7/8
> >   * @kvm:       pointer to kvm structure.
> > @@ -769,6 +774,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
> >         if (map_size == PAGE_SIZE)
> >                 return true;
> >
> > +       /* Allow only PAGE_SIZE mappings for locked memslots */
> > +       if (memslot_is_locked(memslot))
> > +               return false;
> > +
> >         size = memslot->npages * PAGE_SIZE;
> >
> >         gpa_start = memslot->base_gfn << PAGE_SHIFT;
> > @@ -1296,6 +1305,159 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >         return ret;
> >  }
> >
> > +static int try_rlimit_memlock(unsigned long npages)
> > +{
> > +       unsigned long lock_limit;
> > +       bool has_lock_cap;
> > +       int ret = 0;
> > +
> > +       has_lock_cap = capable(CAP_IPC_LOCK);
> > +       if (has_lock_cap)
> > +               goto out;
> > +
> > +       lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > +
> > +       mmap_read_lock(current->mm);
> > +       if (npages + current->mm->locked_vm > lock_limit)
> > +               ret = -ENOMEM;
> > +       mmap_read_unlock(current->mm);
> > +
> > +out:
> > +       return ret;
> > +}
> > +
> > +static void unpin_memslot_pages(struct kvm_memory_slot *memslot, bool writable)
> > +{
> > +       struct kvm_memory_slot_page *entry, *tmp;
> > +
> > +       list_for_each_entry_safe(entry, tmp, &memslot->arch.pages.list, list) {
> > +               if (writable)
> > +                       set_page_dirty_lock(entry->page);
> > +               unpin_user_page(entry->page);
> > +               kfree(entry);
> > +       }
> > +}
> 
> Shouldn't this be done when the memslot is deleted ?
> (Or should the locked memslot be prevented from deleting ?)

I add code to prevent changes to a locked  memslot in patch #9 ("KVM: arm64:
Deny changes to locked memslots").

> 
> > +
> > +static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> > +                       u64 flags)
> > +{
> > +       struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
> > +       struct kvm_memory_slot_page *page_entry;
> > +       bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
> > +       enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> > +       struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> > +       struct vm_area_struct *vma;
> > +       unsigned long npages = memslot->npages;
> > +       unsigned int pin_flags = FOLL_LONGTERM;
> > +       unsigned long i, hva, ipa, mmu_seq;
> > +       int ret;
> > +
> > +       ret = try_rlimit_memlock(npages);
> 
> Even if the memory for the hva described by the memslot is already
> 'locked' by mlock or etc, is this checking needed ?

I believe it is, mlock uses a different mechanism to pin the pages, it sets the
VM_LOCKED VMA flag. And even if a VMA is mlocked, it doesn't mean that the host's
stage 1 is populated because of the MLOCK_ONFAULT mlock() flag. If userspace
wants to lock the same memory twice, then it's free to do it, and suffer any
possible consequences (like running into a size limit).

> 
> 
> > +       if (ret)
> > +               return -ENOMEM;
> > +
> > +       INIT_LIST_HEAD(&memslot->arch.pages.list);
> > +
> > +       if (writable) {
> > +               prot |= KVM_PGTABLE_PROT_W;
> > +               pin_flags |= FOLL_WRITE;
> 
> The lock flag is just for stage 2 mapping, correct ?
> I wonder if it is appropriate for KVM to set 'pin_flags', which is
> passed to pin_user_pages(), based on the lock flag.

I don't see why not, KVM is the consumer of the GUP API.

> 
> > +       }
> > +
> > +       hva = memslot->userspace_addr;
> > +       ipa = memslot->base_gfn << PAGE_SHIFT;
> > +
> > +       mmu_seq = kvm->mmu_notifier_seq;
> > +       smp_rmb();
> > +
> > +       for (i = 0; i < npages; i++) {
> > +               page_entry = kzalloc(sizeof(*page_entry), GFP_KERNEL);
> > +               if (!page_entry) {
> > +                       unpin_memslot_pages(memslot, writable);
> > +                       ret = -ENOMEM;
> > +                       goto out_err;
> 
> Nit: It seems we can call unpin_memslot_pages() from 'out_err'
> instead of calling it from each of the error cases.

I'll see if I can remove the repetition.

> 
> > +               }
> > +
> > +               mmap_read_lock(current->mm);
> > +               ret = pin_user_pages(hva, 1, pin_flags, &page_entry->page, &vma);
> > +               if (ret != 1) {
> > +                       mmap_read_unlock(current->mm);
> > +                       unpin_memslot_pages(memslot, writable);
> > +                       ret = -ENOMEM;
> > +                       goto out_err;
> > +               }
> > +               if (kvm_has_mte(kvm)) {
> > +                       if (vma->vm_flags & VM_SHARED) {
> > +                               ret = -EFAULT;
> > +                       } else {
> > +                               ret = sanitise_mte_tags(kvm,
> > +                                       page_to_pfn(page_entry->page),
> > +                                       PAGE_SIZE);
> > +                       }
> > +                       if (ret) {
> > +                               mmap_read_unlock(current->mm);
> > +                               goto out_err;
> > +                       }
> > +               }
> > +               mmap_read_unlock(current->mm);
> > +
> > +               ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
> > +               if (ret) {
> > +                       unpin_memslot_pages(memslot, writable);
> > +                       goto out_err;
> > +               }
> > +
> > +               spin_lock(&kvm->mmu_lock);
> > +               if (mmu_notifier_retry(kvm, mmu_seq)) {
> > +                       spin_unlock(&kvm->mmu_lock);
> > +                       unpin_memslot_pages(memslot, writable);
> > +                       ret = -EAGAIN;
> > +                       goto out_err;
> > +               }
> > +
> > +               ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> > +                                            page_to_phys(page_entry->page),
> > +                                            prot, &cache);
> > +               spin_unlock(&kvm->mmu_lock);
> > +
> > +               if (ret) {
> > +                       kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> > +                                                i << PAGE_SHIFT);
> > +                       unpin_memslot_pages(memslot, writable);
> > +                       goto out_err;
> > +               }
> > +               list_add(&page_entry->list, &memslot->arch.pages.list);
> > +
> > +               hva += PAGE_SIZE;
> > +               ipa += PAGE_SIZE;
> > +       }
> > +
> > +
> > +       /*
> > +        * Even though we've checked the limit at the start, we can still exceed
> > +        * it if userspace locked other pages in the meantime or if the
> > +        * CAP_IPC_LOCK capability has been revoked.
> > +        */
> > +       ret = account_locked_vm(current->mm, npages, true);
> > +       if (ret) {
> > +               kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> > +                                        npages << PAGE_SHIFT);
> > +               unpin_memslot_pages(memslot, writable);
> > +               goto out_err;
> > +       }
> > +
> > +       memslot->arch.flags = KVM_MEMSLOT_LOCK_READ;
> > +       if (writable)
> > +               memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
> > +
> > +       kvm_mmu_free_memory_cache(&cache);
> > +
> > +       return 0;
> > +
> > +out_err:
> > +       kvm_mmu_free_memory_cache(&cache);
> > +       return ret;
> > +}
> > +
> >  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >  {
> >         struct kvm_memory_slot *memslot;
> > @@ -1325,7 +1487,12 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >                 goto out_unlock_slots;
> >         }
> >
> > -       ret = -EINVAL;
> > +       if (memslot_is_locked(memslot)) {
> > +               ret = -EBUSY;
> > +               goto out_unlock_slots;
> > +       }
> > +
> > +       ret = lock_memslot(kvm, memslot, flags);
> >
> >  out_unlock_slots:
> >         mutex_unlock(&kvm->slots_lock);
> > @@ -1335,11 +1502,22 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >         return ret;
> >  }
> >
> > +static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> > +{
> > +       bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
> > +       unsigned long npages = memslot->npages;
> > +
> > +       unpin_memslot_pages(memslot, writable);
> > +       account_locked_vm(current->mm, npages, false);
> > +
> > +       memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> > +}
> 
> What if the memslot was locked with read only but the memslot
> has read/write permission set ?  Shouldn't the stage 2 mapping
> updated if KVM allows for the scenario ?

If the memslot is locked with read flags, then the stage 2 entries are mapped
read-only, and subsequent stage 2 data aborts will relax the permissions if
needed. Userspace clearly wants the memory to be mapped at stage 2 with
read-only permissions, otherwise it would have specified both read and write
permissions when locking the memslot, I don't see why KVM should do more than
what was requested of it.

If you find this awkward, there is already a case in KVM where userspace wants
the stage 2 entries to be read-only so the guest will cause write faults:
userspace does this when it wants to migrate the VM by setting the
KVM_MEM_LOG_DIRTY_PAGES memslot flag.

Thanks,
Alex

> 
> Thanks,
> Reiji
> 
> 
> > +
> >  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >  {
> >         bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> >         struct kvm_memory_slot *memslot;
> > -       int ret;
> > +       int ret = 0;
> >
> >         if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> >                 return -EINVAL;
> > @@ -1347,18 +1525,20 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >         mutex_lock(&kvm->slots_lock);
> >
> >         if (unlock_all) {
> > -               ret = -EINVAL;
> > -               goto out_unlock_slots;
> > -       }
> > -
> > -       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > -       if (!memslot) {
> > -               ret = -EINVAL;
> > -               goto out_unlock_slots;
> > +               kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> > +                       if (!memslot_is_locked(memslot))
> > +                               continue;
> > +                       unlock_memslot(kvm, memslot);
> > +               }
> > +       } else {
> > +               memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > +               if (!memslot || !memslot_is_locked(memslot)) {
> > +                       ret = -EINVAL;
> > +                       goto out_unlock_slots;
> > +               }
> > +               unlock_memslot(kvm, memslot);
> >         }
> >
> > -       ret = -EINVAL;
> > -
> >  out_unlock_slots:
> >         mutex_unlock(&kvm->slots_lock);
> >         return ret;
> > --
> > 2.33.1
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 03/38] KVM: arm64: Implement the memslot lock/unlock functionality
@ 2022-02-15 11:26       ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-02-15 11:26 UTC (permalink / raw)
  To: Reiji Watanabe
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi Reiji,

On Mon, Feb 14, 2022 at 11:46:38PM -0800, Reiji Watanabe wrote:
> Hi Alex,
> 
> On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Pin memory in the process address space and map it in the stage 2 tables as
> > a result of userspace enabling the KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > capability; and unpin it from the process address space when the capability
> > is used with the KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK flag.
> >
> > The current implementation has two drawbacks which will be fixed in future
> > patches:
> >
> > - The dcache maintenance is done when the memslot is locked, which means
> >   that it is possible that memory changes made by userspace after the ioctl
> >   completes won't be visible to a guest running with the MMU off.
> >
> > - Tag scrubbing is done when the memslot is locked. If the MTE capability
> >   is enabled after the ioctl, the guest will be able to access unsanitised
> >   pages. This is prevented by forbidding userspace to enable the MTE
> >   capability if any memslots are locked.
> >
> > Only PAGE_SIZE mappings are supported at stage 2.
> >
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  Documentation/virt/kvm/api.rst    |   4 +-
> >  arch/arm64/include/asm/kvm_host.h |  11 ++
> >  arch/arm64/kvm/arm.c              |  22 +++-
> >  arch/arm64/kvm/mmu.c              | 204 ++++++++++++++++++++++++++++--
> >  4 files changed, 226 insertions(+), 15 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 16aa59eae3d9..0ac12a730013 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6979,8 +6979,8 @@ write permissions are specified for a memslot which logs dirty pages.
> >
> >  Enabling this capability causes the memory pinned when locking the memslot
> >  specified in args[0] to be unpinned, or, optionally, all memslots to be
> > -unlocked. The IPA range is not unmapped from stage 2.
> > ->>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> > +unlocked. The IPA range is not unmapped from stage 2. It is considered an error
> > +to attempt to unlock a memslot which is not locked.
> >
> >  8. Other capabilities.
> >  ======================
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 733621e41900..7fd70ad90c16 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -99,7 +99,18 @@ struct kvm_s2_mmu {
> >         struct kvm_arch *arch;
> >  };
> >
> > +#define KVM_MEMSLOT_LOCK_READ          (1 << 0)
> > +#define KVM_MEMSLOT_LOCK_WRITE         (1 << 1)
> > +#define KVM_MEMSLOT_LOCK_MASK          0x3
> > +
> > +struct kvm_memory_slot_page {
> > +       struct list_head list;
> > +       struct page *page;
> > +};
> > +
> >  struct kvm_arch_memory_slot {
> > +       struct kvm_memory_slot_page pages;
> > +       u32 flags;
> >  };
> >
> >  struct kvm_arch {
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index d49905d18cee..b9b8b43835e3 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -106,6 +106,25 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> >         }
> >  }
> >
> > +static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
> > +{
> > +       struct kvm_memslots *slots = kvm_memslots(kvm);
> > +       struct kvm_memory_slot *memslot;
> > +       bool has_locked_memslots = false;
> > +       int idx;
> > +
> > +       idx = srcu_read_lock(&kvm->srcu);
> > +       kvm_for_each_memslot(memslot, slots) {
> > +               if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
> > +                       has_locked_memslots = true;
> > +                       break;
> > +               }
> > +       }
> > +       srcu_read_unlock(&kvm->srcu, idx);
> > +
> > +       return has_locked_memslots;
> > +}
> > +
> >  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >                             struct kvm_enable_cap *cap)
> >  {
> > @@ -120,7 +139,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >                 break;
> >         case KVM_CAP_ARM_MTE:
> >                 mutex_lock(&kvm->lock);
> > -               if (!system_supports_mte() || kvm->created_vcpus) {
> > +               if (!system_supports_mte() || kvm->created_vcpus ||
> > +                   (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
> >                         r = -EINVAL;
> >                 } else {
> >                         r = 0;
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index f65bcbc9ae69..b0a8e61315e4 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -72,6 +72,11 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
> >         return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
> >  }
> >
> > +static bool memslot_is_locked(struct kvm_memory_slot *memslot)
> > +{
> > +       return memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK;
> > +}
> > +
> >  /**
> >   * kvm_flush_remote_tlbs() - flush all VM TLB entries for v7/8
> >   * @kvm:       pointer to kvm structure.
> > @@ -769,6 +774,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
> >         if (map_size == PAGE_SIZE)
> >                 return true;
> >
> > +       /* Allow only PAGE_SIZE mappings for locked memslots */
> > +       if (memslot_is_locked(memslot))
> > +               return false;
> > +
> >         size = memslot->npages * PAGE_SIZE;
> >
> >         gpa_start = memslot->base_gfn << PAGE_SHIFT;
> > @@ -1296,6 +1305,159 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >         return ret;
> >  }
> >
> > +static int try_rlimit_memlock(unsigned long npages)
> > +{
> > +       unsigned long lock_limit;
> > +       bool has_lock_cap;
> > +       int ret = 0;
> > +
> > +       has_lock_cap = capable(CAP_IPC_LOCK);
> > +       if (has_lock_cap)
> > +               goto out;
> > +
> > +       lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > +
> > +       mmap_read_lock(current->mm);
> > +       if (npages + current->mm->locked_vm > lock_limit)
> > +               ret = -ENOMEM;
> > +       mmap_read_unlock(current->mm);
> > +
> > +out:
> > +       return ret;
> > +}
> > +
> > +static void unpin_memslot_pages(struct kvm_memory_slot *memslot, bool writable)
> > +{
> > +       struct kvm_memory_slot_page *entry, *tmp;
> > +
> > +       list_for_each_entry_safe(entry, tmp, &memslot->arch.pages.list, list) {
> > +               if (writable)
> > +                       set_page_dirty_lock(entry->page);
> > +               unpin_user_page(entry->page);
> > +               kfree(entry);
> > +       }
> > +}
> 
> Shouldn't this be done when the memslot is deleted ?
> (Or should the locked memslot be prevented from deleting ?)

I add code to prevent changes to a locked  memslot in patch #9 ("KVM: arm64:
Deny changes to locked memslots").

> 
> > +
> > +static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> > +                       u64 flags)
> > +{
> > +       struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
> > +       struct kvm_memory_slot_page *page_entry;
> > +       bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
> > +       enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> > +       struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> > +       struct vm_area_struct *vma;
> > +       unsigned long npages = memslot->npages;
> > +       unsigned int pin_flags = FOLL_LONGTERM;
> > +       unsigned long i, hva, ipa, mmu_seq;
> > +       int ret;
> > +
> > +       ret = try_rlimit_memlock(npages);
> 
> Even if the memory for the hva described by the memslot is already
> 'locked' by mlock or etc, is this checking needed ?

I believe it is, mlock uses a different mechanism to pin the pages, it sets the
VM_LOCKED VMA flag. And even if a VMA is mlocked, it doesn't mean that the host's
stage 1 is populated because of the MLOCK_ONFAULT mlock() flag. If userspace
wants to lock the same memory twice, then it's free to do it, and suffer any
possible consequences (like running into a size limit).

> 
> 
> > +       if (ret)
> > +               return -ENOMEM;
> > +
> > +       INIT_LIST_HEAD(&memslot->arch.pages.list);
> > +
> > +       if (writable) {
> > +               prot |= KVM_PGTABLE_PROT_W;
> > +               pin_flags |= FOLL_WRITE;
> 
> The lock flag is just for stage 2 mapping, correct ?
> I wonder if it is appropriate for KVM to set 'pin_flags', which is
> passed to pin_user_pages(), based on the lock flag.

I don't see why not, KVM is the consumer of the GUP API.

> 
> > +       }
> > +
> > +       hva = memslot->userspace_addr;
> > +       ipa = memslot->base_gfn << PAGE_SHIFT;
> > +
> > +       mmu_seq = kvm->mmu_notifier_seq;
> > +       smp_rmb();
> > +
> > +       for (i = 0; i < npages; i++) {
> > +               page_entry = kzalloc(sizeof(*page_entry), GFP_KERNEL);
> > +               if (!page_entry) {
> > +                       unpin_memslot_pages(memslot, writable);
> > +                       ret = -ENOMEM;
> > +                       goto out_err;
> 
> Nit: It seems we can call unpin_memslot_pages() from 'out_err'
> instead of calling it from each of the error cases.

I'll see if I can remove the repetition.

> 
> > +               }
> > +
> > +               mmap_read_lock(current->mm);
> > +               ret = pin_user_pages(hva, 1, pin_flags, &page_entry->page, &vma);
> > +               if (ret != 1) {
> > +                       mmap_read_unlock(current->mm);
> > +                       unpin_memslot_pages(memslot, writable);
> > +                       ret = -ENOMEM;
> > +                       goto out_err;
> > +               }
> > +               if (kvm_has_mte(kvm)) {
> > +                       if (vma->vm_flags & VM_SHARED) {
> > +                               ret = -EFAULT;
> > +                       } else {
> > +                               ret = sanitise_mte_tags(kvm,
> > +                                       page_to_pfn(page_entry->page),
> > +                                       PAGE_SIZE);
> > +                       }
> > +                       if (ret) {
> > +                               mmap_read_unlock(current->mm);
> > +                               goto out_err;
> > +                       }
> > +               }
> > +               mmap_read_unlock(current->mm);
> > +
> > +               ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
> > +               if (ret) {
> > +                       unpin_memslot_pages(memslot, writable);
> > +                       goto out_err;
> > +               }
> > +
> > +               spin_lock(&kvm->mmu_lock);
> > +               if (mmu_notifier_retry(kvm, mmu_seq)) {
> > +                       spin_unlock(&kvm->mmu_lock);
> > +                       unpin_memslot_pages(memslot, writable);
> > +                       ret = -EAGAIN;
> > +                       goto out_err;
> > +               }
> > +
> > +               ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> > +                                            page_to_phys(page_entry->page),
> > +                                            prot, &cache);
> > +               spin_unlock(&kvm->mmu_lock);
> > +
> > +               if (ret) {
> > +                       kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> > +                                                i << PAGE_SHIFT);
> > +                       unpin_memslot_pages(memslot, writable);
> > +                       goto out_err;
> > +               }
> > +               list_add(&page_entry->list, &memslot->arch.pages.list);
> > +
> > +               hva += PAGE_SIZE;
> > +               ipa += PAGE_SIZE;
> > +       }
> > +
> > +
> > +       /*
> > +        * Even though we've checked the limit at the start, we can still exceed
> > +        * it if userspace locked other pages in the meantime or if the
> > +        * CAP_IPC_LOCK capability has been revoked.
> > +        */
> > +       ret = account_locked_vm(current->mm, npages, true);
> > +       if (ret) {
> > +               kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> > +                                        npages << PAGE_SHIFT);
> > +               unpin_memslot_pages(memslot, writable);
> > +               goto out_err;
> > +       }
> > +
> > +       memslot->arch.flags = KVM_MEMSLOT_LOCK_READ;
> > +       if (writable)
> > +               memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
> > +
> > +       kvm_mmu_free_memory_cache(&cache);
> > +
> > +       return 0;
> > +
> > +out_err:
> > +       kvm_mmu_free_memory_cache(&cache);
> > +       return ret;
> > +}
> > +
> >  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >  {
> >         struct kvm_memory_slot *memslot;
> > @@ -1325,7 +1487,12 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >                 goto out_unlock_slots;
> >         }
> >
> > -       ret = -EINVAL;
> > +       if (memslot_is_locked(memslot)) {
> > +               ret = -EBUSY;
> > +               goto out_unlock_slots;
> > +       }
> > +
> > +       ret = lock_memslot(kvm, memslot, flags);
> >
> >  out_unlock_slots:
> >         mutex_unlock(&kvm->slots_lock);
> > @@ -1335,11 +1502,22 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >         return ret;
> >  }
> >
> > +static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> > +{
> > +       bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
> > +       unsigned long npages = memslot->npages;
> > +
> > +       unpin_memslot_pages(memslot, writable);
> > +       account_locked_vm(current->mm, npages, false);
> > +
> > +       memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> > +}
> 
> What if the memslot was locked with read only but the memslot
> has read/write permission set ?  Shouldn't the stage 2 mapping
> updated if KVM allows for the scenario ?

If the memslot is locked with read flags, then the stage 2 entries are mapped
read-only, and subsequent stage 2 data aborts will relax the permissions if
needed. Userspace clearly wants the memory to be mapped at stage 2 with
read-only permissions, otherwise it would have specified both read and write
permissions when locking the memslot, I don't see why KVM should do more than
what was requested of it.

If you find this awkward, there is already a case in KVM where userspace wants
the stage 2 entries to be read-only so the guest will cause write faults:
userspace does this when it wants to migrate the VM by setting the
KVM_MEM_LOG_DIRTY_PAGES memslot flag.

Thanks,
Alex

> 
> Thanks,
> Reiji
> 
> 
> > +
> >  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >  {
> >         bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> >         struct kvm_memory_slot *memslot;
> > -       int ret;
> > +       int ret = 0;
> >
> >         if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> >                 return -EINVAL;
> > @@ -1347,18 +1525,20 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >         mutex_lock(&kvm->slots_lock);
> >
> >         if (unlock_all) {
> > -               ret = -EINVAL;
> > -               goto out_unlock_slots;
> > -       }
> > -
> > -       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > -       if (!memslot) {
> > -               ret = -EINVAL;
> > -               goto out_unlock_slots;
> > +               kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> > +                       if (!memslot_is_locked(memslot))
> > +                               continue;
> > +                       unlock_memslot(kvm, memslot);
> > +               }
> > +       } else {
> > +               memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > +               if (!memslot || !memslot_is_locked(memslot)) {
> > +                       ret = -EINVAL;
> > +                       goto out_unlock_slots;
> > +               }
> > +               unlock_memslot(kvm, memslot);
> >         }
> >
> > -       ret = -EINVAL;
> > -
> >  out_unlock_slots:
> >         mutex_unlock(&kvm->slots_lock);
> >         return ret;
> > --
> > 2.33.1
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
  2022-02-15 11:03       ` Alexandru Elisei
@ 2022-02-15 12:02         ` Marc Zyngier
  -1 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2022-02-15 12:02 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: Will Deacon, kvmarm, Linux ARM

On Tue, 15 Feb 2022 11:03:59 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> > If a memslot with read/write permission is locked with read only,
> > and then unlocked, can userspace expect stage 2 mapping for the
> > memslot to be updated with read/write ?
> 
> Locking a memslot with the read flag would map the memory described by the
> memslot with read permissions at stage 2. When the memslot is unlocked, KVM
> won't touch the stage 2 entries.
> 
> When the memslot is unlocked, the pages (as in, struct page) backing the VM
> memory as described by the memslot are unpinned. Then the host's MM subsystem
> can treat the memory like any other pages (make them old, new, unmap them, do
> nothing, etc), and the MMU notifier will take care of updating the stage 2
> entries as necessary.
> 
> I guess I should have been more precise in the description. I'll
> change "causes the memory pinned when locking the memslot specified
> in args[0] to be unpinned" to something that clearly states that the
> memory in the host that backs the memslot is unpinned.
> 
> > Can userspace delete the memslot that is locked (without unlocking) ?
> 
> No, it cannot.
> 
> > If so, userspace can expect the corresponding range to be implicitly
> > unlocked, correct ?
> 
> Userspace must explicitely unlock the memslot before deleting it. I want
> userspace to be explicit in its intent.

Does it get in the way of making this robust wrt userspace being
killed (or terminating without unlock first)?

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
@ 2022-02-15 12:02         ` Marc Zyngier
  0 siblings, 0 replies; 118+ messages in thread
From: Marc Zyngier @ 2022-02-15 12:02 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Reiji Watanabe, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

On Tue, 15 Feb 2022 11:03:59 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> > If a memslot with read/write permission is locked with read only,
> > and then unlocked, can userspace expect stage 2 mapping for the
> > memslot to be updated with read/write ?
> 
> Locking a memslot with the read flag would map the memory described by the
> memslot with read permissions at stage 2. When the memslot is unlocked, KVM
> won't touch the stage 2 entries.
> 
> When the memslot is unlocked, the pages (as in, struct page) backing the VM
> memory as described by the memslot are unpinned. Then the host's MM subsystem
> can treat the memory like any other pages (make them old, new, unmap them, do
> nothing, etc), and the MMU notifier will take care of updating the stage 2
> entries as necessary.
> 
> I guess I should have been more precise in the description. I'll
> change "causes the memory pinned when locking the memslot specified
> in args[0] to be unpinned" to something that clearly states that the
> memory in the host that backs the memslot is unpinned.
> 
> > Can userspace delete the memslot that is locked (without unlocking) ?
> 
> No, it cannot.
> 
> > If so, userspace can expect the corresponding range to be implicitly
> > unlocked, correct ?
> 
> Userspace must explicitely unlock the memslot before deleting it. I want
> userspace to be explicit in its intent.

Does it get in the way of making this robust wrt userspace being
killed (or terminating without unlock first)?

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
  2022-02-15 12:02         ` Marc Zyngier
@ 2022-02-15 12:13           ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-02-15 12:13 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Will Deacon, kvmarm, Linux ARM

Hi,

On Tue, Feb 15, 2022 at 12:02:26PM +0000, Marc Zyngier wrote:
> On Tue, 15 Feb 2022 11:03:59 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > > If a memslot with read/write permission is locked with read only,
> > > and then unlocked, can userspace expect stage 2 mapping for the
> > > memslot to be updated with read/write ?
> > 
> > Locking a memslot with the read flag would map the memory described by the
> > memslot with read permissions at stage 2. When the memslot is unlocked, KVM
> > won't touch the stage 2 entries.
> > 
> > When the memslot is unlocked, the pages (as in, struct page) backing the VM
> > memory as described by the memslot are unpinned. Then the host's MM subsystem
> > can treat the memory like any other pages (make them old, new, unmap them, do
> > nothing, etc), and the MMU notifier will take care of updating the stage 2
> > entries as necessary.
> > 
> > I guess I should have been more precise in the description. I'll
> > change "causes the memory pinned when locking the memslot specified
> > in args[0] to be unpinned" to something that clearly states that the
> > memory in the host that backs the memslot is unpinned.
> > 
> > > Can userspace delete the memslot that is locked (without unlocking) ?
> > 
> > No, it cannot.
> > 
> > > If so, userspace can expect the corresponding range to be implicitly
> > > unlocked, correct ?
> > 
> > Userspace must explicitely unlock the memslot before deleting it. I want
> > userspace to be explicit in its intent.
> 
> Does it get in the way of making this robust wrt userspace being
> killed (or terminating without unlock first)?

Patch #8 ("KVM: arm64: Unlock memslots after stage 2 tables are freed")
teaches kvm_arch_flush_shadow_all() to unlock all locked memslots.

Thanks,
Alex

> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
@ 2022-02-15 12:13           ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-02-15 12:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Reiji Watanabe, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi,

On Tue, Feb 15, 2022 at 12:02:26PM +0000, Marc Zyngier wrote:
> On Tue, 15 Feb 2022 11:03:59 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > > If a memslot with read/write permission is locked with read only,
> > > and then unlocked, can userspace expect stage 2 mapping for the
> > > memslot to be updated with read/write ?
> > 
> > Locking a memslot with the read flag would map the memory described by the
> > memslot with read permissions at stage 2. When the memslot is unlocked, KVM
> > won't touch the stage 2 entries.
> > 
> > When the memslot is unlocked, the pages (as in, struct page) backing the VM
> > memory as described by the memslot are unpinned. Then the host's MM subsystem
> > can treat the memory like any other pages (make them old, new, unmap them, do
> > nothing, etc), and the MMU notifier will take care of updating the stage 2
> > entries as necessary.
> > 
> > I guess I should have been more precise in the description. I'll
> > change "causes the memory pinned when locking the memslot specified
> > in args[0] to be unpinned" to something that clearly states that the
> > memory in the host that backs the memslot is unpinned.
> > 
> > > Can userspace delete the memslot that is locked (without unlocking) ?
> > 
> > No, it cannot.
> > 
> > > If so, userspace can expect the corresponding range to be implicitly
> > > unlocked, correct ?
> > 
> > Userspace must explicitely unlock the memslot before deleting it. I want
> > userspace to be explicit in its intent.
> 
> Does it get in the way of making this robust wrt userspace being
> killed (or terminating without unlock first)?

Patch #8 ("KVM: arm64: Unlock memslots after stage 2 tables are freed")
teaches kvm_arch_flush_shadow_all() to unlock all locked memslots.

Thanks,
Alex

> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
  2022-02-15 11:03       ` Alexandru Elisei
@ 2022-02-17  7:35         ` Reiji Watanabe
  -1 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-17  7:35 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: Marc Zyngier, Will Deacon, kvmarm, Linux ARM

Hi Alex,

On Tue, Feb 15, 2022 at 3:03 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi Reiji,
>
> On Mon, Feb 14, 2022 at 09:59:09PM -0800, Reiji Watanabe wrote:
> > Hi Alex,
> >
> > On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > Stage 2 faults triggered by the profiling buffer attempting to write to
> > > memory are reported by the SPE hardware by asserting a buffer management
> > > event interrupt. Interrupts are by their nature asynchronous, which means
> > > that the guest might have changed its stage 1 translation tables since the
> > > attempted write. SPE reports the guest virtual address that caused the data
> > > abort, not the IPA, which means that KVM would have to walk the guest's
> > > stage 1 tables to find the IPA. Using the AT instruction to walk the
> > > guest's tables in hardware is not an option because it doesn't report the
> > > IPA in the case of a stage 2 fault on a stage 1 table walk.
> > >
> > > Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> > > done by adding a capability that allows the user to pin the memory backing
> > > a memslot. The same capability can be used to unlock a memslot, which
> > > unpins the pages associated with the memslot, but doesn't unmap the IPA
> > > range from stage 2; in this case, the addresses will be unmapped from stage
> > > 2 via the MMU notifiers when the process' address space changes.
> > >
> > > For now, the capability doesn't actually do anything other than checking
> > > that the usage is correct; the memory operations will be added in future
> > > patches.
> > >
> > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > ---
> > >  Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
> > >  arch/arm64/include/asm/kvm_mmu.h |  3 ++
> > >  arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
> > >  arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
> > >  include/uapi/linux/kvm.h         |  8 ++++
> > >  5 files changed, 174 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > index aeeb071c7688..16aa59eae3d9 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
> > >  This is intended to support intra-host migration of VMs between userspace VMMs,
> > >  upgrading the VMM process without interrupting the guest.
> > >
> > > +7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > +----------------------------------------
> > > +
> > > +:Architectures: arm64
> > > +:Target: VM
> > > +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> > > +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > +             args[0] is the slot number
> > > +             args[1] specifies the permisions when the memslot is locked or if
> > > +                     all memslots should be unlocked
> > > +
> > > +The presence of this capability indicates that KVM supports locking the memory
> > > +associated with the memslot, and unlocking a previously locked memslot.
> > > +
> > > +The 'flags' parameter is defined as follows:
> > > +
> > > +7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> > > +-------------------------------------------------
> > > +
> > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > +:Architectures: arm64
> > > +:Target: VM
> > > +:Parameters: args[0] contains the memory slot number
> > > +             args[1] contains the permissions for the locked memory:
> > > +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> > > +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> > > +                     (optional) with write permissions
> >
> > Nit: Those flag names don't match the ones in the code.
> > (Their names in the code are KVM_ARM_LOCK_MEM_READ/KVM_ARM_LOCK_MEM_WRITE)
>
> That's true, I'll change the flags to match.
>
> >
> > What is the reason why KVM_ARM_LOCK_MEMORY_{READ,WRITE} flags need
> > to be specified even though memslot already has similar flags ??
>
> I added both flags to make the ABI more flexible, and I don't think it's a
> burden on userspace to specify the flags when locking a memslot.
>
> For this reason, I would rather keep it like this for now, unless you think
> there's a good reason to remove them.

Understood.
Just to confirm, KVM_ARM_LOCK_MEMORY_READ is practically unnecessary,
isn't it ? (Perhaps it might be more straightforward to have a similar
read only lock flag to the memslot considering this is an operation for
a memslot?)

>
> >
> > > +:Returns: 0 on success; negative error code on failure
> > > +
> > > +Enabling this capability causes the memory described by the memslot to be
> > > +pinned in the process address space and the corresponding stage 2 IPA range
> > > +mapped at stage 2. The permissions specified in args[1] apply to both
> > > +mappings. The memory pinned with this capability counts towards the max

I assume 'both mappings' mean the permission is applied to the
mapping for process address space as well as stage 2.
Why do you want to apply the permission to the process address space
mapping as well (not just for the stage 2 mapping) ?


> > > +locked memory limit for the current process.
> > > +
> > > +The capability should be enabled when no VCPUs are in the kernel executing an
> > > +ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
> > > +VCPUs have returned. The virtual memory range described by the memslot must be
> > > +mapped in the userspace process without any gaps. It is considered an error if
> > > +write permissions are specified for a memslot which logs dirty pages.
> > > +
> > > +7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > +---------------------------------------------------
> > > +
> > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > +:Architectures: arm64
> > > +:Target: VM
> > > +:Parameters: args[0] contains the memory slot number
> > > +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> > > +                     which unlocks all previously locked memslots.
> > > +:Returns: 0 on success; negative error code on failure
> > > +
> > > +Enabling this capability causes the memory pinned when locking the memslot
> > > +specified in args[0] to be unpinned, or, optionally, all memslots to be
> > > +unlocked. The IPA range is not unmapped from stage 2.
> > > +>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> >
> > Nit: An unnecessary line.
> >
> > If a memslot with read/write permission is locked with read only,
> > and then unlocked, can userspace expect stage 2 mapping for the
> > memslot to be updated with read/write ?
>
> Locking a memslot with the read flag would map the memory described by the
> memslot with read permissions at stage 2. When the memslot is unlocked, KVM
> won't touch the stage 2 entries.
>
> When the memslot is unlocked, the pages (as in, struct page) backing the VM
> memory as described by the memslot are unpinned. Then the host's MM subsystem
> can treat the memory like any other pages (make them old, new, unmap them, do
> nothing, etc), and the MMU notifier will take care of updating the stage 2
> entries as necessary.
>
> I guess I should have been more precise in the description. I'll change "causes
> the memory pinned when locking the memslot specified in args[0] to be unpinned"
> to something that clearly states that the memory in the host that backs the
> memslot is unpinned.

Thank you for the explanation.
It seems I misunderstood the read/write lock flag.
These lock flags only affect the permission of locked mappings, not
the guest's permission to access the memory, which depends only on
the memslot permission (not on the lock flags) as before. Correct ?


>
> > Can userspace delete the memslot that is locked (without unlocking) ?
>
> No, it cannot.
>
> > If so, userspace can expect the corresponding range to be implicitly
> > unlocked, correct ?
>
> Userspace must explicitely unlock the memslot before deleting it. I want
> userspace to be explicit in its intent.

I see. Thank you for the clarification (I checked patch-9 as well).

Thanks!
Reiji



>
> Thanks,
> Alex
>
> >
> > Thanks,
> > Reiji
> >
> > > +
> > >  8. Other capabilities.
> > >  ======================
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > > index 02d378887743..2c50734f048d 100644
> > > --- a/arch/arm64/include/asm/kvm_mmu.h
> > > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
> > >  void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> > >  void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> > >
> > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > +
> > >  static inline unsigned int kvm_get_vmid_bits(void)
> > >  {
> > >         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index e9b4ad7b5c82..d49905d18cee 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
> > >         return 0;
> > >  }
> > >
> > > +static int kvm_arm_lock_memslot_supported(void)
> > > +{
> > > +       return 0;
> > > +}
> > > +
> > > +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> > > +                                            struct kvm_enable_cap *cap)
> > > +{
> > > +       u64 slot, action_flags;
> > > +       u32 action;
> > > +
> > > +       if (cap->args[2] || cap->args[3])
> > > +               return -EINVAL;
> > > +
> > > +       slot = cap->args[0];
> > > +       action = cap->flags;
> > > +       action_flags = cap->args[1];
> > > +
> > > +       switch (action) {
> > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
> > > +               return kvm_mmu_lock_memslot(kvm, slot, action_flags);
> > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
> > > +               return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
> > > +       default:
> > > +               return -EINVAL;
> > > +       }
> > > +}
> > > +
> > >  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > >                             struct kvm_enable_cap *cap)
> > >  {
> > >         int r;
> > >
> > > -       if (cap->flags)
> > > -               return -EINVAL;
> > > -
> > >         switch (cap->cap) {
> > >         case KVM_CAP_ARM_NISV_TO_USER:
> > > +               if (cap->flags)
> > > +                       return -EINVAL;
> > >                 r = 0;
> > >                 kvm->arch.return_nisv_io_abort_to_user = true;
> > >                 break;
> > > @@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > >                 }
> > >                 mutex_unlock(&kvm->lock);
> > >                 break;
> > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > +               if (!kvm_arm_lock_memslot_supported())
> > > +                       return -EINVAL;
> > > +               r = kvm_lock_user_memory_region_ioctl(kvm, cap);
> > > +               break;
> > >         default:
> > >                 r = -EINVAL;
> > >                 break;
> > > @@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> > >         return VM_FAULT_SIGBUS;
> > >  }
> > >
> > > -
> > >  /**
> > >   * kvm_arch_destroy_vm - destroy the VM data structure
> > >   * @kvm:       pointer to the KVM struct
> > > @@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > >         case KVM_CAP_ARM_PTRAUTH_GENERIC:
> > >                 r = system_has_full_ptr_auth();
> > >                 break;
> > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > +               r = kvm_arm_lock_memslot_supported();
> > > +               break;
> > >         default:
> > >                 r = 0;
> > >         }
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index 326cdfec74a1..f65bcbc9ae69 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > >         return ret;
> > >  }
> > >
> > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > +{
> > > +       struct kvm_memory_slot *memslot;
> > > +       int ret;
> > > +
> > > +       if (slot >= KVM_MEM_SLOTS_NUM)
> > > +               return -EINVAL;
> > > +
> > > +       if (!(flags & KVM_ARM_LOCK_MEM_READ))
> > > +               return -EINVAL;
> > > +
> > > +       mutex_lock(&kvm->lock);
> > > +       if (!kvm_lock_all_vcpus(kvm)) {
> > > +               ret = -EBUSY;
> > > +               goto out_unlock_kvm;
> > > +       }
> > > +       mutex_lock(&kvm->slots_lock);
> > > +
> > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > +       if (!memslot) {
> > > +               ret = -EINVAL;
> > > +               goto out_unlock_slots;
> > > +       }
> > > +       if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
> > > +           ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
> > > +               ret = -EPERM;
> > > +               goto out_unlock_slots;
> > > +       }
> > > +
> > > +       ret = -EINVAL;
> > > +
> > > +out_unlock_slots:
> > > +       mutex_unlock(&kvm->slots_lock);
> > > +       kvm_unlock_all_vcpus(kvm);
> > > +out_unlock_kvm:
> > > +       mutex_unlock(&kvm->lock);
> > > +       return ret;
> > > +}
> > > +
> > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > +{
> > > +       bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> > > +       struct kvm_memory_slot *memslot;
> > > +       int ret;
> > > +
> > > +       if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> > > +               return -EINVAL;
> > > +
> > > +       mutex_lock(&kvm->slots_lock);
> > > +
> > > +       if (unlock_all) {
> > > +               ret = -EINVAL;
> > > +               goto out_unlock_slots;
> > > +       }
> > > +
> > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > +       if (!memslot) {
> > > +               ret = -EINVAL;
> > > +               goto out_unlock_slots;
> > > +       }
> > > +
> > > +       ret = -EINVAL;
> > > +
> > > +out_unlock_slots:
> > > +       mutex_unlock(&kvm->slots_lock);
> > > +       return ret;
> > > +}
> > > +
> > >  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> > >  {
> > >         if (!kvm->arch.mmu.pgt)
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index 1daa45268de2..70c969967557 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> > >  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> > >  #define KVM_CAP_ARM_MTE 205
> > >  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> > > +#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
> > >
> > >  #ifdef KVM_CAP_IRQ_ROUTING
> > >
> > > @@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
> > >  #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
> > >  #define KVM_ARM_MTE_COPY_TAGS    _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
> > >
> > > +/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
> > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK     (1 << 0)
> > > +#define   KVM_ARM_LOCK_MEM_READ                                (1 << 0)
> > > +#define   KVM_ARM_LOCK_MEM_WRITE                       (1 << 1)
> > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK   (1 << 1)
> > > +#define   KVM_ARM_UNLOCK_MEM_ALL                       (1 << 0)
> > > +
> > >  /* ioctl for vm fd */
> > >  #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
> > >
> > > --
> > > 2.33.1
> > >
> > > _______________________________________________
> > > kvmarm mailing list
> > > kvmarm@lists.cs.columbia.edu
> > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
@ 2022-02-17  7:35         ` Reiji Watanabe
  0 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-17  7:35 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi Alex,

On Tue, Feb 15, 2022 at 3:03 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi Reiji,
>
> On Mon, Feb 14, 2022 at 09:59:09PM -0800, Reiji Watanabe wrote:
> > Hi Alex,
> >
> > On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > Stage 2 faults triggered by the profiling buffer attempting to write to
> > > memory are reported by the SPE hardware by asserting a buffer management
> > > event interrupt. Interrupts are by their nature asynchronous, which means
> > > that the guest might have changed its stage 1 translation tables since the
> > > attempted write. SPE reports the guest virtual address that caused the data
> > > abort, not the IPA, which means that KVM would have to walk the guest's
> > > stage 1 tables to find the IPA. Using the AT instruction to walk the
> > > guest's tables in hardware is not an option because it doesn't report the
> > > IPA in the case of a stage 2 fault on a stage 1 table walk.
> > >
> > > Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> > > done by adding a capability that allows the user to pin the memory backing
> > > a memslot. The same capability can be used to unlock a memslot, which
> > > unpins the pages associated with the memslot, but doesn't unmap the IPA
> > > range from stage 2; in this case, the addresses will be unmapped from stage
> > > 2 via the MMU notifiers when the process' address space changes.
> > >
> > > For now, the capability doesn't actually do anything other than checking
> > > that the usage is correct; the memory operations will be added in future
> > > patches.
> > >
> > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > ---
> > >  Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
> > >  arch/arm64/include/asm/kvm_mmu.h |  3 ++
> > >  arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
> > >  arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
> > >  include/uapi/linux/kvm.h         |  8 ++++
> > >  5 files changed, 174 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > index aeeb071c7688..16aa59eae3d9 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
> > >  This is intended to support intra-host migration of VMs between userspace VMMs,
> > >  upgrading the VMM process without interrupting the guest.
> > >
> > > +7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > +----------------------------------------
> > > +
> > > +:Architectures: arm64
> > > +:Target: VM
> > > +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> > > +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > +             args[0] is the slot number
> > > +             args[1] specifies the permisions when the memslot is locked or if
> > > +                     all memslots should be unlocked
> > > +
> > > +The presence of this capability indicates that KVM supports locking the memory
> > > +associated with the memslot, and unlocking a previously locked memslot.
> > > +
> > > +The 'flags' parameter is defined as follows:
> > > +
> > > +7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> > > +-------------------------------------------------
> > > +
> > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > +:Architectures: arm64
> > > +:Target: VM
> > > +:Parameters: args[0] contains the memory slot number
> > > +             args[1] contains the permissions for the locked memory:
> > > +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> > > +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> > > +                     (optional) with write permissions
> >
> > Nit: Those flag names don't match the ones in the code.
> > (Their names in the code are KVM_ARM_LOCK_MEM_READ/KVM_ARM_LOCK_MEM_WRITE)
>
> That's true, I'll change the flags to match.
>
> >
> > What is the reason why KVM_ARM_LOCK_MEMORY_{READ,WRITE} flags need
> > to be specified even though memslot already has similar flags ??
>
> I added both flags to make the ABI more flexible, and I don't think it's a
> burden on userspace to specify the flags when locking a memslot.
>
> For this reason, I would rather keep it like this for now, unless you think
> there's a good reason to remove them.

Understood.
Just to confirm, KVM_ARM_LOCK_MEMORY_READ is practically unnecessary,
isn't it ? (Perhaps it might be more straightforward to have a similar
read only lock flag to the memslot considering this is an operation for
a memslot?)

>
> >
> > > +:Returns: 0 on success; negative error code on failure
> > > +
> > > +Enabling this capability causes the memory described by the memslot to be
> > > +pinned in the process address space and the corresponding stage 2 IPA range
> > > +mapped at stage 2. The permissions specified in args[1] apply to both
> > > +mappings. The memory pinned with this capability counts towards the max

I assume 'both mappings' mean the permission is applied to the
mapping for process address space as well as stage 2.
Why do you want to apply the permission to the process address space
mapping as well (not just for the stage 2 mapping) ?


> > > +locked memory limit for the current process.
> > > +
> > > +The capability should be enabled when no VCPUs are in the kernel executing an
> > > +ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
> > > +VCPUs have returned. The virtual memory range described by the memslot must be
> > > +mapped in the userspace process without any gaps. It is considered an error if
> > > +write permissions are specified for a memslot which logs dirty pages.
> > > +
> > > +7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > +---------------------------------------------------
> > > +
> > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > +:Architectures: arm64
> > > +:Target: VM
> > > +:Parameters: args[0] contains the memory slot number
> > > +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> > > +                     which unlocks all previously locked memslots.
> > > +:Returns: 0 on success; negative error code on failure
> > > +
> > > +Enabling this capability causes the memory pinned when locking the memslot
> > > +specified in args[0] to be unpinned, or, optionally, all memslots to be
> > > +unlocked. The IPA range is not unmapped from stage 2.
> > > +>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> >
> > Nit: An unnecessary line.
> >
> > If a memslot with read/write permission is locked with read only,
> > and then unlocked, can userspace expect stage 2 mapping for the
> > memslot to be updated with read/write ?
>
> Locking a memslot with the read flag would map the memory described by the
> memslot with read permissions at stage 2. When the memslot is unlocked, KVM
> won't touch the stage 2 entries.
>
> When the memslot is unlocked, the pages (as in, struct page) backing the VM
> memory as described by the memslot are unpinned. Then the host's MM subsystem
> can treat the memory like any other pages (make them old, new, unmap them, do
> nothing, etc), and the MMU notifier will take care of updating the stage 2
> entries as necessary.
>
> I guess I should have been more precise in the description. I'll change "causes
> the memory pinned when locking the memslot specified in args[0] to be unpinned"
> to something that clearly states that the memory in the host that backs the
> memslot is unpinned.

Thank you for the explanation.
It seems I misunderstood the read/write lock flag.
These lock flags only affect the permission of locked mappings, not
the guest's permission to access the memory, which depends only on
the memslot permission (not on the lock flags) as before. Correct ?


>
> > Can userspace delete the memslot that is locked (without unlocking) ?
>
> No, it cannot.
>
> > If so, userspace can expect the corresponding range to be implicitly
> > unlocked, correct ?
>
> Userspace must explicitely unlock the memslot before deleting it. I want
> userspace to be explicit in its intent.

I see. Thank you for the clarification (I checked patch-9 as well).

Thanks!
Reiji



>
> Thanks,
> Alex
>
> >
> > Thanks,
> > Reiji
> >
> > > +
> > >  8. Other capabilities.
> > >  ======================
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > > index 02d378887743..2c50734f048d 100644
> > > --- a/arch/arm64/include/asm/kvm_mmu.h
> > > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
> > >  void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> > >  void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> > >
> > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > +
> > >  static inline unsigned int kvm_get_vmid_bits(void)
> > >  {
> > >         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index e9b4ad7b5c82..d49905d18cee 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
> > >         return 0;
> > >  }
> > >
> > > +static int kvm_arm_lock_memslot_supported(void)
> > > +{
> > > +       return 0;
> > > +}
> > > +
> > > +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> > > +                                            struct kvm_enable_cap *cap)
> > > +{
> > > +       u64 slot, action_flags;
> > > +       u32 action;
> > > +
> > > +       if (cap->args[2] || cap->args[3])
> > > +               return -EINVAL;
> > > +
> > > +       slot = cap->args[0];
> > > +       action = cap->flags;
> > > +       action_flags = cap->args[1];
> > > +
> > > +       switch (action) {
> > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
> > > +               return kvm_mmu_lock_memslot(kvm, slot, action_flags);
> > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
> > > +               return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
> > > +       default:
> > > +               return -EINVAL;
> > > +       }
> > > +}
> > > +
> > >  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > >                             struct kvm_enable_cap *cap)
> > >  {
> > >         int r;
> > >
> > > -       if (cap->flags)
> > > -               return -EINVAL;
> > > -
> > >         switch (cap->cap) {
> > >         case KVM_CAP_ARM_NISV_TO_USER:
> > > +               if (cap->flags)
> > > +                       return -EINVAL;
> > >                 r = 0;
> > >                 kvm->arch.return_nisv_io_abort_to_user = true;
> > >                 break;
> > > @@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > >                 }
> > >                 mutex_unlock(&kvm->lock);
> > >                 break;
> > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > +               if (!kvm_arm_lock_memslot_supported())
> > > +                       return -EINVAL;
> > > +               r = kvm_lock_user_memory_region_ioctl(kvm, cap);
> > > +               break;
> > >         default:
> > >                 r = -EINVAL;
> > >                 break;
> > > @@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> > >         return VM_FAULT_SIGBUS;
> > >  }
> > >
> > > -
> > >  /**
> > >   * kvm_arch_destroy_vm - destroy the VM data structure
> > >   * @kvm:       pointer to the KVM struct
> > > @@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > >         case KVM_CAP_ARM_PTRAUTH_GENERIC:
> > >                 r = system_has_full_ptr_auth();
> > >                 break;
> > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > +               r = kvm_arm_lock_memslot_supported();
> > > +               break;
> > >         default:
> > >                 r = 0;
> > >         }
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index 326cdfec74a1..f65bcbc9ae69 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > >         return ret;
> > >  }
> > >
> > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > +{
> > > +       struct kvm_memory_slot *memslot;
> > > +       int ret;
> > > +
> > > +       if (slot >= KVM_MEM_SLOTS_NUM)
> > > +               return -EINVAL;
> > > +
> > > +       if (!(flags & KVM_ARM_LOCK_MEM_READ))
> > > +               return -EINVAL;
> > > +
> > > +       mutex_lock(&kvm->lock);
> > > +       if (!kvm_lock_all_vcpus(kvm)) {
> > > +               ret = -EBUSY;
> > > +               goto out_unlock_kvm;
> > > +       }
> > > +       mutex_lock(&kvm->slots_lock);
> > > +
> > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > +       if (!memslot) {
> > > +               ret = -EINVAL;
> > > +               goto out_unlock_slots;
> > > +       }
> > > +       if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
> > > +           ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
> > > +               ret = -EPERM;
> > > +               goto out_unlock_slots;
> > > +       }
> > > +
> > > +       ret = -EINVAL;
> > > +
> > > +out_unlock_slots:
> > > +       mutex_unlock(&kvm->slots_lock);
> > > +       kvm_unlock_all_vcpus(kvm);
> > > +out_unlock_kvm:
> > > +       mutex_unlock(&kvm->lock);
> > > +       return ret;
> > > +}
> > > +
> > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > +{
> > > +       bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> > > +       struct kvm_memory_slot *memslot;
> > > +       int ret;
> > > +
> > > +       if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> > > +               return -EINVAL;
> > > +
> > > +       mutex_lock(&kvm->slots_lock);
> > > +
> > > +       if (unlock_all) {
> > > +               ret = -EINVAL;
> > > +               goto out_unlock_slots;
> > > +       }
> > > +
> > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > +       if (!memslot) {
> > > +               ret = -EINVAL;
> > > +               goto out_unlock_slots;
> > > +       }
> > > +
> > > +       ret = -EINVAL;
> > > +
> > > +out_unlock_slots:
> > > +       mutex_unlock(&kvm->slots_lock);
> > > +       return ret;
> > > +}
> > > +
> > >  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> > >  {
> > >         if (!kvm->arch.mmu.pgt)
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index 1daa45268de2..70c969967557 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> > >  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> > >  #define KVM_CAP_ARM_MTE 205
> > >  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> > > +#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
> > >
> > >  #ifdef KVM_CAP_IRQ_ROUTING
> > >
> > > @@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
> > >  #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
> > >  #define KVM_ARM_MTE_COPY_TAGS    _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
> > >
> > > +/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
> > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK     (1 << 0)
> > > +#define   KVM_ARM_LOCK_MEM_READ                                (1 << 0)
> > > +#define   KVM_ARM_LOCK_MEM_WRITE                       (1 << 1)
> > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK   (1 << 1)
> > > +#define   KVM_ARM_UNLOCK_MEM_ALL                       (1 << 0)
> > > +
> > >  /* ioctl for vm fd */
> > >  #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
> > >
> > > --
> > > 2.33.1
> > >
> > > _______________________________________________
> > > kvmarm mailing list
> > > kvmarm@lists.cs.columbia.edu
> > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
  2022-02-17  7:35         ` Reiji Watanabe
@ 2022-02-17 10:31           ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-02-17 10:31 UTC (permalink / raw)
  To: Reiji Watanabe; +Cc: Marc Zyngier, Will Deacon, kvmarm, Linux ARM

Hi Reiji,

On Wed, Feb 16, 2022 at 11:35:06PM -0800, Reiji Watanabe wrote:
> Hi Alex,
> 
> On Tue, Feb 15, 2022 at 3:03 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Hi Reiji,
> >
> > On Mon, Feb 14, 2022 at 09:59:09PM -0800, Reiji Watanabe wrote:
> > > Hi Alex,
> > >
> > > On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> > > <alexandru.elisei@arm.com> wrote:
> > > >
> > > > Stage 2 faults triggered by the profiling buffer attempting to write to
> > > > memory are reported by the SPE hardware by asserting a buffer management
> > > > event interrupt. Interrupts are by their nature asynchronous, which means
> > > > that the guest might have changed its stage 1 translation tables since the
> > > > attempted write. SPE reports the guest virtual address that caused the data
> > > > abort, not the IPA, which means that KVM would have to walk the guest's
> > > > stage 1 tables to find the IPA. Using the AT instruction to walk the
> > > > guest's tables in hardware is not an option because it doesn't report the
> > > > IPA in the case of a stage 2 fault on a stage 1 table walk.
> > > >
> > > > Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> > > > done by adding a capability that allows the user to pin the memory backing
> > > > a memslot. The same capability can be used to unlock a memslot, which
> > > > unpins the pages associated with the memslot, but doesn't unmap the IPA
> > > > range from stage 2; in this case, the addresses will be unmapped from stage
> > > > 2 via the MMU notifiers when the process' address space changes.
> > > >
> > > > For now, the capability doesn't actually do anything other than checking
> > > > that the usage is correct; the memory operations will be added in future
> > > > patches.
> > > >
> > > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > > ---
> > > >  Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
> > > >  arch/arm64/include/asm/kvm_mmu.h |  3 ++
> > > >  arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
> > > >  arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
> > > >  include/uapi/linux/kvm.h         |  8 ++++
> > > >  5 files changed, 174 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > index aeeb071c7688..16aa59eae3d9 100644
> > > > --- a/Documentation/virt/kvm/api.rst
> > > > +++ b/Documentation/virt/kvm/api.rst
> > > > @@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
> > > >  This is intended to support intra-host migration of VMs between userspace VMMs,
> > > >  upgrading the VMM process without interrupting the guest.
> > > >
> > > > +7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > +----------------------------------------
> > > > +
> > > > +:Architectures: arm64
> > > > +:Target: VM
> > > > +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> > > > +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > > +             args[0] is the slot number
> > > > +             args[1] specifies the permisions when the memslot is locked or if
> > > > +                     all memslots should be unlocked
> > > > +
> > > > +The presence of this capability indicates that KVM supports locking the memory
> > > > +associated with the memslot, and unlocking a previously locked memslot.
> > > > +
> > > > +The 'flags' parameter is defined as follows:
> > > > +
> > > > +7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> > > > +-------------------------------------------------
> > > > +
> > > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > +:Architectures: arm64
> > > > +:Target: VM
> > > > +:Parameters: args[0] contains the memory slot number
> > > > +             args[1] contains the permissions for the locked memory:
> > > > +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> > > > +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> > > > +                     (optional) with write permissions
> > >
> > > Nit: Those flag names don't match the ones in the code.
> > > (Their names in the code are KVM_ARM_LOCK_MEM_READ/KVM_ARM_LOCK_MEM_WRITE)
> >
> > That's true, I'll change the flags to match.
> >
> > >
> > > What is the reason why KVM_ARM_LOCK_MEMORY_{READ,WRITE} flags need
> > > to be specified even though memslot already has similar flags ??
> >
> > I added both flags to make the ABI more flexible, and I don't think it's a
> > burden on userspace to specify the flags when locking a memslot.
> >
> > For this reason, I would rather keep it like this for now, unless you think
> > there's a good reason to remove them.
> 
> Understood.
> Just to confirm, KVM_ARM_LOCK_MEMORY_READ is practically unnecessary,
> isn't it ? (Perhaps it might be more straightforward to have a similar

Even if the flag looks unnecessary now, that might change in the future, if new
flags are added.

> read only lock flag to the memslot considering this is an operation for
> a memslot?)

I don't see a reason to add a read-only flag to this ioctl, the memslot's
read-only flag should be enough.

> 
> >
> > >
> > > > +:Returns: 0 on success; negative error code on failure
> > > > +
> > > > +Enabling this capability causes the memory described by the memslot to be
> > > > +pinned in the process address space and the corresponding stage 2 IPA range
> > > > +mapped at stage 2. The permissions specified in args[1] apply to both
> > > > +mappings. The memory pinned with this capability counts towards the max
> 
> I assume 'both mappings' mean the permission is applied to the
> mapping for process address space as well as stage 2.
> Why do you want to apply the permission to the process address space
> mapping as well (not just for the stage 2 mapping) ?

pin_user_pages() does not apply the permissions. It checks that they are already
there (look at pin_user_pages -> __gup_longterm_locked -> __get_user_pages_locked).

I put the "both mappings" bit there to let userspace know that the userspace
mapping backing the memslot must have at least the same permissions as those
specified when locking the memslot. I'll remove this part, because it's
confusing and redundant.

> 
> 
> > > > +locked memory limit for the current process.
> > > > +
> > > > +The capability should be enabled when no VCPUs are in the kernel executing an
> > > > +ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
> > > > +VCPUs have returned. The virtual memory range described by the memslot must be
> > > > +mapped in the userspace process without any gaps. It is considered an error if
> > > > +write permissions are specified for a memslot which logs dirty pages.
> > > > +
> > > > +7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > > +---------------------------------------------------
> > > > +
> > > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > +:Architectures: arm64
> > > > +:Target: VM
> > > > +:Parameters: args[0] contains the memory slot number
> > > > +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> > > > +                     which unlocks all previously locked memslots.
> > > > +:Returns: 0 on success; negative error code on failure
> > > > +
> > > > +Enabling this capability causes the memory pinned when locking the memslot
> > > > +specified in args[0] to be unpinned, or, optionally, all memslots to be
> > > > +unlocked. The IPA range is not unmapped from stage 2.
> > > > +>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> > >
> > > Nit: An unnecessary line.
> > >
> > > If a memslot with read/write permission is locked with read only,
> > > and then unlocked, can userspace expect stage 2 mapping for the
> > > memslot to be updated with read/write ?
> >
> > Locking a memslot with the read flag would map the memory described by the
> > memslot with read permissions at stage 2. When the memslot is unlocked, KVM
> > won't touch the stage 2 entries.
> >
> > When the memslot is unlocked, the pages (as in, struct page) backing the VM
> > memory as described by the memslot are unpinned. Then the host's MM subsystem
> > can treat the memory like any other pages (make them old, new, unmap them, do
> > nothing, etc), and the MMU notifier will take care of updating the stage 2
> > entries as necessary.
> >
> > I guess I should have been more precise in the description. I'll change "causes
> > the memory pinned when locking the memslot specified in args[0] to be unpinned"
> > to something that clearly states that the memory in the host that backs the
> > memslot is unpinned.
> 
> Thank you for the explanation.
> It seems I misunderstood the read/write lock flag.
> These lock flags only affect the permission of locked mappings, not
> the guest's permission to access the memory, which depends only on
> the memslot permission (not on the lock flags) as before. Correct ?

Yes, the locked flags only affect the initial permissions with which the IPAs
are mapped at stage 2.

Thanks,
Alex

> 
> 
> >
> > > Can userspace delete the memslot that is locked (without unlocking) ?
> >
> > No, it cannot.
> >
> > > If so, userspace can expect the corresponding range to be implicitly
> > > unlocked, correct ?
> >
> > Userspace must explicitely unlock the memslot before deleting it. I want
> > userspace to be explicit in its intent.
> 
> I see. Thank you for the clarification (I checked patch-9 as well).
> 
> Thanks!
> Reiji
> 
> 
> 
> >
> > Thanks,
> > Alex
> >
> > >
> > > Thanks,
> > > Reiji
> > >
> > > > +
> > > >  8. Other capabilities.
> > > >  ======================
> > > >
> > > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > > > index 02d378887743..2c50734f048d 100644
> > > > --- a/arch/arm64/include/asm/kvm_mmu.h
> > > > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > > @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
> > > >  void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> > > >  void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> > > >
> > > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > > +
> > > >  static inline unsigned int kvm_get_vmid_bits(void)
> > > >  {
> > > >         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > > index e9b4ad7b5c82..d49905d18cee 100644
> > > > --- a/arch/arm64/kvm/arm.c
> > > > +++ b/arch/arm64/kvm/arm.c
> > > > @@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
> > > >         return 0;
> > > >  }
> > > >
> > > > +static int kvm_arm_lock_memslot_supported(void)
> > > > +{
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> > > > +                                            struct kvm_enable_cap *cap)
> > > > +{
> > > > +       u64 slot, action_flags;
> > > > +       u32 action;
> > > > +
> > > > +       if (cap->args[2] || cap->args[3])
> > > > +               return -EINVAL;
> > > > +
> > > > +       slot = cap->args[0];
> > > > +       action = cap->flags;
> > > > +       action_flags = cap->args[1];
> > > > +
> > > > +       switch (action) {
> > > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
> > > > +               return kvm_mmu_lock_memslot(kvm, slot, action_flags);
> > > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
> > > > +               return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
> > > > +       default:
> > > > +               return -EINVAL;
> > > > +       }
> > > > +}
> > > > +
> > > >  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > >                             struct kvm_enable_cap *cap)
> > > >  {
> > > >         int r;
> > > >
> > > > -       if (cap->flags)
> > > > -               return -EINVAL;
> > > > -
> > > >         switch (cap->cap) {
> > > >         case KVM_CAP_ARM_NISV_TO_USER:
> > > > +               if (cap->flags)
> > > > +                       return -EINVAL;
> > > >                 r = 0;
> > > >                 kvm->arch.return_nisv_io_abort_to_user = true;
> > > >                 break;
> > > > @@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > >                 }
> > > >                 mutex_unlock(&kvm->lock);
> > > >                 break;
> > > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > > +               if (!kvm_arm_lock_memslot_supported())
> > > > +                       return -EINVAL;
> > > > +               r = kvm_lock_user_memory_region_ioctl(kvm, cap);
> > > > +               break;
> > > >         default:
> > > >                 r = -EINVAL;
> > > >                 break;
> > > > @@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> > > >         return VM_FAULT_SIGBUS;
> > > >  }
> > > >
> > > > -
> > > >  /**
> > > >   * kvm_arch_destroy_vm - destroy the VM data structure
> > > >   * @kvm:       pointer to the KVM struct
> > > > @@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > > >         case KVM_CAP_ARM_PTRAUTH_GENERIC:
> > > >                 r = system_has_full_ptr_auth();
> > > >                 break;
> > > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > > +               r = kvm_arm_lock_memslot_supported();
> > > > +               break;
> > > >         default:
> > > >                 r = 0;
> > > >         }
> > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > index 326cdfec74a1..f65bcbc9ae69 100644
> > > > --- a/arch/arm64/kvm/mmu.c
> > > > +++ b/arch/arm64/kvm/mmu.c
> > > > @@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > > >         return ret;
> > > >  }
> > > >
> > > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > > +{
> > > > +       struct kvm_memory_slot *memslot;
> > > > +       int ret;
> > > > +
> > > > +       if (slot >= KVM_MEM_SLOTS_NUM)
> > > > +               return -EINVAL;
> > > > +
> > > > +       if (!(flags & KVM_ARM_LOCK_MEM_READ))
> > > > +               return -EINVAL;
> > > > +
> > > > +       mutex_lock(&kvm->lock);
> > > > +       if (!kvm_lock_all_vcpus(kvm)) {
> > > > +               ret = -EBUSY;
> > > > +               goto out_unlock_kvm;
> > > > +       }
> > > > +       mutex_lock(&kvm->slots_lock);
> > > > +
> > > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > > +       if (!memslot) {
> > > > +               ret = -EINVAL;
> > > > +               goto out_unlock_slots;
> > > > +       }
> > > > +       if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
> > > > +           ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
> > > > +               ret = -EPERM;
> > > > +               goto out_unlock_slots;
> > > > +       }
> > > > +
> > > > +       ret = -EINVAL;
> > > > +
> > > > +out_unlock_slots:
> > > > +       mutex_unlock(&kvm->slots_lock);
> > > > +       kvm_unlock_all_vcpus(kvm);
> > > > +out_unlock_kvm:
> > > > +       mutex_unlock(&kvm->lock);
> > > > +       return ret;
> > > > +}
> > > > +
> > > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > > +{
> > > > +       bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> > > > +       struct kvm_memory_slot *memslot;
> > > > +       int ret;
> > > > +
> > > > +       if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> > > > +               return -EINVAL;
> > > > +
> > > > +       mutex_lock(&kvm->slots_lock);
> > > > +
> > > > +       if (unlock_all) {
> > > > +               ret = -EINVAL;
> > > > +               goto out_unlock_slots;
> > > > +       }
> > > > +
> > > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > > +       if (!memslot) {
> > > > +               ret = -EINVAL;
> > > > +               goto out_unlock_slots;
> > > > +       }
> > > > +
> > > > +       ret = -EINVAL;
> > > > +
> > > > +out_unlock_slots:
> > > > +       mutex_unlock(&kvm->slots_lock);
> > > > +       return ret;
> > > > +}
> > > > +
> > > >  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> > > >  {
> > > >         if (!kvm->arch.mmu.pgt)
> > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > index 1daa45268de2..70c969967557 100644
> > > > --- a/include/uapi/linux/kvm.h
> > > > +++ b/include/uapi/linux/kvm.h
> > > > @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> > > >  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> > > >  #define KVM_CAP_ARM_MTE 205
> > > >  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> > > > +#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
> > > >
> > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > >
> > > > @@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
> > > >  #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
> > > >  #define KVM_ARM_MTE_COPY_TAGS    _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
> > > >
> > > > +/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
> > > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK     (1 << 0)
> > > > +#define   KVM_ARM_LOCK_MEM_READ                                (1 << 0)
> > > > +#define   KVM_ARM_LOCK_MEM_WRITE                       (1 << 1)
> > > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK   (1 << 1)
> > > > +#define   KVM_ARM_UNLOCK_MEM_ALL                       (1 << 0)
> > > > +
> > > >  /* ioctl for vm fd */
> > > >  #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
> > > >
> > > > --
> > > > 2.33.1
> > > >
> > > > _______________________________________________
> > > > kvmarm mailing list
> > > > kvmarm@lists.cs.columbia.edu
> > > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
@ 2022-02-17 10:31           ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-02-17 10:31 UTC (permalink / raw)
  To: Reiji Watanabe
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi Reiji,

On Wed, Feb 16, 2022 at 11:35:06PM -0800, Reiji Watanabe wrote:
> Hi Alex,
> 
> On Tue, Feb 15, 2022 at 3:03 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Hi Reiji,
> >
> > On Mon, Feb 14, 2022 at 09:59:09PM -0800, Reiji Watanabe wrote:
> > > Hi Alex,
> > >
> > > On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> > > <alexandru.elisei@arm.com> wrote:
> > > >
> > > > Stage 2 faults triggered by the profiling buffer attempting to write to
> > > > memory are reported by the SPE hardware by asserting a buffer management
> > > > event interrupt. Interrupts are by their nature asynchronous, which means
> > > > that the guest might have changed its stage 1 translation tables since the
> > > > attempted write. SPE reports the guest virtual address that caused the data
> > > > abort, not the IPA, which means that KVM would have to walk the guest's
> > > > stage 1 tables to find the IPA. Using the AT instruction to walk the
> > > > guest's tables in hardware is not an option because it doesn't report the
> > > > IPA in the case of a stage 2 fault on a stage 1 table walk.
> > > >
> > > > Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> > > > done by adding a capability that allows the user to pin the memory backing
> > > > a memslot. The same capability can be used to unlock a memslot, which
> > > > unpins the pages associated with the memslot, but doesn't unmap the IPA
> > > > range from stage 2; in this case, the addresses will be unmapped from stage
> > > > 2 via the MMU notifiers when the process' address space changes.
> > > >
> > > > For now, the capability doesn't actually do anything other than checking
> > > > that the usage is correct; the memory operations will be added in future
> > > > patches.
> > > >
> > > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > > ---
> > > >  Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
> > > >  arch/arm64/include/asm/kvm_mmu.h |  3 ++
> > > >  arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
> > > >  arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
> > > >  include/uapi/linux/kvm.h         |  8 ++++
> > > >  5 files changed, 174 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > index aeeb071c7688..16aa59eae3d9 100644
> > > > --- a/Documentation/virt/kvm/api.rst
> > > > +++ b/Documentation/virt/kvm/api.rst
> > > > @@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
> > > >  This is intended to support intra-host migration of VMs between userspace VMMs,
> > > >  upgrading the VMM process without interrupting the guest.
> > > >
> > > > +7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > +----------------------------------------
> > > > +
> > > > +:Architectures: arm64
> > > > +:Target: VM
> > > > +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> > > > +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > > +             args[0] is the slot number
> > > > +             args[1] specifies the permisions when the memslot is locked or if
> > > > +                     all memslots should be unlocked
> > > > +
> > > > +The presence of this capability indicates that KVM supports locking the memory
> > > > +associated with the memslot, and unlocking a previously locked memslot.
> > > > +
> > > > +The 'flags' parameter is defined as follows:
> > > > +
> > > > +7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> > > > +-------------------------------------------------
> > > > +
> > > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > +:Architectures: arm64
> > > > +:Target: VM
> > > > +:Parameters: args[0] contains the memory slot number
> > > > +             args[1] contains the permissions for the locked memory:
> > > > +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> > > > +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> > > > +                     (optional) with write permissions
> > >
> > > Nit: Those flag names don't match the ones in the code.
> > > (Their names in the code are KVM_ARM_LOCK_MEM_READ/KVM_ARM_LOCK_MEM_WRITE)
> >
> > That's true, I'll change the flags to match.
> >
> > >
> > > What is the reason why KVM_ARM_LOCK_MEMORY_{READ,WRITE} flags need
> > > to be specified even though memslot already has similar flags ??
> >
> > I added both flags to make the ABI more flexible, and I don't think it's a
> > burden on userspace to specify the flags when locking a memslot.
> >
> > For this reason, I would rather keep it like this for now, unless you think
> > there's a good reason to remove them.
> 
> Understood.
> Just to confirm, KVM_ARM_LOCK_MEMORY_READ is practically unnecessary,
> isn't it ? (Perhaps it might be more straightforward to have a similar

Even if the flag looks unnecessary now, that might change in the future, if new
flags are added.

> read only lock flag to the memslot considering this is an operation for
> a memslot?)

I don't see a reason to add a read-only flag to this ioctl, the memslot's
read-only flag should be enough.

> 
> >
> > >
> > > > +:Returns: 0 on success; negative error code on failure
> > > > +
> > > > +Enabling this capability causes the memory described by the memslot to be
> > > > +pinned in the process address space and the corresponding stage 2 IPA range
> > > > +mapped at stage 2. The permissions specified in args[1] apply to both
> > > > +mappings. The memory pinned with this capability counts towards the max
> 
> I assume 'both mappings' mean the permission is applied to the
> mapping for process address space as well as stage 2.
> Why do you want to apply the permission to the process address space
> mapping as well (not just for the stage 2 mapping) ?

pin_user_pages() does not apply the permissions. It checks that they are already
there (look at pin_user_pages -> __gup_longterm_locked -> __get_user_pages_locked).

I put the "both mappings" bit there to let userspace know that the userspace
mapping backing the memslot must have at least the same permissions as those
specified when locking the memslot. I'll remove this part, because it's
confusing and redundant.

> 
> 
> > > > +locked memory limit for the current process.
> > > > +
> > > > +The capability should be enabled when no VCPUs are in the kernel executing an
> > > > +ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
> > > > +VCPUs have returned. The virtual memory range described by the memslot must be
> > > > +mapped in the userspace process without any gaps. It is considered an error if
> > > > +write permissions are specified for a memslot which logs dirty pages.
> > > > +
> > > > +7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > > +---------------------------------------------------
> > > > +
> > > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > +:Architectures: arm64
> > > > +:Target: VM
> > > > +:Parameters: args[0] contains the memory slot number
> > > > +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> > > > +                     which unlocks all previously locked memslots.
> > > > +:Returns: 0 on success; negative error code on failure
> > > > +
> > > > +Enabling this capability causes the memory pinned when locking the memslot
> > > > +specified in args[0] to be unpinned, or, optionally, all memslots to be
> > > > +unlocked. The IPA range is not unmapped from stage 2.
> > > > +>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> > >
> > > Nit: An unnecessary line.
> > >
> > > If a memslot with read/write permission is locked with read only,
> > > and then unlocked, can userspace expect stage 2 mapping for the
> > > memslot to be updated with read/write ?
> >
> > Locking a memslot with the read flag would map the memory described by the
> > memslot with read permissions at stage 2. When the memslot is unlocked, KVM
> > won't touch the stage 2 entries.
> >
> > When the memslot is unlocked, the pages (as in, struct page) backing the VM
> > memory as described by the memslot are unpinned. Then the host's MM subsystem
> > can treat the memory like any other pages (make them old, new, unmap them, do
> > nothing, etc), and the MMU notifier will take care of updating the stage 2
> > entries as necessary.
> >
> > I guess I should have been more precise in the description. I'll change "causes
> > the memory pinned when locking the memslot specified in args[0] to be unpinned"
> > to something that clearly states that the memory in the host that backs the
> > memslot is unpinned.
> 
> Thank you for the explanation.
> It seems I misunderstood the read/write lock flag.
> These lock flags only affect the permission of locked mappings, not
> the guest's permission to access the memory, which depends only on
> the memslot permission (not on the lock flags) as before. Correct ?

Yes, the locked flags only affect the initial permissions with which the IPAs
are mapped at stage 2.

Thanks,
Alex

> 
> 
> >
> > > Can userspace delete the memslot that is locked (without unlocking) ?
> >
> > No, it cannot.
> >
> > > If so, userspace can expect the corresponding range to be implicitly
> > > unlocked, correct ?
> >
> > Userspace must explicitely unlock the memslot before deleting it. I want
> > userspace to be explicit in its intent.
> 
> I see. Thank you for the clarification (I checked patch-9 as well).
> 
> Thanks!
> Reiji
> 
> 
> 
> >
> > Thanks,
> > Alex
> >
> > >
> > > Thanks,
> > > Reiji
> > >
> > > > +
> > > >  8. Other capabilities.
> > > >  ======================
> > > >
> > > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > > > index 02d378887743..2c50734f048d 100644
> > > > --- a/arch/arm64/include/asm/kvm_mmu.h
> > > > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > > @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
> > > >  void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> > > >  void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> > > >
> > > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > > +
> > > >  static inline unsigned int kvm_get_vmid_bits(void)
> > > >  {
> > > >         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > > index e9b4ad7b5c82..d49905d18cee 100644
> > > > --- a/arch/arm64/kvm/arm.c
> > > > +++ b/arch/arm64/kvm/arm.c
> > > > @@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
> > > >         return 0;
> > > >  }
> > > >
> > > > +static int kvm_arm_lock_memslot_supported(void)
> > > > +{
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> > > > +                                            struct kvm_enable_cap *cap)
> > > > +{
> > > > +       u64 slot, action_flags;
> > > > +       u32 action;
> > > > +
> > > > +       if (cap->args[2] || cap->args[3])
> > > > +               return -EINVAL;
> > > > +
> > > > +       slot = cap->args[0];
> > > > +       action = cap->flags;
> > > > +       action_flags = cap->args[1];
> > > > +
> > > > +       switch (action) {
> > > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
> > > > +               return kvm_mmu_lock_memslot(kvm, slot, action_flags);
> > > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
> > > > +               return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
> > > > +       default:
> > > > +               return -EINVAL;
> > > > +       }
> > > > +}
> > > > +
> > > >  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > >                             struct kvm_enable_cap *cap)
> > > >  {
> > > >         int r;
> > > >
> > > > -       if (cap->flags)
> > > > -               return -EINVAL;
> > > > -
> > > >         switch (cap->cap) {
> > > >         case KVM_CAP_ARM_NISV_TO_USER:
> > > > +               if (cap->flags)
> > > > +                       return -EINVAL;
> > > >                 r = 0;
> > > >                 kvm->arch.return_nisv_io_abort_to_user = true;
> > > >                 break;
> > > > @@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > >                 }
> > > >                 mutex_unlock(&kvm->lock);
> > > >                 break;
> > > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > > +               if (!kvm_arm_lock_memslot_supported())
> > > > +                       return -EINVAL;
> > > > +               r = kvm_lock_user_memory_region_ioctl(kvm, cap);
> > > > +               break;
> > > >         default:
> > > >                 r = -EINVAL;
> > > >                 break;
> > > > @@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> > > >         return VM_FAULT_SIGBUS;
> > > >  }
> > > >
> > > > -
> > > >  /**
> > > >   * kvm_arch_destroy_vm - destroy the VM data structure
> > > >   * @kvm:       pointer to the KVM struct
> > > > @@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > > >         case KVM_CAP_ARM_PTRAUTH_GENERIC:
> > > >                 r = system_has_full_ptr_auth();
> > > >                 break;
> > > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > > +               r = kvm_arm_lock_memslot_supported();
> > > > +               break;
> > > >         default:
> > > >                 r = 0;
> > > >         }
> > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > index 326cdfec74a1..f65bcbc9ae69 100644
> > > > --- a/arch/arm64/kvm/mmu.c
> > > > +++ b/arch/arm64/kvm/mmu.c
> > > > @@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > > >         return ret;
> > > >  }
> > > >
> > > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > > +{
> > > > +       struct kvm_memory_slot *memslot;
> > > > +       int ret;
> > > > +
> > > > +       if (slot >= KVM_MEM_SLOTS_NUM)
> > > > +               return -EINVAL;
> > > > +
> > > > +       if (!(flags & KVM_ARM_LOCK_MEM_READ))
> > > > +               return -EINVAL;
> > > > +
> > > > +       mutex_lock(&kvm->lock);
> > > > +       if (!kvm_lock_all_vcpus(kvm)) {
> > > > +               ret = -EBUSY;
> > > > +               goto out_unlock_kvm;
> > > > +       }
> > > > +       mutex_lock(&kvm->slots_lock);
> > > > +
> > > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > > +       if (!memslot) {
> > > > +               ret = -EINVAL;
> > > > +               goto out_unlock_slots;
> > > > +       }
> > > > +       if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
> > > > +           ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
> > > > +               ret = -EPERM;
> > > > +               goto out_unlock_slots;
> > > > +       }
> > > > +
> > > > +       ret = -EINVAL;
> > > > +
> > > > +out_unlock_slots:
> > > > +       mutex_unlock(&kvm->slots_lock);
> > > > +       kvm_unlock_all_vcpus(kvm);
> > > > +out_unlock_kvm:
> > > > +       mutex_unlock(&kvm->lock);
> > > > +       return ret;
> > > > +}
> > > > +
> > > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > > +{
> > > > +       bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> > > > +       struct kvm_memory_slot *memslot;
> > > > +       int ret;
> > > > +
> > > > +       if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> > > > +               return -EINVAL;
> > > > +
> > > > +       mutex_lock(&kvm->slots_lock);
> > > > +
> > > > +       if (unlock_all) {
> > > > +               ret = -EINVAL;
> > > > +               goto out_unlock_slots;
> > > > +       }
> > > > +
> > > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > > +       if (!memslot) {
> > > > +               ret = -EINVAL;
> > > > +               goto out_unlock_slots;
> > > > +       }
> > > > +
> > > > +       ret = -EINVAL;
> > > > +
> > > > +out_unlock_slots:
> > > > +       mutex_unlock(&kvm->slots_lock);
> > > > +       return ret;
> > > > +}
> > > > +
> > > >  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> > > >  {
> > > >         if (!kvm->arch.mmu.pgt)
> > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > index 1daa45268de2..70c969967557 100644
> > > > --- a/include/uapi/linux/kvm.h
> > > > +++ b/include/uapi/linux/kvm.h
> > > > @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> > > >  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> > > >  #define KVM_CAP_ARM_MTE 205
> > > >  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> > > > +#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
> > > >
> > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > >
> > > > @@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
> > > >  #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
> > > >  #define KVM_ARM_MTE_COPY_TAGS    _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
> > > >
> > > > +/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
> > > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK     (1 << 0)
> > > > +#define   KVM_ARM_LOCK_MEM_READ                                (1 << 0)
> > > > +#define   KVM_ARM_LOCK_MEM_WRITE                       (1 << 1)
> > > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK   (1 << 1)
> > > > +#define   KVM_ARM_UNLOCK_MEM_ALL                       (1 << 0)
> > > > +
> > > >  /* ioctl for vm fd */
> > > >  #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
> > > >
> > > > --
> > > > 2.33.1
> > > >
> > > > _______________________________________________
> > > > kvmarm mailing list
> > > > kvmarm@lists.cs.columbia.edu
> > > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
  2022-02-17 10:31           ` Alexandru Elisei
@ 2022-02-18  4:41             ` Reiji Watanabe
  -1 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-18  4:41 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: Marc Zyngier, Will Deacon, kvmarm, Linux ARM

Hi Alex,

On Thu, Feb 17, 2022 at 2:31 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi Reiji,
>
> On Wed, Feb 16, 2022 at 11:35:06PM -0800, Reiji Watanabe wrote:
> > Hi Alex,
> >
> > On Tue, Feb 15, 2022 at 3:03 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > Hi Reiji,
> > >
> > > On Mon, Feb 14, 2022 at 09:59:09PM -0800, Reiji Watanabe wrote:
> > > > Hi Alex,
> > > >
> > > > On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> > > > <alexandru.elisei@arm.com> wrote:
> > > > >
> > > > > Stage 2 faults triggered by the profiling buffer attempting to write to
> > > > > memory are reported by the SPE hardware by asserting a buffer management
> > > > > event interrupt. Interrupts are by their nature asynchronous, which means
> > > > > that the guest might have changed its stage 1 translation tables since the
> > > > > attempted write. SPE reports the guest virtual address that caused the data
> > > > > abort, not the IPA, which means that KVM would have to walk the guest's
> > > > > stage 1 tables to find the IPA. Using the AT instruction to walk the
> > > > > guest's tables in hardware is not an option because it doesn't report the
> > > > > IPA in the case of a stage 2 fault on a stage 1 table walk.
> > > > >
> > > > > Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> > > > > done by adding a capability that allows the user to pin the memory backing
> > > > > a memslot. The same capability can be used to unlock a memslot, which
> > > > > unpins the pages associated with the memslot, but doesn't unmap the IPA
> > > > > range from stage 2; in this case, the addresses will be unmapped from stage
> > > > > 2 via the MMU notifiers when the process' address space changes.
> > > > >
> > > > > For now, the capability doesn't actually do anything other than checking
> > > > > that the usage is correct; the memory operations will be added in future
> > > > > patches.
> > > > >
> > > > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > > > ---
> > > > >  Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
> > > > >  arch/arm64/include/asm/kvm_mmu.h |  3 ++
> > > > >  arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
> > > > >  arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
> > > > >  include/uapi/linux/kvm.h         |  8 ++++
> > > > >  5 files changed, 174 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > index aeeb071c7688..16aa59eae3d9 100644
> > > > > --- a/Documentation/virt/kvm/api.rst
> > > > > +++ b/Documentation/virt/kvm/api.rst
> > > > > @@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
> > > > >  This is intended to support intra-host migration of VMs between userspace VMMs,
> > > > >  upgrading the VMM process without interrupting the guest.
> > > > >
> > > > > +7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > > +----------------------------------------
> > > > > +
> > > > > +:Architectures: arm64
> > > > > +:Target: VM
> > > > > +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> > > > > +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > > > +             args[0] is the slot number
> > > > > +             args[1] specifies the permisions when the memslot is locked or if
> > > > > +                     all memslots should be unlocked
> > > > > +
> > > > > +The presence of this capability indicates that KVM supports locking the memory
> > > > > +associated with the memslot, and unlocking a previously locked memslot.
> > > > > +
> > > > > +The 'flags' parameter is defined as follows:
> > > > > +
> > > > > +7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> > > > > +-------------------------------------------------
> > > > > +
> > > > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > > +:Architectures: arm64
> > > > > +:Target: VM
> > > > > +:Parameters: args[0] contains the memory slot number
> > > > > +             args[1] contains the permissions for the locked memory:
> > > > > +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> > > > > +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> > > > > +                     (optional) with write permissions
> > > >
> > > > Nit: Those flag names don't match the ones in the code.
> > > > (Their names in the code are KVM_ARM_LOCK_MEM_READ/KVM_ARM_LOCK_MEM_WRITE)
> > >
> > > That's true, I'll change the flags to match.
> > >
> > > >
> > > > What is the reason why KVM_ARM_LOCK_MEMORY_{READ,WRITE} flags need
> > > > to be specified even though memslot already has similar flags ??
> > >
> > > I added both flags to make the ABI more flexible, and I don't think it's a
> > > burden on userspace to specify the flags when locking a memslot.
> > >
> > > For this reason, I would rather keep it like this for now, unless you think
> > > there's a good reason to remove them.
> >
> > Understood.
> > Just to confirm, KVM_ARM_LOCK_MEMORY_READ is practically unnecessary,
> > isn't it ? (Perhaps it might be more straightforward to have a similar
>
> Even if the flag looks unnecessary now, that might change in the future, if new
> flags are added.
>
> > read only lock flag to the memslot considering this is an operation for
> > a memslot?)
>
> I don't see a reason to add a read-only flag to this ioctl, the memslot's
> read-only flag should be enough.

What I meant was, if we don't need KVM_ARM_LOCK_MEMORY_READ
and if we can remove that (which isn't the case), instead of the write lock
flag (KVM_ARM_LOCK_MEMORY_WRITE), naming that as read only lock flag
(e.g. KVM_ARM_LOCK_MEMORY_READ_ONLY) might be better.
Anyway, please disregard this comment.
I understand you want to keep KVM_ARM_LOCK_MEMORY_READ and why.


> >
> > >
> > > >
> > > > > +:Returns: 0 on success; negative error code on failure
> > > > > +
> > > > > +Enabling this capability causes the memory described by the memslot to be
> > > > > +pinned in the process address space and the corresponding stage 2 IPA range
> > > > > +mapped at stage 2. The permissions specified in args[1] apply to both
> > > > > +mappings. The memory pinned with this capability counts towards the max
> >
> > I assume 'both mappings' mean the permission is applied to the
> > mapping for process address space as well as stage 2.
> > Why do you want to apply the permission to the process address space
> > mapping as well (not just for the stage 2 mapping) ?
>
> pin_user_pages() does not apply the permissions. It checks that they are already
> there (look at pin_user_pages -> __gup_longterm_locked -> __get_user_pages_locked).
>
> I put the "both mappings" bit there to let userspace know that the userspace
> mapping backing the memslot must have at least the same permissions as those
> specified when locking the memslot. I'll remove this part, because it's
> confusing and redundant.

The above explanation ("backing the memslot must have at least...")
looks clear to me and might be good to replace the original statement with
that instead of removing.
(I don't think that is explained in the doc in this patch)


> >
> >
> > > > > +locked memory limit for the current process.
> > > > > +
> > > > > +The capability should be enabled when no VCPUs are in the kernel executing an
> > > > > +ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
> > > > > +VCPUs have returned. The virtual memory range described by the memslot must be
> > > > > +mapped in the userspace process without any gaps. It is considered an error if
> > > > > +write permissions are specified for a memslot which logs dirty pages.
> > > > > +
> > > > > +7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > > > +---------------------------------------------------
> > > > > +
> > > > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > > +:Architectures: arm64
> > > > > +:Target: VM
> > > > > +:Parameters: args[0] contains the memory slot number
> > > > > +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> > > > > +                     which unlocks all previously locked memslots.
> > > > > +:Returns: 0 on success; negative error code on failure
> > > > > +
> > > > > +Enabling this capability causes the memory pinned when locking the memslot
> > > > > +specified in args[0] to be unpinned, or, optionally, all memslots to be
> > > > > +unlocked. The IPA range is not unmapped from stage 2.
> > > > > +>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> > > >
> > > > Nit: An unnecessary line.
> > > >
> > > > If a memslot with read/write permission is locked with read only,
> > > > and then unlocked, can userspace expect stage 2 mapping for the
> > > > memslot to be updated with read/write ?
> > >
> > > Locking a memslot with the read flag would map the memory described by the
> > > memslot with read permissions at stage 2. When the memslot is unlocked, KVM
> > > won't touch the stage 2 entries.
> > >
> > > When the memslot is unlocked, the pages (as in, struct page) backing the VM
> > > memory as described by the memslot are unpinned. Then the host's MM subsystem
> > > can treat the memory like any other pages (make them old, new, unmap them, do
> > > nothing, etc), and the MMU notifier will take care of updating the stage 2
> > > entries as necessary.
> > >
> > > I guess I should have been more precise in the description. I'll change "causes
> > > the memory pinned when locking the memslot specified in args[0] to be unpinned"
> > > to something that clearly states that the memory in the host that backs the
> > > memslot is unpinned.
> >
> > Thank you for the explanation.
> > It seems I misunderstood the read/write lock flag.
> > These lock flags only affect the permission of locked mappings, not
> > the guest's permission to access the memory, which depends only on
> > the memslot permission (not on the lock flags) as before. Correct ?
>
> Yes, the locked flags only affect the initial permissions with which the IPAs
> are mapped at stage 2.

Thank you for all the clarification!!
Now I think I understand the API (and your intention) better, and
I will continue to take a look at the implementation more closely.

Regards,
Reiji


>
> Thanks,
> Alex
>
> >
> >
> > >
> > > > Can userspace delete the memslot that is locked (without unlocking) ?
> > >
> > > No, it cannot.
> > >
> > > > If so, userspace can expect the corresponding range to be implicitly
> > > > unlocked, correct ?
> > >
> > > Userspace must explicitely unlock the memslot before deleting it. I want
> > > userspace to be explicit in its intent.
> >
> > I see. Thank you for the clarification (I checked patch-9 as well).
> >
> > Thanks!
> > Reiji
> >
> >
> >
> > >
> > > Thanks,
> > > Alex
> > >
> > > >
> > > > Thanks,
> > > > Reiji
> > > >
> > > > > +
> > > > >  8. Other capabilities.
> > > > >  ======================
> > > > >
> > > > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > > > > index 02d378887743..2c50734f048d 100644
> > > > > --- a/arch/arm64/include/asm/kvm_mmu.h
> > > > > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > > > @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
> > > > >  void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> > > > >  void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> > > > >
> > > > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > > > +
> > > > >  static inline unsigned int kvm_get_vmid_bits(void)
> > > > >  {
> > > > >         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > > > index e9b4ad7b5c82..d49905d18cee 100644
> > > > > --- a/arch/arm64/kvm/arm.c
> > > > > +++ b/arch/arm64/kvm/arm.c
> > > > > @@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > +static int kvm_arm_lock_memslot_supported(void)
> > > > > +{
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > > +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> > > > > +                                            struct kvm_enable_cap *cap)
> > > > > +{
> > > > > +       u64 slot, action_flags;
> > > > > +       u32 action;
> > > > > +
> > > > > +       if (cap->args[2] || cap->args[3])
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       slot = cap->args[0];
> > > > > +       action = cap->flags;
> > > > > +       action_flags = cap->args[1];
> > > > > +
> > > > > +       switch (action) {
> > > > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
> > > > > +               return kvm_mmu_lock_memslot(kvm, slot, action_flags);
> > > > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
> > > > > +               return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
> > > > > +       default:
> > > > > +               return -EINVAL;
> > > > > +       }
> > > > > +}
> > > > > +
> > > > >  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > > >                             struct kvm_enable_cap *cap)
> > > > >  {
> > > > >         int r;
> > > > >
> > > > > -       if (cap->flags)
> > > > > -               return -EINVAL;
> > > > > -
> > > > >         switch (cap->cap) {
> > > > >         case KVM_CAP_ARM_NISV_TO_USER:
> > > > > +               if (cap->flags)
> > > > > +                       return -EINVAL;
> > > > >                 r = 0;
> > > > >                 kvm->arch.return_nisv_io_abort_to_user = true;
> > > > >                 break;
> > > > > @@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > > >                 }
> > > > >                 mutex_unlock(&kvm->lock);
> > > > >                 break;
> > > > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > > > +               if (!kvm_arm_lock_memslot_supported())
> > > > > +                       return -EINVAL;
> > > > > +               r = kvm_lock_user_memory_region_ioctl(kvm, cap);
> > > > > +               break;
> > > > >         default:
> > > > >                 r = -EINVAL;
> > > > >                 break;
> > > > > @@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> > > > >         return VM_FAULT_SIGBUS;
> > > > >  }
> > > > >
> > > > > -
> > > > >  /**
> > > > >   * kvm_arch_destroy_vm - destroy the VM data structure
> > > > >   * @kvm:       pointer to the KVM struct
> > > > > @@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > > > >         case KVM_CAP_ARM_PTRAUTH_GENERIC:
> > > > >                 r = system_has_full_ptr_auth();
> > > > >                 break;
> > > > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > > > +               r = kvm_arm_lock_memslot_supported();
> > > > > +               break;
> > > > >         default:
> > > > >                 r = 0;
> > > > >         }
> > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > > index 326cdfec74a1..f65bcbc9ae69 100644
> > > > > --- a/arch/arm64/kvm/mmu.c
> > > > > +++ b/arch/arm64/kvm/mmu.c
> > > > > @@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > > > >         return ret;
> > > > >  }
> > > > >
> > > > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > > > +{
> > > > > +       struct kvm_memory_slot *memslot;
> > > > > +       int ret;
> > > > > +
> > > > > +       if (slot >= KVM_MEM_SLOTS_NUM)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       if (!(flags & KVM_ARM_LOCK_MEM_READ))
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       mutex_lock(&kvm->lock);
> > > > > +       if (!kvm_lock_all_vcpus(kvm)) {
> > > > > +               ret = -EBUSY;
> > > > > +               goto out_unlock_kvm;
> > > > > +       }
> > > > > +       mutex_lock(&kvm->slots_lock);
> > > > > +
> > > > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > > > +       if (!memslot) {
> > > > > +               ret = -EINVAL;
> > > > > +               goto out_unlock_slots;
> > > > > +       }
> > > > > +       if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
> > > > > +           ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
> > > > > +               ret = -EPERM;
> > > > > +               goto out_unlock_slots;
> > > > > +       }
> > > > > +
> > > > > +       ret = -EINVAL;
> > > > > +
> > > > > +out_unlock_slots:
> > > > > +       mutex_unlock(&kvm->slots_lock);
> > > > > +       kvm_unlock_all_vcpus(kvm);
> > > > > +out_unlock_kvm:
> > > > > +       mutex_unlock(&kvm->lock);
> > > > > +       return ret;
> > > > > +}
> > > > > +
> > > > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > > > +{
> > > > > +       bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> > > > > +       struct kvm_memory_slot *memslot;
> > > > > +       int ret;
> > > > > +
> > > > > +       if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       mutex_lock(&kvm->slots_lock);
> > > > > +
> > > > > +       if (unlock_all) {
> > > > > +               ret = -EINVAL;
> > > > > +               goto out_unlock_slots;
> > > > > +       }
> > > > > +
> > > > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > > > +       if (!memslot) {
> > > > > +               ret = -EINVAL;
> > > > > +               goto out_unlock_slots;
> > > > > +       }
> > > > > +
> > > > > +       ret = -EINVAL;
> > > > > +
> > > > > +out_unlock_slots:
> > > > > +       mutex_unlock(&kvm->slots_lock);
> > > > > +       return ret;
> > > > > +}
> > > > > +
> > > > >  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> > > > >  {
> > > > >         if (!kvm->arch.mmu.pgt)
> > > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > > index 1daa45268de2..70c969967557 100644
> > > > > --- a/include/uapi/linux/kvm.h
> > > > > +++ b/include/uapi/linux/kvm.h
> > > > > @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> > > > >  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> > > > >  #define KVM_CAP_ARM_MTE 205
> > > > >  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> > > > > +#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
> > > > >
> > > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > > >
> > > > > @@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
> > > > >  #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
> > > > >  #define KVM_ARM_MTE_COPY_TAGS    _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
> > > > >
> > > > > +/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
> > > > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK     (1 << 0)
> > > > > +#define   KVM_ARM_LOCK_MEM_READ                                (1 << 0)
> > > > > +#define   KVM_ARM_LOCK_MEM_WRITE                       (1 << 1)
> > > > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK   (1 << 1)
> > > > > +#define   KVM_ARM_UNLOCK_MEM_ALL                       (1 << 0)
> > > > > +
> > > > >  /* ioctl for vm fd */
> > > > >  #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
> > > > >
> > > > > --
> > > > > 2.33.1
> > > > >
> > > > > _______________________________________________
> > > > > kvmarm mailing list
> > > > > kvmarm@lists.cs.columbia.edu
> > > > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API
@ 2022-02-18  4:41             ` Reiji Watanabe
  0 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-18  4:41 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi Alex,

On Thu, Feb 17, 2022 at 2:31 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi Reiji,
>
> On Wed, Feb 16, 2022 at 11:35:06PM -0800, Reiji Watanabe wrote:
> > Hi Alex,
> >
> > On Tue, Feb 15, 2022 at 3:03 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > Hi Reiji,
> > >
> > > On Mon, Feb 14, 2022 at 09:59:09PM -0800, Reiji Watanabe wrote:
> > > > Hi Alex,
> > > >
> > > > On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> > > > <alexandru.elisei@arm.com> wrote:
> > > > >
> > > > > Stage 2 faults triggered by the profiling buffer attempting to write to
> > > > > memory are reported by the SPE hardware by asserting a buffer management
> > > > > event interrupt. Interrupts are by their nature asynchronous, which means
> > > > > that the guest might have changed its stage 1 translation tables since the
> > > > > attempted write. SPE reports the guest virtual address that caused the data
> > > > > abort, not the IPA, which means that KVM would have to walk the guest's
> > > > > stage 1 tables to find the IPA. Using the AT instruction to walk the
> > > > > guest's tables in hardware is not an option because it doesn't report the
> > > > > IPA in the case of a stage 2 fault on a stage 1 table walk.
> > > > >
> > > > > Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> > > > > done by adding a capability that allows the user to pin the memory backing
> > > > > a memslot. The same capability can be used to unlock a memslot, which
> > > > > unpins the pages associated with the memslot, but doesn't unmap the IPA
> > > > > range from stage 2; in this case, the addresses will be unmapped from stage
> > > > > 2 via the MMU notifiers when the process' address space changes.
> > > > >
> > > > > For now, the capability doesn't actually do anything other than checking
> > > > > that the usage is correct; the memory operations will be added in future
> > > > > patches.
> > > > >
> > > > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > > > ---
> > > > >  Documentation/virt/kvm/api.rst   | 57 ++++++++++++++++++++++++++
> > > > >  arch/arm64/include/asm/kvm_mmu.h |  3 ++
> > > > >  arch/arm64/kvm/arm.c             | 42 ++++++++++++++++++--
> > > > >  arch/arm64/kvm/mmu.c             | 68 ++++++++++++++++++++++++++++++++
> > > > >  include/uapi/linux/kvm.h         |  8 ++++
> > > > >  5 files changed, 174 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > index aeeb071c7688..16aa59eae3d9 100644
> > > > > --- a/Documentation/virt/kvm/api.rst
> > > > > +++ b/Documentation/virt/kvm/api.rst
> > > > > @@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on.
> > > > >  This is intended to support intra-host migration of VMs between userspace VMMs,
> > > > >  upgrading the VMM process without interrupting the guest.
> > > > >
> > > > > +7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > > +----------------------------------------
> > > > > +
> > > > > +:Architectures: arm64
> > > > > +:Target: VM
> > > > > +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> > > > > +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > > > +             args[0] is the slot number
> > > > > +             args[1] specifies the permisions when the memslot is locked or if
> > > > > +                     all memslots should be unlocked
> > > > > +
> > > > > +The presence of this capability indicates that KVM supports locking the memory
> > > > > +associated with the memslot, and unlocking a previously locked memslot.
> > > > > +
> > > > > +The 'flags' parameter is defined as follows:
> > > > > +
> > > > > +7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> > > > > +-------------------------------------------------
> > > > > +
> > > > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > > +:Architectures: arm64
> > > > > +:Target: VM
> > > > > +:Parameters: args[0] contains the memory slot number
> > > > > +             args[1] contains the permissions for the locked memory:
> > > > > +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> > > > > +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> > > > > +                     (optional) with write permissions
> > > >
> > > > Nit: Those flag names don't match the ones in the code.
> > > > (Their names in the code are KVM_ARM_LOCK_MEM_READ/KVM_ARM_LOCK_MEM_WRITE)
> > >
> > > That's true, I'll change the flags to match.
> > >
> > > >
> > > > What is the reason why KVM_ARM_LOCK_MEMORY_{READ,WRITE} flags need
> > > > to be specified even though memslot already has similar flags ??
> > >
> > > I added both flags to make the ABI more flexible, and I don't think it's a
> > > burden on userspace to specify the flags when locking a memslot.
> > >
> > > For this reason, I would rather keep it like this for now, unless you think
> > > there's a good reason to remove them.
> >
> > Understood.
> > Just to confirm, KVM_ARM_LOCK_MEMORY_READ is practically unnecessary,
> > isn't it ? (Perhaps it might be more straightforward to have a similar
>
> Even if the flag looks unnecessary now, that might change in the future, if new
> flags are added.
>
> > read only lock flag to the memslot considering this is an operation for
> > a memslot?)
>
> I don't see a reason to add a read-only flag to this ioctl, the memslot's
> read-only flag should be enough.

What I meant was, if we don't need KVM_ARM_LOCK_MEMORY_READ
and if we can remove that (which isn't the case), instead of the write lock
flag (KVM_ARM_LOCK_MEMORY_WRITE), naming that as read only lock flag
(e.g. KVM_ARM_LOCK_MEMORY_READ_ONLY) might be better.
Anyway, please disregard this comment.
I understand you want to keep KVM_ARM_LOCK_MEMORY_READ and why.


> >
> > >
> > > >
> > > > > +:Returns: 0 on success; negative error code on failure
> > > > > +
> > > > > +Enabling this capability causes the memory described by the memslot to be
> > > > > +pinned in the process address space and the corresponding stage 2 IPA range
> > > > > +mapped at stage 2. The permissions specified in args[1] apply to both
> > > > > +mappings. The memory pinned with this capability counts towards the max
> >
> > I assume 'both mappings' mean the permission is applied to the
> > mapping for process address space as well as stage 2.
> > Why do you want to apply the permission to the process address space
> > mapping as well (not just for the stage 2 mapping) ?
>
> pin_user_pages() does not apply the permissions. It checks that they are already
> there (look at pin_user_pages -> __gup_longterm_locked -> __get_user_pages_locked).
>
> I put the "both mappings" bit there to let userspace know that the userspace
> mapping backing the memslot must have at least the same permissions as those
> specified when locking the memslot. I'll remove this part, because it's
> confusing and redundant.

The above explanation ("backing the memslot must have at least...")
looks clear to me and might be good to replace the original statement with
that instead of removing.
(I don't think that is explained in the doc in this patch)


> >
> >
> > > > > +locked memory limit for the current process.
> > > > > +
> > > > > +The capability should be enabled when no VCPUs are in the kernel executing an
> > > > > +ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all
> > > > > +VCPUs have returned. The virtual memory range described by the memslot must be
> > > > > +mapped in the userspace process without any gaps. It is considered an error if
> > > > > +write permissions are specified for a memslot which logs dirty pages.
> > > > > +
> > > > > +7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > > > > +---------------------------------------------------
> > > > > +
> > > > > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > > > > +:Architectures: arm64
> > > > > +:Target: VM
> > > > > +:Parameters: args[0] contains the memory slot number
> > > > > +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> > > > > +                     which unlocks all previously locked memslots.
> > > > > +:Returns: 0 on success; negative error code on failure
> > > > > +
> > > > > +Enabling this capability causes the memory pinned when locking the memslot
> > > > > +specified in args[0] to be unpinned, or, optionally, all memslots to be
> > > > > +unlocked. The IPA range is not unmapped from stage 2.
> > > > > +>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API)
> > > >
> > > > Nit: An unnecessary line.
> > > >
> > > > If a memslot with read/write permission is locked with read only,
> > > > and then unlocked, can userspace expect stage 2 mapping for the
> > > > memslot to be updated with read/write ?
> > >
> > > Locking a memslot with the read flag would map the memory described by the
> > > memslot with read permissions at stage 2. When the memslot is unlocked, KVM
> > > won't touch the stage 2 entries.
> > >
> > > When the memslot is unlocked, the pages (as in, struct page) backing the VM
> > > memory as described by the memslot are unpinned. Then the host's MM subsystem
> > > can treat the memory like any other pages (make them old, new, unmap them, do
> > > nothing, etc), and the MMU notifier will take care of updating the stage 2
> > > entries as necessary.
> > >
> > > I guess I should have been more precise in the description. I'll change "causes
> > > the memory pinned when locking the memslot specified in args[0] to be unpinned"
> > > to something that clearly states that the memory in the host that backs the
> > > memslot is unpinned.
> >
> > Thank you for the explanation.
> > It seems I misunderstood the read/write lock flag.
> > These lock flags only affect the permission of locked mappings, not
> > the guest's permission to access the memory, which depends only on
> > the memslot permission (not on the lock flags) as before. Correct ?
>
> Yes, the locked flags only affect the initial permissions with which the IPAs
> are mapped at stage 2.

Thank you for all the clarification!!
Now I think I understand the API (and your intention) better, and
I will continue to take a look at the implementation more closely.

Regards,
Reiji


>
> Thanks,
> Alex
>
> >
> >
> > >
> > > > Can userspace delete the memslot that is locked (without unlocking) ?
> > >
> > > No, it cannot.
> > >
> > > > If so, userspace can expect the corresponding range to be implicitly
> > > > unlocked, correct ?
> > >
> > > Userspace must explicitely unlock the memslot before deleting it. I want
> > > userspace to be explicit in its intent.
> >
> > I see. Thank you for the clarification (I checked patch-9 as well).
> >
> > Thanks!
> > Reiji
> >
> >
> >
> > >
> > > Thanks,
> > > Alex
> > >
> > > >
> > > > Thanks,
> > > > Reiji
> > > >
> > > > > +
> > > > >  8. Other capabilities.
> > > > >  ======================
> > > > >
> > > > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > > > > index 02d378887743..2c50734f048d 100644
> > > > > --- a/arch/arm64/include/asm/kvm_mmu.h
> > > > > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > > > @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
> > > > >  void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> > > > >  void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> > > > >
> > > > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > > > > +
> > > > >  static inline unsigned int kvm_get_vmid_bits(void)
> > > > >  {
> > > > >         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > > > index e9b4ad7b5c82..d49905d18cee 100644
> > > > > --- a/arch/arm64/kvm/arm.c
> > > > > +++ b/arch/arm64/kvm/arm.c
> > > > > @@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque)
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > +static int kvm_arm_lock_memslot_supported(void)
> > > > > +{
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > > +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> > > > > +                                            struct kvm_enable_cap *cap)
> > > > > +{
> > > > > +       u64 slot, action_flags;
> > > > > +       u32 action;
> > > > > +
> > > > > +       if (cap->args[2] || cap->args[3])
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       slot = cap->args[0];
> > > > > +       action = cap->flags;
> > > > > +       action_flags = cap->args[1];
> > > > > +
> > > > > +       switch (action) {
> > > > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
> > > > > +               return kvm_mmu_lock_memslot(kvm, slot, action_flags);
> > > > > +       case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
> > > > > +               return kvm_mmu_unlock_memslot(kvm, slot, action_flags);
> > > > > +       default:
> > > > > +               return -EINVAL;
> > > > > +       }
> > > > > +}
> > > > > +
> > > > >  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > > >                             struct kvm_enable_cap *cap)
> > > > >  {
> > > > >         int r;
> > > > >
> > > > > -       if (cap->flags)
> > > > > -               return -EINVAL;
> > > > > -
> > > > >         switch (cap->cap) {
> > > > >         case KVM_CAP_ARM_NISV_TO_USER:
> > > > > +               if (cap->flags)
> > > > > +                       return -EINVAL;
> > > > >                 r = 0;
> > > > >                 kvm->arch.return_nisv_io_abort_to_user = true;
> > > > >                 break;
> > > > > @@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > > >                 }
> > > > >                 mutex_unlock(&kvm->lock);
> > > > >                 break;
> > > > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > > > +               if (!kvm_arm_lock_memslot_supported())
> > > > > +                       return -EINVAL;
> > > > > +               r = kvm_lock_user_memory_region_ioctl(kvm, cap);
> > > > > +               break;
> > > > >         default:
> > > > >                 r = -EINVAL;
> > > > >                 break;
> > > > > @@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> > > > >         return VM_FAULT_SIGBUS;
> > > > >  }
> > > > >
> > > > > -
> > > > >  /**
> > > > >   * kvm_arch_destroy_vm - destroy the VM data structure
> > > > >   * @kvm:       pointer to the KVM struct
> > > > > @@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > > > >         case KVM_CAP_ARM_PTRAUTH_GENERIC:
> > > > >                 r = system_has_full_ptr_auth();
> > > > >                 break;
> > > > > +       case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
> > > > > +               r = kvm_arm_lock_memslot_supported();
> > > > > +               break;
> > > > >         default:
> > > > >                 r = 0;
> > > > >         }
> > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > > index 326cdfec74a1..f65bcbc9ae69 100644
> > > > > --- a/arch/arm64/kvm/mmu.c
> > > > > +++ b/arch/arm64/kvm/mmu.c
> > > > > @@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > > > >         return ret;
> > > > >  }
> > > > >
> > > > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > > > +{
> > > > > +       struct kvm_memory_slot *memslot;
> > > > > +       int ret;
> > > > > +
> > > > > +       if (slot >= KVM_MEM_SLOTS_NUM)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       if (!(flags & KVM_ARM_LOCK_MEM_READ))
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       mutex_lock(&kvm->lock);
> > > > > +       if (!kvm_lock_all_vcpus(kvm)) {
> > > > > +               ret = -EBUSY;
> > > > > +               goto out_unlock_kvm;
> > > > > +       }
> > > > > +       mutex_lock(&kvm->slots_lock);
> > > > > +
> > > > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > > > +       if (!memslot) {
> > > > > +               ret = -EINVAL;
> > > > > +               goto out_unlock_slots;
> > > > > +       }
> > > > > +       if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
> > > > > +           ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
> > > > > +               ret = -EPERM;
> > > > > +               goto out_unlock_slots;
> > > > > +       }
> > > > > +
> > > > > +       ret = -EINVAL;
> > > > > +
> > > > > +out_unlock_slots:
> > > > > +       mutex_unlock(&kvm->slots_lock);
> > > > > +       kvm_unlock_all_vcpus(kvm);
> > > > > +out_unlock_kvm:
> > > > > +       mutex_unlock(&kvm->lock);
> > > > > +       return ret;
> > > > > +}
> > > > > +
> > > > > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > > > > +{
> > > > > +       bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL;
> > > > > +       struct kvm_memory_slot *memslot;
> > > > > +       int ret;
> > > > > +
> > > > > +       if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       mutex_lock(&kvm->slots_lock);
> > > > > +
> > > > > +       if (unlock_all) {
> > > > > +               ret = -EINVAL;
> > > > > +               goto out_unlock_slots;
> > > > > +       }
> > > > > +
> > > > > +       memslot = id_to_memslot(kvm_memslots(kvm), slot);
> > > > > +       if (!memslot) {
> > > > > +               ret = -EINVAL;
> > > > > +               goto out_unlock_slots;
> > > > > +       }
> > > > > +
> > > > > +       ret = -EINVAL;
> > > > > +
> > > > > +out_unlock_slots:
> > > > > +       mutex_unlock(&kvm->slots_lock);
> > > > > +       return ret;
> > > > > +}
> > > > > +
> > > > >  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> > > > >  {
> > > > >         if (!kvm->arch.mmu.pgt)
> > > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > > index 1daa45268de2..70c969967557 100644
> > > > > --- a/include/uapi/linux/kvm.h
> > > > > +++ b/include/uapi/linux/kvm.h
> > > > > @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
> > > > >  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> > > > >  #define KVM_CAP_ARM_MTE 205
> > > > >  #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
> > > > > +#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207
> > > > >
> > > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > > >
> > > > > @@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping {
> > > > >  #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
> > > > >  #define KVM_ARM_MTE_COPY_TAGS    _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
> > > > >
> > > > > +/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
> > > > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK     (1 << 0)
> > > > > +#define   KVM_ARM_LOCK_MEM_READ                                (1 << 0)
> > > > > +#define   KVM_ARM_LOCK_MEM_WRITE                       (1 << 1)
> > > > > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK   (1 << 1)
> > > > > +#define   KVM_ARM_UNLOCK_MEM_ALL                       (1 << 0)
> > > > > +
> > > > >  /* ioctl for vm fd */
> > > > >  #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
> > > > >
> > > > > --
> > > > > 2.33.1
> > > > >
> > > > > _______________________________________________
> > > > > kvmarm mailing list
> > > > > kvmarm@lists.cs.columbia.edu
> > > > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 04/38] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
  2021-11-17 15:38   ` Alexandru Elisei
@ 2022-02-24  5:56     ` Reiji Watanabe
  -1 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-24  5:56 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: Marc Zyngier, Will Deacon, kvmarm, Linux ARM

Hi Alex,

On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> KVM relies on doing dcache maintenance on stage 2 faults to present to a
> guest running with the MMU off the same view of memory as userspace. For
> locked memslots, KVM so far has done the dcache maintenance when a memslot
> is locked, but that leaves KVM in a rather awkward position: what userspace
> writes to guest memory after the memslot is locked, but before a VCPU is
> run, might not be visible to the guest.
>
> Fix this by deferring the dcache maintenance until the first VCPU is run.
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  7 ++++
>  arch/arm64/include/asm/kvm_mmu.h  |  5 +++
>  arch/arm64/kvm/arm.c              |  3 ++
>  arch/arm64/kvm/mmu.c              | 55 ++++++++++++++++++++++++++++---
>  4 files changed, 66 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7fd70ad90c16..3b4839b447c4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -113,6 +113,10 @@ struct kvm_arch_memory_slot {
>         u32 flags;
>  };
>
> +/* kvm->arch.mmu_pending_ops flags */
> +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE        0
> +#define KVM_MAX_MMU_PENDING_OPS                1
> +
>  struct kvm_arch {
>         struct kvm_s2_mmu mmu;
>
> @@ -136,6 +140,9 @@ struct kvm_arch {
>          */
>         bool return_nisv_io_abort_to_user;
>
> +       /* Defer MMU operations until a VCPU is run. */
> +       unsigned long mmu_pending_ops;
> +
>         /*
>          * VM-wide PMU filter, implemented as a bitmap and big enough for
>          * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 2c50734f048d..cbf57c474fea 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>
> +#define kvm_mmu_has_pending_ops(kvm)   \
> +       (!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> +
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> +
>  static inline unsigned int kvm_get_vmid_bits(void)
>  {
>         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index b9b8b43835e3..96ed48455cdd 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -870,6 +870,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>         if (unlikely(!kvm_vcpu_initialized(vcpu)))
>                 return -ENOEXEC;
>
> +       if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> +               kvm_mmu_perform_pending_ops(vcpu->kvm);
> +
>         ret = kvm_vcpu_first_run_init(vcpu);
>         if (ret)
>                 return ret;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index b0a8e61315e4..8e4787019840 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1305,6 +1305,40 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>         return ret;
>  }
>
> +/*
> + * It's safe to do the CMOs when the first VCPU is run because:
> + * - VCPUs cannot run until mmu_cmo_needed is cleared.

What does 'mmu_cmo_needed' mean ? Do you mean 'mmu_pending_ops' instead ?


> + * - Memslots cannot be modified because we hold the kvm->slots_lock.
> + *
> + * It's safe to periodically release the mmu_lock because:
> + * - VCPUs cannot run.
> + * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
> + *   the mmu_lock, which means accesses will be serialized.
> + * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> + *   is live, which means that the VM will be live.
> + */
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> +{
> +       struct kvm_memory_slot *memslot;
> +
> +       mutex_lock(&kvm->slots_lock);
> +       if (!kvm_mmu_has_pending_ops(kvm))
> +               goto out_unlock;
> +
> +       if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> +               kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +                       if (!memslot_is_locked(memslot))
> +                               continue;

Shouldn't the code hold the mmu_lock to call stage2_flush_memslot() ?

> +                       stage2_flush_memslot(kvm, memslot);

Since stage2_flush_memslot() won't do anything when stage2_has_fwb()
returns true, I wonder if it can be checked even before iterating
memslots (so those iterations can be skipped when not needed).

Thanks,
Reiji

> +               }
> +               clear_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> +       }
> +
> +out_unlock:
> +       mutex_unlock(&kvm->slots_lock);
> +       return;
> +}
> +
>  static int try_rlimit_memlock(unsigned long npages)
>  {
>         unsigned long lock_limit;
> @@ -1345,7 +1379,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>         struct kvm_memory_slot_page *page_entry;
>         bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
>         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> -       struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +       struct kvm_pgtable pgt;
> +       struct kvm_pgtable_mm_ops mm_ops;
>         struct vm_area_struct *vma;
>         unsigned long npages = memslot->npages;
>         unsigned int pin_flags = FOLL_LONGTERM;
> @@ -1363,6 +1398,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>                 pin_flags |= FOLL_WRITE;
>         }
>
> +       /*
> +        * Make a copy of the stage 2 translation table struct to remove the
> +        * dcache callback so we can postpone the cache maintenance operations
> +        * until the first VCPU is run.
> +        */
> +       mm_ops = *kvm->arch.mmu.pgt->mm_ops;
> +       mm_ops.dcache_clean_inval_poc = NULL;
> +       pgt = *kvm->arch.mmu.pgt;
> +       pgt.mm_ops = &mm_ops;
> +
>         hva = memslot->userspace_addr;
>         ipa = memslot->base_gfn << PAGE_SHIFT;
>
> @@ -1414,13 +1459,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>                         goto out_err;
>                 }
>
> -               ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> +               ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
>                                              page_to_phys(page_entry->page),
>                                              prot, &cache);
>                 spin_unlock(&kvm->mmu_lock);
>
>                 if (ret) {
> -                       kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +                       kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>                                                  i << PAGE_SHIFT);
>                         unpin_memslot_pages(memslot, writable);
>                         goto out_err;
> @@ -1439,7 +1484,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>          */
>         ret = account_locked_vm(current->mm, npages, true);
>         if (ret) {
> -               kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +               kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>                                          npages << PAGE_SHIFT);
>                 unpin_memslot_pages(memslot, writable);
>                 goto out_err;
> @@ -1449,6 +1494,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>         if (writable)
>                 memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
>
> +       set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> +
>         kvm_mmu_free_memory_cache(&cache);
>
>         return 0;
> --
> 2.33.1
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 04/38] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
@ 2022-02-24  5:56     ` Reiji Watanabe
  0 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-02-24  5:56 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi Alex,

On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> KVM relies on doing dcache maintenance on stage 2 faults to present to a
> guest running with the MMU off the same view of memory as userspace. For
> locked memslots, KVM so far has done the dcache maintenance when a memslot
> is locked, but that leaves KVM in a rather awkward position: what userspace
> writes to guest memory after the memslot is locked, but before a VCPU is
> run, might not be visible to the guest.
>
> Fix this by deferring the dcache maintenance until the first VCPU is run.
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  7 ++++
>  arch/arm64/include/asm/kvm_mmu.h  |  5 +++
>  arch/arm64/kvm/arm.c              |  3 ++
>  arch/arm64/kvm/mmu.c              | 55 ++++++++++++++++++++++++++++---
>  4 files changed, 66 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7fd70ad90c16..3b4839b447c4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -113,6 +113,10 @@ struct kvm_arch_memory_slot {
>         u32 flags;
>  };
>
> +/* kvm->arch.mmu_pending_ops flags */
> +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE        0
> +#define KVM_MAX_MMU_PENDING_OPS                1
> +
>  struct kvm_arch {
>         struct kvm_s2_mmu mmu;
>
> @@ -136,6 +140,9 @@ struct kvm_arch {
>          */
>         bool return_nisv_io_abort_to_user;
>
> +       /* Defer MMU operations until a VCPU is run. */
> +       unsigned long mmu_pending_ops;
> +
>         /*
>          * VM-wide PMU filter, implemented as a bitmap and big enough for
>          * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 2c50734f048d..cbf57c474fea 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>
> +#define kvm_mmu_has_pending_ops(kvm)   \
> +       (!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> +
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> +
>  static inline unsigned int kvm_get_vmid_bits(void)
>  {
>         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index b9b8b43835e3..96ed48455cdd 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -870,6 +870,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>         if (unlikely(!kvm_vcpu_initialized(vcpu)))
>                 return -ENOEXEC;
>
> +       if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> +               kvm_mmu_perform_pending_ops(vcpu->kvm);
> +
>         ret = kvm_vcpu_first_run_init(vcpu);
>         if (ret)
>                 return ret;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index b0a8e61315e4..8e4787019840 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1305,6 +1305,40 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>         return ret;
>  }
>
> +/*
> + * It's safe to do the CMOs when the first VCPU is run because:
> + * - VCPUs cannot run until mmu_cmo_needed is cleared.

What does 'mmu_cmo_needed' mean ? Do you mean 'mmu_pending_ops' instead ?


> + * - Memslots cannot be modified because we hold the kvm->slots_lock.
> + *
> + * It's safe to periodically release the mmu_lock because:
> + * - VCPUs cannot run.
> + * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
> + *   the mmu_lock, which means accesses will be serialized.
> + * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> + *   is live, which means that the VM will be live.
> + */
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> +{
> +       struct kvm_memory_slot *memslot;
> +
> +       mutex_lock(&kvm->slots_lock);
> +       if (!kvm_mmu_has_pending_ops(kvm))
> +               goto out_unlock;
> +
> +       if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> +               kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +                       if (!memslot_is_locked(memslot))
> +                               continue;

Shouldn't the code hold the mmu_lock to call stage2_flush_memslot() ?

> +                       stage2_flush_memslot(kvm, memslot);

Since stage2_flush_memslot() won't do anything when stage2_has_fwb()
returns true, I wonder if it can be checked even before iterating
memslots (so those iterations can be skipped when not needed).

Thanks,
Reiji

> +               }
> +               clear_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> +       }
> +
> +out_unlock:
> +       mutex_unlock(&kvm->slots_lock);
> +       return;
> +}
> +
>  static int try_rlimit_memlock(unsigned long npages)
>  {
>         unsigned long lock_limit;
> @@ -1345,7 +1379,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>         struct kvm_memory_slot_page *page_entry;
>         bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
>         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> -       struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +       struct kvm_pgtable pgt;
> +       struct kvm_pgtable_mm_ops mm_ops;
>         struct vm_area_struct *vma;
>         unsigned long npages = memslot->npages;
>         unsigned int pin_flags = FOLL_LONGTERM;
> @@ -1363,6 +1398,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>                 pin_flags |= FOLL_WRITE;
>         }
>
> +       /*
> +        * Make a copy of the stage 2 translation table struct to remove the
> +        * dcache callback so we can postpone the cache maintenance operations
> +        * until the first VCPU is run.
> +        */
> +       mm_ops = *kvm->arch.mmu.pgt->mm_ops;
> +       mm_ops.dcache_clean_inval_poc = NULL;
> +       pgt = *kvm->arch.mmu.pgt;
> +       pgt.mm_ops = &mm_ops;
> +
>         hva = memslot->userspace_addr;
>         ipa = memslot->base_gfn << PAGE_SHIFT;
>
> @@ -1414,13 +1459,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>                         goto out_err;
>                 }
>
> -               ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> +               ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
>                                              page_to_phys(page_entry->page),
>                                              prot, &cache);
>                 spin_unlock(&kvm->mmu_lock);
>
>                 if (ret) {
> -                       kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +                       kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>                                                  i << PAGE_SHIFT);
>                         unpin_memslot_pages(memslot, writable);
>                         goto out_err;
> @@ -1439,7 +1484,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>          */
>         ret = account_locked_vm(current->mm, npages, true);
>         if (ret) {
> -               kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +               kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>                                          npages << PAGE_SHIFT);
>                 unpin_memslot_pages(memslot, writable);
>                 goto out_err;
> @@ -1449,6 +1494,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>         if (writable)
>                 memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
>
> +       set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> +
>         kvm_mmu_free_memory_cache(&cache);
>
>         return 0;
> --
> 2.33.1
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 06/38] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs
  2021-11-17 15:38   ` Alexandru Elisei
@ 2022-03-18  5:03     ` Reiji Watanabe
  -1 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-03-18  5:03 UTC (permalink / raw)
  To: Alexandru Elisei, maz, james.morse, suzuki.poulose,
	linux-arm-kernel, kvmarm, will, mark.rutland

Hi Alex,

On 11/17/21 7:38 AM, Alexandru Elisei wrote:
> When an MTE-enabled guest first accesses a physical page, that page must be
> scrubbed for tags. This is normally done by KVM on a translation fault, but
> with locked memslots we will not get translation faults. So far, this has
> been handled by forbidding userspace to enable the MTE capability after
> locking a memslot.
> 
> Remove this constraint by deferring tag cleaning until the first VCPU is
> run, similar to how KVM handles cache maintenance operations.
> 
> When userspace resets a VCPU, KVM again performs cache maintenance
> operations on locked memslots because userspace might have modified the
> guest memory. Clean the tags the next time a VCPU is run for the same
> reason.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>   arch/arm64/include/asm/kvm_host.h |  7 ++-
>   arch/arm64/include/asm/kvm_mmu.h  |  2 +-
>   arch/arm64/kvm/arm.c              | 29 ++--------
>   arch/arm64/kvm/mmu.c              | 95 ++++++++++++++++++++++++++-----
>   4 files changed, 91 insertions(+), 42 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 5f49a27ce289..0ebdef158020 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -114,9 +114,10 @@ struct kvm_arch_memory_slot {
>   };
>   
>   /* kvm->arch.mmu_pending_ops flags */
> -#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
> -#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE	1
> -#define KVM_MAX_MMU_PENDING_OPS		2
> +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE		0
> +#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE		1
> +#define KVM_LOCKED_MEMSLOT_SANITISE_TAGS	2
> +#define KVM_MAX_MMU_PENDING_OPS			3
>   
>   struct kvm_arch {
>   	struct kvm_s2_mmu mmu;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index cbf57c474fea..2d2f902000b3 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -222,7 +222,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>   #define kvm_mmu_has_pending_ops(kvm)	\
>   	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
>   
> -void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> +int kvm_mmu_perform_pending_ops(struct kvm *kvm);
>   
>   static inline unsigned int kvm_get_vmid_bits(void)
>   {
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 96ed48455cdd..13f3af1f2e78 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -106,25 +106,6 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
>   	}
>   }
>   
> -static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
> -{
> -	struct kvm_memslots *slots = kvm_memslots(kvm);
> -	struct kvm_memory_slot *memslot;
> -	bool has_locked_memslots = false;
> -	int idx;
> -
> -	idx = srcu_read_lock(&kvm->srcu);
> -	kvm_for_each_memslot(memslot, slots) {
> -		if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
> -			has_locked_memslots = true;
> -			break;
> -		}
> -	}
> -	srcu_read_unlock(&kvm->srcu, idx);
> -
> -	return has_locked_memslots;
> -}
> -
>   int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   			    struct kvm_enable_cap *cap)
>   {
> @@ -139,8 +120,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		break;
>   	case KVM_CAP_ARM_MTE:
>   		mutex_lock(&kvm->lock);
> -		if (!system_supports_mte() || kvm->created_vcpus ||
> -		    (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
> +		if (!system_supports_mte() || kvm->created_vcpus) {
>   			r = -EINVAL;
>   		} else {
>   			r = 0;
> @@ -870,8 +850,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>   	if (unlikely(!kvm_vcpu_initialized(vcpu)))
>   		return -ENOEXEC;
>   
> -	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> -		kvm_mmu_perform_pending_ops(vcpu->kvm);
> +	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm))) {
> +		ret = kvm_mmu_perform_pending_ops(vcpu->kvm);
> +		if (ret)
> +			return ret;
> +	}
>   
>   	ret = kvm_vcpu_first_run_init(vcpu);
>   	if (ret)
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 188064c5839c..2491e73e3d31 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -613,6 +613,15 @@ void stage2_unmap_vm(struct kvm *kvm)
>   				&kvm->arch.mmu_pending_ops);
>   			set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE,
>   				&kvm->arch.mmu_pending_ops);
> +			/*
> +			 * stage2_unmap_vm() is called after a VCPU has run, at
> +			 * which point the state of the MTE cap (either enabled
> +			 * or disabled) is final.
> +			 */
> +			if (kvm_has_mte(kvm)) {
> +				set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS,
> +					&kvm->arch.mmu_pending_ops);
> +			}
>   			continue;
>   		}
>   		stage2_unmap_memslot(kvm, memslot);
> @@ -956,6 +965,55 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>   	return 0;
>   }
>   
> +static int sanitise_mte_tags_memslot(struct kvm *kvm,
> +				     struct kvm_memory_slot *memslot)
> +{
> +	unsigned long hva, slot_size, slot_end;
> +	struct kvm_memory_slot_page *entry;
> +	struct page *page;
> +	int ret = 0;
> +
> +	hva = memslot->userspace_addr;
> +	slot_size = memslot->npages << PAGE_SHIFT;
> +	slot_end = hva + slot_size;
> +
> +	/* First check that the VMAs spanning the memslot are not shared... */
> +	do {
> +		struct vm_area_struct *vma;
> +
> +		vma = find_vma_intersection(current->mm, hva, slot_end);
> +		/* The VMAs spanning the memslot must be contiguous. */
> +		if (!vma) {
> +			ret = -EFAULT;
> +			goto out;
> +		}
> +		/*
> +		 * VM_SHARED mappings are not allowed with MTE to avoid races
> +		 * when updating the PG_mte_tagged page flag, see
> +		 * sanitise_mte_tags for more details.
> +		 */
> +		if (vma->vm_flags & VM_SHARED) {
> +			ret = -EFAULT;
> +			goto out;
> +		}
> +		hva = min(slot_end, vma->vm_end);
> +	} while (hva < slot_end);
> +
> +	/* ... then clear the tags. */
> +	list_for_each_entry(entry, &memslot->arch.pages.list, list) {
> +		page = entry->page;
> +		if (!test_bit(PG_mte_tagged, &page->flags)) {
> +			mte_clear_page_tags(page_address(page));
> +			set_bit(PG_mte_tagged, &page->flags);
> +		}
> +	}
> +
> +out:
> +	mmap_read_unlock(current->mm);

This appears unnecessary (taken care by the caller).



> +
> +	return ret;
> +}
> +
>   static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   			  struct kvm_memory_slot *memslot, unsigned long hva,
>   			  unsigned long fault_status)
> @@ -1325,14 +1383,29 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>    * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
>    *   is live, which means that the VM will be live.
>    */
> -void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> +int kvm_mmu_perform_pending_ops(struct kvm *kvm)
>   {
>   	struct kvm_memory_slot *memslot;
> +	int ret = 0;
>   
>   	mutex_lock(&kvm->slots_lock);
>   	if (!kvm_mmu_has_pending_ops(kvm))
>   		goto out_unlock;
>   
> +	if (kvm_has_mte(kvm) &&
> +	    (test_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops))) {
> +		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +			if (!memslot_is_locked(memslot))
> +				continue;
> +			mmap_read_lock(current->mm);
> +			ret = sanitise_mte_tags_memslot(kvm, memslot);
> +			mmap_read_unlock(current->mm);
> +			if (ret)
> +				goto out_unlock;
> +		}
> +		clear_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
> +	}
> +
>   	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
>   		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
>   			if (!memslot_is_locked(memslot))
> @@ -1349,7 +1422,7 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm)
>   
>   out_unlock:
>   	mutex_unlock(&kvm->slots_lock);
> -	return;
> +	return ret;
>   }
>   
>   static int try_rlimit_memlock(unsigned long npages)
> @@ -1443,19 +1516,6 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>   			ret = -ENOMEM;
>   			goto out_err;
>   		}
> -		if (kvm_has_mte(kvm)) {
> -			if (vma->vm_flags & VM_SHARED) {
> -				ret = -EFAULT;
> -			} else {
> -				ret = sanitise_mte_tags(kvm,
> -					page_to_pfn(page_entry->page),
> -					PAGE_SIZE);
> -			}
> -			if (ret) {
> -				mmap_read_unlock(current->mm);
> -				goto out_err;
> -			}
> -		}
>   		mmap_read_unlock(current->mm);
>   
>   		ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
> @@ -1508,6 +1568,11 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>   		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
>   
>   	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> +	/*
> +	 * MTE might be enabled after we lock the memslot, set it here
> +	 * unconditionally.
> +	 */
> +	set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);


Since this won't be needed when the system doesn't support MTE,
shouldn't the code check if MTE is supported on the system ?

What is the reason to set this here rather than when the mte
is enabled ?
When MTE is not used, once KVM_LOCKED_MEMSLOT_SANITISE_TAGS is set,
it appears that KVM_LOCKED_MEMSLOT_SANITISE_TAGS won't be cleared
until all memslots are unlocked (Correct ?). I would think it
shouldn't be set when unnecessary or should be cleared once it turns
out to be unnecessary.

Thanks,
Reiji


>   
>   	kvm_mmu_free_memory_cache(&cache);
>   
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 06/38] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs
@ 2022-03-18  5:03     ` Reiji Watanabe
  0 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-03-18  5:03 UTC (permalink / raw)
  To: Alexandru Elisei, maz, james.morse, suzuki.poulose,
	linux-arm-kernel, kvmarm, will, mark.rutland

Hi Alex,

On 11/17/21 7:38 AM, Alexandru Elisei wrote:
> When an MTE-enabled guest first accesses a physical page, that page must be
> scrubbed for tags. This is normally done by KVM on a translation fault, but
> with locked memslots we will not get translation faults. So far, this has
> been handled by forbidding userspace to enable the MTE capability after
> locking a memslot.
> 
> Remove this constraint by deferring tag cleaning until the first VCPU is
> run, similar to how KVM handles cache maintenance operations.
> 
> When userspace resets a VCPU, KVM again performs cache maintenance
> operations on locked memslots because userspace might have modified the
> guest memory. Clean the tags the next time a VCPU is run for the same
> reason.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>   arch/arm64/include/asm/kvm_host.h |  7 ++-
>   arch/arm64/include/asm/kvm_mmu.h  |  2 +-
>   arch/arm64/kvm/arm.c              | 29 ++--------
>   arch/arm64/kvm/mmu.c              | 95 ++++++++++++++++++++++++++-----
>   4 files changed, 91 insertions(+), 42 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 5f49a27ce289..0ebdef158020 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -114,9 +114,10 @@ struct kvm_arch_memory_slot {
>   };
>   
>   /* kvm->arch.mmu_pending_ops flags */
> -#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
> -#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE	1
> -#define KVM_MAX_MMU_PENDING_OPS		2
> +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE		0
> +#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE		1
> +#define KVM_LOCKED_MEMSLOT_SANITISE_TAGS	2
> +#define KVM_MAX_MMU_PENDING_OPS			3
>   
>   struct kvm_arch {
>   	struct kvm_s2_mmu mmu;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index cbf57c474fea..2d2f902000b3 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -222,7 +222,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>   #define kvm_mmu_has_pending_ops(kvm)	\
>   	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
>   
> -void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> +int kvm_mmu_perform_pending_ops(struct kvm *kvm);
>   
>   static inline unsigned int kvm_get_vmid_bits(void)
>   {
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 96ed48455cdd..13f3af1f2e78 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -106,25 +106,6 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
>   	}
>   }
>   
> -static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
> -{
> -	struct kvm_memslots *slots = kvm_memslots(kvm);
> -	struct kvm_memory_slot *memslot;
> -	bool has_locked_memslots = false;
> -	int idx;
> -
> -	idx = srcu_read_lock(&kvm->srcu);
> -	kvm_for_each_memslot(memslot, slots) {
> -		if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
> -			has_locked_memslots = true;
> -			break;
> -		}
> -	}
> -	srcu_read_unlock(&kvm->srcu, idx);
> -
> -	return has_locked_memslots;
> -}
> -
>   int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   			    struct kvm_enable_cap *cap)
>   {
> @@ -139,8 +120,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		break;
>   	case KVM_CAP_ARM_MTE:
>   		mutex_lock(&kvm->lock);
> -		if (!system_supports_mte() || kvm->created_vcpus ||
> -		    (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
> +		if (!system_supports_mte() || kvm->created_vcpus) {
>   			r = -EINVAL;
>   		} else {
>   			r = 0;
> @@ -870,8 +850,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>   	if (unlikely(!kvm_vcpu_initialized(vcpu)))
>   		return -ENOEXEC;
>   
> -	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> -		kvm_mmu_perform_pending_ops(vcpu->kvm);
> +	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm))) {
> +		ret = kvm_mmu_perform_pending_ops(vcpu->kvm);
> +		if (ret)
> +			return ret;
> +	}
>   
>   	ret = kvm_vcpu_first_run_init(vcpu);
>   	if (ret)
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 188064c5839c..2491e73e3d31 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -613,6 +613,15 @@ void stage2_unmap_vm(struct kvm *kvm)
>   				&kvm->arch.mmu_pending_ops);
>   			set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE,
>   				&kvm->arch.mmu_pending_ops);
> +			/*
> +			 * stage2_unmap_vm() is called after a VCPU has run, at
> +			 * which point the state of the MTE cap (either enabled
> +			 * or disabled) is final.
> +			 */
> +			if (kvm_has_mte(kvm)) {
> +				set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS,
> +					&kvm->arch.mmu_pending_ops);
> +			}
>   			continue;
>   		}
>   		stage2_unmap_memslot(kvm, memslot);
> @@ -956,6 +965,55 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>   	return 0;
>   }
>   
> +static int sanitise_mte_tags_memslot(struct kvm *kvm,
> +				     struct kvm_memory_slot *memslot)
> +{
> +	unsigned long hva, slot_size, slot_end;
> +	struct kvm_memory_slot_page *entry;
> +	struct page *page;
> +	int ret = 0;
> +
> +	hva = memslot->userspace_addr;
> +	slot_size = memslot->npages << PAGE_SHIFT;
> +	slot_end = hva + slot_size;
> +
> +	/* First check that the VMAs spanning the memslot are not shared... */
> +	do {
> +		struct vm_area_struct *vma;
> +
> +		vma = find_vma_intersection(current->mm, hva, slot_end);
> +		/* The VMAs spanning the memslot must be contiguous. */
> +		if (!vma) {
> +			ret = -EFAULT;
> +			goto out;
> +		}
> +		/*
> +		 * VM_SHARED mappings are not allowed with MTE to avoid races
> +		 * when updating the PG_mte_tagged page flag, see
> +		 * sanitise_mte_tags for more details.
> +		 */
> +		if (vma->vm_flags & VM_SHARED) {
> +			ret = -EFAULT;
> +			goto out;
> +		}
> +		hva = min(slot_end, vma->vm_end);
> +	} while (hva < slot_end);
> +
> +	/* ... then clear the tags. */
> +	list_for_each_entry(entry, &memslot->arch.pages.list, list) {
> +		page = entry->page;
> +		if (!test_bit(PG_mte_tagged, &page->flags)) {
> +			mte_clear_page_tags(page_address(page));
> +			set_bit(PG_mte_tagged, &page->flags);
> +		}
> +	}
> +
> +out:
> +	mmap_read_unlock(current->mm);

This appears unnecessary (taken care by the caller).



> +
> +	return ret;
> +}
> +
>   static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   			  struct kvm_memory_slot *memslot, unsigned long hva,
>   			  unsigned long fault_status)
> @@ -1325,14 +1383,29 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>    * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
>    *   is live, which means that the VM will be live.
>    */
> -void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> +int kvm_mmu_perform_pending_ops(struct kvm *kvm)
>   {
>   	struct kvm_memory_slot *memslot;
> +	int ret = 0;
>   
>   	mutex_lock(&kvm->slots_lock);
>   	if (!kvm_mmu_has_pending_ops(kvm))
>   		goto out_unlock;
>   
> +	if (kvm_has_mte(kvm) &&
> +	    (test_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops))) {
> +		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +			if (!memslot_is_locked(memslot))
> +				continue;
> +			mmap_read_lock(current->mm);
> +			ret = sanitise_mte_tags_memslot(kvm, memslot);
> +			mmap_read_unlock(current->mm);
> +			if (ret)
> +				goto out_unlock;
> +		}
> +		clear_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
> +	}
> +
>   	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
>   		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
>   			if (!memslot_is_locked(memslot))
> @@ -1349,7 +1422,7 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm)
>   
>   out_unlock:
>   	mutex_unlock(&kvm->slots_lock);
> -	return;
> +	return ret;
>   }
>   
>   static int try_rlimit_memlock(unsigned long npages)
> @@ -1443,19 +1516,6 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>   			ret = -ENOMEM;
>   			goto out_err;
>   		}
> -		if (kvm_has_mte(kvm)) {
> -			if (vma->vm_flags & VM_SHARED) {
> -				ret = -EFAULT;
> -			} else {
> -				ret = sanitise_mte_tags(kvm,
> -					page_to_pfn(page_entry->page),
> -					PAGE_SIZE);
> -			}
> -			if (ret) {
> -				mmap_read_unlock(current->mm);
> -				goto out_err;
> -			}
> -		}
>   		mmap_read_unlock(current->mm);
>   
>   		ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
> @@ -1508,6 +1568,11 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>   		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
>   
>   	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> +	/*
> +	 * MTE might be enabled after we lock the memslot, set it here
> +	 * unconditionally.
> +	 */
> +	set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);


Since this won't be needed when the system doesn't support MTE,
shouldn't the code check if MTE is supported on the system ?

What is the reason to set this here rather than when the mte
is enabled ?
When MTE is not used, once KVM_LOCKED_MEMSLOT_SANITISE_TAGS is set,
it appears that KVM_LOCKED_MEMSLOT_SANITISE_TAGS won't be cleared
until all memslots are unlocked (Correct ?). I would think it
shouldn't be set when unnecessary or should be cleared once it turns
out to be unnecessary.

Thanks,
Reiji


>   
>   	kvm_mmu_free_memory_cache(&cache);
>   

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 08/38] KVM: arm64: Unlock memslots after stage 2 tables are freed
  2021-11-17 15:38   ` Alexandru Elisei
@ 2022-03-18  5:19     ` Reiji Watanabe
  -1 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-03-18  5:19 UTC (permalink / raw)
  To: Alexandru Elisei, maz, james.morse, suzuki.poulose,
	linux-arm-kernel, kvmarm, will, mark.rutland

Hi Alex,

On 11/17/21 7:38 AM, Alexandru Elisei wrote:
> Unpin the backing pages mapped at stage 2 after the stage 2 translation
> tables are destroyed.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>   arch/arm64/kvm/mmu.c | 23 ++++++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index cd6f1bc7842d..072e2aba371f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1627,11 +1627,19 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>   	return ret;
>   }
>   
> -static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> +static void __unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
>   {
>   	bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
>   	unsigned long npages = memslot->npages;
>   
> +	unpin_memslot_pages(memslot, writable);
> +	account_locked_vm(current->mm, npages, false);
> +
> +	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> +}
> +
> +static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> +{
>   	/*
>   	 * MMU maintenace operations aren't performed on an unlocked memslot.
>   	 * Unmap it from stage 2 so the abort handler performs the necessary
> @@ -1640,10 +1648,7 @@ static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
>   	if (kvm_mmu_has_pending_ops(kvm))
>   		kvm_arch_flush_shadow_memslot(kvm, memslot);
>   
> -	unpin_memslot_pages(memslot, writable);
> -	account_locked_vm(current->mm, npages, false);
> -
> -	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> +	__unlock_memslot(kvm, memslot);
>   }
>   
>   int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> @@ -1951,7 +1956,15 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
>   
>   void kvm_arch_flush_shadow_all(struct kvm *kvm)
>   {
> +	struct kvm_memory_slot *memslot;
> +
>   	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +
> +	kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +		if (!memslot_is_locked(memslot))
> +			continue;
> +		__unlock_memslot(kvm, memslot);
> +	}
>   }

Perhaps it might be useful to manage the number of locked memslots ?
(can be used in the fix for kvm_mmu_unlock_memslot in the patch-7 as well)
                                                  
Thanks,
Reiji


>   
>   void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 08/38] KVM: arm64: Unlock memslots after stage 2 tables are freed
@ 2022-03-18  5:19     ` Reiji Watanabe
  0 siblings, 0 replies; 118+ messages in thread
From: Reiji Watanabe @ 2022-03-18  5:19 UTC (permalink / raw)
  To: Alexandru Elisei, maz, james.morse, suzuki.poulose,
	linux-arm-kernel, kvmarm, will, mark.rutland

Hi Alex,

On 11/17/21 7:38 AM, Alexandru Elisei wrote:
> Unpin the backing pages mapped at stage 2 after the stage 2 translation
> tables are destroyed.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>   arch/arm64/kvm/mmu.c | 23 ++++++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index cd6f1bc7842d..072e2aba371f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1627,11 +1627,19 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
>   	return ret;
>   }
>   
> -static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> +static void __unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
>   {
>   	bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
>   	unsigned long npages = memslot->npages;
>   
> +	unpin_memslot_pages(memslot, writable);
> +	account_locked_vm(current->mm, npages, false);
> +
> +	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> +}
> +
> +static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> +{
>   	/*
>   	 * MMU maintenace operations aren't performed on an unlocked memslot.
>   	 * Unmap it from stage 2 so the abort handler performs the necessary
> @@ -1640,10 +1648,7 @@ static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
>   	if (kvm_mmu_has_pending_ops(kvm))
>   		kvm_arch_flush_shadow_memslot(kvm, memslot);
>   
> -	unpin_memslot_pages(memslot, writable);
> -	account_locked_vm(current->mm, npages, false);
> -
> -	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> +	__unlock_memslot(kvm, memslot);
>   }
>   
>   int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> @@ -1951,7 +1956,15 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
>   
>   void kvm_arch_flush_shadow_all(struct kvm *kvm)
>   {
> +	struct kvm_memory_slot *memslot;
> +
>   	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +
> +	kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +		if (!memslot_is_locked(memslot))
> +			continue;
> +		__unlock_memslot(kvm, memslot);
> +	}
>   }

Perhaps it might be useful to manage the number of locked memslots ?
(can be used in the fix for kvm_mmu_unlock_memslot in the patch-7 as well)
                                                  
Thanks,
Reiji


>   
>   void kvm_arch_flush_shadow_memslot(struct kvm *kvm,

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 04/38] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
  2022-02-24  5:56     ` Reiji Watanabe
@ 2022-03-21 17:10       ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-03-21 17:10 UTC (permalink / raw)
  To: Reiji Watanabe; +Cc: Marc Zyngier, Will Deacon, kvmarm, Linux ARM

Hi,

On Wed, Feb 23, 2022 at 09:56:01PM -0800, Reiji Watanabe wrote:
> Hi Alex,
> 
> On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > KVM relies on doing dcache maintenance on stage 2 faults to present to a
> > guest running with the MMU off the same view of memory as userspace. For
> > locked memslots, KVM so far has done the dcache maintenance when a memslot
> > is locked, but that leaves KVM in a rather awkward position: what userspace
> > writes to guest memory after the memslot is locked, but before a VCPU is
> > run, might not be visible to the guest.
> >
> > Fix this by deferring the dcache maintenance until the first VCPU is run.
> >
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  7 ++++
> >  arch/arm64/include/asm/kvm_mmu.h  |  5 +++
> >  arch/arm64/kvm/arm.c              |  3 ++
> >  arch/arm64/kvm/mmu.c              | 55 ++++++++++++++++++++++++++++---
> >  4 files changed, 66 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7fd70ad90c16..3b4839b447c4 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -113,6 +113,10 @@ struct kvm_arch_memory_slot {
> >         u32 flags;
> >  };
> >
> > +/* kvm->arch.mmu_pending_ops flags */
> > +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE        0
> > +#define KVM_MAX_MMU_PENDING_OPS                1
> > +
> >  struct kvm_arch {
> >         struct kvm_s2_mmu mmu;
> >
> > @@ -136,6 +140,9 @@ struct kvm_arch {
> >          */
> >         bool return_nisv_io_abort_to_user;
> >
> > +       /* Defer MMU operations until a VCPU is run. */
> > +       unsigned long mmu_pending_ops;
> > +
> >         /*
> >          * VM-wide PMU filter, implemented as a bitmap and big enough for
> >          * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index 2c50734f048d..cbf57c474fea 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> >  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> >  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> >
> > +#define kvm_mmu_has_pending_ops(kvm)   \
> > +       (!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> > +
> > +void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> > +
> >  static inline unsigned int kvm_get_vmid_bits(void)
> >  {
> >         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index b9b8b43835e3..96ed48455cdd 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -870,6 +870,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >         if (unlikely(!kvm_vcpu_initialized(vcpu)))
> >                 return -ENOEXEC;
> >
> > +       if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> > +               kvm_mmu_perform_pending_ops(vcpu->kvm);
> > +
> >         ret = kvm_vcpu_first_run_init(vcpu);
> >         if (ret)
> >                 return ret;
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index b0a8e61315e4..8e4787019840 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1305,6 +1305,40 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >         return ret;
> >  }
> >
> > +/*
> > + * It's safe to do the CMOs when the first VCPU is run because:
> > + * - VCPUs cannot run until mmu_cmo_needed is cleared.
> 
> What does 'mmu_cmo_needed' mean ? Do you mean 'mmu_pending_ops' instead ?

Yes, I meant mmu_pending_ops here. I used mmu_cmo_needed for the field name
as I was working on it and I forgot to change it. Will fix it.

> 
> 
> > + * - Memslots cannot be modified because we hold the kvm->slots_lock.
> > + *
> > + * It's safe to periodically release the mmu_lock because:
> > + * - VCPUs cannot run.
> > + * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
> > + *   the mmu_lock, which means accesses will be serialized.
> > + * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> > + *   is live, which means that the VM will be live.
> > + */
> > +void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> > +{
> > +       struct kvm_memory_slot *memslot;
> > +
> > +       mutex_lock(&kvm->slots_lock);
> > +       if (!kvm_mmu_has_pending_ops(kvm))
> > +               goto out_unlock;
> > +
> > +       if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> > +               kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> > +                       if (!memslot_is_locked(memslot))
> > +                               continue;
> 
> Shouldn't the code hold the mmu_lock to call stage2_flush_memslot() ?

There will be no contention between different VCPUs because the stage 2
translation tables are protected against concurrent accesses with the
kvm->slots_lock mutex above. But stage2_flush_memslot() expects the
mmu_lock to be held and it will be periodically released by
cond_resched_lock() in stage2_apply_range(); if the lock is not held, then
lockdep will complain about it.

Your observation actually explains why I was seeing intermitent warnings
when lockdep was enabled: __cond_resched_lock was complaining the KVM was
trying to release a lock it wasn't holding. Thank you for pointing the
missing lock acquire operation.

I'll change the code to avoid the lockdep warning.

> 
> > +                       stage2_flush_memslot(kvm, memslot);
> 
> Since stage2_flush_memslot() won't do anything when stage2_has_fwb()
> returns true, I wonder if it can be checked even before iterating
> memslots (so those iterations can be skipped when not needed).

I think this can be further improved by setting the
KVM_LOCKED_MEMSLOT_FLUSH_DCACHE bit only if FWB is not present.

Thanks,
Alex

> 
> Thanks,
> Reiji
> 
> > +               }
> > +               clear_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> > +       }
> > +
> > +out_unlock:
> > +       mutex_unlock(&kvm->slots_lock);
> > +       return;
> > +}
> > +
> >  static int try_rlimit_memlock(unsigned long npages)
> >  {
> >         unsigned long lock_limit;
> > @@ -1345,7 +1379,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >         struct kvm_memory_slot_page *page_entry;
> >         bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
> >         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> > -       struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> > +       struct kvm_pgtable pgt;
> > +       struct kvm_pgtable_mm_ops mm_ops;
> >         struct vm_area_struct *vma;
> >         unsigned long npages = memslot->npages;
> >         unsigned int pin_flags = FOLL_LONGTERM;
> > @@ -1363,6 +1398,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >                 pin_flags |= FOLL_WRITE;
> >         }
> >
> > +       /*
> > +        * Make a copy of the stage 2 translation table struct to remove the
> > +        * dcache callback so we can postpone the cache maintenance operations
> > +        * until the first VCPU is run.
> > +        */
> > +       mm_ops = *kvm->arch.mmu.pgt->mm_ops;
> > +       mm_ops.dcache_clean_inval_poc = NULL;
> > +       pgt = *kvm->arch.mmu.pgt;
> > +       pgt.mm_ops = &mm_ops;
> > +
> >         hva = memslot->userspace_addr;
> >         ipa = memslot->base_gfn << PAGE_SHIFT;
> >
> > @@ -1414,13 +1459,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >                         goto out_err;
> >                 }
> >
> > -               ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> > +               ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
> >                                              page_to_phys(page_entry->page),
> >                                              prot, &cache);
> >                 spin_unlock(&kvm->mmu_lock);
> >
> >                 if (ret) {
> > -                       kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> > +                       kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
> >                                                  i << PAGE_SHIFT);
> >                         unpin_memslot_pages(memslot, writable);
> >                         goto out_err;
> > @@ -1439,7 +1484,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >          */
> >         ret = account_locked_vm(current->mm, npages, true);
> >         if (ret) {
> > -               kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> > +               kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
> >                                          npages << PAGE_SHIFT);
> >                 unpin_memslot_pages(memslot, writable);
> >                 goto out_err;
> > @@ -1449,6 +1494,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >         if (writable)
> >                 memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
> >
> > +       set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> > +
> >         kvm_mmu_free_memory_cache(&cache);
> >
> >         return 0;
> > --
> > 2.33.1
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 04/38] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
@ 2022-03-21 17:10       ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-03-21 17:10 UTC (permalink / raw)
  To: Reiji Watanabe
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Linux ARM, kvmarm,
	Will Deacon, Mark Rutland

Hi,

On Wed, Feb 23, 2022 at 09:56:01PM -0800, Reiji Watanabe wrote:
> Hi Alex,
> 
> On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > KVM relies on doing dcache maintenance on stage 2 faults to present to a
> > guest running with the MMU off the same view of memory as userspace. For
> > locked memslots, KVM so far has done the dcache maintenance when a memslot
> > is locked, but that leaves KVM in a rather awkward position: what userspace
> > writes to guest memory after the memslot is locked, but before a VCPU is
> > run, might not be visible to the guest.
> >
> > Fix this by deferring the dcache maintenance until the first VCPU is run.
> >
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  7 ++++
> >  arch/arm64/include/asm/kvm_mmu.h  |  5 +++
> >  arch/arm64/kvm/arm.c              |  3 ++
> >  arch/arm64/kvm/mmu.c              | 55 ++++++++++++++++++++++++++++---
> >  4 files changed, 66 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7fd70ad90c16..3b4839b447c4 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -113,6 +113,10 @@ struct kvm_arch_memory_slot {
> >         u32 flags;
> >  };
> >
> > +/* kvm->arch.mmu_pending_ops flags */
> > +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE        0
> > +#define KVM_MAX_MMU_PENDING_OPS                1
> > +
> >  struct kvm_arch {
> >         struct kvm_s2_mmu mmu;
> >
> > @@ -136,6 +140,9 @@ struct kvm_arch {
> >          */
> >         bool return_nisv_io_abort_to_user;
> >
> > +       /* Defer MMU operations until a VCPU is run. */
> > +       unsigned long mmu_pending_ops;
> > +
> >         /*
> >          * VM-wide PMU filter, implemented as a bitmap and big enough for
> >          * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index 2c50734f048d..cbf57c474fea 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> >  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> >  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> >
> > +#define kvm_mmu_has_pending_ops(kvm)   \
> > +       (!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> > +
> > +void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> > +
> >  static inline unsigned int kvm_get_vmid_bits(void)
> >  {
> >         int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index b9b8b43835e3..96ed48455cdd 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -870,6 +870,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >         if (unlikely(!kvm_vcpu_initialized(vcpu)))
> >                 return -ENOEXEC;
> >
> > +       if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> > +               kvm_mmu_perform_pending_ops(vcpu->kvm);
> > +
> >         ret = kvm_vcpu_first_run_init(vcpu);
> >         if (ret)
> >                 return ret;
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index b0a8e61315e4..8e4787019840 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1305,6 +1305,40 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >         return ret;
> >  }
> >
> > +/*
> > + * It's safe to do the CMOs when the first VCPU is run because:
> > + * - VCPUs cannot run until mmu_cmo_needed is cleared.
> 
> What does 'mmu_cmo_needed' mean ? Do you mean 'mmu_pending_ops' instead ?

Yes, I meant mmu_pending_ops here. I used mmu_cmo_needed for the field name
as I was working on it and I forgot to change it. Will fix it.

> 
> 
> > + * - Memslots cannot be modified because we hold the kvm->slots_lock.
> > + *
> > + * It's safe to periodically release the mmu_lock because:
> > + * - VCPUs cannot run.
> > + * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
> > + *   the mmu_lock, which means accesses will be serialized.
> > + * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> > + *   is live, which means that the VM will be live.
> > + */
> > +void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> > +{
> > +       struct kvm_memory_slot *memslot;
> > +
> > +       mutex_lock(&kvm->slots_lock);
> > +       if (!kvm_mmu_has_pending_ops(kvm))
> > +               goto out_unlock;
> > +
> > +       if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> > +               kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> > +                       if (!memslot_is_locked(memslot))
> > +                               continue;
> 
> Shouldn't the code hold the mmu_lock to call stage2_flush_memslot() ?

There will be no contention between different VCPUs because the stage 2
translation tables are protected against concurrent accesses with the
kvm->slots_lock mutex above. But stage2_flush_memslot() expects the
mmu_lock to be held and it will be periodically released by
cond_resched_lock() in stage2_apply_range(); if the lock is not held, then
lockdep will complain about it.

Your observation actually explains why I was seeing intermitent warnings
when lockdep was enabled: __cond_resched_lock was complaining the KVM was
trying to release a lock it wasn't holding. Thank you for pointing the
missing lock acquire operation.

I'll change the code to avoid the lockdep warning.

> 
> > +                       stage2_flush_memslot(kvm, memslot);
> 
> Since stage2_flush_memslot() won't do anything when stage2_has_fwb()
> returns true, I wonder if it can be checked even before iterating
> memslots (so those iterations can be skipped when not needed).

I think this can be further improved by setting the
KVM_LOCKED_MEMSLOT_FLUSH_DCACHE bit only if FWB is not present.

Thanks,
Alex

> 
> Thanks,
> Reiji
> 
> > +               }
> > +               clear_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> > +       }
> > +
> > +out_unlock:
> > +       mutex_unlock(&kvm->slots_lock);
> > +       return;
> > +}
> > +
> >  static int try_rlimit_memlock(unsigned long npages)
> >  {
> >         unsigned long lock_limit;
> > @@ -1345,7 +1379,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >         struct kvm_memory_slot_page *page_entry;
> >         bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
> >         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> > -       struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> > +       struct kvm_pgtable pgt;
> > +       struct kvm_pgtable_mm_ops mm_ops;
> >         struct vm_area_struct *vma;
> >         unsigned long npages = memslot->npages;
> >         unsigned int pin_flags = FOLL_LONGTERM;
> > @@ -1363,6 +1398,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >                 pin_flags |= FOLL_WRITE;
> >         }
> >
> > +       /*
> > +        * Make a copy of the stage 2 translation table struct to remove the
> > +        * dcache callback so we can postpone the cache maintenance operations
> > +        * until the first VCPU is run.
> > +        */
> > +       mm_ops = *kvm->arch.mmu.pgt->mm_ops;
> > +       mm_ops.dcache_clean_inval_poc = NULL;
> > +       pgt = *kvm->arch.mmu.pgt;
> > +       pgt.mm_ops = &mm_ops;
> > +
> >         hva = memslot->userspace_addr;
> >         ipa = memslot->base_gfn << PAGE_SHIFT;
> >
> > @@ -1414,13 +1459,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >                         goto out_err;
> >                 }
> >
> > -               ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> > +               ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
> >                                              page_to_phys(page_entry->page),
> >                                              prot, &cache);
> >                 spin_unlock(&kvm->mmu_lock);
> >
> >                 if (ret) {
> > -                       kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> > +                       kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
> >                                                  i << PAGE_SHIFT);
> >                         unpin_memslot_pages(memslot, writable);
> >                         goto out_err;
> > @@ -1439,7 +1484,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >          */
> >         ret = account_locked_vm(current->mm, npages, true);
> >         if (ret) {
> > -               kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> > +               kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
> >                                          npages << PAGE_SHIFT);
> >                 unpin_memslot_pages(memslot, writable);
> >                 goto out_err;
> > @@ -1449,6 +1494,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >         if (writable)
> >                 memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
> >
> > +       set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> > +
> >         kvm_mmu_free_memory_cache(&cache);
> >
> >         return 0;
> > --
> > 2.33.1
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 06/38] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs
  2022-03-18  5:03     ` Reiji Watanabe
@ 2022-03-21 17:17       ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-03-21 17:17 UTC (permalink / raw)
  To: Reiji Watanabe; +Cc: maz, will, kvmarm, linux-arm-kernel

Hi,

On Thu, Mar 17, 2022 at 10:03:47PM -0700, Reiji Watanabe wrote:
> Hi Alex,
> 
> On 11/17/21 7:38 AM, Alexandru Elisei wrote:
> > When an MTE-enabled guest first accesses a physical page, that page must be
> > scrubbed for tags. This is normally done by KVM on a translation fault, but
> > with locked memslots we will not get translation faults. So far, this has
> > been handled by forbidding userspace to enable the MTE capability after
> > locking a memslot.
> > 
> > Remove this constraint by deferring tag cleaning until the first VCPU is
> > run, similar to how KVM handles cache maintenance operations.
> > 
> > When userspace resets a VCPU, KVM again performs cache maintenance
> > operations on locked memslots because userspace might have modified the
> > guest memory. Clean the tags the next time a VCPU is run for the same
> > reason.
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >   arch/arm64/include/asm/kvm_host.h |  7 ++-
> >   arch/arm64/include/asm/kvm_mmu.h  |  2 +-
> >   arch/arm64/kvm/arm.c              | 29 ++--------
> >   arch/arm64/kvm/mmu.c              | 95 ++++++++++++++++++++++++++-----
> >   4 files changed, 91 insertions(+), 42 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 5f49a27ce289..0ebdef158020 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -114,9 +114,10 @@ struct kvm_arch_memory_slot {
> >   };
> >   /* kvm->arch.mmu_pending_ops flags */
> > -#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
> > -#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE	1
> > -#define KVM_MAX_MMU_PENDING_OPS		2
> > +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE		0
> > +#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE		1
> > +#define KVM_LOCKED_MEMSLOT_SANITISE_TAGS	2
> > +#define KVM_MAX_MMU_PENDING_OPS			3
> >   struct kvm_arch {
> >   	struct kvm_s2_mmu mmu;
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index cbf57c474fea..2d2f902000b3 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -222,7 +222,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> >   #define kvm_mmu_has_pending_ops(kvm)	\
> >   	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> > -void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> > +int kvm_mmu_perform_pending_ops(struct kvm *kvm);
> >   static inline unsigned int kvm_get_vmid_bits(void)
> >   {
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 96ed48455cdd..13f3af1f2e78 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -106,25 +106,6 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> >   	}
> >   }
> > -static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
> > -{
> > -	struct kvm_memslots *slots = kvm_memslots(kvm);
> > -	struct kvm_memory_slot *memslot;
> > -	bool has_locked_memslots = false;
> > -	int idx;
> > -
> > -	idx = srcu_read_lock(&kvm->srcu);
> > -	kvm_for_each_memslot(memslot, slots) {
> > -		if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
> > -			has_locked_memslots = true;
> > -			break;
> > -		}
> > -	}
> > -	srcu_read_unlock(&kvm->srcu, idx);
> > -
> > -	return has_locked_memslots;
> > -}
> > -
> >   int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >   			    struct kvm_enable_cap *cap)
> >   {
> > @@ -139,8 +120,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >   		break;
> >   	case KVM_CAP_ARM_MTE:
> >   		mutex_lock(&kvm->lock);
> > -		if (!system_supports_mte() || kvm->created_vcpus ||
> > -		    (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
> > +		if (!system_supports_mte() || kvm->created_vcpus) {
> >   			r = -EINVAL;
> >   		} else {
> >   			r = 0;
> > @@ -870,8 +850,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >   	if (unlikely(!kvm_vcpu_initialized(vcpu)))
> >   		return -ENOEXEC;
> > -	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> > -		kvm_mmu_perform_pending_ops(vcpu->kvm);
> > +	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm))) {
> > +		ret = kvm_mmu_perform_pending_ops(vcpu->kvm);
> > +		if (ret)
> > +			return ret;
> > +	}
> >   	ret = kvm_vcpu_first_run_init(vcpu);
> >   	if (ret)
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 188064c5839c..2491e73e3d31 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -613,6 +613,15 @@ void stage2_unmap_vm(struct kvm *kvm)
> >   				&kvm->arch.mmu_pending_ops);
> >   			set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE,
> >   				&kvm->arch.mmu_pending_ops);
> > +			/*
> > +			 * stage2_unmap_vm() is called after a VCPU has run, at
> > +			 * which point the state of the MTE cap (either enabled
> > +			 * or disabled) is final.
> > +			 */
> > +			if (kvm_has_mte(kvm)) {
> > +				set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS,
> > +					&kvm->arch.mmu_pending_ops);
> > +			}
> >   			continue;
> >   		}
> >   		stage2_unmap_memslot(kvm, memslot);
> > @@ -956,6 +965,55 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> >   	return 0;
> >   }
> > +static int sanitise_mte_tags_memslot(struct kvm *kvm,
> > +				     struct kvm_memory_slot *memslot)
> > +{
> > +	unsigned long hva, slot_size, slot_end;
> > +	struct kvm_memory_slot_page *entry;
> > +	struct page *page;
> > +	int ret = 0;
> > +
> > +	hva = memslot->userspace_addr;
> > +	slot_size = memslot->npages << PAGE_SHIFT;
> > +	slot_end = hva + slot_size;
> > +
> > +	/* First check that the VMAs spanning the memslot are not shared... */
> > +	do {
> > +		struct vm_area_struct *vma;
> > +
> > +		vma = find_vma_intersection(current->mm, hva, slot_end);
> > +		/* The VMAs spanning the memslot must be contiguous. */
> > +		if (!vma) {
> > +			ret = -EFAULT;
> > +			goto out;
> > +		}
> > +		/*
> > +		 * VM_SHARED mappings are not allowed with MTE to avoid races
> > +		 * when updating the PG_mte_tagged page flag, see
> > +		 * sanitise_mte_tags for more details.
> > +		 */
> > +		if (vma->vm_flags & VM_SHARED) {
> > +			ret = -EFAULT;
> > +			goto out;
> > +		}
> > +		hva = min(slot_end, vma->vm_end);
> > +	} while (hva < slot_end);
> > +
> > +	/* ... then clear the tags. */
> > +	list_for_each_entry(entry, &memslot->arch.pages.list, list) {
> > +		page = entry->page;
> > +		if (!test_bit(PG_mte_tagged, &page->flags)) {
> > +			mte_clear_page_tags(page_address(page));
> > +			set_bit(PG_mte_tagged, &page->flags);
> > +		}
> > +	}
> > +
> > +out:
> > +	mmap_read_unlock(current->mm);
> 
> This appears unnecessary (taken care by the caller).

Indeed, this was a refactoring artefact.

> 
> 
> 
> > +
> > +	return ret;
> > +}
> > +
> >   static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >   			  struct kvm_memory_slot *memslot, unsigned long hva,
> >   			  unsigned long fault_status)
> > @@ -1325,14 +1383,29 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >    * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> >    *   is live, which means that the VM will be live.
> >    */
> > -void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> > +int kvm_mmu_perform_pending_ops(struct kvm *kvm)
> >   {
> >   	struct kvm_memory_slot *memslot;
> > +	int ret = 0;
> >   	mutex_lock(&kvm->slots_lock);
> >   	if (!kvm_mmu_has_pending_ops(kvm))
> >   		goto out_unlock;
> > +	if (kvm_has_mte(kvm) &&
> > +	    (test_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops))) {
> > +		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> > +			if (!memslot_is_locked(memslot))
> > +				continue;
> > +			mmap_read_lock(current->mm);
> > +			ret = sanitise_mte_tags_memslot(kvm, memslot);
> > +			mmap_read_unlock(current->mm);
> > +			if (ret)
> > +				goto out_unlock;
> > +		}
> > +		clear_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
> > +	}
> > +
> >   	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> >   		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> >   			if (!memslot_is_locked(memslot))
> > @@ -1349,7 +1422,7 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> >   out_unlock:
> >   	mutex_unlock(&kvm->slots_lock);
> > -	return;
> > +	return ret;
> >   }
> >   static int try_rlimit_memlock(unsigned long npages)
> > @@ -1443,19 +1516,6 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >   			ret = -ENOMEM;
> >   			goto out_err;
> >   		}
> > -		if (kvm_has_mte(kvm)) {
> > -			if (vma->vm_flags & VM_SHARED) {
> > -				ret = -EFAULT;
> > -			} else {
> > -				ret = sanitise_mte_tags(kvm,
> > -					page_to_pfn(page_entry->page),
> > -					PAGE_SIZE);
> > -			}
> > -			if (ret) {
> > -				mmap_read_unlock(current->mm);
> > -				goto out_err;
> > -			}
> > -		}
> >   		mmap_read_unlock(current->mm);
> >   		ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
> > @@ -1508,6 +1568,11 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >   		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
> >   	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> > +	/*
> > +	 * MTE might be enabled after we lock the memslot, set it here
> > +	 * unconditionally.
> > +	 */
> > +	set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
> 
> 
> Since this won't be needed when the system doesn't support MTE,
> shouldn't the code check if MTE is supported on the system ?
> 
> What is the reason to set this here rather than when the mte
> is enabled ?
> When MTE is not used, once KVM_LOCKED_MEMSLOT_SANITISE_TAGS is set,
> it appears that KVM_LOCKED_MEMSLOT_SANITISE_TAGS won't be cleared
> until all memslots are unlocked (Correct ?). I would think it
> shouldn't be set when unnecessary or should be cleared once it turns
> out to be unnecessary.

Indeed, if the user doesn't enable the MTE capability then the bit will
always be set.

The bit must always be set here because KVM has no way of looking into the
future and knowing if the user will enable the MTE capability, as there is
no ordering enforced between creating a memslot and creating a VCPU.

What I can do is clear the bit regardless of the value of kvm_has_mte() in
kvm_mmu_perform_pending_ops(), because at that point the user cannot enable
MTE anymore (at least one VCPU has been created).

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 06/38] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs
@ 2022-03-21 17:17       ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-03-21 17:17 UTC (permalink / raw)
  To: Reiji Watanabe
  Cc: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Hi,

On Thu, Mar 17, 2022 at 10:03:47PM -0700, Reiji Watanabe wrote:
> Hi Alex,
> 
> On 11/17/21 7:38 AM, Alexandru Elisei wrote:
> > When an MTE-enabled guest first accesses a physical page, that page must be
> > scrubbed for tags. This is normally done by KVM on a translation fault, but
> > with locked memslots we will not get translation faults. So far, this has
> > been handled by forbidding userspace to enable the MTE capability after
> > locking a memslot.
> > 
> > Remove this constraint by deferring tag cleaning until the first VCPU is
> > run, similar to how KVM handles cache maintenance operations.
> > 
> > When userspace resets a VCPU, KVM again performs cache maintenance
> > operations on locked memslots because userspace might have modified the
> > guest memory. Clean the tags the next time a VCPU is run for the same
> > reason.
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >   arch/arm64/include/asm/kvm_host.h |  7 ++-
> >   arch/arm64/include/asm/kvm_mmu.h  |  2 +-
> >   arch/arm64/kvm/arm.c              | 29 ++--------
> >   arch/arm64/kvm/mmu.c              | 95 ++++++++++++++++++++++++++-----
> >   4 files changed, 91 insertions(+), 42 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 5f49a27ce289..0ebdef158020 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -114,9 +114,10 @@ struct kvm_arch_memory_slot {
> >   };
> >   /* kvm->arch.mmu_pending_ops flags */
> > -#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
> > -#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE	1
> > -#define KVM_MAX_MMU_PENDING_OPS		2
> > +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE		0
> > +#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE		1
> > +#define KVM_LOCKED_MEMSLOT_SANITISE_TAGS	2
> > +#define KVM_MAX_MMU_PENDING_OPS			3
> >   struct kvm_arch {
> >   	struct kvm_s2_mmu mmu;
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index cbf57c474fea..2d2f902000b3 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -222,7 +222,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> >   #define kvm_mmu_has_pending_ops(kvm)	\
> >   	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> > -void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> > +int kvm_mmu_perform_pending_ops(struct kvm *kvm);
> >   static inline unsigned int kvm_get_vmid_bits(void)
> >   {
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 96ed48455cdd..13f3af1f2e78 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -106,25 +106,6 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> >   	}
> >   }
> > -static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
> > -{
> > -	struct kvm_memslots *slots = kvm_memslots(kvm);
> > -	struct kvm_memory_slot *memslot;
> > -	bool has_locked_memslots = false;
> > -	int idx;
> > -
> > -	idx = srcu_read_lock(&kvm->srcu);
> > -	kvm_for_each_memslot(memslot, slots) {
> > -		if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
> > -			has_locked_memslots = true;
> > -			break;
> > -		}
> > -	}
> > -	srcu_read_unlock(&kvm->srcu, idx);
> > -
> > -	return has_locked_memslots;
> > -}
> > -
> >   int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >   			    struct kvm_enable_cap *cap)
> >   {
> > @@ -139,8 +120,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >   		break;
> >   	case KVM_CAP_ARM_MTE:
> >   		mutex_lock(&kvm->lock);
> > -		if (!system_supports_mte() || kvm->created_vcpus ||
> > -		    (kvm_arm_lock_memslot_supported() && kvm_arm_has_locked_memslots(kvm))) {
> > +		if (!system_supports_mte() || kvm->created_vcpus) {
> >   			r = -EINVAL;
> >   		} else {
> >   			r = 0;
> > @@ -870,8 +850,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >   	if (unlikely(!kvm_vcpu_initialized(vcpu)))
> >   		return -ENOEXEC;
> > -	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> > -		kvm_mmu_perform_pending_ops(vcpu->kvm);
> > +	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm))) {
> > +		ret = kvm_mmu_perform_pending_ops(vcpu->kvm);
> > +		if (ret)
> > +			return ret;
> > +	}
> >   	ret = kvm_vcpu_first_run_init(vcpu);
> >   	if (ret)
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 188064c5839c..2491e73e3d31 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -613,6 +613,15 @@ void stage2_unmap_vm(struct kvm *kvm)
> >   				&kvm->arch.mmu_pending_ops);
> >   			set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE,
> >   				&kvm->arch.mmu_pending_ops);
> > +			/*
> > +			 * stage2_unmap_vm() is called after a VCPU has run, at
> > +			 * which point the state of the MTE cap (either enabled
> > +			 * or disabled) is final.
> > +			 */
> > +			if (kvm_has_mte(kvm)) {
> > +				set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS,
> > +					&kvm->arch.mmu_pending_ops);
> > +			}
> >   			continue;
> >   		}
> >   		stage2_unmap_memslot(kvm, memslot);
> > @@ -956,6 +965,55 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> >   	return 0;
> >   }
> > +static int sanitise_mte_tags_memslot(struct kvm *kvm,
> > +				     struct kvm_memory_slot *memslot)
> > +{
> > +	unsigned long hva, slot_size, slot_end;
> > +	struct kvm_memory_slot_page *entry;
> > +	struct page *page;
> > +	int ret = 0;
> > +
> > +	hva = memslot->userspace_addr;
> > +	slot_size = memslot->npages << PAGE_SHIFT;
> > +	slot_end = hva + slot_size;
> > +
> > +	/* First check that the VMAs spanning the memslot are not shared... */
> > +	do {
> > +		struct vm_area_struct *vma;
> > +
> > +		vma = find_vma_intersection(current->mm, hva, slot_end);
> > +		/* The VMAs spanning the memslot must be contiguous. */
> > +		if (!vma) {
> > +			ret = -EFAULT;
> > +			goto out;
> > +		}
> > +		/*
> > +		 * VM_SHARED mappings are not allowed with MTE to avoid races
> > +		 * when updating the PG_mte_tagged page flag, see
> > +		 * sanitise_mte_tags for more details.
> > +		 */
> > +		if (vma->vm_flags & VM_SHARED) {
> > +			ret = -EFAULT;
> > +			goto out;
> > +		}
> > +		hva = min(slot_end, vma->vm_end);
> > +	} while (hva < slot_end);
> > +
> > +	/* ... then clear the tags. */
> > +	list_for_each_entry(entry, &memslot->arch.pages.list, list) {
> > +		page = entry->page;
> > +		if (!test_bit(PG_mte_tagged, &page->flags)) {
> > +			mte_clear_page_tags(page_address(page));
> > +			set_bit(PG_mte_tagged, &page->flags);
> > +		}
> > +	}
> > +
> > +out:
> > +	mmap_read_unlock(current->mm);
> 
> This appears unnecessary (taken care by the caller).

Indeed, this was a refactoring artefact.

> 
> 
> 
> > +
> > +	return ret;
> > +}
> > +
> >   static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >   			  struct kvm_memory_slot *memslot, unsigned long hva,
> >   			  unsigned long fault_status)
> > @@ -1325,14 +1383,29 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >    * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> >    *   is live, which means that the VM will be live.
> >    */
> > -void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> > +int kvm_mmu_perform_pending_ops(struct kvm *kvm)
> >   {
> >   	struct kvm_memory_slot *memslot;
> > +	int ret = 0;
> >   	mutex_lock(&kvm->slots_lock);
> >   	if (!kvm_mmu_has_pending_ops(kvm))
> >   		goto out_unlock;
> > +	if (kvm_has_mte(kvm) &&
> > +	    (test_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops))) {
> > +		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> > +			if (!memslot_is_locked(memslot))
> > +				continue;
> > +			mmap_read_lock(current->mm);
> > +			ret = sanitise_mte_tags_memslot(kvm, memslot);
> > +			mmap_read_unlock(current->mm);
> > +			if (ret)
> > +				goto out_unlock;
> > +		}
> > +		clear_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
> > +	}
> > +
> >   	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> >   		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> >   			if (!memslot_is_locked(memslot))
> > @@ -1349,7 +1422,7 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> >   out_unlock:
> >   	mutex_unlock(&kvm->slots_lock);
> > -	return;
> > +	return ret;
> >   }
> >   static int try_rlimit_memlock(unsigned long npages)
> > @@ -1443,19 +1516,6 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >   			ret = -ENOMEM;
> >   			goto out_err;
> >   		}
> > -		if (kvm_has_mte(kvm)) {
> > -			if (vma->vm_flags & VM_SHARED) {
> > -				ret = -EFAULT;
> > -			} else {
> > -				ret = sanitise_mte_tags(kvm,
> > -					page_to_pfn(page_entry->page),
> > -					PAGE_SIZE);
> > -			}
> > -			if (ret) {
> > -				mmap_read_unlock(current->mm);
> > -				goto out_err;
> > -			}
> > -		}
> >   		mmap_read_unlock(current->mm);
> >   		ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
> > @@ -1508,6 +1568,11 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >   		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
> >   	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> > +	/*
> > +	 * MTE might be enabled after we lock the memslot, set it here
> > +	 * unconditionally.
> > +	 */
> > +	set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
> 
> 
> Since this won't be needed when the system doesn't support MTE,
> shouldn't the code check if MTE is supported on the system ?
> 
> What is the reason to set this here rather than when the mte
> is enabled ?
> When MTE is not used, once KVM_LOCKED_MEMSLOT_SANITISE_TAGS is set,
> it appears that KVM_LOCKED_MEMSLOT_SANITISE_TAGS won't be cleared
> until all memslots are unlocked (Correct ?). I would think it
> shouldn't be set when unnecessary or should be cleared once it turns
> out to be unnecessary.

Indeed, if the user doesn't enable the MTE capability then the bit will
always be set.

The bit must always be set here because KVM has no way of looking into the
future and knowing if the user will enable the MTE capability, as there is
no ordering enforced between creating a memslot and creating a VCPU.

What I can do is clear the bit regardless of the value of kvm_has_mte() in
kvm_mmu_perform_pending_ops(), because at that point the user cannot enable
MTE anymore (at least one VCPU has been created).

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 08/38] KVM: arm64: Unlock memslots after stage 2 tables are freed
  2022-03-18  5:19     ` Reiji Watanabe
@ 2022-03-21 17:29       ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-03-21 17:29 UTC (permalink / raw)
  To: Reiji Watanabe; +Cc: maz, will, kvmarm, linux-arm-kernel

Hi,

On Thu, Mar 17, 2022 at 10:19:56PM -0700, Reiji Watanabe wrote:
> Hi Alex,
> 
> On 11/17/21 7:38 AM, Alexandru Elisei wrote:
> > Unpin the backing pages mapped at stage 2 after the stage 2 translation
> > tables are destroyed.
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >   arch/arm64/kvm/mmu.c | 23 ++++++++++++++++++-----
> >   1 file changed, 18 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index cd6f1bc7842d..072e2aba371f 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1627,11 +1627,19 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >   	return ret;
> >   }
> > -static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> > +static void __unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> >   {
> >   	bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
> >   	unsigned long npages = memslot->npages;
> > +	unpin_memslot_pages(memslot, writable);
> > +	account_locked_vm(current->mm, npages, false);
> > +
> > +	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> > +}
> > +
> > +static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> > +{
> >   	/*
> >   	 * MMU maintenace operations aren't performed on an unlocked memslot.
> >   	 * Unmap it from stage 2 so the abort handler performs the necessary
> > @@ -1640,10 +1648,7 @@ static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> >   	if (kvm_mmu_has_pending_ops(kvm))
> >   		kvm_arch_flush_shadow_memslot(kvm, memslot);
> > -	unpin_memslot_pages(memslot, writable);
> > -	account_locked_vm(current->mm, npages, false);
> > -
> > -	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> > +	__unlock_memslot(kvm, memslot);
> >   }
> >   int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > @@ -1951,7 +1956,15 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
> >   void kvm_arch_flush_shadow_all(struct kvm *kvm)
> >   {
> > +	struct kvm_memory_slot *memslot;
> > +
> >   	kvm_free_stage2_pgd(&kvm->arch.mmu);
> > +
> > +	kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> > +		if (!memslot_is_locked(memslot))
> > +			continue;
> > +		__unlock_memslot(kvm, memslot);
> > +	}
> >   }
> 
> Perhaps it might be useful to manage the number of locked memslots ?
> (can be used in the fix for kvm_mmu_unlock_memslot in the patch-7 as well)

I don't think it's very useful to manage the number, as we usually want to
find all locked memslots, and there's absolutely no guarantee that the
locked memslot will be at the start of the list, in which case we would
have saved iterating over the last memslots.

In the case above, this is done when the VM is being destroyed, which is
not particularly performance sensitive. And certainly a few linked list
accesses won't make much of a difference.

In patch #7, KVM iterates through the memslots and calls
kvm_arch_flush_shadow_memslot(), which is several orders of magnitude
slower than iterating through a few extra memslots. Also, I don't think
userspace locking then unlocking a memslot before running any VCPUs is
something that will happen very often.

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 08/38] KVM: arm64: Unlock memslots after stage 2 tables are freed
@ 2022-03-21 17:29       ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-03-21 17:29 UTC (permalink / raw)
  To: Reiji Watanabe
  Cc: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Hi,

On Thu, Mar 17, 2022 at 10:19:56PM -0700, Reiji Watanabe wrote:
> Hi Alex,
> 
> On 11/17/21 7:38 AM, Alexandru Elisei wrote:
> > Unpin the backing pages mapped at stage 2 after the stage 2 translation
> > tables are destroyed.
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >   arch/arm64/kvm/mmu.c | 23 ++++++++++++++++++-----
> >   1 file changed, 18 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index cd6f1bc7842d..072e2aba371f 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1627,11 +1627,19 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> >   	return ret;
> >   }
> > -static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> > +static void __unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> >   {
> >   	bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
> >   	unsigned long npages = memslot->npages;
> > +	unpin_memslot_pages(memslot, writable);
> > +	account_locked_vm(current->mm, npages, false);
> > +
> > +	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> > +}
> > +
> > +static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> > +{
> >   	/*
> >   	 * MMU maintenace operations aren't performed on an unlocked memslot.
> >   	 * Unmap it from stage 2 so the abort handler performs the necessary
> > @@ -1640,10 +1648,7 @@ static void unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
> >   	if (kvm_mmu_has_pending_ops(kvm))
> >   		kvm_arch_flush_shadow_memslot(kvm, memslot);
> > -	unpin_memslot_pages(memslot, writable);
> > -	account_locked_vm(current->mm, npages, false);
> > -
> > -	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
> > +	__unlock_memslot(kvm, memslot);
> >   }
> >   int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
> > @@ -1951,7 +1956,15 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
> >   void kvm_arch_flush_shadow_all(struct kvm *kvm)
> >   {
> > +	struct kvm_memory_slot *memslot;
> > +
> >   	kvm_free_stage2_pgd(&kvm->arch.mmu);
> > +
> > +	kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> > +		if (!memslot_is_locked(memslot))
> > +			continue;
> > +		__unlock_memslot(kvm, memslot);
> > +	}
> >   }
> 
> Perhaps it might be useful to manage the number of locked memslots ?
> (can be used in the fix for kvm_mmu_unlock_memslot in the patch-7 as well)

I don't think it's very useful to manage the number, as we usually want to
find all locked memslots, and there's absolutely no guarantee that the
locked memslot will be at the start of the list, in which case we would
have saved iterating over the last memslots.

In the case above, this is done when the VM is being destroyed, which is
not particularly performance sensitive. And certainly a few linked list
accesses won't make much of a difference.

In patch #7, KVM iterates through the memslots and calls
kvm_arch_flush_shadow_memslot(), which is several orders of magnitude
slower than iterating through a few extra memslots. Also, I don't think
userspace locking then unlocking a memslot before running any VCPUs is
something that will happen very often.

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 15/38] perf: arm_spe_pmu: Move struct arm_spe_pmu to a separate header file
  2021-11-17 15:38   ` Alexandru Elisei
@ 2022-07-05 16:57     ` Calvin Owens
  -1 siblings, 0 replies; 118+ messages in thread
From: Calvin Owens @ 2022-07-05 16:57 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Hi Alexandru,

I've been taking a look at this series, it needs a little tweak to
build successfully as a module which I've appended below.

Cheers,
Calvin

On Wed, Nov 17, 2021 at 03:38:19PM +0000, Alexandru Elisei wrote:
> KVM will soon want to make use of struct arm_spe_pmu, move it to a separate
> header where it will be easily accessible. This is a straightforward move
> and functionality should not be impacted.
>
> CC: Will Deacon <will@kernel.org>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---

<snip>

> +++ b/include/linux/perf/arm_spe_pmu.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Split from from drivers/perf/arm_spe_pmu.c.
> + *
> + *  Copyright (C) 2021 ARM Limited
> + */
> +
> +#ifndef __ARM_SPE_PMU_H__
> +#define __ARM_SPE_PMU_H__
> +
> +#include <linux/cpumask.h>
> +#include <linux/perf_event.h>
> +#include <linux/platform_device.h>
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_ARM_SPE_PMU

Here, we need to use the IS_ENABLED() macro for the ARM_SPE_PMU=m case.

Signed-off-by: Calvin Owens <calvinow@qti.qualcomm.com>
---
 include/linux/perf/arm_spe_pmu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf/arm_spe_pmu.h b/include/linux/perf/arm_spe_pmu.h
index 505a8867daad..b643e5e7a766 100644
--- a/include/linux/perf/arm_spe_pmu.h
+++ b/include/linux/perf/arm_spe_pmu.h
@@ -13,7 +13,7 @@
 #include <linux/platform_device.h>
 #include <linux/types.h>
 
-#ifdef CONFIG_ARM_SPE_PMU
+#if IS_ENABLED(CONFIG_ARM_SPE_PMU)
 
 struct arm_spe_pmu {
 	struct pmu				pmu;
@@ -50,6 +50,6 @@ void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu);
 #define kvm_host_spe_init(x)	do { } while(0)
 #endif
 
-#endif /* CONFIG_ARM_SPE_PMU */
+#endif /* IS_ENABLED(CONFIG_ARM_SPE_PMU) */
 
 #endif /* __ARM_SPE_PMU_H__ */
-- 
2.30.2

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 15/38] perf: arm_spe_pmu: Move struct arm_spe_pmu to a separate header file
@ 2022-07-05 16:57     ` Calvin Owens
  0 siblings, 0 replies; 118+ messages in thread
From: Calvin Owens @ 2022-07-05 16:57 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: maz, will, kvmarm, linux-arm-kernel

Hi Alexandru,

I've been taking a look at this series, it needs a little tweak to
build successfully as a module which I've appended below.

Cheers,
Calvin

On Wed, Nov 17, 2021 at 03:38:19PM +0000, Alexandru Elisei wrote:
> KVM will soon want to make use of struct arm_spe_pmu, move it to a separate
> header where it will be easily accessible. This is a straightforward move
> and functionality should not be impacted.
>
> CC: Will Deacon <will@kernel.org>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---

<snip>

> +++ b/include/linux/perf/arm_spe_pmu.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Split from from drivers/perf/arm_spe_pmu.c.
> + *
> + *  Copyright (C) 2021 ARM Limited
> + */
> +
> +#ifndef __ARM_SPE_PMU_H__
> +#define __ARM_SPE_PMU_H__
> +
> +#include <linux/cpumask.h>
> +#include <linux/perf_event.h>
> +#include <linux/platform_device.h>
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_ARM_SPE_PMU

Here, we need to use the IS_ENABLED() macro for the ARM_SPE_PMU=m case.

Signed-off-by: Calvin Owens <calvinow@qti.qualcomm.com>
---
 include/linux/perf/arm_spe_pmu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf/arm_spe_pmu.h b/include/linux/perf/arm_spe_pmu.h
index 505a8867daad..b643e5e7a766 100644
--- a/include/linux/perf/arm_spe_pmu.h
+++ b/include/linux/perf/arm_spe_pmu.h
@@ -13,7 +13,7 @@
 #include <linux/platform_device.h>
 #include <linux/types.h>
 
-#ifdef CONFIG_ARM_SPE_PMU
+#if IS_ENABLED(CONFIG_ARM_SPE_PMU)
 
 struct arm_spe_pmu {
 	struct pmu				pmu;
@@ -50,6 +50,6 @@ void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu);
 #define kvm_host_spe_init(x)	do { } while(0)
 #endif
 
-#endif /* CONFIG_ARM_SPE_PMU */
+#endif /* IS_ENABLED(CONFIG_ARM_SPE_PMU) */
 
 #endif /* __ARM_SPE_PMU_H__ */
-- 
2.30.2
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 15/38] perf: arm_spe_pmu: Move struct arm_spe_pmu to a separate header file
  2022-07-05 16:57     ` Calvin Owens
@ 2022-07-06 10:51       ` Alexandru Elisei
  -1 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-07-06 10:51 UTC (permalink / raw)
  To: Calvin Owens; +Cc: maz, will, kvmarm, linux-arm-kernel

Hi Calvin,

Thank you for the interest! FYI, I'm working on the next iteration of the
series where I'm planning to remove the dependency on CONFIG_NUMA_BALANCING
being unset.

On Tue, Jul 05, 2022 at 09:57:22AM -0700, Calvin Owens wrote:
> Hi Alexandru,
> 
> I've been taking a look at this series, it needs a little tweak to
> build successfully as a module which I've appended below.
> 
> Cheers,
> Calvin
> 
> On Wed, Nov 17, 2021 at 03:38:19PM +0000, Alexandru Elisei wrote:
> > KVM will soon want to make use of struct arm_spe_pmu, move it to a separate
> > header where it will be easily accessible. This is a straightforward move
> > and functionality should not be impacted.
> >
> > CC: Will Deacon <will@kernel.org>
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> 
> <snip>
> 
> > +++ b/include/linux/perf/arm_spe_pmu.h
> > @@ -0,0 +1,49 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Split from from drivers/perf/arm_spe_pmu.c.
> > + *
> > + *  Copyright (C) 2021 ARM Limited
> > + */
> > +
> > +#ifndef __ARM_SPE_PMU_H__
> > +#define __ARM_SPE_PMU_H__
> > +
> > +#include <linux/cpumask.h>
> > +#include <linux/perf_event.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/types.h>
> > +
> > +#ifdef CONFIG_ARM_SPE_PMU
> 
> Here, we need to use the IS_ENABLED() macro for the ARM_SPE_PMU=m case.
> 
> Signed-off-by: Calvin Owens <calvinow@qti.qualcomm.com>
> ---
>  include/linux/perf/arm_spe_pmu.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/perf/arm_spe_pmu.h b/include/linux/perf/arm_spe_pmu.h
> index 505a8867daad..b643e5e7a766 100644
> --- a/include/linux/perf/arm_spe_pmu.h
> +++ b/include/linux/perf/arm_spe_pmu.h
> @@ -13,7 +13,7 @@
>  #include <linux/platform_device.h>
>  #include <linux/types.h>
>  
> -#ifdef CONFIG_ARM_SPE_PMU
> +#if IS_ENABLED(CONFIG_ARM_SPE_PMU)
>  
>  struct arm_spe_pmu {
>  	struct pmu				pmu;
> @@ -50,6 +50,6 @@ void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu);
>  #define kvm_host_spe_init(x)	do { } while(0)
>  #endif
>  
> -#endif /* CONFIG_ARM_SPE_PMU */
> +#endif /* IS_ENABLED(CONFIG_ARM_SPE_PMU) */
>  
>  #endif /* __ARM_SPE_PMU_H__ */
> -- 
> 2.30.2

This indeed fixes the nasty screenfulls of errors that I get when trying to
compile with CONFIG_ARM_SPE_PMU=m.

If that's alright with you, I'll fold the fix into the patch and I'll CC
you.

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [RFC PATCH v5 15/38] perf: arm_spe_pmu: Move struct arm_spe_pmu to a separate header file
@ 2022-07-06 10:51       ` Alexandru Elisei
  0 siblings, 0 replies; 118+ messages in thread
From: Alexandru Elisei @ 2022-07-06 10:51 UTC (permalink / raw)
  To: Calvin Owens
  Cc: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	mark.rutland

Hi Calvin,

Thank you for the interest! FYI, I'm working on the next iteration of the
series where I'm planning to remove the dependency on CONFIG_NUMA_BALANCING
being unset.

On Tue, Jul 05, 2022 at 09:57:22AM -0700, Calvin Owens wrote:
> Hi Alexandru,
> 
> I've been taking a look at this series, it needs a little tweak to
> build successfully as a module which I've appended below.
> 
> Cheers,
> Calvin
> 
> On Wed, Nov 17, 2021 at 03:38:19PM +0000, Alexandru Elisei wrote:
> > KVM will soon want to make use of struct arm_spe_pmu, move it to a separate
> > header where it will be easily accessible. This is a straightforward move
> > and functionality should not be impacted.
> >
> > CC: Will Deacon <will@kernel.org>
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> 
> <snip>
> 
> > +++ b/include/linux/perf/arm_spe_pmu.h
> > @@ -0,0 +1,49 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Split from from drivers/perf/arm_spe_pmu.c.
> > + *
> > + *  Copyright (C) 2021 ARM Limited
> > + */
> > +
> > +#ifndef __ARM_SPE_PMU_H__
> > +#define __ARM_SPE_PMU_H__
> > +
> > +#include <linux/cpumask.h>
> > +#include <linux/perf_event.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/types.h>
> > +
> > +#ifdef CONFIG_ARM_SPE_PMU
> 
> Here, we need to use the IS_ENABLED() macro for the ARM_SPE_PMU=m case.
> 
> Signed-off-by: Calvin Owens <calvinow@qti.qualcomm.com>
> ---
>  include/linux/perf/arm_spe_pmu.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/perf/arm_spe_pmu.h b/include/linux/perf/arm_spe_pmu.h
> index 505a8867daad..b643e5e7a766 100644
> --- a/include/linux/perf/arm_spe_pmu.h
> +++ b/include/linux/perf/arm_spe_pmu.h
> @@ -13,7 +13,7 @@
>  #include <linux/platform_device.h>
>  #include <linux/types.h>
>  
> -#ifdef CONFIG_ARM_SPE_PMU
> +#if IS_ENABLED(CONFIG_ARM_SPE_PMU)
>  
>  struct arm_spe_pmu {
>  	struct pmu				pmu;
> @@ -50,6 +50,6 @@ void kvm_host_spe_init(struct arm_spe_pmu *spe_pmu);
>  #define kvm_host_spe_init(x)	do { } while(0)
>  #endif
>  
> -#endif /* CONFIG_ARM_SPE_PMU */
> +#endif /* IS_ENABLED(CONFIG_ARM_SPE_PMU) */
>  
>  #endif /* __ARM_SPE_PMU_H__ */
> -- 
> 2.30.2

This indeed fixes the nasty screenfulls of errors that I get when trying to
compile with CONFIG_ARM_SPE_PMU=m.

If that's alright with you, I'll fold the fix into the patch and I'll CC
you.

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 118+ messages in thread

end of thread, other threads:[~2022-07-06 10:52 UTC | newest]

Thread overview: 118+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-17 15:38 [RFC PATCH v5 00/38] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
2021-11-17 15:38 ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 01/38] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2022-02-15  5:34   ` Reiji Watanabe
2022-02-15  5:34     ` Reiji Watanabe
2022-02-15 10:34     ` Alexandru Elisei
2022-02-15 10:34       ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 02/38] KVM: arm64: Add lock/unlock memslot user API Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2022-02-15  5:59   ` Reiji Watanabe
2022-02-15  5:59     ` Reiji Watanabe
2022-02-15 11:03     ` Alexandru Elisei
2022-02-15 11:03       ` Alexandru Elisei
2022-02-15 12:02       ` Marc Zyngier
2022-02-15 12:02         ` Marc Zyngier
2022-02-15 12:13         ` Alexandru Elisei
2022-02-15 12:13           ` Alexandru Elisei
2022-02-17  7:35       ` Reiji Watanabe
2022-02-17  7:35         ` Reiji Watanabe
2022-02-17 10:31         ` Alexandru Elisei
2022-02-17 10:31           ` Alexandru Elisei
2022-02-18  4:41           ` Reiji Watanabe
2022-02-18  4:41             ` Reiji Watanabe
2021-11-17 15:38 ` [RFC PATCH v5 03/38] KVM: arm64: Implement the memslot lock/unlock functionality Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2022-02-15  7:46   ` Reiji Watanabe
2022-02-15  7:46     ` Reiji Watanabe
2022-02-15 11:26     ` Alexandru Elisei
2022-02-15 11:26       ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 04/38] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2022-02-24  5:56   ` Reiji Watanabe
2022-02-24  5:56     ` Reiji Watanabe
2022-03-21 17:10     ` Alexandru Elisei
2022-03-21 17:10       ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 05/38] KVM: arm64: Perform CMOs on locked memslots when userspace resets VCPUs Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 06/38] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2022-03-18  5:03   ` Reiji Watanabe
2022-03-18  5:03     ` Reiji Watanabe
2022-03-21 17:17     ` Alexandru Elisei
2022-03-21 17:17       ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 07/38] KVM: arm64: Unmap unlocked memslot from stage 2 if kvm_mmu_has_pending_ops() Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 08/38] KVM: arm64: Unlock memslots after stage 2 tables are freed Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2022-03-18  5:19   ` Reiji Watanabe
2022-03-18  5:19     ` Reiji Watanabe
2022-03-21 17:29     ` Alexandru Elisei
2022-03-21 17:29       ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 09/38] KVM: arm64: Deny changes to locked memslots Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 10/38] KVM: Add kvm_warn{,_ratelimited} macros Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 11/38] KVM: arm64: Print a warning for unexpected faults on locked memslots Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 12/38] KVM: arm64: Allow userspace to lock and unlock memslots Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 13/38] KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 14/38] KVM: arm64: Add SPE capability and VCPU feature Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 15/38] perf: arm_spe_pmu: Move struct arm_spe_pmu to a separate header file Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2022-07-05 16:57   ` Calvin Owens
2022-07-05 16:57     ` Calvin Owens
2022-07-06 10:51     ` Alexandru Elisei
2022-07-06 10:51       ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 16/38] KVM: arm64: Allow SPE emulation when the SPE hardware is present Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 17/38] KVM: arm64: Allow userspace to set the SPE feature only if SPE " Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 18/38] KVM: arm64: Expose SPE version to guests Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 19/38] KVM: arm64: Do not run a VCPU on a CPU without SPE Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2022-01-10 11:40   ` Alexandru Elisei
2022-01-10 11:40     ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 20/38] KVM: arm64: Add a new VCPU device control group for SPE Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 21/38] KVM: arm64: Add SPE VCPU device attribute to set the interrupt number Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 22/38] KVM: arm64: Add SPE VCPU device attribute to initialize SPE Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 23/38] KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 24/38] KVM: arm64: Move accesses to MDCR_EL2 out of __{activate, deactivate}_traps_common Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 25/38] KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 26/38] KVM: arm64: Add SPE system registers to VCPU context Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 27/38] KVM: arm64: nVHE: Save PMSCR_EL1 to the host context Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 28/38] KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 29/38] KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 30/38] KVM: arm64: VHE: " Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 31/38] KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 32/38] KVM: arm64: Allow guest to use physical timestamps if perfmon_capable() Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 33/38] KVM: arm64: Emulate SPE buffer management interrupt Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 34/38] KVM: arm64: Add an userspace API to stop a VCPU profiling Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 35/38] KVM: arm64: Implement " Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 36/38] KVM: arm64: Add PMSIDR_EL1 to the SPE register context Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 37/38] KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei
2021-11-17 15:38 ` [RFC PATCH v5 38/38] KVM: arm64: Allow userspace to enable SPE for guests Alexandru Elisei
2021-11-17 15:38   ` Alexandru Elisei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.